I ran some SLOB tests over the weekend using the new SLOBv2 kit and noticed some interesting results. I was using SLOB to generate physical I/O but the “anomaly” is best demonstrated by putting SLOB in “Logical I/O mode”, i.e. by having a large enough buffer cache to satisfy all reads.
I’m calling SLOB with the following configuration parameters and 32 worker processes:
UPDATE_PCT=20 RUN_TIME=30000 WORK_LOOP=1000 SCALE=10000 WORK_UNIT=256 REDO_STRESS=HEAVY LOAD_PARALLEL_DEGREE=8 SHARED_DATA_MODULUS=0
Notice the WORK_LOOP value is non-zero and the RUN_TIME is fairly large – I’m choosing to run a specific set of SLOBops rather than use elapsed time to define each test length. With WORK_LOOP at 1,000 and 32 worker processes that should generate 32,000 SLOBops. Since UPDATE_PCT is 20% I would expect to see around (32,000 * 20%) = 6,400 update statements. So let’s have a look at a couple of interesting statistics in the AWR report generated from this run:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- redo synch writes 97 2.0 0.0 user commits 6,400 134.5 1.0
That’s exactly the number of user commits we expected. But the number of redo synch writes is interesting…
Redo Synch Writes
When a session places a commit record into the log buffer it posts the log writer process and then puts itself into a log file sync wait until LGWR notifies it that the record has been written to persistent storage. Actually there are times when the session will not post LGWR (because it can see via a flag that LGWR is already writing) but one thing it always does is increment the counter redo synch writes. So in the above AWR output we would expect to see a matching number of redo synch writes to user commits… yet we don’t. Why?
There’s a little-known optimization in Oracle PL/SQL which means that Oracle will not always wait for the log buffer flush to complete, but will instead carry on processing – effectively sacrificing the D (durability) of ACID compliance. This is best explained by Jonathan Lewis in Chapter 6 of his excellent book Oracle Core – if you haven’t read it, consider putting it at the top of your reading list.
Because SLOB’s engine is a PL/SQL block containing a WHILE … LOOP, Oracle decides that the concept of durability can be rather loosely defined to be at the level of the PL/SQL block rather than the transactions being created within it. According to Jonathan, one way of persuading Oracle not to use this optimization is to use a database link; so let’s modify the slob.sql update statement to include the use of a loopback database link and see if the number of redo synch writes now rises to around 6,400:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- redo synch writes 6,564 302.3 0.5 user commits 12,811 590.0 1.0
Indeed it does… but now the number of user commits has doubled, presumably as the result of Oracle performing a two-phase commit (Oracle doesn’t know the loopback database link points to the same database so assumes the transactions are distributed).
Conclusion
I blogged this because I found it interesting, rather than because I had a point I was trying to prove. However, if there were to be any conclusions to this entry they would be the following:
- SLOB is a great tool for experimenting with the behaviour of Oracle under load
- Jonathan’s Oracle Core book is essential reading for anyone who wants to understand Oracle to a deeper level
It’s probably also worth keeping in mind that SLOB’s use of PL/SQL blocks may result in slightly different behaviour from the log writer than you might see from alternative load generation tools or applications which generate I/O.
Filed under: Blog, Database, SLOB Tagged: database, SLOB