The “Secret Sauce” for Exadata is it’s ability to offload processing to the storage tier. Offloading means that the storage servers can apply predicate filters at the storage layer, instead of shipping every possible block back to the database server(s). Another thing that happens with offloading is that the volume of data returned can be further reduced by column projection (i.e. if you only select 1 column from a 100 column table, there is no need to return the other 99 columns). Offloading is geared to long running queries that access a large amount of data. Offloading only works if the Oracle decides to use it’s direct path read mechanism. Direct path reads have traditionally been done by parallel query slaves but can also be done by serial queries. In fact, as of 11g, Oracle has changed the decision making process resulting in more aggressive use of serial direct path reads. I’ve seen this feature described as “serial direct path reads” and “adaptive direct path reads”.
I’ll digress here a bit to discuss this feature since direct path reads are critical to Exadata Offloading. Direct path reads do not load blocks into Oracle’s buffer cache. Instead, the data is returned directly to the PGA of the process requesting the data. This means that the data does not have to be in Oracle block format. That means no 8K block that is only partially filled, that may only have a record or two that you’re interested in, containing every column including ones you don’t want, and with additional header information - needs to be shipped back up from the storage layer. Instead, a much more compact result set containing only the columns requested and hopefully only the rows you need are returned. As I said, direct path reads are traditionally used by parallel query slaves. They are also used in a few other instances such as LOB access and sorts that spill over into TEMP. So the ability to use direct path reads is very important to the Exadata platform and thus the changes to the make them more attractive in 11g. Here are a few links to info on the subject of serial direct path reads:
Also be aware that direct path reads are only available for full scans (tables or indexes). So any statement that uses an index to get to a row in a table will not use this mechanism. Also keep in mind that direct path requires extra processing to ensure that all blocks on disk are current - (i.e. a check point), so frequently modified tables will suffer some overhead before direct path reads can be initiated.
I must say that I think the changes to the heuristics in 11g may be a little on the aggressive side for non-Exadata platforms (the changes may well be driven by Exadata). And by the way, serial direct path reads are not always faster than the normal reads that go through the buffer cache. Dion Cho has a good post on a performance problem due to serial direct path reads kicking in on one node of an 11g RAC environment (not Exadata). The node doing the direct path reads was running the query much slower than the node using the normal buffer cache reads. He also has a post on turning off serial direct path reads.
But enough about the direct path reads stuff, on to the Offloading. One of the first things I wanted to know when I got my first look at a system running on Exadata was whether a particular query was eligible for offloading and if so, how much of the expected i/o was saved by the feature. So of course I wrote a little script to show me that. Turns out there is plenty of info in V$SQL to see what’s going on. I called the script fsx.sql (short for Find_Sql_eXadata). Here’s a little demo:
SYS@LABRAT1> @test_offload SYS@LABRAT1> -- ran but turned off output SYS@LABRAT1> -- select /* test full */ * from kso.skew3; SYS@LABRAT1> SYS@LABRAT1> select /* test count */ count(col1) from kso.skew3; COUNT(COL1) ----------- 384000036 1 row selected. Elapsed: 00:00:24.04 SYS@LABRAT1> select /* test count(rll) */ count(pk_col), count(col1), count(col2), count(col3), count(col4) from kso.skew3; COUNT(PK_COL) COUNT(COL1) COUNT(COL2) COUNT(COL3) COUNT(COL4) ------------- ----------- ----------- ----------- ----------- 384000048 384000036 384000036 384000036 384000036 1 row selected. Elapsed: 00:00:39.09 SYS@LABRAT1> select /* test max */ max(col1) from kso.skew3; MAX(COL1) ---------- 1000000 1 row selected. Elapsed: 00:00:24.51 SYS@LABRAT1> select /* test max all */ max(pk_col), max(col1), max(col2), max(col3), max(col4) from kso.skew3; MAX(PK_COL) MAX(COL1) MAX(COL2) MAX(COL3) M ----------- ---------- ------------------------------ --------- - 1.0000E+12 1000000 asddsadasd 01-JAN-09 Y 1 row selected. Elapsed: 00:00:57.32 SYS@LABRAT1> select /* test avg */ avg(col1) from kso.skew3; AVG(COL1) ---------- 495000.194 1 row selected. Elapsed: 00:00:30.00 SYS@LABRAT1> select /* test avg filter */ avg(col1) from kso.skew3 where col1 between 2 and 10000; AVG(COL1) ---------- 1 row selected. Elapsed: 00:00:13.97 SYS@LABRAT1> select /* test index */ avg(pk_col) from kso.skew where col1 = 136133; AVG(PK_COL) ----------- 15636133 1 row selected. Elapsed: 00:00:00.00 SYS@LABRAT1> set echo off SYS@LABRAT1> set timing off SYS@LABRAT1> @fsx Enter value for sql_text: % test % Enter value for sql_id: Enter value for inst_id: INST SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT ----- ------------- ------ ---------- ---------- ------------- ------------ ---------- ------ ----------- ---------- ---------------------------------------- 1 1767x3dw3m8pa 0 2684249835 1 57.31 1,956,226 1,956,219 0 Yes 9.35 select /* test max all */ max(pk_col), m 1 1wp72w4c5d449 0 2684249835 1 29.98 1,956,226 1,956,219 0 Yes 71.84 select /* test avg */ avg(col1) from kso 1 46vd0hducqms8 0 1667163222 1 422.77 2,032,691 1,956,219 0 No .00 select /* test full */ * from kso.skew3 1 5ry8vd8u0n0bf 0 2684249835 1 24.03 1,956,226 1,956,219 0 Yes 71.85 select /* test count */ count(col1) from 1 6jtf4rnxu2dq7 0 2684249835 1 24.50 1,956,226 1,956,219 0 Yes 71.85 select /* test max */ max(col1) from kso 1 8pfxs6mdf62mq 0 3723858078 1 .00 35 0 0 No .00 select /* test index */ avg(pk_col) from 1 abwyftyy395gk 0 2684249835 1 13.96 1,956,226 1,956,219 0 Yes 99.99 select /* test avg filter */ avg(col1) f 1 fxpkhxt4jwqqm 0 2684249835 1 39.08 1,956,226 1,956,219 0 Yes 9.35 select /* test count(rll) */ count(pk_co 8 rows selected. |
So I did several queries. Some were off-loadable and some were not. Then I ran the script to see what how the statements had been handled. There are a couple of things to note here:
It is possible to turn offloading off via a parameter, CELL_OFFLOAD_PROCESSING, or via a hint. Here’s a bit of the Oracle Exadata Storage Server Software User’s Guide:
The CELL_OFFLOAD_PROCESSING initialization parameter enables SQL processing offload to Exadata Cell. When the value of the parameter is set to TRUE, predicate evaluation can be offloaded to cells. The default value of the parameter is TRUE. If the parameter is set to FALSE at the session or system level, then the database performs all the predicate evaluation with cells serving blocks. You can set CELL_OFFLOAD_ PROCESSING dynamically with the SQL ALTER SYSTEM or ALTER SESSION command. For example:
SQL> ALTER SESSION SET CELL_OFFLOAD_PROCESSING = TRUE;
The CELL_OFFLOAD_PROCESSING parameter can also be set with the OPT_PARAM optimizer hint to enable or disable predicate filtering for a specific SQL command. For example:
– to disable cell_offload_processing for a SQL command SELECT /*+ OPT_PARAM(’cell_offload_processing’ ‘false’) */ COUNT(*) FROM EMPLOYEES;
– to enable cell_offload_processing for a SQL command SELECT /*+ OPT_PARAM(’cell_offload_processing’ ‘true’) */ COUNT(*) FROM EMPLOYEES;
And a quick example:
SYS@LABRAT1> select /* test avg filter hint */
2 /*+ OPT_PARAM('cell_offload_processing' 'false') */
3 avg(col1) from kso.skew3 where col1 between 2 and 10000;
AVG(COL1)
----------
1 row selected.
SYS@LABRAT1> @fsx
Enter value for sql_text: % test avg filter %
Enter value for sql_id:
Enter value for inst_id:
INST SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------------ ---------- ------ ----------- ---------- ----------------------------------------
1 0nb4tjmjuq67t 0 2684249835 1 104.89 1,956,227 1,956,219 0 No .00 select /* test avg filter hint */ /*+ OP
1 abwyftyy395gk 0 2684249835 1 13.96 1,956,226 1,956,219 0 Yes 99.99 select /* test avg filter */ avg(col1) f
2 rows selected. |
So as you can see, the hint disabled offloading and resulted in the statement taking about 10X the elapsed time of the offloaded version.
Now for one more example to whet your appetite for things to come:
SYS@LABRAT1> select count(*) from kso.skew3 where col1 is null;
COUNT(*)
----------
12
1 row selected.
Elapsed: 00:00:00.08
SYS@LABRAT1> @fsx
Enter value for sql_text: %col1 is null%
Enter value for sql_id:
Enter value for inst_id:
INST SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO AVG_PIO AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------------ ---------- ------ ----------- ---------- ----------------------------------------
1 0u1q4b7puqz6g 0 2684249835 1 .08 1,956,226 1,956,219 0 Yes 100.00 select count(*) from kso.skew3 where col
1 row selected.
Elapsed: 00:00:00.09
SYS@LABRAT1> @dplan
Enter value for sql_id: 0u1q4b7puqz6g
Enter value for child_no:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0u1q4b7puqz6g, child number 0
-------------------------------------
select count(*) from kso.skew3 where col1 is null
Plan hash value: 2684249835
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 533K(100)| |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SKEW3 | 12 | 60 | 533K (1)| 01:46:38 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("COL1" IS NULL)
filter("COL1" IS NULL)
20 rows selected.
Elapsed: 00:00:00.02 |
Yes that’s a full scan on a 384M row table that occupies about 2M blocks. And it took 8/100’s of a second. That’s 80 milliseconds - roughly the amount of time necessary to do 10-15 single block disk i/o’s.
Wow! Can you say storage index? More on that in the next installment!
Full name
Kerry Osborne
My company
http://www.enkitec.com
Recent comments
21 weeks 1 day ago
31 weeks 2 hours ago
32 weeks 5 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 5 days ago
49 weeks 1 day ago
50 weeks 1 day ago
50 weeks 2 days ago
1 year 1 week ago