I had an interesting little project this morning. Of course it takes longer to write it down than to do actually do it, but it was kind of interesting and since I haven’t done a post in quite some time (and it’s the day before Thanksgiving, so it’s pretty quite at the office anyway) I decided to share. One of the Enkitec guys (Tim Fox) was doing a performance comparison between various platforms (Exadata using it’s IB Storage Network, Oracle Database Appliance (ODA) using it’s direct attached storage, and a standard database on a Dell box using EMC fiber channel attached storage). The general test idea was simple – see how the platforms stacked up for a query that required a full scan of a large table. More specifically, what Tim wanted to see was the relative speed at which the various storage platforms could return data. The expectation was that the direct attached storage would be fastest and the fibre channel storage would be slowest (especially since we only had a single 2G HBA). He tested ODA and Exadata and got basically what he expected, but when he went to test the database on the Dell he was surprised that it was actually faster than either of the other two tests. So here’s some output from the initial tests: First the Exadata. It’s an X2 quarter rack with one extra storage server. Note that we had to set cell_offload_processing to false to turn off the Exadata storage optimizations, thus giving us a measurement of the hardware capabilities without the Exadata offloading.
> !sqlp sqlp SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 11:08:28 2011 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SYS@DEMO1> @uptime INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS ---------------- ----------------- ----------------- ------- ---------- DEMO1 07-NOV-2011 12:37 23-NOV-2011 11:08 15.94 1377058 SYS@DEMO1> set sqlprompt "_USER'@'EXADATA'>' " SYS@EXADATA> SYS@EXADATA> ! cat /etc/redhat-release Enterprise Linux Enterprise Linux Server release 5.5 (Carthage) SYS@EXADATA> ! uname -a Linux enkdb03.enkitec.com 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux SYS@EXADATA> alter session set "_serial_direct_read"=always; Session altered. SYS@EXADATA> alter session set cell_offload_processing=false; Session altered. SYS@EXADATA> set autotrace on SYS@EXADATA> set timing on SYS@EXADATA> select count(*) from instructor.class_sales; COUNT(*) ---------- 90000000 Elapsed: 00:00:43.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3145879882 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 314K (1)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS STORAGE FULL| CLASS_SALES | 90M| 314K (1)| 00:00:02 | ---------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1168567 consistent gets 1168557 physical reads 0 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SYS@EXADATA> set autotrace off SYS@EXADATA> @fss Enter value for sql_text: select count(*) from instructor.class_sales Enter value for sql_id: SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT ------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ---------------------------------------- b2br1x82p9862 0 1 1 43.00 3.16 1,168,557.00 1,168,567 select count(*) from instructor.class_sa Elapsed: 00:00:00.08 |
So the test on the Exadata took 43 seconds to read and transport roughly 1 million 8K blocks. The same test on the ODA looked like this:
[oracle@patty scripts]$ !rl rlwrap sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 23 10:16:51 2011 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS ---------------- -------------------------- -------------------------- ------- ---------- ODA1 22-NOV-2011 12:23 23-NOV-2011 10:16 .91 78793 SYS@ODA1> ! cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) SYS@ODA1> ! uname -a Linux patty 2.6.18-194.32.1.0.1.el5 #1 SMP Tue Jan 4 16:26:54 EST 2011 x86_64 x86_64 x86_64 GNU/Linux SYS@ODA1> set timing on SYS@ODA1> alter session set '_serial_direct_read'=always; Session altered. Elapsed: 00:00:00.00 SYS@ODA1> set autotrace on SYS@ODA1> select count(*) from instructor.class_sales; COUNT(*) ---------- 90000000 Elapsed: 00:00:30.60 Execution Plan ---------------------------------------------------------- Plan hash value: 3145879882 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 317K (1)| 00:00:05 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 317K (1)| 00:00:05 | -------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 99 recursive calls 0 db block gets 1154080 consistent gets 1153994 physical reads 516 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 1 rows processed SYS@ODA1> set autotrace off SYS@ODA1> @fss Enter value for sql_text: select count(*) from instructor.class_sales Enter value for sql_id: SQL_ID CHILD PLAN_HASH_VALUE EXECS ROWS_PROCESSED AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT ------------- ------ --------------- ---------- -------------- ---------- ---------- ---------- ---------- ---------------------------------------- b2br1x82p9862 0 3145879882 1 1 30.55 13.91 1,153,994 1,154,080 select count(*) from instructor.class_sa |
As expected, the direct attached disk was faster than moving the blocks across the IB network. It took about 30 seconds to read roughly the same number of blocks. The same test on the Dell produced this output:
[osborne@homer scripts]$ sqlp SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 23 11:20:20 2011 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SYS@EXADATA> set sqlprompt "'_USER'@'DELL'>'" SYS@DELL> !cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) SYS@DELL> ! uname -a Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux SYS@DELL> alter session set '_serial_direct_read'=always; Session altered. SYS@DELL> set autotrace on SYS@DELL> set timing on SYS@DELL> select count(*) from instructor.class_sales; COUNT(*) ---------- 90000000 Elapsed: 00:00:11.31 Execution Plan ---------------------------------------------------------- Plan hash value: 3145879882 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 315K (1)| 01:03:08 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 315K (1)| 01:03:08 | -------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 77 recursive calls 0 db block gets 1168660 consistent gets 1168569 physical reads 0 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 10 sorts (memory) 0 sorts (disk) 1 rows processed SYS@DELL> set autotrace off SYS@DELL> @fss Enter value for sql_text: select count(*) from instructor.class_sales Enter value for sql_id: SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT ------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ---------------------------------------- b2br1x82p9862 0 1 1 11.29 11.25 1,168,569.00 1,168,660 select count(*) from instructor.class_sa Elapsed: 00:00:00.03 |
This is where the results were surprising. Tim expected the single 2G HBA to be considerably slower than the other two set ups, but it only took 11 seconds on the Dell / EMC set up to read the same data. So this is where I got to help Tim try to make it slower! (there’s a first time for everything) So what gives? The first thought was that Oracle was caching the data: But no – you’ll notice that we set the _serial_direct_read parameter to always, which forces direct path reads and bypasses the buffer cache altogether. Also, the stats bear out that Oracle is doing physical reads. But 1 Million real i/o’s can’t happen in 11 seconds so it’s got to be memory access right? (I’m sure you’re way ahead of me by now) The answer of course is "Right – it is memory access!" – but it’s the Linux file system cache not the Oracle buffer cache. A quick check of the Linux memory usage showed us that the file cache was over 20G and the table was less than 10G. So Oracle must be set up without Direct I/O.
SYS@DELL> !cat /proc/meminfo MemTotal: 32948892 kB MemFree: 2769420 kB Buffers: 615664 kB Cached: 24727756 kB SwapCached: 101844 kB Active: 21322184 kB Inactive: 5261948 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 32948892 kB LowFree: 2769420 kB SwapTotal: 34996216 kB SwapFree: 34147776 kB Dirty: 448 kB Writeback: 0 kB AnonPages: 1138828 kB Mapped: 4758828 kB Slab: 1088356 kB PageTables: 167464 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 51470660 kB Committed_AS: 15637340 kB VmallocTotal: 34359738367 kB VmallocUsed: 371108 kB VmallocChunk: 34359367199 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB SYS@DELL> !realfreemem.sh -a Free Memory: 2703M Cached Memory: 24148M Total Free Memory: 26851M Total Memory: 32176M Percent Memory Free (including cache): 83% SYS@DELL> @table_size Enter value for owner: INSTRUCTOR Enter value for table_name: CLASS_SALES Enter value for type: OWNER SEGMENT_NAME TYPE TOTALSIZE_MEGS TABLESPACE_NAME -------------------- ------------------------------ ------------------ -------------- ------------------------------ INSTRUCTOR CLASS_SALES TABLE 9,151.0 ODACOMP_DATA -------------- sum 9,151.0 Elapsed: 00:00:00.04 SYS@DELL> -- so how is Oracle configured for I/O? SYS@DELL> @parms Enter value for parameter: filesys Enter value for isset: Enter value for show_hidden: NAME VALUE ISDEFAUL ISMODIFIED ISSET -------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ---------- filesystemio_options NONE TRUE FALSE FALSE Elapsed: 00:00:00.01 |
So the database was not configured to use direct i/o or async i/o (filesystemio_options=none). A quick check with strace verified that direct i/o was not being used. So we modified the filesystemio_options setting and tried again.
SYS@DELL> alter system set filesystemio_options=setall scope=spfile; System altered. SYS@DELL> startup force ORACLE instance started. Total System Global Area 4910620672 bytes Fixed Size 2236648 bytes Variable Size 2298482456 bytes Database Buffers 2600468480 bytes Redo Buffers 9433088 bytes Database mounted. Database opened. SYS@ODACOMP> SYS@ODACOMP> @uptime INSTANCE_NAME STARTUP_TIME CURRENT_TIME DAYS SECONDS ---------------- -------------------------- -------------------------- ------- ---------- ODACOMP 23-NOV-2011 12:45 23-NOV-2011 12:47 .00 121 SYS@ODACOMP> set sqlprompt "'_USER'@'DELL'>'" SYS@DELL> SYS@DELL> ! cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) SYS@DELL> ! uname -a Linux homer.enkitec.com 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux SYS@DELL> set timing on SYS@DELL> alter session set '_serial_direct_read'=always; Session altered. Elapsed: 00:00:00.00 SYS@DELL> set autotrace on SYS@DELL> select count(*) from instructor.class_sales; COUNT(*) ---------- 90000000 Elapsed: 00:01:39.73 Execution Plan ---------------------------------------------------------- Plan hash value: 3145879882 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 315K (1)| 01:03:08 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| CLASS_SALES | 90M| 315K (1)| 01:03:08 | -------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1168567 consistent gets 1168557 physical reads 0 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SYS@DELL> set autotrace off SYS@DELL> @fss Enter value for sql_text: select count(*) from instructor.class_sales Enter value for sql_id: SQL_ID CHILD EXECS AVG_ROWS AVG_ETIME AVG_CPU AVG_PIO AVG_LIO SQL_TEXT ------------- ------ ---------- ---------- ------------- ------------- ------------- ------------ ---------------------------------------- b2br1x82p9862 0 1 1 99.71 7.32 1,168,557.00 1,168,567 select count(*) from instructor.class_sa Elapsed: 00:00:00.05 |
Ah that’s more like it. Roughly 100 seconds now to complete the query. That’s 10X slower. Nice job. It feels so good when you can make something go slower. ;) I should note that in general, it is a very good idea to set filesystemio_options=SETALL. So please don’t jump to the conclusion that setting this parameter to NONE will make your database run faster. These days, the memory is generally better used by Oracle than by the file system cache. Glenn Fawcett has a good (if somewhat dated) post which compares the effects of caching blocks in the file system cache vs. in the Oracle buffer cache in Solaris. As you might image, allowing Oracle to use the memory is generally more effective as shown in his results. So that’s it for today. By the way, this post took 4 times as long to write as the actual testing took. Maybe I’ll get better at it if I start practicing a little more often.
Full name
Kerry Osborne
My company
http://www.enkitec.com
Recent comments
21 weeks 1 day ago
30 weeks 6 days ago
32 weeks 4 days ago
35 weeks 5 days ago
38 weeks 14 hours ago
47 weeks 4 days ago
49 weeks 1 day ago
50 weeks 1 day ago
50 weeks 2 days ago
1 year 6 days ago