Blog update 2012.02.28: I’ve received countless inquiries about the storage used in the proof points I’m making in this post. I’d like to state clearly that the storage is not a production product, not a glimpse of something that may eventually become product or any such thing. This is a post about CPU, not about storage. That point will be clear as you read the words in the post.
In my recent article entitled How Many Non-Exadata RAC Licenses Do You Need to Match Exadata Performance I brought up the topic of processor requirements for Oracle with and without Exadata. I find the topic intriguing. It is my opinion that anyone influencing how their company’s Oracle-related IT budget is used needs to find this topic intriguing.
Before I can address the poll in the above-mentioned post I have to lay some groundwork. The groundwork I need to lay will come in this and an unknown number of installments in a series.
There is no value add for Oracle Database on Exadata in the OLTP/ERP use case. Full stop. OLTP/ERP does not offload processing to storage. Your full-rack Exadata configuration has 168 Xeon 5600 cores in the storage grid doing practically nothing in this use case. Or, I should say, the processing that does occur in the Exadata storage cells (in the OLTP/ERP use case) would be better handled in the database host. There simply is no value in introducing off-host I/O handling (and all the associated communication overhead) for random single-block accesses. Additionally, since Exadata cannot scale random writes, it is actually a very weak platform for these use cases. Allow me to explain.
Exadata Random Write I/O
While it is true Exadata offers the bandwidth for upwards of 1.5 million read IOPS (with low latency) in a full rack X2 configuration, the data sheet specification for random writes is a paltry 50,000 gross IOPS—or 25,000 with Automatic Storage Management normal redundancy. Applications do not exhibit 60:1 read to write ratios. Exadata bottlenecks on random writes long before an application can realize the Exadata Smart Flash Cache datasheet random read rates.
Oracle positions Exadata against products like EMC Greenplum for DW/BI/Analytics workloads. I fully understand this positioning because DW/BI is the primary use case for Exadata. In its inception Exadata addressed very important problems related to data flow. The situation as it stands today, however, is that Exadata addresses problems that no longer exist. Once again, allow me to explain.
The Scourge Of The Front-Side Bus Is Ancient History. That’s Important!
It was not long ago that provisioning ample bandwidth to Real Application Clusters for high-bandwidth scans was very difficult. I understand that. I also understand that, back in those days, commodity servers suffered from internal bandwidth problems limiting a server’s data-ingest capability from storage (PCI->CPU core). I speak of servers in the pre-Quick Path Interconnect (Nehalem EP) days. In those days it made little sense to connect more than, say, two active 4GFC fibre channel paths (~800 MB/s) to a server because the data would not flow unimpeded from storage to the processors. The bottleneck was the front-side bus choking off the flow of data from storage to processor cores. This fact essentially forced Oracle’s customers to create larger, more complex clusters for their RAC deployments just to accommodate the needed flow of data (throughput). That is, while some customers toiled with the most basic problems (e.g., storage connectivity), others solved that problem but still required larger clusters to get more front-side buses involved.
It wasn’t really about the processor cores. It was about the bus. Enter Exadata and storage offload processing.
Because the servers of yesteryear had bottlenecks between the storage adapters and the CPU cores (the front-side bus) it was necessary for Oracle to devise a means for reducing payload between storage and RAC host CPUs. Oracle chose to offload the I/O handling (calls to the Kernel for physical I/O), filtration and column projection to storage. This functionality is known as a Smart Scan. Let’s just forget for a moment that the majority of CPU-intensive processing, in a DW/BI query, occurs after filtration and projection (e.g., table joins, sort, aggregation, etc). Shame on me, I digress.
All right, so imagine for a moment that modern servers don’t really need the offload-processing “help” offered by Exadata? What if modern servers can actually handle data at extreme rates of throughput from storage, over PCI and into the processor cores without offloading the lower level I/O and filtration? Well, the answer to that comes down to how many processor cores are involved with the functionality that is offloaded to Exadata. That is a sophisticated topic, but I don’t think we are ready to tackle it yet because the majority of datacenter folks I interact with suffer from a bit of EarthStillFlat(tm) syndrome. That is, most folks don’t know their servers. They still think it takes lots and lots of processor cores to handle data flow like it did when processor cores were held hostage by front-side bus bottlenecks. In short, we can’t investigate how necessary offload processing is if we don’t know anything about the servers we intend to benefit with said offload. After all, Oracle database is the same software whether running on a Xeon 5600-based server in an Exadata rack or a Xeon 5600-based server not in an Exadata rack.
It is possible to know your servers. You just have to measure.
You might be surprised at how capable they are. Why presume modern servers need the help of offloading I/O (handling) and filtration. You license Oracle by the processor core so it is worthwhile knowing what those cores are capable of. I know my server and what it is capable of. Allow me to share a few things I know about my server’s capabilities.
My server is a very common platform as the following screenshot will show. It is a simple 2s12c24t Xeon 5600 (a.k.a. Westmere EP) server:
My server is attached to very high-performance storage which is presented to an Oracle database via Oracle Managed Files residing in an XFS file system in a md(4) software RAID volume. The following screenshot shows this association/hierarchy as well as the fact that the files are accessed with direct, asynchronous I/O. The screenshot also shows that the database is able to scan a table with 1 billion rows (206 GB) in 45 seconds (4.7 GB/s table scan throughput):
The io.sql script accounts for the volume of data that must be ingested to count the billion rows:
$ cat io.sql set timing off col physical_reads_GB format 999,999,999; select VALUE /1024 /1024 /1024 physical_reads_GB from v$sysstat where STATISTIC# = (select statistic# from v$statname where name like '%physical read bytes%'); set timing on
So this simple test shows that a 2s12c24t server is able to process 392 MB/s per processor core. When Exadata was introduced most data centers used 4GFC fibre channel for storage connectivity. The servers of the day were bandwidth limited. If only I could teleport my 2-socket Xeon 5600 server back in time and put it next to an Exadata V1 box. Once there, I’d be able to demonstrate a 2-socket server capable of handling the flow of data from 12 active 4GFC FC HBA ports! I’d be the talk of the town because similar servers of that era could neither connect as many active FC HBAs nor ingest the data flowing over the wires—the front-side bus was the bottleneck. But, the earth does not remain flat.
The following screenshot shows the results of five SQL statements explained as:
This blog series aims to embark on finding good answers to the question I raised in my recent article entitled How Many Non-Exadata RAC Licenses Do You Need to Match Exadata Performance. I’ve explained that offload to Exadata storage consists of payload reduction. I also offered a technical, historical perspective as why that was so important. I’ve also showed that a small, modern QPI-based server can flow data through processor cores at rates ranging from 407 MBPS/core down to 107 MBPS/core depending on what the SQL is doing (SQL with no predicates mind you).
Since payload reduction is the primary value add of Exadata I finished this installment in the series with an example of a simple 2s12c24t Xeon 5600 server filtering out all rows at a rate of 4,882 MB/s—essentially the same throughput as a simple count(*) of all rows as I showed earlier in this post. That is to say that, thus far, I’ve shown that my little lab system can sustain nearly 5GB/s disk throughput whether performing a simple count of rows or filtering out all rows (based on a date predicate). What’s missing here is the processor cost associated with the filtration and I’ll get to that soon enough.
We can’t accurately estimate the benefit of offload until we can accurately associate CPU cost to filtration. I’ll take this blog series to that point over the next few installments—so long as this topic isn’t too boring for my blog readers.
This is part I in the series. At this point I hope you are beginning to realize that modern servers are better than you probably thought. Moreover, I hope my words about the history of front-side bus impact on sizing systems for Real Application Clusters is starting to make sense. If not, by all means please comment.
As this blog series progresses I aim to help folks better appreciate the costs of performing certain aspects of Oracle query processing on modern hardware. The more we know about modern servers the closer we can get to answer the poll more accurately. You license Oracle by the processor core so it behooves you to know such things…doesn’t it?
By the way, modern storage networking has advanced far beyond 4GFC (400 MB/s).
Finally, as you can tell by my glee in scanning Oracle data from an XFS file system at nearly 5GB/s (direct I/O), I’m quite pleased at the demise of the front-side bus! Unless I’m mistaken, a cluster of such servers, with really fast storage, would be quite a configuration.