Top 60 Oracle Blogs

Recent comments

No /proc/diskstats Does Not Track **Your** Physical I/O Requests

You have applications that scan disk using large sequential reads so you take a peek at /proc/diskstats (field #4 on modern Linux distributions) before and after your test in order to tally up the number of reads your application performed. That’s ok. That’s also a good way to get erroneous data.

Your application makes calls for I/O transfers of a particular size. The device drivers for your storage might not be able to accommodate your transfer request in a single DMA and will therefore “chop it up” into multiple transfers. This is quite common with Fibre Channel device drivers where, for example, I/O requests larger than, say, 256KB get rendered into multiple 256KB transfers in the kernel.

This is not a new phenomenon. However, folks may not naturally expect how stats in /proc/diskstats reflect this phenomenon.

The following screen shot shows a simple shell script I wrote to illustrate the point I’m making. The script is very simple. As the screen shot shows, the script will execute a single dd(1) command with direct I/O for 1,000 reads of sizes varying from 4KB to 1024KB.


#000000;" src="" alt="" width="500" height="337" srcset=" 500w, 150w, 300w, 757w" sizes="(max-width: 500px) 100vw, 500px" />

First, I executed the script by pointing it to a file in an XFS file system on an NVMe drive. As the next screenshot shows, diskstats accurately tallied the application reads until the I/O sizes were larger than 256KB. We can see that the device is only taking DMA requests of up to 256KB. When the script was conducting 512KB and 1024KB I/O requests the count of reads doubled and quadrupled, respectively, as per /proc/diskstats.

#000000;" src="" alt="" width="500" height="207" srcset=" 500w, 150w, 300w, 768w, 835w" sizes="(max-width: 500px) 100vw, 500px" />

I’m sure readers are wondering if the file system (XFS) might be muddying the water. It is not. The following screen shot shows scanning the device directly and the resultant diskstats data remained the same as it was when I scanned a file on that device.


#000000;" src="" alt="" srcset=" 480w, 150w, 300w" sizes="(max-width: 480px) 100vw, 480px" />


This was not meant to be a life-changing blog post. The main point I’m sharing is that it’s important to not confuse your I/O requests with the I/O requests of the kernel. You can scan with whatever read size you’d like. However, the kernel has to abide by the transfer size limit announced by the device driver. To that end, you have your reads and the kernel has its reads. Just remember that /proc/diskstats is tracking the kernel’s read requests–not yours.