Search

OakieTags

Who's online

There are currently 0 users and 38 guests online.

Recent comments

Affiliations

V$CELL_THREAD_HISTORY – “ASH” for Exadata Storage Cells

Did you know that there’s something like Active Session History also in the Exadata storage cells? ;-)

The V$CELL_THREAD_HISTORY view is somewhat like V$ACTIVE_SESSION_HISTORY, but it’s measuring thread activity in the Exadata Storage Cells:

SQL> @desc v$cell_thread_history
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      CELL_NAME                                VARCHAR2(1024)
    2      SNAPSHOT_ID                              NUMBER
    3      SNAPSHOT_TIME                            DATE
    4      THREAD_ID                                NUMBER
    5      JOB_TYPE                                 VARCHAR2(32)
    6      WAIT_STATE                               VARCHAR2(32)
    7      WAIT_OBJECT_NAME                         VARCHAR2(32)
    8      SQL_ID                                   VARCHAR2(13)
    9      DATABASE_ID                              NUMBER
   10      INSTANCE_ID                              NUMBER
   11      SESSION_ID                               NUMBER
   12      SESSION_SERIAL_NUM                       NUMBER

It keeps about 10 minutes worth of samples of Exadata Storage Cell thread activity:

SQL> @minmax snapshot_time v$cell_thread_history
Show min/max (low/high) values in column "snapshot_time" of table v$cell_thread_history...

MIN(SNAPSHOT_TIME MAX(SNAPSHOT_TIME
----------------- -----------------
20130419 14:42:15 20130419 14:52:54

Note that it’s not the V$ view or the database instance which stores this array – it’s the storage cells themselves. If you query the v$cell_thread_history view, your Oracle database session is going to “gather” this instrumentation data from all the required cells and present it to you, that’s why the “cell statistics gather” wait event shows up:

PARSING IN CURSOR #140596385017248 len=42 dep=0 uid=0 oct=3 lid=0 tim=1366404896817011 hv=4063158547 ad='19e452578' sqlid='63awy1gg
t2xs8m'
select count(*) from v$cell_thread_history
END OF STMT
PARSE #140596385017248:c=0,e=115,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4272803188,tim=1366404896817010
EXEC #140596385017248:c=0,e=27,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4272803188,tim=1366404896817127
WAIT #140596385017248: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=40 tim=1366404896817152
WAIT #140596385017248: nam='cell statistics gather' ela= 283 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896818846
WAIT #140596385017248: nam='cell statistics gather' ela= 352 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896819317
WAIT #140596385017248: nam='cell statistics gather' ela= 376 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896820929
WAIT #140596385017248: nam='cell statistics gather' ela= 326 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896822198
WAIT #140596385017248: nam='cell statistics gather' ela= 580 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896823620
...

Now, how frequently do the cells sample their “ASH” data:

SQL> SELECT DISTINCT (snapshot_time - LAG(snapshot_time) 
                      OVER (ORDER BY snapshot_time)) * 86400 lag_seconds
     FROM (SELECT distinct snapshot_time 
           FROM v$cell_thread_history WHERE cell_name = '192.168.12.3');

LAG_SECONDS
-----------

          1

Looks like the sampling is done exactly once per second!

So, great, what can we do with this data?

Here’s an Exadata performance script (cth.sql) which uses V$CELL_THREAD_HISTORY to dig into cell activity from the database layer.

First, I must warn you, the CTH (Cell Thread History) is not as powerful as the Database-level ASH (ASH has almost 100 columns in it already, but the v$cell_thread_history has only 12) and there seem to be some shortcomings in the instrumentation data, as you will see.
But it still allows us to drill down a bit deeper into Exadata storage cell activity straight from your favorite database access tool. And it also has database session-level details in it – which means this information can be used for systematic Exadata performance troubleshooting too!

As the database sessions do communicate the SQL_ID of the currently active SQL statement to the storage cells when sending them requests, we can list the TOP activity by SQL_ID inside the storage cells!

The cth.sql script takes 4 arguments:

  1. The columns to group the CTH activity data by
  2. The filter settings (which sessions, instances, sql_ids to report?)
  3. The begin time range to report
  4. The end time range to report

Note that the CTH view will only have about 10 minutes of history in it (unless the cells are almost idle). For longer-term analysis you might want to gather the history to some table every couple of minutes.

Also, the snapshot_time column shows the CELL time, so if there’s significant clock drift between the storage cells and DB nodes, you might measure the wrong range of data.

Anyway, let’s see the “TOP SQL” in all storage cells:

SQL> @exadata/cth sql_id 1=1 sysdate-1 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...

   SECONDS SQL_ID
---------- -------------
      1755
       977 akbpwk3j38vw7
       146 25y04n1twszyu
       135 98pza6kpfwczj
        14 5n83a4q202674
        12 0w2qpuc6u2zsp
        12 147a57cxq3w5y
         7 axdkf5tjztmy7
         6 apgb2g9q2zjh1
         2 dcq9a12vtcnuw
         2 a9gvfh5hx9u98
         1 f318xdxdn0pdc
         1 0ws7ahf1d78qa

13 rows selected.

Immediately we see that majority of the time consumed does not have any SQL ID associated with it, but we will get back to this later. For now, let’s pick the TOP SQL_ID and drill down into it further, by grouping the data by some more columns:

SQL> @exadata/cth sql_id,job_type,wait_state,wait_object_name sql_id='akbpwk3j38vw7' sysdate-1 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...

   SECONDS SQL_ID        JOB_TYPE                         WAIT_STATE                       WAIT_OBJECT_NAME
---------- ------------- -------------------------------- -------------------------------- --------------------------------
       868 akbpwk3j38vw7 PredicateFilter                  working
        52 akbpwk3j38vw7 PredicateDiskRead                working
        45 akbpwk3j38vw7 PredicateCacheGet                waiting_for_SKGXP_send
         7 akbpwk3j38vw7 CacheGet                         working
         6 akbpwk3j38vw7 PredicateCacheGet                working
         3 akbpwk3j38vw7 CacheGet                         waiting_for_SKGXP_send
         2 akbpwk3j38vw7 PredicateDiskRead                waiting_for_mutex                gdisk ext
         2 akbpwk3j38vw7 CacheGet                         waiting_for_mutex                RemoteSendPort
         1 akbpwk3j38vw7 PredicateFilter                  waiting_for_mutex                PredicateDisk-1
         1 akbpwk3j38vw7 PredicateCachePut                working

Now we have a breakdown of the storage cell thread activity, when (knowingly) doing work for the SQL_ID akbpwk3j38vw7. The job type shows the “phase” of execution in the data flow inside a storage cell. I think I need to write a separate post about the cell execution phases, but for now, the PredicateDiskRead, as the name says, deals with setting up the IOs for disk reads, the PredicateCacheGet is what deals with the (smart) IO requests handed to storage cells by DB sessions (and sending the results back too) and the PredicateFilter is the phase which does the actual processing of just read datablocks in storage cells (logically “opening” the block and passing it through cache, transactional layers to the data layer functions, which extract the required columns of the matching rows of any predicates offloaded to the storage cells). If you wonder where are the I/O wait states – it looks like for (at least) smart scans, thanks to asynchronous I/O the actual I/O waiting/reaping is done by separate threads, which do not “belong” to a single SQL_ID, but track the completion of all I/Os submitted by all threads independently. Cellsrv is a single (multithreaded) process, so it’s easy to separate tasks within the process. This is also how you avoid keeping all 100+ cellsrv threads “occupied” in case of slow IO.

The wait state column is more complicated than the database layer’s session wait state “waiting or not waiting” logic. There are no separate wait events listed in the CTH, but the wait state itself contains the reasons for some waits. Working means ON CPU (or trying to be on CPU if the OS scheduler hasn’t put the thread there yet) and all the waiting_ states, like the name says, mean waiting for something, like a next request over the network (waiting_for_SKGXP_receive) or asynchronous IO completion (waiting_for_IO_reap). Note that Exadata cellsrv architecture is radically different from what you’re used to from the Database world, it’s a multithreaded job queuing and scheduling engine where one thread may submit IOs and some completely different thread will be the one who checks for the asynchronous IO completion (IO reaping), while a yet another thread does the actual filter processing (offloaded execution) tasks. So don’t expect the cellsrv threads serving your smart scans to actually wait for any I/O, because there are independent threads doing I/O reaping for all incoming requests. If you want more deep details – I do talk about the cellsrv internals and data flow in my Advanced Exadata Performance seminar and also Roger MacNicol from Exadata Development will deliver an Exadata SmartScan Deep Dive presentation at Enkitec E4 in August this year (2013) ;-)

Here’s the (probably) full list of wait states from Oracle 11.2.0.3 BP11 running on Cellsrv 11.2.3.2.1)

initial
working
waiting_in_scheduler
waiting_for_mutex
waiting_for_condition_variable
waiting_for_IO_reap
waiting_for_SKGXP_receive
waiting_for_SKGXP_send
looking_for_job
waiting_for_msg_hdr
waiting_for_system_work
waiting_for_offline_done
waiting_for_storidx_lock
waiting_for_issued_ios
waiting_for_buffer_allocation
waiting_for_connect
waiting_for_flashcache_init
waiting_for_io_quiescence
waiting_for_io_draining

Here’s another example, where I show all activity in all storage cells, by SQL_ID (if known), the database_id, instance_id and the session_id of the database session:

SQL> @exadata/cth sql_id,database_id,session_id,job_type,wait_state 1=1 sysdate-1/24/60 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...

   SECONDS SQL_ID        DATABASE_ID SESSION_ID JOB_TYPE                         WAIT_STATE                       MIN(SNAPSHOT_TIME MAX(SNAPSHOT_TIME
---------- ------------- ----------- ---------- -------------------------------- -------------------------------- ----------------- -----------------
      1301 0qmhpg330zqfu  1538629110       2916 PredicateFilter                  working                          20130514 09:20:11 20130514 09:21:10
       313 0qmhpg330zqfu  1538629110          5 PredicateFilter                  working                          20130514 09:20:11 20130514 09:21:10
       242 0qmhpg330zqfu  1538629110        203 PredicateFilter                  working                          20130514 09:20:11 20130514 09:21:10
        73                         0          0 UnidentifiedJob                  waiting_for_IO_reap              20130514 09:20:11 20130514 09:21:10
         9                         0          0 UnidentifiedJob                  working                          20130514 09:20:13 20130514 09:20:58
         4                1538629110        972 CachePut                         working                          20130514 09:20:17 20130514 09:21:04
         3                         0          0 NetworkRead                      working                          20130514 09:20:17 20130514 09:20:34
         2                1538629110        584 CachePut                         working                          20130514 09:20:13 20130514 09:20:59
         2                1538629110        492 CachePut                         working                          20130514 09:20:26 20130514 09:20:52
         2 0qmhpg330zqfu  1538629110          5 PredicateDiskRead                working                          20130514 09:20:58 20130514 09:20:58
         1                         0          0 IO poor performance detection th                                  20130514 09:20:23 20130514 09:20:23
         1                         0          0 ProcessIoctl                     working                          20130514 09:20:26 20130514 09:20:26
         1 147a57cxq3w5y  1538629110       2140 CacheGet                         working                          20130514 09:20:12 20130514 09:20:12
         1 0w2qpuc6u2zsp  1538629110       1479 CacheGet                         working                          20130514 09:21:04 20130514 09:21:04
         1 0w2qpuc6u2zsp  1538629110        986 CacheGet                         working                          20130514 09:20:44 20130514 09:20:44
         1                         0          0 System stats collection thread   working                          20130514 09:20:41 20130514 09:20:41
         1 147a57cxq3w5y  1538629110        790 CacheGet                         waiting_for_SKGXP_send           20130514 09:20:29 20130514 09:20:29
         1 apgb2g9q2zjh1  1538629110       1280 CacheGet                         working                          20130514 09:21:01 20130514 09:21:01
         1 apgb2g9q2zjh1  1538629110       1085 CacheGet                         working                          20130514 09:21:02 20130514 09:21:02
         1                4075070345       1177 CacheGet                         working                          20130514 09:20:23 20130514 09:20:23

Notes:

  1. Exadata storage cell software (cellsrv) samples active thread states every second (like ASH), so every record in V$CELL_THREAD_HISTORY stands for 1 second of “cell activity time” if counted. You can COUNT the records in V$CELL_THREAD_HISTORY just like you can count them in ASH to get approximate (cell) activity time.
  2. You do not have to query the GV$ view as the V$ view itself returns all data from all storage cells (unless restricted by a where clause)
  3. There is no 1-1 correspondence of a cell thread and your database session, so don’t try to simply add these numbers up. A smart scan can asynchronously and independently work for your database session, working in advance, doing prefetching and preprocessing regardless of what your database session happens to be doing. One smart scan “session” can end up spawning multiple threads in each storage cell if the cell can not produce data fast enough than the database session can consume.
  4. The database_id column corresponds to the “global database ID” ( SELECT ksqdngunid FROM X$KSQDN ), not the usual DBID in v$database.

A warning: There have been a few cases where my session crashes (well, actually my process, but not the database) when querying this V$ view:

SQL> @exadata/cth wait_state,sql_id 1=1 sysdate-1/24 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...
      , MIN(snapshot_time), MAX(snapshot_time)
                                *
ERROR at line 5:
ORA-03113: end-of-file on communication channel

But it only has affected my own session so far, so when I log in and rerun the script, it works ok again. I yet have to file a bug for this – this is a documented V$ view and it shouldn’t crash when queried.

This post ended up quite long, so I will continue a Part 2 some day in the future :)

You can download the latest version of cth.sql from here.

Update: John Beresniewicz from Oracle sent me a treemap visualization of the CTH data he experimented with some years ago, here’s an example:

CellSrvTree1a