Search

OakieTags

Who's online

There are currently 0 users and 40 guests online.

Recent comments

Affiliations

12c First N

There have been a couple of nice posts about the “Top N” (or First N / Next N)  syntax that has appeared in 12c, here and here, for example. I particularly like the first set of examples because they include some execution plans that give you a good idea of what’s  going on under the covers. “Under the covers” is important, because if you don’t actually have a large data set to test on you might not realise what impact a “Top N” query might have on a production data set.

You can do just a little better then inference from the execution plan, so I thought I’d push the envelope slightly with a small example and appeal (once again, sorry Doug) to the 10053 trace file. Here’s my test query, with resulting execution plan:

select
	/*+ track this */
	*
from
	t1
order by
	n1
	offset 10 rows
fetch
	next 1 percent rows with ties
;

SQL_ID  957mk29yc8d6h, child number 0
-------------------------------------
select  /*+ track this */  * from  t1 order by  n1  offset 10 rows
fetch  next 1 percent rows with ties

Plan hash value: 2273146475

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |       |  3789 (100)|          |
|*  1 |  VIEW               |      |   100K|    14M|       |  3789   (1)| 00:00:01 |
|   2 |   WINDOW SORT       |      |   100K|    11M|    13M|  3789   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |   100K|    11M|       |   678   (2)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber">10 AND
              "from$_subquery$_002"."rowlimit_$$_rank"<=CASE  WHEN (10>=0) THEN 10 ELSE 0
              END +CEIL("from$_subquery$_002"."rowlimit_$$_total"*1/100)))

Note that I’ve asked Oracle to skip the first ten rows then report the next 1 percent of the data – based on a given ordering – but to include any rows beyond the 1 percent where the ordering values still match the last row of the 1 percent (Tim Hall’s post includes an example showing the difference between “with ties” and “rows only”).

Line 2 of the execution plan is very revealing – we have a “Window Sort” operation – which tells you that the optimizer has to be using some sort of analytic function to get the result set.

So here’s the “Unparsed Query” (extracted with a bit of re-formatting from the 10053 trace file) that Oracle generated from the query:

SELECT
	from$_subquery$_002.ID ID,
	from$_subquery$_002.N1 N1,
	from$_subquery$_002.SMALL_VC SMALL_VC,
	from$_subquery$_002.PADDING PADDING
FROM	(
	SELECT
		T1.ID ID,
		T1.N1 N1,
		T1.SMALL_VC SMALL_VC,
		T1.PADDING PADDING,
		T1.N1 "rowlimit_$_  0",
		ROW_NUMBER() OVER ( ORDER BY T1.N1) rowlimit_$$_rownumber,
		RANK()       OVER ( ORDER BY T1.N1) rowlimit_$$_rank,
		COUNT(*)     OVER () rowlimit_$$_total
	FROM
		TEST_USER.T1 T1
	) from$_subquery$_002
WHERE
	from$_subquery$_002.rowlimit_$$_rank <=  		CASE   			WHEN (10>=0)
				THEN FLOOR(TO_NUMBER(10))
				ELSE 0
			END +
		CEIL(from$_subquery$_002.rowlimit_$$_total*1/100)
AND	from$_subquery$_002.rowlimit_$$_rownumber > 10
ORDER BY
	from$_subquery$_002."rowlimit_$_  0"
;

As expected, we can see the query has been transformed to include analytic functions – three of them. Oracle is basically taking a nice, easily readable syntax, and converting it behind the scenes to the sort of SQL we would once have written by hand. (That’s a good thing, of course – clarity of code does protect you from accidents, and it’s best if messy rewrites are handled by machines rather than error-prone people).

Although we have three analytic functions, we need only one window sort since the count(*) is over() the whole data set (no partitioning or ordering required), and the rank() and row_number() functions are partitioned and ordered by the same expression. Nevertheless, behind the scenes, my particular query has had to sort the entire data set to get the 1% that I wanted.

Footnote: the rank() function appears because of the “with ties” option. If we had chosen the “rows only” option Oracle wouldn’t have needed to discover rows with the same rank() as the last of the 1 percent.

For completeness, here’s the script to generate my data set:

execute dbms_random.seed(0)

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 10000
)
select
	rownum			id,
	trunc(dbms_random.value(0,5000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
/

Note that you don’t need to gather stats – 12c does that automatically on CTAS and “insert as select” into an empty table – though that doesn’t include histograms.