Search

Top 60 Oracle Blogs

Recent comments

Case and Aggregate bug

The following description of a bug appeared on the Oracle Developer Community forum a little while ago – on an upgrade from 12c to 19c a query starting producing the wrong results on a simple call to the average() function. In fact it turned out to be a bug introduced in 12.2.0.1.

The owner of the thread posted a couple of zip files to build a test case – but I had to do a couple of edits, and change the nls_numeric_characters to ‘,.’ in order to get past a formatting error on a call to the to_timestamp() function. I’ve stripped the example to a minimum, and translated column name from German (which was presumably the source of the nls_numeric_characters issue) to make it easier to demonstrate and play with the bug.

First the basic data – you’ll notice that I’ve tested this on 12.1.0.2, 12.2.0.1 and 19.3.0.0 to find out when the bug appeared:

rem
rem     Script:         case_aggregate_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem     Purpose:        
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create table test(
        case_col        varchar2(11), 
        duration        number(*,0), 
        quarter         varchar2(6), 
        q2h_knum_b      varchar2(10)
   )
/

insert into test values('OK',22,'1.2020','AB1234');
insert into test values('OK',39,'1.2020','AB1234');
insert into test values('OK',30,'1.2020','AB1234');
insert into test values('OK',48,'1.2020','AB1234');
commit;

execute dbms_stats.gather_table_stats(user,'test')

create or replace force view v_test
as 
select 
        q2h_knum_b,
        case 
                when b.case_col not like 'err%'
                        then b.duration 
        end     duration,
        case 
                when b.case_col not like 'err%' 
                        then 1 
                        else 0 
        end     status_ok
from
        test b
where
        substr(b.quarter, -4) = 2020
;


break on report
compute avg of duration on report
select * from v_test;

---------------------------------------------

Q2H_KNUM_B   DURATION  STATUS_OK
---------- ---------- ----------
AB1234             22          1
AB1234             39          1
AB1234             30          1
AB1234             48          1
           ----------
avg             34.75


I’ve created a table, loaded some data, gathered stats, then created a view over the table. The view includes a couple of columns that use a simple case expression, and both expressions are based in the same way on the same base column (this may, or may not, be significant in what’s coming). I’ve then run off a simple query with a couple of SQL*Plus commands to report the actual content of the view with the average of the duration column – which is 34.75.

So now we run a couple of queries against the view which aggregate the data down to a single row – including the avg() of the duration – using the coalesce() function – rather than the older nvl() function – to convert any nulls to zero.


select
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg,
        coalesce(sum(status_ok), 0)     ok_count
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

DURATION_COUNT DURATION_MED DURATION_AVG   OK_COUNT
-------------- ------------ ------------ ----------
             4         34.5            0          4

You’ll notice that the duration_avg is reported as zero (this would be the same if I used nvl(), and would be a null if I omitted the coalesce(). This is clearly incorrect. This was the output from 19.3; 12.2 gives the same result, 12.1.0.2 reports the average correctly as 34.75.

There are several way in which you can modify this query to get the right average – here’s one, just put the ok_count column first in the select list:


select
        coalesce(sum(status_ok), 0)     ok_count,
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

  OK_COUNT DURATION_COUNT DURATION_MED DURATION_AVG
---------- -------------- ------------ ------------
         4              4         34.5        34.75


There’s no obvious reason why the error should occur, but there’s a little hint about what may be happening in the Column projection information from the execution plan. The basic plan is the same in both cases, so I’m only show it once; but it’s followed by two versions of the projection information (restricted to operation 1) which I’ve formatted to improve:

Plan hash value: 2603667166

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT GROUP BY     |      |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     1 |    20 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((INSTR('AB1234',"Q2H_KNUM_B")>0 AND
              TO_NUMBER(SUBSTR("B"."QUARTER",(-4)))=2020))

Column Projection Information (Operation 1 only):  (Wrong result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22]



Column Projection Information (Operation 1 only):  (Right result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22]

As you can see, to report avg() Oracle has projected sum() and count().

When we get the right result the sum() for duration appears immediately after the count().

When we get the wrong result the sum() for ok_count comes between the count() and sum() for duration.

This makes me wonder whether Oracle is somehow just losing track of the sum() for duration and therefore dividing null by the count().

This is purely conjecture, of course, and may simply be a coincidence – particularly since 12.1.0.2 gets the right result and shows exactly the same projection information.

Readers are left to experiment with other variations to see if they can spot other suggestive patterns.

Update (Aug 2020)

This is now logged as Bug 31732779 – WRONG RESULT WITH CASE STATEMENT AGGREGATION , though it’s not yet publicly visible.