Search

OakieTags

Who's online

There are currently 0 users and 23 guests online.

Recent comments

Affiliations

Datatype discipline

(This is based on some ‘issues’ a client was having, so the example is architected to make a particular point, but is relevant nonetheless)

In the distributed database world, we often have different names for the same piece of information.

Consider two tables which hold customer information.  In one table, they called the customer key "REF_NUM", and in the other table "CUST_ID".

That’s not ideal, but it’s not a big drama because at least the data types align.

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL NUMBER(10)
                   T2_DATA                                              VARCHAR2(1000)

When I do a join between these two tables, it costs me about 9400 I/O’s

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  1  db block gets
               9408  consistent gets
                  0  physical reads

Consider now the same two tables, with exactly the same data, but the data types are not aligned.  In one system, they used a numeric, and in the other system they used a string

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL VARCHAR2(10)
                   T2_DATA                                              VARCHAR2(1000)

Now look what happens to my SQL

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  2  db block gets
              18428  consistent gets
                  0  physical reads

It costs DOUBLE the amount of I/O’s – it runs twice as slowly.

Moreover, it’s not just performance that is at risk…You’re whole application might just start to crash randomly

Using the same tables above (with the conflicting data types) here’s a SQL that works fine "today"…

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

          COUNT(*)
        ----------
                71

Then a pesky user comes along and does what users do…adds data to the one of the tables.

        SQL> insert into T2 values ('X123','100some data');

        1 row created.

And now let’s re-run that same query again

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

        ERROR: ORA-01722: invalid number

Ker-splat….your application is toast…

Not being disciplined with data type selection causes pain later.