Search

Top 60 Oracle Blogs

Recent comments

Join Performance for UUID, STRING, and INTEGER with CockroachDB

overview

To continue on the UUID performance thread, I was recently asked by a customer how joins perform with various data types. I had not run a specific test, but suspected perform would be driven mostly by the size of the data types.

I wanted to verify my assumptions with real test data that shows the core performance of joins with CockroachDB.

the schema, data, and queries

For this test, two tables were created. The first table had one million rows and the second table had 200k matching primary keys for UUID, STRING, and INTEGER data types.

schema:

create table u1 (id uuid primary key);
create table u2 (id uuid primary key);

create table s1 (id string primary key);
create table s2 (id string primary key);

create table i1 (id integer primary key);
create table i2 (id integer primary key);

data load:

insert into u1 select token from generate_series(1,1000000);
insert into u2 select token from u1 offset 400000 limit 200000;

insert into s1 select token::string from u1;
insert into s2 select token::string from u1 offset 400000 limit 200000;

insert into i1 select gs from generate_series(1,1000000) as gs;
insert into i2 select gs from generate_series(400001,600000) as gs;

queries:

select count(*) from u1 join u2 using (id);
select count(*) from s1 join s2 using (id);
select count(*) from i1 join i2 using (id);

sizing of types and tables

The data type sizes are as follows:

  • UUID :: 16 bytes
  • STRING :: 36 bytes to store UUID as string
  • INTEGER :: 8 bytes for default 64bit values

CockroachDB uses prefix compression compress the primary key values, so the best way is to look at the size of the ranges for the various tables.

-- UUID table (u1)
--
SQL> show ranges from table u1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       43 |     40.054517 |            1 |                       | {1}      | {""}

-- STRING table (s1) :: Storing UUIDs
--
SQL> show ranges from table s1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       45 |            60 |            1 |                       | {1}      | {""}

-- INTEGER table (i1) :: Storing INTEGER values
--
SQL> show ranges from table i1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       49 |     24.934101 |            1 |                       | {1}      | {""}

As you can see, the ranges are compressed pretty well which will surely help improve the performance of joins for all data-types.

raw size (MB) Stored(MB) Compression Ratio
INTEGER 61.0 24.9 2.45
UUID 122.1 40.1 3.05
STRING 274.7 60.0 4.58

… and the winner is

As expected, the performance does follow the size of the various columns. The String data type did perform better than expected, mainly due to the compression of the prefix and data within CockroachDB.

Please feel free to take CockroachDB to a spin and try this for yourself. The instructions to reproduce are in my github repository for join tests.