.
I have posted, a long time ago, about Google Spanner (inserting data and no decimal numeric data types) but many things have changed in this area. There is now a NUMERIC data type and many things have improved in this distributed SQL database, improving a bit the SQL compatibility.
I can use the Cloud Shell, which is very easy – one click fron the console – but here I’m showing how to install the gcloud CLI temporarily in a OEL environement (I’ve explained in a previous post how the OCI free tier is my preferred home)
curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-... | tar -C /var/tmp -zxf -
/var/tmp/google-cloud-sdk/install.sh --usage-reporting true --screen-reader false --rc-path /tmp/gcloud.tmp --command-completion true --path-update true --override-components
This downloads and unzips the Google Cloud SDK for Linux (there are other options like a YUM repo). I put it in a temporary directory here under /var/tmp. install.sh is interactive or you can supply all information on command line. I don’t want it to update my .bash_profile or .bashrc but want to see what it puts there, so just providing a temporary /tmp/gcloud.sh to have a lookt at it
[opc@a ~]$ /var/tmp/google-cloud-sdk/install.sh --usage-reporting true --screen-reader false --rc-path /tmp/gcloud.sh --command-completion true --path-update true --override-components
Welcome to the Google Cloud SDK!
Your current Cloud SDK version is: 321.0.0
The latest available version is: 321.0.0
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Components │
├───────────────┬──────────────────────────────────────────────────────┬──────────────────────────┬──────────┤
│ Status │ Name │ ID │ Size │
├───────────────┼──────────────────────────────────────────────────────┼──────────────────────────┼──────────┤
...
│ Installed │ BigQuery Command Line Tool │ bq │ < 1 MiB │
│ Installed │ Cloud SDK Core Libraries │ core │ 15.9 MiB │
│ Installed │ Cloud Storage Command Line Tool │ gsutil │ 3.5 MiB │
└───────────────┴──────────────────────────────────────────────────────┴──────────────────────────┴──────────┘
[opc@a ~]$ cat /tmp/gcloud.tmp
# The next line updates PATH for the Google Cloud SDK.
if [ -f '/var/tmp/google-cloud-sdk/path.bash.inc' ]; then . '/var/tmp/google-cloud-sdk/path.bash.inc'; fi
# The next line enables shell command completion for gcloud.
if [ -f '/var/tmp/google-cloud-sdk/completion.bash.inc' ]; then . '/var/tmp/google-cloud-sdk/completion.bash.inc'; fi
As you can see, I mentioned nothing for –override-components and the default is gsutil, core and bq
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/12/Screens... 293w" sizes="(max-width: 180px) 100vw, 180px" />
This is the most simple cloud CLI I ever seen. Just type: gcloud init
(/var/tmp/google-cloud-sdk/bin/gcloud in my temporary installation) and it gives you an URL where you can get the verification code, using the web console authentication. Then you pick the cloud project and, optionally, a default region and zone. Those default informations are stored in: ~/.config/gcloud/configurations/config_default
and the credentials are in a sqllite database in ~/.config/gcloud/credentials.db
Gerald Venzl has recently published some free to use data about the world’s countries, capitals, and currencies. What I like with this data set is that, in addition to providing the CSV, Gerald have managed to provide a unique DDL + DML to create this data in SQL databases. And this works in the most common databases despite the fact that, beyond the SQL standard, data types and syntax is different in each engine.
https://twitter.com/GeraldVenzl/status/1338327177788411905?s=20
Google Spanner has a SQL-like API but I cannot run this without a few changes. But just a few, thanks to many recent improvements. And loading this data set will be the occasion to show those new features.
Here is the DDL to create the first table, REGIONS:
/*********************************************/
/***************** REGIONS *******************/
/*********************************************/
CREATE TABLE regions
(
region_id VARCHAR(2) NOT NULL,
name VARCHAR(13) NOT NULL,
CONSTRAINT regions_pk
PRIMARY KEY (region_id)
);
In order to run this in Google Spanner, I change VARCHAR to STRING and I move the PRIMARY KEY declaration out of the relational properties:
CREATE TABLE regions
(
region_id STRING(2) NOT NULL,
name STRING(13) NOT NULL
)
PRIMARY KEY (region_id)
;
It may be surprising to have the PRIMARY KEY declaration at this place but, because Google Spanner is a distributed database, the primary key is also a storage attribute. And it is mandatory as sharding is done by range partitioning on the primary key. Well, I would prefer to stay compatible with SQL and have, if needed, an additional organization clause. But Spanner is one of the first database trying to bring SQL to NoSQL and was originally designed to be used internally (like Google object storage matadata), providing SQL database benefits (SQL, ACID, joins,…) with the same scalability as distributed NoSQL databases. So compatibility with other SQL databases was probably not a priority.
Note that the NOT NULL constraint is allowed and we will see more about columns constraints later.
I have removed the comments because this is not allowed in Google Spanner. I don’t understand that, but remember that it takes its roots in NoSQL where the API calls are embedded in the code, and not in scripts, and the code has its own comments.
The second table is COUNTRIES and a country belongs to a region from the REGION table.
/*********************************************/
/**************** COUNTRIES ******************/
/*********************************************/
CREATE TABLE countries
(
country_id VARCHAR(3) NOT NULL,
country_code VARCHAR(2) NOT NULL,
name VARCHAR(100) NOT NULL,
official_name VARCHAR(200),
population NUMERIC(10),
area_sq_km NUMERIC(10,2),
latitude NUMERIC(8,5),
longitude NUMERIC(8,5),
timezone VARCHAR(40),
region_id VARCHAR(2) NOT NULL,
CONSTRAINT countries_pk
PRIMARY KEY (country_id),
CONSTRAINT countries_regions_fk001
FOREIGN KEY (region_id) REFERENCES regions (region_id)
);
CREATE INDEX countries_regions_fk001 ON countries (region_id);
On the datatypes, I’ll change VARCHAR to STRING and now we have a NUMERIC datatype in Spanner (was only IEEE 754 float) but NUMERIC has a fixed scale and precision (38,9) I’ll probably come back to it in another post. But there are still many limitations with NUMERIC (cannot create index on it, not easy to map when importing,…). It is definitely not a good choice here for areas, latitude and longitude. But I keep it just for DDL compatibility.
I move the PRIMARY KEY definition. But the most important here is actually about not changing the referential constraint at all:
CREATE TABLE countries
(
country_id STRING(3) NOT NULL,
country_code STRING(2) NOT NULL,
name STRING(100) NOT NULL,
official_name STRING(200),
population NUMERIC,
area_sq_km NUMERIC,
latitude NUMERIC,
longitude NUMERIC,
timezone STRING(40),
region_id STRING(2) NOT NULL,
CONSTRAINT countries_regions_fk001
FOREIGN KEY (region_id) REFERENCES regions (region_id)
)
PRIMARY KEY (country_id)
;
Before March 2020 The only possible referential integrity way actually a storage clause (at the same place as the PRIMARY KEY declaration) because referential integrity is not something easy in a distributed database. Because you distribute to scale and that works well only when your transaction is single-shard. This is why, initially, referential integrity was not a FOREIGN KEY but a compound PRIMARY KEY interleaved with the parent PRIMARY KEY. But we will see INTERLEAVE IN PARENT later. Here, COUTRIES has its own primary key, and then its own sharding scheme. That’ also mean that inserting a new country may have to check the parent key (region) in another shard. We will look at performance in another post.
When we declare a foreign key in Google Spanner, an index on it is created, then I didn’t copy the CREATE INDEX statement.
/*********************************************/
/***************** CITIES ********************/
/*********************************************/
CREATE TABLE cities
(
city_id VARCHAR(7) NOT NULL,
name VARCHAR(100) NOT NULL,
official_name VARCHAR(200),
population NUMERIC(8),
is_capital CHAR(1) DEFAULT 'N' NOT NULL,
latitude NUMERIC(8,5),
longitude NUMERIC(8,5),
timezone VARCHAR(40),
country_id VARCHAR(3) NOT NULL,
CONSTRAINT cities_pk
PRIMARY KEY (city_id),
CONSTRAINT cities_countries_fk001
FOREIGN KEY (country_id) REFERENCES countries (country_id),
CONSTRAINT cities_is_capital_Y_N_check001
CHECK (is_capital IN ('Y','N'))
);
CREATE INDEX cities_countries_fk001 ON cities (country_id);
In addition to the datatypes we have seen earlier I’ll transform CHAR to STRING, but I have to remove the DEFAULT declaration. Spanner recently introduced generated columns but those are always calculated, they cannot substitute to DEFAULT.
We declare check constraints since Oct. 2020 and I keep the same declaration. As far as I know they are not used by the optimizer but only to validate the data ingested.
I have a foreign key to COUNTRIES which I keep as non-interleaved. Because the COUNTRY_ID is not part of the CITIES primary key, and maybe because my data model may have to cope with cities changing to another country (geopolitical immutability).
CREATE TABLE cities
(
city_id STRING(7) NOT NULL,
name STRING(100) NOT NULL,
official_name STRING(200),
population NUMERIC,
is_capital STRING(1) NOT NULL,
latitude NUMERIC,
longitude NUMERIC,
timezone STRING(40),
country_id STRING(3) NOT NULL,
CONSTRAINT cities_countries_fk001
FOREIGN KEY (country_id) REFERENCES countries (country_id),
CONSTRAINT cities_is_capital_Y_N_check001
CHECK (is_capital IN ('Y','N'))
)
PRIMARY KEY (city_id)
;
Again, the index on COUNTRY_ID is created implicitely.
/*********************************************/
/***************** CURRENCIES ****************/
/*********************************************/
CREATE TABLE currencies
(
currency_id VARCHAR(3) NOT NULL,
name VARCHAR(50) NOT NULL,
official_name VARCHAR(200),
symbol VARCHAR(18) NOT NULL,
CONSTRAINT currencies_pk
PRIMARY KEY (currency_id)
);
/*********************************************/
/*********** CURRENCIES_COUNTRIES ************/
/*********************************************/
CREATE TABLE currencies_countries
(
currency_id VARCHAR(3) NOT NULL,
country_id VARCHAR(3) NOT NULL,
CONSTRAINT currencies_countries_pk
PRIMARY KEY (currency_id, country_id),
CONSTRAINT currencies_countries_currencies_fk001
FOREIGN KEY (currency_id) REFERENCES currencies (currency_id),
CONSTRAINT currencies_countries_countries_fk002
FOREIGN KEY (country_id) REFERENCES countries(country_id)
);
The CURRENCIES_COUNTRIES is the implementation of many-to-many relationship as a country may have multiple currencies and a currency can be used in multiple country. The primary key is a concatenation of the foreign keys. Here I’ll decide that referential integrity is a bit stronger and the list of currencies will be stored in each country, like storing it pre-joined for performance reason. If you come from Oracle, you may see this INTERLEAVE IN PARENT as a table CLUSTER but on a partitioned table. If you come from DynamoDB you may see the of it as the Adjacency lists modeling in the single-table design.
CREATE TABLE currencies
(
currency_id STRING(3) NOT NULL,
name STRING(50) NOT NULL,
official_name STRING(200),
symbol STRING(18) NOT NULL,
)
PRIMARY KEY (currency_id)
;
CREATE TABLE currencies_countries
(
currency_id STRING(3) NOT NULL,
country_id STRING(3) NOT NULL,
CONSTRAINT currencies_countries_countries_fk002
FOREIGN KEY (country_id) REFERENCES countries(country_id)
)
PRIMARY KEY (currency_id, country_id)
, INTERLEAVE IN PARENT currencies ON DELETE CASCADE;
Here, the many-to-many association between CURRENCIES and COUNTRIES is materialized as a composition with CURRENCIES, stored with it (INTERLEAVE), and removed with it (ON DELETE CASCADE). Note that I did that for the demo because CURRENCY_ID was first in the primary key declaration, but you may think more about data distribution and lifecycle when deciding on interleaving. Google Spanner partitions by range, and this means that all CURRENCIES_COUNTRIES associations will be stored together, in the same shard (called “split” in Spanner) for the same CURRENCY_ID.
There are many new multi-region configurations, within the same continent or distributed over multiple ones:
gcloud spanner instance-configs list
NAME DISPLAY_NAME
asia1 Asia (Tokyo/Osaka/Seoul)
eur3 Europe (Belgium/Netherlands)
eur5 Europe (London/Belgium/Netherlands)
eur6 Europe (Netherlands, Frankfurt)
nam-eur-asia1 United States, Europe, and Asia (Iowa/Oklahoma/Belgium/Taiwan)
nam10 United States (Iowa/Salt Lake/Oklahoma)
nam11 United States (Iowa, South Carolina, Oklahoma)
nam3 United States (Northern Virginia/South Carolina)
nam6 United States (Iowa/South Carolina/Oregon/Los Angeles)
nam7 United States (Iowa, Northern Virginia, Oklahoma)
nam8 United States (Los Angeles, Oregon, Salt Lake City)
nam9 United States (Northern Virginia, Iowa, South Carolina, Oregon)
regional-asia-east1 asia-east1
regional-asia-east2 asia-east2
regional-asia-northeast1 asia-northeast1
regional-asia-northeast2 asia-northeast2
regional-asia-northeast3 asia-northeast3
regional-asia-south1 asia-south1
regional-asia-southeast1 asia-southeast1
regional-asia-southeast2 asia-southeast2
regional-australia-southeast1 australia-southeast1
regional-europe-north1 europe-north1
regional-europe-west1 europe-west1
regional-europe-west2 europe-west2
regional-europe-west3 europe-west3
regional-europe-west4 europe-west4
regional-europe-west6 europe-west6
regional-northamerica-northeast1 northamerica-northeast1
regional-southamerica-east1 southamerica-east1
regional-us-central1 us-central1
regional-us-east1 us-east1
regional-us-east4 us-east4
regional-us-west1 us-west1
regional-us-west2 us-west2
regional-us-west3 us-west3
regional-us-west4 us-west4
I’ll use the latest dual-region in my continent, eur6, added on Dec. 2020 which has two read-write regions (Netherlands, Frankfurt) and the witness region in Zurich. Yes, this neutral position of Switzerland is a perfect fit in the distributed quorum, isn’t it?
Recent comments
3 years 2 days ago
3 years 12 weeks ago
3 years 16 weeks ago
3 years 17 weeks ago
3 years 22 weeks ago
3 years 43 weeks ago
4 years 11 weeks ago
4 years 41 weeks ago
5 years 25 weeks ago
5 years 25 weeks ago