Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

BIG NEWS: I’ve Been Selected On “Journey To Mars” Program !!

HAPPY APRIL FOOLS !! After many months of nervous waiting, knowing that my chances were extremely slim at best, I have just received my confirmation letter from NASA saying that I’ve been one of the lucky ones selected to participate in the “Journey To Mars” program !! I’m soooo damn excited !!! Planning is now […]

Whitepaper Announcement: Benchmarking Amazon Aurora.

This is just a quick blog post to inform readers of a good paper that shows some how-to information for benchmarking Amazon Aurora. This is mostly about sysbench which is used test transactional capabilities.

As an aside, many readers my have heard that I’m porting SLOB to PostgreSQL and will make that available in May 2018. It’ll be called “pgio” and is an implemention of the SLOB Method as described in the SLOB documentation. Adding pgio, to tools like sysbench, rounds-out the toolkit for testing platform readiness for your PostgreSQL applications.

To get a copy of the benchmarking paper, click here.

#000000;" src="https://kevinclosson.files.wordpress.com/2018/03/aurorabenchmarkpaper.png?w=500&h=463" alt="" width="500" height="463" srcset="https://kevinclosson.files.wordpress.com/2018/03/aurorabenchmarkpaper.png?w=500&h=463 500w, https://kevinclosson.files.wordpress.com/2018/03/aurorabenchmarkpaper.pn... 150w, https://kevinclosson.files.wordpress.com/2018/03/aurorabenchmarkpaper.pn... 300w, https://kevinclosson.files.wordpress.com/2018/03/aurorabenchmarkpaper.png 727w" sizes="(max-width: 500px) 100vw, 500px" />

Docker: efficiently building images for large software

I see increasing demand to build a Docker image for the Oracle Database. But the installation process for Oracle does not really fit the Docker way to install by layers: you need to unzip the distribution, install from it to the Oracle Home, remove the things that are not needed, strop the binaries,… Before addressing those specific issues, here are the little tests I’ve done to show how the build layers increase the size of the image.

I’m starting with an empty docker repository on XFS filesystem:

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 33M 80G 1% /var/lib/docker

add, copy, rename and append

For the example, I create a 100MB file in the context:

[root@VM121 docker]# mkdir -p /var/tmp/demo
[root@VM121 docker]# dd if=/dev/urandom of=/var/tmp/demo/file0.100M count=100 bs=1M

Here his my docker file:

FROM alpine:latest as staging
WORKDIR /var/tmp
ADD file0.100M .
RUN cp file0.100M file1.100M
RUN rm file0.100M
RUN mv file1.100M file2.100M
RUN dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M

The 1st step starts with an alpine image
The 2nd step sets the working directory
The 3rd step adds a 100M file from the context
The 4th step copies the file, so that we have 200M in two files
The 5th step removes the previous file, so that we have 100M in one file
The 6th step renames the file, staying with only one 100M file
The 7th step appends 100M to the file, leaving 200M in one file

Here is the build with default option:

[root@VM121 docker]# docker image build -t franck/demo /var/tmp/demo

The context, my 100M files is send first:

Sending build context to Docker daemon 104.9MB

And here are my 7 steps:

Step 1/7 : FROM alpine:latest as staging
latest: Pulling from library/alpine
ff3a5c916c92: Pull complete
Digest: sha256:7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0
Status: Downloaded newer image for alpine:latest
---> 3fd9065eaf02
Step 2/7 : WORKDIR /var/tmp
Removing intermediate container 93d1b5f21bb9
---> 131b3e6f34e7
Step 3/7 : ADD file0.100M .
---> 22ca0b2f6424
Step 4/7 : RUN cp file0.100M file1.100M
---> Running in b4b1b9c7e29b
Removing intermediate container b4b1b9c7e29b
---> 8c7290a5c87e
Step 5/7 : RUN rm file0.100M
---> Running in 606e2c73d456
Removing intermediate container 606e2c73d456
---> 5287e66b019c
Step 6/7 : RUN mv file1.100M file2.100M
---> Running in 10a9b379150e
Removing intermediate container 10a9b379150e
---> f508f426f70e
Step 7/7 : RUN dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M
---> Running in 9dcf6d80642c
100+0 records in
100+0 records out
Removing intermediate container 9dcf6d80642c
---> f98304641c54
Successfully built f98304641c54
Successfully tagged franck/demo:latest

So, what’s the size of my docker repository after my image with this 200M file?

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 538M 80G 1% /var/lib/docker

I have more than 500MB here.

Actually, besides the alpine image downloaded, which is only 4MB, the image I have build is 538MB:

[root@VM121 docker]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest f98304641c54 Less than a second ago 528MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

We can better understand this size by looking at intermediate images:

[root@VM121 docker]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest f98304641c54 1 second ago 528MB
f508f426f70e 27 seconds ago 319MB
5287e66b019c 36 seconds ago 214MB
8c7290a5c87e 37 seconds ago 214MB
22ca0b2f6424 42 seconds ago 109MB
131b3e6f34e7 47 seconds ago 4.15MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

The first one, ’22ca0b2f6424′ is from the step 3 which added the 100MB file
The second one ‘8c7290a5c87e’ is from the 4th step which copied the file, bringing the image to 200MB
The third one ‘5287e66b019c’ is from the 5th step which removed the file. I didn’t increase the size but didn’t remove anything either.
The fourth one ‘f508f426f70e’ is from the 6th step which renamed the file. But this, for docker, is like copying to a new layer and that adds 100MB
Finally, the 7th step appended only 100MB, but this finally resulted to copy the full 200MB file to the new layer

We can see all those operations, and size added at each step, from the image history:

[root@VM121 docker]# docker image history franck/demo
IMAGE CREATED CREATED BY SIZE COMMENT
f98304641c54 1 second ago /bin/sh -c dd if=/dev/urandom of=file2.100M … 210MB
f508f426f70e 27 seconds ago /bin/sh -c mv file1.100M file2.100M 105MB
5287e66b019c 36 seconds ago /bin/sh -c rm file0.100M 0B
8c7290a5c87e 37 seconds ago /bin/sh -c cp file0.100M file1.100M 105MB
22ca0b2f6424 42 seconds ago /bin/sh -c #(nop) ADD file:339435a18aeeb1b69… 105MB
131b3e6f34e7 47 seconds ago /bin/sh -c #(nop) WORKDIR /var/tmp 0B
3fd9065eaf02 2 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
2 months ago /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb… 4.15MB

All in one RUN

One workaround is to run everything in the same layer. Personally, I don’t like it because I don’t get the point of using a Dockerfile for just running one script.
So, here is the Dockerfile with only one RUN command:

FROM alpine:latest as staging
WORKDIR /var/tmp
ADD file0.100M .
RUN cp file0.100M file1.100M \
&& rm file0.100M \
&& mv file1.100M file2.100M \
&& dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M

The build is similar except that there are fewer steps:

[root@VM121 docker]# docker image build -t franck/demo /var/tmp/demo
Sending build context to Docker daemon 104.9MB
Step 1/4 : FROM alpine:latest as staging
latest: Pulling from library/alpine
ff3a5c916c92: Pull complete
Digest: sha256:7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0
Status: Downloaded newer image for alpine:latest
---> 3fd9065eaf02
Step 2/4 : WORKDIR /var/tmp
Removing intermediate container 707644c15547
---> d4528b28c85e
Step 3/4 : ADD file0.100M .
---> e26215766e75
Step 4/4 : RUN cp file0.100M file1.100M && rm file0.100M && mv file1.100M file2.100M && dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M
---> Running in 49c2774851f4
100+0 records in
100+0 records out
Removing intermediate container 49c2774851f4
---> df614ac1b6b3
Successfully built df614ac1b6b3
Successfully tagged franck/demo:latest

This leaves us with a smaller space usage::

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 340M 80G 1% /var/lib/docker

The image is smaller, but still larger than the final state (a 300MB image for only one 200MB file):

[root@VM121 docker]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest df614ac1b6b3 Less than a second ago 319MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

This is because we have grouped the RUN steps, but the ADD has its own layer, adding a file that is removed later:

[root@VM121 docker]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest df614ac1b6b3 Less than a second ago 319MB
e26215766e75 20 seconds ago 109MB
d4528b28c85e 22 seconds ago 4.15MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB
 
[root@VM121 docker]# docker image history franck/demo
IMAGE CREATED CREATED BY SIZE COMMENT
df614ac1b6b3 Less than a second ago /bin/sh -c cp file0.100M file1.100M … 210MB
e26215766e75 20 seconds ago /bin/sh -c #(nop) ADD file:fe0262a4b800bf66d… 105MB
d4528b28c85e 22 seconds ago /bin/sh -c #(nop) WORKDIR /var/tmp 0B
3fd9065eaf02 2 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
2 months ago /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb… 4.15MB

This is the kind of issue we have when building an Oracle Database image. We need to ADD the zip file for the database distribution, and the latest bundle patch. It is removed later but still takes space on the image. Note that one workaround to avoid the ADD layer can be to get the files from an NFS or HTTP server with wget or curl in a RUN layer rather than an ADD one. There’s an example on Stefan Oehrli blog post.

–squash

With the latest versions of docker, there’s an easy way to flatten all those intermediary images at the end.
Here I’ve 18.03 and enabled experimental features:

[root@VM121 docker]# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 8
Server Version: 18.03.0-ce
Storage Driver: overlay2
Backing Filesystem: xfs
...
 
[root@VM121 docker]# cat /etc/docker/daemon.json
{
"experimental": true
}

I start with the same as before but just add –squash to the build command

[root@VM121 docker]# docker image build --squash -t franck/demo /var/tmp/demo

The output is similar but the image is an additional one, reduced down to the size of my final state (with one 200MB file):

[root@VM121 docker]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 2ab439a723c4 Less than a second ago 214MB
c3058e598b0a 3 seconds ago 528MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

The intermediate image list shows that all was done as without ‘–squash’ but with an additional set which reduced the size:

[root@VM121 docker]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 2ab439a723c4 Less than a second ago 214MB
c3058e598b0a 3 seconds ago 528MB
1f14d93a592e 23 seconds ago 319MB
7563d40b650b 27 seconds ago 214MB
8ed15a5059bd 28 seconds ago 214MB
24b11b9026ce 31 seconds ago 109MB
382bb71a6a4a 33 seconds ago 4.15MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

This step is visible in the image history as a ‘merge’ step:

[root@VM121 docker]# docker image history franck/demo
IMAGE CREATED CREATED BY SIZE COMMENT
2ab439a723c4 Less than a second ago 210MB merge sha256:c3058e598b0a30c606c1bfae7114957bbc62fca85d6a70c2aff4473726431394 to sha256:3fd9065eaf02feaf94d68376da52541925650b81698c53c6824d92ff63f98353
3 seconds ago /bin/sh -c dd if=/dev/urandom of=file2.100M … 0B
23 seconds ago /bin/sh -c mv file1.100M file2.100M 0B
27 seconds ago /bin/sh -c rm file0.100M 0B
28 seconds ago /bin/sh -c cp file0.100M file1.100M 0B
31 seconds ago /bin/sh -c #(nop) ADD file:14cef588b48ffbbf1… 0B
33 seconds ago /bin/sh -c #(nop) WORKDIR /var/tmp 0B
2 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
2 months ago /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb… 4.15MB

However, even if I have a smaller final image, my filesystem usage is even larger with this additional 210MB:

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 739M 80G 1% /var/lib/docker

Let’s prune it to get rid of those intermediate images:

[root@VM121 docker]# docker image prune -f
Deleted Images:
deleted: sha256:c3058e598b0a30c606c1bfae7114957bbc62fca85d6a70c2aff4473726431394
deleted: sha256:37ed4826d70def1978f9dc0ddf42618d951f65a79ce30767ac3a5037d514f8af
deleted: sha256:1f14d93a592eb49a210ed73bf65e6886fcec332786d54b55d6b0e16fb8a8beda
deleted: sha256:c65cf4c70aed04e9b57e7a2a4fa454d3c63f43c32af251d8c86f6f85f44b1757
deleted: sha256:7563d40b650b2126866e8072b8df92d5d7516d86b25a2f6f99aa101bb47835ba
deleted: sha256:31ee5456431e903cfd384b1cd7ccb7918d203dc73a131d4ff0b9e6517f0d51cd
deleted: sha256:8ed15a5059bd4c0c4ecb78ad77ed75da143b06923d8a9a9a67268c62257b6534
deleted: sha256:6be91d85dec6e1bda6f1c0d565e98dbf928b4ea139bf9cb666455e77a2d8f0d9
deleted: sha256:24b11b9026ce738a78ce3f7b8b5d86ba3fdeb15523a30a7c22fa1e3712ae679a
deleted: sha256:c0984945970276621780a7888adfde9c6e6ca475c42af6b7c54f664ad86f9c9f
deleted: sha256:382bb71a6a4a7ddec86faa76bb86ea0c1a764e5326ad5ef68ce1a6110ae45754
 
Total reclaimed space: 524.3MB

Now having only the squashed image:

[root@VM121 docker]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 2ab439a723c4 32 minutes ago 214MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB
 
[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 237M 80G 1% /var/lib/docker

multi-stage build

Finally, you can do something similar to an intermediate squash using multi-stage build.

Here is my Dockerfile:

FROM alpine:latest as staging
WORKDIR /var/tmp
ADD file0.100M .
RUN cp file0.100M file1.100M
RUN rm file0.100M
RUN mv file1.100M file2.100M
RUN dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M
 
FROM alpine:latest
WORKDIR /var/tmp
COPY --from=staging /var/tmp .

With multi-stage build, we can start the second stage from a different image, and add more steps, but here I just start with the same alpine image and copy the final layer of the previous build.

We see something very similar to the –squash one:

[root@VM121 docker]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 55f329385f8c Less than a second ago 214MB
fd26a00db784 8 seconds ago 528MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB
 
[root@VM121 docker]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 55f329385f8c 1 second ago 214MB
fd26a00db784 9 seconds ago 528MB
9bf5be367b63 32 seconds ago 319MB
531d78833ba8 35 seconds ago 214MB
05dd68114743 36 seconds ago 214MB
b9e5215a9fc8 39 seconds ago 109MB
ab332f486793 41 seconds ago 4.15MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

The history of the last stage shows the copy of 210MB from the previous one:

[root@VM121 docker]# docker image history franck/demo
IMAGE CREATED CREATED BY SIZE COMMENT
55f329385f8c 1 second ago /bin/sh -c #(nop) COPY dir:2b66b5c36eff5b51f… 210MB
ab332f486793 41 seconds ago /bin/sh -c #(nop) WORKDIR /var/tmp 0B
3fd9065eaf02 2 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
2 months ago /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb… 4.15MB

The usage of filesystem is similar to the –squash one. Even if we reduced the final image, all the intermediate states had to be stored:

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 737M 80G 1% /var/lib/docker

That looks good, if you accept to use a large intermediate space while building the image, which gives you the possibility to debug without re-running from the beginning, thanks to the layers in cache. However, you have still the inefficiency that each time you try the build, the context will be sent again even when not needed. And that is long with a 3GB .zip in the case of Oracle Database installation. Unfortunately, if you add the file to the .dockerignore once you know you have the ADD steps in cache, the next build attempt will not use the caches anymore. I would love to see a per-stage .dockerignore file for multi-stage builds. Or simply have docker realize that some files in the context will not be needed by the COPY or ADD that are not in cache yet.

Sending the whole context at each build attempt, when debugging your Dockerfile, is not efficient at all and looks like punch-card time compilation where people sent the cards to be compiled during the night. One syntax error on the first line and you go for another day.

One solution is to have all the required files in an NFS or HTTPd server and get them with ADD from the URL as mentioned earlier.

Multi-stage with multi-contexts

Another solution is to put all COPY or ADD from context in one Dockerfile to build the image containing all required files, and then build your image from it (and squash it at the end).

Here is my first Dockerfile, just adding the files from the context:

[root@VM121 docker]# ls /var/tmp/demo
Dockerfile file0.100M nocontext
[root@VM121 docker]# cat /var/tmp/demo/Dockerfile
FROM alpine:latest as staging
WORKDIR /var/tmp
ADD file0.100M .

I build this ‘staging’ image:

[root@VM121 docker]# docker image build -t franck/stage0 /var/tmp/demo
Sending build context to Docker daemon 104.9MB
Step 1/3 : FROM alpine:latest as staging
latest: Pulling from library/alpine
ff3a5c916c92: Pull complete
Digest: sha256:7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0
Status: Downloaded newer image for alpine:latest
---> 3fd9065eaf02
Step 2/3 : WORKDIR /var/tmp
Removing intermediate container 0eeed8e0cfd2
---> a5db3b29c8e1
Step 3/3 : ADD file0.100M .
---> 2a34e1e981be
Successfully built 2a34e1e981be
Successfully tagged franck/stage0:latest

This one is the minimal one:

[root@VM121 docker]# docker image ls
+ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/stage0 latest 2a34e1e981be Less than a second ago 109MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB
 
[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 139M 80G 1% /var/lib/docker

Now, I don’t need to send this context anymore during further development of my Dockerfile.
I’ve added the following steps to a Dockerfile in another directory:

[root@VM121 docker]# ls /var/tmp/demo/nocontext/
Dockerfile
[root@VM121 docker]# cat /var/tmp/demo/nocontext/Dockerfile
FROM franck/stage0 as stage1
WORKDIR /var/tmp
RUN cp file0.100M file1.100M
RUN rm file0.100M
RUN mv file1.100M file2.100M
RUN dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M
FROM alpine:latest
WORKDIR /var/tmp

Here is the build, using multi-stage to get a squashed final image (you can also use –squash)

[root@VM121 docker]# docker image build -t franck/demo /var/tmp/demo/nocontext
 
Sending build context to Docker daemon 2.048kB
Step 1/9 : FROM franck/stage0 as stage1
---> 2a34e1e981be
Step 2/9 : WORKDIR /var/tmp
Removing intermediate container eabf57a8de05
...
Successfully built 82478bfa260d
Successfully tagged franck/demo:latest

At that point, there’s no advantage on space used as I keep all layers for easy Dockerfile development:

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 738M 80G 1% /var/lib/docker
 
[root@VM121 docker]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
franck/demo latest 82478bfa260d About a minute ago 214MB
5772ad68d208 About a minute ago 528MB
franck/stage0 latest 2a34e1e981be About a minute ago 109MB
alpine latest 3fd9065eaf02 2 months ago 4.15MB

But now, if I want to add an additional step:

[root@VM121 docker]# cat >> /var/tmp/demo/nocontext/Dockerfile <<< 'RUN chmod a+x /var/tmp'

I can re-build quickly, using cached layers, and without the need to send the context again:

[root@VM121 docker]# docker image build -t franck/demo /var/tmp/demo/nocontext
Sending build context to Docker daemon 2.048kB
Step 1/10 : FROM franck/stage0 as stage1
---> 2a34e1e981be
Step 2/10 : WORKDIR /var/tmp
---> Using cache
---> fa562926cc2b
Step 3/10 : RUN cp file0.100M file1.100M
---> Using cache
---> 31ac716f4d61
Step 4/10 : RUN rm file0.100M
---> Using cache
---> d7392cf51ad9
Step 5/10 : RUN mv file1.100M file2.100M
---> Using cache
---> 4854e503885b
Step 6/10 : RUN dd if=/dev/urandom of=file2.100M seek=100 count=100 bs=1M
---> Using cache
---> 5772ad68d208
Step 7/10 : FROM alpine:latest
---> 3fd9065eaf02
Step 8/10 : WORKDIR /var/tmp
---> Using cache
---> a5db3b29c8e1
Step 9/10 : COPY --from=stage1 /var/tmp .
---> Using cache
---> 82478bfa260d
Step 10/10 : RUN chmod a+x /var/tmp
---> 4a69ee40a938
Successfully built 4a69ee40a938
Successfully tagged franck/demo:latest

Once I’m ok with my final image, I can remove the intermediate ones:

[root@VM121 docker]# docker image prune -f
Deleted Images:
deleted: sha256:5772ad68d20841197d1424f7c64edd21704e4c7b470acb2193de51ae8741385d
deleted: sha256:bab572d749684d126625a74be4f01cc738742f9c112a940391e3533e61dd55b9
deleted: sha256:4854e503885b4057809fe2867a743ae7898e3e06b329229519fdb5c9d8b10ac1
deleted: sha256:de4acb90433c30d6a21cc3b4483adbd403d8051f3c7c31e6bc095a304606355a
deleted: sha256:d7392cf51ad99d5d0b7a1a18d8136905c87bc738a5bc94dec03e92f5385bf9c8
deleted: sha256:f037e7f973f4265099402534cd7ba409f35272701166d59a1be8e5e39508b07c
deleted: sha256:31ac716f4d61f0048a75b8de6f18757970cf0670a0a3d711e4386bf098b32041
deleted: sha256:2dccb363c5beb4daf45586383df6454b198f824d52676f70318444c346c0fe9a
deleted: sha256:fa562926cc2b3cb56400e1068984bb4048f56713a3cf6dcfa3cf6d945023ebc4
 
Total reclaimed space: 419.4MB

And the staging one:

[root@VM121 docker]# docker image rm franck/stage0
Untagged: franck/stage0:latest
Deleted: sha256:2a34e1e981be9154c31c5ee7eb942cc121267f4416b6fe502ab93f2dceafd98c
Deleted: sha256:b996a1bdc829167f16dcbe58b717284764470661c3116a6352f15012e1dff07c

Finally, I optimized the developement of the Dockerfile and finished with the minimal size.

[root@VM121 docker]# df -hT /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdc xfs 80G 237M 80G 1% /var/lib/docker

So what?

I’m always surprised by the lack of efficiency when building an image with a Dockerfile. Any serious application deployment involves several intermediate files and the way docker build is layered inflates the size and the time required. Efficient layering and snapshotting work at block level. Here, at file level, any byte of data modified in a file, even metadata such as the file name, is a whole file copy. But for common applications, the installation steps are not as simple adding new files. You may have files appended, object files added to libraries, then compiled, the stripped…

In this post, I tested some recent features, such as multi-stage build and the experimental –squash, as well as a simple manual multi-stage build. Of course, you can do everything in the same layers, and even not use Dockerfiles at all, but then why using Docker? There’s also the Packer approach that I’ve not tested yet. However, I like the Docker approach, but only when used correctly. Deploying an application, like Oracle Database, should use the layered build in the following way: additional steps for new options or new updates. This means that the files must be built elsewhere, in a staging container, and added in one step. And to be efficient, the context should be sent only when needed: when a non-cached ADD or COPY requires it.

 

Cet article Docker: efficiently building images for large software est apparu en premier sur Blog dbi services.

Docker-CE on Oracle Enterprise Linux 7

Here is how I install the latest Docker version on Oracle Linux 7. You find several blog posts about it which all install ‘docker-engine’. But things move fast in this agile world and docker package name has changed. The Community Edition is now ‘docker-ce’ and you want this one to run the latest version.

I’m on OEL 7.4 but should also wotj on RHEL 7:
[root@VM188 yum]# cat /etc/oracle-release
Oracle Linux Server release 7.4

docker-engine

If you enable [ol7_addons] you can install ‘docker-engine:’

# yum-config-manager --enable ol7_addons
# yum info docker-engine
Loaded plugins: ulninfo
Available Packages
Name : docker-engine
Arch : x86_64
Version : 17.06.2.ol
Release : 1.0.1.el7
Size : 21 M
Repo : ol7_addons/x86_64
Summary : The open-source application container engine
URL : https://dockerproject.org
License : ASL 2.0
Description : Docker is an open source project to build, ship and run any application as a
: lightweight container.

But forget it. That’s 17.06 which is 6 months old. You should consider Docker as a puppy. 6 month in puppy’s years is like 3 human years. So many changes happened.

You can remove all those old things:

# yum remove docker docker-common docker-selinux docker-engine

docker-ce

I’ve not found ‘docker-ce’ on OL7 repositories, as only the Enterprise Edition is there. Then I added the CentOS repo (with yum-config-manager that you can get with yum-utils if you don’t have it already):

yum -y install yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

Once done, I’ve just installed Docker Community Edition with:

yum -y install docker-ce

And, at that time I got the version 17.12 easily:

[root@VM188 ~]# yum info docker-ce
Loaded plugins: ulninfo
Installed Packages
Name : docker-ce
Arch : x86_64
Version : 17.12.0.ce
Release : 1.el7.centos
Size : 123 M
Repo : installed
From repo : docker-ce-stable
Summary : The open-source application container engine
URL : https://www.docker.com
License : ASL 2.0
Description : Docker is an open source project to build, ship and run any application as a
: lightweight container.
...

But now there’s a new version available:

Available Packages
Name : docker-ce
Arch : x86_64
Version : 18.03.0.ce
Release : 1.el7.centos
Size : 35 M
Repo : docker-ce-stable/x86_64
Summary : The open-source application container engine
URL : https://www.docker.com
License : ASL 2.0
Description : Docker is an open source project to build, ship and run any application as a
: lightweight container.

The problem is that if you want to install docker-ce in this latest version, you will now get:

Resolving Dependencies
--> Running transaction check
---> Package docker-ce.x86_64 0:18.03.0.ce-1.el7.centos will be installed
--> Processing Dependency: pigz for package: docker-ce-18.03.0.ce-1.el7.centos.x86_64
--> Finished Dependency Resolution
Error: Package: docker-ce-18.03.0.ce-1.el7.centos.x86_64 (docker-ce-stable)
Requires: pigz
You could try using --skip-broken to work around the problem
** Found 1 pre-existing rpmdb problem(s), 'yum check' output follows:
2:microcode_ctl-2.1-22.5.0.3.el7_4.x86_64 has missing requires of kernel

(Ok Google, this is what you need to index…)

pigz

Starting from version 18.02 there’s a new dependency on ‘pigz’ for parallel gzip.

To get this ‘pigz’ package from the OL7 repository you need to enable EPEL in /etc/yum.repos.d/public-yum-ol7.repo

[ol7_developer_EPEL]
name=Oracle Linux $releasever Developement Packages ($basearch)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL7/developer_EPEL/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

Now, I’m able to install the latest docker-ce:

[root@VM188 yum.repos.d]# yum install docker-ce
Loaded plugins: ulninfo
Resolving Dependencies
--> Running transaction check
---> Package docker-ce.x86_64 0:17.12.0.ce-1.el7.centos will be updated
---> Package docker-ce.x86_64 0:18.03.0.ce-1.el7.centos will be an update
--> Processing Dependency: pigz for package: docker-ce-18.03.0.ce-1.el7.centos.x86_64
--> Running transaction check
---> Package pigz.x86_64 0:2.3.4-1.el7 will be installed
--> Finished Dependency Resolution
 
Dependencies Resolved
 
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Updating:
docker-ce x86_64 18.03.0.ce-1.el7.centos docker-ce-stable 35 M
Installing for dependencies:
pigz x86_64 2.3.4-1.el7 ol7_developer_EPEL 80 k
 
Transaction Summary
================================================================================================================================
Install ( 1 Dependent package)
Upgrade 1 Package
 
Total download size: 35 M
Is this ok [y/d/N]: y

Oracle Database on Docker

You may wonder why I install Docker on Oracle Linux rather than CentOS. The MOS Doc ID 2216342.1 mentions that Oracle will support customers running Oracle Database (single instance) in Docker containers running on Oracle Linux 7 with UEK4 or Red Hat Enterprise Linux 7.

If you want to validate your Docker install for running Oracle Database, the easiest is to use the image build script provided by Oracle:

git clone https://github.com/oracle/docker-images.git
cd ./docker-images/OracleDatabase/SingleInstance/dockerfiles/
# download and move linuxx64_12201_database.zip is in 12.2.0.1 subdirectory
sh buildDockerImage.sh -v 12.2.0.1 -e

Those are maintained by Gerald Venzl, Oracle product manager for database development, so they are obviously the best way to run Oracle Database on Docker. You can read all related best practices from the same author. Once you have that running, you have validated your environment and you can customize further if you want.

 

Cet article Docker-CE on Oracle Enterprise Linux 7 est apparu en premier sur Blog dbi services.

Hybrid histograms

Just a quick post here so I could consolidate some information about histograms in 12c.

On my last Office Hours session, one of the questions that was posed was asking for an explanation of the new types of histograms in use in 12c.  So I had a few slides and described conceptually at a high level how they work and what they represent.  If you’re new to 12c, or new to optimizer histograms in general, then take a look at the video below to get up to speed.  But for those who want to dig into a deeper level, I also saw recently two absolutely cracking articles which discuss the algorithms and implementation in much more detail.  So I’ve linked them here as well.

http://www.oracle.com/technetwork/articles/database/histogram-construction-oracle-12c-4426845.html

http://www.oracle.com/technetwork/articles/database/maintenance-histograms-db-12c-4426850.html

On that note…if you celebrate Easter – have a safe and happy break.  So you on the other side of the weekend!

Hare, Easter Bunny, Plush Bunny, Easter

Lock Types

Every now and again I have to check what a particular lock (or enqueue) type is for and what the associated parameter values represent. This often means I have to think about the names of a couple of views and a collection of columns – then create a few column formats to make the output readable (though sometimes I can take advantage of the “print_table()” procedure that Tom Kyte a long time ago.  It’s only takes a little time to get the code right, but it’s a nuisance when I’m in a hurry so I’ve just scribbled out a few lines of a script that takes a lock type as an input parameter and reports all the information I want.

rem
rem     Script:         lock_types.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2018
rem     Usage:          start lock_types {lock type}
rem

define m_lock_type='&1'

column  display new_value m_display

select
        case when substr(version,1,2) = '12'
        then
                'display_name'
        else
                'name'
        end                     display
from
        v$instance
;

set linesize 160
set pagesize 60
set trimspool on

column type             format a4
column name             format a32
column description      format a132
column id1_tag          format a32
column id2_tag          format a32
column is_user          format a4 heading "User"
column is_recycle       format a4 heading "Rcyc"

set feedback off
break on report skip 1

spool lock_types

select  *
from    V$lock_type
where   type = upper('&m_lock_type')
order by
        type
;

column  name                    format a42
column  parameter1              format a9
column  parameter2              format a24
column  parameter3              format a22
column  wait_class              format a14
column  display_name            format a42

select
        eve.name,
        eve.parameter1,
        eve.parameter2,
        eve.parameter3,
        eve.wait_class,
        nullif(eve.&m_display, eve.name) display_name
from
        v$event_name    eve
where
        eve.name like 'enq: ' || upper('&m_lock_type') || '%'
order by
        nullif(eve.wait_class,'Other'),
        eve.name
;

set feedback on

I’ve included a check (and hack) on the value of the major version because 12c introduced a “display_name” as well as a “name” for events, and the latter is sometimes a little more descriptive than the former, so it’s nice to have a single script that could print two different values for the versions that have them.

Here’s a sample of the output when I pass ‘IV’ as an input parameter:

TYPE NAME                             ID1_TAG                          ID2_TAG                          User Rcyc
---- -------------------------------- -------------------------------- -------------------------------- ---- ----
DESCRIPTION                                                                                                                              CON_ID
------------------------------------------------------------------------------------------------------------------------------------ ----------
IV   Library Cache Invalidation       object #                         time stamp                       NO   NO
Synchronizes library cache object invalidations across instances                                                                              0

NAME                                       PARAMETER PARAMETER2               PARAMETER3             WAIT_CLASS     DISPLAY_NAME
------------------------------------------ --------- ------------------------ ---------------------- -------------- ------------------------------------------
enq: IV -  contention                      type|mode id1                      id2                    Other

As you can see from the presence of the con_id column in v$lock_type this output came from a 12c instance. I picked the IV lock because that’s the one that prompted me to check the meanings of the id[12] and parameter[123] columns when a question about waits for the IV lock appeared recently on Oracle-L. I’ve got two reasons for carrying on with this particular example – first that it demonistrates that the descriptions can be wrong, second that it allows me to demonstrate a quick tip on translation.

The question on Oracle-L related to a 4-node RAC system and reported one instance suffering long waits on the IV enqueue on a fairly regular basis when running a particular batch task. The OP reported the following values as the p1, p2, p3 values from v$session while the wait was going on:


P1    type|mode 1230372869
P2    id1       1398361667
P3    id2                3

According to the details in v$lock_type the enqueue is about library cache invalidation across instances – and that fits the OPs complaint because the system is a RAC system. The id1 value is supposed to be an obj# (object_id), but the OP said it wasn’t; and the id2 value is supposed to be a timestamp, but 3 is an odd value for a timestamp (though it might represent – for example – the 3 second wait that is a common time-out interval for enqueues). So, clearly, the descriptions can be wrong.

Translation

Take another look at p1 and p2, and turn them into Hexadecimal:


1230372869 (dec) = 0x49560005 (hex)
1398361667 (dec) = 0x53594E43 (hex)

If you happen to be good with Hex and ASCII code you’ll know that 2-byte values in the range 41-5F are mostly the capital letters of the Roman alphabet (while 61 – 7f are mostly the lower case letters), so a second translation step gives us:


1230372869 (dec) = 0x49560005 (hex) = 'IV'  5
1398361667 (dec) = 0x53594E43 (hex) = 'SYNC'

The p1 parameter is described (correctly) as “type/mode” – this is an IV enqueue held or requested in mode 5; the p2 parameter is not an object number, it looks more like a text description of why the enqueue is being requested (the enqueue is, after all, described as being used to “synchronize library cache object invalidation”).

I still don’t know what the final parameter represents – I doubt if it’s really about a three second wait (but that could be checked by examining v$session_wait over a period of several seconds or v$session_wait_history), it might be an indication of the instance that the session is trying to synchronize with (though, again, that seems a long shot), or it might just be a “reason-code” describing why the synchronisation is necessary.

Whenever in doubt about the meaning for the various parameters, it’s always worth a quick dec -> hex -> ASCII check, just in case it offers some clues about the function of the enqueue.

 

Richard Foote Consulting Seminars At Your Workplace !!

If you have 6 or more people in you organisation who would be interested in attending a seminar, please contact me at richard@richardfooteconsulting.com. I currently run my highly acclaimed “Oracle Indexing Internals and Best Practices” seminar as public events, which has now been run in some 18 countries. I’m also at work on developing an […]

How to cancel SQL statements in #Oracle 18c

https://uhesse.files.wordpress.com/2015/10/helps.png?w=600&h=558 600w, https://uhesse.files.wordpress.com/2015/10/helps.png?w=150&h=140 150w" sizes="(max-width: 300px) 100vw, 300px" />

A nice 18c New Feature is that you can cancel a certain SQL statement without using Resource Manager techniques. Here’s an example:

SQL> select banner_full from v$version;

BANNER_FULL
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0

SQL> select distinct sid from v$mystat;

     SID
---------
      477

SQL> begin loop null; end loop; end;   
     /

This produces an endless loop. Now I need the SQL_ID of the statement and the SERIAL# of the offending session:

SQL> select sql_text,sql_id from v$sql where sql_text like '%loop null; end loop%';

SQL_TEXT
--------------------------------------------------------------------------------
SQL_ID
-------------
begin loop null; end loop; end;
8gadd3yhtd150

select sql_text,sql_id from v$sql where sql_text like '%loop null; end loop%'
6vbb9d7zj9t5w


SQL> select serial# from v$session where sid=477;

   SERIAL#
----------
     10569

Now I can use the New Feature:

SQL> alter system cancel sql '477,10569,8gadd3yhtd150';

System altered.

The offending session gets

ERROR at line 1:
ORA-01013: user requested cancel of current operation

We waited for this functionality quite some time </p />
</p></div>

    	  	<div class=

A look into oracle redo, part 8: generate redo

This blogpost looks at the very start of oracle redo: the generation of it. In order to do that, I start off with a very simple table, and look at the redo generation part. I guess the regular readers of this blogpost series understand that redo generation is closely connected with making changes made to blocks. This post therefore is not only about redo generation, but also about how the technical implementation of block changes.

I created a simple table (create table test (f1 varchar2(10)) with no index to make the execution as simple as possible, and simply insert rows (insert into test values (‘a’)). It could be argued that not having an index makes it not the most real life scenario, and this might be right. However, the goal here is to make the execution as simple as possible.

I then looked at the execution of the SQL, and created an overview of the relevant functions that are executed in my session:

insexe
  qerltcStart
  qerltcFetch
    qerltcSingleRowLoad
      qerltcSimpleInsRowCBK
        kdtSimpleInsRow
          kdtgrs
          kdtwrp
            kdtchg
              ktbchg2
                ktuchg2
                  ktcbgn
                  ktubnd
                    ktuGetTxForXcb
                      ksqgtlctx
                  ktugur
                  kcbchg1
                    kcbchg1_main
                      ktiimu_chg
                      kcb_block_status
                      ktiimu_chg					
                      ktiimu_chg
                      kcopcv
                      kcopcv
                      kcopcv
                      kcrfw_redo_gen_ext
                        kcrfw_copy_cv
                        kcscur_rba
                        kcbapl
                          kcbhlchk
                          kcoapl
                            kco_issue_callback
                              kturdh
                            kco_blkchk
                            kcoadv_hdr
                            kcbhfix_tail
                        kcbapl
                          kcbhlchk
                          kcoapl
                            kco_issue_callback
                              kturdb
                            kco_blkchk
                            kcoadv_hdr
                            kcbhfix_tail
                        kcbapl
                          kcbhlchk
                          kcoapl
                            kco_issue_callback
                              kdoirp
                            kcoadv_hdr
                            kcbhfix_tail	
                        kcrfw_partial_checksum
    qerltcPostInsertProcessing
    qerltcSetupOerdef
    qerltcFreeMemory

How to read this: I started looking at the functions that are executed when an insert statement is executed at the ‘insexe’ function, which is the main insert function. An indention means it is called from the previous less indented function.

The insertion code calls query execute rowsource load table conventional functions (qerltc). Conventional here means that it’s not a direct insertion. The qerltcStart function essentially initialises memory which is needed to perform the execution of the insert statement.

After the rowsource layer, the data layer is entered using the kdtSimpleInsRow. The first thing that needs to be done here is to find space in a block in which the row can be inserted. Finding a block is done in the kdtgrs function (kernel data table get space for rows). Once a block is found, the code inserts the row using the kdtwrp function (kernel data table write row piece). Via the kdtchg function (kernel data table change) function, the transaction layer is entered.

The transaction layer function that are then used are ktbchg2 (kernel transaction block (header) change) and ktuchg2 (kernel transaction undo called from write logs and perform changes), after which the transaction is initialised in ktcbgn (kernel transaction control begin) next is ktubnd (kernel transaction undo bind undo segment. Inside ktubnd not only the undo segment is selected (‘bound’), but the transaction enqueue lock is set too (ksqgtlctx), which locks the row logically. Further down below you will see Oracle also needs to provide a physical way of locking access when it makes changes to prevent from wrong data being read.

Then we enter the cache layer via the kcbchg1 function (kernel cache buffers change). The cache layer is the actual layer that executes functions to change the block. The actual main function which performs block layer functions is kcbchg1_main.

In order for Oracle’s recovery mechanism to work, the redo must be generated before the block change, so a block can be recovered using the redo. Therefore, kcrfw_redo_gen_ext (kernel cache redo file write/broadcast SCN main redo generation function (12c)) is called. Inside kcrfw_redo_gen_ext, the redo change vectors are copied into the public redo strand using the kcrfw_copy_cv function (kernel cache redo file write/broadcast SCN copy change vectors), after which the changes to the block in the cache are done using the kcbapl (kernel cache buffers apply).

There are three kcbapl executions; the first one applies the change to the undo header where the transaction is assigned a slot which points to the actual undo data inside the undo segment, visible inside the kco_issue_callback function kturdh (kernel transaction undo write undo header data to undo block), the second one applies the change to the undo block with the previous version of the data, so a change be undone and read consistency can be provided. This is also visible inside the kco_issue_callback function: kturdb (kernel transaction undo write undo data to undo block), and the second invocation of kcbapl has the function kdoirp (kernel data operations insert row piece), which performs the actual change of the block to insert the row.

If you look closely at the functions inside kcbapl, you see that a check is done to the buffer prior the the change (kcbhlchk, kernel cache buffers header logical check), and after the change the block metadata is changed to reflect the change (kcoadv_hdr, kernel cache operations advance SCN in buffer header and kcbhfix_tail, kernel cache buffers header update tail).

After the change, a checksum is applied to the generated redo using kcrfw_partial_checksum.

This concludes the insert. A row lock is obtained, the redo change vectors have been copied into the public redo strand, and the changes have been applied to the undo and table blocks.

The functions return, and execution is back at the query execute rowsource layer (qer), which performs some post processing using qerltcPostInsertProcessing, qerltcSetupOerdef and qerltcFreeMemory.

A question that I asked myself is where the actual redo change vectors are created. It’s known that kcrfw_copy_cv copies the change vectors into the public redo strand, but these need to created before you can copy them. Are these created on the spot in kcrfw_copy_cv? Probably not. Of course this dives so deep into the actual working of the Oracle engine that Oracle does not reveal anything about it (that I am aware of).

Luckily, using the Intel Pin tools trace we can see memory access, and therefore we can track the memory area’s that a process is using, and see where (in which function) memory is written to before it is written into the public redo strand. First let’s look the kcrfw_copy_cv function:

40626:kcrfw_copy_cv+544:0x00007ffcd200f778():W:8:0x10e3d3e5/283366373()
40627:_intel_fast_memcpy+7:0x00000000157f62a0():R:8:0x137fff/1277951()
40628:__intel_ssse3_rep_memcpy+30:0x00000000137d0c90():R:4:0xf2afbf82/4071604098(??)
40629:__intel_ssse3_rep_memcpy+8722:0x00007ffcd2011678():R:16:0x5223030ffff10100542b760/-263813982931104()
40630:__intel_ssse3_rep_memcpy+8727:0x00000000d80cbe54(Redo Buffers(pgsz:2048k)|+835156 redo|PUB_REDO_0+835156 ):W:16:0x5223030ffff10100542b760/-263813982931104()
40631:__intel_ssse3_rep_memcpy+8732:0x00007ffcd2011688():R:16:0x000010ffff00000000/4785070309113856()
40632:__intel_ssse3_rep_memcpy+8737:0x00007ffcd201168c():R:16:0x10ffff0000000040200/-4503599627107840()
40633:__intel_ssse3_rep_memcpy+8742:0x00000000d80cbe64(Redo Buffers(pgsz:2048k)|+835172 redo|PUB_REDO_0+835172 ):W:16:0x000010ffff00000000/4785070309113856()
40634:__intel_ssse3_rep_memcpy+8747:0x00000000d80cbe68(Redo Buffers(pgsz:2048k)|+835176 redo|PUB_REDO_0+835176 ):W:16:0x10ffff0000000040200/-4503599627107840()
40635:__intel_ssse3_rep_memcpy+8752:0x00007ffcd200f778():R:8:0x10e3d3e5/283366373()
40636:kcrfw_copy_cv+569:0x00007ffcd200f890():R:4:0x2/2()

What this shows, is the kcrfw_copy_cv function calling _intel_fast_memcpy. The _intel_fast_memcpy is using a specialised memcpy function, __intel_ssse3_rep_memcpy, which uses a processor extension called ‘ssse3’. At line numbers 40629, 40631 and 40632 16 bytes are read and at 40630, 40633 and 40634 the reads are written into the public redo strand.

In order to understand where the data in these 16 bytes memory addresses has been written, I use an extension to my pinatrace annotate oracle script that creates a sqlite database of the annotation results using the ‘-d’ switch.

In order to see where the contents of the memory address 0x00007ffcd2011678 have been written, fire up sqlite3 with the created database:

$ sqlite3 insert.txt.db
sqlite> select * from annotation where memory_address between printf('%d',0x00007ffcd2011678) and printf('%d',0x00007ffcd2011678)+15 and line < 40629;
14439|kssadd+322|140723831772792|??|W|8|279514983|??
14444|kpdbIdToUid+231|140723831772792|??|R|8|279514983|??
16562|kglget+33|140723831772792|??|W|8|0|??
16563|kglget+37|140723831772800|??|W|8|0|??
17380|kglget+290|140723831772792|??|R|8|0|??
17381|kglget+294|140723831772800|??|R|8|0|??
17393|kglpin+13|140723831772792|??|W|8|1810933992|Variable Size(pgsz:2048k)|+183544040 shared pool|KGLH0^d0901451 DS:free memory,duration 1,cls free+96 shared pool|KGLH0^d0901451,duration 1,cls recr+2792
17406|kglpin+77|140723831772804|??|W|4|1|??
17621|kglpin+857|140723831772800|??|W|4|1|??
17720|kglpin+1115|140723831772800|??|R|4|1|??
17726|kglpin+1164|140723831772804|??|R|4|1|??
18697|kglpin+1993|140723831772792|??|R|8|1810933992|Variable Size(pgsz:2048k)|+183544040 shared pool|KGLH0^d0901451 DS:free memory,duration 1,cls free+96 shared pool|KGLH0^d0901451,duration 1,cls recr+2792
18755|kzpchkbu_internal+133|140723831772792|??|W|4|12|??
18763|kzpchkbu_internal+187|140723831772800|??|W|1|2|??
18772|kzpchkbu_internal+230|140723831772801|??|W|1|0|??
18778|kzpchkbu_internal+285|140723831772798|??|W|2|0|??
18818|kzpchkbu_internal+567|140723831772792|??|R|4|12|??
18867|kzpchkbu_internal+1241|140723831772796|??|W|1|0|??
19084|kzpchkbu_internal+1496|140723831772796|??|R|1|0|??
19088|kzpchkbu_internal+1524|140723831772792|??|R|4|12|??
19090|kzpchkbu_internal+1528|140723831772801|??|R|1|0|??
19218|kzpchkbu_internal+1747|140723831772801|??|R|1|0|??
20936|kglati+52|140723831772792|??|W|8|139724814752480|??
21062|kglati+190|140723831772800|??|W|8|80|??
22201|kglati+407|140723831772792|??|R|8|139724814752480|??
22203|kglati+415|140723831772800|??|R|8|80|??
30665|__intel_memset+2691|140723831772800|??|W|16|0|??
31116|__intel_memset+2691|140723831772800|??|W|16|0|??
38371|ktuchg2+9341|140723831772792|??|W|1|5|??
38372|ktuchg2+9345|140723831772793|??|W|1|2|??
39504|kcopcv+207|140723831772794|??|W|2|35|??
39507|kcopcv+221|140723831772800|??|W|4|272|??
39511|kcopcv+249|140723831772796|??|W|4|4294901763|??
39517|ub8_to_kscn_impl+21|140723831772804|??|W|4|7744340|??
40116|kcrfw_redo_gen_ext+1260|140723831772796|??|R|2|3|??
sqlite>

The query is simple: look up all records which were generated before line 40629, which did access the memory region that is read for copying into the public redo strand.

This is the complete output, please be aware that the high memory addresses are PGA memory addresses. PGA memory is highly fluent in nature. If you look in memory access traces you see that nearly every layer that is used during execution of SQL and PL/SQL requires allocation of memory area’s to store information. In fact, it’s so fluent that some memory area’s can be reused during repetitions in execution within the same statement.

The “__intel_memset’ functions are functions that set a memory area to a certain value, in this case zero. This might be an initialisation of memory. In order to be sure, lookup the lines (30665/31116), and see from what function they were called:

30620:kdtgrs+233:0x00007ffcd2012188():W:8:0/0()
30621:kdtgrs+244:0x000000006c20fb98(Variable Size(pgsz:2048k)|+186710936 shared pool|PLMCD^6daf6103 DS:permanent memor,duration 4,cls perm+2472 shared pool|PLMCD^6daf6103,duration 4,cls freeabl+2512 ):R:1:0x8/8
()
30622:kdtgrs+250:0x000000006c20fba4(Variable Size(pgsz:2048k)|+186710948 shared pool|PLMCD^6daf6103 DS:permanent memor,duration 4,cls perm+2484 shared pool|PLMCD^6daf6103,duration 4,cls freeabl+2524 ):R:1:0x80/
128()
30623:kdtgrs+273:0x00007ffcd20110e8():W:8:0x10e0cc66/283167846()
30624:_intel_fast_memset+7:0x00000000157f62a0():R:8:0x137fff/1277951()
30625:__intel_memset+1072:0x00000000157f6724():R:4:0x2/2()
30626:__intel_memset+1128:0x00000000062c7240():R:8:0x92/146()
30627:__intel_memset+2633:0x00000000157b9480():R:4:0x600000/6291456()
30628:__intel_memset+2660:0x00007ffcd2011430():W:16:0x0000000000000000/0()
30629:__intel_memset+2664:0x00007ffcd2011440():W:16:0x0000000000000000/0()
30630:__intel_memset+2669:0x00007ffcd2011450():W:16:0x0000000000000000/0()
30631:__intel_memset+2674:0x00007ffcd2011460():W:16:0x0000000000000000/0()
30632:__intel_memset+2686:0x00007ffcd2011470():W:16:0x0000000000000000/0()
30633:__intel_memset+2691:0x00007ffcd2011480():W:16:0x0000000000000000/0()

It’s called from kdtgrs. And by looking at the long list of memset commands, this clearly is initialisation of memory. This means that any allocation before the memset is something completely different. This means these functions did provide the redo information:

38371|ktuchg2+9341|140723831772792|??|W|1|5|??
38372|ktuchg2+9345|140723831772793|??|W|1|2|??
39504|kcopcv+207|140723831772794|??|W|2|35|??
39507|kcopcv+221|140723831772800|??|W|4|272|??
39511|kcopcv+249|140723831772796|??|W|4|4294901763|??
39517|ub8_to_kscn_impl+21|140723831772804|??|W|4|7744340|??

The ktuchg2 function hints at undo, and the kcopcv function is a function that seems to examine the block and create redo information based on the block. The ub8_to_kscn_impl is a function for the conversion of SCN numbers.

Let’s take the next memory region, and see what functions manipulated region:

38375|ktuchg2+9375|140723831772816|??|W|2|0|??
38376|ktuchg2+9380|140723831772818|??|W|2|0|??
38377|ktuchg2+9385|140723831772820|??|W|2|0|??
38378|ktuchg2+9390|140723831772822|??|W|2|0|??
39506|kcopcv+216|140723831772816|??|W|2|0|??
39510|kcopcv+237|140723831772814|??|W|2|65535|??
39518|ub8_to_kscn_impl+37|140723831772810|??|W|2|0|??
39519|ub8_to_kscn_impl+41|140723831772808|??|W|2|0|??
39523|kcopcv+326|140723831772812|??|W|1|1|??
39526|kcopcv+403|140723831772813|??|W|1|0|??

A little more of the same, ktuchg2/kcopcv/ub8_to_kscn_imp, although in another order.
And the third memory region:

38375|ktuchg2+9375|140723831772816|7ffcd2011690|??|W|2|0|0|??
38376|ktuchg2+9380|140723831772818|7ffcd2011692|??|W|2|0|0|??
38377|ktuchg2+9385|140723831772820|7ffcd2011694|??|W|2|0|0|??
38378|ktuchg2+9390|140723831772822|7ffcd2011696|??|W|2|0|0|??
38383|ktuchg2+9426|140723831772826|7ffcd201169a|??|W|2|32|20|??
39506|kcopcv+216|140723831772816|7ffcd2011690|??|W|2|0|0|??
39510|kcopcv+237|140723831772814|7ffcd201168e|??|W|2|65535|ffff|??
39523|kcopcv+326|140723831772812|7ffcd201168c|??|W|1|1|1|??
39526|kcopcv+403|140723831772813|7ffcd201168d|??|W|1|0|0|??
39895|kcobrh+157|140723831772824|7ffcd2011698|??|W|2|4|4|??

More of the same, although another function was added ‘kcobrh’. I am not familiar with this function, although my suspicion is this is a function to generate a header. Of course these 3 regions were copied using the same call.

The next memcpy call copies two 16 byte regions:

40641:kcrfw_copy_cv+544:0x00007ffcd200f778():W:8:0x10e3d3e5/283366373()
40642:_intel_fast_memcpy+7:0x00000000157f62a0():R:8:0x137fff/1277951()
40643:__intel_ssse3_rep_memcpy+49:0x00000000137d0a40():R:4:0xf2afc3cc/4071605196(??)
40644:__intel_ssse3_rep_memcpy+9244:0x00007ffcd201174c():R:16:0x120880000000000000/612489549322387456()
40645:__intel_ssse3_rep_memcpy+9249:0x00000000d80cbe88(Redo Buffers(pgsz:2048k)|+835208 redo|PUB_REDO_0+835208 ):W:16:0x120880000000000000/612489549322387456()
40646:__intel_ssse3_rep_memcpy+9254:0x00007ffcd201173c():R:16:0x100069600e810008a1c0/7593084918960792000()
40647:__intel_ssse3_rep_memcpy+9258:0x00000000d80cbe78(Redo Buffers(pgsz:2048k)|+835192 redo|PUB_REDO_0+835192 ):W:16:0x100069600e810008a1c0/7593084918960792000()
40648:__intel_ssse3_rep_memcpy+9262:0x00007ffcd200f778():R:8:0x10e3d3e5/283366373()
40649:kcrfw_copy_cv+569:0x00007ffcd200f890():R:4:0x2/2()

(lines 40644 and 40646)
Which memory is written to by different functions. In fact, these were the functions that I suspected to generate the redo information:

37790|ktuchg2+808|140723831773004|??|W|2|2|??
38266|ktugur+1575|140723831773006|??|W|2|136|??
38270|ktugur+1618|140723831773004|??|R|2|2|??
38271|ktugur+1625|140723831773004|??|W|2|18|??
38272|ktugur+1629|140723831773008|??|W|1|0|??
38279|ktugur+1663|140723831773012|??|W|2|0|??
38281|ktugur+1677|140723831773014|??|W|2|0|??
38283|ktugur+1690|140723831773016|??|W|4|0|??
38291|ktugur+1756|140723831773004|??|R|2|18|??
38292|ktugur+1763|140723831773004|??|W|2|18|??
38297|ktugur+1811|140723831773004|??|W|2|18|??

One write of ktuchg2 and then ktugur (kernel transaction undo generate undo and redo).

I gone through all of the memory area’s that are written to the public redo strand, and found the functions in which redo is generated:
– ktuchg2
– kcopcv
– kcobrh
– ktugur
Other functions that I have found in other regions of memory:
– kdtwrp
– kcosze
– ktugul

There is one other thing I found that I didn’t suspect. During the investigation I used a debugger to stop on certain functions (in fact, that is how I gotten the list of functions at the start of this article) during the insert. When doing that, I tried querying the table in order to see what happened visibly to the table at which I performed the insert while it was happening.

I found that the kcbchg1_main function performs cache buffers chains latch gets without calling kslgetl/ksl_get_shared_latch(_int). This has been documented on a few places, it is using a macro to perform the latch get. However, when stopping (by means of breaking execution using a debugger), I found that my other session simply executing a ‘select’ query hung on the latch.

Also, after the cache buffers chain latch get, the block is getting modified. That’s obvious of course, we are doing an insert. However, during the entire time between kcopcv up to the two kcbapl functions, the block is not accessible for any querying, and keeps any other query waiting in the wait event ‘buffer busy wait’. This is a direct violation of the mantra that writers don’t block readers.

Now, before you jump to conclusions, please bare with me a little more. Is this a problem? No, this is the direct consequence of concurrency control, which, in a shared system, always must happen. Certain things can only be done one at a time. In fact, that is the whole purpose of a latch.

However, I do want to make you think about it. These concurrency control waits simply must happen in order not to disturb changes being made to a block, or read an inconsistent version of a block. And this probably in much the same way has been for a long time this way in Oracle, and it has not a problem. Generally. Where this does become a problem, is when you have your processes randomly sleep for longer periods of time. Because of your inserting process is stalling during executing the kcrfw_redo_gen_ext function, other processes will get a buffer busy wait.

So when does this happen? Why am I saying this? Well, these random waits of processes is something that actually happens when your system is overloaded on CPU, or when your system is actively swapping. Now think about this in a broader sense. Whenever you run virtualised, and you do not have strict policies enforced on CPU and memory allocation, and especially if there is oversubscription of CPU and/or memory, these random waits can be applied by the hypervisor. So, never -ever- let your database server run short on CPU or memory, because the database will start showing a very peculiar waits, which could put you on your wrong foot very easily.

Conclusion
I actually tried putting in three distinct messages in this blogpost. The first one is to show how a change (an insert) is done internally. This shows that Oracle is build up in layers, and a lot of these layers are crossed even when you do the simplest of things, like inserting a single row in a table. Yes, this is not simple or even plain confusing. That’s not something I can change. However, I do think that this information is crucial in order to understand how the database works, and thus how to troubleshoot issues and understand what happens.

When a change is made to a block, first the redo is generated, then the undo is written, and then a row is inserted into the data block.

The second message is research on where the redo change vectors are generated. This is background information that will not help with anything concrete, it’s just background information for improving understanding how Oracle works. The research shows that redo is generated in a couple of functions:
– ktuchg2
– kcopcv
– kcobrh
– ktugur
– kdtwrp
– kcosze
– ktugul

At certain points, Oracle serialises access to blocks in order to make sure changes are not disturbed, and other processes do not read an inconsistent version of a block. This is technical locking, not the logical transaction locking that Oracle provides using enqueues. This is the third piece of information that I put in this article. This is directly usable and concrete information, and this is crucial, especially in today’s world of using virtualisation on premises or in the cloud.

If processes get stuck during these times of serialisation, all kinds of waits become visible, which might look like other problems; my first thought with buffer busy waits is a block being read into the buffer cache from disk or being written from the buffer cache onto disk, not being modified in the cache. In fact, if you start inserting into a freshly truncated table, I found I could get TX lock waits while doing a select. Again: this is not obvious.

So the last paragraph warns to make absolutely sure your database server is not oversubscribed on any resource, especially CPU and memory, because it can show itself in unpredictable ways, making it harder to diagnose, and thus to solve.

DMV for Log Statistics in SQL Server

There aren’t too many changes to the Dynamic Management Views in SQL Server 2017, but one was mentioned that I worth a second look and often have a lot more going on than upon first look.

Gif Credit: ancsy55.tumbr.com

DM_DB_LOG_STATS

This is an addition to SQL Server 2017 and available in Azure Database/Azure Data Warehouse to simplify and synchronize how we query information about log statistics, not just backups.  The added benefit of this is that it only requires the VIEW DATABASE STATE privilege to utilize it, which eases the demand on anyone who’s tried to grant rights to view backup information to non-DBA personnel in previous releases.  With this DMV, you can monitor, alert and kick off jobs to manage the transaction log backups.

It does require the database_id to be passed to it to provide results, but who keeps that around?  Much easier to just do a Cross Apply to sys.databases and make your life simple.

With the following query template:

SELECT name as 'Database Name',   as ''
FROM sys.databases AS s
CROSS APPLY sys.dm_db_log_stats(s.database_id)
where name='';

updated with various view columns from below, aliases if you prefer and then the database name if you want to specify, you can gather valuable transaction log statistics.

Important View Column Names

recovery_model SIMPLE, FULL, BULK_LOGGED
log_min_lsn or log_end_lsn the earliest and latest LSN included
current_vlf_sequence_number virtual log file sequence number info
current_vlf_size_mb the size in mb of the virtual log file
total_vlf_count Number of virtual log files residing in the transaction log
total_log_size_mb Total size of transaction log in MB
log_backup_time Time last transaction backup occurred
log_backup_lsn What is the most recent LSN backed up
log_checkpoint_lsn What was the last LSN that experienced a checkpoint
log_recovery_size_mb The size of the logs to be recovered since the last full backup
log_truncation_holdup_reason Lots of information and can be joined to sys.databases.log.reuse_wait_desc.  The results from here will quickly describe what latency is occurring for the transaction log to be truncated, letting you know of rollback or replication.

What can you do with this query template with the information shown?

What LSNs are in my Transaction Logs?

For those of you learning about this from the Oracle side, the LSN or Log Sequence Number is like an SCN in Oracle.

SELECT name as 'Database Name', log_min_lsn as 'Minimum LSN', log_end_lsn as 'Maximum LSN'
FROM sys.databases AS s
CROSS APPLY sys.dm_db_log_stats(s.database_id)
where name='BaseballData';

This will result in the minimum and maximum LSNs that are currently backed up in my logs for the database for the name given:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup2.png?r... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup2.png?r... 768w" sizes="(max-width: 467px) 100vw, 467px" data-recalc-dims="1" />

Last Backup of the Transaction Log?

Knowledge of backups, both full and log backups are an essential part of, (for on-premises and any IaaS) DBAs job.  We all have our trusty scripts and to produce reports, but there’s been some additions to the DMVs to assist in providing more information in, (what I feel) is a simpler format.

SELECT name AS 'Database Name', log_backup_time AS 'last log backup time' FROM sys.databases AS s CROSS APPLY sys.dm_db_log_stats(s.database_id);

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup1.png?r... 300w" sizes="(max-width: 408px) 100vw, 408px" data-recalc-dims="1" />

For this example, I took out the addition of a unique database, so you can see that all the databases for this SQL Server were returned and as they are set to “Simple”

I can also monitor the size of the log since the last backup and upon a certain threshold, kick off a backup:

SELECT s.name as 'Database Name', log_since_last_log_backup_mb as 'Log Size Since Last Backup'
FROM sys.databases AS s
CROSS APPLY sys.dm_db_log_stats(s.database_id)
where name='BaseballData';

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup5.png?r... 300w" sizes="(max-width: 445px) 100vw, 445px" data-recalc-dims="1" />

 

You can also do a small amount of a switch on the template to simply join and pull more refined data from the dm_db_log_stats DMV.  Let’s say we want to return the database name, the recovery model and the last backup time for the transaction log for a single database and we already know what that database_id is:

SELECT t.name as 'Database Name', s.recovery_model as 'Backup Model', s.log_backup_time AS 'last log backup time'
 FROM sys.dm_db_log_stats(5) AS S,sys.databases AS t
 where t.database_id=s.database_id;

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup3.png?r... 300w" sizes="(max-width: 487px) 100vw, 487px" data-recalc-dims="1" />

As you can see from above, I chose to pass the database_ID to the DMV and then join on sys.databases to pull the database name, but to provide the other column data from the DMV.

Trouble Avoidance

One area of contention for how SQL Server transaction logs perform has to do with the total number of Virtual Log Files, (VLFs) vs. the number of active VLFs.  If there is a hold up in truncating VLFs, (too many are held with a “lock” stopping it from truncating to be reused and the transaction log is required to autogrow, this can create a poor performance scenario.

Due to this, the dm_db_log_stats DMV can be used to monitor the number of VLFs, active VLFs, etc. and then alert when a threshold is reached.  This way the DBA isn’t forced to shrink the transaction log, which could impact overall database performance if it is done regularly, (also a sign of poor code/database design in my opinion.)

SELECT t.name as 'Database Name', s.total_vlf_count as 'Total VLFs', s.active_vlf_count as 'Active VLFs', s.log_truncation_holdup_reason as 'Reason for Log Holdup'
FROM sys.dm_db_log_stats(5) AS S,sys.databases AS t
where t.database_id=s.database_id;

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup6.png?r... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/03/backup6.png?r... 768w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

And for my small sample database, there aren’t a lot of VLF’s, (vs. a production system which may have 10K or more) but you get the idea… </p />
</p></div>

    	  	<div class=