Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

That demned elusive archive log!

Introduction

With apologies to Emma Orczy again for stealing a line from “The Scarlet Pimpernel” … </p />
</p></div>

    	  	<div class=

That demned elusive archive log!

Introduction

With apologies to Emma Orczy again for stealing a line from “The Scarlet Pimpernel” … </p />
</p></div>

    	  	<div class=

Power BI 101- Logging and Tracing, Part II

So we went over locations and the basics of logging and tracing in Power BI.  I now want to know how to make more sense from the data.  In Oracle, we use a utility called TKProf, (along with others and a number of third party tools) to make sense of what comes from the logs.  SQL Server has Log Analytics and the profiler, but what can I do with Power BI?

First, let’s discuss what happens when we have actual activity.  In my first post, the system was pretty static.  This time I chose to open up a file with larger data refreshes from multiple sources, added tables, calculated columns and measures.  The one Access DB has over 10 million rows that is refreshed when I first open the PBIX file:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?r... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?r... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?w... 1400w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging.png?w... 2100w" sizes="(max-width: 628px) 100vw, 628px" data-recalc-dims="1" />

Post loading, there’s a significant increase in number of MS Mashup Container, (calculations and measures) and msmdsrv, (data loading) logging:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging1.png?... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Do I really want to go through all this data by hand?  BI is a reporting tool, so what if I bring them into Power BI?  Let’s start with the first MS Mashup Container log-

In Power BI, click on Get Data –> Text and change the file type to “All Files” in the explorer and go to the directory that contains the trace files:

C:\Users\\AppData\Local\Microsoft\Power BI Desktop\Traces\Performance

Remember that you will need to have “hidden items” set to be displayed to browse down to this folder.  Choose the files you wish to load in the directory and Power BI and choose a Customer delimiter of a quotes, (“) to separate the file.  This will load a file that will have a few columns you’ll need to remove that contain data like colons, nulls and other syntax from the file.  Once you’ve completed doing this, you most likely have a table with 15 columns of valuable data:

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/columns_loggi... 145w" sizes="(max-width: 175px) 100vw, 175px" data-recalc-dims="1" />

I’ve renamed the columns to something more descriptive and I can now apply these changes and pull some value from the data.

Using the provided data, I can then produce a report that tells me about what types of processes are the largest users of resources and time.  I can provide reports to grant a visual on what’s going on in a Power BI environment.  The report is pretty straightforward-  Wait events against percentage of waits, Memory allocation over time, Time Waited and Wait Count.  These reports may seem really foreign for most data scientists, but for a DBA, it should resonate and provide them with ways they can offer assistance to the Power BI group in optimization.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 1400w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/07/logging_bi.pn... 2100w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

I can add hierarchy to this to drill down into interesting areas of waits and add more files, identifying each table by the file unique identifier and date that it came from going forward.  I expect my reports and my direction to look different from the direction many have taken with Power BI performance, but I wanted to demonstrate that optimization is always about time.  I admit fully that I’m still learning, but I also am approaching this from a database optimization perspective.  Please let me know your thoughts?

Happy hunting, folks!

 

 

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Power BI 101- Logging and Tracing, Part II], All Right Reserved. 2018.

Announcement: Webinars for “Oracle Indexing Internals and Best Practices” Now Confirmed !!

Exciting News !! I can now confirm the dates for my first webinars of my fully updated and highly acclaimed “Oracle Indexing Internals and Best Practice” seminar. For details of all the extensive content covered in the webinars, please visit my Indexing Seminar page. The webinars will run for 4 hours each day, spanning a full week period […]

Installing ZFS on OEL7 UEK4 for Docker storage

By Franck Pachot

.
The Oracle Database is fully supported on Docker according that Linux is Red Hat Enterprise Linux 7 or Oracle Enterprise Linux 7 with Unbreakable Enterprise 4. This is documented in MOS Note 2216342.1. Given the size of the Oracle database in GigaBytes even empty, the way it is installed at build with many file updates, and the per-block modifications of the datafiles, a block level copy-on-write filesystem is a must and deduplication and compression are appreciated. This makes ZFS a good option for the Docker storage driver, but also the external volumes. By the way, the Docker documentation about the storage drivers mention that zfs is a good choice for high-density workloads such as PaaS and this of course includes Database as a Service.

I’ve run this example on OEL 7.2 created in the the Oracle Cloud:
CaptureCreateInstance

kernel-uek-devel

We need to install the kernel headers. Of course, it is probably better to run a ‘yum update’ and reboot in order to run the latest kernel.
Here, I’m just installing the headers for the current kernel:

[root@localhost opc]# yum -y install kernel-uek-devel-$(uname -r)
...
Installed:
kernel-uek-devel.x86_64 0:4.1.12-112.14.13.el7uek
Dependency Installed:
cpp.x86_64 0:4.8.5-28.0.1.el7_5.1 gcc.x86_64 0:4.8.5-28.0.1.el7_5.1 glibc-devel.x86_64 0:2.17-222.el7
glibc-headers.x86_64 0:2.17-222.el7 kernel-headers.x86_64 0:3.10.0-862.9.1.el7 libdtrace-ctf.x86_64 0:0.8.0-1.el7
libmpc.x86_64 0:1.0.1-3.el7 mpfr.x86_64 0:3.1.1-4.el7
Dependency Updated:
glibc.x86_64 0:2.17-222.el7 glibc-common.x86_64 0:2.17-222.el7 libgcc.x86_64 0:4.8.5-28.0.1.el7_5.1
libgomp.x86_64 0:4.8.5-28.0.1.el7_5.1

DKMS

We need Dynamic Kernel Module Support to load ZFS modules. I had problems in the past with this so I install it step by step to verify that everything is ok. First, enable the EPEL repository:

[root@localhost opc]# yum install -y yum-utils
[root@localhost opc]# yum-config-manager --enable ol7_developer_EPEL

Then install DKMS:

[root@localhost opc]# yum -y install -y dkms
...
Installed:
dkms.noarch 0:2.4.0-1.20170926git959bd74.el7
Dependency Installed:
elfutils-default-yama-scope.noarch 0:0.170-4.el7 elfutils-libelf-devel.x86_64 0:0.170-4.el7
kernel-debug-devel.x86_64 0:3.10.0-862.9.1.el7 zlib-devel.x86_64 0:1.2.7-17.el7
Dependency Updated:
elfutils-libelf.x86_64 0:0.170-4.el7 elfutils-libs.x86_64 0:0.170-4.el7 zlib.x86_64 0:1.2.7-17.el7

Install ZFS repository

There is a zfs-release package that installs the /etc/yum.repos.d/zfs.repo:

[root@localhost opc]# sudo rpm -Uvh http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
Retrieving http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
warning: /var/tmp/rpm-tmp.yvRURo: Header V4 RSA/SHA256 Signature, key ID f14ab620: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:zfs-release-1-5.el7_4 ################################# [100%]

Basically, all it contains is the following enabled section:

[zfs]
name=ZFS on Linux for EL7 - dkms
baseurl=http://download.zfsonlinux.org/epel/7.4/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux

Install ZFS

This is the important part, installing ZFS:

[root@localhost opc]# sudo yum install -y zfs
...
======================================================================================================================
Package Arch Version Repository Size
======================================================================================================================
Installing:
zfs x86_64 0.7.9-1.el7_4 zfs 413 k
Installing for dependencies:
kernel-devel x86_64 3.10.0-862.9.1.el7 ol7_latest 16 M
libnvpair1 x86_64 0.7.9-1.el7_4 zfs 30 k
libuutil1 x86_64 0.7.9-1.el7_4 zfs 35 k
libzfs2 x86_64 0.7.9-1.el7_4 zfs 130 k
libzpool2 x86_64 0.7.9-1.el7_4 zfs 591 k
lm_sensors-libs x86_64 3.4.0-4.20160601gitf9185e5.el7 ol7_latest 41 k
spl x86_64 0.7.9-1.el7_4 zfs 29 k
spl-dkms noarch 0.7.9-1.el7_4 zfs 456 k
sysstat x86_64 10.1.5-13.el7 ol7_latest 310 k
zfs-dkms noarch 0.7.9-1.el7_4 zfs 4.9 M

The most important is to check that the zfs module is installed correctly:

zfs.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.1.12-112.14.13.el7uek.x86_64/extra/

I’ve seen cases where it was not and then the module cannot load. You can also check:

[root@localhost opc]# dkms status
spl, 0.7.9, 4.1.12-112.14.13.el7uek.x86_64, x86_64: installed
zfs, 0.7.9, 4.1.12-112.14.13.el7uek.x86_64, x86_64: installed

If you have a problem (such as “modprobe: FATAL: Module zfs not found” when loading the module), check the status and maybe re-install it with:

dkms remove zfs/0.7.9 --all
dkms --force install zfs/0.7.9

If everything is ok, you can load the module:

[root@localhost opc]# /sbin/modprobe zfs
[root@localhost opc]#

Create a ZFS filesystem

If the ZFS module was not loaded you have this error:

[root@localhost opc]# zpool list
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.

If it has been loaded correctly, you have no ZFS Storage Pool yet:

[root@localhost opc]# zpool list
no pools available

First I need to add a disk to my machine. Here I have only one disk created when I created the Compute Service:

[root@localhost opc]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdb 202:16 0 128G 0 disk
├─xvdb1 202:17 0 500M 0 part /boot
└─xvdb2 202:18 0 127.5G 0 part
├─vg_main-lv_root 249:0 0 123.5G 0 lvm /
└─vg_main-lv_swap 249:1 0 4G 0 lvm [SWAP]

I add a new disk in the Storage tab:
CaptureCreateStorage
And attach it and attach it to my Cloud Instance:
CaptureAttachStorage

Here is the new disk visible from the system:

[root@localhost opc]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdb 202:16 0 128G 0 disk
├─xvdb1 202:17 0 500M 0 part /boot
└─xvdb2 202:18 0 127.5G 0 part
├─vg_main-lv_root 249:0 0 123.5G 0 lvm /
└─vg_main-lv_swap 249:1 0 4G 0 lvm [SWAP]
xvdc 202:32 0 120G 0 disk
&nbsp
[root@localhost opc]# ls -l /dev/xvdc /dev/block/202:32
lrwxrwxrwx 1 root root 7 Jul 19 15:05 /dev/block/202:32 -> ../xvdc
brw-rw---- 1 root disk 202, 32 Jul 19 15:05 /dev/xvdc

Here is where I add a ZFS Storage Pool for Docker:

[root@localhost opc]# zpool create -f zpool-docker -m /var/lib/docker /dev/xvdc
 
[root@localhost opc]# zpool status
pool: zpool-docker
state: ONLINE
scan: none requested
config:
 
NAME STATE READ WRITE CKSUM
zpool-docker ONLINE 0 0 0
xvdc ONLINE 0 0 0
 
[root@localhost opc]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zpool-docker 119G 118K 119G - 0% 0% 1.00x ONLINE -

And while I’m there I set some attributes to enable compression and deduplication. And as Docker writes to layers with 32k I/O I set the recordsize accordingly:

zfs set compression=on zpool-docker
zfs set dedup=on zpool-docker
zfs set recordsize=32k zpool-docker

Note that I attached if you reboot the instance you will have to attach the storage again and then run zpool import zpool-docker)

[root@localhost opc]# zpool import zpool-docker

Just to test that everything is ok, I install Docker as I did in a previous post:

[root@localhost opc]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
[root@localhost opc]# yum-config-manager --enable ol7_addons
[root@localhost opc]# yum -y install docker-ce
[root@localhost opc]# systemctl start docker

Docker layers

I pull a small image and start a container on it:

[root@localhost opc]# docker run oraclelinux:7-slim

Here is the image and the ZFS dataset for its layer, mounted under /var/lib/docker/zfs:

[root@localhost opc]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
oraclelinux 7-slim b1af4ba0cf19 12 days ago 117MB
 
[root@localhost opc]# docker inspect oraclelinux:7-slim | jq -r .[0].GraphDriver
 
{
"Data": {
"Dataset": "zpool-docker/fe31ff466872588506b1a3a3575c64d458beeb94d15bea593e5048237abf4fcc",
"Mountpoint": "/var/lib/docker/zfs/graph/fe31ff466872588506b1a3a3575c64d458beeb94d15bea593e5048237abf4fcc"
},
"Name": "zfs"
}

And here is the container layer:

[root@localhost opc]# docker container ls -a
 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9eb7610c1fc5 oraclelinux:7-slim "/bin/bash" 6 minutes ago Exited (0) 6 minutes ago inspiring_shannon
 
[root@localhost opc]# docker inspect inspiring_shannon | jq -r .[0].GraphDriver
 
{
"Data": {
"Dataset": "zpool-docker/5d1022761b6dee28a25e21ec8c5c73d99d09863f11439bbf86e742856f844982",
"Mountpoint": "/var/lib/docker/zfs/graph/5d1022761b6dee28a25e21ec8c5c73d99d09863f11439bbf86e742856f844982"
},
"Name": "zfs"
}

If you don’t have jq just ‘yum install jq’. It is very convenient to filter and display the ‘inspect’ output.

We can see those datasets from ZFS list:

[root@localhost opc]# zfs list -o creation,space,snapshot_count,written -r | sort
 
CREATION NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD SSCOUNT WRITTEN
Thu Jul 19 15:13 2018 zpool-docker 115G 126M 0B 964K 0B 125M none 964K
Thu Jul 19 15:38 2018 zpool-docker/fe31ff466872588506b1a3a3575c64d458beeb94d15bea593e5048237abf4fcc 115G 125M 0B 125M 0B 0B none 0
Thu Jul 19 15:39 2018 zpool-docker/5d1022761b6dee28a25e21ec8c5c73d99d09863f11439bbf86e742856f844982 115G 87K 0B 87K 0B 0B none 87K
Thu Jul 19 15:39 2018 zpool-docker/5d1022761b6dee28a25e21ec8c5c73d99d09863f11439bbf86e742856f844982-init 115G 46K 0B 46K 0B 0B none 0

Here, sorted by creation time, we see the datasets used by each layer. The initial files before having any image are less than 1MB. The image uses 125MB. The container creation has written 87KB and 46KB additional once running.

 

Cet article Installing ZFS on OEL7 UEK4 for Docker storage est apparu en premier sur Blog dbi services.

Oracle Security Training by Pete Finnigan in 2018

Are you worried about the data in your databases being stolen? GDPR has just become law across the EU and the UK and affects business in other countries that process EU citizens data. Maybe you store and process credit card....[Read More]

Posted by Pete On 19/07/18 At 02:04 PM

Google Cloud Spanner – inserting data

By Franck Pachot

.
In a previous post I’ve created a Google Cloud Spanner database and inserted a few rows from the GUI. This is definitely not a solution fo many rows and here is a post about using the command line.

If I start the Google Shell from the icon on the Spanner page for my project, everything is set. But if I run it from elsewhere, using the https://console.cloud.google.com/cloudshell as I did in A free persistent Google Cloud service with Oracle XE I have to set the project:

franck_pachot@cloudshell:~$ gcloud config set project superb-avatar-210409
Updated property [core/project].
franck_pachot@superb-avatar-210409:~$

Instance

I create my Spanner instance with 3 nodes across the world:
¨
franck_pachot@superb-avatar-210409:~$ time gcloud spanner instances create franck --config nam-eur-asia1 --nodes=3 --description Franck
Creating instance...done.
 
real 0m3.940s
user 0m0.344s
sys 0m0.092s

Database

and Spanner database – created in 6 seconds:

franck_pachot@superb-avatar-210409:~$ time gcloud spanner databases create test --instance=franck
Creating database...done.
&nbssp;
real 0m6.832s
user 0m0.320s
sys 0m0.128s

Table

The DDL for table creation can also be run from there:

franck_pachot@superb-avatar-210409:~$ gcloud spanner databases ddl update test --instance=franck --ddl='create table DEMO1 ( ID1 int64, TEXT string(max) ) primary key (ID1)'
DDL updating...done.
'@type': type.googleapis.com/google.protobuf.Empty

I’m now ready to insert one million rows. Here is my table:

franck_pachot@superb-avatar-210409:~$ gcloud spanner databases ddl describe test --instance=franck
--- |-
CREATE TABLE DEMO1 (
ID1 INT64,
TEXT STRING(MAX),
) PRIMARY KEY(ID1)

Insert

The gcloud command line has a limited insert possibility:

franck_pachot@superb-avatar-210409:~$ time for i in $(seq 1 1000000) ; do gcloud beta spanner rows insert --table=DEMO1 --database=test --instance=franck --data=ID1=${i},TEXT=XXX${i} ; done
commitTimestamp: '2018-07-18T11:09:45.065684Z'
commitTimestamp: '2018-07-18T11:09:50.433133Z'
commitTimestamp: '2018-07-18T11:09:55.752857Z'
commitTimestamp: '2018-07-18T11:10:01.044531Z'
commitTimestamp: '2018-07-18T11:10:06.285764Z'
commitTimestamp: '2018-07-18T11:10:11.106936Z'
^C

Ok, let’s stop there. Calling a service for each row is not efficient with a latency of 5 seconds.

API

I’ll use the API from Python. Basically, a connection is a Spanner Client:

franck_pachot@superb-avatar-210409:~$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import spanner
>>> spanner_client = spanner.Client()
>>> instance = spanner_client.instance('franck')
>>> database = instance.database('test')
>>>

Batch Insert

With this I can send a batch of rows to insert. Here is the full Python script I used to insert one million, by batch of 1000 rows:

from google.cloud import spanner
spanner_client = spanner.Client()
instance = spanner_client.instance('franck')
database = instance.database('test')
for j in range(1000):
records=[]
for i in range(1000):
records.append((1+j*1000+i,u'XXX'+str(i)))
with database.batch() as batch:
batch.insert(table='DEMO1',columns=('ID1', 'TEXT',),values=records)

This takes 2 minutes:

franck_pachot@superb-avatar-210409:~$ time python3 test.py
 
real 2m52.707s
user 0m21.776s
sys 0m0.668s
franck_pachot@superb-avatar-210409:~$

If you remember my list of blogs on Variations on 1M rows insert that’s not so fast. But remember that rows are distributed across 3 nodes in 3 continents but here inserting with constantly increasing value have all batched rows going to the same node. The PRIMARY KEY in Google Spanner is not only there to declare a constraint but also determines the organization of data.

Query

The select can also be run from there from a read-only transaction called ‘Snapshot’ because it is doing MVCC consistent reads:

frank_pachot@superb-avatar-210409:~$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import spanner
>>> with spanner.Client().instance('franck').database('test').snapshot() as snapshot:
... results = snapshot.execute_sql('SELECT COUNT(*) FROM DEMO1')
... for row in results:
... print(row)
...
[1000000]

The advantage of the read-only transaction is that it can do consistent reads without locking. The queries executed in a read-write transaction have to acquire some locks in order to guarantee consistency when reading across multiple nodes.

Interleave

So, you can look at the PRIMARY KEY as a partition by range, and we have also reference partitioning with INTERLEAVE IN PARENT. This reminds me of the Oracle CLUSTER segment that is so rarely used because storing the tables separately is finally the better compromise on performance and flexibility for a multi-purpose database.

Here is my creation of DEMO2 where ID1 is a foreign key referencing DEMO1

franck_pachot@superb-avatar-210409:~$ time gcloud spanner databases ddl update test --instance=franck --ddl='create table DEMO2 ( ID1 int64, ID2 int64, TEXT string(max) ) primary key (ID1,ID2), interleave in parent DEMO1 on delete cascade'
DDL updating...done.
'@type': type.googleapis.com/google.protobuf.Empty
 
real 0m24.418s
user 0m0.356s
sys 0m0.088s

I’m now inserting 5 detail rows per each parent row:

from google.cloud import spanner
database = spanner.Client().instance('franck').database('test')
for j in range(1000):
records=[]
for i in range(1000):
for k in range(5):
records.append((1+j*1000+i,k,u'XXX'+str(i)+' '+str(k)))
with database.batch() as batch:
batch.insert(table='DEMO2',columns=('ID1','ID2','TEXT'),values=records)

This ran in 6 minutes.

Join (Cross Apply)

Here is the execution plan for

SELECT * FROM DEMO1 join DEMO2 using(ID1) where DEMO2.TEXT=DEMO1.TEXT

where I join the two tables and apply a filter on the join:
CaptureSpannerCrossApply

Thanks to the INTERLEAVE the join is running locally. Each row from DEMO1 (the Input of the Cross Apply) is joined with DEMO2 (the Map of Cross Apply) locally. Only the result is serialized. On this small number of rows we do not see the benefit from having the rows in multiple nodes. There are only 2 nodes with rows here (2 local executions) and probably one node contains most of the rows. The average time per node is 10.72 seconds and the elapsed time is 20.9 seconds, so I guess that one node ran un 20.9 seconds and the other in 1.35 only.

The same without the tables interleaved (here as DEMO3) is faster to insert but the join will be more complex where DEMO1 must be distributed to all nodes.
CaptureSpannerDistributedCrossApply
Without interleave, the input table of the local Cross Apply is a Batch Scan, which is actually like a temporary table distributed to all nodes (seems to have 51 chunks here), created by the ‘Create Batch’. This is called Distributed Cross Applied.

So what?

Google Spanner has only some aspects of SQL and Relational databases. But it is still, like the NoSQL databases, a database where the data model is focused at one use case only because the data model and the data organization have to be designed for specific data access.

 

Cet article Google Cloud Spanner – inserting data est apparu en premier sur Blog dbi services.

Power BI 101 – Log Files and Tracing

Knowing where log files are and how to turn on debugging is an essential part of any technical job and this goes for Power BI, too.  Remember, as I learn, so does everyone else….Come on, pretty please?

Power BI Desktop

Log files and traces can be accessed one of two ways-

  • Via the Power BI Application
  • Via File Explorer

In the Power BI application, go to File –> Options and Settings –> Options –> Diagnostics.

Crash and dump files are automatically stored with an option to disable them from this screen, but unsure why you’d ever want to do this.  If Power BI does crash, you would lose any valuable data on what the cause was.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 1519w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 1400w" sizes="(max-width: 318px) 100vw, 318px" data-recalc-dims="1" />

To debug with a trace, you’ll need to enable it from this screen as well, as it’s not turned on by default.  Remember that tracing can be both resource and storage intensive, so only enable it when you actually need to diagnose something.  You can also choose to bypass tracing the geo code cache, as this is used to help map coordinates and it can be very chatty.

To view files, you can click on the open crash/dump file folder and this will open up a File Explorer to the traces directory on your pc.

Directly From File Explorer:

Ensure that File Explorer has viewing set to display hidden items.

C:\Users\\AppData\Local\Microsoft\Power BI Desktop\Traces

Log Files

These are all retained inside the Performance folder under the Traces directory

The file’s will be named with the following naming convention:

..

Locating the files that you need for your current process is easiest if you sort by Date Modified.  Verify that you’re working with the file that is being written to and not the file used to keep track of startup and shutdown log tracking, (0 KB):

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 1512w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 1400w" sizes="(max-width: 438px) 100vw, 438px" data-recalc-dims="1" />

The third log file in the list above, and also the one started before the executable for Power BI Desktop, (PID 13396) is the Microsoft Mashup Container, (Microsoft.Mashup.Container.NetFX40.exe) with its own PID of 16692.  It’s contains valuable information about calculations,  measures and other caching processes.  Take care to ensure the PID of the one used by Power BI in the logs matches the one you’re inspecting in the Task Manager-  Excel and other programs are also known to have a version of this executable, so there may be more than one listed for Power BI, as well as others for different Microsoft applications.

Log File Breakdown

Each file will contain entries providing information on high level processing, including start time, total size of cache allocated for the task, process information, Process ID, (PID), Transaction ID, (TID) and duration.

An example of an entry can be seen below:

ObjectCacheSessions/CacheStats/Size {"Start":"2018-07-19T01:42:24.9707127Z","Action":"ObjectCacheSessions/CacheStats/Size","entryCount":"1","totalSize":"24","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"00000000-0000-0000-0000-000000000000","Process":"PBIDesktop","Pid":13396,"Tid":8,"Duration":"00:00:00.0046865"}

We can easily match up the Process name and the PID with what is displayed in our Task Manager detail view:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 1400w" sizes="(max-width: 460px) 100vw, 460px" data-recalc-dims="1" />

We’ve now identified the process, the amount of memory allocated to perform a task captured in the log, start time and the duration.  The information in these log files can assist when diagnosing if Power BI desktop crashes, but the data collected is quite rudimentary.

If you shut down Power BI Desktop, the PBIDesktop* log file writes to the startup file, which was once empty and it then empties and saves off the timestamp of the exit of the program.

The Microsoft Mashup file has much of the same information, but includes deeper level processing work by Power BI, such as work done in the Query Editor or when we create a measure or new column/table.

In the three examples from the file below, you can see a compile, a save and then an evaluate task.  Note that the Host Process ID is my Power BI Desktop we’ve seen earlier, but the interaction with the Microsoft Mashup Container is demonstrated as well:

SimpleDocumentEvaluator/GetResult/Compile {"Start":"2018-07-19T01:48:46.1441843Z","Action":"SimpleDocumentEvaluator/GetResult/Compile","HostProcessId":"13396","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.4302569"}

ObjectCache/CacheStats/Size {"Start":"2018-07-19T01:48:47.3504705Z","Action":"ObjectCache/CacheStats/Size","HostProcessId":"13396","entryCount":"5","totalSize":"14564","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.0000170"}

SimpleDocumentEvaluator/GetResult/Evaluate {"Start":"2018-07-19T01:48:46.5744678Z","Action":"SimpleDocumentEvaluator/GetResult/Evaluate","HostProcessId":"13396","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.7780750"}

Another common file in the Performance directory will contain the msmdsrv* naming convention, which collect log information on the data source loader.  Duration information and cache/memory allocation could offer valuable information on poor performance during data loading processes.  First stop is always to check the settings for the desktop to see what has been set for memory allocation vs. assuming it’s the default.

If I just start the program and don’t open anything, only the high level processing of starting, basic memory allocation and stopping will be tracked in the PBIDesktop* file until I open up a PBIX file.  Then anything that needs to be updated and refreshed for the visuals, etc. will begin to write log data to the Microsoft Mashup log file and if a data refresh must be performed, the msmdsrv file.

Trace files

When you do turn on debugging, tracing, as shown in the beginning of this post, a file is created in the parent directory, TRACES.

When enabled and after a restart of the Power BI Desktop, you will receive not only similar information about PID, TID, the process and the duration, but also encounter granule information about Power BI and what’s going on behind the scenes:

  • Application graphics info
  • Settings
  • Parameters
  • Background processes
  • Caching
  • Extensions
  • Query edits
  • Changes applied

You’ll even see entries similar to the following:

SharedLocalStorageAccessor/AcquireMutex

A mutex is a small, efficient allocation of memory.  As mutexes have thread affinity, it means the mutex can only be released by the thread in Power BI that owns it.  If it’s released by another thread, an application exception will be thrown in the application and trapped in the trace file.

The interesting aspect of tracing in Power BI Desktop, the options are put back to default, with granule level tracing disabled when you restart the application.

 

 

 

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Power BI 101 - Log Files and Tracing], All Right Reserved. 2018.

Speaking At DOAG 2018 Conference And IT Tage 2018

#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px;">I will be speaking at the yearly DOAG#333333; font-family: Verdana, Arial, sans-serif;"> conference in December as well as at the IT Tage in November. My talk will be "Oracle Optimizer System Statistics Update 2018" where I summarize the history and current state of affairs regarding System Statistics and I/O calibration in recent Oracle versions like 12c and 18c.

Easy way to create large demo-tables in #Exasol and #Oracle

https://uhesse.files.wordpress.com/2015/10/helps.png?w=600&h=558 600w, https://uhesse.files.wordpress.com/2015/10/helps.png?w=150&h=140 150w" sizes="(max-width: 300px) 100vw, 300px" />

If you need a large set of data to test or demonstrate something, this does the trick:

Create demo table in Oracle

create table t as
select 
rownum as id, -- growing unique number
'Just some text' as textcol,
mod(rownum,5) as numcol1, -- numbers 0-4 round-robin
mod(rownum,1000) as numcol2 , -- numbers 0-999 round robin
5000 as numcol3, -- constant
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id -- dates for every month 
from dual connect by level<=1e4 -- 10000 rows 
; 

Create demo table in Exasol

create table t as
select 
rownum as id, -- growing unique number
'Just some text' as textcol,
mod(rownum,5) as numcol1, -- numbers 0-4 round-robin
mod(rownum,1000) as numcol2 , -- numbers 0-999 round robin
5000 as numcol3, -- constant
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id -- dates for every month 
from (select 1 from dual connect by level<=1e4) -- 10000 rows 
; 

In Oracle you may get this error message for a high number of rows (like 10 Mio): ORA-30009: Not enough memory for CONNECT BY operation.
This way it works in spite of it:

SQL> alter session set workarea_size_policy=manual;

Session altered.

SQL> alter session set sort_area_size=1000000000;

Session altered.

SQL> create table t as
     select 
     rownum as id, 
     'Just some text' as textcol,
     mod(rownum,5) as numcol1, 
     mod(rownum,1000) as numcol2 , 
     5000 as numcol3, 
     to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id  
     from dual connect by level<=1e7 ; 
Table created.
 
SQL> select count(*) from t;

  COUNT(*)
----------
  10000000

SQL> select bytes/1024/1024 as mb from user_segments where segment_name='T';

        MB
----------
       480

In Exasol, this problem doesn’t surface:

SQL_EXA> create table t as
         select
         rownum as id,
         'Just some text' as textcol,
         mod(rownum,5) as numcol1,
         mod(rownum,1000) as numcol2 ,
         5000 as numcol3,
         to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id
         from (select 1 from dual connect by level<=1e7) ; 
EXA: create table t as... 
Rows affected: 10000000 
SQL_EXA> SELECT raw_object_size/1024/1024 as mb FROM SYS.EXA_ALL_OBJECT_SIZES where object_name='T';
EXA: SELECT raw_object_size/1024/1024 as mb FROM SYS.EXA_ALL_OBJECT_SIZES w...

MB
----------------------------------
          286.10229492187500000000

1 row in resultset.

And the resulting table is smaller because Exasol stores it in columnar compressed format </p />
</p></div>

    	  	<div class=