Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

GDPR

I posted a couple of days ago my slides from the recent UKOUG Northern Technology day in Leeds where I spoke about GPPR for the Oracle DBA. I said then that i am also preparing a service line for helping....[Read More]

Posted by Pete On 06/06/18 At 03:10 PM

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

GDPR for the Oracle DBA

I did a talk at the recent UKOUG Norther Technology Summit in Leeds, UK on May 16th. This talk was an enhanced version of the one i did at the UKOUG tech conference in Birmingham in December 2017 to a....[Read More]

Posted by Pete On 04/06/18 At 08:40 PM

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out)

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post: “ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to […]

PeteFinnigan.com Limited Printed Oracle Security Training Manuals for Sale

Over the last year or so we have offered for sale left over printed manuals from some of our training courses. Normally we only print the manuals for classes that we organise for in person training such as the classes....[Read More]

Posted by Pete On 03/06/18 At 01:47 PM

Oracle Security Training In York, UK, 2018

I have just updated our public training dates page to add two new dates for Oracle Security training classes that I will be running here in York, UK. We now have 4 dates covering three available classes. These are as....[Read More]

Posted by Pete On 02/06/18 At 06:54 PM

Measure the impact of DRBD on your PostgreSQL database thanks to pgio (the SLOB method for PostgreSQL)

You might want to replicate a PostgreSQL database thanks to DRBD. In this case you should measure the impact of your DRBD setup, especially if you plan to use DRBD in sync mode. As I am a lucky beta tester of pgio (the SLOB method for PostgreSQL), let’s use it to measure the impact.

DRBD configuration

The purpose of this post is not to explain how to set up DRBD. The DRBD configuration that will be used in this post is the following:

  • Primary host: ubdrbd1
  • Secondary host: ubdrbd2

Configuration:

root@ubdrbd1:/etc# cat drbd.conf
global {
usage-count no;
}
common {
protocol C;
}
resource postgresql {
on ubdrbd1 {
device /dev/drbd1;
disk /dev/sdb1;
address 172.16.170.153:7789;
meta-disk internal;
}
on ubdrbd2 {
device /dev/drbd1;
disk /dev/sdb1;
address 172.16.170.155:7789;
meta-disk internal;
}
}

The C protocol is the synchronous one, so that local write operations on the primary node are considered completed only after both the local and the remote disk write (on /dev/sdb1) have been confirmed.

To ensure that PostgreSQL writes in this replicated device, let’s create a filesystem on it:

root@ubdrbd1:/etc# mkfs.ext4 /dev/drbd1
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 5242455 4k blocks and 1310720 inodes
Filesystem UUID: a7cc203b-39fa-46b3-a707-f330f72ca5b1
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

And mount it, in sync mode, on the postgresql default data location:

root@ubdrbd1:~# mkdir /var/lib/postgresql
root@ubdrbd1:~# mount -o sync /dev/drbd1 /var/lib/postgresql

pgio setup

Once PostgreSQL is up and running, we can setup pgio, this is as easy as that:

postgres@ubdrbd1:~/$ tar -xvf pgio-0.9.tar
postgres@ubdrbd1:~/$ cd pgio

then, edit the configuration file to suit your needs. In our current case, the configuration file being used is the following:

postgres@ubdrbd1:~/pgio$ cat pgio.conf
UPDATE_PCT=10
RUN_TIME=120
NUM_SCHEMAS=1
NUM_THREADS=1
WORK_UNIT=255
UPDATE_WORK_UNIT=8
SCALE=200M
DBNAME=pgio
CONNECT_STRING="pgio"
CREATE_BASE_TABLE=TRUE

As you can see:

  • I want 1 schema and 1 thread by schema
  • I have set UPDATE_PCT so that 10% of calls will do an UPDATE (to test writes, and so DRBD replication impact)
  • I kept the default work unit to read 255 blocks and, for those 10% updates, update 8 blocks only

Now, create the pgio tablespace and pgio database:

postgres=# create tablespace pgio location '/var/lib/postgresql/pgdata';
CREATE TABLESPACE
postgres=# create database pgio tablespace pgio;
CREATE DATABASE

and run setup.sh to load the data:

postgres@ubdrbd1:~/pgio$ ./setup.sh

Job info:      Loading 200M scale into 1 schemas as per pgio.conf->NUM_SCHEMAS.
Batching info: Loading 1 schemas per batch as per pgio.conf->NUM_THREADS.
Base table loading time: 3 seconds.
Waiting for batch. Global schema count: 1. Elapsed: 0 seconds.
Waiting for batch. Global schema count: 1. Elapsed: 13 seconds.

Group data loading phase complete.         Elapsed: 13 seconds.

check the content:

postgres@ubdrbd1:~/pgio$ echo '\d+' | psql pgio
                      List of relations
 Schema |   Name    | Type  |  Owner   |  Size  | Description
--------+-----------+-------+----------+--------+-------------
 public | pgio1     | table | postgres | 200 MB |
 public | pgio_base | table | postgres | 29 MB  |
(2 rows)

Now, let’s run 2 times pgio:

  • One time with DRBD not active
  • One time with DRBD active

For each test, the pgio database will be recreated and the setup will be launched. Also the same pgio run duration will be applied (to compare the exact same things).

First run: DRBD not active

The result is the following:

postgres@ubdrbd1:~/pgio$ ./runit.sh
Date: Sat Jun 2 05:09:46 UTC 2018
Database connect string: "pgio".
Shared buffers: 128MB.
Testing 1 schemas with 1 thread(s) accessing 200M (25600 blocks) of each schema.
Running iostat, vmstat and mpstat on current host--in background.
Launching sessions. 1 schema(s) will be accessed by 1 thread(s) each.
pg_stat_database stats:
datname| blks_hit| blks_read|tup_returned|tup_fetched|tup_updated
BEFORE: pgio | 98053 | 34640 | 34309 | 6780 | 8
AFTER: pgio | 19467796 | 11956992 | 30807460 | 29665212 | 115478
DBNAME: pgio. 1 schemas, 1 threads(each). Run time: 120 seconds. RIOPS >99352< CACHE_HITS/s >161414<

As you can see, 115470 tuples have been updated during this 120 seconds run without DRBD in place.

Second run: DRBD active

For the purpose of this post, let’s simulate “slowness” on the DRBD replication. To do so, we can apply some throttling on the DRBD target device thanks to the blkio cgroup. I would suggest to read Frits Hoogland post to see how it can be implemented.

My cgroup configuration on the DRBD secondary server is the following:

root@ubdrbd2:~# ls -l /dev/sdb
brw-rw---- 1 root disk 8, 16 Jun 1 15:16 /dev/sdb

root@ubdrbd2:~# cat /etc/cgconfig.conf
mount {
blkio = /cgroup/blkio;
}

group iothrottle {
blkio {
blkio.throttle.write_iops_device="8:16 500";
}
}

root@ubdrbd2:~# cat /etc/cgrules.conf
* blkio /iothrottle

Means: we want to ensure that the number of writes per second on /dev/sdb will be limited to 500 for all the processes.

Let’s check that DRDB is active:

root@ubdrbd1:~# drbdsetup status --verbose --statistics
postgresql role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:1 disk:UpToDate
      size:20969820 read:453661 written:6472288 al-writes:215 bm-writes:0 upper-pending:4 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:0 sent:5161968 out-of-sync:0 pending:4 unacked:0

root@ubdrbd2:~# drbdsetup status --verbose --statistics
postgresql role:Secondary suspended:no
    write-ordering:flush
  volume:0 minor:1 disk:UpToDate
      size:20969820 read:0 written:5296960 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:2 al-suspended:no blocked:no
  peer connection:Connected role:Primary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:5296968 sent:0 out-of-sync:0 pending:0 unacked:2

With this configuration in place, let’s run pgio on the source machine:

postgres@ubdrbd1:~/pgio$ ./runit.sh
Date: Sat Jun 2 05:37:03 UTC 2018
Database connect string: "pgio".
Shared buffers: 128MB.
Testing 1 schemas with 1 thread(s) accessing 200M (25600 blocks) of each schema.
Running iostat, vmstat and mpstat on current host--in background.
Launching sessions. 1 schema(s) will be accessed by 1 thread(s) each.
pg_stat_database stats:
datname| blks_hit| blks_read|tup_returned|tup_fetched|tup_updated
BEFORE: pgio | 111461 | 44099 | 35967 | 5640 | 6
AFTER: pgio | 3879414 | 2132201 | 5799491 | 5766888 | 22425
DBNAME: pgio. 1 schemas, 1 threads(each). Run time: 120 seconds. RIOPS >17400< CACHE_HITS/s >31399<

As you can see, 22419 tuples have been updated during this 120 seconds run with DRBD and blkio throttling in place (this is far less than the 115470 observed in the first test).

pgio also provides snapshots of iostat, vmstat and mpstat, so that it is easy to check that the throttling was in place (writes per second < 500 on the source due to the throttling on the target):

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00  425.42      0.00      3.20     0.00    84.41   0.00  16.56    0.00    0.18   0.08     0.00     7.71   0.16   6.64
sdb              0.00  415.18      0.00      3.16     0.00    74.92   0.00  15.29    0.00    0.20   0.08     0.00     7.80   0.17   7.26
.
.

compares to:

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00 1311.07      0.00     10.14     0.00    62.98   0.00   4.58    0.00    0.10   0.14     0.00     7.92   0.10  13.70
sdb              0.00 1252.05      0.00      9.67     0.00    62.33   0.00   4.74    0.00    0.11   0.13     0.00     7.91   0.10  13.01
.
.

with DRBD and throttling, both not in place.

So?

Thanks to pgio, we have been able to measure the impact of our DRBD replication setup on PostgreSQL by executing the exact same workload during both tests. We simulated a poor performing DRBD replication by using blkio throttling on the secondary node.

Want to read more about pgio?

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part I: The Beta pgio README File.

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part II: Bulk Data Loading.

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part III: Link To The Full README file for Beta pgio v0.9.

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part IV: How To Reduce The Amount of Memory In The Linux Page Cache For Testing Purposes.

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

https://blog.dbi-services.com/which-bitnami-service-to-choose-in-the-oracle-cloud-infrastructure/

Installing MAMP to play with PHP, MySQL and OpenFlights

You may wonder what I’m doing with those technologies that are completely different from what I usually do. I’ll detail in a future blog post but the idea is giving a short introduction to databases to students at https://powercoders.org/, a coding academy for refugees in Switzerland. They install MAMP (My Apache – MySQL – PHP) during their curriculum for website development, and then I’ll use this environment to teach data modeling and SQL basics. Thus, I’ve to look at MAMP, PHP and MySQL for the first time… And I decided to load the OpenFlights open data to play with.
That explains the title.

CaptureMAMP002
So MAMP is like LAMP (Linux+Apache+PHP+MySQL) but with a M for MacOS, but also Windows (W being an upside-down M after all). Let’s install that on my laptop. I download it from https://www.mamp.info/en/downloads/, run the .exe, all is straightforward and the installer notifies me that the installation will be completed after a reboot.

What? Reboot? Seriously, we are in 2018, that’s Windows 10, I refuse to reboot to install a simple Apache server!

This bundle is very easy to use: a simple window to start and stop the servers (Apache and MySQL) . A setup menu to configure them, but I keep the default. And a link to the start page. All that is installed under C:\MAMP (you can change it, I just kept the default settings). The first time you start the servers, the Windows Firewall configuration is raised and you have to accept it:

CaptureMAMP003

With all defaults (Apache on port 80) my web server pages are on http://localhost (serving the files in C:\MAMP\htdocs) and administration page is at http://localhost/MAMP/
The MySQL administration page (phpMyAdmin) is at http://localhost/MAMP/index.php?page=phpmyadmin. It seems that, at least by default, I don’t need a password to go to the admin pages.

display_errors

I’ll write some PHP and because it’s the first time in my life, I will have some errors. With the default configuration, Apache just raises and Error 500 which does not help me a lot for debugging. This configuration is ok for production because displaying errors may give clues to hackers. But I’m there to play and I want to see the error messages and line numbers.

I have to set display_errors=on for that. The current setting is displayed in http://localhost/MAMP/index.php?language=English&page=phpinfo#module_core and I can change it in C:\MAMP\conf\php7.2.1\php.ini and after a restart of the Apache server I can see full error messages:

Warning: mysqli_real_connect(): (28000/1045): Access denied for user 'root'@'localhost' (using password: YES) in C:\MAMP\htdocs\index.php on line 123

Little fix

But now that I display the errors, I get this annoying message each time I try to do something in phpMyAdmin (which runs as PHP in the same Apache server):

MAMP "count(): Parameter must be an array or an object that implements Countable"

CaptureMAMP004

So this product, which is free but has also a ‘PRO’ version, probably running the same code, is delivered with bad code, raising errors that were ignored. Don’t tell me that it is just a warning. You will see that parentheses are missing, this is a syntax error and raising only a warning for that is quite bad.
CaptureMAMP006
My common sense tells me that we should set display_errors=on and test a few screens before releasing a software. But that step has probably been skipped. Fortunately, the message is clear: line 615 of C:\MAMP\bin\phpMyAdmin\libraries\sql.lib.php

The message is about count() not having the correct parameter. The line 615 shows count($analyzed_sql_results[‘select_expr’] == 1 ) which is probably not correct because it counts a boolean expression. I’ve changed it to (count($analyzed_sql_results[‘select_expr’]) == 1 ) as I suppose they want to count and compare to one.

Well, I’ve never written one line of PHP and I already hate it for its error handling weakness.

Load some data

I want to initialize the database with some data and I’ll use the OpenFlights database. I’ve downloaded and unzipped https://github.com/jpatokal/openflights/blob/master/sql/master.zip
I go to the unzipped directory and run MySQL:

cd /d D:\Downloads\openflights-master

Another little thing to fix here: the sql\create.sql and sql\load-data.sql files contain some lines starting with “\! echo” but this \! command (to run a system command) exists on Linux but not on the Windows port of MySQL. We have to remove them before running the SQL scripts. I’m used to Oracle where I can port my code and scripts from one platform to the other, and was a but surprised by this.

Ready to connect:

C:\MAMP\bin\mysql\bin\mysql test --port=3306 --host=localhost --user root --password
Enter password:

The MySQL connection parameters are displayed on http://localhost/MAMP/ including the password (root)


source sql\create.sql
 
mysql> source sql\create.sql
Query OK, 0 rows affected (0.00 sec)
 
Connection id: 314
Current database: flightdb2
 
Query OK, 0 rows affected (0.02 sec)
 
Query OK, 0 rows affected (0.02 sec)
 
Query OK, 0 rows affected (0.01 sec)
 
Query OK, 0 rows affected (0.01 sec)
 
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
 
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
 
Query OK, 0 rows affected (0.00 sec)
...

This has created the flightdb2 database, with openflights user, and 15 tables.

Now, if you are still in the unzipped directory, you can load data with the source sql\load-data.sql script which loads from the data\*.dat files

mysql> source sql\load-data.sql
Query OK, 6162 rows affected, 4 warnings (0.04 sec)
Records: 6162 Deleted: 0 Skipped: 0 Warnings: 4
 
Query OK, 7184 rows affected, 7184 warnings (0.12 sec)
Records: 7184 Deleted: 0 Skipped: 0 Warnings: 7184
 
Query OK, 67663 rows affected (0.53 sec)
Records: 67663 Deleted: 0 Skipped: 0 Warnings: 0
 
Query OK, 260 rows affected (0.01 sec)
Records: 260 Deleted: 0 Skipped: 0 Warnings: 0
 
Query OK, 12 rows affected (0.01 sec)
Records: 12 Deleted: 0 Skipped: 0 Warnings: 0

Query from PHP

So, for my first lines of PHP I’ve added the following to C:\MAMP\htdocs\index.php:

<?php
$user = 'openflights'; $password = '';
$db = 'flightdb2'; $host = 'localhost'; $port = 3306;
 
$conn = mysqli_init();
if (!$conn) {
die("mysqli_init failed");
}
if (!$success = mysqli_real_connect( $conn, $host, $user, $password, $db, $port)) {
die("