There’s a thread running on OTN at present about deleting huge volumes of duplicated data from a table (to reduce it from 1.1 billion to about 22 million rows). The thread isn’t what I’m going to talk about, though, other than quoting some numbers from it to explain what this post is about.
Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:
What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.
At the same time i want these rows to be inserted into other table.
The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.
If you delete before you insert, how do you find the data to insert ?
There are many posts about the amount of memory that is taken by the Oracle database executables and the database SGA and PGA. The reason for adding yet another one on this topic is a question I recently gotten, and the complexities which surrounds memory usage on modern systems. The intention for this blogpost is to show a tiny bit about page sharing of linux for private pages, then move on to shared pages, and discuss how page allocation looks like with Oracle ASMM (sga_target or manual memory).
The version of linux in this blogpost is Oracle Linux 7.2, using kernel: 4.1.12-37.6.3.el7uek.x86_64 (UEK4)
The version of the Oracle database software is 188.8.131.52.160719 (july 2016).
While browsing the web recently for articles on the HyperLogLog algorithm that Oracle uses for some of its approximate functions, I came upon a blog post written in Jan 2014 with the title Use Subqueries to Count Distinct 50X Faster. There are various ways that subqueries can be used to rewrite queries for improved performance, but when the title caught my eye I couldn’t think of a way in which they could improve “count distinct”. It turned out that the word “subquery” was being used (quite correctly) in the sense of “inline view” while my mind had immediately turned to subqueries in the select list or where clause.
I thought I had written this note a few years ago, on OTN or Oracle-L if not on my blog, but I can’t find any sign of it so I’ve decided it’s time to write it (again) – starting as a question about the following code:
An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.
The OP told us that the query “select * from view” took seven minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in (approximately) that seven minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.
I think the “column group” variant of extended stats is a wonderful addition to the Oracle code base, but there’s a very important detail about using the feature that I hadn’t really noticed until a question came up on the OTN database forum recently about a very bad join cardinality estimate.
The point is this: if you have a multi-column equality join and the optimizer needs some help to get a better estimate of join cardinality then column group statistics may help if you create matching stats at both ends of the join. There is a variation on this directive that helps to explain why I hadn’t noticed it before – multi-column indexes (with exactly the correct columns) have the same effect and, most significantly, the combination of one column group and a matching multi-column index will do the trick.
I received an email recently that started with the sort of opening sentence that I see far more often than I want to:
I have come across an interesting scenario that I would like to run by you, for your opinion.
It’s not that I object to being sent interesting scenarios, it’s just that they are rarely interesting – and this wasn’t one of those rare interesting ones. On the plus side it reminded me that I hadn’t vented one of my popular rants for some time.
Here’s the problem – see if you can work out the error before you get to the rant:
“I’ve got a table and a view on that table; and I’ve got a query that is supposed to use the view. Whether I use the table or the view in query the optimizer uses the primary key on the table to access the table – but when I use the table the query takes about 30 ms, when I use the view the query takes about 903 ms”.
In my last note on index usage I introduced the idea of looking at v$segstat (or v$segment_statistics) and comparing the “logical reads” statistic with the “db block changes” statistic as an indicator of whether or not the index was used in execution plans. This week I’ll explain the idea and show you some results – with a little commentary – from a production system that was reported on the OTN database forum.
The question of how to identify indexes that could be dropped re-appeared (yet again) on the OTN database forum last week. It’s not really surprising that it recurs so regularly – the problem isn’t an easy one to solve but new (and even less new) users keep hoping that there’s a quick and easy solution.
There are, however, strategies and pointers that can help you to optimise the trade-off between effort, risk, and reward. Broadly the idea is to spend a small amount of effort finding a relatively small number of “expensive” indexes that might be safe to drop, so that when you do the detailed analysis you have a good chance that the time spent will be rewarded by a positive result.
Before we get to some results posted on OTN, it’s worth thinking about the global impact and what we’re trying to achieve, and the threats that go with our attempt to achieve it.