Wallis Pereira, Kevin Closson：2010-09-20 (Monday) Moscone South Rm 302
プレゼンテーションタイトルが"Optimizing Servers for Oracle Database Performance"と平凡なので見落としてしまう。Kevin ClossonはExadataのオピニオン・リーダーです。
Oracle Closed Worldとい「オフ会」もある。
We promise everything and demand nothing, and we are experts in the art of buying things expensively and selling them cheaply.
This post is the first of three intended to provide a quick demonstration of the Oracle Data Mining API. Deeper concepts will be glossed over in the interest of keeping this primer material short and to the point.
Training a data mining model involves the process of learning the patterns and relationships in the underlying data. Identifying a business problem, gathering data, and cleaning data are critical steps that must be performed prior to training a model, and these steps tend to consume large amounts of time and effort, but they are outside the scope of this API-focused post.
Moscone West L2, Rm 2005
This presentation looks at various tips, tricks, and traps regarding the use and implementation of Oracle indexes. Topics include indexes and constraint management; how different types of indexes are needed for different constraint options; and how constraints, in turn, influence the cost-based optimizer (CBO). Next the session discusses subtle differences between unique and nonunique indexes and why these differences matter. It looks at tricks that make B-tree indexes extremely useful even if columns have few distinct values, as few as just one value. It also looks at how NULLs can actually be indexed effectively in B-tree indexes. Finally, it considers various problems with index monitoring and why the results can't be trusted.
o Sub-queries are encapsulated as well as nested sub-queries
o Exists/In and Not Exist/Not In are indicated on the diagram
o Outer Joins are denoted on the diagram
Welcome back. I realize this blog has been dead for a long time, but like Gabe Kotter, I’ve decided to return to my roots a bit and fire this blog back up.
Those of you that have spoken to me lately or have been following me on Twitter know that I’ve been pretty passionate about database design and development processes. I’ve gotten to the point where I’ve almost never seen a database where the physical infrastructure is perfect, so ongoing changes to database structures and schemas are just a fact of life.
It’s managing that process of changing the database layout that’s been getting me all worked up lately – even when folks know what to change and how they’d like to change it, they don’t have tools or processes to introduce change into the database system in a traceable manner.
In one of my prior jobs, we were adamant about change-tracking all of our database changes – we used a rigorous change control and CM process based on object-change by object-change individual script files. Little did I know at the time that what we were practicing was a form of database refactoring…
I had thought that almost everyone understood the importance of maintaining traceability for database changes, but I’ve recently encountered situations where the lack of standards and tools means that changes are applied to databases in a haphazard fashion. While searching for some good arguments to use when introducing this concept, I came across a book by Scott Ambler and Pramod Sadalage entitled “Refactoring Databases: Evolutionary Database Design”.
Immediately I was happy with the concept: a whole “text-book” of how and why you need to manage the process of database structural change management. In the remainder of this post, I’ll be reviewing and commenting on this book.
Before I begin, I think it’s interesting to look at the review quotes in the front of the book. In some ways I wonder if folks know who this book is for – most of the quotes seem to patronize “data-professionals” saying it’s high time that they joined the modern world in embracing agile development techniques. References to “strong-armed DBAs” holding back projects seem a little harsh.
I continue to believe that the jack-of-all-trades DBA moniker is mostly to blame for the sad state of database physical design today. Ask folks what the primary task of a DBA group is, and chances are you’ll be told that it’s backup and recovery, not physical database design and construction. I even have a hard time with the term database development as I don’t really feel like I’m writing code when I’m doing physical database layout. I’ve been advocating the split of the DBA term into Database Operator (traditional DBA), Database Engineer (physical database designer and constructor) and Database Developer (stored procedure and SQL writer).
Using my terms, this book is best targeted at the Database Engineer and Developer.
What’s funny to me about the opprobrium heaped upon DBAs by agile developers is that I don’t think it’s a criticism of database technology in and of itself – but rather frustration with being forced to work with database professionals who lack the temperament, skills and experience to do database engineering and development. Let’s face it, a conservative operations DBA is (rightly) concerned primarily with system availability and reliability through ensuring proper backups and minimizing potential workload on the server. These are the DBAs who prefer to have hundreds of small independent databases in production all running at 2% utilization because it plays to their strengths.
It’s far harder to manage a large, multi-application tenant database running at 30-40% utilization experiencing continual structure changes – and that’s where this book starts to help.
The Preface has a nice section on “Why Evolutionary Database Development?” which starts us off into understanding why its necessary to resist the desire to have a full and complete set of logical and physical models before performing database development. Early in my career I participated in efforts to create so-called Enterprise Data Models – which, being constructed by ivory-tower oversight and governance groups lacked any sort of applicability to business and mission requirements. And sadly, were out-of-date even when they were eventually completed. The book authors do a nice job of highlighting the benefits of the incremental approach, and also caution folks about the potential barriers to its adoption. In particular they point out the severe lack of tools supporting database SCM (this is written in 2006).
They also mention the need for database sandbox environments – they suggest individual developers get their own databases to experiment with. I’m not a big fan of this approach – I prefer a single development database that allows me to host a lot of data, with each developer getting their own schema to play around in. I also ALWAYS enable DDL auditing in ALL of my databases – that way I can track potential changes that might need to be promoted to the next environment (I also get to validate that my changes were applied to the higher environment – and, as icing on the cake, I can trap dumb ideas like embedding DDL statements inside transactional operations).
Chapter 2 introduces the concept of Database Refactoring, with a quick introduction on refactoring in general (“a disciplined way to restructure code in small steps”). The authors do a nice job of pointing out that database refactoring is conceptually more difficult than code refactoring – that code refactoring only needs to maintain behavioral semantics, while database refactorings must also maintain informational semantics (pg. 15). The emphasis here includes the ability to introduce change in a transitional way that allows for multiple applications and multiple versions of applications to continue to run against the same database. A simple example of moving a column from a parent table to a child table is also included.
In section 2.3, the authors categorize database refactorings into 6 broad categories: Structural (modifying table definitions), Data Quality (think check constraints), Referential Integrity (capturing rules that might currently be maintained by application code), Architectural (transferring common logic from applications into database procedures to increase their usefulness), Method (stored procedure refactorings), and Non-Refactoring Transformations (evolving the schema to handle new data concepts).
They also introduce the idea of indicators that your database may require refactoring – they call them “Database Smells”
These include common problems like multipurpose columns, multipurpose tables, redundant storage of data items, overloaded columns, and fear of altering the database because it is too complex.
In section 2.6, the authors explain how it is easier to refactor your database schema when you decrease the coupling between applications and the database – through concepts like persistence layers.
Chapter 3 walks you through the basics of a database refactoring process – including giving you a list of process steps. It also includes some good checks on determining whether or not the change is necessary and worth the effort. Finally, they talk about version control and visibility.
Chapter 4 is pretty short, and deals with deploying or migrating changes from environment to environment. This includes the concept of bundling changes together, scheduling and documenting deployments. Finally, they discuss the actual deployment process, including defining and possibly testing your backout procedures.
In my environments, we’d break up these deployment items into 3 main groups: items that are pre-deployable (i.e., can be deployed ahead of time without affecting current applications), items that require application outages, and items that can be deployed “post-deployment” (perhaps cleanup activities that require the structure change, but aren’t required by the applications).
Chapter 5 discusses strategies (actually lessons learned) for successfully moving database refactorings through your development process, including implementing traceability for database changes, simplifying database change review processes, and hunting down and eliminating duplicate SQL.
The rest of the book, Chapters 6 through 11, goes through specific kinds of refactorings (i.e., Introduce Calculated Column) along with basic pros/cons of each one and example SQL scripts (using the Oracle dialect). It serves as a reference catalog of database change concepts and is useful from a delineation perspective. I wish there was more meat in the pro and con section for each transformation, but in the end it’s a useful list.
Overall I thoroughly enjoyed the book and would recommend it for many development teams – project managers and developers should read at least the first 50 pages so as to understand how to integrate database development into the overall project plan. Traditional DBAs supporting development teams absolutely must read this – if only to enhance their ability to interact and fully support development activities.
That’s all I have for now – look for shorter, more incisive posts in the future!
In-Memory Non PQは8セッションまで、ある程度スケーラブルに処理能力は上がった（TPC-Hの結果参照）。
しかし、In-Memory PQはIn-Memory Parallel Queryセッション数でテストしたように、同時4セッションでqphは横ばいとなった。
Parent Latch Statistics DB/Inst: ORACLE/oracle Snaps: 13-14
-> only latches with sleeps are shown
-> ordered by name
Latch Name Requests Misses Sleeps Gets
------------------------ --------------- ------------ ---------- -----------
Real-time plan statistic 7,048 1,133 107 1,026
active service list 25,346 2,817 1,716 1,139
call allocation 41,422 2,215 39 2,177
dummy allocation 9,807 1,635 2 1,633
enqueues 63,653 3,107 10 3,097
messages 9,540 4 1 3
parameter table manageme 24,461 2,459 1 2,458
qmn task queue latch 52 1 1 0
query server process 31 11 24 0
resmgr:free threads list 9,808 1,488 1 1,487
Child Latch Statistics DB/Inst: ORACLE/oracle Snaps: 13-14
-> only latches with sleeps/gets > 1/100000 are shown
-> ordered by name, gets desc
Child Get Spin
Latch Name Num Requests Misses Sleeps Gets
---------------------- ------- ------------ ------------ ---------- -----------
cache buffers chains 49990 2,791,680 5,344 32 5,313
cache buffers chains 42860 112,396 2,595 4 2,586
cache buffers chains 37884 91,118 3,150 2 3,148
cache buffers chains 31347 16,878 1,166 1 1,165
cache buffers chains 16597 5,280 4 1 3
cache buffers chains 48852 5,280 3 1 2
cache buffers chains 10163 5,200 3 1 2
cache buffers chains 4153 4,570 2 1 1
cache buffers chains 13172 4,568 1 1 0
cache buffers chains 14344 4,564 2 1 1
cache buffers chains 18218 4,564 2 1 1
cache buffers chains 46359 4,560 2 1 1
cache buffers chains 47367 4,560 2 1 1
cache buffers chains 55125 4,560 1 1 0
cache buffers chains 65459 4,560 4 1 3
cache buffers chains 6706 4,560 3 1 2
cache buffers chains 5678 4,560 2 1 1
cache buffers chains 17931 4,560 3 1 2
cache buffers chains 36331 4,560 1 1 0
cache buffers chains 43826 4,000 2 1 1
cache buffers chains 23857 3,440 5 1 4
cache buffers chains 49962 3,040 3 1 2
cache buffers chains 42216 3,040 3 1 2
cache buffers chains 345 3,040 1 1 0
cache buffers chains 51797 3,040 1 1 0
cache buffers chains 26238 2,885 2 1 1
cache buffers chains 35482 2,880 1 1 0
cache buffers chains 32567 2,880 3 1 2
cache buffers chains 39474 2,495 3 1 2
cache buffers chains 30680 2,480 1 1 0
cache buffers chains 44458 2,480 1 1 0
cache buffers chains 20453 2,480 3 1 2
cache buffers chains 46045 2,480 1 1 0
cache buffers chains 11421 2,480 1 1 0
cache buffers chains 31662 2,160 4 2 3
cache buffers chains 29644 1,920 1 1 0
cache buffers chains 33569 1,520 3 1 2
cache buffers lru chai 45 54,182 211 2 209
object queue header op 23 54,897 1,443 1 1,442
object queue header op 18 54,867 1,173 2 1,171
object queue header op 24 54,844 1,457 1 1,456
object queue header op 22 54,823 1,386 5 1,381
parallel query stats 1 5,720 920 13 907
process queue 6 2,526 110 2 108
process queue referenc 30 128,252 18 3 16
shared pool 1 133,446 14,215 21 14,194
shared pool 2 127,339 19,019 12 19,007
cache buffers chainsでLatch get missがたくさん発生していた。
これは、In-Memory Parallel Queryセッション数でのテスト結果では同時8セッションで発生した現象だ。
8セッションからのキューイング管理は実装CPUのスレッド数から割り出されているのだろうけど、今回使用しているCore i7 860だと、管理が始まる8セッションは少し遅すぎるのではないかと感じる。実際4セッションでCPUは振り切れている。
DWHシステムでは、In-Memory Parallel Queryは「どんなことがあっても使いたい！」一番魅力的な機能です。
でも全てのデータをBUFFER_CACHEに乗せる事は物理的に難しいのでParallel Queryで補完する。それが理想形です。だから、前回のブログで大容量192GBメモリー搭載可能 System-Xマンのコマーシャル映像を載せたのです。
しかし、、、parallel_degree_policy=auto だけがIn-Memory PQを制御する指定で、CPUが振り切れないように制御する「キューイング管理」がどのタイミングで働くかが良く解りません。