I've been doing a lot of work around large volume data loads into an Oracle 11.2 database recently using External Tables and Parallelism and, despite the fact it's all used well-known techniques, I think it's probably worth a couple of posts to re-emphasise how successful using the right tools for the right job can be.
But first, we ran into a particular problem that threw us off-track for a few hours and this post covers that issue. I have a feeling that at least one person one day will land at this post via Google or from a vague memory of me writing about it and the hassle it saves them will make me smile!
Our application is written using a combination of C# and PL/SQL so, even as we've been implementing more functionality in the database, we still depend on scheduling software calling C# that in turn calls PL/SQL. As I was developing the data loading code, I simplified testing for myself by knocking together a basic .sql script that called the various PL/SQL procedures in the correct order. Everything looked great so I checked in my code and gave the appropriate call specifications to the C# developers to write wrapper procedures. Once that was done, we prepared to run the proper schedule and be amazed and delighted by the new performance improvements.
Unfortunately, the whole thing ran like a dog (... a rather old, sweet but overweight dog with a bad case of asthma).
When I investigated, everything was running serially. I initially thought I'd screwed something up so tried to work out what I'd done wrong but, no matter what I tried, the code used parallelism reliably when called from the basic test harness script but, as soon as we called it from the C# application, it would go back to serial. I wondered about session-level parameter settings or different user accounts but there were no identifiable differences there.
Because the performance difference was so great and we were under a lot of pressure to deliver data to the other teams, I was quite worked up by this and frustrated and there was very little out there on Google but perhaps I should have checked My Oracle Support in the first place ....
Ah! That looked pretty similar to what we were seeing. It turned out to be a combination of the way that the Oracle RDBMS works and the default configuration of Oracle Data Provider for .Net (ODP.Net), which sets the enlist property to true. To quote the support doc - "This makes OCI calls which allows the the DML or transactions to become or be promoted to a distributed transaction." and, as documented in the generic RDBMS documentation, distributed transactions can't use Parallel DML.
As we had no requirement to use distributed transactions, the simple solution was to set enlist=false as a property in the connection string.
Bingo! Everything started running in parallel again ...