Hereby, for those who want another look or for people to share, my presentation content “Creating Structure in Unstructured Data” given during the Hotsos 2013 Symposium on Monday morning. HTH Marco Hotsos 2013 – Creating Structure in Unstructured Data from Marco Gralike
Vorig jaar heb ik behoorlijk wat vragen gekregen over of er een tool was, een methodiek, om BAG data van het Nederlandse Kadaster in een Oracle database te krijgen voor allerlei doeleinden. Basisregistraties Adressen en Gebouwen (BAG) data wordt onder andere uitgeleverd door het Kadaster in XML bestanden waarin alle Kadaster gegevens zijn vastgelegd. Deze …
Just finished my Birds of a Feather XMLDB panel session in the Marriott Hotel and now looking back on an event full day. It all started of with a Keynote session with among others Mark Hurd. the most interesting bit, in my honest opinion, was the announcement of more details in Oracle 12c. I will …
Last Thuesday, Mark Drake, Senior Product Manager and I, delivered a good presentation during UKOUG in Birmingham about how to use your database, via XMLDB functionality, as a file server. The presentation demonstrated as well how you could extent the “standaard” file server (aka your database) functionality with features like, WebDAV driven ACL Security and
Although not a “pure” XML partitioning example, that is partitioning data on criteria within the XML document, and before I forget to mention this exercise, I would like to point out the following URL:
This small exercise was setup based on questions / comments from a reader on this blog regarding the ”
Structured XMLIndex (Part 3) – Building Multiple XMLIndex Structures” content after heaving trouble to setup structured and unstructured local XMLIndexes.
The forum link demonstrates howto:
This one is long overdue. There is a partition example of binary xml on this website based on Range, Hash and List partitioning, but this is, seen from a XML view, a incorrect way to do it due to the fact that the partition column is a regular one and not a virtual column as described in the Oracle XMLDB Developers Guide for 11.2. The examples given in that ppost will partition the table on the ID column but does not partition the XML based on a value/element IN the XML document. The following will.
So here, a small example, of how it can be done based on a virtual column. Be aware that you should support these virtual columns with at least an index or XMLIndex structure for performance reasons.
#993333; font-weight: bold;">CREATE #993333; font-weight: bold;">TABLE binary_part_xml #993333; font-weight: bold;">OF XMLType XMLTYPE STORE #993333; font-weight: bold;">AS SECUREFILE #993333; font-weight: bold;">BINARY XML VIRTUAL #993333; font-weight: bold;">COLUMNS #66cc66;">( LISTING_TYPE #993333; font-weight: bold;">AS #66cc66;">(XMLCast#66cc66;">(XMLQuery#66cc66;">(#ff0000;">'/LISTING/@TYPE' PASSING OBJECT_VALUE RETURNING CONTENT#66cc66;">) #993333; font-weight: bold;">AS VARCHAR2#66cc66;">(#cc66cc;">100#66cc66;">)#66cc66;">)#66cc66;">) #66cc66;">) PARTITION #993333; font-weight: bold;">BY LIST #66cc66;">(LISTING_TYPE#66cc66;">) #66cc66;">( PARTITION health #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Health'#66cc66;">)#66cc66;">, PARTITION law_firms #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Law Firm'#66cc66;">) #66cc66;">);
Currently sitting in at the Oracle Open World 2010 presentation of Sam Iducula, Consulting member of the tech. staff and Mark Drake, Sr. Product Manager for Oracle XML DB. Before getting into the more in-depth topics Sam explained XML schema usage, for validation via XML schema validators like for example XML Spy or JDeveloper. This is currently really needed because those more used XML Schema like the really big ones out there like H7, etc, are nowadays so very very big that a good XML Schema validator is really needed. XML Schema in binary XML format is stored in a post parsed binary format. This has the advantage that Oracle knows about the format when storing the XML document. Extra information can be shared by the database by registering the XML Schema in the database that validates the Binary XML content.
There can be a lot of recursive dependencies, via the import or include references in a XML Schema, which make it even more difficult to make optimal use of this information. For example in the H7 (Health Level 7 schemas) setup this includes over 100 included XML Schemas. Oracle 11gR2 has been greatly improved performance and handling of those very huge meta data information as stored in such XML Schemas. Via streaming schema validation and adding hints via xdb:annotations, this provides the database with even more information on how to optimal handle these structures and as such performance can be improved even more. Some of those hints could be used to avoid the creation of objects, in this case while using XMLType Object Relational storage via, for instance, xdb:defaultTable=”" (providing an empty string) or store parts of the XML document information out of line. By the way for this last example you should use JDeveloper because it will annotate the XML Schema incorrectly (the bug has been reported by me). One of the improvements in 184.108.40.206.0 is huge improvements were made in cycle detection recognition, so they are handled even better in the mentioned version.
On the XMLDB home page on the Oracle OTN website a package of tools provided (“Oracle XML DB Ease of Use Tools for Structured Storage“) which can make your life easier regarding those xdb:annotation’s especially for those enormous big XML Schemas. This tool set which enables you to automate a lot of hints in XML Schema optimization you would like to make. Via XQuery or other XML DB update statements you are also able to override the by the database generated naming or storage options. Via some simple anonymous PL/SQL blocks this can be very easily done via for example, DBMS_ANNOTATE-x packages contained in this XML tool set, as said which is freely available on the XML DB OTN Oracle website.
This tool set also comes with a white paper that shows and demonstrates some of the best XML handling ideas and experience gathered trough a lot of years handling customer use cases the Oracle XML DB Development team had. For example if you know it’s not applicable to your XML document you are able to switch off or alter DOM validation handling while storing or handling your XML document in the database. You can override ordering for example if it is applicable for your XML Schema, this avoids oracle checking it, which improves handling, but, be very aware, it can also be dangerous doing this if it was implemented by the person who created the XML Schema, but just didn’t care about the real life implement and/or it’s importance regarding being a actual mandatory requirement in practice.
I have experience multiple times that even with official XML Schemas the restrictions didn’t match real life use, so although automation really helps you to manage your XML registered schemas more easily, you must be aware of those exceptions. XML schemas can be created very loosely on real life implementations which can get you in a lot of problems after these storage models, based on such an XML schema, is used in your database design; those rules will be enforced via a XML schemas in the database.
As always, proper design with future needs in mind, takes time to do it properly. This is also the case regarding creating a good XML schema.
In Oracle 11you have now the possibility, via this tool set, to use DBMS_XML_MANAGE (for XMLType Object Relational storage) to rewrite table to column mapping, which figures out for you, makes it more easy, to identify and create supporting indexes on ComplexTypes. This has the advantage that you can create indexes with some more meaningful names like, for example “line_items_uniq_idx_01″ or whatever the naming convention within your company might be.
In the latest XDB toolset there will be now also a XDB_ANALYZE_XMLSCHEMA package which sorts out all of the scripting and possible options, while you feed it the actual XML Schemas. As was demonstrated by Mark Drake, all the FpML schemas which have a lot of dependencies of each other where analyzed, annotated, registered and it created over 100 tables and more than 2500+ objects in minutes. Try doing this by hand…
Also while using this package it will sort out the proper XML Schema dependencies and in which order all those XML schemas have to be registered in the correct order (based on includes, imports and ref’s used by Simple- and ComplexTypes). Sometimes you have to break up column create table statements because the maximum amount of columns allowed by Oracle in one single CREATE TABLE statement is “only” 1000 columns. This package will help you figure out how much of those Object Relational storage items will have to be moved “out of line” and/or to break up on a certain level in the XML hierarchy of the XML tree to avoid this 1000 column limitation but also to provide the design info needed to get the maximum performance.
This tool set used for XMLType Object Relational storage is only useful if you XML design is highly relational. If not then, your XMLType storage module should be based on Binary XML. The advantage of using XMLType Object Relational storage is that you make full use of Oracle relational technology and optimizations, which is available since a long long time and full use of, for example, the Cost Based Optimizer will kick into effect. On the other hand, be aware if your XML design is really relational, maybe you should have created it by relational means. There should be a proper use case to work with the XML format in the first place. My adagio always is: if it is not XML, don’t use Oracle XMLDB. If it is, go for Oracle XMLDB, if not only that is a “no cost option” within your Oracle database and it has been designed, since version 220.127.116.11.0, to optimal handle XML in your database.
For further information about choosing the proper XMLType storage model and how to optimally query these structures, have a look at:
Something new? Eh? Should you do this? Eh?
In all, probably not, but for me this was a good exercise towards some more updated demo scripting for my “Boost your environment with XMLDB” presentation or hopefully more clearer relabeled Oracle Open World name for the almost same presentation called “Interfacing with Your Database via Oracle XML DB” (S319105). Just up front, there are some issues with the following:
…but it is good fun for a small exercise based on the following OTN Thread: “Error with basic XMLTable“…
Let me show you what I mean.
Via “bfilename” you are able, since a long time, I guess Oracle 9.2 and onwards, to read a file as a BLOB and because an “XMLTYPE” can swallow almost any datatype, you could do the following…
From time to time the main Oracle XML DB page gets updated with new whitepapers, tooling or Oracle By Example/ Hands-on Lab examples. “Lately” some cool and interesting new whitepapers and updated tooling content were created on this main Oracle XML DB page. The following items and content are really worth reading. Small issue, though, is that you need a bit more than basic understanding to put all this “lessons learned from the last one, two years” into context, but its worth it and otherwise a small reprise on the Oracle XML DB Developers Guide is always useful. A bit like re-reading the Oracle Concepts Manual.
The “Ease of Use Tools” (xdbutilities.zip tool set) for handling XMLType Object Relational storage has been updated and is now applicable on Oracle 10.x and 11.x. No specific to be installed versioned tool set needed anymore. This prepacked tool set on PL/SQL packages is installable on both versions. The zip file also contains a whitepaper that describes some of the (performance) lessons learned while using XMLType Object Relational storage.
You will probably never build only one structured XMLIndex. A practical use case would be an unstructured XMLIndex, indexing the semi-structured parts of your XML, multiple structured XMLIndexes, indexing the highly structured XML islands of data and maybe even a Oracle Text Context index indexing unstructured XML data.
So the next example’s will show how to build an unstructured XMLIndex and build multiple structured XMLIndexes on top of the first one. Also it will give some examples on what to do if you have made mistakes and/or how to apply some maintenance on the XMLIndex structures. You start of by determining which sections should be addressed by the Unstructured XMLIndex and via path subsetting restrict the index to that part (also see “Oracle 11g – XMLIndex (Part 2) – XMLIndex Path Subsetting” for more info on path subsetting). There should be, I think, a good reason for indexing the same node path via multiple structured or unstructured XMLIndexes. One I can think of is to support different kind of XML Queries, but be aware that it, multiple XMLIndex structures on the same nodes, will come with an extra index maintenance overhead.
Anyway, lets say you want most part (haven’t used path subsetting here for the unstructured XMLIndex, but as said I should have done) of the XML document indexed via a unstructured XMLIndex and an extra of two structured XMLIndexes on top of the domain XMLIndex…