Loading ...
Sorry, an error occurred while loading the content.

513Re: [xml-dbms] All in one answer....

Expand Messages
  • Ronald Bourret
    Dec 11, 2000
    • 0 Attachment
      Pareena Shah wrote:
      >
      > Question for the people thinking about the new version of XML DBMS: What do
      > you think about using something like sqlloader to bulk load transformed XML
      > data into an Oracle database? If I have a situation where I am going to be
      > processing large volumes of XML data into an Oracle database, and I want to
      > optimize by buffering rows, and using Oracle's direct path load
      > functionality, is sql loader the best way? Could you comment on the
      > advantages/disadvantages?

      This is an interesting idea, although it won't be included in the next
      release due to lack of time. (It would require completely rearchitecting
      the DOMToDBMS and DBMSToDOM classes.)

      The following discussion is not specific to Oracle's bulk loader, but
      discusses how XML-DBMS might do bulk inserts in the future. This assumes
      such updates are possible using JDBC, and it is not clear to me that
      they are.

      The challenge is this. Suppose we have an XML document that looks like
      the following:

      <A>
      <A1>...</A1>
      <A2>...</A2>
      <A3>...</A3>
      <A4>...</A4>
      <B>
      <B1>...</B1>
      <B2>...</B2>
      <B3>...</B3>
      </B>
      </A>

      and that this document was mapped to tables A (columns A1-A4) and B
      (columns B1-B3) as expected, with the primary key in table A. Now
      suppose you have a whole lot of these structures in a single XML
      document:

      <root>
      <A>
      <A1>...</A1>
      <A2>...</A2>
      <A3>...</A3>
      <A4>...</A4>
      <B>
      <B1>...</B1>
      <B2>...</B2>
      <B3>...</B3>
      </B>
      </A>
      ...
      <A>
      <A1>...</A1>
      <A2>...</A2>
      <A3>...</A3>
      <A4>...</A4>
      <B>
      <B1>...</B1>
      <B2>...</B2>
      <B3>...</B3>
      </B>
      </A>
      </root>

      Currently, what the code does is inserts the row for the first A, then
      the row for the first B, then the row for the second A, then the row for
      the second B, and so on.

      To use bulk loading, the code would need to buffer rows for A and rows
      for B, then insert them when there are a certain number of rows in the
      buffer -- say 100. While this probably wouldn't be too bad in the above
      case, it could get very complicated in the general case.

      For example, imagine there can be an arbitrary number of B children for
      each A parent. Thus, the buffer for B rows would fill up before the
      buffer for A rows. However, the code has to be careful about when it
      inserts rows. That is, it can't just wait until the buffer for B rows is
      full and then just insert them. Because of referential integrity, it has
      to insert the A rows before the B rows, so you need to coordinate when
      the buffers are emptied. Now, imagine doing this for an XML document
      that is nested arbitrarily deep and you'll see that the code is
      non-trivial.

      So while this is a good idea and worth looking at in the future, we
      don't have time to do it now.

      --
      Ronald Bourret
      Programming, Writing, and Training
      XML, Databases, and Schemas
      http://www.rpbourret.com
    • Show all 9 messages in this topic