Loading ...
Sorry, an error occurred while loading the content.

Re: [xml-dbms] All in one answer....

Expand Messages
  • Ronald Bourret
    Ahhh. We re closer than I thought. I think the last thing we need to do is move the dispatch method from the transfer/map engine to the CLI class. In practical
    Message 1 of 9 , Dec 11, 2000
      Ahhh. We're closer than I thought. I think the last thing we need to do
      is move the dispatch method from the transfer/map engine to the CLI
      class. In practical terms, this means moving the init, action, and
      transfer methods from Xmldbms to Transfer, and the init and action
      methods from Xmldbms to GenerateMap. Thus, assuming Transfer and
      GenerateMap inherit from a ProcessProperties class, they would look
      something like the following. Notice that dispatch is a public method,
      so people (like the GUI) who want to do text-based programming can write
      directly to it without going through a command line.

      public class Transfer {

      public static main(String[] args) throws Exception {
      // Parse the arguments and generate a Properties object
      Properties props = this.getProperties(args);

      // Dispatch the action

      public static void dispatch(Properties props) throws Exception
      TransferEngine transferEngine = new TransferEngine();

      // Set up the parser and database

      // Dispatch the action
      String action = props.get(ACTION);
      if (action.equals(STOREDOCUMENT)) {
      String mapFilename = props.get(MAPFILE);
      String xmlFilename = props.get(XMLFILE);
      int commitMode = convertCommitMode(props.get(COMMITMODE));
      String keyGeneratorClass = props.get(KEYGENERATORCLASS);
      transferEngine.storeDocument(mapFilename, xmlFilename,
      commitMode, keyGeneratorClass);
      } else if action.equals(...) {
      } else ... {

      This will make the CLI classes more complex than they are now. (It
      requires them to know the transfer and map engine APIs.) However, this
      means we have a clean separation between the text-based interface and
      the programmatic interface. Furthermore, since the text-based interface
      is layered on top of the programmatic interface, it makes sense for the
      text-based interface (higher level) to know about the programmatic
      interface (lower level) but not vice versa. Finally, it means we can
      evolve both interfaces separately without too much worry of interference
      between the two.

      As a first stab at the text-based interface, see what you think of the
      properties in:


      Ignore the discussion of three separate files and the DatabasePropsFile
      and ParserPropsFile properties. The rest is pretty much the same as
      what's in textvalues.txt, except for: (1) renaming, and (2)
      consolidation of the action, t_status, and t_direction properties into a
      single Action property.

      Does this give people too many options? For example, should we remove
      commit mode and keygeneratorclass, infer the schema type from the file
      extension, and merge retrieveDocumentByKey and retrieveDocumentByKeys?
      On the one hand, this is supposed to be a simple API. On the other hand,
      people probably want control.

      You can also ignore the comment that these properties simply reflect the
      underlying API. Although that statement might be true now, I don't think
      it will be true in the future. In particular, the text-based API should
      have properties that make sense for execution in a language-independent,
      disconnected, probably stateless environment. The programmatic API
      should have methods that make sense for execution in a Java-based,
      connected, state-maintaining environment.

      For the moment, you can use the following as the transfer and map engine
      APIs, but we definitely need to take another look at these and see what
      makes sense for the future:


      Other comments below.

      adam flinton wrote:

      > Done Deal. I'll get that done ASAP. To which end could you cast your eyes
      > over the textvalues.txt & (a) Add anything which is missing (b) take out
      > anything unneccessary (c) check the names of the Key values e.g. XMLDocument
      > or Map or whatever.

      See comments above.

      > > The specific case I am thinking about is when a Web application calls
      > > the transfer engine to get an XML document. Currently, our
      > > API/property
      > > set only allows you to write the document to disk. This is inefficient
      > > and we should be able to stream the document directly back to the
      > > application as XML.
      > This is very true & is something which I've given some thought to (esp re
      > servlets (I am playing with using servlets for messaging.....no one ever
      > said that servlets need to produce / accept just HTML or indeed that their
      > output needs to be "visible")). It is almost the same as where do you get
      > the file from / put it to. E.g. let's imagine that you want to send the
      > resulting doc somewhere via http put. My intial answer (& it remains the
      > same right now) is that it simply means adding stuff to the XMLwriting
      > methods (or possibly even moving the file read / write out to a separate
      > class as in writeFile(File,location) sort of thing).

      Let's leave this alone for the moment, get the architecture in place,
      possibly do a beta release, and then take another look at this before
      final release. I can't help but think there's a reasonable solution to
      this in the text-based case. Perhaps the Transfer.dispatch method can
      return an Object?

      > I'll have a look round....My only problems with XML DB'es per se are :
      > A) Most of the world's data is & will remain in SQL table structures (i.e
      > Relational not tree based)
      > B) A number of very good tree based DB'es exist such as Cache which have
      > been built (& optimsed) over many years & in essence the 2 are the same
      > thing.

      Note that this is an "XML-based API to databases", not an "API to XML
      databases". That is, just as ODBC/JDBC is based on the relational model,
      this API is based on XML. And just as you can implement ODBC/JDBC over
      non-relational data by mapping that data to the relational model, you
      can implement this API over relational data by mapping the relational
      data to XML. (Presumably using something like mapping documents in
      XML-DBMS, DAD in DB2, or annotated schemas in SQL Server.)

      The goal of the API is to make all databases that support XML look the
      same, regardless of whether the underlying storage is native,
      relational, object-oriented, hierarchical, or whatever else.

      > Yup. The feature set is very simple to set out:
      > 1) Mapping / Design:
      > 1.1) Build a DB structure from an XML/ tree structure.
      > 1.2) Build XML from a DB / table based structure
      > 2) Operation:
      > 2.1) Transfer information as fast as possible from XML to an SQL DB
      > 2.2) Transfer information as fast as possible from and SQL DB to XML.

      This is a good summary and worth remembering.

      > Let's be honest....if one were a java programmer then one
      > could sidestep both transfer & GenerateMap & call DOMtoDBMS etc. yourself
      > passing in structures which you'd created yourself. Equally you could build
      > your own transfer engine etc.


      > That's not the person I've been aiming @. I've
      > been aiming @ the Oracle/DB2/SQLServer/Sybase etc.etc DBA or the guy who
      > wants to get an answer in XML.

      Also agreed. I think what took so long to get through my head is that
      the text-based API is the simplest API and is separate from the lower
      level APIs, which give more control to people who want it.

      > RMI probs include non Java apps, firewalling.
      > JMS CORBA SOAP would all carry properties files as @ the end of the day they
      > carry text files & Properties files are just that. You could add servlets +
      > any other dynamic http protocol.

      OK. Let's set this aside for the moment. We've got enough to do...

      > I've been investigating the enhydra schemamapper class for use with
      > generating class'es / objects such that I can have a GUI app which accepts /
      > gets sent an XML doc & can then load the relevant class to deal with / map
      > to the xml doc. In essence XML per se is useless, unless something is done
      > with it (whether in a GUI or a servlet or whatever). So building / using
      > something which allows my developers to easily do something with the
      > resultant XML (& indeed provide XML for use by XMLDBMS) is also important.
      > Then it struck me (as things do when it's late & I'm tired) that in many
      > ways the org.xmlmiddleware kinda covered this too.
      > i.e one "action" might well be to produce the relevant class'es to deal with
      > the XML docs produced according to the schema (or indeed to create a new XML
      > doc) such that you have an SQLDB. You produce the SQL structure you wish to
      > have mapped. This results in a map file & a schema. What then?
      > Wellllllll........run that schema through with "action=produceclasses" (or
      > something similar) & voila you have something which your servlet / GUI
      > developers can then use. The thought was triggered partly by my own needs &
      > partly as we may well (you mentioned it sometime back) use the schemamapper
      > any way & this would allow the use of the same code infrastructure (e.g. the
      > abstraction of the parsers etc). It would also ties in with moving transfer,
      > genmap etc into separate classes as all I would be doing would be to add a
      > "genJava" class....

      I've thought of this, too, and it's what behind Castor, Bluestone,
      Informix's Object Translator, Sun's Project Adelard, and probably some
      other things I'm not aware of. Personally, I think this is where things
      will go in the future. Let's face it, transferring data between an XML
      document and a database is not nearly as interesting as having an
      intermediate object that you can use to manipulate that data.

      As for XML-DBMS' involvement in this sort of thing, I've kept clear of
      it for two reasons. First, there are enough interesting problems in the
      straight XML <=> DBMS world to keep me busy for a long time. Second, a
      bunch of other people are already doing this, so I see little point in
      duplicating other peoples' work, especially when some of that work is
      Open Source.

      That said, I was planning to keep it in mind when designing the map
      factory for XML schemas, which could form the basis for this sort of

      (By the way, last time I looked at the schemamapper class in Enhydra, it
      was woefully underpowered. That is, it supported just a tiny fragment of
      what schemas can do. I assume it will evolve as time goes on, but at the
      moment, it doesn't do us much good.)

      Ronald Bourret
      Programming, Writing, and Training
      XML, Databases, and Schemas
    • meyappan@yahoo.com
      Hi: I am just wondering if we have a flat xml that is with no nested relationship, Is it feasible to do bulk loading of xml data into oracle using direct path
      Message 2 of 9 , Apr 9, 2001

        I am just wondering if we have a flat xml that is with no nested
        relationship, Is it feasible to do bulk loading of xml data into
        oracle using direct path load.


        --- In xml-dbms@y..., Ronald Bourret <rpbourret@r...> wrote:
        > Pareena Shah wrote:
        > >
        > > Question for the people thinking about the new version of XML
        DBMS: What do
        > > you think about using something like sqlloader to bulk load
        transformed XML
        > > data into an Oracle database? If I have a situation where I am
        going to be
        > > processing large volumes of XML data into an Oracle database, and
        I want to
        > > optimize by buffering rows, and using Oracle's direct path load
        > > functionality, is sql loader the best way? Could you comment on
        > > advantages/disadvantages?
        > This is an interesting idea, although it won't be included in the
        > release due to lack of time. (It would require completely
        > the DOMToDBMS and DBMSToDOM classes.)
        > The following discussion is not specific to Oracle's bulk loader,
        > discusses how XML-DBMS might do bulk inserts in the future. This
        > such updates are possible using JDBC, and it is not clear to me that
        > they are.
        > The challenge is this. Suppose we have an XML document that looks
        > the following:
        > <A>
        > <A1>...</A1>
        > <A2>...</A2>
        > <A3>...</A3>
        > <A4>...</A4>
        > <B>
        > <B1>...</B1>
        > <B2>...</B2>
        > <B3>...</B3>
        > </B>
        > </A>
        > and that this document was mapped to tables A (columns A1-A4) and B
        > (columns B1-B3) as expected, with the primary key in table A. Now
        > suppose you have a whole lot of these structures in a single XML
        > document:
        > <root>
        > <A>
        > <A1>...</A1>
        > <A2>...</A2>
        > <A3>...</A3>
        > <A4>...</A4>
        > <B>
        > <B1>...</B1>
        > <B2>...</B2>
        > <B3>...</B3>
        > </B>
        > </A>
        > ...
        > <A>
        > <A1>...</A1>
        > <A2>...</A2>
        > <A3>...</A3>
        > <A4>...</A4>
        > <B>
        > <B1>...</B1>
        > <B2>...</B2>
        > <B3>...</B3>
        > </B>
        > </A>
        > </root>
        > Currently, what the code does is inserts the row for the first A,
        > the row for the first B, then the row for the second A, then the
        row for
        > the second B, and so on.
        > To use bulk loading, the code would need to buffer rows for A and
        > for B, then insert them when there are a certain number of rows in
        > buffer -- say 100. While this probably wouldn't be too bad in the
        > case, it could get very complicated in the general case.
        > For example, imagine there can be an arbitrary number of B children
        > each A parent. Thus, the buffer for B rows would fill up before the
        > buffer for A rows. However, the code has to be careful about when it
        > inserts rows. That is, it can't just wait until the buffer for B
        rows is
        > full and then just insert them. Because of referential integrity,
        it has
        > to insert the A rows before the B rows, so you need to coordinate
        > the buffers are emptied. Now, imagine doing this for an XML document
        > that is nested arbitrarily deep and you'll see that the code is
        > non-trivial.
        > So while this is a good idea and worth looking at in the future, we
        > don't have time to do it now.
        > --
        > Ronald Bourret
        > Programming, Writing, and Training
        > XML, Databases, and Schemas
        > http://www.rpbourret.com
      • Ronald Bourret
        ... Is direct path load a feature of Oracle? If so, XML-DBMS does not use it. By flat XML I assume you mean something like the following:
        Message 3 of 9 , Apr 12, 2001
          meyappan@... wrote:

          > I am just wondering if we have a flat xml that is with no nested
          > relationship, Is it feasible to do bulk loading of xml data into
          > oracle using direct path load.

          Is "direct path load" a feature of Oracle? If so, XML-DBMS does not use

          By "flat XML" I assume you mean something like the following:


          If this is the case, XML-DBMS is probably overkill, as it is designed
          especially to work with nested XML. If you want to use a
          database-specific bulk-load utility, you can probably write your own
          code to do this fairly easily. Such code would presumably use ODBC
          (which supports bulk loads) or Oracle's own API.

          I've attached a rough example of what a SAX version of this code would
          look like at the end of this message -- you would need to modify it for
          bulk loading. (I have a vague feeling there is a state error somewhere
          in this code, but haven't ever run it so I'm not sure.)

          -- Ron

          The code to transfer data from XML to the database follows a common
          pattern, regardless of whether it uses SAX or DOM:

          1.Table element start: prepare an INSERT statement
          2.Row element start: clear INSERT statement parameters
          3.Column elements: buffer PCDATA and set INSERT statement parameters
          4.Row element end: execute INSERT statement
          5.Table element end: close INSERT statement

          The code does not make any assumptions about the names of the tags. In
          fact, it uses the name of the table-level tag to build the
          INSERT statement and the names of the column-level tags to identify
          parameters in the INSERT statement. Thus, these names
          could correspond exactly to the names in the database or could be mapped
          to names in the database using a configuration file.

          Here is the code using SAX:

          int state = TABLE;
          PreparedStatement stmt;
          StringBuffer data;

          public void startElement(String uri, String name, String qName,
          Attributes attr) {
          if (state == TABLE) {
          stmt = getInsertStmt(name);
          state = ROW;
          } else if (state == ROW) {
          state = COLUMN;
          } else { // if (state == COLUMN)
          data = new StringBuffer();

          public void characters (char[] chars, int start, int length) {
          if (state == COLUMN)
          data.append(chars, start, length);

          public void endElement(String uri, String name, String qName) {
          if (state == TABLE)
          else if (state == ROW) {
          state = TABLE;
          } else { // if (state == COLUMN)
          setParameter(stmt, name, data.toString());
          state = ROW;
        Your message has been successfully submitted and would be delivered to recipients shortly.