Loading ...
Sorry, an error occurred while loading the content.

All in one answer....

Expand Messages
  • Adam Flinton
    Dear Ron, Gee thanks this kept me up until roughly 2 -2.30 AM.........lots of blue biro on the print outs...... . OK So in order: API v. properties Proposed
    Message 1 of 9 , Dec 1, 2000
    • 0 Attachment
      Dear Ron,

      Gee thanks this kept me up until roughly 2 -2.30 AM.........lots of blue
      biro on the print outs......<G>.

      OK So in order:

      "API v. properties
      Proposed architecture
      Proposed GeneratePropertyFile command line
      Proposed transfer engine API
      Proposed transfer engine command line
      Proposed GenerateMap API
      Proposed GenerateMap command line"


      Some intial Points which cover a large number of questions:

      1) Fundamentally there are 2 sorts of API

      a) The Command Line interface (Cross platform)
      b) Public Methods exposed for use by Java developers

      2) Properties Objects / files seem to be the simplest ML imaginable. In Java
      everything is an object. In Unix (& in reality in most OS'es everything is a
      file). A Properties file bridges the gap. It is both an object & a file & it
      is not just any object..... it resolves to a very usefull hashtable with a
      built-in write to file mechanism. What is more properties files can be
      merged with both ease & speed to deliver a single Properties object. What is
      more a propfile could be written out by nearly any & every programming
      language possible.

      Think of them as (quite literally) equivalents of config.sys or
      autoexec.bat.

      > Given the use cases, I think we need two different APIs to the transfer
      > engine and GenerateMap. The first is a normal API for programmatic
      > access. The second is a command line API for batch access. (What do
      > stored procedures use?)
      >

      I reckon the cmd line should be used simply to create a propfile / amend a
      propfile.

      > Both the API and the command line can take two forms:
      >
      > 1) Individual methods or command line arguments. For example:
      >
      > storeDocument(mapfile, xmldoc, commitMode, keyGeneratorClassName);
      >

      This is definitely possible for a Java Programmer anyway supposing
      storeDocument is a public method.

      > or:
      >
      > java Transfer mymap.map mydoc.xml afterinsert KeyGeneratorImpl
      >

      Look @ the intial string handling costs. I went down this route initially
      however.........discoveries:

      Let's imagine you want to pass in a new variable e.g. CommitMode. If the CLI
      is done via specific keywords or in a specific order then the initial string
      args[] handling methods would have to be amended etc every time a change was
      made. Why? As it is all you need to do @ the mo is put in CommitMode=auto or
      whatever.

      Simply gen'ing a prop file & then having methods extract the info from the
      propfile as & when required is extremely usefull as it gives you an instant
      lookup facility where the Lookup table is instantly writable. As an example
      if using X-D as an EJB one might have propfiles instantied as Entity Beans
      for reference by lots of X-D processes. The propfile itself is
      subdivideable.....e.g. let us imagine a situation where 90% of any propfile
      is "generic" to either the comp or the app (e.g. "We are going to std'ize on
      Xerces1.1.2 & Oracle"). Equally the JDBC address of the DB might require a
      resolution via LDAP or whatever. As such you might only need to provide
      Username & password. This to a degree is what my ideas on "naming"
      propfiles is predicated upon.

      > 2) A properties file. For example:
      >
      > execute(props);
      >
      > or:
      >
      > java Transfer myprops.prop
      >

      Nice & simple eh?

      I really want to retain the ability to feed in a propfile as a single string
      if required. e.g. user=adam password=dobbin action=transfer
      parser=de.tudarmstadt.ito.domutils.Trans_Xerces
      nq=de.tudarmstadt.ito.domutils.NQ_DOM2 etc.

      I also want to be able to pass in multiple Propfiles within that string.

      Part of the reason has to do with Servlets & their love of long single
      strings.

      > Currently, choice (2) is implemented for both the API (on XML-DBMS) and
      > the command line. I strongly prefer choice (1) for the API. This is
      > because the code is easier to read (and thus easier to learn and
      > maintain) and because the code is a bit faster. However, as you will see
      > in the proposed APIs, what I'm suggesting is a bit of a mix --
      > properties for the database and parser and methods for actions.
      >

      Choice 1 gives you a much larger & more unwieldy initial string / arg
      handling main method / group of methods. I am not sure that the code would
      be that much faster in the long run. If all you are providing is a propfile
      ref (either as a file from disk or from sort of centrally held "in memory"
      hashtable) then how rapidly could you get to actually churning XML<>SQL is
      fairly debateable. In terms of Java's speed (esp with JIT & Hotspot) the
      main drag is object creation & disposal. With a Properties Object you build
      it once & then simply quiz it. If you're interested in "readable code" (I
      too am a simple person who likes simple things) then it allows you to
      dispose with creating tons of string & other objects until you actually need
      them. It also allows you to centralise various IO functions (such as finding
      out if the file exists & if not what to do about it & if so....) as an
      example I would prefer not to pass in a string filename into the various
      methods below such as DOM2DBMS but instead simply a file object which has
      already been opened & checked.

      > For the command line, I initially leaned towards (1) but, after some
      > thought, figured we could support a variation of (2) as well. (I want
      > separate files for database info, parser info, and actions, but the
      > actions file can point to the other two.) This allows people who use the
      > command line (generally non-programmers) a degree of flexibility in how
      > they set things up.
      >

      You can have as many property files as you wish. @ the end of the process
      you should get a single propfile out (if you want it writing out).

      Look on any propfiles you pass in as templates. If you want
      "^xerces122prop.txt^db2AdamsDBprop.txt^otherstuff.text that's entirely fine.

      Proposed Arch:
      "

      GeneratePropertyFiles


      TransferCMDLine GUI GenerateMapCMDLine
      \ / \ /
      \ / \ /
      TransferEngine GenerateMap
      | |
      DOMToDBMS, etc. map factories

      Notice three things:

      1) GeneratePropertyFiles is a separate, utility process and does not
      call the transfer engine, etc. For more info, see email Proposed
      GeneratePropertyFile command line."

      I have no prob with that per se however I would like to retain the abilty to
      have GeneratePropFiles (BTW we need another name for that as GPF has bad
      connotations for anyone who has used WIndows.....) execute either GenMap or
      Transfer.

      "2) The transfer engine and GenerateMap are separate processes and do not
      contain their command line interfaces. This is particularly important in
      the case of the transfer engine, which may some day evolve into a
      standalone server-type process."

      You're right.

      I have been giving mucho thought to this.

      Questions

      (1) How would you want to communicate with such a server engine? IMHO it
      must be done via std file / string passing mechanisms (e.g. http/sevlets,
      text messaging (JMS/SOAP/MQSeries/MSMQ)). The last thing we need is to
      require some port other than 80/8080. A properties file is a text file.
      Remember that one could pass the propfile to a servlet / EJB & then it could
      use that as a Session bean (i.e specific to you but maintained as long as
      you're hooked up to the App server) which in itself is again writable.

      (2) The GenMap class is something that in production systems would
      (hopefully) only be used by the "back-office guys" & the maps would be
      backed up & possibly even stored within a DBMS itself. The last thing you'd
      want is for someone to overwrite a map file without knowing what they're
      doing.

      3) I've drawn the GUI as calling the transfer engine / generate map APIs
      directly. This is intentional, but an early implementation can go
      through the command line classes if that's easier for now.


      I would see more as

      GUI
      |
      CMDLine
      |
      GeneratePropertyFiles
      | |
      TransferEngine GenerateMap
      | |
      DOMToDBMS, etc. map factories


      Note if the transfer engine (or even DOMtoDBMS) is done as a set of public
      classes & public methods then someone could always call them directly
      (assuming that mapfilenames etc were passed in as strings)

      No probs with that.

      Subject: Proposed GeneratePropertyFile command line

      > In the proposed architecture, GenProp is a utility for generating
      > property files. We need this because property files do not appear to be
      > hand-editable. In particular, the things you saw being escaped on your
      > system (Linux?) appear to be different from what I saw being escaped on
      > my system (Win95).
      >

      Actually they are hand-editable. I built the intial files which I passed
      onto you just using notepad. I have been doing the dev work on Linux & Win98
      & Win NT.

      There are some minor points use a \\ instead of \ eg d:\myfile is best done
      as d:\\myfile

      xmlfileout=D:\\Move\\xd\\t54.xml

      But that's about it.

      > Note that GeneratePropertyFile will not call Xmldbms/Transfer/etc. That
      > is, running stuff from the command line is a two-step process. You call
      > GeneratePropertyFile to generate one or more properties files, then you
      > actually call the transfer engine or GenerateMap. This is in line with
      > the notion that people in general will generate properties once and make
      > multiple calls (such as in a batch file) to the Transfer engine with a
      > fixed property file. If they're just playing, they'll use the GUI.
      >

      GPF should be able to call Xmldbms/Transfer/etc. At the moment whether it
      does or not is toggle-able ....

      This is if anything to make it easy for people to play around by both
      gnerating the property file & having it do something that they can then
      admire....<G>. However in a production system you should be able to load up
      a ready to go properties file amend it with say user info etc & thus
      straight into transfer.


      > I suggest the following:
      >
      > GeneratePropertyFile output-filename [-f property-filename | -p
      > property-value-pair] ...
      >
      > Properties are processed in the order encountered (left to right) with
      > the each duplicated property overriding its predecessor.
      >

      What's the need?

      IMHO reasonable aims should be

      A) the keep it as simple as possible
      B) To keep the entry points as text based as possible.


      Subject: Proposed transfer engine API


      > I took a look at the capabilities of DBMSToDOM and DOMToDBMS. These are
      > pretty much exposed by Transfer/TransferResultSet today, the only
      > exception being that you can't currently specify the commit mode and
      > KeyGenerator class. A suggested API is therefore:
      >

      A Perfect example of when & why to use the properties / value pairs
      approach. Write it once & then you have an adaptable CLI without changing a
      damn thing.

      > public void setDatabaseProperties(Properties props);
      > public void setDatabaseProperties(String propFilename);
      > // user and password can also be set as properties
      > public void setUserInfo(String userName, String password);
      >
      > public void setParserProperties(Properties props);
      > public void setParserProperties(String propFilename);
      >


      At the end of the day all these are strings. This is exactly where I got to
      b4 I decided that it would get to be unreadable if done properly (i.e. every
      field being private but with associated get & set public methods)

      You simply don't need tham as everything in the above list would be required
      so fetching them out of a hashtable/properties object would (IMHO) probably
      be faster as you don't need to go round creating string objects,filling
      them, using the results & then disposing of them.

      > public void storeDocument(String mapFilename,
      > String xmlFilename,
      > int commitMode,
      > String keyGeneratorClassName);
      >
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String sqlStatement);
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String tableName,
      > Object[] keyValues);
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String[] tableNames,
      > Object[][] keyValues);
      >
      > There are several problems with this API:
      >

      They seem fine to me.

      > 1) It assumes that the transfer engine is in the same process space as
      > the calling application. For the moment, this is OK with me, as I think
      > we've got enough work to do without worrying about interprocess
      > communication. In the future, the transfer engine might reasonably run
      > as some sort of server process.
      >

      If you wanted to deal with transfer as a piece of code built into your code
      (Transfer x = new Transfer(); sort of thing) then it would indeed be in the
      same process space (I suppose you could go down the RPC/RMI kind route but
      why bother?) In theory what we're going to be doing is producing / consuming
      XML docs. As a result a "client" might simply request XML docs from a http
      server & return them via http or might even request a certain doc via a
      single string (e.g. to a servlet). No reason for the client to interact with
      the server engine except via text (file or string).


      > 2) It does not reuse information (notably maps). This could be solved by
      > removing the mapFilename argument from the existing methods and having a
      > setMap method:
      >

      Oddly enough I have thought about this (notably by considering whether
      mapfiles etc should be resolving & loaded as a file once & then passed in as
      a file not as a string which must then be loaded resolved to a file, found,
      loaded & then acted upon.

      > public void setMap(String mapFilename);
      >

      I was thinking more of

      public File setMap(String mapFilename);

      > One question this raises is how an application uses multiple maps --
      > that is, how to avoid recompiling maps just because you use map A, then
      > map B, then A again, etc. One possibility is to return a map ID/handle
      > that is then passed in to the store/retrieve methods. We would also then
      > need a releaseMap method.
      >
      Yup. I know it may sound harsh....but we could always construct a hashmap &
      fill it with different mapfiles....

      > 3) It eliminates the table option (pass a table name and get SELECT *
      > FROM TABLE). This is because the signature of this method would match
      > the signature of the retrieveDocument(SQL) method. This doesn't bother
      > me much, as I can't imagine anybody using it in a production scenario.
      > If we want it on the GUI and the command line for purposes of people
      > playing, we can easily add it.
      >

      Ditto. I can only really imagine it being of use in the GUI.

      > 4) It probably exceeds the current capabilities of the KeyGenerator
      > interface. This is because there is no way to know what method to call
      > (if any) to initialize the key generator. (Currently, this is not a
      > problem because the calling application does the initialization and
      > DOMToDBMS just makes calls to get keys.) Given that Nick Semenov is the
      > only person who really seems to have even noticed this interface, I'm
      > willing to let it slide for now.
      >

      Hunky Dorey. In most production systems I suspect that keyGeneration might
      be left to the dbms anyway.



      Subject: Proposed transfer engine command line:


      > Here is a proposed command line parallel to the transfer engine
      > interface. It is not clear to me if this is in a separate class or in
      > the transfer engine class. My guess is separate, simply because this
      > will simplify things later if we decide to make the transfer engine into
      > some sort of standalone server process.
      >
      > Transfer properties-file [-u userName] [-p password]
      >
      > -OR-
      >

      No real need. I can certainly add this if required however chucking the
      various vals into the propfile such as Transfer properties file user=adam
      password=dobbin would also then allow you to add key1=128 key2=1234 etc.

      > Transfer database-property-file parser-property-file map-file xml-file
      > [-u userName] [-p password]
      > {-toxml [-c commitMode] [-k keyGeneratorClassName] |
      > -todbms -t tableName -k keyValues [-t tableName -k keyValues ...]
      > |
      > -todbms -s selectStatement}
      >

      Urrrggggg.

      > I originally only had the second syntax, but it has the obvious problem
      > of length (although it has the advantage of readability).
      >

      I am not so sure (really). user=adam is IMHO more naturally readable than -u
      adam

      > What I didn't like about a single properties file is that it mixes
      > together things that are unrelated. That is, parser properties are
      > unrelated to database properties are unrelated to actions. This means
      > that things like parser properties and database properties get
      > duplicated in all files, which I view as a very bad thing, since it
      > means they can't be changed easily.
      >

      That's not so.

      1) In order to work you need both a parser & a DB so both are related to
      actions
      2) In terms of duplication......(a)the Properties Object needs to be created
      only once & can then be quized for vals. The alterantive of creating lots of
      strings etc would IMHO be less efficient in terms of Object creation &
      destruction (b) They can be changed easily by simply loading in another
      value with that key. Again as an example imagine if in the DB properties
      part you simply put in ldap://MyDB & someone had written a little LDAP
      module which went off with the user info & pw & quizzed a LDAP server & as a
      result filled the jdbcurl & jdbcDriver vals (possbly not Driver (unless a
      classloader was in use)) but almost certainly the jdbcUrl.

      At the moment that would be fairly simple to implement (I nearly did so just
      for fun but I only have a LDAP server @ home @ the mo.)


      > The solution to this is to have a single (action) properties file that
      > has
      > pointers to parser properties and database properties files. This I
      > could easily live with and it would just be an option -- either you pass
      > in a properties file or you pass in the whole shebang above.
      >

      More propfiles = more room for error + more IO activity + more processing.

      You can have as many propfiles as you wish (or as few).

      > In either case, you don't get the option to override properties with a
      > random property=value pairs. To my mind, this is simply too much -- at
      > some point, flexibility is just giving people a rope to hang themselves
      > with.
      >

      a) Only property names which are being looked for will have any effect e.g.
      in testing I used splag=splurg & other nonsense parings.
      b) Again consider it in the same light as a config.sys. It isn't so much
      about avoiding -t or whatever as being able to send it a text file which all
      it needs.
      c) It gives us massive adaptability. E.g. you have now got an CommitMode
      value. I would have to do nothing at the CLI level in order to support this
      you'd simply have to add CommitMode=auto into either the prop file or the
      Commandline.

      Subject: Proposed GenerateMap API


      > Here's the proposed API for GenerateMap.
      >
      > public void setDatabaseProperties(Properties props);
      > public void setDatabaseProperties(String propFilename);
      > // user and password can also be set as properties
      > public void setUserInfo(String userName, String password);
      >
      > public void setParserProperties(Properties props);
      > public void setParserProperties(String propFilename);
      >

      Not nned.

      > // Type is DTD, DDML, or XML Schema. Generates .sql and .map
      > public void createMap(String filename, int type);
      >
      > // Generates .map and .dtd or .xsd (XML Schema)??
      > public void createMap(String tableName, String outputBasename);
      > public void createMap(String tableNames[], String outputBasename);
      > public void createMap(String sqlStatement, String outputBasename);
      >
      > Questions:
      >
      > 1) Should people be able to specify the name of the output .sql and .map
      > files? Or should we just continue to use the base name plus .sql/.map?
      >

      The way I see it......it should default to basename but you could provide
      the output names if you want (i.e if null then basename else....)

      > 2) The last createMap is illegal -- it has the same signature as the
      > second (tableName). Any ideas?
      >
      An Array can be an array of one

      > Note that this and the transfer engine might be derivable from a base
      > class that implements the parser and database properties. Note also that
      > I have't yet considered what exceptions can be thrown.
      >

      Yup.

      It should also be pointed out that as detailed in the textvalues.txt you can
      happily cut the properties file down to just those properties you need.

      Subject: Proposed GenerateMap command line


      > Here's the proposed command line for GenerateMap:
      >
      > GenerateMap properties-file [-u userName] [-p password]
      >
      > -OR-
      >
      > GenerateMap databasePropertyFile parserPropertyFile
      > [-u userName] [-p password]
      > { {-dtd | -ddml | -xsd} schemaFilename |
      > -t table [-t table ...] |
      > -s selectStatement}
      >
      > As is the case with Transfer, the single properties file would contain
      > pointers to the database and parsers property files.
      >

      Again you can have as many propfiles (or as few as you wish)

      I'll think about it some more however I do feel that sticking to using
      propfiles gives us:

      A) easily handeditable / easy to build programatically text files
      B) A pretty efficient object creation / destruction graph.
      C) Less code / easier to read code.
      D) Easy addition of other proerties if & when the time comes.
      E) An easily updateable central store of values which could be easily moved
      towards an entirely in memory EJB structure where part of the hashtable
      could be "variable" (e.g. user, SQL statement etc) & others might be system
      set (i.e by an admin person) e.g. xmlparser,dbinfo output xmlfilename (via a
      rules engine or similar).
      F) An app which could be used equally easily in a client or server, App
      server, Stored Proceedure & where requests might come in via many means but
      all transmitting text / text files.

      Try doing some mem tests as I did (but on a very low level scale) between
      creating lots of strings & populating them & loading a propfile & then
      quizzing it. I couldn't find much if any difference.


      Adam
    • Ronald Bourret
      Just a quick note. I think I m finally understanding what you ve been getting at with this properties stuff and am starting (I think) to see the light.
      Message 2 of 9 , Dec 1, 2000
      • 0 Attachment
        Just a quick note. I think I'm finally understanding what you've been
        getting at with this properties stuff and am starting (I think) to see
        the light.

        Basically, it appears that you are thinking of the transfer engine as a
        server process and using properties as a communication protocol, with
        the nice tie-in that anybody can generate them. I am thinking of the
        transfer engine as an in-process class, demanding an API.

        Since we're moving towards a server process anyway, the
        properties-as-communication protocol makes sense and the API I've been
        talking about just moves down a layer.

        The problem I see with all this is that a properties file is a one-way
        protocol. That is, how do we return things like session IDs, errors, and
        even XML documents to the client?

        I'll give this some more thought and get back to you on Monday, if not
        sooner.

        -- Ron
      • Ronald Bourret
        I ve been wracking my brains for the last five days about these issues. Although some of my conclusions are changing, some are definitely not. I m sure you re
        Message 3 of 9 , Dec 6, 2000
        • 0 Attachment
          I've been wracking my brains for the last five days about these issues.
          Although some of my conclusions are changing, some are definitely not.
          I'm sure you're in the same state. Here's the current state of my head,
          in order of least inflammatory to most inflammatory:

          1) I agree that it is useful to have a command line interface to the
          transfer engine (in particular) and the map engine (to a much lesser
          extent). This is needed by languages that, for whatever reason, can't
          call the transfer engine directly.

          After bashing my head around on this, I've decided that you're right
          that a list of property files and properties is the best way to do this.
          It gives people flexibility and the tools to right clean calls. It also
          (in my opinion) gives them rope to hang themselves, but we can take care
          of that in documentation.

          As to the exact syntax, I suggest one small change, and that is to
          replace the list of files separated by ^ with a special File property,
          which indicates that the value is the name of a property file.
          Properties are read from left to right and, in case of a duplicate, the
          last value read is the value used. The syntax is therefore as follows:

          Transfer <property>=<value>...
          GenerateMap <property>=<value>...

          For example:

          Transfer File=MyStuff.prop
          Transfer File=parser.prop File=database.prop File=action.prop
          Transfer File=parser.prop File=database.prop Action=storeDocument
          XMLDocument=foo.xml Map=foo.map

          2) Transfer and GenerateMap should be separate classes and call the
          transfer engine and map engine, respectively. These should be separate
          classes (a) so that it is clear to users what they are doing and (b)
          because I think they will evolve separately in the future. In
          particular, I expect that the transfer engine will become a
          multi-threaded server process and will need to be called in a very
          different manner than the map engine, which will probably only be used
          in-process.

          Furthermore, we should have a separate utility for generating property
          files that has a command line interface of the form:

          GeneratePropFile <property-file-name> <property>=<value>...

          All three of these (Transfer, GenerateMap, and GeneratePropFile) can
          obviously be derived from a single base class that processes the list of
          properties and files -- the only difference will be the main method and
          what class they then call.

          3) What worries me about the command line interfaces is that, while they
          work with our current feature set, they can't support all possible
          functionality. For example, what happens if we want a method to return a
          value? Do we write it to stdout?

          The specific case I am thinking about is when a Web application calls
          the transfer engine to get an XML document. Currently, our API/property
          set only allows you to write the document to disk. This is inefficient
          and we should be able to stream the document directly back to the
          application as XML.

          The general case I am thinking about is that the ultimate API for the
          transfer engine is quite likely to be the XML database API we're
          developing on the XML:DB mailing list. (You might want to join -- the
          discussion is quite good. See www.xmldb.org.) Although similar to our
          current API, it's at a slightly lower level, and would not be useable
          through a command line.

          This makes me think that the command line feature set is likely to be
          different from the transfer engine feature set. In particular, it will
          be limited to actions that make sense from a command line.

          4) At some level, the transfer engine and the map engine must have an
          API similar to what I proposed earlier. That is, whether this is a
          public API or is hidden behind a dispatch layer and called through
          properties, we still need to specify what functionality the engines
          expose. Therefore, we need to solidify these APIs, regardless of how
          they are called. I will continue discussion of these in separate email.

          5) The area where we just don't seem to agree is the API for the
          transfer engine and the map engine. You want a dispatch-style interface,
          I want explicit methods. I'm not sure what to do about the impasse here.

          I am absolutely adamant that we need an API with explicit methods. The
          first and foremost reason for this is readability in the code. It is the
          reason that SAX, DOM, JDBC, ODBC, OLE DB, Oracle CLI, ADO, the Windows
          API, JAXP, and hundreds of other APIs consist of multiple, explicit
          methods, as opposed to a single dispatch method.

          In fact, the only APIs I have ever seen that use dispatch methods are
          things like OLE Automation, Java Reflection, and program-to-program
          communication APIs. What all of these have in common is that the actual
          methods being called are not known until run time. That is not the case
          with us.

          (A distant second reason for an explicit API is speed, but I'm sure we
          could argue forever about which style is faster and could each find
          cases in which one was faster than the other. I think we also agree that
          the difference is negligible compared to total processing time.)

          Note that I view this API as separate from anything used for
          program-to-program communication. In particular, it should be possible
          for an application to call this API regardless of whether the transfer
          engine is running in process or as a separate application. Obviously, a
          program-to-program communication API and some sort of driver are needed
          in the latter case.

          6) How do we do program-to-program communication when the transfer
          engine is running as a server? One possibility is certainly to write our
          own API and use properties as a wire protocol, but I'm not convinced
          this is the way to go. (I have nothing against it -- I'm just ignorant.)

          In particular, are we reinventing the wheel here? What are the
          advantages / disadvantages of this over using RMI, JMS, SOAP, CORBA,
          sockets, or who knows what other technologies? Also, can the transfer
          engine be agnostic about this? That is, can we just write drivers for
          each protocol we choose to support? (This would be my first choice if it
          was possible.)

          Note that, after we define/choose a way to do program-to-program
          communication, I have no objections to applications intercepting this
          and talking to the transfer engine directly through the wire protocol.
          However, I'm not going to encourage this.

          Well, it's 1:00 AM and I can't think of anything else to say. I hope
          this spurs some ideas in you and we can solve this problem.

          -- Ron
        • adam flinton
          ... Ditto. BTW Sorry for the delay however I ve been in London setting up a system for Online Grocery Shopping for lastmile.com. Sheesh talk about needing to
          Message 4 of 9 , Dec 8, 2000
          • 0 Attachment
            > I've been wracking my brains for the last five days about
            > these issues.
            > Although some of my conclusions are changing, some are definitely not.
            > I'm sure you're in the same state. Here's the current state
            > of my head,
            > in order of least inflammatory to most inflammatory:
            >

            Ditto. BTW Sorry for the delay however I've been in London setting up a
            system for Online Grocery Shopping for lastmile.com. Sheesh talk about
            needing to be able to supplant the current mix/hodgepodge of systems/ ways
            of doing things with something simple....I am trying to integrate about 6
            systems ranging from Warehouse Management (with Allpoints.com) thorugh to
            our merchandizing system through to this & that....all different. @ least
            we've managed to fix on Oracle (yuck...) though that was the only real
            choice (2 of the systems are written for Oracle (all Stored Procs are PLSQL
            etc.etc.etc......) Loooooooooong days......Web System uses this, Merch uses
            that Warehouse Management uses something else.........Thank God for Coffee &
            tabacco....(well Caffiene & nicotine).


            Seriously though I gives me the distinct feeling that much as IPX etc gave
            way to TCP something like XMLDBMS + GUI's bound to XML Docs + some simple
            tested text file messaging (e.g. http/Servlets/JMS etc) is deeply needed
            just to get differing Enterprise Apps ("Best of Breed") talking easily &
            simply with each other. Either that or quite simply somewhere something is
            going to collapse (i.e. either the Software "System" in toto (i.e. including
            all the transfering of info) OR No-one outside of large comps with Deep
            pockets will be able to afford to build a system (i.e. the supporting
            commercial system might implode).

            Anyway enough of my wittering....

            > 1) I agree that it is useful to have a command line interface to the
            > transfer engine (in particular) and the map engine (to a much lesser
            > extent). This is needed by languages that, for whatever reason, can't
            > call the transfer engine directly.
            >

            Exactly...Oddly enough see my wittering above....Consider trying to link
            togather a "macro system / App" where some is in C some in VB some in Java
            some in straight SQL some in DB specific scripting lang such as PL/SQL
            etc.etc.etc. That's reality & the concept of "One Language One People" is
            unlikely to come in my lifetime....

            How should I put it......text messaging is going to suceed because of
            Language Differences. XML = text / string SQL = Text / string. thus IMHO an
            app which maps XML <> SQL MUST have a text interface.

            > After bashing my head around on this, I've decided that you're right
            > that a list of property files and properties is the best way
            > to do this.
            > It gives people flexibility and the tools to right clean
            > calls. It also
            > (in my opinion) gives them rope to hang themselves, but we
            > can take care
            > of that in documentation.
            >

            Yup. Heck a user can always type format c:.

            > As to the exact syntax, I suggest one small change, and that is to
            > replace the list of files separated by ^ with a special File property,
            > which indicates that the value is the name of a property file.
            > Properties are read from left to right and, in case of a
            > duplicate, the
            > last value read is the value used. The syntax is therefore as follows:
            >
            > Transfer <property>=<value>...
            > GenerateMap <property>=<value>...
            >
            > For example:
            >
            > Transfer File=MyStuff.prop
            > Transfer File=parser.prop File=database.prop File=action.prop
            > Transfer File=parser.prop File=database.prop Action=storeDocument
            > XMLDocument=foo.xml Map=foo.map
            >

            Done Deal. I'll get that done ASAP. To which end could you cast your eyes
            over the textvalues.txt & (a) Add anything which is missing (b) take out
            anything unneccessary (c) check the names of the Key values e.g. XMLDocument
            or Map or whatever.

            > 2) Transfer and GenerateMap should be separate classes and call the
            > transfer engine and map engine, respectively. These should be separate
            > classes (a) so that it is clear to users what they are doing and (b)
            > because I think they will evolve separately in the future. In
            > particular, I expect that the transfer engine will become a
            > multi-threaded server process and will need to be called in a very
            > different manner than the map engine, which will probably only be used
            > in-process.
            >


            OK. No Probs.

            > Furthermore, we should have a separate utility for generating property
            > files that has a command line interface of the form:
            >
            > GeneratePropFile <property-file-name> <property>=<value>...
            >
            > All three of these (Transfer, GenerateMap, and GeneratePropFile) can
            > obviously be derived from a single base class that processes
            > the list of
            > properties and files -- the only difference will be the main
            > method and
            > what class they then call.
            >


            OK.

            > 3) What worries me about the command line interfaces is that,
            > while they
            > work with our current feature set, they can't support all possible
            > functionality. For example, what happens if we want a method
            > to return a
            > value? Do we write it to stdout?
            >

            This is the same as the CLI vs API debate. In essence unless you are
            programming in Java or somehow using CORBA or whatever then you'll be in
            trouble no matter what. As an example how do you return an Int from Java to
            VB? to Perl? etc.etc. I know this might seem like repetition
            however.......we could always write it out to text in that case. Persoanlly
            i reckon that if you want to do this efficiently then it means either using
            Java & prog'ing to the public methods or using something like CORBA &
            prog'ing to the public methods via that.


            > The specific case I am thinking about is when a Web application calls
            > the transfer engine to get an XML document. Currently, our
            > API/property
            > set only allows you to write the document to disk. This is inefficient
            > and we should be able to stream the document directly back to the
            > application as XML.
            >

            This is very true & is something which I've given some thought to (esp re
            servlets (I am playing with using servlets for messaging.....no one ever
            said that servlets need to produce / accept just HTML or indeed that their
            output needs to be "visible")). It is almost the same as where do you get
            the file from / put it to. E.g. let's imagine that you want to send the
            resulting doc somewhere via http put. My intial answer (& it remains the
            same right now) is that it simply means adding stuff to the XMLwriting
            methods (or possibly even moving the file read / write out to a separate
            class as in writeFile(File,location) sort of thing).

            > The general case I am thinking about is that the ultimate API for the
            > transfer engine is quite likely to be the XML database API we're
            > developing on the XML:DB mailing list. (You might want to join -- the
            > discussion is quite good. See www.xmldb.org.) Although similar to our
            > current API, it's at a slightly lower level, and would not be useable
            > through a command line.
            >


            I'll have a look round....My only problems with XML DB'es per se are :

            A) Most of the world's data is & will remain in SQL table structures (i.e
            Relational not tree based)
            B) A number of very good tree based DB'es exist such as Cache which have
            been built (& optimsed) over many years & in essence the 2 are the same
            thing.

            > This makes me think that the command line feature set is likely to be
            > different from the transfer engine feature set. In particular, it will
            > be limited to actions that make sense from a command line.
            >

            Yup. The feature set is very simple to set out:

            1) Mapping / Design:
            1.1) Build a DB structure from an XML/ tree structure.
            1.2) Build XML from a DB / table based structure

            2) Operation:
            2.1) Transfer information as fast as possible from XML to an SQL DB
            2.2) Transfer information as fast as possible from and SQL DB to XML.



            > 4) At some level, the transfer engine and the map engine must have an
            > API similar to what I proposed earlier. That is, whether this is a
            > public API or is hidden behind a dispatch layer and called through
            > properties, we still need to specify what functionality the engines
            > expose. Therefore, we need to solidify these APIs, regardless of how
            > they are called. I will continue discussion of these in
            > separate email.
            >


            Yup. Public Methods. If it ain't a public method & you're not using Java
            then you're passing in text/string

            > 5) The area where we just don't seem to agree is the API for the
            > transfer engine and the map engine. You want a dispatch-style
            > interface,
            > I want explicit methods. I'm not sure what to do about the
            > impasse here.
            >


            I am absolutely happy with Public methods for transfer etc. All I want is to
            make XMLDBMS as accessible / useable to a non Java person (e.g. a DBA) as to
            a Java programmer. Let's be honest....if one were a java programmer then one
            could sidestep both transfer & GenerateMap & call DOMtoDBMS etc. yourself
            passing in structures which you'd created yourself. Equally you could build
            your own transfer engine etc. That's not the person I've been aiming @. I've
            been aiming @ the Oracle/DB2/SQLServer/Sybase etc.etc DBA or the guy who
            wants to get an answer in XML. Yes Java coders are obviously welcome but @
            the end of the day Java is a processing medium through which information is
            passing. It is simply the data's transitory state. Obviously I'd like the
            public methods to be as easily useable / accessible to a Java programmer as
            possible but most because that guy is likely to be me.....& I'm lazy so
            being able to do something by saying import org.xmlmiddleware.* + say five
            lines of code is what I'm after (as a Java developer). However......we have
            to expect that a large number of possible users would know either /and / or
            XML & SQL without ****needing**** to know Java.


            > I am absolutely adamant that we need an API with explicit methods. The
            > first and foremost reason for this is readability in the
            > code. It is the
            > reason that SAX, DOM, JDBC, ODBC, OLE DB, Oracle CLI, ADO, the Windows
            > API, JAXP, and hundreds of other APIs consist of multiple, explicit
            > methods, as opposed to a single dispatch method.
            >

            Yup indeedy. However as an example the Oracle CLI does not require that you
            know C or C++ simply because Oracle is written in that. Ditto OLEDB, ODBC
            etc.etc.

            I have no problem with someone calling transfer xml map etc.


            > In fact, the only APIs I have ever seen that use dispatch methods are
            > things like OLE Automation, Java Reflection, and program-to-program
            > communication APIs. What all of these have in common is that
            > the actual
            > methods being called are not known until run time. That is
            > not the case
            > with us.
            >

            That's simply not true. SQL is the best example. What's DB2 written in?
            what's Oracle written in? Do I care? Do I compile SQL?

            Think of a database trigger written in SQL.


            > (A distant second reason for an explicit API is speed, but I'm sure we
            > could argue forever about which style is faster and could each find
            > cases in which one was faster than the other. I think we also
            > agree that
            > the difference is negligible compared to total processing time.)
            >

            If you really wanted speed in the API then quite simply you don't want a
            call transfer xyz etc on the command line. Instead you'd want a transfer
            which accepted a file object (or a number of them) directly in Java within
            your app.

            The moment you bring a CLI into it then you have string handling.


            > Note that I view this API as separate from anything used for
            > program-to-program communication. In particular, it should be possible
            > for an application to call this API regardless of whether the transfer
            > engine is running in process or as a separate application.
            > Obviously, a
            > program-to-program communication API and some sort of driver
            > are needed
            > in the latter case.
            >

            Public methods are fine with me.

            > 6) How do we do program-to-program communication when the transfer
            > engine is running as a server? One possibility is certainly
            > to write our
            > own API and use properties as a wire protocol, but I'm not convinced
            > this is the way to go. (I have nothing against it -- I'm just
            > ignorant.)
            >

            See below:



            > In particular, are we reinventing the wheel here? What are the
            > advantages / disadvantages of this over using RMI, JMS, SOAP, CORBA,
            > sockets, or who knows what other technologies? Also, can the transfer
            > engine be agnostic about this? That is, can we just write drivers for
            > each protocol we choose to support? (This would be my first
            > choice if it
            > was possible.)
            >

            RMI probs include non Java apps, firewalling.

            JMS CORBA SOAP would all carry properties files as @ the end of the day they
            carry text files & Properties files are just that. You could add servlets +
            any other dynamic http protocol.


            > Note that, after we define/choose a way to do program-to-program
            > communication, I have no objections to applications intercepting this
            > and talking to the transfer engine directly through the wire protocol.
            > However, I'm not going to encourage this.
            >
            > Well, it's 1:00 AM and I can't think of anything else to say. I hope
            > this spurs some ideas in you and we can solve this problem.
            >

            Some ideas which I've also been working on via my sometimes circuitous brain
            paths......

            I've been investigating the enhydra schemamapper class for use with
            generating class'es / objects such that I can have a GUI app which accepts /
            gets sent an XML doc & can then load the relevant class to deal with / map
            to the xml doc. In essence XML per se is useless, unless something is done
            with it (whether in a GUI or a servlet or whatever). So building / using
            something which allows my developers to easily do something with the
            resultant XML (& indeed provide XML for use by XMLDBMS) is also important.
            Then it struck me (as things do when it's late & I'm tired) that in many
            ways the org.xmlmiddleware kinda covered this too.

            i.e one "action" might well be to produce the relevant class'es to deal with
            the XML docs produced according to the schema (or indeed to create a new XML
            doc) such that you have an SQLDB. You produce the SQL structure you wish to
            have mapped. This results in a map file & a schema. What then?
            Wellllllll........run that schema through with "action=produceclasses" (or
            something similar) & voila you have something which your servlet / GUI
            developers can then use. The thought was triggered partly by my own needs &
            partly as we may well (you mentioned it sometime back) use the schemamapper
            any way & this would allow the use of the same code infrastructure (e.g. the
            abstraction of the parsers etc). It would also ties in with moving transfer,
            genmap etc into separate classes as all I would be doing would be to add a
            "genJava" class....

            i.e it would be an all in one design / builder tool starting with either a
            schema or an SQL structure and resulting in the means to get XML<>SQL &
            XML<>Java processing (GUI client etc.)

            OK nuff said.

            Right then actions:

            Could you review the textvalues.txt as a start.




            Adam
          • Pareena Shah
            Question for the people thinking about the new version of XML DBMS: What do you think about using something like sqlloader to bulk load transformed XML data
            Message 5 of 9 , Dec 8, 2000
            • 0 Attachment
              Question for the people thinking about the new version of XML DBMS: What do
              you think about using something like sqlloader to bulk load transformed XML
              data into an Oracle database? If I have a situation where I am going to be
              processing large volumes of XML data into an Oracle database, and I want to
              optimize by buffering rows, and using Oracle's direct path load
              functionality, is sql loader the best way? Could you comment on the
              advantages/disadvantages?

              ----- Original Message -----
              From: adam flinton <aflinton@...>
              To: Ronald Bourret <rpbourret@...>; xml-dbms Mailing List (E-mail)
              <xml-dbms@egroups.com>
              Sent: Friday, December 08, 2000 12:50 PM
              Subject: RE: RE: [xml-dbms] All in one answer....


              >
              > > I've been wracking my brains for the last five days about
              > > these issues.
              > > Although some of my conclusions are changing, some are definitely not.
              > > I'm sure you're in the same state. Here's the current state
              > > of my head,
              > > in order of least inflammatory to most inflammatory:
              > >
              >
              > Ditto. BTW Sorry for the delay however I've been in London setting up a
              > system for Online Grocery Shopping for lastmile.com. Sheesh talk about
              > needing to be able to supplant the current mix/hodgepodge of systems/ ways
              > of doing things with something simple....I am trying to integrate about 6
              > systems ranging from Warehouse Management (with Allpoints.com) thorugh to
              > our merchandizing system through to this & that....all different. @ least
              > we've managed to fix on Oracle (yuck...) though that was the only real
              > choice (2 of the systems are written for Oracle (all Stored Procs are
              PLSQL
              > etc.etc.etc......) Loooooooooong days......Web System uses this, Merch
              uses
              > that Warehouse Management uses something else.........Thank God for Coffee
              &
              > tabacco....(well Caffiene & nicotine).
              >
              >
              > Seriously though I gives me the distinct feeling that much as IPX etc gave
              > way to TCP something like XMLDBMS + GUI's bound to XML Docs + some simple
              > tested text file messaging (e.g. http/Servlets/JMS etc) is deeply needed
              > just to get differing Enterprise Apps ("Best of Breed") talking easily &
              > simply with each other. Either that or quite simply somewhere something is
              > going to collapse (i.e. either the Software "System" in toto (i.e.
              including
              > all the transfering of info) OR No-one outside of large comps with Deep
              > pockets will be able to afford to build a system (i.e. the supporting
              > commercial system might implode).
              >
              > Anyway enough of my wittering....
              >
              > > 1) I agree that it is useful to have a command line interface to the
              > > transfer engine (in particular) and the map engine (to a much lesser
              > > extent). This is needed by languages that, for whatever reason, can't
              > > call the transfer engine directly.
              > >
              >
              > Exactly...Oddly enough see my wittering above....Consider trying to link
              > togather a "macro system / App" where some is in C some in VB some in Java
              > some in straight SQL some in DB specific scripting lang such as PL/SQL
              > etc.etc.etc. That's reality & the concept of "One Language One People" is
              > unlikely to come in my lifetime....
              >
              > How should I put it......text messaging is going to suceed because of
              > Language Differences. XML = text / string SQL = Text / string. thus IMHO
              an
              > app which maps XML <> SQL MUST have a text interface.
              >
              > > After bashing my head around on this, I've decided that you're right
              > > that a list of property files and properties is the best way
              > > to do this.
              > > It gives people flexibility and the tools to right clean
              > > calls. It also
              > > (in my opinion) gives them rope to hang themselves, but we
              > > can take care
              > > of that in documentation.
              > >
              >
              > Yup. Heck a user can always type format c:.
              >
              > > As to the exact syntax, I suggest one small change, and that is to
              > > replace the list of files separated by ^ with a special File property,
              > > which indicates that the value is the name of a property file.
              > > Properties are read from left to right and, in case of a
              > > duplicate, the
              > > last value read is the value used. The syntax is therefore as follows:
              > >
              > > Transfer <property>=<value>...
              > > GenerateMap <property>=<value>...
              > >
              > > For example:
              > >
              > > Transfer File=MyStuff.prop
              > > Transfer File=parser.prop File=database.prop File=action.prop
              > > Transfer File=parser.prop File=database.prop Action=storeDocument
              > > XMLDocument=foo.xml Map=foo.map
              > >
              >
              > Done Deal. I'll get that done ASAP. To which end could you cast your eyes
              > over the textvalues.txt & (a) Add anything which is missing (b) take out
              > anything unneccessary (c) check the names of the Key values e.g.
              XMLDocument
              > or Map or whatever.
              >
              > > 2) Transfer and GenerateMap should be separate classes and call the
              > > transfer engine and map engine, respectively. These should be separate
              > > classes (a) so that it is clear to users what they are doing and (b)
              > > because I think they will evolve separately in the future. In
              > > particular, I expect that the transfer engine will become a
              > > multi-threaded server process and will need to be called in a very
              > > different manner than the map engine, which will probably only be used
              > > in-process.
              > >
              >
              >
              > OK. No Probs.
              >
              > > Furthermore, we should have a separate utility for generating property
              > > files that has a command line interface of the form:
              > >
              > > GeneratePropFile <property-file-name> <property>=<value>...
              > >
              > > All three of these (Transfer, GenerateMap, and GeneratePropFile) can
              > > obviously be derived from a single base class that processes
              > > the list of
              > > properties and files -- the only difference will be the main
              > > method and
              > > what class they then call.
              > >
              >
              >
              > OK.
              >
              > > 3) What worries me about the command line interfaces is that,
              > > while they
              > > work with our current feature set, they can't support all possible
              > > functionality. For example, what happens if we want a method
              > > to return a
              > > value? Do we write it to stdout?
              > >
              >
              > This is the same as the CLI vs API debate. In essence unless you are
              > programming in Java or somehow using CORBA or whatever then you'll be in
              > trouble no matter what. As an example how do you return an Int from Java
              to
              > VB? to Perl? etc.etc. I know this might seem like repetition
              > however.......we could always write it out to text in that case.
              Persoanlly
              > i reckon that if you want to do this efficiently then it means either
              using
              > Java & prog'ing to the public methods or using something like CORBA &
              > prog'ing to the public methods via that.
              >
              >
              > > The specific case I am thinking about is when a Web application calls
              > > the transfer engine to get an XML document. Currently, our
              > > API/property
              > > set only allows you to write the document to disk. This is inefficient
              > > and we should be able to stream the document directly back to the
              > > application as XML.
              > >
              >
              > This is very true & is something which I've given some thought to (esp re
              > servlets (I am playing with using servlets for messaging.....no one ever
              > said that servlets need to produce / accept just HTML or indeed that their
              > output needs to be "visible")). It is almost the same as where do you get
              > the file from / put it to. E.g. let's imagine that you want to send the
              > resulting doc somewhere via http put. My intial answer (& it remains the
              > same right now) is that it simply means adding stuff to the XMLwriting
              > methods (or possibly even moving the file read / write out to a separate
              > class as in writeFile(File,location) sort of thing).
              >
              > > The general case I am thinking about is that the ultimate API for the
              > > transfer engine is quite likely to be the XML database API we're
              > > developing on the XML:DB mailing list. (You might want to join -- the
              > > discussion is quite good. See www.xmldb.org.) Although similar to our
              > > current API, it's at a slightly lower level, and would not be useable
              > > through a command line.
              > >
              >
              >
              > I'll have a look round....My only problems with XML DB'es per se are :
              >
              > A) Most of the world's data is & will remain in SQL table structures (i.e
              > Relational not tree based)
              > B) A number of very good tree based DB'es exist such as Cache which have
              > been built (& optimsed) over many years & in essence the 2 are the same
              > thing.
              >
              > > This makes me think that the command line feature set is likely to be
              > > different from the transfer engine feature set. In particular, it will
              > > be limited to actions that make sense from a command line.
              > >
              >
              > Yup. The feature set is very simple to set out:
              >
              > 1) Mapping / Design:
              > 1.1) Build a DB structure from an XML/ tree structure.
              > 1.2) Build XML from a DB / table based structure
              >
              > 2) Operation:
              > 2.1) Transfer information as fast as possible from XML to an SQL DB
              > 2.2) Transfer information as fast as possible from and SQL DB to XML.
              >
              >
              >
              > > 4) At some level, the transfer engine and the map engine must have an
              > > API similar to what I proposed earlier. That is, whether this is a
              > > public API or is hidden behind a dispatch layer and called through
              > > properties, we still need to specify what functionality the engines
              > > expose. Therefore, we need to solidify these APIs, regardless of how
              > > they are called. I will continue discussion of these in
              > > separate email.
              > >
              >
              >
              > Yup. Public Methods. If it ain't a public method & you're not using Java
              > then you're passing in text/string
              >
              > > 5) The area where we just don't seem to agree is the API for the
              > > transfer engine and the map engine. You want a dispatch-style
              > > interface,
              > > I want explicit methods. I'm not sure what to do about the
              > > impasse here.
              > >
              >
              >
              > I am absolutely happy with Public methods for transfer etc. All I want is
              to
              > make XMLDBMS as accessible / useable to a non Java person (e.g. a DBA) as
              to
              > a Java programmer. Let's be honest....if one were a java programmer then
              one
              > could sidestep both transfer & GenerateMap & call DOMtoDBMS etc. yourself

              > passing in structures which you'd created yourself. Equally you could
              build
              > your own transfer engine etc. That's not the person I've been aiming @.
              I've
              > been aiming @ the Oracle/DB2/SQLServer/Sybase etc.etc DBA or the guy who
              > wants to get an answer in XML. Yes Java coders are obviously welcome but @
              > the end of the day Java is a processing medium through which information
              is
              > passing. It is simply the data's transitory state. Obviously I'd like the
              > public methods to be as easily useable / accessible to a Java programmer
              as
              > possible but most because that guy is likely to be me.....& I'm lazy so
              > being able to do something by saying import org.xmlmiddleware.* + say five
              > lines of code is what I'm after (as a Java developer). However......we
              have
              > to expect that a large number of possible users would know either /and /
              or
              > XML & SQL without ****needing**** to know Java.
              >
              >
              > > I am absolutely adamant that we need an API with explicit methods. The
              > > first and foremost reason for this is readability in the
              > > code. It is the
              > > reason that SAX, DOM, JDBC, ODBC, OLE DB, Oracle CLI, ADO, the Windows
              > > API, JAXP, and hundreds of other APIs consist of multiple, explicit
              > > methods, as opposed to a single dispatch method.
              > >
              >
              > Yup indeedy. However as an example the Oracle CLI does not require that
              you
              > know C or C++ simply because Oracle is written in that. Ditto OLEDB, ODBC
              > etc.etc.
              >
              > I have no problem with someone calling transfer xml map etc.
              >
              >
              > > In fact, the only APIs I have ever seen that use dispatch methods are
              > > things like OLE Automation, Java Reflection, and program-to-program
              > > communication APIs. What all of these have in common is that
              > > the actual
              > > methods being called are not known until run time. That is
              > > not the case
              > > with us.
              > >
              >
              > That's simply not true. SQL is the best example. What's DB2 written in?
              > what's Oracle written in? Do I care? Do I compile SQL?
              >
              > Think of a database trigger written in SQL.
              >
              >
              > > (A distant second reason for an explicit API is speed, but I'm sure we
              > > could argue forever about which style is faster and could each find
              > > cases in which one was faster than the other. I think we also
              > > agree that
              > > the difference is negligible compared to total processing time.)
              > >
              >
              > If you really wanted speed in the API then quite simply you don't want a
              > call transfer xyz etc on the command line. Instead you'd want a transfer
              > which accepted a file object (or a number of them) directly in Java within
              > your app.
              >
              > The moment you bring a CLI into it then you have string handling.
              >
              >
              > > Note that I view this API as separate from anything used for
              > > program-to-program communication. In particular, it should be possible
              > > for an application to call this API regardless of whether the transfer
              > > engine is running in process or as a separate application.
              > > Obviously, a
              > > program-to-program communication API and some sort of driver
              > > are needed
              > > in the latter case.
              > >
              >
              > Public methods are fine with me.
              >
              > > 6) How do we do program-to-program communication when the transfer
              > > engine is running as a server? One possibility is certainly
              > > to write our
              > > own API and use properties as a wire protocol, but I'm not convinced
              > > this is the way to go. (I have nothing against it -- I'm just
              > > ignorant.)
              > >
              >
              > See below:
              >
              >
              >
              > > In particular, are we reinventing the wheel here? What are the
              > > advantages / disadvantages of this over using RMI, JMS, SOAP, CORBA,
              > > sockets, or who knows what other technologies? Also, can the transfer
              > > engine be agnostic about this? That is, can we just write drivers for
              > > each protocol we choose to support? (This would be my first
              > > choice if it
              > > was possible.)
              > >
              >
              > RMI probs include non Java apps, firewalling.
              >
              > JMS CORBA SOAP would all carry properties files as @ the end of the day
              they
              > carry text files & Properties files are just that. You could add servlets
              +
              > any other dynamic http protocol.
              >
              >
              > > Note that, after we define/choose a way to do program-to-program
              > > communication, I have no objections to applications intercepting this
              > > and talking to the transfer engine directly through the wire protocol.
              > > However, I'm not going to encourage this.
              > >
              > > Well, it's 1:00 AM and I can't think of anything else to say. I hope
              > > this spurs some ideas in you and we can solve this problem.
              > >
              >
              > Some ideas which I've also been working on via my sometimes circuitous
              brain
              > paths......
              >
              > I've been investigating the enhydra schemamapper class for use with
              > generating class'es / objects such that I can have a GUI app which accepts
              /
              > gets sent an XML doc & can then load the relevant class to deal with / map
              > to the xml doc. In essence XML per se is useless, unless something is done
              > with it (whether in a GUI or a servlet or whatever). So building / using
              > something which allows my developers to easily do something with the
              > resultant XML (& indeed provide XML for use by XMLDBMS) is also important.
              > Then it struck me (as things do when it's late & I'm tired) that in many
              > ways the org.xmlmiddleware kinda covered this too.
              >
              > i.e one "action" might well be to produce the relevant class'es to deal
              with
              > the XML docs produced according to the schema (or indeed to create a new
              XML
              > doc) such that you have an SQLDB. You produce the SQL structure you wish
              to
              > have mapped. This results in a map file & a schema. What then?
              > Wellllllll........run that schema through with "action=produceclasses" (or
              > something similar) & voila you have something which your servlet / GUI
              > developers can then use. The thought was triggered partly by my own needs
              &
              > partly as we may well (you mentioned it sometime back) use the
              schemamapper
              > any way & this would allow the use of the same code infrastructure (e.g.
              the
              > abstraction of the parsers etc). It would also ties in with moving
              transfer,
              > genmap etc into separate classes as all I would be doing would be to add a
              > "genJava" class....
              >
              > i.e it would be an all in one design / builder tool starting with either a
              > schema or an SQL structure and resulting in the means to get XML<>SQL &
              > XML<>Java processing (GUI client etc.)
              >
              > OK nuff said.
              >
              > Right then actions:
              >
              > Could you review the textvalues.txt as a start.
              >
              >
              >
              >
              > Adam
              >
              >
              > To Post a message, send it to: xml-dbms@...
              >
              > To Unsubscribe, send a blank message to: xml-dbms-unsubscribe@...
              >
            • Ronald Bourret
              ... This is an interesting idea, although it won t be included in the next release due to lack of time. (It would require completely rearchitecting the
              Message 6 of 9 , Dec 11, 2000
              • 0 Attachment
                Pareena Shah wrote:
                >
                > Question for the people thinking about the new version of XML DBMS: What do
                > you think about using something like sqlloader to bulk load transformed XML
                > data into an Oracle database? If I have a situation where I am going to be
                > processing large volumes of XML data into an Oracle database, and I want to
                > optimize by buffering rows, and using Oracle's direct path load
                > functionality, is sql loader the best way? Could you comment on the
                > advantages/disadvantages?

                This is an interesting idea, although it won't be included in the next
                release due to lack of time. (It would require completely rearchitecting
                the DOMToDBMS and DBMSToDOM classes.)

                The following discussion is not specific to Oracle's bulk loader, but
                discusses how XML-DBMS might do bulk inserts in the future. This assumes
                such updates are possible using JDBC, and it is not clear to me that
                they are.

                The challenge is this. Suppose we have an XML document that looks like
                the following:

                <A>
                <A1>...</A1>
                <A2>...</A2>
                <A3>...</A3>
                <A4>...</A4>
                <B>
                <B1>...</B1>
                <B2>...</B2>
                <B3>...</B3>
                </B>
                </A>

                and that this document was mapped to tables A (columns A1-A4) and B
                (columns B1-B3) as expected, with the primary key in table A. Now
                suppose you have a whole lot of these structures in a single XML
                document:

                <root>
                <A>
                <A1>...</A1>
                <A2>...</A2>
                <A3>...</A3>
                <A4>...</A4>
                <B>
                <B1>...</B1>
                <B2>...</B2>
                <B3>...</B3>
                </B>
                </A>
                ...
                <A>
                <A1>...</A1>
                <A2>...</A2>
                <A3>...</A3>
                <A4>...</A4>
                <B>
                <B1>...</B1>
                <B2>...</B2>
                <B3>...</B3>
                </B>
                </A>
                </root>

                Currently, what the code does is inserts the row for the first A, then
                the row for the first B, then the row for the second A, then the row for
                the second B, and so on.

                To use bulk loading, the code would need to buffer rows for A and rows
                for B, then insert them when there are a certain number of rows in the
                buffer -- say 100. While this probably wouldn't be too bad in the above
                case, it could get very complicated in the general case.

                For example, imagine there can be an arbitrary number of B children for
                each A parent. Thus, the buffer for B rows would fill up before the
                buffer for A rows. However, the code has to be careful about when it
                inserts rows. That is, it can't just wait until the buffer for B rows is
                full and then just insert them. Because of referential integrity, it has
                to insert the A rows before the B rows, so you need to coordinate when
                the buffers are emptied. Now, imagine doing this for an XML document
                that is nested arbitrarily deep and you'll see that the code is
                non-trivial.

                So while this is a good idea and worth looking at in the future, we
                don't have time to do it now.

                --
                Ronald Bourret
                Programming, Writing, and Training
                XML, Databases, and Schemas
                http://www.rpbourret.com
              • Ronald Bourret
                Ahhh. We re closer than I thought. I think the last thing we need to do is move the dispatch method from the transfer/map engine to the CLI class. In practical
                Message 7 of 9 , Dec 11, 2000
                • 0 Attachment
                  Ahhh. We're closer than I thought. I think the last thing we need to do
                  is move the dispatch method from the transfer/map engine to the CLI
                  class. In practical terms, this means moving the init, action, and
                  transfer methods from Xmldbms to Transfer, and the init and action
                  methods from Xmldbms to GenerateMap. Thus, assuming Transfer and
                  GenerateMap inherit from a ProcessProperties class, they would look
                  something like the following. Notice that dispatch is a public method,
                  so people (like the GUI) who want to do text-based programming can write
                  directly to it without going through a command line.

                  public class Transfer {

                  public static main(String[] args) throws Exception {
                  // Parse the arguments and generate a Properties object
                  Properties props = this.getProperties(args);

                  // Dispatch the action
                  dispatch(props);
                  }

                  public static void dispatch(Properties props) throws Exception
                  {
                  TransferEngine transferEngine = new TransferEngine();

                  // Set up the parser and database
                  transferEngine.setParserProperties(props);
                  transferEngine.setDatabaseProperties(props);

                  // Dispatch the action
                  String action = props.get(ACTION);
                  if (action.equals(STOREDOCUMENT)) {
                  String mapFilename = props.get(MAPFILE);
                  String xmlFilename = props.get(XMLFILE);
                  int commitMode = convertCommitMode(props.get(COMMITMODE));
                  String keyGeneratorClass = props.get(KEYGENERATORCLASS);
                  transferEngine.storeDocument(mapFilename, xmlFilename,
                  commitMode, keyGeneratorClass);
                  } else if action.equals(...) {
                  ...
                  ...
                  } else ... {
                  ...
                  }
                  }
                  }

                  This will make the CLI classes more complex than they are now. (It
                  requires them to know the transfer and map engine APIs.) However, this
                  means we have a clean separation between the text-based interface and
                  the programmatic interface. Furthermore, since the text-based interface
                  is layered on top of the programmatic interface, it makes sense for the
                  text-based interface (higher level) to know about the programmatic
                  interface (lower level) but not vice versa. Finally, it means we can
                  evolve both interfaces separately without too much worry of interference
                  between the two.

                  As a first stab at the text-based interface, see what you think of the
                  properties in:

                  http://www.eGroups.com/message/xml-dbms/486

                  Ignore the discussion of three separate files and the DatabasePropsFile
                  and ParserPropsFile properties. The rest is pretty much the same as
                  what's in textvalues.txt, except for: (1) renaming, and (2)
                  consolidation of the action, t_status, and t_direction properties into a
                  single Action property.

                  Does this give people too many options? For example, should we remove
                  commit mode and keygeneratorclass, infer the schema type from the file
                  extension, and merge retrieveDocumentByKey and retrieveDocumentByKeys?
                  On the one hand, this is supposed to be a simple API. On the other hand,
                  people probably want control.

                  You can also ignore the comment that these properties simply reflect the
                  underlying API. Although that statement might be true now, I don't think
                  it will be true in the future. In particular, the text-based API should
                  have properties that make sense for execution in a language-independent,
                  disconnected, probably stateless environment. The programmatic API
                  should have methods that make sense for execution in a Java-based,
                  connected, state-maintaining environment.

                  For the moment, you can use the following as the transfer and map engine
                  APIs, but we definitely need to take another look at these and see what
                  makes sense for the future:

                  http://www.eGroups.com/message/xml-dbms/480
                  http://www.eGroups.com/message/xml-dbms/485

                  Other comments below.

                  adam flinton wrote:

                  > Done Deal. I'll get that done ASAP. To which end could you cast your eyes
                  > over the textvalues.txt & (a) Add anything which is missing (b) take out
                  > anything unneccessary (c) check the names of the Key values e.g. XMLDocument
                  > or Map or whatever.

                  See comments above.

                  > > The specific case I am thinking about is when a Web application calls
                  > > the transfer engine to get an XML document. Currently, our
                  > > API/property
                  > > set only allows you to write the document to disk. This is inefficient
                  > > and we should be able to stream the document directly back to the
                  > > application as XML.
                  >
                  > This is very true & is something which I've given some thought to (esp re
                  > servlets (I am playing with using servlets for messaging.....no one ever
                  > said that servlets need to produce / accept just HTML or indeed that their
                  > output needs to be "visible")). It is almost the same as where do you get
                  > the file from / put it to. E.g. let's imagine that you want to send the
                  > resulting doc somewhere via http put. My intial answer (& it remains the
                  > same right now) is that it simply means adding stuff to the XMLwriting
                  > methods (or possibly even moving the file read / write out to a separate
                  > class as in writeFile(File,location) sort of thing).

                  Let's leave this alone for the moment, get the architecture in place,
                  possibly do a beta release, and then take another look at this before
                  final release. I can't help but think there's a reasonable solution to
                  this in the text-based case. Perhaps the Transfer.dispatch method can
                  return an Object?

                  > I'll have a look round....My only problems with XML DB'es per se are :
                  >
                  > A) Most of the world's data is & will remain in SQL table structures (i.e
                  > Relational not tree based)
                  > B) A number of very good tree based DB'es exist such as Cache which have
                  > been built (& optimsed) over many years & in essence the 2 are the same
                  > thing.

                  Note that this is an "XML-based API to databases", not an "API to XML
                  databases". That is, just as ODBC/JDBC is based on the relational model,
                  this API is based on XML. And just as you can implement ODBC/JDBC over
                  non-relational data by mapping that data to the relational model, you
                  can implement this API over relational data by mapping the relational
                  data to XML. (Presumably using something like mapping documents in
                  XML-DBMS, DAD in DB2, or annotated schemas in SQL Server.)

                  The goal of the API is to make all databases that support XML look the
                  same, regardless of whether the underlying storage is native,
                  relational, object-oriented, hierarchical, or whatever else.

                  > Yup. The feature set is very simple to set out:
                  >
                  > 1) Mapping / Design:
                  > 1.1) Build a DB structure from an XML/ tree structure.
                  > 1.2) Build XML from a DB / table based structure
                  >
                  > 2) Operation:
                  > 2.1) Transfer information as fast as possible from XML to an SQL DB
                  > 2.2) Transfer information as fast as possible from and SQL DB to XML.

                  This is a good summary and worth remembering.

                  > Let's be honest....if one were a java programmer then one
                  > could sidestep both transfer & GenerateMap & call DOMtoDBMS etc. yourself
                  > passing in structures which you'd created yourself. Equally you could build
                  > your own transfer engine etc.

                  Agreed.

                  > That's not the person I've been aiming @. I've
                  > been aiming @ the Oracle/DB2/SQLServer/Sybase etc.etc DBA or the guy who
                  > wants to get an answer in XML.

                  Also agreed. I think what took so long to get through my head is that
                  the text-based API is the simplest API and is separate from the lower
                  level APIs, which give more control to people who want it.

                  > RMI probs include non Java apps, firewalling.
                  >
                  > JMS CORBA SOAP would all carry properties files as @ the end of the day they
                  > carry text files & Properties files are just that. You could add servlets +
                  > any other dynamic http protocol.

                  OK. Let's set this aside for the moment. We've got enough to do...

                  > I've been investigating the enhydra schemamapper class for use with
                  > generating class'es / objects such that I can have a GUI app which accepts /
                  > gets sent an XML doc & can then load the relevant class to deal with / map
                  > to the xml doc. In essence XML per se is useless, unless something is done
                  > with it (whether in a GUI or a servlet or whatever). So building / using
                  > something which allows my developers to easily do something with the
                  > resultant XML (& indeed provide XML for use by XMLDBMS) is also important.
                  > Then it struck me (as things do when it's late & I'm tired) that in many
                  > ways the org.xmlmiddleware kinda covered this too.
                  >
                  > i.e one "action" might well be to produce the relevant class'es to deal with
                  > the XML docs produced according to the schema (or indeed to create a new XML
                  > doc) such that you have an SQLDB. You produce the SQL structure you wish to
                  > have mapped. This results in a map file & a schema. What then?
                  > Wellllllll........run that schema through with "action=produceclasses" (or
                  > something similar) & voila you have something which your servlet / GUI
                  > developers can then use. The thought was triggered partly by my own needs &
                  > partly as we may well (you mentioned it sometime back) use the schemamapper
                  > any way & this would allow the use of the same code infrastructure (e.g. the
                  > abstraction of the parsers etc). It would also ties in with moving transfer,
                  > genmap etc into separate classes as all I would be doing would be to add a
                  > "genJava" class....

                  I've thought of this, too, and it's what behind Castor, Bluestone,
                  Informix's Object Translator, Sun's Project Adelard, and probably some
                  other things I'm not aware of. Personally, I think this is where things
                  will go in the future. Let's face it, transferring data between an XML
                  document and a database is not nearly as interesting as having an
                  intermediate object that you can use to manipulate that data.

                  As for XML-DBMS' involvement in this sort of thing, I've kept clear of
                  it for two reasons. First, there are enough interesting problems in the
                  straight XML <=> DBMS world to keep me busy for a long time. Second, a
                  bunch of other people are already doing this, so I see little point in
                  duplicating other peoples' work, especially when some of that work is
                  Open Source.

                  That said, I was planning to keep it in mind when designing the map
                  factory for XML schemas, which could form the basis for this sort of
                  code.

                  (By the way, last time I looked at the schemamapper class in Enhydra, it
                  was woefully underpowered. That is, it supported just a tiny fragment of
                  what schemas can do. I assume it will evolve as time goes on, but at the
                  moment, it doesn't do us much good.)

                  --
                  Ronald Bourret
                  Programming, Writing, and Training
                  XML, Databases, and Schemas
                  http://www.rpbourret.com
                • meyappan@yahoo.com
                  Hi: I am just wondering if we have a flat xml that is with no nested relationship, Is it feasible to do bulk loading of xml data into oracle using direct path
                  Message 8 of 9 , Apr 9, 2001
                  • 0 Attachment
                    Hi:

                    I am just wondering if we have a flat xml that is with no nested
                    relationship, Is it feasible to do bulk loading of xml data into
                    oracle using direct path load.

                    Thanks
                    Meyyappan


                    --- In xml-dbms@y..., Ronald Bourret <rpbourret@r...> wrote:
                    > Pareena Shah wrote:
                    > >
                    > > Question for the people thinking about the new version of XML
                    DBMS: What do
                    > > you think about using something like sqlloader to bulk load
                    transformed XML
                    > > data into an Oracle database? If I have a situation where I am
                    going to be
                    > > processing large volumes of XML data into an Oracle database, and
                    I want to
                    > > optimize by buffering rows, and using Oracle's direct path load
                    > > functionality, is sql loader the best way? Could you comment on
                    the
                    > > advantages/disadvantages?
                    >
                    > This is an interesting idea, although it won't be included in the
                    next
                    > release due to lack of time. (It would require completely
                    rearchitecting
                    > the DOMToDBMS and DBMSToDOM classes.)
                    >
                    > The following discussion is not specific to Oracle's bulk loader,
                    but
                    > discusses how XML-DBMS might do bulk inserts in the future. This
                    assumes
                    > such updates are possible using JDBC, and it is not clear to me that
                    > they are.
                    >
                    > The challenge is this. Suppose we have an XML document that looks
                    like
                    > the following:
                    >
                    > <A>
                    > <A1>...</A1>
                    > <A2>...</A2>
                    > <A3>...</A3>
                    > <A4>...</A4>
                    > <B>
                    > <B1>...</B1>
                    > <B2>...</B2>
                    > <B3>...</B3>
                    > </B>
                    > </A>
                    >
                    > and that this document was mapped to tables A (columns A1-A4) and B
                    > (columns B1-B3) as expected, with the primary key in table A. Now
                    > suppose you have a whole lot of these structures in a single XML
                    > document:
                    >
                    > <root>
                    > <A>
                    > <A1>...</A1>
                    > <A2>...</A2>
                    > <A3>...</A3>
                    > <A4>...</A4>
                    > <B>
                    > <B1>...</B1>
                    > <B2>...</B2>
                    > <B3>...</B3>
                    > </B>
                    > </A>
                    > ...
                    > <A>
                    > <A1>...</A1>
                    > <A2>...</A2>
                    > <A3>...</A3>
                    > <A4>...</A4>
                    > <B>
                    > <B1>...</B1>
                    > <B2>...</B2>
                    > <B3>...</B3>
                    > </B>
                    > </A>
                    > </root>
                    >
                    > Currently, what the code does is inserts the row for the first A,
                    then
                    > the row for the first B, then the row for the second A, then the
                    row for
                    > the second B, and so on.
                    >
                    > To use bulk loading, the code would need to buffer rows for A and
                    rows
                    > for B, then insert them when there are a certain number of rows in
                    the
                    > buffer -- say 100. While this probably wouldn't be too bad in the
                    above
                    > case, it could get very complicated in the general case.
                    >
                    > For example, imagine there can be an arbitrary number of B children
                    for
                    > each A parent. Thus, the buffer for B rows would fill up before the
                    > buffer for A rows. However, the code has to be careful about when it
                    > inserts rows. That is, it can't just wait until the buffer for B
                    rows is
                    > full and then just insert them. Because of referential integrity,
                    it has
                    > to insert the A rows before the B rows, so you need to coordinate
                    when
                    > the buffers are emptied. Now, imagine doing this for an XML document
                    > that is nested arbitrarily deep and you'll see that the code is
                    > non-trivial.
                    >
                    > So while this is a good idea and worth looking at in the future, we
                    > don't have time to do it now.
                    >
                    > --
                    > Ronald Bourret
                    > Programming, Writing, and Training
                    > XML, Databases, and Schemas
                    > http://www.rpbourret.com
                  • Ronald Bourret
                    ... Is direct path load a feature of Oracle? If so, XML-DBMS does not use it. By flat XML I assume you mean something like the following:
                    Message 9 of 9 , Apr 12, 2001
                    • 0 Attachment
                      meyappan@... wrote:

                      > I am just wondering if we have a flat xml that is with no nested
                      > relationship, Is it feasible to do bulk loading of xml data into
                      > oracle using direct path load.

                      Is "direct path load" a feature of Oracle? If so, XML-DBMS does not use
                      it.

                      By "flat XML" I assume you mean something like the following:

                      <Table>
                      <Row>
                      <Column1>...</Column1>
                      <Column2>...</Column2>
                      ...
                      </Row>
                      <Row>
                      ...
                      </Row>
                      ...
                      </Table>

                      If this is the case, XML-DBMS is probably overkill, as it is designed
                      especially to work with nested XML. If you want to use a
                      database-specific bulk-load utility, you can probably write your own
                      code to do this fairly easily. Such code would presumably use ODBC
                      (which supports bulk loads) or Oracle's own API.

                      I've attached a rough example of what a SAX version of this code would
                      look like at the end of this message -- you would need to modify it for
                      bulk loading. (I have a vague feeling there is a state error somewhere
                      in this code, but haven't ever run it so I'm not sure.)

                      -- Ron

                      The code to transfer data from XML to the database follows a common
                      pattern, regardless of whether it uses SAX or DOM:

                      1.Table element start: prepare an INSERT statement
                      2.Row element start: clear INSERT statement parameters
                      3.Column elements: buffer PCDATA and set INSERT statement parameters
                      4.Row element end: execute INSERT statement
                      5.Table element end: close INSERT statement

                      The code does not make any assumptions about the names of the tags. In
                      fact, it uses the name of the table-level tag to build the
                      INSERT statement and the names of the column-level tags to identify
                      parameters in the INSERT statement. Thus, these names
                      could correspond exactly to the names in the database or could be mapped
                      to names in the database using a configuration file.

                      Here is the code using SAX:

                      int state = TABLE;
                      PreparedStatement stmt;
                      StringBuffer data;

                      public void startElement(String uri, String name, String qName,
                      Attributes attr) {
                      if (state == TABLE) {
                      stmt = getInsertStmt(name);
                      state = ROW;
                      } else if (state == ROW) {
                      stmt.clearParameters();
                      state = COLUMN;
                      } else { // if (state == COLUMN)
                      data = new StringBuffer();
                      }
                      }

                      public void characters (char[] chars, int start, int length) {
                      if (state == COLUMN)
                      data.append(chars, start, length);
                      }

                      public void endElement(String uri, String name, String qName) {
                      if (state == TABLE)
                      stmt.close();
                      else if (state == ROW) {
                      stmt.executeUpdate();
                      state = TABLE;
                      } else { // if (state == COLUMN)
                      setParameter(stmt, name, data.toString());
                      state = ROW;
                      }
                      }
                    Your message has been successfully submitted and would be delivered to recipients shortly.