Loading ...
Sorry, an error occurred while loading the content.

505RE: RE: [xml-dbms] All in one answer....

Expand Messages
  • adam flinton
    Dec 8, 2000
    • 0 Attachment
      > I've been wracking my brains for the last five days about
      > these issues.
      > Although some of my conclusions are changing, some are definitely not.
      > I'm sure you're in the same state. Here's the current state
      > of my head,
      > in order of least inflammatory to most inflammatory:
      >

      Ditto. BTW Sorry for the delay however I've been in London setting up a
      system for Online Grocery Shopping for lastmile.com. Sheesh talk about
      needing to be able to supplant the current mix/hodgepodge of systems/ ways
      of doing things with something simple....I am trying to integrate about 6
      systems ranging from Warehouse Management (with Allpoints.com) thorugh to
      our merchandizing system through to this & that....all different. @ least
      we've managed to fix on Oracle (yuck...) though that was the only real
      choice (2 of the systems are written for Oracle (all Stored Procs are PLSQL
      etc.etc.etc......) Loooooooooong days......Web System uses this, Merch uses
      that Warehouse Management uses something else.........Thank God for Coffee &
      tabacco....(well Caffiene & nicotine).


      Seriously though I gives me the distinct feeling that much as IPX etc gave
      way to TCP something like XMLDBMS + GUI's bound to XML Docs + some simple
      tested text file messaging (e.g. http/Servlets/JMS etc) is deeply needed
      just to get differing Enterprise Apps ("Best of Breed") talking easily &
      simply with each other. Either that or quite simply somewhere something is
      going to collapse (i.e. either the Software "System" in toto (i.e. including
      all the transfering of info) OR No-one outside of large comps with Deep
      pockets will be able to afford to build a system (i.e. the supporting
      commercial system might implode).

      Anyway enough of my wittering....

      > 1) I agree that it is useful to have a command line interface to the
      > transfer engine (in particular) and the map engine (to a much lesser
      > extent). This is needed by languages that, for whatever reason, can't
      > call the transfer engine directly.
      >

      Exactly...Oddly enough see my wittering above....Consider trying to link
      togather a "macro system / App" where some is in C some in VB some in Java
      some in straight SQL some in DB specific scripting lang such as PL/SQL
      etc.etc.etc. That's reality & the concept of "One Language One People" is
      unlikely to come in my lifetime....

      How should I put it......text messaging is going to suceed because of
      Language Differences. XML = text / string SQL = Text / string. thus IMHO an
      app which maps XML <> SQL MUST have a text interface.

      > After bashing my head around on this, I've decided that you're right
      > that a list of property files and properties is the best way
      > to do this.
      > It gives people flexibility and the tools to right clean
      > calls. It also
      > (in my opinion) gives them rope to hang themselves, but we
      > can take care
      > of that in documentation.
      >

      Yup. Heck a user can always type format c:.

      > As to the exact syntax, I suggest one small change, and that is to
      > replace the list of files separated by ^ with a special File property,
      > which indicates that the value is the name of a property file.
      > Properties are read from left to right and, in case of a
      > duplicate, the
      > last value read is the value used. The syntax is therefore as follows:
      >
      > Transfer <property>=<value>...
      > GenerateMap <property>=<value>...
      >
      > For example:
      >
      > Transfer File=MyStuff.prop
      > Transfer File=parser.prop File=database.prop File=action.prop
      > Transfer File=parser.prop File=database.prop Action=storeDocument
      > XMLDocument=foo.xml Map=foo.map
      >

      Done Deal. I'll get that done ASAP. To which end could you cast your eyes
      over the textvalues.txt & (a) Add anything which is missing (b) take out
      anything unneccessary (c) check the names of the Key values e.g. XMLDocument
      or Map or whatever.

      > 2) Transfer and GenerateMap should be separate classes and call the
      > transfer engine and map engine, respectively. These should be separate
      > classes (a) so that it is clear to users what they are doing and (b)
      > because I think they will evolve separately in the future. In
      > particular, I expect that the transfer engine will become a
      > multi-threaded server process and will need to be called in a very
      > different manner than the map engine, which will probably only be used
      > in-process.
      >


      OK. No Probs.

      > Furthermore, we should have a separate utility for generating property
      > files that has a command line interface of the form:
      >
      > GeneratePropFile <property-file-name> <property>=<value>...
      >
      > All three of these (Transfer, GenerateMap, and GeneratePropFile) can
      > obviously be derived from a single base class that processes
      > the list of
      > properties and files -- the only difference will be the main
      > method and
      > what class they then call.
      >


      OK.

      > 3) What worries me about the command line interfaces is that,
      > while they
      > work with our current feature set, they can't support all possible
      > functionality. For example, what happens if we want a method
      > to return a
      > value? Do we write it to stdout?
      >

      This is the same as the CLI vs API debate. In essence unless you are
      programming in Java or somehow using CORBA or whatever then you'll be in
      trouble no matter what. As an example how do you return an Int from Java to
      VB? to Perl? etc.etc. I know this might seem like repetition
      however.......we could always write it out to text in that case. Persoanlly
      i reckon that if you want to do this efficiently then it means either using
      Java & prog'ing to the public methods or using something like CORBA &
      prog'ing to the public methods via that.


      > The specific case I am thinking about is when a Web application calls
      > the transfer engine to get an XML document. Currently, our
      > API/property
      > set only allows you to write the document to disk. This is inefficient
      > and we should be able to stream the document directly back to the
      > application as XML.
      >

      This is very true & is something which I've given some thought to (esp re
      servlets (I am playing with using servlets for messaging.....no one ever
      said that servlets need to produce / accept just HTML or indeed that their
      output needs to be "visible")). It is almost the same as where do you get
      the file from / put it to. E.g. let's imagine that you want to send the
      resulting doc somewhere via http put. My intial answer (& it remains the
      same right now) is that it simply means adding stuff to the XMLwriting
      methods (or possibly even moving the file read / write out to a separate
      class as in writeFile(File,location) sort of thing).

      > The general case I am thinking about is that the ultimate API for the
      > transfer engine is quite likely to be the XML database API we're
      > developing on the XML:DB mailing list. (You might want to join -- the
      > discussion is quite good. See www.xmldb.org.) Although similar to our
      > current API, it's at a slightly lower level, and would not be useable
      > through a command line.
      >


      I'll have a look round....My only problems with XML DB'es per se are :

      A) Most of the world's data is & will remain in SQL table structures (i.e
      Relational not tree based)
      B) A number of very good tree based DB'es exist such as Cache which have
      been built (& optimsed) over many years & in essence the 2 are the same
      thing.

      > This makes me think that the command line feature set is likely to be
      > different from the transfer engine feature set. In particular, it will
      > be limited to actions that make sense from a command line.
      >

      Yup. The feature set is very simple to set out:

      1) Mapping / Design:
      1.1) Build a DB structure from an XML/ tree structure.
      1.2) Build XML from a DB / table based structure

      2) Operation:
      2.1) Transfer information as fast as possible from XML to an SQL DB
      2.2) Transfer information as fast as possible from and SQL DB to XML.



      > 4) At some level, the transfer engine and the map engine must have an
      > API similar to what I proposed earlier. That is, whether this is a
      > public API or is hidden behind a dispatch layer and called through
      > properties, we still need to specify what functionality the engines
      > expose. Therefore, we need to solidify these APIs, regardless of how
      > they are called. I will continue discussion of these in
      > separate email.
      >


      Yup. Public Methods. If it ain't a public method & you're not using Java
      then you're passing in text/string

      > 5) The area where we just don't seem to agree is the API for the
      > transfer engine and the map engine. You want a dispatch-style
      > interface,
      > I want explicit methods. I'm not sure what to do about the
      > impasse here.
      >


      I am absolutely happy with Public methods for transfer etc. All I want is to
      make XMLDBMS as accessible / useable to a non Java person (e.g. a DBA) as to
      a Java programmer. Let's be honest....if one were a java programmer then one
      could sidestep both transfer & GenerateMap & call DOMtoDBMS etc. yourself
      passing in structures which you'd created yourself. Equally you could build
      your own transfer engine etc. That's not the person I've been aiming @. I've
      been aiming @ the Oracle/DB2/SQLServer/Sybase etc.etc DBA or the guy who
      wants to get an answer in XML. Yes Java coders are obviously welcome but @
      the end of the day Java is a processing medium through which information is
      passing. It is simply the data's transitory state. Obviously I'd like the
      public methods to be as easily useable / accessible to a Java programmer as
      possible but most because that guy is likely to be me.....& I'm lazy so
      being able to do something by saying import org.xmlmiddleware.* + say five
      lines of code is what I'm after (as a Java developer). However......we have
      to expect that a large number of possible users would know either /and / or
      XML & SQL without ****needing**** to know Java.


      > I am absolutely adamant that we need an API with explicit methods. The
      > first and foremost reason for this is readability in the
      > code. It is the
      > reason that SAX, DOM, JDBC, ODBC, OLE DB, Oracle CLI, ADO, the Windows
      > API, JAXP, and hundreds of other APIs consist of multiple, explicit
      > methods, as opposed to a single dispatch method.
      >

      Yup indeedy. However as an example the Oracle CLI does not require that you
      know C or C++ simply because Oracle is written in that. Ditto OLEDB, ODBC
      etc.etc.

      I have no problem with someone calling transfer xml map etc.


      > In fact, the only APIs I have ever seen that use dispatch methods are
      > things like OLE Automation, Java Reflection, and program-to-program
      > communication APIs. What all of these have in common is that
      > the actual
      > methods being called are not known until run time. That is
      > not the case
      > with us.
      >

      That's simply not true. SQL is the best example. What's DB2 written in?
      what's Oracle written in? Do I care? Do I compile SQL?

      Think of a database trigger written in SQL.


      > (A distant second reason for an explicit API is speed, but I'm sure we
      > could argue forever about which style is faster and could each find
      > cases in which one was faster than the other. I think we also
      > agree that
      > the difference is negligible compared to total processing time.)
      >

      If you really wanted speed in the API then quite simply you don't want a
      call transfer xyz etc on the command line. Instead you'd want a transfer
      which accepted a file object (or a number of them) directly in Java within
      your app.

      The moment you bring a CLI into it then you have string handling.


      > Note that I view this API as separate from anything used for
      > program-to-program communication. In particular, it should be possible
      > for an application to call this API regardless of whether the transfer
      > engine is running in process or as a separate application.
      > Obviously, a
      > program-to-program communication API and some sort of driver
      > are needed
      > in the latter case.
      >

      Public methods are fine with me.

      > 6) How do we do program-to-program communication when the transfer
      > engine is running as a server? One possibility is certainly
      > to write our
      > own API and use properties as a wire protocol, but I'm not convinced
      > this is the way to go. (I have nothing against it -- I'm just
      > ignorant.)
      >

      See below:



      > In particular, are we reinventing the wheel here? What are the
      > advantages / disadvantages of this over using RMI, JMS, SOAP, CORBA,
      > sockets, or who knows what other technologies? Also, can the transfer
      > engine be agnostic about this? That is, can we just write drivers for
      > each protocol we choose to support? (This would be my first
      > choice if it
      > was possible.)
      >

      RMI probs include non Java apps, firewalling.

      JMS CORBA SOAP would all carry properties files as @ the end of the day they
      carry text files & Properties files are just that. You could add servlets +
      any other dynamic http protocol.


      > Note that, after we define/choose a way to do program-to-program
      > communication, I have no objections to applications intercepting this
      > and talking to the transfer engine directly through the wire protocol.
      > However, I'm not going to encourage this.
      >
      > Well, it's 1:00 AM and I can't think of anything else to say. I hope
      > this spurs some ideas in you and we can solve this problem.
      >

      Some ideas which I've also been working on via my sometimes circuitous brain
      paths......

      I've been investigating the enhydra schemamapper class for use with
      generating class'es / objects such that I can have a GUI app which accepts /
      gets sent an XML doc & can then load the relevant class to deal with / map
      to the xml doc. In essence XML per se is useless, unless something is done
      with it (whether in a GUI or a servlet or whatever). So building / using
      something which allows my developers to easily do something with the
      resultant XML (& indeed provide XML for use by XMLDBMS) is also important.
      Then it struck me (as things do when it's late & I'm tired) that in many
      ways the org.xmlmiddleware kinda covered this too.

      i.e one "action" might well be to produce the relevant class'es to deal with
      the XML docs produced according to the schema (or indeed to create a new XML
      doc) such that you have an SQLDB. You produce the SQL structure you wish to
      have mapped. This results in a map file & a schema. What then?
      Wellllllll........run that schema through with "action=produceclasses" (or
      something similar) & voila you have something which your servlet / GUI
      developers can then use. The thought was triggered partly by my own needs &
      partly as we may well (you mentioned it sometime back) use the schemamapper
      any way & this would allow the use of the same code infrastructure (e.g. the
      abstraction of the parsers etc). It would also ties in with moving transfer,
      genmap etc into separate classes as all I would be doing would be to add a
      "genJava" class....

      i.e it would be an all in one design / builder tool starting with either a
      schema or an SQL structure and resulting in the means to get XML<>SQL &
      XML<>Java processing (GUI client etc.)

      OK nuff said.

      Right then actions:

      Could you review the textvalues.txt as a start.




      Adam
    • Show all 9 messages in this topic