Loading ...
Sorry, an error occurred while loading the content.

484All in one answer....

Expand Messages
  • Adam Flinton
    Dec 1, 2000
    • 0 Attachment
      Dear Ron,

      Gee thanks this kept me up until roughly 2 -2.30 AM.........lots of blue
      biro on the print outs......<G>.

      OK So in order:

      "API v. properties
      Proposed architecture
      Proposed GeneratePropertyFile command line
      Proposed transfer engine API
      Proposed transfer engine command line
      Proposed GenerateMap API
      Proposed GenerateMap command line"


      Some intial Points which cover a large number of questions:

      1) Fundamentally there are 2 sorts of API

      a) The Command Line interface (Cross platform)
      b) Public Methods exposed for use by Java developers

      2) Properties Objects / files seem to be the simplest ML imaginable. In Java
      everything is an object. In Unix (& in reality in most OS'es everything is a
      file). A Properties file bridges the gap. It is both an object & a file & it
      is not just any object..... it resolves to a very usefull hashtable with a
      built-in write to file mechanism. What is more properties files can be
      merged with both ease & speed to deliver a single Properties object. What is
      more a propfile could be written out by nearly any & every programming
      language possible.

      Think of them as (quite literally) equivalents of config.sys or
      autoexec.bat.

      > Given the use cases, I think we need two different APIs to the transfer
      > engine and GenerateMap. The first is a normal API for programmatic
      > access. The second is a command line API for batch access. (What do
      > stored procedures use?)
      >

      I reckon the cmd line should be used simply to create a propfile / amend a
      propfile.

      > Both the API and the command line can take two forms:
      >
      > 1) Individual methods or command line arguments. For example:
      >
      > storeDocument(mapfile, xmldoc, commitMode, keyGeneratorClassName);
      >

      This is definitely possible for a Java Programmer anyway supposing
      storeDocument is a public method.

      > or:
      >
      > java Transfer mymap.map mydoc.xml afterinsert KeyGeneratorImpl
      >

      Look @ the intial string handling costs. I went down this route initially
      however.........discoveries:

      Let's imagine you want to pass in a new variable e.g. CommitMode. If the CLI
      is done via specific keywords or in a specific order then the initial string
      args[] handling methods would have to be amended etc every time a change was
      made. Why? As it is all you need to do @ the mo is put in CommitMode=auto or
      whatever.

      Simply gen'ing a prop file & then having methods extract the info from the
      propfile as & when required is extremely usefull as it gives you an instant
      lookup facility where the Lookup table is instantly writable. As an example
      if using X-D as an EJB one might have propfiles instantied as Entity Beans
      for reference by lots of X-D processes. The propfile itself is
      subdivideable.....e.g. let us imagine a situation where 90% of any propfile
      is "generic" to either the comp or the app (e.g. "We are going to std'ize on
      Xerces1.1.2 & Oracle"). Equally the JDBC address of the DB might require a
      resolution via LDAP or whatever. As such you might only need to provide
      Username & password. This to a degree is what my ideas on "naming"
      propfiles is predicated upon.

      > 2) A properties file. For example:
      >
      > execute(props);
      >
      > or:
      >
      > java Transfer myprops.prop
      >

      Nice & simple eh?

      I really want to retain the ability to feed in a propfile as a single string
      if required. e.g. user=adam password=dobbin action=transfer
      parser=de.tudarmstadt.ito.domutils.Trans_Xerces
      nq=de.tudarmstadt.ito.domutils.NQ_DOM2 etc.

      I also want to be able to pass in multiple Propfiles within that string.

      Part of the reason has to do with Servlets & their love of long single
      strings.

      > Currently, choice (2) is implemented for both the API (on XML-DBMS) and
      > the command line. I strongly prefer choice (1) for the API. This is
      > because the code is easier to read (and thus easier to learn and
      > maintain) and because the code is a bit faster. However, as you will see
      > in the proposed APIs, what I'm suggesting is a bit of a mix --
      > properties for the database and parser and methods for actions.
      >

      Choice 1 gives you a much larger & more unwieldy initial string / arg
      handling main method / group of methods. I am not sure that the code would
      be that much faster in the long run. If all you are providing is a propfile
      ref (either as a file from disk or from sort of centrally held "in memory"
      hashtable) then how rapidly could you get to actually churning XML<>SQL is
      fairly debateable. In terms of Java's speed (esp with JIT & Hotspot) the
      main drag is object creation & disposal. With a Properties Object you build
      it once & then simply quiz it. If you're interested in "readable code" (I
      too am a simple person who likes simple things) then it allows you to
      dispose with creating tons of string & other objects until you actually need
      them. It also allows you to centralise various IO functions (such as finding
      out if the file exists & if not what to do about it & if so....) as an
      example I would prefer not to pass in a string filename into the various
      methods below such as DOM2DBMS but instead simply a file object which has
      already been opened & checked.

      > For the command line, I initially leaned towards (1) but, after some
      > thought, figured we could support a variation of (2) as well. (I want
      > separate files for database info, parser info, and actions, but the
      > actions file can point to the other two.) This allows people who use the
      > command line (generally non-programmers) a degree of flexibility in how
      > they set things up.
      >

      You can have as many property files as you wish. @ the end of the process
      you should get a single propfile out (if you want it writing out).

      Look on any propfiles you pass in as templates. If you want
      "^xerces122prop.txt^db2AdamsDBprop.txt^otherstuff.text that's entirely fine.

      Proposed Arch:
      "

      GeneratePropertyFiles


      TransferCMDLine GUI GenerateMapCMDLine
      \ / \ /
      \ / \ /
      TransferEngine GenerateMap
      | |
      DOMToDBMS, etc. map factories

      Notice three things:

      1) GeneratePropertyFiles is a separate, utility process and does not
      call the transfer engine, etc. For more info, see email Proposed
      GeneratePropertyFile command line."

      I have no prob with that per se however I would like to retain the abilty to
      have GeneratePropFiles (BTW we need another name for that as GPF has bad
      connotations for anyone who has used WIndows.....) execute either GenMap or
      Transfer.

      "2) The transfer engine and GenerateMap are separate processes and do not
      contain their command line interfaces. This is particularly important in
      the case of the transfer engine, which may some day evolve into a
      standalone server-type process."

      You're right.

      I have been giving mucho thought to this.

      Questions

      (1) How would you want to communicate with such a server engine? IMHO it
      must be done via std file / string passing mechanisms (e.g. http/sevlets,
      text messaging (JMS/SOAP/MQSeries/MSMQ)). The last thing we need is to
      require some port other than 80/8080. A properties file is a text file.
      Remember that one could pass the propfile to a servlet / EJB & then it could
      use that as a Session bean (i.e specific to you but maintained as long as
      you're hooked up to the App server) which in itself is again writable.

      (2) The GenMap class is something that in production systems would
      (hopefully) only be used by the "back-office guys" & the maps would be
      backed up & possibly even stored within a DBMS itself. The last thing you'd
      want is for someone to overwrite a map file without knowing what they're
      doing.

      3) I've drawn the GUI as calling the transfer engine / generate map APIs
      directly. This is intentional, but an early implementation can go
      through the command line classes if that's easier for now.


      I would see more as

      GUI
      |
      CMDLine
      |
      GeneratePropertyFiles
      | |
      TransferEngine GenerateMap
      | |
      DOMToDBMS, etc. map factories


      Note if the transfer engine (or even DOMtoDBMS) is done as a set of public
      classes & public methods then someone could always call them directly
      (assuming that mapfilenames etc were passed in as strings)

      No probs with that.

      Subject: Proposed GeneratePropertyFile command line

      > In the proposed architecture, GenProp is a utility for generating
      > property files. We need this because property files do not appear to be
      > hand-editable. In particular, the things you saw being escaped on your
      > system (Linux?) appear to be different from what I saw being escaped on
      > my system (Win95).
      >

      Actually they are hand-editable. I built the intial files which I passed
      onto you just using notepad. I have been doing the dev work on Linux & Win98
      & Win NT.

      There are some minor points use a \\ instead of \ eg d:\myfile is best done
      as d:\\myfile

      xmlfileout=D:\\Move\\xd\\t54.xml

      But that's about it.

      > Note that GeneratePropertyFile will not call Xmldbms/Transfer/etc. That
      > is, running stuff from the command line is a two-step process. You call
      > GeneratePropertyFile to generate one or more properties files, then you
      > actually call the transfer engine or GenerateMap. This is in line with
      > the notion that people in general will generate properties once and make
      > multiple calls (such as in a batch file) to the Transfer engine with a
      > fixed property file. If they're just playing, they'll use the GUI.
      >

      GPF should be able to call Xmldbms/Transfer/etc. At the moment whether it
      does or not is toggle-able ....

      This is if anything to make it easy for people to play around by both
      gnerating the property file & having it do something that they can then
      admire....<G>. However in a production system you should be able to load up
      a ready to go properties file amend it with say user info etc & thus
      straight into transfer.


      > I suggest the following:
      >
      > GeneratePropertyFile output-filename [-f property-filename | -p
      > property-value-pair] ...
      >
      > Properties are processed in the order encountered (left to right) with
      > the each duplicated property overriding its predecessor.
      >

      What's the need?

      IMHO reasonable aims should be

      A) the keep it as simple as possible
      B) To keep the entry points as text based as possible.


      Subject: Proposed transfer engine API


      > I took a look at the capabilities of DBMSToDOM and DOMToDBMS. These are
      > pretty much exposed by Transfer/TransferResultSet today, the only
      > exception being that you can't currently specify the commit mode and
      > KeyGenerator class. A suggested API is therefore:
      >

      A Perfect example of when & why to use the properties / value pairs
      approach. Write it once & then you have an adaptable CLI without changing a
      damn thing.

      > public void setDatabaseProperties(Properties props);
      > public void setDatabaseProperties(String propFilename);
      > // user and password can also be set as properties
      > public void setUserInfo(String userName, String password);
      >
      > public void setParserProperties(Properties props);
      > public void setParserProperties(String propFilename);
      >


      At the end of the day all these are strings. This is exactly where I got to
      b4 I decided that it would get to be unreadable if done properly (i.e. every
      field being private but with associated get & set public methods)

      You simply don't need tham as everything in the above list would be required
      so fetching them out of a hashtable/properties object would (IMHO) probably
      be faster as you don't need to go round creating string objects,filling
      them, using the results & then disposing of them.

      > public void storeDocument(String mapFilename,
      > String xmlFilename,
      > int commitMode,
      > String keyGeneratorClassName);
      >
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String sqlStatement);
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String tableName,
      > Object[] keyValues);
      > public void retrieveDocument(String mapFilename,
      > String xmlFilename,
      > String[] tableNames,
      > Object[][] keyValues);
      >
      > There are several problems with this API:
      >

      They seem fine to me.

      > 1) It assumes that the transfer engine is in the same process space as
      > the calling application. For the moment, this is OK with me, as I think
      > we've got enough work to do without worrying about interprocess
      > communication. In the future, the transfer engine might reasonably run
      > as some sort of server process.
      >

      If you wanted to deal with transfer as a piece of code built into your code
      (Transfer x = new Transfer(); sort of thing) then it would indeed be in the
      same process space (I suppose you could go down the RPC/RMI kind route but
      why bother?) In theory what we're going to be doing is producing / consuming
      XML docs. As a result a "client" might simply request XML docs from a http
      server & return them via http or might even request a certain doc via a
      single string (e.g. to a servlet). No reason for the client to interact with
      the server engine except via text (file or string).


      > 2) It does not reuse information (notably maps). This could be solved by
      > removing the mapFilename argument from the existing methods and having a
      > setMap method:
      >

      Oddly enough I have thought about this (notably by considering whether
      mapfiles etc should be resolving & loaded as a file once & then passed in as
      a file not as a string which must then be loaded resolved to a file, found,
      loaded & then acted upon.

      > public void setMap(String mapFilename);
      >

      I was thinking more of

      public File setMap(String mapFilename);

      > One question this raises is how an application uses multiple maps --
      > that is, how to avoid recompiling maps just because you use map A, then
      > map B, then A again, etc. One possibility is to return a map ID/handle
      > that is then passed in to the store/retrieve methods. We would also then
      > need a releaseMap method.
      >
      Yup. I know it may sound harsh....but we could always construct a hashmap &
      fill it with different mapfiles....

      > 3) It eliminates the table option (pass a table name and get SELECT *
      > FROM TABLE). This is because the signature of this method would match
      > the signature of the retrieveDocument(SQL) method. This doesn't bother
      > me much, as I can't imagine anybody using it in a production scenario.
      > If we want it on the GUI and the command line for purposes of people
      > playing, we can easily add it.
      >

      Ditto. I can only really imagine it being of use in the GUI.

      > 4) It probably exceeds the current capabilities of the KeyGenerator
      > interface. This is because there is no way to know what method to call
      > (if any) to initialize the key generator. (Currently, this is not a
      > problem because the calling application does the initialization and
      > DOMToDBMS just makes calls to get keys.) Given that Nick Semenov is the
      > only person who really seems to have even noticed this interface, I'm
      > willing to let it slide for now.
      >

      Hunky Dorey. In most production systems I suspect that keyGeneration might
      be left to the dbms anyway.



      Subject: Proposed transfer engine command line:


      > Here is a proposed command line parallel to the transfer engine
      > interface. It is not clear to me if this is in a separate class or in
      > the transfer engine class. My guess is separate, simply because this
      > will simplify things later if we decide to make the transfer engine into
      > some sort of standalone server process.
      >
      > Transfer properties-file [-u userName] [-p password]
      >
      > -OR-
      >

      No real need. I can certainly add this if required however chucking the
      various vals into the propfile such as Transfer properties file user=adam
      password=dobbin would also then allow you to add key1=128 key2=1234 etc.

      > Transfer database-property-file parser-property-file map-file xml-file
      > [-u userName] [-p password]
      > {-toxml [-c commitMode] [-k keyGeneratorClassName] |
      > -todbms -t tableName -k keyValues [-t tableName -k keyValues ...]
      > |
      > -todbms -s selectStatement}
      >

      Urrrggggg.

      > I originally only had the second syntax, but it has the obvious problem
      > of length (although it has the advantage of readability).
      >

      I am not so sure (really). user=adam is IMHO more naturally readable than -u
      adam

      > What I didn't like about a single properties file is that it mixes
      > together things that are unrelated. That is, parser properties are
      > unrelated to database properties are unrelated to actions. This means
      > that things like parser properties and database properties get
      > duplicated in all files, which I view as a very bad thing, since it
      > means they can't be changed easily.
      >

      That's not so.

      1) In order to work you need both a parser & a DB so both are related to
      actions
      2) In terms of duplication......(a)the Properties Object needs to be created
      only once & can then be quized for vals. The alterantive of creating lots of
      strings etc would IMHO be less efficient in terms of Object creation &
      destruction (b) They can be changed easily by simply loading in another
      value with that key. Again as an example imagine if in the DB properties
      part you simply put in ldap://MyDB & someone had written a little LDAP
      module which went off with the user info & pw & quizzed a LDAP server & as a
      result filled the jdbcurl & jdbcDriver vals (possbly not Driver (unless a
      classloader was in use)) but almost certainly the jdbcUrl.

      At the moment that would be fairly simple to implement (I nearly did so just
      for fun but I only have a LDAP server @ home @ the mo.)


      > The solution to this is to have a single (action) properties file that
      > has
      > pointers to parser properties and database properties files. This I
      > could easily live with and it would just be an option -- either you pass
      > in a properties file or you pass in the whole shebang above.
      >

      More propfiles = more room for error + more IO activity + more processing.

      You can have as many propfiles as you wish (or as few).

      > In either case, you don't get the option to override properties with a
      > random property=value pairs. To my mind, this is simply too much -- at
      > some point, flexibility is just giving people a rope to hang themselves
      > with.
      >

      a) Only property names which are being looked for will have any effect e.g.
      in testing I used splag=splurg & other nonsense parings.
      b) Again consider it in the same light as a config.sys. It isn't so much
      about avoiding -t or whatever as being able to send it a text file which all
      it needs.
      c) It gives us massive adaptability. E.g. you have now got an CommitMode
      value. I would have to do nothing at the CLI level in order to support this
      you'd simply have to add CommitMode=auto into either the prop file or the
      Commandline.

      Subject: Proposed GenerateMap API


      > Here's the proposed API for GenerateMap.
      >
      > public void setDatabaseProperties(Properties props);
      > public void setDatabaseProperties(String propFilename);
      > // user and password can also be set as properties
      > public void setUserInfo(String userName, String password);
      >
      > public void setParserProperties(Properties props);
      > public void setParserProperties(String propFilename);
      >

      Not nned.

      > // Type is DTD, DDML, or XML Schema. Generates .sql and .map
      > public void createMap(String filename, int type);
      >
      > // Generates .map and .dtd or .xsd (XML Schema)??
      > public void createMap(String tableName, String outputBasename);
      > public void createMap(String tableNames[], String outputBasename);
      > public void createMap(String sqlStatement, String outputBasename);
      >
      > Questions:
      >
      > 1) Should people be able to specify the name of the output .sql and .map
      > files? Or should we just continue to use the base name plus .sql/.map?
      >

      The way I see it......it should default to basename but you could provide
      the output names if you want (i.e if null then basename else....)

      > 2) The last createMap is illegal -- it has the same signature as the
      > second (tableName). Any ideas?
      >
      An Array can be an array of one

      > Note that this and the transfer engine might be derivable from a base
      > class that implements the parser and database properties. Note also that
      > I have't yet considered what exceptions can be thrown.
      >

      Yup.

      It should also be pointed out that as detailed in the textvalues.txt you can
      happily cut the properties file down to just those properties you need.

      Subject: Proposed GenerateMap command line


      > Here's the proposed command line for GenerateMap:
      >
      > GenerateMap properties-file [-u userName] [-p password]
      >
      > -OR-
      >
      > GenerateMap databasePropertyFile parserPropertyFile
      > [-u userName] [-p password]
      > { {-dtd | -ddml | -xsd} schemaFilename |
      > -t table [-t table ...] |
      > -s selectStatement}
      >
      > As is the case with Transfer, the single properties file would contain
      > pointers to the database and parsers property files.
      >

      Again you can have as many propfiles (or as few as you wish)

      I'll think about it some more however I do feel that sticking to using
      propfiles gives us:

      A) easily handeditable / easy to build programatically text files
      B) A pretty efficient object creation / destruction graph.
      C) Less code / easier to read code.
      D) Easy addition of other proerties if & when the time comes.
      E) An easily updateable central store of values which could be easily moved
      towards an entirely in memory EJB structure where part of the hashtable
      could be "variable" (e.g. user, SQL statement etc) & others might be system
      set (i.e by an admin person) e.g. xmlparser,dbinfo output xmlfilename (via a
      rules engine or similar).
      F) An app which could be used equally easily in a client or server, App
      server, Stored Proceedure & where requests might come in via many means but
      all transmitting text / text files.

      Try doing some mem tests as I did (but on a very low level scale) between
      creating lots of strings & populating them & loading a propfile & then
      quizzing it. I couldn't find much if any difference.


      Adam
    • Show all 9 messages in this topic