Loading ...
Sorry, an error occurred while loading the content.

Re: [pcgen-xml] Re: XML Conversion - phased

Expand Messages
  • Keith Davies
    ... Correct in all cases. ... That is my belief. While it may have some benefit from a coding standpoint, going to the obvious XML changes things from a
    Message 1 of 14 , Feb 28, 2003
    • 0 Attachment
      On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen James wrote:
      > Keith wrote:
      > > [ ... ] Thoughts, comments?
      >
      > Let me summarize to make sure I understand your points. Your three
      > points of XML conversion are:
      >
      > 1. XML that exactly mimics current tab-separated LST files
      > 2. XML that describes all of the game elements in an XML way
      > 3. XML that goes all they way, takes #2 but also supports meta-level
      > descriptions and game rules.

      Correct in all cases.

      > I do see a few (minor) benefits of #1, but only as a step towards #2 &
      > #3.
      > * Can get XML reading into PCGen to quickly and get it debugged before
      > moving further.
      > * _Very_ simple to write converters for existing LST files (both
      > community supported and user generated)
      > * Prepare the community to think in XML
      >
      > However I also agree with your statement that it doesn't bring enough to
      > PCGen to worthwhile. Perhaps the Code Monkeys would want that as a
      > first step, but from an data-format standpoint it doesn't add any value
      > over tab-separated.

      That is my belief. While it may have some benefit from a coding
      standpoint, going to the obvious XML changes things from a semi-arcane
      format that is reasonably concise to a verbose but somewhat easier to
      understand format. While this may be of some benefit, I think that we
      wouldn't be staying here long enough to really be worth the trouble.

      > As you said, I agree with your thoughts that #2 is the best risk/gain in
      > the short term, and #3 is the best risk/gain in the long term. I won't
      > comment on branching the code - I don't know enough if that would
      > require a freeze on bug fixes/FREQ or if we could do the work once and
      > have it apply to both branches (CVS should do that, unless out
      > underlying representation of the data changes the nature of the bug
      > fixes).

      To be honest, I don't know how much of the existing code -- the code
      that would be used in #2 -- would really get reused. I think that the
      underlying mechanism of the program would be changing, and that the
      scope of the changes would be broad enough that, while we could use the
      existing code as a resource, it probably wouldn't make a great place to
      start. The internal data model would be changing, the front end code
      would be more or less replaced... all that would be left would be a bit
      of the code used to handle certain game constructs, and that probably
      wouldn't even work with the generalized data.

      Hence this is the 'high-risk' route. I think it'd lead to simpler code
      overall, but it means not being able to make full use of what already
      has been done.

      > Where I have a concern if going from #2 to #3 (or even #1 to #2/#3). #3
      > seems by far the most elegant and powerful, but is hampered by all of
      > the pre-existing LST files. In order to really take advantage of #3 I
      > can't see an automatic converter. Which means that we need to do a lot
      > of work just to preserve our existing LST functionality, which is work
      > not being used to more forward.

      Most of the simpler stuff -- which is, I think, most of everything --
      should be pretty straightforward to convert. It's where things get
      complex that we'll run into difficulty... but then again, I think they
      could get simpler because the XML schema models them better. For
      instance, right now we have a number of templates whose sole purpose is
      to implement changes to the character that cannot be done another way.
      The general rule in the XML is that anything that can change something
      can change anything about it.

      Gah. Ugly sentence. How about an example?

      A class level can change anything about the character it is applied to;
      it can add to stats, change the class hit die[1], increase or decrease
      character size, change race, whatever. Similarly, the race can give /n/
      levels of sorcerer, or just give 12 levels of sorcerer spellcasting (no
      hit dice, familiar, etc., just spells). Taking a feat can give bonus
      ranks in a skill; taking a tenth rank in a skill might give a feat. Why
      not?

      Granted, it means that things can be a little unsane (skill ranks give
      feats? Huh?), but I'm willing to leave that to the data people. As far
      as I'm concerned this is good.

      [1] well, not *exactly*, but close enough -- each level may add a hit
      die, but there's nothing saying that they have to be the same
      size... incidentally, this also makes it easy to support the
      material in Savage Species.

      > You had mentioned a hybrid of #2 and #3, which kept the formats
      > separate. How feasible is a hybrid of #2 and #3 in which we don't?
      > Just as now we have an evolution of tags, what would we have to do to
      > develop the XML format for #3 such that is also supports the data for
      > #2? In other words we put together XML specs for #3, but our initial
      > goal is only to provide functionality for the #2 subset in code,
      > converters, and list files. This gives us a fairly clean path to write
      > converters for in the short term and goes for the maximized risk/reward
      > you had mentioned, but then allows us (from both a code and LST
      > perspective) to add in the meta data and slowly take advantage of it.
      > Just as we currently add or update tags, this would also be working
      > toward the eventual goal of #3, but #3 designed as a superset of #2.

      The formats wouldn't be entirely seperate, just parts of them. #2 would
      use the data schema for D&D that comes out of #3; #3 would also support
      the meta and gui mechanisms. Thus, #2 would be a subset of #3 in terms
      of XML, but they would be able to use the same game data. At least, #3
      would be if the correct meta definitions are loaded.

      Perhaps 'hybrid' was a poorly-chosen word. I didn't mean a hybrid
      program, but a cross between the two approaches.

      What you've described is almost what I had in mind, though. If we were
      to branch as I described earlier, the core branch would be modified to
      make use of the d20 (D&D) XML schema (parts of which are in the
      documentation I recently uploaded). This would get the extant data into
      XML (at least for D&D, and probably for the others; the differences
      aren't that profound yet) and the core branch would be able to use it.

      The extant data can be converted -- automagically where feasible,
      manually where not. I think many of the existing encodings may be made
      simpler, but I think that -- for the ugly parts -- we can continue to
      support pretty much the same behavior, at least initially. For
      instance, where a class has to add a template in order to support
      something (say, a class that increases the character's size) we can
      probably do that conversion automatically, but maybe come back later to
      make it more elegant (make the size change directly in the class
      description, rather than in the template).

      For the new branch, I don't know that we'd be able to scavenge a lot of
      code from the core program to use in the meta-enabled program. The two
      programs, while they have the same goal, don't really do it the same
      way. The third program moves as much as possible into data, and will, I
      hope, be simpler overall.


      Keith
      --
      Keith Davies
      keith.davies@...

      PCGen: <reaper/>, smartass
      "You just can't argue with a moron. It's like handling Nuclear
      waste. It's not good, it's not evil, but for Christ's sake, don't
      get any on you!!" -- Chuck, PCGen mailing list
    • Keith Davies
      ... Any questions or comments? I ve been asked to move things up and unless someone comes up with something I haven t considered before now,
      Message 2 of 14 , Feb 28, 2003
      • 0 Attachment
        On Fri, Feb 28, 2003 at 08:07:04AM -0800, Keith Davies wrote:
        > On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen James wrote:
        > >
        > > Let me summarize to make sure I understand your points. Your three
        > > points of XML conversion are:
        > >
        > > 1. XML that exactly mimics current tab-separated LST files
        > > 2. XML that describes all of the game elements in an XML way
        > > 3. XML that goes all they way, takes #2 but also supports meta-level
        > > descriptions and game rules.

        <killer snip>

        Any questions or comments? I've been asked to move things up and unless
        someone comes up with something I haven't considered before now, it'll
        be decision time.


        Keith
        --
        Keith Davies
        keith.davies@...

        PCGen: <reaper/>, smartass
        "You just can't argue with a moron. It's like handling Nuclear
        waste. It's not good, it's not evil, but for Christ's sake, don't
        get any on you!!" -- Chuck, PCGen mailing list
      • Eric Beaudoin
        ... I was really hoping that this would not be the case i.e. that only the parsing code would be change. The monkeys have worked hard to make PCGEN go faster,
        Message 3 of 14 , Feb 28, 2003
        • 0 Attachment
          At 11:07 2003.02.28, Keith Davies wrote:
          >To be honest, I don't know how much of the existing code -- the code
          >that would be used in #2 -- would really get reused. I think that the
          >underlying mechanism of the program would be changing, and that the
          >scope of the changes would be broad enough that, while we could use the
          >existing code as a resource, it probably wouldn't make a great place to
          >start. The internal data model would be changing, the front end code
          >would be more or less replaced... all that would be left would be a bit
          >of the code used to handle certain game constructs, and that probably
          >wouldn't even work with the generalized data.

          I was really hoping that this would not be the case i.e. that only the parsing code would be change. The monkeys have worked hard to make PCGEN go faster, rewriting it at this point is not only high risk, it will probably be a major step back.

          The goal should be that external representation of the data should have a minimal impact on the application represent them internaly. This XML representation will most probably also be used for E-Tools so it stands a good chance of becoming the defacto data representation standard for d20 products. It is very important that we design a language that is as much application independant as possible.

          Also, for the phase parts, I'm for the quick gains. In my opinion, we should go to a "transition" XML schema that mimics the .LST syntax that we have first and then build on this. No mather how hard we try, we will not be able to create the "right" schema" on the first try anyway. It will be a sery of little evolutions getting us to the right point. We might as well start with something that is already familiar and work from there.

          My opinion anyway.

          P.S. Don't count on editor to much to hide the complexity of the schemas. All the major data contributors are still using text editors to get the job done. We should assume that it will stay that way and plan a schema that will be easy to work with rather than easy for the machine to read. It's easier to optimise a parser than to find and train dedicated data monkeys.


          -----------------------------------------------------------
          √Čric "Space Monkey" Beaudoin
          >> In space, no one can hear you sleep...
          >> Camels to can climb trees (and sometime eat them)
          <mailto:beaudoer@...>
        • Scott Ellsworth
          ... As one of the monkeys who has spent upwards of a coder-month working on optimization, I must humbly disagree. I am not sure that it would take less time
          Message 4 of 14 , Mar 1 2:51 PM
          • 0 Attachment
            On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:

            > At 11:07 2003.02.28, Keith Davies wrote:
            >> The internal data model would be changing, the front end code
            >> would be more or less replaced... all that would be left would be a
            >> bit
            >> of the code used to handle certain game constructs, and that probably
            >> wouldn't even work with the generalized data.
            >
            > I was really hoping that this would not be the case i.e. that only the
            > parsing code would be change. The monkeys have worked hard to make
            > PCGEN go faster, rewriting it at this point is not only high risk, it
            > will probably be a major step back.

            As one of the monkeys who has spent upwards of a coder-month working on
            optimization, I must humbly disagree. I am not sure that it would take
            less time to fix the current data model than to write a new one. This
            is not to say that the current code is bad, just that it was not
            written originally for optimal speed, and we have enough data that we
            really need to have indexed hashes for our data.

            For example, the current code regularly iterates over collections of
            keys, doing a caseless text comparison of each key, because the keys
            double as user editable text. Further, it regularly reparses the same
            string for information it already parsed. (Every tab switch requires
            reparsing a whole bunch of entries for Variables, splitting on |
            characters.) These two things alone eat up well over 90% of the
            execution time, according to my profiler, and they are so deep in the
            program that they are very, very hard to fix. It took me four hours to
            do the work to just replace global weapon profs, and that was one of
            the easier ones.

            Keith has proposed a core data model where keys are unique and always,
            always, always in lower case.

            Further, in Keith's data model, we would not parse a string like
            BONUS:somestuff|moreStuff|+2|otherstuff
            more than once - the data would be broken apart on read, and stored in
            appropriate hashes or lists, so it would be easy to find out if an item
            gave a strength bonus, for example.

            We could split these efforts: convert the data model, and convert the
            XML files as separate tasks. The lst files would remain essentially
            the same, but would have separate key and user visible name data, and
            the reader would break apart all bonuses/variables on reading. I have
            been noodling away at that it my spare time, but it is not a fast
            process.

            If this was a priority, we would want to get a number of code monkeys
            working on it as a major feature. We could easily get an order of
            magnitude out of this, as we make over 5000 function calls just to
            switch a tab, and we should be making in the hundreds at the outside.

            Scott
          • Keith Davies
            ... Scott, please contact me offlist. I have some questions that I think you can answer for me; it d be a big help. ICQ 8550570, or AIM keithjdavies. Thanks,
            Message 5 of 14 , Mar 1 3:22 PM
            • 0 Attachment
              On Sat, Mar 01, 2003 at 02:51:40PM -0800, Scott Ellsworth wrote:
              >
              > On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:
              >
              > > At 11:07 2003.02.28, Keith Davies wrote:
              > >> The internal data model would be changing, the front end code
              > >> would be more or less replaced... all that would be left would be a
              > >> bit
              > >> of the code used to handle certain game constructs, and that probably
              > >> wouldn't even work with the generalized data.
              > >
              > > I was really hoping that this would not be the case i.e. that only the
              > > parsing code would be change. The monkeys have worked hard to make
              > > PCGEN go faster, rewriting it at this point is not only high risk, it
              > > will probably be a major step back.
              >
              > As one of the monkeys who has spent upwards of a coder-month working on
              > optimization, I must humbly disagree. I am not sure that it would take
              > less time to fix the current data model than to write a new one. This
              > is not to say that the current code is bad, just that it was not
              > written originally for optimal speed, and we have enough data that we
              > really need to have indexed hashes for our data.

              Scott, please contact me offlist. I have some questions that I think
              you can answer for me; it'd be a big help. ICQ 8550570, or AIM
              keithjdavies.

              Thanks,
              Keith
              --
              Keith Davies
              keith.davies@...

              PCGen: <reaper/>, smartass
              "You just can't argue with a moron. It's like handling Nuclear
              waste. It's not good, it's not evil, but for Christ's sake, don't
              get any on you!!" -- Chuck, PCGen mailing list
            • merton_monk <merton_monk@yahoo.com>
              I m logged into AIM right now as CMPMerton (might be a space between CMP and Merton, I forget) if you d like to discuss the persistence layer in PCGen or data
              Message 6 of 14 , Mar 1 7:34 PM
              • 0 Attachment
                I'm logged into AIM right now as CMPMerton (might be a space between
                CMP and Merton, I forget) if you'd like to discuss the persistence
                layer in PCGen or data model. Anyone who wants to help with the xml
                conversion can join... I'll probably be online for a couple of hours.

                -Bryan

                --- In pcgen-xml@yahoogroups.com, Keith Davies <keith.davies@k...>
                wrote:
                > On Sat, Mar 01, 2003 at 02:51:40PM -0800, Scott Ellsworth wrote:
                > >
                > > On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:
                > >
                > > > At 11:07 2003.02.28, Keith Davies wrote:
                > > >> The internal data model would be changing, the front end code
                > > >> would be more or less replaced... all that would be left
                would be a
                > > >> bit
                > > >> of the code used to handle certain game constructs, and that
                probably
                > > >> wouldn't even work with the generalized data.
                > > >
                > > > I was really hoping that this would not be the case i.e. that
                only the
                > > > parsing code would be change. The monkeys have worked hard to
                make
                > > > PCGEN go faster, rewriting it at this point is not only high
                risk, it
                > > > will probably be a major step back.
                > >
                > > As one of the monkeys who has spent upwards of a coder-month
                working on
                > > optimization, I must humbly disagree. I am not sure that it
                would take
                > > less time to fix the current data model than to write a new
                one. This
                > > is not to say that the current code is bad, just that it was not
                > > written originally for optimal speed, and we have enough data
                that we
                > > really need to have indexed hashes for our data.
                >
                > Scott, please contact me offlist. I have some questions that I
                think
                > you can answer for me; it'd be a big help. ICQ 8550570, or AIM
                > keithjdavies.
                >
                > Thanks,
                > Keith
                > --
                > Keith Davies
                > keith.davies@k...
                >
                > PCGen: <reaper/>, smartass
                > "You just can't argue with a moron. It's like handling Nuclear
                > waste. It's not good, it's not evil, but for Christ's sake,
                don't
                > get any on you!!" -- Chuck, PCGen mailing list
              • CC Americas 1 Carstensen James
                Keith, Eric and Scott said it, but I think the big question is how much work the Code Monkeys are willing to do. I ll sum up a couple of points from different
                Message 7 of 14 , Mar 3 6:34 AM
                • 0 Attachment
                  Keith,

                  Eric and Scott said it, but I think the big question is how much work
                  the Code Monkeys are willing to do. I'll sum up a couple of points from
                  different posts here.

                  Eric Beaudoin wrote:
                  > Also, for the phase parts, I'm for the quick gains. In my opinion, we
                  should go to a "transition" XML schema that mimics the .LST syntax that
                  we have first and then build on this. No mather how hard we try, we will
                  not be able to create the "right" schema" on the first try anyway. It
                  will be a sery of little evolutions getting us to the right point. We
                  might as well start with something that is already familiar and work
                  from there.

                  Eric, you're contributed much more then me to PCGen, take this with the
                  respect it's intended. From looking at the list files, and looking at
                  the "cleaner" implementation that Keith wants, it's more of a revolution
                  then an evolution. Some things are just done in a different enough way
                  that getting there from a direct tab-to-XML conversion is not easy, may
                  not be optimal, and may end up just being "bolted on" rather then a
                  clean implementation.

                  I myself way hoping for the same evolution, but more from Step #2 to
                  Step #3, and even there I don't know how much that's true.

                  The current LST structure has evolved over time to handle more and more,
                  and it has both it's strengths and weaknesses. XML has different
                  strengths and weaknesses, and I think that if we just pull over the LST
                  files to XML format, we'll be importing some of the weaknesses of LST
                  while not mitigating some of the weaknesses of XML.

                  But we grow from there - maybe that's worth that initial period to
                  convert people over quickest and then grow. Is growing from a syntax
                  that won't take advantage of XMLs strengths to one that will a
                  longer/hander process then doing a single big change and have a good
                  foundation to build on?

                  Scott Ellsworth wrote:
                  > We could split these efforts: convert the data model, and convert the
                  XML files as separate tasks. The lst files would remain essentially
                  the same, but would have separate key and user visible name data, and
                  the reader would break apart all bonuses/variables on reading. I have
                  been noodling away at that it my spare time, but it is not a fast
                  process.

                  Is this something that would be going on concurrently, before, or after
                  the XML conversion? It seems like even if we stayed with LST files
                  working out a way to do unique lowercase identifiers would be a benefit.

                  Cheers,
                  Blue

                  -----Original Message-----
                  From: Keith Davies [mailto:keith.davies@...]
                  Sent: Saturday, March 01, 2003 1:38 AM
                  To: pcgen-xml@yahoogroups.com
                  Subject: Re: [pcgen-xml] Re: XML Conversion - phased


                  On Fri, Feb 28, 2003 at 08:07:04AM -0800, Keith Davies wrote:
                  > On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen
                  James wrote:
                  > >
                  > > Let me summarize to make sure I understand your points. Your three
                  > > points of XML conversion are:
                  > >
                  > > 1. XML that exactly mimics current tab-separated LST files
                  > > 2. XML that describes all of the game elements in an XML way
                  > > 3. XML that goes all they way, takes #2 but also supports
                  meta-level
                  > > descriptions and game rules.

                  <killer snip>

                  Any questions or comments? I've been asked to move things up and unless
                  someone comes up with something I haven't considered before now, it'll
                  be decision time.


                  Keith
                  --
                  Keith Davies
                  keith.davies@...

                  PCGen: <reaper/>, smartass
                  "You just can't argue with a moron. It's like handling Nuclear
                  waste. It's not good, it's not evil, but for Christ's sake, don't
                  get any on you!!" -- Chuck, PCGen mailing list


                  To unsubscribe from this group, send an email to:
                  pcgen-xml-unsubscribe@yahoogroups.com



                  Your use of Yahoo! Groups is subject to
                  http://docs.yahoo.com/info/terms/
                • Keith Davies
                  ... Hi All, things have changed over the weekend; I will be posting more later today (I m a little busy at work right now). The schedule is being moved up and
                  Message 8 of 14 , Mar 3 8:26 AM
                  • 0 Attachment
                    On Mon, Mar 03, 2003 at 09:34:09AM -0500, CC Americas 1 Carstensen James wrote:
                    > Keith,
                    >
                    > Eric and Scott said it, but I think the big question is how much work
                    > the Code Monkeys are willing to do. I'll sum up a couple of points from
                    > different posts here.

                    Hi All,

                    things have changed over the weekend; I will be posting more later today
                    (I'm a little busy at work right now). The schedule is being moved up
                    and I've had to make some (somewhat unilateral) decisions. A plan has
                    been submitted to the BoD for ratification and I expect we'll start on
                    it very soon. To summarize, however:

                    1. We will use (more or less) the simplest and most direct translation
                    of LST to XML for now. *Some* of my design will apply, but for the
                    most part the XML will be very recognizable and easily understood by
                    non-XML monkeys. This was a key point both for the sake of data
                    monkey comfort and simple time pressure.
                    2. The internal data model will, for the most part, remain untouched.
                    We may be able to sneak some changes into the IDM, but for the most
                    part we're aiming at minimal impact.
                    3. XML serialization will be supported in parallel with LST I/O, at
                    least during the early stages. Once XML support is in, LST use is
                    deprecated and, in my schedule, slated for earliest possible removal
                    (next major release will be as soon as we can) in order to reduce
                    maintenance headaches.
                    4. All distributed data files will be converted in one pass, rather than
                    file type by file type (increase amount of pain, minimize duration).
                    5. A converter will be provided for LST files in the field.
                    6. We will use the time until the release of 5.0.0 for analysis of the
                    code and LST structure; once 5.0.0 hits the streets we'll branch the
                    source and start the changes.
                    7. After 6.0.0 (full XML, LST removed) is released, we branch the source
                    as described in an earlier posting and pursue the path I'd *like* to
                    take.

                    The plan above will get us into XML and play on the monkey's familiarity
                    with LST files. It will be crap XML -- in the sense that while it will
                    comply with XML standards it will be poorly-designed XML that will not
                    take advantage of all the benefits that XML will give us -- but will
                    cause the least distress among data monkeys and least impact on the
                    code. The conversion will also be simplest, almost 1:1 congruence
                    except where benefits can be gained by making a change.

                    We're moving, folks, and while it's not the way I'd been hoping to go,
                    it will get us headed in the right direction.


                    Keith
                    --
                    Keith Davies
                    keith.davies@...

                    PCGen: <reaper/>, smartass
                    "You just can't argue with a moron. It's like handling Nuclear
                    waste. It's not good, it's not evil, but for Christ's sake, don't
                    get any on you!!" -- Chuck, PCGen mailing list
                  Your message has been successfully submitted and would be delivered to recipients shortly.