Loading ...
Sorry, an error occurred while loading the content.

RE: [pcgen-xml] XML Conversion - phased

Expand Messages
  • CC Americas 1 Carstensen James
    Folks, Just thought out some thoughts on the PROs and CONs of a phased implementation of XML. PRO: Automatic conversion of data means that it s represented the
    Message 1 of 14 , Feb 26 7:06 AM
    • 0 Attachment
      Folks,

      Just thought out some thoughts on the PROs and CONs of a phased
      implementation of XML.

      PRO:
      Automatic conversion of data means that it's represented the
      same way, so the changes to the PCGen program could focus initially on
      just changing how we read data, and not change any big functionality at
      the same time (sure way to introduce more bugs).
      We would have XML converters for homebrew campaigns (something
      many users will want).
      This also allows us to test and verify the XML data input
      separate from the XML expanded functionality, since we would have a 1:1
      between the XML and the current format allowing us to use the current
      backend unchanged.
      In the second phase we already have working XML in place, and we
      can expand our functionality organically in bits and pieces (just we do
      with our current FREQ) allowing PCGen to advance as well.

      CON:
      More work.
      An additional "conversion" schema.
      The need to do some fairly heavy conversion from our "initial"
      XML conversion to the final (more powerful) XML conversion. This may
      also put off LST writers who learn a whole new format to have parts of
      it depreciated and replaced on a regular basis as we morph to the
      "full-featured" XML.

      If PCGen was a new project I'd rather start it off correct, but with the
      existing amount of data, both community supported (OGL) and user
      supported (custom campaigns, personal use of non-OGL books, etc.) we
      should just discard the idea of doing it in steps, but rather reexamine
      it to see if it can fit or needs or do we have valid reasons to avoid
      it.

      Cheers,
      Blue

      -----Original Message-----
      From: Paul M. Lambert [mailto:plambert@...]
      Sent: Tuesday, February 25, 2003 5:19 PM
      To: pcgen-xml@yahoogroups.com
      Subject: Re: [pcgen-xml] Re: [pcgen] Question about future xml and
      current custom list files


      I vote for this.

      Why?

      Because it gets us over to XML first, which has some basic advantages
      that
      I like.

      And it gives more time for the _real_ work to be done. Moving the .LST
      structures to XML should be much less work and actually complete in a
      reasonable amount of time, and it'll make a converter very easy.

      Then the real work and discussion can occur on the better schema to
      support
      for actually redesigning the data.

      But I like getting a result sooner, even if it's not a 100% ideal world
      situation.

      --plambert

      On Tue, 25 Feb 2003, John Rudd wrote:

      >
      > (re-sent from the main pcgen list)
      >
      > > Subject: Re: [pcgen] Question about future xml and current custom
      list files
      > >
      > > On Tue, Feb 25, 2003 at 12:25:28PM -0800, John Rudd wrote:
      > > > > From: Keith Davies <keith.davies@...>
      > > > >
      > > > > I say 'may' because things are not completely settled, and
      conversion
      > > > > can be done a couple of different ways. One is to basically
      convert
      > > > > exactly what we have to XML. Not a huge net gain, and many of
      the hacks
      > > > > and implementation problems we face now will still be present.
      Better,
      > > > > IMO, is to use a more sophisticated schema that better
      represents the
      > > > > data... but this will require more effort to convert. At this
      point I'm
      > > > > not entirely sure how well it will automate.
      > > >
      > > > Why not do both? In phases?
      > > >
      > > >
      > > > Phase 1) simple tab'ed lst file to xml'ed lst file.
      > > > Phase 2) full on xml support in PCGen with better formats and
      everything.
      > > > Phase 3) Profit! ... er ... wrong audience.
      > > >
      > > >
      > > > (and, another plug for contained attributes instead of attributes
      as
      > > > arguments to tags)
      >
      >
      > To unsubscribe from this group, send an email to:
      > pcgen-xml-unsubscribe@yahoogroups.com
      >
      >
      >
      > Your use of Yahoo! Groups is subject to
      http://docs.yahoo.com/info/terms/
      >


      To unsubscribe from this group, send an email to:
      pcgen-xml-unsubscribe@yahoogroups.com



      Your use of Yahoo! Groups is subject to
      http://docs.yahoo.com/info/terms/
    • John Rudd
      ... (just to be clear, I think your parenthetical statement was meant to say one big funtionality change is the sure way to introduce more bugs ) ... Not
      Message 2 of 14 , Feb 26 8:34 AM
      • 0 Attachment
        > From: "CC Americas 1 Carstensen James" <james.carstensen@...>
        >
        > Folks,
        >
        > Just thought out some thoughts on the PROs and CONs of a phased
        > implementation of XML.
        >
        > PRO:
        > Automatic conversion of data means that it's represented the
        > same way, so the changes to the PCGen program could focus initially on
        > just changing how we read data, and not change any big functionality at
        > the same time (sure way to introduce more bugs).

        (just to be clear, I think your parenthetical statement was meant to say
        "one big funtionality change is the sure way to introduce more bugs")

        > We would have XML converters for homebrew campaigns (something
        > many users will want).
        > This also allows us to test and verify the XML data input
        > separate from the XML expanded functionality, since we would have a 1:1
        > between the XML and the current format allowing us to use the current
        > backend unchanged.
        > In the second phase we already have working XML in place, and we
        > can expand our functionality organically in bits and pieces (just we do
        > with our current FREQ) allowing PCGen to advance as well.
        >
        > CON:
        > More work.

        Not necessarily more work. It's really the same amount of work, broken up
        into chunks. It reduces some amounts of work (because you're not having
        to debug a one big functionality change to make sure all of its moving
        pieces are moving together (which becomes exponentially more difficult as
        you add individual moving parts ... but it also introduces some work
        because the overhead common to all changes will happen multiple times
        ... but in the end it should balance out)

        > An additional "conversion" schema.
        >
        > The need to do some fairly heavy conversion from our "initial"
        > XML conversion to the final (more powerful) XML conversion. This may
        > also put off LST writers who learn a whole new format to have parts of
        > it depreciated and replaced on a regular basis as we morph to the
        > "full-featured" XML.

        This could also be done in phases, allowing LST writers to adapt each
        sub-part of their LST file one at a time. In a way, this just becomes
        like the current system: the LST file format today is not the same as
        it was back in 2.x.y when I first started using PCGen. The difference
        is that you're adding XML features one by one, instead of changing the
        tabbed file format one by one. Plus, it makes it easier on the LST
        writers because it's easier to write simple single feature converters
        than larger complex all encompassing feature converters.

        For example, first you do the tab->xml conversion. Then you expand or
        re-work the equipment file format. Then you re-work the spell file
        format. Then you re-work the next one ... etc. And, along the way,
        maybe you also start to merge some files (since they're seperated by
        tag containers instead of file formats) and things like that. And,
        for each step, you have a simple conversion script that just covers
        the features for that one change.

        > If PCGen was a new project I'd rather start it off correct, but with the
        > existing amount of data, both community supported (OGL) and user
        > supported (custom campaigns, personal use of non-OGL books, etc.) we
        > should just discard the idea of doing it in steps, but rather reexamine

        ^^^^^^
        was this supposed to be a "shouldn't"?

        > it to see if it can fit or needs or do we have valid reasons to avoid
        > it.
        >

        John
      • CC Americas 1 Carstensen James
        ... say ... Yeap, good catch. ... up ... Not sure if I agree. If you go over Keith s ideas there are some fairly large changes between the text file LST files
        Message 3 of 14 , Feb 26 9:00 AM
        • 0 Attachment
          John Rudd said:
          >(just to be clear, I think your parenthetical statement was meant to
          say
          >"one big funtionality change is the sure way to introduce more bugs")

          Yeap, good catch.

          >> CON:
          >> More work.
          >
          >Not necessarily more work. It's really the same amount of work, broken
          up
          >into chunks.

          Not sure if I agree. If you go over Keith's ideas there are some fairly
          large changes between the text file LST files and the proposed XML LST
          files. We'd have to:
          a) Come up with a schema for XML files that can match the LST files
          b) Work out a migration scheme (including instructions for users with
          homebrew lists) to go from the Phase 1 XML LST files to the Phase 2 XML
          list files.

          I think that this work may be well justified, but it adds whole pieces
          we didn't have to consider before.

          > It reduces some amounts of work (because you're not having
          > to debug a one big functionality change to make sure all of its moving
          > pieces are moving together (which becomes exponentially more difficult
          as
          > you add individual moving parts ... but it also introduces some work
          > because the overhead common to all changes will happen multiple times
          > ... but in the end it should balance out)

          That's a definite point.

          <tongue location="in cheek">
          But that's work for the code monkeys, not the XML monkeys. We
          don't care about that.
          <tongue \>

          >> The need to do some fairly heavy conversion from our "initial"
          >> XML conversion to the final (more powerful) XML conversion. This may
          >> also put off LST writers who learn a whole new format to have parts
          of
          >> it depreciated and replaced on a regular basis as we morph to the
          >> "full-featured" XML.
          >
          > This could also be done in phases, allowing LST writers to adapt each
          > sub-part of their LST file one at a time.

          I agree with you, this is how I was envisioning it. However that means
          that LST monkeys are still adding a new phase. (Except those who purely
          use the editors.)

          Phase 0: (The way it is now). "Ooook, understand LST files. Gimme
          banana."
          Phase 1: (XML as tab-oriented). "Ooook, need to learn all these
          strange XML tags. Gimme banana."
          Phase 2+: (XML - taking full advantage of XML) "Arg, everything I just
          learned is now changing. This week they just redid class skills, last
          week was weapons, next week is spell lists. Gimme club. And banana."

          Again, this isn't insurmountable. This becomes a question of "do you
          want to relearn things all at once, or do you want to relearn things in
          pieces, but some of the early stuff you relearn will be changed again by
          the end." I don't know which is more of a problem.

          I'm leaning more towards doing it in phases just to satisfy all of the
          users with custom LSTs because we'd better to be able to write the first
          tabbed-text to XML converters, but I have no pull. 8) Let's see what
          Keith thinks.

          Cheers,
          Blue
          XML lemur (self-appointed)
        • merton_monk <merton_monk@yahoo.com>
          Hopefully most of our data monkeys will be taking full advantage of the GUI editors, though some work will always need to be done by hand. My gut feeling is
          Message 4 of 14 , Feb 27 12:50 PM
          • 0 Attachment
            Hopefully most of our data monkeys will be taking full advantage of
            the GUI editors, though some work will always need to be done by
            hand. My gut feeling is that in a project like this, trying to fully
            convert all at once will simply require too much work up front. I
            think the total amount of work necessary this way would be less, but
            simply not practical considering how many monkeys we'll have to work
            on it. If there were significant enough advantages to going this
            route I might consider suspending all development (data and code)
            outside of this project so as to limit complicating factors, but it'd
            have to be a very compelling case.

            I'd like to get Keith's input on this as well, but I'm leaning toward
            a phased attack because the threshhold for getting results is lower.
            We'll need to communicate to the code monkeys how to take optimial
            advantage of xml and what kinds of data modelings changes in the code
            we should make. The data monkeys will need to be briefed on the new
            format, and the GUI Lst Editors will need to be able to export in xml
            format. This is a huge project, which usually best tackled in
            portions.

            -Bryan
          • Keith Davies
            ... I ve been watching this thread to see what other people s thoughts are before I said anything. We have a few ways to approach this. The first is the dead
            Message 5 of 14 , Feb 27 1:58 PM
            • 0 Attachment
              On Thu, Feb 27, 2003 at 08:50:35PM +0000, merton_monk <merton_monk@...> wrote:
              > Hopefully most of our data monkeys will be taking full advantage of
              > the GUI editors, though some work will always need to be done by hand.
              > My gut feeling is that in a project like this, trying to fully convert
              > all at once will simply require too much work up front. I think the
              > total amount of work necessary this way would be less, but simply not
              > practical considering how many monkeys we'll have to work on it. If
              > there were significant enough advantages to going this route I might
              > consider suspending all development (data and code) outside of this
              > project so as to limit complicating factors, but it'd have to be a
              > very compelling case.
              >
              > I'd like to get Keith's input on this as well, but I'm leaning toward
              > a phased attack because the threshhold for getting results is lower.
              > We'll need to communicate to the code monkeys how to take optimial
              > advantage of xml and what kinds of data modelings changes in the code
              > we should make. The data monkeys will need to be briefed on the new
              > format, and the GUI Lst Editors will need to be able to export in xml
              > format. This is a huge project, which usually best tackled in
              > portions.

              I've been watching this thread to see what other people's thoughts are
              before I said anything. We have a few ways to approach this.

              The first is the dead simple, straightforward conversion from XML. Net
              gain, IMO, isn't signficant enough to warrant the change because this
              would lead to, frankly, crap XML. It *would* make it at least possible
              to use third-party tools to extract the data from here, but it'll be
              denormalized and exhibit the flaws[1] evident in the LST file design.

              [1] Not to denigrate Bryan or anyone else who's worked on it; it's
              proven remarkably flexible and able to be bent to handle things well
              beyond what it should, by rights, be able to do. However, in the
              bending it's being used in ways that I think are unnecessarily
              complex.

              The second is to walk a middle line; define clear XML for each game item
              type to be described -- elements for weapons, spells, feats, etc. This
              is where I started. I found that it was not as elegant as I'd been
              hoping, and would be quite restrictive. However, this would probably be
              a reasonable approach in that it would in fact provide enough of a gain
              to be worth the effort, but again would depend on the data model
              currently in use. It'd provide an XML schema, it'd allow reuse of much
              of the existing PCGen source code (change only the I/O routines,
              ideally), but would not, at this point, provide an opportunity to
              remodel the data in PCGen (something Bryan has, IIRC, commented on being
              desireable). This step can also make use of a subset of what I've
              designed; by and large, if the sample schema I've described is used, the
              d20 items can be used more or less as described.

              The third is to go all the way. Take the models that I've described and
              use them at the meta level to describe the rules and elements to be used
              in each game mode, and ideally even to describe how to edit content of
              those elements. This is my preferred mode because it'll allow easier
              support of more game modes more easily, and to be complete honest I
              think it may even be *simpler* code because it handles general case
              behavior rather than more special case scenarios.


              Now, in terms of risk vs. gain in the short term, I think that the
              second option is better; this is more or less in agreement with Bryan.
              Once the second part has been done, the third could begin. I don't know
              how much would really be reusable in this case, though, because the
              mindset behind the code is quite different. The data, however, would
              either be reusable or at least, for the most part, easily converted.

              I really think that the third option is where we want to end up, though.
              As such, I'd rather work directly toward that goal; the same amount of
              work *for that part* will need to be done, and when completed the second
              part could (not necessarily *would*) be considered unnecessary work.

              What we may want to consider is a hybrid. Branch the code. Core PCGen
              could follow the second path; this is the best risk:gain path at this
              point and would mean that development would continue. The branch, which
              would be pursuing greater change in not only data file structure but
              also internal data structure and interface construction, is considerably
              higher risk... but also provides the greatest gain. I think the
              benefits are commensurate with the risk, but the risk is, I think, large
              enough that I would not be willing to interrupt the existing process in
              order to experiment, only to throw it away.

              In pursuit of the hybrid solution, I would make the following
              suggestions:

              1. Do the hybrid thing, branch.
              2. The XML data schema produced by the experimental branch is used by
              both branches. The gui and meta schemas would be provided for
              reference to the core branch, but only really used by the
              experimental branch.
              3. Two seperate teams work on these, with reports to the BoD. I'd like
              to solicit for a dev team -- XML and Java -- on the main list; this
              may or may not affect those working on the core stuff right now.
              3a. A third team would handle conversion of the data alone; I want a
              toolsmith -- Java, C/C++, Perl, whatever (is Eric available?) --
              to build tools to do this.
              4. In terms of scheduling, I expect that the core branch could be
              converted to XML in a month or two. I do not know how long it would
              take for the experimental branch to complete; this would only be
              determinable as we start to work.

              This is *not* a fork, but two different paths of development to the same
              goal. One is safe and will get the travelers to their goal safely. The
              other is through the mountains, along cliffs and across rivers... but
              goes to, I hope, a better place.

              Thoughts, comments?


              Keith
              --
              Keith Davies
              keith.davies@...

              PCGen: <reaper/>, smartass
              "You just can't argue with a moron. It's like handling Nuclear
              waste. It's not good, it's not evil, but for Christ's sake, don't
              get any on you!!" -- Chuck, PCGen mailing list
            • CC Americas 1 Carstensen James
              ... Let me summarize to make sure I understand your points. Your three points of XML conversion are: 1. XML that exactly mimics current tab-separated LST
              Message 6 of 14 , Feb 28 6:36 AM
              • 0 Attachment
                Keith wrote:
                > [ ... ] Thoughts, comments?

                Let me summarize to make sure I understand your points. Your three
                points of XML conversion are:

                1. XML that exactly mimics current tab-separated LST files
                2. XML that describes all of the game elements in an XML way
                3. XML that goes all they way, takes #2 but also supports meta-level
                descriptions and game rules.

                I do see a few (minor) benefits of #1, but only as a step towards #2 &
                #3.
                * Can get XML reading into PCGen to quickly and get it debugged before
                moving further.
                * _Very_ simple to right converters for existing LST files (both
                community supported and user generated)
                * Prepare the community to think in XML

                However I also agree with your statement that it doesn't bring enough to
                PCGen to worthwhile. Perhaps the Code Monkeys would want that as a
                first step, but from an data-format standpoint it doesn't add any value
                over tap-separated.

                As you said, I agree with your thoughts that #2 is the best risk/gain in
                the short term, and #3 is the best risk/gain in the long term. I won't
                comment on branching the code - I don't know enough if that would
                require a freeze on bug fixes/FREQ or if we could do the work once and
                have it apply to both branches (CVS should do that, unless out
                underlying representation of the data changes the nature of the bug
                fixes).

                Where I have a concern if going from #2 to #3 (or even #1 to #2/#3). #3
                seems by far the most elegant and powerful, but is hampered by all of
                the pre-existing LST files. In order to really take advantage of #3 I
                can't see an automatic converter. Which means that we need to do a lot
                of work just to preserve our existing LST functionality, which is work
                not being used to more forward.

                You had mentioned a hybrid of #2 and #3, which kept the formats
                separate. How feasible is a hybrid of #2 and #3 in which we don't?
                Just as now we have an evolution of tags, what would we have to do to
                develop the XML format for #3 such that is also supports the data for
                #2? In other words we put together XML specs for #3, but our initial
                goal is only to provide functionality for the #2 subset in code,
                converters, and list files. This gives us a fairly clean path to write
                converters for in the short term and goes for the maximized risk/reward
                you had mentioned, but then allows us (from both a code and LST
                perspective) to add in the meta data and slowly take advantage of it.
                Just as we currently add or update tags, this would also be working
                toward the eventual goal of #3, but #3 designed as a superset of #2.

                However, unlike the current tags, which we occasionally need to redo to
                keep with the current vision, instead we'll have a clear path and will
                have designed with future functionality taken into consideration.

                Cheers,
                Blue

                -----Original Message-----
                From: Keith Davies [mailto:keith.davies@...]
                Sent: Thursday, February 27, 2003 4:58 PM
                To: pcgen-xml@yahoogroups.com
                Subject: Re: [pcgen-xml] Re: XML Conversion - phased


                On Thu, Feb 27, 2003 at 08:50:35PM +0000, merton_monk
                <merton_monk@...> wrote:
                > Hopefully most of our data monkeys will be taking full advantage of
                > the GUI editors, though some work will always need to be done by hand.
                > My gut feeling is that in a project like this, trying to fully convert
                > all at once will simply require too much work up front. I think the
                > total amount of work necessary this way would be less, but simply not
                > practical considering how many monkeys we'll have to work on it. If
                > there were significant enough advantages to going this route I might
                > consider suspending all development (data and code) outside of this
                > project so as to limit complicating factors, but it'd have to be a
                > very compelling case.
                >
                > I'd like to get Keith's input on this as well, but I'm leaning toward
                > a phased attack because the threshhold for getting results is lower.
                > We'll need to communicate to the code monkeys how to take optimial
                > advantage of xml and what kinds of data modelings changes in the code
                > we should make. The data monkeys will need to be briefed on the new
                > format, and the GUI Lst Editors will need to be able to export in xml
                > format. This is a huge project, which usually best tackled in
                > portions.

                I've been watching this thread to see what other people's thoughts are
                before I said anything. We have a few ways to approach this.

                The first is the dead simple, straightforward conversion from XML. Net
                gain, IMO, isn't signficant enough to warrant the change because this
                would lead to, frankly, crap XML. It *would* make it at least possible
                to use third-party tools to extract the data from here, but it'll be
                denormalized and exhibit the flaws[1] evident in the LST file design.

                [1] Not to denigrate Bryan or anyone else who's worked on it; it's
                proven remarkably flexible and able to be bent to handle things well
                beyond what it should, by rights, be able to do. However, in the
                bending it's being used in ways that I think are unnecessarily
                complex.

                The second is to walk a middle line; define clear XML for each game item
                type to be described -- elements for weapons, spells, feats, etc. This
                is where I started. I found that it was not as elegant as I'd been
                hoping, and would be quite restrictive. However, this would probably be
                a reasonable approach in that it would in fact provide enough of a gain
                to be worth the effort, but again would depend on the data model
                currently in use. It'd provide an XML schema, it'd allow reuse of much
                of the existing PCGen source code (change only the I/O routines,
                ideally), but would not, at this point, provide an opportunity to
                remodel the data in PCGen (something Bryan has, IIRC, commented on being
                desireable). This step can also make use of a subset of what I've
                designed; by and large, if the sample schema I've described is used, the
                d20 items can be used more or less as described.

                The third is to go all the way. Take the models that I've described and
                use them at the meta level to describe the rules and elements to be used
                in each game mode, and ideally even to describe how to edit content of
                those elements. This is my preferred mode because it'll allow easier
                support of more game modes more easily, and to be complete honest I
                think it may even be *simpler* code because it handles general case
                behavior rather than more special case scenarios.


                Now, in terms of risk vs. gain in the short term, I think that the
                second option is better; this is more or less in agreement with Bryan.
                Once the second part has been done, the third could begin. I don't know
                how much would really be reusable in this case, though, because the
                mindset behind the code is quite different. The data, however, would
                either be reusable or at least, for the most part, easily converted.

                I really think that the third option is where we want to end up, though.
                As such, I'd rather work directly toward that goal; the same amount of
                work *for that part* will need to be done, and when completed the second
                part could (not necessarily *would*) be considered unnecessary work.

                What we may want to consider is a hybrid. Branch the code. Core PCGen
                could follow the second path; this is the best risk:gain path at this
                point and would mean that development would continue. The branch, which
                would be pursuing greater change in not only data file structure but
                also internal data structure and interface construction, is considerably
                higher risk... but also provides the greatest gain. I think the
                benefits are commensurate with the risk, but the risk is, I think, large
                enough that I would not be willing to interrupt the existing process in
                order to experiment, only to throw it away.

                In pursuit of the hybrid solution, I would make the following
                suggestions:

                1. Do the hybrid thing, branch.
                2. The XML data schema produced by the experimental branch is used by
                both branches. The gui and meta schemas would be provided for
                reference to the core branch, but only really used by the
                experimental branch.
                3. Two seperate teams work on these, with reports to the BoD. I'd like
                to solicit for a dev team -- XML and Java -- on the main list; this
                may or may not affect those working on the core stuff right now.
                3a. A third team would handle conversion of the data alone; I want a
                toolsmith -- Java, C/C++, Perl, whatever (is Eric available?) --
                to build tools to do this.
                4. In terms of scheduling, I expect that the core branch could be
                converted to XML in a month or two. I do not know how long it would
                take for the experimental branch to complete; this would only be
                determinable as we start to work.

                This is *not* a fork, but two different paths of development to the same
                goal. One is safe and will get the travelers to their goal safely. The
                other is through the mountains, along cliffs and across rivers... but
                goes to, I hope, a better place.

                Thoughts, comments?


                Keith
                --
                Keith Davies
                keith.davies@...

                PCGen: <reaper/>, smartass
                "You just can't argue with a moron. It's like handling Nuclear
                waste. It's not good, it's not evil, but for Christ's sake, don't
                get any on you!!" -- Chuck, PCGen mailing list


                To unsubscribe from this group, send an email to:
                pcgen-xml-unsubscribe@yahoogroups.com



                Your use of Yahoo! Groups is subject to
                http://docs.yahoo.com/info/terms/
              • Keith Davies
                ... Correct in all cases. ... That is my belief. While it may have some benefit from a coding standpoint, going to the obvious XML changes things from a
                Message 7 of 14 , Feb 28 8:07 AM
                • 0 Attachment
                  On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen James wrote:
                  > Keith wrote:
                  > > [ ... ] Thoughts, comments?
                  >
                  > Let me summarize to make sure I understand your points. Your three
                  > points of XML conversion are:
                  >
                  > 1. XML that exactly mimics current tab-separated LST files
                  > 2. XML that describes all of the game elements in an XML way
                  > 3. XML that goes all they way, takes #2 but also supports meta-level
                  > descriptions and game rules.

                  Correct in all cases.

                  > I do see a few (minor) benefits of #1, but only as a step towards #2 &
                  > #3.
                  > * Can get XML reading into PCGen to quickly and get it debugged before
                  > moving further.
                  > * _Very_ simple to write converters for existing LST files (both
                  > community supported and user generated)
                  > * Prepare the community to think in XML
                  >
                  > However I also agree with your statement that it doesn't bring enough to
                  > PCGen to worthwhile. Perhaps the Code Monkeys would want that as a
                  > first step, but from an data-format standpoint it doesn't add any value
                  > over tab-separated.

                  That is my belief. While it may have some benefit from a coding
                  standpoint, going to the obvious XML changes things from a semi-arcane
                  format that is reasonably concise to a verbose but somewhat easier to
                  understand format. While this may be of some benefit, I think that we
                  wouldn't be staying here long enough to really be worth the trouble.

                  > As you said, I agree with your thoughts that #2 is the best risk/gain in
                  > the short term, and #3 is the best risk/gain in the long term. I won't
                  > comment on branching the code - I don't know enough if that would
                  > require a freeze on bug fixes/FREQ or if we could do the work once and
                  > have it apply to both branches (CVS should do that, unless out
                  > underlying representation of the data changes the nature of the bug
                  > fixes).

                  To be honest, I don't know how much of the existing code -- the code
                  that would be used in #2 -- would really get reused. I think that the
                  underlying mechanism of the program would be changing, and that the
                  scope of the changes would be broad enough that, while we could use the
                  existing code as a resource, it probably wouldn't make a great place to
                  start. The internal data model would be changing, the front end code
                  would be more or less replaced... all that would be left would be a bit
                  of the code used to handle certain game constructs, and that probably
                  wouldn't even work with the generalized data.

                  Hence this is the 'high-risk' route. I think it'd lead to simpler code
                  overall, but it means not being able to make full use of what already
                  has been done.

                  > Where I have a concern if going from #2 to #3 (or even #1 to #2/#3). #3
                  > seems by far the most elegant and powerful, but is hampered by all of
                  > the pre-existing LST files. In order to really take advantage of #3 I
                  > can't see an automatic converter. Which means that we need to do a lot
                  > of work just to preserve our existing LST functionality, which is work
                  > not being used to more forward.

                  Most of the simpler stuff -- which is, I think, most of everything --
                  should be pretty straightforward to convert. It's where things get
                  complex that we'll run into difficulty... but then again, I think they
                  could get simpler because the XML schema models them better. For
                  instance, right now we have a number of templates whose sole purpose is
                  to implement changes to the character that cannot be done another way.
                  The general rule in the XML is that anything that can change something
                  can change anything about it.

                  Gah. Ugly sentence. How about an example?

                  A class level can change anything about the character it is applied to;
                  it can add to stats, change the class hit die[1], increase or decrease
                  character size, change race, whatever. Similarly, the race can give /n/
                  levels of sorcerer, or just give 12 levels of sorcerer spellcasting (no
                  hit dice, familiar, etc., just spells). Taking a feat can give bonus
                  ranks in a skill; taking a tenth rank in a skill might give a feat. Why
                  not?

                  Granted, it means that things can be a little unsane (skill ranks give
                  feats? Huh?), but I'm willing to leave that to the data people. As far
                  as I'm concerned this is good.

                  [1] well, not *exactly*, but close enough -- each level may add a hit
                  die, but there's nothing saying that they have to be the same
                  size... incidentally, this also makes it easy to support the
                  material in Savage Species.

                  > You had mentioned a hybrid of #2 and #3, which kept the formats
                  > separate. How feasible is a hybrid of #2 and #3 in which we don't?
                  > Just as now we have an evolution of tags, what would we have to do to
                  > develop the XML format for #3 such that is also supports the data for
                  > #2? In other words we put together XML specs for #3, but our initial
                  > goal is only to provide functionality for the #2 subset in code,
                  > converters, and list files. This gives us a fairly clean path to write
                  > converters for in the short term and goes for the maximized risk/reward
                  > you had mentioned, but then allows us (from both a code and LST
                  > perspective) to add in the meta data and slowly take advantage of it.
                  > Just as we currently add or update tags, this would also be working
                  > toward the eventual goal of #3, but #3 designed as a superset of #2.

                  The formats wouldn't be entirely seperate, just parts of them. #2 would
                  use the data schema for D&D that comes out of #3; #3 would also support
                  the meta and gui mechanisms. Thus, #2 would be a subset of #3 in terms
                  of XML, but they would be able to use the same game data. At least, #3
                  would be if the correct meta definitions are loaded.

                  Perhaps 'hybrid' was a poorly-chosen word. I didn't mean a hybrid
                  program, but a cross between the two approaches.

                  What you've described is almost what I had in mind, though. If we were
                  to branch as I described earlier, the core branch would be modified to
                  make use of the d20 (D&D) XML schema (parts of which are in the
                  documentation I recently uploaded). This would get the extant data into
                  XML (at least for D&D, and probably for the others; the differences
                  aren't that profound yet) and the core branch would be able to use it.

                  The extant data can be converted -- automagically where feasible,
                  manually where not. I think many of the existing encodings may be made
                  simpler, but I think that -- for the ugly parts -- we can continue to
                  support pretty much the same behavior, at least initially. For
                  instance, where a class has to add a template in order to support
                  something (say, a class that increases the character's size) we can
                  probably do that conversion automatically, but maybe come back later to
                  make it more elegant (make the size change directly in the class
                  description, rather than in the template).

                  For the new branch, I don't know that we'd be able to scavenge a lot of
                  code from the core program to use in the meta-enabled program. The two
                  programs, while they have the same goal, don't really do it the same
                  way. The third program moves as much as possible into data, and will, I
                  hope, be simpler overall.


                  Keith
                  --
                  Keith Davies
                  keith.davies@...

                  PCGen: <reaper/>, smartass
                  "You just can't argue with a moron. It's like handling Nuclear
                  waste. It's not good, it's not evil, but for Christ's sake, don't
                  get any on you!!" -- Chuck, PCGen mailing list
                • Keith Davies
                  ... Any questions or comments? I ve been asked to move things up and unless someone comes up with something I haven t considered before now,
                  Message 8 of 14 , Feb 28 10:38 PM
                  • 0 Attachment
                    On Fri, Feb 28, 2003 at 08:07:04AM -0800, Keith Davies wrote:
                    > On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen James wrote:
                    > >
                    > > Let me summarize to make sure I understand your points. Your three
                    > > points of XML conversion are:
                    > >
                    > > 1. XML that exactly mimics current tab-separated LST files
                    > > 2. XML that describes all of the game elements in an XML way
                    > > 3. XML that goes all they way, takes #2 but also supports meta-level
                    > > descriptions and game rules.

                    <killer snip>

                    Any questions or comments? I've been asked to move things up and unless
                    someone comes up with something I haven't considered before now, it'll
                    be decision time.


                    Keith
                    --
                    Keith Davies
                    keith.davies@...

                    PCGen: <reaper/>, smartass
                    "You just can't argue with a moron. It's like handling Nuclear
                    waste. It's not good, it's not evil, but for Christ's sake, don't
                    get any on you!!" -- Chuck, PCGen mailing list
                  • Eric Beaudoin
                    ... I was really hoping that this would not be the case i.e. that only the parsing code would be change. The monkeys have worked hard to make PCGEN go faster,
                    Message 9 of 14 , Feb 28 11:34 PM
                    • 0 Attachment
                      At 11:07 2003.02.28, Keith Davies wrote:
                      >To be honest, I don't know how much of the existing code -- the code
                      >that would be used in #2 -- would really get reused. I think that the
                      >underlying mechanism of the program would be changing, and that the
                      >scope of the changes would be broad enough that, while we could use the
                      >existing code as a resource, it probably wouldn't make a great place to
                      >start. The internal data model would be changing, the front end code
                      >would be more or less replaced... all that would be left would be a bit
                      >of the code used to handle certain game constructs, and that probably
                      >wouldn't even work with the generalized data.

                      I was really hoping that this would not be the case i.e. that only the parsing code would be change. The monkeys have worked hard to make PCGEN go faster, rewriting it at this point is not only high risk, it will probably be a major step back.

                      The goal should be that external representation of the data should have a minimal impact on the application represent them internaly. This XML representation will most probably also be used for E-Tools so it stands a good chance of becoming the defacto data representation standard for d20 products. It is very important that we design a language that is as much application independant as possible.

                      Also, for the phase parts, I'm for the quick gains. In my opinion, we should go to a "transition" XML schema that mimics the .LST syntax that we have first and then build on this. No mather how hard we try, we will not be able to create the "right" schema" on the first try anyway. It will be a sery of little evolutions getting us to the right point. We might as well start with something that is already familiar and work from there.

                      My opinion anyway.

                      P.S. Don't count on editor to much to hide the complexity of the schemas. All the major data contributors are still using text editors to get the job done. We should assume that it will stay that way and plan a schema that will be easy to work with rather than easy for the machine to read. It's easier to optimise a parser than to find and train dedicated data monkeys.


                      -----------------------------------------------------------
                      √Čric "Space Monkey" Beaudoin
                      >> In space, no one can hear you sleep...
                      >> Camels to can climb trees (and sometime eat them)
                      <mailto:beaudoer@...>
                    • Scott Ellsworth
                      ... As one of the monkeys who has spent upwards of a coder-month working on optimization, I must humbly disagree. I am not sure that it would take less time
                      Message 10 of 14 , Mar 1, 2003
                      • 0 Attachment
                        On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:

                        > At 11:07 2003.02.28, Keith Davies wrote:
                        >> The internal data model would be changing, the front end code
                        >> would be more or less replaced... all that would be left would be a
                        >> bit
                        >> of the code used to handle certain game constructs, and that probably
                        >> wouldn't even work with the generalized data.
                        >
                        > I was really hoping that this would not be the case i.e. that only the
                        > parsing code would be change. The monkeys have worked hard to make
                        > PCGEN go faster, rewriting it at this point is not only high risk, it
                        > will probably be a major step back.

                        As one of the monkeys who has spent upwards of a coder-month working on
                        optimization, I must humbly disagree. I am not sure that it would take
                        less time to fix the current data model than to write a new one. This
                        is not to say that the current code is bad, just that it was not
                        written originally for optimal speed, and we have enough data that we
                        really need to have indexed hashes for our data.

                        For example, the current code regularly iterates over collections of
                        keys, doing a caseless text comparison of each key, because the keys
                        double as user editable text. Further, it regularly reparses the same
                        string for information it already parsed. (Every tab switch requires
                        reparsing a whole bunch of entries for Variables, splitting on |
                        characters.) These two things alone eat up well over 90% of the
                        execution time, according to my profiler, and they are so deep in the
                        program that they are very, very hard to fix. It took me four hours to
                        do the work to just replace global weapon profs, and that was one of
                        the easier ones.

                        Keith has proposed a core data model where keys are unique and always,
                        always, always in lower case.

                        Further, in Keith's data model, we would not parse a string like
                        BONUS:somestuff|moreStuff|+2|otherstuff
                        more than once - the data would be broken apart on read, and stored in
                        appropriate hashes or lists, so it would be easy to find out if an item
                        gave a strength bonus, for example.

                        We could split these efforts: convert the data model, and convert the
                        XML files as separate tasks. The lst files would remain essentially
                        the same, but would have separate key and user visible name data, and
                        the reader would break apart all bonuses/variables on reading. I have
                        been noodling away at that it my spare time, but it is not a fast
                        process.

                        If this was a priority, we would want to get a number of code monkeys
                        working on it as a major feature. We could easily get an order of
                        magnitude out of this, as we make over 5000 function calls just to
                        switch a tab, and we should be making in the hundreds at the outside.

                        Scott
                      • Keith Davies
                        ... Scott, please contact me offlist. I have some questions that I think you can answer for me; it d be a big help. ICQ 8550570, or AIM keithjdavies. Thanks,
                        Message 11 of 14 , Mar 1, 2003
                        • 0 Attachment
                          On Sat, Mar 01, 2003 at 02:51:40PM -0800, Scott Ellsworth wrote:
                          >
                          > On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:
                          >
                          > > At 11:07 2003.02.28, Keith Davies wrote:
                          > >> The internal data model would be changing, the front end code
                          > >> would be more or less replaced... all that would be left would be a
                          > >> bit
                          > >> of the code used to handle certain game constructs, and that probably
                          > >> wouldn't even work with the generalized data.
                          > >
                          > > I was really hoping that this would not be the case i.e. that only the
                          > > parsing code would be change. The monkeys have worked hard to make
                          > > PCGEN go faster, rewriting it at this point is not only high risk, it
                          > > will probably be a major step back.
                          >
                          > As one of the monkeys who has spent upwards of a coder-month working on
                          > optimization, I must humbly disagree. I am not sure that it would take
                          > less time to fix the current data model than to write a new one. This
                          > is not to say that the current code is bad, just that it was not
                          > written originally for optimal speed, and we have enough data that we
                          > really need to have indexed hashes for our data.

                          Scott, please contact me offlist. I have some questions that I think
                          you can answer for me; it'd be a big help. ICQ 8550570, or AIM
                          keithjdavies.

                          Thanks,
                          Keith
                          --
                          Keith Davies
                          keith.davies@...

                          PCGen: <reaper/>, smartass
                          "You just can't argue with a moron. It's like handling Nuclear
                          waste. It's not good, it's not evil, but for Christ's sake, don't
                          get any on you!!" -- Chuck, PCGen mailing list
                        • merton_monk <merton_monk@yahoo.com>
                          I m logged into AIM right now as CMPMerton (might be a space between CMP and Merton, I forget) if you d like to discuss the persistence layer in PCGen or data
                          Message 12 of 14 , Mar 1, 2003
                          • 0 Attachment
                            I'm logged into AIM right now as CMPMerton (might be a space between
                            CMP and Merton, I forget) if you'd like to discuss the persistence
                            layer in PCGen or data model. Anyone who wants to help with the xml
                            conversion can join... I'll probably be online for a couple of hours.

                            -Bryan

                            --- In pcgen-xml@yahoogroups.com, Keith Davies <keith.davies@k...>
                            wrote:
                            > On Sat, Mar 01, 2003 at 02:51:40PM -0800, Scott Ellsworth wrote:
                            > >
                            > > On Friday, February 28, 2003, at 11:34 PM, Eric Beaudoin wrote:
                            > >
                            > > > At 11:07 2003.02.28, Keith Davies wrote:
                            > > >> The internal data model would be changing, the front end code
                            > > >> would be more or less replaced... all that would be left
                            would be a
                            > > >> bit
                            > > >> of the code used to handle certain game constructs, and that
                            probably
                            > > >> wouldn't even work with the generalized data.
                            > > >
                            > > > I was really hoping that this would not be the case i.e. that
                            only the
                            > > > parsing code would be change. The monkeys have worked hard to
                            make
                            > > > PCGEN go faster, rewriting it at this point is not only high
                            risk, it
                            > > > will probably be a major step back.
                            > >
                            > > As one of the monkeys who has spent upwards of a coder-month
                            working on
                            > > optimization, I must humbly disagree. I am not sure that it
                            would take
                            > > less time to fix the current data model than to write a new
                            one. This
                            > > is not to say that the current code is bad, just that it was not
                            > > written originally for optimal speed, and we have enough data
                            that we
                            > > really need to have indexed hashes for our data.
                            >
                            > Scott, please contact me offlist. I have some questions that I
                            think
                            > you can answer for me; it'd be a big help. ICQ 8550570, or AIM
                            > keithjdavies.
                            >
                            > Thanks,
                            > Keith
                            > --
                            > Keith Davies
                            > keith.davies@k...
                            >
                            > PCGen: <reaper/>, smartass
                            > "You just can't argue with a moron. It's like handling Nuclear
                            > waste. It's not good, it's not evil, but for Christ's sake,
                            don't
                            > get any on you!!" -- Chuck, PCGen mailing list
                          • CC Americas 1 Carstensen James
                            Keith, Eric and Scott said it, but I think the big question is how much work the Code Monkeys are willing to do. I ll sum up a couple of points from different
                            Message 13 of 14 , Mar 3, 2003
                            • 0 Attachment
                              Keith,

                              Eric and Scott said it, but I think the big question is how much work
                              the Code Monkeys are willing to do. I'll sum up a couple of points from
                              different posts here.

                              Eric Beaudoin wrote:
                              > Also, for the phase parts, I'm for the quick gains. In my opinion, we
                              should go to a "transition" XML schema that mimics the .LST syntax that
                              we have first and then build on this. No mather how hard we try, we will
                              not be able to create the "right" schema" on the first try anyway. It
                              will be a sery of little evolutions getting us to the right point. We
                              might as well start with something that is already familiar and work
                              from there.

                              Eric, you're contributed much more then me to PCGen, take this with the
                              respect it's intended. From looking at the list files, and looking at
                              the "cleaner" implementation that Keith wants, it's more of a revolution
                              then an evolution. Some things are just done in a different enough way
                              that getting there from a direct tab-to-XML conversion is not easy, may
                              not be optimal, and may end up just being "bolted on" rather then a
                              clean implementation.

                              I myself way hoping for the same evolution, but more from Step #2 to
                              Step #3, and even there I don't know how much that's true.

                              The current LST structure has evolved over time to handle more and more,
                              and it has both it's strengths and weaknesses. XML has different
                              strengths and weaknesses, and I think that if we just pull over the LST
                              files to XML format, we'll be importing some of the weaknesses of LST
                              while not mitigating some of the weaknesses of XML.

                              But we grow from there - maybe that's worth that initial period to
                              convert people over quickest and then grow. Is growing from a syntax
                              that won't take advantage of XMLs strengths to one that will a
                              longer/hander process then doing a single big change and have a good
                              foundation to build on?

                              Scott Ellsworth wrote:
                              > We could split these efforts: convert the data model, and convert the
                              XML files as separate tasks. The lst files would remain essentially
                              the same, but would have separate key and user visible name data, and
                              the reader would break apart all bonuses/variables on reading. I have
                              been noodling away at that it my spare time, but it is not a fast
                              process.

                              Is this something that would be going on concurrently, before, or after
                              the XML conversion? It seems like even if we stayed with LST files
                              working out a way to do unique lowercase identifiers would be a benefit.

                              Cheers,
                              Blue

                              -----Original Message-----
                              From: Keith Davies [mailto:keith.davies@...]
                              Sent: Saturday, March 01, 2003 1:38 AM
                              To: pcgen-xml@yahoogroups.com
                              Subject: Re: [pcgen-xml] Re: XML Conversion - phased


                              On Fri, Feb 28, 2003 at 08:07:04AM -0800, Keith Davies wrote:
                              > On Fri, Feb 28, 2003 at 09:36:54AM -0500, CC Americas 1 Carstensen
                              James wrote:
                              > >
                              > > Let me summarize to make sure I understand your points. Your three
                              > > points of XML conversion are:
                              > >
                              > > 1. XML that exactly mimics current tab-separated LST files
                              > > 2. XML that describes all of the game elements in an XML way
                              > > 3. XML that goes all they way, takes #2 but also supports
                              meta-level
                              > > descriptions and game rules.

                              <killer snip>

                              Any questions or comments? I've been asked to move things up and unless
                              someone comes up with something I haven't considered before now, it'll
                              be decision time.


                              Keith
                              --
                              Keith Davies
                              keith.davies@...

                              PCGen: <reaper/>, smartass
                              "You just can't argue with a moron. It's like handling Nuclear
                              waste. It's not good, it's not evil, but for Christ's sake, don't
                              get any on you!!" -- Chuck, PCGen mailing list


                              To unsubscribe from this group, send an email to:
                              pcgen-xml-unsubscribe@yahoogroups.com



                              Your use of Yahoo! Groups is subject to
                              http://docs.yahoo.com/info/terms/
                            • Keith Davies
                              ... Hi All, things have changed over the weekend; I will be posting more later today (I m a little busy at work right now). The schedule is being moved up and
                              Message 14 of 14 , Mar 3, 2003
                              • 0 Attachment
                                On Mon, Mar 03, 2003 at 09:34:09AM -0500, CC Americas 1 Carstensen James wrote:
                                > Keith,
                                >
                                > Eric and Scott said it, but I think the big question is how much work
                                > the Code Monkeys are willing to do. I'll sum up a couple of points from
                                > different posts here.

                                Hi All,

                                things have changed over the weekend; I will be posting more later today
                                (I'm a little busy at work right now). The schedule is being moved up
                                and I've had to make some (somewhat unilateral) decisions. A plan has
                                been submitted to the BoD for ratification and I expect we'll start on
                                it very soon. To summarize, however:

                                1. We will use (more or less) the simplest and most direct translation
                                of LST to XML for now. *Some* of my design will apply, but for the
                                most part the XML will be very recognizable and easily understood by
                                non-XML monkeys. This was a key point both for the sake of data
                                monkey comfort and simple time pressure.
                                2. The internal data model will, for the most part, remain untouched.
                                We may be able to sneak some changes into the IDM, but for the most
                                part we're aiming at minimal impact.
                                3. XML serialization will be supported in parallel with LST I/O, at
                                least during the early stages. Once XML support is in, LST use is
                                deprecated and, in my schedule, slated for earliest possible removal
                                (next major release will be as soon as we can) in order to reduce
                                maintenance headaches.
                                4. All distributed data files will be converted in one pass, rather than
                                file type by file type (increase amount of pain, minimize duration).
                                5. A converter will be provided for LST files in the field.
                                6. We will use the time until the release of 5.0.0 for analysis of the
                                code and LST structure; once 5.0.0 hits the streets we'll branch the
                                source and start the changes.
                                7. After 6.0.0 (full XML, LST removed) is released, we branch the source
                                as described in an earlier posting and pursue the path I'd *like* to
                                take.

                                The plan above will get us into XML and play on the monkey's familiarity
                                with LST files. It will be crap XML -- in the sense that while it will
                                comply with XML standards it will be poorly-designed XML that will not
                                take advantage of all the benefits that XML will give us -- but will
                                cause the least distress among data monkeys and least impact on the
                                code. The conversion will also be simplest, almost 1:1 congruence
                                except where benefits can be gained by making a change.

                                We're moving, folks, and while it's not the way I'd been hoping to go,
                                it will get us headed in the right direction.


                                Keith
                                --
                                Keith Davies
                                keith.davies@...

                                PCGen: <reaper/>, smartass
                                "You just can't argue with a moron. It's like handling Nuclear
                                waste. It's not good, it's not evil, but for Christ's sake, don't
                                get any on you!!" -- Chuck, PCGen mailing list
                              Your message has been successfully submitted and would be delivered to recipients shortly.