Loading ...
Sorry, an error occurred while loading the content.

[ARCH] Using a Graph for Character Data Store

Expand Messages
  • Tom Parker
    One key question in changing PCGen was the data format that we would be targeting internally to PCGen. A key aspect of the architecture was to make sure we
    Message 1 of 4 , Jun 15, 2009
    • 0 Attachment
      One key question in changing PCGen was the data format that we would be targeting internally to PCGen.

      A key aspect of the architecture was to make sure we separate static information (information about a class, e.g. "Mage") from information about that class when applied to a PlayerCharacter (such as the level of that class). While this doesn't impact the "Rules Data Store" it does impact the "Character Data Store". One of the challenges in PCGen 5.14 is that objects must be cloned in order to be used in a PlayerCharacter (to avoid cross-pollution with multiple PlayerCharacters are loaded). This has been partially addressed in 5.16 and the cloning has been mostly resolved in the current Trunk (5.17). However, many of the performance benefits of not cloning have not been realized in 5.17 (this is required work)

      One small tangent before I try to explain a few points about the data structure: PCGen has the concept of associations. When a Feat (such as "Martial Weapon Proficiency") can be taken with multiple targets (such as "Longsword" or "Spear"), we call those targets "associations". Generally the association is limited by what is in the CHOOSE token of the item to which the association is made. Typically, we will write associations in parenthesis, such as "Weapon Proficiency (Dagger)" would indicate an object (likely a Feat) that was called "Weapon Proficiency" with the association "Dagger".

      With that background, let's get into the targeted data format. Since much of the architecture was generated through the development of thought examples and counter-cases, I'll try to walk through a few items here. There are other factors in play, and this jumps over a few months of discussion to the current design point, so some work filling in details will be required going forward. For now, however:

      We started with a tree model, with Classes awarding slots and those slots being filled by Feats, Skills, etc. We quickly found a problem with that idea: If Weapon Proficiency (Dagger) is awarded by a Class and Weapon Proficiency (Short Sword) is awarded by, say, a Template, how do you fetch the associations for Weapon Proficiency?

      If the base Ability is referenced from two places in the Tree, then the tree reconverges, and isn't a tree... Either we have to different tree nodes that wrap the same PObject (possible, but results in tree search issues and dereferencing penalty) or we want this to be a Graph (in the mathematical sense not in the chart sense). The neat thing with a well-built Graph is that means one can quickly grab all of the associations to an Ability without traversing the entire Graph.

      This plays beautifully with another conceptual problem we need to solve: Prerequisites: While PCGen currently has a perspective of "Prerequisite", this is actually used in multiple ways. The first distinction is that there are times when we want to use an item as a "persistent requirement" (meaning the condition must always be met for the PC to have access to the item), while other conditions might only be a requirement when the object (such as a Language) is taken. This is important for addressing situations that are temporary (such as temporary changes to Strength). The long-term solution here is to split "Prerequisites" (conditions that need to be met when an object is taken) from "Requirements" (the condition must always be met). This is an existing Feature Request: http://sourceforge.net/tracker/index.php?func=detail&aid=1782186&group_id=25576&atid=384722

      The second distinction is in how a Prerequisite (literally both Prerequisites or Requirements, but I'm not going to type that every time) is used. Some Prerequisites are universal (meaning they apply every time something is taken) while others are limited (meaning they apply only when an item is taken in a specific context). These are easily distinguished in how they are stored in the LST files. Universal Prerequisites are stored on the object they limit. This means they appear (in 5.16) as a separate PRExxx token on the object (such as a minimum Dexterity on the Dodge Feat). Limited Prerequistes on the other hand, appear as part of another token (typically as a trailing PRExxx after a pipe (|) in another token, but occasionally in brackets [] ). These limited Prerequisites are part of the relationship between the "parent" object and the "child" object.

      With these limited Prerequisites, it is not an object that has the prerequisite, but instead the interaction between the parent and the child has the prerequisite. Thus, we actually have another actor. We therefore need to be able to store information in both the objects and in the relationships between those objects.

      Going back to graph theory, we have Nodes and Edges (edges connect nodes, nodes may also be called vertices depending on your graph theory instructor). The Nodes in our theory contain the objects (such as Feats and Templates), and the Edges are the relationships. These edges could be PRE/REQ and association aware. After all, it's the interaction (an edge, really) that is has the PRExxx in the limited Prerequisite example... Actually this mental model can work for hit points and other things that are also a result of associations.

      So that's the basics of why we're looking at a Graph as the Character Data Store going forward. There are some further details to address and tweak that model, but I'll stop here to let folks absorb and ask questions.

      TP.
    • Martijn Verburg
      Hi all, ... Wow! That s great progress, I didn t realise that trunk was that far advanced already. I m looking forward to fewer strange Why does my kitty
      Message 2 of 4 , Jun 16, 2009
      • 0 Attachment
        Hi all,

        > A key aspect of the architecture was to make sure we separate
        > static information (information about a class, e.g. "Mage") from
        > information about that class when applied to a PlayerCharacter
        > (such as the level of that class). While this doesn't impact the
        > "Rules Data Store" it does impact the "Character Data Store". One
        > of the challenges in PCGen 5.14 is that objects must be cloned in
        > order to be used in a PlayerCharacter (to avoid cross-pollution
        > with multiple PlayerCharacters are loaded). This has been
        > partially addressed in 5.16 and the cloning has been mostly
        > resolved in the current Trunk (5.17).

        Wow! That's great progress, I didn't realise that trunk was that far advanced already. I'm looking forward to fewer strange "Why does my kitty have a bastard sword in one hand errors" :)

        > However, many of the
        > performance benefits of not cloning have not been realized in 5.17
        > (this is required work).

        Are the performance benefits primarily going to be realised simply because we're no longer doing expensive cloning or are there other advantages we can eek out because we will no longer be cloning?

        > <snip>
        >
        > The long-term
        > solution here is to split "Prerequisites" (conditions that need to
        > be met when an object is taken) from "Requirements" (the condition
        > must always be met). This is an existing Feature Request:
        > http://sourceforge.net/tracker/index.php?func=detail&aid=1782186&
        > group_id=25576&atid=384722

        Ahhhhhhh, <Lightbulb goes off in head>.

        > <snip>
        >
        > Universal Prerequisites are stored on the object
        > they limit. This means they appear (in 5.16) as a separate PRExxx > token on the object (such as a minimum Dexterity on the Dodge
        > Feat). Limited Prerequistes on the other hand, appear as part of
        > another token (typically as a trailing PRExxx after a pipe (|) in
        > another token, but occasionally in brackets [] ). These limited
        > Prerequisites are part of the relationship between the "parent"
        > object and the "child" object.

        <bing, another lightbulb goes off>

        > With these limited Prerequisites, it is not an object that has the
        > prerequisite, but instead the interaction between the parent and
        > the child has the prerequisite. Thus, we actually have another
        > actor. We therefore need to be able to store information in both
        > the objects and in the relationships between those objects.
        >
        > Going back to graph theory, we have Nodes and Edges (edges connect
        > nodes, nodes may also be called vertices depending on your graph
        > theory instructor). The Nodes in our theory contain the objects
        > (such as Feats and Templates), and the Edges are the
        > relationships. These edges could be PRE/REQ and association
        > aware. After all, it's the interaction (an edge, really) that is
        > has the PRExxx in the limited Prerequisite example... Actually this
        > mental model can work for hit points and other things that are also
        > a result of associations.

        Having done graph theory in Comp Sci (10+ years ago now mind you) I grok this, well at least when I'm not coffee deprived! But do/can we have some visualisations of this concept on the wiki? Perhaps a link to a basic graph theory explanation/tutorial/paper/whatever on the web (even something like a Wikipedia article if it's not too complex).

        For me using a graph makes _total_ sense, we just need to make sure that newer developers or developers who haven't trained in formal graph theory get help for this concept.

        I haven't dived into the code around this area very much either, but to have a commented sample of how to ties two nodes together with a relationship on the wiki would be very useful.

        > So that's the basics of why we're looking at a Graph as the
        > Character Data Store going forward. There are some further details
        > to address and tweak that model, but I'll stop here to let folks
        > absorb and ask questions.

        Reading this was enlightening it made a couple of concepts stand out nice and clearly for me (especially after getting lost occasionally on experimental).

        Thanks Tom,

        K
      • Tom Parker
        ... [On removal of cloning] Wow!  That s great progress, I didn t realise that trunk was that far advanced already.  It s not 100% done, but you can see the
        Message 3 of 4 , Jun 16, 2009
        • 0 Attachment
          --- On Tue, 6/16/09, Martijn Verburg <martijnverburg@...> wrote:
          [On removal of cloning] Wow!  That's great progress, I didn't realise that trunk was that far advanced already. 

          It's not 100% done, but you can see the state here:
          http://www.pcgen-test.org/wiki/index.php?title=Architecture_Changes_5.17#Eliminate_Cloning

          > However, many of the
          > performance benefits of not cloning have not been realized in 5.17
          > (this is required work).

          Are the performance benefits primarily going to be realised simply because we're no longer doing expensive cloning or are there other advantages we can eek out because we will no longer be cloning?

          The literal cloning CPU effort is probably a small fraction of the performance issue.  It's the side effects of cloning that are painful.  In a clone environment, comparisons can't be instance equality (==), but must be string comparison on key... but you may not have the key to start, so you have to request the key (or abbreviation for a stat).  So even in trivial comparisons we can reduce:

          stat1.getAbbreviation().equalsIgnoreCase(stat2.getAbbreviation());
          to
          stat1.equals(stat2);

          Note that .equals() in this case for PCStat may have been reduced from a deep equals (that has to test values in the PCStat) to simply an instance equality (because no cloning means they are effectively type safe, thus can test solely on ==).  Thus for that equality test we're probably seeing 2 or 3 orders of magnitude of speed increase.  It's small savings each time, but keep in mind we do these tests thousands of times with every change to a PC.

          A concrete example is to look at PCStat objects (PCStats are Ability scores, things like STR, INT, WIS).  The cloning was literally removed in SVN 9690, but then SVN 9884-6, 9888-90, 9893-5, 9897, 9899, 9900 were all additional changes that removed behavior  required when a clone was present but wasteful when no cloning takes place.

          The removal of list searches, String comparisons, etc. provides improvement.  This is why I've been making the point that so much of the performance issue is dependent on other issues (and why I wanted to break the String dependencies between the LST files and the core).  We had to break the cloning behavior in order to be able to get to this point.  IMHO anyway.

          But do/can we have some visualisations of [the graph concept] on the wiki? 

          We will need some examples for the "thought cases"/"counter-cases" that drive us to the current design, so, yes, some will need to be done.


          Perhaps a link to a basic graph theory explanation/tutorial/paper/whatever on the web (even something like a Wikipedia article if it's not too complex).

          Wikipedia http://en.wikipedia.org/wiki/Graph_(mathematics) and http://en.wikipedia.org/wiki/Graph_theory are both pretty good, as is this:
          http://www.boost.org/doc/libs/1_39_0/libs/graph/doc/graph_theory_review.html

          Note that PCGen will at least be using a directed graph (since we have objects granting other objects).  Some other items from the Graph page also come into play (likely a weighted graph as well)

          At some level, I need to concede that a good computer science education cannot be undervalued.  Reading something like "Introduction to Algorithms" or "The Algorithm Design Manual" might be a substitute if the individual in question learns enough by reading.  Then again, we don't really need a large subset of graph theory (one can stop after graph searches), so a bit of delving on Wikipedia and reading some related articles (depth first search, breadth first search, Dijkstra's algorithm) is probably sufficient.

          I haven't dived into the code around this area very much either, but to have a commented sample of how to ties two nodes together with a relationship on the wiki would be very useful.

          Your request is data structure dependent. We're not there yet :)

          TP.
          --
          Tom Parker
          thpr@... and tppublic@...


        • Martijn Verburg
          Hi all, Finally getting to commenting on all of the various architecture threads again, here s the first one (and the easiest!). ... Great to be able to track
          Message 4 of 4 , Jul 21, 2009
          • 0 Attachment
            Hi all,

            Finally getting to commenting on all of the various architecture threads again, here's the first one (and the easiest!).

            > It's not 100% done, but you can see the state here:
            > http://www.pcgen-test.org/wiki/index.php?title=Architecture_Changes_5.17#Eliminate_Cloning

            Great to be able to track this, I've got it bookmarked :)

            >> Are the performance benefits primarily going to be realised simply
            >> because we're no longer doing expensive cloning or are there other
            >> advantages we can eek out because we will no longer be cloning?
            >
            > The literal cloning CPU effort is probably a small fraction of the
            > performance issue.  It's the side effects of cloning that are
            > painful.  In a clone environment, comparisons can't be instance
            > equality (==), but must be string comparison on key... but you may
            > not have the key to start, so you have to request the key (or
            > abbreviation for a stat).  So even in trivial comparisons we can
            > reduce:
            >
            > stat1.getAbbreviation().equalsIgnoreCase(stat2.getAbbreviation());
            > to
            > stat1.equals(stat2);
            >
            > Note that .equals() in this case for PCStat may have been reduced
            > from a deep equals (that has to test values in the PCStat) to
            > simply an instance equality (because no cloning means they are
            > effectively type safe, thus can test solely on ==).  Thus for that
            > equality test we're probably seeing 2 or 3 orders of magnitude of
            > speed increase.  It's small savings each time, but keep in mind we
            > do these tests thousands of times with every change to a PC.
            >
            > A concrete example is to look at PCStat objects (PCStats are
            > Ability scores, things like STR, INT, WIS).  The cloning was
            > literally removed in SVN 9690, but then SVN 9884-6, 9888-90,
            > 9893-5, 9897, 9899, 9900 were all additional changes that removed
            > behavior  required when a clone was present but wasteful when no
            > cloning takes place.
            >
            > The removal of list searches, String comparisons, etc. provides
            > improvement.  This is why I've been making the point that so much
            > of the performance issue is dependent on other issues (and why I
            > wanted to break the String dependencies between the LST files and
            > the core).  We had to break the cloning behavior in order to be
            > able to get to this point.  IMHO anyway.

            OK, I understand this more clearly now, thanks for the explanation.

            > We will need some examples for the "thought cases"/"counter-cases"
            > that drive us to the current design, so, yes, some will need to be
            > done.

            OK, I see you have this in your http://www.pcgen-test.org/wiki/index.php?title=Tom%27s_Arch_%22TO_DO%22_List (which covers the other points raised in this discussion as well)

            >> Perhaps a link to a basic graph theory explanation/tutorial/paper
            >>/whatever on the web (even something like a Wikipedia article if
            >>it's not too complex).
            >
            > Wikipedia http://en.wikipedia.org/wiki/Graph_(mathematics) and
            > http://en.wikipedia.org/wiki/Graph_theory are both pretty good, as
            > is this:
            > http://www.boost.org/doc/libs/1_39_0/libs/graph/doc/graph_theory_review.html
            >
            > Note that PCGen will at least be using a directed graph (since we
            > have objects granting other objects).  Some other items from the
            > Graph page also come into play (likely a weighted graph as well)
            >
            > At some level, I need to concede that a good computer science
            > education cannot be undervalued.  Reading something like
            > "Introduction to Algorithms" or "The Algorithm Design Manual" might
            > be a substitute if the individual in question learns enough by
            > reading.  Then again, we don't really need a large subset of graph
            > theory (one can stop after graph searches), so a bit of delving on
            > Wikipedia and reading some related articles (depth first search,
            > breadth first search, Dijkstra's algorithm) is probably sufficient.

            OK, I've posted this onto the wiki under http://www.pcgen-test.org/wiki/index.php?title=Graph_Theory#Graph_Theory_and_PCGen and linked to it from Architecture and Code Team pages.

            >> I haven't dived into the code around this area very much either,
            >> but to have a commented sample of how to ties two nodes together
            >> with a relationship on the wiki would be very useful.
            >
            > Your request is data structure dependent. We're not there yet :)

            Fair enough!

            K
          Your message has been successfully submitted and would be delivered to recipients shortly.