Loading ...
Sorry, an error occurred while loading the content.
 

Hello! - (and a first question :-)

Expand Messages
  • Enzo Maggi
    Hello everybody (There is no much people around as far as I can see...) I ve start using MSV few weeks ago. I appreciate its unified vision of the different
    Message 1 of 5 , Feb 7, 2002
      Hello everybody
      (There is no much people around as far as I can see...)

      I've start using MSV few weeks ago. I appreciate its unified vision of
      the different XML grammars, the strong support for Visitors, and the
      overall design goals.
      I understood (I hope) the internal AGM and VGM logic, and how the
      validators 'walks' trhough the expressions to validate complex contents.
      I have a sort of newbie question, I apologize in advance if this is the
      case...

      The way in which complex expressions are built and validated is
      definitely the bottleneck of the structure, as Kohsuke also tells in its
      comments to the ExpressionPool class. Since linear sequences of
      expressions are translated into recursive calls to
      createChildAcceptor(), the runtime quickly fill the stack even with
      moderately complex 'maxOccurs' expressions. Also, the number of
      occurences of an expression in an instance document is not easily
      available while validating.
      So... what about having the cardinality of an expression as one of the
      properties of the expression itself, and translate recursive calls into
      loops whenever possible ?
      XML-Schema has the (unelegant) minOccurs and maxOccurs, but rewriting
      rules exists also for DTD-based expression to compute the min and max
      occurs of a complex expression, given its sub-component.
      createChildAcceptor could be used only in case of complexContent, and
      sequences of the type <element A min=1 max=3> would not be translated
      into the implicitly recursive expression A, (A, (A)?)?

      Is this possible?

      Thanks,

      Enzo Maggi

      Note: I don't know the syntax of RELAX
    • Kohsuke KAWAGUCHI
      Thank you for your interest in MSV.I have a sort of newbie question, I apologize in advance if this is the case...No, not at all. This is by no means
      Message 2 of 5 , Feb 9, 2002
        Thank you for your interest in MSV.

        > I have a sort of newbie question, I apologize in advance if this is the
        > case...

        No, not at all. This is by no means a newbie question.


        > So... what about having the cardinality of an expression as one of the
        > properties of the expression itself, and translate recursive calls into
        > loops whenever possible ?

        I actually heard this request once before. At that time, the proposal
        was to add OccurenceExp, an unary expression with a cardinality.

        But I'm still not sure whether it's a good idea.


        Firstly, OccurenceExp doesn't solve the memory usage problem, because
        all expressions has to be immutable.

        Say you have <element ref="X" maxOccurs="100"/>, then you'll first have

        OccurenceExp[val=100]
        |
        +- ElementExp

        But as the validation goes on, you'll eventually have 100 OccurenceExp
        with different "val" values. The number of expressions will be the same.


        Secondly, although it makes some expressions simpler, it also makes some
        other expressions complex.

        For example, A? can be now representable in more than one way (you can
        either use <choice> with <empty/>, or you can use this new OccurenceExp.)


        And finally, now I am using MSV as a basis of other software (e.g.,
        RELAX NG converter) now. Adding a new primitive expression causes a huge
        change to them. Therefore I'm reluctant to incorporate such a
        fundamental change.


        I think the memory usage problem can be addressed by simply treating a
        big maxOccurs as maxOccurs="unbounded".


        regards,
        ----------------------
        Kohsuke Kawaguchi
        E-Mail: kk@...


        _________________________________________________________
        Do You Yahoo!?
        Get your free @... address at http://mail.yahoo.com
      • Enzo Maggi
        ... ... got it. Thank you. Adding an Occurrence exp would not avoid a combinatory explosion at validation time for large values of maxOccur... I have a
        Message 3 of 5 , Feb 11, 2002
          Kohsuke KAWAGUCHI wrote:

          >
          >
          >Firstly, OccurenceExp doesn't solve the memory usage problem, because
          >all expressions has to be immutable.
          >
          >Say you have <element ref="X" maxOccurs="100"/>, then you'll first have
          >
          >OccurenceExp[val=100]
          > |
          > +- ElementExp
          >
          >But as the validation goes on, you'll eventually have 100 OccurenceExp
          >with different "val" values. The number of expressions will be the same.
          >


          ... got it. Thank you.
          Adding an Occurrence exp would not avoid a combinatory explosion at
          validation time for large values of maxOccur...

          I have a concrete problem, directly deriving from the occurrence count;
          maybe I can solve this without the need to introduce new abstractions:

          In the TypeDectector, how to know wich is the number of occurences that
          caused a content model to be valid?. For example:
          <element name="A">
          <complex>
          <element name="B" minOccurs="1" maxOccurs="3"/>
          </>
          </>

          And an instance
          <A>
          <B/>
          <B/>
          </>

          After validating A, I need to know that its content model B occurred 2
          times.
          I was not able to retrieve this information from the API, without
          writing a Validator myself...
          Is there a simpler way? (Even a pointer in the API would be enough)

          Thanks in advance

          Enzo Maggi
        • Kohsuke KAWAGUCHI
          ... _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
          Message 4 of 5 , Feb 12, 2002
            > I have a concrete problem, directly deriving from the occurrence count;
            > maybe I can solve this without the need to introduce new abstractions:
            >
            > In the TypeDectector, how to know wich is the number of occurences that
            > caused a content model to be valid?. For example:
            > <element name="A">
            > <complex>
            > <element name="B" minOccurs="1" maxOccurs="3"/>
            > </>
            > </>
            >
            > And an instance
            > <A>
            > <B/>
            > <B/>
            > </>
            >
            > After validating A, I need to know that its content model B occurred 2
            > times.
            > I was not able to retrieve this information from the API, without
            > writing a Validator myself...
            > Is there a simpler way? (Even a pointer in the API would be enough)

            OK. Umm, tough question.

            I'm curious why do you want to know this. Would you tell me why, if it's
            OK?

            To answer this question, I guess I need to understand more about exactly
            what you want. If the complex type is:

            <complexType>
            <sequence>
            <element name="B" maxOccurs="5" />
            <element name="C"/>
            <element name="B" maxOccurs="5" />
            </sequence>
            </complexType>

            and the document is:

            <A>
            <B/>
            <B/>
            <X/>
            <B/>
            <B/>
            </A>

            you want the numbering as 1,2,1,1,2. Is this right?

            Now if the complex type is

            <complexType>
            <sequence maxOccurs="5">
            <element ref="A"/>
            <element ref="B"/>
            </sequence>
            </complexType>

            What number sequence do you expect?

            What about if the complex type is:

            <complexType>
            <sequence maxOccurs="5">
            <element ref="A" maxOccurs="5" />
            <element ref="B"/>
            </sequence>
            </complexType>

            ?


            regards,
            ----------------------
            Kohsuke Kawaguchi
            E-Mail: kk@...


            _________________________________________________________
            Do You Yahoo!?
            Get your free @... address at http://mail.yahoo.com
          • Enzo Maggi
            ... Thanks for your kind answer :- Its OK for me to tell you how I would use this. We have a compression algorythm for XML, the one used in the MPEG-7 group;
            Message 5 of 5 , Feb 13, 2002
              Kohsuke KAWAGUCHI wrote:

              >>
              >>After validating A, I need to know that its content model B occurred 2
              >>times.
              >>
              >
              >OK. Umm, tough question.
              >
              >I'm curious why do you want to know this. Would you tell me why, if it's
              >OK?
              >
              Thanks for your kind answer :->
              Its OK for me to tell you how I would use this.

              We have a compression algorythm for XML, the one used in the MPEG-7
              group; It relies on the knowledge of the XML-Schema to encode just the
              minimum informations of an XML instance.
              Let's say that the Schema is

              <element A>
              <complexType>
              <sequence>
              <B>
              <C>
              </sequence>
              </complexType>
              </element>

              If you encounter the A element in a valid XML, you can avoid encode the
              inner B, C elements. They are just implied by the Schema.
              For a choice element, one can just encode the index of the element that
              occurred in the instance.

              In a content model
              <sequence min=1 max=10>
              <A/>
              <B/>
              </sequence>

              We just encode the number of occurrences of the sequence in the instance...
              And so on.

              >
              >
              >To answer this question, I guess I need to understand more about exactly
              >what you want. If the complex type is:
              >
              ><complexType>
              > <sequence>
              > <element name="B" maxOccurs="5" />
              > <element name="C"/>
              > <element name="B" maxOccurs="5" />
              > </sequence>
              ></complexType>
              >
              >and the document is:
              >
              ><A>
              > <B/>
              > <B/>
              > <X/>
              > <B/>
              > <B/>
              ></A>
              >
              >you want the numbering as 1,2,1,1,2. Is this right?
              >

              Well, I would say 1,2,1,2 (1 'A' element , 2 'B' elements, 1 'C'
              element, 2 'B' elements)

              >
              >Now if the complex type is
              >
              ><complexType>
              > <sequence maxOccurs="5">
              > <element ref="A"/>
              > <element ref="B"/>
              > </sequence>
              ></complexType>
              >
              >What number sequence do you expect?
              >

              When several possible interpretation of a content model are possible, the first one would be just fine; for example, the one that the Validator now (implicitly) uses to decide that the model is valid.

              I don't know if keeping track of the interpretation is possible (and/or interesting) for MSV.
              It's like having a "TypeInstance" object that corresponds - at run time - to the intepretation of a "Type".
              I could imagine other uses of a TypeInstance object (after all, the Typeinstance tree represents the input XML just like an alternative - typed? - DOM representation).

              Thanks in advance,

              Enzo Maggi
            Your message has been successfully submitted and would be delivered to recipients shortly.