Loading ...
Sorry, an error occurred while loading the content.

Re: [msv-interest] Re: TheFactoryImpl

Expand Messages
  • James Strachan
    One thought on the autodetection thing. Mostly the detection policy only needs the first Element to be able to decide what schema language it is. So if you had
    Message 1 of 9 , Dec 6, 2001
    • 0 Attachment
      One thought on the autodetection thing. Mostly the detection policy only needs the first Element to be able to decide what schema language it is. So if you had a SAX ContentHandler like this...
       
      public class ProxyContentHandler extends DefaultHandler {
          private ContentHandler validator;
       
          public void startElement(
              String namespaceURI,
              String localName,
              String qualifiedName,
              Attributes attributes
          ) throws SAXException {
              if ( validator == null ) {
                  validator = chooseValidator(
                      namespaceURI,
                      localName,
                      qualifiedName,
                      attributes );
                  validator.startDocument();
              }
              validator.startElement(...);
          }
      }
       
      i.e the first SAX start element event is used to choose the validator, then all SAX events are passed through to the validators ContentHandler. Then you just have 1 SAX parse.
       
      So from the JARV perspective there could be a VerifierFactory which uses this kind of ProxyContentHandler to deduce which real ContentHandler to use to load the Schema.
       
      James
      ----- Original Message -----
      Sent: Friday, November 30, 2001 7:44 PM
      Subject: [msv-interest] Re: TheFactoryImpl


      > com.sun.msv.verifier.jarv.TheFactoryImpl is *really* useful. Though it could
      > do with a more descriptive name. Maybe AutoDetectFactory?

      Yes... I'm always bad at naming things. I'll take your suggestion and change
      the name to AutoDetectFactory.


      > It would be nice to plugin the concept of auto-detection into the JARV API.

      I agree. I can think of an auto-detector, which reads the document
      element of the specified schema file, determines the schema language,
      then creates a JARV implementation of that schema language, then let it
      parse the schema file.


      But right now, this is not easy. VerifierFactory can only accept
      InputSource/File, but not SAX. Hence the chances are, a schema file will
      be parsed twice, once by the auto-detector and once by the actual schema
      parser. This is not nice.

      Mandating JARV implementations to be capable of parsing schemas through
      SAX is also not attractive. Firstly, some validators in fact use DOM to
      parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
      cannot parse DTD through SAX.)

      Another way might be to work on Stream level. Read the first portion of
      the schema as a stream, detect the schema language, then rewind the
      stream and pass it to the schema parser as an InputStream. I thought
      about it once, but I'm not familiar with those low-level XML handling
      (since we need to care about encoding, etc.)


      Is there any idea to solve these problems?

      regards,
      ----------------------
      K.Kawaguchi
      E-Mail: kohsukekawaguchi@...


      To unsubscribe from this group, send an email to:
      msv-interest-unsubscribe@yahoogroups.com



      Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
    • kohsukekawaguchi@yahoo.com
      ... I m bit confused with your example. I think we are trying to detect a schema language from the root element of the schema file. (Say, if it has
      Message 2 of 9 , Dec 6, 2001
      • 0 Attachment
        > One thought on the autodetection thing. Mostly the detection policy only
        > needs the first Element to be able to decide what schema language it is.
        > So if you had a SAX ContentHandler like this...

        I'm bit confused with your example. I think we are trying to detect a
        schema language from the root element of the schema file. (Say, if it
        has http://relaxng.org/ns/structure/1.0 URI, then it must be RELAX NG,
        etc.)

        But your example says you'll choose a validator based on the root
        element. Is that a typo? Or are you thinking about the auto-detection of
        the schema language from the root element of the *instance* document?
        (e.g., if it has xsi:schemaLocation, then the schema will be XML Schema,
        etc.)


        regards,
        --
        Kohsuke KAWAGUCHI +1 650 786 0721
        Sun Microsystems kohsuke.kawaguchi@...
      • James Strachan
        Yes sorry, I used the variable name validator when I should have used something like schemaLoader or something - I half confused myself as I wrote the
        Message 3 of 9 , Dec 7, 2001
        • 0 Attachment
          Yes sorry, I used the variable name 'validator' when I should have used something like 'schemaLoader' or something - I half confused myself as I wrote the previous email ;-)
           
          If there was an AutoDetectFactory which extends VerfierFactory which the compileSchema() method would use the first SAX element event to determine which real schema loader to use, then load whichever schema it found - this would still only involve one SAX parse and the auto-detection would add little overhead.  So the AutoDetectFactory becomes a facade, detecting which real schema loader to use as its being loaded.
           
          I'm ignoring DTDs here - I think a seperate API is probably required for DTDs as they are the only non-xml format to consider so far. Maybe for DTDs we could have a method along the lines of...
             
          public class VerifierFactory {
              ...
              /** factory for DTDs */
              public static VerfierFactory newDTDInstance();
           
              /** auto detect factory */
              public static VerfierFactory newAutoDetectInstance();
          }
           
          James
          ----- Original Message -----
          Sent: Friday, December 07, 2001 3:13 AM
          Subject: [msv-interest] Re: TheFactoryImpl


          > One thought on the autodetection thing. Mostly the detection policy only
          > needs the first Element to be able to decide what schema language it is.
          > So if you had a SAX ContentHandler like this...

          I'm bit confused with your example. I think we are trying to detect a
          schema language from the root element of the schema file. (Say, if it
          has http://relaxng.org/ns/structure/1.0 URI, then it must be RELAX NG,
          etc.)

          But your example says you'll choose a validator based on the root
          element. Is that a typo? Or are you thinking about the auto-detection of
          the schema language from the root element of the *instance* document?
          (e.g., if it has xsi:schemaLocation, then the schema will be XML Schema,
          etc.)


          regards,
          --
          Kohsuke KAWAGUCHI                          +1 650 786 0721
          Sun Microsystems                   kohsuke.kawaguchi@...



          To unsubscribe from this group, send an email to:
          msv-interest-unsubscribe@yahoogroups.com



          Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
        • kohsukekawaguchi@yahoo.com
          ... As I wrote, ... But right now, this is not easy. VerifierFactory can only accept InputSource/File, not SAX events. Hence the chances are, a schema file
          Message 4 of 9 , Dec 17, 2001
          • 0 Attachment
            > If there was an AutoDetectFactory which extends VerfierFactory which the
            > compileSchema() method would use the first SAX element event to
            > determine which real schema loader to use, then load whichever schema it
            > found - this would still only involve one SAX parse and the
            > auto-detection would add little overhead. So the AutoDetectFactory
            > becomes a facade, detecting which real schema loader to use as its being
            > loaded.

            As I wrote,

            --------------
            But right now, this is not easy. VerifierFactory can only accept
            InputSource/File, not SAX events. Hence the chances are, a schema file
            will be parsed twice, once by the auto-detector and once by the actual
            schema parser. This is not nice.

            Mandating JARV implementations to be capable of parsing schemas through
            SAX is also not attractive. Firstly, some validators in fact use DOM to
            parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
            cannot parse DTD through SAX.)
            --------------

            We can solve the first problem by creating a DOM tree from SAX events
            and then parse it.

            We can solve the second problem (DTD) if we can sniff the stream and
            detect if it's DTD or XML, ...


            Umm, after all, maybe it can be done. Let me see if I can do it.



            regards,
            ----------------------
            K.Kawaguchi
            E-Mail: kohsukekawaguchi@...
          • James Strachan
            I was only really thinking of XML-based schemas and that the first element of an XML schema usually describes which schema it is. I think DTDs should always be
            Message 5 of 9 , Dec 19, 2001
            • 0 Attachment
              I was only really thinking of XML-based schemas and that the first element of an XML schema usually describes which schema it is. I think DTDs should always be treated seperately. Maybe having a VerifierFactory method just for DTDs in the API might be worthwhile.

              James
              ----- Original Message -----
              Sent: Tuesday, December 18, 2001 4:19 AM
              Subject: [msv-interest] Re: TheFactoryImpl


              > If there was an AutoDetectFactory which extends VerfierFactory which the
              > compileSchema() method would use the first SAX element event to
              > determine which real schema loader to use, then load whichever schema it
              > found - this would still only involve one SAX parse and the
              > auto-detection would add little overhead.  So the AutoDetectFactory
              > becomes a facade, detecting which real schema loader to use as its being
              > loaded.

              As I wrote,

              --------------
              But right now, this is not easy. VerifierFactory can only accept
              InputSource/File, not SAX events. Hence the chances are, a schema file
              will be parsed twice, once by the auto-detector and once by the actual
              schema parser. This is not nice.

              Mandating JARV implementations to be capable of parsing schemas through
              SAX is also not attractive. Firstly, some validators in fact use DOM to
              parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
              cannot parse DTD through SAX.)
              --------------

              We can solve the first problem by creating a DOM tree from SAX events
              and then parse it.

              We can solve the second problem (DTD) if we can sniff the stream and
              detect if it's DTD or XML, ...


              Umm, after all, maybe it can be done. Let me see if I can do it.



              regards,
              ----------------------
              K.Kawaguchi
              E-Mail: kohsukekawaguchi@...



              To unsubscribe from this group, send an email to:
              msv-interest-unsubscribe@yahoogroups.com



              Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
            • Kohsuke KAWAGUCHI
              ... There are other problems with DTD, but I feel a lot of people are considering XML Schema/RELAX NG/Schematron as something alternative to DTD. So it would
              Message 6 of 9 , Dec 21, 2001
              • 0 Attachment
                > I was only really thinking of XML-based schemas and that the first
                > element of an XML schema usually describes which schema it is. I think
                > DTDs should always be treated seperately. Maybe having a VerifierFactory
                > method just for DTDs in the API might be worthwhile.

                There are other problems with DTD, but I feel a lot of people are
                considering XML Schema/RELAX NG/Schematron as something alternative to
                DTD. So it would be nice if we can treat DTD as if it's the same as XML
                Schema/RELAX NG/Schematron.

                And even if we forget about DTD, we still need to "rewind" the stream
                because JARV doesn't support SAX-based schema parsing right now.


                So here is my plan:

                1. JARV accepts many different inputs: InputStream, File, String (as
                url),etc. But all of them are eventually converted to InputStream or
                Reader.

                2. Use BufferedInputStream/BufferedReader to wrap the original
                InputStream/Reader.

                3. Then mark it before read any byte/char

                4. Try to parse it as an XML with non-validating parser.

                5. if it receives the startElement event, it must be an XML-based
                grammar. Use the tag name to figure out the language, reset the
                stream, hand it to the appropriate JARV implementation

                6. if it fails to parse XML document, reset the stream and hand it to
                the DTD implementation.


                Do you think it works?


                regards,
                --
                Kohsuke Kawaguchi
              • James Strachan
                Sorry for the delay getting back to you - but this sounds great. James ... From: Kohsuke KAWAGUCHI To: msv-interest@yahoogroups.com Sent: Friday, December 21,
                Message 7 of 9 , Jan 13, 2002
                • 0 Attachment
                  Sorry for the delay getting back to you - but this sounds great.

                  James
                  ----- Original Message -----
                  Sent: Friday, December 21, 2001 4:39 PM
                  Subject: [msv-interest] Re: TheFactoryImpl


                  > I was only really thinking of XML-based schemas and that the first
                  > element of an XML schema usually describes which schema it is. I think
                  > DTDs should always be treated seperately. Maybe having a VerifierFactory
                  > method just for DTDs in the API might be worthwhile.

                  There are other problems with DTD, but I feel a lot of people are
                  considering XML Schema/RELAX NG/Schematron as something alternative to
                  DTD. So it would be nice if we can treat DTD as if it's the same as XML
                  Schema/RELAX NG/Schematron.

                  And even if we forget about DTD, we still need to "rewind" the stream
                  because JARV doesn't support SAX-based schema parsing right now.


                  So here is my plan:

                  1. JARV accepts many different inputs: InputStream, File, String (as
                     url),etc. But all of them are eventually converted to InputStream or
                     Reader.

                  2. Use BufferedInputStream/BufferedReader to wrap the original
                     InputStream/Reader.

                  3. Then mark it before read any byte/char

                  4. Try to parse it as an XML with non-validating parser.

                  5. if it receives the startElement event, it must be an XML-based
                     grammar. Use the tag name to figure out the language, reset the
                     stream, hand it to the appropriate JARV implementation

                  6. if it fails to parse XML document, reset the stream and hand it to
                     the DTD implementation.


                  Do you think it works?


                  regards,
                  --
                  Kohsuke Kawaguchi



                  To unsubscribe from this group, send an email to:
                  msv-interest-unsubscribe@yahoogroups.com



                  Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                Your message has been successfully submitted and would be delivered to recipients shortly.