Loading ...
Sorry, an error occurred while loading the content.

TheFactoryImpl

Expand Messages
  • James Strachan
    This might be more of a question for the iso-relax list. The class com.sun.msv.verifier.jarv.TheFactoryImpl is *really* useful. Though it could do with a more
    Message 1 of 9 , Nov 30, 2001
    • 0 Attachment
      This might be more of a question for the iso-relax list. The class
      com.sun.msv.verifier.jarv.TheFactoryImpl is *really* useful. Though it could
      do with a more descriptive name. Maybe AutoDetectFactory?

      It would be nice to plugin the concept of auto-detection into the JARV API.

      VerifierFactory.newInstance() is now deprecated. Maybe we could add a new
      method for creating 'auto detect factories'?

      e.g.

      public static VerifierFactory newAutoDetectFactory();

      which is only used by those wishing to use auto-detection - rather than
      explicitly using the schema-specific newInstance(language) method?

      I love the idea of making application code schema language agnostic and just
      using the schema file or URI to determine what validation occurs.

      James


      _________________________________________________________
      Do You Yahoo!?
      Get your free @... address at http://mail.yahoo.com
    • Kohsuke KAWAGUCHI
      ... Yes... I m always bad at naming things. I ll take your suggestion and change the name to AutoDetectFactory. ... I agree. I can think of an auto-detector,
      Message 2 of 9 , Nov 30, 2001
      • 0 Attachment
        > com.sun.msv.verifier.jarv.TheFactoryImpl is *really* useful. Though it could
        > do with a more descriptive name. Maybe AutoDetectFactory?

        Yes... I'm always bad at naming things. I'll take your suggestion and change
        the name to AutoDetectFactory.


        > It would be nice to plugin the concept of auto-detection into the JARV API.

        I agree. I can think of an auto-detector, which reads the document
        element of the specified schema file, determines the schema language,
        then creates a JARV implementation of that schema language, then let it
        parse the schema file.


        But right now, this is not easy. VerifierFactory can only accept
        InputSource/File, but not SAX. Hence the chances are, a schema file will
        be parsed twice, once by the auto-detector and once by the actual schema
        parser. This is not nice.

        Mandating JARV implementations to be capable of parsing schemas through
        SAX is also not attractive. Firstly, some validators in fact use DOM to
        parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
        cannot parse DTD through SAX.)

        Another way might be to work on Stream level. Read the first portion of
        the schema as a stream, detect the schema language, then rewind the
        stream and pass it to the schema parser as an InputStream. I thought
        about it once, but I'm not familiar with those low-level XML handling
        (since we need to care about encoding, etc.)


        Is there any idea to solve these problems?

        regards,
        ----------------------
        K.Kawaguchi
        E-Mail: kohsukekawaguchi@...
      • James Strachan
        One thought on the autodetection thing. Mostly the detection policy only needs the first Element to be able to decide what schema language it is. So if you had
        Message 3 of 9 , Dec 6, 2001
        • 0 Attachment
          One thought on the autodetection thing. Mostly the detection policy only needs the first Element to be able to decide what schema language it is. So if you had a SAX ContentHandler like this...
           
          public class ProxyContentHandler extends DefaultHandler {
              private ContentHandler validator;
           
              public void startElement(
                  String namespaceURI,
                  String localName,
                  String qualifiedName,
                  Attributes attributes
              ) throws SAXException {
                  if ( validator == null ) {
                      validator = chooseValidator(
                          namespaceURI,
                          localName,
                          qualifiedName,
                          attributes );
                      validator.startDocument();
                  }
                  validator.startElement(...);
              }
          }
           
          i.e the first SAX start element event is used to choose the validator, then all SAX events are passed through to the validators ContentHandler. Then you just have 1 SAX parse.
           
          So from the JARV perspective there could be a VerifierFactory which uses this kind of ProxyContentHandler to deduce which real ContentHandler to use to load the Schema.
           
          James
          ----- Original Message -----
          Sent: Friday, November 30, 2001 7:44 PM
          Subject: [msv-interest] Re: TheFactoryImpl


          > com.sun.msv.verifier.jarv.TheFactoryImpl is *really* useful. Though it could
          > do with a more descriptive name. Maybe AutoDetectFactory?

          Yes... I'm always bad at naming things. I'll take your suggestion and change
          the name to AutoDetectFactory.


          > It would be nice to plugin the concept of auto-detection into the JARV API.

          I agree. I can think of an auto-detector, which reads the document
          element of the specified schema file, determines the schema language,
          then creates a JARV implementation of that schema language, then let it
          parse the schema file.


          But right now, this is not easy. VerifierFactory can only accept
          InputSource/File, but not SAX. Hence the chances are, a schema file will
          be parsed twice, once by the auto-detector and once by the actual schema
          parser. This is not nice.

          Mandating JARV implementations to be capable of parsing schemas through
          SAX is also not attractive. Firstly, some validators in fact use DOM to
          parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
          cannot parse DTD through SAX.)

          Another way might be to work on Stream level. Read the first portion of
          the schema as a stream, detect the schema language, then rewind the
          stream and pass it to the schema parser as an InputStream. I thought
          about it once, but I'm not familiar with those low-level XML handling
          (since we need to care about encoding, etc.)


          Is there any idea to solve these problems?

          regards,
          ----------------------
          K.Kawaguchi
          E-Mail: kohsukekawaguchi@...


          To unsubscribe from this group, send an email to:
          msv-interest-unsubscribe@yahoogroups.com



          Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
        • kohsukekawaguchi@yahoo.com
          ... I m bit confused with your example. I think we are trying to detect a schema language from the root element of the schema file. (Say, if it has
          Message 4 of 9 , Dec 6, 2001
          • 0 Attachment
            > One thought on the autodetection thing. Mostly the detection policy only
            > needs the first Element to be able to decide what schema language it is.
            > So if you had a SAX ContentHandler like this...

            I'm bit confused with your example. I think we are trying to detect a
            schema language from the root element of the schema file. (Say, if it
            has http://relaxng.org/ns/structure/1.0 URI, then it must be RELAX NG,
            etc.)

            But your example says you'll choose a validator based on the root
            element. Is that a typo? Or are you thinking about the auto-detection of
            the schema language from the root element of the *instance* document?
            (e.g., if it has xsi:schemaLocation, then the schema will be XML Schema,
            etc.)


            regards,
            --
            Kohsuke KAWAGUCHI +1 650 786 0721
            Sun Microsystems kohsuke.kawaguchi@...
          • James Strachan
            Yes sorry, I used the variable name validator when I should have used something like schemaLoader or something - I half confused myself as I wrote the
            Message 5 of 9 , Dec 7, 2001
            • 0 Attachment
              Yes sorry, I used the variable name 'validator' when I should have used something like 'schemaLoader' or something - I half confused myself as I wrote the previous email ;-)
               
              If there was an AutoDetectFactory which extends VerfierFactory which the compileSchema() method would use the first SAX element event to determine which real schema loader to use, then load whichever schema it found - this would still only involve one SAX parse and the auto-detection would add little overhead.  So the AutoDetectFactory becomes a facade, detecting which real schema loader to use as its being loaded.
               
              I'm ignoring DTDs here - I think a seperate API is probably required for DTDs as they are the only non-xml format to consider so far. Maybe for DTDs we could have a method along the lines of...
                 
              public class VerifierFactory {
                  ...
                  /** factory for DTDs */
                  public static VerfierFactory newDTDInstance();
               
                  /** auto detect factory */
                  public static VerfierFactory newAutoDetectInstance();
              }
               
              James
              ----- Original Message -----
              Sent: Friday, December 07, 2001 3:13 AM
              Subject: [msv-interest] Re: TheFactoryImpl


              > One thought on the autodetection thing. Mostly the detection policy only
              > needs the first Element to be able to decide what schema language it is.
              > So if you had a SAX ContentHandler like this...

              I'm bit confused with your example. I think we are trying to detect a
              schema language from the root element of the schema file. (Say, if it
              has http://relaxng.org/ns/structure/1.0 URI, then it must be RELAX NG,
              etc.)

              But your example says you'll choose a validator based on the root
              element. Is that a typo? Or are you thinking about the auto-detection of
              the schema language from the root element of the *instance* document?
              (e.g., if it has xsi:schemaLocation, then the schema will be XML Schema,
              etc.)


              regards,
              --
              Kohsuke KAWAGUCHI                          +1 650 786 0721
              Sun Microsystems                   kohsuke.kawaguchi@...



              To unsubscribe from this group, send an email to:
              msv-interest-unsubscribe@yahoogroups.com



              Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
            • kohsukekawaguchi@yahoo.com
              ... As I wrote, ... But right now, this is not easy. VerifierFactory can only accept InputSource/File, not SAX events. Hence the chances are, a schema file
              Message 6 of 9 , Dec 17, 2001
              • 0 Attachment
                > If there was an AutoDetectFactory which extends VerfierFactory which the
                > compileSchema() method would use the first SAX element event to
                > determine which real schema loader to use, then load whichever schema it
                > found - this would still only involve one SAX parse and the
                > auto-detection would add little overhead. So the AutoDetectFactory
                > becomes a facade, detecting which real schema loader to use as its being
                > loaded.

                As I wrote,

                --------------
                But right now, this is not easy. VerifierFactory can only accept
                InputSource/File, not SAX events. Hence the chances are, a schema file
                will be parsed twice, once by the auto-detector and once by the actual
                schema parser. This is not nice.

                Mandating JARV implementations to be capable of parsing schemas through
                SAX is also not attractive. Firstly, some validators in fact use DOM to
                parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
                cannot parse DTD through SAX.)
                --------------

                We can solve the first problem by creating a DOM tree from SAX events
                and then parse it.

                We can solve the second problem (DTD) if we can sniff the stream and
                detect if it's DTD or XML, ...


                Umm, after all, maybe it can be done. Let me see if I can do it.



                regards,
                ----------------------
                K.Kawaguchi
                E-Mail: kohsukekawaguchi@...
              • James Strachan
                I was only really thinking of XML-based schemas and that the first element of an XML schema usually describes which schema it is. I think DTDs should always be
                Message 7 of 9 , Dec 19, 2001
                • 0 Attachment
                  I was only really thinking of XML-based schemas and that the first element of an XML schema usually describes which schema it is. I think DTDs should always be treated seperately. Maybe having a VerifierFactory method just for DTDs in the API might be worthwhile.

                  James
                  ----- Original Message -----
                  Sent: Tuesday, December 18, 2001 4:19 AM
                  Subject: [msv-interest] Re: TheFactoryImpl


                  > If there was an AutoDetectFactory which extends VerfierFactory which the
                  > compileSchema() method would use the first SAX element event to
                  > determine which real schema loader to use, then load whichever schema it
                  > found - this would still only involve one SAX parse and the
                  > auto-detection would add little overhead.  So the AutoDetectFactory
                  > becomes a facade, detecting which real schema loader to use as its being
                  > loaded.

                  As I wrote,

                  --------------
                  But right now, this is not easy. VerifierFactory can only accept
                  InputSource/File, not SAX events. Hence the chances are, a schema file
                  will be parsed twice, once by the auto-detector and once by the actual
                  schema parser. This is not nice.

                  Mandating JARV implementations to be capable of parsing schemas through
                  SAX is also not attractive. Firstly, some validators in fact use DOM to
                  parse schemas (e.g., Xerces), and it doesn't fit well with DTD (as you
                  cannot parse DTD through SAX.)
                  --------------

                  We can solve the first problem by creating a DOM tree from SAX events
                  and then parse it.

                  We can solve the second problem (DTD) if we can sniff the stream and
                  detect if it's DTD or XML, ...


                  Umm, after all, maybe it can be done. Let me see if I can do it.



                  regards,
                  ----------------------
                  K.Kawaguchi
                  E-Mail: kohsukekawaguchi@...



                  To unsubscribe from this group, send an email to:
                  msv-interest-unsubscribe@yahoogroups.com



                  Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                • Kohsuke KAWAGUCHI
                  ... There are other problems with DTD, but I feel a lot of people are considering XML Schema/RELAX NG/Schematron as something alternative to DTD. So it would
                  Message 8 of 9 , Dec 21, 2001
                  • 0 Attachment
                    > I was only really thinking of XML-based schemas and that the first
                    > element of an XML schema usually describes which schema it is. I think
                    > DTDs should always be treated seperately. Maybe having a VerifierFactory
                    > method just for DTDs in the API might be worthwhile.

                    There are other problems with DTD, but I feel a lot of people are
                    considering XML Schema/RELAX NG/Schematron as something alternative to
                    DTD. So it would be nice if we can treat DTD as if it's the same as XML
                    Schema/RELAX NG/Schematron.

                    And even if we forget about DTD, we still need to "rewind" the stream
                    because JARV doesn't support SAX-based schema parsing right now.


                    So here is my plan:

                    1. JARV accepts many different inputs: InputStream, File, String (as
                    url),etc. But all of them are eventually converted to InputStream or
                    Reader.

                    2. Use BufferedInputStream/BufferedReader to wrap the original
                    InputStream/Reader.

                    3. Then mark it before read any byte/char

                    4. Try to parse it as an XML with non-validating parser.

                    5. if it receives the startElement event, it must be an XML-based
                    grammar. Use the tag name to figure out the language, reset the
                    stream, hand it to the appropriate JARV implementation

                    6. if it fails to parse XML document, reset the stream and hand it to
                    the DTD implementation.


                    Do you think it works?


                    regards,
                    --
                    Kohsuke Kawaguchi
                  • James Strachan
                    Sorry for the delay getting back to you - but this sounds great. James ... From: Kohsuke KAWAGUCHI To: msv-interest@yahoogroups.com Sent: Friday, December 21,
                    Message 9 of 9 , Jan 13, 2002
                    • 0 Attachment
                      Sorry for the delay getting back to you - but this sounds great.

                      James
                      ----- Original Message -----
                      Sent: Friday, December 21, 2001 4:39 PM
                      Subject: [msv-interest] Re: TheFactoryImpl


                      > I was only really thinking of XML-based schemas and that the first
                      > element of an XML schema usually describes which schema it is. I think
                      > DTDs should always be treated seperately. Maybe having a VerifierFactory
                      > method just for DTDs in the API might be worthwhile.

                      There are other problems with DTD, but I feel a lot of people are
                      considering XML Schema/RELAX NG/Schematron as something alternative to
                      DTD. So it would be nice if we can treat DTD as if it's the same as XML
                      Schema/RELAX NG/Schematron.

                      And even if we forget about DTD, we still need to "rewind" the stream
                      because JARV doesn't support SAX-based schema parsing right now.


                      So here is my plan:

                      1. JARV accepts many different inputs: InputStream, File, String (as
                         url),etc. But all of them are eventually converted to InputStream or
                         Reader.

                      2. Use BufferedInputStream/BufferedReader to wrap the original
                         InputStream/Reader.

                      3. Then mark it before read any byte/char

                      4. Try to parse it as an XML with non-validating parser.

                      5. if it receives the startElement event, it must be an XML-based
                         grammar. Use the tag name to figure out the language, reset the
                         stream, hand it to the appropriate JARV implementation

                      6. if it fails to parse XML document, reset the stream and hand it to
                         the DTD implementation.


                      Do you think it works?


                      regards,
                      --
                      Kohsuke Kawaguchi



                      To unsubscribe from this group, send an email to:
                      msv-interest-unsubscribe@yahoogroups.com



                      Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                    Your message has been successfully submitted and would be delivered to recipients shortly.