Loading ...
Sorry, an error occurred while loading the content.

RE: [RSS-DEV] Re: default namespace?

Expand Messages
  • Jon Hanna
    ... Yes, regexing for a predefined prefix is pure guesswork and shouldn t be expected to work.
    Message 1 of 14 , Nov 5, 2002
    View Source
    • 0 Attachment
      > > Which is running a regex on the uri provided form the default
      > > namespace, expecting it to be either:
      > > http://purl.org/rss/1.0/ or
      > > http://my.netscape.com/rdf/simple/0.9/
      >
      > Ah, so Kevin's feed using xmlns="http://www.w3.org/1999/xhtml" as
      > the default
      > would certainly freak it out. Shouldn't it, perhaps, be looking at the
      > namespaces instead and then digging out the prefix? That would
      > have found it.

      Yes, regexing for a predefined prefix is pure guesswork and shouldn't be
      expected to work.
    • Ian Davis
      ... I took a look at it this week while writing my LiSA[1] parser and noticed the hardcoded #default namespaces. If no-one else picks it up first I will
      Message 2 of 14 , Nov 5, 2002
      View Source
      • 0 Attachment
        On Monday, 04 November 2002 at 17:34, Bill Kearney wrote:
        > What would it take to get XML:RSS updated to support namespaces 'properly'?

        > http://search.cpan.org/dist/XML-RSS/RSS.pm
        I took a look at it this week while writing my LiSA[1] parser and noticed
        the hardcoded '#default' namespaces. If no-one else picks it up first
        I will hammer out a patch one night this week.

        Ian
      • Ian Davis
        ... I have a completed patch for XML::RSS which I have sent to Jonathan Eisenzopf and Rael Dornfest for inclusion in the module. I have also offered to take
        Message 3 of 14 , Nov 5, 2002
        View Source
        • 0 Attachment
          On Monday, 04 November 2002 at 17:34, Bill Kearney wrote:
          > What would it take to get XML:RSS updated to support namespaces 'properly'?
          > http://search.cpan.org/dist/XML-RSS/RSS.pm
          I have a completed patch for XML::RSS which I have sent to Jonathan
          Eisenzopf and Rael Dornfest for inclusion in the module. I have also
          offered to take over maintenance of the module if either of them do
          not have the time to do so any more.

          Thanks to Jon Hanna for his test cases. I wrote a unit test to run
          XML::RSS against each of the files and verify the parsing of the
          module. The module now parses all eight files identically, except for
          sixth.rdf which contains a BOM (byte order marker - tells you what
          unicode encoding is being used). I've had problems in the past parsing
          XML documents with a BOM using the XML::Parser module. I removed it
          for my unit test and XML::RSS parsed the CDATA without problems.

          I'll report back once I've heard back from either Jonathan or Rael.

          Ian
        • Jon Hanna
          I ve had problems in the past parsing ... Yes the BOM is a bit weird (at least in UTF-8, in UTF-16 it is compulsory, end-of-story). There was some controversy
          Message 4 of 14 , Nov 6, 2002
          View Source
          • 0 Attachment
            I've had problems in the past parsing
            > XML documents with a BOM using the XML::Parser module.

            Yes the BOM is a bit weird (at least in UTF-8, in UTF-16 it is compulsory,
            end-of-story). There was some controversy for a while about whether the BOM
            should be compulsory, prohibited, or optional in XML documents encoded in
            UTF-8 and it was finally resolved that producers MAY use it (and hence
            parsers MUST accept it). See http://www.w3.org/XML/xml-V10-2e-errata#E22 for
            more.

            The simplest solution if to fixing parsers that don't process a UTF-8 BOM on
            is just to look for the byte-sequence 0xEF, 0xBB, 0xBF and eat it before the
            rest of the processing is done (perhaps taking the opportunity to record
            that the stream is definitely UTF-8 if such action makes sense to your
            implementation).

            A more complicated matter is UTF-16 WITHOUT a BOM, however there is no
            requirement that XML parsers deal with this, so I don't have a test case for
            it (UTF-16 XML without a BOM is an error and how you do or do not process it
            is implementation-defined).
          • Ziv Caspi
            ... case ... process ... A common trick when there s no BOM is to assume that the first character is
            Message 5 of 14 , Nov 7, 2002
            View Source
            • 0 Attachment
              Jon Hanna wrote:

              > A more complicated matter is UTF-16 WITHOUT a BOM, however there is no
              > requirement that XML parsers deal with this, so I don't have a test
              case
              > for
              > it (UTF-16 XML without a BOM is an error and how you do or do not
              process
              > it
              > is implementation-defined).

              A common trick when there's no BOM is to assume that the first character
              is '<', and then to test for all possible UTF-style encodings (there are
              not that many). The more interesting case is lack of BOM *and* 7-bit
              compatible with ASCII, in which case you need to locate the encoding
              attribute.

              (For those interested, Aggie has a small XML-encoding parser whose
              source is available. The parser is used when Aggie needs to take
              non-well-formed XML resources and "well-form" them for consumption by
              the .NET XML parser.)

              Ziv Caspi
              cell: +972-53-668-751
              web: http://radio.weblogs.com/0106548/
            • Jon Hanna
              ... Indeed. I certainly recommend anyone writing an XML parser to look at the heuristics detailed in the spec for doing this kind of interpretation. However
              Message 6 of 14 , Nov 8, 2002
              View Source
              • 0 Attachment
                > > A more complicated matter is UTF-16 WITHOUT a BOM, however there is no
                > > requirement that XML parsers deal with this, so I don't have a test
                > case
                > > for
                > > it (UTF-16 XML without a BOM is an error and how you do or do not
                > process
                > > it
                > > is implementation-defined).
                >
                > A common trick when there's no BOM is to assume that the first character
                > is '<', and then to test for all possible UTF-style encodings (there are
                > not that many). The more interesting case is lack of BOM *and* 7-bit
                > compatible with ASCII, in which case you need to locate the encoding
                > attribute.

                Indeed. I certainly recommend anyone writing an XML parser to look at the
                heuristics detailed in the spec for doing this kind of interpretation.
                However doing this is either error-recovery, or support for a
                character-encoding apart from UTF-8 and UTF-16, and hence not something you
                MUST do to be compliant, as such I don't think we need test cases for those
                scenarios.

                Can anyone think of other test scenarios at this level (XML parsing) that we
                need to do before I start thinking about test cases at the RDF and/or RSS
                level?
              Your message has been successfully submitted and would be delivered to recipients shortly.