Loading ...
Sorry, an error occurred while loading the content.

UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)

Expand Messages
  • Michael Brennan
    Thanks for the reference. I ve overlooked that (and thought that UTF-8 never includes a BOM). Interestingly, the XML parser Sun ships with JAXP chokes on this.
    Message 1 of 3 , Apr 2, 2001
    • 0 Attachment
      Thanks for the reference. I've overlooked that (and thought that UTF-8 never includes a BOM).
       
      Interestingly, the XML parser Sun ships with JAXP chokes on this. Now there is one more thing to test for conformance: XML parsers.
       
      Looks like this one is a bug in Sun's parser.  :-(
       
      I wonder how many other XML parsers have problems with this.
      -----Original Message-----
      From: Fredrik Lundh [mailto:fredrik@...]
      Sent: Saturday, March 31, 2001 1:04 AM
      To: soapbuilders@yahoogroups.com
      Subject: Re: [soapbuilders] Follow-up UTF-8 test

      michael wrote:
      > However, I am still seeing one odd problem: the returned message seemed to
      > have some garbage bytes preceding the XML prolog. It appears to be 3 bytes
      > whose hex values are: EFBBBF.
      >
      > I've seen this same sequence of bytes when I save a file in UTF-8 format
      > using Notepad.
      >
      > Any idea what's happening here?

      it's a unicode BOM (byte order mark).  it's not necessary for
      UTF-8, but your parser shouldn't choke on it.

      more info here:

          http://www.unicode.org/unicode/faq/utf_bom.html

      also see appendix F of the XML spec.

      Cheers /F
    • Eric Kidd
      ... UTF-8 generally shouldn t include a BOM, IIRC. But if you see one, you should recognize it. UTF-16, according to the XML standard, must include a BOM.
      Message 2 of 3 , Apr 2, 2001
      • 0 Attachment
        On Mon, Apr 02, 2001 at 03:06:35PM -0700, Michael Brennan wrote:
        > Thanks for the reference. I've overlooked that (and thought that UTF-8
        > never includes a BOM).

        UTF-8 generally shouldn't include a BOM, IIRC. But if you see one, you
        should recognize it. UTF-16, according to the XML standard, must include a
        BOM.

        > I wonder how many other XML parsers have problems with this.

        This is probably a good reason not to send a UTF-8 BOM if you can avoid
        doing so. :-)

        Cheers,
        Eric

        --
        XML-RPC HOWTO: http://www.linuxdoc.org/HOWTO/XML-RPC-HOWTO/index.html
        XML-RPC for C and C++: http://xmlrpc-c.sourceforge.net/
      • Simon Fell
        from my investigations over the weekend, the only time i could see that you were likely to get a UTF-8 BOM is if you are transcoding to UTF-8 from UTF-16.
        Message 3 of 3 , Apr 2, 2001
        • 0 Attachment
          from my investigations over the weekend, the only time i could see that you
          were likely to get a UTF-8 BOM is if you are transcoding to UTF-8 from
          UTF-16.

          After I fixed my encoding problems, both expat and xerces-c handled it fine.
          (both UTF-8 & UTF-16 BOM's)

          Cheers
          Simon

          -----Original Message-----
          From: Eric Kidd [mailto:eric.kidd@...]
          Sent: Monday, April 02, 2001 6:03 PM
          To: soapbuilders@yahoogroups.com
          Subject: Re: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)


          On Mon, Apr 02, 2001 at 03:06:35PM -0700, Michael Brennan wrote:
          > Thanks for the reference. I've overlooked that (and thought that UTF-8
          > never includes a BOM).

          UTF-8 generally shouldn't include a BOM, IIRC. But if you see one, you
          should recognize it. UTF-16, according to the XML standard, must include a
          BOM.

          > I wonder how many other XML parsers have problems with this.

          This is probably a good reason not to send a UTF-8 BOM if you can avoid
          doing so. :-)

          Cheers,
          Eric

          --
          XML-RPC HOWTO:
          http://www.linuxdoc.org/HOWTO/XML-RPC-HOWTO/index.html
          XML-RPC for C and C++: http://xmlrpc-c.sourceforge.net/


          To unsubscribe from this group, send an email to:
          soapbuilders-unsubscribe@yahoogroups.com



          Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        Your message has been successfully submitted and would be delivered to recipients shortly.