1114FW: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)
- Apr 2, 2001We found that we had a problem with this in the Apache implementation when we used a Reader, rather than a raw InputStream, to create the InputSource for the XML parser. The Reader apparently got confused by the BOM, and ate some of it but not all, so the parser couldn't deal. When we switched to sending the InputStream directly to the parser (in our case, Xerces), all was well. Just an FYI, in case this might have something to do with the problem you're seeing.--Glen-----Original Message-----
From: Michael Brennan [mailto:michael_brennan@...]
Sent: Monday, April 02, 2001 6:07 PM
Subject: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)Thanks for the reference. I've overlooked that (and thought that UTF-8 never includes a BOM).Interestingly, the XML parser Sun ships with JAXP chokes on this. Now there is one more thing to test for conformance: XML parsers.Looks like this one is a bug in Sun's parser. :-(I wonder how many other XML parsers have problems with this.-----Original Message-----michael wrote:
From: Fredrik Lundh [mailto:fredrik@...]
Sent: Saturday, March 31, 2001 1:04 AM
Subject: Re: [soapbuilders] Follow-up UTF-8 test
> However, I am still seeing one odd problem: the returned message seemed to
> have some garbage bytes preceding the XML prolog. It appears to be 3 bytes
> whose hex values are: EFBBBF.
> I've seen this same sequence of bytes when I save a file in UTF-8 format
> using Notepad.
> Any idea what's happening here?
it's a unicode BOM (byte order mark). it's not necessary for
UTF-8, but your parser shouldn't choke on it.
more info here:
also see appendix F of the XML spec.
To unsubscribe from this group, send an email to:
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.