found that we had a problem with this in the Apache implementation when we used
a Reader, rather than a raw InputStream, to create the InputSource for the XML
parser. The Reader apparently got confused by the BOM, and ate some of it
but not all, so the parser couldn't deal. When we switched to sending the
InputStream directly to the parser (in our case, Xerces), all was well.
Just an FYI, in case this might have something to do with the problem you're
Thanks for the reference. I've overlooked that (and
thought that UTF-8 never includes a BOM).
Interestingly, the XML parser Sun ships with JAXP
chokes on this. Now there is one more thing to test for conformance: XML
Looks like this one is a bug in Sun's parser.
wonder how many other XML parsers have problems with this.
> However, I am still seeing one odd problem: the returned
message seemed to
> have some garbage bytes preceding the XML prolog.
It appears to be 3 bytes
> whose hex values are: EFBBBF.
> I've seen this same sequence of bytes when I save a file in UTF-8
> using Notepad.
> Any idea what's happening
it's a unicode BOM (byte order mark). it's not necessary
UTF-8, but your parser shouldn't choke on it.
see appendix F of the XML spec.
unsubscribe from this group, send an email
use of Yahoo! Groups is subject to the Yahoo! Terms of Service.