8992Text file differences - what's going on?
- Dec 2, 2010see:
I was having problems getting X1 to index a bunch of files extracted
from a zip file torrent of Cablegate.wiklleaks.org
I finally figured out that there must be something funny about the file
Above zip contains two files art_bad.html and art_good.html
art_bad.html is original file after extracting from the zip file.
art_good.html is the result of opening the "bad" file in Notepad.exe and
then saving it under new name.
Opening both files in Notetab: status bar reports both as UTF-8
If I try to open "art_bad.html" with PsPad-Hex viewer it says "Can't
If I run file.exe (from cygwin) against both it reports:
art_bad.html: xHTML document text
art_good.html: HTML document text
Using online hex dump tool http://www.fileformat.info/tool/hexdump.htm
shows the first 16 bytes of each as:
art_bad.html vs. art_good.html
0000-0010: 3c 3f 78 6d-6c 20 76 65-72 73 69 6f-6e 3d 27 31 <?xml.ve rsion='1
0000-0010: ef bb bf 3c-3f 78 6d 6c-20 76 65 72-73 69 6f 6e ...<?xml .version
so the difference is the first three bytes not present in the "bad"
version (before Save as ... in Notepad)
I know I've seen this discussed here before but can't remember what this
is all about.
Can anybody explain and/or point me to a tool I can use to change the
"bad" files to "good" so I can get them indexed by X1?
Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
- Next post in topic >>