Loading ...
Sorry, an error occurred while loading the content.

Re: [NTO] Text file differences - what's going on?

Expand Messages
  • Alex Plantema
    ... I think you re right, see http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark Alex.
    Message 1 of 9 , Dec 2, 2010
    • 0 Attachment
      Op donderdag 2 december 2010 23:01 schreef Alec Burgess:

      > After googling [EF BB BF] and reading
      > http://en.wikipedia.org/wiki/Byte_order_mark it appears that presence
      > of the byte order mark is supposed to be optional in UTF-8 files (and
      > deprecated).
      >
      > Am I reading that correctly? If so the behavior of PsPad (says can't
      > read in hex format) and X1 (won't index) should be considered "bugs"?

      I think you're right, see http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark

      Alex.
    • loro
      Al wrote:-) ... FYI my copy of PSPad opens both files without problems. The hex view doesn t show me the BOM though, the files look identical. I haven t
      Message 2 of 9 , Dec 2, 2010
      • 0 Attachment
        Al wrote:-)
        >After googling [EF BB BF] and reading
        >http://en.wikipedia.org/wiki/Byte_order_mark it appears that presence of
        >the byte order mark is supposed to be optional in UTF-8 files (and
        >deprecated).

        FYI my copy of PSPad opens both files without problems. The hex view
        doesn't show me the BOM though, the files look identical. I haven't
        upgraded in while, let's see, version 4.3.0 (1971).
      • Alec Burgess
        ... Thanks for checking ... my version was 4.5.1 and I updated to latest stable 4.5.4 (There is a 4.5.5 beta available) I hadn t realized ... didn t check ...
        Message 3 of 9 , Dec 2, 2010
        • 0 Attachment
          On 2010-12-02 21:36, loro wrote:
          > Al wrote:-)
          > >After googling [EF BB BF] and reading
          > >http://en.wikipedia.org/wiki/Byte_order_mark it appears that presence of
          > >the byte order mark is supposed to be optional in UTF-8 files (and
          > >deprecated).
          >
          > FYI my copy of PSPad opens both files without problems. The hex view
          > doesn't show me the BOM though, the files look identical. I haven't
          > upgraded in while, let's see, version 4.3.0 (1971).
          Thanks for checking ... my version was 4.5.1 and I updated to latest
          stable 4.5.4 (There is a 4.5.5 beta available)
          I hadn't realized ... didn't check ...
          PSPad will add three Explorer context menu options: PSPad/PSPad Hex
          View/PSPad Text Diff
          I get the "Can not open file art_bad.html" error *ONLY* when I try to
          open art_bad.html from the context menu [PSPad Hex View]

          After opening the files in PSPad normally and then use View-Hex Edit Mode:
          art_bad.html (w/o Byte order mark)
          - shows in Status bar: Code page: ANSI (Windows) and shows first byte as
          3C (the '<' in "<?xml version='1")

          art_good.html (with Byte order mark)
          - shows in Status bar: Code page: UTF-8 and shows the first four bytes
          as FFFE 3C00 with first two chars: 'ÿþ<' in "ÿþ<?xml v" where the funny
          character 'ÿþ' is (I guess) UTF-8 FFFE
          - This doesn't appear to have any direct relation to the byte order mark
          [EF BB BF] and who knows why its being shown that way. :-)

          When I use Alex's "trick" and open art_good.html, delete all text and
          save as either BOM_only.txt or BOM_only.html then PSPad shows both files
          as Code page: ANSI (Windows) with contents "" and in hex the
          expected "EFBB BF".

          I've been learning way more about this stuff than I ever really wanted !!

          I'll probably post a question/bug-report with PSPad and ask on X1 forums
          if this "bug?" has been corrected in later versions of X1 and beg for an
          updated copy.

          So far only 593 of 251,287 cables have been made available on
          http://cablegate.wikileaks.org/ with torrent link:
          http://file.wikileaks.org/torrent/cablegate/cablegate-201012021301.7z.torrent
          and I gather from the media I've got a couple of weeks to figure out how
          to search them when eventually released.

          One more data point: HippoEDIT was the giveawayoftheday a couple of
          weeks ago ... when I open art_bad.html in it the file is immediately
          flagged as modified - haven't checked, but guess that if I save it the
          BOM will have been inserted a la Notepad.

          --
          Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
        • loro
          ... Yeah, ÿþ is how it uses to show up in the upper left corner of web pages. Lotta
          Message 4 of 9 , Dec 2, 2010
          • 0 Attachment
            At 07:43 2010-12-03, Alec Burgess wrote:
            >with first two chars: 'ÿþ<' in "ÿþ<?xml v" where the funny
            >character 'ÿþ' is (I guess) UTF-8 FFFE

            Yeah, ÿþ is how it uses to show up in the upper left corner of web pages.

            Lotta
          Your message has been successfully submitted and would be delivered to recipients shortly.