Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Code page/character issues

Expand Messages
  • Axel Berger
    ... This is really, really weird. There is no explanation I can think of except that Windows itself is saving extra meta-information about files and NT can
    Message 1 of 14 , Jan 12, 2013
    • 0 Attachment
      John Shotsky wrote:
      > If I export from FireFox, then copy the file to another folder,
      > and then open the copy in NoteTab, it does not have the
      > problem I am discussing. The original still has the problem.
      > If I export from FireFox, then rename the file in the original
      > folder using Windows, then open the file in NoteTab, it
      > does not have the problem I am discussing.

      This is really, really weird. There is no explanation I can think of
      except that Windows itself is saving extra meta-information about files
      and NT can access that. You should do a file compare in hex
      (Totalcommander can do that). If they are identical, they ought to be
      after a simple copy and must be after rename, then it has to be external
      meta-information.

      Axel
    • John Shotsky
      Well, if it was mainstream, I wouldn t have brought it up here. At first, I thought maybe FireFox wasn t properly closing the file, which is why I rebooted
      Message 2 of 14 , Jan 12, 2013
      • 0 Attachment
        Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly
        closing the file, which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show
        anything different between the before and after. The byte count doesn't change. But somehow, NoteTab is seeing
        'something' that I can't find, which is why I asked here. I'll see if I can find a more robust file metadata analyzer.
        FYI � EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text
        file.

        Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the
        same thing as NoteTab!
        Text View: Speka Piragi
        Hex/Ascii View: Spe��a P��r��gi

        NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab � incorrectly.
        That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means
        that either Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I
        don't think it's Windows. This reminds me that, over time, I have often had problems with characters in NoteTab, and
        have done everything I could think of to resolve those issues. But this one seems unresolvable. (except for moving the
        file)

        Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often
        retain oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even
        guess why NoteTab detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do
        not know what mechanism editors use to detect code pages.

        Regards,
        John
        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
        Sent: Saturday, January 12, 2013 00:39
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] Code page/character issues


        John Shotsky wrote:
        > If I export from FireFox, then copy the file to another folder,
        > and then open the copy in NoteTab, it does not have the
        > problem I am discussing. The original still has the problem.
        > If I export from FireFox, then rename the file in the original
        > folder using Windows, then open the file in NoteTab, it
        > does not have the problem I am discussing.

        This is really, really weird. There is no explanation I can think of
        except that Windows itself is saving extra meta-information about files
        and NT can access that. You should do a file compare in hex
        (Totalcommander can do that). If they are identical, they ought to be
        after a simple copy and must be after rename, then it has to be external
        meta-information.

        Axel



        [Non-text portions of this message have been removed]
      • John Shotsky
        One last bit of information about this problem: When the file has been moved or renamed, it displays correctly in NoteTab, although it doesn t display the
        Message 3 of 14 , Jan 12, 2013
        • 0 Attachment
          One last bit of information about this problem:
          When the file has been moved or renamed, it displays correctly in NoteTab, although it doesn't display the accents. It
          displays standard English characters. However, when this same file is viewed in EditPad Pro, the original accents are
          still present. It is only the DISPLAY that is different between these two instances. So, in one case NoteTab displays
          one way, and in the other case a different way.

          From what I've read about code page detection, there are commands that search through a document and report on
          characters found. I still have no idea why moving or renaming a file provides different results to NoteTab. But it is
          apparently something about this search or the decision about how to display that is not working correctly.

          In my research, I found this fascinating tale of the history and basis of characters on computers:
          http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

          Regards,
          John
          RecipeTools Web Site: http://recipetools.gotdns.com/


          -----Original Message-----
          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
          Sent: Saturday, January 12, 2013 05:06
          To: ntb-clips@yahoogroups.com
          Subject: RE: [Clip] Code page/character issues

          Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly
          closing the file, which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show
          anything different between the before and after. The byte count doesn't change. But somehow, NoteTab is seeing
          'something' that I can't find, which is why I asked here. I'll see if I can find a more robust file metadata analyzer.
          FYI – EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text
          file.

          Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the
          same thing as NoteTab!
          Text View: Speka Piragi
          Hex/Ascii View: Speķa Pīr�gi

          NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab – incorrectly.
          That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means
          that either Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I
          don't think it's Windows. This reminds me that, over time, I have often had problems with characters in NoteTab, and
          have done everything I could think of to resolve those issues. But this one seems unresolvable. (except for moving the
          file)

          Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often
          retain oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even
          guess why NoteTab detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do
          not know what mechanism editors use to detect code pages.

          Regards,
          John
          RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
          Sent: Saturday, January 12, 2013 00:39
          To: ntb-clips@yahoogroups.com
          Subject: Re: [Clip] Code page/character issues


          John Shotsky wrote:
          > If I export from FireFox, then copy the file to another folder,
          > and then open the copy in NoteTab, it does not have the
          > problem I am discussing. The original still has the problem.
          > If I export from FireFox, then rename the file in the original
          > folder using Windows, then open the file in NoteTab, it
          > does not have the problem I am discussing.

          This is really, really weird. There is no explanation I can think of
          except that Windows itself is saving extra meta-information about files
          and NT can access that. You should do a file compare in hex
          (Totalcommander can do that). If they are identical, they ought to be
          after a simple copy and must be after rename, then it has to be external
          meta-information.

          Axel



          [Non-text portions of this message have been removed]



          ------------------------------------

          Fookes Software: http://www.fookes.com/
          NoteTab website: http://www.notetab.com/
          NoteTab Discussion Lists: http://www.notetab.com/groups.php

          ***
          Yahoo! Groups Links
        • John Wallace
          What would happen if you saved it as a .html file instead of .txt file? ... From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of
          Message 4 of 14 , Jan 12, 2013
          • 0 Attachment
            What would happen if you saved it as a .html file instead of .txt file?


            -----Original Message-----
            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
            Sent: Saturday, January 12, 2013 05:06
            To: ntb-clips@yahoogroups.com
            Subject: RE: [Clip] Code page/character issues

            Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly closing the file,
            which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show anything different between
            the before and after. The byte count doesn't change. But somehow, NoteTab is seeing 'something' that I can't find, which is why I
            asked here. I'll see if I can find a more robust file metadata analyzer.
            FYI – EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text file.

            Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the same thing as
            NoteTab!
            Text View: Speka Piragi
            Hex/Ascii View: Speķa Pīrāgi

            NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab – incorrectly.
            That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means that either
            Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I don't think it's Windows.
            This reminds me that, over time, I have often had problems with characters in NoteTab, and have done everything I could think of to
            resolve those issues. But this one seems unresolvable. (except for moving the
            file)

            Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often retain
            oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even guess why NoteTab
            detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do not know what mechanism editors
            use to detect code pages.

            Regards,
            John
            RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
            Sent: Saturday, January 12, 2013 00:39
            To: ntb-clips@yahoogroups.com
            Subject: Re: [Clip] Code page/character issues


            John Shotsky wrote:
            > If I export from FireFox, then copy the file to another folder, and
            > then open the copy in NoteTab, it does not have the problem I am
            > discussing. The original still has the problem.
            > If I export from FireFox, then rename the file in the original folder
            > using Windows, then open the file in NoteTab, it does not have the
            > problem I am discussing.

            This is really, really weird. There is no explanation I can think of except that Windows itself is saving extra meta-information
            about files and NT can access that. You should do a file compare in hex (Totalcommander can do that). If they are identical, they
            ought to be after a simple copy and must be after rename, then it has to be external meta-information.

            Axel
          • Axel Berger
            ... It has to, those characters are not in CP1252. Converting your sample and assuming mail transfer has not broken anything I get: Speķa Pīrāgi These are
            Message 5 of 14 , Jan 12, 2013
            • 0 Attachment
              John Shotsky wrote:
              > Text View: Speka Piragi
              > Hex/Ascii View: Speķa Pīrāgi
              > NoteTab correctly detects it as utf-8. But when I force it to
              > Windows 1252, it displays as in NoteTab – incorrectly.

              It has to, those characters are not in CP1252. Converting your sample
              and assuming mail transfer has not broken anything I get:

              Speķa Pīrāgi

              These are from the "extended block A"
              http://www.sql-und-xml.de/unicode-database/latin-extended-a.html

              NoteTab will never be able to deal with them satisfactorily. What I
              don't get at all is how Win7 interferes with them, but then I have so
              far refrained from using eXPerimental and stick to Win98. Even that
              tries to interfere and impose its preferences over mine, but there I can
              more or less control it. Your identical byte count might result from
              using UTF-16, don't newer Windoses do that? If so the byte count should
              be twice the letter count.

              > But, since EditPad Pro detects it correctly, I
              > don't think it's Windows.

              If editpad is true UTF, as you say, then it need not detect anything.
              Notetab is stricly 8-bit and strictly codepage based, all it can do is
              read letters from inside that single chosen codepage when encoded as
              UTF-8. Letters from more than one codepage inside the same document will
              never work.

              Axel
            • John Shotsky
              EditPad Pro is a Unicode editor, so yes, it displays Unicode and utf-8 and many other code pages correctly. But that file is not Unicode, it is 8-bit UTF. When
              Message 6 of 14 , Jan 12, 2013
              • 0 Attachment
                EditPad Pro is a Unicode editor, so yes, it displays Unicode and utf-8 and many other code pages correctly. But that
                file is not Unicode, it is 8-bit UTF. When one of these files is moved, NoteTab not only displays it correctly, but it
                also saves it correctly, that is, without the accents. So, that is the workaround for now. What is not acceptable is the
                file as first opened, which does not result in a question mark or any valid character in any code page. It is just
                garbage. Previously, NoteTab displayed a question mark for any character out of its map. Now, it doesn't.

                But that's not actually the point anyway. The file is UTF-8 when it is written, and after it is copied. Nothing is
                different about the file except that there is a copy in another location. The copy displays correctly in NoteTab, but
                the original doesn't. The copy works with my clip library, the original doesn't. If I export the original in NoteTab to
                UTF-8 it displays correctly, but of course just copying it works, as does renaming it, so I can't say the export
                actually does anything. However, if I export it to Ascii, question marks show up for those characters, as expected. The
                clip library can't work with a bunch of question marks either, of course, as there is no way to guess what the missing
                character is except through a very, very complex word map which replaces question marks with characters if the word is
                otherwise recognized. So, for the words you correctly detected below, I would simply substitute the unaccented
                characters for accented ones and that would be fine. But I can't do that with the original, because it displays EXTRA
                characters, as indicated in my 'Hex/Ascii' view below.

                So, for now, my instructions will include moving the FireFox-exported file to a work folder, and we'll go with that as
                long as it continues to work. As to the problem, I will leave it in the category of unresolvable.

                Regards,
                John
                RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                Sent: Saturday, January 12, 2013 07:23
                To: ntb-clips@yahoogroups.com
                Subject: Re: [Clip] Code page/character issues


                John Shotsky wrote:
                > Text View: Speka Piragi
                > Hex/Ascii View: Spe��a P��r��gi
                > NoteTab correctly detects it as utf-8. But when I force it to
                > Windows 1252, it displays as in NoteTab � incorrectly.

                It has to, those characters are not in CP1252. Converting your sample
                and assuming mail transfer has not broken anything I get:

                Speka Piragi

                These are from the "extended block A"
                http://www.sql-und-xml.de/unicode-database/latin-extended-a.html

                NoteTab will never be able to deal with them satisfactorily. What I
                don't get at all is how Win7 interferes with them, but then I have so
                far refrained from using eXPerimental and stick to Win98. Even that
                tries to interfere and impose its preferences over mine, but there I can
                more or less control it. Your identical byte count might result from
                using UTF-16, don't newer Windoses do that? If so the byte count should
                be twice the letter count.

                > But, since EditPad Pro detects it correctly, I
                > don't think it's Windows.

                If editpad is true UTF, as you say, then it need not detect anything.
                Notetab is stricly 8-bit and strictly codepage based, all it can do is
                read letters from inside that single chosen codepage when encoded as
                UTF-8. Letters from more than one codepage inside the same document will
                never work.

                Axel



                [Non-text portions of this message have been removed]
              • Axel Berger
                ... To my understanding UTF-8 as a specific encoding is a subset, or rather one of several possible versions, of Unicode. ... Sorry, but if those letters do
                Message 7 of 14 , Jan 12, 2013
                • 0 Attachment
                  John Shotsky wrote:
                  > But that file is not Unicode, it is 8-bit UTF.

                  To my understanding UTF-8 as a specific encoding is a subset, or rather
                  one of several possible versions, of Unicode.

                  > When one of these files is moved, NoteTab not only displays it
                  > correctly, but it also saves it correctly, that is, without the
                  > accents.

                  Sorry, but if those letters do have accents, then anything without is
                  INcorrect. It may be an acceptable workaround, like Muller or Mueller
                  instead of Müller, but never correct.

                  > So, that is the workaround for now.

                  Right

                  > But that's not actually the point anyway.

                  Agreed. Win7 does something strange here and I'm very happy I need not
                  concern myself with that.

                  > As to the problem, I will leave it in the category of unresolvable.

                  Probably best.
                Your message has been successfully submitted and would be delivered to recipients shortly.