Loading ...
Sorry, an error occurred while loading the content.

Code page/character issues

Expand Messages
  • John Shotsky
    I have been having ongoing problems with characters displaying correctly in NoteTab. I often get text files as an export from an addon in FireFox, called
    Message 1 of 14 , Jan 11, 2013
    • 0 Attachment
      I have been having ongoing problems with characters displaying correctly in NoteTab. I often get text files as an export
      from an addon in FireFox, called RecipeFox. It captures recipes off web pages and exports them to a text file. It is
      unpredictable what the original code page or character set of the web page is, so an attempt is made to export to text
      in UTF-8, so that it will display correctly in NoteTab. However, at times, it does not. I have determined that, at
      times, NoteTab chooses the wrong code page or text format when it opens a text file for the first time, which causes
      certain high-order characters to display incorrect � usually, a character preceded by an 'A' with a carat on top of it,
      but there are also other variations. Simple characters, such as copyright symbols and degree symbols are likewise
      incorrectly displayed in this instance, but other times the exported text files are ok. It apparently is linked to an
      analysis of characters found in the text file, since it usually happens when higher order ASCII characters are found in
      the file.

      So, I took one of these files which displays incorrectly and tried exporting it to various formats, including ANSI,
      ASCII, UTF-8, and Unicode. They all display correctly, except that the ASCII version replaces higher order characters
      with (two) question marks INCLUDING those copyright and degree symbols that would otherwise be ok in a plain text file.
      (Copyright ?? )

      I can forward the original and each of the exported versions if someone wants to look at the reason why this happens. It
      is very frustrating, because it cannot be determined ahead of time when this is going to happen, and, if the file is
      saved one time, it is permanently corrupted and can no longer be exported to any of the other formats, since it is saved
      as plain ASCII by default. I cannot ask my users to always export any text file to UTF-8 or ANSI before editing and
      saving the file, so my real goal is to learn how to avoid this problem, and possibly get a fix for NoteTab if it is
      actually a NoteTab issue alone. Since our group doesn't permit attachments, just let me know if you want me to forward
      my files to you.
      Thanks,
      John




      [Non-text portions of this message have been removed]
    • Axel Berger
      ... NoteTab can only work with one 8-bit character set at a time. It can read and use UTF-8 notation if all characters used can be mapped to one character
      Message 2 of 14 , Jan 11, 2013
      • 0 Attachment
        John Shotsky wrote:
        > usually, a character preceded by an 'A' with a carat on top of it,
        > but there are also other variations.

        NoteTab can only work with one 8-bit character set at a time. It can
        read and use UTF-8 notation if all characters used can be mapped to one
        character page, it can't deal with more than 256 different letters.
        So the safe way to go is to load as raw UTF-8 and run a clip to convert
        to something compatible like HTML's Ӓ notation.

        Axel
      • John Shotsky
        As I stated, I can t ask my users to know how to do that. They want to open a file, edit it a little, save and run my clips on it. Once they open, edit and
        Message 3 of 14 , Jan 11, 2013
        • 0 Attachment
          As I stated, I can't ask my users to know how to do that. They want to open a file, edit it a little, save and run my
          clips on it. Once they open, edit and save, the damage is done. What I am looking for is a way to intercept this problem
          so it doesn't happen, or a modification to NoteTab, if necessary, to make it work right. As I said, if I export this
          unedited file to either ANSI or UTF-8, it displays correctly, so why doesn't it just open it in that format to start
          with, instead of making bogus characters that aren't correct in *ANY* situation?
          (Spe��a P��r��gi) http://www.saveur.com/article/Recipes/Speka-Piragi-Bacon-Turnovers

          Regards,
          John
          RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
          Sent: Friday, January 11, 2013 10:00
          To: ntb-clips@yahoogroups.com
          Subject: Re: [Clip] Code page/character issues


          John Shotsky wrote:
          > usually, a character preceded by an 'A' with a carat on top of it,
          > but there are also other variations.

          NoteTab can only work with one 8-bit character set at a time. It can
          read and use UTF-8 notation if all characters used can be mapped to one
          character page, it can't deal with more than 256 different letters.
          So the safe way to go is to load as raw UTF-8 and run a clip to convert
          to something compatible like HTML's ? notation.

          Axel



          [Non-text portions of this message have been removed]
        • Axel Berger
          ... I second that. A general setting turning off all UTF capabilities and detection would be extremely nice to have. It will then be up to you to detect and
          Message 4 of 14 , Jan 11, 2013
          • 0 Attachment
            John Shotsky wrote:
            > What I am looking for is a way to intercept this problem
            > so it doesn't happen, or a modification to NoteTab, if necessary,
            > to make it work right.

            I second that. A general setting turning off all UTF capabilities and
            detection would be extremely nice to have. It will then be up to you to
            detect and deal with such characters, but for myself I've found that to
            be the easy part, stopping NoteTab from interfering is the hard bit.

            Axel
          • John Shotsky
            Here is another interesting thing about this problem. If I export the file which creates the condition I m discussing, and do not even open the file in
            Message 5 of 14 , Jan 11, 2013
            • 0 Attachment
              Here is another interesting thing about this problem.
              If I export the file which creates the condition I'm discussing, and do not even open the file in NoteTab, then copy the
              file to another folder, then open the copy, it reads fine. If I open the original, it has the problem. That is, the
              simple act of copying an original file to another folder using Windows 7 'fixes' the problem. That is so bizarre, but at
              least I can tell my users to copy it to a different work folder before opening it.

              Regards,
              John
              RecipeTools Web Site: http://recipetools.gotdns.com/


              -----Original Message-----
              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
              Sent: Friday, January 11, 2013 10:16
              To: ntb-clips@yahoogroups.com
              Subject: RE: [Clip] Code page/character issues

              As I stated, I can't ask my users to know how to do that. They want to open a file, edit it a little, save and run my
              clips on it. Once they open, edit and save, the damage is done. What I am looking for is a way to intercept this problem
              so it doesn't happen, or a modification to NoteTab, if necessary, to make it work right. As I said, if I export this
              unedited file to either ANSI or UTF-8, it displays correctly, so why doesn't it just open it in that format to start
              with, instead of making bogus characters that aren't correct in *ANY* situation?
              (Speķa Pīr�gi) http://www.saveur.com/article/Recipes/Speka-Piragi-Bacon-Turnovers

              Regards,
              John
              RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
              Sent: Friday, January 11, 2013 10:00
              To: ntb-clips@yahoogroups.com
              Subject: Re: [Clip] Code page/character issues


              John Shotsky wrote:
              > usually, a character preceded by an 'A' with a carat on top of it,
              > but there are also other variations.

              NoteTab can only work with one 8-bit character set at a time. It can
              read and use UTF-8 notation if all characters used can be mapped to one
              character page, it can't deal with more than 256 different letters.
              So the safe way to go is to load as raw UTF-8 and run a clip to convert
              to something compatible like HTML's ? notation.

              Axel



              [Non-text portions of this message have been removed]



              ------------------------------------

              Fookes Software: http://www.fookes.com/
              NoteTab website: http://www.notetab.com/
              NoteTab Discussion Lists: http://www.notetab.com/groups.php

              ***
              Yahoo! Groups Links
            • Axel Berger
              ... Almost certainly not. Your original has already been opened and probably saved by NT at least once before. NT saves info about recent files in Note*.fpr.
              Message 6 of 14 , Jan 11, 2013
              • 0 Attachment
                John Shotsky wrote:
                > If I open the original, it has the problem.

                Almost certainly not. Your "original" has already been opened and
                probably saved by NT at least once before. NT saves info about recent
                files in Note*.fpr. When I encounter intractable UTF problems, I rename
                the file and open it then, does the same trick.

                > at least I can tell my users to copy it to a different work folder
                > before opening it.

                If they do that before opening it for the first time, they'll achieve
                nothing.

                But you can include the technique in a clip: Save the file, Rename it
                through ^!Dos (wait a second for cacheing), and open the "new" one. Even
                better, if you can make your users open NT with the /RawUTF8 parameter,
                your problems ought to be solved.

                (I should have remembered. When introduced I refrained from using that,
                because I wanted to keep the way back to version 5.8 and 6.2 open. By
                now I feel quite safe in 7.x and ought to reconsider. Now, where and how
                to find each and every place where NT is evoked? Registry, Programs
                folder, Links, Commands in dozens of shell-like programs, ... -- looks a
                lot like a big fool's errand, and needs to be done on at least three
                machines: Eric, a command line parameter is good, a setting would be
                much much better.)

                Axel
              • John Shotsky
                When the text file is exported from Firefox, then opened in NoteTab, it has the problem I am discussing. Every time, for the same export. If I export from
                Message 7 of 14 , Jan 11, 2013
                • 0 Attachment
                  When the text file is exported from Firefox, then opened in NoteTab, it has the problem I am discussing. Every time, for
                  the same export.
                  If I export from FireFox, then reboot my computer, and then open the file with NoteTab, it has the problem I am
                  discussing.
                  If I open the file from FireFox, then save it with another name, it still has the same problem.
                  If I open the file from FireFox, copy the contents, then paste into a new file and save THAT, it still has the same
                  problem.
                  If I export from FireFox, then copy the file to another folder, and then open the copy in NoteTab, it does not have the
                  problem I am discussing. The original still has the problem.
                  If I export from FireFox, then rename the file in the original folder using Windows, then open the file in NoteTab, it
                  does not have the problem I am discussing. There is no original at this point.
                  I can show it with screenshots, but if I copy or rename the file, the problem vanishes. Win7-64.
                  Regards,
                  John
                  RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                  From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                  Sent: Friday, January 11, 2013 13:47
                  To: ntb-clips@yahoogroups.com
                  Subject: Re: [Clip] Code page/character issues


                  John Shotsky wrote:
                  > If I open the original, it has the problem.

                  Almost certainly not. Your "original" has already been opened and
                  probably saved by NT at least once before. NT saves info about recent
                  files in Note*.fpr. When I encounter intractable UTF problems, I rename
                  the file and open it then, does the same trick.

                  > at least I can tell my users to copy it to a different work folder
                  > before opening it.

                  If they do that before opening it for the first time, they'll achieve
                  nothing.

                  But you can include the technique in a clip: Save the file, Rename it
                  through ^!Dos (wait a second for cacheing), and open the "new" one. Even
                  better, if you can make your users open NT with the /RawUTF8 parameter,
                  your problems ought to be solved.

                  (I should have remembered. When introduced I refrained from using that,
                  because I wanted to keep the way back to version 5.8 and 6.2 open. By
                  now I feel quite safe in 7.x and ought to reconsider. Now, where and how
                  to find each and every place where NT is evoked? Registry, Programs
                  folder, Links, Commands in dozens of shell-like programs, ... -- looks a
                  lot like a big fool's errand, and needs to be done on at least three
                  machines: Eric, a command line parameter is good, a setting would be
                  much much better.)

                  Axel



                  [Non-text portions of this message have been removed]
                • Axel Berger
                  ... This is really, really weird. There is no explanation I can think of except that Windows itself is saving extra meta-information about files and NT can
                  Message 8 of 14 , Jan 12, 2013
                  • 0 Attachment
                    John Shotsky wrote:
                    > If I export from FireFox, then copy the file to another folder,
                    > and then open the copy in NoteTab, it does not have the
                    > problem I am discussing. The original still has the problem.
                    > If I export from FireFox, then rename the file in the original
                    > folder using Windows, then open the file in NoteTab, it
                    > does not have the problem I am discussing.

                    This is really, really weird. There is no explanation I can think of
                    except that Windows itself is saving extra meta-information about files
                    and NT can access that. You should do a file compare in hex
                    (Totalcommander can do that). If they are identical, they ought to be
                    after a simple copy and must be after rename, then it has to be external
                    meta-information.

                    Axel
                  • John Shotsky
                    Well, if it was mainstream, I wouldn t have brought it up here. At first, I thought maybe FireFox wasn t properly closing the file, which is why I rebooted
                    Message 9 of 14 , Jan 12, 2013
                    • 0 Attachment
                      Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly
                      closing the file, which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show
                      anything different between the before and after. The byte count doesn't change. But somehow, NoteTab is seeing
                      'something' that I can't find, which is why I asked here. I'll see if I can find a more robust file metadata analyzer.
                      FYI � EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text
                      file.

                      Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the
                      same thing as NoteTab!
                      Text View: Speka Piragi
                      Hex/Ascii View: Spe��a P��r��gi

                      NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab � incorrectly.
                      That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means
                      that either Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I
                      don't think it's Windows. This reminds me that, over time, I have often had problems with characters in NoteTab, and
                      have done everything I could think of to resolve those issues. But this one seems unresolvable. (except for moving the
                      file)

                      Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often
                      retain oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even
                      guess why NoteTab detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do
                      not know what mechanism editors use to detect code pages.

                      Regards,
                      John
                      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                      Sent: Saturday, January 12, 2013 00:39
                      To: ntb-clips@yahoogroups.com
                      Subject: Re: [Clip] Code page/character issues


                      John Shotsky wrote:
                      > If I export from FireFox, then copy the file to another folder,
                      > and then open the copy in NoteTab, it does not have the
                      > problem I am discussing. The original still has the problem.
                      > If I export from FireFox, then rename the file in the original
                      > folder using Windows, then open the file in NoteTab, it
                      > does not have the problem I am discussing.

                      This is really, really weird. There is no explanation I can think of
                      except that Windows itself is saving extra meta-information about files
                      and NT can access that. You should do a file compare in hex
                      (Totalcommander can do that). If they are identical, they ought to be
                      after a simple copy and must be after rename, then it has to be external
                      meta-information.

                      Axel



                      [Non-text portions of this message have been removed]
                    • John Shotsky
                      One last bit of information about this problem: When the file has been moved or renamed, it displays correctly in NoteTab, although it doesn t display the
                      Message 10 of 14 , Jan 12, 2013
                      • 0 Attachment
                        One last bit of information about this problem:
                        When the file has been moved or renamed, it displays correctly in NoteTab, although it doesn't display the accents. It
                        displays standard English characters. However, when this same file is viewed in EditPad Pro, the original accents are
                        still present. It is only the DISPLAY that is different between these two instances. So, in one case NoteTab displays
                        one way, and in the other case a different way.

                        From what I've read about code page detection, there are commands that search through a document and report on
                        characters found. I still have no idea why moving or renaming a file provides different results to NoteTab. But it is
                        apparently something about this search or the decision about how to display that is not working correctly.

                        In my research, I found this fascinating tale of the history and basis of characters on computers:
                        http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

                        Regards,
                        John
                        RecipeTools Web Site: http://recipetools.gotdns.com/


                        -----Original Message-----
                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
                        Sent: Saturday, January 12, 2013 05:06
                        To: ntb-clips@yahoogroups.com
                        Subject: RE: [Clip] Code page/character issues

                        Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly
                        closing the file, which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show
                        anything different between the before and after. The byte count doesn't change. But somehow, NoteTab is seeing
                        'something' that I can't find, which is why I asked here. I'll see if I can find a more robust file metadata analyzer.
                        FYI – EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text
                        file.

                        Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the
                        same thing as NoteTab!
                        Text View: Speka Piragi
                        Hex/Ascii View: Speķa Pīr�gi

                        NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab – incorrectly.
                        That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means
                        that either Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I
                        don't think it's Windows. This reminds me that, over time, I have often had problems with characters in NoteTab, and
                        have done everything I could think of to resolve those issues. But this one seems unresolvable. (except for moving the
                        file)

                        Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often
                        retain oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even
                        guess why NoteTab detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do
                        not know what mechanism editors use to detect code pages.

                        Regards,
                        John
                        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                        Sent: Saturday, January 12, 2013 00:39
                        To: ntb-clips@yahoogroups.com
                        Subject: Re: [Clip] Code page/character issues


                        John Shotsky wrote:
                        > If I export from FireFox, then copy the file to another folder,
                        > and then open the copy in NoteTab, it does not have the
                        > problem I am discussing. The original still has the problem.
                        > If I export from FireFox, then rename the file in the original
                        > folder using Windows, then open the file in NoteTab, it
                        > does not have the problem I am discussing.

                        This is really, really weird. There is no explanation I can think of
                        except that Windows itself is saving extra meta-information about files
                        and NT can access that. You should do a file compare in hex
                        (Totalcommander can do that). If they are identical, they ought to be
                        after a simple copy and must be after rename, then it has to be external
                        meta-information.

                        Axel



                        [Non-text portions of this message have been removed]



                        ------------------------------------

                        Fookes Software: http://www.fookes.com/
                        NoteTab website: http://www.notetab.com/
                        NoteTab Discussion Lists: http://www.notetab.com/groups.php

                        ***
                        Yahoo! Groups Links
                      • John Wallace
                        What would happen if you saved it as a .html file instead of .txt file? ... From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of
                        Message 11 of 14 , Jan 12, 2013
                        • 0 Attachment
                          What would happen if you saved it as a .html file instead of .txt file?


                          -----Original Message-----
                          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
                          Sent: Saturday, January 12, 2013 05:06
                          To: ntb-clips@yahoogroups.com
                          Subject: RE: [Clip] Code page/character issues

                          Well, if it was mainstream, I wouldn't have brought it up here. At first, I thought maybe FireFox wasn't properly closing the file,
                          which is why I rebooted between, but that didn't help. I've inspected properties, but that didn't show anything different between
                          the before and after. The byte count doesn't change. But somehow, NoteTab is seeing 'something' that I can't find, which is why I
                          asked here. I'll see if I can find a more robust file metadata analyzer.
                          FYI – EditPad Pro opens it fine, but it is also a Unicode editor, and I often use it to see what is 'really' in a text file.

                          Now for the even weirder part: EditPad Pro DOES display it correctly in text view, but in hex/Ascii view, it shows the same thing as
                          NoteTab!
                          Text View: Speka Piragi
                          Hex/Ascii View: Speķa Pīrāgi

                          NoteTab correctly detects it as utf-8. But when I force it to Windows 1252, it displays as in NoteTab – incorrectly.
                          That means that NoteTab is detecting it as Win1252 before it is renamed, and as UTF8 afterwards. Further, that means that either
                          Windows or NoteTab is incorrectly detecting the code page. But, since EditPad Pro detects it correctly, I don't think it's Windows.
                          This reminds me that, over time, I have often had problems with characters in NoteTab, and have done everything I could think of to
                          resolve those issues. But this one seems unresolvable. (except for moving the
                          file)

                          Recipes, it turns out, can be very difficult, since they come from every country, and even when Anglicized, they often retain
                          oddball characters. Web pages can display these characters fine, but text editors may have problems. I can't even guess why NoteTab
                          detects the code page differently, but I am pretty certain that it is a NoteTab problem alone. I do not know what mechanism editors
                          use to detect code pages.

                          Regards,
                          John
                          RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                          Sent: Saturday, January 12, 2013 00:39
                          To: ntb-clips@yahoogroups.com
                          Subject: Re: [Clip] Code page/character issues


                          John Shotsky wrote:
                          > If I export from FireFox, then copy the file to another folder, and
                          > then open the copy in NoteTab, it does not have the problem I am
                          > discussing. The original still has the problem.
                          > If I export from FireFox, then rename the file in the original folder
                          > using Windows, then open the file in NoteTab, it does not have the
                          > problem I am discussing.

                          This is really, really weird. There is no explanation I can think of except that Windows itself is saving extra meta-information
                          about files and NT can access that. You should do a file compare in hex (Totalcommander can do that). If they are identical, they
                          ought to be after a simple copy and must be after rename, then it has to be external meta-information.

                          Axel
                        • Axel Berger
                          ... It has to, those characters are not in CP1252. Converting your sample and assuming mail transfer has not broken anything I get: Speķa Pīrāgi These are
                          Message 12 of 14 , Jan 12, 2013
                          • 0 Attachment
                            John Shotsky wrote:
                            > Text View: Speka Piragi
                            > Hex/Ascii View: Speķa Pīrāgi
                            > NoteTab correctly detects it as utf-8. But when I force it to
                            > Windows 1252, it displays as in NoteTab – incorrectly.

                            It has to, those characters are not in CP1252. Converting your sample
                            and assuming mail transfer has not broken anything I get:

                            Speķa Pīrāgi

                            These are from the "extended block A"
                            http://www.sql-und-xml.de/unicode-database/latin-extended-a.html

                            NoteTab will never be able to deal with them satisfactorily. What I
                            don't get at all is how Win7 interferes with them, but then I have so
                            far refrained from using eXPerimental and stick to Win98. Even that
                            tries to interfere and impose its preferences over mine, but there I can
                            more or less control it. Your identical byte count might result from
                            using UTF-16, don't newer Windoses do that? If so the byte count should
                            be twice the letter count.

                            > But, since EditPad Pro detects it correctly, I
                            > don't think it's Windows.

                            If editpad is true UTF, as you say, then it need not detect anything.
                            Notetab is stricly 8-bit and strictly codepage based, all it can do is
                            read letters from inside that single chosen codepage when encoded as
                            UTF-8. Letters from more than one codepage inside the same document will
                            never work.

                            Axel
                          • John Shotsky
                            EditPad Pro is a Unicode editor, so yes, it displays Unicode and utf-8 and many other code pages correctly. But that file is not Unicode, it is 8-bit UTF. When
                            Message 13 of 14 , Jan 12, 2013
                            • 0 Attachment
                              EditPad Pro is a Unicode editor, so yes, it displays Unicode and utf-8 and many other code pages correctly. But that
                              file is not Unicode, it is 8-bit UTF. When one of these files is moved, NoteTab not only displays it correctly, but it
                              also saves it correctly, that is, without the accents. So, that is the workaround for now. What is not acceptable is the
                              file as first opened, which does not result in a question mark or any valid character in any code page. It is just
                              garbage. Previously, NoteTab displayed a question mark for any character out of its map. Now, it doesn't.

                              But that's not actually the point anyway. The file is UTF-8 when it is written, and after it is copied. Nothing is
                              different about the file except that there is a copy in another location. The copy displays correctly in NoteTab, but
                              the original doesn't. The copy works with my clip library, the original doesn't. If I export the original in NoteTab to
                              UTF-8 it displays correctly, but of course just copying it works, as does renaming it, so I can't say the export
                              actually does anything. However, if I export it to Ascii, question marks show up for those characters, as expected. The
                              clip library can't work with a bunch of question marks either, of course, as there is no way to guess what the missing
                              character is except through a very, very complex word map which replaces question marks with characters if the word is
                              otherwise recognized. So, for the words you correctly detected below, I would simply substitute the unaccented
                              characters for accented ones and that would be fine. But I can't do that with the original, because it displays EXTRA
                              characters, as indicated in my 'Hex/Ascii' view below.

                              So, for now, my instructions will include moving the FireFox-exported file to a work folder, and we'll go with that as
                              long as it continues to work. As to the problem, I will leave it in the category of unresolvable.

                              Regards,
                              John
                              RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
                              Sent: Saturday, January 12, 2013 07:23
                              To: ntb-clips@yahoogroups.com
                              Subject: Re: [Clip] Code page/character issues


                              John Shotsky wrote:
                              > Text View: Speka Piragi
                              > Hex/Ascii View: Spe��a P��r��gi
                              > NoteTab correctly detects it as utf-8. But when I force it to
                              > Windows 1252, it displays as in NoteTab � incorrectly.

                              It has to, those characters are not in CP1252. Converting your sample
                              and assuming mail transfer has not broken anything I get:

                              Speka Piragi

                              These are from the "extended block A"
                              http://www.sql-und-xml.de/unicode-database/latin-extended-a.html

                              NoteTab will never be able to deal with them satisfactorily. What I
                              don't get at all is how Win7 interferes with them, but then I have so
                              far refrained from using eXPerimental and stick to Win98. Even that
                              tries to interfere and impose its preferences over mine, but there I can
                              more or less control it. Your identical byte count might result from
                              using UTF-16, don't newer Windoses do that? If so the byte count should
                              be twice the letter count.

                              > But, since EditPad Pro detects it correctly, I
                              > don't think it's Windows.

                              If editpad is true UTF, as you say, then it need not detect anything.
                              Notetab is stricly 8-bit and strictly codepage based, all it can do is
                              read letters from inside that single chosen codepage when encoded as
                              UTF-8. Letters from more than one codepage inside the same document will
                              never work.

                              Axel



                              [Non-text portions of this message have been removed]
                            • Axel Berger
                              ... To my understanding UTF-8 as a specific encoding is a subset, or rather one of several possible versions, of Unicode. ... Sorry, but if those letters do
                              Message 14 of 14 , Jan 12, 2013
                              • 0 Attachment
                                John Shotsky wrote:
                                > But that file is not Unicode, it is 8-bit UTF.

                                To my understanding UTF-8 as a specific encoding is a subset, or rather
                                one of several possible versions, of Unicode.

                                > When one of these files is moved, NoteTab not only displays it
                                > correctly, but it also saves it correctly, that is, without the
                                > accents.

                                Sorry, but if those letters do have accents, then anything without is
                                INcorrect. It may be an acceptable workaround, like Muller or Mueller
                                instead of Müller, but never correct.

                                > So, that is the workaround for now.

                                Right

                                > But that's not actually the point anyway.

                                Agreed. Win7 does something strange here and I'm very happy I need not
                                concern myself with that.

                                > As to the problem, I will leave it in the category of unresolvable.

                                Probably best.
                              Your message has been successfully submitted and would be delivered to recipients shortly.