Loading ...
Sorry, an error occurred while loading the content.

Question about special characters being replaced with '?' in HTML documents

Expand Messages
  • John
    Hi All- I ve been using Notetab for some time now and have yet to resolve an issue that has plagued me for some time. I use NT to edit website pages (primarily
    Message 1 of 10 , Feb 3, 2011
    • 0 Attachment
      Hi All-

      I've been using Notetab for some time now and have yet to resolve an issue that has plagued me for some time.

      I use NT to edit website pages (primarily HTML and PHP). When cutting and pasting certain characters like " (quote) the character is replaced with a question mark.

      For example - if I paste the following from anotehr site:

      "We were never happy with the production. Perhaps we should have taken more time over the record. But now we've got the chance to improve things. There will be no re-recording, just a remix".

      It will appear in Notetab as:

      ?We were never happy with the production. Perhaps we should have taken more time over the record. But now we?ve got the chance to improve things. There will be no re-recording, just a remix?.

      However if I pasted the same into a blank notetab file, the single and double quotes appear.

      Can someone explain why NoteTab is replacing these valid characters with a question mark?

      Thanks so much-

      -John
      www.cygnus-x1.net
    • John Shotsky
      It s probably because it s Unicode which is not fully supported by NoteTab. When you paste into a new doc, NoteTab uses the text format you have selected for
      Message 2 of 10 , Feb 3, 2011
      • 0 Attachment
        It's probably because it's Unicode which is not fully supported by NoteTab. When you paste into a new doc, NoteTab uses
        the text format you have selected for new documents, so it's displayed as plain text, and question marks are used to
        display characters it can't handle as plain text. To see if it's Unicode, you can look at it with a hex editor, which
        will show two bytes for those characters.

        This also happens with other high order characters, such as the single character for 1/3 that is supported by some
        systems, but not NoteTab. Those aren't Unicode, they are just high order characters that aren't supported in the
        font/code page combination you are using. But since your problem is with lower-order ASCII characters, I suspect it is
        Unicode.

        One way around this would be to get all your source files in plain ASCII, then the problem would be gone for good. There
        are several ways to do that, including using NoteTab itself, although I can't speak for the accuracy of doing so, since
        I use another program when I encounter this problem.

        Regards,
        John

        From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com] On Behalf Of John
        Sent: Thursday, February 03, 2011 2:56 PM
        To: ntb-html@yahoogroups.com
        Subject: [NH] Question about special characters being replaced with '?' in HTML documents


        Hi All-

        I've been using Notetab for some time now and have yet to resolve an issue that has plagued me for some time.

        I use NT to edit website pages (primarily HTML and PHP). When cutting and pasting certain characters like " (quote) the
        character is replaced with a question mark.

        For example - if I paste the following from anotehr site:

        "We were never happy with the production. Perhaps we should have taken more time over the record. But now we've got the
        chance to improve things. There will be no re-recording, just a remix".

        It will appear in Notetab as:

        ?We were never happy with the production. Perhaps we should have taken more time over the record. But now we?ve got the
        chance to improve things. There will be no re-recording, just a remix?.

        However if I pasted the same into a blank notetab file, the single and double quotes appear.

        Can someone explain why NoteTab is replacing these valid characters with a question mark?

        Thanks so much-

        -John
        www.cygnus-x1.net



        [Non-text portions of this message have been removed]
      • Axel Berger
        ... Beg to differ. I m sure it s not the standard ASCII double quotes right there on the keyboard but the typographic ones higher up in the charset. I d say
        Message 3 of 10 , Feb 3, 2011
        • 0 Attachment
          John Shotsky wrote:
          > But since your problem is with lower-order ASCII characters,
          > I suspect it is Unicode.

          Beg to differ. I'm sure it's not the standard ASCII double quotes right
          there on the keyboard but the typographic ones higher up in the charset.
          I'd say when you paste into an existing document, that document already
          has its encoding set, and if pasted characters don't exist in it, that's
          it. A new document can still adapt itself.

          Whatever it is, the existing document and the one the new text is copied
          from are in some way incompatible.

          Axel
        • John Shotsky
          Yes, of course it could be a character set problem - I never said it couldn t be. But my suspicion is that it really is a Unicode problem. NoteTab is very
          Message 4 of 10 , Feb 3, 2011
          • 0 Attachment
            Yes, of course it could be a character set problem - I never said it couldn't be. But my suspicion is that it really is
            a Unicode problem. NoteTab is very unreliable when working with Unicode. I can tell, if I have access to one of the
            original files, not saved by NoteTab. Once saved, the question marks are saved in place of the characters in question.

            Regards,
            John
            Recipe formatting tools: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com
            RecipeTools FTP Site: <ftp://recipetoolsftp.gotdns.com/> ftp://recipetoolsftp.gotdns.com
            Beaverton Weather: <http://shotsky.gotdns.com/> http://shotsky.gotdns.com

            From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com] On Behalf Of Axel Berger
            Sent: Thursday, February 03, 2011 9:10 PM
            To: ntb-html@yahoogroups.com
            Subject: Re: [NH] Question about special characters being replaced with '?' in HTML documents


            John Shotsky wrote:
            > But since your problem is with lower-order ASCII characters,
            > I suspect it is Unicode.

            Beg to differ. I'm sure it's not the standard ASCII double quotes right
            there on the keyboard but the typographic ones higher up in the charset.
            I'd say when you paste into an existing document, that document already
            has its encoding set, and if pasted characters don't exist in it, that's
            it. A new document can still adapt itself.

            Whatever it is, the existing document and the one the new text is copied
            from are in some way incompatible.

            Axel



            [Non-text portions of this message have been removed]
          • Don
            I have had similar problems in fact. I emailed about it not long ago. I think I found relief by switching to a different font if I remember correctly.
            Message 5 of 10 , Feb 3, 2011
            • 0 Attachment
              I have had similar problems in fact. I emailed about it not long ago.
              I think I found relief by switching to a different font if I remember
              correctly.

              On 2/4/2011 12:23 AM, John Shotsky wrote:
              > Yes, of course it could be a character set problem - I never said it couldn't be. But my suspicion is that it really is
              > a Unicode problem. NoteTab is very unreliable when working with Unicode. I can tell, if I have access to one of the
              > original files, not saved by NoteTab. Once saved, the question marks are saved in place of the characters in question.
              >
            • John Shotsky
              But every font accepts double quotes. That s why I suggested Unicode. Regards, John Recipe formatting tools:
              Message 6 of 10 , Feb 3, 2011
              • 0 Attachment
                But every font accepts double quotes. That's why I suggested Unicode.

                Regards,
                John
                Recipe formatting tools: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com
                RecipeTools FTP Site: <ftp://recipetoolsftp.gotdns.com/> ftp://recipetoolsftp.gotdns.com
                Beaverton Weather: <http://shotsky.gotdns.com/> http://shotsky.gotdns.com

                From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com] On Behalf Of Don
                Sent: Thursday, February 03, 2011 9:51 PM
                To: ntb-html@yahoogroups.com
                Subject: Re: [NH] Question about special characters being replaced with '?' in HTML documents


                I have had similar problems in fact. I emailed about it not long ago.
                I think I found relief by switching to a different font if I remember
                correctly.

                On 2/4/2011 12:23 AM, John Shotsky wrote:
                > Yes, of course it could be a character set problem - I never said it couldn't be. But my suspicion is that it really
                is
                > a Unicode problem. NoteTab is very unreliable when working with Unicode. I can tell, if I have access to one of the
                > original files, not saved by NoteTab. Once saved, the question marks are saved in place of the characters in question.
                >



                [Non-text portions of this message have been removed]
              • Don
                or do they? double quotes aren t always double quotes, they are smart quotes and so forth which was where I was having troubles
                Message 7 of 10 , Feb 3, 2011
                • 0 Attachment
                  or do they? double quotes aren't always double quotes, they are smart
                  quotes and so forth which was where I was having troubles

                  On 2/4/2011 12:56 AM, John Shotsky wrote:
                  > But every font accepts double quotes. That's why I suggested Unicode.
                  >
                • Axel Berger
                  ... I agree, after all Unicode is a kind of character set. I tend to use the primitive notepad for pasting and saving from websites and only paste short
                  Message 8 of 10 , Feb 4, 2011
                  • 0 Attachment
                    John Shotsky wrote:
                    > But my suspicion is that it really is a Unicode problem.
                    > NoteTab is very unreliable when working with Unicode.

                    I agree, after all Unicode is a kind of character set. I tend to use the
                    primitive notepad for pasting and saving from websites and only paste
                    short snippets directly. In the latter case wrong characters are obvious
                    and quickly dealt with. (And many html pages are pasted into in the
                    first place and have broken characters right there, so there's no way
                    manual rework can be eliminated entirely.)

                    Axel
                  • Sheri
                    ... NoteTab is an ANSI editor, not a Unicode one. Since version 6, it supports raw utf-8 as a way of preserving unicode. NoteTab allows you to use the
                    Message 9 of 10 , Feb 4, 2011
                    • 0 Attachment
                      On 2/4/2011 4:02 AM, Axel Berger wrote:
                      > John Shotsky wrote:
                      >> But my suspicion is that it really is a Unicode problem.
                      >> NoteTab is very unreliable when working with Unicode.
                      > I agree, after all Unicode is a kind of character set. I tend to use the
                      > primitive notepad for pasting and saving from websites and only paste
                      > short snippets directly. In the latter case wrong characters are obvious
                      > and quickly dealt with. (And many html pages are pasted into in the
                      > first place and have broken characters right there, so there's no way
                      > manual rework can be eliminated entirely.)
                      >
                      > Axel
                      >

                      NoteTab is an ANSI editor, not a Unicode one. Since version 6, it
                      supports raw utf-8 as a way of preserving unicode. NoteTab allows you to
                      use the Microsoft "ansi" table for utf-8. If you use raw utf-8, upper
                      characters will always appear as multiple bytes (goobledy gook) in
                      NoteTab regardless of the font chosen. Raw utf-8 is the only choice if
                      you want to edit unicode (Basic Multilinqual Plane only) completely
                      losslessly in NoteTab. When used for that purpose, it is more like a
                      data editor than a text editor. It is possible to use NoteTab to
                      reliably modify "unicode" that will be viewed outside of NoteTab, or to
                      carry out custom character conversions to "normal" ANSI formats.Utf-8
                      regex is the only tool provided.

                      All of this operates reliably, but easily goes awry when the user makes
                      a mistake. I think (and have previously reported) that better error
                      reporting could be implemented. For example, if you tell the PCRE regex
                      engine you are testing utf-8 and it sees text that is not utf-8 (e.g.,
                      because you've already made erroneous changes) it raises an error which
                      NoteTab doesn't tell you.

                      Despite the limitations, I've found having the capability to be useful.
                      Also, for more pedestrian use where losslessness is not imperative,
                      unicode can be converted to normal ANSI and edited normally in NoteTab.
                      But for obvious reasons when you need a unicode editor you should use one.

                      Regards,
                      Sheri
                    • loro
                      ... Do you copy the text from a page displayed in a browser or from the source? In either case is the source document on the web and can you link to it? I
                      Message 10 of 10 , Feb 4, 2011
                      • 0 Attachment
                        John wrote:
                        >For example - if I paste the following from anotehr site:
                        >
                        >"We were never happy with the production. Perhaps we should have
                        >taken more time over the record. But now we've got the chance to
                        >improve things. There will be no re-recording, just a remix".
                        >
                        >It will appear in Notetab as:
                        >
                        >?We were never happy with the production. Perhaps we should have
                        >taken more time over the record. But now we?ve got the chance to
                        >improve things. There will be no re-recording, just a remix?.

                        Do you copy the text from a page displayed in a browser or from the
                        source? In either case is the source document on the web and can you
                        link to it? I tried to google it up, but that quote is used on
                        hundreds of sites, it seems.

                        Lotta
                      Your message has been successfully submitted and would be delivered to recipients shortly.