Loading ...
Sorry, an error occurred while loading the content.

Re: [NH] HTML syntax highlighting not working correct

Expand Messages
  • Axel Berger
    ... In my experience URL highlighting has always been partially fooled by word wrap. I never found it a problem, but using NZ for many things besides HTML I m
    Message 1 of 29 , Dec 27, 2012
    • 0 Attachment
      manon_purple wrote:
      > or only partial highlighted.

      In my experience URL highlighting has always been partially fooled by
      word wrap. I never found it a problem, but using NZ for many things
      besides HTML I'm used to doing without highlighting anyway.

      Axel
    • M
      Hi, I noticed that syntax highlighting is haywire. I assumed that the programmers were already aware of this fault and were working on it. No doubt there will
      Message 2 of 29 , Dec 27, 2012
      • 0 Attachment
        Hi,

        I noticed that syntax highlighting is haywire. I assumed that the
        programmers were already aware of this fault and were working on it. No
        doubt there will be an upgrade soon. I hope so.

        Michael Rawley.



        -----
        No virus found in this message.
        Checked by AVG - www.avg.com
        Version: 2013.0.2805 / Virus Database: 2637/5989 - Release Date: 12/26/12
      • manon_purple
        Thanks for the input Axel, so far did not see a relation with wordwrap, but that could be an issue. For now I revert to version 6.
        Message 3 of 29 , Dec 27, 2012
        • 0 Attachment
          Thanks for the input Axel, so far did not see a relation with wordwrap, but that could be an issue. For now I revert to version 6.

          --- In ntb-html@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
          >
          > manon_purple wrote:
          > > or only partial highlighted.
          >
          > In my experience URL highlighting has always been partially fooled by
          > word wrap. I never found it a problem, but using NZ for many things
          > besides HTML I'm used to doing without highlighting anyway.
          >
          > Axel
          >
        • manon_purple
          Thanks Michael, in version 7.1 a problem with syntax highlighting is supposed to be remedied, according to the changes log. I have no idea if the programmers
          Message 4 of 29 , Dec 27, 2012
          • 0 Attachment
            Thanks Michael, in version 7.1 a problem with syntax highlighting is supposed to be remedied, according to the changes log. I have no idea if the programmers read this board. For now I will revert to version 6.

            --- In ntb-html@yahoogroups.com, M <rawley@...> wrote:
            >
            > Hi,
            >
            > I noticed that syntax highlighting is haywire. I assumed that the
            > programmers were already aware of this fault and were working on it. No
            > doubt there will be an upgrade soon. I hope so.
            >
            > Michael Rawley.
            >
            >
            >
            > -----
            > No virus found in this message.
            > Checked by AVG - www.avg.com
            > Version: 2013.0.2805 / Virus Database: 2637/5989 - Release Date: 12/26/12
            >
          • Fookes Software
            Hi, Yes we are aware of certain issues with the syntax highlighting feature. We have fixed many problems but it s complicated and some fixes unfortunately
            Message 5 of 29 , Jan 17, 2013
            • 0 Attachment
              Hi,

              Yes we are aware of certain issues with the syntax highlighting
              feature. We have fixed many problems but it's complicated and some
              fixes unfortunately cause other problems elsewhere. But we're doing
              our best to sort it out.

              --
              Regards,

              Julian - Fookes Software Helpdesk


              On Fri, Dec 28, 2012 at 8:56 AM, manon_purple <manon_purple@...> wrote:
              > Thanks Michael, in version 7.1 a problem with syntax highlighting is supposed to be remedied, according to the changes log. I have no idea if the programmers read this board. For now I will revert to version 6.
              >
              > --- In ntb-html@yahoogroups.com, M <rawley@...> wrote:
              >>
              >> Hi,
              >>
              >> I noticed that syntax highlighting is haywire. I assumed that the
              >> programmers were already aware of this fault and were working on it. No
              >> doubt there will be an upgrade soon. I hope so.
              >>
              >> Michael Rawley.
              >>
              >>
              >>
              >> -----
              >> No virus found in this message.
              >> Checked by AVG - www.avg.com
              >> Version: 2013.0.2805 / Virus Database: 2637/5989 - Release Date: 12/26/12
              >>
              >
              >
              >
              >
              > ------------------------------------
              >
              > Fookes Software: http://www.fookes.com/
              > NoteTab website: http://www.notetab.com/
              > NoteTab Discussion Lists: http://www.notetab.com/groups.php
              >
              > ***
              > Yahoo! Groups Links
              >
              >
              >
            • shotsky1
              A belated answer, but I did discover what was causing it in one case. Regardless of how I tried to open an html file, it always opened as code page Unicode
              Message 6 of 29 , Nov 15, 2014
              • 0 Attachment

                A belated answer, but I did discover what was causing it in one case. Regardless of how I tried to open an html file, it always opened as code page Unicode \65001. That led to an investigation of why. Eventually, I traced it to a charset callout of utf8 in the head of the document. If you have that in the metadata, NoteTab will ignore your desired code page and go by that callout.

                I tested by simply opening the file, using regex to remove the metatag then saving. Then I opened the file with the /1252 code page, and it worked correctly. And, the highlighting went from totally bogus to very nearly perfect. Only took me a half a day to figure out what was causing it - I knew there was no Unicode in the document, as it was originally saved as plain text (.txt) and that worked ok - it would open with whatever code page I wanted. It was only when saved as html that the code page could not be controlled.

                Hope this helps someone else, it has bugged me forever.

                John

              • Axel Berger
                ... Could you do me a favour and retry opening the original file with the command line option /RawUTF8 I am right in the middle of a lot of work with a
                Message 7 of 29 , Nov 15, 2014
                • 0 Attachment
                  "jshotsky@... [ntb-html]" wrote:
                  > Eventually, I
                  > traced it to a charset callout of utf8 in the head of the document.
                  > If you have that in the metadata, NoteTab will ignore your desired
                  > code page and go by that callout.

                  Could you do me a favour and retry opening the original file with the
                  command line option /RawUTF8

                  I am right in the middle of a lot of work with a deadline and do not dare
                  installing new software, so I can't test 7.2 as yet. From the list of
                  changes it seems my desparate wish to have that option added to the
                  permanent settings has been ignored again.

                  Does just giving that command line parameter solve your highlighting
                  problem? If so it is one more argument for offering it as a permenent
                  setting in the setup.

                  Danke
                  Axel
                • John Shotsky
                  The problem is that it opens as Unicode/utf8 even when you don t want it to if the charset is called out. But, I tried it and the files become read only when
                  Message 8 of 29 , Nov 15, 2014
                  • 0 Attachment

                    The problem is that it opens as Unicode/utf8 even when you don't want it to if the charset is called out. But, I tried it and the files become read only when opened that way.

                     

                     

                    From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                    Sent: Saturday, November 15, 2014 13:08
                    To: ntb-html@yahoogroups.com
                    Subject: Re: [NH] HTML syntax highlighting not working correct

                     

                     

                    "jshotsky@... [ntb-html]" wrote:

                    > Eventually, I
                    > traced it to a charset callout of utf8 in the head of the document.
                    > If you have that in the metadata, NoteTab will ignore your desired
                    > code page and go by that callout.

                    Could you do me a favour and retry opening the original file with the
                    command line option /RawUTF8

                    I am right in the middle of a lot of work with a deadline and do not dare
                    installing new software, so I can't test 7.2 as yet. From the list of
                    changes it seems my desparate wish to have that option added to the
                    permanent settings has been ignored again.

                    Does just giving that command line parameter solve your highlighting
                    problem? If so it is one more argument for offering it as a permenent
                    setting in the setup.

                    Danke
                    Axel

                  • Axel Berger
                    ... To my mind that s a good thing. In HTML whenever a charset is explicitly given, that charset ought to be used. ... I don t get it. Always using that
                    Message 9 of 29 , Nov 15, 2014
                    • 0 Attachment
                      "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                      > The problem is that it opens as Unicode/utf8 even when you don't want it
                      > to if the charset is called out.

                      To my mind that's a good thing. In HTML whenever a charset is explicitly
                      given, that charset ought to be used.

                      > But, I tried it and the files become
                      > read only when opened that way.

                      I don't get it. Always using that command line option as a standard, which
                      involved editing the registry and many other places, a lot of hassle, is
                      what keeps me free from that forced conversion and read-only nonsense.

                      Axel
                    • John Shotsky
                      Axel, I don t WANT to open the file as utf8 in a plain text editor that has no Unicode in it. I don t WANT it to be html at all. I want the text file that
                      Message 10 of 29 , Nov 16, 2014
                      • 0 Attachment

                        Axel,

                        I don't WANT to open the file as utf8 in a plain text editor that has no Unicode in it. I don't WANT it to be html at all. I want the text file that happens to have html text in it to open at the code page I specify, not on the command line, but in a clip command. I can open a .txt file that has the exact same text in it as any code page I choose. But if the file type is .html, it will ONLY open based on that character set callout, regardless of WHAT I tell it to open as. That is just wrong. If I remove the charset callout, it does as I tell it to do. It should not be READING that text, it should be obeying the command that it is given when told to open the file.

                        It doesn't even make sense to be able to open a file as any given code page if it is going to read the text as it opens, and choose what code page to use automatically from that text.

                        From the NoteTab help:

                        ^!Open "FileName" [/C=CodePage] (added in v6.0)

                        Opens or selects the specified document "FileName", or opens multiple files (names separated by semicolon). Use fully qualified file names (with path name) to ensure the command finds the correct file. You can use wild cards with this command. The optional switch /R can be used to open the file(s) as Read-Only, and the /J= switch can be used to place the cursor at a specific line number when it is opened. If /J=-1, the cursor will be placed at the beginning of the last line. Use the /C, switch to define a Code Page value for the file(s) begin opened. CodePage should be one of the numeric values listed here: http://www.notetab.com/redir/codepage

                        It doesn't say it will open the specified document UNLESS it has an html filetype AND has a charset callout in the text, in which case it will use that charset/codepage regardless of what you want to do. And it doesn't explain why, when it does that, it fails to properly highlight the html. It uses multiple colors in html tags, it highlights only parts of some tags, and it highlights text between tags. That is also wrong, because it should first have opened at the code page I specified, and it should not blow the highlighting when opened, at all. As soon as I take that charset callout out of the text, it no longer matters what I tell it to open as, it just does what I tell it. And, the highlighting is correct. This all started because I was trying to solve the problem of why the highlighting was erratic. I eventually discovered that it was opening as /65001 regardless of what I told it to do, and that led to the faulty highlighting. So, there are two bugs there: not following the command to open as instructed, and somehow blowing the highlighting when it opens as /65001.

                         

                         

                        From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                        Sent: Saturday, November 15, 2014 14:16
                        To: ntb-html@yahoogroups.com
                        Subject: Re: [NH] HTML syntax highlighting not working correct

                         

                         

                        "'John Shotsky' jshotsky@... [ntb-html]" wrote:

                        > The problem is that it opens as Unicode/utf8 even when you don't want it
                        > to if the charset is called out.

                        To my mind that's a good thing. In HTML whenever a charset is explicitly
                        given, that charset ought to be used.

                        > But, I tried it and the files become
                        > read only when opened that way.

                        I don't get it. Always using that command line option as a standard, which
                        involved editing the registry and many other places, a lot of hassle, is
                        what keeps me free from that forced conversion and read-only nonsense.

                        Axel

                      • Marcelo Bastos
                        ... Odd, that s not my experience. I have pages I download from a few websites that are declared (in the header) as UTF-8, but since they don t have any high
                        Message 11 of 29 , Nov 16, 2014
                        • 0 Attachment
                          On 16/11/2014 10:59, 'John Shotsky' jshotsky@... [ntb-html] wrote:
                          > I don't WANT to open the file as utf8 in a plain text editor that has
                          > no Unicode in it. I don't WANT it to be html at all. I want the text
                          > file that happens to have html text in it to open at the code page I
                          > specify, not on the command line, but in a clip command. I can open a
                          > .txt file that has the exact same text in it as any code page I
                          > choose. But if the file type is .html, it will ONLY open based on that
                          > character set callout, regardless of WHAT I tell it to open as. That
                          > is just wrong. If I remove the charset callout, it does as I tell it
                          > to do. It should not be READING that text, it should be obeying the
                          > command that it is given when told to open the file.

                          Odd, that's not my experience. I have pages I download from a few
                          websites that are declared (in the header) as UTF-8, but since they
                          don't have any "high" characters, NoteTab treats them as regular ASCII
                          -- NoteTab behaves somewhat oddly (search & replace fields behave oddly,
                          some files become read-only) when it is working with an Unicode file,
                          and I don't notice that in most cases.

                          When there ARE "high" characters, I use a clip (95% based on work by
                          Axel Berger, by the way) to convert it to a non-Unicode form. The clip
                          first reloads the file as /C=65001, in order to be able to "see" the
                          UTF-8 sequences; then does some quite smart search-and-replace to turn
                          them into HTML numeric entities. and then reloads the file again as
                          /C=1200 at the end.

                          That last step needs an explanation: I have noticed that NoteTab
                          remembers the format the file was explicitly loaded last time. That is,
                          if you load the file as /C=65001, it will keep doing so for that file on
                          subsequent reloads. It's NOT related to anything INSIDE the file,
                          because simply renaming the file (in Windows Explorer) will "reset" it
                          to loading normally. The "reload as /C=1200" at the end is just to tell
                          NoteTab to load the file normally again.

                          Perhaps your problem is related to this behavior I noticed?


                          --
                          MCBastos

                          This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
                          -=-=-
                          ... Sent from my Total Lack of Social Skills.
                          * Added by TagZilla 0.7a1 running on Seamonkey *
                          Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
                        • John Shotsky
                          I have written code to convert Unicode characters to the ANSI range as well, and delete anything that is Unicode but not usable in ANSI (such as the Asian
                          Message 12 of 29 , Nov 16, 2014
                          • 0 Attachment
                            I have written code to convert Unicode characters to the ANSI range as well, and delete anything that is Unicode but not usable in
                            ANSI (such as the Asian character sets).. My needs are different, because I need to convert characters that don't exist in ANSI to
                            ANSI sequences, such as the vulgar fraction ? . So, it detects that character and replaces it with the three-ANSI-character sequence
                            1/3. There are many others, such as some foreign letters that don't exist in ANSI, so I convert them to the nearest ANSI equivalent.
                            I get recipes from all over the world, and in Unicode, so in order to be functional in NoteTab, they have to become ANSI. For
                            example, I get some Vietnamese recipes that have ph?, which is converted to pho.

                            That said, the problem I have experienced is different. Remember, I said that the same TEXT saved to two different file names that
                            have never been opened in NoteTab before open differently. The text version opens as I direct it to open, but the one with the
                            charset callout opens as 65001 regardless. Unless, of course, I use NoteTab to strip that one line out before saving it to those two
                            file names, and then they BOTH open as instructed.
                            I do understand what you mean by having been opened before - there is (supposed to be) a byte order code character at the beginning
                            of a Unicode file that identifies the file as Unicode.
                            So, even plain text files can be identified as Unicode by a code:
                            http://msdn.microsoft.com/en-us/library/windows/desktop/dd374101(v=vs.85).aspx

                            Regards,
                            John
                            RecipeTools Web Site: http://recipetools.gotdns.com/recipetools/
                            RecipeTools Yahoo Group: http://groups.yahoo.com/group/RecipeTools/
                            John's Mags Yahoo Group: http://groups.yahoo.com/group/johnsmags/
                            John's Mags Google Group: https://groups.google.com/forum/?hl=en#!forum/johnsmags
                            Subscribe to John's Mags: http://johnsmags.gotdns.com/johnsmags/subscribe.html



                            -----Original Message-----
                            From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                            Sent: Sunday, November 16, 2014 07:32
                            To: ntb-html@yahoogroups.com
                            Subject: Re: [NH] HTML syntax highlighting not working correct

                            On 16/11/2014 10:59, 'John Shotsky' jshotsky@... [ntb-html] wrote:
                            > I don't WANT to open the file as utf8 in a plain text editor that has
                            > no Unicode in it. I don't WANT it to be html at all. I want the text
                            > file that happens to have html text in it to open at the code page I
                            > specify, not on the command line, but in a clip command. I can open a
                            > .txt file that has the exact same text in it as any code page I
                            > choose. But if the file type is .html, it will ONLY open based on that
                            > character set callout, regardless of WHAT I tell it to open as. That
                            > is just wrong. If I remove the charset callout, it does as I tell it
                            > to do. It should not be READING that text, it should be obeying the
                            > command that it is given when told to open the file.

                            Odd, that's not my experience. I have pages I download from a few
                            websites that are declared (in the header) as UTF-8, but since they
                            don't have any "high" characters, NoteTab treats them as regular ASCII
                            -- NoteTab behaves somewhat oddly (search & replace fields behave oddly,
                            some files become read-only) when it is working with an Unicode file,
                            and I don't notice that in most cases.

                            When there ARE "high" characters, I use a clip (95% based on work by
                            Axel Berger, by the way) to convert it to a non-Unicode form. The clip
                            first reloads the file as /C=65001, in order to be able to "see" the
                            UTF-8 sequences; then does some quite smart search-and-replace to turn
                            them into HTML numeric entities. and then reloads the file again as
                            /C=1200 at the end.

                            That last step needs an explanation: I have noticed that NoteTab
                            remembers the format the file was explicitly loaded last time. That is,
                            if you load the file as /C=65001, it will keep doing so for that file on
                            subsequent reloads. It's NOT related to anything INSIDE the file,
                            because simply renaming the file (in Windows Explorer) will "reset" it
                            to loading normally. The "reload as /C=1200" at the end is just to tell
                            NoteTab to load the file normally again.

                            Perhaps your problem is related to this behavior I noticed?


                            --
                            MCBastos

                            This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
                            -=-=-
                            ... Sent from my Total Lack of Social Skills.
                            * Added by TagZilla 0.7a1 running on Seamonkey *
                            Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla



                            ------------------------------------

                            ------------------------------------

                            Fookes Software: http://www.fookes.com/
                            NoteTab website: http://www.notetab.com/
                            NoteTab Discussion Lists: http://www.notetab.com/groups.php

                            ***

                            ------------------------------------

                            Yahoo Groups Links
                          • Axel Berger
                            ... You don t need to. Ther command parameter has to be given on starting up NT. I said and keep saying it s a hassle, I want a permanent setup option in the
                            Message 13 of 29 , Nov 16, 2014
                            • 0 Attachment
                              "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                              > not on the command line, but in a clip command.

                              You don't need to. Ther command parameter has to be given on starting up
                              NT. I said and keep saying it's a hassle, I want a permanent setup option
                              in the main settings dialog.

                              > I want the text file that happens to have html text in it
                              > to open at the code page I specify

                              I haven't tried it, but the View->Options->File Filters setting might help
                              here. The thing is, NT either treats a file as text with no highlighting
                              and no other special features or it treats it as HTML with all the
                              functions on. That's how I want it. And lets be quite clear here, if the
                              charset declaration does not meet the charset actually used, then that file
                              is seriously broken. I don't want NT not to do the right thing on all the
                              correct files I edit, just so it won't do the wrong thing on one or two
                              broken files you may encounter once in a while.

                              And it's easy to get around: <Ctrl><N> for new file with your default
                              extension, .tex in my case but you can make it .txt, and <Ctrl><Shift><O>
                              for load block (might not be the default, not sure what I altered over the
                              years). There you are, a text file with no HTML functionality.

                              > That is just wrong.

                              No, it's a feature and a good one. You don't know the code page before
                              reading the file. In 99 % of cases the code page you specify will be a
                              guess, and in 99.9 % of cases the page the authoir specified will be the
                              correct one.

                              Axel
                            • Axel Berger
                              ... True, but to the best of my knowledge NT doesn t use it. What it does use is the file NOTEPRO.FPR, which is a long list of all recently edited files and
                              Message 14 of 29 , Nov 16, 2014
                              • 0 Attachment
                                "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                > I do understand what you mean by having been opened before - there
                                > is (supposed to be) a byte order code character at the beginning
                                > of a Unicode file that identifies the file as Unicode.

                                True, but to the best of my knowledge NT doesn't use it. What it does use
                                is the file NOTEPRO.FPR, which is a long list of all recently edited files
                                and keeps information on things like e.g. line length for word wrap. The
                                charset used seems to be in there too.

                                Axel
                              • John Shotsky
                                There is no user interaction in my clips. There is no manual file opening/closing, etc. It is all under script control. That said, I don t care HOW it is
                                Message 15 of 29 , Nov 16, 2014
                                • 0 Attachment

                                  There is no user interaction in my clips. There is no manual file opening/closing, etc. It is all under script control. That said, I don't care HOW it is encoded - I am going to change it anyway. When I'm done, it will be all ANSI, and it will be html5. And, I will have a charset callout that is correct. My files come from all over the world, and many of them have atrocious html. Some are missing tags that are mandatory, others use tags incorrectly and some use no classes to identify different things.

                                  My goal is to end up with an html5 document that is consistent regardless of the quality or type of the input. And, I will say again, when I tell NoteTab to open a file, under clip control, using the code page I specify according to NoteTab's own help file, I expect it to do what it says it will do. That it does differently is a bug. And the erratic highlighting is another bug.

                                   

                                   

                                  From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                  Sent: Sunday, November 16, 2014 08:13
                                  To: ntb-html@yahoogroups.com
                                  Subject: Re: [NH] HTML syntax highlighting not working correct

                                   

                                   

                                  "'John Shotsky' jshotsky@... [ntb-html]" wrote:

                                  > not on the command line, but in a clip command.

                                  You don't need to. Ther command parameter has to be given on starting up
                                  NT. I said and keep saying it's a hassle, I want a permanent setup option
                                  in the main settings dialog.

                                  > I want the text file that happens to have html text in it
                                  > to open at the code page I specify

                                  I haven't tried it, but the View->Options->File Filters setting might help
                                  here. The thing is, NT either treats a file as text with no highlighting
                                  and no other special features or it treats it as HTML with all the
                                  functions on. That's how I want it. And lets be quite clear here, if the
                                  charset declaration does not meet the charset actually used, then that file
                                  is seriously broken. I don't want NT not to do the right thing on all the
                                  correct files I edit, just so it won't do the wrong thing on one or two
                                  broken files you may encounter once in a while.

                                  And it's easy to get around: <Ctrl><N> for new file with your default
                                  extension, .tex in my case but you can make it .txt, and <Ctrl><Shift><O>
                                  for load block (might not be the default, not sure what I altered over the
                                  years). There you are, a text file with no HTML functionality.

                                  > That is just wrong.

                                  No, it's a feature and a good one. You don't know the code page before
                                  reading the file. In 99 % of cases the code page you specify will be a
                                  guess, and in 99.9 % of cases the page the authoir specified will be the
                                  correct one.

                                  Axel

                                • Axel Berger
                                  ... No it isn t, _especially_ not in your case. As you say ... So you don t know what s coming in and your guess of codepage is bound to be wrong about half
                                  Message 16 of 29 , Nov 16, 2014
                                  • 0 Attachment
                                    "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                    > That it does differently is a bug.

                                    No it isn't, _especially_ not in your case. As you say

                                    > My files come from all over the
                                    > world, and many of them have atrocious html.

                                    So you don't know what's coming in and your guess of codepage is bound to
                                    be wrong about half the time. So when the file explicitly tells you what it
                                    is, that has to be better than your first guess.
                                    But never mind, you can have what you want, just make a new file and load
                                    block and all your worries are gone. If it's a clip anyway, the cost of the
                                    extra step is nil.

                                    > And the erratic highlighting is another bug.

                                    True, and I tried to help find a way around it.

                                    Axel
                                  • John Shotsky
                                    Not true. I load all files as /65001 to start. I find/fix all the high order characters based on the entire UNICODE range. When that is done what remains is
                                    Message 17 of 29 , Nov 16, 2014
                                    • 0 Attachment

                                      Not true.

                                      I load all files as /65001 to start. I find/fix all the high order characters based on the entire UNICODE range. When that is done what remains is ALWAYS ANSI, nothing higher. Then the file is saved, closed, and the rest of the processing occurs. After stripping it, and saving as ANSI, it should BE ANSI. But it isn't, if it has that charset declaration in the file. And that is my complaint. It is a file with only 8-bit characters that is NOT Unicode, but it is opened as Unicode by NoteTab as long as that charset declaration is in there. So, yes, it IS a bug. It is explicitly SAVED BY NOTETAB as ANSI, but can't be opened as ANSI.

                                       

                                       

                                      From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                      Sent: Sunday, November 16, 2014 09:14
                                      To: ntb-html@yahoogroups.com
                                      Subject: Re: [NH] HTML syntax highlighting not working correct

                                       

                                       

                                      "'John Shotsky' jshotsky@... [ntb-html]" wrote:

                                      > That it does differently is a bug.

                                      No it isn't, _especially_ not in your case. As you say

                                      > My files come from all over the
                                      > world, and many of them have atrocious html.

                                      So you don't know what's coming in and your guess of codepage is bound to
                                      be wrong about half the time. So when the file explicitly tells you what it
                                      is, that has to be better than your first guess.
                                      But never mind, you can have what you want, just make a new file and load
                                      block and all your worries are gone. If it's a clip anyway, the cost of the
                                      extra step is nil.

                                      > And the erratic highlighting is another bug.

                                      True, and I tried to help find a way around it.

                                      Axel

                                    • Axel Berger
                                      ... Well John, in all honesty, if you go and change the charset and save again WITHOUT adapting and correcting the charset declaration, then the mistake and
                                      Message 18 of 29 , Nov 16, 2014
                                      • 0 Attachment
                                        "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                        > After stripping it, and saving as
                                        > ANSI, it should BE ANSI. But it isn't, if it has
                                        > that charset declaration in the file.

                                        Well John, in all honesty, if you go and change the charset and save again
                                        WITHOUT adapting and correcting the charset declaration, then the mistake
                                        and responsibility is all yours.

                                        Axel
                                      • John Shotsky
                                        I don t understand that statement. I open a document as Unicode, convert it to ANSI, save it as ANSI. That is, I change CHARACTERS from multibyte characters to
                                        Message 19 of 29 , Nov 16, 2014
                                        • 0 Attachment

                                          I don't understand that statement. I open a document as Unicode, convert it to ANSI, save it as ANSI. That is, I change CHARACTERS from multibyte characters to single byte characters. That is all it does. There is no text processing at all, it is simply character conversion. It IS ANSI at that point. It is a file full of only one byte characters, saved in an ANSI format BY NOTETAB. NoteTab should not be reading the text and choosing to open it as something other than ANSI if it is told to open an ANSI file as ANSI. NoteTab is not a browser, and that charset declaration is for browsers, not text editors.

                                          It is not an error on my part to NOT change the charset declaration, I am using a text editor to edit text, and the program should not be reading the text and doing something different than it was told to do. It should be able to open any file as any code page it is told to open at, even if that means something doesn't display correctly. THAT would be user error. It is a program error to not open as told to do.  Note that it DOES open a .txt file as wanted, even with the charset declaration in the text. It should do that for ANY filetype.

                                           

                                           

                                          From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                          Sent: Sunday, November 16, 2014 09:40
                                          To: ntb-html@yahoogroups.com
                                          Subject: Re: [NH] HTML syntax highlighting not working correct

                                           

                                           

                                          "'John Shotsky' jshotsky@... [ntb-html]" wrote:

                                          > After stripping it, and saving as
                                          > ANSI, it should BE ANSI. But it isn't, if it has
                                          > that charset declaration in the file.

                                          Well John, in all honesty, if you go and change the charset and save again
                                          WITHOUT adapting and correcting the charset declaration, then the mistake
                                          and responsibility is all yours.

                                          Axel

                                        • Axel Berger
                                          ... No you don t. If you do and save as text, using the appropriate extension or telling NT to treat .htm as such, then all will be as you want. What you are
                                          Message 20 of 29 , Nov 16, 2014
                                          • 0 Attachment
                                            "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                            > I am using a text editor to edit text,

                                            No you don't. If you do and save as text, using the appropriate extension
                                            or telling NT to treat .htm as such, then all will be as you want. What you
                                            are really using is a powerful HTML editor with loads of specialized HTML
                                            functions that are triggered when you tell NT your file is HTML. Most of us
                                            here, presumably all who read the ntb-html list, like, want and enjoy that.
                                            NT is not just a HTML editor, when you tell it not to tread your file as
                                            html it won't, but it is one too and that is, to most of us here, one of
                                            its main benefits.

                                            Why on earth don't you just use a .txt extension for what has to be only a
                                            temporary file? You said yourself the final version will have its headers
                                            cleaned. If your temporary intermediate file is text, not HTML, then call
                                            it text and not .htm.

                                            Axel
                                          • johnta1
                                            Note that it DOES open a .txt file as wanted, even with the charset declaration in the text. It should do that for ANY filetype. So basically it works
                                            Message 21 of 29 , Nov 16, 2014
                                            • 0 Attachment
                                              "Note that it DOES open a .txt file as wanted, even with the charset
                                              declaration in the text. It should do that for ANY filetype."

                                              So basically it works correctly except when the file is saved as a
                                              .htm/.html file?

                                              What if it is saved as .php? (what I would be interested in)

                                              Why not keep it as a text file (.txt) until all the editing is done and
                                              then change it to .html when uploaded to web?


                                              John Wallace
                                              Pontiac Power RULES!!!
                                              http://www.wallaceracing.com



                                              > I don't understand that statement. I open a document as Unicode, convert
                                              > it to ANSI, save it as ANSI. That is, I change CHARACTERS
                                              > from multibyte characters to single byte characters. That is all it does.
                                              > There is no text processing at all, it is simply character
                                              > conversion. It IS ANSI at that point. It is a file full of only one byte
                                              > characters, saved in an ANSI format BY NOTETAB. NoteTab
                                              > should not be reading the text and choosing to open it as something other
                                              > than ANSI if it is told to open an ANSI file as ANSI.
                                              > NoteTab is not a browser, and that charset declaration is for browsers,
                                              > not text editors.
                                              > It is not an error on my part to NOT change the charset declaration, I am
                                              > using a text editor to edit text, and the program should
                                              > not be reading the text and doing something different than it was told to
                                              > do. It should be able to open any file as any code page it
                                              > is told to open at, even if that means something doesn't display
                                              > correctly. THAT would be user error. It is a program error to not
                                              > open as told to do. Note that it DOES open a .txt file as wanted, even
                                              > with the charset declaration in the text. It should do that
                                              > for ANY filetype.
                                              >
                                              > Regards,
                                              > John
                                            • John Shotsky
                                              [Why on earth don t you just use a .txt extension for what has to be only a temporary file? ] I do that, and it IS a temporary file. But I save the txt file to
                                              Message 22 of 29 , Nov 16, 2014
                                              • 0 Attachment

                                                [Why on earth don't you just use a .txt extension for what has to be only a temporary file? ]

                                                I do that, and it IS a temporary file. But I save the txt file to an html file later so it is easier to see the html tags, which is what this whole effort is about.

                                                The sequence of events is this:

                                                Open 'unknown' file as /65001 - Unicode.

                                                Convert all multi-byte characters to single byte characters, OR delete them.

                                                Save as a text file, and close it.

                                                Open the text file as ANSI and perform certain processing that works better as a text file than as an html file. These are actually more character conversions but require regex to work correctly, which it will not do in a Unicode file.

                                                Save as html file, and delete the .txt file.

                                                Perform all required html processing

                                                Save the html file for user to pass to a wysiwyg html editor.

                                                At this point, all html is html5, there are no strange characters, and all anyone cares about is what things are, as opposed to how they are coded. They are all coded exactly the same, regardless of how they started out. There a billions of combinations of character sets and html coding practices, and I want just ONE standardized output. I do get what I want.

                                                The user copies out of the wysiwyg editor and pastes into NoteTab - pure text, no html.

                                                Then the rest of my clip library goes to work and does what is necessary to get the file properly marked up to feed into a recipe management program. It also creates 'standard' html cookbooks, which you can see here: http://johnsmags.gotdns.com/johnsmags/, and it makes ebooks of the same content.

                                                 

                                                The user starts with a blank document with only a code to say what he/she wants to happen. In this instance, it is grabbing the content of an ebook cookbook. RecipeClips asks which ebook, and that is the only user interaction. Any format of ebook cookbook gets converted by Calibre into a standard format called 'htmlz'. That is a basically a zip file with all the photos in a folder and the text in a separate file. RecipeClips unzips it, extracts the files, converts the index.html file from Unicode to ANSI, and proceeds through the process outlined above.

                                                 

                                                But rather than argue, I simply explained what caused my problem with highlighting in case anyone else has that problem. If one didn't know that the declaration was causing the file to open other than as wanted, they might never find the cause. As it is, it took me hours to finally find the cause, which was simply that charset declaration existing in the text file.

                                                Here is a little clip I use on any open file to learn what codepage the file is opened as:

                                                Clip is called 'CodePage'.

                                                ^!Jump 1

                                                ^$GetDocCodePage$

                                                Pretty handy to verify things are as you think they are.

                                                 

                                                 

                                                From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                                Sent: Sunday, November 16, 2014 10:08
                                                To: ntb-html@yahoogroups.com
                                                Subject: Re: [NH] HTML syntax highlighting not working correct

                                                 

                                                 

                                                "'John Shotsky' jshotsky@... [ntb-html]" wrote:

                                                > I am using a text editor to edit text,

                                                No you don't. If you do and save as text, using the appropriate extension
                                                or telling NT to treat .htm as such, then all will be as you want. What you
                                                are really using is a powerful HTML editor with loads of specialized HTML
                                                functions that are triggered when you tell NT your file is HTML. Most of us
                                                here, presumably all who read the ntb-html list, like, want and enjoy that.
                                                NT is not just a HTML editor, when you tell it not to tread your file as
                                                html it won't, but it is one too and that is, to most of us here, one of
                                                its main benefits.

                                                Why on earth don't you just use a .txt extension for what has to be only a
                                                temporary file? You said yourself the final version will have its headers
                                                cleaned. If your temporary intermediate file is text, not HTML, then call
                                                it text and not .htm.

                                                Axel

                                              • John Shotsky
                                                I could leave it as text for users, but as a developer, I want/need to see the highlighted html tags. As I mentioned, some of the html formatting is atrocious,
                                                Message 23 of 29 , Nov 16, 2014
                                                • 0 Attachment

                                                  I could leave it as text for users, but as a developer, I want/need to see the highlighted html tags. As I mentioned, some of the html formatting is atrocious, and not even Tidy can fix it. An experienced html coder would never believe some of the things people do, such as creating breaks with classes, and self-closing them. Not to mention that html files have a LOT of content that is not WANTED when the goal is to access only part of the body content.

                                                  So, my code accepts html in any form, old or new, properly written or not, and converts it to html5, with all extraneous content (styling, for instance) removed, in a form in which ONLY the text content, and NONE of the html remains, but that text content is 'marked up' as to what it is. My work is all around recipes, so regardless of how a recipe is written, or how it is coded in html, RecipeClips must understand what it is, and how to handle it for a consistent end result.

                                                  This also works for web pages that are downloaded using, for instance, HTTrack, which again have ATROCIOUS html coding in many cases. Blogs are the worst.

                                                  RecipeClips is like a black box, into which you can pour recipes in many different export formats, html formats, ebooks or whatever, and get a single properly formatted result that is consistent. It is truly garbage in, good stuff out, because RecipeClips finds the garbage and gets rid of it.

                                                   

                                                  I could write a test for php, but I think anyone could test it by simply saving any html file as ANSI and a php filetype with that charset declaration present. Open it as wanted, and check what code page NoteTab thinks it is.

                                                   

                                                   

                                                  From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                                  Sent: Sunday, November 16, 2014 11:06
                                                  To: ntb-html@yahoogroups.com
                                                  Subject: RE: [NH] HTML syntax highlighting not working correct

                                                   

                                                   

                                                  "Note that it DOES open a .txt file as wanted, even with the charset
                                                  declaration in the text. It should do that for ANY filetype."

                                                  So basically it works correctly except when the file is saved as a
                                                  .htm/.html file?

                                                  What if it is saved as .php? (what I would be interested in)

                                                  Why not keep it as a text file (.txt) until all the editing is done and
                                                  then change it to .html when uploaded to web?

                                                  John Wallace
                                                  Pontiac Power RULES!!!
                                                  http://www.wallaceracing.com

                                                  > I don't understand that statement. I open a document as Unicode, convert
                                                  > it to ANSI, save it as ANSI. That is, I change CHARACTERS
                                                  > from multibyte characters to single byte characters. That is all it does.
                                                  > There is no text processing at all, it is simply character
                                                  > conversion. It IS ANSI at that point. It is a file full of only one byte
                                                  > characters, saved in an ANSI format BY NOTETAB. NoteTab
                                                  > should not be reading the text and choosing to open it as something other
                                                  > than ANSI if it is told to open an ANSI file as ANSI.
                                                  > NoteTab is not a browser, and that charset declaration is for browsers,
                                                  > not text editors.
                                                  > It is not an error on my part to NOT change the charset declaration, I am
                                                  > using a text editor to edit text, and the program should
                                                  > not be reading the text and doing something different than it was told to
                                                  > do. It should be able to open any file as any code page it
                                                  > is told to open at, even if that means something doesn't display
                                                  > correctly. THAT would be user error. It is a program error to not
                                                  > open as told to do. Note that it DOES open a .txt file as wanted, even
                                                  > with the charset declaration in the text. It should do that
                                                  > for ANY filetype.
                                                  >
                                                  > Regards,
                                                  > John

                                                • bruce.somers
                                                  It is not an error on my part to NOT change the charset declaration, I am using a text editor to edit text, and the program should not be reading the text and
                                                  Message 24 of 29 , Nov 16, 2014
                                                  • 0 Attachment
                                                    It is not an error on my part to NOT change the charset declaration, I am using a text editor to edit text, and the program should not be reading the text and doing something different than it was told to do.
                                                     
                                                    I'm with John. Unless NoteTab is explicitly instructed to treat the file as HTML (in this case) it certainly should not base any editing actions on the presence of an HTML charset declaration.
                                                     
                                                    Bruce
                                                     
                                                     
                                                  • John Shotsky
                                                    As a followup to the solution of this problem, I determined that if you open a file using the codepage command, it may ignore it. I had another error which
                                                    Message 25 of 29 , Dec 20, 2014
                                                    • 0 Attachment

                                                      As a followup to the solution of this problem, I determined that if you open a file using the codepage command, it may ignore it. I had another error which clouded the actual problem and solution, though - I was removing the html header, and trying to just use the codepage open to force it to the proper code page. Without a proper html header, the highlighting was not working. No matter that it didn't use the codepage I specified, it did not display the html highlighting because it requires a proper html header (with utf-8 declaration) before the highlighting will work. I don't know why it cares, highlighting of html should be independent of code page present.

                                                      My code strips headers away from any html, then converts what remains to html5, and adds back an html5 header. I was looking at the text before the header was added back. Once I added the header back right after stripping 'whatever it was before', the highlighting worked. So - two different problems.

                                                      Just in case this helps someone else who might be trying to use NTP to convert webpages to html5.

                                                       

                                                       

                                                      From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                                      Sent: Sunday, November 16, 2014 11:41
                                                      To: ntb-html@yahoogroups.com
                                                      Subject: RE: [NH] HTML syntax highlighting not working correct

                                                       

                                                       

                                                      I could leave it as text for users, but as a developer, I want/need to see the highlighted html tags. As I mentioned, some of the html formatting is atrocious, and not even Tidy can fix it. An experienced html coder would never believe some of the things people do, such as creating breaks with classes, and self-closing them. Not to mention that html files have a LOT of content that is not WANTED when the goal is to access only part of the body content.

                                                      So, my code accepts html in any form, old or new, properly written or not, and converts it to html5, with all extraneous content (styling, for instance) removed, in a form in which ONLY the text content, and NONE of the html remains, but that text content is 'marked up' as to what it is. My work is all around recipes, so regardless of how a recipe is written, or how it is coded in html, RecipeClips must understand what it is, and how to handle it for a consistent end result.

                                                      This also works for web pages that are downloaded using, for instance, HTTrack, which again have ATROCIOUS html coding in many cases. Blogs are the worst.

                                                      RecipeClips is like a black box, into which you can pour recipes in many different export formats, html formats, ebooks or whatever, and get a single properly formatted result that is consistent. It is truly garbage in, good stuff out, because RecipeClips finds the garbage and gets rid of it.

                                                       

                                                      I could write a test for php, but I think anyone could test it by simply saving any html file as ANSI and a php filetype with that charset declaration present. Open it as wanted, and check what code page NoteTab thinks it is.

                                                       

                                                       

                                                      From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                                      Sent: Sunday, November 16, 2014 11:06
                                                      To: ntb-html@yahoogroups.com
                                                      Subject: RE: [NH] HTML syntax highlighting not working correct

                                                       

                                                       

                                                      "Note that it DOES open a .txt file as wanted, even with the charset
                                                      declaration in the text. It should do that for ANY filetype."

                                                      So basically it works correctly except when the file is saved as a
                                                      .htm/.html file?

                                                      What if it is saved as .php? (what I would be interested in)

                                                      Why not keep it as a text file (.txt) until all the editing is done and
                                                      then change it to .html when uploaded to web?

                                                      John Wallace
                                                      Pontiac Power RULES!!!
                                                      http://www.wallaceracing.com

                                                      > I don't understand that statement. I open a document as Unicode, convert
                                                      > it to ANSI, save it as ANSI. That is, I change CHARACTERS
                                                      > from multibyte characters to single byte characters. That is all it does.
                                                      > There is no text processing at all, it is simply character
                                                      > conversion. It IS ANSI at that point. It is a file full of only one byte
                                                      > characters, saved in an ANSI format BY NOTETAB. NoteTab
                                                      > should not be reading the text and choosing to open it as something other
                                                      > than ANSI if it is told to open an ANSI file as ANSI.
                                                      > NoteTab is not a browser, and that charset declaration is for browsers,
                                                      > not text editors.
                                                      > It is not an error on my part to NOT change the charset declaration, I am
                                                      > using a text editor to edit text, and the program should
                                                      > not be reading the text and doing something different than it was told to
                                                      > do. It should be able to open any file as any code page it
                                                      > is told to open at, even if that means something doesn't display
                                                      > correctly. THAT would be user error. It is a program error to not
                                                      > open as told to do. Note that it DOES open a .txt file as wanted, even
                                                      > with the charset declaration in the text. It should do that
                                                      > for ANY filetype.
                                                      >
                                                      > Regards,
                                                      > John

                                                    • Axel Berger
                                                      ... Not so, or at least not generally. Does your file have a .htm extension? I frequently export HTML code from my database without any headers and without the
                                                      Message 26 of 29 , Dec 21, 2014
                                                      • 0 Attachment
                                                        "'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                                        > Without a proper html header, the highlighting was not working.

                                                        Not so, or at least not generally. Does your file have a .htm extension? I
                                                        frequently export HTML code from my database without any headers and
                                                        without the <HTML></HTML> and <BODY></BODY> wraps. When I open thoses files
                                                        in NT to add the missing parts, the complete highlighting is there.

                                                        Axel
                                                      • loro
                                                        ... Yar, you can make HTML highlighting show up in any type of document by just adding the extension to the list in Settings. No doctypes or anything else
                                                        Message 27 of 29 , Dec 26, 2014
                                                        • 0 Attachment
                                                          Axel wrote:
                                                          >"'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                                          > > Without a proper html header, the highlighting was not working.
                                                          >
                                                          >Not so, or at least not generally. Does your file have a .htm extension? I
                                                          >frequently export HTML code from my database without any headers and
                                                          >without the <HTML></HTML> and <BODY></BODY> wraps. When I open thoses files
                                                          >in NT to add the missing parts, the complete highlighting is there.

                                                          Yar, you can make HTML highlighting show up in any type of document
                                                          by just adding the extension to the list in Settings. No doctypes or
                                                          anything else needed, not even proper HTML. <whateveryouplease> will
                                                          be highlighted. So this sounds a little weired.


                                                          Lotta
                                                        • John Shotsky
                                                          If I could send screenshots, I could show you text between tags that is highlighted, as well as tags that are not highlighted. In all cases, the html has no
                                                          Message 28 of 29 , Dec 26, 2014
                                                          • 0 Attachment

                                                            If I could send screenshots, I could show you text between tags that is highlighted, as well as tags that are not highlighted. In all cases, the html has no characters above 255. (ANSI).

                                                            Saving the doc, closing it and reopening it seems to help get the highlighting correct again. Simple html with only divs and headers, as well at html head/foot. At this point, I've just learned to live with it. It only happens when debugging, which is the only reason I see it. If I open a file, it always displays correctly.

                                                             

                                                             

                                                            From: ntb-html@yahoogroups.com [mailto:ntb-html@yahoogroups.com]
                                                            Sent: Friday, December 26, 2014 07:11
                                                            To: ntb-html@yahoogroups.com
                                                            Subject: Re: [NH] HTML syntax highlighting not working correct

                                                             

                                                             

                                                            Axel wrote:

                                                            >"'John Shotsky' jshotsky@... [ntb-html]" wrote:
                                                            > > Without a proper html header, the highlighting was not working.
                                                            >
                                                            >Not so, or at least not generally. Does your file have a .htm extension? I
                                                            >frequently export HTML code from my database without any headers and
                                                            >without the <HTML></HTML> and <BODY></BODY> wraps. When I open thoses files
                                                            >in NT to add the missing parts, the complete highlighting is there.

                                                            Yar, you can make HTML highlighting show up in any type of document
                                                            by just adding the extension to the list in Settings. No doctypes or
                                                            anything else needed, not even proper HTML. <whateveryouplease> will
                                                            be highlighted. So this sounds a little weired.

                                                            Lotta

                                                          Your message has been successfully submitted and would be delivered to recipients shortly.