Loading ...
Sorry, an error occurred while loading the content.
 

Re: [NH] Notetab refuses to perform edits on this .html file

Expand Messages
  • Marcelo Bastos
    ... I didn t check, but most time when I couldn t edit a file, it turned out to be a Unicode file. Notetab has limited Unicode support. The thing to do, you
    Message 1 of 9 , Oct 9, 2012
      Interviewed by CNN on 09/10/2012 21:51, stitch.happy told the world:
      > I'm just getting my feet wet with using NoteTab to edit .html docs.
      >
      > I made a local copy of this web page:
      >
      > http://twitter.github.com/bootstrap/scaffolding.html
      >
      I didn't check, but most time when I couldn't edit a file, it turned out
      to be a Unicode file. Notetab has limited Unicode support.

      The thing to do, you save the file as a copy. Then edit the copy.

      The problem: if there were Unicode characters there, you lost them. You
      then have to figure out what they were and where they went originally.
      And then you have to find out the character entities for them and enter
      them manually.

      One way to do that, I found, is by using Microsoft Word. Open the
      original file in Word, save it as "Web page, filtered." Word is pretty
      useless as a HTML editor, but it does have good Unicode support, and it
      will usually convert Unicode to a Win-1252 file with all the
      1252-incompatible characters to HTML numbered entities. Then you open
      this file in Notepad, search for "&#", and there you have it, the
      mystery characters.

      And that is the second reason I still keep Word in my computer, since I
      hardly ever use it for writing nowadays. (The first reason is that the
      file-compare feature in Word is pretty kickass, and I have to compare
      files now and then).

      --
      MCBastos

      This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
      -=-=-
      ... Sent from my HAL 9000.
      * Added by TagZilla 0.7a1 running on Seamonkey 2.12.1 *
      Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
    • Axel Berger
      ... Which is why that s not the way to do it. Hope the following is correct (i.e. works first time), I really hate this feature . You can a) Open the file as
      Message 2 of 9 , Oct 10, 2012
        Marcelo Bastos wrote:
        > The problem: if there were Unicode characters there, you lost them.

        Which is why that's not the way to do it. Hope the following is correct
        (i.e. works first time), I really hate this "feature". You can
        a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
        switch off document --> Read only.
        or
        b) Open an empty document and your page in another editor and copy and
        paste all of it over.

        To get rid of the UTF characters and convert them to HTML entities you
        can run this clip:

        ---------------------------------------------------------------
        :loop
        ^!Find "[\xC0-\xF7][\x80-\xBF]*" RS
        ^!IfError donelatin
        ^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1
        ^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei
        ^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei
        ^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier
        ^!Continue Illegal sequence, can't be converted.
        ^!Goto loop
        :zwei
        ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
        64)$
        ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
        32)$
        ^!Set %third%=0
        ^!Set %fourth%=0
        ^!Goto makeent
        :drei
        ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
        64)$
        ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
        64)$
        ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
        16)$
        ^!Set %fourth%=0
        ^!Goto makeent
        :vier
        ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD
        64)$
        ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
        64)$
        ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
        64)$
        ^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
        8)$
        :makeent
        ^!Set
        %first%=^$Calc(262144*^%fourth%+4096*^%third%+64*^%second%+^%first%;0)$
        ^!InsertText &#^%first%;
        ^!Goto loop
        :latin1
        ^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$
        ^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$
        ^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$
        ^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$
        ^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$
        ^!Goto loop
        :donelatin
        ^!Replace "€" >> "€" WASTI
        ^!Replace "Š" >> "Š" WASTI
        ^!Replace "š" >> "š" WASTI
        ^!Replace "Ž" >> "Ž" WASTI
        ^!Replace "ž" >> "ž" WASTI
        ^!Replace "Œ" >> "Œ" WASTI
        ^!Replace "œ" >> "œ" WASTI
        ^!Replace "Ÿ" >> "Ÿ" WASTI
        ---------------------------------------------------------------

        Beware of broken long lines. Each line begins with either "^" or ":".







        RA




        You
        > then have to figure out what they were and where they went originally.
        > And then you have to find out the character entities for them and enter
        > them manually.
        >
        > One way to do that, I found, is by using Microsoft Word. Open the
        > original file in Word, save it as "Web page, filtered." Word is pretty
        > useless as a HTML editor, but it does have good Unicode support, and it
        > will usually convert Unicode to a Win-1252 file with all the
        > 1252-incompatible characters to HTML numbered entities. Then you open
        > this file in Notepad, search for "&#", and there you have it, the
        > mystery characters.
        >
        > And that is the second reason I still keep Word in my computer, since I
        > hardly ever use it for writing nowadays. (The first reason is that the
        > file-compare feature in Word is pretty kickass, and I have to compare
        > files now and then).
        >
        > --
        > MCBastos
        >
        > This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
        > -=-=-
        > ... Sent from my HAL 9000.
        > * Added by TagZilla 0.7a1 running on Seamonkey 2.12.1 *
        > Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
        >
        > ------------------------------------
        >
        > Fookes Software: http://www.fookes.com/
        > NoteTab website: http://www.notetab.com/
        > NoteTab Discussion Lists: http://www.notetab.com/groups.php
        >
        > ***
        > Yahoo! Groups Links
        >
        >
        >
        --
        Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
        Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
        D-51519 Odenthal-Heide eMail: Axel-Berger@...
        Deutschland (Germany) http://berger-odenthal.de
      • stitch.happy
        Thanks, John and Marcelo, for your suggestions. It turned out to be a Unicode character, a checkmark, that was the problem. I appreciate the help! I used
        Message 3 of 9 , Oct 10, 2012
          Thanks, John and Marcelo, for your suggestions. It turned out to be a Unicode character, a checkmark, that was the problem. I appreciate the help!

          I used NotePad to open the file and did a Save-As and selected ANSI format. Got a warning that said I was about to lose Unicode formatted characters. Saved as a new file and did a file compare (CompareIt!) between the two files.

          Regards,
          Bev

          --- In ntb-html@yahoogroups.com, Marcelo Bastos <bytext@...> wrote:

          > I didn't check, but most time when I couldn't edit a file, it turned out
          > to be a Unicode file. Notetab has limited Unicode support.
        • stitch.happy
          Thanks, Axel. I sent the prev reply before seeing this. This looks very handy. -Bev
          Message 4 of 9 , Oct 10, 2012
            Thanks, Axel. I sent the prev reply before seeing this. This looks very handy.

            -Bev

            --- In ntb-html@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
            >
            > Marcelo Bastos wrote:
            > > The problem: if there were Unicode characters there, you lost them.
            >
            > Which is why that's not the way to do it. Hope the following is correct
            > (i.e. works first time), I really hate this "feature". You can
            > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
            > switch off document --> Read only.
            > or
            > b) Open an empty document and your page in another editor and copy and
            > paste all of it over.
            >
          • stitch.happy
            Sweet. Worked the first time. Now a part of my clip library of handy stuff, with attribution to Axel. I used method (a). Thanks, Axel! And thanks for the
            Message 5 of 9 , Oct 10, 2012
              Sweet. Worked the first time. Now a part of my clip library of handy stuff, with attribution to Axel. I used method (a). Thanks, Axel! And thanks for the hint to unwrap broken long lines. Hints like that to the newbies keeps the frustration down. Keep up the good work folks!

              Bev

              --- In ntb-html@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
              >
              > Marcelo Bastos wrote:
              > > The problem: if there were Unicode characters there, you lost them.
              >
              > Which is why that's not the way to do it. Hope the following is correct
              > (i.e. works first time), I really hate this "feature". You can
              > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
              > switch off document --> Read only.
              > or
              > b) Open an empty document and your page in another editor and copy and
              > paste all of it over.
              >
              > To get rid of the UTF characters and convert them to HTML entities you
              > can run this clip:
              >
              > ---------------------------------------------------------------
              > :loop
              > ^!Find "[\xC0-\xF7][\x80-\xBF]*" RS
              > ^!IfError donelatin
              > ^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1
              > ^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei
              > ^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei
              > ^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier
              > ^!Continue Illegal sequence, can't be converted.
              > ^!Goto loop
              > :zwei
              > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
              > 64)$
              > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
              > 32)$
              > ^!Set %third%=0
              > ^!Set %fourth%=0
              > ^!Goto makeent
              > :drei
              > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
              > 64)$
              > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
              > 64)$
              > ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
              > 16)$
              > ^!Set %fourth%=0
              > ^!Goto makeent
              > :vier
              > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD
              > 64)$
              > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
              > 64)$
              > ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
              > 64)$
              > ^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
              > 8)$
              > :makeent
              > ^!Set
              > %first%=^$Calc(262144*^%fourth%+4096*^%third%+64*^%second%+^%first%;0)$
              > ^!InsertText &#^%first%;
              > ^!Goto loop
              > :latin1
              > ^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$
              > ^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$
              > ^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$
              > ^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$
              > ^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$
              > ^!Goto loop
              > :donelatin
              > ^!Replace "€" >> "€" WASTI
              > ^!Replace "Š" >> "Š" WASTI
              > ^!Replace "š" >> "š" WASTI
              > ^!Replace "Ž" >> "Ž" WASTI
              > ^!Replace "ž" >> "ž" WASTI
              > ^!Replace "Œ" >> "Œ" WASTI
              > ^!Replace "œ" >> "œ" WASTI
              > ^!Replace "Ÿ" >> "Ÿ" WASTI
              > ---------------------------------------------------------------
              >
              > Beware of broken long lines. Each line begins with either "^" or ":".
              >
              >
              >
              >
              >
              >
              >
              > RA
              >
              >
              >
              >
              > You
              > > then have to figure out what they were and where they went originally.
              > > And then you have to find out the character entities for them and enter
              > > them manually.
              > >
              > > One way to do that, I found, is by using Microsoft Word. Open the
              > > original file in Word, save it as "Web page, filtered." Word is pretty
              > > useless as a HTML editor, but it does have good Unicode support, and it
              > > will usually convert Unicode to a Win-1252 file with all the
              > > 1252-incompatible characters to HTML numbered entities. Then you open
              > > this file in Notepad, search for "&#", and there you have it, the
              > > mystery characters.
              > >
              > > And that is the second reason I still keep Word in my computer, since I
              > > hardly ever use it for writing nowadays. (The first reason is that the
              > > file-compare feature in Word is pretty kickass, and I have to compare
              > > files now and then).
              > >
              > > --
              > > MCBastos
              > >
              > > This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
              > > -=-=-
              > > ... Sent from my HAL 9000.
              > > * Added by TagZilla 0.7a1 running on Seamonkey 2.12.1 *
              > > Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
              > >
              > > ------------------------------------
              > >
              > > Fookes Software: http://www.fookes.com/
              > > NoteTab website: http://www.notetab.com/
              > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
              > >
              > > ***
              > > Yahoo! Groups Links
              > >
              > >
              > >
              > --
              > Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
              > Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
              > D-51519 Odenthal-Heide eMail: Axel-Berger@...
              > Deutschland (Germany) http://berger-odenthal.de
              >
            • Marcelo Bastos
              ... That s a very nice piece of clip programming, and yes, it DID work first time. (Well, after I fixed a couple statements that had been line-wrapped by the
              Message 6 of 9 , Oct 10, 2012
                Interviewed by CNN on 10/10/2012 07:01, Axel Berger told the world:
                > Marcelo Bastos wrote:
                >> The problem: if there were Unicode characters there, you lost them.
                > Which is why that's not the way to do it. Hope the following is correct
                > (i.e. works first time), I really hate this "feature". You can
                > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
                > switch off document --> Read only.
                That's a very nice piece of clip programming, and yes, it DID work first
                time. (Well, after I fixed a couple statements that had been
                line-wrapped by the mail systems, that is.) Thank you, it will prove
                most useful in the coming weeks.
                I had a quick look at the logic, and it seems to be generic enough to
                tackle the entire Basic Multilingual Plane. Which is good, since I have
                deal with a couple text sources who just *love* to use obscure
                characters from languages you never heard about for aesthetic effect.

                I'm already thinking about four or five ways I can integrate it into my
                workflow. It will probably end up as the main subroutine of a larger
                clip. I'm thinking of starting with an auto-reload of the file as "UTF-8
                (no conversion)," then a preprocessing search-and-replace to get rid of
                the most common cases, like "smart quotes" (not strictly needed, but it
                should speed up the process quite a bit), and a post-processing
                "cleanup" phase using a couple clips I already have in hand.

                --

                MCBastos This message has been protected with the 2ROT13 algorithm.
                Unauthorized use will be prosecuted under the DMCA.

                -=-=-
                ... Sent from my Total Lack of Social Skills.
                * Added by TagZilla 0.7a1 running on Seamonkey 2.13 *
                Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
              • Axel Berger
                ... Even more than that, it will also translate illegal UTF into equally illegal entities. I have another clip that checks a document for legal UTF and flags
                Message 7 of 9 , Oct 10, 2012
                  Marcelo Bastos wrote:
                  > I had a quick look at the logic, and it seems to be generic enough to
                  > tackle the entire Basic Multilingual Plane.

                  Even more than that, it will also translate illegal UTF into equally
                  illegal entities. I have another clip that checks a document for legal
                  UTF and flags errors such as ANSI characters.

                  ---------------------------------------------------------------
                  :loop
                  ^!Find "([\x80-\xBF]|[\xC0-\xFF][\x80-\xBF]*)" RS
                  ^!IfError usasc
                  ^!IfMatch "[\xC2-\xDF][\x80-\xBF]" "^$GetSelection$" loop
                  ^!IfMatch "\xE0[\xA0-\xBF][\x80-\xBF]" "^$GetSelection$" loop
                  ^!IfMatch "[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}" "^$GetSelection$" loop
                  ^!IfMatch "\xED[\x80-\x9F][\x80-\xBF]" "^$GetSelection$" loop
                  ^!IfMatch "\xF0[\x90-\xBF][\x80-\xBF]{2}" "^$GetSelection$" loop
                  ^!IfMatch "[\xF1-\xF3][\x80-\xBF]{3}" "^$GetSelection$" loop
                  ^!IfMatch "\xF4[\x80-\x8F][\x80-\xBF]{2}" "^$GetSelection$" loop
                  ^!Continue Illegal sequence, no UTF-8
                  ^!Goto loop
                  :usasc
                  ^!Continue No errors found
                  ---------------------------------------------------------------

                  Axel
                Your message has been successfully submitted and would be delivered to recipients shortly.