Loading ...
Sorry, an error occurred while loading the content.

Re: [NH] Notetab refuses to perform edits on this .html file

Expand Messages
  • Axel Berger
    ... Which is why that s not the way to do it. Hope the following is correct (i.e. works first time), I really hate this feature . You can a) Open the file as
    Message 1 of 9 , Oct 10, 2012
    • 0 Attachment
      Marcelo Bastos wrote:
      > The problem: if there were Unicode characters there, you lost them.

      Which is why that's not the way to do it. Hope the following is correct
      (i.e. works first time), I really hate this "feature". You can
      a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
      switch off document --> Read only.
      or
      b) Open an empty document and your page in another editor and copy and
      paste all of it over.

      To get rid of the UTF characters and convert them to HTML entities you
      can run this clip:

      ---------------------------------------------------------------
      :loop
      ^!Find "[\xC0-\xF7][\x80-\xBF]*" RS
      ^!IfError donelatin
      ^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1
      ^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei
      ^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei
      ^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier
      ^!Continue Illegal sequence, can't be converted.
      ^!Goto loop
      :zwei
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      32)$
      ^!Set %third%=0
      ^!Set %fourth%=0
      ^!Goto makeent
      :drei
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      16)$
      ^!Set %fourth%=0
      ^!Goto makeent
      :vier
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
      64)$
      ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      8)$
      :makeent
      ^!Set
      %first%=^$Calc(262144*^%fourth%+4096*^%third%+64*^%second%+^%first%;0)$
      ^!InsertText &#^%first%;
      ^!Goto loop
      :latin1
      ^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$
      ^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$
      ^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$
      ^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$
      ^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$
      ^!Goto loop
      :donelatin
      ^!Replace "€" >> "€" WASTI
      ^!Replace "Š" >> "Š" WASTI
      ^!Replace "š" >> "š" WASTI
      ^!Replace "Ž" >> "Ž" WASTI
      ^!Replace "ž" >> "ž" WASTI
      ^!Replace "Œ" >> "Œ" WASTI
      ^!Replace "œ" >> "œ" WASTI
      ^!Replace "Ÿ" >> "Ÿ" WASTI
      ---------------------------------------------------------------

      Beware of broken long lines. Each line begins with either "^" or ":".







      RA




      You
      > then have to figure out what they were and where they went originally.
      > And then you have to find out the character entities for them and enter
      > them manually.
      >
      > One way to do that, I found, is by using Microsoft Word. Open the
      > original file in Word, save it as "Web page, filtered." Word is pretty
      > useless as a HTML editor, but it does have good Unicode support, and it
      > will usually convert Unicode to a Win-1252 file with all the
      > 1252-incompatible characters to HTML numbered entities. Then you open
      > this file in Notepad, search for "&#", and there you have it, the
      > mystery characters.
      >
      > And that is the second reason I still keep Word in my computer, since I
      > hardly ever use it for writing nowadays. (The first reason is that the
      > file-compare feature in Word is pretty kickass, and I have to compare
      > files now and then).
      >
      > --
      > MCBastos
      >
      > This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
      > -=-=-
      > ... Sent from my HAL 9000.
      > * Added by TagZilla 0.7a1 running on Seamonkey 2.12.1 *
      > Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
      >
      > ------------------------------------
      >
      > Fookes Software: http://www.fookes.com/
      > NoteTab website: http://www.notetab.com/
      > NoteTab Discussion Lists: http://www.notetab.com/groups.php
      >
      > ***
      > Yahoo! Groups Links
      >
      >
      >
      --
      Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
      Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
      D-51519 Odenthal-Heide eMail: Axel-Berger@...
      Deutschland (Germany) http://berger-odenthal.de
    • stitch.happy
      Thanks, John and Marcelo, for your suggestions. It turned out to be a Unicode character, a checkmark, that was the problem. I appreciate the help! I used
      Message 2 of 9 , Oct 10, 2012
      • 0 Attachment
        Thanks, John and Marcelo, for your suggestions. It turned out to be a Unicode character, a checkmark, that was the problem. I appreciate the help!

        I used NotePad to open the file and did a Save-As and selected ANSI format. Got a warning that said I was about to lose Unicode formatted characters. Saved as a new file and did a file compare (CompareIt!) between the two files.

        Regards,
        Bev

        --- In ntb-html@yahoogroups.com, Marcelo Bastos <bytext@...> wrote:

        > I didn't check, but most time when I couldn't edit a file, it turned out
        > to be a Unicode file. Notetab has limited Unicode support.
      • stitch.happy
        Thanks, Axel. I sent the prev reply before seeing this. This looks very handy. -Bev
        Message 3 of 9 , Oct 10, 2012
        • 0 Attachment
          Thanks, Axel. I sent the prev reply before seeing this. This looks very handy.

          -Bev

          --- In ntb-html@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
          >
          > Marcelo Bastos wrote:
          > > The problem: if there were Unicode characters there, you lost them.
          >
          > Which is why that's not the way to do it. Hope the following is correct
          > (i.e. works first time), I really hate this "feature". You can
          > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
          > switch off document --> Read only.
          > or
          > b) Open an empty document and your page in another editor and copy and
          > paste all of it over.
          >
        • stitch.happy
          Sweet. Worked the first time. Now a part of my clip library of handy stuff, with attribution to Axel. I used method (a). Thanks, Axel! And thanks for the
          Message 4 of 9 , Oct 10, 2012
          • 0 Attachment
            Sweet. Worked the first time. Now a part of my clip library of handy stuff, with attribution to Axel. I used method (a). Thanks, Axel! And thanks for the hint to unwrap broken long lines. Hints like that to the newbies keeps the frustration down. Keep up the good work folks!

            Bev

            --- In ntb-html@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
            >
            > Marcelo Bastos wrote:
            > > The problem: if there were Unicode characters there, you lost them.
            >
            > Which is why that's not the way to do it. Hope the following is correct
            > (i.e. works first time), I really hate this "feature". You can
            > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
            > switch off document --> Read only.
            > or
            > b) Open an empty document and your page in another editor and copy and
            > paste all of it over.
            >
            > To get rid of the UTF characters and convert them to HTML entities you
            > can run this clip:
            >
            > ---------------------------------------------------------------
            > :loop
            > ^!Find "[\xC0-\xF7][\x80-\xBF]*" RS
            > ^!IfError donelatin
            > ^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1
            > ^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei
            > ^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei
            > ^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier
            > ^!Continue Illegal sequence, can't be converted.
            > ^!Goto loop
            > :zwei
            > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
            > 64)$
            > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
            > 32)$
            > ^!Set %third%=0
            > ^!Set %fourth%=0
            > ^!Goto makeent
            > :drei
            > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
            > 64)$
            > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
            > 64)$
            > ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
            > 16)$
            > ^!Set %fourth%=0
            > ^!Goto makeent
            > :vier
            > ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD
            > 64)$
            > ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
            > 64)$
            > ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
            > 64)$
            > ^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
            > 8)$
            > :makeent
            > ^!Set
            > %first%=^$Calc(262144*^%fourth%+4096*^%third%+64*^%second%+^%first%;0)$
            > ^!InsertText &#^%first%;
            > ^!Goto loop
            > :latin1
            > ^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$
            > ^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$
            > ^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$
            > ^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$
            > ^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$
            > ^!Goto loop
            > :donelatin
            > ^!Replace "€" >> "€" WASTI
            > ^!Replace "Š" >> "Š" WASTI
            > ^!Replace "š" >> "š" WASTI
            > ^!Replace "Ž" >> "Ž" WASTI
            > ^!Replace "ž" >> "ž" WASTI
            > ^!Replace "Œ" >> "Œ" WASTI
            > ^!Replace "œ" >> "œ" WASTI
            > ^!Replace "Ÿ" >> "Ÿ" WASTI
            > ---------------------------------------------------------------
            >
            > Beware of broken long lines. Each line begins with either "^" or ":".
            >
            >
            >
            >
            >
            >
            >
            > RA
            >
            >
            >
            >
            > You
            > > then have to figure out what they were and where they went originally.
            > > And then you have to find out the character entities for them and enter
            > > them manually.
            > >
            > > One way to do that, I found, is by using Microsoft Word. Open the
            > > original file in Word, save it as "Web page, filtered." Word is pretty
            > > useless as a HTML editor, but it does have good Unicode support, and it
            > > will usually convert Unicode to a Win-1252 file with all the
            > > 1252-incompatible characters to HTML numbered entities. Then you open
            > > this file in Notepad, search for "&#", and there you have it, the
            > > mystery characters.
            > >
            > > And that is the second reason I still keep Word in my computer, since I
            > > hardly ever use it for writing nowadays. (The first reason is that the
            > > file-compare feature in Word is pretty kickass, and I have to compare
            > > files now and then).
            > >
            > > --
            > > MCBastos
            > >
            > > This message has been protected with the 2ROT13 algorithm. Unauthorized use will be prosecuted under the DMCA.
            > > -=-=-
            > > ... Sent from my HAL 9000.
            > > * Added by TagZilla 0.7a1 running on Seamonkey 2.12.1 *
            > > Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
            > >
            > > ------------------------------------
            > >
            > > Fookes Software: http://www.fookes.com/
            > > NoteTab website: http://www.notetab.com/
            > > NoteTab Discussion Lists: http://www.notetab.com/groups.php
            > >
            > > ***
            > > Yahoo! Groups Links
            > >
            > >
            > >
            > --
            > Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
            > Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
            > D-51519 Odenthal-Heide eMail: Axel-Berger@...
            > Deutschland (Germany) http://berger-odenthal.de
            >
          • Marcelo Bastos
            ... That s a very nice piece of clip programming, and yes, it DID work first time. (Well, after I fixed a couple statements that had been line-wrapped by the
            Message 5 of 9 , Oct 10, 2012
            • 0 Attachment
              Interviewed by CNN on 10/10/2012 07:01, Axel Berger told the world:
              > Marcelo Bastos wrote:
              >> The problem: if there were Unicode characters there, you lost them.
              > Which is why that's not the way to do it. Hope the following is correct
              > (i.e. works first time), I really hate this "feature". You can
              > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
              > switch off document --> Read only.
              That's a very nice piece of clip programming, and yes, it DID work first
              time. (Well, after I fixed a couple statements that had been
              line-wrapped by the mail systems, that is.) Thank you, it will prove
              most useful in the coming weeks.
              I had a quick look at the logic, and it seems to be generic enough to
              tackle the entire Basic Multilingual Plane. Which is good, since I have
              deal with a couple text sources who just *love* to use obscure
              characters from languages you never heard about for aesthetic effect.

              I'm already thinking about four or five ways I can integrate it into my
              workflow. It will probably end up as the main subroutine of a larger
              clip. I'm thinking of starting with an auto-reload of the file as "UTF-8
              (no conversion)," then a preprocessing search-and-replace to get rid of
              the most common cases, like "smart quotes" (not strictly needed, but it
              should speed up the process quite a bit), and a post-processing
              "cleanup" phase using a couple clips I already have in hand.

              --

              MCBastos This message has been protected with the 2ROT13 algorithm.
              Unauthorized use will be prosecuted under the DMCA.

              -=-=-
              ... Sent from my Total Lack of Social Skills.
              * Added by TagZilla 0.7a1 running on Seamonkey 2.13 *
              Get it at http://xsidebar.mozdev.org/modifiedmailnews.html#tagzilla
            • Axel Berger
              ... Even more than that, it will also translate illegal UTF into equally illegal entities. I have another clip that checks a document for legal UTF and flags
              Message 6 of 9 , Oct 10, 2012
              • 0 Attachment
                Marcelo Bastos wrote:
                > I had a quick look at the logic, and it seems to be generic enough to
                > tackle the entire Basic Multilingual Plane.

                Even more than that, it will also translate illegal UTF into equally
                illegal entities. I have another clip that checks a document for legal
                UTF and flags errors such as ANSI characters.

                ---------------------------------------------------------------
                :loop
                ^!Find "([\x80-\xBF]|[\xC0-\xFF][\x80-\xBF]*)" RS
                ^!IfError usasc
                ^!IfMatch "[\xC2-\xDF][\x80-\xBF]" "^$GetSelection$" loop
                ^!IfMatch "\xE0[\xA0-\xBF][\x80-\xBF]" "^$GetSelection$" loop
                ^!IfMatch "[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}" "^$GetSelection$" loop
                ^!IfMatch "\xED[\x80-\x9F][\x80-\xBF]" "^$GetSelection$" loop
                ^!IfMatch "\xF0[\x90-\xBF][\x80-\xBF]{2}" "^$GetSelection$" loop
                ^!IfMatch "[\xF1-\xF3][\x80-\xBF]{3}" "^$GetSelection$" loop
                ^!IfMatch "\xF4[\x80-\x8F][\x80-\xBF]{2}" "^$GetSelection$" loop
                ^!Continue Illegal sequence, no UTF-8
                ^!Goto loop
                :usasc
                ^!Continue No errors found
                ---------------------------------------------------------------

                Axel
              Your message has been successfully submitted and would be delivered to recipients shortly.