Loading ...
Sorry, an error occurred while loading the content.

Bug? Replacing Tremas

Expand Messages
  • Axel Berger
    Copying and pasting from other people s PDFs I encounter all kinds of strange characters. The most common of them can be replaced automatically. There is one
    Message 1 of 5 , Aug 13, 2013
    • 0 Attachment
      Copying and pasting from other people's PDFs I encounter all kinds of
      strange characters. The most common of them can be replaced
      automatically. There is one strange problem though:

      An Umlaut "ä" is often the letter preceded or followed by the trema "a¨"
      or "¨a" but contrary to all other cases

      ^!Replace "¨a" >> "ä" HASTI

      does not work. The equivalent Regex

      ^!Replace "\xA8a" >> "ä" HRASTI

      does work though. Why is this and why does it affect the trema but no
      other special character?

      Danke
      Axel

      P.S: Ceterum censeo:
      The workaround for repeating a clip ^!Find through F3 is a real bore and
      all too frequntly has glitches. In spite of a generous ^!Delay the
      ^!Keyboard #^%Pattern%# all too often drops characters. This used to
      work so nicely before.

      --
      Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
      Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
      D-51519 Odenthal-Heide eMail: Axel-Berger@...
      Deutschland (Germany) http://berger-odenthal.de
    • John Shotsky
      It might be that the character is not supported in the same code page as the other characters. I use a lot of codes like that to accommodate German, Russian,
      Message 2 of 5 , Aug 13, 2013
      • 0 Attachment
        It might be that the character is not supported in the same code page as the other characters. I use a lot of codes like that to
        accommodate German, Russian, Polish, etc characters that are in an otherwise known code page. These often come from web pages that
        don't properly code such characters for the internet. Browsers are pretty lax about coding like that, and often display correctly
        even when the character is not properly encoded with character entities. I'm not sure about PDF, but assume the same is possible.

        I also use those codes to substitute completely different characters, when I know the actual character is not going to be properly
        handled later, as in the case of single-character fractions. The internet provides those characters through 1/3rds, 1/5ths, 1\6ths
        and 1/8ths. They are called 'vulgar fractions'. I substitute the three-character equivalent, such as 1/8, 3/8, etc. NoteTab only
        supports 1/2, 1/4, and the rest are turned in to question marks.
        Lastly, it may be that the actual character in your case is Unicode, which would provide more than one byte per character. I use
        EditPad Pro (expired trial) to investigate things like that, as it has a hex editor and recognizes the Unicode characters. Once you
        see how it is coded, you can usually create some regex to fix it, as long as it isn't saved as a question mark first.

        Regards,
        John
        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
        John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
        Sent: Tuesday, August 13, 2013 07:18
        To: NoteTab Clips
        Subject: [Clip] Bug? Replacing Tremas


        Copying and pasting from other people's PDFs I encounter all kinds of
        strange characters. The most common of them can be replaced
        automatically. There is one strange problem though:

        An Umlaut "�" is often the letter preceded or followed by the trema "a�"
        or "�a" but contrary to all other cases

        ^!Replace "�a" >> "�" HASTI

        does not work. The equivalent Regex

        ^!Replace "\xA8a" >> "�" HRASTI

        does work though. Why is this and why does it affect the trema but no
        other special character?

        Danke
        Axel

        P.S: Ceterum censeo:
        The workaround for repeating a clip ^!Find through F3 is a real bore and
        all too frequntly has glitches. In spite of a generous ^!Delay the
        ^!Keyboard #^%Pattern%# all too often drops characters. This used to
        work so nicely before.

        --
        Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
        Johann-H�ck-Str. 14 Fax: +49/ 2174/ 7439 68
        D-51519 Odenthal-Heide eMail: Axel-Berger@... <mailto:Axel-Berger%40Nexgo.De>
        Deutschland (Germany) http://berger-odenthal.de



        [Non-text portions of this message have been removed]
      • Axel Berger
        ... It is. As I always work in 8-bit mode (including UTF raw) byte 168 is and stays byte 168 - it may be displayed differently from what was intended but
        Message 3 of 5 , Aug 13, 2013
        • 0 Attachment
          John Shotsky wrote:
          > It might be that the character is not supported in the same
          > code page as the other characters.

          It is. As I always work in 8-bit mode (including UTF raw) byte 168 is
          and stays byte 168 - it may be displayed differently from what was
          intended but that's not relevant here.
          As I always set the find part of a replace by copy and paste from the
          text I'm working on in NoteTab I can also exclude mistyping.

          > I'm not sure about PDF, but assume the same is possible.

          Absolutely. But what I'm working with is no longer the PDF but text from
          it already pasted into NoteTab. So I can see exactly what it is I want
          to replace.

          > it may be that the actual character in your case is Unicode,
          > which would provide more than one byte per character.

          As stated above in my case that can't be, I'd see the two-character
          sequence. In fact those unicode characters are part of what comes up
          routinely and again it is character 168 = A8 that tends to cause
          problems. An example taken from my clip is:

          ^!Replace "¡©" >> "-" WASTI
          ^!Replace "\xA8C" >> "--" WRASTI

          In the first case I could just copy and paste those two characters, in
          the second the trema had to written as \xA8 or the replace won't work.

          > as it has a hex editor and recognizes the Unicode characters

          So is NoteTab as long as you strictly keep it in 8-bit mode. I like the
          new feature of displaying the byte value of single selected characters
          in the status line. Befor that I used this clip

          ^!Set %varchar%=^$GetChar$
          ^!If ^$GetSelSize$ < 1 SKIP
          ^!Set %varchar%=^$StrCopyLeft("^$GetSelection$";1)$
          ^!Set %varchar%=^$CharToDec(^%varchar%)$
          ^!Info Hex: ^$IntToHex(^%varchar%)$ Dec: ^%varchar%

          Axel
        • Art Kocsis
          ... Danger: Notetab forums can be time killers! Not having been exposed to the word trema before I had to look it up. That only took three or four hours. As
          Message 4 of 5 , Aug 14, 2013
          • 0 Attachment
            At 8/13/2013 07:17 AM, Axel wrote:
            ><snip>
            >An Umlaut "ä" is often the letter preceded or followed by the trema "a¨"
            >or "¨a" but contrary to all other cases...

            Danger: Notetab forums can be time killers!

            Not having been exposed to the word "trema" before I had to look it up.
            That only took three or four hours. As with all things internet, one link leads to another so into the fascinating world of linguistics I plunged.

            One can (and many do), spend a lifetime studying the history, evolution and usage of just the various glyphs used in languages around the world so spending only a few hours isn't bad. For anyone interested, Wikipedia is a good place to start.

            Meanwhile, back at the ranch...

            Art
          • John Shotsky
            I hear you. There is a plethora of facts that I have gathered that way, and my wife understands that my brain has room for those facts, but not hers à
            Message 5 of 5 , Aug 14, 2013
            • 0 Attachment
              I hear you. There is a plethora of 'facts' that I have gathered that way, and my wife understands that my brain has room for 'those'
              facts, but not 'hers'�

              Regards,
              John
              RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
              John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Art Kocsis
              Sent: Wednesday, August 14, 2013 16:37
              To: NoteTab-Clips
              Subject: [Clip] Tremas: The Dangers of NTB Forums


              At 8/13/2013 07:17 AM, Axel wrote:
              ><snip>
              >An Umlaut "�" is often the letter preceded or followed by the trema "a�"
              >or "�a" but contrary to all other cases...

              Danger: Notetab forums can be time killers!

              Not having been exposed to the word "trema" before I had to look it up.
              That only took three or four hours. As with all things internet, one link leads to another so into the fascinating world of
              linguistics I plunged.

              One can (and many do), spend a lifetime studying the history, evolution and usage of just the various glyphs used in languages
              around the world so spending only a few hours isn't bad. For anyone interested, Wikipedia is a good place to start.

              Meanwhile, back at the ranch...

              Art



              [Non-text portions of this message have been removed]
            Your message has been successfully submitted and would be delivered to recipients shortly.