Loading ...
Sorry, an error occurred while loading the content.

alphanumeric character transcoding, by ones and pairs

Expand Messages
  • rickah
    I need to replace characters and character pairs in whole documents. This relates to an Asian language that does not have a standardized keyboard, so any two
    Message 1 of 19 , Jun 13, 2013
    View Source
    • 0 Attachment
      I need to replace characters and character pairs in whole documents. This relates to an Asian language that does not have a standardized keyboard, so any two fonts might use completely different keyboard layouts rendering them incompatible for 'computerizing'.

      An example is, I want create a clip that would re-write the characters in this first string to exactly match the characters in the second string.

      1: lABTAkvFHWkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanOlapLUB
      2: vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR

      What look like punctuation marks in these strings are actually letters, not word breaks. This language does not traditionally use spaces nor punctuation.

      Making the following list is as far as I got. Note that "bJ" (bS) is one letter, while b, B, j or J in another combination would be four separate letters. That makes it somewhat persnickety.

      Thank you very much for getting me started in the right direction.

      --list--
      ^!Replace "bJ" >> "bS"
      ^!Replace "dD" >> "'f"
      ^!Replace "kY" >> "uF"
      ^!Replace "KY" >> "cF"
      ^!Replace "mD" >> "rf"
      ^!Replace "mJ" >> "rS"
      ^!Replace "pJ" >> "yS"
      ^!Replace "PJ" >> "zS"
      ^!Replace "SJ" >> "pS"
      ^!Replace "sJ" >> "qS"
      ^!Replace "a" >> "g"
      ^!Replace "A" >> "H"
      ^!Replace "b" >> "b"
      ^!Replace "B" >> "R"
      ^!Replace "c" >> "C"
      ^!Replace "d" >> "'"
      ^!Replace "E" >> "h"
      ^!Replace "e" >> "J"
      ^!Replace "F" >> "."
      ^!Replace "f" >> "m"
      ^!Replace "g" >> "i"
      ^!Replace "h" >> "["
      ^!Replace "H" >> "o"
      ^!Replace "I" >> ">"
      ^!Replace "i" >> "X"
      ^!Replace "j" >> "*"
      ^!Replace "J" >> "S"
      ^!Replace "K" >> "c"
      ^!Replace "k" >> "u"
      ^!Replace "L" >> "s"
      ^!Replace "l" >> "v"
      ^!Replace "m" >> "r"
      ^!Replace "n" >> "e"
      ^!Replace "o" >> "D"
      ^!Replace "O" >> "d"
      ^!Replace "p" >> "y"
      ^!Replace "P" >> "z"
      ^!Replace "q" >> "n"
      ^!Replace "r" >> "&"
      ^!Replace "S" >> "p"
      ^!Replace "s" >> "q"
      ^!Replace "t" >> "w"
      ^!Replace "T" >> "x"
      ^!Replace "u" >> "k"
      ^!Replace "v" >> "l"
      ^!Replace "w" >> "0"
      ^!Replace "W" >> "G"
      ^!Replace "X" >> "{"
      ^!Replace "x" >> "t"
      ^!Replace "y" >> ","
      ^!Replace "Y" >> "F"
      ^!Replace "z" >> "±S"
      ^!Replace ":" >> ";"
      ^!Replace "\" >> "-"
      --end--
    • flo.gehrke
      ... This is no complete solution but just a basic idea: Example: In order to replace ABdEHhdDSJ with HR ho[fpS try the following clip: ^!SetArray
      Message 2 of 19 , Jun 13, 2013
      View Source
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "rickah" <richolland@...> wrote:
        >
        > (...) I want create a clip that would re-write the characters
        > in this first string to exactly match the characters in the
        > second string.
        >
        > 1: lABTAkvFHWkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanOlapLUB
        > 2: vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR
        > (...)
        > Making the following list is as far as I got.
        > --list--
        > ^!Replace "bJ" >> "bS"
        > ^!Replace "dD" >> "'f"
        > etc

        This is no complete solution but just a basic idea:

        Example: In order to replace 'ABdEHhdDSJ' with 'HR'ho[fpS' try the following clip:

        ^!SetArray %Char%=A;H;B;R;E;h;H;o;h;[;dD;f;SJ;pS;d;'
        ^!Set %i%=0

        :Loop
        ^!Inc %i%
        ^!If ^%i% > ^%Char0% Out
        ^!Set %Search%=^%Char^%i%%
        ^!Inc %i%
        ^!Set %ReplaceWith%=^%Char^%i%%
        ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
        ^!Goto Loop

        :Out
        ^!Replace "|" >> "" WATS

        In array %Char%, you see a sequence where each character is followed by the character it has to be replaced with. Note: In the array, a string like 'd' must follow 'dD'.

        The point is not to replace a 'h' that replaced an 'E' again with '['. To prevent this, each character that has been replaced already is protected with '|'. So '|h' won't get replaced with [' again. This is achieved by the Negative Lookbehind '(?<!\|)'.

        Probably, there are more issues in this but I hope it might be useful as a first approach...

        Regards,
        Flo
      • Ian NTnerd
        Flo, Still has bugs see bottom. Thanks for helping me learn more RE. Negative look behind. I d simplify the array by putting it in a separate clip each pair on
        Message 3 of 19 , Jun 13, 2013
        View Source
        • 0 Attachment
          Flo,

          Still has bugs see bottom.

          Thanks for helping me learn more RE. Negative look behind.

          I'd simplify the array by putting it in a separate clip each pair on a
          new line and each pair separated by a tab space. That way the data is
          separated from the clip and much easier to maintain. Note this email has
          changed the tab to multiple spaces so it needs to be changed to work.
          H="_list"
          bJ bS
          dD 'f
          kY uF
          KY cF
          mD rf
          mJ rS
          pJ yS
          PJ zS
          SJ pS
          sJ qS
          a g
          A H
          b b
          B R
          c C
          d '
          E h
          e J
          F .
          f m
          g i
          h [
          H o
          I >
          i X
          j *
          J S
          K c
          k u
          L s
          l v
          m r
          n e
          o D
          O d
          p y
          P z
          q n
          r &
          S p
          s q
          t w
          T x
          u k
          v l
          w 0
          W G
          X {
          x t
          y ,
          Y F
          z ±S
          : ;
          \\ -

          Note above \ character has to be doubled for this to work.

          H="transcode"
          ^!SetListDelimiter ^p
          ^!SetArray %Charpair%=^$GetClipText("list")$
          ^!Set %i%=0

          :Loop

          ^!Inc %i%

          ^!If ^%i% > ^%Charpair0% Out
          ^!SetListDelimiter ^T
          ^!SetArray %pair%=^%Charpair^%i%%
          ^!Set %Search%=^%pair1%
          ;^!Inc %i%
          ^!Set %ReplaceWith%=^%pair2%
          ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
          ^!Goto Loop

          :Out
          ^!Replace "|" >> "" WATS


          This only partly works. Some of the second characters in the two
          character digraphs can change a second time.

          Data
          Start line
          lABTAkvFHWkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanOlapLUB
          transformed line
          vHRxHul.oGuDvgchs'H.ngvgeGhvgCd;vgcl;qh;rk>qh;q.vgedvgysUR
          Wanted line
          vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR


          So I think this is a fail. Changing the order may fix the problem but it
          may introduce new problems. It really needs to be processed as a string.

          But there may be some data inconsistency between the supplied changes
          and the two strings. i.e the second letter H or h

          Ian

          On 14/06/2013 6:38 AM, flo.gehrke wrote:
          >
          > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>,
          > "rickah" <richolland@...> wrote:
          > >
          > > (...) I want create a clip that would re-write the characters
          > > in this first string to exactly match the characters in the
          > > second string.
          > >
          > > 1: lABTAkvFHWkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanOlapLUB
          > > 2: vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR
          > > (...)
          > > Making the following list is as far as I got.
          > > --list--
          > > ^!Replace "bJ" >> "bS"
          > > ^!Replace "dD" >> "'f"
          > > etc
          >
          > This is no complete solution but just a basic idea:
          >
          > Example: In order to replace 'ABdEHhdDSJ' with 'HR'ho[fpS' try the
          > following clip:
          >
          > ^!SetArray %Char%=A;H;B;R;E;h;H;o;h;[;dD;f;SJ;pS;d;'
          > ^!Set %i%=0
          >
          > :Loop
          > ^!Inc %i%
          > ^!If ^%i% > ^%Char0% Out
          > ^!Set %Search%=^%Char^%i%%
          > ^!Inc %i%
          > ^!Set %ReplaceWith%=^%Char^%i%%
          > ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
          > ^!Goto Loop
          >
          > :Out
          > ^!Replace "|" >> "" WATS
          >
          > In array %Char%, you see a sequence where each character is followed
          > by the character it has to be replaced with. Note: In the array, a
          > string like 'd' must follow 'dD'.
          >
          > The point is not to replace a 'h' that replaced an 'E' again with '['.
          > To prevent this, each character that has been replaced already is
          > protected with '|'. So '|h' won't get replaced with [' again. This is
          > achieved by the Negative Lookbehind '(?<!\|)'.
          >
          > Probably, there are more issues in this but I hope it might be useful
          > as a first approach...
          >
          > Regards,
          > Flo
          >
          >



          [Non-text portions of this message have been removed]
        • flo.gehrke
          ... Ian, Yes, you are right. I ve been quite certain that my basic idea would get across more difficulties when regarding the complete list of characters.
          Message 4 of 19 , Jun 14, 2013
          View Source
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
            >
            > Flo,
            >
            > Still has bugs see bottom (...)

            Ian,

            Yes, you are right. I've been quite certain that my "basic idea" would get across more difficulties when regarding the complete list of characters.

            Probably, we cannot avoid to first split the target string into valid characters and sequences. Just to select a few of them...

            ; Assign all valid characters/sequences to %List%
            ^!SetArray %List%=bJ;dD;kY;KY;mD;mJ;pJ;PJ;SJ;sJ;a;A;b;B;c;d;E;e;F;f;g;h;H;o
            ^!Set %i%=1
            ; Split target string into valid characters/sequences
            ^!Replace "(?<!\|)^%List^%i%%(?!\|)" >> "|$0" WARS
            ^!Inc %i%
            ^!If ^%i% > ^%List0% ContinueWith...
            ^!Goto Split

            Now it's easier to match valid characters/sequences and to protect them against getting replaced another time. The loop could be now...

            :Loop
            ^!Inc %i%
            ^!If ^%i% > ^%Char0% Out
            ^!Set %Search%=\|^%Char^%i%%\|
            ^!Inc %i%
            ^!Set %RepWith%=^%Char^%i%%
            ^!Replace "^%Search%" >> "#^%RepWith%#" WARS
            ^!Goto Loop

            :Out
            ^!Replace "#" >> "" WATS

            Thus, a valid string like '|A|' will be replaced with '#H#' which should protect it against getting replaced another time.

            Of course, this is still imperfect. For example we have to pay attention to characters being used as metacharacters etc. But, in the moment, I have to leave that task to you or anyone else who has the time and patience to continue working on this...

            Regards,
            Flo
          • Ian NTnerd
            Flo, I don t think you need to split first. Just extend your initial idea so that the replacement that has two characters has a | inserted between the
            Message 5 of 19 , Jun 16, 2013
            View Source
            • 0 Attachment
              Flo,

              I don't think you need to split first. Just extend your initial idea so
              that the replacement that has two characters has a | inserted between
              the characters. Then your first RE replace works fine.

              So adding in these few lines before the replace happens but after the
              ^%ReplaceWith% is created.
              ^!If ^$StrSize("^%ReplaceWith%")$ = 2 addchar ELSE noadd
              :addchar
              ^!Set
              %ReplaceWith%=^$StrCopyLeft("^%ReplaceWith%";1)$|^$StrCopyRight("^%ReplaceWith%";1)$

              :noadd

              That gets me better results. I still think the before and after data
              strings are inconsistent with the supplied change values.

              I really appreciate what I learn from your RE.

              Ian

              On 14/06/2013 9:54 PM, flo.gehrke wrote:
              >
              > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>,
              > Ian NTnerd <indiamcq@...> wrote:
              > >
              > > Flo,
              > >
              > > Still has bugs see bottom (...)
              >
              > Ian,
              >
              > Yes, you are right. I've been quite certain that my "basic idea" would
              > get across more difficulties when regarding the complete list of
              > characters.
              >
              > Probably, we cannot avoid to first split the target string into valid
              > characters and sequences. Just to select a few of them...
              >
              > ; Assign all valid characters/sequences to %List%
              > ^!SetArray
              > %List%=bJ;dD;kY;KY;mD;mJ;pJ;PJ;SJ;sJ;a;A;b;B;c;d;E;e;F;f;g;h;H;o
              > ^!Set %i%=1
              > ; Split target string into valid characters/sequences
              > ^!Replace "(?<!\|)^%List^%i%%(?!\|)" >> "|$0" WARS
              > ^!Inc %i%
              > ^!If ^%i% > ^%List0% ContinueWith...
              > ^!Goto Split
              >
              > Now it's easier to match valid characters/sequences and to protect
              > them against getting replaced another time. The loop could be now...
              >
              > :Loop
              > ^!Inc %i%
              > ^!If ^%i% > ^%Char0% Out
              > ^!Set %Search%=\|^%Char^%i%%\|
              > ^!Inc %i%
              > ^!Set %RepWith%=^%Char^%i%%
              > ^!Replace "^%Search%" >> "#^%RepWith%#" WARS
              > ^!Goto Loop
              >
              > :Out
              > ^!Replace "#" >> "" WATS
              >
              > Thus, a valid string like '|A|' will be replaced with '#H#' which
              > should protect it against getting replaced another time.
              >
              > Of course, this is still imperfect. For example we have to pay
              > attention to characters being used as metacharacters etc. But, in the
              > moment, I have to leave that task to you or anyone else who has the
              > time and patience to continue working on this...
              >
              > Regards,
              > Flo
              >
              >



              [Non-text portions of this message have been removed]
            • flo.gehrke
              ... Ian, Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between
              Message 6 of 19 , Jun 17, 2013
              View Source
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
                >
                > Flo, I don't think you need to split first. Just extend your
                > initial idea so that the replacement that has two characters
                > has a | inserted between the characters. Then your first RE
                > replace works fine.

                Ian,

                Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between single characters and 2-character-sequences.

                So why the heck does rickah make it three replacements: 'bJ' -> 'bS', 'b' -> 'b', and 'J' -> 'S' if the result is always the same? Also why 'KY' -> 'cF', 'K' -> 'c', and 'Y' -> 'F' etc...?

                Unfortunately, the only one who could answer this has made himself scarce. So the job is much easier, indeed.

                Obviously, we have to add 'D' -> 'f'.

                Another idea is to store the search and replace characters in a LIST.TXT file as follows:

                a|g
                A|H
                b|b
                B|R
                c|C
                D|f
                d|'
                etc...

                Among the search characters, the backslash is the only metacharacter that must be escaped: '\\'.

                The clip will load and convert this list into an array %Char%:

                ^!SetScreenUpdate Off
                ^!SetClipboard ^$GetFileText(^$GetDocumentPath$LIST.TXT)$
                ^!SetClipboard ^$StrReplace((\||\r\n);»;^$GetClipboard$;RA)$
                ^!SetListDelimiter »
                ^!SetArray %Char%=^$GetClipboard$
                ^!Set %i%=0

                :Loop
                ^!Inc %i%
                ^!If ^%i% > ^%Char0% Out
                ^!Set %Search%=^%Char^%i%%
                ^!Inc %i%
                ^!Set %RepWith%=^%Char^%i%%
                ^!Replace "(?<!\|)^%Search%" >> "|^%RepWith%" WARS
                ^!Goto Loop

                :Out
                ^!Replace "|" >> "" WATS

                Tested with NT Pro 7.1.

                > I still think the before and after data strings are inconsistent
                > with the supplied change values.

                Yes, I agree with you. In the 54 ^!Replace commands, for example, 'A' is replaced with 'H' and not with 'h' as in the sample.

                Regards,
                Flo
              • flo.gehrke
                ... The complete LIST.TXT that is used in my clip: a|g A|H b|b B|R c|C D|f d| E|h e|J F|. f|m g|i h|[ H|o I| i|X j|* J|S K|c k|u L|s l|v m|r n|e o|D O|d p|y
                Message 7 of 19 , Jun 17, 2013
                View Source
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                  >
                  > Another idea is to store the search and replace characters in a LIST.TXT file as follows:
                  >
                  > a|g
                  > etc...

                  The complete LIST.TXT that is used in my clip:

                  a|g
                  A|H
                  b|b
                  B|R
                  c|C
                  D|f
                  d|'
                  E|h
                  e|J
                  F|.
                  f|m
                  g|i
                  h|[
                  H|o
                  I|>
                  i|X
                  j|*
                  J|S
                  K|c
                  k|u
                  L|s
                  l|v
                  m|r
                  n|e
                  o|D
                  O|d
                  p|y
                  P|z
                  q|n
                  r|&
                  S|p
                  s|q
                  t|w
                  T|x
                  u|k
                  v|l
                  w|0
                  W|G
                  X|{
                  x|t
                  y|,
                  Y|F
                  z|±S
                  :|;
                  \\|-

                  Note: The backslash (last entry) must be escaped '\\'.

                  Flo
                • rickah
                  Thanks a million Ian and Flo. I didn t want to interrupt before now because I ve been enjoying just watching the process of working out this problem. I ll
                  Message 8 of 19 , Jun 18, 2013
                  View Source
                  • 0 Attachment
                    Thanks a million Ian and Flo. I didn't want to interrupt before now because I've been enjoying just watching the process of working out this problem. I'll spend today trying to understand what you two have suggested and making it work for my situation.
                    I'll post my results.
                    RicHolland
                  • rickah
                    Flo and Ian, I apologize for not noticing this earlier, but you two seemed to be doing so well without me, I was afraid I d break your focus. If bJ occurs
                    Message 9 of 19 , Jun 18, 2013
                    View Source
                    • 0 Attachment
                      Flo and Ian,
                      I apologize for not noticing this earlier, but you two seemed to be doing so well without me, I was afraid I'd break your focus.

                      If "bJ" occurs together it becomes a single and different character.
                      If "bj" occurs together it is two separate characters. These are completely case sensitive pairings.

                      ONLY these character pairs follow this rule:
                      "bJ" >> "bS"
                      "dD" >> "'f"
                      "kY" >> "uF"
                      "KY" >> "cF"
                      "mD" >> "rf"
                      "mJ" >> "rS"
                      "pJ" >> "yS"
                      "PJ" >> "zS"
                      "SJ" >> "pS"
                      "sJ" >> "qS"

                      > > Flo, I don't think you need to split first. Just extend your
                      > > initial idea so that the replacement that has two characters
                      > > has a | inserted between the characters. Then your first RE
                      > > replace works fine.
                      >
                      > Ian,
                      >
                      > Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between single characters and 2-character-sequences.
                      >
                      > So why the heck does rickah make it three replacements: 'bJ' -> 'bS', 'b' -> 'b', and 'J' -> 'S' if the result is always the same? Also why 'KY' -> 'cF', 'K' -> 'c', and 'Y' -> 'F' etc...?
                      >
                      > Unfortunately, the only one who could answer this has made himself scarce. So the job is much easier, indeed.
                    • rickah
                      Flo and Ian, To display what function this all serves, I created a simple webpage. The two fonts must be installed to view it properly. web page:
                      Message 10 of 19 , Jun 18, 2013
                      View Source
                      • 0 Attachment
                        Flo and Ian,
                        To display what function this all serves, I created a simple webpage. The two fonts must be installed to view it properly.
                        web page: https://sites.google.com/site/my37s8ks8a/Latin-Sgaw
                        fonts zip: https://sites.google.com/site/my37s8ks8a/2KarenFonts.zip

                        I found a character error in my original sample line. There is one addtional 'A' in this line. I hope this didn't cause much trouble:
                        1: lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

                        =====

                        > In array %Char%, you see a sequence where each character is followed by the character it has to be replaced with. Note: In the array, a string like 'd' must follow 'dD'.
                        >
                        > The point is not to replace a 'h' that replaced an 'E' again with '['. To prevent this, each character that has been replaced already is protected with '|'. So '|h' won't get replaced with [' again. This is achieved by the Negative Lookbehind '(?<!\|)'.
                        >
                        > Probably, there are more issues in this but I hope it might be useful as a first approach...
                        >
                        > Regards,
                        > Flo
                        >
                      • flo.gehrke
                        ... I had a look at your webpage and tested those two lines (Karen Standard/Karen Normal Unique) with my clip (as posted with messages #23882 and #23883). I
                        Message 11 of 19 , Jun 18, 2013
                        View Source
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "rickah" <richolland@...> wrote:
                          >
                          > Flo and Ian, To display what function this all serves, I created
                          > a simple webpage...

                          I had a look at your webpage and tested those two lines (Karen Standard/Karen Normal Unique) with my clip (as posted with messages #23882 and #23883).

                          I get to a correct transcoding of your KSTD sample...

                          lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

                          to your KNU sample...

                          vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR

                          However different from your 54 replacements, this works only after changing my LIST.TXT as follows...

                          A >> h
                          E >> H
                          n >> E
                          U >> K

                          Regards,
                          Flo
                        • rickah
                          Vielen Danke, Flo. Most excellent. I had to update to NTB v7 (for the %RepWith%), and I had trouble with this one line relating to the list.txt, for obvious
                          Message 12 of 19 , Jun 19, 2013
                          View Source
                          • 0 Attachment
                            Vielen Danke, Flo. Most excellent.

                            I had to update to NTB v7 (for the %RepWith%), and I had trouble with this one line relating to the list.txt, for obvious reasons:

                            ^!SetListDelimiter »

                            instead of:

                            ^!SetListDelimiter >>
                            --

                            And a gold star for catching the "... n >> E, > U >> K" ...

                            This will help tremendously since there are more than those two font
                            variations to work with. Only recently does that language have a Unicode font with a standard keyboard layout. Working toward gradually changing the various older texts into the new font set will be so much easier now.

                            Thanks again, Flo and Ian,
                            Richard

                            --

                            --
                          • Ian NTnerd
                            Richard, Part of the idea is for you to learn how clips go together. If we give you all you don t learn as much. :-) I changed the tab list delimiter so email
                            Message 13 of 19 , Jun 19, 2013
                            View Source
                            • 0 Attachment
                              Richard,

                              Part of the idea is for you to learn how clips go together. If we give
                              you all you don't learn as much. :-)

                              I changed the tab list delimiter so email does not mess up the tab. Now
                              it is space greater than, greater than space.
                              With
                              ^!SetListDelimiter " >> "

                              Here is my working code with start and end samples.

                              If you are using it to go the other way Normal to Standard then the [
                              character needs to be escaped with a \[ in the list.

                              H="Karen Standard to Normal"
                              ^!SetListDelimiter ^p
                              ^!SetArray %Charpair%=^$GetClipText("list")$
                              ^!Set %i%=0

                              :Loop
                              ^!Inc %i%
                              ^!If ^%i% > ^%Charpair0% Out
                              ^!SetListDelimiter " >> "
                              ^!SetArray %pair%=^%Charpair^%i%%
                              ^!Set %Search%=^%pair1%
                              ^!Set %ReplaceWith%=^%pair2%
                              ^!SetDebug Off
                              ^!If ^$StrSize("^%ReplaceWith%")$ = 2 addchar ELSE noadd
                              :addchar
                              ^!Set
                              %ReplaceWith%=^$StrCopyLeft("^%ReplaceWith%";1)$|^$StrCopyRight("^%ReplaceWith%";1)$
                              :noadd
                              ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
                              ^!Goto Loop

                              :Out
                              ^!Replace "|" >> "" WATS

                              H=";List follows has the form character1 space greater_than greater_than
                              space character2"


                              H="_list"
                              bJ >> bS
                              dD >> 'f
                              kY >> uF
                              KY >> cF
                              mD >> rf
                              mJ >> rS
                              pJ >> yS
                              PJ >> zS
                              SJ >> pS
                              sJ >> qS
                              a >> g
                              A >> H
                              b >> b
                              B >> R
                              c >> C
                              d >> '
                              E >> h
                              e >> J
                              F >> .
                              f >> m
                              g >> i
                              h >> [
                              H >> o
                              I >> >
                              i >> X
                              j >> *
                              J >> S
                              K >> c
                              k >> u
                              L >> s
                              l >> v
                              m >> r
                              n >> E
                              o >> D
                              O >> d
                              p >> y
                              P >> z
                              q >> n
                              r >> &
                              S >> p
                              s >> q
                              t >> w
                              T >> x
                              u >> k
                              U >> K
                              v >> l
                              w >> 0
                              W >> G
                              X >> {
                              x >> t
                              y >> ,
                              Y >> F
                              z >> ±S
                              : >> ;
                              \\ >> -



                              H="Karen Standard"
                              lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

                              H="Karen Normal"
                              vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR


                              On 19/06/2013 11:13 PM, rickah wrote:
                              >
                              >
                              >
                              > Vielen Danke, Flo. Most excellent.
                              >
                              > I had to update to NTB v7 (for the %RepWith%), and I had trouble with
                              > this one line relating to the list.txt, for obvious reasons:
                              >
                              > ^!SetListDelimiter »
                              >
                              > instead of:
                              >
                              > ^!SetListDelimiter >>
                              > --
                              >
                              > And a gold star for catching the "... n >> E, > U >> K" ...
                              >
                              > This will help tremendously since there are more than those two font
                              > variations to work with. Only recently does that language have a
                              > Unicode font with a standard keyboard layout. Working toward gradually
                              > changing the various older texts into the new font set will be so much
                              > easier now.
                              >
                              > Thanks again, Flo and Ian,
                              > Richard
                              >
                              > --
                              >
                              > --
                              >
                              >



                              [Non-text portions of this message have been removed]
                            • rickah
                              Ian, I was not expecting nearly so much. The finished script is so complex yet compact (i.e., elegant) it will take some time to study just to figure out what
                              Message 14 of 19 , Jun 20, 2013
                              View Source
                              • 0 Attachment
                                Ian,
                                I was not expecting nearly so much. The finished script is so complex yet compact (i.e., elegant) it will take some time to study just to figure out what it does. I'm couldn't be happier that Flo took up this challenge.

                                You guys went went far above and beyond what I expected. I was completely thrilled to be able to re-code an entire test page of text with one click; and no mis-coding that I could detect. Now, internet searches not possible using one font set may work when using another.

                                After some study and research, I'm going to see if I can use what I learn to make it available in a web-share-able format. The people I know who could benefit from this script are not very computer literate to begin with.

                                I cannot thank y'all enough.

                                Yours, Richard.

                                --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
                                >
                                > Richard,
                                >
                                > Part of the idea is for you to learn how clips go together.
                                > If we give you all you don't learn as much. :-)
                                >
                                > I changed the tab list delimiter so email does not mess up the tab.
                                > Now it is space greater than, greater than space.
                                > With
                                > ^!SetListDelimiter " >> "
                                --
                              • flo.gehrke
                                ... If those sample lines on Richard s webpage show a correct transcoding then there are two differences in Ian s result: His clip replaces A H instead of
                                Message 15 of 19 , Jun 20, 2013
                                View Source
                                • 0 Attachment
                                  --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
                                  >
                                  >
                                  > Here is my working code with start and end samples...

                                  If those sample lines on Richard's webpage show a correct transcoding then there are two differences in Ian's result: His clip replaces 'A >> H' instead of 'h', and 'E >> h' instead of 'H'. It seems to work the other way round as I mentioned in message #23887.

                                  There's also an issue with 'n'. In the table on Richard's webpage, 'n' is replaced with 'e'. This accords with his sample lines where, at position #52, 'n' is replaced with 'e'. At #26, however, 'n' is replaced with 'E'. Why this?

                                  In other words: If 'n' has to be replaced with 'e' where does an 'E' come from? As a replace character, 'E' doesn't occur either in Richard's 54 replacements (see message #23877) or in the table on his webpage.

                                  So, if I'm not mistaken, some fine-tuning is needed here.

                                  Regards,
                                  Flo
                                • rickah
                                  Yes, you are again correct. I did make those changes but failed to note them all here. This is my LIST.TXT as it stands now. -- LIST.TXT a»g A»h b»b B»R
                                  Message 16 of 19 , Jun 21, 2013
                                  View Source
                                  • 0 Attachment
                                    Yes, you are again correct. I did make those changes but failed to note them all here. This is my LIST.TXT as it stands now.

                                    --
                                    LIST.TXT
                                    a»g
                                    A»h
                                    b»b
                                    B»R
                                    c»C
                                    D»f
                                    d»'
                                    E»H
                                    e»J
                                    F».
                                    f»m
                                    g»i
                                    G»A
                                    h»[
                                    H»o
                                    I»>
                                    i»X
                                    j»*
                                    J»S
                                    K»c
                                    k»u
                                    L»s
                                    l»v
                                    m»r
                                    n»e
                                    o»D
                                    O»d
                                    p»y
                                    P»z
                                    q»n
                                    r»&
                                    S»p
                                    s»q
                                    t»w
                                    T»x
                                    u»k
                                    U»K
                                    v»l
                                    w»0
                                    W»G
                                    X»{
                                    x»t
                                    y»,
                                    Y»F
                                    z»&S
                                    ,»<
                                    »A
                                    :»;
                                    …»µ
                                    .»$
                                    \\»-

                                    --end--

                                    KSTD keyboard character "G" and KNU character "A" are non-printing 'gaps' merely for visual effect but are not part of the written language. In this list, a KSTD space (" ") is changed to KNU "A " to both show a visible gap and add a word delimiting space.

                                    Common "Western" punctuation marks will eventually come in handy for newer publications, but these are not found in traditional S'gaw.

                                    Cheers,
                                    Richard.
                                    --
                                    --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                                    >
                                    > If those sample lines on Richard's webpage show a correct transcoding then there are two differences in Ian's result: His clip replaces 'A >> H' instead of 'h', and 'E >> h' instead of 'H'. It seems to work the other way round as I mentioned in message #23887.
                                    >
                                    > There's also an issue with 'n'. In the table on Richard's webpage, 'n' is replaced with 'e'. This accords with his sample lines where, at position #52, 'n' is replaced with 'e'. At #26, however, 'n' is replaced with 'E'. Why this?
                                    >
                                    > In other words: If 'n' has to be replaced with 'e' where does an 'E' come from? As a replace character, 'E' doesn't occur either in Richard's 54 replacements (see message #23877) or in the table on his webpage.
                                    >
                                    > So, if I'm not mistaken, some fine-tuning is needed here.
                                    >
                                    > Regards,
                                    > Flo
                                    >
                                  • flo.gehrke
                                    ... Richard, Two more ideas: 1. If in LIST.TXT the » is OK for you as a separator then we don t have to replace the pipe in that list any more but only the
                                    Message 17 of 19 , Jun 21, 2013
                                    View Source
                                    • 0 Attachment
                                      --- In ntb-clips@yahoogroups.com, "rickah" <richolland@...> wrote:
                                      >
                                      > Yes, you are again correct. I did make those changes but failed
                                      > to note them all here. This is my LIST.TXT as it stands now
                                      > --
                                      > LIST.TXT
                                      > a»g
                                      > A»h
                                      > (...)

                                      Richard,

                                      Two more ideas:

                                      1. If in LIST.TXT the '»' is OK for you as a separator then we don't have to replace the pipe in that list any more but only the CRNL.

                                      2. Your new LIST.TXT is containing...

                                      .»$

                                      Please note that the dot is a RegEx metacharacter which means 'any character except NL'. As a literal character it must be escaped with '\.' on the left:

                                      \.»$

                                      If you like we could omit the escaping in the list and insert two command lines which will automatically check any search character and add the backslash if needed -- see below.

                                      If you prefer this solution then remove all backslashs on the left in LIST.TXT.

                                      Regarding these ideas, now the latest version could be...


                                      ^!SetHintInfo Working...
                                      ^!SetScreenUpdate Off
                                      ^!SetClipboard ^$GetFileText(^$GetDocumentPath$LIST.TXT)$
                                      ^!SetClipboard ^$StrReplace(\R;»;^$GetClipboard$;RA)$
                                      ^!SetListDelimiter »
                                      ^!SetArray %Char%=^$GetClipboard$
                                      ^!Set %i%=0

                                      :Loop
                                      ^!Inc %i%
                                      ^!If ^%i% > ^%Char0% Out
                                      ^!Set %Search%=^%Char^%i%%
                                      ; New: Check for metacharacters
                                      ; --- Long line start---
                                      ^!IfMatch "(\.|\[|\(|\)|\^|\$|\*|\+|\?|\\|{|\|)" "^%Search%" Next Else Skip
                                      ; --- Long line end ---
                                      ^!Set %Search%=\^%Search%
                                      ^!Inc %i%
                                      ^!Set %RepWith%=^%Char^%i%%
                                      ^!Replace "(?<!\|)^%Search%" >> "|^%RepWith%" WARS
                                      ^!Goto Loop

                                      :Out
                                      ^!Replace "|" >> "" WATS
                                      ^!Info Finished!

                                      Regards,
                                      Flo
                                    • rickah
                                      That does simplify things. I don t have to guess which characters need to be escaped. Eventually, nearly all keyboard characters may need to be added to the
                                      Message 18 of 19 , Jun 22, 2013
                                      View Source
                                      • 0 Attachment
                                        That does simplify things. I don't have to guess which characters need to be escaped. Eventually, nearly all keyboard characters may need to be added to the list.


                                        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                                        > Richard,
                                        >
                                        > Two more ideas:
                                        >
                                        > > >
                                        > If you like we could omit the escaping in the list and insert two command lines which will automatically check any search character and add the backslash if needed -- see below.
                                        >
                                        > If you prefer this solution then remove all backslassh on the left in LIST.TXT.
                                        >
                                        > Regarding these ideas, now the latest version could be...
                                        >

                                        ^!IfMatch "(\.|\[|\(|\)|\^|\$|\*|\+|\?|\\|{|\|)" "^%Search%" Next Else Skip
                                        ^!Set %Search%=\^%Search%

                                        I've put the list.txt in the notepad.exe folder so I don't lose track of it.

                                        Would you be able to help implement the suggestion of Ian to use a clip list instead? I'm thinking it would make things much easier to share. (The clip H="_list" matches the LIST.TXT.)

                                        > ^!SetListDelimiter ^p
                                        > ^!SetArray %Charpair%=^$GetClipText("list")$
                                        > ^!Set %i%=0

                                        > :Loop
                                        > ^!Inc %i%
                                        > ^!If ^%i% > ^%Charpair0% Out
                                        > ^!SetListDelimiter ^T
                                        > ^!SetArray %pair%=^%Charpair^%i%%
                                        > ^!Set %Search%=^%pair1%
                                        > ;^!Inc %i%
                                        > ^!Set %ReplaceWith%=^%pair2%
                                        > ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
                                        > ^!Goto Loop

                                        One very minor error when setting list items; with "M|&Sl", the letter M is replaced by the complex character "&Sl". I found that this entry must follow "S|p" or "&Sl" beocmes "&pl".

                                        This reminds me that I'll eventually be working with character codes such as: ၁ and "\u1063\u103A". Do you foresee much difficulty?

                                        Rick
                                      • rickah
                                        I m going to end up with three or four conversion lists, so having them as separate text files is the better idea. Thanks again, Flo. R. Holland.
                                        Message 19 of 19 , Jun 28, 2013
                                        View Source
                                        • 0 Attachment
                                          I'm going to end up with three or four conversion lists, so having them as separate text files is the better idea.

                                          Thanks again, Flo.
                                          R. Holland.
                                        Your message has been successfully submitted and would be delivered to recipients shortly.