Loading ...
Sorry, an error occurred while loading the content.

[Clip] Re: alphanumeric character transcoding, by ones and pairs

Expand Messages
  • flo.gehrke
    ... Ian, Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between
    Message 1 of 19 , Jun 17, 2013
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
      >
      > Flo, I don't think you need to split first. Just extend your
      > initial idea so that the replacement that has two characters
      > has a | inserted between the characters. Then your first RE
      > replace works fine.

      Ian,

      Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between single characters and 2-character-sequences.

      So why the heck does rickah make it three replacements: 'bJ' -> 'bS', 'b' -> 'b', and 'J' -> 'S' if the result is always the same? Also why 'KY' -> 'cF', 'K' -> 'c', and 'Y' -> 'F' etc...?

      Unfortunately, the only one who could answer this has made himself scarce. So the job is much easier, indeed.

      Obviously, we have to add 'D' -> 'f'.

      Another idea is to store the search and replace characters in a LIST.TXT file as follows:

      a|g
      A|H
      b|b
      B|R
      c|C
      D|f
      d|'
      etc...

      Among the search characters, the backslash is the only metacharacter that must be escaped: '\\'.

      The clip will load and convert this list into an array %Char%:

      ^!SetScreenUpdate Off
      ^!SetClipboard ^$GetFileText(^$GetDocumentPath$LIST.TXT)$
      ^!SetClipboard ^$StrReplace((\||\r\n);»;^$GetClipboard$;RA)$
      ^!SetListDelimiter »
      ^!SetArray %Char%=^$GetClipboard$
      ^!Set %i%=0

      :Loop
      ^!Inc %i%
      ^!If ^%i% > ^%Char0% Out
      ^!Set %Search%=^%Char^%i%%
      ^!Inc %i%
      ^!Set %RepWith%=^%Char^%i%%
      ^!Replace "(?<!\|)^%Search%" >> "|^%RepWith%" WARS
      ^!Goto Loop

      :Out
      ^!Replace "|" >> "" WATS

      Tested with NT Pro 7.1.

      > I still think the before and after data strings are inconsistent
      > with the supplied change values.

      Yes, I agree with you. In the 54 ^!Replace commands, for example, 'A' is replaced with 'H' and not with 'h' as in the sample.

      Regards,
      Flo
    • flo.gehrke
      ... The complete LIST.TXT that is used in my clip: a|g A|H b|b B|R c|C D|f d| E|h e|J F|. f|m g|i h|[ H|o I| i|X j|* J|S K|c k|u L|s l|v m|r n|e o|D O|d p|y
      Message 2 of 19 , Jun 17, 2013
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
        >
        > Another idea is to store the search and replace characters in a LIST.TXT file as follows:
        >
        > a|g
        > etc...

        The complete LIST.TXT that is used in my clip:

        a|g
        A|H
        b|b
        B|R
        c|C
        D|f
        d|'
        E|h
        e|J
        F|.
        f|m
        g|i
        h|[
        H|o
        I|>
        i|X
        j|*
        J|S
        K|c
        k|u
        L|s
        l|v
        m|r
        n|e
        o|D
        O|d
        p|y
        P|z
        q|n
        r|&
        S|p
        s|q
        t|w
        T|x
        u|k
        v|l
        w|0
        W|G
        X|{
        x|t
        y|,
        Y|F
        z|±S
        :|;
        \\|-

        Note: The backslash (last entry) must be escaped '\\'.

        Flo
      • rickah
        Thanks a million Ian and Flo. I didn t want to interrupt before now because I ve been enjoying just watching the process of working out this problem. I ll
        Message 3 of 19 , Jun 18, 2013
        • 0 Attachment
          Thanks a million Ian and Flo. I didn't want to interrupt before now because I've been enjoying just watching the process of working out this problem. I'll spend today trying to understand what you two have suggested and making it work for my situation.
          I'll post my results.
          RicHolland
        • rickah
          Flo and Ian, I apologize for not noticing this earlier, but you two seemed to be doing so well without me, I was afraid I d break your focus. If bJ occurs
          Message 4 of 19 , Jun 18, 2013
          • 0 Attachment
            Flo and Ian,
            I apologize for not noticing this earlier, but you two seemed to be doing so well without me, I was afraid I'd break your focus.

            If "bJ" occurs together it becomes a single and different character.
            If "bj" occurs together it is two separate characters. These are completely case sensitive pairings.

            ONLY these character pairs follow this rule:
            "bJ" >> "bS"
            "dD" >> "'f"
            "kY" >> "uF"
            "KY" >> "cF"
            "mD" >> "rf"
            "mJ" >> "rS"
            "pJ" >> "yS"
            "PJ" >> "zS"
            "SJ" >> "pS"
            "sJ" >> "qS"

            > > Flo, I don't think you need to split first. Just extend your
            > > initial idea so that the replacement that has two characters
            > > has a | inserted between the characters. Then your first RE
            > > replace works fine.
            >
            > Ian,
            >
            > Yes, I think you are right. I was misled by those 54 ^!Replace lines in message #23877. Only today, I see that there seems to be no difference between single characters and 2-character-sequences.
            >
            > So why the heck does rickah make it three replacements: 'bJ' -> 'bS', 'b' -> 'b', and 'J' -> 'S' if the result is always the same? Also why 'KY' -> 'cF', 'K' -> 'c', and 'Y' -> 'F' etc...?
            >
            > Unfortunately, the only one who could answer this has made himself scarce. So the job is much easier, indeed.
          • rickah
            Flo and Ian, To display what function this all serves, I created a simple webpage. The two fonts must be installed to view it properly. web page:
            Message 5 of 19 , Jun 18, 2013
            • 0 Attachment
              Flo and Ian,
              To display what function this all serves, I created a simple webpage. The two fonts must be installed to view it properly.
              web page: https://sites.google.com/site/my37s8ks8a/Latin-Sgaw
              fonts zip: https://sites.google.com/site/my37s8ks8a/2KarenFonts.zip

              I found a character error in my original sample line. There is one addtional 'A' in this line. I hope this didn't cause much trouble:
              1: lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

              =====

              > In array %Char%, you see a sequence where each character is followed by the character it has to be replaced with. Note: In the array, a string like 'd' must follow 'dD'.
              >
              > The point is not to replace a 'h' that replaced an 'E' again with '['. To prevent this, each character that has been replaced already is protected with '|'. So '|h' won't get replaced with [' again. This is achieved by the Negative Lookbehind '(?<!\|)'.
              >
              > Probably, there are more issues in this but I hope it might be useful as a first approach...
              >
              > Regards,
              > Flo
              >
            • flo.gehrke
              ... I had a look at your webpage and tested those two lines (Karen Standard/Karen Normal Unique) with my clip (as posted with messages #23882 and #23883). I
              Message 6 of 19 , Jun 18, 2013
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "rickah" <richolland@...> wrote:
                >
                > Flo and Ian, To display what function this all serves, I created
                > a simple webpage...

                I had a look at your webpage and tested those two lines (Karen Standard/Karen Normal Unique) with my clip (as posted with messages #23882 and #23883).

                I get to a correct transcoding of your KSTD sample...

                lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

                to your KNU sample...

                vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR

                However different from your 54 replacements, this works only after changing my LIST.TXT as follows...

                A >> h
                E >> H
                n >> E
                U >> K

                Regards,
                Flo
              • rickah
                Vielen Danke, Flo. Most excellent. I had to update to NTB v7 (for the %RepWith%), and I had trouble with this one line relating to the list.txt, for obvious
                Message 7 of 19 , Jun 19, 2013
                • 0 Attachment
                  Vielen Danke, Flo. Most excellent.

                  I had to update to NTB v7 (for the %RepWith%), and I had trouble with this one line relating to the list.txt, for obvious reasons:

                  ^!SetListDelimiter »

                  instead of:

                  ^!SetListDelimiter >>
                  --

                  And a gold star for catching the "... n >> E, > U >> K" ...

                  This will help tremendously since there are more than those two font
                  variations to work with. Only recently does that language have a Unicode font with a standard keyboard layout. Working toward gradually changing the various older texts into the new font set will be so much easier now.

                  Thanks again, Flo and Ian,
                  Richard

                  --

                  --
                • Ian NTnerd
                  Richard, Part of the idea is for you to learn how clips go together. If we give you all you don t learn as much. :-) I changed the tab list delimiter so email
                  Message 8 of 19 , Jun 19, 2013
                  • 0 Attachment
                    Richard,

                    Part of the idea is for you to learn how clips go together. If we give
                    you all you don't learn as much. :-)

                    I changed the tab list delimiter so email does not mess up the tab. Now
                    it is space greater than, greater than space.
                    With
                    ^!SetListDelimiter " >> "

                    Here is my working code with start and end samples.

                    If you are using it to go the other way Normal to Standard then the [
                    character needs to be escaped with a \[ in the list.

                    H="Karen Standard to Normal"
                    ^!SetListDelimiter ^p
                    ^!SetArray %Charpair%=^$GetClipText("list")$
                    ^!Set %i%=0

                    :Loop
                    ^!Inc %i%
                    ^!If ^%i% > ^%Charpair0% Out
                    ^!SetListDelimiter " >> "
                    ^!SetArray %pair%=^%Charpair^%i%%
                    ^!Set %Search%=^%pair1%
                    ^!Set %ReplaceWith%=^%pair2%
                    ^!SetDebug Off
                    ^!If ^$StrSize("^%ReplaceWith%")$ = 2 addchar ELSE noadd
                    :addchar
                    ^!Set
                    %ReplaceWith%=^$StrCopyLeft("^%ReplaceWith%";1)$|^$StrCopyRight("^%ReplaceWith%";1)$
                    :noadd
                    ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
                    ^!Goto Loop

                    :Out
                    ^!Replace "|" >> "" WATS

                    H=";List follows has the form character1 space greater_than greater_than
                    space character2"


                    H="_list"
                    bJ >> bS
                    dD >> 'f
                    kY >> uF
                    KY >> cF
                    mD >> rf
                    mJ >> rS
                    pJ >> yS
                    PJ >> zS
                    SJ >> pS
                    sJ >> qS
                    a >> g
                    A >> H
                    b >> b
                    B >> R
                    c >> C
                    d >> '
                    E >> h
                    e >> J
                    F >> .
                    f >> m
                    g >> i
                    h >> [
                    H >> o
                    I >> >
                    i >> X
                    j >> *
                    J >> S
                    K >> c
                    k >> u
                    L >> s
                    l >> v
                    m >> r
                    n >> E
                    o >> D
                    O >> d
                    p >> y
                    P >> z
                    q >> n
                    r >> &
                    S >> p
                    s >> q
                    t >> w
                    T >> x
                    u >> k
                    U >> K
                    v >> l
                    w >> 0
                    W >> G
                    X >> {
                    x >> t
                    y >> ,
                    Y >> F
                    z >> ±S
                    : >> ;
                    \\ >> -



                    H="Karen Standard"
                    lABTAkvFHWAkolaKELdAFqalanWElacO:laKv:sE:muIsE:sFlanolapLUB

                    H="Karen Normal"
                    vhRxhul.oGhuDvgcHs'h.ngvgEGHvgCd;vgcl;qH;rk>qH;q.vgeDvgysKR


                    On 19/06/2013 11:13 PM, rickah wrote:
                    >
                    >
                    >
                    > Vielen Danke, Flo. Most excellent.
                    >
                    > I had to update to NTB v7 (for the %RepWith%), and I had trouble with
                    > this one line relating to the list.txt, for obvious reasons:
                    >
                    > ^!SetListDelimiter »
                    >
                    > instead of:
                    >
                    > ^!SetListDelimiter >>
                    > --
                    >
                    > And a gold star for catching the "... n >> E, > U >> K" ...
                    >
                    > This will help tremendously since there are more than those two font
                    > variations to work with. Only recently does that language have a
                    > Unicode font with a standard keyboard layout. Working toward gradually
                    > changing the various older texts into the new font set will be so much
                    > easier now.
                    >
                    > Thanks again, Flo and Ian,
                    > Richard
                    >
                    > --
                    >
                    > --
                    >
                    >



                    [Non-text portions of this message have been removed]
                  • rickah
                    Ian, I was not expecting nearly so much. The finished script is so complex yet compact (i.e., elegant) it will take some time to study just to figure out what
                    Message 9 of 19 , Jun 20, 2013
                    • 0 Attachment
                      Ian,
                      I was not expecting nearly so much. The finished script is so complex yet compact (i.e., elegant) it will take some time to study just to figure out what it does. I'm couldn't be happier that Flo took up this challenge.

                      You guys went went far above and beyond what I expected. I was completely thrilled to be able to re-code an entire test page of text with one click; and no mis-coding that I could detect. Now, internet searches not possible using one font set may work when using another.

                      After some study and research, I'm going to see if I can use what I learn to make it available in a web-share-able format. The people I know who could benefit from this script are not very computer literate to begin with.

                      I cannot thank y'all enough.

                      Yours, Richard.

                      --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
                      >
                      > Richard,
                      >
                      > Part of the idea is for you to learn how clips go together.
                      > If we give you all you don't learn as much. :-)
                      >
                      > I changed the tab list delimiter so email does not mess up the tab.
                      > Now it is space greater than, greater than space.
                      > With
                      > ^!SetListDelimiter " >> "
                      --
                    • flo.gehrke
                      ... If those sample lines on Richard s webpage show a correct transcoding then there are two differences in Ian s result: His clip replaces A H instead of
                      Message 10 of 19 , Jun 20, 2013
                      • 0 Attachment
                        --- In ntb-clips@yahoogroups.com, Ian NTnerd <indiamcq@...> wrote:
                        >
                        >
                        > Here is my working code with start and end samples...

                        If those sample lines on Richard's webpage show a correct transcoding then there are two differences in Ian's result: His clip replaces 'A >> H' instead of 'h', and 'E >> h' instead of 'H'. It seems to work the other way round as I mentioned in message #23887.

                        There's also an issue with 'n'. In the table on Richard's webpage, 'n' is replaced with 'e'. This accords with his sample lines where, at position #52, 'n' is replaced with 'e'. At #26, however, 'n' is replaced with 'E'. Why this?

                        In other words: If 'n' has to be replaced with 'e' where does an 'E' come from? As a replace character, 'E' doesn't occur either in Richard's 54 replacements (see message #23877) or in the table on his webpage.

                        So, if I'm not mistaken, some fine-tuning is needed here.

                        Regards,
                        Flo
                      • rickah
                        Yes, you are again correct. I did make those changes but failed to note them all here. This is my LIST.TXT as it stands now. -- LIST.TXT a»g A»h b»b B»R
                        Message 11 of 19 , Jun 21, 2013
                        • 0 Attachment
                          Yes, you are again correct. I did make those changes but failed to note them all here. This is my LIST.TXT as it stands now.

                          --
                          LIST.TXT
                          a»g
                          A»h
                          b»b
                          B»R
                          c»C
                          D»f
                          d»'
                          E»H
                          e»J
                          F».
                          f»m
                          g»i
                          G»A
                          h»[
                          H»o
                          I»>
                          i»X
                          j»*
                          J»S
                          K»c
                          k»u
                          L»s
                          l»v
                          m»r
                          n»e
                          o»D
                          O»d
                          p»y
                          P»z
                          q»n
                          r»&
                          S»p
                          s»q
                          t»w
                          T»x
                          u»k
                          U»K
                          v»l
                          w»0
                          W»G
                          X»{
                          x»t
                          y»,
                          Y»F
                          z»&S
                          ,»<
                          »A
                          :»;
                          …»µ
                          .»$
                          \\»-

                          --end--

                          KSTD keyboard character "G" and KNU character "A" are non-printing 'gaps' merely for visual effect but are not part of the written language. In this list, a KSTD space (" ") is changed to KNU "A " to both show a visible gap and add a word delimiting space.

                          Common "Western" punctuation marks will eventually come in handy for newer publications, but these are not found in traditional S'gaw.

                          Cheers,
                          Richard.
                          --
                          --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                          >
                          > If those sample lines on Richard's webpage show a correct transcoding then there are two differences in Ian's result: His clip replaces 'A >> H' instead of 'h', and 'E >> h' instead of 'H'. It seems to work the other way round as I mentioned in message #23887.
                          >
                          > There's also an issue with 'n'. In the table on Richard's webpage, 'n' is replaced with 'e'. This accords with his sample lines where, at position #52, 'n' is replaced with 'e'. At #26, however, 'n' is replaced with 'E'. Why this?
                          >
                          > In other words: If 'n' has to be replaced with 'e' where does an 'E' come from? As a replace character, 'E' doesn't occur either in Richard's 54 replacements (see message #23877) or in the table on his webpage.
                          >
                          > So, if I'm not mistaken, some fine-tuning is needed here.
                          >
                          > Regards,
                          > Flo
                          >
                        • flo.gehrke
                          ... Richard, Two more ideas: 1. If in LIST.TXT the » is OK for you as a separator then we don t have to replace the pipe in that list any more but only the
                          Message 12 of 19 , Jun 21, 2013
                          • 0 Attachment
                            --- In ntb-clips@yahoogroups.com, "rickah" <richolland@...> wrote:
                            >
                            > Yes, you are again correct. I did make those changes but failed
                            > to note them all here. This is my LIST.TXT as it stands now
                            > --
                            > LIST.TXT
                            > a»g
                            > A»h
                            > (...)

                            Richard,

                            Two more ideas:

                            1. If in LIST.TXT the '»' is OK for you as a separator then we don't have to replace the pipe in that list any more but only the CRNL.

                            2. Your new LIST.TXT is containing...

                            .»$

                            Please note that the dot is a RegEx metacharacter which means 'any character except NL'. As a literal character it must be escaped with '\.' on the left:

                            \.»$

                            If you like we could omit the escaping in the list and insert two command lines which will automatically check any search character and add the backslash if needed -- see below.

                            If you prefer this solution then remove all backslashs on the left in LIST.TXT.

                            Regarding these ideas, now the latest version could be...


                            ^!SetHintInfo Working...
                            ^!SetScreenUpdate Off
                            ^!SetClipboard ^$GetFileText(^$GetDocumentPath$LIST.TXT)$
                            ^!SetClipboard ^$StrReplace(\R;»;^$GetClipboard$;RA)$
                            ^!SetListDelimiter »
                            ^!SetArray %Char%=^$GetClipboard$
                            ^!Set %i%=0

                            :Loop
                            ^!Inc %i%
                            ^!If ^%i% > ^%Char0% Out
                            ^!Set %Search%=^%Char^%i%%
                            ; New: Check for metacharacters
                            ; --- Long line start---
                            ^!IfMatch "(\.|\[|\(|\)|\^|\$|\*|\+|\?|\\|{|\|)" "^%Search%" Next Else Skip
                            ; --- Long line end ---
                            ^!Set %Search%=\^%Search%
                            ^!Inc %i%
                            ^!Set %RepWith%=^%Char^%i%%
                            ^!Replace "(?<!\|)^%Search%" >> "|^%RepWith%" WARS
                            ^!Goto Loop

                            :Out
                            ^!Replace "|" >> "" WATS
                            ^!Info Finished!

                            Regards,
                            Flo
                          • rickah
                            That does simplify things. I don t have to guess which characters need to be escaped. Eventually, nearly all keyboard characters may need to be added to the
                            Message 13 of 19 , Jun 22, 2013
                            • 0 Attachment
                              That does simplify things. I don't have to guess which characters need to be escaped. Eventually, nearly all keyboard characters may need to be added to the list.


                              --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                              > Richard,
                              >
                              > Two more ideas:
                              >
                              > > >
                              > If you like we could omit the escaping in the list and insert two command lines which will automatically check any search character and add the backslash if needed -- see below.
                              >
                              > If you prefer this solution then remove all backslassh on the left in LIST.TXT.
                              >
                              > Regarding these ideas, now the latest version could be...
                              >

                              ^!IfMatch "(\.|\[|\(|\)|\^|\$|\*|\+|\?|\\|{|\|)" "^%Search%" Next Else Skip
                              ^!Set %Search%=\^%Search%

                              I've put the list.txt in the notepad.exe folder so I don't lose track of it.

                              Would you be able to help implement the suggestion of Ian to use a clip list instead? I'm thinking it would make things much easier to share. (The clip H="_list" matches the LIST.TXT.)

                              > ^!SetListDelimiter ^p
                              > ^!SetArray %Charpair%=^$GetClipText("list")$
                              > ^!Set %i%=0

                              > :Loop
                              > ^!Inc %i%
                              > ^!If ^%i% > ^%Charpair0% Out
                              > ^!SetListDelimiter ^T
                              > ^!SetArray %pair%=^%Charpair^%i%%
                              > ^!Set %Search%=^%pair1%
                              > ;^!Inc %i%
                              > ^!Set %ReplaceWith%=^%pair2%
                              > ^!Replace "(?<!\|)^%Search%" >> "|^%ReplaceWith%" WARS
                              > ^!Goto Loop

                              One very minor error when setting list items; with "M|&Sl", the letter M is replaced by the complex character "&Sl". I found that this entry must follow "S|p" or "&Sl" beocmes "&pl".

                              This reminds me that I'll eventually be working with character codes such as: ၁ and "\u1063\u103A". Do you foresee much difficulty?

                              Rick
                            • rickah
                              I m going to end up with three or four conversion lists, so having them as separate text files is the better idea. Thanks again, Flo. R. Holland.
                              Message 14 of 19 , Jun 28, 2013
                              • 0 Attachment
                                I'm going to end up with three or four conversion lists, so having them as separate text files is the better idea.

                                Thanks again, Flo.
                                R. Holland.
                              Your message has been successfully submitted and would be delivered to recipients shortly.