Loading ...
Sorry, an error occurred while loading the content.

Re: Finding gaps in a sequence

Expand Messages
  • flo.gehrke
    Joy, I also went through your clip again (messages #22230, #22245). I like that formula ^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$ which, actually, seems to be
    Message 1 of 29 , Dec 1, 2011
    • 0 Attachment
      Joy,

      I also went through your clip again (messages #22230, #22245). I like that formula '^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$' which, actually, seems to be the heart of your solution.

      So I combined it with some ideas of my first concept and managed to speed up your clip significantly. Originally, your clip needed 78 seconds (on my notebook) to check a list of 10,000 codes. The following version is doing it in 9 seconds:


      ^!SetHintInfo Working...
      ; Assign code list to array %List%
      ^!SetListDelimiter ^%NL%
      ^!SetArray %List%=^$GetText$
      ^!Set %AZ%="abcdefghijklmnopqrstuvwxyz"
      ^!Set %i%=1

      :CodeToInt
      ; Save current code to variable for later output in case of gap
      ^!Set %CurrCode%=^%List^%i%%
      ; Convert code to number(with Joy's formula)
      ^!Set %First%=^$Convert(^%List^%i%%)$
      ^!Inc %First%
      ^!Inc %i%
      ^!If ^%i% > ^%List0% Out
      ^!Set %Second%=^$Convert(^%List^%i%%)$
      ^!IfSame ^%First% ^%Second% CodeToInt Else False

      :False
      ^!Append %Gaps%=^%CurrCode%^P
      ^!Goto CodeToInt

      :Out
      ^!IfEmpty ^%Gaps% Next Else Skip_2
      ^!Info No gaps!
      ^!Goto Skip_3
      ^!Toolbar New Document
      ^!InsertText Gap found after...^P^%Gaps%
      ^!Toolbar Second Window
      ^!ClearVariables


      The sublip with custom function ^$Convert$ and your formula is...

      ^!Set %C1%=^$StrIndex(^&;1)$
      ^!Set %C2%=^$StrIndex(^&;2)$
      ^!Set %C3%=^$StrIndex(^&;3)$
      ^!Set %V1%=^$StrPos(^%C1%;^%AZ%;0)$
      ^!Set %V2%=^$StrPos(^%C2%;^%AZ%;0)$
      ^!Set %V3%=^$StrPos(^%C3%;^%AZ%;0)$
      ^!Result ^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$


      Thanks again for your proposal! Maybe you'll have a look at this revised version...

      Regards,
      Flo


      --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@...> wrote:
      >
      >
      >
      > --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
      > >
      > > Hi Flo,
      > >
      > > You have are right in what the hex conversion was supposed to do.
      > > In the mean time I found my original char to hex clip, which only converted a single digit. I applied the single-digit approach to your problem. While I got it to work, it just raised another problem.
      > >
      > > The alphabet is like a base-26 number set (English aplhabet), after shifting a to zero. Straight conversion to numbers creates gaps, where it rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of the next digit, and azz to baa has a gap much larger!
      >
      >
      > Sorry if I misunderstood you but I'll reply just in case in order to save you possible extra work and confusion...
      >
      > You said aaz to aba has a gap but you correctly noted a=0...z=25.
      > Therefore, aaz=(0*26^2 + 0*26 + 25)=25 and aba=(0*26^2 + 1*26 + 0)=26 - No gap. Likewise, azz=675 and baa=676. Again, no gap.
      >
      > Hope this helps, sorry if I misunderstood.
      >
      > Joy
      >
    • Eb
      Flo, ... Is that fast or slow? ... To change the output to the original code, just make a copy of the codes array and use the unadulterated copy to display
      Message 2 of 29 , Dec 1, 2011
      • 0 Attachment
        Flo,

        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:

        > I've tested it succesfully. In a list of 10,000 3-digit-alpha-codes it needs 118 seconds to find a gap.


        Is that fast or slow?


        > Maybe it's a bit complicated to see those gaps because it outputs numbers and not the code -- but never mind. What matters here is the basic concept.


        To change the output to the original code, just make a copy of the 'codes' array and use the unadulterated copy to display the gap (or display both the original code and the numeric code, since the numbers give a clearer picture of how large the gap is.


        Cheers
      • Art Kocsis
        ... If I am interpreting correctly what you said here, the statement is not correct - there is no gap using the alphabet as symbols for a base 26 numbering
        Message 3 of 29 , Dec 1, 2011
        • 0 Attachment
          At 11/30/2011 13:28, Eb wrote:
          >The alphabet is like a base-26 number set (English aplhabet), after
          >shifting a to zero. Straight conversion to numbers creates gaps, where it
          >rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of
          >the next digit, and azz to baa has a gap much larger!

          If I am interpreting correctly what you said here, the statement is not
          correct - there is no gap using the alphabet as symbols for a base 26
          numbering system.

          Any integer (including negative ones), may be used as a base for counting
          sequentially and takes the form: sum(d(i) * b^i) where "d" is the ith
          "digit" (right to left, 0 based) and "b" is the base-1. In the case of
          using the alphabet symbols to represent base 26 digits: a=0, b=1 ... z=25
          and the base 10 value of any such number would be d2 * 26^2 + d1 * 26^1 +
          d0 * 26^0 or d2*676 +d1*26 + d0*1.

          Thus aaz = 0*676 + 0*26 + 25*1 = 25 and aba = 0*676 + 1*26 +0*1 = 26
          (no gap)
          Also azz = 0*676 + 25*26 + 25*1 = 675 and baa = 1*676 + 0*26 +0*1 = 676
          (again, no gap)

          Your code uses does correctly so the statement may just be ambiguously worded.

          BTW, very clever use of ^!Inc & ^!Dec to do arithmetic! I'll have to
          remember that.

          I have noted that none of the suggested solutions have done any input data
          verification but all assume that each line truly begins with a three (lower
          case) alpha character. Your use of ^$GetDocMatchAll("^[a-z]{3}")$ to
          extract the sequence codes would seem to offer a simple, one-line way to
          verify that assumption: just compare the size of the ^%codes% array to the
          line count of the source document.

          ^!If ^$GetParaCount$ <> %codes0% ^!Continue Input data error - missing
          sequence code(s)


          Namaste', Art
        • Eb
          Joy, I observed a gap while using a non-mathematical (== ) technique to convert from base 26 (the alphabet) to base 16 by using the ascii codes: aaz == 0 x 41
          Message 4 of 29 , Dec 2, 2011
          • 0 Attachment
            Joy,

            I observed a gap while using a non-mathematical (==>) technique to convert from base 26 (the alphabet) to base 16 by using the ascii codes:

            aaz ==> 0 x 41 41 5A = 4,276,570
            aba ==> 0 x 41 42 41 = 4,276,801

            Once I shifted to the base 26 array approach, I may have stayed in the haze of non-math confusion for a bit longer <g>.


            Eb


            --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@...> wrote:
            >
            >
            >
            > --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
            > >
            > > Hi Flo,
            > >
            > > You have are right in what the hex conversion was supposed to do.
            > > In the mean time I found my original char to hex clip, which only converted a single digit. I applied the single-digit approach to your problem. While I got it to work, it just raised another problem.
            > >
            > > The alphabet is like a base-26 number set (English aplhabet), after shifting a to zero. Straight conversion to numbers creates gaps, where it rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of the next digit, and azz to baa has a gap much larger!
            >
            >
            > Sorry if I misunderstood you but I'll reply just in case in order to save you possible extra work and confusion...
            >
            > You said aaz to aba has a gap but you correctly noted a=0...z=25.
            > Therefore, aaz=(0*26^2 + 0*26 + 25)=25 and aba=(0*26^2 + 1*26 + 0)=26 - No gap. Likewise, azz=675 and baa=676. Again, no gap.
            >
            > Hope this helps, sorry if I misunderstood.
            >
            > Joy
            >
          • Eb
            Yes, I was still confused by my earlier attempt to convert character codes to hex codes using ascii. My test clip still had elements of hex code in it. Color
            Message 5 of 29 , Dec 2, 2011
            • 0 Attachment
              Yes, I was still confused by my earlier attempt to convert character codes to hex codes using ascii.

              My test clip still had elements of hex code in it.

              Color me embarrassed.

              Eb

              --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@...> wrote:
              >
              > At 11/30/2011 13:28, Eb wrote:
              > >The alphabet is like a base-26 number set (English aplhabet), after
              > >shifting a to zero. Straight conversion to numbers creates gaps, where it
              > >rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of
              > >the next digit, and azz to baa has a gap much larger!
              >
              > If I am interpreting correctly what you said here, the statement is not
              > correct - there is no gap using the alphabet as symbols for a base 26
              > numbering system.
            • ebbtidalflats
              Hi Art, I suspect that none of the people offering solutions are privy to the format of the data file. So verifying input must be left to Flo. For example, the
              Message 6 of 29 , Dec 2, 2011
              • 0 Attachment
                Hi Art,

                I suspect that none of the people offering solutions are privy to the format of the data file. So verifying input must be left to Flo.

                For example, the ^$GetDocMathcAll statement must include the field delimiter to avoid also matching the first three characters of longer words, which might not be index codes at all.


                Cheers


                Eb

                --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@...> wrote:
                > ...
                > I have noted that none of the suggested solutions have done any input data
                > verification but all assume that each line truly begins with a three (lower
                > case) alpha character. Your use of ^$GetDocMatchAll("^[a-z]{3}")$ to
                > extract the sequence codes would seem to offer a simple, one-line way to
                > verify that assumption: just compare the size of the ^%codes% array to the
                > line count of the source document.
                >
                > ^!If ^$GetParaCount$ <> %codes0% ^!Continue Input data error - missing
                > sequence code(s)
                >
                >
                > Namaste', Art
                >
              • flo.gehrke
                ... Friends, I started this topic with message #22221 writing... ... So why speculating about the format of the data? Why inventing characters and strings
                Message 7 of 29 , Dec 2, 2011
                • 0 Attachment
                  > --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@> wrote:
                  > I have noted that none of the suggested solutions have done any
                  > input data verification but all assume that each line truly
                  > begins with a three (lower case) alpha character...

                  --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...> wrote:
                  >
                  > Hi Art,
                  >
                  > I suspect that none of the people offering solutions are privy to
                  > the format of the data file. So verifying input must be left to Flo.

                  Friends,

                  I started this topic with message #22221 writing...

                  > I've got a database where each record is indexed with an alpha-code
                  > from 'aaa' to 'zzz'. Every now and then, I want to find out if there
                  > is a gap in a sorted list of these codes. There's a gap, for
                  > example, in...
                  >
                  > zbx
                  > zby
                  > zbz
                  > zca
                  > zcc
                  > zcd

                  So why speculating about the format of the data? Why inventing characters and strings which actually are not there?

                  "For we write none other things unto you,
                  than what ye read or acknowledge..."
                  Corinthians 2, 1:13

                  Flo
                • joy8388608
                  Flo - Very interesting. Your clip is much faster than mine even when I turned ScreenUpdate off. Mine took 41 seconds and yours took 15 for 17550 lines (aaa to
                  Message 8 of 29 , Dec 5, 2011
                  • 0 Attachment
                    Flo -

                    Very interesting. Your clip is much faster than mine even when I turned ScreenUpdate off. Mine took 41 seconds and yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why? I'm not sure. Perhaps working with an array even though the lines on a screen are probably just another type of array.

                    This has been fun, interesting, and I've learned several new things.

                    Oh, yes. You don't have to, but as I posted previously, you can modify the value of %AZ% to "bcdefghijklmnopqrstuvwxyz" (remove the 'a') for correctness.

                    Thanks,
                    Joy

                    P.S. On the off chance anyone else (still) wants to play with this for learning purposes, I wrote a quick clip to generate the lines aaa to zzz. Let me know if anyone wants me to post the code.


                    --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                    >
                    > Joy,
                    >
                    > I also went through your clip again (messages #22230, #22245). I like that formula '^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$' which, actually, seems to be the heart of your solution.
                    >
                    > So I combined it with some ideas of my first concept and managed to speed up your clip significantly. Originally, your clip needed 78 seconds (on my notebook) to check a list of 10,000 codes. The following version is doing it in 9 seconds:
                    >
                    >
                    > ^!SetHintInfo Working...
                    > ; Assign code list to array %List%
                    > ^!SetListDelimiter ^%NL%
                    > ^!SetArray %List%=^$GetText$
                    > ^!Set %AZ%="abcdefghijklmnopqrstuvwxyz"
                    > ^!Set %i%=1
                    >
                    > :CodeToInt
                    > ; Save current code to variable for later output in case of gap
                    > ^!Set %CurrCode%=^%List^%i%%
                    > ; Convert code to number(with Joy's formula)
                    > ^!Set %First%=^$Convert(^%List^%i%%)$
                    > ^!Inc %First%
                    > ^!Inc %i%
                    > ^!If ^%i% > ^%List0% Out
                    > ^!Set %Second%=^$Convert(^%List^%i%%)$
                    > ^!IfSame ^%First% ^%Second% CodeToInt Else False
                    >
                    > :False
                    > ^!Append %Gaps%=^%CurrCode%^P
                    > ^!Goto CodeToInt
                    >
                    > :Out
                    > ^!IfEmpty ^%Gaps% Next Else Skip_2
                    > ^!Info No gaps!
                    > ^!Goto Skip_3
                    > ^!Toolbar New Document
                    > ^!InsertText Gap found after...^P^%Gaps%
                    > ^!Toolbar Second Window
                    > ^!ClearVariables
                    >
                    >
                    > The sublip with custom function ^$Convert$ and your formula is...
                    >
                    > ^!Set %C1%=^$StrIndex(^&;1)$
                    > ^!Set %C2%=^$StrIndex(^&;2)$
                    > ^!Set %C3%=^$StrIndex(^&;3)$
                    > ^!Set %V1%=^$StrPos(^%C1%;^%AZ%;0)$
                    > ^!Set %V2%=^$StrPos(^%C2%;^%AZ%;0)$
                    > ^!Set %V3%=^$StrPos(^%C3%;^%AZ%;0)$
                    > ^!Result ^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$
                    >
                    >
                    > Thanks again for your proposal! Maybe you'll have a look at this revised version...
                    >
                    > Regards,
                    > Flo
                  • flo.gehrke
                    ... Joy, I think there are three main reasons for that: 1. Assigning the whole list to an array 2. Calculating ^$ConvertTo26$ only twice -- it s done three
                    Message 9 of 29 , Dec 5, 2011
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@...> wrote:
                      >
                      > Flo -
                      >
                      > Very interesting. Your clip is much faster than mine even
                      > when I turned ScreenUpdate off. Mine took 41 seconds and
                      > yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines
                      > removed). Why? I'm not sure...

                      > Flo -
                      >
                      > Very interesting. Your clip is much faster than mine even when
                      > I turned ScreenUpdate off. Mine took 41 seconds and yours took
                      > 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why?
                      > I'm not sure...

                      Joy,

                      I think there are three main reasons for that:

                      1. Assigning the whole list to an array

                      2. Calculating ^$ConvertTo26$ only twice -- it's done three times in your clip

                      3. Gathering up the gaps with ^!Append and outputting them all at once -- no ^!InsertText

                      > I wrote a quick clip to generate the lines aaa to zzz. Let
                      > me know if anyone wants me to post the code.

                      I put my hand up and would enjoy seeing that clip!

                      Flo
                    • joy8388608
                      ... My pleasure. Joy Generate Base 26 numbers ; by Joy ^!Continue This will generate 17576 lines from aaa to zzz. ^!SKIP Leave Screen update on? (Slower...)
                      Message 10 of 29 , Dec 7, 2011
                      • 0 Attachment
                        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                        >
                        > --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@> wrote:
                        > >
                        > > Flo -
                        > >
                        > > Very interesting. Your clip is much faster than mine even
                        > > when I turned ScreenUpdate off. Mine took 41 seconds and
                        > > yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines
                        > > removed). Why? I'm not sure...
                        >
                        > > Flo -
                        > >
                        > > Very interesting. Your clip is much faster than mine even when
                        > > I turned ScreenUpdate off. Mine took 41 seconds and yours took
                        > > 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why?
                        > > I'm not sure...
                        >
                        > Joy,
                        >
                        > I think there are three main reasons for that:
                        >
                        > 1. Assigning the whole list to an array
                        >
                        > 2. Calculating ^$ConvertTo26$ only twice -- it's done three times in your clip
                        >
                        > 3. Gathering up the gaps with ^!Append and outputting them all at once -- no ^!InsertText
                        >
                        > > I wrote a quick clip to generate the lines aaa to zzz. Let
                        > > me know if anyone wants me to post the code.
                        >
                        > I put my hand up and would enjoy seeing that clip!
                        >
                        > Flo
                        >

                        My pleasure. Joy

                        Generate Base 26 numbers
                        ; by Joy
                        ^!Continue This will generate 17576 lines from aaa to zzz.

                        ^!SKIP Leave Screen update on? (Slower...)
                        ^!Setscreenupdate OFF
                        ^!StatusShow Generating sequences aaa to zzz...

                        ; Start with aaa
                        ^!Set %I%=-1

                        :LoopStart
                        ^!Inc %I%
                        ^!Set %Num%=^%I%

                        ; Find value of first digit (of 3) (will be 0 to 25)
                        ^!Set %x%=^$Calc(INT(^%Num%/676))$

                        ; Convert first digit to letter (will be a to z)
                        ^!Set %B26%=^$DecToChar(^$Calc(^%x%+97)$)$

                        ; adjust value of current number
                        ^!Set %Num%=^$Calc(^%Num% - (^%x%*676))$

                        ; Find value of second digit (of 3) (will be 0 to 25)
                        ^!Set %x%=^$Calc(INT(^%Num%/26))$

                        ; Convert second digit to letter (will be a to z) and append
                        ^!Set %B26%=^%B26%^$DecToChar(^$Calc(^%x%+97)$)$

                        ; adjust value of current number
                        ^!Set %Num%=^$Calc(^%Num% - (^%x%*26))$

                        ; Convert remaining value (0 to 25) to letter (will be a to z) and append
                        ^!Set %B26%=^%B26%^$DecToChar(^$Calc(^%Num%+97)$)$

                        ; Output value
                        ^!InsertText ^%B26%^%NL%

                        ^!If "^%B26%" <> "zzz" LoopStart

                        ^!Sound SystemExclamation
                      Your message has been successfully submitted and would be delivered to recipients shortly.