Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Re: Finding gaps in a sequence

Expand Messages
  • Art Kocsis
    ... If I am interpreting correctly what you said here, the statement is not correct - there is no gap using the alphabet as symbols for a base 26 numbering
    Message 1 of 29 , Dec 1, 2011
    • 0 Attachment
      At 11/30/2011 13:28, Eb wrote:
      >The alphabet is like a base-26 number set (English aplhabet), after
      >shifting a to zero. Straight conversion to numbers creates gaps, where it
      >rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of
      >the next digit, and azz to baa has a gap much larger!

      If I am interpreting correctly what you said here, the statement is not
      correct - there is no gap using the alphabet as symbols for a base 26
      numbering system.

      Any integer (including negative ones), may be used as a base for counting
      sequentially and takes the form: sum(d(i) * b^i) where "d" is the ith
      "digit" (right to left, 0 based) and "b" is the base-1. In the case of
      using the alphabet symbols to represent base 26 digits: a=0, b=1 ... z=25
      and the base 10 value of any such number would be d2 * 26^2 + d1 * 26^1 +
      d0 * 26^0 or d2*676 +d1*26 + d0*1.

      Thus aaz = 0*676 + 0*26 + 25*1 = 25 and aba = 0*676 + 1*26 +0*1 = 26
      (no gap)
      Also azz = 0*676 + 25*26 + 25*1 = 675 and baa = 1*676 + 0*26 +0*1 = 676
      (again, no gap)

      Your code uses does correctly so the statement may just be ambiguously worded.

      BTW, very clever use of ^!Inc & ^!Dec to do arithmetic! I'll have to
      remember that.

      I have noted that none of the suggested solutions have done any input data
      verification but all assume that each line truly begins with a three (lower
      case) alpha character. Your use of ^$GetDocMatchAll("^[a-z]{3}")$ to
      extract the sequence codes would seem to offer a simple, one-line way to
      verify that assumption: just compare the size of the ^%codes% array to the
      line count of the source document.

      ^!If ^$GetParaCount$ <> %codes0% ^!Continue Input data error - missing
      sequence code(s)


      Namaste', Art
    • Eb
      Joy, I observed a gap while using a non-mathematical (== ) technique to convert from base 26 (the alphabet) to base 16 by using the ascii codes: aaz == 0 x 41
      Message 2 of 29 , Dec 2, 2011
      • 0 Attachment
        Joy,

        I observed a gap while using a non-mathematical (==>) technique to convert from base 26 (the alphabet) to base 16 by using the ascii codes:

        aaz ==> 0 x 41 41 5A = 4,276,570
        aba ==> 0 x 41 42 41 = 4,276,801

        Once I shifted to the base 26 array approach, I may have stayed in the haze of non-math confusion for a bit longer <g>.


        Eb


        --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@...> wrote:
        >
        >
        >
        > --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
        > >
        > > Hi Flo,
        > >
        > > You have are right in what the hex conversion was supposed to do.
        > > In the mean time I found my original char to hex clip, which only converted a single digit. I applied the single-digit approach to your problem. While I got it to work, it just raised another problem.
        > >
        > > The alphabet is like a base-26 number set (English aplhabet), after shifting a to zero. Straight conversion to numbers creates gaps, where it rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of the next digit, and azz to baa has a gap much larger!
        >
        >
        > Sorry if I misunderstood you but I'll reply just in case in order to save you possible extra work and confusion...
        >
        > You said aaz to aba has a gap but you correctly noted a=0...z=25.
        > Therefore, aaz=(0*26^2 + 0*26 + 25)=25 and aba=(0*26^2 + 1*26 + 0)=26 - No gap. Likewise, azz=675 and baa=676. Again, no gap.
        >
        > Hope this helps, sorry if I misunderstood.
        >
        > Joy
        >
      • Eb
        Yes, I was still confused by my earlier attempt to convert character codes to hex codes using ascii. My test clip still had elements of hex code in it. Color
        Message 3 of 29 , Dec 2, 2011
        • 0 Attachment
          Yes, I was still confused by my earlier attempt to convert character codes to hex codes using ascii.

          My test clip still had elements of hex code in it.

          Color me embarrassed.

          Eb

          --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@...> wrote:
          >
          > At 11/30/2011 13:28, Eb wrote:
          > >The alphabet is like a base-26 number set (English aplhabet), after
          > >shifting a to zero. Straight conversion to numbers creates gaps, where it
          > >rolls to the next digit, i.e. aaz --> aba has a gap of 26!, the value of
          > >the next digit, and azz to baa has a gap much larger!
          >
          > If I am interpreting correctly what you said here, the statement is not
          > correct - there is no gap using the alphabet as symbols for a base 26
          > numbering system.
        • ebbtidalflats
          Hi Art, I suspect that none of the people offering solutions are privy to the format of the data file. So verifying input must be left to Flo. For example, the
          Message 4 of 29 , Dec 2, 2011
          • 0 Attachment
            Hi Art,

            I suspect that none of the people offering solutions are privy to the format of the data file. So verifying input must be left to Flo.

            For example, the ^$GetDocMathcAll statement must include the field delimiter to avoid also matching the first three characters of longer words, which might not be index codes at all.


            Cheers


            Eb

            --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@...> wrote:
            > ...
            > I have noted that none of the suggested solutions have done any input data
            > verification but all assume that each line truly begins with a three (lower
            > case) alpha character. Your use of ^$GetDocMatchAll("^[a-z]{3}")$ to
            > extract the sequence codes would seem to offer a simple, one-line way to
            > verify that assumption: just compare the size of the ^%codes% array to the
            > line count of the source document.
            >
            > ^!If ^$GetParaCount$ <> %codes0% ^!Continue Input data error - missing
            > sequence code(s)
            >
            >
            > Namaste', Art
            >
          • flo.gehrke
            ... Friends, I started this topic with message #22221 writing... ... So why speculating about the format of the data? Why inventing characters and strings
            Message 5 of 29 , Dec 2, 2011
            • 0 Attachment
              > --- In ntb-clips@yahoogroups.com, Art Kocsis <artkns@> wrote:
              > I have noted that none of the suggested solutions have done any
              > input data verification but all assume that each line truly
              > begins with a three (lower case) alpha character...

              --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...> wrote:
              >
              > Hi Art,
              >
              > I suspect that none of the people offering solutions are privy to
              > the format of the data file. So verifying input must be left to Flo.

              Friends,

              I started this topic with message #22221 writing...

              > I've got a database where each record is indexed with an alpha-code
              > from 'aaa' to 'zzz'. Every now and then, I want to find out if there
              > is a gap in a sorted list of these codes. There's a gap, for
              > example, in...
              >
              > zbx
              > zby
              > zbz
              > zca
              > zcc
              > zcd

              So why speculating about the format of the data? Why inventing characters and strings which actually are not there?

              "For we write none other things unto you,
              than what ye read or acknowledge..."
              Corinthians 2, 1:13

              Flo
            • joy8388608
              Flo - Very interesting. Your clip is much faster than mine even when I turned ScreenUpdate off. Mine took 41 seconds and yours took 15 for 17550 lines (aaa to
              Message 6 of 29 , Dec 5, 2011
              • 0 Attachment
                Flo -

                Very interesting. Your clip is much faster than mine even when I turned ScreenUpdate off. Mine took 41 seconds and yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why? I'm not sure. Perhaps working with an array even though the lines on a screen are probably just another type of array.

                This has been fun, interesting, and I've learned several new things.

                Oh, yes. You don't have to, but as I posted previously, you can modify the value of %AZ% to "bcdefghijklmnopqrstuvwxyz" (remove the 'a') for correctness.

                Thanks,
                Joy

                P.S. On the off chance anyone else (still) wants to play with this for learning purposes, I wrote a quick clip to generate the lines aaa to zzz. Let me know if anyone wants me to post the code.


                --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                >
                > Joy,
                >
                > I also went through your clip again (messages #22230, #22245). I like that formula '^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$' which, actually, seems to be the heart of your solution.
                >
                > So I combined it with some ideas of my first concept and managed to speed up your clip significantly. Originally, your clip needed 78 seconds (on my notebook) to check a list of 10,000 codes. The following version is doing it in 9 seconds:
                >
                >
                > ^!SetHintInfo Working...
                > ; Assign code list to array %List%
                > ^!SetListDelimiter ^%NL%
                > ^!SetArray %List%=^$GetText$
                > ^!Set %AZ%="abcdefghijklmnopqrstuvwxyz"
                > ^!Set %i%=1
                >
                > :CodeToInt
                > ; Save current code to variable for later output in case of gap
                > ^!Set %CurrCode%=^%List^%i%%
                > ; Convert code to number(with Joy's formula)
                > ^!Set %First%=^$Convert(^%List^%i%%)$
                > ^!Inc %First%
                > ^!Inc %i%
                > ^!If ^%i% > ^%List0% Out
                > ^!Set %Second%=^$Convert(^%List^%i%%)$
                > ^!IfSame ^%First% ^%Second% CodeToInt Else False
                >
                > :False
                > ^!Append %Gaps%=^%CurrCode%^P
                > ^!Goto CodeToInt
                >
                > :Out
                > ^!IfEmpty ^%Gaps% Next Else Skip_2
                > ^!Info No gaps!
                > ^!Goto Skip_3
                > ^!Toolbar New Document
                > ^!InsertText Gap found after...^P^%Gaps%
                > ^!Toolbar Second Window
                > ^!ClearVariables
                >
                >
                > The sublip with custom function ^$Convert$ and your formula is...
                >
                > ^!Set %C1%=^$StrIndex(^&;1)$
                > ^!Set %C2%=^$StrIndex(^&;2)$
                > ^!Set %C3%=^$StrIndex(^&;3)$
                > ^!Set %V1%=^$StrPos(^%C1%;^%AZ%;0)$
                > ^!Set %V2%=^$StrPos(^%C2%;^%AZ%;0)$
                > ^!Set %V3%=^$StrPos(^%C3%;^%AZ%;0)$
                > ^!Result ^$Calc(^%V1%*676 + ^%V2%*26 + ^%V3%)$
                >
                >
                > Thanks again for your proposal! Maybe you'll have a look at this revised version...
                >
                > Regards,
                > Flo
              • flo.gehrke
                ... Joy, I think there are three main reasons for that: 1. Assigning the whole list to an array 2. Calculating ^$ConvertTo26$ only twice -- it s done three
                Message 7 of 29 , Dec 5, 2011
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@...> wrote:
                  >
                  > Flo -
                  >
                  > Very interesting. Your clip is much faster than mine even
                  > when I turned ScreenUpdate off. Mine took 41 seconds and
                  > yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines
                  > removed). Why? I'm not sure...

                  > Flo -
                  >
                  > Very interesting. Your clip is much faster than mine even when
                  > I turned ScreenUpdate off. Mine took 41 seconds and yours took
                  > 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why?
                  > I'm not sure...

                  Joy,

                  I think there are three main reasons for that:

                  1. Assigning the whole list to an array

                  2. Calculating ^$ConvertTo26$ only twice -- it's done three times in your clip

                  3. Gathering up the gaps with ^!Append and outputting them all at once -- no ^!InsertText

                  > I wrote a quick clip to generate the lines aaa to zzz. Let
                  > me know if anyone wants me to post the code.

                  I put my hand up and would enjoy seeing that clip!

                  Flo
                • joy8388608
                  ... My pleasure. Joy Generate Base 26 numbers ; by Joy ^!Continue This will generate 17576 lines from aaa to zzz. ^!SKIP Leave Screen update on? (Slower...)
                  Message 8 of 29 , Dec 7, 2011
                  • 0 Attachment
                    --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                    >
                    > --- In ntb-clips@yahoogroups.com, "joy8388608" <mycroftj@> wrote:
                    > >
                    > > Flo -
                    > >
                    > > Very interesting. Your clip is much faster than mine even
                    > > when I turned ScreenUpdate off. Mine took 41 seconds and
                    > > yours took 15 for 17550 lines (aaa to zzz with 26 .rr lines
                    > > removed). Why? I'm not sure...
                    >
                    > > Flo -
                    > >
                    > > Very interesting. Your clip is much faster than mine even when
                    > > I turned ScreenUpdate off. Mine took 41 seconds and yours took
                    > > 15 for 17550 lines (aaa to zzz with 26 .rr lines removed). Why?
                    > > I'm not sure...
                    >
                    > Joy,
                    >
                    > I think there are three main reasons for that:
                    >
                    > 1. Assigning the whole list to an array
                    >
                    > 2. Calculating ^$ConvertTo26$ only twice -- it's done three times in your clip
                    >
                    > 3. Gathering up the gaps with ^!Append and outputting them all at once -- no ^!InsertText
                    >
                    > > I wrote a quick clip to generate the lines aaa to zzz. Let
                    > > me know if anyone wants me to post the code.
                    >
                    > I put my hand up and would enjoy seeing that clip!
                    >
                    > Flo
                    >

                    My pleasure. Joy

                    Generate Base 26 numbers
                    ; by Joy
                    ^!Continue This will generate 17576 lines from aaa to zzz.

                    ^!SKIP Leave Screen update on? (Slower...)
                    ^!Setscreenupdate OFF
                    ^!StatusShow Generating sequences aaa to zzz...

                    ; Start with aaa
                    ^!Set %I%=-1

                    :LoopStart
                    ^!Inc %I%
                    ^!Set %Num%=^%I%

                    ; Find value of first digit (of 3) (will be 0 to 25)
                    ^!Set %x%=^$Calc(INT(^%Num%/676))$

                    ; Convert first digit to letter (will be a to z)
                    ^!Set %B26%=^$DecToChar(^$Calc(^%x%+97)$)$

                    ; adjust value of current number
                    ^!Set %Num%=^$Calc(^%Num% - (^%x%*676))$

                    ; Find value of second digit (of 3) (will be 0 to 25)
                    ^!Set %x%=^$Calc(INT(^%Num%/26))$

                    ; Convert second digit to letter (will be a to z) and append
                    ^!Set %B26%=^%B26%^$DecToChar(^$Calc(^%x%+97)$)$

                    ; adjust value of current number
                    ^!Set %Num%=^$Calc(^%Num% - (^%x%*26))$

                    ; Convert remaining value (0 to 25) to letter (will be a to z) and append
                    ^!Set %B26%=^%B26%^$DecToChar(^$Calc(^%Num%+97)$)$

                    ; Output value
                    ^!InsertText ^%B26%^%NL%

                    ^!If "^%B26%" <> "zzz" LoopStart

                    ^!Sound SystemExclamation
                  Your message has been successfully submitted and would be delivered to recipients shortly.