Loading ...
Sorry, an error occurred while loading the content.
 

Re: Empty Array Elements...

Expand Messages
  • flo.gehrke
    ... Paul, I m trying to understand your examples. When running... ^!Find {[^{}]++} WRS against the subject... {strongly|} you will get one match:
    Message 1 of 7 , Dec 8, 2010
      --- In ntb-clips@yahoogroups.com, "Paul" <xboa721@...> wrote:
      >
      > Hi all,
      >
      > I have a peculiar problem today! I want GetDocMatchAll to recognise an empty match and assign it to an array element (the empty string)!
      >
      > Here's the deal:
      >
      > ^!Find "{[^{}]++}" WRS
      >
      > finds nested bracket terms. Here's the challenge example:
      >
      > {strongly|}
      >
      > And the assignment code:
      > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
      >
      > And I want:
      > ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%
      >
      > to show ^%arrayelements_0%=2.
      >
      > Currently either GetDocMatchAll OR the array assignment is ignoring the 'blank' match that the original ^!Find locates. I haven't been able to determine which is the cause and I'm not sure if this can be accomplished.
      >
      > Naturally I'm considering using a token in the bracket like VOIDoption to indicate that the two options are either "strongly" or "". i.e. {strongly|VOIDoption} though it would be cleaner not to.
      >
      > Thanks in advance for your help.
      > Paul

      Paul,

      I'm trying to understand your examples. When running...

      ^!Find "{[^{}]++}" WRS

      against the subject...

      {strongly|}

      you will get one match: '{strongly|}'. So where's the "empty match"?

      ^!Find "[^|}{]++" WRS

      will get one match as well: 'strongly'. Again, no empty match.

      Accordingly,...

      ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
      ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%

      will output 'Nest_0 = 1' (and not '2').

      If there actually are "empty matches" (matches of zero length), NT will count them correctly. For example: When running...

      ^!SetArray %Array%=^$GetDocMatchAll("(?=.)")$
      ^!Info %Array0% = ^%Array0%

      against the subject

      abc

      the output will be '%Array0% = 3', because '(?=.)' matches at the position before any character, i.e. it achieves three empty matches (matches which don't consume any character). Consequently,...

      ^!Info ^%Array%

      will output ';;'. That is, NT will assign three empty values to the array.

      Regards,
      Flo
    • diodeom
      ... The pattern [^|}{] is quantified here with a plus, so it expects *at least one* non-pipe/bracket in order to make a match. If it were simply set to look
      Message 2 of 7 , Dec 8, 2010
        Paul <xboa721@...> wrote:
        >
        > I want GetDocMatchAll to recognise an empty match and assign it to an array element (the empty string)!
        >
        > Here's the deal:
        >
        > ^!Find "{[^{}]++}" WRS
        >
        > finds nested bracket terms. Here's the challenge example:
        >
        > {strongly|}
        >
        > And the assignment code:
        > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
        >
        > And I want:
        > ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%
        >
        > to show ^%arrayelements_0%=2.
        >
        > Currently either GetDocMatchAll OR the array assignment is ignoring the 'blank' match that the original ^!Find locates. I haven't been able to determine which is the cause and I'm not sure if this can be accomplished.
        >
        >

        The pattern [^|}{] is quantified here with a plus, so it expects *at least one* non-pipe/bracket in order to make a match.

        If it were simply set to look for zero or more, it would capture also unwanted "empties" before and after the brackets. One way to prevent it could be to demand that any match has to be both preceded and followed by a pipe/bracket:

        In selection: {strongly|}
        ^$GetDocMatchAll("(?<=[|}{])[^|}{]*+(?=[|}{])")$

        Alternatively, these funky look-behind/ahead assertions could be avoided if the selection that GetDocMatchAll operates on (acquired in your previous step) were reduced to just what's inside the brackets, e.g.:

        ^!Find "{\K[^{}]++" WRS
        Then in selection: strongly|
        ^$GetDocMatchAll("[^|}{]*+")
      • Eb
        Paul, without fully understanding the context, but expecting the content AND the curlies already selected, the easy answer is to assign the VBAR as array
        Message 3 of 7 , Dec 8, 2010
          Paul,

          without fully understanding the context, but expecting the content AND the curlies already selected, the easy answer is to assign the VBAR as array element, AND leave it out of your forbidden character set:

          ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^}{]+))$

          I'm not sure why you included a double '+' but to capture the example you will get a single array example with an array delimiter OTHER than the vbar. However, the VBAR as delimiter AND as part of the match will automatically return two elements, even if one (or both) is empty.

          Assuming you have a reason for the Find command, if you add look-behind and ahead assertions for the curlies, you will not even need a GetDocMatchAll function, but just an array assignment.

          i.e.

          ^!Find "(?<={)[^{}]+(?=})" WRS
          ^!SetArray ... = ^$GetSelection$

          (you may need to escape the curlies)


          Cheers,


          Eb


          --- In ntb-clips@yahoogroups.com, "Paul" <xboa721@...> wrote:
          >
          > ... assign it to an array element (the empty string)!
          >
          > ^!Find "{[^{}]++}" WRS
          > ;{strongly|}
          > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
          > ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%
          >
          > to show ^%arrayelements_0%=2.
        • Paul
          Thankyou for the replies. Without implementing the ideas presented (it s waaaaay to late tonight!) they make sense and I m sure will clear the sticking point
          Message 4 of 7 , Dec 8, 2010
            Thankyou for the replies.

            Without implementing the ideas presented (it's waaaaay to late tonight!) they make sense and I'm sure will clear the sticking point for me. Many thanks.

            In answers to the questions: Having removed myself from the details of the regex for so long I'd basically overlooked the double+ operator as a first port of call for the ^!Find.

            The lookbehind/ahead assertions really are no problem and in a subsequent search I have to use them anyway.. however it remains to be seen from testing what is required and what works best.

            I'm intrigued by a direct search without the use of GDMA however the complexity of search within search means the current system I'm running I might just stick with :) Then again, I will get a chance to try it out soon. Perhaps the processing time is better.

            Certainly, the proof's in the pudding and I'll be baking soon!

            Kind regards,
            Paul

            p.s. does anyone run an annual competition for search terms that regex can't find? thought not! ;)


            --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
            >
            > Paul,
            >
            > without fully understanding the context, but expecting the content AND the curlies already selected, the easy answer is to assign the VBAR as array element, AND leave it out of your forbidden character set:
            >
            > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^}{]+))$
            >
            > I'm not sure why you included a double '+' but to capture the example you will get a single array example with an array delimiter OTHER than the vbar. However, the VBAR as delimiter AND as part of the match will automatically return two elements, even if one (or both) is empty.
            >
            > Assuming you have a reason for the Find command, if you add look-behind and ahead assertions for the curlies, you will not even need a GetDocMatchAll function, but just an array assignment.
            >
            > i.e.
            >
            > ^!Find "(?<={)[^{}]+(?=})" WRS
            > ^!SetArray ... = ^$GetSelection$
            >
            > (you may need to escape the curlies)
            >
            >
            > Cheers,
            >
            >
            > Eb
            >
            >
            > --- In ntb-clips@yahoogroups.com, "Paul" <xboa721@> wrote:
            > >
            > > ... assign it to an array element (the empty string)!
            > >
            > > ^!Find "{[^{}]++}" WRS
            > > ;{strongly|}
            > > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
            > > ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%
            > >
            > > to show ^%arrayelements_0%=2.
            >
          • Paul
            Ah.. the simple answer here Eb is that a document may contain the following: {For an example|Zum bespiel} this is {a good|ein besser|the best} sentence to
            Message 5 of 7 , Dec 8, 2010
              Ah.. the simple answer here Eb is that a document may contain the following:

              {For an example|Zum bespiel} this is {a good|ein besser|the best} sentence to {understand|comprehend|make sense of} the {higher|} purpose behind the program's {intention|purpose|function|ideology}.

              The first {part{ing|ner} left the house in {tatters|pristine condition} as the absent minded vicar {rowed across the {creek.|Thames.}|ran around in circles!}

              Ok, so it's not simple but did you ever read a Choose Your Own Adventure Story by Edward Packard? Great stuff. Well, it's a similar idea run on a nested bracket system. The purpose lends itself to article marketing and chasing lazy uni students submitting other's work as their own (perhaps with a few words changed).

              So that's content, and yes the curlies were already selected. So if I understand your comment correctly, leaving the VBAR out of the forbidden char set is not an option, as per the previous examples.

              Cheers.

              --- In ntb-clips@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
              >
              > Paul,
              >
              > without fully understanding the context, but expecting the content AND the curlies already selected, the easy answer is to assign the VBAR as array element, AND leave it out of your forbidden character set:
              >
              > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^}{]+))$
              >
              > I'm not sure why you included a double '+' but to capture the example you will get a single array example with an array delimiter OTHER than the vbar. However, the VBAR as delimiter AND as part of the match will automatically return two elements, even if one (or both) is empty.
              >
              > Assuming you have a reason for the Find command, if you add look-behind and ahead assertions for the curlies, you will not even need a GetDocMatchAll function, but just an array assignment.
              >
              > i.e.
              >
              > ^!Find "(?<={)[^{}]+(?=})" WRS
              > ^!SetArray ... = ^$GetSelection$
              >
              > (you may need to escape the curlies)
              >
              >
              > Cheers,
              >
              >
              > Eb
              >
              >
              > --- In ntb-clips@yahoogroups.com, "Paul" <xboa721@> wrote:
              > >
              > > ... assign it to an array element (the empty string)!
              > >
              > > ^!Find "{[^{}]++}" WRS
              > > ;{strongly|}
              > > ^!SetArray %Nest^%Index%_%=^$GetDocMatchAll([^|}{]++)$
              > > ^!Info Nest^%Index%_0 = ^%Nest^%Index%_0%
              > >
              > > to show ^%arrayelements_0%=2.
              >
            • Paul
              Thanks Diodem, With reference to: ^!Find {[^{}]++} WRS ... If there is no match I want to exit the search routine. ... This works superbly, albeit with the
              Message 6 of 7 , Dec 9, 2010
                Thanks Diodem,

                With reference to: ^!Find "{[^{}]++}" WRS

                > The pattern [^|}{] is quantified here with a plus, so it expects *at least one* non-pipe/bracket in order to make a match.

                If there is no match I want to exit the search routine.

                > ^$GetDocMatchAll("(?<=[|}{])[^|}{]*+(?=[|}{])")$

                This works superbly, albeit with the funky look-behind/ahead assertions!

                > ..if the selection that GetDocMatchAll operates on (acquired in your previous step) were reduced to just what's inside the brackets, e.g.:
                >
                > ^!Find "{\K[^{}]++" WRS

                This knocks out the LHS curly from the found term which is a problem when I replace it with a token.

                Test
                As a quick test using the following text:

                7. {Knocking it|Putting it} Together
                The {old school|old fashioned|traditional} way to get a screw {into|in} a piece of wood {is|was} to use a {screwdriver|screw driver}! {As with|Like using} any hand tool this is a bit of a {practised|skilled} art.

                I {strongly|} recommend you beg-borrow-steel a cordless {driver|screwdriver} or {at least|as a second option} a cordless drill. Driving a screw at a {steady|constant} {pace|rate|speed} into the wood will give the best hold.


                I get the following results:

                Nest Number Contents

                1: 2 Knocking it Putting it
                2: 3 old school old fashioned traditional
                3: 2 into in
                4: 2 is was
                5: 2 screwdriver screw driver
                6: 2 As with Like using
                7: 2 practised skilled
                8: 2 strongly
                9: 2 driver screwdriver
                10: 2 at least as a second option
                11: 2 steady constant
                12: 3 pace rate speed

                7. *1* Together
                The *2* way to get a screw *3* a piece of wood *4* to use a *5*! *6* any hand tool this is a bit of a *7* art.

                I *8* recommend you beg-borrow-steel a cordless *9* or *10* a cordless drill. Driving a screw at a *11* *12* into the wood will give the best hold.

                Result? I get nest(8) reporting 2 options, one of which is blank, as desired.

                Curiously ^!Find "{[^{}]*+}" WRS
                also works which got me a little puzzled.. shouldn't 0 or more of the [^{}] match *any* text? Even with the ungreedy? Not a biggie. Unless you can see it obviously I wouldn't spend time on it. Cheers.

                Thankyou very much.
                Paul
              Your message has been successfully submitted and would be delivered to recipients shortly.