Loading ...
Sorry, an error occurred while loading the content.

18616[Clip] Re: Line frequency analysis.

Expand Messages
  • Sheri
    Nov 3, 2008
      --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
      >
      > Sheri wrote:
      > > --- In ntb-clips@yahoogroups.com, John Fitzsimons <johnf@> wrote:
      > >
      > >> I want to end up with a list like......
      > >>
      > >> 000001,0.verizon.windows2000
      > >> 000003,0.verizon.windowsxp
      > >> 000012,24hoursupport.helpdesk
      > >> 000008,alt.computer
      > >>
      > >> Is there an existing way/clip to do this ? If not then can
      > >> someone provide the needed code to produce this result please ?
      > >>
      > >
      > > This will do it exactly as above, but version 5+ is required:
      > >
      > > ^!SetScreenUpdate Off
      > > ^!Jump Doc_End
      > > ^!If ^$GetCol$>1 Next Else Skip
      > > ^!InsertText ^P
      > > ^!Jump Doc_Start
      > > :Loop
      > > ^!Find "^(.+\r\n)\1*" RS
      > > ^!IfError Quit
      > > ^!Set %count%=^$StrCount("^%NL%";"^$GetSelection$";Yes;Yes)$
      > > ^!Set %fill%=^$Calc(6-^$StrSize(^%count%)$)$
      > > ^!Replace "(.+\r\n)\1*" >> "^$StrFill("0";^%fill%)$^%count%,$1" RHS
      > > ^!Goto Loop
      > > :Quit
      > > ^!ClearVariable %count%
      > > ^!ClearVariable %fill%
      > > ;end of clip
      >
      >
      > I have done that much less efficiently in the past.
      >
      > I'll try to break it down ...
      > set screen update off just speeds life up
      > jump doc end takes us to the end

      > I think you are adding a blank line at the end next ...
      > interesting way -- although you could have multiple blank lines
      > at the end and you aren't removing them

      Doesn't matter to the clip if there are multiple blank lines at the
      end -- only thing that matters is, the last line with content needs to
      have a CRLF aka ^%NL% at the end of it.

      > Loop ... finds basically anything of one character or more
      > followed by a new line

      > if there is an error it quits ... so if there were a blank line
      > in the middle of the list, it would quit because it is less than
      > one in length?

      Find finds next, it skips what doesn't match. So a blank line in the
      middle is skipped over. Only problem with a blank line in the middle
      would be if it occurs in the middle of repeated content. Each set of
      repeated content needs to be consecutive, so a blank line would cause
      multiple count outputs for that content/line.

      >
      > PLEASE EXPLAIN THIS PART:
      > It apparently counts the number of times the string occurs in the
      > document? -- this part confused me a little. I assume that the
      > \1* is the key part because it finds all incidents of that term,
      > highlights them and deletes them to replace them with your final
      > product.

      \1 Matches the same thing that matched for substring 1, i.e., the part
      in parentheses (.+\r\n)

      IOW, a repetition of the whole line.

      The asterisk after \1 says it can match zero or more times.

      After the find, the repeated lines are selected. So to find the number
      of repetitions, I just count the number of ^%NL%'s in the selection
      (aka highlight).

      >
      > I get the $1 being the () part of the find.
      > From there it essentially just cycles.
      >
      >
      > Sheri, even though his data was sorted, should we not do a sort
      > to begin?

      To do its job, the repetitions need to be consecutive, so unless
      already sorted, yes.

      > What about word wrap. Could it affect your outcome at all?

      No, word wrap does not add ^%NL%'s

      >
      >
      > Here is how I did something similar without regex -- I bet yours
      > is faster. In my case I am taking one element out of a delimited
      > list vs his example that has just one element, ie, the entire
      > line.

      > Here is mine:

      > :NewTeam
      > ;first time set team
      > ^!Set %GrabField2%=^$GetField(^$GetRow$;^%TeamField%)$
      > ;^!Info ^%GrabField2%
      > ^!Set %Team%=^$GetSelection$
      > ^!Set %TeamCount%=0
      > ;^!Info ^%TeamCount%
      > :Loop
      > ;get team for this line
      > ^!Set %GrabField2%=^$GetField(^$GetRow$;^%TeamField%)$
      > ;be sure that this line is for current team
      > ;if not, go to ProcessTeam
      > ;otherwise continue here
      > ^!If "^%Team%" <> "^$GetSelection$" ProcessTeam
      > ^!Set %TeamCount%=^$Calc(^%TeamCount%+1;0)$
      > ;^!Info ^%TeamCount%
      > ^!Jump +1
      > ^!GoTo Loop
      >
      > :ProcessTeam
      > ;There is more here that does something with the info
      > ^!GoTo NewTeam
      >

      That looks fine, John may prefer it since he was still using 4.95 in a
      previous posting this year. Setting screenupdate off might make it faster.

      John, there a free Light version of 5.7b (latest version) that
      includes clipcode and regex capability.

      Regards,
      Sheri
    • Show all 15 messages in this topic