Re: Line frequency analysis.

• ... That works fine. Here s one that ignores trailing white space and empty lines in the data (but goes back to counting strings for count): ^!SetScreenUpdate
That works fine. Here's one that ignores trailing white space and
empty lines in the data (but goes back to counting strings for count):

^!SetScreenUpdate off
^!Jump Doc_Start
:Loop
^!Find "^(.+)(\s*\r\n\1)*" RS
^!IfError Out
^!SetArray %farray%=^\$GetReSubstrings\$
;begin long line
^!Set
%Count%=^\$StrCount("^%NL%";"^\$GetDocReplaceAll("\s*(\r\n)+|(?<=.)\z";"\r\n")\$";YES;YES)\$
;end long line
^!Set %Fill%=^\$Calc(6-^\$StrSize(^%Count%)\$)\$
^!InsertText ^\$StrFill("0";^%fill%)\$^%Count%,^%farray1%
^!Goto Loop
:Out
^!Set %farray%=""
^!ClearVariable %farray%
^!ClearVariable %Count%
^!ClearVariable %Fill%
;end of clip
• ... z ; r n )\$ ;YES;YES)\$ ... Sheri, I think there s a problem with this solution. It ignores trailing blanks and empty lines in the counting of duplicate
Sheri,

I think there's a problem with this solution. It ignores trailing
blanks and empty lines in the counting of duplicate lines but not in
the ^!Find command. So if we have got...

0.verizon.windowsxp <-- trailing blank
0.verizon.windowsxp
0.verizon.windowsxp

for example, the clip doesn't find three duplicates but interprets
this as a singular line plus two duplicates of another line. That is,
it works fine only on the condition that all three duplicates end
with a trailing blank.

So it may be better to remove trailing blanks prior to ^!Find. Isn't
it?

Regards,
Flo

P.S. By the way: It makes no difference -- but what about...

^\$GetDocReplaceAll("\s+\$|(?<=.)\z";"\r\n")\$

(since NT v.5.0 the \s matches "any white space", including CRNL).

• ... You re right. ... Not necessary if we change the ^!Find to this: ^!Find ^(.+ S)( s* r n 1)* RS ... Haven t tested that and it may work fine. However it
• ... Hmn, maybe it should be ^!Find ^(.* S)( s* r n 1)* RS just in case there is only one visible character on the line. Regards, Sheri
Hmn, maybe it should be

^!Find "^(.*\S)(\s*\r\n\1)*" RS

just in case there is only one visible character on the line.

Regards,
Sheri
• ... Indeed that pattern matches in the middle of r n. So after replacement, we end up with r n n. The reason it makes no difference to the outcome is because
Indeed that pattern matches in the middle of \r\n. So after
replacement, we end up with \r\n\n. The reason it makes no difference
to the outcome is because we are counting "\r\n" after applying this
in ^\$GetDocReplaceAll\$. You would be able to see a problem if that
were getting inserted in the document (however, something else happens
when you insert it, \r\n\n becomes \r\n\r\n because the input control
needs line breaks to be \r\n -- you can test the string size prior to
insertion vs testing the size of a selection after insertion).

I think it is preferable to greedily replace white space that precedes
"\r\n" with "". The white space does include other CRLFs.

Regards,
Sheri
• ... Yes, I can see now what happens here. I think your explanation is in better accordance with the PCRE Documentation than the NoteTab Help on RegEx. The
Yes, I can see now what happens here. I think your explanation is in
better accordance with the PCRE Documentation than the NoteTab Help
on RegEx. The latter says: "\$ assert end of string (or line, in
multiline mode)".

The PCRESYNTAX Documentation from PCRE 7.7 is more detailed: "\$ end
of subject, also before newline at end of subject, also before
internal newline in multiline mode."

Thanks again, Sheri! I can clearly see the difference now and
why "^\$GetDocReplaceAll("\s+|(?<=.)\z";"\r\n")\$" doesn't affect the
result.

Flo

