Loading ...
Sorry, an error occurred while loading the content.

RE: [NTS] Changing CR & LF

Expand Messages
  • John Shotsky
    Art, Nice essay. Just for clarification, my usage doesn t use those two clips in sequence - one is used at the beginning, the other is used at the end. I do
    Message 1 of 16 , Jun 3, 2008
    • 0 Attachment
      Art,

      Nice essay.

      Just for clarification, my usage doesn't use those two clips in sequence - one is used at the beginning, the other is
      used at the end. I do all the processing within my clip libraries (which now run to thousands of lines) using \n
      internally. That is, it is an easy form in which to work within the clips, but they are all changed back at exit.

      In Options, you can change your 'Save As' format, but it will not really affect the document you have opened - it
      applies to new documents. It is counterintuitive, but if you open a Unicode document, then save it, you will see that
      it's still in Unicode regardless of the format you have specified for output. You can see this by looking at the file
      size. If you copy the contents of that document and paste into a new document, then save, you'll see that the file size
      has reduced by 50%, as it should. This really threw me, because my clips are not designed to run on Unicode documents,
      so the clip library would just fail miserably. I was getting Unicode from OCR because of wider support for symbols.so
      now, if I receive Unicode documents, I immediately convert them by copy/paste/save, and go from there.

      You can

      From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of Art Kocsis
      Sent: Tuesday, June 03, 2008 6:54 AM
      To: ntb-scripts@yahoogroups.com
      Subject: RE: [NTS] Changing CR & LF

      Warning - long post, question at end.

      Well, Thanks to you guys and the help files some progress has been made.
      Probably most of this is quite well known to you all but I need to document
      what I have learned and maybe it will do someone else some good.

      I don't know about you but oftentimes with a problem such as this I get very
      frustrated. Seemingly duplicate tests yield different results. It typically
      means something else is going on that is unknown. That is what happened here.

      Lotta, you gave me a clue but I glossed over it until I found it explicitly
      in the RegEx help file:

      "all types of documents are temporarily converted to Windows texts
      for display purposes" [i.e., CRLF for line terminators]

      So all my test cases - using CR (0Dh), LF (0Ah) or CRLF (0D0Ah) as line
      terminators - were identical inside NoteTab. This is true for NoteTab 4.95
      as well as 5.61.

      Test (EOL=0A).txt
      ----------------------------------------------------------
      00000000 24242424 24240A24 24242424 240A2424 $$$$$$?$$$$$$?$$
      00000010 24242424 0A252525 $$$$?%%%

      Test (EOL=0A) Cut & Paste to UE.txt
      ----------------------------------------------------------
      00000000 24242424 24240D0A 24242424 24240D0A $$$$$$??$$$$$$??
      00000010 24242424 24240D0A 252525 $$$$$$??%%%

      So I could load this Unix file, look at it, play with it, add new
      lines and then save it. When I looked at the saved file it still
      has the LF terminators. However, when I cut and pasted the text
      from NoteTab into a hex editor, I can compare the disk contents
      to the NoteTab contents and see the CRLF terminator replacement.
      The same holds true for a Mac file. So that little mystery is
      resolved.

      Just to be pedantic and unambiguous a Unix file uses a LF (0Ah)
      for its line terminator, a Mac file uses a CR (0Dh) and a Windows
      file uses both CRLF (0D0Ah) in that order.

      I also discovered that a right click on a tab results in a pull
      down menu that has a "Save Format" sub-menu where one can specify
      the save format to be Windows, Unix, Mac, EBCDIC or Original as
      well as use ANSI or ASCII character sets. That would have been
      nice to notice years ago instead of just now ;(. This only sets
      the save mode for that tab only, however. Is there a global default
      setting somewhere? the default now seems to be "Original".

      So back to the RegEx problem. I have to say the RegEx help file
      alone was worth the $10 for the upgrade. [That doesn't mean that
      I know RegEx, it is by far, a LOT clearer and more comprehensive
      than anything else that I found.] I was also struck by how complex
      RegEx and its implementation is. My head is still reeling and I may
      be more confused now than when I started.

      According to the RegEx help file, the NoteTab implementation operates
      in non-utf8 mode. [Someday I will take the time to find out just what
      utf8 is!] And Notetab defaults to BSR_ANYCRLF mode:

      "NoteTab version 5.61 uses PCRE's newline option of ANYCRLF by default.
      Earlier
      versions defaulted to CRLF. ... NoteTab also uses by default PCRE's
      BSR_ANYCRLF
      option, which allows \R (i.e., backslash-R) to match linebreak
      characters
      related to Windows, Unix and Mac text files."

      As I understand it, (BSR is an abbreviation for "backslash R".), the
      BSR_ANYCRLF
      mode means that \R will match CR, LF, or CRLF line endings but not any
      Unicode ending.
      I guess this is good but it seems redundant (except for disk files), since
      all line endings are
      converted to CRLF while editing in windows. What I don't understand is the
      discussion about
      specifying a newline convention via (*CR), (*LF), (*CRLF), (*ANYCRLF) or
      (*ANY). Is this
      redundant with BSR_ANYCRLF, more fine tuned, or ??? Are these something to
      set globally or
      in front of every pattern?

      So it would appear that I could implement my clip to remove all double line
      endings (i.e., to
      create a single spaced document), by simply replacing all one or more
      occurrences of CRLF
      to a single CRLF and then setting the format mode to Windows/DOS (or at
      least saving it as
      Windows/DOS). Something like: CRLF+ >> CRLF

      The RegEx help doesn't say anything about how to use \R so I may have to
      take back some
      of my praise for the help file. The regular Notetab helps some.

      However, John's technique does seem to work - at least resulting is single
      spacing.

      ^!Replace "\R+" >> "\n" ARSTW
      ^!Replace "\n" >> "^%NL%" ARSTW

      This seems inefficient, so ...

      ^!Replace "\R+" >> "^%NL%" ARSTW

      also works [once I get the right syntax<g>].

      This seems almost ridiculous - hours and hours of hair pulling and
      frustration for
      a single line of code!!! Yet this part works much faster than the non RegEx
      clip.

      So now I am left with the problem of setting the save format mode to windows.
      I would prefer not actually saving the file, just setting the mode for a
      later save.
      Lotta, you mentioned something about changing an INI value in a clip. What
      value? Can I make the Windows mode a default or force all saves to be windows
      mode?

      Thanks again for all your help (and patience for reading all of this!).

      Art



      [Non-text portions of this message have been removed]
    • Art Kocsis
      Hi John, Thanks for the tip. I wonder how many other settings in the options have I not seen all these years. Actually, setting the Options | Documents |
      Message 2 of 16 , Jun 3, 2008
      • 0 Attachment
        Hi John,

        Thanks for the tip. I wonder how many other settings in the options have I
        not seen all these years. Actually, setting the "Options | Documents |
        Format Save As:" to "windows" DOES
        work for existing documents (unless you have over ridden the setting via a
        right click). It even
        works for docs that are already loaded. Try it. Load some docs of various
        types, change the
        "save as" mode and then right click on the tabs to verify the "save as"
        mode. Anyway, what
        used to take forever (30 seconds or more for large files) is now down to
        less than a second.
        RegEx rocks!!! [But I do wish it had easier syntax.]

        Yes, I knew you did your clip processing between the two replace
        statements. However, my
        first attempt at a one liner didn't work so I tried your sequence. Then
        comparing them I found
        I had missed a caret before the %NL%. RegEx is nice but it is extremely
        picky about syntax.

        Regarding the Unicode, I don't run into Unicode files very often but did
        discover that copy and
        save trick. I noticed that the "Options | General | Protect Unicode Files"
        had been checked by
        default. Would unchecking it eliminate the need to copy and save? I don't
        have a known
        Unicode file to check it out.

        Thanks again, Art

        At 6/3/2008 07:23 AM, John Shotsky wrote:
        >Art,
        >
        >Nice essay.
        >
        >Just for clarification, my usage doesn't use those two clips in sequence -
        >one is used at the beginning, the other is used at the end. I do all the
        >processing within my clip libraries (which
        >now run to thousands of lines) using \n internally. That is, it is an easy
        >form in which to work
        >within the clips, but they are all changed back at exit.
        >
        >In Options, you can change your 'Save As' format, but it will not really
        >affect the document you have opened - it applies to new documents. It is
        >counterintuitive, but if you open a Unicode
        >document, then save it, you will see that it's still in Unicode regardless
        >of the format you have
        >specified for output. You can see this by looking at the file size. If you
        >copy the contents of that document and paste into a new document, then
        >save, you'll see that the file size
        >
        >has reduced by 50%, as it should. This really threw me, because my clips
        >are not designed to run on Unicode documents, so the clip library would
        >just fail miserably. I was getting Unicode
        >from OCR because of wider support for symbols.so now, if I receive Unicode
        >documents, I immediately convert them by copy/paste/save, and go from there.
        >
        >
        >You can
        >
        >From: <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com
        >[mailto:ntb-scripts@yahoogroups.com] On Behalf Of Art Kocsis
        >Sent: Tuesday, June 03, 2008 6:54 AM
        >To: <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com
        >Subject: RE: [NTS] Changing CR & LF
      • buralex@gmail.com
        Art Kocsis said on Jun 03, 2008 23:16 -0400 (in ... Use it enough and you ll start trying to enter Regex in the Google toolbar. it
        Message 3 of 16 , Jun 4, 2008
        • 0 Attachment
          Art Kocsis <artkns@...> said on Jun 03, 2008 23:16 -0400 (in
          part):
          > RegEx rocks!!! [But I do wish it had easier syntax.]
          Use it enough and you'll start trying to enter Regex in the Google toolbar.
          <tip> it doesn't work :-) </tip>

          More seriously - if you want to get a better handle on Regex syntax
          RegexBuddy is the way to go. And it has a forum as helpful as the
          Notetab group of mail lists. (Fortunately not as active)

          I use RegexBuddy for almost every non-trivial regex I attempt in Notetab
          then just paste the "correct" expression back into a ^!Find or ^!Replace
          statement.

          Regards ... Alec -- buralex-gmail
          --



          [Non-text portions of this message have been removed]
        • Art Kocsis
          Even though it s kind of embarrassing to display a kludgy clip, I thought I would share this in the hopes that it would inspire other RegEx beginners to learn
          Message 4 of 16 , Jun 12, 2008
          • 0 Attachment
            Even though it's kind of embarrassing to display a kludgy clip, I thought I
            would share this in the hopes that it would inspire other RegEx beginners
            to learn and use RegEx.

            Years ago I got tired of all the empty lines in HTML pages that I was editing
            (largely due to WYSIWYG editors such as FrontPage), I decided to write a
            clip to get rid of them. I don't know if this was my first clip or not but
            it was
            early. One of my big problems was handling the various line terminators -
            CR, LF, CRLF - that appeared in the code. I did not learn until last month
            that they were all converted to CRLF in the working image. Even so, the ^P
            token did not work consistently so I came up with this scheme. It worked
            but was quite slow. Finally losing patience with its slowness, I decided to
            redo the clip using RegEx. As you know, with your help, I was successful.
            Below, for your amusement/education/motivation are the before & after clips.

            Three lessons can be learned:

            1) Even kludges can be made to work and are useful. Keep trying.
            2) RegEx is quite esoteric yet is conquerable and is extremely efficient.
            3) We need better documentation, especially a User Guide.

            Enjoy, Art

            Note: NoteTab has EXTREMELY picky syntax. When it says "space delimited"
            that means a SINGLE space - two or more spaces => "syntax error"!

            ;^!Replace "SearchText" >> "ReplaceText" [Options TCIBGWHRSA]
            ; W: Whole. Search entire document (not just from the cursor position).
            ; S: Silent. NoteTab will not display any message box.
            ; A: All. Replace matched occurrences, not just first one.
            ;
            ; "ALT+M" invokes the Modify menu
            ; "L" invokes the Lines submenu
            ; "T" trims the selected text


            ;######### Old, Non-RegEx Clip. Could take 30 sec or more on a 40K file
            ^!StatusShow Running Single Space
            ^!Toolbar Select All
            ^!Keyboard ALT+M L T
            ^!Replace "^L" >> "^C" WSA
            ^!Replace "^C^C^C^C^C^C" >> "^C" WSA
            ^!Replace "^C^C^C^C^C" >> "^C" WSA
            ^!Replace "^C^C^C^C" >> "^C" WSA
            ^!Replace "^C^C^C" >> "^C" WSA
            ^!Replace "^C^C" >> "^C" WSA
            ^!Replace "^C" >> "^P" WSA
            ^!Jump Doc_Start


            ;######### New, RegEx Clip. Takes a fraction of a second on even very large
            files
            ^!StatusShow Running Single Space
            ^!Toolbar Select All
            ^!Keyboard ALT+M L T
            ^!Replace "\R+" >> "^%NL%" ARSTW
            ^!Jump Doc_Start
          • John Shotsky
            Art, ^!Replace R+ ^%NL% ARSTW Will change your paragraph spacing, because any paragraph separation lines will disappear. If you want to retain a blank
            Message 5 of 16 , Jun 12, 2008
            • 0 Attachment
              Art,

              ^!Replace "\R+" >> "^%NL%" ARSTW


              Will change your paragraph spacing, because any paragraph separation lines will disappear.
              If you want to retain a blank line between paragraphs, it should be:

              ^!Replace "\R" >> "^%NL%" ARSTW

              John

              From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of Art Kocsis
              Sent: Thursday, June 12, 2008 3:55 PM
              To: NoteTab-Scripts
              Subject: Re: [NTS] Changing CR & LF

              Even though it's kind of embarrassing to display a kludgy clip, I thought I
              would share this in the hopes that it would inspire other RegEx beginners
              to learn and use RegEx.

              Years ago I got tired of all the empty lines in HTML pages that I was editing
              (largely due to WYSIWYG editors such as FrontPage), I decided to write a
              clip to get rid of them. I don't know if this was my first clip or not but
              it was
              early. One of my big problems was handling the various line terminators -
              CR, LF, CRLF - that appeared in the code. I did not learn until last month
              that they were all converted to CRLF in the working image. Even so, the ^P
              token did not work consistently so I came up with this scheme. It worked
              but was quite slow. Finally losing patience with its slowness, I decided to
              redo the clip using RegEx. As you know, with your help, I was successful.
              Below, for your amusement/education/motivation are the before & after clips.

              Three lessons can be learned:

              1) Even kludges can be made to work and are useful. Keep trying.
              2) RegEx is quite esoteric yet is conquerable and is extremely efficient.
              3) We need better documentation, especially a User Guide.

              Enjoy, Art

              Note: NoteTab has EXTREMELY picky syntax. When it says "space delimited"
              that means a SINGLE space - two or more spaces => "syntax error"!

              ;^!Replace "SearchText" >> "ReplaceText" [Options TCIBGWHRSA]
              ; W: Whole. Search entire document (not just from the cursor position).
              ; S: Silent. NoteTab will not display any message box.
              ; A: All. Replace matched occurrences, not just first one.
              ;
              ; "ALT+M" invokes the Modify menu
              ; "L" invokes the Lines submenu
              ; "T" trims the selected text

              ;######### Old, Non-RegEx Clip. Could take 30 sec or more on a 40K file
              ^!StatusShow Running Single Space
              ^!Toolbar Select All
              ^!Keyboard ALT+M L T
              ^!Replace "^L" >> "^C" WSA
              ^!Replace "^C^C^C^C^C^C" >> "^C" WSA
              ^!Replace "^C^C^C^C^C" >> "^C" WSA
              ^!Replace "^C^C^C^C" >> "^C" WSA
              ^!Replace "^C^C^C" >> "^C" WSA
              ^!Replace "^C^C" >> "^C" WSA
              ^!Replace "^C" >> "^P" WSA
              ^!Jump Doc_Start

              ;######### New, RegEx Clip. Takes a fraction of a second on even very large
              files
              ^!StatusShow Running Single Space
              ^!Toolbar Select All
              ^!Keyboard ALT+M L T
              ^!Replace "\R+" >> "^%NL%" ARSTW
              ^!Jump Doc_Start



              [Non-text portions of this message have been removed]
            • Alec Burgess
              ... Just a couple of comments Art: As always there are many ways to skin the cat, in Notetab clips. I usually use them is this order: - Native command (looking
              Message 6 of 16 , Jun 12, 2008
              • 0 Attachment
                On Thu, Jun 12, 2008 at 6:54 PM, Art Kocsis <artkns@...> wrote:

                > ;######### New, RegEx Clip. Takes a fraction of a second on even very large
                >
                > files
                > ^!StatusShow Running Single Space
                > ^!Toolbar Select All
                > ^!Keyboard ALT+M L T
                > ^!Replace "\R+" >> "^%NL%" ARSTW
                > ^!Jump Doc_Start
                >

                Just a couple of comments Art:
                As always there are many ways to skin the cat, in Notetab clips. I usually
                use them is this order:

                - Native command (looking for a native command in Clip Help assists in
                learning others I may not have even realized exist)
                - Menu Command
                - Toolbar Command (actually I never use these - I think ALL Toolbar
                commands are available from menu.)
                - Keyboard Command - avoid like the plague - hard to figure out what they
                do, may require Waits to work correctly, (however they ARE necessary when
                trying to drive an external window from within Notetab)

                so:

                > ^!Toolbar Select All
                > ^!Keyboard ALT+M L T
                >
                would be:

                > ^!select ALL
                > ; or optionally ^!Menu Edit/"Select All";
                > ^!Menu Modify/Lines/Trim Blanks"


                what say the rest of the regular contributors?
                --
                Regards ... Alec
                --


                [Non-text portions of this message have been removed]
              • Art Kocsis
                Thank you for your interest John, but my clip does exactly what I want to do: change an entire document to single spacing, i.e,, change every instance of one
                Message 7 of 16 , Jun 12, 2008
                • 0 Attachment
                  Thank you for your interest John, but my clip does exactly what I want
                  to do: change an entire document to single spacing, i.e,, change every
                  instance of one or more consecutive newlines to a single newline. Your
                  clip is, in essence, a null clip as it just replaces each instance of a
                  newline with a newline. Compare the two expressions operating on a
                  multiply spaced document.

                  Namaste', Art


                  At 6/12/2008 04:02 PM, you wrote:
                  >Art,
                  >
                  >^!Replace "\R+" >> "^%NL%" ARSTW
                  >
                  >Will change your paragraph spacing, because any paragraph separation lines
                  >will disappear.
                  >If you want to retain a blank line between paragraphs, it should be:
                  >
                  >^!Replace "\R" >> "^%NL%" ARSTW
                  >
                  >John
                  >
                  >From: <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com On
                  >Behalf Of Art Kocsis
                  >Sent: Thursday, June 12, 2008 3:55 PM
                  >To: NoteTab-Scripts
                  >Subject: Re: [NTS] Changing CR & LF
                  >
                  >Even though it's kind of embarrassing to display a kludgy clip, I thought I
                  >would share this in the hopes that it would inspire other RegEx beginners
                  >to learn and use RegEx.
                  >
                  >Years ago I got tired of all the empty lines in HTML pages that I was editing
                  >(largely due to WYSIWYG editors such as FrontPage), I decided to write a
                  >clip to get rid of them. I don't know if this was my first clip or not but
                  >it was
                  >early. One of my big problems was handling the various line terminators -
                  >CR, LF, CRLF - that appeared in the code. I did not learn until last month
                  >that they were all converted to CRLF in the working image. Even so, the ^P
                  >token did not work consistently so I came up with this scheme. It worked
                  >but was quite slow. Finally losing patience with its slowness, I decided to
                  >redo the clip using RegEx. As you know, with your help, I was successful.
                  >Below, for your amusement/education/motivation are the before & after clips.
                  >
                  >Three lessons can be learned:
                  >
                  >1) Even kludges can be made to work and are useful. Keep trying.
                  >2) RegEx is quite esoteric yet is conquerable and is extremely efficient.
                  >3) We need better documentation, especially a User Guide.
                  >
                  >Enjoy, Art
                  >
                  >Note: NoteTab has EXTREMELY picky syntax. When it says "space delimited"
                  >that means a SINGLE space - two or more spaces => "syntax error"!
                  <snip>
                  >;######### New, RegEx Clip. Takes a fraction of a second on even very large
                  >files
                  >^!StatusShow Running Single Space
                  >^!Toolbar Select All
                  >^!Keyboard ALT+M L T
                  >^!Replace "\R+" >> "^%NL%" ARSTW
                  >^!Jump Doc_Start
                • Art Kocsis
                  HI Alec, This says it all: learning others I may not have even realized exist I had originally coded the Trim as: ^!Keyboard ALT+M O &100 L O &100 T But it
                  Message 8 of 16 , Jun 12, 2008
                  • 0 Attachment
                    HI Alec,

                    This says it all: "learning others I may not have even realized exist"

                    I had originally coded the Trim as:

                    ^!Keyboard ALT+M O &100 L O &100 T

                    But it didn't work so I did the

                    ^!Keyboard ALT+M L T

                    which did work, but is esoteric.

                    After learning about ^!Toolbar, I tried as it would be self-documenting:

                    ^!Toolbar Trim Blanks

                    but that didn't work because, as you know, it's not on a toolbar! So
                    I gave up and went back to what worked.

                    So thanks for the tip. I didn't know about the ^!Select ALL or ^!Menu
                    until now. It is much better and is what I had wanted to do all along.

                    Getting back to the Clip Help file it is huge and I find I spend a huge
                    amount of time searching it, even for stuff I have seen before let alone
                    commands or techniques of which I am not aware. There are not
                    enough internal links and the organization is frequently not the way
                    that I think. To find all these tidbits I would need to read the file end
                    to end a few times which would take forever. In the meantime, I am
                    collecting the various command names that I use or that look promising
                    into a single sorted file. Maybe that will help. Thankfully there is this list
                    that reap the benefit of other eyes reading the help file, like

                    "trying to drive an external window from within Notetab" ?? Was ist??

                    Thanks again, Art


                    At 6/12/2008 08:39 PM, you wrote:
                    >On Thu, Jun 12, 2008 at 6:54 PM, Art Kocsis
                    ><<mailto:artkns%40sbcglobal.net>artkns@...> wrote:
                    >
                    > > ;##### New, RegEx Clip. Takes a fraction of a second on even very large
                    > files
                    > >
                    > > ^!StatusShow Running Single Space
                    > > ^!Toolbar Select All
                    > > ^!Keyboard ALT+M L T
                    > > ^!Replace "\R+" >> "^%NL%" ARSTW
                    > > ^!Jump Doc_Start
                    > >
                    >Just a couple of comments Art:
                    >As always there are many ways to skin the cat, in Notetab clips. I usually
                    >use them is this order:
                    >
                    >- Native command (looking for a native command in Clip Help assists in
                    >learning others I may not have even realized exist)
                    >- Menu Command
                    >- Toolbar Command (actually I never use these - I think ALL Toolbar
                    >commands are available from menu.)
                    >- Keyboard Command - avoid like the plague - hard to figure out what they
                    >do, may require Waits to work correctly, (however they ARE necessary when
                    >trying to drive an external window from within Notetab)
                    >
                    >so:
                    >
                    > > ^!Toolbar Select All
                    > > ^!Keyboard ALT+M L T
                    > >
                    >would be:
                    >
                    > > ^!select ALL
                    > > ; or optionally ^!Menu Edit/"Select All";
                    > > ^!Menu Modify/Lines/Trim Blanks"
                    >
                    >what say the rest of the regular contributors?
                    >--
                    >Regards ... Alec
                  Your message has been successfully submitted and would be delivered to recipients shortly.