Loading ...
Sorry, an error occurred while loading the content.

regex to find paragraphs in a document

Expand Messages
  • Don - HtmlFixIt.com
    Sheri should be proud. I actually figured out a couple of regexes! But I am back at a wall here. I am trying to convert a text document into html. I have
    Message 1 of 6 , Apr 9, 2007
    • 0 Attachment
      Sheri should be proud. I actually figured out a couple of regexes!

      But I am back at a wall here. I am trying to convert a text document
      into html. I have two types of headings in the document and I have
      inserted the proper number of returns before each to indicate what type
      of heading it is. Now I want to wrap my paragraphs in p tags. I am
      using NTP 5.x

      In my document I start with this (P=line feed/return or whatever it is)
      format of text:

      P
      P
      P
      P
      HEADING2P
      paragraph 1 here etc etc etcP
      P
      paragraph 2 here againP
      P
      P
      Another Heading3P
      paragraph 3 here againP

      and so forth.

      What I am doing is running the file and finding every heading preceded
      by four returns and nesting it in <h2> tags (heading example 2 above).
      That is working fine with this regex:
      ^!Jump Doc_Start
      ^!Replace "\r\n\r\n\r\n\r\n([^\r\n]+)" >> "\r\n\r\n<h2
      class="cap_control">$1</h2>" TIRSWA

      When I am done I have removed two of the returns in front of it.

      then I repeat that process finding all with three in front of it and
      leave behind h3 tags (heading example 3 paragraph above):
      ^!Replace "\r\n\r\n\r\n([^\r\n]+)" >> "\r\n\r\n<h3
      class="cap_control">$1</h3>" TIRSWA

      Now I am trying to wrap my paragraphs in <p></p> tags. How do I find them?

      They are preceded by an </h#> tag line (examples one and three above
      after application of regex) or by a blank line (example 2 paragraph above).

      I hope that makes sense.
    • Sheri
      ... Hi Don, I would point out that you don t need ^!Jump Doc_Start if you re using the W whole document option. Also, T is meaningless in combination with
      Message 2 of 6 , Apr 9, 2007
      • 0 Attachment
        Don - HtmlFixIt.com wrote:
        > Sheri should be proud. I actually figured out a couple of regexes!
        >
        >
        :)

        Hi Don,

        I would point out that you don't need ^!Jump Doc_Start if you're using
        the "W" whole document option. Also, "T" is meaningless in combination
        with "R" regex option.

        You can try this for your paragraphs:

        ^!Replace "^(?!\<h).+$" >> "<p>$0</p>" RAWS

        It matches the beginning of a line (if that BOL is not followed by the
        start of heading tag), and everything on that line (as long as there's
        at least one character) up to the CRLF. Because of the parentheses, the
        text is captured as subpattern 1. Then it replaces the matched text with
        subpattern 1 surrounded by paragraph tags.

        Regards,
        Sheri
      • Sheri
        ... Sorry, I sent a bit too quickly. Ignore what I said about subpattern 1, I took the parentheses out of the pattern because they were unnecessary. The
        Message 3 of 6 , Apr 9, 2007
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
          >
          > Don - HtmlFixIt.com wrote:
          > > Sheri should be proud. I actually figured out a couple of regexes!
          > >
          > >
          > :)
          >
          > Hi Don,
          >
          > I would point out that you don't need ^!Jump
          > Doc_Start if you're using the "W" whole document
          > option. Also, "T" is meaningless in combination with
          > "R" regex option.
          >
          > You can try this for your paragraphs:

          >
          > ^!Replace "^(?!\<h).+$" >> "<p>$0</p>" RAWS
          >
          > It matches the beginning of a line (if that BOL is
          > not followed by the start of heading tag), and
          > everything on that line (as long as there's at least
          > one character) up to the CRLF. Because of the
          > parentheses, the text is captured as subpattern 1. Then
          > it replaces the matched text with subpattern 1
          > surrounded by paragraph tags.

          Sorry, I sent a bit too quickly. Ignore what I said about subpattern
          1, I took the parentheses out of the pattern because they were
          unnecessary. The parentheses were previously around the dot plus in
          the pattern. Then the replacement referred to $1 instead of $0.
          Thought it might confuse you that the dot plus was $1, because of the
          other parentheses in the pattern. Parentheses surrounding an assertion
          do not count.

          Regards,
          Sheri
        • Don - HtmlFixIt.com
          ... Yes, good point. When I first started I was doing just one. When I added the W I should have deleted the jump doc start. I am getting the T because I am
          Message 4 of 6 , Apr 9, 2007
          • 0 Attachment
            >> I would point out that you don't need ^!Jump
            >> Doc_Start if you're using the "W" whole document
            >> option. Also, "T" is meaningless in combination with
            >> "R" regex option.

            Yes, good point. When I first started I was doing just one. When I
            added the W I should have deleted the jump doc start.
            I am getting the T because I am using somebodies clip bar help and it
            doesn't work properly for the regex search I don't think. So I use the
            Normal replace dialog.

            There are a couple of little bugs actually I keep meaning to write down
            in cc syntax.

            One is the regex replace as mentioned above.

            If you type replace and hit the ccsyn icon (that's how I do it anyway).
            You get an opportunity to select either Normal or Regular Expression

            I choose regular and get essentially this as output:
            ^!Replace "x" >> "y" Ignore case (can also be accomplished with (?i) in
            expression)==Yes^=I|_Not specified^=}AWxS{(T=C)

            Also, when you use iferror you get this:
            ^!IfError GoToLabelTrue [ELSE GoToLabelFalse]

            after getting a non-sense option popping up. I think it should be
            prompting me for the labels for the goto and else goto but it doesn't.

            Let me say again how much I love CCSYN! Thank you for your efforts in
            that regard.

            I will try your paragraph method next. I was kind of getting them
            (don't laugh) using this:
            :Loop1
            ;find paragraphs after heading
            ^!Find "[\w[:punct:]]\r\n[\w]" TIRS
            ^!Jump Select_End
            ^!MoveCursor -1
            ^!Select Line
            ^!InsertHtml <p>^&</p>
            ^!If "^$GetRow$" = "^$GetLineCount$" Loop2


            One problem with that is that it was grabbing the return at the end of
            the paragraph so then the extra replace is necessary to reverse the </p>
            and the ^P.

            I was also trying ^!Select paragraphs, but same issue there with
            grabbing the ^P inside my paragraph tags.
          • Sheri
            Hi Don, Thanks for reporting those bugs, if you mention things as you come across them I ll try to fix them up. I posted an update to Clipcode Syntax in the
            Message 5 of 6 , Apr 9, 2007
            • 0 Attachment
              Hi Don,

              Thanks for reporting those bugs, if you mention things as you come
              across them I'll try to fix them up. I posted an update to Clipcode
              Syntax in the files area. :)

              Regards,
              Sheri
            • Lee Underwood
              On 4/9/2007 11:31 PM, Sheri wrote: ... ........................................................ Nice work, Sheri, as usual. Thanks! I ll have to look this one
              Message 6 of 6 , Apr 9, 2007
              • 0 Attachment
                On 4/9/2007 11:31 PM, Sheri wrote:
                ........................................................

                >Hi Don,
                >
                >I posted an update to Clipcode Syntax in the files area. :)
                >
                >Regards,
                >Sheri
                ........................................................

                Nice work, Sheri, as usual. Thanks! I'll have to look this one over carefully.

                Lee

                ...........................................................
                Lee Underwood
                Jupitermedia Corporation

                Managing Editor
                <http://www.webdeveloper.com/>WebDeveloper.com |
                <http://scriptsearch.internet.com/>ScriptSearch |
                <http://javascript.internet.com/>JavaScript Source |
                <http://www.thecounter.com/>TheCounter |
                <http://www.theguestbook.com/>TheGuestbook

                Associate Editor
                <http://www.webreference.com/>WebReference.com


                [Non-text portions of this message have been removed]
              Your message has been successfully submitted and would be delivered to recipients shortly.