regex to find paragraphs in a document
- Sheri should be proud. I actually figured out a couple of regexes!
But I am back at a wall here. I am trying to convert a text document
into html. I have two types of headings in the document and I have
inserted the proper number of returns before each to indicate what type
of heading it is. Now I want to wrap my paragraphs in p tags. I am
using NTP 5.x
In my document I start with this (P=line feed/return or whatever it is)
format of text:
paragraph 1 here etc etc etcP
paragraph 2 here againP
paragraph 3 here againP
and so forth.
What I am doing is running the file and finding every heading preceded
by four returns and nesting it in <h2> tags (heading example 2 above).
That is working fine with this regex:
^!Replace "\r\n\r\n\r\n\r\n([^\r\n]+)" >> "\r\n\r\n<h2
When I am done I have removed two of the returns in front of it.
then I repeat that process finding all with three in front of it and
leave behind h3 tags (heading example 3 paragraph above):
^!Replace "\r\n\r\n\r\n([^\r\n]+)" >> "\r\n\r\n<h3
Now I am trying to wrap my paragraphs in <p></p> tags. How do I find them?
They are preceded by an </h#> tag line (examples one and three above
after application of regex) or by a blank line (example 2 paragraph above).
I hope that makes sense.
- Don - HtmlFixIt.com wrote:
> Sheri should be proud. I actually figured out a couple of regexes!:)
I would point out that you don't need ^!Jump Doc_Start if you're using
the "W" whole document option. Also, "T" is meaningless in combination
with "R" regex option.
You can try this for your paragraphs:
^!Replace "^(?!\<h).+$" >> "<p>$0</p>" RAWS
It matches the beginning of a line (if that BOL is not followed by the
start of heading tag), and everything on that line (as long as there's
at least one character) up to the CRLF. Because of the parentheses, the
text is captured as subpattern 1. Then it replaces the matched text with
subpattern 1 surrounded by paragraph tags.
- --- In email@example.com, Sheri <silvermoonwoman@...> wrote:
>Sorry, I sent a bit too quickly. Ignore what I said about subpattern
> Don - HtmlFixIt.com wrote:
> > Sheri should be proud. I actually figured out a couple of regexes!
> Hi Don,
> I would point out that you don't need ^!Jump
> Doc_Start if you're using the "W" whole document
> option. Also, "T" is meaningless in combination with
> "R" regex option.
> You can try this for your paragraphs:
> ^!Replace "^(?!\<h).+$" >> "<p>$0</p>" RAWS
> It matches the beginning of a line (if that BOL is
> not followed by the start of heading tag), and
> everything on that line (as long as there's at least
> one character) up to the CRLF. Because of the
> parentheses, the text is captured as subpattern 1. Then
> it replaces the matched text with subpattern 1
> surrounded by paragraph tags.
1, I took the parentheses out of the pattern because they were
unnecessary. The parentheses were previously around the dot plus in
the pattern. Then the replacement referred to $1 instead of $0.
Thought it might confuse you that the dot plus was $1, because of the
other parentheses in the pattern. Parentheses surrounding an assertion
do not count.
>> I would point out that you don't need ^!JumpYes, good point. When I first started I was doing just one. When I
>> Doc_Start if you're using the "W" whole document
>> option. Also, "T" is meaningless in combination with
>> "R" regex option.
added the W I should have deleted the jump doc start.
I am getting the T because I am using somebodies clip bar help and it
doesn't work properly for the regex search I don't think. So I use the
Normal replace dialog.
There are a couple of little bugs actually I keep meaning to write down
in cc syntax.
One is the regex replace as mentioned above.
If you type replace and hit the ccsyn icon (that's how I do it anyway).
You get an opportunity to select either Normal or Regular Expression
I choose regular and get essentially this as output:
^!Replace "x" >> "y" Ignore case (can also be accomplished with (?i) in
Also, when you use iferror you get this:
^!IfError GoToLabelTrue [ELSE GoToLabelFalse]
after getting a non-sense option popping up. I think it should be
prompting me for the labels for the goto and else goto but it doesn't.
Let me say again how much I love CCSYN! Thank you for your efforts in
I will try your paragraph method next. I was kind of getting them
(don't laugh) using this:
;find paragraphs after heading
^!Find "[\w[:punct:]]\r\n[\w]" TIRS
^!If "^$GetRow$" = "^$GetLineCount$" Loop2
One problem with that is that it was grabbing the return at the end of
the paragraph so then the extra replace is necessary to reverse the </p>
and the ^P.
I was also trying ^!Select paragraphs, but same issue there with
grabbing the ^P inside my paragraph tags.
- Hi Don,
Thanks for reporting those bugs, if you mention things as you come
across them I'll try to fix them up. I posted an update to Clipcode
Syntax in the files area. :)
On 4/9/2007 11:31 PM, Sheri wrote:
>I posted an update to Clipcode Syntax in the files area. :)
Nice work, Sheri, as usual. Thanks! I'll have to look this one over carefully.
[Non-text portions of this message have been removed]