Re: [NTB] Help with regex... removing some html tags
- At 02:46 PM 8/6/03 -0000, you wrote:
>Another regular expression neophyte here...Hi Mike,
>I have about 250 html pages and I need to remove all of the
>can't find the right combination for the regex.
> every possible character and line feed combination
>Using the following statement:
>It finds the first occurrence of <script and the very last occurrence
>of </script>. It never finds </script> tags that fall in-between. Of
>course, it also selects other tags that I need to keep when it only
>grabs the last </script>.
>I haven't even thought about what the replacement statement should
>look like... I'm guessing that it would be blank because I want to
>remove all occurrences.
>Any help would be appreciated.
You should not post questions about advanced programming or script use to
the basic list. This should be posted to the NoteTab Clips list. I have
posted this reply there with a Cc: to you in case you are not on that list
just yet. You can join the clips list by sending an empty post to:
NoteTab currently has a "Greedy" regex engine which causes this to happen.
Eric will be writing a new version called 5.0 which will have a new regex
engine which will not have this problem. In the mean time you must use a
regular search and replace.
Here is the clip:
;08/06/2003, 12:53:04 PM
^!Find "<script\" TISA
^!IfError Exit ELSE Next
^!Find "</script>" TISA
- Thanks for the help... didn't know that this was a regex bug.
Sorry 'bout postin' outside of ntbClips. I thought it was just a one
liner in the find/replace dialog box.
- I realize this thread is old but I'm trying to do a similar job but this script doesn't work in NTP v6. I want to remove text starting with " and ending with ". A program that removes all styles from an HTML page would work, but I thought a script would be simpler for minor editing. Larry's script didn't work for me using either "\"" or '"'. Any suggestions?
--- In firstname.lastname@example.org, Larry Thomas <larryt@...> wrote:
> At 02:46 PM 8/6/03 -0000, you wrote:
> >Another regular expression neophyte here...
> >I have about 250 html pages and I need to remove all of the
> >can't find the right combination for the regex.
> Here is the clip:
> ;lrt@... e¿ê
> ;08/06/2003, 12:53:04 PM
> ^!Find "<script\" TISA
> ^!IfError Exit ELSE Next
> ^!Set %Cursor%=^$Getrow$:^$Getcol$
> ^!Find "</script>" TISA
> ^!Jump Select_End
> ^!SelectTo ^%Cursor%
> ^!Select Lines
> ^!Goto Loop
> lrt@... e¿ê
- --- In email@example.com, "pdqweb" <pat@...> wrote:
>Assuming (!) that all your quotes surrounding segments to remove are properly paired:
> I want to remove text starting with " and ending with ".
^!Replace "\x22[^"]++\x22" >> "" WARS