Loading ...
Sorry, an error occurred while loading the content.

Re: [NTB] Help with regex... removing some html tags

Expand Messages
  • Larry Thomas
    ... Hi Mike, You should not post questions about advanced programming or script use to the basic list. This should be posted to the NoteTab Clips list. I
    Message 1 of 4 , Aug 6, 2003
    • 0 Attachment
      At 02:46 PM 8/6/03 -0000, you wrote:
      >Another regular expression neophyte here...
      >
      >I have about 250 html pages and I need to remove all of the
      >javascript. Each page has multiple scripts. My problem is, I just
      >can't find the right combination for the regex.
      >
      >Each javascript has:
      > <script
      > every possible character and line feed combination
      > </script>
      >
      >Using the following statement:
      > (<script\a+(</script>))
      >
      >It finds the first occurrence of <script and the very last occurrence
      >of </script>. It never finds </script> tags that fall in-between. Of
      >course, it also selects other tags that I need to keep when it only
      >grabs the last </script>.
      >
      >I haven't even thought about what the replacement statement should
      >look like... I'm guessing that it would be blank because I want to
      >remove all occurrences.
      >
      >Any help would be appreciated.
      >-Mike

      Hi Mike,

      You should not post questions about advanced programming or script use to
      the basic list. This should be posted to the NoteTab Clips list. I have
      posted this reply there with a Cc: to you in case you are not on that list
      just yet. You can join the clips list by sending an empty post to:

      ntb-Clips-Subscribe@yahoogroups.com

      NoteTab currently has a "Greedy" regex engine which causes this to happen.
      Eric will be writing a new version called 5.0 which will have a new regex
      engine which will not have this problem. In the mean time you must use a
      regular search and replace.

      Here is the clip:

      H="Delete Javascript"
      ;lrt@... e¿ê
      ;08/06/2003, 12:53:04 PM
      :Loop
      ^!Find "<script\" TISA
      ^!IfError Exit ELSE Next
      ^!Set %Cursor%=^$Getrow$:^$Getcol$
      ^!Find "</script>" TISA
      ^!Jump Select_End
      ^!SelectTo ^%Cursor%
      ^!Select Lines
      ^!InsertText
      ^!Goto Loop

      Regards,

      Larry
      lrt@... e¿ê
    • mbl60181
      Thanks for the help... didn t know that this was a regex bug. Sorry bout postin outside of ntbClips. I thought it was just a one liner in the find/replace
      Message 2 of 4 , Aug 6, 2003
      • 0 Attachment
        Thanks for the help... didn't know that this was a regex bug.
        Sorry 'bout postin' outside of ntbClips. I thought it was just a one
        liner in the find/replace dialog box.
      • pdqweb
        I realize this thread is old but I m trying to do a similar job but this script doesn t work in NTP v6. I want to remove text starting with and ending with
        Message 3 of 4 , Mar 10 11:13 AM
        • 0 Attachment
          I realize this thread is old but I'm trying to do a similar job but this script doesn't work in NTP v6. I want to remove text starting with " and ending with ". A program that removes all styles from an HTML page would work, but I thought a script would be simpler for minor editing. Larry's script didn't work for me using either "\"" or '"'. Any suggestions?

          --- In ntb-clips@yahoogroups.com, Larry Thomas <larryt@...> wrote:
          >
          > At 02:46 PM 8/6/03 -0000, you wrote:
          > >Another regular expression neophyte here...
          > >
          > >I have about 250 html pages and I need to remove all of the
          > >javascript. Each page has multiple scripts. My problem is, I just
          > >can't find the right combination for the regex.
          >
          > Here is the clip:
          >
          > H="Delete Javascript"
          > ;lrt@... e¿ê
          > ;08/06/2003, 12:53:04 PM
          > :Loop
          > ^!Find "<script\" TISA
          > ^!IfError Exit ELSE Next
          > ^!Set %Cursor%=^$Getrow$:^$Getcol$
          > ^!Find "</script>" TISA
          > ^!Jump Select_End
          > ^!SelectTo ^%Cursor%
          > ^!Select Lines
          > ^!InsertText
          > ^!Goto Loop
          >
          > Regards,
          >
          > Larry
          > lrt@... e¿ê
          >
        • diodeom
          ... Assuming (!) that all your quotes surrounding segments to remove are properly paired: ^!Jump 1 ^!Replace x22[^ ]++ x22 WARS
          Message 4 of 4 , Mar 10 11:32 AM
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, "pdqweb" <pat@...> wrote:
            >
            > I want to remove text starting with " and ending with ".
            >

            Assuming (!) that all your quotes surrounding segments to remove are properly paired:

            ^!Jump 1
            ^!Replace "\x22[^"]++\x22" >> "" WARS
          Your message has been successfully submitted and would be delivered to recipients shortly.