Loading ...
Sorry, an error occurred while loading the content.
 

Re: alternative not regex syntax (was modify strip html preserve urls)

Expand Messages
  • Don - HtmlFixIt.com
    No I am taking information and moving it from one html environment to another where mark-up is not needed to get my result. Formatting is handled in the upload
    Message 1 of 15 , Nov 3, 2009
      No I am taking information and moving it from one html environment to
      another where mark-up is not needed to get my result.
      Formatting is handled in the upload method I am using by line breaks vs
      <p> tags for example and that is converted via php to render the p tag
      -- so I no longer need the p tag and so forth. The reality is that I am
      going back into an html environment, but loosing mark-up styling
      information while preserving links and maybe one or two other tags while
      losing the rest.

      So is there a easy regex I can use for find all tags that aren't a href
      or /a and delete them. Ideally I may add another or to it as well.
      So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
      nothing.

      So that would be ^=not a href or \a or ul or li ... or something like
      that. Syntax help on alternative nots?

      Not's are new to my regex brain.

      Don
      Axel Berger wrote:
      > "Don - HtmlFixIt.com" wrote:
      >> Won't work Axel, the </a> has been removed.
      >
      > Yes, but isn't some clickable text unexpected and rather misleading in a
      > non-HTML environment? Wouldn't it be better to show the URL verbatim and
      > make that clickable?
      >
      > It depends on what kind of end result you have in mind I suppose, and
      > I'm not as yet very sure what it is you want to achieve.
      >
      > Axel
      >
      >
    • ebbtidalflats
      Don, Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the
      Message 2 of 15 , Nov 4, 2009
        Don,

        Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the variety.

        Instead in only three replace actions you could:
        1. tokenize the links (for example {a href ...}button text{/a})
        2. strip all html (no need to preserve URLs, since they don't exist anymore)
        3. then restore the tokenized links to html links.

        Cheers,


        Eb

        --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
        >
        > No I am taking information and moving it from one html environment to
        > another where mark-up is not needed to get my result.
        > Formatting is handled in the upload method I am using by line breaks vs
        > <p> tags for example and that is converted via php to render the p tag
        > -- so I no longer need the p tag and so forth. The reality is that I am
        > going back into an html environment, but loosing mark-up styling
        > information while preserving links and maybe one or two other tags while
        > losing the rest.
        >
        > So is there a easy regex I can use for find all tags that aren't a href
        > or /a and delete them. Ideally I may add another or to it as well.
        > So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
        > nothing.
        >
        > So that would be ^=not a href or \a or ul or li ... or something like
        > that. Syntax help on alternative nots?
        >
        > Not's are new to my regex brain.
        >
        > Don
        > Axel Berger wrote:
        > > "Don - HtmlFixIt.com" wrote:
        > >> Won't work Axel, the </a> has been removed.
        > >
        > > Yes, but isn't some clickable text unexpected and rather misleading in a
        > > non-HTML environment? Wouldn't it be better to show the URL verbatim and
        > > make that clickable?
        > >
        > > It depends on what kind of end result you have in mind I suppose, and
        > > I'm not as yet very sure what it is you want to achieve.
        > >
        > > Axel
        > >
        > >
        >
      • Sheri
        ... Try: ^!Replace ]* ) RAWS0 Regards, Sheri
        Message 3 of 15 , Nov 5, 2009
          --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
          >
          > Here is the rub, I want to keep hyperlinks and delete everything
          > else in the html world. So can I simply look for something like:
          > <not \a or \a href> and replace with nothing?
          > My not regex is pretty (can't say sloppy because it doesn't exist).

          Try:
          ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

          Regards,
          Sheri
        • Sheri
          ... also this: ^!Replace ]* RAWS0
          Message 4 of 15 , Nov 5, 2009
            --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
            >
            > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
            > >
            > > Here is the rub, I want to keep hyperlinks and delete everything
            > > else in the html world. So can I simply look for something like:
            > > <not \a or \a href> and replace with nothing?
            > > My not regex is pretty (can't say sloppy because it doesn't exist).
            >
            > Try:
            > ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

            also this:

            ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
          • Don - HtmlFixIt.com
            dang I was still working on the first puzzle! does that cover either an a tag or an /a tag then Sheri?
            Message 5 of 15 , Nov 5, 2009
              dang I was still working on the first puzzle!
              does that cover either an a tag or an /a tag then Sheri?

              Sheri wrote:
              >
              > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
              >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
              >>> Here is the rub, I want to keep hyperlinks and delete everything
              >>> else in the html world. So can I simply look for something like:
              >>> <not \a or \a href> and replace with nothing?
              >>> My not regex is pretty (can't say sloppy because it doesn't exist).
              >> Try:
              >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
              >
              > also this:
              >
              > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
            • Sheri
              ... Should preserve both opening and closing a tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines
              Message 6 of 15 , Nov 5, 2009
                --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                >
                > dang I was still working on the first puzzle!
                > does that cover either an a tag or an /a tag then Sheri?

                Should preserve both opening and closing "a" tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines the character class [^>]* would need to be [^>\r\n]*

                >
                > Sheri wrote:
                > >
                > > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@> wrote:
                > >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                > >>> Here is the rub, I want to keep hyperlinks and delete everything
                > >>> else in the html world. So can I simply look for something like:
                > >>> <not \a or \a href> and replace with nothing?
                > >>> My not regex is pretty (can't say sloppy because it doesn't exist).
                > >> Try:
                > >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
                > >
                > > also this:
                > >
                > > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.