Loading ...
Sorry, an error occurred while loading the content.

RE: [Clip] modify strip html preserve urls

Expand Messages
  • John Shotsky
    It might be easier to strip the html then rebuild the links. What does the source look like? John From: ntb-clips@yahoogroups.com
    Message 1 of 15 , Nov 3, 2009
    • 0 Attachment
      It might be easier to strip the html then rebuild the links. What does the
      source look like?

      John



      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf
      Of Don - HtmlFixIt.com
      Sent: Tuesday, November 03, 2009 7:13 AM
      To: ntb-clips@yahoogroups.com
      Subject: Re: [Clip] modify strip html preserve urls





      I guess in my mind I took preserve urls as preserve hyperlinks.

      Here is the rub, I want to keep hyperlinks and delete everything else in
      the html world. So can I simply look for something like:
      <not \a or \a href> and replace with nothing?
      My not regex is pretty (can't say sloppy because it doesn't exist).

      Axel Berger wrote:
      > "Don - HtmlFixIt.com" wrote:
      >> I thought it would actually leave the tags
      >
      > Well, it's called "strip HTML", innit?
      >
      >> need to clean tags another way then
      >
      > The "<http://(.*?)>" sequence should be easy to find. Just wrap whatever
      > you want around it.
      >
      > Axel
      >





      [Non-text portions of this message have been removed]
    • Axel Berger
      ... Yes, I suppose you can. But I suggest my way is simpler, erase all HTML through the menu function and then find the URLs and resore the tag around them.
      Message 2 of 15 , Nov 3, 2009
      • 0 Attachment
        "Don - HtmlFixIt.com" wrote:
        > So can I simply look for something like:
        > <not \a or \a href> and replace with nothing?

        Yes, I suppose you can. But I suggest my way is simpler, erase all HTML
        through the menu function and then find the URLs and resore the tag
        around them. Caution: My suggested find will not work for local relative
        references without the "http:".

        Axel
      • Don - HtmlFixIt.com
        Won t work Axel, the has been removed. I suppose I could put a marker in there ...
        Message 3 of 15 , Nov 3, 2009
        • 0 Attachment
          Won't work Axel, the </a> has been removed. I suppose I could put a
          marker in there ...

          Axel Berger wrote:
          > "Don - HtmlFixIt.com" wrote:
          >> So can I simply look for something like:
          >> <not \a or \a href> and replace with nothing?
          >
          > Yes, I suppose you can. But I suggest my way is simpler, erase all HTML
          > through the menu function and then find the URLs and resore the tag
          > around them. Caution: My suggested find will not work for local relative
          > references without the "http:".
          >
          > Axel
        • Axel Berger
          ... Yes, but isn t some clickable text unexpected and rather misleading in a non-HTML environment? Wouldn t it be better to show the URL verbatim and make that
          Message 4 of 15 , Nov 3, 2009
          • 0 Attachment
            "Don - HtmlFixIt.com" wrote:
            > Won't work Axel, the </a> has been removed.

            Yes, but isn't some clickable text unexpected and rather misleading in a
            non-HTML environment? Wouldn't it be better to show the URL verbatim and
            make that clickable?

            It depends on what kind of end result you have in mind I suppose, and
            I'm not as yet very sure what it is you want to achieve.

            Axel
          • Don - HtmlFixIt.com
            No I am taking information and moving it from one html environment to another where mark-up is not needed to get my result. Formatting is handled in the upload
            Message 5 of 15 , Nov 3, 2009
            • 0 Attachment
              No I am taking information and moving it from one html environment to
              another where mark-up is not needed to get my result.
              Formatting is handled in the upload method I am using by line breaks vs
              <p> tags for example and that is converted via php to render the p tag
              -- so I no longer need the p tag and so forth. The reality is that I am
              going back into an html environment, but loosing mark-up styling
              information while preserving links and maybe one or two other tags while
              losing the rest.

              So is there a easy regex I can use for find all tags that aren't a href
              or /a and delete them. Ideally I may add another or to it as well.
              So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
              nothing.

              So that would be ^=not a href or \a or ul or li ... or something like
              that. Syntax help on alternative nots?

              Not's are new to my regex brain.

              Don
              Axel Berger wrote:
              > "Don - HtmlFixIt.com" wrote:
              >> Won't work Axel, the </a> has been removed.
              >
              > Yes, but isn't some clickable text unexpected and rather misleading in a
              > non-HTML environment? Wouldn't it be better to show the URL verbatim and
              > make that clickable?
              >
              > It depends on what kind of end result you have in mind I suppose, and
              > I'm not as yet very sure what it is you want to achieve.
              >
              > Axel
              >
              >
            • ebbtidalflats
              Don, Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the
              Message 6 of 15 , Nov 4, 2009
              • 0 Attachment
                Don,

                Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the variety.

                Instead in only three replace actions you could:
                1. tokenize the links (for example {a href ...}button text{/a})
                2. strip all html (no need to preserve URLs, since they don't exist anymore)
                3. then restore the tokenized links to html links.

                Cheers,


                Eb

                --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                >
                > No I am taking information and moving it from one html environment to
                > another where mark-up is not needed to get my result.
                > Formatting is handled in the upload method I am using by line breaks vs
                > <p> tags for example and that is converted via php to render the p tag
                > -- so I no longer need the p tag and so forth. The reality is that I am
                > going back into an html environment, but loosing mark-up styling
                > information while preserving links and maybe one or two other tags while
                > losing the rest.
                >
                > So is there a easy regex I can use for find all tags that aren't a href
                > or /a and delete them. Ideally I may add another or to it as well.
                > So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
                > nothing.
                >
                > So that would be ^=not a href or \a or ul or li ... or something like
                > that. Syntax help on alternative nots?
                >
                > Not's are new to my regex brain.
                >
                > Don
                > Axel Berger wrote:
                > > "Don - HtmlFixIt.com" wrote:
                > >> Won't work Axel, the </a> has been removed.
                > >
                > > Yes, but isn't some clickable text unexpected and rather misleading in a
                > > non-HTML environment? Wouldn't it be better to show the URL verbatim and
                > > make that clickable?
                > >
                > > It depends on what kind of end result you have in mind I suppose, and
                > > I'm not as yet very sure what it is you want to achieve.
                > >
                > > Axel
                > >
                > >
                >
              • Sheri
                ... Try: ^!Replace ]* ) RAWS0 Regards, Sheri
                Message 7 of 15 , Nov 5, 2009
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                  >
                  > Here is the rub, I want to keep hyperlinks and delete everything
                  > else in the html world. So can I simply look for something like:
                  > <not \a or \a href> and replace with nothing?
                  > My not regex is pretty (can't say sloppy because it doesn't exist).

                  Try:
                  ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

                  Regards,
                  Sheri
                • Sheri
                  ... also this: ^!Replace ]* RAWS0
                  Message 8 of 15 , Nov 5, 2009
                  • 0 Attachment
                    --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                    >
                    > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                    > >
                    > > Here is the rub, I want to keep hyperlinks and delete everything
                    > > else in the html world. So can I simply look for something like:
                    > > <not \a or \a href> and replace with nothing?
                    > > My not regex is pretty (can't say sloppy because it doesn't exist).
                    >
                    > Try:
                    > ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

                    also this:

                    ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                  • Don - HtmlFixIt.com
                    dang I was still working on the first puzzle! does that cover either an a tag or an /a tag then Sheri?
                    Message 9 of 15 , Nov 5, 2009
                    • 0 Attachment
                      dang I was still working on the first puzzle!
                      does that cover either an a tag or an /a tag then Sheri?

                      Sheri wrote:
                      >
                      > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                      >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                      >>> Here is the rub, I want to keep hyperlinks and delete everything
                      >>> else in the html world. So can I simply look for something like:
                      >>> <not \a or \a href> and replace with nothing?
                      >>> My not regex is pretty (can't say sloppy because it doesn't exist).
                      >> Try:
                      >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
                      >
                      > also this:
                      >
                      > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                    • Sheri
                      ... Should preserve both opening and closing a tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines
                      Message 10 of 15 , Nov 5, 2009
                      • 0 Attachment
                        --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                        >
                        > dang I was still working on the first puzzle!
                        > does that cover either an a tag or an /a tag then Sheri?

                        Should preserve both opening and closing "a" tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines the character class [^>]* would need to be [^>\r\n]*

                        >
                        > Sheri wrote:
                        > >
                        > > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@> wrote:
                        > >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                        > >>> Here is the rub, I want to keep hyperlinks and delete everything
                        > >>> else in the html world. So can I simply look for something like:
                        > >>> <not \a or \a href> and replace with nothing?
                        > >>> My not regex is pretty (can't say sloppy because it doesn't exist).
                        > >> Try:
                        > >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
                        > >
                        > > also this:
                        > >
                        > > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                        >
                      Your message has been successfully submitted and would be delivered to recipients shortly.