Loading ...
Sorry, an error occurred while loading the content.
 

modify strip html preserve urls

Expand Messages
  • Don - HtmlFixIt.com
    when I do this I get the but it deletes the corresponding I want the left alone as well .... am I missing something?
    Message 1 of 15 , Nov 2, 2009
      when I do this I get the <a href="whatever.html"> but it deletes the
      corresponding </a>
      I want the </a> left alone as well .... am I missing something?
    • loro
      ... Are you sure? What version? Because what Notetab has always done before is turning this
      Message 2 of 15 , Nov 2, 2009
        Don wrote:
        >when I do this I get the <a href="whatever.html"> but it deletes the
        >corresponding </a>
        >I want the </a> left alone as well .... am I missing something?

        Are you sure? What version? Because what Notetab has always done
        before is turning this
        <a href="http://...>Link text</a>
        into this.
        <http://....>Link text

        No A HREF, just whatever-they-are-called brackets around the URL.

        Lotta
      • Don - HtmlFixIt.com
        so that is expected behavior then ... I thought it would actually leave the tags, need to clean tags another way then
        Message 3 of 15 , Nov 2, 2009
          so that is expected behavior then ... I thought it would actually leave
          the tags, need to clean tags another way then

          loro wrote:
          > Don wrote:
          >> when I do this I get the <a href="whatever.html"> but it deletes the
          >> corresponding </a>
          >> I want the </a> left alone as well .... am I missing something?
          >
          > Are you sure? What version? Because what Notetab has always done
          > before is turning this
          > <a href="http://...>Link text</a>
          > into this.
          > <http://....>Link text
          >
          > No A HREF, just whatever-they-are-called brackets around the URL.
          >
          > Lotta
          >
          >
          >
          > ------------------------------------
          >
          > Fookes Software: http://www.fookes.com/
          > NoteTab website: http://www.notetab.com/
          > NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >
          > ***
          > Yahoo! Groups Links
          >
          >
          >
          >
        • Axel Berger
          ... Well, it s called strip HTML , innit? ... The sequence should be easy to find. Just wrap whatever you want around it. Axel
          Message 4 of 15 , Nov 3, 2009
            "Don - HtmlFixIt.com" wrote:
            > I thought it would actually leave the tags

            Well, it's called "strip HTML", innit?

            > need to clean tags another way then

            The "<http://(.*?)>" sequence should be easy to find. Just wrap whatever
            you want around it.

            Axel
          • Don - HtmlFixIt.com
            I guess in my mind I took preserve urls as preserve hyperlinks. Here is the rub, I want to keep hyperlinks and delete everything else in the html world. So
            Message 5 of 15 , Nov 3, 2009
              I guess in my mind I took preserve urls as preserve hyperlinks.

              Here is the rub, I want to keep hyperlinks and delete everything else in
              the html world. So can I simply look for something like:
              <not \a or \a href> and replace with nothing?
              My not regex is pretty (can't say sloppy because it doesn't exist).

              Axel Berger wrote:
              > "Don - HtmlFixIt.com" wrote:
              >> I thought it would actually leave the tags
              >
              > Well, it's called "strip HTML", innit?
              >
              >> need to clean tags another way then
              >
              > The "<http://(.*?)>" sequence should be easy to find. Just wrap whatever
              > you want around it.
              >
              > Axel
              >
            • John Shotsky
              It might be easier to strip the html then rebuild the links. What does the source look like? John From: ntb-clips@yahoogroups.com
              Message 6 of 15 , Nov 3, 2009
                It might be easier to strip the html then rebuild the links. What does the
                source look like?

                John



                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf
                Of Don - HtmlFixIt.com
                Sent: Tuesday, November 03, 2009 7:13 AM
                To: ntb-clips@yahoogroups.com
                Subject: Re: [Clip] modify strip html preserve urls





                I guess in my mind I took preserve urls as preserve hyperlinks.

                Here is the rub, I want to keep hyperlinks and delete everything else in
                the html world. So can I simply look for something like:
                <not \a or \a href> and replace with nothing?
                My not regex is pretty (can't say sloppy because it doesn't exist).

                Axel Berger wrote:
                > "Don - HtmlFixIt.com" wrote:
                >> I thought it would actually leave the tags
                >
                > Well, it's called "strip HTML", innit?
                >
                >> need to clean tags another way then
                >
                > The "<http://(.*?)>" sequence should be easy to find. Just wrap whatever
                > you want around it.
                >
                > Axel
                >





                [Non-text portions of this message have been removed]
              • Axel Berger
                ... Yes, I suppose you can. But I suggest my way is simpler, erase all HTML through the menu function and then find the URLs and resore the tag around them.
                Message 7 of 15 , Nov 3, 2009
                  "Don - HtmlFixIt.com" wrote:
                  > So can I simply look for something like:
                  > <not \a or \a href> and replace with nothing?

                  Yes, I suppose you can. But I suggest my way is simpler, erase all HTML
                  through the menu function and then find the URLs and resore the tag
                  around them. Caution: My suggested find will not work for local relative
                  references without the "http:".

                  Axel
                • Don - HtmlFixIt.com
                  Won t work Axel, the has been removed. I suppose I could put a marker in there ...
                  Message 8 of 15 , Nov 3, 2009
                    Won't work Axel, the </a> has been removed. I suppose I could put a
                    marker in there ...

                    Axel Berger wrote:
                    > "Don - HtmlFixIt.com" wrote:
                    >> So can I simply look for something like:
                    >> <not \a or \a href> and replace with nothing?
                    >
                    > Yes, I suppose you can. But I suggest my way is simpler, erase all HTML
                    > through the menu function and then find the URLs and resore the tag
                    > around them. Caution: My suggested find will not work for local relative
                    > references without the "http:".
                    >
                    > Axel
                  • Axel Berger
                    ... Yes, but isn t some clickable text unexpected and rather misleading in a non-HTML environment? Wouldn t it be better to show the URL verbatim and make that
                    Message 9 of 15 , Nov 3, 2009
                      "Don - HtmlFixIt.com" wrote:
                      > Won't work Axel, the </a> has been removed.

                      Yes, but isn't some clickable text unexpected and rather misleading in a
                      non-HTML environment? Wouldn't it be better to show the URL verbatim and
                      make that clickable?

                      It depends on what kind of end result you have in mind I suppose, and
                      I'm not as yet very sure what it is you want to achieve.

                      Axel
                    • Don - HtmlFixIt.com
                      No I am taking information and moving it from one html environment to another where mark-up is not needed to get my result. Formatting is handled in the upload
                      Message 10 of 15 , Nov 3, 2009
                        No I am taking information and moving it from one html environment to
                        another where mark-up is not needed to get my result.
                        Formatting is handled in the upload method I am using by line breaks vs
                        <p> tags for example and that is converted via php to render the p tag
                        -- so I no longer need the p tag and so forth. The reality is that I am
                        going back into an html environment, but loosing mark-up styling
                        information while preserving links and maybe one or two other tags while
                        losing the rest.

                        So is there a easy regex I can use for find all tags that aren't a href
                        or /a and delete them. Ideally I may add another or to it as well.
                        So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
                        nothing.

                        So that would be ^=not a href or \a or ul or li ... or something like
                        that. Syntax help on alternative nots?

                        Not's are new to my regex brain.

                        Don
                        Axel Berger wrote:
                        > "Don - HtmlFixIt.com" wrote:
                        >> Won't work Axel, the </a> has been removed.
                        >
                        > Yes, but isn't some clickable text unexpected and rather misleading in a
                        > non-HTML environment? Wouldn't it be better to show the URL verbatim and
                        > make that clickable?
                        >
                        > It depends on what kind of end result you have in mind I suppose, and
                        > I'm not as yet very sure what it is you want to achieve.
                        >
                        > Axel
                        >
                        >
                      • ebbtidalflats
                        Don, Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the
                        Message 11 of 15 , Nov 4, 2009
                          Don,

                          Constructing a NOT pattern is fairly difficult because the [^...] deals with character sets, not patterns, and might need several passes to deal with the variety.

                          Instead in only three replace actions you could:
                          1. tokenize the links (for example {a href ...}button text{/a})
                          2. strip all html (no need to preserve URLs, since they don't exist anymore)
                          3. then restore the tokenized links to html links.

                          Cheers,


                          Eb

                          --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                          >
                          > No I am taking information and moving it from one html environment to
                          > another where mark-up is not needed to get my result.
                          > Formatting is handled in the upload method I am using by line breaks vs
                          > <p> tags for example and that is converted via php to render the p tag
                          > -- so I no longer need the p tag and so forth. The reality is that I am
                          > going back into an html environment, but loosing mark-up styling
                          > information while preserving links and maybe one or two other tags while
                          > losing the rest.
                          >
                          > So is there a easy regex I can use for find all tags that aren't a href
                          > or /a and delete them. Ideally I may add another or to it as well.
                          > So maybe this: find <^a href.*?|\a|ul|li> and delete or replace with
                          > nothing.
                          >
                          > So that would be ^=not a href or \a or ul or li ... or something like
                          > that. Syntax help on alternative nots?
                          >
                          > Not's are new to my regex brain.
                          >
                          > Don
                          > Axel Berger wrote:
                          > > "Don - HtmlFixIt.com" wrote:
                          > >> Won't work Axel, the </a> has been removed.
                          > >
                          > > Yes, but isn't some clickable text unexpected and rather misleading in a
                          > > non-HTML environment? Wouldn't it be better to show the URL verbatim and
                          > > make that clickable?
                          > >
                          > > It depends on what kind of end result you have in mind I suppose, and
                          > > I'm not as yet very sure what it is you want to achieve.
                          > >
                          > > Axel
                          > >
                          > >
                          >
                        • Sheri
                          ... Try: ^!Replace ]* ) RAWS0 Regards, Sheri
                          Message 12 of 15 , Nov 5, 2009
                            --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                            >
                            > Here is the rub, I want to keep hyperlinks and delete everything
                            > else in the html world. So can I simply look for something like:
                            > <not \a or \a href> and replace with nothing?
                            > My not regex is pretty (can't say sloppy because it doesn't exist).

                            Try:
                            ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

                            Regards,
                            Sheri
                          • Sheri
                            ... also this: ^!Replace ]* RAWS0
                            Message 13 of 15 , Nov 5, 2009
                              --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                              >
                              > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                              > >
                              > > Here is the rub, I want to keep hyperlinks and delete everything
                              > > else in the html world. So can I simply look for something like:
                              > > <not \a or \a href> and replace with nothing?
                              > > My not regex is pretty (can't say sloppy because it doesn't exist).
                              >
                              > Try:
                              > ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0

                              also this:

                              ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                            • Don - HtmlFixIt.com
                              dang I was still working on the first puzzle! does that cover either an a tag or an /a tag then Sheri?
                              Message 14 of 15 , Nov 5, 2009
                                dang I was still working on the first puzzle!
                                does that cover either an a tag or an /a tag then Sheri?

                                Sheri wrote:
                                >
                                > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
                                >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                                >>> Here is the rub, I want to keep hyperlinks and delete everything
                                >>> else in the html world. So can I simply look for something like:
                                >>> <not \a or \a href> and replace with nothing?
                                >>> My not regex is pretty (can't say sloppy because it doesn't exist).
                                >> Try:
                                >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
                                >
                                > also this:
                                >
                                > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                              • Sheri
                                ... Should preserve both opening and closing a tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines
                                Message 15 of 15 , Nov 5, 2009
                                  --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                                  >
                                  > dang I was still working on the first puzzle!
                                  > does that cover either an a tag or an /a tag then Sheri?

                                  Should preserve both opening and closing "a" tags, while otherwise removing tags. Tags are not being confined to single lines. To confine to single lines the character class [^>]* would need to be [^>\r\n]*

                                  >
                                  > Sheri wrote:
                                  > >
                                  > > --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@> wrote:
                                  > >> --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@> wrote:
                                  > >>> Here is the rub, I want to keep hyperlinks and delete everything
                                  > >>> else in the html world. So can I simply look for something like:
                                  > >>> <not \a or \a href> and replace with nothing?
                                  > >>> My not regex is pretty (can't say sloppy because it doesn't exist).
                                  > >> Try:
                                  > >> ^!Replace "<(?(?=/?a)(*FAIL)|[^>]*>)" >> "" RAWS0
                                  > >
                                  > > also this:
                                  > >
                                  > > ^!Replace "<(?!/?+a)[^>]*>" >> "" RAWS0
                                  >
                                Your message has been successfully submitted and would be delivered to recipients shortly.