Loading ...
Sorry, an error occurred while loading the content.

Strip XML tags

Expand Messages
  • Martin ONeill
    I think an OPML file is in XML. How can I strip XML tags. If I use strip HTML tags then all the data is stripped out as well. Thanks Martin
    Message 1 of 4 , Nov 6, 2005
    • 0 Attachment
      I think an OPML file is in XML. How can I strip XML tags. If I use
      strip HTML tags then all the data is stripped out as well.

      Thanks
      Martin
    • loro
      ... Is it the radio UserLand outline stuff you mean? Like this one? Almost all tags are empty
      Message 2 of 4 , Nov 6, 2005
      • 0 Attachment
        Martin ONeill wrote:
        >I think an OPML file is in XML. How can I strip XML tags. If I use
        >strip HTML tags then all the data is stripped out as well.

        Is it the radio UserLand outline stuff you mean? Like this one?
        <http://static.userland.com/gems/radiodiscuss/specification.opml>

        Almost all tags are empty and the real content is in the attribute values,
        so Strip HTML won't work. You could use ^$GetHtmlTagAttr()$ .

        As an example, to get the value of "text" and strip the rest of the
        <outline> tags in the document above you could do something like this.

        ---------------------
        H="OPML"
        ^!Jump text_start
        :loop
        ^!Find <outline IS
        ^!IfError end
        ^!Replace "^$GetHTMLTag$" >> "^$GetHtmlTagAttr("^$GetHTMLTag$";text)$" S
        ^!Goto loop
        ---------------------

        I know there can be several attributes and that there are more things both
        to strip and preserve. Just a push (hopefully) in the right direction. ;-)

        Lotta
      • Martin ONeill
        ... I m not sure as that page would not load for me. The opml I m referring to is on http://www.opml.org/spec I have tried your clip, but on a small circa 50k
        Message 3 of 4 , Nov 6, 2005
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, loro <loro-spam01-@t...> wrote:
          > Is it the radio UserLand outline stuff you mean? Like this one?
          > <http://static.userland.com/gems/radiodiscuss/specification.opml>

          I'm not sure as that page would not load for me. The opml I'm
          referring to is on http://www.opml.org/spec

          I have tried your clip, but on a small circa 50k file, the clip goes
          into some sort of loop and does not finish.

          I don't know anything about XML and probably need to learn a bit. Is
          there any simple online source that I could refer to? - particularly
          regarding the XML tags.

          Many thanks,
          Martin
        • loro
          ... Yeah, they have a link to sample file I posted a link to. We talk about the a same thing. That s a starting point. ;-) ... Don t know why it loops, but it
          Message 4 of 4 , Nov 6, 2005
          • 0 Attachment
            Martin ONeill wrote:
            >I'm not sure as that page would not load for me. The opml I'm
            >referring to is on http://www.opml.org/spec

            Yeah, they have a link to sample file I posted a link to. We talk about the
            a same thing. That's a starting point. ;-)

            >I have tried your clip, but on a small circa 50k file, the clip goes
            >into some sort of loop and does not finish.

            Don't know why it loops, but it only looks for the 'text' attributes found
            in that sample file.

            >I don't know anything about XML and probably need to learn a bit. Is
            >there any simple online source that I could refer to?

            The samples at opml.org, I guess. But you don't need to know XML, more than
            to understand the concept of tags. If you have this
            <tag>Content</tag>
            "Strip HTML" gets rid of '<tag>' and '</tag>' and leaves 'Content'.

            But this OPML format seems to build on mostly empty elements (they have no
            content), with the bits you want to keep in attributes. Like so.
            <tag attr="stuff we want here" />
            "Strip HTML" kills the whole tag and leaves you with nothing. That's why
            you must find the attribute values and save them somehow.
            ^$GetHtmlTagAttr()$ does that.

            If you post a link to a file of the type you are actually working with,
            or a long enough sample of it, someone can probably come up with something.

            Lotta
          Your message has been successfully submitted and would be delivered to recipients shortly.