Loading ...
Sorry, an error occurred while loading the content.
 

Re: remove and clean CDATA out of xml

Expand Messages
  • bw
    Sorry, I do not understand the concept top post, but I guess you mean start a new thread for a different question ;-) I just needed to add a /g in order to get
    Message 1 of 11 , Feb 1, 2010
      Sorry, I do not understand the concept top post, but I guess you mean
      start a new thread for a different question ;-)

      I just needed to add a /g in order to get is done everywhere.

      Thanks! Very helpful for me to understand even more the power of vim :)

      On 01/02/2010, Christian Brabandt <cblists@...> wrote:
      > On Mon, February 1, 2010 4:49 pm, bw wrote:
      >> Your last comment made me think. I would like all the html encoded
      >> parts like É, é ’ etc... to be transformed into real
      >> utf8 as the feed should be utf8. (É, é and ’)
      >
      > Please don't top post.
      >
      > Regarding your question, I believe this:
      > :%s/&#\(\d\+\);/\=printf("%s ", nr2char(str2nr(submatch(1),10)))/
      >
      > should do what you want.
      >
      >
      > regards,
      > Christian
      >
      > --
      > You received this message from the "vim_use" maillist.
      > For more information, visit http://www.vim.org/maillist.php


      --
      [Bb](astia{2}n)?\s?[Ww](ak{2}ie)?$

      --
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Raúl Núñez de Arenas Coronado
      Saluton bw :) ... No, it s putting the reply text *before* the quoted text: http://en.wikipedia.org/wiki/Posting_style#Top-posting The preferred style on the
      Message 2 of 11 , Feb 1, 2010
        Saluton bw :)

        bw <b...@...> skribis:
        > Sorry, I do not understand the concept top post, but I guess you mean
        > start a new thread for a different question ;-)

        No, it's putting the reply text *before* the quoted text:
        http://en.wikipedia.org/wiki/Posting_style#Top-posting

        The preferred style on the list is interleaved-posting (also explained
        in the link above), but a good bunch of members just do as they please.

        --
        Raúl "DervishD" Núñez de Arenas Coronado
        Linux Registered User 88736 | http://www.dervishd.net
        It's my PC and I'll cry if I want to... RAmen!

        --
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
      • bw
        ... I have a hard time understand the ( %( %(]] ) @! _. ) {-} ) part. What does it do? What does % mean? I do understand it will take anything in CDATA
        Message 3 of 11 , Feb 2, 2010
          > :%s/<!\[\[CDATA\[\(\%(\%(]]>\)\@!\_.\)\{-}\)]]>/\=substitute(submatch(1),'<[^>]*>', '', 'g')/g

          I have a hard time understand the \(\%(\%(]]>\)\@!\_.\)\{-}\) part.
          What does it do? What does \% mean? I do understand it will take
          anything in CDATA brackets and run the substiture command over it.


          thanks

          --
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
        • Tim Chase
          ... The %(... ) is a non-capturing group. The command breaks down as ...
          Message 4 of 11 , Feb 2, 2010
            bw wrote:
            >> :%s/<!\[\[CDATA\[\(\%(\%(]]>\)\@!\_.\)\{-}\)]]>/\=substitute(submatch(1),'<[^>]*>', '', 'g')/g
            >
            > I have a hard time understand the \(\%(\%(]]>\)\@!\_.\)\{-}\) part.
            > What does it do? What does \% mean? I do understand it will take
            > anything in CDATA brackets and run the substiture command over it.

            The \%(...\) is a non-capturing group.

            The command breaks down as

            :%s/ substitute

            <!\[\[CDATA\[ a literal "<![[CDATA["

            \( begin capturing
            \%( begin non-capturing group #1
            \%( begin non-capturing group #2
            ]]> a literal "]]>" close tag
            \) (end non-cap group #2)
            \@! isn't allowed to match here
            \_. match any one character incl NL
            \) (end non-cap group #1)
            \{-} as few as possible
            \) end capture group
            ]]> the literal "]]>" that matches
            / and replace it with
            \= the following expression
            substitute( uh...substitute :)
            submatch(1), the content of the CDATA
            '<[^>]*>', all tags and replace them
            '', with nothing
            'g') for all of the tags
            /g for all of the matches on a line

            In retrospect, because "]]>" unilaterally closes a CDATA and
            you're capturing everything inside, you might be able to simplify
            that to just

            :%s/:%s/<!\[\[CDATA\[\(\_.\{-}\)]]>/\=substitute(submatch(1),'<[^>]*>',
            '', 'g')/g

            HTH,

            -tim


            --
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
          Your message has been successfully submitted and would be delivered to recipients shortly.