Loading ...
Sorry, an error occurred while loading the content.

RE: find/replace problem and a good tutorial

Expand Messages
  • Scholte, J.C.M.
    try: http://www.oreilly.com/catalog/regex/ ... Hans Scholte, ... Van: Juan Pablo Aqueveque [mailto:jp.aqueveque@gmail.com] Verzonden: maandag 30 mei
    Message 1 of 11 , May 30, 2005
    • 0 Attachment
      try: http://www.oreilly.com/catalog/regex/

      :%s/_\([^_]\+\)_/<em>\1<\/em>/g
      :%s/\*\([^\*]\+\)\*/<strong>\1<\/strong>/g
      :%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

      Hans Scholte, <DPC/>



      -----Oorspronkelijk bericht-----
      Van: Juan Pablo Aqueveque [mailto:jp.aqueveque@...]
      Verzonden: maandag 30 mei 2005 16:22
      Aan: vim@...
      Onderwerp: find/replace problem and a good tutorial


      Hi All:

      My great problem with vim has always been the regular expressions. It
      is very difficult to understand them and much more to develop one. For
      instances, I want to do the next find and replace:

      _word_ to <em>word</em>
      *word* to <strong>word</strong>
      ??word?? to <cite>word</cite>

      Besides helping me with the previous problem, how can I learn regular
      expressions in a simple way?, some remarkable URL in this respect it
      would be very valued.

      In advance, thank you very much for your help, the members of this
      list are always willing to help.


      --
      juan pablo aqueveque
      www.juque.cl
    • Tim Chase
      ... There are a variety of ways these can be defined :) This would likely be something like ... g ). /g ... g ). /g These should take care
      Message 2 of 11 , May 30, 2005
      • 0 Attachment
        > _word_ to <em>word</em>
        > *word* to <strong>word</strong>
        > ??word?? to <cite>word</cite>

        There are a variety of ways these can be defined :)

        This would likely be something like

        :%s/\<_\(\S*\)_\>/\='<em>'.substitute(submatch(1), '_', ' ',
        'g').'<\/em>'/g

        :%s/\*\(\S*\)\*/\='<strong>'.substitute(submatch(1), '*', ' ',
        'g').'<\/strong>'/g

        These should take care of _cases_like_this_ though I haven't
        figured out a clean way for it to handle _cases like this_ in
        terms of distinguishing them from _a first case_ and _a second_
        where there are two on the same line. If it's just a single
        word, they can be simplified to

        :%s/\<_\(\S*\)_\>/<em>\1<\/em>/g
        :%s/\*\(\S*\)\*/<strong>\1<\/strong>/g

        or if only a single "_" marks both the beginning and the end, as
        in the above _this is an example_, then it can be done similarly
        with

        :%s/\<_\([^_]*\)_\>/<em>\1<\/em>/g
        :%s/\<\*\([^_]*\)\*\>/<strong>\1<\/strong>/g

        Also, note that the "*...*" ones don't make use of the \<...\>
        because that's not considered a word-boundary. This assumes your
        'iskeyword' property doesn't include an asterisk, but does
        include the underscore. YMMV if you've bunged with this option
        :)

        The third one is a bit trickier, as you've got to find two
        adjacent characters to work with. If *no* question marks can
        occur in the text, it's not so bad. Something like

        %s/??\([^?]\+\)??/<cite>\1<\/cite>/g

        However, if you're allowed to have something like

        ??What time is it? she said??

        then something, perhaps, like the following (mostly untested)


        %s/??\([^?]\%(.\%(??\)\@<!\)*\)??/<cite>\1<\/cite>/g

        > Besides helping me with the previous problem, how can I learn
        > regular expressions in a simple way?, some remarkable URL in
        > this respect it would be very valued.

        There are a number of ways to come at it. Having taken a course
        in programming languages really helps, or if you've written your
        own finite-state-machines (FSMs). :) As for links and texts,
        there's an O'reilly book with "regular expressions" in the title
        which is supposed to be quite good. Regexps really are a
        programming language of sorts, so if you code, it's just a matter
        of breaking down the problem (the target matches you want) into
        regexp atoms. Additionally, you'll often see us break down
        complex regexps to help folks understand the magic they're doing.

        So, in that same spirit... :)

        In the first example (the one with the substitute() and
        submatch() functions) it breaks down like this:

        %s/foo/bar/g you're surely familiar with this.

        where "foo" is defined as "\<_\(\S*\)_\>" and "bar" is the
        evaluation of an expression. For more help on replacing with the
        results of expressions, see ":help sub-replace-special".

        Now, that first expression is
        \< ensure the pattern match begins at the start of a word
        _ that begins with an underscore
        \(...\) mark some stuff that we'll reference later
        \S* everything that's not considered whitespace (WS)
        _ the closing/ending underscore
        \> make sure a word ends here (followed by WS)

        So basically, it's "when an underscore starts something, is
        followed by a bunch of non-whitespace stuff, and then ends with
        an underscore, remember the stuff that wasn't whitespace".

        This then gets massaged via the "sub-replace-special" evalutaion
        of the "\=". First, we start by creating the tags...we know it
        will look something like

        "<em>".stuff."<\/em>"

        (note that we have to escape the forward slash or else the
        ":s/foo/bar/g" gets confused by it, and thinks that its reached
        the end of the replacement text)

        Now, the "stuff" is simply the originally captured stuff from
        above, only we want to replace any underscores in it as well so
        we don't end up with something like

        _this_is_a_test_

        becoming

        <em>this_is_a_test</em>

        The substitute() function takes care of this replacement.

        ==================

        For the second bout of them, it's the same thing as before, only
        we don't have to worry about stripping out extraneous
        underscores. This simplifies matters. No need for the
        "sub-replace-special" expressions, substitute() calls, etc. We
        can just use the back-references, making the

        <em>\1<\/em>

        where the "\1" is replaced with the text we previously tagged as
        "interesting". Again, escaping the forward-slash to keep if from
        terminating the replacement expression prematurely.

        ==================

        The third theme & variation on this is to change what constitutes
        the search target. Previously, we wanted non-whitespace. This
        one, we simply want anything and everything that's not an
        underscore. So we swap the "\S*" for "[^_]*" which is how one
        denotes "anything that isn't an underscore".

        ==================

        Again, the fourth is the same as the previous, only the
        prohibited characters are question-marks, rather than
        underscores. This is something akin to

        :%s/??\(.\{-}\)??/<cite>\1<\/cite>/g

        which stops the match at the first "??" it sees after the opening
        "??"

        ==================

        Lastly, that "??...??" one is tricker. Previously, we could look
        for a single starting atom, some stuff, and a single ending atom.
        This time, we have to look for the starting marker, some stuff
        that doesn't include the ending marker, followed by the ending
        marker. From that, you should be able to discern that we've got

        :%s/??\(stuff\)??/<cite>\1<\/cite>/g

        which is about the same as above. However, things get messy in
        that "stuff" portion. The initial "[^?]" is a single character
        that isn't a question mark. This prevents troubles that may crop
        up with things like

        ?????

        We then group (but don't bother to tag, using the \%(...\)
        syntax) any characters that aren't immediately followed/preceeded
        (depending on where you start counting) by a pair of question marks.


        [^?] a character that isn't a question mark
        \%(...)* a bunch of valid things we group, but don't need to track
        . any character
        \%(??\)\@<! ensure that "??" doesn't match before this point.



        They can be complex & hairy, but they've gotta make sense to the
        regexp interpreter at some point, so they can be dissected :)
        It's just a matter of breaking down the problem into bits that
        you know have solutions, then stringing them all together.

        Further help within vim can be found at topics such as

        :help :s
        :he sub-replace-special
        :he 'iskeyword'
        :he substitute()
        :he submatch()
        :he /\(
        :he /\%(
        :he /[]
        :he /\@<!
        :he /\1


        Hope this gets you well on the road to regexp mastery...

        -tim
      • Marian Csontos
        On Mon, 30 May 2005 17:02:57 +0200, Scholte, J.C.M. ... In this case I d prefer ... Marian ________ Information from NOD32 ________ This
        Message 3 of 11 , May 30, 2005
        • 0 Attachment
          On Mon, 30 May 2005 17:02:57 +0200, Scholte, J.C.M. <J.C.M.Scholte@...>
          wrote:

          > try: http://www.oreilly.com/catalog/regex/
          >
          > :%s/_\([^_]\+\)_/<em>\1<\/em>/g
          > :%s/\*\([^\*]\+\)\*/<strong>\1<\/strong>/g
          > :%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

          In this case I'd prefer
          :%s/??\([^?]\{-}\)??/<cite>\1<\/cite>/g

          Marian



          ________ Information from NOD32 ________
          This message was checked by NOD32 Antivirus System for Linux Mail Server.
          http://www.nod32.com
        • Vigil
          ... http://www.geocities.com/volontir/ -- .
          Message 4 of 11 , May 31, 2005
          • 0 Attachment
            On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:

            > Besides helping me with the previous problem, how can I learn regular
            > expressions in a simple way?, some remarkable URL in this respect it
            > would be very valued.

            http://www.geocities.com/volontir/

            --

            .
          • A. J. Mechelynck
            ... You might start with ... in ascending order of difficulty. I believe that everything is there, but like all Vim help, it is best to read it attentively;
            Message 5 of 11 , May 31, 2005
            • 0 Attachment
              Vigil wrote:
              > On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:
              >
              >> Besides helping me with the previous problem, how can I learn regular
              >> expressions in a simple way?, some remarkable URL in this respect it
              >> would be very valued.
              >
              >
              > http://www.geocities.com/volontir/
              >

              You might start with

              :help 03.9
              :help usr_27.txt
              :help pattern.txt

              in ascending order of difficulty. I believe that everything is there,
              but like all Vim help, it is best to read it attentively; often
              "hands-on" experimenting (trying some searches on a "real" file and
              seeing if they work) is the best way to learn.

              There may be books at your bookshop; but keep in mind that Perl regular
              expressions, Vim regular expressions, and "grep" regular expressions,
              are all similar but not identical.

              There are also two references and one URL at the very end of the "tutor"
              file:

              :view $VIMRUNTIME/tutor/tutor
              G

              I don't know if the books are still in print, or the URL still operational.


              Best regards,
              Tony.
            • jkilbour@pol.net
              I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null
              Message 6 of 11 , May 31, 2005
              • 0 Attachment
                I would like to identify the null fields in a set of files (which have
                different numbers of fields; i.e. to find not just the number of fields
                that are null but also which fields are null. Is this possible using vim
                regular expressions?
              • Eljay Love-Jensen
                Hi jkilbour, ... Given... ... You want to search for: /|| And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4,
                Message 7 of 11 , May 31, 2005
                • 0 Attachment
                  Hi jkilbour,

                  >I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?

                  Given...
                  |one|two|three||five||seven|

                  You want to search for:
                  /||

                  And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?

                  Is that what you are asking?

                  I don't think that's possible. (Which means Tony will show how to do it in a minute or two.)

                  I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields. Then, with a separate search, all the empty 2nd fields. Then with another separate search, all the empty 3rd fields. 4th, 5th, 6th, and finally empty 7th fields.

                  Would that suffice?

                  Here's an example of finding the empty fourth field:
                  /^\(|[^|]*\)\{3}\zs||

                  NOTE: I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar. If it doesn't, you'll have to adjust the search pattern accordingly.

                  HTH,
                  --Eljay
                • A. J. Mechelynck
                  ... Thanks for your high opinion of my capacities. Regexes were however never my forte. I don t know if it is possible, but if it is, it would require a more
                  Message 8 of 11 , May 31, 2005
                  • 0 Attachment
                    Eljay Love-Jensen wrote:
                    > Hi jkilbour,
                    >
                    >
                    >>I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?
                    >
                    >
                    > Given...
                    > |one|two|three||five||seven|
                    >
                    > You want to search for:
                    > /||
                    >
                    > And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?
                    >
                    > Is that what you are asking?
                    >
                    > I don't think that's possible. (Which means Tony will show how to do it in a minute or two.)

                    Thanks for your high opinion of my capacities. Regexes were however
                    never my forte. I don't know if it is possible, but if it is, it would
                    require a more complicated regex than what I feel up to generating at
                    the moment. The first step would be to define what to replace the aobve
                    line by. Maybe generate a quickfix "error file" with would reference
                    each matching || (overlappings allowed!) so that :cn would find them all
                    in turn (using :vimgrep on Vim 7 if possible)? Or else, generate a file with

                    25|one|two|three|4|five|6|seven

                    or

                    25|4|6

                    if the line you showed was line 25?

                    Or something else? Let jkilbour answer.

                    >
                    > I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields. Then, with a separate search, all the empty 2nd fields. Then with another separate search, all the empty 3rd fields. 4th, 5th, 6th, and finally empty 7th fields.
                    >
                    > Would that suffice?
                    >
                    > Here's an example of finding the empty fourth field:
                    > /^\(|[^|]*\)\{3}\zs||
                    >
                    > NOTE: I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar. If it doesn't, you'll have to adjust the search pattern accordingly.
                    >
                    > HTH,
                    > --Eljay
                    >
                    >
                    >
                    >

                    Best regards,
                    Tony.
                  • Hari Krishna Dara
                    ... Depending on what exactly you want to do with them, you might be able to create multiple solutions. May be you can first number all of the fields and then
                    Message 9 of 11 , May 31, 2005
                    • 0 Attachment
                      On Tue, 31 May 2005 at 6:46am, jkilbour@... wrote:

                      > I would like to identify the null fields in a set of files (which have
                      > different numbers of fields; i.e. to find not just the number of fields
                      > that are null but also which fields are null. Is this possible using vim
                      > regular expressions?
                      >

                      Depending on what exactly you want to do with them, you might be able to
                      create multiple solutions. May be you can first number all of the fields
                      and then search for those that are empty to lookup them up.

                      function! Submatch()
                      let g:idx = g:idx + 1
                      let match = submatch(1)
                      return (match == '' ? g:idx.':'.'<null>' : match)
                      endfunction

                      let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g

                      The above will transform

                      |one|two|three||five||seven|

                      into

                      |one|two|three|4:<null>|five|6:<null>|seven|

                      All that you need to do then is to search for nulls using a pattern such
                      as "\d\+:<null>". The actual regex to use to do the above substitution
                      will depend on exact specifications, such as can you have a "|"
                      character inside a field, and if so how you escape them. I am not a
                      regex guru myself, but if you need further help, you can describe your
                      needs in more details, for me or others on the list to come up with the
                      right pattern.

                      There are many regex gurus on this list, so I won't be surprised to see
                      a much simpler/easy to use solution. However, if you want to deal with a
                      programmatic approach, you can take a look at my multvals.vim plugin to
                      iterate over the fields and do something with them.

                      call MvIterCreate('|one|two|three||five||seven|', '|', 'Iter')
                      let n = 0
                      while MvIterHasNext('Iter')
                      let ele = MvIterNext('Iter')
                      if ele == ''
                      echo 'Found null at: ' . n
                      endif
                      let n = n + 1
                      endwhile
                      call MvIterDestroy('Iter')

                      PS: Multvals treats the first "|" in the string also as a separator
                      resulting in one extra field, but you should be able to workaround that.

                      --
                      HTH,
                      Hari



                      __________________________________
                      Do you Yahoo!?
                      Yahoo! Small Business - Try our new Resources site
                      http://smallbusiness.yahoo.com/resources/
                    • Antony Scriven
                      Hello ... This is somewhat shorter and transforms the whole buffer (if I understand the problem correctly; I missed the original mail):
                      Message 10 of 11 , Jun 1, 2005
                      • 0 Attachment
                        Hello

                        On May 31, Hari Krishna Dara wrote:

                        > On Tue, 31 May 2005 at 6:46am, jkilbour@... wrote:
                        >
                        > > I would like to identify the null fields in a set of
                        > > files (which have different numbers of fields; i.e. to
                        > > find not just the number of fields that are null but
                        > > also which fields are null. Is this possible using vim
                        > > regular expressions?
                        >
                        > Depending on what exactly you want to do with them, you
                        > might be able to create multiple solutions. May be you
                        > can first number all of the fields and then search for
                        > those that are empty to lookup them up.
                        >
                        > function! Submatch()
                        > let g:idx = g:idx + 1
                        > let match = submatch(1)
                        > return (match == '' ? g:idx.':'.'<null>' : match)
                        > endfunction
                        >
                        > let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g
                        >
                        > The above will transform
                        >
                        > |one|two|three||five||seven|
                        >
                        > into
                        >
                        > |one|two|three|4:<null>|five|6:<null>|seven|
                        >
                        > [...]

                        This is somewhat shorter and transforms the whole buffer (if
                        I understand the problem correctly; I missed the original
                        mail):

                        %s/\(^.*|\)\@<=|/\=strlen(substitute(submatch(1),'[^|]','','g')).':<null>|'/g

                        But I offer this mostly for interesting ways to use \@<=. It
                        is, IMO, unmaintainable. I think \= plus a function, as Hari
                        has done, is normally the best approach for sort of thing.

                        Antony
                      Your message has been successfully submitted and would be delivered to recipients shortly.