Loading ...
Sorry, an error occurred while loading the content.

Re: find/replace problem and a good tutorial

Expand Messages
  • Tim Chase
    ... There are a variety of ways these can be defined :) This would likely be something like ... g ). /g ... g ). /g These should take care
    Message 1 of 11 , May 30, 2005
    • 0 Attachment
      > _word_ to <em>word</em>
      > *word* to <strong>word</strong>
      > ??word?? to <cite>word</cite>

      There are a variety of ways these can be defined :)

      This would likely be something like

      :%s/\<_\(\S*\)_\>/\='<em>'.substitute(submatch(1), '_', ' ',
      'g').'<\/em>'/g

      :%s/\*\(\S*\)\*/\='<strong>'.substitute(submatch(1), '*', ' ',
      'g').'<\/strong>'/g

      These should take care of _cases_like_this_ though I haven't
      figured out a clean way for it to handle _cases like this_ in
      terms of distinguishing them from _a first case_ and _a second_
      where there are two on the same line. If it's just a single
      word, they can be simplified to

      :%s/\<_\(\S*\)_\>/<em>\1<\/em>/g
      :%s/\*\(\S*\)\*/<strong>\1<\/strong>/g

      or if only a single "_" marks both the beginning and the end, as
      in the above _this is an example_, then it can be done similarly
      with

      :%s/\<_\([^_]*\)_\>/<em>\1<\/em>/g
      :%s/\<\*\([^_]*\)\*\>/<strong>\1<\/strong>/g

      Also, note that the "*...*" ones don't make use of the \<...\>
      because that's not considered a word-boundary. This assumes your
      'iskeyword' property doesn't include an asterisk, but does
      include the underscore. YMMV if you've bunged with this option
      :)

      The third one is a bit trickier, as you've got to find two
      adjacent characters to work with. If *no* question marks can
      occur in the text, it's not so bad. Something like

      %s/??\([^?]\+\)??/<cite>\1<\/cite>/g

      However, if you're allowed to have something like

      ??What time is it? she said??

      then something, perhaps, like the following (mostly untested)


      %s/??\([^?]\%(.\%(??\)\@<!\)*\)??/<cite>\1<\/cite>/g

      > Besides helping me with the previous problem, how can I learn
      > regular expressions in a simple way?, some remarkable URL in
      > this respect it would be very valued.

      There are a number of ways to come at it. Having taken a course
      in programming languages really helps, or if you've written your
      own finite-state-machines (FSMs). :) As for links and texts,
      there's an O'reilly book with "regular expressions" in the title
      which is supposed to be quite good. Regexps really are a
      programming language of sorts, so if you code, it's just a matter
      of breaking down the problem (the target matches you want) into
      regexp atoms. Additionally, you'll often see us break down
      complex regexps to help folks understand the magic they're doing.

      So, in that same spirit... :)

      In the first example (the one with the substitute() and
      submatch() functions) it breaks down like this:

      %s/foo/bar/g you're surely familiar with this.

      where "foo" is defined as "\<_\(\S*\)_\>" and "bar" is the
      evaluation of an expression. For more help on replacing with the
      results of expressions, see ":help sub-replace-special".

      Now, that first expression is
      \< ensure the pattern match begins at the start of a word
      _ that begins with an underscore
      \(...\) mark some stuff that we'll reference later
      \S* everything that's not considered whitespace (WS)
      _ the closing/ending underscore
      \> make sure a word ends here (followed by WS)

      So basically, it's "when an underscore starts something, is
      followed by a bunch of non-whitespace stuff, and then ends with
      an underscore, remember the stuff that wasn't whitespace".

      This then gets massaged via the "sub-replace-special" evalutaion
      of the "\=". First, we start by creating the tags...we know it
      will look something like

      "<em>".stuff."<\/em>"

      (note that we have to escape the forward slash or else the
      ":s/foo/bar/g" gets confused by it, and thinks that its reached
      the end of the replacement text)

      Now, the "stuff" is simply the originally captured stuff from
      above, only we want to replace any underscores in it as well so
      we don't end up with something like

      _this_is_a_test_

      becoming

      <em>this_is_a_test</em>

      The substitute() function takes care of this replacement.

      ==================

      For the second bout of them, it's the same thing as before, only
      we don't have to worry about stripping out extraneous
      underscores. This simplifies matters. No need for the
      "sub-replace-special" expressions, substitute() calls, etc. We
      can just use the back-references, making the

      <em>\1<\/em>

      where the "\1" is replaced with the text we previously tagged as
      "interesting". Again, escaping the forward-slash to keep if from
      terminating the replacement expression prematurely.

      ==================

      The third theme & variation on this is to change what constitutes
      the search target. Previously, we wanted non-whitespace. This
      one, we simply want anything and everything that's not an
      underscore. So we swap the "\S*" for "[^_]*" which is how one
      denotes "anything that isn't an underscore".

      ==================

      Again, the fourth is the same as the previous, only the
      prohibited characters are question-marks, rather than
      underscores. This is something akin to

      :%s/??\(.\{-}\)??/<cite>\1<\/cite>/g

      which stops the match at the first "??" it sees after the opening
      "??"

      ==================

      Lastly, that "??...??" one is tricker. Previously, we could look
      for a single starting atom, some stuff, and a single ending atom.
      This time, we have to look for the starting marker, some stuff
      that doesn't include the ending marker, followed by the ending
      marker. From that, you should be able to discern that we've got

      :%s/??\(stuff\)??/<cite>\1<\/cite>/g

      which is about the same as above. However, things get messy in
      that "stuff" portion. The initial "[^?]" is a single character
      that isn't a question mark. This prevents troubles that may crop
      up with things like

      ?????

      We then group (but don't bother to tag, using the \%(...\)
      syntax) any characters that aren't immediately followed/preceeded
      (depending on where you start counting) by a pair of question marks.


      [^?] a character that isn't a question mark
      \%(...)* a bunch of valid things we group, but don't need to track
      . any character
      \%(??\)\@<! ensure that "??" doesn't match before this point.



      They can be complex & hairy, but they've gotta make sense to the
      regexp interpreter at some point, so they can be dissected :)
      It's just a matter of breaking down the problem into bits that
      you know have solutions, then stringing them all together.

      Further help within vim can be found at topics such as

      :help :s
      :he sub-replace-special
      :he 'iskeyword'
      :he substitute()
      :he submatch()
      :he /\(
      :he /\%(
      :he /[]
      :he /\@<!
      :he /\1


      Hope this gets you well on the road to regexp mastery...

      -tim
    • Marian Csontos
      On Mon, 30 May 2005 17:02:57 +0200, Scholte, J.C.M. ... In this case I d prefer ... Marian ________ Information from NOD32 ________ This
      Message 2 of 11 , May 30, 2005
      • 0 Attachment
        On Mon, 30 May 2005 17:02:57 +0200, Scholte, J.C.M. <J.C.M.Scholte@...>
        wrote:

        > try: http://www.oreilly.com/catalog/regex/
        >
        > :%s/_\([^_]\+\)_/<em>\1<\/em>/g
        > :%s/\*\([^\*]\+\)\*/<strong>\1<\/strong>/g
        > :%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

        In this case I'd prefer
        :%s/??\([^?]\{-}\)??/<cite>\1<\/cite>/g

        Marian



        ________ Information from NOD32 ________
        This message was checked by NOD32 Antivirus System for Linux Mail Server.
        http://www.nod32.com
      • Vigil
        ... http://www.geocities.com/volontir/ -- .
        Message 3 of 11 , May 31, 2005
        • 0 Attachment
          On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:

          > Besides helping me with the previous problem, how can I learn regular
          > expressions in a simple way?, some remarkable URL in this respect it
          > would be very valued.

          http://www.geocities.com/volontir/

          --

          .
        • A. J. Mechelynck
          ... You might start with ... in ascending order of difficulty. I believe that everything is there, but like all Vim help, it is best to read it attentively;
          Message 4 of 11 , May 31, 2005
          • 0 Attachment
            Vigil wrote:
            > On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:
            >
            >> Besides helping me with the previous problem, how can I learn regular
            >> expressions in a simple way?, some remarkable URL in this respect it
            >> would be very valued.
            >
            >
            > http://www.geocities.com/volontir/
            >

            You might start with

            :help 03.9
            :help usr_27.txt
            :help pattern.txt

            in ascending order of difficulty. I believe that everything is there,
            but like all Vim help, it is best to read it attentively; often
            "hands-on" experimenting (trying some searches on a "real" file and
            seeing if they work) is the best way to learn.

            There may be books at your bookshop; but keep in mind that Perl regular
            expressions, Vim regular expressions, and "grep" regular expressions,
            are all similar but not identical.

            There are also two references and one URL at the very end of the "tutor"
            file:

            :view $VIMRUNTIME/tutor/tutor
            G

            I don't know if the books are still in print, or the URL still operational.


            Best regards,
            Tony.
          • jkilbour@pol.net
            I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null
            Message 5 of 11 , May 31, 2005
            • 0 Attachment
              I would like to identify the null fields in a set of files (which have
              different numbers of fields; i.e. to find not just the number of fields
              that are null but also which fields are null. Is this possible using vim
              regular expressions?
            • Eljay Love-Jensen
              Hi jkilbour, ... Given... ... You want to search for: /|| And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4,
              Message 6 of 11 , May 31, 2005
              • 0 Attachment
                Hi jkilbour,

                >I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?

                Given...
                |one|two|three||five||seven|

                You want to search for:
                /||

                And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?

                Is that what you are asking?

                I don't think that's possible. (Which means Tony will show how to do it in a minute or two.)

                I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields. Then, with a separate search, all the empty 2nd fields. Then with another separate search, all the empty 3rd fields. 4th, 5th, 6th, and finally empty 7th fields.

                Would that suffice?

                Here's an example of finding the empty fourth field:
                /^\(|[^|]*\)\{3}\zs||

                NOTE: I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar. If it doesn't, you'll have to adjust the search pattern accordingly.

                HTH,
                --Eljay
              • A. J. Mechelynck
                ... Thanks for your high opinion of my capacities. Regexes were however never my forte. I don t know if it is possible, but if it is, it would require a more
                Message 7 of 11 , May 31, 2005
                • 0 Attachment
                  Eljay Love-Jensen wrote:
                  > Hi jkilbour,
                  >
                  >
                  >>I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?
                  >
                  >
                  > Given...
                  > |one|two|three||five||seven|
                  >
                  > You want to search for:
                  > /||
                  >
                  > And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?
                  >
                  > Is that what you are asking?
                  >
                  > I don't think that's possible. (Which means Tony will show how to do it in a minute or two.)

                  Thanks for your high opinion of my capacities. Regexes were however
                  never my forte. I don't know if it is possible, but if it is, it would
                  require a more complicated regex than what I feel up to generating at
                  the moment. The first step would be to define what to replace the aobve
                  line by. Maybe generate a quickfix "error file" with would reference
                  each matching || (overlappings allowed!) so that :cn would find them all
                  in turn (using :vimgrep on Vim 7 if possible)? Or else, generate a file with

                  25|one|two|three|4|five|6|seven

                  or

                  25|4|6

                  if the line you showed was line 25?

                  Or something else? Let jkilbour answer.

                  >
                  > I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields. Then, with a separate search, all the empty 2nd fields. Then with another separate search, all the empty 3rd fields. 4th, 5th, 6th, and finally empty 7th fields.
                  >
                  > Would that suffice?
                  >
                  > Here's an example of finding the empty fourth field:
                  > /^\(|[^|]*\)\{3}\zs||
                  >
                  > NOTE: I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar. If it doesn't, you'll have to adjust the search pattern accordingly.
                  >
                  > HTH,
                  > --Eljay
                  >
                  >
                  >
                  >

                  Best regards,
                  Tony.
                • Hari Krishna Dara
                  ... Depending on what exactly you want to do with them, you might be able to create multiple solutions. May be you can first number all of the fields and then
                  Message 8 of 11 , May 31, 2005
                  • 0 Attachment
                    On Tue, 31 May 2005 at 6:46am, jkilbour@... wrote:

                    > I would like to identify the null fields in a set of files (which have
                    > different numbers of fields; i.e. to find not just the number of fields
                    > that are null but also which fields are null. Is this possible using vim
                    > regular expressions?
                    >

                    Depending on what exactly you want to do with them, you might be able to
                    create multiple solutions. May be you can first number all of the fields
                    and then search for those that are empty to lookup them up.

                    function! Submatch()
                    let g:idx = g:idx + 1
                    let match = submatch(1)
                    return (match == '' ? g:idx.':'.'<null>' : match)
                    endfunction

                    let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g

                    The above will transform

                    |one|two|three||five||seven|

                    into

                    |one|two|three|4:<null>|five|6:<null>|seven|

                    All that you need to do then is to search for nulls using a pattern such
                    as "\d\+:<null>". The actual regex to use to do the above substitution
                    will depend on exact specifications, such as can you have a "|"
                    character inside a field, and if so how you escape them. I am not a
                    regex guru myself, but if you need further help, you can describe your
                    needs in more details, for me or others on the list to come up with the
                    right pattern.

                    There are many regex gurus on this list, so I won't be surprised to see
                    a much simpler/easy to use solution. However, if you want to deal with a
                    programmatic approach, you can take a look at my multvals.vim plugin to
                    iterate over the fields and do something with them.

                    call MvIterCreate('|one|two|three||five||seven|', '|', 'Iter')
                    let n = 0
                    while MvIterHasNext('Iter')
                    let ele = MvIterNext('Iter')
                    if ele == ''
                    echo 'Found null at: ' . n
                    endif
                    let n = n + 1
                    endwhile
                    call MvIterDestroy('Iter')

                    PS: Multvals treats the first "|" in the string also as a separator
                    resulting in one extra field, but you should be able to workaround that.

                    --
                    HTH,
                    Hari



                    __________________________________
                    Do you Yahoo!?
                    Yahoo! Small Business - Try our new Resources site
                    http://smallbusiness.yahoo.com/resources/
                  • Antony Scriven
                    Hello ... This is somewhat shorter and transforms the whole buffer (if I understand the problem correctly; I missed the original mail):
                    Message 9 of 11 , Jun 1, 2005
                    • 0 Attachment
                      Hello

                      On May 31, Hari Krishna Dara wrote:

                      > On Tue, 31 May 2005 at 6:46am, jkilbour@... wrote:
                      >
                      > > I would like to identify the null fields in a set of
                      > > files (which have different numbers of fields; i.e. to
                      > > find not just the number of fields that are null but
                      > > also which fields are null. Is this possible using vim
                      > > regular expressions?
                      >
                      > Depending on what exactly you want to do with them, you
                      > might be able to create multiple solutions. May be you
                      > can first number all of the fields and then search for
                      > those that are empty to lookup them up.
                      >
                      > function! Submatch()
                      > let g:idx = g:idx + 1
                      > let match = submatch(1)
                      > return (match == '' ? g:idx.':'.'<null>' : match)
                      > endfunction
                      >
                      > let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g
                      >
                      > The above will transform
                      >
                      > |one|two|three||five||seven|
                      >
                      > into
                      >
                      > |one|two|three|4:<null>|five|6:<null>|seven|
                      >
                      > [...]

                      This is somewhat shorter and transforms the whole buffer (if
                      I understand the problem correctly; I missed the original
                      mail):

                      %s/\(^.*|\)\@<=|/\=strlen(substitute(submatch(1),'[^|]','','g')).':<null>|'/g

                      But I offer this mostly for interesting ways to use \@<=. It
                      is, IMO, unmaintainable. I think \= plus a function, as Hari
                      has done, is normally the best approach for sort of thing.

                      Antony
                    Your message has been successfully submitted and would be delivered to recipients shortly.