Loading ...
Sorry, an error occurred while loading the content.

Re: search disk and regexp

Expand Messages
  • Sheri
    ... With the first two, you are looking for situations where there is only one character between and . With the third, you are looking for
    Message 1 of 3 , Oct 14, 2008
    • 0 Attachment
      --- In notetab@yahoogroups.com, "Adrien Verlee" <adrien.verlee@...> wrote:
      >
      > Hello,
      >
      > I want to find the words between <h1> and </h1> on an entire folder.
      > When I thick reg exp and write in de the find-field of Search disk
      > dialog: <h1>\w</h1> or <h1>[a-zA-Z]</h1>, or <h1>(.*)</h1>. Nothing
      > happens, no files are found (and there are more than 500 files with
      > <h1>...</h1>)
      >
      > Thus, what I'm doing wrong?
      > --
      > adrien
      >

      With the first two, you are looking for situations where there is only
      one character between <h1> and </h1>. With the third, you are looking
      for <h1> followed by all characters that are not line breaks, followed
      by </h1>. Since the next thing would have to be a line break, it
      couldn't possibly be </h1>.

      <h1>\w+</h1>

      would match the tags surrounding multiple word characters (which
      includes alpha and numeric and I think underscores).

      <h1>[a-zA-Z]+</h1>

      would allow only alphabetic characters between the tags.

      <h1>.*?</h1>

      would match from an opening tag through the first closing tag (but
      everything must be the same line -- same for for the others).

      If you opt for one of the first two, they could be improved by using
      two plus signs e.g.

      <h1>\w++</h1> or <h1>[a-zA-Z]++</h1>

      because in both cases, since "<" is not in the matching characters of
      \w or [a-zA-Z], there is no need to backtrack. The second plus sign
      says to consume all of the \w or [a-zA-Z] characters without
      backtracking and then require the next characters to be </h1>.

      If your tags could be split over multiple lines, that can be done for
      example with

      (?s)<h1>.+?</h1>

      The (?s) makes dot including line breaking characters. The difference
      between using a plus and an asterisk after the dot is that with plus,
      there must be at least one character between the tags.

      Regards,
      Sheri
    • Adrien Verlee
      ... This is perfect! Thank you. -- adrien
      Message 2 of 3 , Oct 14, 2008
      • 0 Attachment
        Op 14-okt-08, om 16:07 heeft Sheri het volgende geschreven:

        > <h1>.*?</h1>


        This is perfect! Thank you.
        --
        adrien
      Your message has been successfully submitted and would be delivered to recipients shortly.