Loading ...
Sorry, an error occurred while loading the content.

search disk and regexp

Expand Messages
  • Adrien Verlee
    Hello, I want to find the words between and on an entire folder. When I thick reg exp and write in de the find-field of Search disk dialog:
    Message 1 of 3 , Oct 14, 2008
    • 0 Attachment
      Hello,

      I want to find the words between <h1> and </h1> on an entire folder.
      When I thick reg exp and write in de the find-field of Search disk
      dialog: <h1>\w</h1> or <h1>[a-zA-Z]</h1>, or <h1>(.*)</h1>. Nothing
      happens, no files are found (and there are more than 500 files with
      <h1>...</h1>)

      Thus, what I'm doing wrong?
      --
      adrien
    • Sheri
      ... With the first two, you are looking for situations where there is only one character between and . With the third, you are looking for
      Message 2 of 3 , Oct 14, 2008
      • 0 Attachment
        --- In notetab@yahoogroups.com, "Adrien Verlee" <adrien.verlee@...> wrote:
        >
        > Hello,
        >
        > I want to find the words between <h1> and </h1> on an entire folder.
        > When I thick reg exp and write in de the find-field of Search disk
        > dialog: <h1>\w</h1> or <h1>[a-zA-Z]</h1>, or <h1>(.*)</h1>. Nothing
        > happens, no files are found (and there are more than 500 files with
        > <h1>...</h1>)
        >
        > Thus, what I'm doing wrong?
        > --
        > adrien
        >

        With the first two, you are looking for situations where there is only
        one character between <h1> and </h1>. With the third, you are looking
        for <h1> followed by all characters that are not line breaks, followed
        by </h1>. Since the next thing would have to be a line break, it
        couldn't possibly be </h1>.

        <h1>\w+</h1>

        would match the tags surrounding multiple word characters (which
        includes alpha and numeric and I think underscores).

        <h1>[a-zA-Z]+</h1>

        would allow only alphabetic characters between the tags.

        <h1>.*?</h1>

        would match from an opening tag through the first closing tag (but
        everything must be the same line -- same for for the others).

        If you opt for one of the first two, they could be improved by using
        two plus signs e.g.

        <h1>\w++</h1> or <h1>[a-zA-Z]++</h1>

        because in both cases, since "<" is not in the matching characters of
        \w or [a-zA-Z], there is no need to backtrack. The second plus sign
        says to consume all of the \w or [a-zA-Z] characters without
        backtracking and then require the next characters to be </h1>.

        If your tags could be split over multiple lines, that can be done for
        example with

        (?s)<h1>.+?</h1>

        The (?s) makes dot including line breaking characters. The difference
        between using a plus and an asterisk after the dot is that with plus,
        there must be at least one character between the tags.

        Regards,
        Sheri
      • Adrien Verlee
        ... This is perfect! Thank you. -- adrien
        Message 3 of 3 , Oct 14, 2008
        • 0 Attachment
          Op 14-okt-08, om 16:07 heeft Sheri het volgende geschreven:

          > <h1>.*?</h1>


          This is perfect! Thank you.
          --
          adrien
        Your message has been successfully submitted and would be delivered to recipients shortly.