--- In firstname.lastname@example.org
, "Adrien Verlee" <adrien.verlee@...> wrote:
> I want to find the words between <h1> and </h1> on an entire folder.
> When I thick reg exp and write in de the find-field of Search disk
> dialog: <h1>\w</h1> or <h1>[a-zA-Z]</h1>, or <h1>(.*)</h1>. Nothing
> happens, no files are found (and there are more than 500 files with
> Thus, what I'm doing wrong?
With the first two, you are looking for situations where there is only
one character between <h1> and </h1>. With the third, you are looking
for <h1> followed by all characters that are not line breaks, followed
by </h1>. Since the next thing would have to be a line break, it
couldn't possibly be </h1>.
would match the tags surrounding multiple word characters (which
includes alpha and numeric and I think underscores).
would allow only alphabetic characters between the tags.
would match from an opening tag through the first closing tag (but
everything must be the same line -- same for for the others).
If you opt for one of the first two, they could be improved by using
two plus signs e.g.
<h1>\w++</h1> or <h1>[a-zA-Z]++</h1>
because in both cases, since "<" is not in the matching characters of
\w or [a-zA-Z], there is no need to backtrack. The second plus sign
says to consume all of the \w or [a-zA-Z] characters without
backtracking and then require the next characters to be </h1>.
If your tags could be split over multiple lines, that can be done for
The (?s) makes dot including line breaking characters. The difference
between using a plus and an asterisk after the dot is that with plus,
there must be at least one character between the tags.