Loading ...
Sorry, an error occurred while loading the content.

Yet another regex question (fixed width look behind?)

Expand Messages
  • Klaas Gadeyne
    I m looking for a perl regex that matches all strings except those beginning with a particular substring. Let s suppose the substring is /debian , then
    Message 1 of 4 , Dec 13, 2007
    • 0 Attachment
      I'm looking for a perl regex that matches all strings except those
      beginning with a particular substring.

      Let's suppose the substring is "/debian", then following strings won't match:

      /debian/dists/etch/Release
      /debian/db/packages.db
      /debian/

      All other strings, fi

      /index.html
      /bodypages/index.php
      /index.php?param=value
      /index.php?param=debian

      will match

      As might be clear from the above example, the goal is to match certain
      parts of url's that are fed into a webserver, which means I cannot use
      perl statements (such as negations) to keep the regex itself rather
      simple :-(
      Fighting my way through manpages, I found some patterns that come
      close (such as the fixed width look behind pattern (?<!pattern), but
      nothing works as it should (or more correct, as I want it to :-), eg.

      if ("/debian/index.php" =~ /^(\/(?<!debian\/).*)/ ) {
      print "It, $1, matches\n";
      }
      else {
      print "It doesn't match\n";
      }

      yields

      It, /debian/index.php, matches

      And using an non greedy variant (.*?) I get

      It, , matches

      Both of these expressions shouldn't match though :-(

      Any ideas? Thx!

      Klaas

      The context:
      An apache configuration where all url's that do not start with "/debian/..."
      should be redirected to poort 443 (https)

      So I need a single line in my apache configuration that looks like

      RedirectMatch ^/[^(debian/.*)] https://myserver/$1

      (I know the above regex is nonsense)
    • merlyn@stonehenge.com
      ... Klaas Let s suppose the substring is /debian , then following strings won t match: Klaas /debian/dists/etch/Release Klaas /debian/db/packages.db Klaas
      Message 2 of 4 , Dec 13, 2007
      • 0 Attachment
        >>>>> "Klaas" == Klaas Gadeyne <klaas.gadeyne@...> writes:

        Klaas> Let's suppose the substring is "/debian", then following strings won't match:

        Klaas> /debian/dists/etch/Release
        Klaas> /debian/db/packages.db
        Klaas> /debian/

        Klaas> All other strings, fi

        Klaas> /index.html
        Klaas> /bodypages/index.php
        Klaas> /index.php?param=value
        Klaas> /index.php?param=debian

        Klaas> will match

        Klaas> As might be clear from the above example, the goal is to match certain
        Klaas> parts of url's that are fed into a webserver, which means I cannot use
        Klaas> perl statements (such as negations) to keep the regex itself rather
        Klaas> simple :-(

        Not sure what you mean by that. Won't

        my $your_regex = qr{^/debian/};
        if (/$your_regex/) {
        # reject the line
        } else {
        # accept the line
        }

        work?

        --
        Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
        <merlyn@...> <URL:http://www.stonehenge.com/merlyn/>
        Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
        See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
      • Jeff Pinyan
        ... Something like this will work: if ($string =~ m{^(?!/debian)}) { print [$string] does not begin with /debian n ; } ... The problem is that your regex
        Message 3 of 4 , Dec 13, 2007
        • 0 Attachment
          On Dec 13, 2007 11:56 AM, Klaas Gadeyne <klaas.gadeyne@...> wrote:
          >
          > I'm looking for a perl regex that matches all strings except those
          > beginning with a particular substring.
          >
          > Let's suppose the substring is "/debian", then following strings won't match:
          >
          > /debian/dists/etch/Release
          > /debian/db/packages.db
          > /debian/
          >
          > All other strings, fi
          >
          > /index.html
          > /bodypages/index.php
          > /index.php?param=value
          > /index.php?param=debian
          >
          > will match

          Something like this will work:

          if ($string =~ m{^(?!/debian)}) {
          print "[$string] does not begin with '/debian'\n";
          }

          > if ("/debian/index.php" =~ /^(\/(?<!debian\/).*)/ ) {
          > print "It, $1, matches\n";
          > }

          The problem is that your regex says: "match beginning-of-line, then a
          /, then, look behind us to make sure we're not preceded by 'debian/',
          ..." That condition will never be met, since the only character
          matched so far is the / at the beginning of the string! I think my
          solution above will work fine for you.

          > So I need a single line in my apache configuration that looks like
          >
          > RedirectMatch ^/[^(debian/.*)] https://myserver/$1

          I don't know if Apache can handle (?!...), but if does, then my code
          should still be acceptable.

          RedirectMatch ^(?!/debian/)(.*) https://myserver/$1


          --
          [Mary said,] "Do whatever he tells you." ~ John 2:5
        • klaas.gadeyne@fmtc.be
          ... [...] ... Well, AFAIK mod_alias (which provides RedirectMatch) doesn t, but mod_rewrite does. I m not sure if this is the best possible option, but
          Message 4 of 4 , Dec 13, 2007
          • 0 Attachment
            > On Dec 13, 2007 11:56 AM, Klaas Gadeyne <klaas.gadeyne@...> wrote:
            [...]
            > I don't know if Apache can handle (?!...), but if does, then my code
            > should still be acceptable.
            >
            > RedirectMatch ^(?!/debian/)(.*) https://myserver/$1

            Well, AFAIK mod_alias (which provides RedirectMatch) doesn't, but
            mod_rewrite does.

            I'm not sure if this is the best possible option, but

            RewriteEngine On
            RewriteRule !^/debian.*$ https://www.myserver.org%{REQUEST_URI} [R]

            seems to do the trick

            Thx for the support!

            Klaas
          Your message has been successfully submitted and would be delivered to recipients shortly.