Loading ...
Sorry, an error occurred while loading the content.

Re: Accomodate for (J|j)ava(s|S)cript in regex

Expand Messages
  • Joshua Goldman
    Eugeny got what I was thinking of ... Square brackets mean one of the set of characters (or if it starts with ^ any character but one of the set ). [J|j]
    Message 1 of 4 , Jan 17, 2005
    • 0 Attachment
      Eugeny got what I was thinking of

      > do it like this
      > [Jj]ava[Ss]cript

      Square brackets mean "one of the set of characters" (or if it starts
      with ^ "any character but one of the set"). [J|j] means "either J or j
      or |", which would probably work in 99% of the cases since the string
      "|ava|cript:" is not going to occur too often.

      Thinking about trying to catch javascript:open("foo.html"). Problem
      is that anything that catches this will also catch any other
      Javascript that has a string for the first parameter. In my use of
      xenu, I didn't have this problem because I knew exactly what the
      javascript functions were.

      Thanks for pointing out the link to the regexp page.
      http://www.regular-expressions.info/alternation.html

      --- In xenu-usergroup@yahoogroups.com, "frank visser" <f.visser3@c...>
      wrote:
      >
      > hi josh,
      >
      > wish you were right, but no:
      > http://www.regular-expressions.info/alternation.html
      >
      > never heard of pipe symbol used with [...].
      >
      > will try out your suggestion though.
      >
      > the reason i wanted to exclude ('foo.htm'as match is that i wanted
      > to avoid ('benchmarks', etc., but you are right, i might include
      > items that refer to a URL.
      >
      > will dig into that as well and let u know.
      >
      > frank
      >
      > --- In xenu-usergroup@yahoogroups.com, "Josh Goldman" <Josh-
      > Goldman@r...> wrote:
      > > Shouldn't that be square brackets not parentheses. that is
      > >

      fixed thanks to Eugeny
      > > [Jj]ava[sS]cript: *[_a-zA-Z0-9]+ *\( *['"](/|ftp://|https?://)
      > [^'"]+)['"]
      > >
      > > unquoted parentheses ( ) indicate the section of the string that
      > you will be
      > > referencing with \1 or \2, where a square bracket is being used to
      > group
      > > characters for | or.
      > >
      > > In the correct string, the first unquoted ( should be after the
      > initial
      > > ['|"]. If you have an unquoted () before it, in this case "(J|j)",
      > then Xenu
      > > will try to find the link using "J" rather than the actual http
      > string since
      > > it is probably taking the result of the regular expression and
      > getting the
      > > value of \1.
      > >
      > > You also seem to have an extra parenthesis before the ftp. ['"]
      > ((/|ftp
      > >
      > > It's been a while since I've worked with regexp so it is possible
      > that I am
      > > wrong, but here's my explanation of the regexp
      > >
      > > [Jj]ava[sS]cript: *[_a-zA-Z0-9]+ *\( *['"](/|ftp://|https?://)
      > [^'"]+)['"]
      > >
      > > match a string
      > > that starts with either J or j
      > > followed by ava
      > > then either s or S
      > > followed by cript:
      > > then 0 or more space characters
      > > then a function name consisting of 1 or more characters from the
      > set _, a-z,
      > > A-X, and 0-9
      > > then 0 or more space characters
      > > then the literal ( left parenthesis
      > > then 0 or more space characters
      > > then either ' or "
      > > the following string will be returned as \1
      > > Either / or ftp:// or https:// or http:// s?
      > means 0 or 1
      > > s
      > > followed by one or more characters that can be anything
      > except ' or "
      > > End of \1 string
      > > Followed by ' or "
      > >
      > > This regexp won't catch local file references, such as
      > > Javascript:Open("foo.html")
      > > You could possibly fix that by putting a ? after
      > (/|ftp://|https?://)
      > >
      > > Message: 2
      > > Date: Sun, 16 Jan 2005 10:11:26 -0000
      > > From: "frank visser" <f.visser3@c...>
      > > Subject: Accomodate for (J|j)ava(s|S)cript in regex
      > >
      > >
      > > hi all,
      > >
      > > i am trying to upgrade the regex
      > > javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)
      > ['"]
      > >
      > > for all cases of "javascript", "Javascript", "javaScript"
      > > and "JavaScript".
      > >
      > > as follows:
      > >
      > > (J|j)ava(s|S)cript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)
      > > [^'"]+)['"]
      > >
      > > but this causes "broken links" in Xenu to show up of the type:
    Your message has been successfully submitted and would be delivered to recipients shortly.