Loading ...
Sorry, an error occurred while loading the content.
 

Regex non-match issue

Expand Messages
  • frank visser
    hi all, on this korean web page: http://www.intel.com/cd/products/services/apac/kor/server/85975.htm the following javascript link is skipped by xenu 1.2beta:
    Message 1 of 2 , Jan 2, 2005
      hi all,

      on this korean web page:
      http://www.intel.com/cd/products/services/apac/kor/server/85975.htm

      the following javascript link is skipped by xenu 1.2beta:
      javascript:openWin
      ('http://www.intel.com/kr/hangul/business/bss/products/desktop/p4p/be
      nefits/ht/flash/illustration.htm','760','570');

      using the following time honoured regex:

      javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]

      when i run the regex through some online regex tester, it does match.

      why would this URL be the exception? the regex i use with xenu
      normally captures all URLs starting with http:// !

      i'm sure it has nothing to do with the page being in korean... ;-)

      if it helps, this same URL also does not match when it occurs on
      other web pages, so there must be something wrong with it, but i
      can't spot the error.

      who can help?

      frank
    • Tilman Hausherr
      ... No, it has a big J ! Tilman
      Message 2 of 2 , Jan 2, 2005
        On Sun, 02 Jan 2005 13:18:50 -0000, frank visser wrote:

        >
        >
        >hi all,
        >
        >on this korean web page:
        >http://www.intel.com/cd/products/services/apac/kor/server/85975.htm
        >
        >the following javascript link is skipped by xenu 1.2beta:
        >javascript:openWin

        No, it has a big "J" !

        Tilman


        >('http://www.intel.com/kr/hangul/business/bss/products/desktop/p4p/be
        >nefits/ht/flash/illustration.htm','760','570');
        >
        >using the following time honoured regex:
        >
        >javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]
        >
        >when i run the regex through some online regex tester, it does match.
        >
        >why would this URL be the exception? the regex i use with xenu
        >normally captures all URLs starting with http:// !
        >
        >i'm sure it has nothing to do with the page being in korean... ;-)
        >
        >if it helps, this same URL also does not match when it occurs on
        >other web pages, so there must be something wrong with it, but i
        >can't spot the error.
        >
        >who can help?
        >
        >frank
        >
        >
        >
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.