Loading ...
Sorry, an error occurred while loading the content.

FW: complex javascript link

Expand Messages
  • Sattler,Eugeny,SAMARA,B&C
    23.11.04 2:07, Frank Visser wrote FV Would you know a way to rewrite the regex i am currently using: FV FV Javascript=javascript:
    Message 1 of 3 , Nov 22, 2004
    • 0 Attachment
      23.11.04 2:07, Frank Visser <f.visser3 (a)chello.nl> wrote

      FV> Would you know a way to rewrite the regex i am currently using:
      FV>
      FV> Javascript=javascript: *[_a-zA-Z0-9]+ *\( *['"]([^'"]+)['"]
      FV>
      FV> So that it will not match the javascript function below:
      FV>
      FV> javascript:BenchMarks('notebook','index','?iid=ipp_mobiletech+tools_compare&');
      FV>
      FV> xenu now parses it incorrectly, which leads to many "broken" links.
      FV> I'd rather have xenu to skip this type of javascript.
      FV> So I want the regex to match only URL like strings, starting with h|f|/
      FV> for http, ftp and /relative_links.
      Hi Frank!
      Still haven't read "Regular Expressions syntax" part of PowerGREP manual?
      :-))

      The task you mentioned is so-o-o easy!

      I suggest this:

      Javascript=
      javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]

      Explanation:
      (/|ftp://|https?://)
      matches either "/" or "ftp://" or "http://" or "https://"

      BTW, that already was a part of my initial (not simplified) regex,
      but, as you remember, we decided to get rid of this because we wanted
      our regular expression to be as simple as possible so as to be sure
      regex processing library (not entirely perl compatible) of XenuLS
      understands it.

      But I would go further - I suggest to try to match URL chunks
      starting with "?param_name=param_value" like here:
      javascript:OpenWin('?iid=ipp_mobiletech+tools_compare&');

      So I suggest this
      javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://|\?[_a-zA-Z0-9]+=)[^'"]+?)['"]

      Explanation: due to presence of "\?[_a-zA-Z]+=" in the regex we allow
      javascript parameter starting with a question mark, followed by a word
      consisting from letters and/or digits and/or underscores, followed by
      an equals sign to be passed to URL checker. This
      "?param_name=param_value" URL chunk will be concatenated with base
      href and then the whole thing will be checked.

      --
      Best regards,
      Eugeny mailto:accmailer%20AT%20yandex.ru




      [Non-text portions of this message have been removed]
    • Frank Visser
      Euvgeny, Super! I will try it out later today! Not sure if I want the parameters though, its for statistical purposes only. frank _____ From:
      Message 2 of 3 , Nov 22, 2004
      • 0 Attachment
        Euvgeny,



        Super! I will try it out later today!



        Not sure if I want the parameters though, its for statistical purposes only.



        frank



        _____

        From: Sattler,Eugeny,SAMARA,B&C [mailto:Eugeny.Sattler@...]
        Sent: dinsdag 23 november 2004 7:51
        To: xenu-usergroup@yahoogroups.com
        Subject: [xenu-usergroup] FW: complex javascript link



        23.11.04 2:07, Frank Visser <f.visser3 (a)chello.nl> wrote

        FV> Would you know a way to rewrite the regex i am currently using:
        FV>
        FV> Javascript=javascript: *[_a-zA-Z0-9]+ *\( *['"]([^'"]+)['"]
        FV>
        FV> So that it will not match the javascript function below:
        FV>
        FV>
        javascript:BenchMarks('notebook','index','?iid=ipp_mobiletech+tools_compare&
        ');
        FV>
        FV> xenu now parses it incorrectly, which leads to many "broken" links.
        FV> I'd rather have xenu to skip this type of javascript.
        FV> So I want the regex to match only URL like strings, starting with h|f|/
        FV> for http, ftp and /relative_links.
        Hi Frank!
        Still haven't read "Regular Expressions syntax" part of PowerGREP manual?
        :-))

        The task you mentioned is so-o-o easy!

        I suggest this:

        Javascript=
        javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'
        <ftp://|https?:/)[^'> "]+)['"]

        Explanation:
        (/|ftp://|https?://) <ftp://|https?:/)>
        matches either "/" or "ftp://" or "http://" or "https://"

        BTW, that already was a part of my initial (not simplified) regex,
        but, as you remember, we decided to get rid of this because we wanted
        our regular expression to be as simple as possible so as to be sure
        regex processing library (not entirely perl compatible) of XenuLS
        understands it.

        But I would go further - I suggest to try to match URL chunks
        starting with "?param_name=param_value" like here:
        javascript:OpenWin('?iid=ipp_mobiletech+tools_compare&');

        So I suggest this
        javascript: *[_a-zA-Z0-9]+ *\(
        *['"]((/|ftp://|https?://|\?[_a-zA-Z0-9]+=)[^'
        <ftp://|https?:/|/?[_a-zA-Z0-9]+=)[^'> "]+?)['"]

        Explanation: due to presence of "\?[_a-zA-Z]+=" in the regex we allow
        javascript parameter starting with a question mark, followed by a word
        consisting from letters and/or digits and/or underscores, followed by
        an equals sign to be passed to URL checker. This
        "?param_name=param_value" URL chunk will be concatenated with base
        href and then the whole thing will be checked.

        --
        Best regards,
        Eugeny mailto:accmailer%20AT%20yandex.ru




        [Non-text portions of this message have been removed]






        Yahoo! Groups Sponsor



        <http://us.ard.yahoo.com/SIG=1293avcpk/M=296572.5585671.6651487.3001176/D=gr
        oups/S=1705005512:HM/EXP=1101279437/A=2343726/R=0/SIG=12iim4cke/*http:/clk.a
        tdmt.com/VON/go/yhxxxvon01900091von/direct/01/&time=1101193037045794>
        <http://us.ard.yahoo.com/SIG=1293avcpk/M=296572.5585671.6651487.3001176/D=gr
        oups/S=1705005512:HM/EXP=1101279437/A=2343726/R=1/SIG=12iim4cke/*http:/clk.a
        tdmt.com/VON/go/yhxxxvon01900091von/direct/01/&time=1101193037045794>

        Get unlimited calls to
        <http://us.ard.yahoo.com/SIG=1293avcpk/M=296572.5585671.6651487.3001176/D=gr
        oups/S=1705005512:HM/EXP=1101279437/A=2343726/R=1/SIG=12iim4cke/*http:/clk.a
        tdmt.com/VON/go/yhxxxvon01900091von/direct/01/&time=1101193037045794>

        U.S./Canada
        <http://us.ard.yahoo.com/SIG=1293avcpk/M=296572.5585671.6651487.3001176/D=gr
        oups/S=1705005512:HM/EXP=1101279437/A=2343726/R=1/SIG=12iim4cke/*http:/clk.a
        tdmt.com/VON/go/yhxxxvon01900091von/direct/01/&time=1101193037045794>


        <http://view.atdmt.com/VON/view/yhxxxvon01900091von/direct/01/&time=11011930
        37045794>



        <http://us.adserver.yahoo.com/l?M=296572.5585671.6651487.3001176/D=groups/S=
        :HM/A=2343726/rand=734836257>



        _____

        Yahoo! Groups Links

        * To visit your group on the web, go to:
        http://groups.yahoo.com/group/xenu-usergroup/

        * To unsubscribe from this group, send an email to:
        xenu-usergroup-unsubscribe@yahoogroups.com
        <mailto:xenu-usergroup-unsubscribe@yahoogroups.com?subject=Unsubscribe>

        * Your use of Yahoo! Groups is subject to the Yahoo!
        <http://docs.yahoo.com/info/terms/> Terms of Service.



        [Non-text portions of this message have been removed]
      • frank visser
        hi eugeny, your suggestion did not work, xenu now skipped ALL javascript, but when i deleted the part between , it did work: Javascript=javascript:
        Message 3 of 3 , Nov 23, 2004
        • 0 Attachment
          hi eugeny,

          your suggestion did not work, xenu now skipped ALL javascript, but
          when i deleted the part between <...>, it did work:

          Javascript=javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)
          [^'"]+)['"]

          probably < and > are not recognized by xenu?

          just curious:

          - wasn't there a / missing after the second https:/ ?
          - what did [^' between the <...> section do?

          thanks again for your help,

          frank



          --- In xenu-usergroup@yahoogroups.com, "Sattler,Eugeny,SAMARA,B&C"
          <Eugeny.Sattler@R...> wrote:
          > 23.11.04 2:07, Frank Visser <f.visser3 (a)chello.nl> wrote
          >
          > FV> Would you know a way to rewrite the regex i am currently using:
          > FV>
          > FV> Javascript=javascript: *[_a-zA-Z0-9]+ *\( *['"]([^'"]+)['"]
          > FV>
          > FV> So that it will not match the javascript function below:
          > FV>
          > FV> javascript:BenchMarks('notebook','index','?
          iid=ipp_mobiletech+tools_compare&');
          > FV>
          > FV> xenu now parses it incorrectly, which leads to many "broken"
          links.
          > FV> I'd rather have xenu to skip this type of javascript.
          > FV> So I want the regex to match only URL like strings, starting
          with h|f|/
          > FV> for http, ftp and /relative_links.
          > Hi Frank!
          > Still haven't read "Regular Expressions syntax" part of PowerGREP
          manual?
          > :-))
          >
          > The task you mentioned is so-o-o easy!
          >
          > I suggest this:
          >
          > Javascript=
          > javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]
          >
          > Explanation:
          > (/|ftp://|https?://)
          > matches either "/" or "ftp://" or "http://" or "https://"
          >
          > BTW, that already was a part of my initial (not simplified)
          regex,
          > but, as you remember, we decided to get rid of this because we
          wanted
          > our regular expression to be as simple as possible so as to be
          sure
          > regex processing library (not entirely perl compatible) of
          XenuLS
          > understands it.
          >
          > But I would go further - I suggest to try to match URL chunks
          > starting with "?param_name=param_value" like here:
          > javascript:OpenWin('?iid=ipp_mobiletech+tools_compare&');
          >
          > So I suggest this
          > javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://|\?[_a-zA-
          Z0-9]+=)[^'"]+?)['"]
          >
          > Explanation: due to presence of "\?[_a-zA-Z]+=" in the regex we
          allow
          > javascript parameter starting with a question mark, followed by a
          word
          > consisting from letters and/or digits and/or underscores, followed
          by
          > an equals sign to be passed to URL checker.
          This
          > "?param_name=param_value" URL chunk will be concatenated with
          base
          > href and then the whole thing will be checked.
          >
          > --
          > Best regards,
          > Eugeny mailto:accmailer%20AT%20yandex.ru
          >
          >
          >
          >
          > [Non-text portions of this message have been removed]
        Your message has been successfully submitted and would be delivered to recipients shortly.