Loading ...
Sorry, an error occurred while loading the content.
 

Re: HA: [xenu-usergroup] Accomodate for (J|j)ava(s|S)cript in regex

Expand Messages
  • frank visser
    hi evgeny, thanks for looking into this. i have tried all possible variations, but only lower case javascript seems to give a match. when i try this link:
    Message 1 of 12 , Jan 17, 2005
      hi evgeny,

      thanks for looking into this.

      i have tried all possible variations, but only lower
      case "javascript" seems to give a match.

      when i try this link:

      <a href="Javascript:openWin
      ('http://www.intel.com/business/bss/industry/government/cs_demo/govt_
      flash.htm?iid=ibe_home+govt_static&','759','532')">TESTLINK</a>

      or

      <a href="javaScript:openWin
      ('http://www.intel.com/business/bss/industry/government/cs_demo/govt_
      flash.htm?iid=ibe_home+govt_static&','759','532')">TESTLINK</a>

      or

      <a href="JavaScript:openWin
      ('http://www.intel.com/business/bss/industry/government/cs_demo/govt_
      flash.htm?iid=ibe_home+govt_static&','759','532')">TESTLINK</a>


      and change the regex accordingly:

      Javascript=[Jj]ava[Ss]cript: *[_a-zA-Z0-9]+ *\( *['"]
      ((/|ftp://|https?://)[^'"]+)['"]

      i don't get a match.

      is there some rule saying that "javascript:" should be lower case?

      does not make sense.

      when i enter the URL in a regex tester such as found on:
      http://www.forta.com/books/0672325667/

      i can vary S/s and J/j and case sensitive/case unsensitive, and it
      all works as expected.

      tilman, is Xenu able to handle this aspect of case (in)sensitiveness?

      it does seem so, for the function names matched with [_a-zA-Z0-9]+
      sometimes do have capitals in them, and are matched.


      frank



      --- In xenu-usergroup@yahoogroups.com, Eugeny.Sattler@R... wrote:
      > > i am trying to upgrade the regex
      > > javascript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)
      ['"]
      > > for all cases of "javascript", "Javascript", "javaScript"
      > > and "JavaScript".
      > > as follows:
      > > (J|j)ava(s|S)cript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)
      > > [^'"]+)['"]
      > > but this causes "broken links" in Xenu to show up of the type:
      >
      > do it like this
      > [Jj]ava[Ss]cript
      >
      > Eugeny
    • frank visser
      hi tilman, does this line of code in Xenu: if (pcsLink- Left(11) == javascript: && !m_csJavascript.IsEmpty()) cause that only javascript: cases with lower
      Message 2 of 12 , Jan 19, 2005
        hi tilman,

        does this line of code in Xenu:

        if (pcsLink->Left(11) == "javascript:" &&
        !m_csJavascript.IsEmpty())

        cause that only "javascript:" cases with lower case "j" etc. are matched?

        would it help to make the match case insensitive, so also
        "Javascript", "JavaScript", "JAVASCRIPT:" are matched.

        on the web, URLs are case insensitive, so is HTML.

        frank
      • Tilman Hausherr
        ... Yes. ... URLs are not. Maybe the beginning. I also thought that this line would make trouble, but you said that your new regex worked fine... Tilman
        Message 3 of 12 , Jan 20, 2005
          On Thu, 20 Jan 2005 07:31:50 -0000, frank visser wrote:

          >hi tilman,
          >
          >does this line of code in Xenu:
          >
          >if (pcsLink->Left(11) == "javascript:" &&
          >!m_csJavascript.IsEmpty())
          >
          >cause that only "javascript:" cases with lower case "j" etc. are matched?

          Yes.

          >would it help to make the match case insensitive, so also
          >"Javascript", "JavaScript", "JAVASCRIPT:" are matched.
          >
          >on the web, URLs are case insensitive, so is HTML.

          URLs are not. Maybe the beginning. I also thought that this line would
          make trouble, but you said that your new regex worked fine...

          Tilman
        • Eugeny.Sattler@RU.NESTLE.com
          ... putting [Jj][Aa][Vv][Aa][Ss][Cc][Rr][Ii][Pp][Tt] into the regex would catch all possible permutations although if Tilman made his (borrowed from codeguru)
          Message 4 of 12 , Jan 20, 2005
            >would it help to make the match case insensitive, so also
            >"Javascript", "JavaScript", "JAVASCRIPT:" are matched.


            putting [Jj][Aa][Vv][Aa][Ss][Cc][Rr][Ii][Pp][Tt]
            into the regex would catch all possible permutations although if Tilman
            made his (borrowed from codeguru) regexp parser case insensitive,
            the regex could be more elegant.
          • frank visser
            hi tilman, evgeny, obviously, the javascript regex addition to xenu is in beta, so i am trying to find cases where it doesn t work. but since i am not a
            Message 5 of 12 , Jan 20, 2005
              hi tilman, evgeny,

              obviously, the javascript regex addition to xenu is in beta, so i am
              trying to find cases where it doesn't work.

              but since i am not a programmer, this goes slow with me.

              i don't care about elegance as long as it works, but i could not get
              the [Jj] etc. solution to work at all - that was my problem.

              am i mistaken? will try again.

              frank



              --- In xenu-usergroup@yahoogroups.com, Eugeny.Sattler@R... wrote:
              > >would it help to make the match case insensitive, so also
              > >"Javascript", "JavaScript", "JAVASCRIPT:" are matched.
              >
              >
              > putting [Jj][Aa][Vv][Aa][Ss][Cc][Rr][Ii][Pp][Tt]
              > into the regex would catch all possible permutations although if
              Tilman
              > made his (borrowed from codeguru) regexp parser case insensitive,
              > the regex could be more elegant.
            • frank visser
              hi tilman, my programmer colleague says it is possible to ignore the case of the string javascript in your code as follows: if ( (0 ==
              Message 6 of 12 , Jan 20, 2005
                hi tilman,

                my programmer colleague says it is possible to "ignore" the "case" of
                the string "javascript" in your code as follows:


                if ( (0 == stricmp(pcsLink->Left(11), "javascript:")) && !
                m_csJavascript.IsEmpty())){
                Regexp reXenu (m_csJavascript);
                if (reXenu.Match(*pcsLink)){
                *pcsLink = reXenu[1];
                }
                }


                if ( (0 == pcsLink->Left(11)->stricmp("javascript:")) && !
                m_csJavascript.IsEmpty())){
                Regexp reXenu (m_csJavascript);
                if (reXenu.Match(*pcsLink)){
                *pcsLink = reXenu[1];
                }
                }


                two variations, written from the top of his head, so don't count on
                it, but perhaps you get the idea.

                does this make sense?


                frank

                --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@s...>
                wrote:
                > On Thu, 20 Jan 2005 07:31:50 -0000, frank visser wrote:
                >
                > >hi tilman,
                > >
                > >does this line of code in Xenu:
                > >
                > >if (pcsLink->Left(11) == "javascript:" &&
                > >!m_csJavascript.IsEmpty())
                > >
                > >cause that only "javascript:" cases with lower case "j" etc. are
                matched?
                >
                > Yes.
                >
                > >would it help to make the match case insensitive, so also
                > >"Javascript", "JavaScript", "JAVASCRIPT:" are matched.
                > >
                > >on the web, URLs are case insensitive, so is HTML.
                >
                > URLs are not. Maybe the beginning. I also thought that this line
                would
                > make trouble, but you said that your new regex worked fine...
                >
                > Tilman
              • Tilman Hausherr
                ... Of course it does. I ve done Visual C++ MFC programming since it exists, and C programming for 20 years. I never asked for advice about how to solve it, I
                Message 7 of 12 , Jan 20, 2005
                  On Thu, 20 Jan 2005 16:12:55 -0000, frank visser wrote:

                  >hi tilman,
                  >
                  >my programmer colleague says it is possible to "ignore" the "case" of
                  >the string "javascript" in your code as follows:
                  >
                  >
                  >if ( (0 == stricmp(pcsLink->Left(11), "javascript:")) && !
                  >m_csJavascript.IsEmpty())){
                  > Regexp reXenu (m_csJavascript);
                  > if (reXenu.Match(*pcsLink)){
                  > *pcsLink = reXenu[1];
                  > }
                  >}
                  >
                  >
                  >if ( (0 == pcsLink->Left(11)->stricmp("javascript:")) && !
                  >m_csJavascript.IsEmpty())){
                  > Regexp reXenu (m_csJavascript);
                  > if (reXenu.Match(*pcsLink)){
                  > *pcsLink = reXenu[1];
                  > }
                  >}
                  >
                  >
                  >two variations, written from the top of his head, so don't count on
                  >it, but perhaps you get the idea.
                  >
                  >does this make sense?

                  Of course it does. I've done Visual C++ MFC programming since it exists,
                  and C programming for 20 years. I never asked for advice about how to
                  solve it, I just thought Xenu had worked for you because you had
                  mentioned something like that, so I didn't think much about it although
                  the code line looked "suspicious" to me. I'll make an appropriate change
                  this weekend anyway.

                  (Alternatively, consider teaching your "ยง/#?/("! CMS to use lowercase
                  "javascript:")

                  Tilman

                  >
                  >
                  >frank
                  >
                  >--- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@s...>
                  >wrote:
                  >> On Thu, 20 Jan 2005 07:31:50 -0000, frank visser wrote:
                  >>
                  >> >hi tilman,
                  >> >
                  >> >does this line of code in Xenu:
                  >> >
                  >> >if (pcsLink->Left(11) == "javascript:" &&
                  >> >!m_csJavascript.IsEmpty())
                  >> >
                  >> >cause that only "javascript:" cases with lower case "j" etc. are
                  >matched?
                  >>
                  >> Yes.
                  >>
                  >> >would it help to make the match case insensitive, so also
                  >> >"Javascript", "JavaScript", "JAVASCRIPT:" are matched.
                  >> >
                  >> >on the web, URLs are case insensitive, so is HTML.
                  >>
                  >> URLs are not. Maybe the beginning. I also thought that this line
                  >would
                  >> make trouble, but you said that your new regex worked fine...
                  >>
                  >> Tilman
                  >
                  >
                  >
                  >
                  >
                  >
                  >Yahoo! Groups Links
                  >
                  >
                  >
                  >
                  >
                  >
                • Tilman Hausherr
                  ... I ll correct it this weekend.
                  Message 8 of 12 , Jan 20, 2005
                    On Thu, 20 Jan 2005 14:06:09 -0000, frank visser wrote:

                    >i don't care about elegance as long as it works, but i could not get
                    >the [Jj] etc. solution to work at all - that was my problem.
                    >
                    >am i mistaken? will try again.

                    I'll correct it this weekend.
                  • frank visser
                    thanks a lot tilmann! (please note i still use the cookie-accepting version ;-) - you once remarked you considered adding this as an option in xenu: [ ]
                    Message 9 of 12 , Jan 21, 2005
                      thanks a lot tilmann!

                      (please note i still use the cookie-accepting version ;-) - you once
                      remarked you considered adding this as an option in xenu:

                      "[ ] accept cookies."

                      saves you the trouble of creating separate versions.)

                      frank

                      --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@s...>
                      wrote:
                      > On Thu, 20 Jan 2005 14:06:09 -0000, frank visser wrote:
                      >
                      > >i don't care about elegance as long as it works, but i could not
                      get
                      > >the [Jj] etc. solution to work at all - that was my problem.
                      > >
                      > >am i mistaken? will try again.
                      >
                      > I'll correct it this weekend.
                    • Tilman Hausherr
                      ... You will be able (in the next version this weekend) to enable cookies with [Options] AllowCookies=1 This is unofficial, I will only make a description on
                      Message 10 of 12 , Jan 21, 2005
                        On Fri, 21 Jan 2005 13:21:46 -0000, frank visser wrote:

                        >thanks a lot tilmann!
                        >
                        >(please note i still use the cookie-accepting version ;-) - you once
                        >remarked you considered adding this as an option in xenu:
                        >
                        >"[ ] accept cookies."
                        >
                        >saves you the trouble of creating separate versions.)

                        You will be able (in the next version this weekend) to enable cookies
                        with

                        [Options]
                        AllowCookies=1

                        This is unofficial, I will only make a description on the website, but
                        won't put it in the options dialog box.

                        The javascript thing is now also done. Please test it to see whether it
                        works the way you want. I didn't test it so that you have some work,
                        too. (Note that you will STILL have to use an appropriate Regexp!)

                        http://home.snafu.de/tilman/tmp/xenubeta.zip

                        Tilman
                      • frank visser
                        hi tilman, the regex patch works fine. with: Javascript=[Jj]ava[Ss]cript: *[_a-zA-Z0-9]+ * ( *[ ] ((/|ftp://|https?://)[^ ]+)[ ] it covers my needs. frank
                        Message 11 of 12 , Jan 24, 2005
                          hi tilman,

                          the regex patch works fine.

                          with:

                          Javascript=[Jj]ava[Ss]cript: *[_a-zA-Z0-9]+ *\( *['"]
                          ((/|ftp://|https?://)[^'"]+)['"]

                          it covers my needs.

                          frank

                          --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@s...>
                          wrote:
                          > On Fri, 21 Jan 2005 13:21:46 -0000, frank visser wrote:
                          >
                          > >thanks a lot tilmann!
                          > >
                          > >(please note i still use the cookie-accepting version ;-) - you
                          once
                          > >remarked you considered adding this as an option in xenu:
                          > >
                          > >"[ ] accept cookies."
                          > >
                          > >saves you the trouble of creating separate versions.)
                          >
                          > You will be able (in the next version this weekend) to enable
                          cookies
                          > with
                          >
                          > [Options]
                          > AllowCookies=1
                          >
                          > This is unofficial, I will only make a description on the website,
                          but
                          > won't put it in the options dialog box.
                          >
                          > The javascript thing is now also done. Please test it to see
                          whether it
                          > works the way you want. I didn't test it so that you have some work,
                          > too. (Note that you will STILL have to use an appropriate Regexp!)
                          >
                          > http://home.snafu.de/tilman/tmp/xenubeta.zip
                          >
                          > Tilman
                        Your message has been successfully submitted and would be delivered to recipients shortly.