Loading ...
Sorry, an error occurred while loading the content.

Re: [xenu-usergroup] Filter URLs

Expand Messages
  • Tilman Hausherr
    You don t need the wildcard version if you just want to exclude something that starts with https://my.own.site/first/removepage The wildcard version does not
    Message 1 of 6 , Oct 10, 2007
    • 0 Attachment
      You don't need the wildcard version if you just want to exclude
      something that starts with

      https://my.own.site/first/removepage

      The wildcard version does not support "\" or regular expressions. The
      best would be to enter

      *removepage*

      in the exclusion list.

      Tilman

      On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:

      >Hello all together,
      >
      >I have read the posts to the topic filtering, but have not found what
      >I expected. So I want to reach the following:
      >
      >1. We have a lot of small web sites, that all start with https://
      >my.own.site/, the web sites themselves are then https://my.own.site/
      >first https://my.own.site/second ...
      >2. There are special actions that I want to filter all the time. All
      >of them have a '?' in the URL for the parameters.
      >3. I only want to check the consistency of one of the web sites, not
      >of all.
      >
      >So the filter should do:
      >Include: https://my.own.site/first
      >Exclude: https://my.own.site/%5b-first]
      >Exclude: https://my.own.site/*\?*
      >
      >I have downloaded xenuwild2.zip which allows to specify not only the
      >beginning, but the whole URL with wild cards.
      >
      >Is there a way to reach what I want to do with Xenu? It is a fabulous
      >tool, but without this feature, it would be too dangerous to use it
      >(https://my.own.site/first/removepage?pageID=xyz).
      >
      >Bye and thank's for a fine tool
      >
      >Markus
      >
      >
      >
      >
      >Yahoo! Groups Links
      >
      >
      >
    • Tilman Hausherr
      I ve uploaded the version 1.2j: 8.10.2007 (1.2j) Major improvements: - 5.6.2007: second options pane with 7 secret settings - 7.7.2007: up/down sort symbol
      Message 2 of 6 , Oct 11, 2007
      • 0 Attachment
        I've uploaded the version 1.2j:


        8.10.2007 (1.2j)
        Major improvements:
        - 5.6.2007: second options pane with 7 "secret" settings
        - 7.7.2007: up/down sort symbol on column header

        http://www.codeguru.com/cpp/controls/listview/advanced/article.php/c4179/
        Minor improvements:
        - 4.10.2006: visible URLs are first in new threads
        - 4.10.2006: update listctrl when "busy" is set
        - 7.10.2006: 2nd part of report more efficient for huge sites
        - 12.10.2006: REMOVEDOUBLESLASH compile option removes "/../" too
        - 15.10.2006: application/xhtml+xml is hypertext, too
        - 15.10.2006: Switched to InnoSetup 5.1.8
        - 30.10.2006: Skip aim://, ymsgr://, rtsp://, xmpp://
        - 30.11.2006: better error message for ShellExecute() errors
        - 30.11.2006: "//" in URL after the host name is not "broken" when after
        a "?"
        - 8.1.2007: Max title length 1024
        - 16.1.2007: ftp dialogbox wider
        - 19.1.2007: [Options] MakeLowerCase=1 ==> converts all URLs to lower
        case
        (default is 0)
        - 3.3.2007: [Options] ListLocalDirectories=1 ==> local directory
        listing (default is 0)
        - ??.3.2007: [Options] AllowLocalFilesInRemoteCheck=1 ==> Allow
        file:// links in remote check
        (default is 0)
        - 16.3.2007: Skip callto:
        - 25.3.3007: meta generator
        - 31.3.2007: Upgraded to InnoSetup 5.1.11
        - 31.3.2007: Title TrimRight()
        - 31.3.2007: update listctrl when title becomes known
        - 31.3.2007: convert titles in sitemap to &...; notation
        - 1.4.2007: Added most of
        http://www.htmlhelp.com/reference/html40/entities/special.html to
        conversion table
        - 29.5.2007: "asterisk" sound when done
        - 2.6.2007: -save option for command line version to save .XEN file
        (does overwrite)
        - 2.6.2007: all command line options for command line version can now be
        combined
        - 5.6.2007: MakeLowerCase, vNormalizeURL() slightly changed internally
        - 6.6.2007: .XEN Archive version 10
        - 8.6.2007: "Autostart" feature when opening .XEN file
        - 8.6.2007: all command line options for command line version can be
        used when opening .XEN file
        - 28.7.2007: retry feature in command line version (test)
        - 3.8.2007: Upgraded to InnoSetup 5.1.13
        - 15.8.2007: reset sort icon, and vUpdateColumnSortIcon() at InsertAll()

        Bug fixes:
        - 7.12.2006: check for iIndex < pList->GetItemCount()
        - 13.2.2007: corrected bug in ListLocalDirectories feature (last file
        ignored)
        - 15.2.2007: wildcard version adds "*" at the end of each entry in
        "Check URL list"
        - 23.5.2007: aim: instead of aim://
        - 20.8.2007: remove "file://" for ShellExecute()
        - 21.9.2007: % size corrected in statistic (was % count!)
        - 22.9.2007: fixed FindFile security leak,
        http://goodfellas.shellcode.com.ar/own/VULWKU200706142
      • markus_liebelt_82
        Hi Tilman, nice idea, I will prove if that is sufficient for the dangerous links. But what about reducing the web site to only one part and excluding the rest?
        Message 3 of 6 , Oct 11, 2007
        • 0 Attachment
          Hi Tilman,

          nice idea, I will prove if that is sufficient for the dangerous
          links. But what about reducing the web site to only one part and
          excluding the rest? Any idea how to get that? There should be an
          order or match expressions, with rules if they should followed or not.

          Like:
          Follow not: removepage
          Follow: https://my.own.site/first
          Follow not: https://my.own.site/

          Will that work when I start with: https://my.own.site/first and
          include the other two in the "do not check" section? It looks for me
          as links to other parts of the web (in which I am not interested in)
          are followed as well.

          By the way, where is the new version dowloadable?

          Bye
          Markus

          --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...>
          wrote:
          >
          > You don't need the wildcard version if you just want to exclude
          > something that starts with
          >
          > https://my.own.site/first/removepage
          >
          > The wildcard version does not support "\" or regular expressions.
          The
          > best would be to enter
          >
          > *removepage*
          >
          > in the exclusion list.
          >
          > Tilman
          >
          > On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
          >
          > >Hello all together,
          > >
          > >I have read the posts to the topic filtering, but have not found
          what
          > >I expected. So I want to reach the following:
          > >
          > >1. We have a lot of small web sites, that all start with https://
          > >my.own.site/, the web sites themselves are then https://
          my.own.site/
          > >first https://my.own.site/second ...
          > >2. There are special actions that I want to filter all the time.
          All
          > >of them have a '?' in the URL for the parameters.
          > >3. I only want to check the consistency of one of the web sites,
          not
          > >of all.
          > >
          > >So the filter should do:
          > >Include: https://my.own.site/first
          > >Exclude: https://my.own.site/%5b-first]
          > >Exclude: https://my.own.site/*\?*
          > >
          > >I have downloaded xenuwild2.zip which allows to specify not only
          the
          > >beginning, but the whole URL with wild cards.
          > >
          > >Is there a way to reach what I want to do with Xenu? It is a
          fabulous
          > >tool, but without this feature, it would be too dangerous to use
          it
          > >(https://my.own.site/first/removepage?pageID=xyz).
          > >
          > >Bye and thank's for a fine tool
          > >
          > >Markus
          > >
          > >
          > >
          > >
          > >Yahoo! Groups Links
          > >
          > >
          > >
          >
        • Tilman Hausherr
          ... if you use the wildcard version, you should add this exclude: *removepage* https://my.own.site/* include: https://my.own.site/first* However it is an
          Message 4 of 6 , Oct 11, 2007
          • 0 Attachment
            On Thu, 11 Oct 2007 15:48:07 -0000, markus_liebelt_82 wrote:

            >Hi Tilman,
            >
            >nice idea, I will prove if that is sufficient for the dangerous
            >links. But what about reducing the web site to only one part and
            >excluding the rest? Any idea how to get that? There should be an
            >order or match expressions, with rules if they should followed or not.
            >
            >Like:
            >Follow not: removepage
            >Follow: https://my.own.site/first
            >Follow not: https://my.own.site/

            if you use the wildcard version, you should add this

            exclude:
            *removepage*
            https://my.own.site/*

            include:
            https://my.own.site/first*

            However it is an obvious contradiction.

            >Will that work when I start with: https://my.own.site/first and
            >include the other two in the "do not check" section? It looks for me
            >as links to other parts of the web (in which I am not interested in)
            >are followed as well.

            I think it doesn't make sense. It won't work, since you are excluding
            everything.

            A possible solution is to use
            https://my.own.site/first/
            as start URL. Note the "/" at the end. It will might check
            https://my.own.site/rest/ for existance, but won't spider it.

            >By the way, where is the new version dowloadable?

            http://home.snafu.de/tilman/XENU.ZIP

            (this is not the wildcard version)

            Tilman

            >
            >Bye
            > Markus
            >
            >--- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...>
            >wrote:
            >>
            >> You don't need the wildcard version if you just want to exclude
            >> something that starts with
            >>
            >> https://my.own.site/first/removepage
            >>
            >> The wildcard version does not support "\" or regular expressions.
            >The
            >> best would be to enter
            >>
            >> *removepage*
            >>
            >> in the exclusion list.
            >>
            >> Tilman
            >>
            >> On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
            >>
            >> >Hello all together,
            >> >
            >> >I have read the posts to the topic filtering, but have not found
            >what
            >> >I expected. So I want to reach the following:
            >> >
            >> >1. We have a lot of small web sites, that all start with https://
            >> >my.own.site/, the web sites themselves are then https://
            >my.own.site/
            >> >first https://my.own.site/second ...
            >> >2. There are special actions that I want to filter all the time.
            >All
            >> >of them have a '?' in the URL for the parameters.
            >> >3. I only want to check the consistency of one of the web sites,
            >not
            >> >of all.
            >> >
            >> >So the filter should do:
            >> >Include: https://my.own.site/first
            >> >Exclude: https://my.own.site/%5b-first]
            >> >Exclude: https://my.own.site/*\?*
            >> >
            >> >I have downloaded xenuwild2.zip which allows to specify not only
            >the
            >> >beginning, but the whole URL with wild cards.
            >> >
            >> >Is there a way to reach what I want to do with Xenu? It is a
            >fabulous
            >> >tool, but without this feature, it would be too dangerous to use
            >it
            >> >(https://my.own.site/first/removepage?pageID=xyz).
            >> >
            >> >Bye and thank's for a fine tool
            >> >
            >> >Markus
            >> >
            >> >
            >> >
            >> >
            >> >Yahoo! Groups Links
            >> >
            >> >
            >> >
            >>
            >
            >
            >
            >
            >
            >Yahoo! Groups Links
            >
            >
            >
          • markus_liebelt_82
            Bad habit to answer the own questions, but: - It works now with the hint to include a / at the end of starting point. - I need the wildcards to exclude
            Message 5 of 6 , Oct 11, 2007
            • 0 Attachment
              Bad habit to answer the own questions, but:
              - It works now with the hint to include a '/' at the end of starting
              point.
              - I need the wildcards to exclude *action*, *showComment* ...
              - I am able to skip the rest of the web site by giving an exclusion
              rule like in my example.

              Thank's a lot, and have a nice day :-)

              Markus

              --- In xenu-usergroup@yahoogroups.com,
              "markus_liebelt_82" <Markus.Liebelt@...> wrote:
              >
              > Hi Tilman,
              >
              > nice idea, I will prove if that is sufficient for the dangerous
              > links. But what about reducing the web site to only one part and
              > excluding the rest? Any idea how to get that? There should be an
              > order or match expressions, with rules if they should followed or
              not.
              >
              > Like:
              > Follow not: removepage
              > Follow: https://my.own.site/first
              > Follow not: https://my.own.site/
              >
              > Will that work when I start with: https://my.own.site/first and
              > include the other two in the "do not check" section? It looks for
              me
              > as links to other parts of the web (in which I am not interested
              in)
              > are followed as well.
              >
              > By the way, where is the new version dowloadable?
              >
              > Bye
              > Markus
              >
              > --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@>
              > wrote:
              > >
              > > You don't need the wildcard version if you just want to exclude
              > > something that starts with
              > >
              > > https://my.own.site/first/removepage
              > >
              > > The wildcard version does not support "\" or regular expressions.
              > The
              > > best would be to enter
              > >
              > > *removepage*
              > >
              > > in the exclusion list.
              > >
              > > Tilman
              > >
              > > On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
              > >
              > > >Hello all together,
              > > >
              > > >I have read the posts to the topic filtering, but have not found
              > what
              > > >I expected. So I want to reach the following:
              > > >
              > > >1. We have a lot of small web sites, that all start with https://
              > > >my.own.site/, the web sites themselves are then https://
              > my.own.site/
              > > >first https://my.own.site/second ...
              > > >2. There are special actions that I want to filter all the time.
              > All
              > > >of them have a '?' in the URL for the parameters.
              > > >3. I only want to check the consistency of one of the web sites,
              > not
              > > >of all.
              > > >
              > > >So the filter should do:
              > > >Include: https://my.own.site/first
              > > >Exclude: https://my.own.site/%5b-first]
              > > >Exclude: https://my.own.site/*\?*
              > > >
              > > >I have downloaded xenuwild2.zip which allows to specify not only
              > the
              > > >beginning, but the whole URL with wild cards.
              > > >
              > > >Is there a way to reach what I want to do with Xenu? It is a
              > fabulous
              > > >tool, but without this feature, it would be too dangerous to use
              > it
              > > >(https://my.own.site/first/removepage?pageID=xyz).
              > > >
              > > >Bye and thank's for a fine tool
              > > >
              > > >Markus
              > > >
              > > >
              > > >
              > > >
              > > >Yahoo! Groups Links
              > > >
              > > >
              > > >
              > >
              >
            Your message has been successfully submitted and would be delivered to recipients shortly.