Loading ...
Sorry, an error occurred while loading the content.

Re: [xenu-usergroup] Re: Filter URLs

Expand Messages
  • Tilman Hausherr
    ... if you use the wildcard version, you should add this exclude: *removepage* https://my.own.site/* include: https://my.own.site/first* However it is an
    Message 1 of 6 , Oct 11, 2007
    • 0 Attachment
      On Thu, 11 Oct 2007 15:48:07 -0000, markus_liebelt_82 wrote:

      >Hi Tilman,
      >
      >nice idea, I will prove if that is sufficient for the dangerous
      >links. But what about reducing the web site to only one part and
      >excluding the rest? Any idea how to get that? There should be an
      >order or match expressions, with rules if they should followed or not.
      >
      >Like:
      >Follow not: removepage
      >Follow: https://my.own.site/first
      >Follow not: https://my.own.site/

      if you use the wildcard version, you should add this

      exclude:
      *removepage*
      https://my.own.site/*

      include:
      https://my.own.site/first*

      However it is an obvious contradiction.

      >Will that work when I start with: https://my.own.site/first and
      >include the other two in the "do not check" section? It looks for me
      >as links to other parts of the web (in which I am not interested in)
      >are followed as well.

      I think it doesn't make sense. It won't work, since you are excluding
      everything.

      A possible solution is to use
      https://my.own.site/first/
      as start URL. Note the "/" at the end. It will might check
      https://my.own.site/rest/ for existance, but won't spider it.

      >By the way, where is the new version dowloadable?

      http://home.snafu.de/tilman/XENU.ZIP

      (this is not the wildcard version)

      Tilman

      >
      >Bye
      > Markus
      >
      >--- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...>
      >wrote:
      >>
      >> You don't need the wildcard version if you just want to exclude
      >> something that starts with
      >>
      >> https://my.own.site/first/removepage
      >>
      >> The wildcard version does not support "\" or regular expressions.
      >The
      >> best would be to enter
      >>
      >> *removepage*
      >>
      >> in the exclusion list.
      >>
      >> Tilman
      >>
      >> On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
      >>
      >> >Hello all together,
      >> >
      >> >I have read the posts to the topic filtering, but have not found
      >what
      >> >I expected. So I want to reach the following:
      >> >
      >> >1. We have a lot of small web sites, that all start with https://
      >> >my.own.site/, the web sites themselves are then https://
      >my.own.site/
      >> >first https://my.own.site/second ...
      >> >2. There are special actions that I want to filter all the time.
      >All
      >> >of them have a '?' in the URL for the parameters.
      >> >3. I only want to check the consistency of one of the web sites,
      >not
      >> >of all.
      >> >
      >> >So the filter should do:
      >> >Include: https://my.own.site/first
      >> >Exclude: https://my.own.site/%5b-first]
      >> >Exclude: https://my.own.site/*\?*
      >> >
      >> >I have downloaded xenuwild2.zip which allows to specify not only
      >the
      >> >beginning, but the whole URL with wild cards.
      >> >
      >> >Is there a way to reach what I want to do with Xenu? It is a
      >fabulous
      >> >tool, but without this feature, it would be too dangerous to use
      >it
      >> >(https://my.own.site/first/removepage?pageID=xyz).
      >> >
      >> >Bye and thank's for a fine tool
      >> >
      >> >Markus
      >> >
      >> >
      >> >
      >> >
      >> >Yahoo! Groups Links
      >> >
      >> >
      >> >
      >>
      >
      >
      >
      >
      >
      >Yahoo! Groups Links
      >
      >
      >
    • markus_liebelt_82
      Bad habit to answer the own questions, but: - It works now with the hint to include a / at the end of starting point. - I need the wildcards to exclude
      Message 2 of 6 , Oct 11, 2007
      • 0 Attachment
        Bad habit to answer the own questions, but:
        - It works now with the hint to include a '/' at the end of starting
        point.
        - I need the wildcards to exclude *action*, *showComment* ...
        - I am able to skip the rest of the web site by giving an exclusion
        rule like in my example.

        Thank's a lot, and have a nice day :-)

        Markus

        --- In xenu-usergroup@yahoogroups.com,
        "markus_liebelt_82" <Markus.Liebelt@...> wrote:
        >
        > Hi Tilman,
        >
        > nice idea, I will prove if that is sufficient for the dangerous
        > links. But what about reducing the web site to only one part and
        > excluding the rest? Any idea how to get that? There should be an
        > order or match expressions, with rules if they should followed or
        not.
        >
        > Like:
        > Follow not: removepage
        > Follow: https://my.own.site/first
        > Follow not: https://my.own.site/
        >
        > Will that work when I start with: https://my.own.site/first and
        > include the other two in the "do not check" section? It looks for
        me
        > as links to other parts of the web (in which I am not interested
        in)
        > are followed as well.
        >
        > By the way, where is the new version dowloadable?
        >
        > Bye
        > Markus
        >
        > --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@>
        > wrote:
        > >
        > > You don't need the wildcard version if you just want to exclude
        > > something that starts with
        > >
        > > https://my.own.site/first/removepage
        > >
        > > The wildcard version does not support "\" or regular expressions.
        > The
        > > best would be to enter
        > >
        > > *removepage*
        > >
        > > in the exclusion list.
        > >
        > > Tilman
        > >
        > > On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
        > >
        > > >Hello all together,
        > > >
        > > >I have read the posts to the topic filtering, but have not found
        > what
        > > >I expected. So I want to reach the following:
        > > >
        > > >1. We have a lot of small web sites, that all start with https://
        > > >my.own.site/, the web sites themselves are then https://
        > my.own.site/
        > > >first https://my.own.site/second ...
        > > >2. There are special actions that I want to filter all the time.
        > All
        > > >of them have a '?' in the URL for the parameters.
        > > >3. I only want to check the consistency of one of the web sites,
        > not
        > > >of all.
        > > >
        > > >So the filter should do:
        > > >Include: https://my.own.site/first
        > > >Exclude: https://my.own.site/%5b-first]
        > > >Exclude: https://my.own.site/*\?*
        > > >
        > > >I have downloaded xenuwild2.zip which allows to specify not only
        > the
        > > >beginning, but the whole URL with wild cards.
        > > >
        > > >Is there a way to reach what I want to do with Xenu? It is a
        > fabulous
        > > >tool, but without this feature, it would be too dangerous to use
        > it
        > > >(https://my.own.site/first/removepage?pageID=xyz).
        > > >
        > > >Bye and thank's for a fine tool
        > > >
        > > >Markus
        > > >
        > > >
        > > >
        > > >
        > > >Yahoo! Groups Links
        > > >
        > > >
        > > >
        > >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.