Loading ...
Sorry, an error occurred while loading the content.

Filter URLs

Expand Messages
  • markus_liebelt_82
    Hello all together, I have read the posts to the topic filtering, but have not found what I expected. So I want to reach the following: 1. We have a lot of
    Message 1 of 6 , Oct 10, 2007
    • 0 Attachment
      Hello all together,

      I have read the posts to the topic filtering, but have not found what
      I expected. So I want to reach the following:

      1. We have a lot of small web sites, that all start with https://
      my.own.site/, the web sites themselves are then https://my.own.site/
      first https://my.own.site/second ...
      2. There are special actions that I want to filter all the time. All
      of them have a '?' in the URL for the parameters.
      3. I only want to check the consistency of one of the web sites, not
      of all.

      So the filter should do:
      Include: https://my.own.site/first
      Exclude: https://my.own.site/%5b-first]
      Exclude: https://my.own.site/*\?*

      I have downloaded xenuwild2.zip which allows to specify not only the
      beginning, but the whole URL with wild cards.

      Is there a way to reach what I want to do with Xenu? It is a fabulous
      tool, but without this feature, it would be too dangerous to use it
      (https://my.own.site/first/removepage?pageID=xyz).

      Bye and thank's for a fine tool

      Markus
    • Tilman Hausherr
      You don t need the wildcard version if you just want to exclude something that starts with https://my.own.site/first/removepage The wildcard version does not
      Message 2 of 6 , Oct 10, 2007
      • 0 Attachment
        You don't need the wildcard version if you just want to exclude
        something that starts with

        https://my.own.site/first/removepage

        The wildcard version does not support "\" or regular expressions. The
        best would be to enter

        *removepage*

        in the exclusion list.

        Tilman

        On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:

        >Hello all together,
        >
        >I have read the posts to the topic filtering, but have not found what
        >I expected. So I want to reach the following:
        >
        >1. We have a lot of small web sites, that all start with https://
        >my.own.site/, the web sites themselves are then https://my.own.site/
        >first https://my.own.site/second ...
        >2. There are special actions that I want to filter all the time. All
        >of them have a '?' in the URL for the parameters.
        >3. I only want to check the consistency of one of the web sites, not
        >of all.
        >
        >So the filter should do:
        >Include: https://my.own.site/first
        >Exclude: https://my.own.site/%5b-first]
        >Exclude: https://my.own.site/*\?*
        >
        >I have downloaded xenuwild2.zip which allows to specify not only the
        >beginning, but the whole URL with wild cards.
        >
        >Is there a way to reach what I want to do with Xenu? It is a fabulous
        >tool, but without this feature, it would be too dangerous to use it
        >(https://my.own.site/first/removepage?pageID=xyz).
        >
        >Bye and thank's for a fine tool
        >
        >Markus
        >
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
      • Tilman Hausherr
        I ve uploaded the version 1.2j: 8.10.2007 (1.2j) Major improvements: - 5.6.2007: second options pane with 7 secret settings - 7.7.2007: up/down sort symbol
        Message 3 of 6 , Oct 11, 2007
        • 0 Attachment
          I've uploaded the version 1.2j:


          8.10.2007 (1.2j)
          Major improvements:
          - 5.6.2007: second options pane with 7 "secret" settings
          - 7.7.2007: up/down sort symbol on column header

          http://www.codeguru.com/cpp/controls/listview/advanced/article.php/c4179/
          Minor improvements:
          - 4.10.2006: visible URLs are first in new threads
          - 4.10.2006: update listctrl when "busy" is set
          - 7.10.2006: 2nd part of report more efficient for huge sites
          - 12.10.2006: REMOVEDOUBLESLASH compile option removes "/../" too
          - 15.10.2006: application/xhtml+xml is hypertext, too
          - 15.10.2006: Switched to InnoSetup 5.1.8
          - 30.10.2006: Skip aim://, ymsgr://, rtsp://, xmpp://
          - 30.11.2006: better error message for ShellExecute() errors
          - 30.11.2006: "//" in URL after the host name is not "broken" when after
          a "?"
          - 8.1.2007: Max title length 1024
          - 16.1.2007: ftp dialogbox wider
          - 19.1.2007: [Options] MakeLowerCase=1 ==> converts all URLs to lower
          case
          (default is 0)
          - 3.3.2007: [Options] ListLocalDirectories=1 ==> local directory
          listing (default is 0)
          - ??.3.2007: [Options] AllowLocalFilesInRemoteCheck=1 ==> Allow
          file:// links in remote check
          (default is 0)
          - 16.3.2007: Skip callto:
          - 25.3.3007: meta generator
          - 31.3.2007: Upgraded to InnoSetup 5.1.11
          - 31.3.2007: Title TrimRight()
          - 31.3.2007: update listctrl when title becomes known
          - 31.3.2007: convert titles in sitemap to &...; notation
          - 1.4.2007: Added most of
          http://www.htmlhelp.com/reference/html40/entities/special.html to
          conversion table
          - 29.5.2007: "asterisk" sound when done
          - 2.6.2007: -save option for command line version to save .XEN file
          (does overwrite)
          - 2.6.2007: all command line options for command line version can now be
          combined
          - 5.6.2007: MakeLowerCase, vNormalizeURL() slightly changed internally
          - 6.6.2007: .XEN Archive version 10
          - 8.6.2007: "Autostart" feature when opening .XEN file
          - 8.6.2007: all command line options for command line version can be
          used when opening .XEN file
          - 28.7.2007: retry feature in command line version (test)
          - 3.8.2007: Upgraded to InnoSetup 5.1.13
          - 15.8.2007: reset sort icon, and vUpdateColumnSortIcon() at InsertAll()

          Bug fixes:
          - 7.12.2006: check for iIndex < pList->GetItemCount()
          - 13.2.2007: corrected bug in ListLocalDirectories feature (last file
          ignored)
          - 15.2.2007: wildcard version adds "*" at the end of each entry in
          "Check URL list"
          - 23.5.2007: aim: instead of aim://
          - 20.8.2007: remove "file://" for ShellExecute()
          - 21.9.2007: % size corrected in statistic (was % count!)
          - 22.9.2007: fixed FindFile security leak,
          http://goodfellas.shellcode.com.ar/own/VULWKU200706142
        • markus_liebelt_82
          Hi Tilman, nice idea, I will prove if that is sufficient for the dangerous links. But what about reducing the web site to only one part and excluding the rest?
          Message 4 of 6 , Oct 11, 2007
          • 0 Attachment
            Hi Tilman,

            nice idea, I will prove if that is sufficient for the dangerous
            links. But what about reducing the web site to only one part and
            excluding the rest? Any idea how to get that? There should be an
            order or match expressions, with rules if they should followed or not.

            Like:
            Follow not: removepage
            Follow: https://my.own.site/first
            Follow not: https://my.own.site/

            Will that work when I start with: https://my.own.site/first and
            include the other two in the "do not check" section? It looks for me
            as links to other parts of the web (in which I am not interested in)
            are followed as well.

            By the way, where is the new version dowloadable?

            Bye
            Markus

            --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...>
            wrote:
            >
            > You don't need the wildcard version if you just want to exclude
            > something that starts with
            >
            > https://my.own.site/first/removepage
            >
            > The wildcard version does not support "\" or regular expressions.
            The
            > best would be to enter
            >
            > *removepage*
            >
            > in the exclusion list.
            >
            > Tilman
            >
            > On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
            >
            > >Hello all together,
            > >
            > >I have read the posts to the topic filtering, but have not found
            what
            > >I expected. So I want to reach the following:
            > >
            > >1. We have a lot of small web sites, that all start with https://
            > >my.own.site/, the web sites themselves are then https://
            my.own.site/
            > >first https://my.own.site/second ...
            > >2. There are special actions that I want to filter all the time.
            All
            > >of them have a '?' in the URL for the parameters.
            > >3. I only want to check the consistency of one of the web sites,
            not
            > >of all.
            > >
            > >So the filter should do:
            > >Include: https://my.own.site/first
            > >Exclude: https://my.own.site/%5b-first]
            > >Exclude: https://my.own.site/*\?*
            > >
            > >I have downloaded xenuwild2.zip which allows to specify not only
            the
            > >beginning, but the whole URL with wild cards.
            > >
            > >Is there a way to reach what I want to do with Xenu? It is a
            fabulous
            > >tool, but without this feature, it would be too dangerous to use
            it
            > >(https://my.own.site/first/removepage?pageID=xyz).
            > >
            > >Bye and thank's for a fine tool
            > >
            > >Markus
            > >
            > >
            > >
            > >
            > >Yahoo! Groups Links
            > >
            > >
            > >
            >
          • Tilman Hausherr
            ... if you use the wildcard version, you should add this exclude: *removepage* https://my.own.site/* include: https://my.own.site/first* However it is an
            Message 5 of 6 , Oct 11, 2007
            • 0 Attachment
              On Thu, 11 Oct 2007 15:48:07 -0000, markus_liebelt_82 wrote:

              >Hi Tilman,
              >
              >nice idea, I will prove if that is sufficient for the dangerous
              >links. But what about reducing the web site to only one part and
              >excluding the rest? Any idea how to get that? There should be an
              >order or match expressions, with rules if they should followed or not.
              >
              >Like:
              >Follow not: removepage
              >Follow: https://my.own.site/first
              >Follow not: https://my.own.site/

              if you use the wildcard version, you should add this

              exclude:
              *removepage*
              https://my.own.site/*

              include:
              https://my.own.site/first*

              However it is an obvious contradiction.

              >Will that work when I start with: https://my.own.site/first and
              >include the other two in the "do not check" section? It looks for me
              >as links to other parts of the web (in which I am not interested in)
              >are followed as well.

              I think it doesn't make sense. It won't work, since you are excluding
              everything.

              A possible solution is to use
              https://my.own.site/first/
              as start URL. Note the "/" at the end. It will might check
              https://my.own.site/rest/ for existance, but won't spider it.

              >By the way, where is the new version dowloadable?

              http://home.snafu.de/tilman/XENU.ZIP

              (this is not the wildcard version)

              Tilman

              >
              >Bye
              > Markus
              >
              >--- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...>
              >wrote:
              >>
              >> You don't need the wildcard version if you just want to exclude
              >> something that starts with
              >>
              >> https://my.own.site/first/removepage
              >>
              >> The wildcard version does not support "\" or regular expressions.
              >The
              >> best would be to enter
              >>
              >> *removepage*
              >>
              >> in the exclusion list.
              >>
              >> Tilman
              >>
              >> On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
              >>
              >> >Hello all together,
              >> >
              >> >I have read the posts to the topic filtering, but have not found
              >what
              >> >I expected. So I want to reach the following:
              >> >
              >> >1. We have a lot of small web sites, that all start with https://
              >> >my.own.site/, the web sites themselves are then https://
              >my.own.site/
              >> >first https://my.own.site/second ...
              >> >2. There are special actions that I want to filter all the time.
              >All
              >> >of them have a '?' in the URL for the parameters.
              >> >3. I only want to check the consistency of one of the web sites,
              >not
              >> >of all.
              >> >
              >> >So the filter should do:
              >> >Include: https://my.own.site/first
              >> >Exclude: https://my.own.site/%5b-first]
              >> >Exclude: https://my.own.site/*\?*
              >> >
              >> >I have downloaded xenuwild2.zip which allows to specify not only
              >the
              >> >beginning, but the whole URL with wild cards.
              >> >
              >> >Is there a way to reach what I want to do with Xenu? It is a
              >fabulous
              >> >tool, but without this feature, it would be too dangerous to use
              >it
              >> >(https://my.own.site/first/removepage?pageID=xyz).
              >> >
              >> >Bye and thank's for a fine tool
              >> >
              >> >Markus
              >> >
              >> >
              >> >
              >> >
              >> >Yahoo! Groups Links
              >> >
              >> >
              >> >
              >>
              >
              >
              >
              >
              >
              >Yahoo! Groups Links
              >
              >
              >
            • markus_liebelt_82
              Bad habit to answer the own questions, but: - It works now with the hint to include a / at the end of starting point. - I need the wildcards to exclude
              Message 6 of 6 , Oct 11, 2007
              • 0 Attachment
                Bad habit to answer the own questions, but:
                - It works now with the hint to include a '/' at the end of starting
                point.
                - I need the wildcards to exclude *action*, *showComment* ...
                - I am able to skip the rest of the web site by giving an exclusion
                rule like in my example.

                Thank's a lot, and have a nice day :-)

                Markus

                --- In xenu-usergroup@yahoogroups.com,
                "markus_liebelt_82" <Markus.Liebelt@...> wrote:
                >
                > Hi Tilman,
                >
                > nice idea, I will prove if that is sufficient for the dangerous
                > links. But what about reducing the web site to only one part and
                > excluding the rest? Any idea how to get that? There should be an
                > order or match expressions, with rules if they should followed or
                not.
                >
                > Like:
                > Follow not: removepage
                > Follow: https://my.own.site/first
                > Follow not: https://my.own.site/
                >
                > Will that work when I start with: https://my.own.site/first and
                > include the other two in the "do not check" section? It looks for
                me
                > as links to other parts of the web (in which I am not interested
                in)
                > are followed as well.
                >
                > By the way, where is the new version dowloadable?
                >
                > Bye
                > Markus
                >
                > --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@>
                > wrote:
                > >
                > > You don't need the wildcard version if you just want to exclude
                > > something that starts with
                > >
                > > https://my.own.site/first/removepage
                > >
                > > The wildcard version does not support "\" or regular expressions.
                > The
                > > best would be to enter
                > >
                > > *removepage*
                > >
                > > in the exclusion list.
                > >
                > > Tilman
                > >
                > > On Wed, 10 Oct 2007 18:49:37 -0000, markus_liebelt_82 wrote:
                > >
                > > >Hello all together,
                > > >
                > > >I have read the posts to the topic filtering, but have not found
                > what
                > > >I expected. So I want to reach the following:
                > > >
                > > >1. We have a lot of small web sites, that all start with https://
                > > >my.own.site/, the web sites themselves are then https://
                > my.own.site/
                > > >first https://my.own.site/second ...
                > > >2. There are special actions that I want to filter all the time.
                > All
                > > >of them have a '?' in the URL for the parameters.
                > > >3. I only want to check the consistency of one of the web sites,
                > not
                > > >of all.
                > > >
                > > >So the filter should do:
                > > >Include: https://my.own.site/first
                > > >Exclude: https://my.own.site/%5b-first]
                > > >Exclude: https://my.own.site/*\?*
                > > >
                > > >I have downloaded xenuwild2.zip which allows to specify not only
                > the
                > > >beginning, but the whole URL with wild cards.
                > > >
                > > >Is there a way to reach what I want to do with Xenu? It is a
                > fabulous
                > > >tool, but without this feature, it would be too dangerous to use
                > it
                > > >(https://my.own.site/first/removepage?pageID=xyz).
                > > >
                > > >Bye and thank's for a fine tool
                > > >
                > > >Markus
                > > >
                > > >
                > > >
                > > >
                > > >Yahoo! Groups Links
                > > >
                > > >
                > > >
                > >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.