Loading ...
Sorry, an error occurred while loading the content.

1.3.9--wildcards?

Expand Messages
  • Jonathan Crane
    Tilman, does this version support wildcards or is there still a separate wildcard version? j
    Message 1 of 6 , May 18, 2011
    • 0 Attachment
      Xenu Linkchecker Usergroup

      Tilman, does this version support wildcards or is there still a separate wildcard version?

      j

    • Tilman Hausherr
      No, the wildcards version is still separate. Tilman
      Message 2 of 6 , May 22, 2011
      • 0 Attachment
        No, the wildcards version is still separate.

        Tilman

        On Wed, 18 May 2011 13:34:59 +0000, Jonathan Crane wrote:

        >Tilman, does this version support wildcards or is there still a separate wildcard version?
        >j
      • Fischer, Thomas
        Hi Tilman, I am checking some of my websites and found some unexpected problems. Can you comment? 1. On the page
        Message 3 of 6 , May 23, 2011
        • 0 Attachment
          Hi Tilman,
           
          I am checking some of my websites and found some unexpected problems. Can you comment?
           
          1. On the page
           
          Xenu gives a "not found" error for a link "http://www.mathguide.de/db=math/type=form" which is linked from "Source Type Catalog".
           While that URL really doesn't exist, the actual link (in the source code) is
           
          <A HREF="db=math/type=subj"><IMG ALT="Subject Catalog" SRC="/grafiken/new-left.gif" ALIGN=TOP BORDER=0 HEIGHT=15 WIDTH=15 HSPACE=5>Subject Catalog</A>
           
          with a base of that page as
           
           
          which is (correctly, I assume) evaluated to the working link
          by the browsers.
          This is part of a frame (e.g. http://www.mathguide.de/cgi-bin/ssgfi/navigator2.pl/db=math/type=subj/frame=1), but this shouldn't matter, should it?
           
          2. I get the error message
          http://www.liv.ac.uk/maths/
             \_____ error code: 400 (no object data)
           and similarly,
          and
          http://magma.maths.usyd.edu.au/magma/
             \_____ error code: 401 (auth required)
          while the links seem to work fine.
           
          I know of "forbidden requests" like
          http://de.wikibooks.org/wiki/Regal:Mathematik
             \_____ error code: 403 (forbidden request)
          but this seems to be different and starts to get annoying.
           
          3. I am shown errors like
          or
          http://www.nist.gov/dads/
           http://xlinux.nist.gov/dads//
             \_____ error code: 404 (not found)
           
          Is there a way to find out how they relate to my website (www.MathGuide.de in this case)?
           
          All the best, and thanks for this great piece of free software
          Thomas

          --
          Dr. Thomas Fischer
          Research and Development Department (RDD)
          Göttingen State and University Library
          Georg-August-Universität Göttingen
          37073 Göttingen
          Germany

          Tel.: +49 551 393883
          and   +43 662 621498
          fischer@...-goettingen.de
          http://www.MathGuide.de/

        • Tilman Hausherr
          ... No, its Source Type
          Message 4 of 6 , May 23, 2011
          • 0 Attachment
            On Mon, 23 May 2011 13:58:24 +0200, Fischer, Thomas wrote:

            >Hi Tilman,
            >
            >I am checking some of my websites and found some unexpected problems. Can you comment?
            >
            >1. On the page
            >http://www.mathguide.de/cgi-bin/ssgfi/navigator.pl?db=math&type=subj
            >
            >Xenu gives a "not found" error for a link "http://www.mathguide.de/db=math/type=form" which is linked from "Source Type Catalog".
            > While that URL really doesn't exist, the actual link (in the source code) is
            >
            ><A HREF="db=math/type=subj"><IMG ALT="Subject Catalog" SRC="/grafiken/new-left.gif" ALIGN=TOP BORDER=0 HEIGHT=15 WIDTH=15 HSPACE=5>Subject Catalog</A>


            No, its

            <A HREF="db=math/type=form"><IMG ALT="Source Type Catalog"
            SRC="/grafiken/new-right.gif" ALIGN=TOP BORDER=0 HEIGHT=15 WIDTH=15
            HSPACE=5>Source Type Catalog</A>

            and that "Source Type Catalog" link is really broken, even with a
            browser. (Opera)

            >
            >with a base of that page as
            >
            ><BASE HREF="http://www.MathGuide.de/cgi-bin/ssgfi/navigator2.pl/">

            No, it is
            <BASE HREF="http://www.MathGuide.de/">

            >
            >which is (correctly, I assume) evaluated to the working link
            >http://www.MathGuide.de/cgi-bin/ssgfi/navigator2.pl/db=math/type=subj
            >by the browsers.
            >This is part of a frame (e.g. http://www.mathguide.de/cgi-bin/ssgfi/navigator2.pl/db=math/type=subj/frame=1), but this shouldn't matter, should it?

            Correct.

            >
            >2. I get the error message
            >http://www.liv.ac.uk/maths/
            > \_____ error code: 400 (no object data)
            > and similarly,
            >http://www.google.com/Top/World/Deutsch/Wissenschaft/Mathematik/
            > \_____ error code: 404 (not found)
            >and
            >http://magma.maths.usyd.edu.au/magma/
            > \_____ error code: 401 (auth required)
            >while the links seem to work fine.
            >
            >I know of "forbidden requests" like
            >http://de.wikibooks.org/wiki/Regal:Mathematik
            > \_____ error code: 403 (forbidden request)
            >but this seems to be different and starts to get annoying.

            Yes, all these servers "hate" Xenu, as described here
            http://home.snafu.de/tilman/xenulink.html Nr. 20

            >3. I am shown errors like
            >
            >http://www.mpib-berlin.mpg.de/DOK/metatagd.htm
            > http://www.mpib-berlin.mpg.de/de/DOK/metatagd.htm
            > \_____ error code: 404 (not found)
            >or
            >http://www.nist.gov/dads/
            > http://xlinux.nist.gov/dads//
            > \_____ error code: 404 (not found)
            >
            >Is there a way to find out how they relate to my website (www.MathGuide.de<http://www.MathGuide.de> in this case)?

            Yes, see the redirection segment in the report

            Tilman

            >
            >All the best, and thanks for this great piece of free software
            >Thomas
          • Fischer, Thomas
            Hi Tilman, thanks for the speedy reply! ... You are so right! Sorry for my blindness, these pages are built together dynamically and I didn t look right. It
            Message 5 of 6 , May 24, 2011
            • 0 Attachment
              Hi Tilman,

              thanks for the speedy reply!

              > >1. On the page
              > >http://www.mathguide.de/cgi-bin/ssgfi/navigator.pl?db=math&type=subj
              > >
              > >Xenu gives a "not found" error for a link
              > "http://www.mathguide.de/db=math/type=form" which is linked
              > from "Source Type Catalog".
              > > While that URL really doesn't exist, the actual link (in the source
              > >code) is
              > >
              > ><A HREF="db=math/type=subj"><IMG ALT="Subject Catalog"
              > >SRC="/grafiken/new-left.gif" ALIGN=TOP BORDER=0 HEIGHT=15 WIDTH=15
              > >HSPACE=5>Subject Catalog</A>
              >
              > No, its
              >
              > <A HREF="db=math/type=form"><IMG ALT="Source Type Catalog"
              > SRC="/grafiken/new-right.gif" ALIGN=TOP BORDER=0 HEIGHT=15
              > WIDTH=15 HSPACE=5>Source Type Catalog</A>
              >
              > and that "Source Type Catalog" link is really broken, even
              > with a browser. (Opera)
              >
              > >
              > >with a base of that page as
              > >
              > ><BASE HREF="http://www.MathGuide.de/cgi-bin/ssgfi/navigator2.pl/">
              >
              > No, it is
              > <BASE HREF="http://www.MathGuide.de/">

              You are so right! Sorry for my blindness, these pages are built together dynamically and I didn't look right. It took me ages and three runs of Xenu to eradicate all the "navigator.pl"-links that -- while working -- created the erroneous links. I would have needed some additional back step in the error report:

              http://www.mathguide.de/cgi-bin/ssgfi/navigator.pl?db=math&type=form
              http://www.mathguide.de/db=math/type=subj
              \_____ error code: 404 (not found)

              Where is the first page linked from?
              I eventually created a site map that helped a little.

              > >2. I get the error message
              > >http://www.liv.ac.uk/maths/
              > > \_____ error code: 400 (no object data) and similarly,
              > >http://www.google.com/Top/World/Deutsch/Wissenschaft/Mathematik/
              > > \_____ error code: 404 (not found)
              > >and
              > >http://magma.maths.usyd.edu.au/magma/
              > > \_____ error code: 401 (auth required) while the links seem to work
              > >fine.
              > >
              > >I know of "forbidden requests" like
              > >http://de.wikibooks.org/wiki/Regal:Mathematik
              > > \_____ error code: 403 (forbidden request) but this seems to be
              > >different and starts to get annoying.
              >
              > Yes, all these servers "hate" Xenu, as described here
              > http://home.snafu.de/tilman/xenulink.html Nr. 20

              I wasn't aware of that. But it *is* a nuisance.
              The page "http://www.andilinks.com/linkckg.shtm" referred to on xenulink.html Nr. 20 says:
              "Be sure to mark those that deny Xenu so they can be easily excluded or remembered on the next pass."
              I have no clue how I can exclude dozens (50?) websites from being checked. Filling them into the tiny space at the bottom of the "Check URL" form seems pretty cumbersome; some additional "Exclusionlist" file might be helpful.
              Actually, I have already problems dealing with the 15 or so exclusions I use and have some trouble managing my Xenu.ini file (about 1900 lines by now). Is there a description of the specifics of this file somewhere? Most of the syntax can be guessed, but the connection between the [Recent URL List] and the specific inclusions and exclusions isn't quite clear to me.

              > >3. I am shown errors like
              > >
              > >http://www.mpib-berlin.mpg.de/DOK/metatagd.htm
              > > http://www.mpib-berlin.mpg.de/de/DOK/metatagd.htm
              > > \_____ error code: 404 (not found)
              > >or
              > >http://www.nist.gov/dads/
              > > http://xlinux.nist.gov/dads//
              > > \_____ error code: 404 (not found)
              > >
              > >Is there a way to find out how they relate to my website
              > (www.MathGuide.de<http://www.MathGuide.de> in this case)?
              >
              > Yes, see the redirection segment in the report

              Thanks, I found that. I suppose it would be too complicated to bring these too bits of information together automatically? I have loads of redirects and would try to fix the ones which are broken *and* permanent first. But they are not so easily spotted.

              Another thing (I might have mentioned before): we use Dublin Core Metadata, which requires the following header information (see http://dublincore.org/documents/dc-html/):

              <head profile="http://dublincore.org/documents/2008/08/04/dc-html/">
              <title>...</title>
              <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
              <meta name="DC.title" content="..." >
              </head>
              Unfortunately that means that I get a warning

              http://purl.org/dc/elements/1.1/
              redirected to: http://dublincore.org/2010/10/11/dcelements.rdf
              status code: 302 (object temporarily moved)

              for every single page on my site.
              Would a "don't check 'http://purl.org/dc/elements/1.1/'" help in this case?

              BTW, It seems that Xenu can check Tono's URLs just fine, but my respective mail didn't make it to the list yet.

              Thanks again
              Thomas
            • Tilman Hausherr
              ... Search it and press ALT-ENTER ... Yeah, that part sucks. I should replace this with a normal big text box some day. ... No, you should guess :-) [Recent
              Message 6 of 6 , May 24, 2011
              • 0 Attachment
                On Tue, 24 May 2011 15:58:39 +0200, Fischer, Thomas wrote:

                >Hi Tilman,
                >
                >thanks for the speedy reply!
                >
                >> >1. On the page
                >> >http://www.mathguide.de/cgi-bin/ssgfi/navigator.pl?db=math&type=subj
                >> >
                >> >Xenu gives a "not found" error for a link
                >> "http://www.mathguide.de/db=math/type=form" which is linked
                >> from "Source Type Catalog".
                >> > While that URL really doesn't exist, the actual link (in the source
                >> >code) is
                >> >
                >> ><A HREF="db=math/type=subj"><IMG ALT="Subject Catalog"
                >> >SRC="/grafiken/new-left.gif" ALIGN=TOP BORDER=0 HEIGHT=15 WIDTH=15
                >> >HSPACE=5>Subject Catalog</A>
                >>
                >> No, its
                >>
                >> <A HREF="db=math/type=form"><IMG ALT="Source Type Catalog"
                >> SRC="/grafiken/new-right.gif" ALIGN=TOP BORDER=0 HEIGHT=15
                >> WIDTH=15 HSPACE=5>Source Type Catalog</A>
                >>
                >> and that "Source Type Catalog" link is really broken, even
                >> with a browser. (Opera)
                >>
                >> >
                >> >with a base of that page as
                >> >
                >> ><BASE HREF="http://www.MathGuide.de/cgi-bin/ssgfi/navigator2.pl/">
                >>
                >> No, it is
                >> <BASE HREF="http://www.MathGuide.de/">
                >
                >You are so right! Sorry for my blindness, these pages are built together dynamically and I didn't look right. It took me ages and three runs of Xenu to eradicate all the "navigator.pl"-links that -- while working -- created the erroneous links. I would have needed some additional back step in the error report:
                >
                >http://www.mathguide.de/cgi-bin/ssgfi/navigator.pl?db=math&type=form
                > http://www.mathguide.de/db=math/type=subj
                > \_____ error code: 404 (not found)
                >
                >Where is the first page linked from?

                Search it and press ALT-ENTER

                >I eventually created a site map that helped a little.
                >
                >> >2. I get the error message
                >> >http://www.liv.ac.uk/maths/
                >> > \_____ error code: 400 (no object data) and similarly,
                >> >http://www.google.com/Top/World/Deutsch/Wissenschaft/Mathematik/
                >> > \_____ error code: 404 (not found)
                >> >and
                >> >http://magma.maths.usyd.edu.au/magma/
                >> > \_____ error code: 401 (auth required) while the links seem to work
                >> >fine.
                >> >
                >> >I know of "forbidden requests" like
                >> >http://de.wikibooks.org/wiki/Regal:Mathematik
                >> > \_____ error code: 403 (forbidden request) but this seems to be
                >> >different and starts to get annoying.
                >>
                >> Yes, all these servers "hate" Xenu, as described here
                >> http://home.snafu.de/tilman/xenulink.html Nr. 20
                >
                >I wasn't aware of that. But it *is* a nuisance.
                >The page "http://www.andilinks.com/linkckg.shtm" referred to on xenulink.html Nr. 20 says:
                >"Be sure to mark those that deny Xenu so they can be easily excluded or remembered on the next pass."
                >I have no clue how I can exclude dozens (50?) websites from being checked. Filling them into the tiny space at the bottom of the "Check URL" form seems pretty cumbersome; some additional "Exclusionlist" file might be helpful.

                Yeah, that part sucks. I should replace this with a normal big text box
                some day.

                >Actually, I have already problems dealing with the 15 or so exclusions I use and have some trouble managing my Xenu.ini file (about 1900 lines by now). Is there a description of the specifics of this file somewhere? Most of the syntax can be guessed, but the connection between the [Recent URL List] and the specific inclusions and exclusions isn't quite clear to me.

                No, you should guess :-) [Recent URL List] is for the combo box. There
                is also a general include / exclude in the ini file, and there are the
                same with an URL in it, these are URL-specific.

                My own Xenu.ini file is 300K.

                >
                >> >3. I am shown errors like
                >> >
                >> >http://www.mpib-berlin.mpg.de/DOK/metatagd.htm
                >> > http://www.mpib-berlin.mpg.de/de/DOK/metatagd.htm
                >> > \_____ error code: 404 (not found)
                >> >or
                >> >http://www.nist.gov/dads/
                >> > http://xlinux.nist.gov/dads//
                >> > \_____ error code: 404 (not found)
                >> >
                >> >Is there a way to find out how they relate to my website
                >> (www.MathGuide.de<http://www.MathGuide.de> in this case)?
                >>
                >> Yes, see the redirection segment in the report
                >
                >Thanks, I found that. I suppose it would be too complicated to bring these too bits of information together automatically? I have loads of redirects and would try to fix the ones which are broken *and* permanent first. But they are not so easily spotted.

                Tricky indeed.

                >Another thing (I might have mentioned before): we use Dublin Core Metadata, which requires the following header information (see http://dublincore.org/documents/dc-html/):
                >
                > <head profile="http://dublincore.org/documents/2008/08/04/dc-html/">
                > <title>...</title>
                > <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
                > <meta name="DC.title" content="..." >
                > </head>
                >Unfortunately that means that I get a warning
                >
                >http://purl.org/dc/elements/1.1/
                >redirected to: http://dublincore.org/2010/10/11/dcelements.rdf
                >status code: 302 (object temporarily moved)
                >
                >for every single page on my site.
                >Would a "don't check 'http://purl.org/dc/elements/1.1/'" help in this case?

                Sure, simply exclude http://purl.org . Maybe I asnwered that last time.

                >BTW, It seems that Xenu can check Tono's URLs just fine, but my respective mail didn't make it to the list yet.

                I'm not the moderator :)

                Tilman

                >
                >Thanks again
                >Thomas
                >
                >------------------------------------
                >
                >Yahoo! Groups Links
                >
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.