Loading ...
Sorry, an error occurred while loading the content.

Re: [xenu-usergroup] Broken Link Question

Expand Messages
  • Bruce Hartford
    ... Yes, if I don t have that checked it appears to only check the internal links between pages within my site. I want to check my outgoing links to other
    Message 1 of 30 , Oct 3, 2010
    • 0 Attachment
      On 10/3/2010 3:25 PM, Shiner wrote:
       

      On 08/07/2010 16:57, Bruce Hartford wrote:

       

      I have a different question regarding broken links reported by Xenu.

      My website is http://crmvet.org. When I run Xenu against it and produce
      a report, it lists broken links by the page they are on, which is great
      and very useful. For example:

      http://crmvet.org/biblio.htm
      http://www.kimopress.com/
      \_____ error code: 12002 (timeout)

      Which I interpret to mean that the link to www.kimopress.com on the
      biblio.htm page is not working. So far, so good.

      But it ALSO lists broken links that do not appear to be on my website at
      all. For example:

      http://archives.cnn.com/1999/US/12/08/king.assassination.01
      http://archives.cnn.com/1999/US/12/08/king.assassination.01/
      \_____ error code: 404 (not found)

      http://cnnstudentnews.cnn.com/2000/LAW/06/08/henry.avants
      http://www.cnn.com/studentnews/2000/LAW/06/08/henry.avants
      \_____ error code: 404 (not found)

      This looks to me as if Xenu is reporting a broken link on the
      archives.cnn.com and cnnstudentnews.cnn.com sites. Am I misunderstanding
      this? If not, why is Xenu reporting on broken links that are not on my
      site?


      Do you have check external domains enabled ?

      Yes, if I don't have that checked it appears to only check the internal links between pages within my site. I want to check my outgoing links to other sites, but I don't want to check links ON those other sites.

      Bruce

    • Bruce Hartford
      Well, now that I know that this is expected behavior, I m cool. I did try to find this issue in the documentation, but if it s there I must have missed it.
      Message 2 of 30 , Oct 4, 2010
      • 0 Attachment
        Well, now that I know that this is expected behavior, I'm cool. I did
        try to find this issue in the documentation, but if it's there I must
        have missed it. Perhaps something like, "When you look at your list of
        broken links (by page or link) you may see broken links from sites other
        than your own. The reason for this is [explanation]."

        Anyway, thanks for responding and clarifying that those mysterious
        broken links are not because I did something wrong.

        Bruce


        On 10/4/2010 1:47 AM, Tilman Hausherr wrote:
        > Sigh... this issue has really come up often in all these years, so maybe
        > I should really do something. Its a design issue, mostly. I can't place
        > all the URLs that link to the "mysterious" URL because it could be many,
        > and there might be a whole chain of redirections.
        >
        > What would be a solution?
        >
        > A small note like this?
        >
        >
        > Broken links, ordered by link:
        > http://archives.cnn.com/1999/US/12/08/king.assassination.01/
        > error code: 404 (not found), linked from page(s):
        > http://archives.cnn.com/1999/US/12/08/king.assassination.01
        > (Attention: the URL above redirects)
        >
        > 1 broken link(s) reported
        >
        > Return to Top
        >
        > Broken links, ordered by page:
        > http://archives.cnn.com/1999/US/12/08/king.assassination.01
        > (Attention: the URL above redirects)
        > http://archives.cnn.com/1999/US/12/08/king.assassination.01/
        > \_____ error code: 404 (not found)
        >
        > 1 broken link(s) reported
        >
        > Return to Top
        >
        >
        > Or a different text, like
        >
        > (Attention: see "List of redirected URLs" to find out more about this URL")
        > or
        > (Attention: enable "List of redirected URLs" to find out more about this
        > URL")
        > with an internal link to the section that deals with that URL.
        >
        > Then I'll probably get many mails from people who disabled that list in
        > the report but don't remember how to enable it :-(
        >
        > Tilman
        >
        >
        >
        > On Mon, 4 Oct 2010 09:55:29 +0200, "Fischer, Thomas"
        > <fischer@...-goettingen.de> wrote:
        >
        >>
        >>
        >>
        >> <head>
        >>
        >> <style type="text/css">
        >> <!--
        >>
        >> /* start of attachment style */
        >> .ygrp-photo-title{
        >> clear: both;
        >> font-size: smaller;
        >> height: 15px;
        >> overflow: hidden;
        >> text-align: center;
        >> width: 75px;
        >> }
        >> div.ygrp-photo{
        >> background-position: center;
        >> background-repeat: no-repeat;
        >> background-color: white;
        >> border: 1px solid black;
        >> height: 62px;
        >> width: 62px;
        >> }
        >>
        >> div.photo-title
        >> a,
        >> div.photo-title a:active,
        >> div.photo-title a:hover,
        >> div.photo-title a:visited {
        >> text-decoration: none;
        >> }
        >>
        >> div.attach-table div.attach-row {
        >> clear: both;
        >> }
        >>
        >> div.attach-table div.attach-row div {
        >> float: left;
        >> /* margin: 2px;*/
        >> }
        >>
        >> p {
        >> clear: both;
        >> padding: 15px 0 3px 0;
        >> overflow: hidden;
        >> }
        >>
        >> div.ygrp-file {
        >> width: 30px;
        >> valign: middle;
        >> }
        >> div.attach-table div.attach-row div div a {
        >> text-decoration: none;
        >> }
        >>
        >> div.attach-table div.attach-row div div span {
        >> font-weight: normal;
        >> }
        >>
        >> div.ygrp-file-title {
        >> font-weight: bold;
        >> }
        >> /* end of attachment style */
        >> -->
        >> </style>
        >> </head>
        >> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
        >> <HTML><HEAD>
        >> <META content="text/html; charset=us-ascii" http-equiv=Content-Type>
        >> <META name=GENERATOR content="MSHTML 8.00.7600.16625"></HEAD>
        >> <BODY style="BACKGROUND-COLOR: #fff">
        >>
        >>
        >> <!-- |**|begin egp html banner|**| -->
        >>
        >> <br><br>
        >>
        >> <!-- |**|end egp html banner|**| -->
        >>
        >>
        >>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><FONT size=2
        >> face=Arial>Hello Tilman,</FONT></SPAN></DIV>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><FONT size=2
        >> face=Arial></FONT></SPAN> </DIV>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><FONT size=2
        >> face=Arial>this could be the same issue as in my mail from 2010-05-20
        > (Very
        >> External Links Checked).</FONT></SPAN></DIV>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><FONT size=2
        > face=Arial>I
        >> might be due to the information in Xenu's report. E.g. I
        >> got</FONT></SPAN></DIV>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><FONT
        > color=#0000ff
        >> size=2 face=Arial></FONT></SPAN> </DIV>
        >> <DIV dir=ltr align=left><SPAN class=150065906-04102010><SPAN
        >> style="WIDOWS: 2; TEXT-TRANSFORM: none; TEXT-INDENT: 0px;
        > BORDER-COLLAPSE: separate; FONT: medium 'Times New Roman'; WHITE-SPACE:
        > normal; ORPHANS: 2; LETTER-SPACING: normal; COLOR: rgb(0,0,0);
        > WORD-SPACING: 0px; -webkit-border-horizontal-spacing: 0px;
        > -webkit-border-vertical-spacing: 0px;
        > -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust:
        > auto; -webkit-text-stroke-width: 0px"
        >> class=Apple-style-span><PRE><A
        > href="http://new.oberlin.edu/arts-and-sciences/departments/mathematics"
        > target=_blank><FONT
        > color=#0066cc>http://new.oberlin.edu/arts-and-sciences/departments/mathematics</FONT></A>
        >> redirected to:<A
        > href="http://new.oberlin.edu/arts-and-sciences/departments/mathematics/"
        > target=_blank><FONT
        > color=#0066cc>http://new.oberlin.edu/arts-and-sciences/departments/mathematics/</FONT></A>
        >> status code: 302 (object temporarily moved)
        >> linked from page(s):
        >> <A href="http://www.oberlin.edu/math/" target=_blank><FONT
        > color=#0066cc>http://www.oberlin.edu/math/</FONT></A></PRE></SPAN></SPAN></DIV>
        >> <DIV><SPAN class=150065906-04102010></SPAN><FONT size=2
        >>
        > face=Arial>and found that I linked to</FONT></DIV>
        >> <DIV><A href="http://www.oberlin.edu/math/" target=_blank><FONT
        > color=#0066cc
        >> size=2 face=Arial>http://www.oberlin.edu/math/</FONT></A><FONT
        > face=Arial><FONT
        >> size=2> </FONT></FONT></DIV>
        >> <DIV><FONT face=Arial><FONT
        >> size=2>which in turn redirects to</FONT></FONT></DIV>
        >> <DIV><FONT face=Arial><FONT size=2><A
        >>
        > href="http://new.oberlin.edu/arts-and-sciences/departments/mathematics/"><FONT
        >
        > color=#000000>http://new.oberlin.edu/arts-and-sciences/departments/mathematics/</FONT></A><SPAN
        >
        >> class=150065906-04102010></SPAN></FONT></FONT><BR></DIV>
        >> <DIV><FONT size=2 face=Arial><SPAN class=150065906-04102010>while<A
        >> href="http://www.oberlin.edu/math/" target=_blank><FONT color=#000000
        > size=2
        >> face=Arial>http://www.oberlin.edu/math/</FONT></A> is out of my
        >> range.</SPAN></FONT></DIV>
        >> <DIV><FONT size=2 face=Arial><SPAN class=150065906-04102010>Could Xenu's
        >> information somehow be changed to reflect this situation more
        >> clearly?</SPAN></FONT></DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        >> class=150065906-04102010></SPAN></FONT> </DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        > class=150065906-04102010>Best
        >> regards</SPAN></FONT></DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        >> class=150065906-04102010>Thomas</SPAN></FONT></DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        > class=150065906-04102010>(and
        >> thanks for a great product!)</SPAN></FONT></DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        >> class=150065906-04102010></SPAN></FONT> </DIV>
        >> <DIV><FONT color=#0000ff size=2 face=Arial><SPAN
        >> class=150065906-04102010></SPAN></FONT> </DIV>
        >> <BLOCKQUOTE
        >> style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT:
        > 5px; MARGIN-RIGHT: 0px">
        >> <DIV dir=ltr lang=de class=OutlookMessageHeader align=left>
        >> <HR tabIndex=-1>
        >> <FONT size=2 face=Tahoma><B>Von:</B> xenu-usergroup@yahoogroups.com
        >> [mailto:xenu-usergroup@yahoogroups.com]<B>Im Auftrag von</B>Tilman
        >> Hausherr<BR><B>Gesendet:</B> Montag, 4. Oktober 2010
        > 05:56<BR><B>An:</B>
        >> xenu-usergroup@yahoogroups.com<BR><B>Betreff:</B> Re:
        > [xenu-usergroup] A
        >> different question about broken links<BR></FONT><BR></DIV>
        >> <DIV></DIV><SPAN style="DISPLAY: none"> </SPAN>
        >> <DIV id=ygrp-text>
        >> <P>See in the report, the redirection list. You probably link to these
        >> CNN<BR>sites that redirect.<BR><BR>Tilman<BR><BR>On Tue, 06 Jul 2010
        > 08:41:40
        >> -0700, Bruce Hartford wrote:<BR><BR>>I have a different question
        > regarding
        >> broken links reported by Xenu.<BR>><BR>>My website is<A
        >> href="http://crmvet.org.">http://crmvet.org.</A> When I run Xenu
        > against it
        >> and produce<BR>>a report, it lists broken links by the page they
        > are on,
        >> which is great<BR>>and very useful. For example:<BR>><BR>><A
        >>
        > href="http://crmvet.org/biblio.htm">http://crmvet.org/biblio.htm</A><BR>>
        >
        >> <A
        > href="http://www.kimopress.com/">http://www.kimopress.com/</A><BR>>
        >> \_____ error code: 12002 (timeout)<BR>><BR>>Which I interpret
        > to mean
        >> that the link to www.kimopress.com on the<BR>>biblio.htm page is
        > not
        >> working. So far, so good.<BR>><BR>>But it ALSO lists broken
        > links that
        >> do not appear to be on my website at<BR>>all. For
        >> example:<BR>><BR>><A
        >>
        > href="http://archives.cnn.com/1999/US/12/08/king.assassination.01">http://archives.cnn.com/1999/US/12/08/king.assassination.01</A><BR>>
        >
        >> <A
        >>
        > href="http://archives.cnn.com/1999/US/12/08/king.assassination.01/">http://archives.cnn.com/1999/US/12/08/king.assassination.01/</A><BR>>
        >
        >> \_____ error code: 404 (not found)<BR>><BR>><A
        >>
        > href="http://cnnstudentnews.cnn.com/2000/LAW/06/08/henry.avants">http://cnnstudentnews.cnn.com/2000/LAW/06/08/henry.avants</A><BR>>
        >
        >> <A
        >>
        > href="http://www.cnn.com/studentnews/2000/LAW/06/08/henry.avants">http://www.cnn.com/studentnews/2000/LAW/06/08/henry.avants</A><BR>>
        >
        >> \_____ error code: 404 (not found)<BR>><BR>><BR>>This looks
        > to me as
        >> if Xenu is reporting a broken link on the<BR>>archives.cnn.com and
        >> cnnstudentnews.cnn.com sites. Am I misunderstanding<BR>>this? If
        > not, why
        >> is Xenu reporting on broken links that are not on my
        >>
        > <BR>>site?<BR>><BR>>Thanks.<BR>><BR>>Bruce<BR>><BR>><BR>><BR>><BR>>------------------------------------<BR>><BR>>Yahoo!
        >
        >> Groups Links<BR>><BR>><BR>><BR></P></DIV><!-- end group
        > email -->
        >>
        >>
        >> <!-- |**|begin egp html banner|**| -->
        >>
        >> <br>
        >>
        >>
        >>
        >> <br>
        >>
        >> <!-- |**|end egp html banner|**| -->
        >>
        >>
        >> <div width="1" style="color: white; clear: both;"/></div>
        >> </BODY></HTML>
        >
        >
        > ------------------------------------
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
      • Tilman Hausherr
        I hereby declare that todays beta is a release candidate. http://home.snafu.de/tilman/tmp/xenubeta.zip Please support me by testing your website with it. I
        Message 3 of 30 , May 16, 2011
        • 0 Attachment
          I hereby declare that todays beta is a release candidate.
          http://home.snafu.de/tilman/tmp/xenubeta.zip

          Please support me by testing your website with it.

          I will probably release the 1.3.9 version at the end of the week.

          Tilman
        • Jonathan Crane
          Tilman, mind if I ask you to post the changes to an email message to the group? j Jonathan Crane
          Message 4 of 30 , May 17, 2011
          • 0 Attachment

            Tilman, mind if I ask you to post the changes to an email message to the group?

            j

             

            Jonathan Crane

             

          • ultra_blue
            Hi, Tilman: Is a change log available? Thanks! Greg
            Message 5 of 30 , May 17, 2011
            • 0 Attachment
              Hi, Tilman:

              Is a change log available?

              Thanks!
              Greg


              --- In xenu-usergroup@yahoogroups.com, Tilman Hausherr <tilman@...> wrote:
              >
              > I hereby declare that todays beta is a release candidate.
              > http://home.snafu.de/tilman/tmp/xenubeta.zip
              >
              > Please support me by testing your website with it.
              >
              > I will probably release the 1.3.9 version at the end of the week.
              >
              > Tilman
              >
            • Tilman Hausherr
              ... Here s the whatsnew file: 1.3.9 Major improvements: 16.4.2011-25.4.2011: Output duplicate content, title, description in the manager section Minor
              Message 6 of 30 , May 17, 2011
              • 0 Attachment
                On Tue, 17 May 2011 16:26:08 -0000, ultra_blue wrote:

                >Hi, Tilman:
                >
                >Is a change log available?
                >
                >Thanks!
                >Greg

                Here's the "whatsnew" file:

                1.3.9

                Major improvements:
                16.4.2011-25.4.2011: Output duplicate content, title, description in the
                manager section

                Minor improvements:
                4.9.2010: excludeMSO behaviour without "/" now
                6.9.2010: remove "reset entry" from context menu when inactive
                7.9.2010: ContextMenuManager for VS2010
                29.9.2010: Report: "List of valid *internal* URLs you can submit to a
                search engine"
                12.10.2010: If-Modified-Since option in INI file
                18.10.2010: Max depth to 9999 instead of 999
                23.11.2010: clarify include/exclude text for wildcard version
                30.12.2010: don't open in internet archive etc when URL from internet
                archive
                1.3.2011: remove percents for URLs in properties dialog
                3.3.2011: remove percents in URLs in report
                9.4.2011: -post querystring for command line version
                15.4.2011: Keywords meta tag column (absolutely useless for google & co,
                but people believe in it)
                14.5.2011: Skip ldap:
                16.5.2011: about box with "FAQ" instead of "Click me", corrected main
                window title

                Bug fixes:
                10.9.2010: process CSS comments
                5.10.2010: remove quotes for charset compoment in HTTP header from nginx
                14.10.2010: mtime, not ctime for local files
                7.11.2010: Unicode comparison for local orphan search
                26.2.2011: flag ICU_NO_ENCODE in AfxParseURLEx() because of bug on
                hebrew systems with
                "tav" character in URL
                28.2.2011: repaint visible line if charset changed
                3.3.2011: handle &#nnnn; and &#xnnnn; as unicode in ProcessLink() and
                others
                3.3.2011: don't reset charset for redirections
                4.3.2011: &#xnnnn; handling for anchors
                8.3.2011: can now check mail domains with no MX record, but with A
                record
                14.3.2011: corrected license text
                14.5.2011: later read jar contents marked as InJar; exclude paths of
                these from orphan check
                14.5.2011: Fixed abort box for local orphan search
                16.5.2011: Fixed toolbar gripper paint problem in XP

                Misc:
                1.3.2011: remove side effect in csRemovePercents()
                4.3.2011: &#nnnn; central conversion routine
                9.4.2011: remove FORMTEST twice, focus on POST query string when
                checkbox set
                15.4.2011: CLinkInfo Archive format version 16 (Keywords)
                16.4.2011: CLinkInfo Archive format version 17 (MD5 hash)



                4.9.2010 (1.3.8)

                Major improvements:
                19.6.2010: check css @import statements within <STYLE>...</STYLE>
                check url() elements within <STYLE>...</STYLE>
                check url() element within STYLE=
                (dedicated to The gorgeous Princess Victoria of
                Sweden, whose
                wedding to Clark Kent contributed that there's really
                nothing on
                television besides herself and the soccer world cup :-) )
                See also who's got his hand:
                http://thausherr.blogspot.com/2010/06/prinzessin-im-griff.html
                20.6.2010: parse css files similar to <STYLE>...</STYLE>

                Minor improvements:
                1.7.2010: sort "broken page-local links" section in report
                3.7.2010: url property dialog now resizeable
                6.7.2010: mailto with empty rest => "mailto:", not "mailto:@".
                24.7.2010: mailto:name%40host.com => mailto:name@...
                25.7.2010: all mailto: URLs of a host with successful DNS lookup are set
                to "skip type"
                27.7.2010: dito also for previously failed mailto: URLs of that
                successful looked up host
                27.7.2010: light green color for "mail host ok", which replaces text
                "skip type" for mailto:
                7.8.2010: renamed "maximum level" to "maximum depth"
                14.8.2010: GraphViz only for "ok" links

                Misc:
                20.6.2010: changed link counting method, now in AddUrl
                4.7.2010: clean possible memory leaks when finishing; FreeLibrary() for
                DNSAPI.DLL
                7.7.2010: changed toolbars slightly, preparations for VS2010
                20.7.2010: for VS2010, expand application class with virtual INI
                functions because I hate the registry
                15.8.2010: "#" as error (not in public release)
                24.8.2010: DLL security: fully qualified path for LoadLibrary()

                Bug fixes:
                20.6.2010: Lower case in check for .gif, .png etc
                23.7.2010: corrected bug in change from 25.5.2010 "set recent URL list
                to 100 instead of 10"
                1.8.2010: correct bug about CCriticalSection usage for ServerMap and
                CharsetMap
                2.9.2010: fix for false alert in VS2010 buffer overflow check


                12.6.2010 (1.3.7)

                Minor improvements:
                12.6.2010: .class files that are in an external .jar file are marked as
                skipped
                ".class in Jar" property is now saved in .XEN file

                Bug fixes:
                14.6.2010: correct skip of ".class in Jar" property when choosing next
                thread
                set all unhandled ".class in Jar" URLs as "not found" when
                all else done

                Misc:
                12.6.2010: CLinkInfo Archive format version 15 (".class in Jar"
                property)



                11.6.2010 (1.3.6)

                Major improvements:
                24.2.2010: Check the domains of mail addresses (DNS lookup for MX
                record)

                Minor improvements:
                7.12.2009: Include PARSETEST4 section in general release (convert
                characters >80H to %XX, for "international" URLs)
                19.12.2009: For "international" characters in local files: Use Unicode
                for local directory search, URL launch in browser, read/check local
                files
                20.12.2009: But not for Windows 95/98/ME
                22.12.2009: add ".class" for applets if needed, replace "." with "/".
                example:
                http://www.colorado.edu/physics/2000/applets/bec.html
                27.12.2009: updated to NSIS 2.46
                10.1.2010: use version 6 list column sort arrows on XP and higher
                14.1.2010: added Description column
                15.1.2010: added warning when settings overwritten by profile
                16.1.2010: attempt at decoding .jar files for APPLET ARCHIVE thanks to
                http://www.codeguru.com/cpp/cpp/cpp_mfc/article.php/c4049/
                However:
                - only one .jar archive per applet
                - no unicode in file names
                - name of archive must end with .jar
                - .jar file must be internal, or the class link will
                remain broken
                - .class "in Jar" property isn't saved in .XEN file
                (which prevents standard access in favor of waiting for .jar lookup)
                24.1.2010: added <video src=
                27.1.2010: improved list control divider double click (title is the
                minimum)
                26.2.2010: improved extra text in domain mail check
                13.3.2010: Get page body only if not redirection or redirection but no
                "Location:" in header
                (should make PARSETEST3 fix superfluous)
                16.3.2010: ...
                30.3.2010: Abort box for ftp orphan search
                2.4.2010: [Options] Accept="*/*" (default value)
                14.4.2010-6.5.2010: milliseconds in duration
                12.5.2010: reset e-mail flag when loading .XEN file, because if set it
                would mail and quit after loading a finished job
                12.5.2010: include link text in report (LINKTEXT compile option)
                25.5.2010: set recent URL list to 100 instead of 10
                3.6.2010: version nr. in report
                6.6.2010: show count of included / excluded URLs in the report
                6.6.2010: Abort box for orphan search always

                Bug fixes:
                15.12.2009: PARSETEST4 section: replaced "> 80X" with ">= 80X"
                20.12.2009: added version check for Unicode Clipboard and Sitemap for
                Windows 95/98/ME (like 27.1.2009)
                21.12.2009: corrected broken banner links
                22.12.2009: tell "anchor occurs multiple times" only once per URL
                4.1.2010: remove stuff after "?" in mailto: due to Microsoft error in
                AfxParseURLEx()
                10.1.2010: fixed list column sort arrows wrongly displayed in unsorted
                columns (on 7, but not on XP)
                12.1.2010: fixed "//" bug in applet codebase in local url
                15.1.2010: disabled and unchecked "Inactive" checkbox after loading new
                profile
                18.1.2010: fixed title line of tab export
                20.1.2010: Don't assume URLs to be UTF-8, use current charset instead
                However: this solution isn't perfect, because the correct
                charset of an URL would be the referring URL
                But in most cases it will work, because URLs usually
                have the same charset
                Known bug: Root URL with exotic characters
                20.1.2010: Corrected exotic URLs in sitemap
                26.1.2010: Fixed % in file: URLs, only convert %XX
                27.1.2010: "Conversion to lowercase" option uses codepage for conversion
                31.1.2010: Fixed bug in report (max size + max size url), probably
                introduced on 15.1.2010
                15.3.2010: vNormalizeURL() with conversion to UTF8 prior to
                AfxMyParseURL()
                store URLs in UTF8, unless already ANSI or ISO-8859-1 (1252)
                vRemovePercents for display only
                3.4.2010: prevent reentrant calls to vDoIdle();
                set fileNotFound status if tmp URL content file deleted by
                antivirus software
                10.4.2010: replaced "> 80X" with ">= 80X" in vAnsi2EntityEscaped()
                30.4.2010: changed user agent with "/" as requested in
                http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43
                and
                http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.8
                6.6.2010: add milliseconds in sum for manager statistics avg
                calculation


                Misc:
                14.1.2010: CLinkInfo Archive format version 12 (Description)
                15.1.2010: CLinkInfo Archive format version 13 (size now 64 bit value)
                27.1.2010: OnNewDocument() with vNormalizeURL() instead of
                AfxMyParseURL()
                29.1.2010: OnNewDocument(): moved duplicate code to end
                5.5.2010: CLinkInfo Archive format version 14 (milliseconds)
                6.6.2010: MinSize, MaxSize unsigned


                5.12.2009 (1.3.5)

                Bug fixes:
                4.12.2009: Skip xmpp: and others properly
                4.12.2009: fixed another *.LNK file loss bug in NSIS script that would
                occur when installing in existing folder

                Misc:
                5.11.2009: processorArchitecture="*" in manifest
                28.11.2009: improved error messages for MultiByteToWideChar()
                29.11.2009: updated to NSIS 2.45
                1.12.2009: About box with correct spelling: "Xenu's"
                5.12.2009: created this version on new PC


                5.11.2009 (1.3.4)

                Minor improvements:
                30.5.2009: ignore "view-source:"
                1.6.2009: set SECURITY_FLAG_IGNORE_REVOCATION after
                ERROR_INTERNET_SEC_CERT_REV_FAILED (works only the first time, sadly)
                1.6.2009: ErrorDlg for ERROR_INTERNET_SEC_CERT_REV_FAILED only if
                SECURITY_FLAG_IGNORE_REVOCATION not set
                5.6.2009: set up minimum status line segment widths
                26.7.2009: Use local timezone when displaying date+time of website,
                instead of GMT
                29.8.2009: show time status every second
                9.10.2009: mention empty URLs in report to avoid confusion

                Bug fixes:
                20.10.2009: ignore MIME type and charset when result not HTTP_STATUS_OK
                5.11.2009: fixed /S setup.exe bug in NSIS script

                Misc:
                1.6.2009: ErrorDlg (certificates etc) now from app window, not desktop
                window
                9.10.2009: Test monocolor
                15.10.2009: merged AFX_INET_SERVICE_HTTP and AFX_INET_SERVICE_HTTPS in
                ThreadProcGET()
                16.10.2009: tired of character in version number, now using digits
                31.10.2009: VS2010 fixes: PFNCALLBACK, OnTimer (UINT_PTR), LRESULT
                OnFindReplace, INT_PTR lHnd
                3.11.2009: CLinkInfo Archive format version 11 (m_iThisURL ->
                m_dwThisURL)

                25.4.2009 (1.3c)

                Bug fixes:
                18.4.2009: Changed behaviour of google sitemap creation - only convert
                five characters to ampersand
                21.4.2009: Changed behaviour of google sitemap creation - Convert >80H
                characters to %XX

                Minor improvements:
                21.4.2009: Only take the first TITLE, not later TITLEs


                18.4.2009 (1.3b)
                Install this version if you're in China, use Windows 95/98/ME, or check
                sites with over a million URLs.

                Minor improvements:
                23.12.2008: ErrorDlg for ERROR_INTERNET_SEC_CERT_DATE_INVALID
                24.12.2008: http:// within the path, like in archive.org URLs is never a
                "//" error (see 30.11.2006)
                25.12.2008: ErrorDlg for ERROR_INTERNET_SEC_CERT_CN_INVALID and
                ERROR_INTERNET_SEC_CERT_REV_FAILED
                27.12.2008: Optimized array growth
                28.12.2008: Started IGNORETITLES compile option to save memory (ignores
                titles and server name)
                28.12.2008: Optimized charset, use ref in global hash table instead of
                CString (saves memory)
                29.12.2008: IGNORETITLES now ignores externals totally
                29.12.2008: improved speed for collecting pending URLs within visible
                section of xenu window
                29.12.2008: CCriticalSection for CharsetMap
                29.12.2008: Optimized server software name, use ref in global hash table
                instead of CString (saves memory)
                1.3.2009: Use PARSETEST version for general release
                18.4.2009: New NSIS installer script with much help from Andrey
                Aleksanyants

                Bug fixes:
                1.1.2009: Don't drop all input if </a> missing after <a...>
                12.1.2009: Made own upper case conversion (for users in China)
                27.1.2009: added version check, because Unicode API calls don't work for
                Windows 95/98/ME
                1.3.2009: corrected NL before charset in TAB export
                31.3.2009: corrected bug in high port numbers
                7.4.2009: fixed bug for PARSETEST version, moved replace "/./" with "/"
                higher in AfxMyParseURL()
                17.4.2009: report file not created message

                Misc:
                26.1.2009: replaced InitCommonControls() with InitCommonControlsEx()
                17.4.2009: attempt at html FORMs with POST query string (for FORMTEST
                version only)

                20.12.2008 (1.3)
                I've decided to call this version 1.3 instead of 1.2k. The international
                charset, the google maps,
                and the "ID=" anchor were widely requested, so I guess this is really a
                good leap forward.

                Major improvements:
                - 22.1.2008: UTF-8 in Xenu Window and report
                - 23.1.2008: all charsets in Xenu Window and report
                - 2.2.2008: parse charset meta tag (note that header settings have
                priority!)
                - 2.2.2008: improved speed of charset handling by using hash table
                - 29.2.2008: Google Sitemap
                - 25.7.2008: parse ID= anchor
                - 23.11.2008, 6.12.2008: GraphViz export

                Minor improvements:
                - 20.10.2007: Updated to InnoSetup 5.2.1
                - 28.11.2007: decode "encrypted" mailto (Use parse result for
                AFX_INET_SERVICE_MAILTO)
                - 1.12.2007: Updated to InnoSetup 5.2.2
                - 14.3.2007: Updated to InnoSetup 5.2.3
                - 22.1.2008: CLinkInfo Archive format version 10 (charset)
                - 8.3.2008: Accept-Language option (command line version only)
                -language en / de / ...
                - 10.4.2008: removed wsprintf() calls
                - 8.5.2008: passive ftp option for ftp URLs too (previously just in
                Orphan check) *** unfinished, not saved in .XEN; Version 11 ***
                - 31.5.2008: new icon, sponsored by www.hitflip.de, designed by Dominic
                Raths
                - 8.6.2008: InternetCloseHandle() with trace, also trace in ftp stuff
                - 9.6.2008: ShellExecute() separated in File and Path; new
                vShellLaunchURL() function
                - 29.6.2008: Google Sitemaps: higher priority for root URL
                - 5.7.2008: Don't quit if mail fails
                - 5.7.2008: xenulog.txt with date, too
                - 5.7.2008: Improved email dialogbox (disable fields)
                - 25.7.2008: Skip </...> in parser segment; changed bParseAnchorTag() to
                be more general
                - 5.8.2008: report: detect and report redirection loops
                - 9.9.2008: IMG LONGDESC
                - 10.10.2008: Abort Dialog for sitemap
                - 15.10.2008: Fixed C++ language issues (scope of variables in 'for'
                loop) for VS2008; #define _WIN32_IE 0x0400
                - 17.10.2008: manifest for common control XP look and feel;
                HOLLOW_BRUSH for Bitmap in Tip Dialog, solves problem in
                2005 attempt
                http://tech.groups.yahoo.com/group/xenu-usergroup/message/445
                positive side effect: can now display exotic
                charset in text control
                - 18.10.2008: manifest resource
                http://www.codeproject.com/KB/winsdk/xptheme.aspx
                - 18.10.2008: sort list of redirections in report
                - 31.10.2008: GetExitCodeThread() result when not STILL_ACTIVE
                - 2.11.2008: added <OBJECT DATA="...">
                - 6.12.2008: Added column to pagemap

                Bug fixes:
                - 13.12.2007: catch empty URL in HREF etc
                - 14.2.2008: corrected WCHAR divider size bug in MakeShortStringW()
                - 11.3.2008: Google Sitemap only for internal URLs; escape &'"<>
                - 26.3.2008: CXenuDoc()::m_bCheckExternal set to profile value
                - 9.9.2008: corrected bug in mailto (user name was missing)
                - 9.9.2008: loop detection algorithm in redirection report sometimes
                had an endless loop itself
                - 24.9.2008: check for ";" removed in ParseImgTag() and ParseAnchorTag()
                - 26.9.2008: reset charset only for HTML with bodies,
                because of http://www.adventure-inn.com/ch/description/
                - 1.10.2008: URLs are also UTF-8
                - 1.10.2008: Clipboard URL copy in Unicode format
                - 5.10.2008: IDC_URL in Property Dialog also UTF-8
                - 11.10.2008: All fields in Property Dialog now in UTF-8
                - 2.11.2008: No double separator in context menu for local files with
                non-existing MIME types


                Misc:
                - 19.10.2008: ShellExecute with 0 as first param
                - 7.11.2008: Orphan size as LONGLONG
                - 19.11.2008: PARSETEST3 version for % stuff in redirections


                8.10.2007 (1.2j)
                Major improvements:
                - 5.6.2007: second options pane with 7 "secret" settings
                - 7.7.2007: up/down sort symbol on column header

                http://www.codeguru.com/cpp/controls/listview/advanced/article.php/c4179/
                Minor improvements:
                - 4.10.2006: visible URLs are first in new threads
                - 4.10.2006: update listctrl when "busy" is set
                - 7.10.2006: 2nd part of report more efficient for huge sites
                - 12.10.2006: REMOVEDOUBLESLASH compile option removes "/../" too
                - 15.10.2006: application/xhtml+xml is hypertext, too
                - 15.10.2006: Updated to InnoSetup 5.1.8
                - 30.10.2006: Skip aim://, ymsgr://, rtsp://, xmpp://
                - 30.11.2006: better error message for ShellExecute() errors
                - 30.11.2006: "//" in URL after the host name is not "broken" when after
                a "?"
                - 8.1.2007: Max title length 1024
                - 16.1.2007: ftp dialogbox wider
                - 19.1.2007: [Options] MakeLowerCase=1 ==> converts all URLs to lower
                case
                (default is 0)
                - 3.3.2007: [Options] ListLocalDirectories=1 ==> local directory
                listing (default is 0)
                - ??.3.2007: [Options] AllowLocalFilesInRemoteCheck=1 ==> Allow
                file:// links in remote check
                (default is 0)
                - 16.3.2007: Skip callto:
                - 25.3.3007: meta generator
                - 31.3.2007: Upgraded to InnoSetup 5.1.11
                - 31.3.2007: Title TrimRight()
                - 31.3.2007: update listctrl when title becomes known
                - 31.3.2007: convert titles in sitemap to &...; notation
                - 1.4.2007: Added most of
                http://www.htmlhelp.com/reference/html40/entities/special.html to
                conversion table
                - 29.5.2007: "asterisk" sound when done
                - 2.6.2007: -save option for command line version to save .XEN file
                (does overwrite)
                - 2.6.2007: all command line options for command line version can now be
                combined
                - 5.6.2007: MakeLowerCase, vNormalizeURL() slightly changed internally
                - 6.6.2007: .XEN Archive version 10
                - 8.6.2007: "Autostart" feature when opening .XEN file
                - 8.6.2007: all command line options for command line version can be
                used when opening .XEN file
                - 28.7.2007: retry feature in command line version (test)
                - 3.8.2007: Upgraded to InnoSetup 5.1.13
                - 15.8.2007: reset sort icon, and vUpdateColumnSortIcon() at InsertAll()

                Bug fixes:
                - 7.12.2006: check for iIndex < pList->GetItemCount()
                - 13.2.2007: corrected bug in ListLocalDirectories feature (last file
                ignored)
                - 15.2.2007: wildcard version adds "*" at the end of each entry in
                "Check URL list"
                - 23.5.2007: aim: instead of aim://
                - 20.8.2007: remove "file://" for ShellExecute()
                - 21.9.2007: % size corrected in statistic (was % count!)
                - 22.9.2007: fixed CFindFile security leak,
                http://goodfellas.shellcode.com.ar/own/VULWKU200706142

                1.10.2006 (1.2i)
                Major improvements:
                (none)
                Minor improvements:
                - 25.6.2006: Property dialogbox with count
                - 25.7.2006: Added orphan size
                - 19.8.2006: PARSETEST2 compile option (restore all %XX, like 23.7.2005)
                - 19.8.2006: Updated to InnoSetup 5.1.6
                - 9.9.2006: NEW Dialog Box wider
                - 16.9.2006: [Options] MaxRetry
                - file %TEMP%\XENULOG.TXT for people who have trouble launching the
                browser
                (the file is not automatically sent to anyone, this must be done
                manually)
                Bug fixes:
                - 6.6.2006: vNormalizeURL (csBaseURL);
                - 24.7.2006: Microsoft bug in CStdioFile::ReadString workaround
                (happened with files with a multiple of 128 with no CR on
                last line)
                http://www.mpdvc.de/html.htm#Q71
                http://avensoft.biz/kb/kbDetail.wsp?kb_id=162
                - 26.7.2006: Added missing HTTP status codes (412-415)
                - 30.7.2006 - 2.8.2006: corrected many HTML errors in report (Thanks
                Spike!)
                - 14.9.2006: corrected bug in 18.3.2006 feature that made Xenu slow when
                unfinished URLs only at the bottom of huge URL list
                - 18.9.2006: corrected error handling for smtp.Connect()
                - 10.11.2006: total elapsed hours, instead of modulo 24 in status line

                2.6.2006 (1.2h)
                Major improvements:
                - Tip of the day
                Minor improvements:
                - ALT part of <IMG > used for the title column
                - [Options] FailSimilarHosts=0 (current behaviour and default is 1)
                - more statistics for managers (min size with link, max size with link,
                avg size)
                - "In Links" and "Out Links" in headings for better readability when
                small
                - correct error message for empty ftp orphan directories
                - error message for empty local orphan directories
                - error message for non existing local orphan directories
                - orphan list sorted
                - (Test / by request only) IGNOREFRONTPAGEORPHANS
                - ftp host field allows port number, ftphostname:port
                - ftp dialog fields stored in .INI file
                - ftp default page (e.g. index.html, home.html, default.asp, etc)
                - ftp dialog does not appear when Xenu is launched with "-url", but is
                still available in "corporate" version
                - ReportBroken2 more efficient
                - 8.6.2005 Updated to InnoSetup 5
                - slight change in .TAB format: Status-Code and Status-Text instead of
                Status only
                - prevent empty input in NEW dialog
                - Ignore "error" HTTP_STATUS_ACCEPTED (for user with VMware, host Fedora
                Core 9 who has NAT problems)
                - changed handling of "%XX" with file:// orphan files
                - 23.7.2005: AfxMyParseURL removes "%XX" with file:// URLs
                - include/exclude wildcard test thanks to
                http://www.codeproject.com/string/wildcmp.asp
                - better text for ftp orphan dialog
                - 18.3.2006: currently selected URL is first next new thread
                - 19.3.2006: ftp/gopher segment only when such URLs exist
                - 19.3.2006: put include/exclude settings into report
                - 1.4.2006: link to Google Sitemaps in report
                Bug fixes:
                - in file://///UNC-Host/Share, leading "//" is not an error
                - &#xnnn; now recognised (in addition to &#nnn;)
                - vNormalizeURL() when reading URL List
                - need space or semicolon before a "name", "href", etc
                - process % when checking an ftp URL on an ftp server


                18.3.2005 (1.2g)
                Major improvements:
                - Attempt at javascript thanks to
                http://www.codeguru.com/Cpp/Cpp/string/regex/article.php/c2779/
                details explained at
                http://home.snafu.de/tilman/xenulink.html#javascript
                Minor improvements:
                - [Options] ExcludeMSO=1 and Xenu ignores URLs that end with
                /filelist.xml
                /editdata.mso
                /oledata.mso
                - Show elapsed time in status bar [15.1.2005 changed archiving format]
                - TARGET=_blank instead of TARGET=Xenu in report
                - New Version 2.44 of CSMTPConnection http://www.naughter.com/smtp.html
                - "//" in local files is always an error
                - mailed report as "XXXX.htm" instead of "XXXX.tmp.htm"
                - Version String in .XEN file
                - vTimeoutSimilarHosts() more efficient with huge sites
                - Faster local link checking (no copying to %temp% file)
                - HTTP_STATUS_REDIRECT_KEEP_VERB (307)
                - ERROR_INTERNET_CLIENT_AUTH_CERT_NEEDED (12044) error handling
                - passive ftp mode in orphan dialog box
                - Send XENU.INI file as mail test instead of CONFIG.SYS
                - Orphan check also for https://
                - "New" Dialogbox can be used to enter a ftp link (no crawling!)
                - Cookies allowed when [Options] AllowCookies=1
                don't use this if you have links that delete or change something!
                - (Test / by request only) PARSETEST, ORPHANS_CASEINSENSIVE
                Bug fixes:
                - better error handling for error 12003 in FTP orphan check
                - _findclose in local orphan check (to unlock directory!)
                - /> bug fixed in META REFRESH
                - &# handling in vReplaceAmpStuff() and in bProcessLink()
                - handle redirection target as a possibly relative link
                - No empty URLs in URL list
                - date, size for file:///
                - alexa, google cache and wayback only for http:// and https://
                - offset in ParseTag as int instead of short for tags > 64K
                - cut off after '?' in remote orphan check
                - exclude excluded URLs in Orphan list
                - WINVER 0x0400

                6.8.2004 (1.2f)
                Major improvements:
                - Real setup (InnoSetup 4)
                - Status code for redirections
                - Context menu: Open in Google Cache
                - Context menu: Open in Wayback Machine
                - Context menu: Open Alexa
                Minor improvements:
                - report as "XXXX.htm" instead of "XXXX.tmp.htm"
                - Max-Level also "connected" to the URL
                - Compiled on Windows XP
                - List of unfinished threads when closing
                - Don't display ODP context menu for broken http://editors.dmoz.org
                links
                - "Display error" mentioned in Properties Box when too many links
                - Look for subdirectories when doing orphan searches
                - Remove "file:///" when launching local URLs without DDE
                - Change "\" to "/" for "file://" URLs because of problems with Opera
                7.5
                Bug fixes:
                - Deletes TGH*.* files also when limited number of levels
                - Can work with http://www.dbdebunk.com: "location: " instead of
                "Location: "
                - Correct time in report (minute and second were mismatched)
                - ReportStatistics and ReportOrphans flags in .XEN file
                - No error message when click "not on a line"
                - Prevent re-entrancy of vAttention() when e-mailing report

                28.9.2003 (1.2e)
                Major improvements:
                - Remote Orphans
                - Bugfix for sites > 65535 links: m_FromTab set to 32bit
                - timeout feature (default: 60 secs)
                - STOP button in addition to the PAUSE toolbar button
                - Scan https:// websites with bad certificate (ERROR_INTERNET_INVALID_CA
                = 12045)
                - Validate URL with right mouse click
                Minor improvements:
                - Skip irc://, mms://, rtsp://, pnm://, wtai://
                - <hr> instead of "=========" in report
                - </li> in report, so that it is correct HTML
                - "Normalization" of URLs in include/exclude list
                - Len = 0 when file error with http GET
                - OpenRequest() with INTERNET_FLAG_NO_COOKIES
                - Site Map recursion warning
                - "//" in URL after the host name is not "broken" when part of
                "http://" or "https://"
                - empty line in report after local link error for a page
                - Local orphans case insensitive
                - Automatic retries only when m_bBusy
                - CInternetSession local to thread, to make STOP possible
                - "http://dmoz.org" instead of "http://dmoz.org/" comparison, to avoid
                extra menu item for dmoz-internal links
                - Properties at right-mouse-key always the last item
                - Make current item visible after sort
                - More random spidering to balance the load
                - Url Sort case unsensitive
                - Buffer overflow bug in unknown errors removed
                [29.5.2005 reinstalled VC++ after HD loss]

                14.9.2002 (1.2d)
                - "//" in URL after the host name is not "broken" when "http://" or
                "https://" after a "?"
                - Corrected bug that local non-HTML files would be downloaded in full
                - Corrected GUI bug in "new" dialog
                - Converted %5F to _
                - Change in cmdline version about profile reading
                (Matching now done before Normalization)

                16.7.2002 (1.2c)
                - <BLOCKQUOTE CITE
                - Consider unexisting types like "httttp" as "not found"
                - Editing of ODP websites in the right-mouse-menu
                (useful for editors at http://dmoz.org)
                - For local files, launch related applications (e.g. viewer, editor)
                with the right-mouse-menu
                - Corrected bug that had root page twice in Xenu list
                - "//" in URL after the host name is always an error
                - Prevent closing when threads running
                - "R" launches "Properties" in right-mouse-menu
                - Save directory of "Browse" location
                - Enlarged "New" Dialogbox
                - Retry also for error 403
                - Local checks for "#"
                - HTTP_STATUS_PROXY_AUTH_REQ handling not dependent of password setting
                - Ignore "error" HTTP_STATUS_RESET_CONTENT (at
                http://www.vietnamthink.com/ )
                - Corrected '%' bug with Orphan files
                - "\" not a bug when after a "?"
                - Correct # of Threads and URLs in status line when finished
                - Corrected Bug with stuff like "nohref=" or "classname=" inside

                30.11.2001 (1.2b)
                - !!!!! Moved the xenu.ini file from
                \windows or \winnt to the current working directory
                - Corrected bug with </Script>
                - <TR BACKGROUND
                - <TH BACKGROUND

                6.10.2001 (1.2a)
                - extra column: time spent
                - Correct count for broken links in report
                - Can get size of some ftp files
                - <TABLE BACKGROUND
                - Append header information from redirected files even if a body exists,
                because of http://wap.loop.de
                - Look up MIME type for local files
                - Unofficial Option in XENU.INI:
                [Options] UseDDE=0 to disable DDE on some systems
                - Combined html and wml (WAP) scanning
                - <INPUT SRC="image.gif"> checked
                - Skip <SCRIPT>...</SCRIPT>
                - Logo in About-Box changed
                - Min Level can be 0
                - CTRL-Numpad-ADD to resize all columns
                - Attempt at Orphan files
                - Improved speed
                - Better method for Url lookup
                - no UrlTable search in ctor of CLinkInfo
                - check for "txt", "jpg" etc more efficient
                - m_csRootURL tested in bIncluded()
                - CLinkInfo::vAddFromURL more efficient
                - Internal function bHasBrokenToURLs() more efficient
                - Corrected weird bug in initial Combo-Box
                - Changed Text in NEW Dialogbox
                - Compiled with VC++ 6

                22.7.2001 (1.1f)
                - Changed User-Agent string to
                Xenu Link Sleuth
                because of problems with many websites, e.g. www.sptimes.com

                21.7.2001 (1.1e)
                - CTRL-W and CTRL-Q shortcuts for Close and Exit
                - Ability to consider hard redirections as errors
                - Changed character in User-Agent string from ' to ยด

                2.7.2001 (1.1d)
                - new error "no info to return" for empty web pages
                - corrected bug about saving to tab file when file exists
                - added statistics for managers :-)
                - HEAD command also for .zip, .exe .swf (saves bandwidth)
                - serializing requests for name/password
                - changed include/exclude so as to work only on the *beginning* of URLs
                (don't forget to start them with "http"!)

                (1.1c)
                - Added some extra error messages
                - Saving columns width
                - Adjusting column width with double-click
                - e-mail feature
                - removed mailto:www-request@... from report
                - added LAYER SRC, IFRAME SRC and IMG LOWSRC
                - sort URLs in broken link section of the report
                - HEAD command also for .txt, .png, .rtf and .pdf (saves bandwidth)

                (1.1b)
                - Added <TD BACKGROUND="">
                - file:/// instead of file://
                - added BGSOUND
                - Compiled with VC++ 5.0, smaller
                - Can now launch URLs even with registry poorly configured
                - URLs of the report open in new window
                - Property box with Link Text / Title
                - URLs for include/exclude are "bound" to the URL

                (1.1a)
                - [ and ] in URLs
                - corrected bug in CODEBASE (must add "/" if not there)
                - corrected bug that deleted include/exclude fields
                - improved include/exclude dialog
                - added text for error 300
                - corrected bug about password sites

                (1.0w 14.4.2000)
                - PLUGINSPACE in EMBED tag now checked
                - APPLET now checked, with CLASS and ARCHIVE, relative to CODEBASE

                (1.0v 7.4.2000)
                - EMBED tag now checked
                - "Options" in the "New" dialog
                - "Return to top" in Report
                - Corrected bug in site map: broken links are not included
                - Now converting more &blah; characters
                - Titles get also converted
                - converting &blah; characters before normalizing
                - convert &# characters in URL
                - can now handle URLs like http://user:password@host/ or ftp or https
                - export always exports *all*, regardless of the view.
                - sadly an old bug is back in: URLs with "\" are not recognised as
                broken.
                - Links that start with "/../" are considered to be broken

                (1.0u 15.10.1999)
                - "skip these" feature - this really excudes URLs
                - &U for Check URL menu

                (1.0t 9.9.1999)
                - corrected /./ bug
                - added CTRL-B to switch between views

                (1.0s 12.8.1999)
                - "normalizing" received URLs. Advantage: hostnames always converted
                into lower case.
                - considering all pending URLs with the same host as failed when
                timeout, connection failed, or similar
                - moved the "Browse..." button
                - changed the URL combining method, now using Microsoft's
                InternetCombineURL() instead of my own algorithm
                - proxy authentication now supported
                - corrected bug with '

                (1.0r 29.5.1999)
                - Corrected bug with image maps

                (1.0q 29.5.1999)
                - include titles of links
                - include / exclude
                - allowing the use of '
                - corrected bug re: e.g. "src" being used *before* the actual "src" word
                - new tags: link, script (the applet tag will come in a later version)
                - removed empty <ul></ul> sequences in the report
                - date in the title of the report
                - corrected bug re: HTML pages with CR only
                - set "text/html" for local files
                - save size of columns

                (1.0p 8.1.1999)
                - corrected bug about URL-in-URL
                - convert & when in URLs
                - REFRESH META Tag
                - Focus set to OK after entering local file
                - remote URLs with "\" now always fail (because netscape cannot handle
                them)

                13.10.1998 (1.0o)
                - corrected bug that prevented checking local files with a space in it
                - corrected bug that thread count was not updated when finished
                - corrected bug that ignored http:/host/directory error
                - added banners

                If anyone has locations that offer banners, please e-mail me.
                I would advertise for non-profit organisations that deal with
                human rights or environmental topics. Attention - I will only
                use banners that I like, and link to organisations that I like.

                5.9.1998 (1.0n)
                - Can check local files - useful for people who don't want to install
                a local WWW server; simplified toolbar / initial window
                - "Check External" in INI file for new windows
                - "Show Broken Links Only" in INI for new windows
                - Corrected "//" bug for www.workstation.digital.com
                - Added random seed for banner (actually, uploaded this already on 17.7)
                - included HTML file in the ZIP file
                - RegisterShellFileTypes(FALSE) to prevent the "new" and "print"
                in the registry for new users
                - Errors between 1 and 199 are also "errors"
                - maximize MDI child when opening
                - Randomize checking, so that there is less volume on just one host
                (reduces peak volume on the ISP who hosts the site being checked)
                - Slight change in report because of OPERA bug with <PRE> after <H2>

                16.7.1998 (1.0m)
                - Added banners in report
                - corrected the "406" bug

                24.6.1998 (1.0l)
                - Added a column at the right (error text).
                - removed "DELETE_ON_CLOSE" technique, didn't work on Windows NT
                due to different OS behaviour. Sorry!

                5.6.1998 (1.0k)
                - Changed ftp access completely. It is now reliable, but won't work with
                proxies.
                - more than 32767 URLs
                - Optimized HTML parser

                18.4.1998 (1.0j)
                (I was on vacation, and I am still behind in my other activities,
                so no "big" new feature this time)
                - no need to enter "http://" in the NEW dialog box
                - Cool Xenu icon! See on the page above.
                - CTRL-R for "retry broken links"
                - Removed "search" from context menu (nothing was associated with it)

                6.2.1998 (1.0i)
                - URL launch should now work properly with Netscape Communicator

                1.2.1998 (1.0h)
                - added "export to TAB separated file" for Excel (for Marc)
                - added max level
                - 100% CPU usage problem solved (Miguelito) / changed idle processing
                - Site Map
                - URL launching improved (but still not perfect)

                25.12.1997 (also 1.0g)
                - corrected "%26" endless loop bug (Electronic Telegraph)

                24.12.1997 (1.0g)
                - added lots of new options (for Stu)
                - chose what you want to have in the report
                - chose to "fail" passworded sites
                - changed the way that URLs are launched: now with DDE so that only
                one instance (but another window) of Netscape comes up. Behaviour
                with IE and Opera might be different
                - corrected "text/html;...." bug (for Hanno)
                - you can now launch URLs with ENTER
                - you can now get the property box with ALT-ENTER
                - force reload for every call --> INTERNET_FLAG_RELOAD (for Doug)
                - changed initial dialog box, after two users didn't realize that one
                has to input only one URL, and not every page of the site
                - removed unused toolbar icons and menu elements

                23.11.1997 (1.0f)
                - corrected bug that made it difficult to check local or very fast sites
                - corrected minor bug in Properties Dialog
                - Added column with link level
                - Added error message for wrong input
                - Added different tries for image maps

                12.10.1997 (1.0e)
                - list of redirected URLs (useful because certain ISPs, e.g.
                www.primenet.com do not provide proper error returns, instead they
                redirect to an error page)
                - checking of targets of redirected URLs (this often leads to more
                broken links, as lots of sites make automatic redirection without
                checking if the target site exists)
                - ftp & gopher list for manual check
                - added tips how to repair broken links in the FAQ
                - retry mechanism enhanced (for sites that fail with the HEAD command)
                - error handler improved (open file problem)
                - status line accuracy improved

                7.9.1997 (1.0d)
                - "Find" dialog box
                - # of threads can be configured (watch your TCP/IP line glow!)
                - corrected bug related to titles that do not end
                - added authorization for "simple" password sites (HTTP error 401)
                (will not work with web-based passwords, e.g. NY Times)

                24.8.1997 (1.0c)
                - HTML report, so that you can view with your browser
                - % of checked URLs in the status bar
                - URL list to chose from in "new" dialog box
                - Automatic retry with GET when certain conditions are
                met that suggest that the server cannot process the HEAD
                command (www.amazon.com , www.wildkidz.com, www.dejanews.com )
                - corrected display bug in "Reset Item" feature
                - corrected bug when http:// in the middle of an URL
                (www.sueddeutsche.de used this)
                - corrected bug that incorrectly processed URLs that started
                with a space
                - corrected bug when saving while busy, that made reloading crash

                15.8.1997 (1.0b)
                - <BASE HREF="url"> now handled correctly (www.trancenet.org used it)
                - "Reset Item" feature to recheck a single broken URL
                - Automatic saving of window placement in INI file
                - Error msg when trying to check non-http/https sites
                - Reports are deleted when the next report is made
                (*** Please go to your temp directory and delete all the TGH*.* files)
                - "Scroll bug" found and removed!
                - Now possibility to check your bookmark file
                - found column click bug, corrected, implemented time sorting
                - New column: server.
                - New column: title.
                - Properties Dialog Box

                10.8.1997
                - ability to save & restore
                - complete list of URLs (good to submit to a search engine)
                - new icons
                - # of threads in status line
                - correct size of dynamic html files
                - "copy" and "launch URL" function in menu and popup menu
                - launch report all the time
              • tarastockford
                Hi Tilman The duplicates feature is useful, thanks. It would be good to have separate subheadings for duplicate content, duplicate titles and duplicate
                Message 7 of 30 , May 18, 2011
                • 0 Attachment
                  Hi Tilman

                  The duplicates feature is useful, thanks. It would be good to have separate subheadings for duplicate content, duplicate titles and duplicate descriptions if possible, instead of them being all mixed together.

                  The release candidate is working well for me so far.

                  Thanks

                  Tara
                • Tilman Hausherr
                  Hi Tara, While what you do write does make sense, the problem is that I should report dup titles only if the pages itself are not duplicates, i.e. it is all
                  Message 8 of 30 , May 18, 2011
                  • 0 Attachment
                    Hi Tara,

                    While what you do write does make sense, the problem is that I should
                    report dup titles only if the pages itself are not duplicates, i.e. it
                    is all connected. So if I would seperate the three, I would have to do
                    through my URL lists three times instead of just once, i.e. it would be
                    even slower. For a website with 70000 URLs, the manager statistics
                    currently take about 15 seconds. Sure, I could just write the segments
                    into temporary files and then put it back together, but that would also
                    be more work...

                    So I'll keep this mail but wait if more people complain about that :-)

                    I might also separate that part of the report from the manager section
                    and create a "SEO fans" section...

                    Tilman

                    On Wed, 18 May 2011 16:30:00 -0000, tarastockford wrote:

                    >Hi Tilman
                    >
                    >The duplicates feature is useful, thanks. It would be good to have separate subheadings for duplicate content, duplicate titles and duplicate descriptions if possible, instead of them being all mixed together.
                    >
                    >The release candidate is working well for me so far.
                    >
                    >Thanks
                    >
                    >Tara
                    >
                    >
                    >
                    >------------------------------------
                    >
                    >Yahoo! Groups Links
                    >
                    >
                    >
                  • Tilman Hausherr
                    ...And another. http://home.snafu.de/tilman/tmp/xenubeta.zip One guy had a seemingly minor bug in the report that showed that a change I made one year ago
                    Message 9 of 30 , May 21, 2011
                    • 0 Attachment
                      ...And another.
                      http://home.snafu.de/tilman/tmp/xenubeta.zip

                      One guy had a seemingly minor bug in the report that showed that a
                      change I made one year ago wasn't really good enough.

                      The bug was about foreign characters on pages done with ISO-8859-1 (this
                      is ANSI, sortof). If you have foreign characters in titles (even if you
                      use UTF8), and especially in URLs, please try the current beta.

                      Tilman

                      On Mon, 16 May 2011 18:55:08 +0200, Tilman Hausherr wrote:

                      >I hereby declare that todays beta is a release candidate.
                      >http://home.snafu.de/tilman/tmp/xenubeta.zip
                      >
                      >Please support me by testing your website with it.
                      >
                      >I will probably release the 1.3.9 version at the end of the week.
                      >
                      >Tilman
                      >
                      >
                      >
                      >------------------------------------
                      >
                      >Yahoo! Groups Links
                      >
                      >
                      >
                    • Bruce Hartford
                      Using the Xenu 1.3.8 I m getting a ton of false reports of broken links to external websites. They have error code 404 (not found), 12007 (no such host), 12029
                      Message 10 of 30 , Dec 16, 2011
                      • 0 Attachment
                        Using the Xenu 1.3.8 I'm getting a ton of false reports of broken links
                        to external websites. They have error code 404 (not found), 12007 (no
                        such host), 12029 (no connection). Yet when I click on the supposedly
                        bad link in the HTML Broken Link Report, the page loads with no problem
                        or delay. Over the past few weeks, I estimate that 95% of the supposedly
                        broken links Xenu reports have actually been valid URLs.

                        Any thoughts?

                        Bruce
                      • Tilman Hausherr
                        Try less threads. Also uncheck fail all URLs of same failed host in the advanced options dialog. Tilman
                        Message 11 of 30 , Dec 31, 2011
                        • 0 Attachment
                          Try less threads. Also uncheck "fail all URLs of same failed host" in
                          the advanced options dialog.

                          Tilman

                          On Fri, 16 Dec 2011 13:25:14 -0800, Bruce Hartford wrote:

                          >Using the Xenu 1.3.8 I'm getting a ton of false reports of broken links
                          >to external websites. They have error code 404 (not found), 12007 (no
                          >such host), 12029 (no connection). Yet when I click on the supposedly
                          >bad link in the HTML Broken Link Report, the page loads with no problem
                          >or delay. Over the past few weeks, I estimate that 95% of the supposedly
                          >broken links Xenu reports have actually been valid URLs.
                          >
                          >Any thoughts?
                          >
                          >Bruce
                          >
                          >
                          >
                          >------------------------------------
                          >
                          >Yahoo! Groups Links
                          >
                          >
                          >
                        • Bruce Hartford
                          Thanks, I tried your suggestion but no joy. I m still getting a huge number of false error code: 12007 (no such host) errors. When I click on the URL of the
                          Message 12 of 30 , Dec 31, 2011
                          • 0 Attachment
                            Thanks, I tried your suggestion but no joy. I'm still getting a huge
                            number of false "error code: 12007 (no such host)" errors. When I click
                            on the URL of the supposedly unavailable site it pops right up.

                            Bruce



                            On 12/31/2011 2:17 AM, Tilman Hausherr wrote:
                            > Try less threads. Also uncheck "fail all URLs of same failed host" in
                            > the advanced options dialog.
                            >
                            > Tilman
                            >
                            > On Fri, 16 Dec 2011 13:25:14 -0800, Bruce Hartford wrote:
                            >
                            >> Using the Xenu 1.3.8 I'm getting a ton of false reports of broken links
                            >> to external websites. They have error code 404 (not found), 12007 (no
                            >> such host), 12029 (no connection). Yet when I click on the supposedly
                            >> bad link in the HTML Broken Link Report, the page loads with no problem
                            >> or delay. Over the past few weeks, I estimate that 95% of the supposedly
                            >> broken links Xenu reports have actually been valid URLs.
                            >>
                            >> Any thoughts?
                            >>
                            >> Bruce
                            >>
                            >>
                            >>
                            >> ------------------------------------
                            >>
                            >> Yahoo! Groups Links
                            >>
                            >>
                            >>
                            >

                            --
                            Bruce Hartford
                            Webspinner: Civil Rights Movement Veterans website http://www.crmvet.org
                            Sojourner's Blog: http://ohfreedom.wordpress.com
                          • Fischer, Thomas
                            Hi Bruce, can you give examples of the URLs that are reported as failing while working when clicked? Xenu is pickier than browsers about correct URLs , e.g.
                            Message 13 of 30 , Jan 3, 2012
                            • 0 Attachment
                              Hi Bruce,
                               
                              can you give examples of the URLs that are reported as failing while working when clicked? Xenu is pickier than browsers about "correct URLs", e.g. it will not follow erroneous relative paths and will regard directories without trailing slash as errors.
                               
                              All the best
                              Thomas


                              Von: xenu-usergroup@yahoogroups.com [mailto:xenu-usergroup@yahoogroups.com] Im Auftrag von Bruce Hartford
                              Gesendet: Samstag, 31. Dezember 2011 22:24
                              An: xenu-usergroup@yahoogroups.com
                              Betreff: Re: [xenu-usergroup] Loads of False Error

                               

                              Thanks, I tried your suggestion but no joy. I'm still getting a huge
                              number of false "error code: 12007 (no such host)" errors. When I click
                              on the URL of the supposedly unavailable site it pops right up.

                              Bruce

                              On 12/31/2011 2:17 AM, Tilman Hausherr wrote:
                              > Try less threads. Also uncheck "fail all URLs of same failed host" in
                              > the advanced options dialog.
                              >
                              > Tilman
                              >
                              > On Fri, 16 Dec 2011 13:25:14 -0800, Bruce Hartford wrote:
                              >
                              >> Using the Xenu 1.3.8 I'm getting a ton of false reports of broken links
                              >> to external websites. They have error code 404 (not found), 12007 (no
                              >> such host), 12029 (no connection). Yet when I click on the supposedly
                              >> bad link in the HTML Broken Link Report, the page loads with no problem
                              >> or delay. Over the past few weeks, I estimate that 95% of the supposedly
                              >> broken links Xenu reports have actually been valid URLs.
                              >>
                              >> Any thoughts?
                              >>
                              >> Bruce
                              >>
                              >>
                              >>
                              >> ------------------------------------
                              >>
                              >> Yahoo! Groups Links
                              >>
                              >>
                              >>
                              >

                              --
                              Bruce Hartford
                              Webspinner: Civil Rights Movement Veterans website http://www.crmvet.org
                              Sojourner's Blog: http://ohfreedom.wordpress.com

                            • Tilman Hausherr
                              Were these all the same host? (I.e. the part between http:// and the third / ) Btw you also need to press CTRL-R to retry after making the changes I mentioned.
                              Message 14 of 30 , Jan 3, 2012
                              • 0 Attachment
                                Were these all the same host? (I.e. the part between http:// and the
                                third / )

                                Btw you also need to press CTRL-R to retry after making the changes I
                                mentioned.

                                Tilman

                                Am 31.12.2011 22:23, schrieb Bruce Hartford:
                                > Thanks, I tried your suggestion but no joy. I'm still getting a huge
                                > number of false "error code: 12007 (no such host)" errors. When I click
                                > on the URL of the supposedly unavailable site it pops right up.
                                >
                                > Bruce
                                >
                                >
                                >
                                > On 12/31/2011 2:17 AM, Tilman Hausherr wrote:
                                >> Try less threads. Also uncheck "fail all URLs of same failed host" in
                                >> the advanced options dialog.
                                >>
                                >> Tilman
                                >>
                                >> On Fri, 16 Dec 2011 13:25:14 -0800, Bruce Hartford wrote:
                                >>
                                >>> Using the Xenu 1.3.8 I'm getting a ton of false reports of broken links
                                >>> to external websites. They have error code 404 (not found), 12007 (no
                                >>> such host), 12029 (no connection). Yet when I click on the supposedly
                                >>> bad link in the HTML Broken Link Report, the page loads with no problem
                                >>> or delay. Over the past few weeks, I estimate that 95% of the supposedly
                                >>> broken links Xenu reports have actually been valid URLs.
                                >>>
                                >>> Any thoughts?
                                >>>
                                >>> Bruce
                                >>>
                                >>>
                                >>>
                                >>> ------------------------------------
                                >>>
                                >>> Yahoo! Groups Links
                                >>>
                                >>>
                                >>>
                              • Bruce Hartford
                                ... http://www.crmvet.org/crmlinks.htm http://www.bobzellner.com/ _____ error code: 12007 (no such host)
                                Message 15 of 30 , Jan 3, 2012
                                • 0 Attachment
                                  On 1/3/2012 1:42 AM, Fischer, Thomas wrote:
                                  > Hi Bruce,
                                  >
                                  > can you give examples of the URLs that are reported as failing while
                                  > working when clicked? Xenu is pickier than browsers about "correct
                                  > URLs", e.g. it will not follow erroneous relative paths and will
                                  > regard directories without trailing slash as errors.

                                  http://www.crmvet.org/crmlinks.htm
                                  http://www.bobzellner.com/
                                  \_____ error code: 12007 (no such host)
                                  http://www.farmworkermovement.us/ufwarchives/index.shtml
                                  \_____ error code: 12007 (no such host)
                                  https://mycampus.asurams.edu/web/event-civil-rights/home
                                  \_____ error code: 12007 (no such host)
                                  http://www.pba.org/programming/programs/thisisatlanta/atlcivilrights
                                  \_____ error code: 12007 (no such host)
                                  http://www.atlantastudentmovement.org/
                                  \_____ error code: 12007 (no such host)

                                  Bruce
                                Your message has been successfully submitted and would be delivered to recipients shortly.