Loading ...
Sorry, an error occurred while loading the content.

AW: [xenu-usergroup] Feature Suggestion: Please provide a loadable website definition file.

Expand Messages
  • Thomas Fischer
    Hello Tilman, ... I d like to second Rolf on his request. There is the file Xenu.ini with all the information, and I prepared it for my users accordingly, so
    Message 1 of 6 , Jan 4, 2010
    • 0 Attachment
      Hello Tilman,

      > > On Sat, 02 Jan 2010 14:20:54 -0000, Rolf Hemmerling wrote:
      > > My Feature Suggestion:
      > > a) Please provide a loadable website definition file
      > > ( = configuration file )
      > >
      > > as typing in the "exceptions" each time is annoying ( after
      > > you offer exceptions, thanks for that ). And thinking about
      > > proper exceptions is hard work, so you have to write it down
      > > anyhow - why not in a configuration file ?!
      > >
      > > So that a non-technical expert might run XENU monthly/weekly
      > > etc., by just running Xenu and loading the website file
      > > prepared by an expert.
      >
      > They are loaded after you have typed in the URL, or got it
      > from the drop down box.

      I'd like to second Rolf on his request.
      There is the file Xenu.ini with all the information, and I prepared it for
      my users accordingly, so that they can start their linkchecks without
      worrying about restrictions.
      But it is a strange mixture of preferences, history, settings etc, and every
      time a check is run it is changed by Xenu.
      So while it serves some of the purposes that Rolf is asking for, it is not
      quite as straightforward as it could be.
      I would prefer a separation of the file into different files, in particular
      one file for each website that is checked with all the additional settings
      needed and which remains unchanged by Xenu (unless preferences are altered).
      My Xenu.ini now has 55KB and it starts to get a little confusing.

      All the best
      Thomas
    • Tilman Hausherr
      Hi, For those of you who are annoyed of the bug with mail URLs of the kind mailto:user@host.com?subject=xxx there s a new version that solves it:
      Message 2 of 6 , Jan 4, 2010
      • 0 Attachment
        Hi,

        For those of you who are annoyed of the bug with mail URLs of the kind

        mailto:user@...?subject=xxx

        there's a new version that solves it:
        http://home.snafu.de/tilman/tmp/xenubeta.zip

        Tilman
      • Tilman Hausherr
        It seems that there s a bug in my software with links like this one: interview I just
        Message 3 of 6 , Jan 13, 2010
        • 0 Attachment
          It seems that there's a bug in my software with links like this one:

          <a
          href="http://www.dctp.tv/#/meinungsmacher/udo-vetter-lawblog">interview</a>

          I just throw away everything after the #, so I would spider to
          http://www.dctp.tv/ , which shows a different content.

          Does anybody know the meaning of a # that appears "deep inside" an URL,
          and what would the correct logic to differentiate it from the classic
          '#' as explained in
          http://www.w3.org/Addressing/URL/uri-spec.html ? Could it be "it doesn't
          count if the '#' is before a '/'" ?

          If so, what about this URL
          http://www.ftd.de/auto/bilder/:galerie-die-fiatisierung-von-chrysler/50059172.html#utm_source=rss&utm_medium=rss_feed&utm_campaign=/
          where the content is identical to this URL
          http://www.ftd.de/auto/bilder/:galerie-die-fiatisierung-von-chrysler/50059172.html
          ?

          Tilman
        • Daniel Norton
          That s not a bug in your software, it s a bug in the website. The hash sign (#) in a URI is a reserved character and a URI with a hash sign (#) should retrieve
          Message 4 of 6 , Jan 13, 2010
          • 0 Attachment
            That's not a bug in your software, it's a bug in the website. The hash sign (#) in a URI is a reserved character and a URI with a hash sign (#) should retrieve the same document as the URI without the hash sign and everything following it (the fragment identifier). From RFC 3986 (highlight added):

            4.4 Same-Document Reference

            When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (Section 5.1), that reference is called a "same-document" reference. The most frequent examples of same-document references are relative references that are empty or include only the number sign ("#") separator followed by a fragment identifier.

            When a same-document reference is dereferenced for a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference; therefore, a dereference should not result in a new retrieval action.

            The specification does not provide for any exceptions for characters (such as "/") after the hash mark, so they must be considered to be part of the fragment identifier. The W3 document you referenced concurs.

            --
            Daniel

          Your message has been successfully submitted and would be delivered to recipients shortly.