AW: [xenu-usergroup] Feature Suggestion: Please provide a loadable website definition file.
- Hello Tilman,
> > On Sat, 02 Jan 2010 14:20:54 -0000, Rolf Hemmerling wrote:I'd like to second Rolf on his request.
> > My Feature Suggestion:
> > a) Please provide a loadable website definition file
> > ( = configuration file )
> > as typing in the "exceptions" each time is annoying ( after
> > you offer exceptions, thanks for that ). And thinking about
> > proper exceptions is hard work, so you have to write it down
> > anyhow - why not in a configuration file ?!
> > So that a non-technical expert might run XENU monthly/weekly
> > etc., by just running Xenu and loading the website file
> > prepared by an expert.
> They are loaded after you have typed in the URL, or got it
> from the drop down box.
There is the file Xenu.ini with all the information, and I prepared it for
my users accordingly, so that they can start their linkchecks without
worrying about restrictions.
But it is a strange mixture of preferences, history, settings etc, and every
time a check is run it is changed by Xenu.
So while it serves some of the purposes that Rolf is asking for, it is not
quite as straightforward as it could be.
I would prefer a separation of the file into different files, in particular
one file for each website that is checked with all the additional settings
needed and which remains unchanged by Xenu (unless preferences are altered).
My Xenu.ini now has 55KB and it starts to get a little confusing.
All the best
- It seems that there's a bug in my software with links like this one:
I just throw away everything after the #, so I would spider to
http://www.dctp.tv/ , which shows a different content.
Does anybody know the meaning of a # that appears "deep inside" an URL,
and what would the correct logic to differentiate it from the classic
'#' as explained in
http://www.w3.org/Addressing/URL/uri-spec.html ? Could it be "it doesn't
count if the '#' is before a '/'" ?
If so, what about this URL
where the content is identical to this URL
- That's not a bug in your software, it's a bug in the website. The hash sign (#) in a URI is a reserved character and a URI with a hash sign (#) should retrieve the same document as the URI without the hash sign and everything following it (the fragment identifier). From RFC 3986 (highlight added):
- 4.4 Same-Document Reference
When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (Section 5.1), that reference is called a "same-document" reference. The most frequent examples of same-document references are relative references that are empty or include only the number sign ("#") separator followed by a fragment identifier.
When a same-document reference is dereferenced for a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference; therefore, a dereference should not result in a new retrieval action.The specification does not provide for any exceptions for characters (such as "/") after the hash mark, so they must be considered to be part of the fragment identifier. The W3 document you referenced concurs.--Daniel