use of # between '/'s
- It seems that there's a bug in my software with links like this one:
I just throw away everything after the #, so I would spider to
http://www.dctp.tv/ , which shows a different content.
Does anybody know the meaning of a # that appears "deep inside" an URL,
and what would the correct logic to differentiate it from the classic
'#' as explained in
http://www.w3.org/Addressing/URL/uri-spec.html ? Could it be "it doesn't
count if the '#' is before a '/'" ?
If so, what about this URL
where the content is identical to this URL
- That's not a bug in your software, it's a bug in the website. The hash sign (#) in a URI is a reserved character and a URI with a hash sign (#) should retrieve the same document as the URI without the hash sign and everything following it (the fragment identifier). From RFC 3986 (highlight added):
- 4.4 Same-Document Reference
When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (Section 5.1), that reference is called a "same-document" reference. The most frequent examples of same-document references are relative references that are empty or include only the number sign ("#") separator followed by a fragment identifier.
When a same-document reference is dereferenced for a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference; therefore, a dereference should not result in a new retrieval action.The specification does not provide for any exceptions for characters (such as "/") after the hash mark, so they must be considered to be part of the fragment identifier. The W3 document you referenced concurs.--Daniel