Here's another style question for SquishQL: should URI/URIrefs be quoted in
some way or should they be unquoted?
A related issue is how should lists be written: with or without commas. Or
should commas be optional. This impacts URIs because of trailing commas.
e.g. <http://somewhere.org/file.html> vs http://somewhere.org/file.html
If they are quoted; how? RFC2396 suggests <>
Some of the issues: a short extract from RFC2396 is below + other
references inline:
1/ <> is just extra length for things that are already long.
2/ URI{refs} can't contain literal space (they must be escaped as %20) so
space could be used to delimit URIs. But they can contain quite a few
characters:
(sections 2.2 and 2.3)
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
So it gets confusing because:
urn:example:/name)
is a URI so
(?x ?y urn:example:/name) should be (?x ?y urn:example:/name)
)
If we have no commas separators, and unquoted URIs: this is legal as a
single URI:
urn:example:fred,
which might be considered confusing by some.
3/ Absolute URIs must start with a scheme name and a scheme name is:
scheme = alpha *( alpha | digit | "+" | "-" | "." )
4/ n-triple quotes URIs always.
Options:
a - Always quote URIs
b - Have unquoted URIs, with more escapes.
c - Allow unquoted and quoted URI: a URI must be quoted if it is confusing
(we have to define "confusing")
Others?
Notes:
I would like to have quoted URIs always with the quote mechanism being <>.
Andy
Extract from RFC 2396:
2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include here a
description of those US-ASCII characters that have been excluded and
the reasons for their exclusion.
The control characters in the US-ASCII coded character set are not
used within a URI, both because they are non-printable and because
they are likely to be misinterpreted by some control mechanisms.
control = <US-ASCII coded characters 00-1F and 7F hexadecimal>
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.
space = <US-ASCII coded character 20 hexadecimal>
The angle-bracket "<" and ">" and double-quote (") characters are
excluded because they are often used as the delimiters around URI in
text documents and protocol fields. The character "#" is excluded
because it is used to delimit a URI from a fragment identifier in URI
references (Section 4). The percent character "%" is excluded because
it is used for the encoding of escaped characters.
delims = "<" | ">" | "#" | "%" | <">
Other characters are excluded because gateways and other transport
agents are known to sometimes modify such characters, or they are
used as delimiters.
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
Data corresponding to excluded characters must be escaped in order to
be properly represented within a URI.