Finding News Taxonomies [was: RE: Towards a TAG consideration of CURIEs]
- John Cowan wrote:
> Misha Wolf scripsit:Well, there are two options for URI construction:
> > As we would very strongly prefer to end up with a Web page per
> > Taxonomy,
> Is that really sensible when taxonomies are very large? Consider
> SNOMED-CT, with upwards of 300,000 terms. I should think that
> the choice of / vs. # should be allowed to depend on the taxonomy
> in use.
a) use simple concatenation of taxonomy URI and code,
b) require that a specified string be injected between the taxonomy
URI and the code.
I agree with David Booth that consuming programs shouldn't have to
contain hardwired knowledge of the rules for each taxonomy. I'm not
sure, though, that there exists a viable mechanism for telling a
program which of the above to do, for each of the hundreds of
taxonomies used for News. I haven't looked at GRDDL for some time,
but I seem to recall that it is designed for interpreting document
instances, so is probably not the right tool for specifying how to
handle a taxonomy that will be used by millions of documents. I
also don't recall such a capability in RDDL, though I haven't looked
at it, too, for quite some time.
So if we limited ourselves to one rule only, and if we wanted to
support the use of both "#" and "/", we would probbaly have to go
for simple concatenation and specify that in cases where any of the
codes would not be legal fragment IDs, the taxonomy URI must end
with a character which will sanitise the code. This approach is
illustrated by choices 1 and 2 in my previous mail:
1. Simple concatenation using "/" as the delimiter
"http://www.iptc.org/NewsCodes/" & "123456" ->
2. Simple concatenation using "#_" as the delimiter
"http://www.iptc.org/NewsCodes#_" & "123456" ->
One of the disadvantages is that a number of RDF tools can't cope
with choice 2. At any rate, this seemed to be the case when I last
looked into this matter.
News Standards Manager, Reuters, http://www.reuters.com/
Vice Chair, News Architecture WP, IPTC, http://www.iptc.org/
This email was sent to you by Reuters, the global news and information company.
To find out more about Reuters visit www.about.reuters.com
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of Reuters Limited.
Reuters Limited is part of the Reuters Group of companies, of which Reuters Group PLC is the ultimate parent company.
Reuters Group PLC - Registered office address: The Reuters Building, South Colonnade, Canary Wharf, London E14 5EP, United Kingdom
Registered No: 3296375
Registered in England and Wales
- Booth, David (HP Software - Boston) scripsit:
> Would it be feasible to mandate a particular prefix as part of allOr for that matter just "_". I *never* understood why that was
> taxonomy IDs, such as "code:"? For example:
such a problem.
> I know you (or someone else) mentioned that publishers do not want to+1
> modify their existing codes, but something like this would be easy for
> both human and machine to syntactically distinguish from the original
> codes ("12345" or "foo"). In that sense the prefix seems conceptually
> no different from other XML syntax that surrounds the original codes and
> must be parsed away to retrieve the original codes.
Where the wombat has walked, John Cowan <cowan@...>
it will inevitably walk again. http://www.ccil.org/~cowan