Loading ...
Sorry, an error occurred while loading the content.

Finding News Taxonomies [was: RE: Towards a TAG consideration of CURIEs]

Expand Messages
  • Misha Wolf
    ... Well, there are two options for URI construction: a) use simple concatenation of taxonomy URI and code, b) require that a specified string be injected
    Message 1 of 5 , Apr 7, 2007
    • 0 Attachment
      John Cowan wrote:

      > Misha Wolf scripsit:
      >
      > > As we would very strongly prefer to end up with a Web page per
      > > Taxonomy,
      >
      > Is that really sensible when taxonomies are very large? Consider
      > SNOMED-CT, with upwards of 300,000 terms. I should think that
      > the choice of / vs. # should be allowed to depend on the taxonomy
      > in use.

      Well, there are two options for URI construction:

      a) use simple concatenation of taxonomy URI and code,

      b) require that a specified string be injected between the taxonomy
      URI and the code.

      I agree with David Booth that consuming programs shouldn't have to
      contain hardwired knowledge of the rules for each taxonomy. I'm not
      sure, though, that there exists a viable mechanism for telling a
      program which of the above to do, for each of the hundreds of
      taxonomies used for News. I haven't looked at GRDDL for some time,
      but I seem to recall that it is designed for interpreting document
      instances, so is probably not the right tool for specifying how to
      handle a taxonomy that will be used by millions of documents. I
      also don't recall such a capability in RDDL, though I haven't looked
      at it, too, for quite some time.

      So if we limited ourselves to one rule only, and if we wanted to
      support the use of both "#" and "/", we would probbaly have to go
      for simple concatenation and specify that in cases where any of the
      codes would not be legal fragment IDs, the taxonomy URI must end
      with a character which will sanitise the code. This approach is
      illustrated by choices 1 and 2 in my previous mail:

      1. Simple concatenation using "/" as the delimiter
      "http://www.iptc.org/NewsCodes/" & "123456" ->
      "http://www.iptc.org/NewsCodes/123456"

      2. Simple concatenation using "#_" as the delimiter
      "http://www.iptc.org/NewsCodes#_" & "123456" ->
      "http://www.iptc.org/NewsCodes#_123456"

      One of the disadvantages is that a number of RDF tools can't cope
      with choice 2. At any rate, this seemed to be the case when I last
      looked into this matter.

      Misha Wolf
      News Standards Manager, Reuters, http://www.reuters.com/
      Vice Chair, News Architecture WP, IPTC, http://www.iptc.org/

      This email was sent to you by Reuters, the global news and information company.
      To find out more about Reuters visit www.about.reuters.com

      Any views expressed in this message are those of the individual sender,
      except where the sender specifically states them to be the views of Reuters Limited.

      Reuters Limited is part of the Reuters Group of companies, of which Reuters Group PLC is the ultimate parent company.
      Reuters Group PLC - Registered office address: The Reuters Building, South Colonnade, Canary Wharf, London E14 5EP, United Kingdom
      Registered No: 3296375
      Registered in England and Wales
    • John Cowan
      ... Or for that matter just _ . I *never* understood why that was such a problem. ... +1 -- Where the wombat has walked, John Cowan
      Message 2 of 5 , Apr 12, 2007
      • 0 Attachment
        Booth, David (HP Software - Boston) scripsit:

        > Would it be feasible to mandate a particular prefix as part of all
        > taxonomy IDs, such as "code:"? For example:

        Or for that matter just "_". I *never* understood why that was
        such a problem.

        > I know you (or someone else) mentioned that publishers do not want to
        > modify their existing codes, but something like this would be easy for
        > both human and machine to syntactically distinguish from the original
        > codes ("12345" or "foo"). In that sense the prefix seems conceptually
        > no different from other XML syntax that surrounds the original codes and
        > must be parsed away to retrieve the original codes.

        +1

        --
        Where the wombat has walked, John Cowan <cowan@...>
        it will inevitably walk again. http://www.ccil.org/~cowan
      Your message has been successfully submitted and would be delivered to recipients shortly.