Loading ...
Sorry, an error occurred while loading the content.

CURIEs: A proposal

Expand Messages
  • Misha Wolf
    Hi folks, A modest proposal, drawing on ideas from Mark, Henry, Tim, Dan, Norm and others: 1 We agree on a generic syntax and generic rules for Compact URIs
    Message 1 of 14 , Jun 2, 2006
    • 0 Attachment
      Hi folks,

      A modest proposal, drawing on ideas from Mark, Henry, Tim, Dan, Norm
      and others:

      1 We agree on a generic syntax and generic rules for Compact URIs
      (CURIEs) in attribute values.

      2 We agree that restricted syntaxes and rules will be (or have
      been) defined for specific purposes. One such purpose is XML
      Namespaces and QNames.

      3 Groups within the W3C and elsewhere will define other restricted
      syntaxes and rules for their own purposes.

      4 The generic syntax for a CURIE in an attribute value will be:
      <foo bar="prefix:suffix"/>

      5 The generic syntax for multiple CURIEs in an attribute value
      will (where permitted) be:
      <foo bar="prefix1:suffix1 ... prefixN:suffixN"/>

      6 Both the prefix and the suffix may (in the generic case) be
      numeric.

      7 Each language must specify:

      7a the syntactic constraints (if any) on the prefix and suffix.

      7b how CURIEs and URIs are distinguished, eg through dedicated
      attributes or through a special syntax.

      7c the mechanism for specifying the prefix-to-IRI mapping. The
      mechanism may use information provided out-of-band.

      7d whether and, if so, how the prefix and suffix are combined to
      form an IRI.

      7e whether the prefix and suffix form a tuple or whether they are
      just a compact representation for an IRI.

      7f whether the IRI mapped to the prefix is required to be
      dereferenceable.

      7g whether the IRI built from the prefix and suffix (and, possibly,
      including also other building blocks) is required to be
      dereferenceable.

      7h whether any fragment identifiers in these IRIs are required to
      be legal XML names.

      8 To avoid confusion with XML Namespaces and QNames:

      8a The xmlns attribute is reserved for use with XML Namespaces and
      QNames.

      8b If a prefix matches an xmlns declaration then the CURIE MUST be
      interpreted as a QName.

      Misha
      ------------------- NewsML 2 resources ------------------------------
      http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
      http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


      To find out more about Reuters visit www.about.reuters.com

      Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
    • Misha Wolf
      I m concerned that people are incorrectly assuming that CURIEs address a single set of requirements. On this mistaken foundation, various proposals are
      Message 2 of 14 , Jun 12, 2006
      • 0 Attachment
        I'm concerned that people are incorrectly assuming that CURIEs
        address a single set of requirements. On this mistaken foundation,
        various proposals are developed, which do not satisfy the full set
        of requirements. I hope to respond to Harry's mail tomorrow. In
        the meantime, please take a look at my presentation to the W3C AC,
        which summarises the News Industry's requirements:
        http://lists.w3.org/Archives/Public/www-archive/2006Jun/0013.html
        and at my mail titled "CURIEs: A proposal":
        http://lists.w3.org/Archives/Public/www-tag/2006Jun/0007.html

        Misha
        ------------------- NewsML 2 resources ------------------------------
        http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
        http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


        -----Original Message-----
        From: public-rdf-in-xhtml-tf-request@... [mailto:public-rdf-in-xhtml-tf-request@...] On Behalf Of Harry Halpin
        Sent: 09 June 2006 10:37
        To: mark.birbeck@...
        Cc: www-tag@...; public-rdf-in-xhtml-tf@...
        Subject: Re: CURIEs: A proposal


        I think Henry's pointing to the conceptual problem with making CURIEs a
        "superset" of QNames. Unlike CURIEs, QNames (at least as I've been able
        to discover, correct me if I'm wrong - the spec just seems silent) do
        not define an algorithm for converting an entire QName to a IRI, and by
        "algorithm" we're not talking about anything fancy - but just
        concatenating the namespace URI and the local name as strings, which is
        what most processors do anyways - as pointed out by Borden [1] and
        raised to the TAG [2], who seemed to be answer a sort of different
        question in their finding.

        Yet a processor can map (expanded name, local name) like
        (http://www.w3.org/1999/XSL/Transform,template) to an IRI by doing:

        http://www.w3.org/1999/XSL/Transformtemplate

        Or by doing:

        http://www.w3.org/1999/XSL/Transform#template

        And it seems both would be equally valid or invalid, depending on your
        opinion.

        So by making CURIEs a superset of QNames is a bit difficult as long as
        the QName (namespace prefix, local name)=>IRI construction is
        unspecified. And so using the ":" for QNames and CURIEs means that given
        any "x:y" element or attribute name one couldn't tell whether one meant
        an IRI or a (namespace prefix, local name). So there seems to be two
        choices:

        1) Unspecified IRI construction for the entire (namespace prefix, local
        name) a *bug* in QNames and should be corrected post-hoc by the CURIE
        proposal. If this is the case, then CURIEs should use ":" and then make
        themselves a superset of QNames.

        or

        2) Unspecified IRI construction in QNames is a *feature* and so CURIEs
        should
        exist as a parallel standard, and so use [insert character besides ":"
        here] in order to keep confusion between QNames and CURIEs at a minimum.

        My earlier post is that some communities (i.e. some of the microformat
        people I talked to at WWW2006) mentioned that they would like another
        character besides : for "namespaces in microformats." Next time I'll
        just tell them to escape their colons :)

        [1] http://www.openhealth.org/RDF/QNameQuagmire.html
        [2] http://www.w3.org/2001/tag/issues.html#rdfmsQnameUriMapping-6

        --
        -harry

        Harry Halpin, University of Edinburgh
        http://www.ibiblio.org/hhalpin 6B522426



        To find out more about Reuters visit www.about.reuters.com

        Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
      • Misha Wolf
        Hi Stuart, ... I believe that QNames should form a subset of CURIEs. This implies that the same syntax should be used. ... The IPTC requires numeric suffix
        Message 3 of 14 , Jun 13, 2006
        • 0 Attachment
          Hi Stuart,

          > Hello Misha,
          >
          > FWIW... a colleague suggested the use of '::' to separate prefix
          > from suffix ie. prefix::suffix
          >
          > Rationale:
          > 1) Visually/Syntactically distinct from QNames.
          > 2) Appealingly similar in appearance to QNames.

          I believe that QNames should form a subset of CURIEs. This implies
          that the same syntax should be used.

          > Regarding 7(a-h) below:
          > This seems to me to leave far too many things open for each
          > language using CURIEs to have to specify - making it difficult to
          > conceive of generic libraries for handling CURIEs. In particular:
          >
          > 7a) there should only be one set of syntactic constraints;

          The IPTC requires numeric suffix values. Others may not. Also, I
          believe that QNames should form a subset of CURIEs. Both of these
          imply that the same constraints cannot apply.

          > 7b) see '::' suggestion above
          > 7d) *if* CURIEs are genuinely a compact way of writing a URI, there
          > should be a *single* mapping from a CURIE to a URI/IRI.

          As I showed in my presentation at the AC meeting, the IPTC requires
          a tuple, *not* (just) a compact way of writing a URI. Furthermore,
          the IPTC requires that a {prefix, numeric suffix} can be used to
          build a legal (X)HTML URI, complete with fragment ID, eg:
          .../iptc.org/example#_12345
          Others require plain concatenation. Hence, there can't be a single
          construction rule.

          > 7e) should have a single answer... which probably (regrettably)
          > means a CURIE is a tuple of {prefix, suffix, prefixURI,
          > compactedURI}

          Maybe. But not compactedURI, please.

          > 7f-g) seems like normal good practice with URIs applies any CURIE
          > spec should remain silent.

          See comments re numerics above.

          > 7h) again surely a matter for generic URI/IRI syntax.

          See comments re numerics above.

          > Fixing all of that would leave solely the matter of establish a
          > prefix=>URi mapping on a per language basis (7c), and I would hope
          > there would be a single approach for XML based languages - other
          > non XML based languages (N3 (and friends), SPARQL...) would have
          > to define their own mechanisms.

          See above.

          > 8b) seems troubling because it risks confusing a Qname with a
          > CURIE.

          See above.

          Regards,
          Misha
          ------------------- NewsML 2 resources ------------------------------
          http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
          http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


          To find out more about Reuters visit www.about.reuters.com

          Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
        • Misha Wolf
          Hi Henry, ... I seem to be having a real problem contesting the accepted reality that a CURIE is a shorthand for an IRI. As I explained in my Edinburgh
          Message 4 of 14 , Jun 13, 2006
          • 0 Attachment
            Hi Henry,

            > My initial reaction is similar to Stuart's -- there's a lot to
            > agree with here, but
            >
            > 1) I think we do better to keep QNames (shorthand for an *expanded
            > name* which is a pair of an absolute IRI and an NMTOKEN local
            > name) and CURIEs (shorthand for an IRI) clearly distinguished
            > conceptually;
            >
            > 2) We think seriously about an alternative to the ':' as the
            > separator for CURIEs.

            I seem to be having a real problem contesting the accepted reality
            that a CURIE is a shorthand for an IRI. As I explained in my
            Edinburgh presentation, what the IPTC requires is a {prefix, suffix}
            tuple, associated with two IRIs:
            - one corresponding to the prefix,
            - one corresponding to some combination of the prefix and the
            suffix.

            As you and others pointed out in Edinburgh, the most obvious way
            of combining prefix and suffix would yield illegal fragment IDs,
            given that many of the suffix values are numeric. The available
            solutions appear to be:

            S1. Define a new media-type, which allows numeric fragment IDs.
            This approach would prevent the use of HTML or HTML/RDF pages
            for documenting taxonomies.

            S2. Don't use numeric suffix values. This approach to reality was,
            I'm pleased to say, recognised as ostrich-like when the W3C and
            the IETF accepted IRIs as real-world constructs and URIs as
            mangled IRIs, to be used in those environments which can't cope
            with IRIs.

            S3. Define a construction rule which avoids these problems, eg:
            <prefixIRI> & "#_" & <suffix>

            This reminds me that simple concatenation:
            <prefixIRI> & <suffix>
            would be broken as we would have to end the <prefixIRI> with a "#".
            This would mean that the page corresponding to a taxonomy as a whole
            would be identified by, eg:
            http://example.org/taxonomy1#
            My reading of the relevant specs suggests that this would be wrong.

            Regards,
            Misha
            ------------------- NewsML 2 resources ------------------------------
            http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
            http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


            To find out more about Reuters visit www.about.reuters.com

            Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
          • Misha Wolf
            Hi Harry, ... I contest the assertion that there is some agreed rule for how to obtain an IRI from the {prefix, suffix} of a CURIE. ... If this is so, it
            Message 5 of 14 , Jun 13, 2006
            • 0 Attachment
              Hi Harry,

              > I think Henry's pointing to the conceptual problem with making
              > CURIEs a "superset" of QNames. Unlike CURIEs, QNames (at least as
              > I've been able to discover, correct me if I'm wrong - the spec just
              > seems silent) do not define an algorithm for converting an entire
              > QName to a IRI, and by "algorithm" we're not talking about anything
              > fancy - but just concatenating the namespace URI and the local name
              > as strings, which is what most processors do anyways - as pointed
              > out by Borden [1] and raised to the TAG [2], who seemed to be answer
              > a sort of different question in their finding.

              I contest the assertion that there is some agreed rule for how to
              obtain an IRI from the {prefix, suffix} of a CURIE.

              > Yet a processor can map (expanded name, local name) like
              > (http://www.w3.org/1999/XSL/Transform,template) to an IRI by doing:
              >
              > http://www.w3.org/1999/XSL/Transformtemplate
              >
              > Or by doing:
              >
              > http://www.w3.org/1999/XSL/Transform#template
              >
              > And it seems both would be equally valid or invalid, depending on
              > your opinion.

              If this is so, it should be fixed asap.

              > So by making CURIEs a superset of QNames is a bit difficult as long
              > as the QName (namespace prefix, local name)=>IRI construction is
              > unspecified.

              This is not a problem, as the same is true for CURIEs.

              > And so using the ":" for QNames and CURIEs means that given any
              > "x:y" element or attribute name one couldn't tell whether one meant
              > an IRI or a (namespace prefix, local name). So there seems to be two
              > choices:
              >
              > 1) Unspecified IRI construction for the entire (namespace prefix,
              > local name) a *bug* in QNames and should be corrected post-hoc by
              > the CURIE proposal. If this is the case, then CURIEs should use ":"
              > and then make themselves a superset of QNames.

              It is not a bug in QNames.

              > or
              >
              > 2) Unspecified IRI construction in QNames is a *feature* and so
              > CURIEs should exist as a parallel standard, and so use [insert
              > character besides ":" here] in order to keep confusion between
              > QNames and CURIEs at a minimum.

              There is no difference between QNames and CURIEs in this matter.

              > My earlier post is that some communities (i.e. some of the
              > microformat people I talked to at WWW2006) mentioned that they
              > would like another character besides : for "namespaces in
              > microformats." Next time I'll just tell them to escape their
              > colons :)
              >
              > [1] http://www.openhealth.org/RDF/QNameQuagmire.html
              > [2] http://www.w3.org/2001/tag/issues.html#rdfmsQnameUriMapping-6

              Regards,
              Misha
              ------------------- NewsML 2 resources ------------------------------
              http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
              http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


              To find out more about Reuters visit www.about.reuters.com

              Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
            • Dan Connolly
              ... I found the notes from our discussion in Edinburgh, Misha, but then I left them at home and I m travelling. I got a better picture of the requirements, and
              Message 6 of 14 , Jun 13, 2006
              • 0 Attachment
                On Jun 2, 2006, at 2:14 PM, Misha Wolf wrote:
                > Hi folks,
                >
                > A modest proposal, drawing on ideas from Mark, Henry, Tim, Dan, Norm
                > and others:

                I found the notes from our discussion in Edinburgh, Misha, but then I
                left
                them at home and I'm travelling. I got a better picture of the
                requirements,
                and we discussed several options.

                As I understand it, IPTC has a whole bunch of codes... collections
                of codes, in fact. Vocabularies, I gather.

                The goal is a compact syntax to encode a code within a vocabulary,
                such that you can get from this compact syntax a URI for the code
                within the vocabulary and for the vocabulary itself.

                Some of the codes start with digits. We suspect (though we're not
                certain) that vocabularies are homogeneous in this respect: within
                a vocabulary, either all the codes start with a digit or none do.

                I gather these are for use in NewsML2, and there's a desire
                to share technology between NewsML2 and XHTML2 and other
                formats and to use the URIs with RDF tools.

                We discussed a number of possibilities... for the sake of
                example, a numeric code set I know about (though I'm not at all
                sure it's actually used in IPTC...) is SIC codes
                (http://en.wikipedia.org/wiki/SIC_codes ) and a non-numeric
                code set that I know about is IATA codes
                (http://en.wikipedia.org/wiki/IATA_airport_code ).

                Option A. Have a syntax for binding, say, sic: to http://sic.org/vocab1#
                and use sic:0070 for a code. To get a URI for that code, concatenate
                them.
                http://sic.org/vocab1#0070 . To get a URI for the vocabulary,
                concatenate
                them and then strip off the fragment: http://sic.org/vocab1 .
                Similarly, bind, say, iata: to http://iata.org/airports# and
                let iata:LGA expand to http://iata.org/airports#LGA and
                then to get the vocabulary, strip off the fragment
                http://iata.org/airports.

                The sic:0070 short-hand does not match XML/XPath QName syntax,
                so you can't use it in RDF/XML. You can't even make up a QName
                for the URI http://sic.org/vocab1#0070 so you simply can't use
                it as a property name in RDF/XML. (The example of a SIC code
                is not something that you're likely to want to use as an RDF
                property name


                Option B: Bind sic: to http://sic.org/vocab1 and use sic:0070;
                To get a URI for that code, concatenate them with a # between:
                http://sic.org/vocab1#0070 . To get a URI for the vocabularly,
                look in the binding, and get http://sic.org/vocab1 .

                Option C: Like A, but for any codes that don't start with an XML name
                start character, put a _ in front of it before you use it in any of
                these
                web technolgies. So sic:_0070 is the short syntax,
                http://sic.org/vocab1#_0070
                is the URI for the code, and again, to get the URI for the vocab,
                strip off the fragment: http://sic.org/vocab1 .
                Now we can use the short syntax as a QName in RDF/XML.

                In Option C, the IATA stuff is the same as in Option A:
                bind iata: to http://iata.org/airports# and
                let iata:LGA expand to http://iata.org/airports#LGA
                and strip off the fragment to get the vocabulary and get
                http://iata.org/airports .


                There might have been some other options that I've forgotten.

                And I'm not sure to what extent compatibility with existing NewsML
                practice is a requirement.

                The proposal you make here seems much more complicated
                than any of those options, and it involves a lot more coordination
                (new rules that bindin on "Groups within the W3C and elsewhere").

                > 1 We agree on a generic syntax and generic rules for Compact URIs
                > (CURIEs) in attribute values.
                >
                > 2 We agree that restricted syntaxes and rules will be (or have
                > been) defined for specific purposes. One such purpose is XML
                > Namespaces and QNames.
                >
                > 3 Groups within the W3C and elsewhere will define other restricted
                > syntaxes and rules for their own purposes.
                >
                > 4 The generic syntax for a CURIE in an attribute value will be:
                > <foo bar="prefix:suffix"/>
                >
                > 5 The generic syntax for multiple CURIEs in an attribute value
                > will (where permitted) be:
                > <foo bar="prefix1:suffix1 ... prefixN:suffixN"/>
                >
                > 6 Both the prefix and the suffix may (in the generic case) be
                > numeric.
                >
                > 7 Each language must specify:
                >
                > 7a the syntactic constraints (if any) on the prefix and suffix.
                >
                > 7b how CURIEs and URIs are distinguished, eg through dedicated
                > attributes or through a special syntax.
                >
                > 7c the mechanism for specifying the prefix-to-IRI mapping. The
                > mechanism may use information provided out-of-band.
                >
                > 7d whether and, if so, how the prefix and suffix are combined to
                > form an IRI.
                >
                > 7e whether the prefix and suffix form a tuple or whether they are
                > just a compact representation for an IRI.
                >
                > 7f whether the IRI mapped to the prefix is required to be
                > dereferenceable.
                >
                > 7g whether the IRI built from the prefix and suffix (and, possibly,
                > including also other building blocks) is required to be
                > dereferenceable.
                >
                > 7h whether any fragment identifiers in these IRIs are required to
                > be legal XML names.
                >
                > 8 To avoid confusion with XML Namespaces and QNames:
                >
                > 8a The xmlns attribute is reserved for use with XML Namespaces and
                > QNames.
                >
                > 8b If a prefix matches an xmlns declaration then the CURIE MUST be
                > interpreted as a QName.
                >
                > Misha
                > ------------------- NewsML 2 resources ------------------------------
                > http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                > http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2



                --
                Dan Connolly, W3C http://www.w3.org/People/Connolly/
              • Misha Wolf
                Hi Dan, Various groups are interested in the CURIE initiative. These groups don t all have the same requirements. I hope that we can agree on a good solution,
                Message 7 of 14 , Jun 13, 2006
                • 0 Attachment
                  Hi Dan,

                  Various groups are interested in the CURIE initiative. These groups
                  don't all have the same requirements. I hope that we can agree on a
                  good solution, which meets all of the requirements. During our
                  post-presentation discussion in Edinburgh, we discussed specifically
                  the IPTC's requirements. The mail you responded to below was my
                  attempted synthesis, which tackles a broader canvas than just the
                  IPTC's needs. I've summarised some of the options for the IPTC's
                  problem space, and some of the problems with those options, in my
                  reply to Henry:
                  http://lists.w3.org/Archives/Public/www-tag/2006Jun/0046.html

                  I'll respond now to specific points in your mail:

                  > As I understand it, IPTC has a whole bunch of codes... collections
                  > of codes, in fact. Vocabularies, I gather.

                  Indeed. Note that many of these vocabularies exist independently of
                  the IPTC, eg:
                  - BCP-47 (eg "zh-Hant", ie Traditional Chinese)
                  - CUSIP (eg "037833100", ie Apple Computer)
                  - ISBN (eg "0-321-18578-1", ie The Unicode Standard, Version 4.0)
                  - ISIN (eg "US0378331005", ie Apple Computer)
                  - ISO-3166-Alpha-2 (eg "CS", ie Serbia and Montenegro)
                  - ISO-4217-Alpha (eg "JPY", ie Japanese Yen)
                  - ISO-4217-Num (eg "392", ie Japanese Yen)
                  - ISSN (eg "0261-3077", ie The Guardian)
                  - NYSE (eg "A", ie Agilent Technologies)
                  - SEDOL (eg "0263494", ie BAE Systems)
                  - Valoren (eg "1203203", ie UBS)

                  > The goal is a compact syntax to encode a code within a vocabulary,
                  > such that you can get from this compact syntax a URI for the code
                  > within the vocabulary and for the vocabulary itself.

                  And to ensure that receiving systems and people receive codes which
                  they understand.

                  > Some of the codes start with digits. We suspect (though we're not
                  > certain) that vocabularies are homogeneous in this respect: within
                  > a vocabulary, either all the codes start with a digit or none do.
                  >
                  > I gather these are for use in NewsML2, and there's a desire
                  > to share technology between NewsML2 and XHTML2 and other
                  > formats and to use the URIs with RDF tools.
                  >
                  > We discussed a number of possibilities... for the sake of
                  > example, a numeric code set I know about (though I'm not at all
                  > sure it's actually used in IPTC...) is SIC codes
                  > (http://en.wikipedia.org/wiki/SIC_codes ) and a non-numeric
                  > code set that I know about is IATA codes
                  > (http://en.wikipedia.org/wiki/IATA_airport_code ).
                  >
                  > Option A. Have a syntax for binding, say, sic: to
                  http://sic.org/vocab1#
                  > and use sic:0070 for a code. To get a URI for that code, concatenate
                  > them.
                  > http://sic.org/vocab1#0070 . To get a URI for the vocabulary,
                  > concatenate
                  > them and then strip off the fragment: http://sic.org/vocab1 .
                  > Similarly, bind, say, iata: to http://iata.org/airports# and
                  > let iata:LGA expand to http://iata.org/airports#LGA and
                  > then to get the vocabulary, strip off the fragment
                  > http://iata.org/airports.
                  >
                  > The sic:0070 short-hand does not match XML/XPath QName syntax,
                  > so you can't use it in RDF/XML. You can't even make up a QName
                  > for the URI http://sic.org/vocab1#0070 so you simply can't use
                  > it as a property name in RDF/XML. (The example of a SIC code
                  > is not something that you're likely to want to use as an RDF
                  > property name
                  >
                  > Option B: Bind sic: to http://sic.org/vocab1 and use sic:0070;
                  > To get a URI for that code, concatenate them with a # between:
                  > http://sic.org/vocab1#0070 . To get a URI for the vocabularly,
                  > look in the binding, and get http://sic.org/vocab1 .
                  >
                  > Option C: Like A, but for any codes that don't start with an XML name
                  > start character, put a _ in front of it before you use it in any of
                  > these
                  > web technolgies. So sic:_0070 is the short syntax,
                  > http://sic.org/vocab1#_0070
                  > is the URI for the code, and again, to get the URI for the vocab,
                  > strip off the fragment: http://sic.org/vocab1 .
                  > Now we can use the short syntax as a QName in RDF/XML.

                  We can't do this as receiving systems (and people) would not
                  recognise the codes.

                  > In Option C, the IATA stuff is the same as in Option A:
                  > bind iata: to http://iata.org/airports# and
                  > let iata:LGA expand to http://iata.org/airports#LGA
                  > and strip off the fragment to get the vocabulary and get
                  > http://iata.org/airports .
                  >
                  >
                  > There might have been some other options that I've forgotten.

                  The option I favour is:
                  vocabIRI = http://sic.org/vocab1
                  prefix = sic
                  suffix (aka code) = 0070
                  CURIE = sic:0070
                  construction rule = <vocabIRI> & "#_" & <code>
                  codeIRI = http://sic.org/vocab1#_0070

                  > And I'm not sure to what extent compatibility with existing NewsML
                  > practice is a requirement.

                  It isn't. But compatibility with the real world *is* a requirement.

                  > The proposal you make here seems much more complicated
                  > than any of those options, and it involves a lot more coordination
                  > (new rules that bindin on "Groups within the W3C and elsewhere").

                  See my intro.

                  Regards,
                  Misha
                  ------------------- NewsML 2 resources ------------------------------
                  http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                  http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


                  To find out more about Reuters visit www.about.reuters.com

                  Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
                • Dan Connolly
                  ... I m a little in the dark... if I knew what the other groups are involved and what their requirements are, I d be in a better position to evaluate the
                  Message 8 of 14 , Jun 13, 2006
                  • 0 Attachment
                    On Jun 13, 2006, at 12:01 PM, Misha Wolf wrote:
                    > Hi Dan,
                    >
                    > Various groups are interested in the CURIE initiative. These groups
                    > don't all have the same requirements. I hope that we can agree on a
                    > good solution, which meets all of the requirements. During our
                    > post-presentation discussion in Edinburgh, we discussed specifically
                    > the IPTC's requirements. The mail you responded to below was my
                    > attempted synthesis, which tackles a broader canvas than just the
                    > IPTC's needs.

                    I'm a little in the dark... if I knew what the other groups are
                    involved and
                    what their requirements are, I'd be in a better position to evaluate
                    the proposal... and in a better position to know if a critical mass
                    of the relevant constituents agree.


                    > I've summarised some of the options for the IPTC's
                    > problem space, and some of the problems with those options, in my
                    > reply to Henry:
                    > http://lists.w3.org/Archives/Public/www-tag/2006Jun/0046.html
                    >
                    > I'll respond now to specific points in your mail:
                    >
                    >> As I understand it, IPTC has a whole bunch of codes... collections
                    >> of codes, in fact. Vocabularies, I gather.
                    >
                    > Indeed. Note that many of these vocabularies exist independently of
                    > the IPTC, eg:
                    > - BCP-47 (eg "zh-Hant", ie Traditional Chinese)
                    > - CUSIP (eg "037833100", ie Apple Computer)
                    > - ISBN (eg "0-321-18578-1", ie The Unicode Standard, Version 4.0)
                    > - ISIN (eg "US0378331005", ie Apple Computer)
                    > - ISO-3166-Alpha-2 (eg "CS", ie Serbia and Montenegro)
                    > - ISO-4217-Alpha (eg "JPY", ie Japanese Yen)
                    > - ISO-4217-Num (eg "392", ie Japanese Yen)
                    > - ISSN (eg "0261-3077", ie The Guardian)
                    > - NYSE (eg "A", ie Agilent Technologies)
                    > - SEDOL (eg "0263494", ie BAE Systems)
                    > - Valoren (eg "1203203", ie UBS)
                    >
                    >> The goal is a compact syntax to encode a code within a vocabulary,
                    >> such that you can get from this compact syntax a URI for the code
                    >> within the vocabulary and for the vocabulary itself.
                    >
                    > And to ensure that receiving systems and people receive codes which
                    > they understand.

                    Can we expect receving systems/people to learn about whatever
                    proposal we come up with? Or do they have to understand based
                    on what they already know, without any code changes for systems
                    and without people reading more specs (or other docs)?

                    If we can't expect systems to pick up new technology, that's
                    sort of a non-starter, no?

                    [...]
                    >> Option C: Like A, but for any codes that don't start with an XML name
                    >> start character, put a _ in front of it before you use it in any of
                    >> these
                    >> web technolgies. So sic:_0070 is the short syntax,
                    >> http://sic.org/vocab1#_0070
                    >> is the URI for the code, and again, to get the URI for the vocab,
                    >> strip off the fragment: http://sic.org/vocab1 .
                    >> Now we can use the short syntax as a QName in RDF/XML.
                    >
                    > We can't do this as receiving systems (and people) would not
                    > recognise the codes.

                    Receiving systems can execute the algorithm and determine
                    the relevant URIs and then look up the URIs in the Web, no?

                    >> In Option C, the IATA stuff is the same as in Option A:
                    >> bind iata: to http://iata.org/airports# and
                    >> let iata:LGA expand to http://iata.org/airports#LGA
                    >> and strip off the fragment to get the vocabulary and get
                    >> http://iata.org/airports .
                    >>
                    >>
                    >> There might have been some other options that I've forgotten.
                    >
                    > The option I favour is:
                    > vocabIRI = http://sic.org/vocab1
                    > prefix = sic
                    > suffix (aka code) = 0070
                    > CURIE = sic:0070
                    > construction rule = <vocabIRI> & "#_" & <code>
                    > codeIRI = http://sic.org/vocab1#_0070

                    That looks reasonable as far as I can tell.


                    >> And I'm not sure to what extent compatibility with existing NewsML
                    >> practice is a requirement.
                    >
                    > It isn't.

                    Good to know.

                    > But compatibility with the real world *is* a requirement.

                    I don't know what to make of that remark.

                    >> The proposal you make here seems much more complicated
                    >> than any of those options, and it involves a lot more coordination
                    >> (new rules that bindin on "Groups within the W3C and elsewhere").
                    >
                    > See my intro.
                    >
                    > Regards,
                    > Misha
                    > ------------------- NewsML 2 resources ------------------------------
                    > http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                    > http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2



                    --
                    Dan Connolly, W3C http://www.w3.org/People/Connolly/
                  • Misha Wolf
                    Hi Dan, ... First of all there is the RDF-in-XHTML task force. And then there are the various W3C specifications tracked down by Mark, which variously claim
                    Message 9 of 14 , Jun 13, 2006
                    • 0 Attachment
                      Hi Dan,

                      > I'm a little in the dark... if I knew what the other groups are
                      > involved and what their requirements are, I'd be in a better
                      > position to evaluate the proposal... and in a better position to
                      > know if a critical mass of the relevant constituents agree.

                      First of all there is the RDF-in-XHTML task force. And then
                      there are the various W3C specifications tracked down by Mark,
                      which variously claim to be using QNames but aren't, or are
                      explicitly using assorted quasi-QNames. Details of a number of
                      these were given by Mark in his presentation at Edinburgh and in
                      earlier mails.

                      As far as proposals are concerned, I distinguish between the
                      specific proposal for solving the IPTC's problem, and a more
                      general proposal for an architecture which encompasses all of
                      these abbreviated forms, including QNames (given in my mail at
                      the start of this thread).

                      > > And to ensure that receiving systems and people receive codes
                      > > which they understand.

                      > Can we expect receving systems/people to learn about whatever
                      > proposal we come up with? Or do they have to understand based
                      > on what they already know, without any code changes for systems
                      > and without people reading more specs (or other docs)?
                      >
                      > If we can't expect systems to pick up new technology, that's
                      > sort of a non-starter, no?

                      Lots of systems insert/store/display naked codes directly. As we
                      are trying to make NewsML 2 easy to use, we aren't going to be
                      telling people that they have to mangle their data because some
                      W3C spec can't cope with it the way it is. We are very happy to
                      generate mangled URIs for (X)HTML/RDF pages documenting the
                      vocabularies, but we won't mangle the base data. In another mail
                      I gave the example of IRIs vs URIs. It has been accepted that
                      users need to be able to include resource identifiers, as they
                      understand them, in XML documents. The mangling to URIs happens
                      behind the scenes.

                      > Receiving systems can execute the algorithm and determine
                      > the relevant URIs and then look up the URIs in the Web, no?

                      Indeed.

                      > > The option I favour is:
                      > > vocabIRI = http://sic.org/vocab1
                      > > prefix = sic
                      > > suffix (aka code) = 0070
                      > > CURIE = sic:0070
                      > > construction rule = <vocabIRI> & "#_" & <code>
                      > > codeIRI = http://sic.org/vocab1#_0070

                      > That looks reasonable as far as I can tell.

                      > > [...] compatibility with the real world *is* a requirement.

                      > I don't know what to make of that remark.

                      Simply a restatement of the position that the codes themselves
                      must be left as is. To see why, try any of these in Google:

                      CUSIP 037833100 -> Apple Computer

                      SEDOL 0263494 -> BAE Systems

                      Valoren 1203203 -> UBS

                      ISBN 0-321-18578-1 -> The Unicode Standard

                      ISSN 0261-3077 -> The Guardian

                      ISO 4217 392 -> Japanese Yen

                      Then try them again, this time prefixing the numeric value with
                      a "_". The result is, in each case except for the last one:

                      Your search - [...] - did not match any documents.

                      Suggestions:
                      Make sure all words are spelled correctly.
                      Try different keywords.
                      Try more general keywords.
                      Try fewer keywords.

                      In the last case, three hits are shown for the string:
                      ISO 4217 _392
                      but they are all irrelevant.

                      Regards,
                      Misha
                      ------------------- NewsML 2 resources ------------------------------
                      http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                      http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


                      To find out more about Reuters visit www.about.reuters.com

                      Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
                    • Misha Wolf
                      Hi Karl, I was not discussing URI syntax. I was explaining that these vocabularies and codes are in widespread use and was using Google search to demonstrate
                      Message 10 of 14 , Jun 14, 2006
                      • 0 Attachment
                        Hi Karl,

                        I was not discussing URI syntax. I was explaining that these
                        vocabularies and codes are in widespread use and was using Google
                        search to demonstrate this. While a search for each of the
                        strings on the left-hand side does yield many hits, these hits
                        include resources related to the entities I listed on the right-
                        hand side.

                        Regards,
                        Misha
                        ------------------- NewsML 2 resources ------------------------------
                        http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                        http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2



                        -----Original Message-----
                        From: Karl Dubost [mailto:karl@...]
                        Sent: 14 June 2006 02:21
                        To: Misha Wolf
                        Cc: Dan Connolly; www-tag@...; newsml-2@yahoogroups.com; public-rdf-in-xhtml-tf@...
                        Subject: Re: CURIEs: A proposal


                        Le 06-06-14 à 07:01, Misha Wolf a écrit :
                        > Simply a restatement of the position that the codes themselves
                        > must be left as is. To see why, try any of these in Google:
                        >
                        > CUSIP 037833100 -> Apple Computer
                        > SEDOL 0263494 -> BAE Systems
                        > Valoren 1203203 -> UBS
                        > ISBN 0-321-18578-1 -> The Unicode Standard
                        > ISSN 0261-3077 -> The Guardian
                        > ISO 4217 392 -> Japanese Yen

                        I do not have the same answers AND more I do not have one answer but
                        a list of links going on multiple web pages when I have tried on
                        http://www.alltheweb.com/
                        http://www.av.com/
                        http://www.google.com/

                        I'm not sure the result of a search can be considered as an
                        identifier. Or I have missed something in this thread.

                        This is for example a possible identifier
                        http://worldcatlibraries.org/wcpa/isbn/0-321-18578-1
                        These work too (which are used often. Unfortunately, but you talked
                        about real world)
                        http://www.amazon.co.jp/gp/product/0321185781/
                        http://www.amazon.com/gp/product/0321185781/
                        Or these ones?
                        http://isbn.nu/0-321-18578-1
                        http://isbn.nu/0321185781



                        --
                        Karl Dubost - http://www.w3.org/People/karl/
                        W3C Conformance Manager, QA Activity Lead
                        QA Weblog - http://www.w3.org/QA/
                        *** Be Strict To Be Cool ***


                        To find out more about Reuters visit www.about.reuters.com

                        Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
                      • Dan Connolly
                        ... In today s discussion[1], we realized airport:LGA in NewsML 2 wouldn t work like airport:LGA in, say, SPARQL. We also realized that if an RDF user copied a
                        Message 11 of 14 , Jun 14, 2006
                        • 0 Attachment
                          On Jun 13, 2006, at 6:01 PM, Misha Wolf wrote:
                          > [...]
                          >>> The option I favour is:
                          >>> vocabIRI = http://sic.org/vocab1
                          >>> prefix = sic
                          >>> suffix (aka code) = 0070
                          >>> CURIE = sic:0070
                          >>> construction rule = <vocabIRI> & "#_" & <code>
                          >>> codeIRI = http://sic.org/vocab1#_0070
                          >
                          >> That looks reasonable as far as I can tell.

                          In today's discussion[1], we realized airport:LGA in NewsML 2
                          wouldn't work like airport:LGA in, say, SPARQL. We also
                          realized that if an RDF user copied a namespace URI
                          ending with a # to a NewsML document, that construction
                          rule would result in a bogus URI with two ##s.

                          What do you think of this...?

                          construction-rule =
                          if vocabURI ends with a #, s1 = '' (empty string) else s1 = '#'
                          if code does not start with a name-start character, s2 = '_' else s2
                          = ''
                          return <vocabURI> + s1 + s2 + <code>

                          3 examples:

                          vocabIRI = http://sic.org/vocab1
                          prefix = sic
                          suffix (aka code) = 0070
                          CURIE = sic:0070
                          codeIRI = http://sic.org/vocab1#_0070

                          vocabIRI = http://sic.org/vocab2#
                          prefix = sic
                          suffix (aka code) = 0070
                          CURIE = sic:0070
                          codeIRI = http://sic.org/vocab2#_0070

                          vocabIRI = http://sic.org/vocab3
                          prefix = country
                          suffix (aka code) = canada
                          CURIE = country:canada
                          codeIRI = http://sic.org/vocab3#canada


                          [1] minutes of TAG ftf Wed 14 Jun, to appear

                          --
                          Dan Connolly, W3C http://www.w3.org/People/Connolly/
                        • Laurent Le Meur
                          Hi Dan, ... I guess that this rule can be implemented by any baby programmer in any language, so it is totally acceptable. One extra interest is that one is
                          Message 12 of 14 , Jun 14, 2006
                          • 0 Attachment
                            Hi Dan,

                            > construction-rule =
                            > if vocabURI ends with a #, s1 = '' (empty string) else s1 = '#'
                            > if code does not start with a name-start character, s2 = '_' else s2
                            > = ''
                            > return <vocabURI> + s1 + s2 + <code>

                            I guess that this rule can be implemented by any baby programmer in any
                            language, so it is totally acceptable.

                            One extra interest is that one is able to get back from a full URI to a CURIE:
                            CURIE construction-rule =
                            get the first "#', from right to left
                            map the left part to any prefix you want
                            if the end part begins with a '_' and is followed by a digit, strip the '_' to
                            get the code; if not you've got the code already.

                            Laurent Le Meur
                            NewsML-2 Architecture WP chair
                            Agence France Presse



                            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

                            This e-mail, and any file transmitted with it, is confidential and intended solely for the use of the individual or entity to whom it is addressed. If you have received this email in error, please contact the sender and delete the email from your system. If you are not the named addressee you should not disseminate, distribute or copy this email.

                            For more information on Agence France-Presse, please visit our web site at http://www.afp.com

                            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                          • Misha Wolf
                            Hi Dan, ... In the absence of [1] it is not possible for me to form a definite opinion, but I have a strong feeling of mutual incomprehension. How can the TAG
                            Message 13 of 14 , Jun 15, 2006
                            • 0 Attachment
                              Hi Dan,

                              > [1] minutes of TAG ftf Wed 14 Jun, to appear

                              In the absence of [1] it is not possible for me to form a definite
                              opinion, but I have a strong feeling of mutual incomprehension.
                              How can the TAG form an opinion on the URI construction rule in the
                              absence of an overall architectural vision? Maybe such a vision
                              has been formulated and will emerge in [1]. I won't spend too much
                              time on this now, as I'm acting on insufficient information, so
                              will make just a few brief points:

                              - For excellent reasons, there is no universal rule for how one
                              should construct the IRI for an element/attribute name from a
                              namespace IRI and a localname. Is the TAG proposing that in
                              the case of attribute values alone, there should be a universal
                              rule?

                              - Even in the case of attribute values, languages may wish to
                              build URIs using "/" as the delimiter rather than "#". How
                              can there be a single rule for all?

                              - "airport:LGA" is just a string in the absence of language-
                              specific rules, such as how the IRI-to-prefix mapping is
                              established. QNames use xmlns for this purpose. As I showed
                              in my Edinburgh presentation, NewsML 2 does not. I have no
                              idea what SPARQL uses.

                              So, again, I'm asking for an overall architectural vision, and I
                              refer you to my "CURIEs: A proposal", 2 June 2006:
                              http://lists.w3.org/Archives/Public/www-tag/2006Jun/0007.html

                              Regards,
                              Misha
                              ------------------- NewsML 2 resources ------------------------------
                              http://www.iptc.org | http://www.iptc.org/std-dev/NAR/1.0
                              http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


                              To find out more about Reuters visit www.about.reuters.com

                              Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
                            • Dan Connolly
                              ... Good question. I think not... I think this is input to the NewsML 2 design, intended to minimize confusion when people want to transfer their knowledge
                              Message 14 of 14 , Jun 19, 2006
                              • 0 Attachment
                                On Thu, 2006-06-15 at 13:37 +0100, Misha Wolf wrote:
                                > Hi Dan,
                                >
                                > > [1] minutes of TAG ftf Wed 14 Jun, to appear
                                >
                                > In the absence of [1] it is not possible for me to form a definite
                                > opinion, but I have a strong feeling of mutual incomprehension.
                                > How can the TAG form an opinion on the URI construction rule in the
                                > absence of an overall architectural vision? Maybe such a vision
                                > has been formulated and will emerge in [1]. I won't spend too much
                                > time on this now, as I'm acting on insufficient information, so
                                > will make just a few brief points:
                                >
                                > - For excellent reasons, there is no universal rule for how one
                                > should construct the IRI for an element/attribute name from a
                                > namespace IRI and a localname. Is the TAG proposing that in
                                > the case of attribute values alone, there should be a universal
                                > rule?

                                Good question. I think not... I think this is input to the NewsML 2
                                design, intended to minimize confusion when people want to transfer
                                their knowledge from NewsML 2 to SPARQL and back.

                                It's not clear that the TAG is proposing anything, by the way;
                                I don't think the chair put a question to the group.

                                As to mutual incomprehension, yes, it seems a higher-bandwidth
                                medium is in order... more on that separately.


                                --
                                Dan Connolly, W3C http://www.w3.org/People/Connolly/
                                D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
                              Your message has been successfully submitted and would be delivered to recipients shortly.