Loading ...
Sorry, an error occurred while loading the content.

The Mire 'twixt Documents And Data

Expand Messages
  • Sean B. Palmer
    Documents are here to stay, and so is data. Roughly put, we have HTML/WWW for documents, and XML/RDF/SW for data. The problem we all face is that it is very
    Message 1 of 2 , Dec 2, 2000
    View Source
    • 0 Attachment
      Documents are here to stay, and so is data. Roughly put, we have HTML/WWW
      for documents, and XML/RDF/SW for data. The problem we all face is that it
      is very rare to have either a pure document or pure data. Documents always
      have data to back them up, and consequently data always needs some kind of
      prose explanation.
      Look upon this as "explicit reification" if you must: everything needs a
      prose definition at some level. Does this mean the SW has failed before it
      has started? Of course not! It will work for pure data models, but there
      aren't all that many pure data models out there..the information we mainly
      deal with is simply annotated data.
      At the moment it appears that we have a mini formatting war [1] going on for
      documents vs. data, and the ongoing battles about XML Schema vs. XML DTDs
      (or put a bit more rationally XML vs. XHTML). But why can't we just come to
      a sort of half document half data consensus?
      [[[
      I believe that one of the best ways to transition into RDF, if not a
      long-term deployment strategy for RDF, is to manage the information in
      human-consumable form (XHTML) annotated with just enough info to extract the
      RDF statements that the human info is intended to convey. [...] We all know
      that we have to produce a human-readable version of the thing... why not use
      that as the primary source?
      ]]] - [2]
      Or in other words, using XHTML [3] as a repository for data, but one that
      can still be marked up with annotations, explanations, and summaries...aha!
      The key concepts we have here is the following: Data can be stored somehow
      in XHTML, and annotated with two different types of further data -
      annotation intended to facilitate the machine transformation and extraction
      of that data into machine (RDF?) form, and annotation to assist humans in
      the interpretation of that data [4].
      The two most important building blocks for this conversation will be these
      simple little tags and attributes (their meanings are self-explanatory):-

      <annotation xmlns="[TBD]">
      <inverseOf
      xmlns="http://www.daml.org/2000/10/daml-ont.daml">
      @annotation @class @type

      If we added those simple tags etc. to a kind of XHTML slurry, then we would
      have a lot more power to walk through the mire 'twixt documents and data.
      But this is all an abstract conversation isn't it? Not really. Browsers
      worldwide grok XHTML, and a few can use CSS to style other forms of XML. At
      the moment, to cleanly extract data from XHTML, we have to pepper it (i.e.
      annotate it) with hundreds of "classes" - class attributes [5] to imply our
      meaning, for example as discussed in the semantic design principles [6], and
      so instead we could just add a few custom based annotation and logic based
      tags (like the ones above) to (e.g.) m12n, and create a transformable form
      of XHTML, to bridge the gap.
      Strangely enough, the W3C's Amaya already has an annotation system [7], and
      an annotation server [8]. But it doesn't tie into the document at all, and
      therefore I doubt it has any usage at all (sorry!). However, the principle
      of using annotations with data is a great idea, and one that surely should
      be pursued.
      Summary:-
      We need some kind of "lingua franca" to annotate data in such a form so as
      to be human readable, and transformable into machine readable format. (And
      yes, this does have smackings of SDF [9]).

      There aren't many examples of semantically annotated XHTML out there (in
      fact, I can't ifnd one satisfactory one...) so I urge people to create
      examples.

      References:-
      [1] http://doctypes.org/
      - Doctypes.org, M. Altheim
      [2] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0103.html
      - XSLT for screen-scraping RDF out of real-world data, Dan Connolly
      [3] http://www.w3.org/TR/xhtml1/
      - XHTML 1.0, Steven Pemberton et al.
      [4] http://www.mysterylights.com/sbp/#docordata
      - Documents vs. Data, Sean B. Palmer
      [5] http://www.w3.org/TR/html401/struct/global.html#adef-class
      - The class Attribute - HTML 4.01, Dave Raggett et al.
      [6] http://www.mysterylights.com/sbp/#semanticprinciples
      - Design Principles to Aid Semantics, Sean B. Palmer
      [7]
      http://www.w3.org/2000/02/collaboration/annotation/AmayaDocs/Annotation.html
      - Annotations in Amaya
      [8] http://annotest.w3.org/
      - The W3Cs Annotea project
      [9] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0033.html
      - Semantic Document Frameworks, Sean B. Palmer

      P.S. Apologies for the cross post: this note (i.e. rant) covers quite a few
      topics...

      Kindest Regards,
      Sean B. Palmer
      http://www.mysterylights.com/sbp/
      http://www.w3.org/WAI/ [ERT/GL/PF]
      "Perhaps, but let's not get bogged down in semantics."
      - Homer J. Simpson, BABF07.
    • Sean B. Palmer
      ... Yes, that s exactly my point. We need some kind of system (that isn t too comlicated) that allows us to anotate data, and have it either processed or
      Message 2 of 2 , Dec 2, 2000
      View Source
      • 0 Attachment
        > With xlink, topic maps, and RDF, we have plenty of possiblities for
        > annotating documents, even third-party documents. Provided, that
        > is, they are marked up in some useful way. xhtml isn't usually
        > enough for that.

        Yes, that's exactly my point. We need some kind of system (that isn't too
        comlicated) that allows us to anotate data, and have it either processed or
        displayed. I'm sure if people just added some proprietary extensions to
        XHTML, took away some of the junk, and mixed in a few other languages (XLink
        and RDF) then you would have the basis for an architecturally sound, well
        processable "language".

        > What we need are usable tools, preferably gui editor-like tools, to let us
        do
        > these things. We've got the standards infrastructure, I think. I want to
        be
        > able to take a document, highlight parts and add notes, comments [...]
        > and links to other documents, [...]

        This is the next major step towards a brighter Web (dare I say the Semantic
        Web). Someone needs to create a "Mosaic for the SW", and they have to do it
        fairly soon. We need something like the very original NEXT browser that
        TimBL created: somthing that allows you to read and write to the Web in a
        WYSIWYG environment without needing to see the source code or URIs. Also,
        that "browser/editor" is going to settle on some type of output, and I think
        it will be something similar to the annotated data in XHTML thing I am
        talking about. Amaya already is 1% of the way there, but people need to move
        it on the next 99%...
        It would also be compatable with the Web of trust, and of only partly
        recognizable languages, becuas it should be able to work out most
        statements, but for the main part you would have the basic structural
        framework (using some XHTML tags) and then have the data annotated with
        XLink, RDF and so on. Bringing all of these things together is one of the
        hardest things, but like I say, Amaya is getting there. Also IE5 can display
        plain XML with CSS, so if some group of people developed some of these
        principles and brought them al in line, there is no stopping them creating
        the "Mosaic of the SW".

        Of course, people must first settle onto the act that docuents and data are
        inseperable, and that was the original purpose of the first message, and
        this one moves me onto the next level (and hence this question): is anyone
        developing an WYSIWYG annotation GUI?

        Kindest Regards,
        Sean B. Palmer
        http://www.mysterylights.com/sbp/
        http://www.w3.org/WAI/ [ERT/GL/PF]
        "Perhaps, but let's not get bogged down in semantics."
        - Homer J. Simpson, BABF07.

        > Sean B. Palmer wrote about mixing xhtml with annotation markup -
        >
        > ...
        > > I believe that one of the best ways to transition into RDF, if not a
        > > long-term deployment strategy for RDF, is to manage the information in
        > > human-consumable form (XHTML) annotated with just enough info to extract
        the
        > > RDF statements that the human info is intended to convey. [...] We all
        know
        > > that we have to produce a human-readable version of the thing... why not
        use
        > > that as the primary source?
        > > ]]] - [2]
        > > Or in other words, using XHTML [3] as a repository for data, but one
        that
        > > can still be marked up with annotations, explanations, and
        summaries...aha!
        > > The key concepts we have here is the following: Data can be stored
        somehow
        > > in XHTML, and annotated with two different types of further data -
        > > annotation intended to facilitate the machine transformation and
        extraction
        > > of that data into machine (RDF?) form, and annotation to assist humans
        in
        > > the interpretation of that data [4].
        > ...
        > > If we added those simple tags etc. to a kind of XHTML slurry, then we
        would
        > > have a lot more power to walk through the mire 'twixt documents and
        data.
        > > But this is all an abstract conversation isn't it? Not really. Browsers
        > > worldwide grok XHTML, and a few can use CSS to style other forms of XML.
        At
        > > the moment, to cleanly extract data from XHTML, we have to pepper it
        (i.e.
        > > annotate it) with hundreds of "classes" - class attributes [5] to imply
        our
        > > meaning, for example as discussed in the semantic design principles [6],
        and
        > > so instead we could just add a few custom based annotation and logic
        based
        > > tags (like the ones above) to (e.g.) m12n, and create a transformable
        form
        > > of XHTML, to bridge the gap.
      Your message has been successfully submitted and would be delivered to recipients shortly.