Loading ...
Sorry, an error occurred while loading the content.
 

RE: [syndication] Digest Number 368

Expand Messages
  • David Galbraith
    Hi, ... Here is a draft of a more generalized version of using inline metadata that I ve been working on, comments appreciated. Aarons excellent script takes
    Message 1 of 7 , Sep 4, 2001
      Hi,
      Mark wrote:

      > If we're expecting a publisher to mark up their page, why not give
      > them better control over what becomes the title, the link, etc.,
      > rather than playing guesswork?

      Here is a draft of a more generalized version of using inline metadata that
      I've been working on, comments appreciated.
      Aarons' excellent script takes a link mentioned in a weblog posting and
      assumes that the surrounding commentary is about that link (if there is more
      than one link in a posting then you have to choose which one the commentary
      is about). If you like, the XML that is produced is metadata about another
      page other than the one where the metadata is published within span tags.
      The more generalized approach assumes that you want to automatically
      generate metadata about the page that you are running the script over, be
      able to generate richer metadata and allow for multiple links.
      In addition you will want to use namespaces defined by URI's to avoid
      collisions and it may be usefull to use the concept of weblog style
      permanent archiving to attach the default namespaces at the level of
      individual postings as opposed to web pages which are a transient thing. In
      other words metadata should be attached to a piece of content (a posting
      which may be a paragraph or several pages) as it was authored, as opposed to
      a page which is merely a rendering of part of some content or a collection
      of different pieces of content.

      Below is a rough draft of what a marked up HTML would look like page (the
      markup is called swml 'semantic web markup language' for want of a better
      description):

      <html>
      <head>
      <title>The Liar : !</title>
      <!—because we are using the weblog method of defining permanent links to
      items as they
      appear in an archive the metadata can be extracted from individual nuggets
      of information on
      the page, the page is merely are temporary view of the data, this allows for
      permanent
      extraction of information from dynamic websites and is analagous to a view
      of records from a
      database and the like—>
      <!—because we are using swml strict the following stylesheet reference can
      define layout
      which will notbe extracted as metadata—>
      <link rel=”stylesheet” type=”text/css” href=”/stylesheets/pretty_things.css”
      >
      <!—the mention of swml strict tells parsers to only extract metadata
      explicitly defined using
      swml class attributes—>
      <META NAME=”Keywords” CONTENT=”weblog stuff, swml, strict”>
      <META NAME=”Description” CONTENT=”swml strict”>
      <meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1">
      <body>
      <span class=”swml—rdf—item”>
      <!— the following line is a blank link so that references to this item point
      to its top (this is
      common weblog usage and has nothing to do with swml per se)—>
      <a name=”3748433"></a>
      <span class=”swml—blog—date_posted”>22.5.01...</span>
      <blockquote>
      <span class=”swml—rss1—description”>
      <!—the following swml class is a link plus content, the contents of the link
      are wrapped in a
      tag using the name of the class plus _text and the link itself is given the
      name of the class
      plus _link, after extraction—>
      <a class=”sml—blog—headline” href=”
      http://c.moreover.com/click/here.pl?x19439590">Fears
      of a clown</a>
      <br>Things look bleak for <span class=”swml—name”>Bob Manion</span>
      , aka “Flasher the Clown”. For the past two decades, Bob has
      <br>graced public festivals near his home in Clayton, California.
      <br>WHAT? :!
      </span><br>
      <!—we can use the same class name the order in which they are grouped
      determines what
      metadata belongs to what link—>
      <a class=”sml—blog—headline” href=”
      http://c.moreover.com/click/here.pl?x19439590">
      Things still look bleak.</a><br>
      <!—since we are using swml class notation on this page, the following item
      is ignored as
      metadata (although is still part of the parent metadata class, item) and
      items like that below
      can be rendered using css stylesheets without affecting metadata
      extraction—>
      <span class=”presentation_item”>Pretty thing.</span><br>
      <span class=”swml—blog—time-posted”>12:10 PM</span>
      <!—the following line is commented out for non-displayed inline metadata,
      but the swml
      metadata can still be extracted—>
      <!—<span class=”swml—blog—author”>David Galbraith</span>—>
      <!—the following line is what determines the rdf about attribute for the
      item it also specifies
      the actual link that points to the archived version—>
      <a class=”rdf—about-3748433" href=”http://www.theliar.com/oldlies/
      2001_05_01_oldlies.html#3748433">:.</a>
      </blockquote>
      <p>
      </span>
      <span class=”swml—rdf—item”>
      <a name=”3554025"></a>
      <span class=”swml—blog—date_posted”>8.5.01...</b></span>
      <blockquote>
      <span class=”swml—rss1—description”><a class=”sml—blog—headline” href=”
      http://
      c.moreover.com/click/here.pl?x18749952">Ugly males are better
      partners</a>The less attrac-tive
      make much better fathers because they don’t go around chasing attractive
      females.<br>
      :!</span><br>
      <span class=”swml—blog—time-posted”>2:28 PM</span>
      <!—<span class=”swml—blog—author”>David Galbraith</span>—>
      <a class=”rdf—about-3554025" href=”http://www.theliar.com/oldlies/
      2001_05_01_oldlies.html#3554025">:.</a>
      </blockquote>
      <p>
      </span>
      </body>
      </html>


      The following shows RSS 1.0 output based upon parsing the above SWML:
      Since RSS is chosen as the default namespace, all non explicitly declared
      namespaces for
      span class attributes are presumed to belong to the namespace of the archive
      URI which
      uses the alias ‘my:’ by default.
      For rendering in other vocabularies, it may be desireable to set the
      namespace aliased by
      ‘my:’ as the default.

      <?xml version=”1.0" encoding=”UTF-8" ?>
      <rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#%c2%94 xmlns:dc=”
      http://
      purl.org/dc/elements/1.1/” xmlns:sy=”
      http://purl.org/rss/1.0/modules/syndication/%c2%94
      xmlns=”http://purl.org/rss/1.0/%c2%94 xmlns:my=”http://www.theliar.com/oldlies/
      2001_05_01_oldlies.html”>
      <channel rdf:about=”http://www.theliar.com/oldlies/2001_05_01_oldlies.html%c2%94>
      <title>Permanent RDF-Ready Archive</title>
      <link>http://www.theliar.com/oldlies/2001_05_01_oldlies.html</link>
      <description>Weblog archive</description>
      <sy:updatePeriod>daily</sy:updatePeriod>
      <items>
      <rdf:Seq>
      <rdf:li rdf:resource=”
      http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3748433" />
      <rdf:li rdf:resource=”
      http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3554025" />
      </rdf:Seq>
      </items>
      </channel>
      <item rdf:about=”
      http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3748433">
      <link>http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3554025</link>
      <description>Fears of a clown. Things look bleak for Bob Manion, aka
      “Flasher the Clown”. For
      the past two decades, Bob has graced public festivals near his home in
      Clayton, California.
      WHAT? :!.</description>
      <my:name>Bob Manion</my:name>
      <blog:headline>
      <blog:headline_link>http://c.moreover.com/click/here.pl?x19439590</blog:head
      line_link>
      <blog:headline_text>Fears of a clown.</blog:headline_text>
      </blog:headline>
      <blog:headline>
      <blog:headline_link>http://c.moreover.com/click/here.pl?x19439590</blog:head
      line_link>
      <blog:headline_text>Things still look bleak.</blog:headline>
      </blog:headline>
      <blog:date_posted>22.5.01...</blog:date_posted>
      <blog:time_posted>12:10 PM</blog:time_posted>
      <blog:author>David Galbraith</blog:author>
      </item>
      <item rdf:about=”
      http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3554025">
      <link>http://www.theliar.com/oldlies/2001_05_01_oldlies.html#3554025</link>
      <description>Ugly males are better partners. The less attractive make much
      better fathers
      because they don’t go around chasing attractive females.</description>
      <blog:headline>
      <blog:headline_link>http://c.moreover.com/click/here.pl?x18749952</blog:head
      line_link>
      <blog:headline_text>Ugly males are better partners.</blog:headline_text>
      </blog:headline>
      <blog:date_posted>8.5.01...</blog:date_posted>
      <blog:time_posted>2:28 PM</blog:time_posted>
      <blog:author>David Galbraith</blog:author>
      </item>
      </rdf:RDF>

      ...................................
      David Galbraith - Chief Architect, founder
      Moreover Technologies, Inc.
      http://www.moreover.com
      mailto:david@...
      415-577-8828 (US)
      0777-565-8880 (UK)
      ...................................
      Moreover Technologies White Paper:
      "Managing Online Information to
      Maximize Corporate Intranet ROI"
      http://x.moreover.com/c/?sig
      ...................................

      >
      > Message: 1
      > Date: Mon, 3 Sep 2001 10:12:24 -0700
      > From: Mark Nottingham <mnot@...>
      > Subject: Re: RSSify your web page
      >
      >
      > Before this approach explodes too much, it seemed like the RSSify
      > engine was *only* basing items on a <span class="rss:item"> tag, and
      > using heuristics to discover the rest.
      >
      > If we're expecting a publisher to mark up their page, why not give
      > them better control over what becomes the title, the link, etc.,
      > rather than playing guesswork?
      >
      > I thought that Aaron's original engine, and the W3C version [1] did
      > this...
      >
      >
      > [1] http://www.w3.org/2000/08/w3c-synd/
      >
      >
      >
      >
    • Julian Bond
      In article , David Galbraith writes ... Interesting but... If you have control over both
      Message 2 of 7 , Sep 5, 2001
        In article <LNBBIHIAEFCFHPLINLECGEHCACAB.david@...>, David
        Galbraith <david@...> writes
        >Below is a rough draft of what a marked up HTML would look like page (the
        >markup is called swml 'semantic web markup language' for want of a better
        >description):

        Interesting but...

        If you have control over both the source and destination of this data,
        then I suppose you could do something with it. But I have the same
        problem with this that I have with the route RSSDF 1.0 has gone. It's
        all very well producing wonderful extensions to the standard but it's
        all a bit pointless isn't it, if there are no implementations of readers
        that know how to make sense of it.

        This is the problem with standards. Until you have a critical mass of
        implementations, they are irrelevant.

        So here's the questions. What would you do with this stuff in a reader?
        And how would you persuade people to produce it?

        Right now we have a standard in 0.91 (and a half) that has lots of
        implementations of both the source and destination of the data. The
        problem now is first, getting more of the rest of the world to produce
        it and second, producing better readers.

        I'm repeating myself, but the whole <span class="rss:item"> is nothing
        more than a temporary kludge. The real answer is to get RSS produced by
        default by as many CMS as possible.

        --
        Julian Bond email: julian_bond@...
        CV/Resume: http://www.voidstar.com/cv/
        WebLog: http://www.voidstar.com/
        HomeURL: http://www.shockwav.demon.co.uk/
        M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
        ICQ:33679568 tag:So many words, so little time
      • Rick Bradley
        ... To get this kind of adoption a document containing the following needs to reach people capable of making it so at any given site using any given CMS (the
        Message 3 of 7 , Sep 5, 2001
          * Julian Bond (julian_bond@...) [010905 14:28]:
          > Right now we have a standard in 0.91 (and a half) that has lots of
          > implementations of both the source and destination of the data. The
          > problem now is first, getting more of the rest of the world to produce
          > it and second, producing better readers.
          >
          > I'm repeating myself, but the whole <span class="rss:item"> is nothing
          > more than a temporary kludge. The real answer is to get RSS produced by
          > default by as many CMS as possible.

          To get this kind of adoption a document containing the following needs
          to reach people capable of making it so at any given site using any
          given CMS (the bulk of which I contend are likely to be more
          home-grown than off-the-shelf):

          - a terse and dead-simple explanation of why RSS matters to those
          who will pay for its integration (it has a cost whether we wish to
          accept it or not):

          "If you add RSS support to your site you will get X, Y, and Z.
          These are things which will obviously increase your {bottom-line,
          advertising exposure, good kharma, etc.}."

          - A short and dead-simple description of what a typical web site would
          emit as RSS of a /single/ specific version. This would ideally show
          mock-up HTML content and a 1:1 correspondence between the HTML elements
          in the mock-up and their corresponding RSS elements.

          - A short and dead-simple set of instructions on where to put the RSS feed,
          and how often to update it.

          - Links to already-built tools, and more in-depth resources.

          Does such a document exist? If yes, then it's not available enough to
          those who might consider using RSS. If not, then one needs to be
          written.

          Rick
          --
          Mostly useless pseudo-random number: 768
          Rick Bradley - http://xns.org/=rick@... (95 F)
        • Chris Croome
          Hi ... I hope it s OK if I take that as a prompt to let the list know about a web centent management tool that can produce RSS 0.9, 0.91 and 1.0 feeds, MKDoc
          Message 4 of 7 , Sep 5, 2001
            Hi

            On Wed 05-Sep-2001 at 07:59:29PM +0100, Julian Bond wrote:
            >
            > The real answer is to get RSS produced by default by as many CMS as
            > possible.

            I hope it's OK if I take that as a prompt to let the list know about a
            web centent management tool that can produce RSS 0.9, 0.91 and 1.0
            feeds, MKDoc [1].

            It can also produce DC HTML and RDF metadata and all content managament
            is done via a web interface.

            Chris

            [1] http://mkdoc.com/

            --
            Chris Croome
            http://www.webarchitects.co.uk/
          • Julian Bond
            In article , Rick Bradley writes ... Hmm, what about all the slashclone code, manila, through
            Message 5 of 7 , Sep 5, 2001
              In article <20010905160014.L17336@...>, Rick Bradley
              <roundeye@...> writes
              >To get this kind of adoption a document containing the following needs
              >to reach people capable of making it so at any given site using any
              >given CMS (the bulk of which I contend are likely to be more
              >home-grown than off-the-shelf):

              Hmm, what about all the slashclone code, manila, through cold fusion, MS
              CMS, up to documentum, vignette, bladerunner etc. Anyone writing their
              own CMS now for a single web site needs their head examined. And in
              general, for commercial content sites they don't use a home grown
              solution at all, at all.

              > - a terse and dead-simple explanation of why RSS matters to those
              > who will pay for its integration (it has a cost whether we wish to
              > accept it or not):
              >
              > "If you add RSS support to your site you will get X, Y, and Z.
              > These are things which will obviously increase your {bottom-line,
              > advertising exposure, good kharma, etc.}."
              >
              > - A short and dead-simple description of what a typical web site would
              > emit as RSS of a /single/ specific version. This would ideally show
              > mock-up HTML content and a 1:1 correspondence between the HTML elements
              > in the mock-up and their corresponding RSS elements.
              >
              > - A short and dead-simple set of instructions on where to put the RSS feed,
              > and how often to update it.
              >
              > - Links to already-built tools, and more in-depth resources.
              >
              >Does such a document exist? If yes, then it's not available enough to
              >those who might consider using RSS. If not, then one needs to be
              >written.

              http://www.voidstar.com/rssfaq
              http://blogspace.com/rss
              http://www.purplepages.ie/RSS/

              I'm sure all three would welcome any input that provided the above. The
              first one is built in a CMS where you can comment and suggest changes or
              additions directly.

              One of the side effects of the Syndic8 project is hopefully going to be
              more people, contacting more potential sources of RSS and they will need
              to answer the same questions repeatedly. "Available enough" perhaps
              means "sufficiently promoted".

              --
              Julian Bond email: julian_bond@...
              CV/Resume: http://www.voidstar.com/cv/
              WebLog: http://www.voidstar.com/
              HomeURL: http://www.shockwav.demon.co.uk/
              M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
              ICQ:33679568 tag:So many words, so little time
            • Mike Dierken
              ... Actually, it is an example of an Architectural Form - an concept from SGML. The difference between: and Is
              Message 6 of 7 , Sep 6, 2001
                RE: [syndication] SWML

                >
                > I'm repeating myself, but the whole <span class="rss:item"> is nothing
                > more than a temporary kludge. The real answer is to get RSS
                > produced by default by as many CMS as possible.
                >
                Actually, it is an example of an Architectural Form - an concept from SGML.

                The difference between:
                 <span class="rss:item">
                and
                 <rss:item html="span">

                Is pretty minimal & mainly textual (at the file storage level).

                If you give configuration info to a SAX parser - like "use 'class=' to get the tagname" then you can consume either text format & the result (sax event or DOM, whichever) looks the same. (I don't know of any SAX parser that has this built in, so this unfortunately is only xml theory, not xml practice.)

                Using architectural forms lets the publisher decide what they like & makes 'transmogrifiying' relatively simple and efficient. Arbitrary transforms - like with XSL - can also do the job, but are more expensive computationally.

                If you try to get each producer to use a common format - which probably is not native to their data model - then you are pushing the burden of transformation onto them. Separating this allows for a 'middle-man' to do the transform. If you don't have a system with a middle-man (like web-servers & web-browsers originally had) then you'll probably have to push this onto either the server or the client. If servers already have a plug-in approach (like a CGI/servlet/etc.) then it probably isn't that difficult for providers to do the work - but they won't unless there are enough clients.


                Mike

              • Eric Bohlman
                ... You can do such things with a SAX filter, which is in fact the natural place to do architectural processing (presuming that you don t require an awful lot
                Message 7 of 7 , Sep 7, 2001
                  9/6/01 3:25:09 PM, Mike Dierken <mike@...> wrote:
                  > If you give configuration info to a SAX parser - like "use 'class=' to get
                  > the tagname" then you can consume either text format & the result (sax
                  > event or DOM, whichever) looks the same. (I don't know of any SAX parser
                  > that has this built in, so this unfortunately is only xml theory, not xml
                  > practice.)

                  You can do such things with a SAX filter, which is in fact the natural place to do architectural
                  processing (presuming that you don't require an awful lot of lookahead).
                Your message has been successfully submitted and would be delivered to recipients shortly.