RSS: Introducing Myself
- Hi. My name is Dan Libby.
Disclaimer 1 : My apologies if this sounds pompous, or you disagree with my recommendations. Feel free to take it with a grain of salt.
Disclaimer 2: I speak only for myself, and not for my employer(s) - past or present.
I was the primary author of the RSS 0.9 and 0.91 spec and the architect behind the My Netscape Network (a separate project from My Netscape, which I also worked on). I left Netscape in 1999, in part because of what I felt was mis-handling (non-handling?) of RSS and the MN platform. I fully expected the format to die an ignominious death, and I was pleasantly surprised to recently to poke my head out of the sand and find so many people still using it. I am glad that the net community has begun adopting RSS, and would like to see it realize the original vision. So I have been watching the recent discussions with interest. It is my hope that some background will at least make my original intent and reasoning clear and perhaps even help us avoid a fork, though perhaps that is inevitable?
The original My Netscape Network Vision:
We would create a platform and an RDF vocabulary for syndicating metadata about websites and aggregating them on My Netscape and ultimately in the web browser. Because we only retrieved metadata, the website authors would still receive user's click-throughs to view the full site, thus benefitting both the aggregator and the publisher. My Netscape would run an RDF database that stored all the content. Preferences akin to mail filters, would allow the user to filter only the data in which they are interested onto the page, from the entire pool of data. For example, a user interested in articles about "Football" would be able to setup a personalized channel that simply consisted of a filter for Football, or even for a particular team or player. Or for all references to Slashdot.org, or whatever. This fit our personalization scheme well, and would (I hoped) give us the largest selection of content, with the greatest degree of personalization available. Tools would be made available to simplify the process of creating these files, and to validate them, and life would be good.
What Actually Happened:
1) A decision was made that for the first implementation, we did not actually need a "real" RDF database, which did not even really exist at the time. Instead we could put the data in our existing store, and instead display data, one "channel" at a time. This made publishers happier anyway, because they would get their own window and logo. We could always do the "full" implementation later.
2) The original RDF/RSS spec was deemed "too complex" for the "average user". The RDF data model itself is complex to the uninitiated, and thus the placement of certain XML elements representing arc types seemed redundant and arbitrary to some. Support for XML namespaces was basically non-existent. My (poor) solution was to create a simpler format, RSS 0.9, that was technically valid RDF, but dropped namespaces and created a non-connected graph. We decided that it could always be "transformed" into a graph for the to-be-built RDF database, but this imposed a 1 channel per file limitation. People were willing to live with it. (note: The "inChannel" tag in RSS 1.0 proposal solves this problem neatly). This marked the beginning of the Full Functionality vs Keep It Simple Stupid debate that continues to this day. It is interesting to note that the _original_ spec I wrote is actually much closer to RSS 1.0 than to either 0.9 or 0.91. At the time, I insisted that we publish it, if only to make the RDF crowd happy, and we ironically called it the Futures Document.
3) We shipped the first implementation, sans tools. Basically, there was a spec for RSS 0.9, some samples, and a web-based validation tool. No further support was given for a while, and I was kept busy working on other projects. Even still, channels started coming in, and the system worked in a rudimentary fashion.
4) At some point, it was decided that we needed to rev the RSS spec to allow things like per item descriptions, i18n support, ratings, and image widths and height. Due to artificial (in my view) time constraints, it was again decided to continue with the current storage solution, and I realized that we were *never* going to get around to the rest of the project as originally conceived. At the time, the primary users of RSS (Dave Winer the most vocal among them) were asking why it needed to be so complex and why it didn't have support for various features, eg update frequencies. We really had no good answer, given that we weren't using RDF for any useful purpose. Further, because RDF can be expressed in XML in multiple ways, I was uncomfortable publishing a DTD for RSS 0.9, since the DTD would claim that technically valid RDF/RSS data conforming to the RDF graph model was not valid RSS. Anyway, it didn't feel "clean". The compromise was to produce RSS 0.91, which could be validated with any validating XML parser, and which incorporated much of userland's vocabulary, thus removing most (I think) of Dave's major objections. I felt slightly bad about this, but given actual usage at the time, I felt it better suited the needs of its users: simplicity, correctness, and a larger vocabulary, without RDF baggage. (I also had a really fun time writing a vocab independent XML validation system in python, which it turns out is pretty similar to XML-Schema.)
5) We shipped the thing in a very short time, meeting the time constraints, then spent a month or two fixing it all. :-) It was apparently not deemed "strategic", and thus was never given more than maintenance attention.
6) People on the net began creating all sorts of tools on their own, and publishing how-to articles, and all sorts of things, and using it in ways not envisioned by, err, some. And now we are here, debating it all over again. Fortunately, this time it is in an open forum.
My Perspective On "The Right Thing":
1) I agree with Dave and others that ease/simplicity of USE is very important. I think the success of RSS 0.9* has been because it was so simple. Anyone who knew HTML could do it, which was good, because they had to do it by hand.
2) Simplicity and ease of use do not require a simple format. Microsoft Word is pretty simple to use, but try reading the binary representation of a saved file sometime. Or even their new XML representation. The important thing is that the end-user tools be simple. This means pre-built scripts for script-writers and field-by-field hand-holding entry for those who would otherwise hand-code, and a validator for both.
3) Flexibility and extensibility are necessities and supercede even the need for simplicity. Without them, the format will assuredly split and will be used in ways never intended. With them, it is safe to add your own random data type, and the receiver is free to interpret or ignore as it sees fit. As long as everyone agrees on the core, RSS remains a useful mechanism. For this reason, I would suggest that dublin core be added to the core spec, in the way that RSS 0.91 has been (as a core 'module'). This was originally intended anyway, as evidenced by the "futures" document.
4) Validation is extremely important -- important enough to be listed apart from "tools". Someone publishing a document *must* be able to validate that the document is correct before sending it, particularly when setting up an automated system. Validation further helps prevent the format from splitting, particularly in areas where the spec may be unclear. For XML, validation requires minimally a DTD, and optimally XML-Schema and/or further application level processing. For RDF, validation requires an RDF-Schema aware processor (I believe).
5) Given the above points, I (for the most part) support the RSS 1.0 spec, as written. I believe it has a high degree of flexibility while maintaining a relatively simple core set. However, to be *practical*, we must first create the tools for 1) validation, 2) processing, and 3) generation, pretty much in that order. With proper validation tools, people can begin writing processors and generators, or even producing files by hand. Without them, it is like shooting in the dark.
5a) Another note on this, and a caveat -- given that the RSS 1.0 spec utilizes RDF, I believe that the tools and format itself should be RDF aware _from the start_. A solid foundation is key to building anything that is going to last. This means that it is the *data model* that is important, not the physical syntax of "start with channel, then several items, etc". In fact, I believe the spec itself should be an RDF Schema depicting the data model, with simple examples of how to express it in XML. Anything less results in confusion and a mish-mash of incompatible tools, where some are simple XML processors and some are full RDF-aware processors. I see this as the largest hurdle for RSS as RDF, given the comparative lack of RDF tools to XML tools. If we are not willing to commit to this in the spec and tools, then we may as well go back to a plain XML format. In other words, put up or shut up.
6) Is RDF Necessary? Well, no. Not for plain syndication anyway. That's why I got rid of it in 0.91. But it is pretty cool. Now, after a year of working with it on a day to day basis, I have a fairly good understanding of what it is and is not good at. It is good at expressing a data model and allowing one to refer to arbitrary things without duplicating data, something the XML tree structure is weak at. SInce RSS has "Summary" as its third word (regardless of version), that seems like a pretty good match. I think that basing the format in RDF will add value as more and more people are using it and are able to refer to things in databases all over the web without physically re-bundling the data. In other words, the value at the beginning will be small or non-existent, but will grow non-linearly over time.
7) I think that the original vision mentioned above is still do-able, particularly given the existence of guha's RDFDB and similar tools, and that someone could build a very kickass personalization/filtering and syndication system that way. (Of course, given proper transformations and a suitable backend, you could regardless of the format.)
That's my $.02. My congrats and thanks to the authors and champions of the RSS 1.0 spec and all of you who have given RSS renewed life after Netscape all but abandoned it, and to Rael Dornfest for making me take notice.