RDF to SQL
- This is mainly a follow-up to something Chris from DIA (he's on the mail
list now) and I were talking about Monday, but I think the list at large
would be interested in seeing this.
When I met with Chris on Monday, he raised an important point that
whatever system is used to share data, it should be really easy to use,
in part to encourage people to use it.
Sharing data as raw databases makes it really easy to drop the data into
a website, he suggested. And, I totally admit that it's much easier
than dealing with RDF.
But, as I responded Monday, once data is in RDF, it's easy to export it
into a database. Two weeks ago I had been working on an RDF querying
engine (for fun, really, since there are already existing programs to do
this), and this week I added to it an SQL output format. The result is
the ability to query an RDF data model and output it, more or less, as a
First some background...
GovTrack publishes an RDF version of all of its data in
http://www.govtrack.us/data/rdf/. You should take a look at the
people.rdf file if you haven't seen it get to get a general idea for the
structure of the data.
You can browse the data at http://www.govtrack.us/rdfbrowse.xpd. The
browser program itself knows nothing about the type of data that it's
browsing, which is a good example of the advantage of using RDF. All of
the different types of information magically just come together, with no
glue specific to each type of data. (The browser uses the RDF schemas
in http://www.govtrack.us/share/ and some labels present in the RDF
files above to display nice names in place of some URIs.)
RDF can be written in XML or Notation 3, among other formats. There are
N3 versions of the schemas in the share directory if you want to see
what they look like. N3 is a much simpler format than RDF/XML. It's
basically just a list of statements: subject predicate object, followed
by a period.
For the query engine that I wrote, the query itself is written as RDF
(in this case as N3). You give it an RDF graph with some nodes marked
as variables, and the engine tells you the different ways it can match
up (bind) those variables with entities in the target data model.
Ok, the example...
At http://www.govtrack.us/rdfquery.cgi you can try it out. Although,
admitedly it's difficult coming up with valid queries because the
structure of the data isn't all that simple.
The example queries the data model for all representatives currently
serving in an office. (Since the data model is pretty rich, it's also
possible to write queries to list the population of each state for any
senator that voted Nay on legislation related to Copyright, for instance.)
Anyway, the idea is that once the data is in RDF, we could come up with
some queries to generate database versions of the information, and then
also publish those.
- Joshua Tauberer
** Nothing Unreal Exists **