Loading ...
Sorry, an error occurred while loading the content.
 

Re: [govtrack] Re: linking Govtrack data

Expand Messages
  • Josh Tauberer
    ... Yes, I think. The diagram is the generic Linking Open Data diagram that shows all of the datasets in the LOD community. Besides DBPedia, the highlighted
    Message 1 of 8 , Jul 8, 2008
      Roger Williams wrote:
      > --- In govtrack@yahoogroups.com, Josh Tauberer <tauberer@...> wrote:
      >> The data is marginally interlinked. For instance, their New York
      > resource:
      >> http://dbpedia.org/resource/New_York
      >> is owl:sameAs'd to my New York resource:
      >> http://www.rdfabout.com/rdf/usgov/geo/us/ny
      >>
      >> This was done based on their previously existing owl:sameAs links
      > to the
      >> Geonames dataset, and then the links I created between Geonames and
      > my
      >> Census data set (http://www.rdfabout.com/demo/census). That's the
      > extent
      >> of the interlinking --- otherwise the datasets don't connect at
      > all. No
      >> politician interlinking, for instance. (Also see:
      >> http://wiki.dbpedia.org/Interlinking)
      >>
      > Yeah...I looked at this DBPedia page before. What I got confused
      > about is that it has a bubble for "US Census Data" and
      > then "GovTrack" is grey-ed. There is no legend for this graphic.
      >
      > Do you know what the "grey-ed" bubbles connote? Is this like more
      > than one "degree of separation"?

      Yes, I think. The diagram is the generic Linking Open Data diagram that
      shows all of the datasets in the LOD community. Besides DBPedia, the
      highlighted ones are those which DBPedia itself asserts owl:sameAs links
      to. The US Census Data refers to my Census data set which has the
      geographic identifiers, and then hanging off of that (the GovTrack
      bubble) is the political side to my data.

      > But I can guess that this can be helped by code to link all of
      > the "Clinton" [in GovTrack] references to "Hillary/Bill/New
      > York/Arkansas" [in Wiki/DBPedia].
      >
      > With some stuff I think about, the first question is: how hard is
      > this curation? Can this be done with Protege
      > [http://protege.stanford.edu/%5d?

      Identifying connections isn't hard to do by hand except that there are a
      lot of connections to identify, and it's more efficient if you can write
      a program to do it en mass. For Members of Congress, ok, there are only
      435 now, but there are some 10,000 looking back to 1786 (or whatever),
      plus there are some 75,000 geographic entities that might be related
      between data sets.

      > Then if we do update GovTrack with all this, what happens when
      > GovTrack is refreshed?

      Whatever connections are made would be stored in its own file, and since
      the identifiers for things are stable, it would continue to make sense
      as data is updated.

      --
      - Josh Tauberer
      - GovTrack.us

      http://razor.occams.info

      "Yields falsehood when preceded by its quotation! Yields
      falsehood when preceded by its quotation!" Achilles to
      Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
    Your message has been successfully submitted and would be delivered to recipients shortly.