Mixing Facts with Speculation/Gossip
- Aroundthecapitol.com seems to be impacted by
intermigling with tables of data taken from official
sources. He had a section for anonymous discussions
and speculation about politicians and bills including
who may/may not be running in future elections.
Apparently he was threatened with legal action for
something someone said. See his blog entry:
What does everyone think about this? I understand the
legal implications of publishing anonymous comments,
(and I think he could probably win if he had the
resources to fight it) but I think there is an
underlying issue regarding data source integrity.
I'm all for free speech and public discource of the
issues, but is a data source site the proper place for
such discussions? Does a public forum put the data
source's, or the entities maintaining the data source,
integrity into question?
Another questionable area is that fact that the sites
creator is a lobbyist himself and has not ruled out
running for public office. As a user of the website
how can I be assured that the data has not been
I guess what I'm really looking for is how do I make
sure my own data gathering efforts are not
discredited? Process translucence perhaps?
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
- Hi Scott,
You've raised some really good points. I think you're absolutely right that pure data mining in the same site with an open forum that has a decided purpose of shaping public opinion is something we'd REALLY like to avoid. That was the reason I separated Pythia from WWW on ProgressiveNation.net. The idea: let visitors mine what they need to mine, the way they need to mine it -- on the mining site. Once they've drawn their conclusions or are ready to publish their studies, they must go back to the main site.
As we're scraping and conditioning the raw data, we might think about a common format for citing the source, perhaps briefly describing the normlization procedures, and state outright that no opinion has been made on any of the contents. I'm not familiar with other members' sites, so I'm not sure what sort of mining and publishing operations are already underway.
Joshua, would OGDEX support such a resource registration/citation form? This kind of thing is more common in the genealogy world, where citation of source is EVERYTHING. (Genealogists are even more cynical than political researchers :chuckle:) That would give us an idea of the resources we've got on hand, so we can begin building the inter-site communications. (More on that in another post).
I think by registering our resources on OGDEX and/or GovTrack we can lend some assurance that we've at least done a peer-review and have agreed that the data (wherever it may currently lie) is coherent, standardized, and as complete as currently possible, given the technologies we're forced to employ. (The expression "BFH" [Big Friendly Hammer] springs to mind.)
Resource registration isn't the whole solution, but it would be a start.
----- Original Message -----
From: Scott Beardsley <sc0ttbeardsley@...>
Sent: Tue, 1 Mar 2005 00:34:50 +0000
Subject: [govtrack] Mixing Facts with Speculation/Gossip
- Bill Farrell wrote:
> As we're scraping and conditioning the raw data, we might think aboutOGDEX was started as a community effort with no particular leadership,
> a common format for citing the source, perhaps briefly describing the
> normlization procedures, and state outright that no opinion has been
> made on any of the contents. [snip]
> Joshua, would OGDEX support such a resource registration/citation
so don't look at me. :) Its mission is to serve as a hub for efforts
along these lines, and I agree that we will need a system for describing
data sources, and I also think OGDEX would work well as a central place
to list sources in that way.
I have lots of ideas (as per usual) about how to go about describing
data sources, but before I present them 1) I need to get everyone to
agree that RDF is the way to approach this (otherwise we're going to be
debating XML formats forever), and 2) we need to actually get different
data sources available on the web.
The ideal way to get all of this going is for someone that has data
(e.g. you, Bill) to pick out a slice of their data that is related to
what's on GovTrack but doesn't overlap with it, and then for you and
GovTrack to export that data in a common format (e.g. RDF). These days
I'm just waiting for this to happen.
Take the bioguide IDs, for instance. It's related to GovTrack's data in
that its about the same people that GovTrack has data about, but it
doesn't overlap because I don't have that info. The common format I
hope to convince you of is RDF (pending my finishing the explanation of
RDF), and then I can work with you on getting the data exported in that
- Joshua Tauberer
** Nothing Unreal Exists **