Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Proposed new terms of data use

Expand Messages
  • Aaron Swartz
    Josh, who are the malefactors here? We can help you put a little pressure on them. I suspect just raising the issue with them publicly will help a great deal.
    Message 1 of 15 , Oct 14, 2008
    • 0 Attachment
      Josh, who are the malefactors here? We can help you put a little
      pressure on them. I suspect just raising the issue with them publicly
      will help a great deal.
    • Josh Tauberer
      ... Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in the XML. Document what s in the files (http://watchdog.net/about/api is broken atm so
      Message 2 of 15 , Oct 14, 2008
      • 0 Attachment
        Aaron Swartz wrote:
        > Let me know if we at watchdog.net can do anything to be more
        > open/helpful/compliant with this. With each passing week I'm more
        > interested in collaborating with other groups and opening things up
        > more.

        Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
        the XML. Document what's in the files (http://watchdog.net/about/api is
        broken atm so I don't know what's there).

        > Josh, who are the malefactors here? We can help you put a little
        > pressure on them. I suspect just raising the issue with them publicly
        > will help a great deal.

        I don't want to go down that path. No one is doing anything particularly
        egregious or at all malicious.

        --
        - Josh Tauberer
        - GovTrack.us

        http://razor.occams.info

        "Yields falsehood when preceded by its quotation! Yields
        falsehood when preceded by its quotation!" Achilles to
        Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
      • Aaron Swartz
        ... The dumps are really meant to be TSV rather than SQL. And our webpages all speak RDF/XML. But I m probably not going to do XML dumps because I hate the
        Message 3 of 15 , Oct 14, 2008
        • 0 Attachment
          > Use XML instead of/in addition to JSON/SQL.

          The dumps are really meant to be TSV rather than SQL. And our webpages
          all speak RDF/XML. But I'm probably not going to do XML dumps because
          I hate the format with a white-hot passion and I fail to see the
          point. Surely TSV is just as easy to parse.

          > Normalize names to IDs in the XML.

          This is true in the TSV.

          > Document what's in the files (http://watchdog.net/about/api is
          > broken atm so I don't know what's there).

          Sorry, fixed. There's some docs in /data now; let me know what you
          think is missing.
        • Ilan Rabinovitch
          ... Josh, At the moment GeekPAC is using your data by parsing the feeds via rsync and putting them into a SQL database. I m still doing a little clean up,
          Message 4 of 15 , Oct 14, 2008
          • 0 Attachment
            Josh Tauberer wrote:
            >
            > Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
            > the XML. Document what's in the files (http://watchdog.net/about/api is
            > broken atm so I don't know what's there).
            >
            >
            Josh,

            At the moment GeekPAC is using your data by parsing the feeds via rsync
            and putting them into a SQL database. I'm still doing a little clean
            up, but I do plan to post both the database dumps, as well as the Deki
            extensions we've written that perform the SQL queries we display. Does
            that fall in line with what you were thinking for acceptable use?

            We are not currently adding anything new to the data so reoutputing to
            XML seems a bit redundant. The reason we prefer SQL is that its easier
            for us to do relational queries on SQL than XML.

            Regards,

            Ilan
          • Josh Tauberer
            ... Besides what Fred posted (thanks Fred), I m not sure I can even assert copyright over the data --- it s a database, there s basically no creative
            Message 5 of 15 , Oct 14, 2008
            • 0 Attachment
              Michael Dale wrote:
              > In terms of data re-sharing ...you could license the "transformed" data
              > that govTrack makes available under cc-by-sa but creative commons
              > license does not says much about ~how~ the transformations are re-made
              > available.

              Besides what Fred posted (thanks Fred), I'm not sure I can even assert
              copyright over the data --- it's a database, there's basically no
              creative difference from the public domain original. I wouldn't really
              want to anyway, for the same reason I probably wouldn't create real TOS,
              since I do think the data should be free.

              > focus on providing constructive advice to groups working in this space
              > to maximize the commons and re usability of the data. ie provide a means
              > of "querying the data with gov_track ID if the govtrack data is used"

              Sure, your feedback and the other comments have been instructive for
              figuring out that angle.

              > I will quickly profile data usage / re-usage on metavid.org ;)

              I didn't have MetaVid in mind. :) I would, however, love to see database
              dumps (in a useful format) rather than having to query for everything.

              (I hope I'm the only one who hates queries and APIs as a primary means
              of data access....)

              --
              - Josh Tauberer
              - GovTrack.us

              http://razor.occams.info

              "Yields falsehood when preceded by its quotation! Yields
              falsehood when preceded by its quotation!" Achilles to
              Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
            • Josh Tauberer
              ... In that case you have nothing to worry about. :) -- - Josh Tauberer - GovTrack.us http://razor.occams.info Yields falsehood when preceded by its
              Message 6 of 15 , Oct 14, 2008
              • 0 Attachment
                Ilan Rabinovitch wrote:
                > Josh Tauberer wrote:
                >> Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
                >> the XML. Document what's in the files (http://watchdog.net/about/api is
                >> broken atm so I don't know what's there).
                >>
                >>
                > Josh,
                >
                > At the moment GeekPAC is using your data by parsing the feeds via rsync
                > and putting them into a SQL database. I'm still doing a little clean
                > up, but I do plan to post both the database dumps, as well as the Deki
                > extensions we've written that perform the SQL queries we display. Does
                > that fall in line with what you were thinking for acceptable use?
                >
                > We are not currently adding anything new to the data so reoutputing to
                > XML seems a bit redundant.

                In that case you have nothing to worry about. :)

                --
                - Josh Tauberer
                - GovTrack.us

                http://razor.occams.info

                "Yields falsehood when preceded by its quotation! Yields
                falsehood when preceded by its quotation!" Achilles to
                Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
              • David Moore
                Hi everyone, David with OpenCongress here. Definitely count us in on whatever community standards are agreed upon, we re happy to contribute. More details
                Message 7 of 15 , Oct 14, 2008
                • 0 Attachment
                  Hi everyone, David with OpenCongress here. Definitely count us in on
                  whatever community standards are agreed upon, we're happy to contribute.
                  More details below, think that Josh is right to bring it up.

                  As a foundation, our site code is open-source under the GPL and we offer
                  a host of RSS feeds & widgets & sharing tools to push info out.

                  We've always wanted to build an open API, but to be honest, given our
                  small staff & limited programming time, it wasn't as much of a priority
                  as major feature development.

                  Of course, that hasn't stopped us from starting work on a totally open
                  API on the back burner, making all data on OC & created by the OC user
                  community available. We've looped in a volunteer programmer to work on
                  the project with us in his spare time.

                  The OpenCongress API should do the trick as far as putting more data
                  from our corner of the transparency world on the communal table. Overall
                  goal is to provide programmers w/ an API that they could access and get
                  the bills associated with a given issue area, their status, and
                  blogs/commentary/social wisdom about them. We'll be able to provide
                  developers with at least the following data for non-commercial use:

                  a) Aggregated news & blog coverage of bills, Senators, and
                  Representatives, including those ranked "most useful"

                  b) Counts and locations of users tracking bills, Members, committees,
                  issues, etc.

                  c) User comments, incl. those rated "most useful", i.e. filtered up

                  d) User approval ratings for Members

                  e) User votes "aye" or "nay" on bills sitewide

                  f) Users also tracking related bills, issues, Members (connections)

                  g) Users who support/oppose also support/oppose related bills & Members

                  h) Users's OC friend relationships -- in their district, state, and
                  nationwide

                  i) Coming soon, more personally bookmarked content from users of MyOC

                  Coming from this, a few sample use cases:

                  i. Political bloggers will be able to more easily access user opinion on
                  bills & issues & Members in a specific Congressional district, e.g., "In
                  the NY-12 Congressional District, public opinion is running strongly
                  against this bill, with 147 out of 195 users opposing it. These users
                  are also opposing this related bill, and have given their Rep an
                  approval rating of only 29%, etc."

                  ii. Issue-based groups will be able to create highly customizable
                  widgets identifying the most significant bills, votes, related issue
                  areas, and Members relating to them. Groups will be able to easily
                  display & re-publish the news coverage, blog coverage, and user comments
                  rated "most helpful" on their issue by OC users.

                  iii. With future planned feature development, users will be able to
                  interact with each other in new ways, and contribute analysis of bills &
                  votes on the site -- this too will be made available to programmers
                  looking to keep their communities in touch with issue areas they care
                  about. All the social actions & opinions taking place on OC will be
                  available through the API.

                  If you're intersted in helping us build the API, we'd love volunteer
                  time -- send me an email at drm@... -- or if you have
                  questions, feel free to drop me a line as well. I don't really have a
                  pinpoint estimate of when the API will be finished at its current rate,
                  given other development work underway, but it should be ready before the
                  start of the next Congress in January '09, and hopefully much before then.

                  Input welcome on all the above, and volunteer help greatly appreciated,
                  Thanks,
                  -David

                  --
                  David Moore
                  c: (917) 753-3462
                  www.opencongress.org
                • Josh Tauberer
                  Bah! APIs! The next time someone says API I m gonna jump out a window. I ve got a window right here. It s open. I m ready. The one case an API makes sense as a
                  Message 8 of 15 , Oct 14, 2008
                  • 0 Attachment
                    Bah! APIs! The next time someone says API I'm gonna jump out a window.
                    I've got a window right here. It's open. I'm ready.

                    The one case an API makes sense as a primary means of data access is
                    when the data is so large and inseparable that it cannot be reasonably
                    distributed in files. It would have to be, say, at least several hundred
                    megabytes if not a few gigabytes for that to be the case --- and even
                    then one would have to justify not making use of resources like
                    public.resource.org to host it.

                    Can you imagine the outrage if the FEC decided to make its data
                    available only via an API with an API key that was limited to some fixed
                    number of queries per day? What's the first thing that would happen?
                    People (people like Carl Malamud right?) would reconstruct the database
                    and make it available via FTP.

                    Besides the case where the data is just too big, if the data is not
                    available in a flat file, it is IMO simply not open data, and as far as
                    what I am talking about on this thread, it "doesn't count".

                    (APIs take time to program correctly. Yes. Insufficient resources =
                    acceptable reason not to have an API. Database dumps do not take serious
                    effort.)

                    --
                    - Josh Tauberer
                    - GovTrack.us

                    http://razor.occams.info

                    "Yields falsehood when preceded by its quotation! Yields
                    falsehood when preceded by its quotation!" Achilles to
                    Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)


                    David Moore wrote:
                    > Hi everyone, David with OpenCongress here. Definitely count us in on
                    > whatever community standards are agreed upon, we're happy to contribute.
                    > More details below, think that Josh is right to bring it up.
                    >
                    > As a foundation, our site code is open-source under the GPL and we offer
                    > a host of RSS feeds & widgets & sharing tools to push info out.
                    >
                    > We've always wanted to build an open API, but to be honest, given our
                    > small staff & limited programming time, it wasn't as much of a priority
                    > as major feature development.
                    >
                    > Of course, that hasn't stopped us from starting work on a totally open
                    > API on the back burner, making all data on OC & created by the OC user
                    > community available. We've looped in a volunteer programmer to work on
                    > the project with us in his spare time.
                    >
                    > The OpenCongress API should do the trick as far as putting more data
                    > from our corner of the transparency world on the communal table. Overall
                    > goal is to provide programmers w/ an API that they could access and get
                    > the bills associated with a given issue area, their status, and
                    > blogs/commentary/social wisdom about them. We'll be able to provide
                    > developers with at least the following data for non-commercial use:
                    >
                    > a) Aggregated news & blog coverage of bills, Senators, and
                    > Representatives, including those ranked "most useful"
                    >
                    > b) Counts and locations of users tracking bills, Members, committees,
                    > issues, etc.
                    >
                    > c) User comments, incl. those rated "most useful", i.e. filtered up
                    >
                    > d) User approval ratings for Members
                    >
                    > e) User votes "aye" or "nay" on bills sitewide
                    >
                    > f) Users also tracking related bills, issues, Members (connections)
                    >
                    > g) Users who support/oppose also support/oppose related bills & Members
                    >
                    > h) Users's OC friend relationships -- in their district, state, and
                    > nationwide
                    >
                    > i) Coming soon, more personally bookmarked content from users of MyOC
                    >
                    > Coming from this, a few sample use cases:
                    >
                    > i. Political bloggers will be able to more easily access user opinion on
                    > bills & issues & Members in a specific Congressional district, e.g., "In
                    > the NY-12 Congressional District, public opinion is running strongly
                    > against this bill, with 147 out of 195 users opposing it. These users
                    > are also opposing this related bill, and have given their Rep an
                    > approval rating of only 29%, etc."
                    >
                    > ii. Issue-based groups will be able to create highly customizable
                    > widgets identifying the most significant bills, votes, related issue
                    > areas, and Members relating to them. Groups will be able to easily
                    > display & re-publish the news coverage, blog coverage, and user comments
                    > rated "most helpful" on their issue by OC users.
                    >
                    > iii. With future planned feature development, users will be able to
                    > interact with each other in new ways, and contribute analysis of bills &
                    > votes on the site -- this too will be made available to programmers
                    > looking to keep their communities in touch with issue areas they care
                    > about. All the social actions & opinions taking place on OC will be
                    > available through the API.
                    >
                    > If you're intersted in helping us build the API, we'd love volunteer
                    > time -- send me an email at drm@... -- or if you have
                    > questions, feel free to drop me a line as well. I don't really have a
                    > pinpoint estimate of when the API will be finished at its current rate,
                    > given other development work underway, but it should be ready before the
                    > start of the next Congress in January '09, and hopefully much before then.
                    >
                    > Input welcome on all the above, and volunteer help greatly appreciated,
                    > Thanks,
                    > -David
                    >
                  • aronpilhofer
                    ... Let s hope it s a low floor, because I wanted to let folks know we ve just released our campaign finance API. Not necessarily of great use to this group,
                    Message 9 of 15 , Oct 15, 2008
                    • 0 Attachment
                      > Bah! APIs! The next time someone says API I'm gonna jump out a window.

                      Let's hope it's a low floor, because I wanted to let folks know we've
                      just released our campaign finance API. Not necessarily of great use
                      to this group, but who knows.

                      http://developer.nytimes.com/docs/campaign_finance_api

                      Incidentally, I agree that API's are a rather crappy way of
                      distributing data en toto, but who is arguing this as an either/or?
                      There is significant value in both.

                      First, you mention how horrible it would be should the FEC create an
                      API. But not everyone has the technical know-how to handle, what, 12?
                      13? million FEC records, much less make sense of the arcane poorly
                      documented system they use to categorize and code individual records.
                      If you don't know what you are doing, you can end up completely
                      shooting yourself in the foot.

                      And don't even get me started on the electronic filings, which is what
                      we are using for our own API. The process of massaging those data into
                      something meaningful is far far more complicated than it should be.
                      (Like, who's the genius who decided not to require campaigns to
                      disclose their aggregate amount of unitemized donors?)

                      So, why should you be required to become a campaign finance expert in
                      order to use the data? That's an artificial and unnecessary barrier.

                      Second, not everyone wants all 8 kazillion records. They may only care
                      about specific donors, or specific candidates, or specific localities.
                      A well-written API (ours is a work in progress, so, don't judge it
                      just yet) is another way of lowering the barrier of entry.

                      I agree that the term and the concept is getting a bit overused. But
                      that isn't a compelling reason NOT to make access to data easier for
                      people.

                      >Again, I'm not actually enacting this policy over my data.

                      On the specific point that started this thread, it might be a good
                      time to gently remind you that this is not your data. It's the
                      public's data, which you (and god bless you for having done it) have
                      taken the time and effort to make available in a rational format for
                      the betterment of all.

                      It is a lesson I think we all learned on the playground: sharing is
                      not always reciprocal. There are going to be people out there who
                      take, and don't give back. I understand your frustration, but I don't
                      think adding some new requirement is going to help all that much, and
                      may actually end up hurting more than anything else.

                      My 2 cents,
                      Aron
                    • Josh Tauberer
                      ... Well, look, I wasn t making a statement about APIs in general. I was responding to a response to my statement about contributing to the commons, and I was
                      Message 10 of 15 , Oct 15, 2008
                      • 0 Attachment
                        aronpilhofer wrote:
                        > Incidentally, I agree that API's are a rather crappy way of
                        > distributing data en toto, but who is arguing this as an either/or?
                        > There is significant value in both.

                        Well, look, I wasn't making a statement about APIs in general.

                        I was responding to a response to my statement about contributing to the
                        commons, and I was saying that an API doesn't contribute the data to the
                        commons.

                        In the case of the Times's FEC API, the data is already available in
                        bulk from the FEC. You're providing an additional service to make things
                        easier, and I say that is only a good thing. You're also a commercial
                        enterprise, with different goals, and I meant to only be addressing the
                        strictly nonprofit/transparency world, though I know I didn't say it.

                        > On the specific point that started this thread, it might be a good
                        > time to gently remind you that this is not your data.

                        For all the time I put into it, I think I get a little say in how it is
                        used (if you access my server to get it). I have no moral obligation to
                        provide the data to everyone. At worst it would be hypocritical to start
                        adding restrictions when I talk about openness, which is why I don't
                        actually have any.

                        And the irony is not past me that if I actually add a restriction,
                        someone could fork the project.

                        > There are going to be people out there who
                        > take, and don't give back.

                        But that doesn't mean I shouldn't have an expectation about what they
                        *ought* to be doing. The fact that someone isn't contributing data that
                        they have back doesn't mean I stop asking.

                        --
                        - Josh Tauberer
                        - GovTrack.us

                        http://razor.occams.info

                        "Yields falsehood when preceded by its quotation! Yields
                        falsehood when preceded by its quotation!" Achilles to
                        Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                      • aronpilhofer
                        ... Fair enough. I move to strike my statement from the record. ... I guess that depends on what restrictions you do decide to slap on it, if any. I m not
                        Message 11 of 15 , Oct 15, 2008
                        • 0 Attachment
                          > Well, look, I wasn't making a statement about APIs in general.

                          Fair enough. I move to strike my statement from the record.

                          > For all the time I put into it, I think I get a little say in how it >is

                          I guess that depends on what restrictions you do decide to slap on it,
                          if any. I'm not telling you anything you don't know -- but that's part
                          of the deal when you decide to open things up. People take and don't
                          play nice. It sucks, but you can't really have it both ways.

                          >The fact that someone isn't contributing data >that
                          > they have back doesn't mean I stop asking.

                          No one said that. But putting some kind of license on the data to
                          enforce it, that's another matter.
                        Your message has been successfully submitted and would be delivered to recipients shortly.