Loading ...
Sorry, an error occurred while loading the content.

Re: [govtrack] Proposed new terms of data use

Expand Messages
  • Aaron Swartz
    ... The dumps are really meant to be TSV rather than SQL. And our webpages all speak RDF/XML. But I m probably not going to do XML dumps because I hate the
    Message 1 of 15 , Oct 14, 2008
    • 0 Attachment
      > Use XML instead of/in addition to JSON/SQL.

      The dumps are really meant to be TSV rather than SQL. And our webpages
      all speak RDF/XML. But I'm probably not going to do XML dumps because
      I hate the format with a white-hot passion and I fail to see the
      point. Surely TSV is just as easy to parse.

      > Normalize names to IDs in the XML.

      This is true in the TSV.

      > Document what's in the files (http://watchdog.net/about/api is
      > broken atm so I don't know what's there).

      Sorry, fixed. There's some docs in /data now; let me know what you
      think is missing.
    • Ilan Rabinovitch
      ... Josh, At the moment GeekPAC is using your data by parsing the feeds via rsync and putting them into a SQL database. I m still doing a little clean up,
      Message 2 of 15 , Oct 14, 2008
      • 0 Attachment
        Josh Tauberer wrote:
        >
        > Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
        > the XML. Document what's in the files (http://watchdog.net/about/api is
        > broken atm so I don't know what's there).
        >
        >
        Josh,

        At the moment GeekPAC is using your data by parsing the feeds via rsync
        and putting them into a SQL database. I'm still doing a little clean
        up, but I do plan to post both the database dumps, as well as the Deki
        extensions we've written that perform the SQL queries we display. Does
        that fall in line with what you were thinking for acceptable use?

        We are not currently adding anything new to the data so reoutputing to
        XML seems a bit redundant. The reason we prefer SQL is that its easier
        for us to do relational queries on SQL than XML.

        Regards,

        Ilan
      • Josh Tauberer
        ... Besides what Fred posted (thanks Fred), I m not sure I can even assert copyright over the data --- it s a database, there s basically no creative
        Message 3 of 15 , Oct 14, 2008
        • 0 Attachment
          Michael Dale wrote:
          > In terms of data re-sharing ...you could license the "transformed" data
          > that govTrack makes available under cc-by-sa but creative commons
          > license does not says much about ~how~ the transformations are re-made
          > available.

          Besides what Fred posted (thanks Fred), I'm not sure I can even assert
          copyright over the data --- it's a database, there's basically no
          creative difference from the public domain original. I wouldn't really
          want to anyway, for the same reason I probably wouldn't create real TOS,
          since I do think the data should be free.

          > focus on providing constructive advice to groups working in this space
          > to maximize the commons and re usability of the data. ie provide a means
          > of "querying the data with gov_track ID if the govtrack data is used"

          Sure, your feedback and the other comments have been instructive for
          figuring out that angle.

          > I will quickly profile data usage / re-usage on metavid.org ;)

          I didn't have MetaVid in mind. :) I would, however, love to see database
          dumps (in a useful format) rather than having to query for everything.

          (I hope I'm the only one who hates queries and APIs as a primary means
          of data access....)

          --
          - Josh Tauberer
          - GovTrack.us

          http://razor.occams.info

          "Yields falsehood when preceded by its quotation! Yields
          falsehood when preceded by its quotation!" Achilles to
          Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
        • Josh Tauberer
          ... In that case you have nothing to worry about. :) -- - Josh Tauberer - GovTrack.us http://razor.occams.info Yields falsehood when preceded by its
          Message 4 of 15 , Oct 14, 2008
          • 0 Attachment
            Ilan Rabinovitch wrote:
            > Josh Tauberer wrote:
            >> Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
            >> the XML. Document what's in the files (http://watchdog.net/about/api is
            >> broken atm so I don't know what's there).
            >>
            >>
            > Josh,
            >
            > At the moment GeekPAC is using your data by parsing the feeds via rsync
            > and putting them into a SQL database. I'm still doing a little clean
            > up, but I do plan to post both the database dumps, as well as the Deki
            > extensions we've written that perform the SQL queries we display. Does
            > that fall in line with what you were thinking for acceptable use?
            >
            > We are not currently adding anything new to the data so reoutputing to
            > XML seems a bit redundant.

            In that case you have nothing to worry about. :)

            --
            - Josh Tauberer
            - GovTrack.us

            http://razor.occams.info

            "Yields falsehood when preceded by its quotation! Yields
            falsehood when preceded by its quotation!" Achilles to
            Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
          • David Moore
            Hi everyone, David with OpenCongress here. Definitely count us in on whatever community standards are agreed upon, we re happy to contribute. More details
            Message 5 of 15 , Oct 14, 2008
            • 0 Attachment
              Hi everyone, David with OpenCongress here. Definitely count us in on
              whatever community standards are agreed upon, we're happy to contribute.
              More details below, think that Josh is right to bring it up.

              As a foundation, our site code is open-source under the GPL and we offer
              a host of RSS feeds & widgets & sharing tools to push info out.

              We've always wanted to build an open API, but to be honest, given our
              small staff & limited programming time, it wasn't as much of a priority
              as major feature development.

              Of course, that hasn't stopped us from starting work on a totally open
              API on the back burner, making all data on OC & created by the OC user
              community available. We've looped in a volunteer programmer to work on
              the project with us in his spare time.

              The OpenCongress API should do the trick as far as putting more data
              from our corner of the transparency world on the communal table. Overall
              goal is to provide programmers w/ an API that they could access and get
              the bills associated with a given issue area, their status, and
              blogs/commentary/social wisdom about them. We'll be able to provide
              developers with at least the following data for non-commercial use:

              a) Aggregated news & blog coverage of bills, Senators, and
              Representatives, including those ranked "most useful"

              b) Counts and locations of users tracking bills, Members, committees,
              issues, etc.

              c) User comments, incl. those rated "most useful", i.e. filtered up

              d) User approval ratings for Members

              e) User votes "aye" or "nay" on bills sitewide

              f) Users also tracking related bills, issues, Members (connections)

              g) Users who support/oppose also support/oppose related bills & Members

              h) Users's OC friend relationships -- in their district, state, and
              nationwide

              i) Coming soon, more personally bookmarked content from users of MyOC

              Coming from this, a few sample use cases:

              i. Political bloggers will be able to more easily access user opinion on
              bills & issues & Members in a specific Congressional district, e.g., "In
              the NY-12 Congressional District, public opinion is running strongly
              against this bill, with 147 out of 195 users opposing it. These users
              are also opposing this related bill, and have given their Rep an
              approval rating of only 29%, etc."

              ii. Issue-based groups will be able to create highly customizable
              widgets identifying the most significant bills, votes, related issue
              areas, and Members relating to them. Groups will be able to easily
              display & re-publish the news coverage, blog coverage, and user comments
              rated "most helpful" on their issue by OC users.

              iii. With future planned feature development, users will be able to
              interact with each other in new ways, and contribute analysis of bills &
              votes on the site -- this too will be made available to programmers
              looking to keep their communities in touch with issue areas they care
              about. All the social actions & opinions taking place on OC will be
              available through the API.

              If you're intersted in helping us build the API, we'd love volunteer
              time -- send me an email at drm@... -- or if you have
              questions, feel free to drop me a line as well. I don't really have a
              pinpoint estimate of when the API will be finished at its current rate,
              given other development work underway, but it should be ready before the
              start of the next Congress in January '09, and hopefully much before then.

              Input welcome on all the above, and volunteer help greatly appreciated,
              Thanks,
              -David

              --
              David Moore
              c: (917) 753-3462
              www.opencongress.org
            • Josh Tauberer
              Bah! APIs! The next time someone says API I m gonna jump out a window. I ve got a window right here. It s open. I m ready. The one case an API makes sense as a
              Message 6 of 15 , Oct 14, 2008
              • 0 Attachment
                Bah! APIs! The next time someone says API I'm gonna jump out a window.
                I've got a window right here. It's open. I'm ready.

                The one case an API makes sense as a primary means of data access is
                when the data is so large and inseparable that it cannot be reasonably
                distributed in files. It would have to be, say, at least several hundred
                megabytes if not a few gigabytes for that to be the case --- and even
                then one would have to justify not making use of resources like
                public.resource.org to host it.

                Can you imagine the outrage if the FEC decided to make its data
                available only via an API with an API key that was limited to some fixed
                number of queries per day? What's the first thing that would happen?
                People (people like Carl Malamud right?) would reconstruct the database
                and make it available via FTP.

                Besides the case where the data is just too big, if the data is not
                available in a flat file, it is IMO simply not open data, and as far as
                what I am talking about on this thread, it "doesn't count".

                (APIs take time to program correctly. Yes. Insufficient resources =
                acceptable reason not to have an API. Database dumps do not take serious
                effort.)

                --
                - Josh Tauberer
                - GovTrack.us

                http://razor.occams.info

                "Yields falsehood when preceded by its quotation! Yields
                falsehood when preceded by its quotation!" Achilles to
                Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)


                David Moore wrote:
                > Hi everyone, David with OpenCongress here. Definitely count us in on
                > whatever community standards are agreed upon, we're happy to contribute.
                > More details below, think that Josh is right to bring it up.
                >
                > As a foundation, our site code is open-source under the GPL and we offer
                > a host of RSS feeds & widgets & sharing tools to push info out.
                >
                > We've always wanted to build an open API, but to be honest, given our
                > small staff & limited programming time, it wasn't as much of a priority
                > as major feature development.
                >
                > Of course, that hasn't stopped us from starting work on a totally open
                > API on the back burner, making all data on OC & created by the OC user
                > community available. We've looped in a volunteer programmer to work on
                > the project with us in his spare time.
                >
                > The OpenCongress API should do the trick as far as putting more data
                > from our corner of the transparency world on the communal table. Overall
                > goal is to provide programmers w/ an API that they could access and get
                > the bills associated with a given issue area, their status, and
                > blogs/commentary/social wisdom about them. We'll be able to provide
                > developers with at least the following data for non-commercial use:
                >
                > a) Aggregated news & blog coverage of bills, Senators, and
                > Representatives, including those ranked "most useful"
                >
                > b) Counts and locations of users tracking bills, Members, committees,
                > issues, etc.
                >
                > c) User comments, incl. those rated "most useful", i.e. filtered up
                >
                > d) User approval ratings for Members
                >
                > e) User votes "aye" or "nay" on bills sitewide
                >
                > f) Users also tracking related bills, issues, Members (connections)
                >
                > g) Users who support/oppose also support/oppose related bills & Members
                >
                > h) Users's OC friend relationships -- in their district, state, and
                > nationwide
                >
                > i) Coming soon, more personally bookmarked content from users of MyOC
                >
                > Coming from this, a few sample use cases:
                >
                > i. Political bloggers will be able to more easily access user opinion on
                > bills & issues & Members in a specific Congressional district, e.g., "In
                > the NY-12 Congressional District, public opinion is running strongly
                > against this bill, with 147 out of 195 users opposing it. These users
                > are also opposing this related bill, and have given their Rep an
                > approval rating of only 29%, etc."
                >
                > ii. Issue-based groups will be able to create highly customizable
                > widgets identifying the most significant bills, votes, related issue
                > areas, and Members relating to them. Groups will be able to easily
                > display & re-publish the news coverage, blog coverage, and user comments
                > rated "most helpful" on their issue by OC users.
                >
                > iii. With future planned feature development, users will be able to
                > interact with each other in new ways, and contribute analysis of bills &
                > votes on the site -- this too will be made available to programmers
                > looking to keep their communities in touch with issue areas they care
                > about. All the social actions & opinions taking place on OC will be
                > available through the API.
                >
                > If you're intersted in helping us build the API, we'd love volunteer
                > time -- send me an email at drm@... -- or if you have
                > questions, feel free to drop me a line as well. I don't really have a
                > pinpoint estimate of when the API will be finished at its current rate,
                > given other development work underway, but it should be ready before the
                > start of the next Congress in January '09, and hopefully much before then.
                >
                > Input welcome on all the above, and volunteer help greatly appreciated,
                > Thanks,
                > -David
                >
              • aronpilhofer
                ... Let s hope it s a low floor, because I wanted to let folks know we ve just released our campaign finance API. Not necessarily of great use to this group,
                Message 7 of 15 , Oct 15, 2008
                • 0 Attachment
                  > Bah! APIs! The next time someone says API I'm gonna jump out a window.

                  Let's hope it's a low floor, because I wanted to let folks know we've
                  just released our campaign finance API. Not necessarily of great use
                  to this group, but who knows.

                  http://developer.nytimes.com/docs/campaign_finance_api

                  Incidentally, I agree that API's are a rather crappy way of
                  distributing data en toto, but who is arguing this as an either/or?
                  There is significant value in both.

                  First, you mention how horrible it would be should the FEC create an
                  API. But not everyone has the technical know-how to handle, what, 12?
                  13? million FEC records, much less make sense of the arcane poorly
                  documented system they use to categorize and code individual records.
                  If you don't know what you are doing, you can end up completely
                  shooting yourself in the foot.

                  And don't even get me started on the electronic filings, which is what
                  we are using for our own API. The process of massaging those data into
                  something meaningful is far far more complicated than it should be.
                  (Like, who's the genius who decided not to require campaigns to
                  disclose their aggregate amount of unitemized donors?)

                  So, why should you be required to become a campaign finance expert in
                  order to use the data? That's an artificial and unnecessary barrier.

                  Second, not everyone wants all 8 kazillion records. They may only care
                  about specific donors, or specific candidates, or specific localities.
                  A well-written API (ours is a work in progress, so, don't judge it
                  just yet) is another way of lowering the barrier of entry.

                  I agree that the term and the concept is getting a bit overused. But
                  that isn't a compelling reason NOT to make access to data easier for
                  people.

                  >Again, I'm not actually enacting this policy over my data.

                  On the specific point that started this thread, it might be a good
                  time to gently remind you that this is not your data. It's the
                  public's data, which you (and god bless you for having done it) have
                  taken the time and effort to make available in a rational format for
                  the betterment of all.

                  It is a lesson I think we all learned on the playground: sharing is
                  not always reciprocal. There are going to be people out there who
                  take, and don't give back. I understand your frustration, but I don't
                  think adding some new requirement is going to help all that much, and
                  may actually end up hurting more than anything else.

                  My 2 cents,
                  Aron
                • Josh Tauberer
                  ... Well, look, I wasn t making a statement about APIs in general. I was responding to a response to my statement about contributing to the commons, and I was
                  Message 8 of 15 , Oct 15, 2008
                  • 0 Attachment
                    aronpilhofer wrote:
                    > Incidentally, I agree that API's are a rather crappy way of
                    > distributing data en toto, but who is arguing this as an either/or?
                    > There is significant value in both.

                    Well, look, I wasn't making a statement about APIs in general.

                    I was responding to a response to my statement about contributing to the
                    commons, and I was saying that an API doesn't contribute the data to the
                    commons.

                    In the case of the Times's FEC API, the data is already available in
                    bulk from the FEC. You're providing an additional service to make things
                    easier, and I say that is only a good thing. You're also a commercial
                    enterprise, with different goals, and I meant to only be addressing the
                    strictly nonprofit/transparency world, though I know I didn't say it.

                    > On the specific point that started this thread, it might be a good
                    > time to gently remind you that this is not your data.

                    For all the time I put into it, I think I get a little say in how it is
                    used (if you access my server to get it). I have no moral obligation to
                    provide the data to everyone. At worst it would be hypocritical to start
                    adding restrictions when I talk about openness, which is why I don't
                    actually have any.

                    And the irony is not past me that if I actually add a restriction,
                    someone could fork the project.

                    > There are going to be people out there who
                    > take, and don't give back.

                    But that doesn't mean I shouldn't have an expectation about what they
                    *ought* to be doing. The fact that someone isn't contributing data that
                    they have back doesn't mean I stop asking.

                    --
                    - Josh Tauberer
                    - GovTrack.us

                    http://razor.occams.info

                    "Yields falsehood when preceded by its quotation! Yields
                    falsehood when preceded by its quotation!" Achilles to
                    Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                  • aronpilhofer
                    ... Fair enough. I move to strike my statement from the record. ... I guess that depends on what restrictions you do decide to slap on it, if any. I m not
                    Message 9 of 15 , Oct 15, 2008
                    • 0 Attachment
                      > Well, look, I wasn't making a statement about APIs in general.

                      Fair enough. I move to strike my statement from the record.

                      > For all the time I put into it, I think I get a little say in how it >is

                      I guess that depends on what restrictions you do decide to slap on it,
                      if any. I'm not telling you anything you don't know -- but that's part
                      of the deal when you decide to open things up. People take and don't
                      play nice. It sucks, but you can't really have it both ways.

                      >The fact that someone isn't contributing data >that
                      > they have back doesn't mean I stop asking.

                      No one said that. But putting some kind of license on the data to
                      enforce it, that's another matter.
                    Your message has been successfully submitted and would be delivered to recipients shortly.