Loading ...
Sorry, an error occurred while loading the content.

Proposed new terms of data use

Expand Messages
  • Josh Tauberer
    This weekend I was in the Bay Area at the Free Culture 2008 conference (really great!) and a combination of things happened- At a somewhat random meeting
    Message 1 of 15 , Oct 13, 2008
    • 0 Attachment
      This weekend I was in the Bay Area at the Free Culture 2008 conference
      (really great!) and a combination of things happened-

      At a somewhat random meeting before the conference, someone said to me
      that the government transparency/civic hacking community is quite like a
      lot of little islands. We don't actually work together nearly as much as
      we could or should.

      During the conference, it became clearer to me that there may be more
      one-way transactions going on with GovTrack data/scripts than I thought.

      So I put the following forward. This is not entirely serious, but it is
      not a joke either.

      If you are using GovTrack data and are enhancing it, combining it with
      other data, or using data and modifying my scripts, then I *expect* that
      a contribution back to the "commons" of something in a *compatible*
      format be made, or risk that I severely rate-limit your access to the
      GovTrack database.

      What I would mean, if I were serious about this, is-

      If you have new data, you have to either:
      a) Give it to me to post, or
      b) Post it publicly and document where it is on my wiki or on some
      suitable neutral-ground wiki listing data resources,
      and it must be:
      c) In a format similar to the formats I use, meaning it must use
      my identifiers for things. (Unless you predate GovTrack, in
      which case I'll concede that it is my responsibility to use
      *your* identifiers.)

      If you modify my script, you have to send me back a patch. (I think this
      might be the terms of the AGPL anyway.)

      The rationale here is that I'm tired of seeing forking of various sorts.
      It pains me personally and it makes me upset that we can't all just do
      things in a common way.

      Here's the short form:

      If you're using my data and you have something missing from the
      "commons", i.e. the pool of publicly available data, then you are
      expected to give back in a substantive, meaningful, commensurate, and
      compatible way. The fact that you have agreed to not share someone
      else's closed data does not absolve you of the expectation.

      Again, I'm not actually enacting this policy over my data. But I'm tempted.

      --
      - Josh Tauberer
      - GovTrack.us

      http://razor.occams.info

      "Yields falsehood when preceded by its quotation! Yields
      falsehood when preceded by its quotation!" Achilles to
      Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
    • Aaron Swartz
      Hi Josh, Let me know if we at watchdog.net can do anything to be more open/helpful/compliant with this. With each passing week I m more interested in
      Message 2 of 15 , Oct 13, 2008
      • 0 Attachment
        Hi Josh,

        Let me know if we at watchdog.net can do anything to be more
        open/helpful/compliant with this. With each passing week I'm more
        interested in collaborating with other groups and opening things up
        more.
      • Michael Dale
        Hi All, I also enjoyed the FC conference, and had some productive talks Josh around metavid data integrations :) In terms of data re-sharing ...you could
        Message 3 of 15 , Oct 13, 2008
        • 0 Attachment
          Hi All,

          I also enjoyed the FC conference, and had some productive talks Josh
          around metavid data integrations :)

          In terms of data re-sharing ...you could license the "transformed" data
          that govTrack makes available under cc-by-sa but creative commons
          license does not says much about ~how~ the transformations are re-made
          available.

          Its hard to quantify your desire to encourage re-usability but using
          shared keys seems like a good start. Maybe it would be more productive
          to profile "good" re-usage that easily returns things to the commons in
          a structured and easy-to-access way and "bad" re-usage that has all the
          output in flash swf with cryptic connections to the server ;). ie maybe
          focus on providing constructive advice to groups working in this space
          to maximize the commons and re usability of the data. ie provide a means
          of "querying the data with gov_track ID if the govtrack data is used"

          I will quickly profile data usage / re-usage on metavid.org ;)

          We had to scrape data from many sites to build the present semantic
          congress dataset since not all the sites make their enhancement to the
          data available in clean easy to reuse formats... I am sure all these
          sites want to share the data-sets but are simply limited by time and or
          limited resources...

          The good thing is as technologies like semantic wiki propagate.. it
          becomes really easy to write a scraper and then add in massive amounts
          of arbitrary structured data. Since you just have to edit a few wiki
          pages rather than modify SQL tables.

          But the real fun comes when you want to display and share the data ....
          If you look under the hood at metavid you see that all the "views" or
          pages are just "mashups" on the site itself. ie the
          http://metavid.org/wiki/Members_of_Congress page is just an inline
          semantic query using: http://tinyurl.com/3fubtd you could pretty easily
          make your own "members of congress" page that highlighted any given set
          of properties your interested in. Likewise people, bills and interest
          group pages work in a similar way.

          more info on metavid semantic queries here:
          http://metavid.org/wiki/Sample_Semantic_Queries_page

          It was easy for us to make all the enhancements of congress data
          instantly accessible to any site that we scraped because of the nature
          of the metavid platform. Ie govtrack can pull videos via gov_track id,
          and maplight if they wanted to could pull videos via maplight id etc.

          I imagine it would not be difficult for other platforms to offer
          accessing the data via the govtrack keys. I definitely echo Josh call
          for more congress data sites making data accessible via shared keys!
          Sunlight api can also help in this regard:
          http://services.sunlightlabs.com/api/

          P.S: We are presently re-launching metavid so everyone is encouraged to
          blog about if and provide feedback if they want.
          a blog post calling for people to blog about it / participate:
          http://metavid.org/blog/

          peace,
          michael



          Aaron Swartz wrote:
          > Hi Josh,
          >
          > Let me know if we at watchdog.net can do anything to be more
          > open/helpful/compliant with this. With each passing week I'm more
          > interested in collaborating with other groups and opening things up
          > more.
          >
          >
        • Fred Benenson
          Just to jump in -- CC-BY-SA requires the modified work to be released under the same CC license. This means, effectively, that you can change the format so
          Message 4 of 15 , Oct 13, 2008
          • 0 Attachment
            Just to jump in -- CC-BY-SA requires the modified work to be released under the same CC license. This means, effectively, that you can change the format so long as the resulting version is under the same license.

            But BY-SA may not be the best option, and we (I work for CC full time) discourage use of CC licenses for large data sets of questionable copyright status. Science Commons is working in this respect, on scientific data.

            The reason is because requiring attribution when data is used can be overly onerous for researchers (especially within scientific fields of study). It also becomes very difficult to track and provide attribution across multiple generations.

            My feeling is that Josh might be best served by simply offering (and perhaps predicating access to the data on) a request  that the data be used in this way, not a legal mandate.

            F
             

            On Mon, Oct 13, 2008 at 5:39 PM, Fred Benenson <fred.benenson@...> wrote:
            Just to jump in -- CC-BY-SA requires the modified work to be released under the same CC license. This means, effectively, that you can change the format so long as the resulting version is under the same license.

            But BY-SA may not be the best option, and we (I work for CC full time) discourage use of CC licenses for large data sets of questionable copyright status. Science Commons is working in this respect, on scientific data.

            The reason is because requiring attribution when data is used can be overly onerous for researchers (especially within scientific fields of study). It also becomes very difficult to track and provide attribution across multiple generations.

            My feeling is that Josh might be best served by simply offering (and perhaps predicating access to the data on) a request  that the data be used in this way, not a legal mandate.

            F

             



            On Mon, Oct 13, 2008 at 4:40 PM, Michael Dale <dale@...> wrote:

            Hi All,

            I also enjoyed the FC conference, and had some productive talks Josh
            around metavid data integrations :)

            In terms of data re-sharing ...you could license the "transformed" data
            that govTrack makes available under cc-by-sa but creative commons
            license does not says much about ~how~ the transformations are re-made
            available.

            Its hard to quantify your desire to encourage re-usability but using
            shared keys seems like a good start. Maybe it would be more productive
            to profile "good" re-usage that easily returns things to the commons in
            a structured and easy-to-access way and "bad" re-usage that has all the
            output in flash swf with cryptic connections to the server ;). ie maybe
            focus on providing constructive advice to groups working in this space
            to maximize the commons and re usability of the data. ie provide a means
            of "querying the data with gov_track ID if the govtrack data is used"

            I will quickly profile data usage / re-usage on metavid.org ;)

            We had to scrape data from many sites to build the present semantic
            congress dataset since not all the sites make their enhancement to the
            data available in clean easy to reuse formats... I am sure all these
            sites want to share the data-sets but are simply limited by time and or
            limited resources...

            The good thing is as technologies like semantic wiki propagate.. it
            becomes really easy to write a scraper and then add in massive amounts
            of arbitrary structured data. Since you just have to edit a few wiki
            pages rather than modify SQL tables.

            But the real fun comes when you want to display and share the data ....
            If you look under the hood at metavid you see that all the "views" or
            pages are just "mashups" on the site itself. ie the
            http://metavid.org/wiki/Members_of_Congress page is just an inline
            semantic query using: http://tinyurl.com/3fubtd you could pretty easily
            make your own "members of congress" page that highlighted any given set
            of properties your interested in. Likewise people, bills and interest
            group pages work in a similar way.

            more info on metavid semantic queries here:
            http://metavid.org/wiki/Sample_Semantic_Queries_page

            It was easy for us to make all the enhancements of congress data
            instantly accessible to any site that we scraped because of the nature
            of the metavid platform. Ie govtrack can pull videos via gov_track id,
            and maplight if they wanted to could pull videos via maplight id etc.

            I imagine it would not be difficult for other platforms to offer
            accessing the data via the govtrack keys. I definitely echo Josh call
            for more congress data sites making data accessible via shared keys!
            Sunlight api can also help in this regard:
            http://services.sunlightlabs.com/api/

            P.S: We are presently re-launching metavid so everyone is encouraged to
            blog about if and provide feedback if they want.
            a blog post calling for people to blog about it / participate:
            http://metavid.org/blog/

            peace,
            michael



            Aaron Swartz wrote:
            > Hi Josh,
            >
            > Let me know if we at watchdog.net can do anything to be more
            > open/helpful/compliant with this. With each passing week I'm more
            > interested in collaborating with other groups and opening things up
            > more.
            >
            >



          • Aaron Swartz
            Josh, who are the malefactors here? We can help you put a little pressure on them. I suspect just raising the issue with them publicly will help a great deal.
            Message 5 of 15 , Oct 14, 2008
            • 0 Attachment
              Josh, who are the malefactors here? We can help you put a little
              pressure on them. I suspect just raising the issue with them publicly
              will help a great deal.
            • Josh Tauberer
              ... Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in the XML. Document what s in the files (http://watchdog.net/about/api is broken atm so
              Message 6 of 15 , Oct 14, 2008
              • 0 Attachment
                Aaron Swartz wrote:
                > Let me know if we at watchdog.net can do anything to be more
                > open/helpful/compliant with this. With each passing week I'm more
                > interested in collaborating with other groups and opening things up
                > more.

                Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
                the XML. Document what's in the files (http://watchdog.net/about/api is
                broken atm so I don't know what's there).

                > Josh, who are the malefactors here? We can help you put a little
                > pressure on them. I suspect just raising the issue with them publicly
                > will help a great deal.

                I don't want to go down that path. No one is doing anything particularly
                egregious or at all malicious.

                --
                - Josh Tauberer
                - GovTrack.us

                http://razor.occams.info

                "Yields falsehood when preceded by its quotation! Yields
                falsehood when preceded by its quotation!" Achilles to
                Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
              • Aaron Swartz
                ... The dumps are really meant to be TSV rather than SQL. And our webpages all speak RDF/XML. But I m probably not going to do XML dumps because I hate the
                Message 7 of 15 , Oct 14, 2008
                • 0 Attachment
                  > Use XML instead of/in addition to JSON/SQL.

                  The dumps are really meant to be TSV rather than SQL. And our webpages
                  all speak RDF/XML. But I'm probably not going to do XML dumps because
                  I hate the format with a white-hot passion and I fail to see the
                  point. Surely TSV is just as easy to parse.

                  > Normalize names to IDs in the XML.

                  This is true in the TSV.

                  > Document what's in the files (http://watchdog.net/about/api is
                  > broken atm so I don't know what's there).

                  Sorry, fixed. There's some docs in /data now; let me know what you
                  think is missing.
                • Ilan Rabinovitch
                  ... Josh, At the moment GeekPAC is using your data by parsing the feeds via rsync and putting them into a SQL database. I m still doing a little clean up,
                  Message 8 of 15 , Oct 14, 2008
                  • 0 Attachment
                    Josh Tauberer wrote:
                    >
                    > Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
                    > the XML. Document what's in the files (http://watchdog.net/about/api is
                    > broken atm so I don't know what's there).
                    >
                    >
                    Josh,

                    At the moment GeekPAC is using your data by parsing the feeds via rsync
                    and putting them into a SQL database. I'm still doing a little clean
                    up, but I do plan to post both the database dumps, as well as the Deki
                    extensions we've written that perform the SQL queries we display. Does
                    that fall in line with what you were thinking for acceptable use?

                    We are not currently adding anything new to the data so reoutputing to
                    XML seems a bit redundant. The reason we prefer SQL is that its easier
                    for us to do relational queries on SQL than XML.

                    Regards,

                    Ilan
                  • Josh Tauberer
                    ... Besides what Fred posted (thanks Fred), I m not sure I can even assert copyright over the data --- it s a database, there s basically no creative
                    Message 9 of 15 , Oct 14, 2008
                    • 0 Attachment
                      Michael Dale wrote:
                      > In terms of data re-sharing ...you could license the "transformed" data
                      > that govTrack makes available under cc-by-sa but creative commons
                      > license does not says much about ~how~ the transformations are re-made
                      > available.

                      Besides what Fred posted (thanks Fred), I'm not sure I can even assert
                      copyright over the data --- it's a database, there's basically no
                      creative difference from the public domain original. I wouldn't really
                      want to anyway, for the same reason I probably wouldn't create real TOS,
                      since I do think the data should be free.

                      > focus on providing constructive advice to groups working in this space
                      > to maximize the commons and re usability of the data. ie provide a means
                      > of "querying the data with gov_track ID if the govtrack data is used"

                      Sure, your feedback and the other comments have been instructive for
                      figuring out that angle.

                      > I will quickly profile data usage / re-usage on metavid.org ;)

                      I didn't have MetaVid in mind. :) I would, however, love to see database
                      dumps (in a useful format) rather than having to query for everything.

                      (I hope I'm the only one who hates queries and APIs as a primary means
                      of data access....)

                      --
                      - Josh Tauberer
                      - GovTrack.us

                      http://razor.occams.info

                      "Yields falsehood when preceded by its quotation! Yields
                      falsehood when preceded by its quotation!" Achilles to
                      Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                    • Josh Tauberer
                      ... In that case you have nothing to worry about. :) -- - Josh Tauberer - GovTrack.us http://razor.occams.info Yields falsehood when preceded by its
                      Message 10 of 15 , Oct 14, 2008
                      • 0 Attachment
                        Ilan Rabinovitch wrote:
                        > Josh Tauberer wrote:
                        >> Use XML instead of/in addition to JSON/SQL. Normalize names to IDs in
                        >> the XML. Document what's in the files (http://watchdog.net/about/api is
                        >> broken atm so I don't know what's there).
                        >>
                        >>
                        > Josh,
                        >
                        > At the moment GeekPAC is using your data by parsing the feeds via rsync
                        > and putting them into a SQL database. I'm still doing a little clean
                        > up, but I do plan to post both the database dumps, as well as the Deki
                        > extensions we've written that perform the SQL queries we display. Does
                        > that fall in line with what you were thinking for acceptable use?
                        >
                        > We are not currently adding anything new to the data so reoutputing to
                        > XML seems a bit redundant.

                        In that case you have nothing to worry about. :)

                        --
                        - Josh Tauberer
                        - GovTrack.us

                        http://razor.occams.info

                        "Yields falsehood when preceded by its quotation! Yields
                        falsehood when preceded by its quotation!" Achilles to
                        Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                      • David Moore
                        Hi everyone, David with OpenCongress here. Definitely count us in on whatever community standards are agreed upon, we re happy to contribute. More details
                        Message 11 of 15 , Oct 14, 2008
                        • 0 Attachment
                          Hi everyone, David with OpenCongress here. Definitely count us in on
                          whatever community standards are agreed upon, we're happy to contribute.
                          More details below, think that Josh is right to bring it up.

                          As a foundation, our site code is open-source under the GPL and we offer
                          a host of RSS feeds & widgets & sharing tools to push info out.

                          We've always wanted to build an open API, but to be honest, given our
                          small staff & limited programming time, it wasn't as much of a priority
                          as major feature development.

                          Of course, that hasn't stopped us from starting work on a totally open
                          API on the back burner, making all data on OC & created by the OC user
                          community available. We've looped in a volunteer programmer to work on
                          the project with us in his spare time.

                          The OpenCongress API should do the trick as far as putting more data
                          from our corner of the transparency world on the communal table. Overall
                          goal is to provide programmers w/ an API that they could access and get
                          the bills associated with a given issue area, their status, and
                          blogs/commentary/social wisdom about them. We'll be able to provide
                          developers with at least the following data for non-commercial use:

                          a) Aggregated news & blog coverage of bills, Senators, and
                          Representatives, including those ranked "most useful"

                          b) Counts and locations of users tracking bills, Members, committees,
                          issues, etc.

                          c) User comments, incl. those rated "most useful", i.e. filtered up

                          d) User approval ratings for Members

                          e) User votes "aye" or "nay" on bills sitewide

                          f) Users also tracking related bills, issues, Members (connections)

                          g) Users who support/oppose also support/oppose related bills & Members

                          h) Users's OC friend relationships -- in their district, state, and
                          nationwide

                          i) Coming soon, more personally bookmarked content from users of MyOC

                          Coming from this, a few sample use cases:

                          i. Political bloggers will be able to more easily access user opinion on
                          bills & issues & Members in a specific Congressional district, e.g., "In
                          the NY-12 Congressional District, public opinion is running strongly
                          against this bill, with 147 out of 195 users opposing it. These users
                          are also opposing this related bill, and have given their Rep an
                          approval rating of only 29%, etc."

                          ii. Issue-based groups will be able to create highly customizable
                          widgets identifying the most significant bills, votes, related issue
                          areas, and Members relating to them. Groups will be able to easily
                          display & re-publish the news coverage, blog coverage, and user comments
                          rated "most helpful" on their issue by OC users.

                          iii. With future planned feature development, users will be able to
                          interact with each other in new ways, and contribute analysis of bills &
                          votes on the site -- this too will be made available to programmers
                          looking to keep their communities in touch with issue areas they care
                          about. All the social actions & opinions taking place on OC will be
                          available through the API.

                          If you're intersted in helping us build the API, we'd love volunteer
                          time -- send me an email at drm@... -- or if you have
                          questions, feel free to drop me a line as well. I don't really have a
                          pinpoint estimate of when the API will be finished at its current rate,
                          given other development work underway, but it should be ready before the
                          start of the next Congress in January '09, and hopefully much before then.

                          Input welcome on all the above, and volunteer help greatly appreciated,
                          Thanks,
                          -David

                          --
                          David Moore
                          c: (917) 753-3462
                          www.opencongress.org
                        • Josh Tauberer
                          Bah! APIs! The next time someone says API I m gonna jump out a window. I ve got a window right here. It s open. I m ready. The one case an API makes sense as a
                          Message 12 of 15 , Oct 14, 2008
                          • 0 Attachment
                            Bah! APIs! The next time someone says API I'm gonna jump out a window.
                            I've got a window right here. It's open. I'm ready.

                            The one case an API makes sense as a primary means of data access is
                            when the data is so large and inseparable that it cannot be reasonably
                            distributed in files. It would have to be, say, at least several hundred
                            megabytes if not a few gigabytes for that to be the case --- and even
                            then one would have to justify not making use of resources like
                            public.resource.org to host it.

                            Can you imagine the outrage if the FEC decided to make its data
                            available only via an API with an API key that was limited to some fixed
                            number of queries per day? What's the first thing that would happen?
                            People (people like Carl Malamud right?) would reconstruct the database
                            and make it available via FTP.

                            Besides the case where the data is just too big, if the data is not
                            available in a flat file, it is IMO simply not open data, and as far as
                            what I am talking about on this thread, it "doesn't count".

                            (APIs take time to program correctly. Yes. Insufficient resources =
                            acceptable reason not to have an API. Database dumps do not take serious
                            effort.)

                            --
                            - Josh Tauberer
                            - GovTrack.us

                            http://razor.occams.info

                            "Yields falsehood when preceded by its quotation! Yields
                            falsehood when preceded by its quotation!" Achilles to
                            Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)


                            David Moore wrote:
                            > Hi everyone, David with OpenCongress here. Definitely count us in on
                            > whatever community standards are agreed upon, we're happy to contribute.
                            > More details below, think that Josh is right to bring it up.
                            >
                            > As a foundation, our site code is open-source under the GPL and we offer
                            > a host of RSS feeds & widgets & sharing tools to push info out.
                            >
                            > We've always wanted to build an open API, but to be honest, given our
                            > small staff & limited programming time, it wasn't as much of a priority
                            > as major feature development.
                            >
                            > Of course, that hasn't stopped us from starting work on a totally open
                            > API on the back burner, making all data on OC & created by the OC user
                            > community available. We've looped in a volunteer programmer to work on
                            > the project with us in his spare time.
                            >
                            > The OpenCongress API should do the trick as far as putting more data
                            > from our corner of the transparency world on the communal table. Overall
                            > goal is to provide programmers w/ an API that they could access and get
                            > the bills associated with a given issue area, their status, and
                            > blogs/commentary/social wisdom about them. We'll be able to provide
                            > developers with at least the following data for non-commercial use:
                            >
                            > a) Aggregated news & blog coverage of bills, Senators, and
                            > Representatives, including those ranked "most useful"
                            >
                            > b) Counts and locations of users tracking bills, Members, committees,
                            > issues, etc.
                            >
                            > c) User comments, incl. those rated "most useful", i.e. filtered up
                            >
                            > d) User approval ratings for Members
                            >
                            > e) User votes "aye" or "nay" on bills sitewide
                            >
                            > f) Users also tracking related bills, issues, Members (connections)
                            >
                            > g) Users who support/oppose also support/oppose related bills & Members
                            >
                            > h) Users's OC friend relationships -- in their district, state, and
                            > nationwide
                            >
                            > i) Coming soon, more personally bookmarked content from users of MyOC
                            >
                            > Coming from this, a few sample use cases:
                            >
                            > i. Political bloggers will be able to more easily access user opinion on
                            > bills & issues & Members in a specific Congressional district, e.g., "In
                            > the NY-12 Congressional District, public opinion is running strongly
                            > against this bill, with 147 out of 195 users opposing it. These users
                            > are also opposing this related bill, and have given their Rep an
                            > approval rating of only 29%, etc."
                            >
                            > ii. Issue-based groups will be able to create highly customizable
                            > widgets identifying the most significant bills, votes, related issue
                            > areas, and Members relating to them. Groups will be able to easily
                            > display & re-publish the news coverage, blog coverage, and user comments
                            > rated "most helpful" on their issue by OC users.
                            >
                            > iii. With future planned feature development, users will be able to
                            > interact with each other in new ways, and contribute analysis of bills &
                            > votes on the site -- this too will be made available to programmers
                            > looking to keep their communities in touch with issue areas they care
                            > about. All the social actions & opinions taking place on OC will be
                            > available through the API.
                            >
                            > If you're intersted in helping us build the API, we'd love volunteer
                            > time -- send me an email at drm@... -- or if you have
                            > questions, feel free to drop me a line as well. I don't really have a
                            > pinpoint estimate of when the API will be finished at its current rate,
                            > given other development work underway, but it should be ready before the
                            > start of the next Congress in January '09, and hopefully much before then.
                            >
                            > Input welcome on all the above, and volunteer help greatly appreciated,
                            > Thanks,
                            > -David
                            >
                          • aronpilhofer
                            ... Let s hope it s a low floor, because I wanted to let folks know we ve just released our campaign finance API. Not necessarily of great use to this group,
                            Message 13 of 15 , Oct 15, 2008
                            • 0 Attachment
                              > Bah! APIs! The next time someone says API I'm gonna jump out a window.

                              Let's hope it's a low floor, because I wanted to let folks know we've
                              just released our campaign finance API. Not necessarily of great use
                              to this group, but who knows.

                              http://developer.nytimes.com/docs/campaign_finance_api

                              Incidentally, I agree that API's are a rather crappy way of
                              distributing data en toto, but who is arguing this as an either/or?
                              There is significant value in both.

                              First, you mention how horrible it would be should the FEC create an
                              API. But not everyone has the technical know-how to handle, what, 12?
                              13? million FEC records, much less make sense of the arcane poorly
                              documented system they use to categorize and code individual records.
                              If you don't know what you are doing, you can end up completely
                              shooting yourself in the foot.

                              And don't even get me started on the electronic filings, which is what
                              we are using for our own API. The process of massaging those data into
                              something meaningful is far far more complicated than it should be.
                              (Like, who's the genius who decided not to require campaigns to
                              disclose their aggregate amount of unitemized donors?)

                              So, why should you be required to become a campaign finance expert in
                              order to use the data? That's an artificial and unnecessary barrier.

                              Second, not everyone wants all 8 kazillion records. They may only care
                              about specific donors, or specific candidates, or specific localities.
                              A well-written API (ours is a work in progress, so, don't judge it
                              just yet) is another way of lowering the barrier of entry.

                              I agree that the term and the concept is getting a bit overused. But
                              that isn't a compelling reason NOT to make access to data easier for
                              people.

                              >Again, I'm not actually enacting this policy over my data.

                              On the specific point that started this thread, it might be a good
                              time to gently remind you that this is not your data. It's the
                              public's data, which you (and god bless you for having done it) have
                              taken the time and effort to make available in a rational format for
                              the betterment of all.

                              It is a lesson I think we all learned on the playground: sharing is
                              not always reciprocal. There are going to be people out there who
                              take, and don't give back. I understand your frustration, but I don't
                              think adding some new requirement is going to help all that much, and
                              may actually end up hurting more than anything else.

                              My 2 cents,
                              Aron
                            • Josh Tauberer
                              ... Well, look, I wasn t making a statement about APIs in general. I was responding to a response to my statement about contributing to the commons, and I was
                              Message 14 of 15 , Oct 15, 2008
                              • 0 Attachment
                                aronpilhofer wrote:
                                > Incidentally, I agree that API's are a rather crappy way of
                                > distributing data en toto, but who is arguing this as an either/or?
                                > There is significant value in both.

                                Well, look, I wasn't making a statement about APIs in general.

                                I was responding to a response to my statement about contributing to the
                                commons, and I was saying that an API doesn't contribute the data to the
                                commons.

                                In the case of the Times's FEC API, the data is already available in
                                bulk from the FEC. You're providing an additional service to make things
                                easier, and I say that is only a good thing. You're also a commercial
                                enterprise, with different goals, and I meant to only be addressing the
                                strictly nonprofit/transparency world, though I know I didn't say it.

                                > On the specific point that started this thread, it might be a good
                                > time to gently remind you that this is not your data.

                                For all the time I put into it, I think I get a little say in how it is
                                used (if you access my server to get it). I have no moral obligation to
                                provide the data to everyone. At worst it would be hypocritical to start
                                adding restrictions when I talk about openness, which is why I don't
                                actually have any.

                                And the irony is not past me that if I actually add a restriction,
                                someone could fork the project.

                                > There are going to be people out there who
                                > take, and don't give back.

                                But that doesn't mean I shouldn't have an expectation about what they
                                *ought* to be doing. The fact that someone isn't contributing data that
                                they have back doesn't mean I stop asking.

                                --
                                - Josh Tauberer
                                - GovTrack.us

                                http://razor.occams.info

                                "Yields falsehood when preceded by its quotation! Yields
                                falsehood when preceded by its quotation!" Achilles to
                                Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
                              • aronpilhofer
                                ... Fair enough. I move to strike my statement from the record. ... I guess that depends on what restrictions you do decide to slap on it, if any. I m not
                                Message 15 of 15 , Oct 15, 2008
                                • 0 Attachment
                                  > Well, look, I wasn't making a statement about APIs in general.

                                  Fair enough. I move to strike my statement from the record.

                                  > For all the time I put into it, I think I get a little say in how it >is

                                  I guess that depends on what restrictions you do decide to slap on it,
                                  if any. I'm not telling you anything you don't know -- but that's part
                                  of the deal when you decide to open things up. People take and don't
                                  play nice. It sucks, but you can't really have it both ways.

                                  >The fact that someone isn't contributing data >that
                                  > they have back doesn't mean I stop asking.

                                  No one said that. But putting some kind of license on the data to
                                  enforce it, that's another matter.
                                Your message has been successfully submitted and would be delivered to recipients shortly.