Loading ...
Sorry, an error occurred while loading the content.

Red Cross Documents: extra credit challenge

Expand Messages
  • Dan Prives
    In response to an inquiry from Sen. Charles Grassley, the American Red Cross has provided several thousand pages of background materials, ranging from thank
    Message 1 of 6 , Mar 1, 2006
    • 0 Attachment
      In response to an inquiry from Sen. Charles Grassley, the American Red Cross has provided several thousand pages of background materials, ranging from thank you letters to board members to minutes from finance committee meetings.

      The problems is, the documents are in 8 large, unindexed PDF files of
      200-300 pages each.

      I would like to throw out the challenge to anyone in the community to break up these files into individual document files, index them, label them, and make them publicly available on the Internet within the next week or so.

      Links to the large document files can be found here:

      http://finance.senate.gov/sitepages/grassley.htm

      They are the links labeled Response 1 through Response 8.

      Regards,
      Dan Prives
      "Where Most Needed"
      Blogging daily about charities at:
      http://www.wheremostneeded.org
    • Don Cameron
      ... Hi Dan, Being one who loves a technical challenge your request was hard to pass-up - So for the sake of the exercise I downloaded the first of these PDF s
      Message 2 of 6 , Mar 1, 2006
      • 0 Attachment
        -----original message-----
        >> I would like to throw out the challenge to anyone in the community to break up these files into individual document files, index them, label them, and make them publicly available on the Internet within the next week or so.>>



        Hi Dan,

        Being one who loves a technical challenge your request was hard to pass-up - So for the sake of the exercise I downloaded the first of these PDF's (17 meg) to see how hard it would be to do as you ask...

        Running a 'select-all' > 'copy' from Acrobat Reader took all of about 10 seconds... Pasting the resulting clipboard contents into Word a few more seconds. Here the challenge begins, although fortunately the Red Cross have added standard delineators between each document (a reference number), meaning it's quite simple to automate the function of adding a page break.

        The last task (converting to HTML or PDF as single files is relatively
        simple so I would estimate about 20-30 mins per file to get the outcome you are after. Adding an index a bit more complex but hardly onerous.

        However...

        - Would publishing the resultant docs infringe on copyright?

        - Do you have the permission of the Red Cross to do this?

        - Is it ethical to pull apart someone else's document for republication?

        - Why are we doing this? (It's easy enough for anyone to read the originals)

        - It appears as though some of the content contains letters from individuals

        - IMO we would also need the permission of individual authors for
        re-publication.

        Interested in your thoughts on these points before going any further...

        Rgds, Don
      • Dan Prives
        Don: Thanks for the interest. These items were provided by the American Red Cross in response to a letter from Sen. Charles Grassley of the Senate Finance
        Message 3 of 6 , Mar 2, 2006
        • 0 Attachment
          Don:

          Thanks for the interest. These items were provided by the American Red Cross in response to a letter from Sen. Charles Grassley of the Senate Finance Committee. Sen. Grassley's letter is attached to this press release:

          http://finance.senate.gov/press/Gpress/2005/prg122905.pdf

          Sen. Grassley then included them as a attachments to a press release, so they are public domain, which apparently was Sen. Grassley's intent in making the inquiry. The site you downloaded these from is the Senate Finance Committee's.

          The interest in these documents is illuminating the governance and operations of the American Red Cross, so that we can get a better understanding of how the organization operates. My own interest is in increasing understanding & getting beyond finger pointing. It's easy to read the originals, but it's not easy to find them by subject (for instance, the audit committee minutes on a specific date).

          Or maybe I just need to learn how to separate the documents. Is this really something I could do in OpenOffice.org Writer or MS Word?

          Regards,
          Dan Prives
          "Where Most Needed"
          Blogging daily about charities at:
          http://www.wheremostneeded.org




          -----original message-----
          >>- Would publishing the resultant docs infringe on copyright? - Do you have the permission of the Red Cross to do this? - Is it ethical to pull apart someone else's document for republication? - Why are we doing this? (It's easy enough for anyone to read the originals) - It appears as though some of the content contains letters from individuals - IMO we would also need the permission of individual authors for re-publication.>>
        • Don Cameron
          ... Hi Dan - Yes - When opening a PDF in Acrobat reader, from the Edit menu choose Select All and Copy . This copies the content to the clipboard. You can
          Message 4 of 6 , Mar 2, 2006
          • 0 Attachment
            -----original message-----
            >> maybe I just need to learn how to separate the documents. Is this really something I could do in OpenOffice.org Writer or MS Word?>>

            Hi Dan - Yes - When opening a PDF in Acrobat reader, from the 'Edit' menu choose 'Select All' and 'Copy'. This copies the content to the clipboard. You can subsequently 'Paste' this content into another application. I use Word 2003 so the text included in any images (scanned documents) is also pasted along with regular text. I'm not sure if Open Office supports this however I assume it does although I haven't tested it.

            Not being aware of the background I'm still reluctant to re-publish these documents without permission, although maybe someone else will... I'm assuming they were sent in confidence to your Senator and s/he made them public under rights granted under political process? - I'm not sure this really makes them 'public domain', or that such legal exclusions carry to you or I re-publishing the documents.

            In my experience the Red Cross is very open and approachable - my wife was a Red Cross Disaster Victim Registrar for many years and we share a lot of respect for the work of this org - When you asked them for this info what was the response?

            Cheers, Don
          • Dan Prives
            Don: I recall that you are outside the US. What is prompting this interest is the recent resignation of the President of ARC after just a few years in the
            Message 5 of 6 , Mar 3, 2006
            • 0 Attachment
              Don:

              I recall that you are outside the US. What is prompting this interest is the recent resignation of the President of ARC after just a few years in the post. The same thing happened to their previous head. In both cases, the press thought it was about performance during disasters, but the people who resigned said it was about internal/board difficulties. So Sen. Grassley and the rest of us in the US are interested in seeing these internal board documents, as they may reveal more about what is really going on.

              I didn't get very good results when I tried the simple cut and paste. Some of the documents came out fairly well, but others were just gibberish.

              Regards,

              Dan Prives
              "Where Most Needed"
              Blogging daily about charities at:
              http://www.wheremostneeded.org




              -----original message-----
              >>Not being aware of the background I'm still reluctant to re-publish these documents without permission, although maybe someone else will... I'm assuming they were sent in confidence to your Senator and s/he made them public under rights granted under political process? - I'm not sure this really makes them 'public domain', or that such legal exclusions carry to you or I re-publishing the documents. In my experience the Red Cross is very open and approachable - my wife was a Red Cross Disaster Victim Registrar for many years and we share a lot of respect for the work of this org - When you asked them for this info what was the response?>>
            • Don Cameron
              ... Dan, Much of the content of these documents appears to be scanned images (hand-written memo s or typed letters with hand-written notes). These will not
              Message 6 of 6 , Mar 5, 2006
              • 0 Attachment
                -----original message-----
                >> I didn't get very good results when I tried the simple cut and paste. Some of the documents came out fairly well, but others were just gibberish.>>



                Dan,

                Much of the content of these documents appears to be scanned images (hand-written memo's or typed letters with hand-written notes). These will not copy/paste properly if selected as 'text' during a copy/paste operation.

                To copy images from a PDF using Acrobat Reader (ver 6.0 or later required) -

                1. Open the PDF document

                2. On the Basic toolbar click the down arrow beside the Select Text button

                3. Choose 'Select Image'

                4. Click on the graphic to select it

                5. Right click on the graphic and select 'Copy Image to Clipboard'

                6. Paste the rsultant image into Word or OO Writer

                Rgds, Don
              Your message has been successfully submitted and would be delivered to recipients shortly.