Loading ...
Sorry, an error occurred while loading the content.

2 issues

Expand Messages
  • Ben Woosley
    Just FYI, two issues I m seeing: - Rolls from the keithpoole data source use a different naming scheme than we see elsewhere. e.g. h16.xml rather than
    Message 1 of 14 , Apr 30, 2010
    • 0 Attachment
      Just FYI, two issues I'm seeing:
      • Rolls from the keithpoole data source use a different naming scheme than we see elsewhere.  e.g. 'h16.xml' rather than '1987-h16.xml'. For congress 101, both sources are present, under the 2 different naming schemes.
      • Most senate bills missing for congress 101.  This is a bit odd because the roll data seems complete, while on the Blll side, there are many missing bills, including it seems the vast majority of Senate bills.  Congress 100 and 102 don't seem to have this problem
      -Ben
    • Josh Tauberer
      ... Right I was forced into this because earlier in Congressional history (esp at the beginning) Congresses didn t line up with years very nicely. In fact,
      Message 2 of 14 , May 2, 2010
      • 0 Attachment
        > * Rolls from the keithpoole data source use a different naming
        > scheme than we see elsewhere. e.g. 'h16.xml' rather than
        > '1987-h16.xml'. For congress 101, both sources are present, under
        > the 2 different naming schemes.

        Right I was forced into this because earlier in Congressional history
        (esp at the beginning) Congresses didn't line up with years very nicely.
        In fact, they still don't but as a matter of practicality it hadn't
        mattered. In the 1st Congress, for instance, it ends on March 3, 1791
        and the 2nd Congress starts the next day. That means a vote with an id
        that has the year and roll call number is ambiguous between possibly two
        sessions.

        So I changed the naming scheme for roll call votes brought in this way
        to just the roll call number in the files, and in the website URLs I
        think it has the session number and roll call number.

        An unavoidable mess.

        In the 101st Congress, I think the Senate votes may go with the new
        (i.e. 102-111th) scheme and the House votes may go with the historical
        (i.e. 1st-101st) scheme. Maybe.

        > * Most senate bills missing for congress 101. This is a bit odd
        > because the roll data seems complete, while on the Blll side,
        > there are many missing bills, including it seems the vast majority
        > of Senate bills. Congress 100 and 102 don't seem to have this problem

        I'm not sure how far back I've imported data from THOMAS. The further I
        go back, the more errors pop up. For sure it's relatively complete as
        far back as the 103rd. Also, bill and vote data come from different
        sources, so you can't judge the completeness of one by the other.

        These things have all been in progress for a long time...

        > Bobby Jindal carries a PVS id (26879) which has apparently been retired and replaced (with 35481).

        I'll update the db. I used to enter PVS IDs myself but now get it from
        Sunlight's API on a periodical/random basis.

        Thanks.

        Josh
      • Eric Mill
        And I ll make sure Sunlight updates the PVS ID for Bobby Jindal.
        Message 3 of 14 , May 2, 2010
        • 0 Attachment
          And I'll make sure Sunlight updates the PVS ID for Bobby Jindal.

          On Sun, May 2, 2010 at 11:32 AM, Josh Tauberer <tauberer@...> wrote:
          >>     * Rolls from the keithpoole data source use a different naming
          >>       scheme than we see elsewhere.  e.g. 'h16.xml' rather than
          >>       '1987-h16.xml'. For congress 101, both sources are present, under
          >>       the 2 different naming schemes.
          >
          > Right I was forced into this because earlier in Congressional history
          > (esp at the beginning) Congresses didn't line up with years very nicely.
          > In fact, they still don't but as a matter of practicality it hadn't
          > mattered. In the 1st Congress, for instance, it ends on March 3, 1791
          > and the 2nd Congress starts the next day. That means a vote with an id
          > that has the year and roll call number is ambiguous between possibly two
          > sessions.
          >
          > So I changed the naming scheme for roll call votes brought in this way
          > to just the roll call number in the files, and in the website URLs I
          > think it has the session number and roll call number.
          >
          > An unavoidable mess.
          >
          > In the 101st Congress, I think the Senate votes may go with the new
          > (i.e. 102-111th) scheme and the House votes may go with the historical
          > (i.e. 1st-101st) scheme. Maybe.
          >
          >>     * Most senate bills missing for congress 101.  This is a bit odd
          >>       because the roll data seems complete, while on the Blll side,
          >>       there are many missing bills, including it seems the vast majority
          >>       of Senate bills.  Congress 100 and 102 don't seem to have this problem
          >
          > I'm not sure how far back I've imported data from THOMAS. The further I
          > go back, the more errors pop up. For sure it's relatively complete as
          > far back as the 103rd. Also, bill and vote data come from different
          > sources, so you can't judge the completeness of one by the other.
          >
          > These things have all been in progress for a long time...
          >
          >> Bobby Jindal carries a PVS id (26879) which has apparently been retired and replaced (with 35481).
          >
          > I'll update the db. I used to enter PVS IDs myself but now get it from
          > Sunlight's API on a periodical/random basis.
          >
          > Thanks.
          >
          > Josh
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
        • Ben Woosley
          Thanks Josh, On rolls, you might consider using the new, year-less scheme across the board, with soft links or the like aliasing the xml to the old naming
          Message 4 of 14 , May 2, 2010
          • 0 Attachment
            Thanks Josh,

            On rolls, you might consider using the new, year-less scheme across the board, with soft links or the like aliasing the xml to the old naming scheme.

            -Ben


            On Sun, May 2, 2010 at 11:32 AM, Josh Tauberer <tauberer@...> wrote:
               * Rolls from the keithpoole data source use a different naming

                 scheme than we see elsewhere.  e.g. 'h16.xml' rather than
                 '1987-h16.xml'. For congress 101, both sources are present, under
                 the 2 different naming schemes.

            Right I was forced into this because earlier in Congressional history (esp at the beginning) Congresses didn't line up with years very nicely. In fact, they still don't but as a matter of practicality it hadn't mattered. In the 1st Congress, for instance, it ends on March 3, 1791 and the 2nd Congress starts the next day. That means a vote with an id that has the year and roll call number is ambiguous between possibly two sessions.

            So I changed the naming scheme for roll call votes brought in this way to just the roll call number in the files, and in the website URLs I think it has the session number and roll call number.

            An unavoidable mess.

            In the 101st Congress, I think the Senate votes may go with the new (i.e. 102-111th) scheme and the House votes may go with the historical (i.e. 1st-101st) scheme. Maybe.

               * Most senate bills missing for congress 101.  This is a bit odd

                 because the roll data seems complete, while on the Blll side,
                 there are many missing bills, including it seems the vast majority
                 of Senate bills.  Congress 100 and 102 don't seem to have this problem

            I'm not sure how far back I've imported data from THOMAS. The further I go back, the more errors pop up. For sure it's relatively complete as far back as the 103rd. Also, bill and vote data come from different sources, so you can't judge the completeness of one by the other.

            These things have all been in progress for a long time...

            Bobby Jindal carries a PVS id (26879) which has apparently been retired and replaced (with 35481).

            I'll update the db. I used to enter PVS IDs myself but now get it from Sunlight's API on a periodical/random basis.

            Thanks.

            Josh

          • Josh Tauberer
            ... Ah but the House and Senate currently reset their numbering at the start of each calendar year, so e.g. there are two roll #1 s in each chamber for the
            Message 5 of 14 , May 11, 2010
            • 0 Attachment
              On 05/02/2010 08:14 PM, Ben Woosley wrote:
              > On rolls, you might consider using the new, year-less scheme across the
              > board, with soft links or the like aliasing the xml to the old naming
              > scheme.

              Ah but the House and Senate currently reset their numbering at the start
              of each calendar year, so e.g. there are two roll #1's in each chamber
              for the 111th Congress.

              (What's a poor schema designer to do? :)

              - Josh Tauberer
              - CivicImpulse / GovTrack.us

              http://razor.occams.info | www.govtrack.us | civicimpulse.com

              "Members of both sides are reminded not to use guests of the
              House as props."

              On 05/02/2010 08:14 PM, Ben Woosley wrote:
              >
              >
              > Thanks Josh,
              >
              > On rolls, you might consider using the new, year-less scheme across the
              > board, with soft links or the like aliasing the xml to the old naming
              > scheme.
              >
              > -Ben
              >
              >
              > On Sun, May 2, 2010 at 11:32 AM, Josh Tauberer <tauberer@...
              > <mailto:tauberer@...>> wrote:
              >
              > * Rolls from the keithpoole data source use a different naming
              >
              > scheme than we see elsewhere. e.g. 'h16.xml' rather than
              > '1987-h16.xml'. For congress 101, both sources are present, under
              > the 2 different naming schemes.
              >
              >
              > Right I was forced into this because earlier in Congressional
              > history (esp at the beginning) Congresses didn't line up with years
              > very nicely. In fact, they still don't but as a matter of
              > practicality it hadn't mattered. In the 1st Congress, for instance,
              > it ends on March 3, 1791 and the 2nd Congress starts the next day.
              > That means a vote with an id that has the year and roll call number
              > is ambiguous between possibly two sessions.
              >
              > So I changed the naming scheme for roll call votes brought in this
              > way to just the roll call number in the files, and in the website
              > URLs I think it has the session number and roll call number.
              >
              > An unavoidable mess.
              >
              > In the 101st Congress, I think the Senate votes may go with the new
              > (i.e. 102-111th) scheme and the House votes may go with the
              > historical (i.e. 1st-101st) scheme. Maybe.
              >
              > * Most senate bills missing for congress 101. This is a bit odd
              >
              > because the roll data seems complete, while on the Blll side,
              > there are many missing bills, including it seems the vast
              > majority
              > of Senate bills. Congress 100 and 102 don't seem to have
              > this problem
              >
              >
              > I'm not sure how far back I've imported data from THOMAS. The
              > further I go back, the more errors pop up. For sure it's relatively
              > complete as far back as the 103rd. Also, bill and vote data come
              > from different sources, so you can't judge the completeness of one
              > by the other.
              >
              > These things have all been in progress for a long time...
              >
              > Bobby Jindal carries a PVS id (26879) which has apparently been
              > retired and replaced (with 35481).
              >
              >
              > I'll update the db. I used to enter PVS IDs myself but now get it
              > from Sunlight's API on a periodical/random basis.
              >
              > Thanks.
              >
              > Josh
              >
              >
              >
              >
              >
            • Harvey Frey
              Has anyone tried downloading bills from Thomas in HTML format, or converting their Text downloads to HTML? The guy at the Thomas help desk at first didn t
              Message 6 of 14 , May 24, 2010
              • 0 Attachment
                    Has anyone tried downloading bills from Thomas in HTML format, or converting their Text downloads to HTML?

                    The guy at the Thomas help desk at first didn't understand what I wanted, and then said that no one had ever asked for that before.


                    As you know, when you download the Contents page of a bill, the hyperlinks to the actual sections point to your own local folder instead of to the Thomas page where they exist.

                    If you add a base statement to point them back to the Thomas site, the links disappear in a few minutes since the search expires.

                    If you download all their referenced pages, you can't use their links to convert them to local hyperlinks, since they address them through a cgi program instead of through static name anchors.

                    Your can download a bill as plain text rather than HTML, but manually adding all the name anchors and hyperlinks would be a massive job.

                    (I did it for the Patriot Act, and it wasn't fun, especially since it looked like they intentionally obfuscated the references, sometimes using Public Law References, sometimes USC, and sometimes common names, so it was a detective job to find the sections they were amending.)

                    Has anyone tried this, say with perl?

                    I'm specifically interested in HR 3590, the recent Health Reform Bill.
                Some bills are posted as a massive XML file, but this one isn't. (If it were, I suppose you could use their id/id-ref pairs to construct href/name pairs, and then clean out the rest of the XML cruft.)

                Harvey

                =============================
                Harvey S. Frey MD PhD Esq.
                hsfrey@...  www.harp.org
                -----------------------------
                "Withdrawing in disgust is not the same thing as apathy."
                - Brian Eno

                =============================
              • Josh Tauberer
                That s what GovTrack does to get bill text. http://www.govtrack.us/data/us/bills.text/111/h/h3590.html I clean the HTML and make sure it s well-formed XML
                Message 7 of 14 , May 24, 2010
                • 0 Attachment
                  That's what GovTrack does to get bill text.
                  http://www.govtrack.us/data/us/bills.text/111/h/h3590.html

                  I clean the HTML and make sure it's well-formed XML before putting it there.

                  - Josh Tauberer
                  - CivicImpulse / GovTrack.us

                  http://razor.occams.info | www.govtrack.us | civicimpulse.com

                  "Members of both sides are reminded not to use guests of the
                  House as props."

                  On 05/24/2010 06:59 PM, Harvey Frey wrote:
                  >
                  >
                  > Has anyone tried downloading bills from Thomas in HTML format, or
                  > converting their Text downloads to HTML?
                  >
                  > The guy at the Thomas help desk at first didn't understand what I
                  > wanted, and then said that no one had ever asked for that before.
                  >
                  > As you know, when you download the Contents page of a bill, the
                  > hyperlinks to the actual sections point to your own local folder instead
                  > of to the Thomas page where they exist.
                  >
                  > If you add a base statement to point them back to the Thomas site, the
                  > links disappear in a few minutes since the search expires.
                  >
                  > If you download all their referenced pages, you can't use their links to
                  > convert them to local hyperlinks, since they address them through a cgi
                  > program instead of through static name anchors.
                  >
                  > Your can download a bill as plain text rather than HTML, but manually
                  > adding all the name anchors and hyperlinks would be a massive job.
                  >
                  > (I did it for the Patriot Act, and it wasn't fun, especially since it
                  > looked like they intentionally obfuscated the references, sometimes
                  > using Public Law References, sometimes USC, and sometimes common names,
                  > so it was a detective job to find the sections they were amending.)
                  >
                  > Has anyone tried this, say with perl?
                  >
                  > I'm specifically interested in HR 3590, the recent Health Reform Bill.
                  > Some bills are posted as a massive XML file, but this one isn't. (If it
                  > were, I suppose you could use their id/id-ref pairs to construct
                  > href/name pairs, and then clean out the rest of the XML cruft.)
                  >
                  > Harvey
                  >
                  > =============================
                  > Harvey S. Frey MD PhD Esq.
                  > hsfrey@... www.harp.org
                  > -----------------------------
                  > "Withdrawing in disgust is not the same thing as apathy."
                  > - Brian Eno
                  > =============================
                  >
                  >
                  >
                • Neil M.
                  I ve got an old python script that takes the US Code text files (from http://uscode.house.gov), scans for references and outputs mediawiki formatted pages and
                  Message 8 of 14 , May 24, 2010
                  • 0 Attachment
                    I've got an old python script that takes the US Code text files (from
                    http://uscode.house.gov), scans for references and outputs mediawiki
                    formatted pages and wiki links. It shouldn't be too difficult to modify
                    it to parse some other text/HTML file and output HTML instead. What is
                    it you want to link to exactly? The aforementioned US Code website?
                    Thomas? Both?

                    Neil

                    On 5/24/2010 3:59 PM, Harvey Frey wrote:
                    >
                    >
                    > Has anyone tried downloading bills from Thomas in HTML format, or
                    > converting their Text downloads to HTML?
                    >
                    > The guy at the Thomas help desk at first didn't understand what I
                    > wanted, and then said that no one had ever asked for that before.
                    >
                    > As you know, when you download the Contents page of a bill, the
                    > hyperlinks to the actual sections point to your own local folder instead
                    > of to the Thomas page where they exist.
                    >
                    > If you add a base statement to point them back to the Thomas site,
                    > the links disappear in a few minutes since the search expires.
                    >
                    > If you download all their referenced pages, you can't use their
                    > links to convert them to local hyperlinks, since they address them
                    > through a cgi program instead of through static name anchors.
                    >
                    > Your can download a bill as plain text rather than HTML, but
                    > manually adding all the name anchors and hyperlinks would be a massive job.
                    >
                    > (I did it for the Patriot Act, and it wasn't fun, especially since
                    > it looked like they intentionally obfuscated the references, sometimes
                    > using Public Law References, sometimes USC, and sometimes common names,
                    > so it was a detective job to find the sections they were amending.)
                    >
                    > Has anyone tried this, say with perl?
                    >
                    > I'm specifically interested in HR 3590, the recent Health Reform
                    > Bill. Some bills are posted as a massive XML file, but this one isn't.
                    > (If it were, I suppose you could use their id/id-ref pairs to construct
                    > href/name pairs, and then clean out the rest of the XML cruft.)
                    >
                    > Harvey
                    >
                    > =============================
                    > Harvey S. Frey MD PhD Esq.
                    > hsfrey@... www.harp.org
                    > -----------------------------
                    > "Withdrawing in disgust is not the same thing as apathy."
                    > - Brian Eno
                    > =============================
                    >
                    >
                  • Harvey Frey
                    Josh: What I m trying to do is clean it up so the section hierarchy is displayed, and add hyperlinks to the text from the TOC. This is an excerpt of the TOC.
                    Message 9 of 14 , May 24, 2010
                    • 0 Attachment
                      Josh:

                          What I'm trying to do is clean it up so the section hierarchy is displayed, and add hyperlinks to the text from the TOC. This is an excerpt of the TOC. What doesn't show here is that the lines are hyperlinks, and I've used a different background color for the sections of text added to other acts. (The back-ticks are not very salient.)

                          It's not that I'm a neat freak - I just get easily confused when the margins are all raggedy and the fonts don't go with the hierarchy.

                      Harvey

                      SECTION 1. SHORT TITLE; TABLE OF CONTENTS.

                        (a) Short Title- This Act may be cited as the `Patient Protection and Affordable Care Act'.
                        (b) Table of Contents- The table of contents of this Act is as follows:
                          Sec. 1. Short title; table of contents.

                      TITLE I--QUALITY, AFFORDABLE HEALTH CARE FOR ALL AMERICANS



                      Josh Tauberer wrote:
                       

                      That's what GovTrack does to get bill text.
                      http://www.govtrack.us/data/us/bills.text/111/h/h3590.html

                      I clean the HTML and make sure it's well-formed XML before putting it there.

                      - Josh Tauberer
                      - CivicImpulse / GovTrack.us

                      http://razor.occams.info | www.govtrack.us | civicimpulse.com

                      "Members of both sides are reminded not to use guests of the
                      House as props."

                      On 05/24/2010 06:59 PM, Harvey Frey wrote:
                      >
                      >
                      > Has anyone tried downloading bills from Thomas in HTML format, or
                      > converting their Text downloads to HTML?
                      >
                      > The guy at the Thomas help desk at first didn't understand what I
                      > wanted, and then said that no one had ever asked for that before.
                      >
                      > As you know, when you download the Contents page of a bill, the
                      > hyperlinks to the actual sections point to your own local folder instead
                      > of to the Thomas page where they exist.
                      >
                      > If you add a base statement to point them back to the Thomas site, the
                      > links disappear in a few minutes since the search expires.
                      >
                      > If you download all their referenced pages, you can't use their links to
                      > convert them to local hyperlinks, since they address them through a cgi
                      > program instead of through static name anchors.
                      >
                      > Your can download a bill as plain text rather than HTML, but manually
                      > adding all the name anchors and hyperlinks would be a massive job.
                      >
                      > (I did it for the Patriot Act, and it wasn't fun, especially since it
                      > looked like they intentionally obfuscated the references, sometimes
                      > using Public Law References, sometimes USC, and sometimes common names,
                      > so it was a detective job to find the sections they were amending.)
                      >
                      > Has anyone tried this, say with perl?
                      >
                      > I'm specifically interested in HR 3590, the recent Health Reform Bill.
                      > Some bills are posted as a massive XML file, but this one isn't. (If it
                      > were, I suppose you could use their id/id-ref pairs to construct
                      > href/name pairs, and then clean out the rest of the XML cruft.)
                      >
                      > Harvey
                      >
                      > =============================
                      > Harvey S. Frey MD PhD Esq.
                      > hsfrey@... www.harp.org
                      > -----------------------------
                      > "Withdrawing in disgust is not the same thing as apathy."
                      > - Brian Eno
                      > =============================
                      >
                      >
                      >

                    • Harvey Frey
                      Hi Neil: Thanks for the response! I simply want to use the contents section of a bill to hyperlink to the actual text paragraphs within the same bill, to
                      Message 10 of 14 , May 25, 2010
                      • 0 Attachment
                        Hi Neil:

                            Thanks for the response!

                            I simply want to use the 'contents' section of a bill to hyperlink to the actual text paragraphs within the same bill, to make the bills easier to navigate and comprehend.

                            If I save the contents page from Thomas, the links point to a cgi script and disappear within a few minutes. AFAIK, Thomas uses no permanent links - everything runs through their cgi script.

                            If I download the entire bill from Thomas, it contains no links at all, so I need to be able to find section headings and put name anchors there, and put corresponding href anchors in the correct TOC line, but not do it for incidental references which are not headings.

                            So it's not a problem of finding stereotypical text references to USC sections and constructing HTML from them. That would be a pretty straightforward RegExp problem.

                        Harvey
                        ================================================

                        Neil M. wrote:
                        I've got an old python script that takes the US Code text files (from
                        http://uscode.house.gov), scans for references and outputs mediawiki
                        formatted pages and wiki links.  It shouldn't be too difficult to modify
                        it to parse some other text/HTML file and output HTML instead.  What is
                        it you want to link to exactly?  The aforementioned US Code website?
                        Thomas?  Both?
                        
                        Neil
                        
                        On 5/24/2010 3:59 PM, Harvey Frey wrote:
                          
                         
                        
                            Has anyone tried downloading bills from Thomas in HTML format, or
                        converting their Text downloads to HTML?
                        
                            The guy at the Thomas help desk at first didn't understand what I
                        wanted, and then said that no one had ever asked for that before.
                        
                            As you know, when you download the Contents page of a bill, the
                        hyperlinks to the actual sections point to your own local folder instead
                        of to the Thomas page where they exist.
                        
                            If you add a base statement to point them back to the Thomas site,
                        the links disappear in a few minutes since the search expires.
                        
                            If you download all their referenced pages, you can't use their
                        links to convert them to local hyperlinks, since they address them
                        through a cgi program instead of through static name anchors.
                        
                            Your can download a bill as plain text rather than HTML, but
                        manually adding all the name anchors and hyperlinks would be a massive job.
                        
                            (I did it for the Patriot Act, and it wasn't fun, especially since
                        it looked like they intentionally obfuscated the references, sometimes
                        using Public Law References, sometimes USC, and sometimes common names,
                        so it was a detective job to find the sections they were amending.)
                        
                            Has anyone tried this, say with perl?
                        
                            I'm specifically interested in HR 3590, the recent Health Reform
                        Bill. Some bills are posted as a massive XML file, but this one isn't.
                        (If it were, I suppose you could use their id/id-ref pairs to construct
                        href/name pairs, and then clean out the rest of the XML cruft.)
                        
                        Harvey
                        
                        =============================
                        Harvey S. Frey MD PhD Esq.
                        hsfrey@...  www.harp.org
                        -----------------------------
                        "Withdrawing in disgust is not the same thing as apathy."
                        - Brian Eno
                        =============================
                        
                        
                            
                        
                        ------------------------------------
                        
                        Yahoo! Groups Links
                        
                        <*> To visit your group on the web, go to:
                            http://groups.yahoo.com/group/govtrack/
                        
                        <*> Your email settings:
                            Individual Email | Traditional
                        
                        <*> To change settings online go to:
                            http://groups.yahoo.com/group/govtrack/join
                            (Yahoo! ID required)
                        
                        <*> To change settings via email:
                            govtrack-digest@yahoogroups.com 
                            govtrack-fullfeatured@yahoogroups.com
                        
                        <*> To unsubscribe from this group, send an email to:
                            govtrack-unsubscribe@yahoogroups.com
                        
                        <*> Your use of Yahoo! Groups is subject to:
                            http://docs.yahoo.com/info/terms/
                        
                        
                          
                      • Neil M.
                        I had some free time, something like this? http://www.nabber.org/media/HR3590.html Neil
                        Message 11 of 14 , May 25, 2010
                        • 0 Attachment
                          I had some free time, something like this?

                          http://www.nabber.org/media/HR3590.html

                          Neil

                          On 5/25/2010 7:23 PM, Harvey Frey wrote:
                          >
                          >
                          > Hi Neil:
                          >
                          > Thanks for the response!
                          >
                          > I simply want to use the 'contents' section of a bill to hyperlink
                          > to the actual text paragraphs within the same bill, to make the bills
                          > easier to navigate and comprehend.
                          >
                          > If I save the contents page from Thomas, the links point to a cgi
                          > script and disappear within a few minutes. AFAIK, Thomas uses no
                          > permanent links - everything runs through their cgi script.
                          >
                          > If I download the entire bill from Thomas, it contains no links at
                          > all, so I need to be able to find section headings and put name anchors
                          > there, and put corresponding href anchors in the correct TOC line, but
                          > not do it for incidental references which are not headings.
                          >
                          > So it's not a problem of finding stereotypical text references to
                          > USC sections and constructing HTML from them. That would be a pretty
                          > straightforward RegExp problem.
                          >
                          > Harvey
                          > ================================================
                          >
                          > Neil M. wrote:
                          >
                          >> I've got an old python script that takes the US Code text files (from
                          >> http://uscode.house.gov), scans for references and outputs mediawiki
                          >> formatted pages and wiki links. It shouldn't be too difficult to modify
                          >> it to parse some other text/HTML file and output HTML instead. What is
                          >> it you want to link to exactly? The aforementioned US Code website?
                          >> Thomas? Both?
                          >>
                          >> Neil
                          >>
                          >> On 5/24/2010 3:59 PM, Harvey Frey wrote:
                          >>
                          >>>
                          >>>
                          >>> Has anyone tried downloading bills from Thomas in HTML format, or
                          >>> converting their Text downloads to HTML?
                          >>>
                          >>> The guy at the Thomas help desk at first didn't understand what I
                          >>> wanted, and then said that no one had ever asked for that before.
                          >>>
                          >>> As you know, when you download the Contents page of a bill, the
                          >>> hyperlinks to the actual sections point to your own local folder instead
                          >>> of to the Thomas page where they exist.
                          >>>
                          >>> If you add a base statement to point them back to the Thomas site,
                          >>> the links disappear in a few minutes since the search expires.
                          >>>
                          >>> If you download all their referenced pages, you can't use their
                          >>> links to convert them to local hyperlinks, since they address them
                          >>> through a cgi program instead of through static name anchors.
                          >>>
                          >>> Your can download a bill as plain text rather than HTML, but
                          >>> manually adding all the name anchors and hyperlinks would be a massive job.
                          >>>
                          >>> (I did it for the Patriot Act, and it wasn't fun, especially since
                          >>> it looked like they intentionally obfuscated the references, sometimes
                          >>> using Public Law References, sometimes USC, and sometimes common names,
                          >>> so it was a detective job to find the sections they were amending.)
                          >>>
                          >>> Has anyone tried this, say with perl?
                          >>>
                          >>> I'm specifically interested in HR 3590, the recent Health Reform
                          >>> Bill. Some bills are posted as a massive XML file, but this one isn't.
                          >>> (If it were, I suppose you could use their id/id-ref pairs to construct
                          >>> href/name pairs, and then clean out the rest of the XML cruft.)
                          >>>
                          >>> Harvey
                          >>>
                          >>> =============================
                          >>> Harvey S. Frey MD PhD Esq.
                          >>> hsfrey@... www.harp.org
                          >>> -----------------------------
                          >>> "Withdrawing in disgust is not the same thing as apathy."
                          >>> - Brian Eno
                          >>> =============================
                          >>>
                          >>>
                          >>>
                          >>
                          >>
                          >> ------------------------------------
                          >>
                          >> Yahoo! Groups Links
                          >>
                          >>
                          >>
                          >>
                          >>
                          >
                        • Harvey Frey
                          Neil: Precisely! :-D Thank You! Did you write a script to do that? Manually it would surely have taken more than a little free time !! Harvey
                          Message 12 of 14 , May 26, 2010
                          • 0 Attachment
                            Neil:

                                Precisely! :-D
                                Thank You!

                                Did you write a script to do that?
                                Manually it would surely have taken more than a little "free time" !!

                            Harvey

                            Neil M. wrote:
                            I had some free time, something like this?
                            
                            http://www.nabber.org/media/HR3590.html
                            
                            Neil
                            
                            On 5/25/2010 7:23 PM, Harvey Frey wrote:
                              
                             
                            
                            Hi Neil:
                            
                                Thanks for the response!
                            
                                I simply want to use the 'contents' section of a bill to hyperlink
                            to the actual text paragraphs within the same bill, to make the bills
                            easier to navigate and comprehend.
                            
                                If I save the contents page from Thomas, the links point to a cgi
                            script and disappear within a few minutes. AFAIK, Thomas uses no
                            permanent links - everything runs through their cgi script.
                            
                                If I download the entire bill from Thomas, it contains no links at
                            all, so I need to be able to find section headings and put name anchors
                            there, and put corresponding href anchors in the correct TOC line, but
                            not do it for incidental references which are not headings.
                            
                                So it's not a problem of finding stereotypical text references to
                            USC sections and constructing HTML from them. That would be a pretty
                            straightforward RegExp problem.
                            
                            Harvey
                            ================================================
                            
                            Neil M. wrote:
                            
                                
                            I've got an old python script that takes the US Code text files (from
                            http://uscode.house.gov), scans for references and outputs mediawiki
                            formatted pages and wiki links.  It shouldn't be too difficult to modify
                            it to parse some other text/HTML file and output HTML instead.  What is
                            it you want to link to exactly?  The aforementioned US Code website?
                            Thomas?  Both?
                            
                            Neil
                            
                            On 5/24/2010 3:59 PM, Harvey Frey wrote:
                              
                                  
                             
                            
                                Has anyone tried downloading bills from Thomas in HTML format, or
                            converting their Text downloads to HTML?
                            
                                The guy at the Thomas help desk at first didn't understand what I
                            wanted, and then said that no one had ever asked for that before.
                            
                                As you know, when you download the Contents page of a bill, the
                            hyperlinks to the actual sections point to your own local folder instead
                            of to the Thomas page where they exist.
                            
                                If you add a base statement to point them back to the Thomas site,
                            the links disappear in a few minutes since the search expires.
                            
                                If you download all their referenced pages, you can't use their
                            links to convert them to local hyperlinks, since they address them
                            through a cgi program instead of through static name anchors.
                            
                                Your can download a bill as plain text rather than HTML, but
                            manually adding all the name anchors and hyperlinks would be a massive job.
                            
                                (I did it for the Patriot Act, and it wasn't fun, especially since
                            it looked like they intentionally obfuscated the references, sometimes
                            using Public Law References, sometimes USC, and sometimes common names,
                            so it was a detective job to find the sections they were amending.)
                            
                                Has anyone tried this, say with perl?
                            
                                I'm specifically interested in HR 3590, the recent Health Reform
                            Bill. Some bills are posted as a massive XML file, but this one isn't.
                            (If it were, I suppose you could use their id/id-ref pairs to construct
                            href/name pairs, and then clean out the rest of the XML cruft.)
                            
                            Harvey
                            
                            =============================
                            Harvey S. Frey MD PhD Esq.
                            hsfrey@...  www.harp.org
                            -----------------------------
                            "Withdrawing in disgust is not the same thing as apathy."
                            - Brian Eno
                            =============================
                            
                            
                                
                                    
                            ------------------------------------
                            
                            Yahoo! Groups Links
                            
                            
                            
                            
                              
                                  
                            
                            ------------------------------------
                            
                            Yahoo! Groups Links
                            
                            <*> To visit your group on the web, go to:
                                http://groups.yahoo.com/group/govtrack/
                            
                            <*> Your email settings:
                                Individual Email | Traditional
                            
                            <*> To change settings online go to:
                                http://groups.yahoo.com/group/govtrack/join
                                (Yahoo! ID required)
                            
                            <*> To change settings via email:
                                govtrack-digest@yahoogroups.com 
                                govtrack-fullfeatured@yahoogroups.com
                            
                            <*> To unsubscribe from this group, send an email to:
                                govtrack-unsubscribe@yahoogroups.com
                            
                            <*> Your use of Yahoo! Groups is subject to:
                                http://docs.yahoo.com/info/terms/
                            
                            
                              
                          • Neil M.
                            Yes I wrote a quick Python script if anyone wants it just let me know. Neil
                            Message 13 of 14 , May 26, 2010
                            • 0 Attachment
                              Yes I wrote a quick Python script if anyone wants it just let me know.

                              Neil

                              On 5/26/2010 11:05 AM, Harvey Frey wrote:
                              >
                              >
                              > Neil:
                              >
                              > Precisely! :-D
                              > Thank You!
                              >
                              > Did you write a script to do that?
                              > Manually it would surely have taken more than a little "free time" !!
                              >
                              > Harvey
                              >
                              > Neil M. wrote:
                              >
                              >> I had some free time, something like this?
                              >>
                              >> http://www.nabber.org/media/HR3590.html
                              >>
                              >> Neil
                              >>
                              >> On 5/25/2010 7:23 PM, Harvey Frey wrote:
                              >>
                              >>>
                              >>>
                              >>> Hi Neil:
                              >>>
                              >>> Thanks for the response!
                              >>>
                              >>> I simply want to use the 'contents' section of a bill to hyperlink
                              >>> to the actual text paragraphs within the same bill, to make the bills
                              >>> easier to navigate and comprehend.
                              >>>
                              >>> If I save the contents page from Thomas, the links point to a cgi
                              >>> script and disappear within a few minutes. AFAIK, Thomas uses no
                              >>> permanent links - everything runs through their cgi script.
                              >>>
                              >>> If I download the entire bill from Thomas, it contains no links at
                              >>> all, so I need to be able to find section headings and put name anchors
                              >>> there, and put corresponding href anchors in the correct TOC line, but
                              >>> not do it for incidental references which are not headings.
                              >>>
                              >>> So it's not a problem of finding stereotypical text references to
                              >>> USC sections and constructing HTML from them. That would be a pretty
                              >>> straightforward RegExp problem.
                              >>>
                              >>> Harvey
                              >>> ================================================
                              >>>
                              >>> Neil M. wrote:
                              >>>
                              >>>
                              >>>> I've got an old python script that takes the US Code text files (from
                              >>>> http://uscode.house.gov), scans for references and outputs mediawiki
                              >>>> formatted pages and wiki links. It shouldn't be too difficult to modify
                              >>>> it to parse some other text/HTML file and output HTML instead. What is
                              >>>> it you want to link to exactly? The aforementioned US Code website?
                              >>>> Thomas? Both?
                              >>>>
                              >>>> Neil
                              >>>>
                              >>>> On 5/24/2010 3:59 PM, Harvey Frey wrote:
                              >>>>
                              >>>>
                              >>>>>
                              >>>>>
                              >>>>> Has anyone tried downloading bills from Thomas in HTML format, or
                              >>>>> converting their Text downloads to HTML?
                              >>>>>
                              >>>>> The guy at the Thomas help desk at first didn't understand what I
                              >>>>> wanted, and then said that no one had ever asked for that before.
                              >>>>>
                              >>>>> As you know, when you download the Contents page of a bill, the
                              >>>>> hyperlinks to the actual sections point to your own local folder instead
                              >>>>> of to the Thomas page where they exist.
                              >>>>>
                              >>>>> If you add a base statement to point them back to the Thomas site,
                              >>>>> the links disappear in a few minutes since the search expires.
                              >>>>>
                              >>>>> If you download all their referenced pages, you can't use their
                              >>>>> links to convert them to local hyperlinks, since they address them
                              >>>>> through a cgi program instead of through static name anchors.
                              >>>>>
                              >>>>> Your can download a bill as plain text rather than HTML, but
                              >>>>> manually adding all the name anchors and hyperlinks would be a massive job.
                              >>>>>
                              >>>>> (I did it for the Patriot Act, and it wasn't fun, especially since
                              >>>>> it looked like they intentionally obfuscated the references, sometimes
                              >>>>> using Public Law References, sometimes USC, and sometimes common names,
                              >>>>> so it was a detective job to find the sections they were amending.)
                              >>>>>
                              >>>>> Has anyone tried this, say with perl?
                              >>>>>
                              >>>>> I'm specifically interested in HR 3590, the recent Health Reform
                              >>>>> Bill. Some bills are posted as a massive XML file, but this one isn't.
                              >>>>> (If it were, I suppose you could use their id/id-ref pairs to construct
                              >>>>> href/name pairs, and then clean out the rest of the XML cruft.)
                              >>>>>
                              >>>>> Harvey
                              >>>>>
                              >>>>> =============================
                              >>>>> Harvey S. Frey MD PhD Esq.
                              >>>>> hsfrey@... www.harp.org
                              >>>>> -----------------------------
                              >>>>> "Withdrawing in disgust is not the same thing as apathy."
                              >>>>> - Brian Eno
                              >>>>> =============================
                              >>>>>
                              >>>>>
                              >>>>>
                              >>>>>
                              >>>> ------------------------------------
                              >>>>
                              >>>> Yahoo! Groups Links
                              >>>>
                              >>>>
                              >>>>
                              >>>>
                              >>>>
                              >>>>
                              >>
                              >>
                              >> ------------------------------------
                              >>
                              >> Yahoo! Groups Links
                              >>
                              >>
                              >>
                              >>
                              >>
                              >
                            • Harvey Frey
                              ... Yes, please! It does need a little manual post-editing, since the section numbers in amended texts can be (and are) duplicates. Harvey
                              Message 14 of 14 , May 26, 2010
                              • 0 Attachment
                                Neil:

                                >
                                if anyone wants it just let me know<

                                     Yes, please!

                                    It does need a little manual post-editing, since the section numbers in amended texts can be (and are) duplicates.

                                Harvey

                                Neil M. wrote:
                                Yes I wrote a quick Python script if anyone wants it just let me know.
                                
                                Neil
                                
                                On 5/26/2010 11:05 AM, Harvey Frey wrote:
                                  
                                 
                                
                                Neil:
                                
                                    Precisely! :-D
                                    Thank You!
                                
                                    Did you write a script to do that?
                                    Manually it would surely have taken more than a little "free time" !!
                                
                                Harvey
                                
                                Neil M. wrote:
                                
                                    
                                I had some free time, something like this?
                                
                                http://www.nabber.org/media/HR3590.html
                                
                                Neil
                                
                                On 5/25/2010 7:23 PM, Harvey Frey wrote:
                                  
                                      
                                 
                                
                                Hi Neil:
                                
                                    Thanks for the response!
                                
                                    I simply want to use the 'contents' section of a bill to hyperlink
                                to the actual text paragraphs within the same bill, to make the bills
                                easier to navigate and comprehend.
                                
                                    If I save the contents page from Thomas, the links point to a cgi
                                script and disappear within a few minutes. AFAIK, Thomas uses no
                                permanent links - everything runs through their cgi script.
                                
                                    If I download the entire bill from Thomas, it contains no links at
                                all, so I need to be able to find section headings and put name anchors
                                there, and put corresponding href anchors in the correct TOC line, but
                                not do it for incidental references which are not headings.
                                
                                    So it's not a problem of finding stereotypical text references to
                                USC sections and constructing HTML from them. That would be a pretty
                                straightforward RegExp problem.
                                
                                Harvey
                                ================================================
                                
                                Neil M. wrote:
                                
                                    
                                        
                                I've got an old python script that takes the US Code text files (from
                                http://uscode.house.gov), scans for references and outputs mediawiki
                                formatted pages and wiki links.  It shouldn't be too difficult to modify
                                it to parse some other text/HTML file and output HTML instead.  What is
                                it you want to link to exactly?  The aforementioned US Code website?
                                Thomas?  Both?
                                
                                Neil
                                
                                On 5/24/2010 3:59 PM, Harvey Frey wrote:
                                  
                                      
                                          
                                 
                                
                                    Has anyone tried downloading bills from Thomas in HTML format, or
                                converting their Text downloads to HTML?
                                
                                    The guy at the Thomas help desk at first didn't understand what I
                                wanted, and then said that no one had ever asked for that before.
                                
                                    As you know, when you download the Contents page of a bill, the
                                hyperlinks to the actual sections point to your own local folder instead
                                of to the Thomas page where they exist.
                                
                                    If you add a base statement to point them back to the Thomas site,
                                the links disappear in a few minutes since the search expires.
                                
                                    If you download all their referenced pages, you can't use their
                                links to convert them to local hyperlinks, since they address them
                                through a cgi program instead of through static name anchors.
                                
                                    Your can download a bill as plain text rather than HTML, but
                                manually adding all the name anchors and hyperlinks would be a massive job.
                                
                                    (I did it for the Patriot Act, and it wasn't fun, especially since
                                it looked like they intentionally obfuscated the references, sometimes
                                using Public Law References, sometimes USC, and sometimes common names,
                                so it was a detective job to find the sections they were amending.)
                                
                                    Has anyone tried this, say with perl?
                                
                                    I'm specifically interested in HR 3590, the recent Health Reform
                                Bill. Some bills are posted as a massive XML file, but this one isn't.
                                (If it were, I suppose you could use their id/id-ref pairs to construct
                                href/name pairs, and then clean out the rest of the XML cruft.)
                                
                                Harvey
                                
                                =============================
                                Harvey S. Frey MD PhD Esq.
                                hsfrey@...  www.harp.org
                                -----------------------------
                                "Withdrawing in disgust is not the same thing as apathy."
                                - Brian Eno
                                =============================
                                
                                
                                    
                                        
                                            
                                ------------------------------------
                                
                                Yahoo! Groups Links
                                
                                
                                
                                
                                  
                                      
                                          
                                ------------------------------------
                                
                                Yahoo! Groups Links
                                
                                
                                
                                
                                  
                                      
                                
                                ------------------------------------
                                
                                Yahoo! Groups Links
                                
                                <*> To visit your group on the web, go to:
                                    http://groups.yahoo.com/group/govtrack/
                                
                                <*> Your email settings:
                                    Individual Email | Traditional
                                
                                <*> To change settings online go to:
                                    http://groups.yahoo.com/group/govtrack/join
                                    (Yahoo! ID required)
                                
                                <*> To change settings via email:
                                    govtrack-digest@yahoogroups.com 
                                    govtrack-fullfeatured@yahoogroups.com
                                
                                <*> To unsubscribe from this group, send an email to:
                                    govtrack-unsubscribe@yahoogroups.com
                                
                                <*> Your use of Yahoo! Groups is subject to:
                                    http://docs.yahoo.com/info/terms/
                                
                                
                                  
                              Your message has been successfully submitted and would be delivered to recipients shortly.