261Re: [govtrack] Two Project Ideas [bill versioning]
- Nov 9, 2006Scott Burns wrote:
> Instead of trying to convert PDFs and remove formatting you can getRight, I forgot that Thomas's HTML versions are pretty good.
> basic HTML versions of these bills from Thomas. This bill, for
> example, can be found here:
> I haven't played around with the queries there enough to figure outNot as far as I know also.
> if there's a reliable URL to get directly to the text display of the
> version you want
In that case, the task may be a lot easier. Convert the HTML into XML,
and then run a difference with an XML differencing tool, such as xmldiff
(a Python script, very slow when I tried it just now, but seems to
actually be useful for this project and can read the HTML directly) or
Which might do the same thing faster and better, but I haven't tried.
It's in C++ and needs to be compiled.
- Joshua Tauberer
"Strike up the klezmer and start acting like a man. You're
about to have a truth-mitzvah." -- The Colbert Report
- << Previous post in topic Next post in topic >>