IQPDF - the PDF search tool in Liberty Basic - Index generator now in the Cloud
- Hi all,
just an update on the IQPDF PDF search tool written in Liberty Basic. IQPDF.COM
Some serious revisions to the core search engine code and methods in preparation for a rewrite into IOS for the IPAD have resulted in some significant improvements in speed and accuracy.
I have also discovered the LB Booster (by RT Russell) which has allowed me to compile into a single EXE file. LBB has some limitations compared to Liberty and is not in any way a development environment, but once you have stable code, it does improve speed and compactness considerably.
I have also put the index generation side of the IQPDF app into the cloud using AmazonAWS to facilitate and automate the index process. It now runs rather sweetly by emailing a PDF to PDF@... and you receive the index by return, the original PDF is not sent back - as you already have a copy.
To make the whole application work (both user and server side) I have has to pull quite a few techniques and 'tricks' together and the impressive thing is Liberty's ability to accommodate it all. Rather easier than even in the IOS Dev tools as it turns out :)
Anyway IQPDF is free for personal use and I would appreciate some external testing of the server side by having some varied PDFs to process and return indexes. I don't keep the PDFs or the created indexes, just a log of CPU time, sender email address and PDF filename / size.
Email does have a practical limit of about 20-25Mb for attachments, but within that limitation the server can accept any realistic number of PDF attachments in a single email. I am presently testing an FTP solution for really large PDFs, but that is more for my work environment - which is how this project started.
IQPDF is a single, multiple word / file PDF search application that uses a form of binary 3D index to reference the PDF text content in a unique way. It allows an input word list to be referenced into the PDF and returns ranked pages displayed in sumatraPDF. Pages are ranked on a word count / order basis that is proving very effective. Google search it ain't in many ways, but for what I designed it for it beats Google Desktop into a cocked hat. It is spectacularly fast, being able to perform a multiple word search in say the Complete Works of Shakespeare (731211 words in 3066 pages) in less than 450ms on my fairly pedestrian PC on a first pass. Once the index file hits the buffer this time drops to less than 100ms per search. Not too bad at all.
I am posting this here to showcase what LB can do when pushed reasonably hard and this is not a spam posting. Take a click to IQPDF.COM, download the run times and have a play. I have the Full KJ Bible and Works of Shakespeare available there with indexes to get you started. In fact using IQPDF on these alone is worth the effort as it makes finding 'stuff' so easy. Do try emailing some of your PDFs to PDF@... and copying both the PDF and the returned index file into wherever you put the IQPDF application. You might be surprised at what you can find.