Loading ...
Sorry, an error occurred while loading the content.

Re: [NTO] pdf converter to html

Expand Messages
  • Alec Burgess
    Hi Adrien: I use pdftotext http://en.wikipedia.org/wiki/Pdftotext download from http://www.foolabs.com/xpdf/home.html I use this command ^!replace .*
    Message 1 of 3 , Feb 1, 2013
    • 0 Attachment
      Hi Adrien:
      I use pdftotext http://en.wikipedia.org/wiki/Pdftotext download from
      http://www.foolabs.com/xpdf/home.html
      I use this command ^!replace ".*" >> "pdftotext.exe -nopgbrk -layout
      "$0"" rwais
      on a buffer containing the names of the pdf files to converted, save
      that as a batch file and run it in the folder containing the files to be
      converted.

      I then run this clip:
      H=pdfToTextToHtml
      ; --- use this on list of files ^!replace ".*" >> "pdftotext.exe
      -nopgbrk -layout "$0"" rwais
      ^!replace "^.{5,68}\R(?!\R)" >> "$0\r\n\r\n" rwais
      ^!replace "(?<=\.|"\?\!)\R" >> "\r\n\r\n\r\n" rwais
      ^!select all
      ^!toolbar "join lines"
      ^!toolbar "Document to HTML"
      ^!replace "(?<=[a-z])\</p\>\r\n\r\n\<p\>(?!chapter)" >> "\x20" rwais
      ^!replace "(?<=,|\-|\.\.\.|:)\</p\>\r\n\r\n\<p\>" >> "\x20" rwais
      ^!replace "(Mr|Mrs|Dr)\.\</p\>\r\n\r\n\<p\>\r\n" >> "\x20" rwais
      ^!save as "^$getname(^$getdocname$)$.html"

      on the resulting TXT files which creates the basic HTML files you requested.
      Note: the ^!replace lines in above are just tweaks used to get
      paragraph and line breaks as I desire.


      On 2013-02-01 07:37, Adrien Verlee wrote:
      > Many converters exist.
      > Before to wade through it (exceptionally I now have to convert 2 pdf's),
      > is there maybe someone with experience, who can say which one to take.
      >
      > Conversion to basic html is sufficient.
      --
      Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
    • Adrien Verlee
      ... Sometimes one is so foolish to look elsewhere, where the solution lies next door. I had only to open the PDF in Acrobat Reader X and save as text. Where my
      Message 2 of 3 , Feb 2, 2013
      • 0 Attachment
        Op 2/02/2013 0:04, Alec Burgess schreef:
        > I use pdftotexthttp://en.wikipedia.org/wiki/Pdftotext download from
        > http://www.foolabs.com/xpdf/home.html

        Sometimes one is so foolish to look elsewhere, where the solution lies
        next door.

        I had only to open the PDF in Acrobat Reader X and save as text. Where
        my Word-macro can run on.

        But thanks for your post.
        --
        Translatie > www.adrien-verlee.be

        - Ongedeelde informatie = verloren informatie -
      Your message has been successfully submitted and would be delivered to recipients shortly.