Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Extracting text from WORD files

Expand Messages
  • Larry Hamilton
    ... Robin, This is a task I encountered a couple of years ago with over 100 Word documents that contained census readings and all I needed was the text to wrap
    Message 1 of 7 , Oct 13, 2003
    • 0 Attachment
      Robin Chapple wrote:
      > I have a task to extract the text from 26 WORD documents. Is this a
      > task that I can achieve with clips?
      >
      > Thanks,
      >
      > Robin Chapple

      Robin,

      This is a task I encountered a couple of years ago with over 100 Word
      documents that contained census readings and all I needed was the text to
      wrap <pre> </pre> tags around for simple HTML files.

      Here is the clip I used, and it gets the headers and footers, if any.

      There were several methods that I encountered from others on the list, but
      those solutions did not quite do what I needed. I even looked for command
      line utilities to extract text, but none of them could do what opening the
      Word doc itself allowed.

      You may need to adjust the delays on the keyboard commands to get them to do
      what you need. For only 26 files, this will be faster than building a new
      clip of doing it by hand.

      HTH,

      Larry Hamilton
      lmh@...
      My Web Site: http://notlimah.tripod.com/
      Webmaster: Hamilton National Genealogical Society, Inc.
      http://www.hamiltongensociety.org/

      <copy below this line>
      ;March 05, 2002 Larry Hamilton lmh@...
      ;Brute force method to open Word document, and use toolbar coommands to copy
      headers and footers from document. The commandline tools I found do not pull
      out the header and footer text. Only Word saving as Text does so.
      ^!ClearVariables
      ^!SetDebug ON

      ;I just hard coded the path to keep it simple.
      ^!Set %File%=^$GetFileFirst("c:\Census";*.doc)$
      ^!ChDir C:\Census
      :LOOP
      ;The following was used for testing to make sure it does what is desired.
      ;^!Info ^%File% > ^$GetName(^%File%)$.txt

      ^!"C:\Program Files\Microsoft Office\Office\WINWORD.EXE"
      ^$GetShort(^%File%)$

      ^!SetHintInfo ^$GetDate(hh:nn:ss am/pm dddd, mmmm dd, yyyy)$
      ^!FocusApp "Microsoft Word - ^$GetName(^%File%)$"
      ^!IfDiff "^$GetAppTitle$" "Microsoft Word - ^$GetName(^%File%)$" Skip_-2
      ^!StatusClose
      ^!Delay 15
      ;The following Keyboard sequence will save the currently opened document
      with the same name in TXT format. It puts the headers & footers at the end
      of the file, so it still needs to be cleaned up.
      ^!Keyboard ALT+F A &100 TAB &100 T &100 ENTER


      ^!Set %File%=^$GetFileNext$
      ^!GoTo LOOP
      ^!CloseFileFind
      </copy above this line>
    • hugo_paulissen
      Robin, Larry, I had to do this for a couple of hundred of files once. What follows is a very quick and dirty clip (warning!), which opens the documents in Word
      Message 2 of 7 , Oct 14, 2003
      • 0 Attachment
        Robin, Larry,

        I had to do this for a couple of hundred of files once. What follows
        is a very quick and dirty clip (warning!), which opens the documents
        in Word (one at a time), and copies the text to NoteTab. The document
        is then saved with the same name plus a txt-extension... If all files
        are processed the clip should stop.

        Please note that you should have Word open - and that there should be
        no document loaded in Word before you start the clip. (This can be
        fixed by changing the FocusApp line...); the following clip assumes
        the title bar of MS Word only shows Microsoft Word.

        Hugo

        ^!Set %path%="C:\WINDOWS\Desktop\OutlookFiles"
        ^!SetArray %Files%=^$GetFiles("^%path%";*.doc)$
        ^!Set %X%=1
        :EXPORT
        ^!If ^%X% > ^%Files0% END
        ^!FocusApp "Microsoft Word"
        ^!Delay 1
        ^!Keyboard CTRL+O
        ^!Delay 1
        ^!Keyboard #^%Files^%X%%# ENTER
        ^!Delay 1
        ^!Keyboard CTRL+A CTRL+C CTRL+W
        ^!ActivateApp
        ^!Select ALL
        ^!InsertText ^$GetClipboard$
        ^!Save AS "^%Files^%X%%.txt"
        ^!INC %X%
        ^!Delay 1
        ^!GoTo EXPORT
      • Don Passenger
        That does what I said, only it has notetab code in it ;-) -- Don Passenger
        Message 3 of 7 , Oct 14, 2003
        • 0 Attachment
          That does what I said, only it has notetab code in it ;-)

          --

          Don Passenger
        Your message has been successfully submitted and would be delivered to recipients shortly.