Re: [Clip] Extracting text from WORD files
- Robin Chapple wrote:
> I have a task to extract the text from 26 WORD documents. Is this aRobin,
> task that I can achieve with clips?
> Robin Chapple
This is a task I encountered a couple of years ago with over 100 Word
documents that contained census readings and all I needed was the text to
wrap <pre> </pre> tags around for simple HTML files.
Here is the clip I used, and it gets the headers and footers, if any.
There were several methods that I encountered from others on the list, but
those solutions did not quite do what I needed. I even looked for command
line utilities to extract text, but none of them could do what opening the
Word doc itself allowed.
You may need to adjust the delays on the keyboard commands to get them to do
what you need. For only 26 files, this will be faster than building a new
clip of doing it by hand.
My Web Site: http://notlimah.tripod.com/
Webmaster: Hamilton National Genealogical Society, Inc.
<copy below this line>
;March 05, 2002 Larry Hamilton lmh@...
;Brute force method to open Word document, and use toolbar coommands to copy
headers and footers from document. The commandline tools I found do not pull
out the header and footer text. Only Word saving as Text does so.
;I just hard coded the path to keep it simple.
;The following was used for testing to make sure it does what is desired.
;^!Info ^%File% > ^$GetName(^%File%)$.txt
^!"C:\Program Files\Microsoft Office\Office\WINWORD.EXE"
^!SetHintInfo ^$GetDate(hh:nn:ss am/pm dddd, mmmm dd, yyyy)$
^!FocusApp "Microsoft Word - ^$GetName(^%File%)$"
^!IfDiff "^$GetAppTitle$" "Microsoft Word - ^$GetName(^%File%)$" Skip_-2
;The following Keyboard sequence will save the currently opened document
with the same name in TXT format. It puts the headers & footers at the end
of the file, so it still needs to be cleaned up.
^!Keyboard ALT+F A &100 TAB &100 T &100 ENTER
</copy above this line>
- Robin, Larry,
I had to do this for a couple of hundred of files once. What follows
is a very quick and dirty clip (warning!), which opens the documents
in Word (one at a time), and copies the text to NoteTab. The document
is then saved with the same name plus a txt-extension... If all files
are processed the clip should stop.
Please note that you should have Word open - and that there should be
no document loaded in Word before you start the clip. (This can be
fixed by changing the FocusApp line...); the following clip assumes
the title bar of MS Word only shows Microsoft Word.
^!If ^%X% > ^%Files0% END
^!FocusApp "Microsoft Word"
^!Keyboard #^%Files^%X%%# ENTER
^!Keyboard CTRL+A CTRL+C CTRL+W
^!Save AS "^%Files^%X%%.txt"