Loading ...
Sorry, an error occurred while loading the content.

Word 2002 to HTML

Expand Messages
  • R Shapp
    Hi Group, I need help converting Word documents to web pages. About three or four times a month, I get scientific papers as email attachments which must be
    Message 1 of 18 , Nov 6, 2003
    • 0 Attachment
      Hi Group,

      I need help converting Word documents to web pages. About three or four times
      a month, I get scientific papers as email attachments which must be uploaded
      to a website. These papers are written in several different versions of
      MSWord. They almost always contain tables, graphs, and images as well as
      specifically indented text, the formatting of which must be maintained.

      My mail client is MSOE v6 running under WinXP Home.

      I save the documents to disk then open them in Word 2002 and use File > "Save
      as Web Page..." with a file extension of "htm". I open the htm document in
      MSIE v6 and View Source in NoteTab Pro. Then the misery begins.

      Each paper contains what appear to be hundreds of lines of style sheets or xml
      codes. All the images are named "imagexxx.gif" or "imagexxx.wmz" or
      "imagexxx.jpg" where "xxx" is consecutively numbered from 001 upwards. Of
      course these file names conflict with similarly named image files from all the
      other papers.

      I end up hand-coding the html for these papers line-by-line.

      Some of this problem could be solved by converting all the incoming documents
      to PDF files, but most of my users don't like the Adobe Reader.

      Can anyone suggest a more efficient overall strategy?

      Lacking a grand strategy, how about a tactic that will allow me to rename all
      the image files to something unique -- maybe replace the word "image" with a
      string specific to each paper?

      Thanks for the help,

      Ray Shapp
    • Greg Chapman
      Hi Ray, ... That is probably your big mistake! :-) I use WORD2000, and have only seen WORD2002 once. However, it was a long enough glance to notice that
      Message 2 of 18 , Nov 7, 2003
      • 0 Attachment
        Hi Ray,

        > I save the documents to disk then open them in Word 2002 and use
        > File > "Save
        > as Web Page..." with a file extension of "htm".

        That is probably your big mistake! :-)

        I use WORD2000, and have only seen WORD2002 once. However, it was a long
        enough glance to notice that there is a "Filtered HTML" type. That's what
        you need for uploading to the web.

        The "Save as Web page" should be reserved for round-tripping to non-MS word
        processors. It was Microsoft's attempt to highjack HTML as an internal
        format for WORD. As you see, if you look at the style sheet code, it's
        there to allow you to preserve all the word-processor possible but
        HTML-impossible formatting. For web purposes its useless bloat.

        WORD2000 users can obtain a plug-in from the Microsoft site, which does the
        same job. (Search for the file "msohtmf2.exe" (MS Office HTML Filter v2)).
        Apart from the main filter, it introduces some nice little tricks such as
        "Copy as HTML" For example, if you find it tricky to create the raw code
        for intricate patterns of nested tables, then create them in WORD and use
        the "Copy as HTML" feature to paste them into NoteTab! :-)

        Greg
      • hugo_paulissen
        Ray, A very quick note. I have to do similar things regularly. There are ways to change the conversion to HTML (using other tools; generally resulting in
        Message 3 of 18 , Nov 7, 2003
        • 0 Attachment
          Ray,

          A very quick note. I have to do similar things regularly. There are
          ways to change the conversion to HTML (using other tools; generally
          resulting in cleaner HTML but seldom/never in a satisfactory format),
          but if the formatting is important it is quite handy to do it from
          within Word...

          The way to overcome the double picture-names is to have Word put all
          files that belong to the original HTML in a separate folder. You end
          up with something as xyz.htm (file) and xyz_supportfiles (folder).
          Now you do not have to edit the files manually if you put them on a
          server. (Do not forget to put the folder on the server as well...)

          In Word 2000 this setting can be found in...
          Options | General | Web-options | Files | Organize supporting files
          in a folder

          I just noticed a setting for Rely on CSS for font formatting as well.
          It was unchecked on my machine, but I imagine that this will help in
          reducing the file size of the htm-files.

          Problem is that you seldom go to these settings and forget about them
          eventually.

          Regards,

          Hugo
        • R Shapp
          Hi Greg and Hugo, That Filtered HTML mode is exactly what I needed! I now have reasonably tractable HTML coding that I can paste into the proper templates. Do
          Message 4 of 18 , Nov 7, 2003
          • 0 Attachment
            Hi Greg and Hugo,

            That Filtered HTML mode is exactly what I needed! I now have reasonably
            tractable HTML coding that I can paste into the proper templates.

            Do you have any suggestions about an efficient way to give the image files
            unique names? Right now, all the images from each conversion begin with the
            name "image001.gif" and progress in numerical sequence. I'd like a way to
            replace "image" with a unique string that is related to the content of the
            original paper. I can use the NoteTab Replace Text feature to change all the
            hyperlinks within the source HTML. Do you know of a way to rename batches of
            files in Windows Explorer folders? The renaming feature for batches of files
            in Explorer introduces parentheses in the names. I don't think that would be
            acceptable to the file naming conventions on the website.

            Hugo, The use of separate folders for the image files of each paper is a good
            fall-back solution if NoteTab doesn't already have a renaming function for
            batches of files. I seem to remember that NoteTab has a way to manipulate
            file names -- maybe via the use of regular expressions???

            Thanks again for the great help.

            Ray Shapp
          • John Zeman
            Ray have you considered using the alt tag in your images? That s the tag which displays text when you position your mouse pointer over a image in a web page.
            Message 5 of 18 , Nov 7, 2003
            • 0 Attachment
              Ray have you considered using the alt tag in your images? That's the
              tag which displays text when you position your mouse pointer over a
              image in a web page. A sample of alt tag usage is below:

              <img src="images/cedar-creek-title1.gif" width="300" height="171"
              alt="Cedar Creek Logo">

              Where
              alt="Cedar Creek Logo"
              is the alt tag.

              If you really want to rename several files instead, I wrote a clip a
              couple years ago that does mass file renaming, but since switching to
              the Directory Opus file manager last spring I quit using it..

              I probably still have that clip around here somewhere in the cobwebs
              though. :)

              John



              --- In ntb-html@yahoogroups.com, R Shapp <ras45@o...> wrote:
              > Hi Greg and Hugo,
              >
              > That Filtered HTML mode is exactly what I needed! I now have
              reasonably
              > tractable HTML coding that I can paste into the proper templates.
              >
              > Do you have any suggestions about an efficient way to give the
              image files
              > unique names? Right now, all the images from each conversion begin
              with the
              > name "image001.gif" and progress in numerical sequence. I'd like a
              way to
              > replace "image" with a unique string that is related to the content
              of the
              > original paper. I can use the NoteTab Replace Text feature to
              change all the
              > hyperlinks within the source HTML. Do you know of a way to rename
              batches of
              > files in Windows Explorer folders? The renaming feature for
              batches of files
              > in Explorer introduces parentheses in the names. I don't think
              that would be
              > acceptable to the file naming conventions on the website.
              >
              > Hugo, The use of separate folders for the image files of each
              paper is a good
              > fall-back solution if NoteTab doesn't already have a renaming
              function for
              > batches of files. I seem to remember that NoteTab has a way to
              manipulate
              > file names -- maybe via the use of regular expressions???
              >
              > Thanks again for the great help.
              >
              > Ray Shapp
            • Alec Burgess
              ... the ... v2)). ... The copy as HTML feature sounds neat. I ve got Office 2002 I tried searching in Word 2002 s Help for copy as html hoping it had been
              Message 6 of 18 , Nov 7, 2003
              • 0 Attachment
                Greg:

                > WORD2000 users can obtain a plug-in from the Microsoft site, which does
                the
                > same job. (Search for the file "msohtmf2.exe" (MS Office HTML Filter
                v2)).
                > Apart from the main filter, it introduces some nice little tricks such as
                > "Copy as HTML" For example, if you find it tricky to create the raw code
                > for intricate patterns of nested tables, then create them in WORD and use
                > the "Copy as HTML" feature to paste them into NoteTab! :-)

                The copy as HTML feature sounds neat. I've got Office 2002
                I tried searching in Word 2002's Help for "copy as html" hoping it had been
                supplied as standard - no joy
                Ditto checking Word-Tools-Options-General-Web Options
                Ditto Google: "copy as html" "Office 2002"
                Ditto d/l'ing and attempting to install w/fingers crossed (You don't have
                Office 2000 - ya, I know :-( )

                Any suggestions? - noting that you said you'd "only seen WORD2002 once" :-)
                I guess "Save as ... filtered HTML", open in browser and then "Copy partial
                source" would work though

                Regards ... Alec
                --
                ----- Original Message -----
                From: "Greg Chapman" <greg@...>
                To: <ntb-html@yahoogroups.com>
                Sent: Friday, November 07, 2003 03:51
                Subject: [klb: RE: [NH] Word 2002 to HTML


                > Hi Ray,
                >
                > > I save the documents to disk then open them in Word 2002 and use
                > > File > "Save
                > > as Web Page..." with a file extension of "htm".
                >
                > That is probably your big mistake! :-)
                >
                > I use WORD2000, and have only seen WORD2002 once. However, it was a long
                > enough glance to notice that there is a "Filtered HTML" type. That's what
                > you need for uploading to the web.
                >
                > The "Save as Web page" should be reserved for round-tripping to non-MS
                word
                > processors. It was Microsoft's attempt to highjack HTML as an internal
                > format for WORD. As you see, if you look at the style sheet code, it's
                > there to allow you to preserve all the word-processor possible but
                > HTML-impossible formatting. For web purposes its useless bloat.
              • R Shapp
                Hi John and Alec, Yes, I always use the alt tag -- Tidy demands it.
                Message 7 of 18 , Nov 7, 2003
                • 0 Attachment
                  Hi John and Alec,

                  <<have you considered using the alt tag in your images?>>

                  Yes, I always use the alt tag -- Tidy demands it.

                  <<"Save as ... filtered HTML", >>

                  I don't use Word 2000 anymore, but in Word 2002, the Save as... Save as Type
                  window option is "Web Page, Filtered (*.htm; *.html)".

                  <<since switching to the Directory Opus file manager last spring >>

                  That comment got me looking for file renaming utilities, and I settled on
                  Flash Renamer by rl vision ( http://www.rlvision.com/ ).

                  Problem solved!

                  Thanks to all who helped.

                  Ray Shapp
                • John Zeman
                  Glad you found your answer Ray.. I have to admit I ve been kicking myself in the head ever since I posted what I did about using the alt tag because there
                  Message 8 of 18 , Nov 7, 2003
                  • 0 Attachment
                    Glad you found your answer Ray..

                    I have to admit I've been kicking myself in the head ever since I
                    posted what I did about using the "alt tag" because there ain't no
                    such animal as an alt tag. I have bad habit of misusing the HTML
                    term of "tag". I know what a tag is, but I tend to use that term
                    where I really shouldn't. Just to set the record straight, the
                    alt "tag" is really an "attribute" of the img element (and other
                    elements as well).

                    Thanks to all for not taking me to the woodshed for inaccurately
                    stating things before..

                    John


                    --- In ntb-html@yahoogroups.com, R Shapp <ras45@o...> wrote:
                    > Hi John and Alec,
                    >
                    > <<have you considered using the alt tag in your images?>>

                    > Problem solved!
                    >
                    > Thanks to all who helped.
                    >
                    > Ray Shapp
                  • Larry Hamilton
                    ... I have Word 2002 at work, and hate its efforts at HTML. I have also had others send me web pages created by Word 2002. I will keep the suggestions in mind
                    Message 9 of 18 , Nov 9, 2003
                    • 0 Attachment
                      Alec Burgess wrote:
                      > Ditto d/l'ing and attempting to install w/fingers crossed (You don't
                      > have Office 2000 - ya, I know :-( )
                      >
                      > Any suggestions? - noting that you said you'd "only seen WORD2002
                      > once" :-) I guess "Save as ... filtered HTML", open in browser and
                      > then "Copy partial source" would work though
                      >
                      > Regards ... Alec

                      I have Word 2002 at work, and hate its efforts at HTML. I have also had
                      others send me web pages created by Word 2002.

                      I will keep the suggestions in mind next time.

                      What I have found works best with pre-existing HTML created by Word 2002 is
                      to open it with Open Office. When it is saved, it cleans out the
                      non-essential stuff. If I recall it works best if you use Word to save the
                      HTML document as a DOC then open in Open Office and save as HTML. It is a
                      big download, but I find it worth it to not have to fiddle with trying to
                      clean up the MS non-standard HTML.

                      Larry Hamilton
                      lmh@...
                      http://notlimah.tripod.com/
                    • Adrian/ Rosemary Worsfold
                      I see what Larry Hamilton says about Open Office and cleaning up or producing better HTML. Open Office also allows editing and producing Powerpoint files. But
                      Message 10 of 18 , Nov 9, 2003
                      • 0 Attachment
                        I see what Larry Hamilton says about Open Office and cleaning up or producing
                        better HTML. Open Office also allows editing and producing Powerpoint files.
                        But it is too awkward and unreliable. How does 602Text compare? To me its
                        HTML is a half way house with a long internal style sheet, and a lot of classes
                        and repeated styles after every paragraph opening tag:

                        <.p class="Normal" style="text-indent: -39px; margin-left: 39px; font-size:
                        12pt; ">

                        It is hopeless at importing a .doc document generated by Word, and especially
                        with tables (they all are) and the html out is unreliable in effect. Humans are
                        the most economical and efficient. But Textshield (rtf writer that reads .doc,
                        adds images, and saves several ways) didn't do too badly, with no style sheet
                        but repeated styles, except what should have been ULs became OLs. (..
                        added)

                        <BODY bgcolor=White>

                        <.p align="center" style="text-indent: 0; margin-left: 0; margin-right: 0">
                        <.FONT FACE="Comic Sans MS" SIZE=+1 COLOR=WindowText>
                        <.span style="background-color: Window">
                        <.font size="3">.<.U>Families.<./P>
                        <.p style="text-indent: 0; margin-left: 0; margin-right: 0">.<./P>
                        <.p style="text-indent: 0; margin-left: 0; margin-right: 0">.<./U>Today it is
                        recognised that families come in all shapes and sizes. It used to be easy to
                        define a family. It meant two parents and 2.4 children, as an average. Now
                        families exist:.<./P>
                        <.p style="text-indent: 0; margin-left: 0; margin-right: 0">.<./P>
                        <.p style="text-indent: 18; margin-left: 0; margin-right: 0">.<.OL>
                        <.LI>With one or two parents.<./LI>.<.LI>
                        Parents who are married or unmarried.<./LI>.<.LI>
                        Parents of two sexes and occasionally one.<./LI>.<.LI>
                        Adults with children and without children.<./LI>.<.LI>
                        Three generations or more in one house.<./LI>.<.LI>
                        Children of different partners with one parent in common.<./LI>.<.LI>
                        Children alternating between parents' houses.<./LI>.<.LI>
                        Some children knowing four parents each and eight grandparents.<./LI>.<.LI>
                        Some children knowing even more!.<./LI>.<.LI>
                        Some children not genetically their parentsÂ’ children (being adopted and
                        otherwise!).<./LI>
                        <./LI>.<./OL>
                        Was it ever really any different or is this just more recognised? Go around any
                        graveyard and see clues; look at the half-truths revealed and concealed in any
                        family tree exercise (is genealogy deception?)..<./P>


                        Adrian Worsfold

                        http://www.pluralist.co.uk
                      • Mark McLaughlin
                        Further to this topic folks may want to check out the Demoroniser: http://www.fourmilab.ch/webtools/demoroniser/ It s designed to clean up HTML that was
                        Message 11 of 18 , Nov 10, 2003
                        • 0 Attachment
                          Further to this topic folks may want to check out the Demoroniser:

                          http://www.fourmilab.ch/webtools/demoroniser/


                          It's designed to clean up HTML that was written by Micro$oft
                          applications.


                          Cheers


                          Mark McLaughlin
                          ----------------------------------------------------------------
                          Best Color Video Production CD-ROM Website Design
                          mailto:mark@... Ph. 250-744-4111 Fx.
                          www.bestcolorvideo.com/
                          www.BCVnet.com Website Hosting & Server Colocation
                          " We Produce Videos & Internet Websites for YOUR Business "
                          ----------------------------------------------------------------

                          > -----Original Message-----
                          > From: R Shapp [mailto:ras45@...]
                          > Sent: November 6, 2003 7:37 PM
                          > To: ntb-html@yahoogroups.com
                          > Subject: [NH] Word 2002 to HTML
                          >
                          >
                          > Hi Group,
                          >
                          > I need help converting Word documents to web pages ..snip...
                        • Greg Chapman
                          Hi Alec, Sorry! Just realised I never responded.... ... And the answer is no! :-( A student brought his own laptop to a session I was running. It really was
                          Message 12 of 18 , Nov 17, 2003
                          • 0 Attachment
                            Hi Alec,

                            Sorry! Just realised I never responded....

                            > The copy as HTML feature sounds neat. I've got Office 2002
                            > I tried searching in Word 2002's Help for "copy as html" hoping
                            > it had been supplied as standard - no joy
                            > Ditto checking Word-Tools-Options-General-Web Options
                            > Ditto Google: "copy as html" "Office 2002"
                            > Ditto d/l'ing and attempting to install w/fingers crossed (You don't have
                            > Office 2000 - ya, I know :-( )
                            >
                            > Any suggestions? - noting that you said you'd "only seen WORD2002
                            > once" :-)

                            And the answer is no! :-(

                            A student brought his own laptop to a session I was running. It really was
                            "only seen once"!

                            All I could do is send you a zipped WORD file which includes screen dumps of
                            the various filter windows - to show you what you are missing!

                            Greg
                          • Kathy Jungjohann
                            Greg, That must be where I going wrong as well: Not setting the filter options right ... even choosing filtered html leaves all the junk in ... Please send
                            Message 13 of 18 , Nov 17, 2003
                            • 0 Attachment
                              Greg,
                              That must be where I going wrong as well:
                              Not setting the filter options right ...
                              even choosing filtered html leaves all the junk in ...
                              Please send screen snaps to me as well ...

                              Kathy

                              > > Any suggestions? - noting that you said you'd "only seen WORD2002
                              > > once" :-)
                              >And the answer is no! :-(
                              >All I could do is send you a zipped WORD file which includes screen dumps of
                              >the various filter windows - to show you what you are missing!
                              >
                              >Greg
                            • Greg Chapman
                              Hi Kathy, ... You have been following the thread and are aware that the HTML Filter (msohtmf2.exe) only works in WORD 2000, not 2002, aren t you? I ve uploaded
                              Message 14 of 18 , Nov 17, 2003
                              • 0 Attachment
                                Hi Kathy,

                                > That must be where I going wrong as well:
                                > Not setting the filter options right ...
                                > even choosing filtered html leaves all the junk in ...
                                > Please send screen snaps to me as well ...

                                You have been following the thread and are aware that the HTML Filter
                                (msohtmf2.exe) only works in WORD 2000, not 2002, aren't you?

                                I've uploaded to the link below a file containing text extracted from a
                                couple of pages from the MS Support site, which provide installation and
                                some usage instructions, and some screen dumps showing the window appearance
                                and options screen.

                                Visit:
                                http://www.eastwalton.fsworld.co.uk/HTMLFilterNotes.zip

                                Greg
                              • Kathy Jungjohann
                                Yes, have been following thread ... been meaning to jump in ... It s only with 2002 that I haven t been able to solve the conversion problem. (Oh, for the good
                                Message 15 of 18 , Nov 17, 2003
                                • 0 Attachment
                                  Yes, have been following thread ... been meaning to jump in ...
                                  It's only with 2002 that I haven't been able to solve the conversion problem.
                                  (Oh, for the good ol' days of a simpler Word conversion ...)
                                  There are a lot of alt characters in the text I need to convert - accented
                                  letters.
                                  Texism used to work until he rewrote the program and it replaced all
                                  accented letters with question marks - no matter how I tried saving the
                                  file!! (Maybe he's rewritten it again since I quit using it, and it is now
                                  more successful ...) http://textism.com/wordcleaner/

                                  Thanks,
                                  Kathy

                                  >You have been following the thread and are aware that the HTML Filter
                                  >(msohtmf2.exe) only works in WORD 2000, not 2002, aren't you?
                                  >
                                  >Visit:
                                  >http://www.eastwalton.fsworld.co.uk/HTMLFilterNotes.zip
                                  >
                                  >Greg
                                • Alec Burgess
                                  Greg: I ve found that the built-in Word 2002 option to save as ... web-filtered and then opening in a browser and using their Copy partial source is adequate
                                  Message 16 of 18 , Nov 17, 2003
                                  • 0 Attachment
                                    Greg:

                                    I've found that the built-in Word 2002 option to save as ... web-filtered
                                    and then opening in a browser and using their Copy partial source is
                                    adequate if clumsy.

                                    Just for fun, I created a simple empty 2x2 table, saved as web-filtered,
                                    opened in Notetab and ran Tidy against it. The error report showed just:
                                    >>
                                    line 8 column 1 - Warning: <style> lacks "type" attribute
                                    line 37 column 1 - Warning: <table> lacks "summary" attribute
                                    <<
                                    seems good enough to me :-)
                                    I don't do enough with Word and/or HTML to be seriously bothered.

                                    > All I could do is send you a zipped WORD file which includes screen
                                    > dumps of the various filter windows - to show you what you are
                                    > missing!

                                    Please do.

                                    Regards ... Alec
                                    --

                                    ---- Original Message ----
                                    From: "Greg Chapman" <greg@...>
                                    To: <ntb-html@yahoogroups.com>
                                    Sent: Monday, November 17, 2003 08:31
                                    Subject: [gla: RE: RE: [NH] Word 2002 to HTML

                                    > Hi Alec,
                                    >
                                    > Sorry! Just realised I never responded....
                                    >
                                    >> The copy as HTML feature sounds neat. I've got Office 2002
                                    >> I tried searching in Word 2002's Help for "copy as html" hoping
                                    >> it had been supplied as standard - no joy
                                    >> Ditto checking Word-Tools-Options-General-Web Options
                                    >> Ditto Google: "copy as html" "Office 2002"
                                    >> Ditto d/l'ing and attempting to install w/fingers crossed (You don't
                                    >> have Office 2000 - ya, I know :-( )
                                    >>
                                    >> Any suggestions? - noting that you said you'd "only seen WORD2002
                                    >> once" :-)
                                    >
                                    > And the answer is no! :-(
                                    >
                                    > A student brought his own laptop to a session I was running. It
                                    > really was "only seen once"!
                                    >
                                    > All I could do is send you a zipped WORD file which includes screen
                                    > dumps of the various filter windows - to show you what you are
                                    > missing!
                                  • Greg Chapman
                                    Hi Alec, ... Hope the link: http://www.eastwalton.fsworld.co.uk/HTMLFilterNotes.zip sufficed! Greg
                                    Message 17 of 18 , Nov 18, 2003
                                    • 0 Attachment
                                      Hi Alec,

                                      > > All I could do is send you a zipped WORD file which includes screen
                                      > > dumps of the various filter windows - to show you what you are
                                      > > missing!
                                      >
                                      > Please do.

                                      Hope the link:
                                      http://www.eastwalton.fsworld.co.uk/HTMLFilterNotes.zip
                                      sufficed!

                                      Greg
                                    • Alec Burgess
                                      Thanks Greg. It looks like the Word 2002 save as ... Html-filtered does basically the same thing (possibly excluding some of the optional removals) but only
                                      Message 18 of 18 , Nov 18, 2003
                                      • 0 Attachment
                                        Thanks Greg.

                                        It looks like the Word 2002 "save as ... Html-filtered" does basically the
                                        same thing (possibly excluding some of the optional removals) but only one
                                        at a time.

                                        I guess only Bill knows why it wasn't included in OfficeXP

                                        Regards ... Alec
                                        --

                                        ---- Original Message ----
                                        From: "Greg Chapman" <greg@...>
                                        To: <ntb-html@yahoogroups.com>
                                        Sent: Tuesday, November 18, 2003 05:11
                                        Subject: [gla: RE: [NH] Word 2002 to HTML

                                        > Hi Alec,
                                        >
                                        >>> All I could do is send you a zipped WORD file which includes screen
                                        >>> dumps of the various filter windows - to show you what you are
                                        >>> missing!
                                        >>
                                        >> Please do.
                                        >
                                        > Hope the link:
                                        > http://www.eastwalton.fsworld.co.uk/HTMLFilterNotes.zip
                                        > sufficed!
                                      Your message has been successfully submitted and would be delivered to recipients shortly.