Re: [NH] Tidy weirdness
- --- In ntb-html@y..., Jim Beidle <JBeidle@c...> wrote:
> Hmmm...the first thing that pops into my mind is that Word 2000doesn't
> produce HTML, it produces a MS-specification XML page. What youreally need
> is something to scrub out the XML. There are a couple of XMLcleaners on
> the Notetab library site at http://www.notetab.com/html.htm thatyou can
> use. Look at the whole page, not just the XML portion of it.I think I've isolated the problem as being that HTMLtidy does not
like the arbitrary line breaks used by Word, which fall in the middle
of tags and such, and who is to blame it? Unfortunately I couldn't
join the lines because the documents are very long, and apparently
Notetab was not having such a long paragraph (100,000 characters with
all of that useless repeated formatting data) What I did was download
a Microsoft product which strips all of their proprietary XML from
the HTML - I got it at:
With that removed, the code had fallen to 40,000 characters, and
small enough to join, the HTMLTidy, which worked its magic.
> Of course, you could do what I do and refuse to use Word as a HTMLeditor
> ;-) Even Front Page Express :-P does a better job of wysiwyg layoutthan
> Word and provides code that's easier to clean. Or just use NotetabIf I honestly had any choice, I would not be using Word.
> I hope this helped a bit, and good luck!
Unfortunately, Frontpage isn't part of my installation. Mind you, if
I honestly had any choice, I'd be soaking up some rays in the south
of France right now. Notwithstanding my personal bitterness, thanks
for your advice, Jim.
- --- In ntb-html@y..., "Grant" <emerge@p...> wrote:
> There is a special tidy switch which can be used to clean word htmlfiles.
> word-2000: [yes|no]to use a
> The best way to get all the switch options from withen notetab is
> config file.repository on the
> In my xhtml library (available from the library download
> notetab site)there are two tidy related clipswizard.
> which help generate a complete tidy config file via a notetab
> The wizard contains all the tidy config options including the 'word-2000: '
> switch which you can try.library.
> The two clips stand alone and be taken out of the general xhtml
> Included belowThis looks a lot more elegant way of configuring tidy. I have a
solution to my current problems (see other mail) but I shall download
your libraries for future use (I don't trust myself to figure out
where the line breaks go after so many brain frazzling hours of
The trouble with tidy, as with so many things in the computer world
is that there is a constant battle between power (and HTMLTidy is
powerful) and complexity. What I like about it is generally with the
default options it does a good job. But HTML author or programmer is
not my main job, so I simply haven't the time to learn the finer
points of configuration. It looks like your scripts take the edge off
this, for which much thanks.
- Hi Jim and Swirus
> Hmmm...the first thing that pops into my mind is that Word 2000 doesn'tThere's also the official HTML filter from Microsoft. I picked up my copy
> produce HTML, it produces a MS-specification XML page. What you
> really need
> is something to scrub out the XML. There are a couple of XML cleaners on
> the Notetab library site at http://www.notetab.com/html.htm that you can
> use. Look at the whole page, not just the XML portion of it.
from a magazine cover disk, but try a search for the file "msohtmlf2.exe".
This is v2 of the Microsoft Office HTML filter.
It does a number of things including place an "Export to compact HTML"
button on the standard toolbar and additional export options on the File
menu, including one to create a CSS file, from your document. HTML TIDY
will still find some garbage to correct, but its a massive improvement on
the standard output.
> There's also the official HTML filter from Microsoft. IIt's at http://office.microsoft.com/downloads/2000/Msohtmf2.aspx
> picked up my copy from a magazine cover disk, but try a
> search for the file "msohtmlf2.exe". This is v2 of the
> Microsoft Office HTML filter.
+44 (7850) 150133
PO Box 211 Welwyn AL6 0EX UK