Loading ...
Sorry, an error occurred while loading the content.

Re: [NH] Tidy weirdness

Expand Messages
  • swirus@yahoo.com
    ... doesn t ... really need ... cleaners on ... you can ... I think I ve isolated the problem as being that HTMLtidy does not like the arbitrary line breaks
    Message 1 of 6 , Aug 14, 2001
    • 0 Attachment
      --- In ntb-html@y..., Jim Beidle <JBeidle@c...> wrote:
      > Hmmm...the first thing that pops into my mind is that Word 2000
      doesn't
      > produce HTML, it produces a MS-specification XML page. What you
      really need
      > is something to scrub out the XML. There are a couple of XML
      cleaners on
      > the Notetab library site at http://www.notetab.com/html.htm that
      you can
      > use. Look at the whole page, not just the XML portion of it.

      I think I've isolated the problem as being that HTMLtidy does not
      like the arbitrary line breaks used by Word, which fall in the middle
      of tags and such, and who is to blame it? Unfortunately I couldn't
      join the lines because the documents are very long, and apparently
      Notetab was not having such a long paragraph (100,000 characters with
      all of that useless repeated formatting data) What I did was download
      a Microsoft product which strips all of their proprietary XML from
      the HTML - I got it at:

      http://office.microsoft.com/downloads/2000/Msohtmf2.aspx

      With that removed, the code had fallen to 40,000 characters, and
      small enough to join, the HTMLTidy, which worked its magic.

      > Of course, you could do what I do and refuse to use Word as a HTML
      editor
      > ;-) Even Front Page Express :-P does a better job of wysiwyg layout
      than
      > Word and provides code that's easier to clean. Or just use Notetab
      >
      > I hope this helped a bit, and good luck!

      If I honestly had any choice, I would not be using Word.
      Unfortunately, Frontpage isn't part of my installation. Mind you, if
      I honestly had any choice, I'd be soaking up some rays in the south
      of France right now. Notwithstanding my personal bitterness, thanks
      for your advice, Jim.

      John.
    • swirus@yahoo.com
      ... files. ... to use a ... repository on the ... wizard. ... 2000: ... library. ... This looks a lot more elegant way of configuring tidy. I have a solution
      Message 2 of 6 , Aug 14, 2001
      • 0 Attachment
        --- In ntb-html@y..., "Grant" <emerge@p...> wrote:
        > There is a special tidy switch which can be used to clean word html
        files.
        > word-2000: [yes|no]
        > The best way to get all the switch options from withen notetab is
        to use a
        > config file.
        >
        > In my xhtml library (available from the library download
        repository on the
        > notetab site)there are two tidy related clips
        > -tidy
        > _TidyConfigSetup
        > which help generate a complete tidy config file via a notetab
        wizard.
        > The wizard contains all the tidy config options including the 'word-
        2000: '
        > switch which you can try.
        > The two clips stand alone and be taken out of the general xhtml
        library.
        > Included below
        >
        This looks a lot more elegant way of configuring tidy. I have a
        solution to my current problems (see other mail) but I shall download
        your libraries for future use (I don't trust myself to figure out
        where the line breaks go after so many brain frazzling hours of
        correcting MSHTML(TM)).

        The trouble with tidy, as with so many things in the computer world
        is that there is a constant battle between power (and HTMLTidy is
        powerful) and complexity. What I like about it is generally with the
        default options it does a good job. But HTML author or programmer is
        not my main job, so I simply haven't the time to learn the finer
        points of configuration. It looks like your scripts take the edge off
        this, for which much thanks.

        Cheers,
        John
      • Greg Chapman
        Hi Jim and Swirus ... There s also the official HTML filter from Microsoft. I picked up my copy from a magazine cover disk, but try a search for the file
        Message 3 of 6 , Aug 14, 2001
        • 0 Attachment
          Hi Jim and Swirus

          > Hmmm...the first thing that pops into my mind is that Word 2000 doesn't
          > produce HTML, it produces a MS-specification XML page. What you
          > really need
          > is something to scrub out the XML. There are a couple of XML cleaners on
          > the Notetab library site at http://www.notetab.com/html.htm that you can
          > use. Look at the whole page, not just the XML portion of it.

          There's also the official HTML filter from Microsoft. I picked up my copy
          from a magazine cover disk, but try a search for the file "msohtmlf2.exe".
          This is v2 of the Microsoft Office HTML filter.

          It does a number of things including place an "Export to compact HTML"
          button on the standard toolbar and additional export options on the File
          menu, including one to create a CSS file, from your document. HTML TIDY
          will still find some garbage to correct, but its a massive improvement on
          the standard output.

          Greg
        • Bob Janes
          ... It s at http://office.microsoft.com/downloads/2000/Msohtmf2.aspx Best wishes Bob -- Bob Janes Organisational Consultant +44 (7850) 150133 PO Box 211 Welwyn
          Message 4 of 6 , Aug 14, 2001
          • 0 Attachment
            > There's also the official HTML filter from Microsoft. I
            > picked up my copy from a magazine cover disk, but try a
            > search for the file "msohtmlf2.exe". This is v2 of the
            > Microsoft Office HTML filter.

            It's at http://office.microsoft.com/downloads/2000/Msohtmf2.aspx

            Best wishes

            Bob

            --

            Bob Janes
            Organisational Consultant
            +44 (7850) 150133
            PO Box 211 Welwyn AL6 0EX UK
            mailto:bob.janes@...
            www.webster-and-janes.co.uk
          Your message has been successfully submitted and would be delivered to recipients shortly.