Loading ...
Sorry, an error occurred while loading the content.

Re: [NH] Clipbook / program to convert Word junk -> HTML

Expand Messages
  • Rudolf Horbas
    ... That s what I just wanted to post. ... Here I can help, my New Year holiday is over since yesterday:
    Message 1 of 9 , Jan 3, 2007
    • 0 Attachment
      > 2. Use HTMLTidy to convert it from UTF-8 to another encoding

      That's what I just wanted to post.

      > Unfortunately, I'm writing this on the road right now (extended New Year
      > holiday) and I don't have all my references at hand to give you the
      > nitty gritty details.

      Here I can help, my New Year holiday is over since yesterday:
      http://tidy.sourceforge.net/docs/quickref.html#word-2000

      Instead of making a new config file for NoteTab, I'd suggest to use
      TidyGUI (no longer maintained, but functional):
      http://perso.orange.fr/ablavier/TidyGUI/index.html

      The tab "cleanup" has the option "Source document is from MS Word 2000".

      (You could alternatively just use TidyGUI to save a Tidy.cfg for NoteTab.)

      HTH, and Happy New Year to all!
      Rudi
    • Julie
      ... Are characters like this font dependant? I ve tried all the encoding options, as well as checking the source document is from MS Word 2000 tab, but none
      Message 2 of 9 , Jan 3, 2007
      • 0 Attachment
        At 1/3/2007 10:29 AM, Rudolf Horbas wrote:

        >Here I can help, my New Year holiday is over since yesterday:
        >http://tidy.sourceforge.net/docs/quickref.html#word-2000
        >
        >Instead of making a new config file for NoteTab, I'd suggest to use
        >TidyGUI (no longer maintained, but functional):
        >http://perso.orange.fr/ablavier/TidyGUI/index.html
        >
        >The tab "cleanup" has the option "Source document is from MS Word 2000".

        Are characters like this font dependant? I've tried all the encoding
        options, as well as checking the "source document is from MS Word
        2000" tab, but none of the tries has correctly converted the characters.

        from the http://textism.com/wordcleaner/ site my text from my other
        post translates as

        “Good and ill have not changed since yesteryear; nor are they
        one thing among Elves and Dwarves and another thing among Men. It is
        man’s part to discern them as much in the Golden Wood as in his
        own house.” Aragorn to Éomer

        Which looks fine on preview in the browser. Any helpful hints?
      • loro
        ... It s Word s curly quotes that give you trouble. They are non-standard. You can turn them off in Word. I don t know what happens if you try to turn them
        Message 3 of 9 , Jan 3, 2007
        • 0 Attachment
          Julie wrote:
          >from the http://textism.com/wordcleaner/ site my text from my other
          >post translates as
          >
          >“Good and ill have not changed since yesteryear; nor are they
          >one thing among Elves and Dwarves and another thing among Men. It is
          >man’s part to discern them as much in the Golden Wood as in his
          >own house.” Aragorn to Éomer
          >
          >Which looks fine on preview in the browser. Any helpful hints?

          It's Word's curly quotes that give you trouble. They are non-standard. You
          can turn them off in Word. I don't know what happens if you try to turn
          them off on a document that already has them. Maybe if you turn them off
          and then paste the text into a new document?

          I think it's this:
          Tool | AutorCorrect, then you have them on both the AutoFormat and
          AutoFormat As You Type tabs, "Replace straight quotes with smart quotes".

          Too bad about WordCleaner. It used to be free. :-(

          Lotta
        • Julie
          Hey Lotta ... It s also accented letters like Éomer. Many of these are articles that have been posted in blogs that I ve collected... I can t believe people
          Message 4 of 9 , Jan 3, 2007
          • 0 Attachment
            Hey Lotta

            >It's Word's curly quotes that give you trouble. They are non-standard. You
            >can turn them off in Word. I don't know what happens if you try to turn
            >them off on a document that already has them. Maybe if you turn them off
            >and then paste the text into a new document?

            It's also accented letters like Éomer. Many of
            these are articles that have been posted in blogs
            that I've collected... I can't believe people
            posted that mess! A friend wants to repost them
            cleaned up, so I thought I'd see if there was an easy way to do this. :-)

            >Too bad about WordCleaner. It used to be free. :-(

            The site gives me six uses a day. The potential
            project isn't a rush at least... doesn't matter
            how long it takes, but I have a substantial
            number of articles to convert. Could take a while. LOL

            Julie


            --
            No virus found in this outgoing message.
            Checked by AVG Free Edition.
            Version: 7.5.432 / Virus Database: 268.16.4/615 - Release Date: 1/3/2007 1:34 PM
          • loro
            ... You can do it with Notetab too. Notetab can display the curly quotes and the Replace thingie recognizes them, so you can select one of each kind and do a
            Message 5 of 9 , Jan 3, 2007
            • 0 Attachment
              I wrote:
              >It's Word's curly quotes that give you trouble.

              You can do it with Notetab too. Notetab can display the curly quotes and
              the Replace thingie recognizes them, so you can select one of each kind and
              do a "replace all" with the entity for the corresponding legit curly quote.

              Lotta
            • loro
              ... Ah. The first example came through all jumbled so I went by the second one. ... You could use a proxy. ;-o) Lotta
              Message 6 of 9 , Jan 3, 2007
              • 0 Attachment
                Julie wrote:
                > >It's Word's curly quotes that give you trouble.

                >It's also accented letters like Éomer.

                Ah. The first example came through all jumbled so I went by the second one.

                >The site gives me six uses a day. The potential
                >project isn't a rush at least... doesn't matter
                >how long it takes, but I have a substantial
                >number of articles to convert. Could take a while. LOL

                You could use a proxy. ;-o)

                Lotta
              • Julie
                Hey Lotta - ... The thought has crossed my mind. Julie -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.432 / Virus
                Message 7 of 9 , Jan 3, 2007
                • 0 Attachment
                  Hey Lotta -

                  >You could use a proxy. ;-o)

                  The thought has crossed my mind. <G>

                  Julie


                  --
                  No virus found in this outgoing message.
                  Checked by AVG Free Edition.
                  Version: 7.5.432 / Virus Database: 268.16.4/615 - Release Date: 1/3/2007 1:34 PM
                • bruce.somers@web.de
                  Julie wrote: I can t believe people posted that mess! A friend wants to repost them cleaned up, so I thought I d see if there was an easy
                  Message 8 of 9 , Jan 4, 2007
                  • 0 Attachment
                    Julie <gleits@...> wrote:

                    I can't believe people
                    posted that mess! A friend wants to repost them
                    cleaned up, so I thought I'd see if there was an easy way to do this. :-)

                    No, you needn't 't believe that. It's much more likely that some component (program) used by the poster of the blog entry, has replaced what it considered to be non-standard characters, curly quotes, accented characters etc., with their corresponding "escape-codes", because many viewers will not have the character sets needed to display them. Many systems recognize only the extremely provincial and badly limited ASCII character set.

                    It's probably the blog software that is not able to replace those escape-codes with the corresponding characters.

                    Bruce
                  Your message has been successfully submitted and would be delivered to recipients shortly.