Loading ...
Sorry, an error occurred while loading the content.

displaying UTF-8 correctly

Expand Messages
  • Tilman Hausherr
    There s a new beta version that has an improvement when displaying UTF-8 pages in the Xenu window: http://home.snafu.de/tilman/tmp/xenubeta.zip Try these pages
    Message 1 of 5 , Jan 23, 2008
    • 0 Attachment
      There's a new beta version that has an improvement when displaying UTF-8
      pages in the Xenu window:
      http://home.snafu.de/tilman/tmp/xenubeta.zip

      Try these pages
      http://www.mthojgaard.dk/pages/ou-fo_byggeri
      http://www.pickwicktea.com/ru/
      with the old and the new version to see what I mean.

      Xenu only looks at what's in the HTTP header (charset), so your server
      has to be configured correctly. Charset settings in the HTML page are
      not relevant.

      I'm planning to generalize this improvement at a later time, so that it
      would work with any charset/codepage.

      The .XEN format has changed, so don't save any .XEN file if you're
      planning to reuse the "old" version. Old .XEN files will be read by the
      new version, but won't have the display improvement.

      Tilman
    • Tilman Hausherr
      ... Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu displays arab/russian characters in the window and the report (but not in the
      Message 2 of 5 , Jan 23, 2008
      • 0 Attachment
        On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:

        >There's a new beta version that has an improvement when displaying UTF-8
        >pages in the Xenu window:
        >http://home.snafu.de/tilman/tmp/xenubeta.zip
        >
        >Try these pages
        >http://www.mthojgaard.dk/pages/ou-fo_byggeri
        >http://www.pickwicktea.com/ru/
        >with the old and the new version to see what I mean.
        >
        >Xenu only looks at what's in the HTTP header (charset), so your server
        >has to be configured correctly. Charset settings in the HTML page are
        >not relevant.
        >
        >I'm planning to generalize this improvement at a later time, so that it
        >would work with any charset/codepage.

        Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
        displays arab/russian characters in the window and the report (but not
        in the Properties dialog).

        A later step will be to handle charsets that do not display with the
        default (arial) font (e.g. Chinese, Japanese).

        Tilman


        >
        >The .XEN format has changed, so don't save any .XEN file if you're
        >planning to reuse the "old" version. Old .XEN files will be read by the
        >new version, but won't have the display improvement.
        >
        >Tilman
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
      • Tilman Hausherr
        I am now also handling charsets that do not appear in the header, but in a meta tag in the page itself. (Note that header settings take higher priority) I
        Message 3 of 5 , Feb 2, 2008
        • 0 Attachment
          I am now also handling charsets that do not appear in the header, but in
          a meta tag in the page itself. (Note that header settings take higher
          priority)

          I still don't handle font switching (although I already tested some
          code). Apparently, it isn't needed: I have read that Windows fonts have
          "links" where missing codepages can be found, and that this is handled
          automatically.

          With the current version, I was able to display chinese, korean, arabic,
          russian, japanese, greek and indian(!) websites.

          I don't know if Xenu will work correctly in these countries itself.

          Anyway, I uploaded a new beta. Mail me if it doesn't work properly.

          Tilman

          On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:

          >On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
          >
          >>There's a new beta version that has an improvement when displaying UTF-8
          >>pages in the Xenu window:
          >>http://home.snafu.de/tilman/tmp/xenubeta.zip
          >>
          >>Try these pages
          >>http://www.mthojgaard.dk/pages/ou-fo_byggeri
          >>http://www.pickwicktea.com/ru/
          >>with the old and the new version to see what I mean.
          >>
          >>Xenu only looks at what's in the HTTP header (charset), so your server
          >>has to be configured correctly. Charset settings in the HTML page are
          >>not relevant.
          >>
          >>I'm planning to generalize this improvement at a later time, so that it
          >>would work with any charset/codepage.
          >
          >Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
          >displays arab/russian characters in the window and the report (but not
          >in the Properties dialog).
          >
          >A later step will be to handle charsets that do not display with the
          >default (arial) font (e.g. Chinese, Japanese).
          >
          >Tilman
          >
          >
          >>
          >>The .XEN format has changed, so don't save any .XEN file if you're
          >>planning to reuse the "old" version. Old .XEN files will be read by the
          >>new version, but won't have the display improvement.
          >>
          >>Tilman
          >>
          >>
          >>
          >>Yahoo! Groups Links
          >>
          >>
          >>
          >
          >
          >
          >Yahoo! Groups Links
          >
          >
          >
        • Tilman Hausherr
          If you just downloaded the beta version, you might want to do it again. The version I uploaded had a test feature to not delete files in the %temp% directory.
          Message 4 of 5 , Feb 2, 2008
          • 0 Attachment
            If you just downloaded the beta version, you might want to do it again.
            The version I uploaded had a test feature to not delete files in the
            %temp% directory. (TGH*.* files)

            Tilman

            On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:

            >I am now also handling charsets that do not appear in the header, but in
            >a meta tag in the page itself. (Note that header settings take higher
            >priority)
            >
            >I still don't handle font switching (although I already tested some
            >code). Apparently, it isn't needed: I have read that Windows fonts have
            >"links" where missing codepages can be found, and that this is handled
            >automatically.
            >
            >With the current version, I was able to display chinese, korean, arabic,
            >russian, japanese, greek and indian(!) websites.
            >
            >I don't know if Xenu will work correctly in these countries itself.
            >
            >Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
            >
            >Tilman
            >
            >On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
            >
            >>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
            >>
            >>>There's a new beta version that has an improvement when displaying UTF-8
            >>>pages in the Xenu window:
            >>>http://home.snafu.de/tilman/tmp/xenubeta.zip
            >>>
            >>>Try these pages
            >>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
            >>>http://www.pickwicktea.com/ru/
            >>>with the old and the new version to see what I mean.
            >>>
            >>>Xenu only looks at what's in the HTTP header (charset), so your server
            >>>has to be configured correctly. Charset settings in the HTML page are
            >>>not relevant.
            >>>
            >>>I'm planning to generalize this improvement at a later time, so that it
            >>>would work with any charset/codepage.
            >>
            >>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
            >>displays arab/russian characters in the window and the report (but not
            >>in the Properties dialog).
            >>
            >>A later step will be to handle charsets that do not display with the
            >>default (arial) font (e.g. Chinese, Japanese).
            >>
            >>Tilman
            >>
            >>
            >>>
            >>>The .XEN format has changed, so don't save any .XEN file if you're
            >>>planning to reuse the "old" version. Old .XEN files will be read by the
            >>>new version, but won't have the display improvement.
            >>>
            >>>Tilman
            >>>
            >>>
            >>>
            >>>Yahoo! Groups Links
            >>>
            >>>
            >>>
            >>
            >>
            >>
            >>Yahoo! Groups Links
            >>
            >>
            >>
            >
            >
            >
            >Yahoo! Groups Links
            >
            >
            >
          • Tilman Hausherr
            If you just downloaded the beta version, you might want to do it again. The current version had a nasty bug that would fill 1K of memory space with zeroes. It
            Message 5 of 5 , Feb 14, 2008
            • 0 Attachment
              If you just downloaded the beta version, you might want to do it again.

              The current version had a nasty bug that would fill 1K of memory space
              with zeroes. It should have caused a lot of trouble, but apparently, all
              it did was to prevent the column sorting from working correctly.

              http://home.snafu.de/tilman/tmp/xenubeta.zip

              Tilman

              On Sat, 02 Feb 2008 20:24:42 +0100, Tilman Hausherr wrote:

              >If you just downloaded the beta version, you might want to do it again.
              >The version I uploaded had a test feature to not delete files in the
              >%temp% directory. (TGH*.* files)
              >
              >Tilman
              >
              >On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:
              >
              >>I am now also handling charsets that do not appear in the header, but in
              >>a meta tag in the page itself. (Note that header settings take higher
              >>priority)
              >>
              >>I still don't handle font switching (although I already tested some
              >>code). Apparently, it isn't needed: I have read that Windows fonts have
              >>"links" where missing codepages can be found, and that this is handled
              >>automatically.
              >>
              >>With the current version, I was able to display chinese, korean, arabic,
              >>russian, japanese, greek and indian(!) websites.
              >>
              >>I don't know if Xenu will work correctly in these countries itself.
              >>
              >>Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
              >>
              >>Tilman
              >>
              >>On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
              >>
              >>>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
              >>>
              >>>>There's a new beta version that has an improvement when displaying UTF-8
              >>>>pages in the Xenu window:
              >>>>http://home.snafu.de/tilman/tmp/xenubeta.zip
              >>>>
              >>>>Try these pages
              >>>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
              >>>>http://www.pickwicktea.com/ru/
              >>>>with the old and the new version to see what I mean.
              >>>>
              >>>>Xenu only looks at what's in the HTTP header (charset), so your server
              >>>>has to be configured correctly. Charset settings in the HTML page are
              >>>>not relevant.
              >>>>
              >>>>I'm planning to generalize this improvement at a later time, so that it
              >>>>would work with any charset/codepage.
              >>>
              >>>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
              >>>displays arab/russian characters in the window and the report (but not
              >>>in the Properties dialog).
              >>>
              >>>A later step will be to handle charsets that do not display with the
              >>>default (arial) font (e.g. Chinese, Japanese).
              >>>
              >>>Tilman
              >>>
              >>>
              >>>>
              >>>>The .XEN format has changed, so don't save any .XEN file if you're
              >>>>planning to reuse the "old" version. Old .XEN files will be read by the
              >>>>new version, but won't have the display improvement.
              >>>>
              >>>>Tilman
              >>>>
              >>>>
              >>>>
              >>>>Yahoo! Groups Links
              >>>>
              >>>>
              >>>>
              >>>
              >>>
              >>>
              >>>Yahoo! Groups Links
              >>>
              >>>
              >>>
              >>
              >>
              >>
              >>Yahoo! Groups Links
              >>
              >>
              >>
              >
              >
              >
              >Yahoo! Groups Links
              >
              >
              >
            Your message has been successfully submitted and would be delivered to recipients shortly.