Loading ...
Sorry, an error occurred while loading the content.

Re: [xenu-usergroup] displaying UTF-8 correctly

Expand Messages
  • Tilman Hausherr
    I am now also handling charsets that do not appear in the header, but in a meta tag in the page itself. (Note that header settings take higher priority) I
    Message 1 of 5 , Feb 2, 2008
    • 0 Attachment
      I am now also handling charsets that do not appear in the header, but in
      a meta tag in the page itself. (Note that header settings take higher
      priority)

      I still don't handle font switching (although I already tested some
      code). Apparently, it isn't needed: I have read that Windows fonts have
      "links" where missing codepages can be found, and that this is handled
      automatically.

      With the current version, I was able to display chinese, korean, arabic,
      russian, japanese, greek and indian(!) websites.

      I don't know if Xenu will work correctly in these countries itself.

      Anyway, I uploaded a new beta. Mail me if it doesn't work properly.

      Tilman

      On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:

      >On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
      >
      >>There's a new beta version that has an improvement when displaying UTF-8
      >>pages in the Xenu window:
      >>http://home.snafu.de/tilman/tmp/xenubeta.zip
      >>
      >>Try these pages
      >>http://www.mthojgaard.dk/pages/ou-fo_byggeri
      >>http://www.pickwicktea.com/ru/
      >>with the old and the new version to see what I mean.
      >>
      >>Xenu only looks at what's in the HTTP header (charset), so your server
      >>has to be configured correctly. Charset settings in the HTML page are
      >>not relevant.
      >>
      >>I'm planning to generalize this improvement at a later time, so that it
      >>would work with any charset/codepage.
      >
      >Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
      >displays arab/russian characters in the window and the report (but not
      >in the Properties dialog).
      >
      >A later step will be to handle charsets that do not display with the
      >default (arial) font (e.g. Chinese, Japanese).
      >
      >Tilman
      >
      >
      >>
      >>The .XEN format has changed, so don't save any .XEN file if you're
      >>planning to reuse the "old" version. Old .XEN files will be read by the
      >>new version, but won't have the display improvement.
      >>
      >>Tilman
      >>
      >>
      >>
      >>Yahoo! Groups Links
      >>
      >>
      >>
      >
      >
      >
      >Yahoo! Groups Links
      >
      >
      >
    • Tilman Hausherr
      If you just downloaded the beta version, you might want to do it again. The version I uploaded had a test feature to not delete files in the %temp% directory.
      Message 2 of 5 , Feb 2, 2008
      • 0 Attachment
        If you just downloaded the beta version, you might want to do it again.
        The version I uploaded had a test feature to not delete files in the
        %temp% directory. (TGH*.* files)

        Tilman

        On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:

        >I am now also handling charsets that do not appear in the header, but in
        >a meta tag in the page itself. (Note that header settings take higher
        >priority)
        >
        >I still don't handle font switching (although I already tested some
        >code). Apparently, it isn't needed: I have read that Windows fonts have
        >"links" where missing codepages can be found, and that this is handled
        >automatically.
        >
        >With the current version, I was able to display chinese, korean, arabic,
        >russian, japanese, greek and indian(!) websites.
        >
        >I don't know if Xenu will work correctly in these countries itself.
        >
        >Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
        >
        >Tilman
        >
        >On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
        >
        >>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
        >>
        >>>There's a new beta version that has an improvement when displaying UTF-8
        >>>pages in the Xenu window:
        >>>http://home.snafu.de/tilman/tmp/xenubeta.zip
        >>>
        >>>Try these pages
        >>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
        >>>http://www.pickwicktea.com/ru/
        >>>with the old and the new version to see what I mean.
        >>>
        >>>Xenu only looks at what's in the HTTP header (charset), so your server
        >>>has to be configured correctly. Charset settings in the HTML page are
        >>>not relevant.
        >>>
        >>>I'm planning to generalize this improvement at a later time, so that it
        >>>would work with any charset/codepage.
        >>
        >>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
        >>displays arab/russian characters in the window and the report (but not
        >>in the Properties dialog).
        >>
        >>A later step will be to handle charsets that do not display with the
        >>default (arial) font (e.g. Chinese, Japanese).
        >>
        >>Tilman
        >>
        >>
        >>>
        >>>The .XEN format has changed, so don't save any .XEN file if you're
        >>>planning to reuse the "old" version. Old .XEN files will be read by the
        >>>new version, but won't have the display improvement.
        >>>
        >>>Tilman
        >>>
        >>>
        >>>
        >>>Yahoo! Groups Links
        >>>
        >>>
        >>>
        >>
        >>
        >>
        >>Yahoo! Groups Links
        >>
        >>
        >>
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
      • Tilman Hausherr
        If you just downloaded the beta version, you might want to do it again. The current version had a nasty bug that would fill 1K of memory space with zeroes. It
        Message 3 of 5 , Feb 14, 2008
        • 0 Attachment
          If you just downloaded the beta version, you might want to do it again.

          The current version had a nasty bug that would fill 1K of memory space
          with zeroes. It should have caused a lot of trouble, but apparently, all
          it did was to prevent the column sorting from working correctly.

          http://home.snafu.de/tilman/tmp/xenubeta.zip

          Tilman

          On Sat, 02 Feb 2008 20:24:42 +0100, Tilman Hausherr wrote:

          >If you just downloaded the beta version, you might want to do it again.
          >The version I uploaded had a test feature to not delete files in the
          >%temp% directory. (TGH*.* files)
          >
          >Tilman
          >
          >On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:
          >
          >>I am now also handling charsets that do not appear in the header, but in
          >>a meta tag in the page itself. (Note that header settings take higher
          >>priority)
          >>
          >>I still don't handle font switching (although I already tested some
          >>code). Apparently, it isn't needed: I have read that Windows fonts have
          >>"links" where missing codepages can be found, and that this is handled
          >>automatically.
          >>
          >>With the current version, I was able to display chinese, korean, arabic,
          >>russian, japanese, greek and indian(!) websites.
          >>
          >>I don't know if Xenu will work correctly in these countries itself.
          >>
          >>Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
          >>
          >>Tilman
          >>
          >>On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
          >>
          >>>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
          >>>
          >>>>There's a new beta version that has an improvement when displaying UTF-8
          >>>>pages in the Xenu window:
          >>>>http://home.snafu.de/tilman/tmp/xenubeta.zip
          >>>>
          >>>>Try these pages
          >>>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
          >>>>http://www.pickwicktea.com/ru/
          >>>>with the old and the new version to see what I mean.
          >>>>
          >>>>Xenu only looks at what's in the HTTP header (charset), so your server
          >>>>has to be configured correctly. Charset settings in the HTML page are
          >>>>not relevant.
          >>>>
          >>>>I'm planning to generalize this improvement at a later time, so that it
          >>>>would work with any charset/codepage.
          >>>
          >>>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
          >>>displays arab/russian characters in the window and the report (but not
          >>>in the Properties dialog).
          >>>
          >>>A later step will be to handle charsets that do not display with the
          >>>default (arial) font (e.g. Chinese, Japanese).
          >>>
          >>>Tilman
          >>>
          >>>
          >>>>
          >>>>The .XEN format has changed, so don't save any .XEN file if you're
          >>>>planning to reuse the "old" version. Old .XEN files will be read by the
          >>>>new version, but won't have the display improvement.
          >>>>
          >>>>Tilman
          >>>>
          >>>>
          >>>>
          >>>>Yahoo! Groups Links
          >>>>
          >>>>
          >>>>
          >>>
          >>>
          >>>
          >>>Yahoo! Groups Links
          >>>
          >>>
          >>>
          >>
          >>
          >>
          >>Yahoo! Groups Links
          >>
          >>
          >>
          >
          >
          >
          >Yahoo! Groups Links
          >
          >
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.