Loading ...
Sorry, an error occurred while loading the content.

Re: [xenu-usergroup] displaying UTF-8 correctly

Expand Messages
  • Tilman Hausherr
    ... Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu displays arab/russian characters in the window and the report (but not in the
    Message 1 of 5 , Jan 23, 2008
    • 0 Attachment
      On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:

      >There's a new beta version that has an improvement when displaying UTF-8
      >pages in the Xenu window:
      >http://home.snafu.de/tilman/tmp/xenubeta.zip
      >
      >Try these pages
      >http://www.mthojgaard.dk/pages/ou-fo_byggeri
      >http://www.pickwicktea.com/ru/
      >with the old and the new version to see what I mean.
      >
      >Xenu only looks at what's in the HTTP header (charset), so your server
      >has to be configured correctly. Charset settings in the HTML page are
      >not relevant.
      >
      >I'm planning to generalize this improvement at a later time, so that it
      >would work with any charset/codepage.

      Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
      displays arab/russian characters in the window and the report (but not
      in the Properties dialog).

      A later step will be to handle charsets that do not display with the
      default (arial) font (e.g. Chinese, Japanese).

      Tilman


      >
      >The .XEN format has changed, so don't save any .XEN file if you're
      >planning to reuse the "old" version. Old .XEN files will be read by the
      >new version, but won't have the display improvement.
      >
      >Tilman
      >
      >
      >
      >Yahoo! Groups Links
      >
      >
      >
    • Tilman Hausherr
      I am now also handling charsets that do not appear in the header, but in a meta tag in the page itself. (Note that header settings take higher priority) I
      Message 2 of 5 , Feb 2 10:59 AM
      • 0 Attachment
        I am now also handling charsets that do not appear in the header, but in
        a meta tag in the page itself. (Note that header settings take higher
        priority)

        I still don't handle font switching (although I already tested some
        code). Apparently, it isn't needed: I have read that Windows fonts have
        "links" where missing codepages can be found, and that this is handled
        automatically.

        With the current version, I was able to display chinese, korean, arabic,
        russian, japanese, greek and indian(!) websites.

        I don't know if Xenu will work correctly in these countries itself.

        Anyway, I uploaded a new beta. Mail me if it doesn't work properly.

        Tilman

        On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:

        >On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
        >
        >>There's a new beta version that has an improvement when displaying UTF-8
        >>pages in the Xenu window:
        >>http://home.snafu.de/tilman/tmp/xenubeta.zip
        >>
        >>Try these pages
        >>http://www.mthojgaard.dk/pages/ou-fo_byggeri
        >>http://www.pickwicktea.com/ru/
        >>with the old and the new version to see what I mean.
        >>
        >>Xenu only looks at what's in the HTTP header (charset), so your server
        >>has to be configured correctly. Charset settings in the HTML page are
        >>not relevant.
        >>
        >>I'm planning to generalize this improvement at a later time, so that it
        >>would work with any charset/codepage.
        >
        >Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
        >displays arab/russian characters in the window and the report (but not
        >in the Properties dialog).
        >
        >A later step will be to handle charsets that do not display with the
        >default (arial) font (e.g. Chinese, Japanese).
        >
        >Tilman
        >
        >
        >>
        >>The .XEN format has changed, so don't save any .XEN file if you're
        >>planning to reuse the "old" version. Old .XEN files will be read by the
        >>new version, but won't have the display improvement.
        >>
        >>Tilman
        >>
        >>
        >>
        >>Yahoo! Groups Links
        >>
        >>
        >>
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
      • Tilman Hausherr
        If you just downloaded the beta version, you might want to do it again. The version I uploaded had a test feature to not delete files in the %temp% directory.
        Message 3 of 5 , Feb 2 11:24 AM
        • 0 Attachment
          If you just downloaded the beta version, you might want to do it again.
          The version I uploaded had a test feature to not delete files in the
          %temp% directory. (TGH*.* files)

          Tilman

          On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:

          >I am now also handling charsets that do not appear in the header, but in
          >a meta tag in the page itself. (Note that header settings take higher
          >priority)
          >
          >I still don't handle font switching (although I already tested some
          >code). Apparently, it isn't needed: I have read that Windows fonts have
          >"links" where missing codepages can be found, and that this is handled
          >automatically.
          >
          >With the current version, I was able to display chinese, korean, arabic,
          >russian, japanese, greek and indian(!) websites.
          >
          >I don't know if Xenu will work correctly in these countries itself.
          >
          >Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
          >
          >Tilman
          >
          >On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
          >
          >>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
          >>
          >>>There's a new beta version that has an improvement when displaying UTF-8
          >>>pages in the Xenu window:
          >>>http://home.snafu.de/tilman/tmp/xenubeta.zip
          >>>
          >>>Try these pages
          >>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
          >>>http://www.pickwicktea.com/ru/
          >>>with the old and the new version to see what I mean.
          >>>
          >>>Xenu only looks at what's in the HTTP header (charset), so your server
          >>>has to be configured correctly. Charset settings in the HTML page are
          >>>not relevant.
          >>>
          >>>I'm planning to generalize this improvement at a later time, so that it
          >>>would work with any charset/codepage.
          >>
          >>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
          >>displays arab/russian characters in the window and the report (but not
          >>in the Properties dialog).
          >>
          >>A later step will be to handle charsets that do not display with the
          >>default (arial) font (e.g. Chinese, Japanese).
          >>
          >>Tilman
          >>
          >>
          >>>
          >>>The .XEN format has changed, so don't save any .XEN file if you're
          >>>planning to reuse the "old" version. Old .XEN files will be read by the
          >>>new version, but won't have the display improvement.
          >>>
          >>>Tilman
          >>>
          >>>
          >>>
          >>>Yahoo! Groups Links
          >>>
          >>>
          >>>
          >>
          >>
          >>
          >>Yahoo! Groups Links
          >>
          >>
          >>
          >
          >
          >
          >Yahoo! Groups Links
          >
          >
          >
        • Tilman Hausherr
          If you just downloaded the beta version, you might want to do it again. The current version had a nasty bug that would fill 1K of memory space with zeroes. It
          Message 4 of 5 , Feb 14 2:11 AM
          • 0 Attachment
            If you just downloaded the beta version, you might want to do it again.

            The current version had a nasty bug that would fill 1K of memory space
            with zeroes. It should have caused a lot of trouble, but apparently, all
            it did was to prevent the column sorting from working correctly.

            http://home.snafu.de/tilman/tmp/xenubeta.zip

            Tilman

            On Sat, 02 Feb 2008 20:24:42 +0100, Tilman Hausherr wrote:

            >If you just downloaded the beta version, you might want to do it again.
            >The version I uploaded had a test feature to not delete files in the
            >%temp% directory. (TGH*.* files)
            >
            >Tilman
            >
            >On Sat, 02 Feb 2008 19:59:50 +0100, Tilman Hausherr wrote:
            >
            >>I am now also handling charsets that do not appear in the header, but in
            >>a meta tag in the page itself. (Note that header settings take higher
            >>priority)
            >>
            >>I still don't handle font switching (although I already tested some
            >>code). Apparently, it isn't needed: I have read that Windows fonts have
            >>"links" where missing codepages can be found, and that this is handled
            >>automatically.
            >>
            >>With the current version, I was able to display chinese, korean, arabic,
            >>russian, japanese, greek and indian(!) websites.
            >>
            >>I don't know if Xenu will work correctly in these countries itself.
            >>
            >>Anyway, I uploaded a new beta. Mail me if it doesn't work properly.
            >>
            >>Tilman
            >>
            >>On Thu, 24 Jan 2008 07:42:11 +0100, Tilman Hausherr wrote:
            >>
            >>>On Wed, 23 Jan 2008 20:05:41 +0100, Tilman Hausherr wrote:
            >>>
            >>>>There's a new beta version that has an improvement when displaying UTF-8
            >>>>pages in the Xenu window:
            >>>>http://home.snafu.de/tilman/tmp/xenubeta.zip
            >>>>
            >>>>Try these pages
            >>>>http://www.mthojgaard.dk/pages/ou-fo_byggeri
            >>>>http://www.pickwicktea.com/ru/
            >>>>with the old and the new version to see what I mean.
            >>>>
            >>>>Xenu only looks at what's in the HTTP header (charset), so your server
            >>>>has to be configured correctly. Charset settings in the HTML page are
            >>>>not relevant.
            >>>>
            >>>>I'm planning to generalize this improvement at a later time, so that it
            >>>>would work with any charset/codepage.
            >>>
            >>>Which I just did. I tested it with aljazeera.net and pravda.ru and Xenu
            >>>displays arab/russian characters in the window and the report (but not
            >>>in the Properties dialog).
            >>>
            >>>A later step will be to handle charsets that do not display with the
            >>>default (arial) font (e.g. Chinese, Japanese).
            >>>
            >>>Tilman
            >>>
            >>>
            >>>>
            >>>>The .XEN format has changed, so don't save any .XEN file if you're
            >>>>planning to reuse the "old" version. Old .XEN files will be read by the
            >>>>new version, but won't have the display improvement.
            >>>>
            >>>>Tilman
            >>>>
            >>>>
            >>>>
            >>>>Yahoo! Groups Links
            >>>>
            >>>>
            >>>>
            >>>
            >>>
            >>>
            >>>Yahoo! Groups Links
            >>>
            >>>
            >>>
            >>
            >>
            >>
            >>Yahoo! Groups Links
            >>
            >>
            >>
            >
            >
            >
            >Yahoo! Groups Links
            >
            >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.