Loading ...
Sorry, an error occurred while loading the content.

Unicode support on Win32?

Expand Messages
  • Andrej Borsenkow
    Recently I tried to edit with Vim exported registry key. This was exported under Win2k. To my surprise, I saw only binary data instead of text ... until I
    Message 1 of 13 , Sep 29, 2000
    • 0 Attachment
      Recently I tried to edit with Vim exported registry key. This was exported
      under Win2k. To my surprise, I saw only binary data instead of text ... until
      I realized, that Win2k exports registry in Unicode (I have no idea, if it can
      be changed).

      What is the current state of Unicode support in Vim? As I understand, Vim-6
      includes Unicode support - is it possible to find binaries somewhere?

      TIA

      -andrej

      Have a nice DOS!
      B >>
    • Bram Moolenaar
      ... Vim 6.0 supports Unicode. Internally it uses UTF-8. You need to set the charcode and filecharcode options to edit a Unicode file (I would guess
      Message 2 of 13 , Sep 29, 2000
      • 0 Attachment
        Andrej Borsenkow wrote:

        > Recently I tried to edit with Vim exported registry key. This was exported
        > under Win2k. To my surprise, I saw only binary data instead of text ... until
        > I realized, that Win2k exports registry in Unicode (I have no idea, if it can
        > be changed).
        >
        > What is the current state of Unicode support in Vim? As I understand, Vim-6
        > includes Unicode support - is it possible to find binaries somewhere?

        Vim 6.0 supports Unicode. Internally it uses UTF-8. You need to set the
        'charcode' and 'filecharcode' options to edit a Unicode file (I would guess
        Windows uses a 16 bit coding, thus use "ucs-2" for 'filecharcode').

        I haven't tried this on Windows though. You need at least a Unicode font.
        You might need to do some experimenting to make it work. If you (or someone
        else) do find out how to make it work, adding some text to the docs would be
        good. Then others can start using it too.

        --
        hundred-and-one symptoms of being an internet addict:
        42. Your virtual girlfriend finds a new net sweetheart with a larger bandwidth.

        /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
        \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
      • Andrej Borsenkow
        ... I would be happy to do it, but I cannot compile Vim under Win32 and I cannot find it on FTP site (version 6 that is). Any place to download it from? TIA
        Message 3 of 13 , Oct 2, 2000
        • 0 Attachment
          >
          > I haven't tried this on Windows though. You need at least a Unicode font.
          > You might need to do some experimenting to make it work. If you (or someone
          > else) do find out how to make it work, adding some text to the docs would be
          > good. Then others can start using it too.
          >

          I would be happy to do it, but I cannot compile Vim under Win32 and I cannot
          find it on FTP site (version 6 that is). Any place to download it from?

          TIA

          -andrej
        • Andrej Borsenkow
          ... Experimenting with vim6.0n vim likes monospaced fonts. Unfortunately, the only two Unicode fonts I have here on stock Win2kSP1 + IE5.5SP1 are Lucida Sans
          Message 4 of 13 , Nov 21, 2000
          • 0 Attachment
            >
            >
            > Andrej Borsenkow wrote:
            >
            > > Recently I tried to edit with Vim exported registry key. This was exported
            > > under Win2k. To my surprise, I saw only binary data instead of
            > text ... until
            > > I realized, that Win2k exports registry in Unicode (I have no
            > idea, if it can
            > > be changed).
            > >
            > > What is the current state of Unicode support in Vim? As I
            > understand, Vim-6
            > > includes Unicode support - is it possible to find binaries somewhere?
            >
            > Vim 6.0 supports Unicode. Internally it uses UTF-8. You need to set the
            > 'charcode' and 'filecharcode' options to edit a Unicode file (I would guess
            > Windows uses a 16 bit coding, thus use "ucs-2" for 'filecharcode').
            >
            > I haven't tried this on Windows though. You need at least a Unicode font.
            > You might need to do some experimenting to make it work. If you (or someone
            > else) do find out how to make it work, adding some text to the docs would be
            > good. Then others can start using it too.
            >

            Experimenting with vim6.0n

            vim likes monospaced fonts. Unfortunately, the only two Unicode fonts I have
            here on stock Win2kSP1 + IE5.5SP1 are "Lucida Sans Unicode" and "Arial Unicode
            MS". Both are proportional. Actually, Lucida seems to include only subset
            known as WCS, but it should be enough for my purposes.

            Setting filecharcode=ucs-2 and charcode=latin-1 gives me something like (the
            same registry dump under Win2k):

            y?W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ...

            that really looks like LSB UCS-2 encoding. First word may be something like
            text direction (left-to-right) indication?

            Setting charcode to ucs-2 results in undeciferable display. That may be
            correct because I cannot select UCS-2 font; still, interaction between
            filcharcode/charcode/font puzzles me a bit :-)

            Anybody knows about monospaced Unicode font? Or is it possible to use
            porportional fonts in Vim?

            -andrej
          • Ričardas Čepas
            On Tue Nov 21 11:27:40 2000 +0300 Andrej Borsenkow wrote: ... Is something wrong with Courier New? It doesn t cover as many characters as Arial Unicode but it
            Message 5 of 13 , Nov 21, 2000
            • 0 Attachment
              On Tue Nov 21 11:27:40 2000 +0300 Andrej Borsenkow wrote:

              ...
              >
              > Setting charcode to ucs-2 results in undeciferable display. That may be
              > correct because I cannot select UCS-2 font; still, interaction between
              > filcharcode/charcode/font puzzles me a bit :-)
              >
              > Anybody knows about monospaced Unicode font? Or is it possible to use
              > porportional fonts in Vim?
              >
              Is something wrong with Courier New? It doesn't cover as
              many characters as Arial Unicode but it is monospaced Unicode font
              and has cyrillic characters BTW.
              --
              ☻ Ričardas Čepas ☺
              ~~
              ~
            • Andrej Borsenkow
              ... Yes. It does not work :-) At least, setting filecharcode=ucs-2 charcode=ucs-2 selecting font Courier New opening Unicode text file results in screenful
              Message 6 of 13 , Nov 21, 2000
              • 0 Attachment
                >
                > On Tue Nov 21 11:27:40 2000 +0300 Andrej Borsenkow wrote:
                >
                > ...
                > >
                > > Setting charcode to ucs-2 results in undeciferable display. That may be
                > > correct because I cannot select UCS-2 font; still, interaction between
                > > filcharcode/charcode/font puzzles me a bit :-)
                > >
                > > Anybody knows about monospaced Unicode font? Or is it possible to use
                > > porportional fonts in Vim?
                > >
                > Is something wrong with Courier New? It doesn't cover as
                > many characters as Arial Unicode but it is monospaced Unicode font
                > and has cyrillic characters BTW.

                Yes. It does not work :-) At least, setting

                filecharcode=ucs-2
                charcode=ucs-2
                selecting font 'Courier New'
                opening Unicode text file

                results in screenful of squares with blanks. But it may not depend on font.

                -andrej
              • Bram Moolenaar
                ... Hmm, that won t work then. There is still the idea to use a proportional font, but position each character in a mono-spaced cell. Then at least you can
                Message 7 of 13 , Nov 21, 2000
                • 0 Attachment
                  Andrej Borsenkow wrote:

                  > Experimenting with vim6.0n
                  >
                  > vim likes monospaced fonts. Unfortunately, the only two Unicode fonts I have
                  > here on stock Win2kSP1 + IE5.5SP1 are "Lucida Sans Unicode" and "Arial
                  > Unicode MS". Both are proportional. Actually, Lucida seems to include only
                  > subset known as WCS, but it should be enough for my purposes.

                  Hmm, that won't work then. There is still the idea to use a proportional
                  font, but position each character in a mono-spaced cell. Then at least you
                  can use any font (although it will look ugly). Hopefully someone makes a
                  patch for this, because many people run into this problem.

                  > Setting filecharcode=ucs-2 and charcode=latin-1 gives me something like (the
                  > same registry dump under Win2k):
                  >
                  > y?W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ...
                  >
                  > that really looks like LSB UCS-2 encoding. First word may be something like
                  > text direction (left-to-right) indication?

                  Looks like no conversion was done. Try this instead:

                  :set charcode=latin-1
                  :e ++cc=ucs-2l your_file

                  This uses latin-1 internally, and forces the file to be read with
                  'filecharcode' set to "ucs-2l". There are other ways to do this, but this is
                  the simplest.

                  The first Unicode character is a BOM (Byte Order Mark). This isn't treated
                  specially right now.

                  > Setting charcode to ucs-2 results in undeciferable display. That may be
                  > correct because I cannot select UCS-2 font; still, interaction between
                  > filcharcode/charcode/font puzzles me a bit :-)

                  In most cases you would work with 'charcode' set to ucs-2 or utf-8 (whatever
                  most of your files are encoded in) and leave 'filecharcode' empty. Then you
                  set 'filecharcodes' to have most formats automatically detected.

                  Your font must always match 'charcode'. 'charcode' is the option that tells
                  Vim the encoding of all internal items. You shouldn't change this much,
                  because the encoding of registers, viminfo etc. would become messed up.

                  > Anybody knows about monospaced Unicode font? Or is it possible to use
                  > porportional fonts in Vim?

                  You could try this page:

                  http://www.microsoft.com/typography/fontpack/default.htm

                  Installing an international version of Internet Explorer might also help,
                  since it comes with a number of fonts.

                  --
                  hundred-and-one symptoms of being an internet addict:
                  227. You sleep next to your monitor. Or on top of it.

                  /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
                  \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
                • Andrej Borsenkow
                  ... The same result. On the status line it clearly says: [noeol] [Not converted] [unix]
                  Message 8 of 13 , Nov 21, 2000
                  • 0 Attachment
                    >
                    > > Setting filecharcode=ucs-2 and charcode=latin-1 gives me
                    > something like (the
                    > > same registry dump under Win2k):
                    > >
                    > > y?W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ...
                    > >
                    > > that really looks like LSB UCS-2 encoding. First word may be
                    > something like
                    > > text direction (left-to-right) indication?
                    >
                    > Looks like no conversion was done. Try this instead:
                    >
                    > :set charcode=latin-1
                    > :e ++cc=ucs-2l your_file
                    >
                    > This uses latin-1 internally, and forces the file to be read with
                    > 'filecharcode' set to "ucs-2l". There are other ways to do this,
                    > but this is
                    > the simplest.
                    >


                    The same result. On the status line it clearly says:

                    [noeol] [Not converted] [unix]
                  • Andrej Borsenkow
                    ... O.K., I figured it out. To do conversion Vim either uses iconv or charconv expression. iconv is obviously unavailable in Win32 port. Hmm, I wonder, if at
                    Message 9 of 13 , Nov 21, 2000
                    • 0 Attachment
                      > >
                      > > Looks like no conversion was done. Try this instead:
                      > >
                      > > :set charcode=latin-1
                      > > :e ++cc=ucs-2l your_file
                      > >
                      >
                      > The same result. On the status line it clearly says:
                      >
                      > [noeol] [Not converted] [unix]
                      >
                      >

                      O.K., I figured it out. To do conversion Vim either uses iconv or charconv
                      expression. iconv is obviously unavailable in Win32 port.

                      Hmm, I wonder, if at least built-in conversion from/to UCS-[24][LM] to/from
                      UTF-8 makes sense. It is mechanical and no table is needed.

                      Is there better (faster) UCS <-> UTF-8 algorithm than example in RFC2279?

                      -andrej
                    • Bram Moolenaar
                      ... Try using ucs-2l instead of ucs-2 . The default for Unicode is big-endian, but Windows uses little-endian. Perhaps I should implement BOM recognition?
                      Message 10 of 13 , Nov 21, 2000
                      • 0 Attachment
                        Andrej Borsenkow wrote:

                        > > Is something wrong with Courier New? It doesn't cover as
                        > > many characters as Arial Unicode but it is monospaced Unicode font
                        > > and has cyrillic characters BTW.
                        >
                        > Yes. It does not work :-) At least, setting
                        >
                        > filecharcode=ucs-2
                        > charcode=ucs-2
                        > selecting font 'Courier New'
                        > opening Unicode text file
                        >
                        > results in screenful of squares with blanks. But it may not depend on font.

                        Try using "ucs-2l" instead of "ucs-2". The default for Unicode is big-endian,
                        but Windows uses little-endian.

                        Perhaps I should implement BOM recognition?

                        --
                        hundred-and-one symptoms of being an internet addict:
                        233. You start dreaming about web pages...in html.

                        /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
                        \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
                      • Bram Moolenaar
                        ... That s what already happens. Well, what _should_ happen. This means there was a problem when doing the conversion. Could you send me a small example file
                        Message 11 of 13 , Nov 21, 2000
                        • 0 Attachment
                          Andrej Borsenkow wrote:

                          > > > Setting filecharcode=ucs-2 and charcode=latin-1 gives me something like
                          > > > (the same registry dump under Win2k):
                          > > >
                          > > > y?W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ...
                          > > >
                          > > > that really looks like LSB UCS-2 encoding. First word may be something
                          > > > like text direction (left-to-right) indication?
                          > >
                          > > Looks like no conversion was done. Try this instead:
                          > >
                          > > :set charcode=latin-1
                          > > :e ++cc=ucs-2l your_file
                          > >
                          > > This uses latin-1 internally, and forces the file to be read with
                          > > 'filecharcode' set to "ucs-2l". There are other ways to do this, but this
                          > > is the simplest.
                          >
                          > The same result. On the status line it clearly says:
                          >
                          > [noeol] [Not converted] [unix]

                          In the next message:

                          > O.K., I figured it out. To do conversion Vim either uses iconv or charconv
                          > expression. iconv is obviously unavailable in Win32 port.
                          >
                          > Hmm, I wonder, if at least built-in conversion from/to UCS-[24][LM] to/from
                          > UTF-8 makes sense. It is mechanical and no table is needed.

                          That's what already happens. Well, what _should_ happen.

                          This means there was a problem when doing the conversion. Could you send me a
                          small example file in this encoding, so that I can try it out to see what
                          happens?

                          --
                          hundred-and-one symptoms of being an internet addict:
                          236. You start saving URL's in your digital watch.

                          /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
                          \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
                        • Andrej Borsenkow
                          ... Yep! That really did the trick! With Courier New as well - not that I like this font very much. My actual problem was not Cyrillic (I do not say I do not
                          Message 12 of 13 , Nov 21, 2000
                          • 0 Attachment
                            >
                            > Andrej Borsenkow wrote:
                            >
                            > > > Is something wrong with Courier New? It doesn't cover as
                            > > > many characters as Arial Unicode but it is monospaced Unicode font
                            > > > and has cyrillic characters BTW.
                            > >
                            > > Yes. It does not work :-) At least, setting
                            > >
                            > > filecharcode=ucs-2
                            > > charcode=ucs-2
                            > > selecting font 'Courier New'
                            > > opening Unicode text file
                            > >
                            > > results in screenful of squares with blanks. But it may not
                            > depend on font.
                            >
                            > Try using "ucs-2l" instead of "ucs-2". The default for Unicode is
                            > big-endian,
                            > but Windows uses little-endian.
                            >

                            Yep! That really did the trick! With Courier New as well - not that I like
                            this font very much.

                            My actual problem was not Cyrillic (I do not say I do not need it, of course
                            :-), but rather registry files. Registry exports under Win2k are in Unicode,
                            and I normally use Vim. This was one exception when I was forced to use
                            Notepad.

                            > Perhaps I should implement BOM recognition?
                            >


                            At least, it should not be in editable text. Currently attempt to move cursor
                            to the beginning of file looks a bit funny :-)


                            Is it possible to autodect file encoding? Anybody knows, how Windows does it?

                            Also, while it is possible to _display_ text in Unicode (I just successfully
                            did it with mixed ASCII/Russian/German), it is impossible to _enter_ either in
                            Russian or German. It looks like Vim interprets each input character as part
                            of two-byte wide char.

                            -andrej
                          • Bram Moolenaar
                            ... I think this should be done like the endofline option. The bomb option would be appropriate: BOM at Beginning of file. :-) ... Probably by detecting
                            Message 13 of 13 , Nov 21, 2000
                            • 0 Attachment
                              Andrej Borsenkow wrote:

                              > > Perhaps I should implement BOM recognition?
                              >
                              > At least, it should not be in editable text. Currently attempt to move
                              > cursor to the beginning of file looks a bit funny :-)

                              I think this should be done like the 'endofline' option. The 'bomb' option
                              would be appropriate: BOM at Beginning of file. :-)

                              > Is it possible to autodect file encoding? Anybody knows, how Windows does
                              > it?

                              Probably by detecting the BOM. It's near to impossible that a non-Unicode
                              files starts with it.

                              I could make it such that when 'filecharcodes' contains with "ucs-bom" it will
                              look for a BOM and skip this choice when there isn't one. You can then set
                              your filecharcodes to: "ucs-bom,utf-8,latin-1". That should detect most
                              files automatically.

                              > Also, while it is possible to _display_ text in Unicode (I just successfully
                              > did it with mixed ASCII/Russian/German), it is impossible to _enter_ either
                              > in Russian or German. It looks like Vim interprets each input character as
                              > part of two-byte wide char.

                              This is something that requires work. How can Vim detect that the system is
                              sending a multi-byte character? Is there a special Unicode input function?
                              The Win32 input function should check 'charcode' and perhaps do UCS-2 to UTF-8
                              conversion.

                              --
                              You can tune a file system, but you can't tuna fish
                              -- man tunefs

                              /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
                              \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
                            Your message has been successfully submitted and would be delivered to recipients shortly.