Loading ...
Sorry, an error occurred while loading the content.

bug: gvim 7.0.205 on xp can not display ucs-2

Expand Messages
  • Mike Li
    gvim 7.0 (patches 1-205) under windows xp, built with the mingw compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text files. see below for the
    Message 1 of 9 , Mar 5, 2007
    • 0 Attachment
      gvim 7.0 (patches 1-205) under windows xp, built with the mingw
      compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
      files. see below for the xxd-dump of an ucs-2 text file containing a
      single chinese character (U+6c38):

      0000000: 6c 38 00 0d 00 0a l8....

      the same problem is seen with the little-endian (ucs-2le) version of
      the same file:

      0000000: 38 6c 0d 00 0a 00 8l....

      the presence or absence of a BOM (byte order marker) at the beginning
      of the file does not make a difference. the issue is also seen with
      gvim from the original windows binary distribution.

      console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
      4.0, works fine with '++enc=ucs-2'. the original binary from the yum
      package vim-enhanced-7.0.201-1.fc6 also works fine.

      -x
    • Mike Li
      console vim 7.0 (patches 1-205), built with the mingw compiler under cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary distributed with
      Message 2 of 9 , Mar 5, 2007
      • 0 Attachment
        console vim 7.0 (patches 1-205), built with the mingw compiler under
        cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
        distributed with cygwin have the same problem as the gvim binaries
        under windows xp.

        -x

        On 3/5/07, Mike Li <entrophage@...> wrote:
        > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
        > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
        > files. see below for the xxd-dump of an ucs-2 text file containing a
        > single chinese character (U+6c38):
        >
        > 0000000: 6c 38 00 0d 00 0a l8....
        >
        > the same problem is seen with the little-endian (ucs-2le) version of
        > the same file:
        >
        > 0000000: 38 6c 0d 00 0a 00 8l....
        >
        > the presence or absence of a BOM (byte order marker) at the beginning
        > of the file does not make a difference. the issue is also seen with
        > gvim from the original windows binary distribution.
        >
        > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
        > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
        > package vim-enhanced-7.0.201-1.fc6 also works fine.
        >
        > -x
        >
      • Doug Cook
        What gets displayed? Does this happen on gVim as well? Do Chinese characters appear correctly in the console window when using other programs? ... From: Mike
        Message 3 of 9 , Mar 6, 2007
        • 0 Attachment
          What gets displayed?

          Does this happen on gVim as well?

          Do Chinese characters appear correctly in the console window when using
          other programs?

          -----Original Message-----
          From: Mike Li [mailto:entrophage@...]
          Sent: Monday, March 05, 2007 10:04 PM
          To: vim-dev@...
          Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2

          console vim 7.0 (patches 1-205), built with the mingw compiler under
          cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
          distributed with cygwin have the same problem as the gvim binaries
          under windows xp.

          -x

          On 3/5/07, Mike Li <entrophage@...> wrote:
          > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
          > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
          > files. see below for the xxd-dump of an ucs-2 text file containing a
          > single chinese character (U+6c38):
          >
          > 0000000: 6c 38 00 0d 00 0a l8....
          >
          > the same problem is seen with the little-endian (ucs-2le) version of
          > the same file:
          >
          > 0000000: 38 6c 0d 00 0a 00 8l....
          >
          > the presence or absence of a BOM (byte order marker) at the beginning
          > of the file does not make a difference. the issue is also seen with
          > gvim from the original windows binary distribution.
          >
          > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
          > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
          > package vim-enhanced-7.0.201-1.fc6 also works fine.
          >
          > -x
          >
        • Mike Li
          for big-endian, the following is displayed: l8^@^M^@ for little-endian, the following is displayed: 8l^M^@ ^@ ^@ and ^M are control characters. this
          Message 4 of 9 , Mar 6, 2007
          • 0 Attachment
            for big-endian, the following is displayed:

            l8^@^M^@

            for little-endian, the following is displayed:

            8l^M^@
            ^@

            "^@" and "^M" are control characters.

            this definitely happens on gvim as well as console vim under windows
            xp. i'm not sure if it happens with other programs in the console
            window, as the platform is windows. i have the appropriate fonts, and
            notepad displays the little-endian version correctly.

            -x

            On 3/6/07, Doug Cook <douglasevancook@...> wrote:
            > What gets displayed?
            >
            > Does this happen on gVim as well?
            >
            > Do Chinese characters appear correctly in the console window when using
            > other programs?
            >
            > -----Original Message-----
            > From: Mike Li [mailto:entrophage@...]
            > Sent: Monday, March 05, 2007 10:04 PM
            > To: vim-dev@...
            > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
            >
            > console vim 7.0 (patches 1-205), built with the mingw compiler under
            > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
            > distributed with cygwin have the same problem as the gvim binaries
            > under windows xp.
            >
            > -x
            >
            > On 3/5/07, Mike Li <entrophage@...> wrote:
            > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
            > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
            > > files. see below for the xxd-dump of an ucs-2 text file containing a
            > > single chinese character (U+6c38):
            > >
            > > 0000000: 6c 38 00 0d 00 0a l8....
            > >
            > > the same problem is seen with the little-endian (ucs-2le) version of
            > > the same file:
            > >
            > > 0000000: 38 6c 0d 00 0a 00 8l....
            > >
            > > the presence or absence of a BOM (byte order marker) at the beginning
            > > of the file does not make a difference. the issue is also seen with
            > > gvim from the original windows binary distribution.
            > >
            > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
            > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
            > > package vim-enhanced-7.0.201-1.fc6 also works fine.
            > >
            > > -x
            > >
            >
            >
          • Mike Li
            one point of clarification: the correcly functioning fedora console vim binaries were run under x11 (rxvt-unicode) with appropriate truetype fonts. -x
            Message 5 of 9 , Mar 6, 2007
            • 0 Attachment
              one point of clarification: the correcly functioning fedora console
              vim binaries were run under x11 (rxvt-unicode) with appropriate
              truetype fonts.

              -x

              On 3/5/07, Mike Li <entrophage@...> wrote:
              > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
              > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
              > files. see below for the xxd-dump of an ucs-2 text file containing a
              > single chinese character (U+6c38):
              >
              > 0000000: 6c 38 00 0d 00 0a l8....
              >
              > the same problem is seen with the little-endian (ucs-2le) version of
              > the same file:
              >
              > 0000000: 38 6c 0d 00 0a 00 8l....
              >
              > the presence or absence of a BOM (byte order marker) at the beginning
              > of the file does not make a difference. the issue is also seen with
              > gvim from the original windows binary distribution.
              >
              > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
              > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
              > package vim-enhanced-7.0.201-1.fc6 also works fine.
              >
              > -x
              >
            • Mike Li
              one more update: if i add the following two lines to my _vimrc, then the ucs-2le text file works: set fileencodings+=ucs-2le set encoding=utf-8 note that both
              Message 6 of 9 , Mar 6, 2007
              • 0 Attachment
                one more update: if i add the following two lines to my _vimrc, then
                the ucs-2le text file works:

                set fileencodings+=ucs-2le
                set encoding=utf-8

                note that both need to be set before i edit the file. once i load the
                file, setting them no longer helps.

                -x

                On 3/6/07, Mike Li <entrophage@...> wrote:
                > one point of clarification: the correcly functioning fedora console
                > vim binaries were run under x11 (rxvt-unicode) with appropriate
                > truetype fonts.
                >
                > -x
                >
                > On 3/5/07, Mike Li <entrophage@...> wrote:
                > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
                > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
                > > files. see below for the xxd-dump of an ucs-2 text file containing a
                > > single chinese character (U+6c38):
                > >
                > > 0000000: 6c 38 00 0d 00 0a l8....
                > >
                > > the same problem is seen with the little-endian (ucs-2le) version of
                > > the same file:
                > >
                > > 0000000: 38 6c 0d 00 0a 00 8l....
                > >
                > > the presence or absence of a BOM (byte order marker) at the beginning
                > > of the file does not make a difference. the issue is also seen with
                > > gvim from the original windows binary distribution.
                > >
                > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
                > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
                > > package vim-enhanced-7.0.201-1.fc6 also works fine.
                > >
                > > -x
                > >
                >
              • A.J.Mechelynck
                ... Of course: - Vim needs to be able to represent Unicode codepoints in memory ( :set encoding=utf-8 ). This must be done before any attempt to read the file.
                Message 7 of 9 , Mar 6, 2007
                • 0 Attachment
                  Mike Li wrote:
                  > one more update: if i add the following two lines to my _vimrc, then
                  > the ucs-2le text file works:
                  >
                  > set fileencodings+=ucs-2le
                  > set encoding=utf-8
                  >
                  > note that both need to be set before i edit the file. once i load the
                  > file, setting them no longer helps.
                  >
                  > -x

                  Of course:
                  - Vim needs to be able to represent Unicode codepoints in memory (":set
                  encoding=utf-8"). This must be done before any attempt to read the file.

                  - Vim needs to know how to detect the encoding. This can be done in several ways:

                  * if 'fileencodings' *begins* with "ucs-bom", any Unicode encoding with BOM
                  will be recognised.
                  * if you use ":e ++enc=ucs-2be filename", the file will be interpreted
                  according to big-endian UCS-2
                  * if 'fileencodings' contains "ucs-2le", and anything preceding it checks
                  "invalid" for that file, then the file will be read as little-endian UCS-2 if
                  it contains no invalid bytes for that encoding.

                  Of course, if you change 'fileencodings' after the file has been read, it is
                  too late.

                  You also need to set a font containing the glyphs for whatever codepoints
                  you'll want to see. This is not a trivial problem in multilingual file: e.g.,
                  I know no fixed-width font having both Chinese and Arabic glyphs. Setting the
                  font is done in gvim by means of the 'guifont' option; console Vim uses
                  whatever font is set by the hardware text console or by the software terminal
                  emulator.

                  See:
                  :help 'encoding'
                  :help 'fileencodings'
                  :help ++opt
                  :help mbyte.txt
                  :help 'guifont'

                  Best regards,
                  Tony.
                  --
                  Did you know that there are 71.9 acres of nipple tissue in the U.S.?
                • Doug Cook
                  Using gVim, if I load your file normally, I do get ASCII instead of Unicode. ... it appears to work. I don t have a Chinese font, so I get a box, but it is a
                  Message 8 of 9 , Mar 6, 2007
                  • 0 Attachment
                    Using gVim, if I load your file normally, I do get ASCII instead of Unicode.
                    But if I then type:

                    :e ++enc=ucs-2

                    it appears to work. I don't have a Chinese font, so I get a box, but it is a
                    single character and it is double-width, so it appears to be interpreted
                    correctly by Vim. The same thing happens automatically if I insert the UCS-2
                    byte order mark "fe ff" at the start of the file.

                    The console version is a bit more tricky, and correct behavior from Vim
                    might depend on various system settings. In my case, the console version
                    reports a conversion error.

                    -----Original Message-----
                    From: Mike Li [mailto:entrophage@...]
                    Sent: Tuesday, March 06, 2007 1:53 AM
                    To: Doug Cook
                    Cc: vim-dev@...
                    Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2

                    for big-endian, the following is displayed:

                    l8^@^M^@

                    for little-endian, the following is displayed:

                    8l^M^@
                    ^@

                    "^@" and "^M" are control characters.

                    this definitely happens on gvim as well as console vim under windows
                    xp. i'm not sure if it happens with other programs in the console
                    window, as the platform is windows. i have the appropriate fonts, and
                    notepad displays the little-endian version correctly.

                    -x

                    On 3/6/07, Doug Cook <douglasevancook@...> wrote:
                    > What gets displayed?
                    >
                    > Does this happen on gVim as well?
                    >
                    > Do Chinese characters appear correctly in the console window when using
                    > other programs?
                    >
                    > -----Original Message-----
                    > From: Mike Li [mailto:entrophage@...]
                    > Sent: Monday, March 05, 2007 10:04 PM
                    > To: vim-dev@...
                    > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
                    >
                    > console vim 7.0 (patches 1-205), built with the mingw compiler under
                    > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
                    > distributed with cygwin have the same problem as the gvim binaries
                    > under windows xp.
                    >
                    > -x
                    >
                    > On 3/5/07, Mike Li <entrophage@...> wrote:
                    > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
                    > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
                    > > files. see below for the xxd-dump of an ucs-2 text file containing a
                    > > single chinese character (U+6c38):
                    > >
                    > > 0000000: 6c 38 00 0d 00 0a l8....
                    > >
                    > > the same problem is seen with the little-endian (ucs-2le) version of
                    > > the same file:
                    > >
                    > > 0000000: 38 6c 0d 00 0a 00 8l....
                    > >
                    > > the presence or absence of a BOM (byte order marker) at the beginning
                    > > of the file does not make a difference. the issue is also seen with
                    > > gvim from the original windows binary distribution.
                    > >
                    > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
                    > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
                    > > package vim-enhanced-7.0.201-1.fc6 also works fine.
                    > >
                    > > -x
                    > >
                    >
                    >
                  • Mike Li
                    much thanks to Doug and A.J. -- i now see that it wasn t a bug at all. sorry for the noise. -x
                    Message 9 of 9 , Mar 6, 2007
                    • 0 Attachment
                      much thanks to Doug and A.J. -- i now see that it wasn't a bug at all.
                      sorry for the noise.

                      -x

                      On 3/6/07, Doug Cook <douglasevancook@...> wrote:
                      > Using gVim, if I load your file normally, I do get ASCII instead of Unicode.
                      > But if I then type:
                      >
                      > :e ++enc=ucs-2
                      >
                      > it appears to work. I don't have a Chinese font, so I get a box, but it is a
                      > single character and it is double-width, so it appears to be interpreted
                      > correctly by Vim. The same thing happens automatically if I insert the UCS-2
                      > byte order mark "fe ff" at the start of the file.
                      >
                      > The console version is a bit more tricky, and correct behavior from Vim
                      > might depend on various system settings. In my case, the console version
                      > reports a conversion error.
                      >
                      > -----Original Message-----
                      > From: Mike Li [mailto:entrophage@...]
                      > Sent: Tuesday, March 06, 2007 1:53 AM
                      > To: Doug Cook
                      > Cc: vim-dev@...
                      > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
                      >
                      > for big-endian, the following is displayed:
                      >
                      > l8^@^M^@
                      >
                      > for little-endian, the following is displayed:
                      >
                      > 8l^M^@
                      > ^@
                      >
                      > "^@" and "^M" are control characters.
                      >
                      > this definitely happens on gvim as well as console vim under windows
                      > xp. i'm not sure if it happens with other programs in the console
                      > window, as the platform is windows. i have the appropriate fonts, and
                      > notepad displays the little-endian version correctly.
                      >
                      > -x
                      >
                      > On 3/6/07, Doug Cook <douglasevancook@...> wrote:
                      > > What gets displayed?
                      > >
                      > > Does this happen on gVim as well?
                      > >
                      > > Do Chinese characters appear correctly in the console window when using
                      > > other programs?
                      > >
                      > > -----Original Message-----
                      > > From: Mike Li [mailto:entrophage@...]
                      > > Sent: Monday, March 05, 2007 10:04 PM
                      > > To: vim-dev@...
                      > > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
                      > >
                      > > console vim 7.0 (patches 1-205), built with the mingw compiler under
                      > > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
                      > > distributed with cygwin have the same problem as the gvim binaries
                      > > under windows xp.
                      > >
                      > > -x
                      > >
                      > > On 3/5/07, Mike Li <entrophage@...> wrote:
                      > > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
                      > > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
                      > > > files. see below for the xxd-dump of an ucs-2 text file containing a
                      > > > single chinese character (U+6c38):
                      > > >
                      > > > 0000000: 6c 38 00 0d 00 0a l8....
                      > > >
                      > > > the same problem is seen with the little-endian (ucs-2le) version of
                      > > > the same file:
                      > > >
                      > > > 0000000: 38 6c 0d 00 0a 00 8l....
                      > > >
                      > > > the presence or absence of a BOM (byte order marker) at the beginning
                      > > > of the file does not make a difference. the issue is also seen with
                      > > > gvim from the original windows binary distribution.
                      > > >
                      > > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
                      > > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
                      > > > package vim-enhanced-7.0.201-1.fc6 also works fine.
                      > > >
                      > > > -x
                      > > >
                      > >
                      > >
                      >
                      >
                    Your message has been successfully submitted and would be delivered to recipients shortly.