Loading ...
Sorry, an error occurred while loading the content.

Re: bug: gvim 7.0.205 on xp can not display ucs-2

Expand Messages
  • Mike Li
    for big-endian, the following is displayed: l8^@^M^@ for little-endian, the following is displayed: 8l^M^@ ^@ ^@ and ^M are control characters. this
    Message 1 of 9 , Mar 6, 2007
    • 0 Attachment
      for big-endian, the following is displayed:

      l8^@^M^@

      for little-endian, the following is displayed:

      8l^M^@
      ^@

      "^@" and "^M" are control characters.

      this definitely happens on gvim as well as console vim under windows
      xp. i'm not sure if it happens with other programs in the console
      window, as the platform is windows. i have the appropriate fonts, and
      notepad displays the little-endian version correctly.

      -x

      On 3/6/07, Doug Cook <douglasevancook@...> wrote:
      > What gets displayed?
      >
      > Does this happen on gVim as well?
      >
      > Do Chinese characters appear correctly in the console window when using
      > other programs?
      >
      > -----Original Message-----
      > From: Mike Li [mailto:entrophage@...]
      > Sent: Monday, March 05, 2007 10:04 PM
      > To: vim-dev@...
      > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
      >
      > console vim 7.0 (patches 1-205), built with the mingw compiler under
      > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
      > distributed with cygwin have the same problem as the gvim binaries
      > under windows xp.
      >
      > -x
      >
      > On 3/5/07, Mike Li <entrophage@...> wrote:
      > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
      > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
      > > files. see below for the xxd-dump of an ucs-2 text file containing a
      > > single chinese character (U+6c38):
      > >
      > > 0000000: 6c 38 00 0d 00 0a l8....
      > >
      > > the same problem is seen with the little-endian (ucs-2le) version of
      > > the same file:
      > >
      > > 0000000: 38 6c 0d 00 0a 00 8l....
      > >
      > > the presence or absence of a BOM (byte order marker) at the beginning
      > > of the file does not make a difference. the issue is also seen with
      > > gvim from the original windows binary distribution.
      > >
      > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
      > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
      > > package vim-enhanced-7.0.201-1.fc6 also works fine.
      > >
      > > -x
      > >
      >
      >
    • Mike Li
      one point of clarification: the correcly functioning fedora console vim binaries were run under x11 (rxvt-unicode) with appropriate truetype fonts. -x
      Message 2 of 9 , Mar 6, 2007
      • 0 Attachment
        one point of clarification: the correcly functioning fedora console
        vim binaries were run under x11 (rxvt-unicode) with appropriate
        truetype fonts.

        -x

        On 3/5/07, Mike Li <entrophage@...> wrote:
        > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
        > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
        > files. see below for the xxd-dump of an ucs-2 text file containing a
        > single chinese character (U+6c38):
        >
        > 0000000: 6c 38 00 0d 00 0a l8....
        >
        > the same problem is seen with the little-endian (ucs-2le) version of
        > the same file:
        >
        > 0000000: 38 6c 0d 00 0a 00 8l....
        >
        > the presence or absence of a BOM (byte order marker) at the beginning
        > of the file does not make a difference. the issue is also seen with
        > gvim from the original windows binary distribution.
        >
        > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
        > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
        > package vim-enhanced-7.0.201-1.fc6 also works fine.
        >
        > -x
        >
      • Mike Li
        one more update: if i add the following two lines to my _vimrc, then the ucs-2le text file works: set fileencodings+=ucs-2le set encoding=utf-8 note that both
        Message 3 of 9 , Mar 6, 2007
        • 0 Attachment
          one more update: if i add the following two lines to my _vimrc, then
          the ucs-2le text file works:

          set fileencodings+=ucs-2le
          set encoding=utf-8

          note that both need to be set before i edit the file. once i load the
          file, setting them no longer helps.

          -x

          On 3/6/07, Mike Li <entrophage@...> wrote:
          > one point of clarification: the correcly functioning fedora console
          > vim binaries were run under x11 (rxvt-unicode) with appropriate
          > truetype fonts.
          >
          > -x
          >
          > On 3/5/07, Mike Li <entrophage@...> wrote:
          > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
          > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
          > > files. see below for the xxd-dump of an ucs-2 text file containing a
          > > single chinese character (U+6c38):
          > >
          > > 0000000: 6c 38 00 0d 00 0a l8....
          > >
          > > the same problem is seen with the little-endian (ucs-2le) version of
          > > the same file:
          > >
          > > 0000000: 38 6c 0d 00 0a 00 8l....
          > >
          > > the presence or absence of a BOM (byte order marker) at the beginning
          > > of the file does not make a difference. the issue is also seen with
          > > gvim from the original windows binary distribution.
          > >
          > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
          > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
          > > package vim-enhanced-7.0.201-1.fc6 also works fine.
          > >
          > > -x
          > >
          >
        • A.J.Mechelynck
          ... Of course: - Vim needs to be able to represent Unicode codepoints in memory ( :set encoding=utf-8 ). This must be done before any attempt to read the file.
          Message 4 of 9 , Mar 6, 2007
          • 0 Attachment
            Mike Li wrote:
            > one more update: if i add the following two lines to my _vimrc, then
            > the ucs-2le text file works:
            >
            > set fileencodings+=ucs-2le
            > set encoding=utf-8
            >
            > note that both need to be set before i edit the file. once i load the
            > file, setting them no longer helps.
            >
            > -x

            Of course:
            - Vim needs to be able to represent Unicode codepoints in memory (":set
            encoding=utf-8"). This must be done before any attempt to read the file.

            - Vim needs to know how to detect the encoding. This can be done in several ways:

            * if 'fileencodings' *begins* with "ucs-bom", any Unicode encoding with BOM
            will be recognised.
            * if you use ":e ++enc=ucs-2be filename", the file will be interpreted
            according to big-endian UCS-2
            * if 'fileencodings' contains "ucs-2le", and anything preceding it checks
            "invalid" for that file, then the file will be read as little-endian UCS-2 if
            it contains no invalid bytes for that encoding.

            Of course, if you change 'fileencodings' after the file has been read, it is
            too late.

            You also need to set a font containing the glyphs for whatever codepoints
            you'll want to see. This is not a trivial problem in multilingual file: e.g.,
            I know no fixed-width font having both Chinese and Arabic glyphs. Setting the
            font is done in gvim by means of the 'guifont' option; console Vim uses
            whatever font is set by the hardware text console or by the software terminal
            emulator.

            See:
            :help 'encoding'
            :help 'fileencodings'
            :help ++opt
            :help mbyte.txt
            :help 'guifont'

            Best regards,
            Tony.
            --
            Did you know that there are 71.9 acres of nipple tissue in the U.S.?
          • Doug Cook
            Using gVim, if I load your file normally, I do get ASCII instead of Unicode. ... it appears to work. I don t have a Chinese font, so I get a box, but it is a
            Message 5 of 9 , Mar 6, 2007
            • 0 Attachment
              Using gVim, if I load your file normally, I do get ASCII instead of Unicode.
              But if I then type:

              :e ++enc=ucs-2

              it appears to work. I don't have a Chinese font, so I get a box, but it is a
              single character and it is double-width, so it appears to be interpreted
              correctly by Vim. The same thing happens automatically if I insert the UCS-2
              byte order mark "fe ff" at the start of the file.

              The console version is a bit more tricky, and correct behavior from Vim
              might depend on various system settings. In my case, the console version
              reports a conversion error.

              -----Original Message-----
              From: Mike Li [mailto:entrophage@...]
              Sent: Tuesday, March 06, 2007 1:53 AM
              To: Doug Cook
              Cc: vim-dev@...
              Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2

              for big-endian, the following is displayed:

              l8^@^M^@

              for little-endian, the following is displayed:

              8l^M^@
              ^@

              "^@" and "^M" are control characters.

              this definitely happens on gvim as well as console vim under windows
              xp. i'm not sure if it happens with other programs in the console
              window, as the platform is windows. i have the appropriate fonts, and
              notepad displays the little-endian version correctly.

              -x

              On 3/6/07, Doug Cook <douglasevancook@...> wrote:
              > What gets displayed?
              >
              > Does this happen on gVim as well?
              >
              > Do Chinese characters appear correctly in the console window when using
              > other programs?
              >
              > -----Original Message-----
              > From: Mike Li [mailto:entrophage@...]
              > Sent: Monday, March 05, 2007 10:04 PM
              > To: vim-dev@...
              > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
              >
              > console vim 7.0 (patches 1-205), built with the mingw compiler under
              > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
              > distributed with cygwin have the same problem as the gvim binaries
              > under windows xp.
              >
              > -x
              >
              > On 3/5/07, Mike Li <entrophage@...> wrote:
              > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
              > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
              > > files. see below for the xxd-dump of an ucs-2 text file containing a
              > > single chinese character (U+6c38):
              > >
              > > 0000000: 6c 38 00 0d 00 0a l8....
              > >
              > > the same problem is seen with the little-endian (ucs-2le) version of
              > > the same file:
              > >
              > > 0000000: 38 6c 0d 00 0a 00 8l....
              > >
              > > the presence or absence of a BOM (byte order marker) at the beginning
              > > of the file does not make a difference. the issue is also seen with
              > > gvim from the original windows binary distribution.
              > >
              > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
              > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
              > > package vim-enhanced-7.0.201-1.fc6 also works fine.
              > >
              > > -x
              > >
              >
              >
            • Mike Li
              much thanks to Doug and A.J. -- i now see that it wasn t a bug at all. sorry for the noise. -x
              Message 6 of 9 , Mar 6, 2007
              • 0 Attachment
                much thanks to Doug and A.J. -- i now see that it wasn't a bug at all.
                sorry for the noise.

                -x

                On 3/6/07, Doug Cook <douglasevancook@...> wrote:
                > Using gVim, if I load your file normally, I do get ASCII instead of Unicode.
                > But if I then type:
                >
                > :e ++enc=ucs-2
                >
                > it appears to work. I don't have a Chinese font, so I get a box, but it is a
                > single character and it is double-width, so it appears to be interpreted
                > correctly by Vim. The same thing happens automatically if I insert the UCS-2
                > byte order mark "fe ff" at the start of the file.
                >
                > The console version is a bit more tricky, and correct behavior from Vim
                > might depend on various system settings. In my case, the console version
                > reports a conversion error.
                >
                > -----Original Message-----
                > From: Mike Li [mailto:entrophage@...]
                > Sent: Tuesday, March 06, 2007 1:53 AM
                > To: Doug Cook
                > Cc: vim-dev@...
                > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
                >
                > for big-endian, the following is displayed:
                >
                > l8^@^M^@
                >
                > for little-endian, the following is displayed:
                >
                > 8l^M^@
                > ^@
                >
                > "^@" and "^M" are control characters.
                >
                > this definitely happens on gvim as well as console vim under windows
                > xp. i'm not sure if it happens with other programs in the console
                > window, as the platform is windows. i have the appropriate fonts, and
                > notepad displays the little-endian version correctly.
                >
                > -x
                >
                > On 3/6/07, Doug Cook <douglasevancook@...> wrote:
                > > What gets displayed?
                > >
                > > Does this happen on gVim as well?
                > >
                > > Do Chinese characters appear correctly in the console window when using
                > > other programs?
                > >
                > > -----Original Message-----
                > > From: Mike Li [mailto:entrophage@...]
                > > Sent: Monday, March 05, 2007 10:04 PM
                > > To: vim-dev@...
                > > Subject: Re: bug: gvim 7.0.205 on xp can not display ucs-2
                > >
                > > console vim 7.0 (patches 1-205), built with the mingw compiler under
                > > cygwin (gcc -mno-cygwin), as well as the console vim 7.0.122 binary
                > > distributed with cygwin have the same problem as the gvim binaries
                > > under windows xp.
                > >
                > > -x
                > >
                > > On 3/5/07, Mike Li <entrophage@...> wrote:
                > > > gvim 7.0 (patches 1-205) under windows xp, built with the mingw
                > > > compiler under cygwin (gcc -mno-cygwin), can not display ucs-2 text
                > > > files. see below for the xxd-dump of an ucs-2 text file containing a
                > > > single chinese character (U+6c38):
                > > >
                > > > 0000000: 6c 38 00 0d 00 0a l8....
                > > >
                > > > the same problem is seen with the little-endian (ucs-2le) version of
                > > > the same file:
                > > >
                > > > 0000000: 38 6c 0d 00 0a 00 8l....
                > > >
                > > > the presence or absence of a BOM (byte order marker) at the beginning
                > > > of the file does not make a difference. the issue is also seen with
                > > > gvim from the original windows binary distribution.
                > > >
                > > > console vim 7.0 (patches 1-205) under fedora core 6, built with gcc
                > > > 4.0, works fine with '++enc=ucs-2'. the original binary from the yum
                > > > package vim-enhanced-7.0.201-1.fc6 also works fine.
                > > >
                > > > -x
                > > >
                > >
                > >
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.