Loading ...
Sorry, an error occurred while loading the content.

character code

Expand Messages
  • Raphael Bauduin
    Hi, is there a way to see the code of the character you are on? I m working with a file coming from dos, and I d like to see if I have the CR,LF,... (but not
    Message 1 of 6 , Apr 30, 2002
    • 0 Attachment
      Hi,

      is there a way to see the code of the character you are on?
      I'm working with a file coming from dos, and I'd like to see if I have the
      CR,LF,... (but not only that: this is a csv representation of a access db I want
      to import in postgres. Some field are text, in which you find a 'new line',but I
      don't know how this is represented in the csv.)

      The hex view by

      :set display=uhex

      is not practical for that.

      Still about characters: in the very same file coming from a windows box, I have
      some character repreented by <..> codes. For example :
      soci<82>t<82> for société. Should I rework this file befor importing the data in
      a database, or will the database convert it correctly?

      Even a broader question: do you have any documentation pointers for those character
      representations?

      Thanks for your help!

      Raph
      --
      Free Software and Open Source Developers Meeting
      See you at the 2002 edition. Check the 2001 sessions on www.opensource-tv.com
      Visit http://www.fosdem.org and become member of the mailing list!
    • Preben Peppe Guldberg
      ... Are you looking for ga and :ascii ? See :help ga . ... My take on this is that vim cannot properly display the characters. As long as you don t
      Message 2 of 6 , Apr 30, 2002
      • 0 Attachment
        Thus wrote Raphael Bauduin (rb@...) on [020430]:

        > is there a way to see the code of the character you are on?

        Are you looking for "ga" and ":ascii"? See ":help ga".

        > Still about characters: in the very same file coming from a windows box, I have
        > some character repreented by <..> codes. For example :
        > soci<82>t<82> for société. Should I rework this file befor importing the data in
        > a database, or will the database convert it correctly?

        My take on this is that vim cannot properly display the characters. As
        long as you don't copy'n'paste them, but use the actual file (or write
        some lines to another file), you should be fine.

        Peppe
        --
        "Before you criticize someone, walk
        Preben "Peppe" Guldberg __/-\__ a mile in his shoes. That way, if
        c928400@... (o o) he gets angry, he'll be a mile away
        ----------------------oOOo (_) oOOo-- - and barefoot." --Sarah Jackson
      • Raphael Bauduin
        ... now I know ga, here are some more details about the word société: in the file coming from access, the é character is represented by 130,
        Message 3 of 6 , Apr 30, 2002
        • 0 Attachment
          > My take on this is that vim cannot properly display the characters. As
          > long as you don't copy'n'paste them, but use the actual file (or write
          > some lines to another file), you should be fine.

          now I know ga, here are some more details about the word société:
          in the file coming from access, the é character is represented by
          <~B> <M-^B> 130, Hex 82, Octal 202

          In the same file I can insert a é, and ga gives:
          <é> <|i> <M-i> 233, Hex e9, Octal 351

          How comes there are two results for actually the same character?


          Another precision: with ga, I can't take a look at the 'new line' characters.
          :set list shows the end of the line with a '$', but I don't know if it's a
          CR/LF or not....

          Raph

          --
          Free Software and Open Source Developers Meeting
          See you at the 2002 edition. Check the 2001 sessions on www.opensource-tv.com
          Visit http://www.fosdem.org and become member of the mailing list!
        • Ron Aaron
          ... Hello, Raphael - The reason is that they are *not* the same character! The character encodings are different. The character 0xe9 in the utf-8 encoding
          Message 4 of 6 , Apr 30, 2002
          • 0 Attachment
            >> My take on this is that vim cannot properly display the characters. As
            >> long as you don't copy'n'paste them, but use the actual file (or write
            >> some lines to another file), you should be fine.
            >
            > now I know ga, here are some more details about the word société: in the
            > file coming from access, the é character is represented by
            > <~B> <M-^B> 130, Hex 82, Octal 202
            >
            > In the same file I can insert a é, and ga gives:
            > <é> <|i> <M-i> 233, Hex e9, Octal 351
            >
            > How comes there are two results for actually the same character?

            Hello, Raphael -

            The reason is that they are *not* the same character!

            The character encodings are different. The character 0xe9 in the 'utf-8'
            encoding is indeed the é character. Do you know what is the character 0x82
            encoding?

            What are your settigns for 'enc' and 'fencs', and 'fenc' after you read the
            file in?


            > Another precision: with ga, I can't take a look at the 'new
            > line' characters.

            That is correct. The newline characters are converted internally to 'NUL' and
            are not visible to you. You can tell by looking at 'ff':
            :set ff?
            if it's 'dos' then you have 'crlf' line endings, if 'unix' it is 'lf' alone.

            Also, you can do

            :enew
            :r!xxd #

            to create a new buffer with the output of 'xxd' (a hex dumper) for the buffer
            you were just looking at. This will show every byte in the file exactly,
            without interpretation.

            Best,
            Ron
          • Jürgen Krämer
            Hi, ... the encoding might be DOS code pages 437 or 850 or ... This cannot be decided from the single encoding é == 0x82, because those code pages do not
            Message 5 of 6 , May 1 11:32 PM
            • 0 Attachment
              Hi,

              Ron Aaron wrote:
              >
              > The reason is that they are *not* the same character!
              >
              > The character encodings are different. The character 0xe9 in the
              > 'utf-8' encoding is indeed the é character. Do you know what is the
              > character 0x82 encoding?

              the encoding might be DOS code pages 437 or 850 or ... This cannot
              be decided from the single encoding 'é' == 0x82, because those code
              pages do not differ in this case. If Raphael could provide some more
              incorrectly displayed characters I could look it up op.

              Regards,
              Jürgen

              --
              Jürgen Krämer Softwareentwicklung/-support
              Habel GmbH mailto:software@...
              Hinteres Öschle 2 Tel: (0 74 61) 93 53 15
              78604 Rietheim-Weilheim Fax: (0 74 61) 93 53 99
            • Antoine J. Mechelynck
              Raphael (or anyone having that kind of problems) could also view the ... Succinctly, :set enc? shows your current encoding. :set fenc? shows the encoding for
              Message 6 of 6 , May 3 8:15 AM
              • 0 Attachment
                Raphael (or anyone having that kind of problems) could also view the
                following help topics:

                :h 'encoding'
                :h 'fileencoding'
                :h 'fileencodings'
                :h 'termencoding'
                :h :digraphs

                Succinctly, :set enc? shows your current encoding. :set fenc? shows the
                encoding for the current file (defaults to the current 'encoding' if empty);
                set fencs? shows which encodings are recognised when reading a file: they
                are tried in order; the first one which works is used, or if none there is a
                fallback to (I think) latin1; set tenc? shows which encoding is used to
                interpret your key presses (default to the current 'encoding' if empty, so
                don't forget to set it if, for instance, you want to edit unicode files
                using a Latin or ISO-8859 keyboard); :dig (with no parameters) shows all the
                digraphs currently defined for your current encoding (to use one in Insert
                or Replace mode, type <Ctrl-K><char1><char2>); :dig <char1><char2> <number>
                [<char3><char4> <number>] ... defines one or more new ones.

                On my Windows box, using the iso-8859-15 encoding in gvim, I get é = 233
                (and has digraph e' ); 130 has digraph BH. Under UTF-8, the same (but many
                more digraphs are defined of course, with charcodes between 1 and 64262).
                Strangely enough, in vim in a DOS box under Windows, with the same :set
                options, I get é=130 (and has digraph e'); much fewer digraphs are defined
                (233 isn't, and there are none under 128). Maybe it inherits my cp437
                setting from the BIOS (and no codepage software loaded in either CONFIG.SYS
                or AUTOEXEC.BAT).

                Tony.

                ----- Original Message -----
                From: "Jürgen Krämer" <jkraemer@...>
                To: <vim@...>
                Sent: Thursday, May 02, 2002 8:32 AM
                Subject: Re: character code


                >
                > Hi,
                >
                > Ron Aaron wrote:
                > >
                > > The reason is that they are *not* the same character!
                > >
                > > The character encodings are different. The character 0xe9 in the
                > > 'utf-8' encoding is indeed the é character. Do you know what is the
                > > character 0x82 encoding?
                >
                > the encoding might be DOS code pages 437 or 850 or ... This cannot
                > be decided from the single encoding 'é' == 0x82, because those code
                > pages do not differ in this case. If Raphael could provide some more
                > incorrectly displayed characters I could look it up op.
                >
                > Regards,
                > Jürgen
                >
                > --
                > Jürgen Krämer Softwareentwicklung/-support
                > Habel GmbH mailto:software@...
                > Hinteres Öschle 2 Tel: (0 74 61) 93 53 15
                > 78604 Rietheim-Weilheim Fax: (0 74 61) 93 53 99
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.