Loading ...
Sorry, an error occurred while loading the content.

if "\xe4"=="\xe4" failes,why?

Expand Messages
  • mbbill
    I met a very strange problem recently, that is when I set the following options: set encoding=utf-8 set ignorecase then the expression: if xe4 == xe4
    Message 1 of 7 , Nov 29, 2006
    • 0 Attachment
      I met a very strange problem recently, that is
      when I set the following options:
      set encoding=utf-8
      set ignorecase
      then the expression: if "\xe4"=="\xe4" fails.
      I test it using:
      if "\xe4"=="\xe4"
      echo "test"
      endif
      but I got nothing output, why ?



      --
      Best regards,
      mbbill mailto:bill.mb@...
    • A.J.Mechelynck
      ... P.S. ... outputs 1 (one, i.e., TRUE). I think this proves my conjecture above. Best regards, Tony.
      Message 2 of 7 , Nov 29, 2006
      • 0 Attachment
        A.J.Mechelynck wrote:
        > mbbill wrote:
        >> I met a very strange problem recently, that is
        >> when I set the following options:
        >> set encoding=utf-8
        >> set ignorecase
        >> then the expression: if "\xe4"=="\xe4" fails.
        >> I test it using:
        >> if "\xe4"=="\xe4"
        >> echo "test"
        >> endif
        >> but I got nothing output, why ?
        >>
        >>
        >
        > I confirm this:
        >
        > :echo ("\xe4" == "\xe4")
        >
        > outputs 0
        >
        > I guess the strings, or at least one of them, are not evaluated as "the
        > U+00E4 codepoint, i.e., 0xC3 0xA4" but as "the one-byte string 0xE4,
        > which is not a valid Unicode codepoint when followed by a null". The
        > latter would be NaS (Not a String) in evaluations, and give the same
        > kind of strange results as NaN (Not a Number) in floating-point
        > comparisons.
        >
        > This conjecture seems to be confirmed by
        >
        > :echo ("\xe4")
        >
        > which outputs <e4> in blue, not ä (a-umlaut) in black, which is output by
        >
        > :echo "ä"
        >
        > and by
        >
        > :echo ("\<Char-0xe4>")
        >
        >
        > Bug or feature?
        >
        >
        > Best regards,
        > Tony.
        >

        P.S.

        :echo ("ä" == "\xc3\xa4")

        outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.


        Best regards,
        Tony.
      • mbbill
        Hello A.J.Mechelynck, ... Yes, I agree with your opinion. When I test it somewhere else, I can not let the bug come again sometimes, may be some other
        Message 3 of 7 , Nov 29, 2006
        • 0 Attachment
          Hello A.J.Mechelynck,

          Thursday, November 30, 2006, 1:15:14 PM, you wrote:

          >?A.J.Mechelynck wrote:
          >>?mbbill wrote:
          >>>?I met a very strange problem recently, that is
          >>>?when I set the following options:
          >>>?set encoding=utf-8
          >>>?set ignorecase
          >>>?then the expression: if "\xe4"=="\xe4" fails.
          >>>?I test it using:
          >>>?if "\xe4"=="\xe4"
          >>>? echo "test"
          >>>?endif
          >>>?but I got nothing output, why ?

          >>>?

          >>?I confirm this:

          >>? :echo ("\xe4" == "\xe4")

          >>?outputs 0

          >>?I guess the strings, or at least one of them, are not evaluated as "the
          >>?U+00E4 codepoint, i.e., 0xC3 0xA4" but as "the one-byte string 0xE4,
          >>?which is not a valid Unicode codepoint when followed by a null". The
          >>?latter would be NaS (Not a String) in evaluations, and give the same
          >>?kind of strange results as NaN (Not a Number) in floating-point
          >>?comparisons.

          >>?This conjecture seems to be confirmed by

          >>? :echo ("\xe4")

          >>?which outputs <e4> in blue, not ä (a-umlaut) in black, which is output by

          >>? :echo "ä"

          >>?and by

          >>? :echo ("\<Char-0xe4>")


          >>?Bug or feature?


          >>?Best regards,
          >>?Tony.


          >?P.S.

          >? :echo ("ä" == "\xc3\xa4")

          >?outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.

          Yes, I agree with your opinion.
          When I test it somewhere else, I can not let the "bug" come again sometimes, may be some other options can affect the result of the expression.



          --
          Best regards,
          mbbill mailto:bill.mb@...
        • A.J.Mechelynck
          ... In all 8-bit encodings, xe4 is (IIUC) whatever is represented in that encoding by the byte 0xe4, which is usually a valid character. In Unicode (always
          Message 4 of 7 , Nov 29, 2006
          • 0 Attachment
            mbbill wrote:
            > Hello A.J.Mechelynck,
            >
            > Thursday, November 30, 2006, 1:15:14 PM, you wrote:
            >
            >> ?A.J.Mechelynck wrote:
            >>> ?mbbill wrote:
            >>>> ?I met a very strange problem recently, that is
            >>>> ?when I set the following options:
            >>>> ?set encoding=utf-8
            >>>> ?set ignorecase
            >>>> ?then the expression: if "\xe4"=="\xe4" fails.
            >>>> ?I test it using:
            >>>> ?if "\xe4"=="\xe4"
            >>>> ? echo "test"
            >>>> ?endif
            >>>> ?but I got nothing output, why ?
            >
            >>>> ?
            >
            >>> ?I confirm this:
            >
            >>> ? :echo ("\xe4" == "\xe4")
            >
            >>> ?outputs 0
            >
            >>> ?I guess the strings, or at least one of them, are not evaluated as "the
            >>> ?U+00E4 codepoint, i.e., 0xC3 0xA4" but as "the one-byte string 0xE4,
            >>> ?which is not a valid Unicode codepoint when followed by a null". The
            >>> ?latter would be NaS (Not a String) in evaluations, and give the same
            >>> ?kind of strange results as NaN (Not a Number) in floating-point
            >>> ?comparisons.
            >
            >>> ?This conjecture seems to be confirmed by
            >
            >>> ? :echo ("\xe4")
            >
            >>> ?which outputs <e4> in blue, not ä (a-umlaut) in black, which is output by
            >
            >>> ? :echo "ä"
            >
            >>> ?and by
            >
            >>> ? :echo ("\<Char-0xe4>")
            >
            >
            >>> ?Bug or feature?
            >
            >
            >>> ?Best regards,
            >>> ?Tony.
            >
            >
            >> ?P.S.
            >
            >> ? :echo ("ä" == "\xc3\xa4")
            >
            >> ?outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.
            >
            > Yes, I agree with your opinion.
            > When I test it somewhere else, I can not let the "bug" come again sometimes, may be some other options can affect the result of the expression.
            >
            >
            >

            In all 8-bit encodings, "\xe4" is (IIUC) whatever is represented in that
            encoding by the byte 0xe4, which is usually a valid character. In Unicode
            (always internally UTF-8 in Vim) 0xE4 is not a valid character, unless it is
            followed by exactly two bytes (no more, no less) in the range 0x80-0xBF,
            because UTF-8 codepoints are represented by one to six bytes each, and these
            bytes are as follows:
            0x00-0x7F: standalone byte
            0x80-0xBF: trailing byte (any byte but the first, in a multibyte sequence)
            0xCO-0xDF: leading byte of a two-byte sequence
            0xE0-0xEF: leading byte of a three-byte sequence
            0xF0-0xF7: leading byte of a four-byte sequence
            0xF8-0xFB: leading byte of a five-byte sequence
            0xFC-0xFD: leading byte of a six-byte sequence
            0xFE-0xFF: invalid

            I don't know how "\xe4" tests in non-Unicode multibyte encodings such as those
            used for Chinese, Japanese, Korean, etc.


            Best regards,
            Tony.
          • Charles E Campbell Jr
            ... Try set encoding=utf-8 if xe4 == xe4 redraw! echo equal! else redraw! echo not equal endif Looks like your message is doing an unwanted
            Message 5 of 7 , Dec 1, 2006
            • 0 Attachment
              mbbill wrote:

              >I met a very strange problem recently, that is
              >when I set the following options:
              >set encoding=utf-8
              >set ignorecase
              >then the expression: if "\xe4"=="\xe4" fails.
              >I test it using:
              >if "\xe4"=="\xe4"
              > echo "test"
              >endif
              >but I got nothing output, why ?
              >
              >
              >
              >
              >
              Try

              set encoding=utf-8
              if "\xe4" == "\xe4"
              redraw!
              echo "equal!"
              else
              redraw!
              echo "not equal"
              endif


              Looks like your message is doing an unwanted disappearing act.

              Regards,
              Chip Campbell
            • Charles E Campbell Jr
              ... That s peculiar; I get (when I source the script): equal! 1 with the following emendation to the script: set encoding=utf-8 if xe4 == xe4 redraw!
              Message 6 of 7 , Dec 1, 2006
              • 0 Attachment
                A.J.Mechelynck wrote:

                > Charles E Campbell Jr wrote:
                >
                >> mbbill wrote:
                >>
                >>> I met a very strange problem recently, that is
                >>> when I set the following options:
                >>> set encoding=utf-8
                >>> set ignorecase
                >>> then the expression: if "\xe4"=="\xe4" fails.
                >>> I test it using:
                >>> if "\xe4"=="\xe4"
                >>> echo "test"
                >>> endif
                >>> but I got nothing output, why ?
                >>>
                >>>
                >>>
                >>>
                >>>
                >> Try
                >>
                >> set encoding=utf-8
                >> if "\xe4" == "\xe4"
                >> redraw!
                >> echo "equal!"
                >> else
                >> redraw!
                >> echo "not equal"
                >> endif
                >>
                >>
                >> Looks like your message is doing an unwanted disappearing act.
                >>
                >> Regards,
                >> Chip Campbell
                >>
                >>
                >
                > It's not as simple as that, Dr. Chip: I get 0 (zero) as reply to
                >
                > :echo ("\xe4" == "\xe4")
                >
                > when 'encoding' is UTF-8. However, the byte 0xE4 by itself is not a
                > valid character in UTF-8. I also get 1 (one) in reply to
                >
                > :echo ("ä" == "\xc3\xa4")
                >
                > where ä (a-umlaut) is Unicode codepoint U+00E4, represented in UTF-8
                > by the two bytes 0xC3 0xA4.

                That's peculiar; I get (when I source the script):
                equal!
                1

                with the following emendation to the script:

                set encoding=utf-8
                if "\xe4" == "\xe4"
                redraw!
                echo "equal!"
                else
                redraw!
                echo "not equal"
                endif
                echo ("\xe4" == "\xe4")

                But, without those redraws, I get no message.

                Regards,
                Chip Campbell
              • A.J.Mechelynck
                ... [...] ... I still get 0 (and sourcing the above scriptlet gives me not equal 0 ). In some earlier post, the OP mentioned he needed ignorecase to see the
                Message 7 of 7 , Dec 1, 2006
                • 0 Attachment
                  Charles E Campbell Jr wrote:
                  > A.J.Mechelynck wrote:
                  [...]
                  >> It's not as simple as that, Dr. Chip: I get 0 (zero) as reply to
                  >>
                  >> :echo ("\xe4" == "\xe4")
                  >>
                  >> when 'encoding' is UTF-8. However, the byte 0xE4 by itself is not a
                  >> valid character in UTF-8. I also get 1 (one) in reply to
                  >>
                  >> :echo ("ä" == "\xc3\xa4")
                  >>
                  >> where ä (a-umlaut) is Unicode codepoint U+00E4, represented in UTF-8
                  >> by the two bytes 0xC3 0xA4.
                  >
                  > That's peculiar; I get (when I source the script):
                  > equal!
                  > 1
                  >
                  > with the following emendation to the script:
                  >
                  > set encoding=utf-8
                  > if "\xe4" == "\xe4"
                  > redraw!
                  > echo "equal!"
                  > else
                  > redraw!
                  > echo "not equal"
                  > endif
                  > echo ("\xe4" == "\xe4")
                  >
                  > But, without those redraws, I get no message.
                  >
                  > Regards,
                  > Chip Campbell
                  >
                  >

                  I still get 0 (and sourcing the above scriptlet gives me

                  not equal
                  0

                  ). In some earlier post, the OP mentioned he needed 'ignorecase' to see the
                  "unequal" behaviour (I have encoding=utf-8 ignorecase as defaults).

                  Using gvim 7.0.174, huge version with GTK2-GNOME GUI, on SuSE Linux 9.3


                  Best regards,
                  Tony.
                Your message has been successfully submitted and would be delivered to recipients shortly.