Loading ...
Sorry, an error occurred while loading the content.
 

Re: patch: Enabling utf-8 hangul input.

Expand Messages
  • Bram Moolenaar
    ... Any update on this? -- Compilation process failed successfully. /// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net ///
    Message 1 of 9 , Jan 4, 2012
      I replied to Shawn Kim (long ago):

      > > >>>> In response to the following comment made by Bram on Aug 2, 2007:
      > > >>>> (can be viewed at http://groups.google.com/group/vim_dev/browse_thread/thread/3b73a504c77ba803/)
      > > >>>>
      > > >>>>> I hesitate removing the Hangul support without knowing for sure that it
      > > >>>>> is not needed. Browsing through the messages I do see remarks that it
      > > >>>>> might still be useful to a few people.
      > > >>>>>
      > > >>>>> Perhaps the Hangul support can be changed to also work for UTF-8?
      > > >>>>
      > > >>>> I made (finally) a patch that enables hangul-input module to work for
      > > >>>> UTF-8.
      > > >>>
      > > >>> Thanks. I'm glad to finally see this implemented.
      > > >>> It still needs some work though.
      > > >>>
      > > >>>> Finally, hg diff:
      > > >>>> ... It is too long. But I cannot find a way to attach a file, so, here
      > > >>>> goes the diff:
      > > >>>
      > > >>> Please do send this as an attachment. Long lines got wrapped, making it
      > > >>> impossible to apply.
      > > >>>
      > > >>> The change to getchar.c should not be there. Perhaps you are not
      > > >>> encoding the strings that go into the input buffer correctly? A CSI
      > > >>> should be put there as three characters: CSI KS_EXTRA KE_CSI.
      > > >>> I guess fix_input_buffer() can be used in push_raw_key().
      > > >>
      > > >> 1. I took a look into fix_input_buffer() and used it to "fix" hangul input buffer.
      > > >> But fix_input_buffer() function did not do anything.
      > > >> It escapes CSI into K_SPECIAL KS_EXTRA KE_CSI sequence
      > > >> only when the first byte of the input buffer is CSI.
      > > >> But the hangul codes in question have 0x9b in the middle or at the end,
      > > >> e.g) EB A0 9B.
      > > >> The function does not have any chance to "fix" the buffer.
      > > >
      > > > I think that when CSI appears halfway a utf-8 byte sequence it doesn't
      > > > need to be escaped. That only happens when it's at the start of a
      > > > character, it needs to be escaped to avoid it being interpreted as a
      > > > special key byte sequence.
      > >
      > > Yes, I also believe the 0x9b in the middle of an encoded byte
      > > does not need to be escaped. It's part of valid code.
      >
      > I was wrong, it does need to be escaped. But for the GUI this happens
      > early on, not in fix_input_buffer(). See key_press_event(), first use
      > of CSI.
      >
      > > >> 2. 0x9b in hangul codes is valid code. I encoded the strings correctly.
      > > >> 0x9b(CSI) is part of utf-8 encoded hangul code.
      > > >
      > > > The encoding in the input buffer is a bit weird, it includes special
      > > > byte sequences, and then what the user types has to be escaped to avoid
      > > > that byte sequence being handled in the wrong way.
      > > >
      > > >> 3. Question: I guest that the CSI is some kind of special character that
      > > >> indicates subsequent characters have some special meaning, right? Then,
      > > >> in gui mode, in what case a user can generate CSI code?
      > > >> If I knew what does the CSI do and when the CSI is generated, it would be
      > > >> much easier for me to do the job.
      > > >
      > > > In the GUI it's a bit different, we don't read raw bytes from what the
      > > > user types, but create a byte stream from events. E.g. in
      > > > src/gui_gtk_x11.c in key_press_event().
      > >
      > > The hangul input automata is initiated from THAT routine.
      > > Following is the callstack when hangul input automata is being in action:
      > >
      > > src/gui_gtk_x11.c: key_press_event()
      > > --> src/ui.c: add_to_input_buffer()
      > > --> src/hangulin.c: hangul_input_process() (the automata)
      > >
      > > or
      > >
      > > src/gui_x11.c: gui_x11_key_hit_cb()
      > > --> src/ui.c: add_to_input_buffer()
      > > --> src/hangulin.c: hangul_input_process() (the automata)
      > >
      > > The hangul_input_process() creates hangul code from what user has
      > > typed in. And then it puts the hangul code in "inbuf" buffer by
      > > calling push_raw_key().
      > >
      > > And then somewhere in the way, the "inbuf" is processd by vgetc() in
      > > src/getchar.c. The function finds out that the 0x9b(CSI) is in the
      > > middle of the code, and the routine I commented out (src/getchar.c:
      > > vgetc()) interprets the 0x9b as a special code, and modifies "inbuf",
      > > where it should not be interpreted as a special key, but be preserved
      > > as they are.
      > >
      > > Am I missing something?
      > > And, what should I do to avoid interpreting 0x9b as CSI?
      > >
      > > Please consider that hangul input routine is meaningful only when
      > > MULTIBYTE and GUI option is enabled.
      >
      > You need to do the same thing as what happens in the loop in
      > key_press_event() to escape the CSI characters.
      >
      > Also see the comment above add_to_input_buf().

      Any update on this?

      --
      Compilation process failed successfully.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ an exciting new programming language -- http://www.Zimbu.org ///
      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    Your message has been successfully submitted and would be delivered to recipients shortly.