Loading ...
Sorry, an error occurred while loading the content.
 

:helpgrep don't match any multi-byte words when enc!=utf-8

Expand Messages
  • mattn
    Hi list. When using vim with encoding!=utf-8, :helpgrep don t match any multi-byte words. For japanese, .jax is possible to add localized help files. And it
    Message 1 of 5 , Nov 28, 2011
      Hi list.

      When using vim with encoding!=utf-8, :helpgrep don't match any multi-byte words.
      For japanese, '.jax'  is possible to add localized help files. And it must be utf-8 encoded.
      But vim is matching the words with different encoding.

      Below is a patch. Please check and include.


      Thanks.

      diff -r 379a6398d462 src/quickfix.c
      --- a/src/quickfix.c Wed Oct 26 23:48:21 2011 +0200
      +++ b/src/quickfix.c Mon Nov 28 21:50:56 2011 +0900
      @@ -3914,6 +3914,14 @@
           regmatch.rm_ic = FALSE;
           if (regmatch.regprog != NULL)
           {
      +#ifdef FEAT_MBYTE
      + vimconv_T vc;
      +
      + vc.vc_type = CONV_NONE;
      + if (!enc_utf8)
      +    convert_setup(&vc, "utf-8", p_enc);
      +#endif
      +
        /* create a new quickfix list */
        qf_new_list(qi, *eap->cmdlinep);
       
      @@ -3948,21 +3956,31 @@
        lnum = 1;
        while (!vim_fgets(IObuff, IOSIZE, fd) && !got_int)
        {
      -    if (vim_regexec(&regmatch, IObuff, (colnr_T)0))
      +
      +    char_u    *line = IObuff;
      +#ifdef FEAT_MBYTE
      +    if (vc.vc_type != CONV_NONE) {
      + line = string_convert(&vc, IObuff, NULL);
      + if (!line)
      +    line = IObuff;
      +    }
      +#endif
      +
      +    if (vim_regexec(&regmatch, line, (colnr_T)0))
           {
      - int l = (int)STRLEN(IObuff);
      + int l = (int)STRLEN(line);
       
        /* remove trailing CR, LF, spaces, etc. */
      - while (l > 0 && IObuff[l - 1] <= ' ')
      -     IObuff[--l] = NUL;
      + while (l > 0 && line[l - 1] <= ' ')
      +     line[--l] = NUL;
       
        if (qf_add_entry(qi, &prevp,
           NULL, /* dir */
           fnames[fi],
           0,
      -    IObuff,
      +    line,
           lnum,
      -    (int)(regmatch.startp[0] - IObuff)
      +    (int)(regmatch.startp[0] - line)
        + 1, /* col */
           FALSE, /* vis_col */
           NULL, /* search pattern */
      @@ -3975,6 +3993,11 @@
           break;
        }
           }
      +#ifdef FEAT_MBYTE
      +    if (line != IObuff)
      + vim_free(line);
      +#endif
      +
           ++lnum;
           line_breakcheck();
        }
      @@ -3990,6 +4013,11 @@
        qi->qf_lists[qi->qf_curlist].qf_ptr =
           qi->qf_lists[qi->qf_curlist].qf_start;
        qi->qf_lists[qi->qf_curlist].qf_index = 1;
      +
      +#ifdef FEAT_MBYTE
      + if (vc.vc_type != CONV_NONE)
      +    convert_setup(&vc, NULL, NULL);
      +#endif
           }
       
           if (p_cpo == empty_option)

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Bram Moolenaar
      ... Thanks for the patch, I ll look into it later. I wonder how many users still use another encoding than utf-8. Unicode makes life so much simpler. -- Some
      Message 2 of 5 , Nov 28, 2011
        Yasuhiro Matsumoto wrote:

        > When using vim with encoding!=utf-8, :helpgrep don't match any multi-byte
        > words.
        > For japanese, '.jax' is possible to add localized help files. And it must
        > be utf-8 encoded.
        > But vim is matching the words with different encoding.
        >
        > Below is a patch. Please check and include.

        Thanks for the patch, I'll look into it later.

        I wonder how many users still use another encoding than utf-8.
        Unicode makes life so much simpler.

        --
        Some of the well know MS-Windows errors:
        EMEMORY Memory error caused by..., eh...
        ELICENSE Your license has expired, give us more money!
        EMOUSE Mouse moved, reinstall Windows
        EILLEGAL Illegal error, you are not allowed to see this
        EVIRUS Undetectable virus found

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ an exciting new programming language -- http://www.Zimbu.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Gary Johnson
        ... Where I used to work, all the Unix systems hadn t been updated in years and were still using ISO 8859-1. That was a year ago and I doubt that it s
        Message 3 of 5 , Nov 28, 2011
          On 2011-11-28, Bram Moolenaar wrote:
          > Yasuhiro Matsumoto wrote:
          >
          > > When using vim with encoding!=utf-8, :helpgrep don't match any multi-byte
          > > words.
          > > For japanese, '.jax' is possible to add localized help files. And it must
          > > be utf-8 encoded.
          > > But vim is matching the words with different encoding.
          > >
          > > Below is a patch. Please check and include.
          >
          > Thanks for the patch, I'll look into it later.
          >
          > I wonder how many users still use another encoding than utf-8.
          > Unicode makes life so much simpler.

          Where I used to work, all the Unix systems hadn't been updated in
          years and were still using ISO 8859-1. That was a year ago and I
          doubt that it's changed.

          Regards,
          Gary

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • mattn
          Add one thing. It s better to check encoding isn t latin1 like. https://gist.github.com/1400283 ... A lot of windows users probably uses DBCS encodings. For
          Message 4 of 5 , Nov 28, 2011
            Add one thing.

            It's better to check encoding isn't latin1 like.


            > I wonder how many users still use another encoding than utf-8.
             Unicode makes life so much simpler. 

            A lot of windows users probably uses DBCS encodings.
            For example, the order of character codes are not same between cp932 and utf-8. We must change the way of regular expression. like [ア-ンあ-ん] to match with japanese hiragana.

            And also, I'm thinking that some application doesn't work correctly with utf-8 encoding even if it's on unix. ex: ambigous widths, lack of fixed with fonts, and dictionaries, etc.

            diff -r 379a6398d462 src/quickfix.c
            --- a/src/quickfix.c Wed Oct 26 23:48:21 2011 +0200
            +++ b/src/quickfix.c Tue Nov 29 09:00:19 2011 +0900
            @@ -3914,6 +3914,14 @@
                 regmatch.rm_ic = FALSE;
                 if (regmatch.regprog != NULL)
                 {
            +#ifdef FEAT_MBYTE
            + vimconv_T vc;
            +
            + vc.vc_type = CONV_NONE;
            + if (!enc_utf8 && !enc_latin1like)
            +    convert_setup(&vc, "utf-8", p_enc);
            +#endif
            +
              /* create a new quickfix list */
              qf_new_list(qi, *eap->cmdlinep);
             
            @@ -3948,21 +3956,31 @@
              lnum = 1;
              while (!vim_fgets(IObuff, IOSIZE, fd) && !got_int)
              {
            -    if (vim_regexec(&regmatch, IObuff, (colnr_T)0))
            +
            +    char_u    *line = IObuff;
            +#ifdef FEAT_MBYTE
            +    if (vc.vc_type != CONV_NONE) {
            + line = string_convert(&vc, IObuff, NULL);
            + if (!line)
            +    line = IObuff;
            +    }
            +#endif
            +
            +    if (vim_regexec(&regmatch, line, (colnr_T)0))
                 {
            - int l = (int)STRLEN(IObuff);
            + int l = (int)STRLEN(line);
             
              /* remove trailing CR, LF, spaces, etc. */
            - while (l > 0 && IObuff[l - 1] <= ' ')
            -     IObuff[--l] = NUL;
            + while (l > 0 && line[l - 1] <= ' ')
            +     line[--l] = NUL;
             
              if (qf_add_entry(qi, &prevp,
                 NULL, /* dir */
                 fnames[fi],
                 0,
            -    IObuff,
            +    line,
                 lnum,
            -    (int)(regmatch.startp[0] - IObuff)
            +    (int)(regmatch.startp[0] - line)
              + 1, /* col */
                 FALSE, /* vis_col */
                 NULL, /* search pattern */
            @@ -3975,6 +3993,11 @@
                 break;
              }
                 }
            +#ifdef FEAT_MBYTE
            +    if (line != IObuff)
            + vim_free(line);
            +#endif
            +
                 ++lnum;
                 line_breakcheck();
              }
            @@ -3990,6 +4013,11 @@
              qi->qf_lists[qi->qf_curlist].qf_ptr =
                 qi->qf_lists[qi->qf_curlist].qf_start;
              qi->qf_lists[qi->qf_curlist].qf_index = 1;
            +
            +#ifdef FEAT_MBYTE
            + if (vc.vc_type != CONV_NONE)
            +    convert_setup(&vc, NULL, NULL);
            +#endif
                 }
             
                 if (p_cpo == empty_option)

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • mattn
            Bram ... If the part && !enc_latin1like is not needed, please remove it. -- You received this message from the vim_dev maillist. Do not top-post! Type your
            Message 5 of 5 , Nov 28, 2011
              Bram

              > if (!enc_utf8 && !enc_latin1like) 

              If the part '&& !enc_latin1like' is not needed, please remove it.

              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            Your message has been successfully submitted and would be delivered to recipients shortly.