Loading ...
Sorry, an error occurred while loading the content.

[vim-multibyte] Illegal match at multibyte trail

Expand Messages
  • Taro Muraoka
    -Simple search in search.c. (2K) Please test. ... Taro Muraoka Problem: Illegal match at trail of multibyte character caused trouble.
    Message 1 of 4 , Jan 15, 2000
    • 0 Attachment
      -Simple search in search.c. (2K)
      Please test.
      ----
      Taro Muraoka <koron@...>


      Problem: Illegal match at trail of multibyte character caused trouble.
      Solution: Add codes consider multibyte. Foward pointer 2 byte if pointed
      character is lead byte of multibyte.
      Files: src/search.c


      *** ./src.orig/search.c Thu Dec 30 07:41:50 1999
      --- ./src/search.c Sat Jan 15 14:45:06 2000
      ***************
      *** 265,270 ****
      --- 265,276 ----
      {
      /* don't ignore case if pattern has uppercase */
      for (p = pat; *p; )
      + #ifdef MULTI_BYTE
      + if( IsLeadByte( *p ) ){
      + if( !*++p ) break;
      + ++p;
      + }else
      + #endif
      if (isupper(*p++))
      reg_ic = FALSE;
      }
      ***************
      *** 1081,1088 ****
      --- 1087,1102 ----
      {
      if ((col += dir) < 0 || col >= len)
      return FALSE;
      + #ifdef MULTI_BYTE
      + if (dir < 0 && IsTrailByte(p, &p[col]))
      + continue; /* skip multibyte's trail byte */
      + #endif
      if (p[col] == c)
      break;
      + #ifdef MULTI_BYTE
      + if (dir > 0 && IsLeadByte(p[col]))
      + col += dir; /* skip multibyte's trail byte */
      + #endif
      }
      }
      if (type)
      ***************
      *** 1421,1427 ****
      --- 1435,1450 ----
      comment_col = check_linecomment(linep);
      }
      else
      + #if defined(MULTI_BYTE) && defined(WIN32)
      + {
      + --pos.col;
      + if ( is_dbcs == (int)DBCS_JPN &&
      + IsTrailByte(linep, linep + pos.col) )
      + --pos.col;
      + }
      + #else
      --pos.col;
      + #endif
      }
      else /* forward search */
      {
      ***************
      *** 1440,1446 ****
      --- 1463,1478 ----
      line_breakcheck();
      }
      else
      + #if defined(MULTI_BYTE) && defined(WIN32)
      + {
      + ++pos.col;
      + if ( is_dbcs == (int)DBCS_JPN &&
      + IsTrailByte(linep, linep + pos.col) )
      + ++pos.col;
      + }
      + #else
      ++pos.col;
      + #endif
      }

      /*
    • Bram Moolenaar
      ... Looks like a good fix. However, I don t see why is_dbcs isn t tested in the first two locations, and specifically compared to DBCS_JPN in the second two.
      Message 2 of 4 , Jan 15, 2000
      • 0 Attachment
        Taro Muraoka wrote:

        > -Simple search in search.c. (2K)
        > Please test.

        > Problem: Illegal match at trail of multibyte character caused trouble.
        > Solution: Add codes consider multibyte. Foward pointer 2 byte if pointed
        > character is lead byte of multibyte.
        > Files: src/search.c

        Looks like a good fix. However, I don't see why is_dbcs isn't tested in the
        first two locations, and specifically compared to DBCS_JPN in the second two.

        Also, I don't understand why the second two should only be used for Win32.
        Wouldn't they be required for all systems?

        Please consider this change instead:

        *** ../vim-5.6a.28/src/search.c Wed Dec 29 12:11:01 1999
        --- src/search.c Sat Jan 15 21:12:19 2000
        ***************
        *** 265,272 ****
        {
        /* don't ignore case if pattern has uppercase */
        for (p = pat; *p; )
        ! if (isupper(*p++))
        ! reg_ic = FALSE;
        }
        no_smartcase = FALSE;
        }
        --- 265,283 ----
        {
        /* don't ignore case if pattern has uppercase */
        for (p = pat; *p; )
        ! {
        ! #ifdef MULTI_BYTE
        ! if (is_dbcs && IsLeadByte(*p))
        ! {
        ! if (*++p == NUL)
        ! break;
        ! ++p;
        ! }
        ! else
        ! #endif
        ! if (isupper(*p++))
        ! reg_ic = FALSE;
        ! }
        }
        no_smartcase = FALSE;
        }
        ***************
        *** 1081,1088 ****
        --- 1092,1107 ----
        {
        if ((col += dir) < 0 || col >= len)
        return FALSE;
        + #ifdef MULTI_BYTE
        + if (is_dbcs && dir < 0 && IsTrailByte(p, &p[col]))
        + continue; /* skip multibyte's trail byte */
        + #endif
        if (p[col] == c)
        break;
        + #ifdef MULTI_BYTE
        + if (is_dbcs && dir > 0 && IsLeadByte(p[col]))
        + ++col; /* skip multibyte's trail byte */
        + #endif
        }
        }
        if (type)
        ***************
        *** 1421,1427 ****
        --- 1440,1452 ----
        comment_col = check_linecomment(linep);
        }
        else
        + {
        --pos.col;
        + #ifdef MULTI_BYTE
        + if (is_dbcs && IsTrailByte(linep, linep + pos.col))
        + --pos.col;
        + #endif
        + }
        }
        else /* forward search */
        {
        ***************
        *** 1440,1446 ****
        --- 1465,1478 ----
        line_breakcheck();
        }
        else
        + {
        + #ifdef MULTI_BYTE
        + if (is_dbcs && IsLeadByte(linep[pos.col])
        + && linep[pos.col + 1] != NUL)
        + ++pos.col;
        + #endif
        ++pos.col;
        + }
        }

        /*

        --
        hundred-and-one symptoms of being an internet addict:
        187. You promise yourself that you'll only stay online for another
        15 minutes...at least once every hour.

        --/-/---- Bram Moolenaar ---- Bram@... ---- Bram@... ---\-\--
        \ \ www.vim.org/iccf www.moolenaar.net www.vim.org / /
      • Taro Muraoka
        ... (omit) It works very good. Thank you Bram for fix. First two locations were made by me Taro Muraoka. And I had forgot to think about 8-bit character.
        Message 3 of 4 , Jan 15, 2000
        • 0 Attachment
          Bram Moolenaar wrote:
          > Looks like a good fix. However, I don't see why is_dbcs isn't tested in the
          > first two locations, and specifically compared to DBCS_JPN in the second two.
          >
          > Also, I don't understand why the second two should only be used for Win32.
          > Wouldn't they be required for all systems?
          >
          > Please consider this change instead:
          >
          > *** ../vim-5.6a.28/src/search.c Wed Dec 29 12:11:01 1999
          > --- src/search.c Sat Jan 15 21:12:19 2000
          :
          :
          (omit)

          It works very good. Thank you Bram for fix.

          First two locations were made by me Taro Muraoka. And I had forgot to think
          about 8-bit character. Second two were originated by other. He probably
          thought IsLeadByte() is very high cost and this code is not needed for UNIX's
          encoding EUC. Multibyte of EUC consist of 0x80-0xFF, but multibyte of
          Shift-JIS consist of 0x40-0xFF.

          ----
          Taro Muraoka mailto:koron@...
        • Bram Moolenaar
          ... Thanks for checking. I have included these fixes. I didn t include the one for the regexp code yet, I ll do that later. IsLeadByte() isn t high cost, but
          Message 4 of 4 , Jan 16, 2000
          • 0 Attachment
            Taro Muraoka wrote:

            > It works very good. Thank you Bram for fix.
            >
            > First two locations were made by me Taro Muraoka. And I had forgot to think
            > about 8-bit character. Second two were originated by other. He probably
            > thought IsLeadByte() is very high cost and this code is not needed for UNIX's
            > encoding EUC. Multibyte of EUC consist of 0x80-0xFF, but multibyte of
            > Shift-JIS consist of 0x40-0xFF.

            Thanks for checking. I have included these fixes. I didn't include the one
            for the regexp code yet, I'll do that later.

            IsLeadByte() isn't high cost, but IsTrailByte() is, because it starts looking
            at the start of the string.

            This multi-byte code is really two-byte code. For Vim 6.0 we will have to
            change it all drastically to support UTF-8. Perhaps the multi-byte codes that
            are supported until now can be converted to UTF-8? The main advantage will be
            that you can edit two buffers with different encoding at the same time.

            --
            hundred-and-one symptoms of being an internet addict:
            193. You ask your girlfriend to drive home so you can sit back with
            your PDA and download the information to your laptop

            --/-/---- Bram Moolenaar ---- Bram@... ---- Bram@... ---\-\--
            \ \ www.vim.org/iccf www.moolenaar.net www.vim.org / /
          Your message has been successfully submitted and would be delivered to recipients shortly.