Loading ...
Sorry, an error occurred while loading the content.
 

vim7: spell and non-ascii letters (word border problems?)

Expand Messages
  • Mikolaj Machowski
    Hello, Today CVS. (post should be in iso-8859-2) This is similar to one problem from the past - considering non-ascii letters as word borders. Settings: set
    Message 1 of 5 , Jul 3, 2005
      Hello,

      Today CVS.

      (post should be in iso-8859-2)

      This is similar to one problem from the past - considering non-ascii
      letters as word borders.

      Settings:

      set spell
      set spelllang=pl (the same behaviour with pl,en)
      set encoding=iso-8859-2

      Freshly recreated spell file with attached patch from sources

      Word:

      dług <- second letter is l with slash, 179 (hex b3) in iso-8859-2 enc.

      When typing



      d is highlighted as bad word

      dłu

      d is still highlighted as bad word

      dług

      highlighting vanishes because whole "dług" is valid word (debt).

      Looks like space is more important as word border than non-ascii letter.
      (OTOH words are properly added to the 'spellfile')
      This situation disturbs writing very, very much. I gave only simplest
      example I could find. Behaviour can be observed also when typing words
      like: grzęda (here "grz" is highlighted up to typing of d when "grzęd"
      becomes valid word (perches)), część, język, powyższe, etc. There are
      really many situations when user will be alarmed with highlighting in
      completely legit situations.

      Also when bad word contain non-ascii letter only part up to that letter
      (or after that letter) is highlighted:

      Śniadecki, siglów
      ^^^^^^^^ ^^^^
      Parts underscored with ^ are highlighted as bad.

      m.

      --
      LaTeX + Vim = http://vim-latex.sourceforge.net/
      Vim-list(s) Users Map: (last change 15 May)
      http://skawina.eu.org/mikolaj/vimlist
      CLEWN - http://clewn.sf.net
    • Bram Moolenaar
      ... There was a problem in reading the list with word characters from the .spl file. This patch should fix it: Index: spell.c
      Message 2 of 5 , Jul 4, 2005
        Mikolaj Machowski wrote:

        > Today CVS.
        >
        > (post should be in iso-8859-2)
        >
        > This is similar to one problem from the past - considering non-ascii
        > letters as word borders.
        >
        > Settings:
        >
        > set spell
        > set spelllang=pl (the same behaviour with pl,en)
        > set encoding=iso-8859-2
        >
        > Freshly recreated spell file with attached patch from sources
        >
        > Word:
        >
        > d³ug <- second letter is l with slash, 179 (hex b3) in iso-8859-2 enc.
        >
        > When typing
        >
        > d³
        >
        > d is highlighted as bad word
        >
        > d³u
        >
        > d is still highlighted as bad word
        >
        > d³ug
        >
        > highlighting vanishes because whole "d³ug" is valid word (debt).
        >
        > Looks like space is more important as word border than non-ascii letter.
        > (OTOH words are properly added to the 'spellfile')
        > This situation disturbs writing very, very much. I gave only simplest
        > example I could find. Behaviour can be observed also when typing words
        > like: grzêda (here "grz" is highlighted up to typing of d when "grzêd"
        > becomes valid word (perches)), czê¶æ, jêzyk, powy¿sze, etc. There are
        > really many situations when user will be alarmed with highlighting in
        > completely legit situations.
        >
        > Also when bad word contain non-ascii letter only part up to that letter
        > (or after that letter) is highlighted:
        >
        > ¦niadecki, siglów
        > ^^^^^^^^ ^^^^
        > Parts underscored with ^ are highlighted as bad.

        There was a problem in reading the list with word characters from the
        .spl file. This patch should fix it:

        Index: spell.c
        ===================================================================
        RCS file: /cvsroot/vim/vim7/src/spell.c,v
        retrieving revision 1.34
        diff -u -r1.34 spell.c
        --- spell.c 3 Jul 2005 21:25:22 -0000 1.34
        +++ spell.c 4 Jul 2005 09:01:49 -0000
        @@ -591,7 +591,7 @@
        static void int_wordlist_spl __ARGS((char_u *fname));
        static void spell_load_cb __ARGS((char_u *fname, void *cookie));
        static slang_T *spell_load_file __ARGS((char_u *fname, char_u *lang, slang_T *old_lp, int silent));
        -static char_u *read_cnt_string __ARGS((FILE *fd, int cnt_bytes, int *errp));
        +static char_u *read_cnt_string __ARGS((FILE *fd, int cnt_bytes, int *lenp));
        static int set_sofo __ARGS((slang_T *lp, char_u *from, char_u *to));
        static void set_sal_first __ARGS((slang_T *lp));
        #ifdef FEAT_MBYTE
        @@ -603,7 +603,7 @@
        static int find_region __ARGS((char_u *rp, char_u *region));
        static int captype __ARGS((char_u *word, char_u *end));
        static void spell_reload_one __ARGS((char_u *fname, int added_word));
        -static int set_spell_charflags __ARGS((char_u *flags, char_u *upp));
        +static int set_spell_charflags __ARGS((char_u *flags, int cnt, char_u *upp));
        static int set_spell_chartab __ARGS((char_u *fol, char_u *low, char_u *upp));
        static void write_spell_chartab __ARGS((FILE *fd));
        static int spell_casefold __ARGS((char_u *p, int len, char_u *buf, int buflen));
        @@ -1837,12 +1837,12 @@

        /* <charflagslen> <charflags> */
        p = read_cnt_string(fd, 1, &cnt);
        - if (cnt == FAIL)
        + if (cnt < 0)
        goto endFAIL;

        /* <fcharslen> <fchars> */
        - fol = read_cnt_string(fd, 2, &cnt);
        - if (cnt == FAIL)
        + fol = read_cnt_string(fd, 2, &ccnt);
        + if (ccnt < 0)
        {
        vim_free(p);
        goto endFAIL;
        @@ -1850,7 +1850,7 @@

        /* Set the word-char flags and fill SPELL_ISUPPER() table. */
        if (p != NULL && fol != NULL)
        - i = set_spell_charflags(p, fol);
        + i = set_spell_charflags(p, cnt, fol);

        vim_free(p);
        vim_free(fol);
        @@ -1861,7 +1861,7 @@

        /* <midwordlen> <midword> */
        lp->sl_midword = read_cnt_string(fd, 2, &cnt);
        - if (cnt == FAIL)
        + if (cnt < 0)
        goto endFAIL;

        /* <prefcondcnt> <prefcond> ... */
        @@ -1912,10 +1912,10 @@
        {
        ftp = &((fromto_T *)gap->ga_data)[gap->ga_len];
        ftp->ft_from = read_cnt_string(fd, 1, &i);
        - if (i == FAIL)
        + if (i <= 0)
        goto endFAIL;
        ftp->ft_to = read_cnt_string(fd, 1, &i);
        - if (i == FAIL)
        + if (i <= 0)
        {
        vim_free(ftp->ft_from);
        goto endFAIL;
        @@ -1957,19 +1957,24 @@

        /* <salfromlen> <salfrom> */
        bp = read_cnt_string(fd, 2, &cnt);
        - if (cnt == FAIL)
        + if (cnt < 0)
        goto endFAIL;

        /* <saltolen> <salto> */
        fol = read_cnt_string(fd, 2, &cnt);
        - if (cnt == FAIL)
        + if (cnt < 0)
        {
        vim_free(bp);
        goto endFAIL;
        }

        /* Store the info in lp->sl_sal and/or lp->sl_sal_first. */
        - i = set_sofo(lp, bp, fol);
        + if (bp != NULL && fol != NULL)
        + i = set_sofo(lp, bp, fol);
        + else if (bp != NULL || fol != NULL)
        + i = FAIL; /* only one of two strings is an error */
        + else
        + i = OK;

        vim_free(bp);
        vim_free(fol);
        @@ -2036,7 +2041,7 @@

        /* <saltolen> <salto> */
        smp->sm_to = read_cnt_string(fd, 1, &ccnt);
        - if (ccnt == FAIL)
        + if (ccnt < 0)
        {
        vim_free(smp->sm_lead);
        goto formerr;
        @@ -2052,10 +2057,13 @@
        smp->sm_oneof_w = NULL;
        else
        smp->sm_oneof_w = mb_str2wide(smp->sm_oneof);
        - smp->sm_to_w = mb_str2wide(smp->sm_to);
        + if (smp->sm_to == NULL)
        + smp->sm_to_w = NULL;
        + else
        + smp->sm_to_w = mb_str2wide(smp->sm_to);
        if (smp->sm_lead_w == NULL
        || (smp->sm_oneof_w == NULL && smp->sm_oneof != NULL)
        - || smp->sm_to_w == NULL)
        + || (smp->sm_to_w == NULL && smp->sm_to != NULL))
        {
        vim_free(smp->sm_lead);
        vim_free(smp->sm_to);
        @@ -2074,11 +2082,13 @@

        /* <maplen> <mapstr> */
        p = read_cnt_string(fd, 2, &cnt);
        - if (cnt == FAIL)
        + if (cnt < 0)
        goto endFAIL;
        - set_map_str(lp, p);
        - vim_free(p);
        -
        + if (p != NULL)
        + {
        + set_map_str(lp, p);
        + vim_free(p);
        + }

        /* round 1: <LWORDTREE>
        * round 2: <KWORDTREE>
        @@ -2155,13 +2165,13 @@
        * Read a length field from "fd" in "cnt_bytes" bytes.
        * Allocate memory, read the string into it and add a NUL at the end.
        * Returns NULL when the count is zero.
        - * Sets "*errp" to FAIL when there is an error, OK otherwise.
        + * Sets "*cntp" to -1 when there is an error, length of the result otherwise.
        */
        static char_u *
        -read_cnt_string(fd, cnt_bytes, errp)
        +read_cnt_string(fd, cnt_bytes, cntp)
        FILE *fd;
        int cnt_bytes;
        - int *errp;
        + int *cntp;
        {
        int cnt = 0;
        int i;
        @@ -2173,18 +2183,20 @@
        if (cnt < 0)
        {
        EMSG(_(e_spell_trunc));
        - *errp = FAIL;
        + *cntp = -1;
        return NULL;
        }
        + *cntp = cnt;
        + if (cnt == 0)
        + return NULL; /* nothing to read, return NULL */

        /* allocate memory */
        str = alloc((unsigned)cnt + 1);
        if (str == NULL)
        {
        - *errp = FAIL;
        + *cntp = -1;
        return NULL;
        }
        - *errp = OK;

        /* Read the string. Doesn't check for truncated file. */
        for (i = 0; i < cnt; ++i)
        @@ -2697,6 +2709,9 @@
        {
        char_u *p;

        + if (lp->sl_midword == NULL) /* there aren't any */
        + return;
        +
        for (p = lp->sl_midword; *p != NUL; )
        #ifdef FEAT_MBYTE
        if (has_mbyte)
        @@ -5604,34 +5619,39 @@
        * Set the spell character tables from strings in the .spl file.
        */
        static int
        -set_spell_charflags(flags, upp)
        +set_spell_charflags(flags, cnt, fol)
        char_u *flags;
        - char_u *upp;
        + int cnt; /* length of "flags" */
        + char_u *fol;
        {
        /* We build the new tables here first, so that we can compare with the
        * previous one. */
        spelltab_T new_st;
        int i;
        - char_u *p = upp;
        + char_u *p = fol;
        int c;

        clear_spell_chartab(&new_st);

        - for (i = 0; flags[i] != NUL; ++i)
        + for (i = 0; i < 128; ++i)
        {
        - new_st.st_isw[i + 128] = (flags[i] & CF_WORD) != 0;
        - new_st.st_isu[i + 128] = (flags[i] & CF_UPPER) != 0;
        + if (i < cnt)
        + {
        + new_st.st_isw[i + 128] = (flags[i] & CF_WORD) != 0;
        + new_st.st_isu[i + 128] = (flags[i] & CF_UPPER) != 0;
        + }

        - if (*p == NUL)
        - return FAIL;
        + if (*p != NUL)
        + {
        #ifdef FEAT_MBYTE
        - c = mb_ptr2char_adv(&p);
        + c = mb_ptr2char_adv(&p);
        #else
        - c = *p++;
        + c = *p++;
        #endif
        - new_st.st_fold[i + 128] = c;
        - if (i + 128 != c && new_st.st_isu[i + 128] && c < 256)
        - new_st.st_upper[c] = i + 128;
        + new_st.st_fold[i + 128] = c;
        + if (i + 128 != c && new_st.st_isu[i + 128] && c < 256)
        + new_st.st_upper[c] = i + 128;
        + }
        }

        return set_spell_finish(&new_st);
        @@ -8836,6 +8856,8 @@

        /* replace string */
        s = smp[n].sm_to;
        + if (s == NULL)
        + s = (char_u *)"";
        pf = smp[n].sm_rules;
        p0 = (vim_strchr(pf, '<') != NULL) ? 1 : 0;
        if (p0 == 1 && z == 0)
        @@ -9138,18 +9160,20 @@
        if (p0 == 1 && z == 0)
        {
        /* rule with '<' is used */
        - if (reslen > 0 && *ws != NUL && (wres[reslen - 1] == c
        + if (reslen > 0 && ws != NULL && *ws != NUL
        + && (wres[reslen - 1] == c
        || wres[reslen - 1] == *ws))
        reslen--;
        z0 = 1;
        z = 1;
        k0 = 0;
        - while (*ws != NUL && word[i + k0] != NUL)
        - {
        - word[i + k0] = *ws;
        - k0++;
        - ws++;
        - }
        + if (ws != NULL)
        + while (*ws != NUL && word[i + k0] != NUL)
        + {
        + word[i + k0] = *ws;
        + k0++;
        + ws++;
        + }
        if (k > k0)
        mch_memmove(word + i + k0, word + i + k,
        sizeof(int) * (STRLEN(word + i + k) + 1));
        @@ -9162,14 +9186,19 @@
        /* no '<' rule used */
        i += k - 1;
        z = 0;
        - while (*ws != NUL && ws[1] != NUL && reslen < MAXWLEN)
        - {
        - if (reslen == 0 || wres[reslen - 1] != *ws)
        - wres[reslen++] = *ws;
        - ws++;
        - }
        + if (ws != NULL)
        + while (*ws != NUL && ws[1] != NUL
        + && reslen < MAXWLEN)
        + {
        + if (reslen == 0 || wres[reslen - 1] != *ws)
        + wres[reslen++] = *ws;
        + ws++;
        + }
        /* new "actual letter" */
        - c = *ws;
        + if (ws == NULL)
        + c = NUL;
        + else
        + c = *ws;
        if (strstr((char *)s, "^^") != NULL)
        {
        if (c != NUL)


        --
        hundred-and-one symptoms of being an internet addict:
        220. Your wife asks for sex and you tell her where to find you on IRC.

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
        \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
      • Mikolaj Machowski
        ... Patch malformed at line 41. Or when saving message in other way: patch v. 2.5.9 mikolaj@localhost ~/vim7/src $ patch -p0 --dry-run
        Message 3 of 5 , Jul 4, 2005
          Dnia poniedziałek 04 lipiec 2005 12:27, Bram Moolenaar napisał:
          >
          > There was a problem in reading the list with word characters from the
          > .spl file. This patch should fix it:
          >
          Patch malformed at line 41.

          Or when saving message in other way:

          patch v. 2.5.9

          mikolaj@localhost ~/vim7/src $ patch -p0 --dry-run < dlug2
          patching file spell.c
          Hunk #3 FAILED at 1837.
          Hunk #4 FAILED at 1850.
          Hunk #5 FAILED at 1861.
          Hunk #6 FAILED at 1912.
          Hunk #7 FAILED at 1957.
          Hunk #8 FAILED at 2041.
          Hunk #9 FAILED at 2057.
          Hunk #10 FAILED at 2082.
          Hunk #11 FAILED at 2165.
          Hunk #12 FAILED at 2183.
          Hunk #13 succeeded at 2709 with fuzz 2.
          Hunk #14 FAILED at 5619.
          Hunk #15 FAILED at 8856.
          Hunk #16 FAILED at 9160.
          Hunk #17 FAILED at 9186.
          14 out of 17 hunks FAILED -- saving rejects to file spell.c.rej

          When merged changes by hand got message about unknown ws (note however
          I could make some mistake, this is big patch).

          Looks like you send me patch to not published version of spell.c

          m.
        • Bram Moolenaar
          ... The patch was against the spell.c in CVS. It probably got mangled by the mail system somewhere (e.g, changing tabs to spaces). Anyway, it will be in the
          Message 4 of 5 , Jul 4, 2005
            Mikolaj Machowski wrote:

            > Dnia poniedzia³ek 04 lipiec 2005 12:27, Bram Moolenaar napisa³:
            > >
            > > There was a problem in reading the list with word characters from the
            > > .spl file. This patch should fix it:
            > >
            > Patch malformed at line 41.
            >
            > Or when saving message in other way:
            >
            > patch v. 2.5.9
            >
            > mikolaj@localhost ~/vim7/src $ patch -p0 --dry-run < dlug2
            > patching file spell.c
            > Hunk #3 FAILED at 1837.
            > Hunk #4 FAILED at 1850.
            > Hunk #5 FAILED at 1861.
            > Hunk #6 FAILED at 1912.
            > Hunk #7 FAILED at 1957.
            > Hunk #8 FAILED at 2041.
            > Hunk #9 FAILED at 2057.
            > Hunk #10 FAILED at 2082.
            > Hunk #11 FAILED at 2165.
            > Hunk #12 FAILED at 2183.
            > Hunk #13 succeeded at 2709 with fuzz 2.
            > Hunk #14 FAILED at 5619.
            > Hunk #15 FAILED at 8856.
            > Hunk #16 FAILED at 9160.
            > Hunk #17 FAILED at 9186.
            > 14 out of 17 hunks FAILED -- saving rejects to file spell.c.rej
            >
            > When merged changes by hand got message about unknown ws (note however
            > I could make some mistake, this is big patch).
            >
            > Looks like you send me patch to not published version of spell.c

            The patch was against the spell.c in CVS. It probably got mangled by
            the mail system somewhere (e.g, changing tabs to spaces). Anyway, it
            will be in the next snapshot.

            --
            A computer without Windows is like a fish without a bicycle.

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
            \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
          • Mikolaj Machowski
            ... Aah. patch -l and everything works (patching and patch itself). m.
            Message 5 of 5 , Jul 4, 2005
              Dnia poniedziałek 04 lipiec 2005 21:43, Bram Moolenaar napisał:
              >
              > The patch was against the spell.c in CVS. It probably got mangled by
              > the mail system somewhere (e.g, changing tabs to spaces). Anyway, it
              > will be in the next snapshot.

              Aah. patch -l and everything works (patching and patch itself).

              m.
            Your message has been successfully submitted and would be delivered to recipients shortly.