Loading ...
Sorry, an error occurred while loading the content.
 

Re: More spell files available

Expand Messages
  • Tony Mechelynck
    ... From: Bram Moolenaar To: Sent: Thursday, September 01, 2005 6:09 PM Subject: More spell files available ... [...]
    Message 1 of 8 , Sep 1, 2005
      ----- Original Message -----
      From: "Bram Moolenaar" <Bram@...>
      To: <vim-dev@...>
      Sent: Thursday, September 01, 2005 6:09 PM
      Subject: More spell files available


      >
      > I have made spell files for all the languages that Myspell supports.
      > You can download them directly from:
      > ftp://ftp.vim.org/pub/vim/unstable/runtime/spell/
      >
      > This requires a recent snapshot, since the spell file format has been
      > changed about a week ago. I don't expect the format to change again,
      > since I now use sections for different parts of information. Thus when
      > adding something for compound words, spell files that don't use compound
      > words will remain valid.
      [...]

      OK, if the next snapshot includes them they will find themselves in my next
      W32 distribution, see http://users.skynet.be/antoine.mechelynck/vim/ --
      check the "last change" at the top of the page, then click "The experimental
      Vim 7" to get to the relevant paragraph. (Read before downloading! No
      warranty, no reimbursements).

      However, since I go to town this evening, I don't know exactly _when_ I will
      compile the next snapshot. That's why it's important to check the change
      date (at top of page) and/or the snapshot number (part of the .zip filename;
      yesterday's was 0139). The "last change" timestamp is in UTC, which is 2
      hours earlier than the current "official time" where I live (Central
      European summer time, zone +0200). OTOH the "compile date/time" in the
      ":version" listing of my builds is my "time zone time" which explains why
      the "compile date" in the ":version" listing can be later (by up to approx.
      2 hours) to the "change date" on the HTML page. If you know your own time
      zone (compared to UTC) you can determine when my latest build was produced.
      <OFFTOPIC>
      Don't confuse adding and subtracting: the sun rises in the East and sets in
      the West, therefore "midday" happens progressively later, the more you go to
      the West (on a world map whose left and right edges coincide with the
      International Date Line, which runs approximately but not exactly
      North-South through the Pacific and makes a zigzag between Kamchatka and
      Alaska). IOW when it is 12:00 UTC (in 24-hour notation, i.e., midday) it is
      "morning" (on the same day) in the Americas and "afternoon" (also on the
      same day) in most of the other continents.
      </OFFTOPIC>


      Best regards,
      Tony.
    • Tony Mechelynck
      ... From: Tony Mechelynck To: ; Bram Moolenaar Sent: Thursday, September 01, 2005 6:55
      Message 2 of 8 , Sep 1, 2005
        ----- Original Message -----
        From: "Tony Mechelynck" <antoine.mechelynck@...>
        To: <vim-dev@...>; "Bram Moolenaar" <Bram@...>
        Sent: Thursday, September 01, 2005 6:55 PM
        Subject: Re: More spell files available


        > ----- Original Message -----
        > From: "Bram Moolenaar" <Bram@...>
        > To: <vim-dev@...>
        > Sent: Thursday, September 01, 2005 6:09 PM
        > Subject: More spell files available
        >
        >
        >>
        >> I have made spell files for all the languages that Myspell supports.
        >> You can download them directly from:
        >> ftp://ftp.vim.org/pub/vim/unstable/runtime/spell/
        >>
        >> This requires a recent snapshot, since the spell file format has been
        >> changed about a week ago. I don't expect the format to change again,
        >> since I now use sections for different parts of information. Thus when
        >> adding something for compound words, spell files that don't use compound
        >> words will remain valid.
        > [...]
        >
        > OK, if the next snapshot includes them they will find themselves in my
        > next W32 distribution, see
        > http://users.skynet.be/antoine.mechelynck/vim/ -- check the "last change"
        > at the top of the page, then click "The experimental Vim 7" to get to the
        > relevant paragraph. (Read before downloading! No warranty, no
        > reimbursements).
        >
        > However, since I go to town this evening, I don't know exactly _when_ I
        > will compile the next snapshot. That's why it's important to check the
        > change date (at top of page) and/or the snapshot number (part of the .zip
        > filename; yesterday's was 0139). [...]

        OK, it's done. Snapshot 140 distribution for W32 has just been uploaded and
        there are indeed a lot of new spell-related files which didn't exist
        yesterday.

        Happy Vimming!
        Tony.
      • Mikolaj Machowski
        ... [cut] ... One thing with suggestions. Word: rzubr(badly spelled {z with dot above}ubr - bison-like animal from Central Europe) set spelllang=pl set
        Message 3 of 8 , Sep 2, 2005
          Dnia czwartek, 1 września 2005 18:09, Bram Moolenaar napisał:
          > - Try if suggestions make sense. You may set 'verbose' to see the
          [cut]
          > As always, suggestions are welcome.

          One thing with suggestions.

          Word: rzubr(badly spelled {z with dot above}ubr - bison-like animal
          from Central Europe)

          set spelllang=pl
          set spelllang=pl,en

          Correct spelling comes at the top.

          set spelllang=en,pl

          Strange things happen. I understand English words have preference but
          there are also other Polish words before {z.}ubr:

          1 "Rysu br"
          2 "Sabr"
          3 "Issuer"
          4 "Rubra"
          5 "Rs suer"
          6 "Rósłby" <- R{oacute}słby
          7 "Tsuby"
          8 "Tsuba"
          9 "Tsubo"
          10 "Tsubą" <- Tsub{aogonek}
          11 "Tsubę" <- Tsub{eogonek}
          12 "Reube"
          13 "Rubs"
          14 "Ruby"
          15 "Subj"
          16 "Subs"
          17 "Tsub"
          18 "Sutr"
          19 "Reub"
          20 "Rube"
          21 "Rubi"
          22 "Ruhr"
          23 "Żubr" <- correct word {Z.}ubr

          Ideal would be to split suggestions in two lists, one for each language.
          Unfortunately if I remember correctly this is not possible because Vim
          creates in memory one, big word list with preferences (in this case) for
          suggestions taken from "en".

          m.
        • Bram Moolenaar
          ... I notice that when adding ,en the scoring changes. The sound-a-like mechanism for English is also used for Polish. Perhaps we should not do ... Then
          Message 4 of 8 , Sep 3, 2005
            Mikolaj Machowski wrote:

            > Dnia czwartek, 1 wrze¶nia 2005 18:09, Bram Moolenaar napisa³:
            > > - Try if suggestions make sense. You may set 'verbose' to see the
            > [cut]
            > > As always, suggestions are welcome.
            >
            > One thing with suggestions.
            >
            > Word: rzubr(badly spelled {z with dot above}ubr - bison-like animal
            > from Central Europe)
            >
            > set spelllang=pl
            > set spelllang=pl,en
            >
            > Correct spelling comes at the top.

            I notice that when adding ",en" the scoring changes. The sound-a-like
            mechanism for English is also used for Polish. Perhaps we should not do
            that? However, if you would have used:
            :set spellang=en,en-math
            Then you do want to use the English sound folding for en-math too.

            Perhaps you can add SOFO items to the Polish spell file? That would
            give better sound folding and suggestions. And we can avoid using the
            English sound folding for Polish.

            > set spelllang=en,pl
            >
            > Strange things happen. I understand English words have preference but
            > there are also other Polish words before {z.}ubr:

            The sound folding appears to change the scoring. It's strange though
            that "en,pl" differs so much from "pl,en".

            > Ideal would be to split suggestions in two lists, one for each language.
            > Unfortunately if I remember correctly this is not possible because Vim
            > creates in memory one, big word list with preferences (in this case) for
            > suggestions taken from "en".

            Making two lists should not be necessary, since the scoring mechanism
            should find the best matching words. Thus it should recognize the
            language implicitly. Perhaps it would be useful to indicate what word
            list the suggestion came from.

            --
            hundred-and-one symptoms of being an internet addict:
            168. You have your own domain name.

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
            \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
          • Mikolaj Machowski
            ... It would be the best. ... Can Vim recognise difference between en,pl (or any other lang code?) If could make difference for en,pl but use the same technic
            Message 5 of 8 , Sep 3, 2005
              Dnia sobota, 3 września 2005 12:12, Bram Moolenaar napisał:
              > > > - Try if suggestions make sense. You may set 'verbose' to see the
              > >
              > > [cut]
              > >
              > > > As always, suggestions are welcome.
              > >
              > > One thing with suggestions.
              > >
              > > Word: rzubr(badly spelled {z with dot above}ubr - bison-like animal
              > > from Central Europe)
              > >
              > > set spelllang=pl
              > > set spelllang=pl,en
              > >
              > > Correct spelling comes at the top.
              >
              > I notice that when adding ",en" the scoring changes. The sound-a-like
              > mechanism for English is also used for Polish. Perhaps we should not do
              >
              > that?

              It would be the best.

              > However, if you would have used:
              > :set spellang=en,en-math

              Can Vim recognise difference between en,pl (or any other lang code?)

              If could make difference for en,pl but use the same technic for
              en,en-math...

              > Then you do want to use the English sound folding for en-math too.
              >
              > Perhaps you can add SOFO items to the Polish spell file? That would
              > give better sound folding and suggestions. And we can avoid using the
              > English sound folding for Polish.

              Don't think so. As i understand from ":help SOFO" this is
              letter-for-letter mechanism while in Polish there is many
              letter-for-2letters exchanges.

              Also made some tests and only use of REP was making significant
              improvement in suggestions.

              > > set spelllang=en,pl
              > >
              > > Strange things happen. I understand English words have preference but
              > > there are also other Polish words before {z.}ubr:
              >
              > The sound folding appears to change the scoring. It's strange though
              > that "en,pl" differs so much from "pl,en".

              I understood first language is enforcing its rules on second (and
              all next) language. Which is quite logical but as I posted example it
              makes some strange effect.

              > > Ideal would be to split suggestions in two lists, one for each
              > > language. Unfortunately if I remember correctly this is not possible
              > > because Vim creates in memory one, big word list with preferences (in
              > > this case) for suggestions taken from "en".
              >
              > Making two lists should not be necessary, since the scoring mechanism
              > should find the best matching words. Thus it should recognize the
              > language implicitly. Perhaps it would be useful to indicate what word
              > list the suggestion came from.

              Yes. And list could be sorted by this indication (to group them).

              Maybe also Vim could guess which language is currently used.
              I proposed it previously: Get current line with 1 or 2 lines of context
              (3-5 lines total), pass it to spell checking probing each language from
              spelllang separately. Give priority to settings of language with lower
              number of errors.

              Pseudo-code:

              let spelllang_set = &spelllang
              let langlist = split(&spelllang, ',')
              let langbads = {}
              for i in langlist
              let &spelllang = i
              let text = getline(line(".")-2, line(".")+2)
              let wordlist = <- get rid of punctuation and split text by whitespace
              ->
              let counter = 0
              for k in wordlist
              if tolower(k) != tolower(spellsuggest(k,1)[0])
              let counter += 1
              endif
              endfor
              let langbads[counter] = i
              endor

              " Now we have dictionary {"20":"en", "3":"pl"} . This is quite safe to
              " assume is this situation we want to write in pl, so
              let &spelllang = langbads[min(keys(langbads))]
              " Hmm. I remember some problems with remapping of z?
              normal! z?
              let &spelllang = spelllang_set

              It would be faster if made binary. Maybe option for 'spellsuggest':
              "lang:2". number would be number of context lines.

              Remains one problem: special dictionaries. There would be hardly any
              text written entirely in en-math.

              m.
            • Bram Moolenaar
              ... OK, I ll look into using the sound folding only for the language it is specified for. ... The main issue would actually be the additions. This is what
              Message 6 of 8 , Sep 4, 2005
                Mikolaj Machowski wrote:

                > > I notice that when adding ",en" the scoring changes. The
                > > sound-a-like mechanism for English is also used for Polish. Perhaps
                > > we should not do that?
                >
                > It would be the best.

                OK, I'll look into using the sound folding only for the language it is
                specified for.

                > > However, if you would have used:
                > > :set spellang=en,en-math
                >
                > Can Vim recognise difference between en,pl (or any other lang code?)
                >
                > If could make difference for en,pl but use the same technic for
                > en,en-math...

                The main issue would actually be the additions. This is what someone
                adds to his personal dictionary with "zg". You do want sound folding
                for that.

                Otherwise, if there is a language specified with two letters, it would
                be possible to use the same sound folding for other languages with these
                letters that don't specify sound folding itself. That would work for
                "en", "en-math", "en-whatever". Hopefully this isn't too tricky.

                > > Perhaps you can add SOFO items to the Polish spell file? That would
                > > give better sound folding and suggestions. And we can avoid using the
                > > English sound folding for Polish.
                >
                > Don't think so. As i understand from ":help SOFO" this is
                > letter-for-letter mechanism while in Polish there is many
                > letter-for-2letters exchanges.

                You would need to use SAL items them. That's a lot more complicated,
                but also provides the possibility for more accurate sounds-a-like
                matching.

                > Also made some tests and only use of REP was making significant
                > improvement in suggestions.

                OK. You could suggest this to the maintainers of the Polish word list.

                > Maybe also Vim could guess which language is currently used.
                > I proposed it previously: Get current line with 1 or 2 lines of context
                > (3-5 lines total), pass it to spell checking probing each language from
                > spelllang separately. Give priority to settings of language with lower
                > number of errors.

                It's possible, but in border cases this will go wrong. Especially when
                mixing short lines of Polish and English. I also think there is not
                much use for it, since Vim already supports mixing languages.

                --
                "Hit any key to continue" it said, but nothing happened after F sharp.

                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
              • Mikolaj Machowski
                ... see below ... I tested it 2 month ago and only REP has significant improvement in suggestions (SAL only slight). My REP lines are from the beginning of
                Message 7 of 8 , Sep 4, 2005
                  Dnia niedziela, 4 września 2005 17:52, Bram Moolenaar napisał:
                  > > > Perhaps you can add SOFO items to the Polish spell file? That would
                  > > > give better sound folding and suggestions. And we can avoid using
                  > > > the English sound folding for Polish.
                  > >
                  > > Don't think so. As i understand from ":help SOFO" this is
                  > > letter-for-letter mechanism while in Polish there is many
                  > > letter-for-2letters exchanges.
                  >
                  > You would need to use SAL items them. That's a lot more complicated,
                  > but also provides the possibility for more accurate sounds-a-like
                  > matching.
                  >
                  see below
                  > > Also made some tests and only use of REP was making significant
                  > > improvement in suggestions.
                  >
                  > OK. You could suggest this to the maintainers of the Polish word list.

                  I tested it 2 month ago and only REP has significant improvement in
                  suggestions (SAL only slight). My REP lines are from the beginning of
                  July in kurnik files.

                  m.
                Your message has been successfully submitted and would be delivered to recipients shortly.