Loading ...
Sorry, an error occurred while loading the content.

Re: "breakat" non-English chars when set linebreak and wrap

Expand Messages
  • Bram Moolenaar
    ... I understand the problem. breakat is a list of characters, thus it doesn t allow a regexp or character range. Adding all Chinese characters to it would
    Message 1 of 4 , Aug 27, 2005
    • 0 Attachment
      Yao G. Zhan wrote:

      > I have quite a few text files that is mixed with English and non-English
      > chars such as Chinese. Usually they are documents that have very long
      > lines that every line is a paragraph per se. So I use "set wrap". For
      > English text, I prefer "set linebreak" so that a word would not break at
      > the end of the screen line end. But VIM doens't work as I expected by
      > breaking the line at chars specified in "breakat", especially when with
      > Chinese text where a character is a word on its own. For example:
      >
      > set linebreak
      > set wrap
      >
      > now I have this text in a long line (I'll use X to represent a single
      > Chinese char in case you can't display it.)
      >
      > English begins. English ends. Chinese begins.XXXXXXXXX.
      >
      > Then I resize the window a bit narrower. This line should wrap like:
      >
      > English begins. English ends. Chinese begins.XXXXX
      > XXXX.
      >
      > This is because each Chinese char is a word on its own. I expect VIM to
      > break at Chinese chars as well as "breakat". But actually VIM wraps it
      > like:
      >
      > English begins. English ends. Chinese begins.
      > XXXXXXXXX.
      >
      > Although there are still enough space to display some Chinese chars
      > after the period sign "." in the first line.
      >
      > Is there any mean that I can do to make VIM work as I expect?

      I understand the problem. 'breakat' is a list of characters, thus it
      doesn't allow a regexp or character range. Adding all Chinese
      characters to it would make it much too long.

      Perhaps we could allow character ranges. But previously something like
      "[a-z]" would mean the characters "][az-". Perhaps doubling the square
      brackets isn't too bad: "[[a-z]]"? Otherwise a separate option could be
      used.

      Anyway, using a regexp here will certainly slow down processing.
      Currently a 256-entry lookup table is used to speedup processing. That
      won't work for multi-byte characters...

      --
      Nobody will ever need more than 640 kB RAM.
      -- Bill Gates, 1983
      Windows 98 requires 16 MB RAM.
      -- Bill Gates, 1999
      Logical conclusion: Nobody will ever need Windows 98.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
    • Camillo Särs
      Hi Bram, ... Do you keep the unicode charater properties in memory somewhere? In that case you might want to consider doing a lookup in that table instead.
      Message 2 of 4 , Aug 29, 2005
      • 0 Attachment
        Hi Bram,

        Bram Moolenaar wrote:
        > Anyway, using a regexp here will certainly slow down processing.
        > Currently a 256-entry lookup table is used to speedup processing. That
        > won't work for multi-byte characters...

        Do you keep the unicode charater properties in memory somewhere? In
        that case you might want to consider doing a lookup in that table
        instead. Actually, I believe that that's the only "right" solution that
        would work reasonably correctly under any language.

        Regards,
        Camillo
        --
        Camillo Särs <ged@...> Aim for the impossible and you
        http://camillo.särs.net will achieve the improbable
      • Bram Moolenaar
        ... There are a few properties of Unicode characters that Vim knows, such as the cell width and upper/lower case. But that a sequence of characters can be
        Message 3 of 4 , Aug 29, 2005
        • 0 Attachment
          Camillo Särs wrote:

          > Bram Moolenaar wrote:
          > > Anyway, using a regexp here will certainly slow down processing.
          > > Currently a 256-entry lookup table is used to speedup processing. That
          > > won't work for multi-byte characters...
          >
          > Do you keep the unicode charater properties in memory somewhere? In
          > that case you might want to consider doing a lookup in that table
          > instead. Actually, I believe that that's the only "right" solution that
          > would work reasonably correctly under any language.

          There are a few properties of Unicode characters that Vim knows, such as
          the cell width and upper/lower case. But that a sequence of characters
          can be wrapped at any point isn't in there. The rough separation in
          latin1 and non-latin1 characters is sufficient for when mixing Asian
          text with English. Perhaps that's sufficient for most people.

          --
          hundred-and-one symptoms of being an internet addict:
          120. You ask a friend, "What's that big shiny thing?" He says, "It's the sun."

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
          \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
        Your message has been successfully submitted and would be delivered to recipients shortly.