Loading ...
Sorry, an error occurred while loading the content.

"breakat" non-English chars when set linebreak and wrap

Expand Messages
  • Yao G. Zhan
    Hello! I have quite a few text files that is mixed with English and non-English chars such as Chinese. Usually they are documents that have very long lines
    Message 1 of 4 , Aug 26, 2005
    • 0 Attachment
      Hello!

      I have quite a few text files that is mixed with English and non-English
      chars such as Chinese. Usually they are documents that have very long
      lines that every line is a paragraph per se. So I use "set wrap". For
      English text, I prefer "set linebreak" so that a word would not break at
      the end of the screen line end. But VIM doens't work as I expected by
      breaking the line at chars specified in "breakat", especially when with
      Chinese text where a character is a word on its own. For example:

      set linebreak
      set wrap

      now I have this text in a long line (I'll use X to represent a single
      Chinese char in case you can't display it.)

      English begins. English ends. Chinese begins.XXXXXXXXX.

      Then I resize the window a bit narrower. This line should wrap like:

      English begins. English ends. Chinese begins.XXXXX
      XXXX.

      This is because each Chinese char is a word on its own. I expect VIM to
      break at Chinese chars as well as "breakat". But actually VIM wraps it
      like:

      English begins. English ends. Chinese begins.
      XXXXXXXXX.

      Although there are still enough space to display some Chinese chars
      after the period sign "." in the first line.

      Is there any mean that I can do to make VIM work as I expect?

      Thank you!
    • Bram Moolenaar
      ... I understand the problem. breakat is a list of characters, thus it doesn t allow a regexp or character range. Adding all Chinese characters to it would
      Message 2 of 4 , Aug 27, 2005
      • 0 Attachment
        Yao G. Zhan wrote:

        > I have quite a few text files that is mixed with English and non-English
        > chars such as Chinese. Usually they are documents that have very long
        > lines that every line is a paragraph per se. So I use "set wrap". For
        > English text, I prefer "set linebreak" so that a word would not break at
        > the end of the screen line end. But VIM doens't work as I expected by
        > breaking the line at chars specified in "breakat", especially when with
        > Chinese text where a character is a word on its own. For example:
        >
        > set linebreak
        > set wrap
        >
        > now I have this text in a long line (I'll use X to represent a single
        > Chinese char in case you can't display it.)
        >
        > English begins. English ends. Chinese begins.XXXXXXXXX.
        >
        > Then I resize the window a bit narrower. This line should wrap like:
        >
        > English begins. English ends. Chinese begins.XXXXX
        > XXXX.
        >
        > This is because each Chinese char is a word on its own. I expect VIM to
        > break at Chinese chars as well as "breakat". But actually VIM wraps it
        > like:
        >
        > English begins. English ends. Chinese begins.
        > XXXXXXXXX.
        >
        > Although there are still enough space to display some Chinese chars
        > after the period sign "." in the first line.
        >
        > Is there any mean that I can do to make VIM work as I expect?

        I understand the problem. 'breakat' is a list of characters, thus it
        doesn't allow a regexp or character range. Adding all Chinese
        characters to it would make it much too long.

        Perhaps we could allow character ranges. But previously something like
        "[a-z]" would mean the characters "][az-". Perhaps doubling the square
        brackets isn't too bad: "[[a-z]]"? Otherwise a separate option could be
        used.

        Anyway, using a regexp here will certainly slow down processing.
        Currently a 256-entry lookup table is used to speedup processing. That
        won't work for multi-byte characters...

        --
        Nobody will ever need more than 640 kB RAM.
        -- Bill Gates, 1983
        Windows 98 requires 16 MB RAM.
        -- Bill Gates, 1999
        Logical conclusion: Nobody will ever need Windows 98.

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
        \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
      • Camillo Särs
        Hi Bram, ... Do you keep the unicode charater properties in memory somewhere? In that case you might want to consider doing a lookup in that table instead.
        Message 3 of 4 , Aug 29, 2005
        • 0 Attachment
          Hi Bram,

          Bram Moolenaar wrote:
          > Anyway, using a regexp here will certainly slow down processing.
          > Currently a 256-entry lookup table is used to speedup processing. That
          > won't work for multi-byte characters...

          Do you keep the unicode charater properties in memory somewhere? In
          that case you might want to consider doing a lookup in that table
          instead. Actually, I believe that that's the only "right" solution that
          would work reasonably correctly under any language.

          Regards,
          Camillo
          --
          Camillo Särs <ged@...> Aim for the impossible and you
          http://camillo.särs.net will achieve the improbable
        • Bram Moolenaar
          ... There are a few properties of Unicode characters that Vim knows, such as the cell width and upper/lower case. But that a sequence of characters can be
          Message 4 of 4 , Aug 29, 2005
          • 0 Attachment
            Camillo Särs wrote:

            > Bram Moolenaar wrote:
            > > Anyway, using a regexp here will certainly slow down processing.
            > > Currently a 256-entry lookup table is used to speedup processing. That
            > > won't work for multi-byte characters...
            >
            > Do you keep the unicode charater properties in memory somewhere? In
            > that case you might want to consider doing a lookup in that table
            > instead. Actually, I believe that that's the only "right" solution that
            > would work reasonably correctly under any language.

            There are a few properties of Unicode characters that Vim knows, such as
            the cell width and upper/lower case. But that a sequence of characters
            can be wrapped at any point isn't in there. The rough separation in
            latin1 and non-latin1 characters is sufficient for when mixing Asian
            text with English. Perhaps that's sufficient for most people.

            --
            hundred-and-one symptoms of being an internet addict:
            120. You ask a friend, "What's that big shiny thing?" He says, "It's the sun."

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
            \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
          Your message has been successfully submitted and would be delivered to recipients shortly.