Loading ...
Sorry, an error occurred while loading the content.

[RFC] colnr() function

Expand Messages
  • ZyX
    Given recent discussion around matchaddpos() and the fact that converting virtual column to byte offset has a larger variety of use cases (I personally needed
    Message 1 of 5 , Jul 4, 2014
    • 0 Attachment
      Given recent discussion around matchaddpos() and the fact that converting virtual column to byte offset has a larger variety of use cases (I personally needed this to get <C-v>-selected block without altering marks, registers, cursor position, etc) I propose the new function colnr():

      colnr :: (string, number, options) -> col

      This function accepts number: generic index and translates it to byte offset according to given options which may be one of the following:

      1. `'col'`: returns `number` or length of the string plus one, whichever is smaller. Alias to {'unicode': 0, 'tabstop': 1, 'fixed_width_tab': 1}
      2. `'codepoint'`: returns byte offset of Unicode codepoint with index given as `number` (here and below index starts from 1). Alias to {'tabstop': 1, 'fixed_width_tab': 1, 'ambiwidth': 'single', 'fullwidth_len': 1, 'diacritics_len': 1, 'invalid_unicode': 'error'}. Is opposite of `strchars()`.
      3. `'virtcol'`: alias to {} which is the same as {'tabstop': &tabstop, 'fixed_width_tab': 0, 'ambiwidth': &ambiwidth, 'fullwidth_len': 2, 'diacritics_len': 0, 'diacritics_start_len': 1, 'unicode': 1, 'invalid_unicode': 'strtrans'}.
      4. A dictionary with the following keys:

      - tabstop: effective value of &tabstop. All defaults are listed in 3.
      - fixed_width_tab: if non-zero, makes every tab be assumed to occupy `tabstop` columns.
      - ambiwidth: effective value of &ambiwidth.
      - fullwidth_len: number of indexes fullwidth characters occupy.
      - diacritics_len: number of indexes diacritics characters occupy.
      - diacritics_start_len: length of the one diacritic character that is at the start of the string.
      - unicode: if non-zero, recognizes one unicode character as occupying one index. Disables `ambiwidth`, `fullwidth_len`, `diacritics_len`, `diacritics_start_len` and `invalid_unicode` options.
      - invalid_unicode: determines how to treat bytes that are not valid unicode. Valid values are
      - strtrans: use length of the `strtrans` result.
      - error: fail with error, return zero.
      - single: treat as occupying one index.

      Use-cases:

      1. Transform screen column to byte offset:

      echo colnr("\ta", 2, 'virtcol') " Returns 1 since 2 is in the middle of the tab
      2. Transform screen column reported by some compiler to byte offset:

      echo colnr(string, idx, {'tabstop': 8})
      3. Transform screen column reported by some compiler that only knows ASCII and still things 1 character == 1 byte:

      echo colnr(string, idx, {'tabstop': 8, 'unicode': 0})
      4. Transform index of unicode() string reported by some python program:

      echo colnr(string, idx + 1, 'codepoint')
      5. Allow configuring transformer without :if branches: allow byte index

      echo colnr(string, idx, 'col') " Will echo idx most of time

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/d/optout.
    • Bram Moolenaar
      ... It s probably better to use encoding , some users will have other encodings than Unicode, e.g. a compiler in Japan or China would output double-byte
      Message 2 of 5 , Jul 6, 2014
      • 0 Attachment
        ZyX wrote:

        > Given recent discussion around matchaddpos() and the fact that
        > converting virtual column to byte offset has a larger variety of use
        > cases (I personally needed this to get <C-v>-selected block without
        > altering marks, registers, cursor position, etc) I propose the new
        > function colnr():
        >
        > colnr :: (string, number, options) -> col
        >
        > This function accepts number: generic index and translates it to byte
        > offset according to given options which may be one of the following:
        >
        > 1. `'col'`: returns `number` or length of the string plus one,
        > whichever is smaller. Alias to {'unicode': 0, 'tabstop': 1,
        > 'fixed_width_tab': 1}
        > 2. `'codepoint'`: returns byte offset of Unicode codepoint with index
        > given as `number` (here and below index starts from 1). Alias to
        > {'tabstop': 1, 'fixed_width_tab': 1, 'ambiwidth': 'single',
        > 'fullwidth_len': 1, 'diacritics_len': 1, 'invalid_unicode': 'error'}.
        > Is opposite of `strchars()`.
        > 3. `'virtcol'`: alias to {} which is the same as {'tabstop': &tabstop,
        > 'fixed_width_tab': 0, 'ambiwidth': &ambiwidth, 'fullwidth_len': 2,
        > 'diacritics_len': 0, 'diacritics_start_len': 1, 'unicode': 1,
        > 'invalid_unicode': 'strtrans'}.
        > 4. A dictionary with the following keys:
        >
        > - tabstop: effective value of &tabstop. All defaults are listed in 3.
        > - fixed_width_tab: if non-zero, makes every tab be assumed to occupy `tabstop` columns.
        > - ambiwidth: effective value of &ambiwidth.
        > - fullwidth_len: number of indexes fullwidth characters occupy.
        > - diacritics_len: number of indexes diacritics characters occupy.
        > - diacritics_start_len: length of the one diacritic character that is at the start of the string.
        > - unicode: if non-zero, recognizes one unicode character as occupying one index. Disables `ambiwidth`, `fullwidth_len`, `diacritics_len`, `diacritics_start_len` and `invalid_unicode` options.

        It's probably better to use 'encoding', some users will have other
        encodings than Unicode, e.g. a compiler in Japan or China would output
        double-byte characters.

        > - invalid_unicode: determines how to treat bytes that are not valid unicode. Valid values are
        > - strtrans: use length of the `strtrans` result.
        > - error: fail with error, return zero.
        > - single: treat as occupying one index.
        >
        > Use-cases:
        >
        > 1. Transform screen column to byte offset:
        >
        > echo colnr("\ta", 2, 'virtcol') " Returns 1 since 2 is in the middle of the tab
        > 2. Transform screen column reported by some compiler to byte offset:
        >
        > echo colnr(string, idx, {'tabstop': 8})
        > 3. Transform screen column reported by some compiler that only knows ASCII and still things 1 character == 1 byte:
        >
        > echo colnr(string, idx, {'tabstop': 8, 'unicode': 0})
        > 4. Transform index of unicode() string reported by some python program:
        >
        > echo colnr(string, idx + 1, 'codepoint')
        > 5. Allow configuring transformer without :if branches: allow byte index
        >
        > echo colnr(string, idx, 'col') " Will echo idx most of time

        Sounds useful. It's going to be a lot of work to implement though.
        Especially with all the options.

        I would suggest first writing the documentation, thus explaining to a
        Vim script programmer how to use it, to find out if it's not too
        difficult to understand.

        --
        Two cows are standing together in a field. One asks the other:
        "So what do you think about this Mad Cow Disease?"
        The other replies: "That doesn't concern me. I'm a helicopter."

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ an exciting new programming language -- http://www.Zimbu.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/d/optout.
      • LCD 47
        ... [...] This seems useful, but do you _have_ to do all that in a single function? Perhaps 2. and 3. could be split into separate functions? Also, I d say 4.
        Message 3 of 5 , Jul 6, 2014
        • 0 Attachment
          On 4 July 2014, ZyX <zyx.vim@...> wrote:
          > Given recent discussion around matchaddpos() and the fact that
          > converting virtual column to byte offset has a larger variety of use
          > cases (I personally needed this to get <C-v>-selected block without
          > altering marks, registers, cursor position, etc) I propose the new
          > function colnr():
          >
          > colnr :: (string, number, options) -> col
          >
          > This function accepts number: generic index and translates it to byte
          > offset according to given options which may be one of the following:
          >
          > 1. `'col'`: returns `number` or length of the string plus one,
          > whichever is smaller. Alias to {'unicode': 0, 'tabstop': 1,
          > 'fixed_width_tab': 1}
          >
          > 2. `'codepoint'`: returns byte offset of Unicode codepoint with
          > index given as `number` (here and below index starts from 1). Alias
          > to {'tabstop': 1, 'fixed_width_tab': 1, 'ambiwidth': 'single',
          > 'fullwidth_len': 1, 'diacritics_len': 1, 'invalid_unicode':
          > 'error'}. Is opposite of `strchars()`.
          >
          > 3. `'virtcol'`: alias to {} which is the same as {'tabstop': &tabstop,
          > 'fixed_width_tab': 0, 'ambiwidth': &ambiwidth, 'fullwidth_len': 2,
          > 'diacritics_len': 0, 'diacritics_start_len': 1, 'unicode': 1,
          > 'invalid_unicode': 'strtrans'}.
          >
          > 4. A dictionary with the following keys:
          >
          > - tabstop: effective value of &tabstop. All defaults are listed in 3.
          > - fixed_width_tab: if non-zero, makes every tab be assumed to
          > occupy `tabstop` columns.
          > - ambiwidth: effective value of &ambiwidth.
          > - fullwidth_len: number of indexes fullwidth characters occupy.
          > - diacritics_len: number of indexes diacritics characters occupy.
          > - diacritics_start_len: length of the one diacritic character that
          > is at the start of the string.
          > - unicode: if non-zero, recognizes one unicode character as
          > occupying one index. Disables `ambiwidth`, `fullwidth_len`,
          > `diacritics_len`, `diacritics_start_len` and `invalid_unicode`
          > options.
          > - invalid_unicode: determines how to treat bytes that are not
          > valid unicode. Valid values are
          > - strtrans: use length of the `strtrans` result. error: fail with
          > - error, return zero. single: treat as occupying one index.
          [...]

          This seems useful, but do you _have_ to do all that in a single
          function? Perhaps 2. and 3. could be split into separate functions?

          Also, I'd say 4. should just use the actual settings rather than
          parse them from a Christmas tree of options. Vim tradition seems to be
          to leave it to the user to save and restore the context before doing the
          job.

          /lcd

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/d/optout.
        • ZyX
          ... It does not make sense as long as 4. exists. ... There are no options for treating fullwidth characters as single width or for messing with diacritics
          Message 4 of 5 , Jul 6, 2014
          • 0 Attachment
            > This seems useful, but do you _have_ to do all that in a single
            > function? Perhaps 2. and 3. could be split into separate functions?

            It does not make sense as long as 4. exists.

            > Also, I'd say 4. should just use the actual settings rather than
            > parse them from a Christmas tree of options. Vim tradition seems to be
            > to leave it to the user to save and restore the context before doing the
            > job.

            There are no options for treating fullwidth characters as single width or for messing with diacritics exactly that way.

            Also see use-case 5. If I proposed new options *the very first thing* I would implement is a wrapper function for setting and restoring those options that takes a dictionary. Because it is simpler to explain contributors how to configure specific compiler support if it is such dictionary.

            I know the tradition, but if you check out things like &ignorecase you will see that in one place &ignorecase is ignored, but there is a setting for ignoring case, in other place it is not ignored, but there still is a setting and in the third place there are no options but to use explicit \C/\c. Given that there are exactly no options for some features of this function I propose it would be inconsistent to pass some options in a dictionary and take some other only from environment.

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/d/optout.
          • ZyX
            ... And, by the way, setting &tabstop and &ambiwidth without &lazyredraw will for sure cause massive redraw. And I am almost certain that setting &ambiwidth
            Message 5 of 5 , Jul 6, 2014
            • 0 Attachment
              > Also, I'd say 4. should just use the actual settings rather than
              > parse them from a Christmas tree of options. Vim tradition seems to be
              > to leave it to the user to save and restore the context before doing the
              > job.

              And, by the way, setting &tabstop and &ambiwidth without &lazyredraw will for sure cause massive redraw. And I am almost certain that setting &ambiwidth will result in broken display.

              Also do you want to answer your users why your script has set &tabstop value that was set in `ftplugin/python.vim`? This will happen if you save and restore values. Not to mention the fact that most developers that use save/restore have not heard about :finally, for some reason do not understand that it applies there or are simply forgetting to use it. Same with :setlocal.

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/d/optout.
            Your message has been successfully submitted and would be delivered to recipients shortly.