Loading ...
Sorry, an error occurred while loading the content.

Re: Vim 7 performance notes

Expand Messages
  • Nikolai Weibull
    ... Yes. It s based on Doug Lea s malloc: http://gee.cs.oswego.edu/dl/html/malloc.html nikolai
    Message 1 of 30 , Feb 4, 2007
    • 0 Attachment
      On 2/4/07, Yakov Lerner <iler.ml@...> wrote:

      > Gnu malloc (glibc) is exceptionally fast, iirc.

      Yes. It's based on Doug Lea's malloc:

      http://gee.cs.oswego.edu/dl/html/malloc.html

      nikolai
    • Bram Moolenaar
      ... Speed should be OK this way, but it does keep up to 32 Kbyte allocated. That may not seem much, but if we do this in many places it quickly adds up. Can
      Message 2 of 30 , Feb 4, 2007
      • 0 Attachment
        Alexei Alexandrov wrote:

        > >
        > > The idea to gradually increase the chunk size makes sense, but not
        > > everywhere. For the syntax stack it's probably better to start with a
        > > stack that is mostly needed, then growing quite quickly (say double the
        > > chunk size every time). That's because when recursive things are
        > > involved we need much more space. And copying the stack to another
        > > place is more expensive than clearing with zeroes.
        > >
        > > Perhaps you can do some investigation about what the size mostly ends up
        > > to be. Then use that with a special version of ga_grow() that increases
        > > the chunk size every time.
        >
        > I've also tried the approach with persisting the regstack and backpos
        > allocation across calls to vim_regexec_both. It seems to work even
        > better than increasing the grow size in my particular test cases. And
        > I can't think up any situation when it can slow things down. Could you
        > please take a look at the patch attached and provide your opinion?

        Speed should be OK this way, but it does keep up to 32 Kbyte allocated.
        That may not seem much, but if we do this in many places it quickly adds
        up.

        Can you show the benchmarks you used to see the performance and the
        stack space that is being used? Otherwise we keep guessing the best
        numbers.

        Coding detail: please don't use "if (!number)", use "if (number == 0)",
        that is so much easier to read. Checking if ga_data is NULL would be
        simpler.

        --
        From "know your smileys":
        :q vi user saying, "How do I get out of this damn emacs editor?"

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ download, build and distribute -- http://www.A-A-P.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
      • Alexei Alexandrov
        ... Any keep limit greater than initial size (e.g. 16384 bytes) will give the same effect in many cases. By many cases here I mean cases when the stack
        Message 3 of 30 , Feb 5, 2007
        • 0 Attachment
          Hi Bram Moolenaar, you wrote:

          >
          > Speed should be OK this way, but it does keep up to 32 Kbyte allocated.
          > That may not seem much, but if we do this in many places it quickly adds
          > up.

          Any "keep limit" greater than initial size (e.g. 16384 bytes) will give the same effect in many cases. By "many cases" here I mean cases when the stack just doesn't grow more than 512 bytes (very usual for syntax highlighting). Is 16384 still too big? Regexp matching and syntax highlighting is one of the most important things in VIM so maybe it's tolerable? Also, as far as I understand, previosly (version 6) VIM used program stack as regstack, right? Compared with program stack solution current solution with the change proposed is much better because program stack never shrinks - once it grows to 1M, for example, the memory won't be ever given back. At least, this is so on Windows, Linux and I guess most Unixes.

          > Can you show the benchmarks you used to see the performance and the
          > stack space that is being used? Otherwise we keep guessing the best
          > numbers.

          OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4 Northwood, 2.4 GHz, 512M RAM.
          I did 2 things: understanding stack usage and performance measurement. To understand the stack usage I added some simple logging to regexp.c: printing ga_maxlen before regstack and backpos clearing and forced the arrays to have the grow size 1 (so that ga_maxlen will be high watermark in bytes). Of course, for performance investigations I used version with normal grow size and without logging.

          The version with logging was used to perform the following:

          1. With syntax highlighting on, the following files were viewed from gg to G (with PgDn) and the following high watermark of stack size was observed: spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes), big text file with a lot of syntax errors (252 bytes)

          2. Command

          gvim -c "vimgrep /a\(.\)*z/ *.c | q"

          was executed in VIM 7 source directory. Stack watermark - 31008 bytes. This is example of non-optimal regexp which tends to take a lot of stack space.

          Similar other test cases were tried leading to the following conclusions: 1) there is a lot of vim_regexec_both calls during syntax highlighting which work in very shallow stacks (<1K); 2) when user searches for something with regexp there are cases when regular expression can require big amount of memory (>10K).

          The performance measurements were done against original version (7.0.188) and modified regexp.c (initial: 8192, keep limit: 16384). Each measurement was performed 3 times, minimal time was picked up.

          First, I test the syntax highlighting speed:
          Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | redraw! | endfor | q"
          Original version: 10.6 seconds
          Modified version: 8.5 seconds
          The difference is about 25%.

          Second, I did some grepping through Vim sources again:
          Command: gvim.exe -c "vimgrep /a.*z/ *.c | q"
          Original version: 6.6 seconds
          Modified version: 5.6 seconds
          The difference is about 15%.

          > Coding detail: please don't use "if (!number)", use "if (number == 0)",
          > that is so much easier to read. Checking if ga_data is NULL would be
          > simpler.

          Got it - no problem.

          --
          Alexei Alexandrov
        • Bram Moolenaar
          ... It sounds like keeping only 1024 bytes would already work for most situations. That would be an acceptable amount to keep allocated at all times. So why
          Message 4 of 30 , Feb 5, 2007
          • 0 Attachment
            Alexei Alexandrov wrote:

            > OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4
            > Northwood, 2.4 GHz, 512M RAM.
            > I did 2 things: understanding stack usage and performance measurement.
            > To understand the stack usage I added some simple logging to regexp.c:
            > printing ga_maxlen before regstack and backpos clearing and forced the
            > arrays to have the grow size 1 (so that ga_maxlen will be high
            > watermark in bytes). Of course, for performance investigations I used
            > version with normal grow size and without logging.
            >
            > The version with logging was used to perform the following:
            >
            > 1. With syntax highlighting on, the following files were viewed from
            > gg to G (with PgDn) and the following high watermark of stack size was
            > observed: spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes),
            > big text file with a lot of syntax errors (252 bytes)

            It sounds like keeping only 1024 bytes would already work for most
            situations. That would be an acceptable amount to keep allocated at
            all times. So why don't we use this as the initial size, and when it
            grows larger we free it when finished. The growth size can be doubled
            each time perhaps.

            > 2. Command
            >
            > gvim -c "vimgrep /a\(.\)*z/ *.c | q"
            >
            > was executed in VIM 7 source directory. Stack watermark - 31008 bytes.
            > This is example of non-optimal regexp which tends to take a lot of
            > stack space.

            Right, this may happen and stack size wil greatly depend on the line
            length.

            > Similar other test cases were tried leading to the following
            > conclusions: 1) there is a lot of vim_regexec_both calls during syntax
            > highlighting which work in very shallow stacks (<1K); 2) when user
            > searches for something with regexp there are cases when regular
            > expression can require big amount of memory (>10K).
            >
            > The performance measurements were done against original version
            > (7.0.188) and modified regexp.c (initial: 8192, keep limit: 16384).
            > Each measurement was performed 3 times, minimal time was picked up.
            >
            > First, I test the syntax highlighting speed:
            > Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | redraw! | endfor | q"
            > Original version: 10.6 seconds
            > Modified version: 8.5 seconds
            > The difference is about 25%.
            >
            > Second, I did some grepping through Vim sources again:
            > Command: gvim.exe -c "vimgrep /a.*z/ *.c | q"
            > Original version: 6.6 seconds
            > Modified version: 5.6 seconds
            > The difference is about 15%.

            That's very useful, thanks for diving into this.

            --
            From "know your smileys":
            8<}} Glasses, big nose, beard

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ download, build and distribute -- http://www.A-A-P.org ///
            \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
          • Mikolaj Machowski
            Hello, Nice work. Could you send or place somewhere patches? I d like to test them on more complex regexps. TIA m.
            Message 5 of 30 , Feb 6, 2007
            • 0 Attachment
              Hello,

              Nice work. Could you send or place somewhere patches? I'd like to test
              them on more complex regexps.

              TIA

              m.
            • Alexei Alexandrov
              ... Here it is. Note that the biggest speed-up is observed when regexp is matched a lot of times. The regexp mechanism itself is not affected at all here - so
              Message 6 of 30 , Feb 6, 2007
              • 0 Attachment
                Hi Mikolaj Machowski, you wrote:

                > Nice work. Could you send or place somewhere patches? I'd like to test
                > them on more complex regexps.

                Here it is. Note that the biggest speed-up is observed when regexp is matched a lot of times. The regexp mechanism itself is not affected at all here - so if you have one regexp which runs very long you won't probably notice any major difference.

                --
                Alexei Alexandrov
              • Alexei Alexandrov
                ... I ve also noticed that Vim spends somewhat significant time on startup loading spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time
                Message 7 of 30 , Feb 6, 2007
                • 0 Attachment
                  Hi Alexei Alexandrov, you wrote:

                  > Hi Bram et al.,
                  >
                  > I'm doing some performance investigations of Vim code trying to understand
                  > whether there are any possibilities to improve it.

                  I've also noticed that Vim spends somewhat significant time on startup loading spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the stack. The reason for this is CRT blocking since the runtime is multithreaded. It's Windows, but on Linux it should be similar.

                  As far as I understand, Vim doesn't access the spell file from multiple threads. Thus, the situation can be improved a lot: on Linux by using getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. Before VS 2005 this function can be emulated by macro:

                  #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
                  0xff & *(_stream)->_ptr++ : _filbuf(_stream))

                  By switching to non-blocking getc() in spell.c I was able to reduce Vim startup time from about 0.9 seconds to about 0.7 seconds.

                  --
                  Alexei Alexandrov
                • George V. Reilly
                  ... How did you measure the time in EnterCriticalSection and LeaveCriticalSection? If there s no lock contention, these routines are little more than
                  Message 8 of 30 , Feb 6, 2007
                  • 0 Attachment
                    Alexei Alexandrov wrote:
                    > I've also noticed that Vim spends somewhat significant time on startup loading spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the stack. The reason for this is CRT blocking since the runtime is multithreaded. It's Windows, but on Linux it should be similar.
                    >
                    > As far as I understand, Vim doesn't access the spell file from multiple threads. Thus, the situation can be improved a lot: on Linux by using getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. Before VS 2005 this function can be emulated by macro:
                    >
                    > #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
                    > 0xff & *(_stream)->_ptr++ : _filbuf(_stream))
                    >
                    > By switching to non-blocking getc() in spell.c I was able to reduce Vim startup time from about 0.9 seconds to about 0.7 seconds.
                    >

                    How did you measure the time in EnterCriticalSection and
                    LeaveCriticalSection? If there's no lock contention, these routines are
                    little more than InterlockedIncrement and InterlockedDecrement, without
                    a kernel transition or blocking. If the lock is already held, then by
                    definition, EnterCriticalSection has to block until the lock is
                    available. Similarly, if LeaveCriticalSection detects that there are
                    other callers waiting, it will signal one of the waiters.

                    In other words, if you're seeing significant time in Enter/LeaveCS, I
                    can think of two causes. Either your measurement tool has perturbed the
                    results, or there really is some multithreaded lock contention. The
                    former seems more likely, as Vim is single-threaded, but who knows what
                    some DLLs in the Vim process might be doing.

                    I would be vary wary of using the _getc_nolock macro until we understand
                    why you are seeing those results.

                    --
                    /George V. Reilly george@...
                    http://www.georgevreilly.com/blog
                    The biggest mistake is not learning from all your other mistakes.
                  • Alexei Alexandrov
                    ... I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of
                    Message 9 of 30 , Feb 7, 2007
                    • 0 Attachment
                      Hi George V. Reilly, you wrote:

                      >
                      > How did you measure the time in EnterCriticalSection and
                      > LeaveCriticalSection?

                      I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of tools is very low. I verified that the problem exists by compiling Vim with unlocked getc() and measuring the difference (without any tool, just by $ time ...).

                      > If there's no lock contention, these routines are little more than
                      > InterlockedIncrement and InterlockedDecrement, without a kernel transition
                      > or blocking.

                      You're absolutely right, it doesn't need to switch the context when the CS is free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which confirms these numbers is attached. So it means that to read 1 Mbyte of data with locking getc (this is roughly the size of Russian + English spl files) we need to pay 400e6 cycles for these useless attempts to syncronize. Given the frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second which is 0.17 seconds - exactly the speedup I observed.

                      And remember that you need to pay this price on every Vim startup.

                      >
                      > In other words, if you're seeing significant time in Enter/LeaveCS, I
                      > can think of two causes. Either your measurement tool has perturbed the
                      > results, or there really is some multithreaded lock contention. The
                      > former seems more likely, as Vim is single-threaded, but who knows what
                      > some DLLs in the Vim process might be doing.

                      No, there isn't any contention. The critical section in Microsoft multi-threaded CRT is per-FILE* so it's impossible that any guy competes with you unless you give them the FILE *. As far as I can say, descriptor to opened spell file is absolutetly private inside spell.c

                      Also, the numbers above show that the overhead is exactly this without any contention. If there were competition, the overhead would be much bigger.

                      >
                      > I would be vary wary of using the _getc_nolock macro until we understand
                      > why you are seeing those results.
                      >

                      --
                      Alexei Alexandrov
                    • Alexei Alexandrov
                      ... I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of
                      Message 10 of 30 , Feb 7, 2007
                      • 0 Attachment
                        Hi George V. Reilly, you wrote:

                        >
                        > How did you measure the time in EnterCriticalSection and
                        > LeaveCriticalSection?
                        >

                        I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of tools is very low. I verified that the problem exists by compiling Vim with unlocked getc() and measuring the difference (without any tool, just by $ time ...).
                        >
                        > If there's no lock contention, these routines are little more than
                        > InterlockedIncrement and InterlockedDecrement, without a kernel transition
                        > or blocking.
                        >
                        You're absolutely right, it doesn't need to switch the context when the CS is free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which confirms these numbers is attached. So it means that to read 1 Mbyte of data with locking getc (this is roughly the size of Russian + English spl files) we need to pay 400e6 cycles for these useless attempts to syncronize. Given the frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second which is 0.17 seconds - exactly the speedup I observed.

                        And remember that you need to pay this price on every Vim startup.
                        >
                        >
                        > In other words, if you're seeing significant time in Enter/LeaveCS, I
                        > can think of two causes. Either your measurement tool has perturbed the
                        > results, or there really is some multithreaded lock contention. The
                        > former seems more likely, as Vim is single-threaded, but who knows what
                        > some DLLs in the Vim process might be doing.
                        >
                        No, there isn't any contention. The critical section in Microsoft multi-threaded CRT is per-FILE* so it's impossible that any guy competes with you unless you give them the FILE *. As far as I can say, descriptor to opened spell file is absolutetly private inside spell.c

                        Also, the numbers above show that the overhead is exactly this without any contention. If there were competition, the overhead would be much bigger.
                        >
                        >
                        > I would be vary wary of using the _getc_nolock macro until we understand
                        > why you are seeing those results.
                        >
                        >
                        --
                        Alexei Alexandrov
                      • Alexei Alexandrov
                        ... Forgot the attachment. -- Alexei Alexandrov
                        Message 11 of 30 , Feb 7, 2007
                        • 0 Attachment
                          Hi Alexei Alexandrov, you wrote:

                          >
                          > The program which confirms these numbers is attached.
                          >
                          Forgot the attachment.

                          --
                          Alexei Alexandrov
                        • Mikolaj Machowski
                          ... When testing it with VST it gave 3.4% speed improvements (the same metodology - 3 tests before and after, choose the best results). m.
                          Message 12 of 30 , Feb 7, 2007
                          • 0 Attachment
                            Dnia środa 07 luty 2007, Alexei Alexandrov napisał:
                            > Hi Mikolaj Machowski, you wrote:
                            > > Nice work. Could you send or place somewhere patches? I'd like to test
                            > > them on more complex regexps.
                            >
                            > Here it is. Note that the biggest speed-up is observed when regexp is
                            > matched a lot of times. The regexp mechanism itself is not affected at
                            > all here - so if you have one regexp which runs very long you won't
                            > probably notice any major difference.

                            When testing it with VST it gave 3.4% speed improvements (the same
                            metodology - 3 tests before and after, choose the best results).

                            m.
                          • Alexei Alexandrov
                            ... Well, it s not that much but it s still positive result. :-) -- Alexei Alexandrov
                            Message 13 of 30 , Feb 8, 2007
                            • 0 Attachment
                              Hi Mikolaj Machowski, you wrote:

                              >
                              > When testing it with VST it gave 3.4% speed improvements (the same
                              > metodology - 3 tests before and after, choose the best results).
                              >

                              Well, it's not that much but it's still positive result. :-)

                              --
                              Alexei Alexandrov
                            • Nikolai Weibull
                              ... Considering how intense these computations are and how often they are performed (probably the most time-consuming part of Vim), any improvement will help a
                              Message 14 of 30 , Feb 8, 2007
                              • 0 Attachment
                                On 2/8/07, Alexei Alexandrov <alexei.alexandrov@...> wrote:

                                > > When testing it with VST it gave 3.4% speed improvements (the same
                                > > metodology - 3 tests before and after, choose the best results).

                                > Well, it's not that much but it's still positive result. :-)

                                Considering how intense these computations are and how often they are
                                performed (probably the most time-consuming part of Vim), any
                                improvement will help a lot.

                                It's nice to see people alongside Bram working on improving Vim. Your
                                work is highly appreciated (as well ;-).

                                nikolai
                              • Bram Moolenaar
                                ... I would not assume it s similar until there is proof. ... This sounds like a bug in getc(). It should know that only one thread is reading the file and
                                Message 15 of 30 , Feb 8, 2007
                                • 0 Attachment
                                  Alexei Alexandrov wrote:

                                  > I've also noticed that Vim spends somewhat significant time on startup
                                  > loading spell files (I have 2 languages in my .vimrc: set
                                  > spelllang=en,ru). The time is mostly spent in
                                  > EnterCriticalSection/LeaveCriticalSection with getc() upper the stack.
                                  > The reason for this is CRT blocking since the runtime is
                                  > multithreaded. It's Windows, but on Linux it should be similar.

                                  I would not assume it's similar until there is proof.

                                  > As far as I understand, Vim doesn't access the spell file from
                                  > multiple threads. Thus, the situation can be improved a lot: on Linux
                                  > by using getc_unlocked. On Windows, starting VS 2005 there is a
                                  > function _getc_nolock. Before VS 2005 this function can be emulated by
                                  > macro:
                                  >
                                  > #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
                                  > 0xff & *(_stream)->_ptr++ : _filbuf(_stream))
                                  >
                                  > By switching to non-blocking getc() in spell.c I was able to reduce
                                  > Vim startup time from about 0.9 seconds to about 0.7 seconds.

                                  This sounds like a bug in getc(). It should know that only one thread
                                  is reading the file and skip the locking. I don't think we should fix
                                  library bugs in Vim, unless it's crashing or another significant problem.

                                  Perhaps you can already get a big increase by not compiling for
                                  debugging? With MSVC this usually has a big impact. Also largely
                                  defeats profiling with debugging enabled.

                                  --
                                  From "know your smileys":
                                  :-E Has major dental problems

                                  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                  /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                  \\\ download, build and distribute -- http://www.A-A-P.org ///
                                  \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
                                • Alexei Alexandrov
                                  ... Of course. I m going to investigate it there. ... It can t be a bug. I might be missing what you mean, but I can t see how it can know that only one thread
                                  Message 16 of 30 , Feb 10, 2007
                                  • 0 Attachment
                                    Hi Bram Moolenaar, you wrote:

                                    >> It's Windows, but on Linux it should be similar.
                                    >
                                    > I would not assume it's similar until there is proof.
                                    >
                                    Of course. I'm going to investigate it there.

                                    >
                                    > This sounds like a bug in getc(). It should know that only one thread
                                    > is reading the file and skip the locking. I don't think we should fix
                                    > library bugs in Vim, unless it's crashing or another significant problem.
                                    >
                                    It can't be a bug. I might be missing what you mean, but I can't see how it can know that only one thread is reading a file. It doesn't have a clue whether you gave this FILE * to other threads or not. It tries to be lightweight - as I described in a separate mail it uses InterlockedIncrement/Decrement but they are not that lightweight - they don't require switching to kernel mode but still take about 200 cycles of CPU each.

                                    The only optimization that I see could be avoiding blocking (and even trying to block) in case if there is only one thread in current process and if there is guarantee that this particular call is guaranteed not to create any threads. But 1) it still may be expensive and 2) Vim has some background threads anyway, probably.

                                    > Perhaps you can already get a big increase by not compiling for
                                    > debugging? With MSVC this usually has a big impact. Also largely
                                    > defeats profiling with debugging enabled.
                                    >
                                    I do _all_ performance measurements using optimized version of binary with symbols. This is just a must for performance tuning.

                                    --
                                    Alexei Alexandrov
                                  • Bram Moolenaar
                                    ... Hmm, getc() is apparently preparing for the worst and isn t optimized for the 98% of the situations where there is only one thread reading from the fd.
                                    Message 17 of 30 , Feb 10, 2007
                                    • 0 Attachment
                                      Alexei Alexandrov wrote:

                                      > > This sounds like a bug in getc(). It should know that only one thread
                                      > > is reading the file and skip the locking. I don't think we should fix
                                      > > library bugs in Vim, unless it's crashing or another significant problem.
                                      > >
                                      > It can't be a bug. I might be missing what you mean, but I can't see
                                      > how it can know that only one thread is reading a file. It doesn't
                                      > have a clue whether you gave this FILE * to other threads or not. It
                                      > tries to be lightweight - as I described in a separate mail it uses
                                      > InterlockedIncrement/Decrement but they are not that lightweight -
                                      > they don't require switching to kernel mode but still take about 200
                                      > cycles of CPU each.
                                      >
                                      > The only optimization that I see could be avoiding blocking (and even
                                      > trying to block) in case if there is only one thread in current
                                      > process and if there is guarantee that this particular call is
                                      > guaranteed not to create any threads. But 1) it still may be expensive
                                      > and 2) Vim has some background threads anyway, probably.

                                      Hmm, getc() is apparently preparing for the worst and isn't optimized
                                      for the 98% of the situations where there is only one thread reading
                                      from the fd. With the result that the, very often used, getc() is slow.

                                      Besides using getc_unlocked() we could use read() and do our own
                                      buffering. That can't bee very complicated. And we don't need to
                                      figure out if the getc_unlocked() function is available.


                                      --
                                      hundred-and-one symptoms of being an internet addict:
                                      102. When filling out your driver's license application, you give
                                      your IP address.

                                      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                      \\\ download, build and distribute -- http://www.A-A-P.org ///
                                      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
                                    • Alexei Alexandrov
                                      ... I chose 8192/16384 pair because it s the closest to original 10000 bytes. 10000 itself would also be fine but I like round numbers... The patch with
                                      Message 18 of 30 , Feb 10, 2007
                                      • 0 Attachment
                                        Hi Bram Moolenaar, you wrote:

                                        >
                                        > It sounds like keeping only 1024 bytes would already work for most
                                        > situations. That would be an acceptable amount to keep allocated at
                                        > all times. So why don't we use this as the initial size, and when it
                                        > grows larger we free it when finished. The growth size can be doubled
                                        > each time perhaps.
                                        >

                                        I chose 8192/16384 pair because it's the closest to original 10000 bytes. 10000 itself would also be fine but I like round numbers...

                                        The patch with changes which, I think, close to what you describe above is attached. Could you please take a look at it?

                                        >
                                        > Right, this may happen and stack size wil greatly depend on the line
                                        > length.
                                        >
                                        ...
                                        >
                                        > That's very useful, thanks for diving into this.
                                        >

                                        My pleasure.

                                        --
                                        Alexei Alexandrov
                                      • Bram Moolenaar
                                        ... It s starting to look better, less disadvantages and should still give a fair speedup. 10000 bytes is OK for when it s going to be freed again soon. If
                                        Message 19 of 30 , Feb 10, 2007
                                        • 0 Attachment
                                          Alexei Alexandrov wrote:

                                          > Hi Bram Moolenaar, you wrote:
                                          >
                                          > >
                                          > > It sounds like keeping only 1024 bytes would already work for most
                                          > > situations. That would be an acceptable amount to keep allocated at
                                          > > all times. So why don't we use this as the initial size, and when it
                                          > > grows larger we free it when finished. The growth size can be doubled
                                          > > each time perhaps.
                                          >
                                          > I chose 8192/16384 pair because it's the closest to original 10000
                                          > bytes. 10000 itself would also be fine but I like round numbers...
                                          >
                                          > The patch with changes which, I think, close to what you describe
                                          > above is attached. Could you please take a look at it?

                                          It's starting to look better, less disadvantages and should still give a
                                          fair speedup.

                                          10000 bytes is OK for when it's going to be freed again soon. If you
                                          want to keep the memory allocated something less would be more
                                          appropriate. And thus you need to start low, but could increase it in
                                          larger steps (since it's going to be freed on return anyway).

                                          The growarray is used in lots of places. Adding another field to it
                                          will cause more memory to be used. Isn't it easier to make another
                                          version of ga_grow() that increases the growsize when allocating another
                                          block?

                                          Instead of doubling each time, which is going to be big chunks quickly,
                                          another way would be to first allocate one block at the start size of
                                          about 1000 bytes, then set the growsize to 10000. So we grow at the
                                          same speed as before. Then no extra field or function is needed, it's
                                          local change in the regexp code.

                                          Something like:
                                          if (regstack.ga_data == NULL)
                                          {
                                          ga_init2(®stack, 1, 1000);
                                          ga_grow(®stack, 1);
                                          }
                                          regstack.ga_growsize = 10000;

                                          I do wonder if the memory needs to be cleared when re-using the
                                          allocated memory. Can you check that?

                                          --
                                          A meeting is an event at which the minutes are kept and the hours are lost.

                                          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                          \\\ download, build and distribute -- http://www.A-A-P.org ///
                                          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
                                        • Alexei Alexandrov
                                          Bram Moolenaar wrote: Instead of doubling each time, which is going to be big chunks quickly, another way would be to first allocate one block at the
                                          Message 20 of 30 , Oct 2, 2007
                                          • 0 Attachment
                                            Bram Moolenaar wrote:

                                            >
                                            > Instead of doubling each time, which is going to be big chunks quickly,
                                            > another way would be to first allocate one block at the start size of
                                            > about 1000 bytes, then set the growsize to 10000. So we grow at the
                                            > same speed as before. Then no extra field or function is needed, it's
                                            > local change in the regexp code.
                                            >
                                            > Something like:
                                            > if (regstack.ga_data == NULL)
                                            > {
                                            > ga_init2(®stack, 1, 1000);
                                            > ga_grow(®stack, 1);
                                            > }
                                            > regstack.ga_growsize = 10000;
                                            >
                                            > I do wonder if the memory needs to be cleared when re-using the
                                            > allocated memory. Can you check that?
                                            >

                                            I'd like to get back to this topic - after I use my patch for half a
                                            year without a problem.

                                            The patch is attached. It addresses your comments fully. Please take a look.

                                            Regarding cleaning the stacks - from what I can tell from the code, it's
                                            not needed. regmatch() clears the stack heads in the beginning by
                                            resetting the len of arrays, and then it uses only the part of the stack
                                            it fills.

                                            In fact, I also considered moving the stack allocation management
                                            directly to regmatch() since the stacks are actually used only there.
                                            But it's more cosmetic change and shouldn't affect the functionality in
                                            any way (though we may get 2 reallocations per regexec_both call in case
                                            of big chunks).

                                            So does the patch look like a good one to you? Or will I just live with
                                            it here locally? :)


                                            --~--~---------~--~----~------------~-------~--~----~
                                            You received this message from the "vim_dev" maillist.
                                            For more information, visit http://www.vim.org/maillist.php
                                            -~----------~----~----~----~------~----~------~--~---
                                          • Bram Moolenaar
                                            ... The patch looks OK to me. The big question is: how much performance do we gain? There is also another regexp improvement underway, this was part of the
                                            Message 21 of 30 , Oct 3, 2007
                                            • 0 Attachment
                                              Alexei Alexandrov wrote:

                                              > > Instead of doubling each time, which is going to be big chunks quickly,
                                              > > another way would be to first allocate one block at the start size of
                                              > > about 1000 bytes, then set the growsize to 10000. So we grow at the
                                              > > same speed as before. Then no extra field or function is needed, it's
                                              > > local change in the regexp code.
                                              > >
                                              > > Something like:
                                              > > if (regstack.ga_data == NULL)
                                              > > {
                                              > > ga_init2(®stack, 1, 1000);
                                              > > ga_grow(®stack, 1);
                                              > > }
                                              > > regstack.ga_growsize = 10000;
                                              > >
                                              > > I do wonder if the memory needs to be cleared when re-using the
                                              > > allocated memory. Can you check that?
                                              > >
                                              >
                                              > I'd like to get back to this topic - after I use my patch for half a
                                              > year without a problem.
                                              >
                                              > The patch is attached. It addresses your comments fully. Please take a look.
                                              >
                                              > Regarding cleaning the stacks - from what I can tell from the code, it's
                                              > not needed. regmatch() clears the stack heads in the beginning by
                                              > resetting the len of arrays, and then it uses only the part of the stack
                                              > it fills.
                                              >
                                              > In fact, I also considered moving the stack allocation management
                                              > directly to regmatch() since the stacks are actually used only there.
                                              > But it's more cosmetic change and shouldn't affect the functionality in
                                              > any way (though we may get 2 reallocations per regexec_both call in case
                                              > of big chunks).
                                              >
                                              > So does the patch look like a good one to you? Or will I just live with
                                              > it here locally? :)

                                              The patch looks OK to me. The big question is: how much performance do
                                              we gain?

                                              There is also another regexp improvement underway, this was part of the
                                              Google summer of code. It would be nice if we have a performance
                                              measurement mechanism, so that the regexp stuff can be tuned. A Vim
                                              script would be best, so that it can be run everywhere. Perhaps using
                                              some of the syntax highlighting, since that uses regexp a lot and
                                              provides a real-world situation. Since the actual display updating is
                                              not what we want to measure, using the synID() function might work.
                                              Combined with ":syn sync fromstart".

                                              --
                                              ARTHUR: Now stand aside worthy adversary.
                                              BLACK KNIGHT: (Glancing at his shoulder) 'Tis but a scratch.
                                              ARTHUR: A scratch? Your arm's off.
                                              "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

                                              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                              \\\ download, build and distribute -- http://www.A-A-P.org ///
                                              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                                              --~--~---------~--~----~------------~-------~--~----~
                                              You received this message from the "vim_dev" maillist.
                                              For more information, visit http://www.vim.org/maillist.php
                                              -~----------~----~----~----~------~----~------~--~---
                                            • Mikolaj Machowski
                                              ... I repeated tests done in Feb. The same methodology. Vim 7.1.135 Script: VST. Three profiles before patch, three profiles after patch. Difference between
                                              Message 22 of 30 , Oct 3, 2007
                                              • 0 Attachment
                                                Dnia środa 03 październik 2007, Bram Moolenaar napisał:
                                                > > So does the patch look like a good one to you? Or will I just live
                                                > > with it here locally? :)
                                                >
                                                > The patch looks OK to me. The big question is: how much performance do
                                                > we gain?

                                                I repeated tests done in Feb. The same methodology.

                                                Vim 7.1.135
                                                Script: VST.
                                                Three profiles before patch, three profiles after patch.

                                                Difference between two best results is 2.2% (comparing with 3.4% from
                                                Feb). In addition I compared worst with best:

                                                worst pre - best post: 4.7%
                                                best pre - worst post: 1.2%

                                                Of course all numbers are time gains achieved with regexp-perf patch.

                                                m.


                                                --~--~---------~--~----~------------~-------~--~----~
                                                You received this message from the "vim_dev" maillist.
                                                For more information, visit http://www.vim.org/maillist.php
                                                -~----------~----~----~----~------~----~------~--~---
                                              • Bram Moolenaar
                                                ... What do you mean with VST? It would be good to describe this in a way that I can run the same on my system. -- Scientists decoded the first message from
                                                Message 23 of 30 , Oct 3, 2007
                                                • 0 Attachment
                                                  Mikolaj Machowski wrote:

                                                  > Dnia =B6roda 03 pa=BCdziernik 2007, Bram Moolenaar napisa=B3:
                                                  > > > So does the patch look like a good one to you? Or will I just live
                                                  > > > with it here locally? :)
                                                  > >
                                                  > > The patch looks OK to me. The big question is: how much performance do
                                                  > > we gain?
                                                  >
                                                  > I repeated tests done in Feb. The same methodology.
                                                  >
                                                  > Vim 7.1.135
                                                  > Script: VST.
                                                  > Three profiles before patch, three profiles after patch.
                                                  >
                                                  > Difference between two best results is 2.2% (comparing with 3.4% from
                                                  > Feb). In addition I compared worst with best:
                                                  >
                                                  > worst pre - best post: 4.7%
                                                  > best pre - worst post: 1.2%
                                                  >
                                                  > Of course all numbers are time gains achieved with regexp-perf patch.

                                                  What do you mean with VST? It would be good to describe this in a way
                                                  that I can run the same on my system.

                                                  --
                                                  Scientists decoded the first message from an alien civilization:
                                                  SIMPLY SEND 6 TIMES 10 TO THE 50 ATOMS OF HYDROGEN TO THE STAR
                                                  SYSTEM AT THE TOP OF THE LIST, CROSS OFF THAT STAR SYSTEM, THEN PUT
                                                  YOUR STAR SYSTEM AT THE BOTTOM OF THE LIST AND SEND IT TO 100 OTHER
                                                  STAR SYSTEMS. WITHIN ONE TENTH GALACTIC ROTATION YOU WILL RECEIVE
                                                  ENOUGH HYDROGREN TO POWER YOUR CIVILIZATION UNTIL ENTROPY REACHES ITS
                                                  MAXIMUM! IT REALLY WORKS!

                                                  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                                  /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                                  \\\ download, build and distribute -- http://www.A-A-P.org ///
                                                  \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                                                  --~--~---------~--~----~------------~-------~--~----~
                                                  You received this message from the "vim_dev" maillist.
                                                  For more information, visit http://www.vim.org/maillist.php
                                                  -~----------~----~----~----~------~----~------~--~---
                                                • Mikołaj Machowski
                                                  ... VST - Vim reStructuredText What is it: http://skawina.eu.org/mikolaj/vst.html http://skawina.eu.org/mikolaj/vst.zip Tests were done on vst.txt and with
                                                  Message 24 of 30 , Oct 4, 2007
                                                  • 0 Attachment
                                                    > > Vim 7.1.135
                                                    > > Script: VST.
                                                    > > Three profiles before patch, three profiles after patch.
                                                    > >
                                                    > > Difference between two best results is 2.2% (comparing with 3.4% from
                                                    > > Feb). In addition I compared worst with best:
                                                    > >
                                                    > > worst pre - best post: 4.7%
                                                    > > best pre - worst post: 1.2%
                                                    > >
                                                    > What do you mean with VST? It would be good to describe this in a way
                                                    > that I can run the same on my system.
                                                    >
                                                    VST - Vim reStructuredText

                                                    What is it:
                                                    http://skawina.eu.org/mikolaj/vst.html

                                                    http://skawina.eu.org/mikolaj/vst.zip

                                                    Tests were done on vst.txt and with :Vst command

                                                    m.

                                                    ----------------------------------------------------
                                                    Żagle w górę! Mistrzostwa Polski w Match Racingu
                                                    już 5-7 października w Gdyni!
                                                    Zobacz więcej:
                                                    http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fpolishmatchtour.html&sid=47



                                                    --~--~---------~--~----~------------~-------~--~----~
                                                    You received this message from the "vim_dev" maillist.
                                                    For more information, visit http://www.vim.org/maillist.php
                                                    -~----------~----~----~----~------~----~------~--~---
                                                  • Alexei Alexandrov
                                                    ... OK, so I ll need to prepare some numbers. For now some historical background. I started this performance investigation due to a pathological performance
                                                    Message 25 of 30 , Oct 4, 2007
                                                    • 0 Attachment
                                                      Bram Moolenaar wrote:
                                                      >>
                                                      >> So does the patch look like a good one to you? Or will I just live with
                                                      >> it here locally? :)
                                                      >
                                                      > The patch looks OK to me. The big question is: how much performance do
                                                      > we gain?
                                                      >
                                                      > There is also another regexp improvement underway, this was part of the
                                                      > Google summer of code. It would be nice if we have a performance
                                                      > measurement mechanism, so that the regexp stuff can be tuned. A Vim
                                                      > script would be best, so that it can be run everywhere. Perhaps using
                                                      > some of the syntax highlighting, since that uses regexp a lot and
                                                      > provides a real-world situation. Since the actual display updating is
                                                      > not what we want to measure, using the synID() function might work.
                                                      > Combined with ":syn sync fromstart".
                                                      >

                                                      OK, so I'll need to prepare some numbers.

                                                      For now some historical background. I started this performance
                                                      investigation due to a pathological performance degradation on my VIM
                                                      script for coloring rgb.txt file (the file comes in the root of Vim
                                                      runtime file tree). I color it so that each line is colored with RGB it
                                                      represents. The script is attached.

                                                      It creates a big number of highlight groups with short simple matches.
                                                      It was light-fast on VIM 6.3 and it became redrawing slo-o-o-w on VIM 7
                                                      (after os-stack-based-regexp-stack to heap-based stack change). This is
                                                      why I investigated it. So I wouldn't say that it's improvement - it's
                                                      actually back to performance of 6.3. Now I need to prove you that it's
                                                      actually "back". OK, will do :-)

                                                      --
                                                      Alexei Alexandrov

                                                      --~--~---------~--~----~------------~-------~--~----~
                                                      You received this message from the "vim_dev" maillist.
                                                      For more information, visit http://www.vim.org/maillist.php
                                                      -~----------~----~----~----~------~----~------~--~---
                                                    • Bram Moolenaar
                                                      ... OK, I know about that. I m not sure if this would give a representative performance for regexp use. I would think most people use regexp patterns for
                                                      Message 26 of 30 , Oct 4, 2007
                                                      • 0 Attachment
                                                        Mikolaj Machowski wrote:

                                                        > > > Vim 7.1.135
                                                        > > > Script: VST.
                                                        > > > Three profiles before patch, three profiles after patch.
                                                        > > >
                                                        > > > Difference between two best results is 2.2% (comparing with 3.4% from
                                                        > > > Feb). In addition I compared worst with best:
                                                        > > >
                                                        > > > worst pre - best post: 4.7%
                                                        > > > best pre - worst post: 1.2%
                                                        > > >
                                                        > > What do you mean with VST? It would be good to describe this in a way
                                                        > > that I can run the same on my system.
                                                        > >
                                                        > VST - Vim reStructuredText
                                                        >
                                                        > What is it:
                                                        > http://skawina.eu.org/mikolaj/vst.html
                                                        >
                                                        > http://skawina.eu.org/mikolaj/vst.zip
                                                        >
                                                        > Tests were done on vst.txt and with :Vst command

                                                        OK, I know about that. I'm not sure if this would give a representative
                                                        performance for regexp use. I would think most people use regexp
                                                        patterns for syntax highlighting.

                                                        --
                                                        Team-building exercises come in many forms but they all trace their roots back
                                                        to the prison system. In your typical team-building exercise the employees
                                                        are subjected to a variety of unpleasant situations until they become either a
                                                        cohesive team or a ring of car jackers.
                                                        (Scott Adams - The Dilbert principle)

                                                        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                                        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                                        \\\ download, build and distribute -- http://www.A-A-P.org ///
                                                        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                                                        --~--~---------~--~----~------------~-------~--~----~
                                                        You received this message from the "vim_dev" maillist.
                                                        For more information, visit http://www.vim.org/maillist.php
                                                        -~----------~----~----~----~------~----~------~--~---
                                                      Your message has been successfully submitted and would be delivered to recipients shortly.