Loading ...
Sorry, an error occurred while loading the content.

Re: runaway vim processes

Expand Messages
  • Daniel Elstner
    ... Cheers. Just to be sure -- did you test the patch and verified that it works for you, or was this just a generic expression of joy about the rapid customer
    Message 1 of 14 , Mar 1, 2003
    View Source
    • 0 Attachment
      On Sam, 2003-03-01 at 03:21, Michael P. Soulier wrote:
      > On 27/02/03 Daniel Elstner did speaketh:
      >
      > > Now that was way more than 10 minutes. Anyway, I think I got it
      > > working, the patch is attached.
      >
      > God I love open source.

      Cheers.

      Just to be sure -- did you test the patch and verified that it works for
      you, or was this just a generic expression of joy about the rapid
      customer service in the Open Source world? :-)

      --Daniel
    • Daniel Elstner
      Hey, I found some interesting information on the issue. This Debian bug report explains what s going on:
      Message 2 of 14 , Mar 3, 2003
      View Source
      • 0 Attachment
        Hey,

        I found some interesting information on the issue. This Debian bug
        report explains what's going on:

        http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html

        As mentioned in the mail, the problem with coroutines also applies to
        sigaltstack(). There seems to be only one way around the problem:
        simply don't use sigaltstack().

        The attached patch disables the alternative stack if compiling on Linux
        with pthreads, except for SIGSEGV. This is AFAIK the only signal for
        which the alternative stack is really necessary. Thus there shouldn't
        be a regression in functionality when switching to fixed-up pthreads
        some day.

        Regards,
        --Daniel
      • Bram Moolenaar
        ... This makes sense. I m glad you found out about this Linux problem. The solution would not work though: Catching SIGSEGV on the alternate stack works
        Message 3 of 14 , Mar 3, 2003
        View Source
        • 0 Attachment
          Daniel Elstner wrote:

          > I found some interesting information on the issue. This Debian bug
          > report explains what's going on:
          >
          > http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html
          >
          > As mentioned in the mail, the problem with coroutines also applies to
          > sigaltstack(). There seems to be only one way around the problem:
          > simply don't use sigaltstack().
          >
          > The attached patch disables the alternative stack if compiling on Linux
          > with pthreads, except for SIGSEGV. This is AFAIK the only signal for
          > which the alternative stack is really necessary. Thus there shouldn't
          > be a regression in functionality when switching to fixed-up pthreads
          > some day.

          This makes sense. I'm glad you found out about this Linux problem.

          The solution would not work though: Catching SIGSEGV on the alternate
          stack works (assuming that longjmp() works with threading).
          But when SIGSEGV occurs for another reason it would run into the problem
          with the stack pointer and generate another SIGSEGV, thus loop forever.

          I cannot think of a solution without losing the ability to catch
          out-of-stack errors. We _need_ the alternate stack, and it can't be
          used when threading is enabled...

          --
          To keep milk from turning sour: Keep it in the cow.

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
          \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
          \\\ Help AIDS victims, buy at Amazon -- http://ICCF.nl/click1.html ///
        • Daniel Elstner
          ... Yes, it s not pretty. My idea was that reducing the problem to SIGSEGV is a lot better than the current situation, and switching to a fixed pthread (i.e.
          Message 4 of 14 , Mar 3, 2003
          View Source
          • 0 Attachment
            On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
            > Daniel Elstner wrote:
            >
            > > I found some interesting information on the issue. This Debian bug
            > > report explains what's going on:
            > >
            > > http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html
            > >
            > > As mentioned in the mail, the problem with coroutines also applies to
            > > sigaltstack(). There seems to be only one way around the problem:
            > > simply don't use sigaltstack().
            > >
            > > The attached patch disables the alternative stack if compiling on Linux
            > > with pthreads, except for SIGSEGV. This is AFAIK the only signal for
            > > which the alternative stack is really necessary. Thus there shouldn't
            > > be a regression in functionality when switching to fixed-up pthreads
            > > some day.
            >
            > This makes sense. I'm glad you found out about this Linux problem.
            >
            > The solution would not work though: Catching SIGSEGV on the alternate
            > stack works (assuming that longjmp() works with threading).
            > But when SIGSEGV occurs for another reason it would run into the problem
            > with the stack pointer and generate another SIGSEGV, thus loop forever.

            Yes, it's not pretty. My idea was that reducing the problem to SIGSEGV
            is a lot better than the current situation, and switching to a fixed
            pthread (i.e. glibc compiled with minimum kernel >= 2.4) would fix the
            problem without recompiling Vim.

            SIGSEGV is fatal in most apps; Vim will just behave slightly more
            annoying and livelock instead of crashing immediately. And for this we
            can blame someone else.

            > I cannot think of a solution without losing the ability to catch
            > out-of-stack errors. We _need_ the alternate stack, and it can't be
            > used when threading is enabled...

            There is a solution -- rebuild glibc with minimum kernel >= 2.4 :/

            Regards,
            --Daniel
          • Daniel Elstner
            ... I just confirmed that at least that part still works. Inserting a recursive call early in regmatch() successfully resulted in E363, and Vim did not crash.
            Message 5 of 14 , Mar 3, 2003
            View Source
            • 0 Attachment
              On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:

              > The solution would not work though: Catching SIGSEGV on the alternate
              > stack works (assuming that longjmp() works with threading).

              I just confirmed that at least that part still works. Inserting a
              recursive call early in regmatch() successfully resulted in E363, and
              Vim did not crash.

              --Daniel
            • Daniel Elstner
              ... Ooops, sorry, I take that back. /me just realized that there is an explicit call to mch_stackcheck() in regmatch(). Is there any simple way to test the
              Message 6 of 14 , Mar 3, 2003
              View Source
              • 0 Attachment
                On Die, 2003-03-04 at 08:55, Daniel Elstner wrote:
                > On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
                >
                > > The solution would not work though: Catching SIGSEGV on the alternate
                > > stack works (assuming that longjmp() works with threading).
                >
                > I just confirmed that at least that part still works. Inserting a
                > recursive call early in regmatch() successfully resulted in E363, and
                > Vim did not crash.

                Ooops, sorry, I take that back. /me just realized that there is an
                explicit call to mch_stackcheck() in regmatch(). Is there any simple
                way to test the functionality of the signal handler?

                --Daniel
              • Bram Moolenaar
                ... That s what you get when double checking for errors... I think the simplest way is to undefine HAVE_GETRLIMIT in auto/config.h and compile again. Since we
                Message 7 of 14 , Mar 4, 2003
                View Source
                • 0 Attachment
                  Daniel Elstner wrote:

                  > On Die, 2003-03-04 at 08:55, Daniel Elstner wrote:
                  > > On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
                  > >
                  > > > The solution would not work though: Catching SIGSEGV on the alternate
                  > > > stack works (assuming that longjmp() works with threading).
                  > >
                  > > I just confirmed that at least that part still works. Inserting a
                  > > recursive call early in regmatch() successfully resulted in E363, and
                  > > Vim did not crash.
                  >
                  > Ooops, sorry, I take that back. /me just realized that there is an
                  > explicit call to mch_stackcheck() in regmatch(). Is there any simple
                  > way to test the functionality of the signal handler?

                  That's what you get when double checking for errors...

                  I think the simplest way is to undefine HAVE_GETRLIMIT in auto/config.h
                  and compile again.

                  Since we do HAVE_GETRLIMIT on linux, and threading may cause a hang, it
                  might be better to rely on mch_stackcheck() and not use the alternate
                  stack. Thus use your #ifdef around setting sa.sa_flags, also for
                  SIGSEGV.

                  /* Setup to use the alternate stack for the signal function. */
                  sa.sa_handler = func_deadly;
                  sigemptyset(&sa.sa_mask);
                  # if defined(__linux__) && defined(_REENTRANT)
                  /* Linux with kernel 2.2 has a bug in thread handling in
                  * combination with using the alternate stack: library functions
                  * will use the ordinary stack anyway, causing a SEGV signal,
                  * which recursively calls deathtrap and hangs. */
                  sa.sa_flags = 0;
                  # else
                  sa.sa_flags = SA_ONSTACK;
                  # endif
                  sigaction(signal_info[i].sig, &sa, NULL);

                  Does that look OK?

                  --
                  SUPERIMPOSE "England AD 787". After a few more seconds we hear hoofbeats in
                  the distance. They come slowly closer. Then out of the mist comes KING
                  ARTHUR followed by a SERVANT who is banging two half coconuts together.
                  "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

                  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                  /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                  \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                  \\\ Help AIDS victims, buy at Amazon -- http://ICCF.nl/click1.html ///
                • Daniel Elstner
                  ... Yes, that sounds good to me. ... Yep. But I d change the wording of the comment to: /* On Linux, glibc compiled for minimum kernel 2.2 has a bug in *
                  Message 8 of 14 , Mar 4, 2003
                  View Source
                  • 0 Attachment
                    On Die, 2003-03-04 at 10:25, Bram Moolenaar wrote:

                    > Since we do HAVE_GETRLIMIT on linux, and threading may cause a hang, it
                    > might be better to rely on mch_stackcheck() and not use the alternate
                    > stack. Thus use your #ifdef around setting sa.sa_flags, also for
                    > SIGSEGV.

                    Yes, that sounds good to me.

                    > /* Setup to use the alternate stack for the signal function. */
                    > sa.sa_handler = func_deadly;
                    > sigemptyset(&sa.sa_mask);
                    > # if defined(__linux__) && defined(_REENTRANT)
                    > /* Linux with kernel 2.2 has a bug in thread handling in
                    > * combination with using the alternate stack: library functions
                    > * will use the ordinary stack anyway, causing a SEGV signal,
                    > * which recursively calls deathtrap and hangs. */
                    > sa.sa_flags = 0;
                    > # else
                    > sa.sa_flags = SA_ONSTACK;
                    > # endif
                    > sigaction(signal_info[i].sig, &sa, NULL);
                    >
                    > Does that look OK?

                    Yep. But I'd change the wording of the comment to:

                    /* On Linux, glibc compiled for minimum kernel 2.2 has a bug in
                    * thread handling in combination with using the alternate stack:
                    * pthread library functions try to use the stack pointer to
                    * identify the current thread, causing a SEGV signal, which
                    * recursively calls deathtrap and hangs. */

                    Otherwise people might get confused since Linux 2.2 is quite rare on
                    desktop systems these days.

                    --Daniel
                  Your message has been successfully submitted and would be delivered to recipients shortly.