Loading ...
Sorry, an error occurred while loading the content.

Re: runaway vim processes

Expand Messages
  • Michael P. Soulier
    ... God I love open source. Mike -- Michael P. Soulier , GnuPG pub key: 5BC8BE08 ...the word HACK is used as a verb to indicate a
    Message 1 of 14 , Feb 28, 2003
    • 0 Attachment
      On 27/02/03 Daniel Elstner did speaketh:

      > Now that was way more than 10 minutes. Anyway, I think I got it
      > working, the patch is attached.

      God I love open source.

      Mike

      --
      Michael P. Soulier <msoulier@...>, GnuPG pub key: 5BC8BE08
      "...the word HACK is used as a verb to indicate a massive amount
      of nerd-like effort." -Harley Hahn, A Student's Guide to Unix
      HTML Email Considered Harmful: http://expita.com/nomime.html
    • Daniel Elstner
      ... Cheers. Just to be sure -- did you test the patch and verified that it works for you, or was this just a generic expression of joy about the rapid customer
      Message 2 of 14 , Mar 1 10:51 AM
      • 0 Attachment
        On Sam, 2003-03-01 at 03:21, Michael P. Soulier wrote:
        > On 27/02/03 Daniel Elstner did speaketh:
        >
        > > Now that was way more than 10 minutes. Anyway, I think I got it
        > > working, the patch is attached.
        >
        > God I love open source.

        Cheers.

        Just to be sure -- did you test the patch and verified that it works for
        you, or was this just a generic expression of joy about the rapid
        customer service in the Open Source world? :-)

        --Daniel
      • Daniel Elstner
        Hey, I found some interesting information on the issue. This Debian bug report explains what s going on:
        Message 3 of 14 , Mar 3 8:01 AM
        • 0 Attachment
          Hey,

          I found some interesting information on the issue. This Debian bug
          report explains what's going on:

          http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html

          As mentioned in the mail, the problem with coroutines also applies to
          sigaltstack(). There seems to be only one way around the problem:
          simply don't use sigaltstack().

          The attached patch disables the alternative stack if compiling on Linux
          with pthreads, except for SIGSEGV. This is AFAIK the only signal for
          which the alternative stack is really necessary. Thus there shouldn't
          be a regression in functionality when switching to fixed-up pthreads
          some day.

          Regards,
          --Daniel
        • Bram Moolenaar
          ... This makes sense. I m glad you found out about this Linux problem. The solution would not work though: Catching SIGSEGV on the alternate stack works
          Message 4 of 14 , Mar 3 12:05 PM
          • 0 Attachment
            Daniel Elstner wrote:

            > I found some interesting information on the issue. This Debian bug
            > report explains what's going on:
            >
            > http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html
            >
            > As mentioned in the mail, the problem with coroutines also applies to
            > sigaltstack(). There seems to be only one way around the problem:
            > simply don't use sigaltstack().
            >
            > The attached patch disables the alternative stack if compiling on Linux
            > with pthreads, except for SIGSEGV. This is AFAIK the only signal for
            > which the alternative stack is really necessary. Thus there shouldn't
            > be a regression in functionality when switching to fixed-up pthreads
            > some day.

            This makes sense. I'm glad you found out about this Linux problem.

            The solution would not work though: Catching SIGSEGV on the alternate
            stack works (assuming that longjmp() works with threading).
            But when SIGSEGV occurs for another reason it would run into the problem
            with the stack pointer and generate another SIGSEGV, thus loop forever.

            I cannot think of a solution without losing the ability to catch
            out-of-stack errors. We _need_ the alternate stack, and it can't be
            used when threading is enabled...

            --
            To keep milk from turning sour: Keep it in the cow.

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
            \\\ Help AIDS victims, buy at Amazon -- http://ICCF.nl/click1.html ///
          • Daniel Elstner
            ... Yes, it s not pretty. My idea was that reducing the problem to SIGSEGV is a lot better than the current situation, and switching to a fixed pthread (i.e.
            Message 5 of 14 , Mar 3 12:27 PM
            • 0 Attachment
              On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
              > Daniel Elstner wrote:
              >
              > > I found some interesting information on the issue. This Debian bug
              > > report explains what's going on:
              > >
              > > http://lists.debian.org/debian-glibc/2002/debian-glibc-200212/msg00347.html
              > >
              > > As mentioned in the mail, the problem with coroutines also applies to
              > > sigaltstack(). There seems to be only one way around the problem:
              > > simply don't use sigaltstack().
              > >
              > > The attached patch disables the alternative stack if compiling on Linux
              > > with pthreads, except for SIGSEGV. This is AFAIK the only signal for
              > > which the alternative stack is really necessary. Thus there shouldn't
              > > be a regression in functionality when switching to fixed-up pthreads
              > > some day.
              >
              > This makes sense. I'm glad you found out about this Linux problem.
              >
              > The solution would not work though: Catching SIGSEGV on the alternate
              > stack works (assuming that longjmp() works with threading).
              > But when SIGSEGV occurs for another reason it would run into the problem
              > with the stack pointer and generate another SIGSEGV, thus loop forever.

              Yes, it's not pretty. My idea was that reducing the problem to SIGSEGV
              is a lot better than the current situation, and switching to a fixed
              pthread (i.e. glibc compiled with minimum kernel >= 2.4) would fix the
              problem without recompiling Vim.

              SIGSEGV is fatal in most apps; Vim will just behave slightly more
              annoying and livelock instead of crashing immediately. And for this we
              can blame someone else.

              > I cannot think of a solution without losing the ability to catch
              > out-of-stack errors. We _need_ the alternate stack, and it can't be
              > used when threading is enabled...

              There is a solution -- rebuild glibc with minimum kernel >= 2.4 :/

              Regards,
              --Daniel
            • Daniel Elstner
              ... I just confirmed that at least that part still works. Inserting a recursive call early in regmatch() successfully resulted in E363, and Vim did not crash.
              Message 6 of 14 , Mar 3 11:55 PM
              • 0 Attachment
                On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:

                > The solution would not work though: Catching SIGSEGV on the alternate
                > stack works (assuming that longjmp() works with threading).

                I just confirmed that at least that part still works. Inserting a
                recursive call early in regmatch() successfully resulted in E363, and
                Vim did not crash.

                --Daniel
              • Daniel Elstner
                ... Ooops, sorry, I take that back. /me just realized that there is an explicit call to mch_stackcheck() in regmatch(). Is there any simple way to test the
                Message 7 of 14 , Mar 3 11:58 PM
                • 0 Attachment
                  On Die, 2003-03-04 at 08:55, Daniel Elstner wrote:
                  > On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
                  >
                  > > The solution would not work though: Catching SIGSEGV on the alternate
                  > > stack works (assuming that longjmp() works with threading).
                  >
                  > I just confirmed that at least that part still works. Inserting a
                  > recursive call early in regmatch() successfully resulted in E363, and
                  > Vim did not crash.

                  Ooops, sorry, I take that back. /me just realized that there is an
                  explicit call to mch_stackcheck() in regmatch(). Is there any simple
                  way to test the functionality of the signal handler?

                  --Daniel
                • Bram Moolenaar
                  ... That s what you get when double checking for errors... I think the simplest way is to undefine HAVE_GETRLIMIT in auto/config.h and compile again. Since we
                  Message 8 of 14 , Mar 4 1:25 AM
                  • 0 Attachment
                    Daniel Elstner wrote:

                    > On Die, 2003-03-04 at 08:55, Daniel Elstner wrote:
                    > > On Mon, 2003-03-03 at 21:05, Bram Moolenaar wrote:
                    > >
                    > > > The solution would not work though: Catching SIGSEGV on the alternate
                    > > > stack works (assuming that longjmp() works with threading).
                    > >
                    > > I just confirmed that at least that part still works. Inserting a
                    > > recursive call early in regmatch() successfully resulted in E363, and
                    > > Vim did not crash.
                    >
                    > Ooops, sorry, I take that back. /me just realized that there is an
                    > explicit call to mch_stackcheck() in regmatch(). Is there any simple
                    > way to test the functionality of the signal handler?

                    That's what you get when double checking for errors...

                    I think the simplest way is to undefine HAVE_GETRLIMIT in auto/config.h
                    and compile again.

                    Since we do HAVE_GETRLIMIT on linux, and threading may cause a hang, it
                    might be better to rely on mch_stackcheck() and not use the alternate
                    stack. Thus use your #ifdef around setting sa.sa_flags, also for
                    SIGSEGV.

                    /* Setup to use the alternate stack for the signal function. */
                    sa.sa_handler = func_deadly;
                    sigemptyset(&sa.sa_mask);
                    # if defined(__linux__) && defined(_REENTRANT)
                    /* Linux with kernel 2.2 has a bug in thread handling in
                    * combination with using the alternate stack: library functions
                    * will use the ordinary stack anyway, causing a SEGV signal,
                    * which recursively calls deathtrap and hangs. */
                    sa.sa_flags = 0;
                    # else
                    sa.sa_flags = SA_ONSTACK;
                    # endif
                    sigaction(signal_info[i].sig, &sa, NULL);

                    Does that look OK?

                    --
                    SUPERIMPOSE "England AD 787". After a few more seconds we hear hoofbeats in
                    the distance. They come slowly closer. Then out of the mist comes KING
                    ARTHUR followed by a SERVANT who is banging two half coconuts together.
                    "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                    \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                    \\\ Help AIDS victims, buy at Amazon -- http://ICCF.nl/click1.html ///
                  • Daniel Elstner
                    ... Yes, that sounds good to me. ... Yep. But I d change the wording of the comment to: /* On Linux, glibc compiled for minimum kernel 2.2 has a bug in *
                    Message 9 of 14 , Mar 4 7:15 AM
                    • 0 Attachment
                      On Die, 2003-03-04 at 10:25, Bram Moolenaar wrote:

                      > Since we do HAVE_GETRLIMIT on linux, and threading may cause a hang, it
                      > might be better to rely on mch_stackcheck() and not use the alternate
                      > stack. Thus use your #ifdef around setting sa.sa_flags, also for
                      > SIGSEGV.

                      Yes, that sounds good to me.

                      > /* Setup to use the alternate stack for the signal function. */
                      > sa.sa_handler = func_deadly;
                      > sigemptyset(&sa.sa_mask);
                      > # if defined(__linux__) && defined(_REENTRANT)
                      > /* Linux with kernel 2.2 has a bug in thread handling in
                      > * combination with using the alternate stack: library functions
                      > * will use the ordinary stack anyway, causing a SEGV signal,
                      > * which recursively calls deathtrap and hangs. */
                      > sa.sa_flags = 0;
                      > # else
                      > sa.sa_flags = SA_ONSTACK;
                      > # endif
                      > sigaction(signal_info[i].sig, &sa, NULL);
                      >
                      > Does that look OK?

                      Yep. But I'd change the wording of the comment to:

                      /* On Linux, glibc compiled for minimum kernel 2.2 has a bug in
                      * thread handling in combination with using the alternate stack:
                      * pthread library functions try to use the stack pointer to
                      * identify the current thread, causing a SEGV signal, which
                      * recursively calls deathtrap and hangs. */

                      Otherwise people might get confused since Linux 2.2 is quite rare on
                      desktop systems these days.

                      --Daniel
                    Your message has been successfully submitted and would be delivered to recipients shortly.