Loading ...
Sorry, an error occurred while loading the content.

delayed ACKs for retransmitted packets: ouch!

Expand Messages
  • Neal Cardwell
    Recently i ve been looking at a scenario where i m seeing delayed ACKs for retransmitted packets really destroy the performance of New Reno. The TCP congestion
    Message 1 of 4 , Nov 2, 1998
    • 0 Attachment
      Recently i've been looking at a scenario where i'm seeing delayed ACKs for
      retransmitted packets really destroy the performance of New Reno.

      The TCP congestion control draft (draft-ietf-tcpimpl-cong-control-00.txt)
      specifies that "Out-of-order data segments SHOULD be acknowledged
      immediately, in order to trigger the fast retransmit algorithm." Many
      implementations -- at least FreeBSD 3.0 and Linux 2.1, and probably most
      others, i'm guessing -- interpret this by sending an immediate
      acknowledgment only if a data segment they receive is above a hole in
      their receive queue. That is, the ACK only if the sequence number is above
      and not equal to rcv_next (see Figure 27.15 in Stevens vol 2 for the code
      snippet that does this in Net/3 and FreeBSD).

      Unfortunately, this means that if the sender retransmits a single segment
      which fills in a hole, then the receiver finds that this segment fits in
      nicely at rcv_next. So the receiver will sit around until its delayed ACK
      timer expires, possibly hundreds of ms later. Only then will it ACK to the
      sender that the hole has successfully been filled, and only then will the
      sender be able to continue on, perhaps filling other holes.

      Consider the following sequence plots of tcpdumps of two TCP connections:

      http://www.cs.washington.edu/homes/cardwell/misc/xfer1.ps
      http://www.cs.washington.edu/homes/cardwell/misc/xfer2.ps

      These show a Linux 2.1.126 sender at UW sending 100KB to my Linux 2.0.32
      machine at home over my 440Kbps DSL line. The traces are from the
      perspective of the sender. The RTT is about 22ms for short packets, and
      the MSS is about 1460 bytes.

      These transfers should have taken about 2 seconds, judging from the slope
      of the ACKs during slow start. But of course slow start overshoots, and
      there are many losses at around the 1 second mark in both traces. Now
      because the Linux 2.1.126 sender is using New Reno, it spends several
      painful seconds in Fast Recovery filling in the holes, one segment at a
      time. As a result the second transfer, for instance, spends nearly 5
      seconds in Fast Recovery; during this period I'm getting about 30Kbps on
      average, and not so happy about the $ i forked over for DSL buying me
      modem performance!

      Why does it spend so long in Fast Recovery? I think the main problem is
      that the receiver is delaying its ACKs for the retransmitted segments that
      are nicely filling holes in its receive queue. It happens to be delaying
      them by a lot, due to the particular delayed ACK implementation in Linux
      2.0. But i think the point is that delaying acknowledgments is a very bad
      idea when the sender is filling in holes one packet at a time, as it will
      tend to do in Fast Recovery, or immediately after an RTO (assuming no
      SACK).

      So what i'm asking is this: is it a good idea to clarify or extend the
      notion of "out-of-order" data that should be ACKed immediately, in such a
      way that data segments that fill in a hole in the receive queue should be
      ACKed immediately? This would seem to alleviate this problem with New
      Reno. Are there other scenarios where it would make things worse instead?

      neal
    • Kacheong Poon
      ... That is quite interesting. Solaris sends an ACK immediately in this case. That is why we never saw this problem when we added NewReno to Solaris 2.6.
      Message 2 of 4 , Nov 3, 1998
      • 0 Attachment
        > The TCP congestion control draft (draft-ietf-tcpimpl-cong-control-00.txt)
        > specifies that "Out-of-order data segments SHOULD be acknowledged
        > immediately, in order to trigger the fast retransmit algorithm." Many
        > implementations -- at least FreeBSD 3.0 and Linux 2.1, and probably most
        > others, i'm guessing -- interpret this by sending an immediate
        > acknowledgment only if a data segment they receive is above a hole in
        > their receive queue. That is, the ACK only if the sequence number is above
        > and not equal to rcv_next (see Figure 27.15 in Stevens vol 2 for the code
        > snippet that does this in Net/3 and FreeBSD).

        That is quite interesting. Solaris sends an ACK immediately in this case.
        That is why we never saw this problem when we added NewReno to Solaris 2.6.
        Maybe you can try to use a Solaris 2.6 or 7 as the receiver and see what
        will happen. Also compare the result using a Solaris sender.

        > So what i'm asking is this: is it a good idea to clarify or extend the
        > notion of "out-of-order" data that should be ACKed immediately, in such a
        > way that data segments that fill in a hole in the receive queue should be
        > ACKed immediately? This would seem to alleviate this problem with New
        > Reno. Are there other scenarios where it would make things worse instead?

        Actually, I always think that a segment which fills a hole is an out of
        order segment. IMHO, this looks like a bug in the implementation. Maybe
        this can be another item in the known problem draft?

        K. Poon.
        kcpoon@...
      • Neal Cardwell
        Although Linux 2.0 appears to perform poorly in acking retransmitted packets in some cases, just for the record i thought i d note that Luigi Rizzo pointed out
        Message 3 of 4 , Nov 6, 1998
        • 0 Attachment
          Although Linux 2.0 appears to perform poorly in acking retransmitted
          packets in some cases, just for the record i thought i'd note that Luigi
          Rizzo pointed out to me that BSD does actually do the right thing when it
          gets a data segment that falls below previously-received data:

          Luigi said:
          > Fig27.15 in Stevens deals with TCP_REASS, and in order to have a
          > DELACK you need that the reassembly queue be empty, and this does
          > not happen when the received pkt fills a hole (you know it's a hole
          > because you already have a pkt _after_ it).

          thanks, Luigi!
          neal
        • David S. Miller
          Date: Mon, 2 Nov 1998 19:19:40 -0800 (PST) From: Neal Cardwell during this period I m getting about 30Kbps on average, and not so
          Message 4 of 4 , Nov 7, 1998
          • 0 Attachment
            Date: Mon, 2 Nov 1998 19:19:40 -0800 (PST)
            From: Neal Cardwell <cardwell@...>

            during this period I'm getting about 30Kbps on average, and not so
            happy about the $ i forked over for DSL buying me modem
            performance!

            Sorry I did not respond sooner.

            Pick your poison, here is the fix for both 2.0.x and 2.1.x
            Linux TCP stacks. First 2.0.x:

            --- net/ipv4/tcp_input.c.~1~ Tue Jul 21 08:13:48 1998
            +++ net/ipv4/tcp_input.c Sat Nov 7 06:48:59 1998
            @@ -1929,8 +1929,14 @@
            * Delay the ack if possible. Send ack's to
            * fin frames immediately as there shouldn't be
            * anything more to come.
            + *
            + * ACK immediately if we still have any out of
            + * order data. This is because we desire "maximum
            + * feedback during loss". --DaveM
            */
            - if (!sk->delay_acks || th->fin) {
            + if (!sk->delay_acks || th->fin ||
            + ((sk->acked_seq == skb->end_seq) &&
            + (skb->next != (struct sk_buff *) &sk->receive_queue))) {
            tcp_send_ack(sk);
            } else {
            /*

            And here is the same fix for current 2.1.x Linux TCP:

            Index: net/ipv4/tcp_input.c
            ===================================================================
            RCS file: /vger/u4/cvs/linux/net/ipv4/tcp_input.c,v
            retrieving revision 1.135
            retrieving revision 1.136
            diff -u -r1.135 -r1.136
            --- tcp_input.c 1998/11/07 10:54:42 1.135
            +++ tcp_input.c 1998/11/07 14:36:18 1.136
            @@ -5,7 +5,7 @@
            *
            * Implementation of the Transmission Control Protocol(TCP).
            *
            - * Version: $Id: tcp_input.c,v 1.135 1998/11/07 10:54:42 davem Exp $
            + * Version: $Id: tcp_input.c,v 1.136 1998/11/07 14:36:18 davem Exp $
            *
            * Authors: Ross Biro, <bir7@...>
            * Fred N. van Kempen, <waltje@...>
            @@ -1517,7 +1517,7 @@
            * - delay time <= 0.5 HZ
            * - we don't have a window update to send
            * - must send at least every 2 full sized packets
            - * - must send an ACK if we have any SACKs
            + * - must send an ACK if we have any out of order data
            *
            * With an extra heuristic to handle loss of packet
            * situations and also helping the sender leave slow
            @@ -1530,8 +1530,8 @@
            tcp_raise_window(sk) ||
            /* We entered "quick ACK" mode or... */
            tcp_in_quickack_mode(tp) ||
            - /* We have pending SACKs */
            - (tp->sack_ok && tp->num_sacks)) {
            + /* We have out of order data */
            + (skb_peek(&tp->out_of_order_queue) != NULL)) {
            /* Then ack it now */
            tcp_send_ack(sk);
            } else {

            Enjoy. BTW, I never noticed this because most the time when I'm
            working on loss recovery on 2.1.x Linux both ends speak SACK. The fix
            here for 2.1.x just turns the old SACK test into a more general test.
            I'm hoping 2.2.x gets released soon so SACK can finally be widely
            deployed.

            Thanks a lot for pointing out this problem.

            Later,
            David S. Miller
            davem@...
          Your message has been successfully submitted and would be delivered to recipients shortly.