Loading ...
Sorry, an error occurred while loading the content.

Re: Postfix lost connection issue

Expand Messages
  • Viktor Dukhovni
    ... The message size is ~6 MB. ... Message content transmission starts at 20:08:50.323. ... Transmission aborts 22.084 seconds later. So the bandwidth used is
    Message 1 of 11 , May 2, 2013
    • 0 Attachment
      On Thu, May 02, 2013 at 11:09:35PM -0700, mailtime wrote:

      > 20:08:45.927 TX: <MAIL FROM:<sender@...> SIZE=6469087>

      The message size is ~6 MB.

      > 20:08:50.163 TX: <DATA>
      > 20:08:50.323 RX: <354 End data with <CR><LF>.<CR><LF>>

      Message content transmission starts at 20:08:50.323.

      > 20:09:12.407 SMTPResponse(Sending Body): Socket error - 10053 -
      > WSAECONNABORTED

      Transmission aborts 22.084 seconds later. So the bandwidth used
      is at most 3Mb/s, but sufficiently recent Postfix releases would
      normally log the number of bytes received, and cleanup did see
      headers, so your Postfix is likely relatively old. Do capture the
      session and apply wireshark or similar.

      --
      Viktor.
    • mailtime
      Thanks for your advice Viktor I have done a capture and loaded into WireShark. 1, 0.000000 , 1.2.3.4 , 192.168.1.239 , TCP , 66 , 19524 smtp [SYN] Seq=0
      Message 2 of 11 , May 7, 2013
      • 0 Attachment
        Thanks for your advice Viktor
        I have done a capture and loaded into WireShark.


        1,"0.000000","1.2.3.4","192.168.1.239","TCP","66","19524 > smtp [SYN] Seq=0
        Win=8192 Len=0 MSS=1380 WS=256 SACK_PERM=1"
        2,"0.000022","192.168.1.239","1.2.3.4","TCP","66","smtp > 19524 [SYN, ACK]
        Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1 WS=128"
        3,"0.043797","1.2.3.4","192.168.1.239","TCP","60","19524 > smtp [ACK] Seq=1
        Ack=1 Win=131072 Len=0"
        4,"0.047331","192.168.1.239","1.2.3.4","SMTP","96","S: 220 spambox.co.nz
        ESMTP Postfix"
        5,"0.093672","1.2.3.4","192.168.1.239","SMTP","90","C: EHLO
        remote.sender.com"
        6,"0.093699","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK] Seq=43
        Ack=37 Win=5888 Len=0"
        7,"0.095070","192.168.1.239","1.2.3.4","SMTP","261","S: 250-spambox.co.nz |
        250-PIPELINING | 250-SIZE 20728640 | 250-VRFY | 250-ETRN | 250-STARTTLS |
        250-AUTH PLAIN LOGIN CRAM-MD5 | 250-AUTH=PLAIN LOGIN CRAM-MD5 |
        250-ENHANCEDSTATUSCODES | 250-8BITMIME | 250 DSN"
        8,"0.140698","1.2.3.4","192.168.1.239","SMTP","115","C: MAIL
        FROM:<jono.smith@...> SIZE=221190"
        9,"0.146588","192.168.1.239","1.2.3.4","SMTP","68","S: 250 2.1.0 Ok"
        10,"0.190450","1.2.3.4","192.168.1.239","SMTP","89","C: RCPT
        TO:<recipient@...>"
        11,"0.230190","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
        Seq=264 Ack=133 Win=5888 Len=0"
        12,"3.625029","192.168.1.239","1.2.3.4","SMTP","68","S: 250 2.1.5 Ok"
        13,"3.668700","1.2.3.4","192.168.1.239","SMTP","60","C: DATA"

        Which follows on for a while transmitting then this occurs

        66,"4.267143","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
        Seq=315 Ack=47059 Win=64128 Len=0"
        67,"4.272844","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
        1380 bytes"
        68,"4.278587","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
        1380 bytes"
        69,"4.278597","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
        Seq=315 Ack=49819 Win=64128 Len=0"
        70,"4.284366","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
        1380 bytes"
        71,"4.290093","1.2.3.4","192.168.1.239","SMTP","1434","[TCP Previous segment
        not captured] C: DATA fragment, 1380 bytes"
        72,"4.290104","192.168.1.239","1.2.3.4","TCP","66","smtp > 19524 [ACK]
        Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=55339"
        73,"4.295853","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
        1380 bytes"
        74,"4.295859","192.168.1.239","1.2.3.4","TCP","66","[TCP Dup ACK 72#1] smtp
        > 19524 [ACK] Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=56719"
        75,"4.301731","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
        1380 bytes"
        76,"4.301738","192.168.1.239","1.2.3.4","TCP","66","[TCP Dup ACK 72#2] smtp
        > 19524 [ACK] Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=58099"
        77,"4.307588","1.2.3.4","192.168.1.239","SMTP","1434","[TCP Previous segment
        not captured] C: DATA fragment, 1380 bytes"

















        --
        View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670p57788.html
        Sent from the Postfix Users mailing list archive at Nabble.com.
      • Viktor Dukhovni
        ... The wireshark output, which omits much detail, (but the GUI allows you to drill in various details) is for you to interpret. If you want help from me,
        Message 3 of 11 , May 7, 2013
        • 0 Attachment
          On Tue, May 07, 2013 at 04:46:35AM -0700, mailtime wrote:

          > Thanks for your advice Viktor
          > I have done a capture and loaded into WireShark.

          The wireshark output, which omits much detail, (but the GUI allows
          you to drill in various details) is for you to interpret. If you
          want help from me, post complete output from "tcpdump -nr capture.pcap".

          That said, clearly some packets from the sender are lost, and never
          retransmitted. The TCP connection negotiates selective ACK and
          window scaling on both sides. First thing I would do is disable
          window scaling on your Postfix server. This will reduce throughput
          for mail from far away senders (without window scaling Hong Kong
          to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).

          The problem is almost certainly a firewall bug either at the sending
          or receiving side. Updating a more recent firewall software or
          firmware version may help.

          --
          Viktor.
        • Wietse Venema
          ... Postfix 2.6 and later have a workaround for TCP window scaling bugs. For detailed instructions, see: http://www.postfix.org/postconf.5.html#tcp_windowsize
          Message 4 of 11 , May 7, 2013
          • 0 Attachment
            Viktor Dukhovni:
            > That said, clearly some packets from the sender are lost, and never
            > retransmitted. The TCP connection negotiates selective ACK and
            > window scaling on both sides. First thing I would do is disable
            > window scaling on your Postfix server. This will reduce throughput
            > for mail from far away senders (without window scaling Hong Kong
            > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).

            Postfix 2.6 and later have a workaround for TCP window scaling bugs.
            For detailed instructions, see:
            http://www.postfix.org/postconf.5.html#tcp_windowsize

            Wietse
          • Viktor Dukhovni
            ... On which operating systems does this suppress window scaling during the TCP 3-way handshake? I tried MacOSX and NetBSD, neither seemed to suppress window
            Message 5 of 11 , May 7, 2013
            • 0 Attachment
              On Tue, May 07, 2013 at 12:57:27PM -0400, Wietse Venema wrote:

              > Viktor Dukhovni:
              > > That said, clearly some packets from the sender are lost, and never
              > > retransmitted. The TCP connection negotiates selective ACK and
              > > window scaling on both sides. First thing I would do is disable
              > > window scaling on your Postfix server. This will reduce throughput
              > > for mail from far away senders (without window scaling Hong Kong
              > > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).
              >
              > Postfix 2.6 and later have a workaround for TCP window scaling bugs.
              > For detailed instructions, see:
              > http://www.postfix.org/postconf.5.html#tcp_windowsize

              On which operating systems does this suppress window scaling during
              the TCP 3-way handshake?

              I tried MacOSX and NetBSD, neither seemed to suppress window scaling
              when the server window size was set to 32767:

              C > S: Flags [S], seq 227891400, win 65535, options
              [mss 1460,wscale 4,TS val 342104207 ecr 0,sackOK,eol], length 0
              S > C: Flags [S.], seq 3628790798, ack 227891401, win 32767, options
              [mss 1460,wscale 3,TS val 1 ecr 342104207,sackOK], length 0

              Have not tried Linux yet, perhaps this works there...

              --
              Viktor.
            • Wietse Venema
              ... It s five years since this I added the tcp_windowsize feature, and the systems that I was using at the time have been decomissioned. Even with minuscule
              Message 6 of 11 , May 7, 2013
              • 0 Attachment
                Viktor Dukhovni:
                > On Tue, May 07, 2013 at 12:57:27PM -0400, Wietse Venema wrote:
                >
                > > Viktor Dukhovni:
                > > > That said, clearly some packets from the sender are lost, and never
                > > > retransmitted. The TCP connection negotiates selective ACK and
                > > > window scaling on both sides. First thing I would do is disable
                > > > window scaling on your Postfix server. This will reduce throughput
                > > > for mail from far away senders (without window scaling Hong Kong
                > > > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).
                > >
                > > Postfix 2.6 and later have a workaround for TCP window scaling bugs.
                > > For detailed instructions, see:
                > > http://www.postfix.org/postconf.5.html#tcp_windowsize
                >
                > On which operating systems does this suppress window scaling during
                > the TCP 3-way handshake?

                It's five years since this I added the tcp_windowsize feature, and
                the systems that I was using at the time have been decomissioned.
                Even with minuscule window sizes I see a non-zero wscale option in
                the SYN+ACK response from FreeBSD 8. I'll update the documentation.

                Wietse
              • mailtime
                Thanks both for the info. I have attached a log file showing the tcp information. It starts to go wrong around Frame 24660: I did disable window scaling on
                Message 7 of 11 , May 7, 2013
                • 0 Attachment
                  Thanks both for the info. I have attached a log file showing the tcp
                  information.
                  It starts to go wrong around Frame 24660:
                  I did disable window scaling on Postfix and for a while i thought it had
                  solved the issue but then more of the same as you can see in the log.
                  (hopefully showing the correct information!)

                  TCPLog <http://postfix.1071664.n5.nabble.com/file/n57846/TCPLog>

                  Thanks in advance




                  --
                  View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670p57846.html
                  Sent from the Postfix Users mailing list archive at Nabble.com.
                • Viktor Dukhovni
                  ... What do you mean by on Postfix ? This is a kernel setting, modified via sysctl or similar. In any case the session you posted has no window scaling,
                  Message 8 of 11 , May 8, 2013
                  • 0 Attachment
                    On Tue, May 07, 2013 at 10:59:45PM -0700, mailtime wrote:

                    > It starts to go wrong around Frame 24660:
                    > I did disable window scaling on Postfix

                    What do you mean by "on Postfix"? This is a kernel setting, modified
                    via "sysctl" or similar. In any case the session you posted has
                    no window scaling, so that is moot. You could try disabling
                    selective ACK support if that's possible:

                    net.ipv4.tcp_sack = 0
                    net.ipv4.tcp_dsack = 0
                    net.ipv4.tcp_fack = 0

                    > and for a while i thought it had
                    > solved the issue but then more of the same as you can see in the log.
                    > (hopefully showing the correct information!)
                    >
                    > TCPLog <http://postfix.1071664.n5.nabble.com/file/n57846/TCPLog>

                    Somebody's firewall or NAT device is pining for the scrap heap.
                    The receiver's duplicate acks eventually trigger sender RSTs instead
                    of retransmission of the lost segment. What we don't know is whose.

                    --
                    Viktor.
                  Your message has been successfully submitted and would be delivered to recipients shortly.