Loading ...
Sorry, an error occurred while loading the content.

Postfix lost connection issue

Expand Messages
  • mailtime
    Hi, After many years of hassle free use from our email system we are now getting a rather odd disconnect as shown below. May 3 16:07:36 spambox
    Message 1 of 11 , May 2 10:31 PM
    • 0 Attachment
      Hi,
      After many years of hassle free use from our email system we are now getting
      a rather odd disconnect as shown below.


      May 3 16:07:36 spambox nss_wins[14039]: connect from
      sender.remote.com[1.2.3.4]
      May 3 16:07:38 spambox postgrey[2865]: action=pass, reason=client AWL,
      client_name=remote.remote.com, client_address=1.2.3.4,
      sender=John.Smith@..., recipient=user@...
      May 3 16:07:38 spambox nss_wins[14039]: 0518C62E981:
      client=remote.remote.com[1.2.3.4]
      May 3 16:07:38 spambox postfix/cleanup[16322]: 0518C62E981: hold: header
      Received: from remote.remote.com (remote.remote.com [1.2.3.4])??by
      spambox.co.nz (Postfix) with ESMTP id 0518C62E981??for
      <user@...>; Fri, 3 May 2 from remote.remote.com[1.2.3.4];
      from=<John.Smith@...> to=<user@...> proto=ESMTP
      helo=<remote.remote.com>
      May 3 16:07:38 spambox postfix/cleanup[16322]: 0518C62E981:
      message-id=<5C82EDFA9CB70146A77EB00A438DC6340B7614DB@...>
      May 3 16:07:39 spambox nss_wins[14039]: lost connection after DATA from
      remote.remote.com[1.2.3.4]
      May 3 16:07:39 spambox nss_wins[14039]: disconnect from
      remote.remote.com[1.2.3.4]


      What is frustrating is that is is not occurring every single time, some
      emails from this domain pass through with no delay, other times it drops the
      connection and the sending server queues it and tries again.
      There are two mail servers that send to clients on this system that are
      having the issue.
      Both of the servers are running Mail Marshal (if that's any help!)

      I have done some research on the "lost connection after Data" and many
      articles refer to loss of network connectivity which we have tested and is
      not the issue. Others refer to incorrect, or lowering the MTU of either NIC
      or Router (have confirmed this is also not the issue.

      I would be interested to know if anyone else has had a problem like this.





      --
      View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670.html
      Sent from the Postfix Users mailing list archive at Nabble.com.
    • Viktor Dukhovni
      ... Yuck, nss_wins openlog(3) breakage! :-( I would find a way of disabling this. Perhaps proxymap can help. Also report the bug to your O/S vendor, shared
      Message 2 of 11 , May 2 10:42 PM
      • 0 Attachment
        On Thu, May 02, 2013 at 10:31:07PM -0700, mailtime wrote:

        > After many years of hassle free use from our email system we are now getting
        > a rather odd disconnect as shown below.
        >
        >
        > May 3 16:07:36 spambox nss_wins[14039]: connect from
        > sender.remote.com[1.2.3.4]

        Yuck, nss_wins openlog(3) breakage! :-( I would find a way of
        disabling this. Perhaps proxymap can help. Also report the
        bug to your O/S vendor, shared libraries must not create this
        mess.

        > May 3 16:07:39 spambox nss_wins[14039]: lost connection after DATA from
        > remote.remote.com[1.2.3.4]

        http://www.postfix.org/DEBUG_README.html#sniffer

        Then read the result with wireshark. Make a careful note of the
        window-scaling, and MTU negotiated at the start of the session,
        what happened shortly before the sender closed the session. Who
        sent the first FIN or RST? Were there any retransmissions.

        You could disable window scaling on your end, or set the default
        scale exponent to 0. Most problems are firewall or NAT device
        problems and window scaling can confuse the more crude implementations.

        Any other logging from smtpd (aka nss_wins) 14039 between the
        connection start and end?

        --
        Viktor.
      • mailtime
        Thanks for your thoughts i will follow that article and see what i get back. In regard to the 14039, no only the 4 entries (pertaining to that transaction) I
        Message 3 of 11 , May 2 11:09 PM
        • 0 Attachment
          Thanks for your thoughts i will follow that article and see what i get back.
          In regard to the 14039, no only the 4 entries (pertaining to that
          transaction)

          I do finally have a log file from one of sending mail servers though.

          20:08:45.214 RX: <220 spambox.co.nz ESMTP Postfix>
          20:08:45.214 TX: <EHLO remote.remote.com>
          20:08:45.926 RX: <250-spambox.co.nz>
          20:08:45.926 RX: <250-PIPELINING>
          20:08:45.926 RX: <250-SIZE 20728640>
          20:08:45.926 RX: <250-VRFY>
          20:08:45.926 RX: <250-ETRN>
          20:08:45.926 RX: <250-STARTTLS>
          20:08:45.926 RX: <250-AUTH PLAIN LOGIN CRAM-MD5>
          20:08:45.926 RX: <250-AUTH=PLAIN LOGIN CRAM-MD5>
          20:08:45.927 RX: <250-ENHANCEDSTATUSCODES>
          20:08:45.927 RX: <250-8BITMIME>
          20:08:45.927 RX: <250 DSN>
          20:08:45.927 TX: <MAIL FROM:<sender@...> SIZE=6469087>
          20:08:46.088 RX: <250 2.1.0 Ok>
          20:08:46.088 TX: <RCPT TO:<user@...>>
          20:08:46.940 RX: <250 2.1.5 Ok>
          20:08:46.941 TX: <RCPT TO:<user2@...>>
          20:08:50.163 RX: <250 2.1.5 Ok>
          20:08:50.163 TX: <DATA>
          20:08:50.323 RX: <354 End data with <CR><LF>.<CR><LF>>
          20:09:12.407 SMTPResponse(Sending Body): Socket error - 10053 -
          WSAECONNABORTED
          20:09:12.407 SMTPSend(QUIT): Socket send failed to 202.162.181.245 error -
          10053 - WSAECONNABORTED
          20:09:12.407 Thread exiting for B5174efe90000.5174efe90000.0001.mml after
          28519 millisecs






          --
          View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670p57672.html
          Sent from the Postfix Users mailing list archive at Nabble.com.
        • Viktor Dukhovni
          ... The message size is ~6 MB. ... Message content transmission starts at 20:08:50.323. ... Transmission aborts 22.084 seconds later. So the bandwidth used is
          Message 4 of 11 , May 2 11:36 PM
          • 0 Attachment
            On Thu, May 02, 2013 at 11:09:35PM -0700, mailtime wrote:

            > 20:08:45.927 TX: <MAIL FROM:<sender@...> SIZE=6469087>

            The message size is ~6 MB.

            > 20:08:50.163 TX: <DATA>
            > 20:08:50.323 RX: <354 End data with <CR><LF>.<CR><LF>>

            Message content transmission starts at 20:08:50.323.

            > 20:09:12.407 SMTPResponse(Sending Body): Socket error - 10053 -
            > WSAECONNABORTED

            Transmission aborts 22.084 seconds later. So the bandwidth used
            is at most 3Mb/s, but sufficiently recent Postfix releases would
            normally log the number of bytes received, and cleanup did see
            headers, so your Postfix is likely relatively old. Do capture the
            session and apply wireshark or similar.

            --
            Viktor.
          • mailtime
            Thanks for your advice Viktor I have done a capture and loaded into WireShark. 1, 0.000000 , 1.2.3.4 , 192.168.1.239 , TCP , 66 , 19524 smtp [SYN] Seq=0
            Message 5 of 11 , May 7 4:46 AM
            • 0 Attachment
              Thanks for your advice Viktor
              I have done a capture and loaded into WireShark.


              1,"0.000000","1.2.3.4","192.168.1.239","TCP","66","19524 > smtp [SYN] Seq=0
              Win=8192 Len=0 MSS=1380 WS=256 SACK_PERM=1"
              2,"0.000022","192.168.1.239","1.2.3.4","TCP","66","smtp > 19524 [SYN, ACK]
              Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1 WS=128"
              3,"0.043797","1.2.3.4","192.168.1.239","TCP","60","19524 > smtp [ACK] Seq=1
              Ack=1 Win=131072 Len=0"
              4,"0.047331","192.168.1.239","1.2.3.4","SMTP","96","S: 220 spambox.co.nz
              ESMTP Postfix"
              5,"0.093672","1.2.3.4","192.168.1.239","SMTP","90","C: EHLO
              remote.sender.com"
              6,"0.093699","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK] Seq=43
              Ack=37 Win=5888 Len=0"
              7,"0.095070","192.168.1.239","1.2.3.4","SMTP","261","S: 250-spambox.co.nz |
              250-PIPELINING | 250-SIZE 20728640 | 250-VRFY | 250-ETRN | 250-STARTTLS |
              250-AUTH PLAIN LOGIN CRAM-MD5 | 250-AUTH=PLAIN LOGIN CRAM-MD5 |
              250-ENHANCEDSTATUSCODES | 250-8BITMIME | 250 DSN"
              8,"0.140698","1.2.3.4","192.168.1.239","SMTP","115","C: MAIL
              FROM:<jono.smith@...> SIZE=221190"
              9,"0.146588","192.168.1.239","1.2.3.4","SMTP","68","S: 250 2.1.0 Ok"
              10,"0.190450","1.2.3.4","192.168.1.239","SMTP","89","C: RCPT
              TO:<recipient@...>"
              11,"0.230190","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
              Seq=264 Ack=133 Win=5888 Len=0"
              12,"3.625029","192.168.1.239","1.2.3.4","SMTP","68","S: 250 2.1.5 Ok"
              13,"3.668700","1.2.3.4","192.168.1.239","SMTP","60","C: DATA"

              Which follows on for a while transmitting then this occurs

              66,"4.267143","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
              Seq=315 Ack=47059 Win=64128 Len=0"
              67,"4.272844","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
              1380 bytes"
              68,"4.278587","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
              1380 bytes"
              69,"4.278597","192.168.1.239","1.2.3.4","TCP","54","smtp > 19524 [ACK]
              Seq=315 Ack=49819 Win=64128 Len=0"
              70,"4.284366","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
              1380 bytes"
              71,"4.290093","1.2.3.4","192.168.1.239","SMTP","1434","[TCP Previous segment
              not captured] C: DATA fragment, 1380 bytes"
              72,"4.290104","192.168.1.239","1.2.3.4","TCP","66","smtp > 19524 [ACK]
              Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=55339"
              73,"4.295853","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
              1380 bytes"
              74,"4.295859","192.168.1.239","1.2.3.4","TCP","66","[TCP Dup ACK 72#1] smtp
              > 19524 [ACK] Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=56719"
              75,"4.301731","1.2.3.4","192.168.1.239","SMTP","1434","C: DATA fragment,
              1380 bytes"
              76,"4.301738","192.168.1.239","1.2.3.4","TCP","66","[TCP Dup ACK 72#2] smtp
              > 19524 [ACK] Seq=315 Ack=51199 Win=64128 Len=0 SLE=53959 SRE=58099"
              77,"4.307588","1.2.3.4","192.168.1.239","SMTP","1434","[TCP Previous segment
              not captured] C: DATA fragment, 1380 bytes"

















              --
              View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670p57788.html
              Sent from the Postfix Users mailing list archive at Nabble.com.
            • Viktor Dukhovni
              ... The wireshark output, which omits much detail, (but the GUI allows you to drill in various details) is for you to interpret. If you want help from me,
              Message 6 of 11 , May 7 7:58 AM
              • 0 Attachment
                On Tue, May 07, 2013 at 04:46:35AM -0700, mailtime wrote:

                > Thanks for your advice Viktor
                > I have done a capture and loaded into WireShark.

                The wireshark output, which omits much detail, (but the GUI allows
                you to drill in various details) is for you to interpret. If you
                want help from me, post complete output from "tcpdump -nr capture.pcap".

                That said, clearly some packets from the sender are lost, and never
                retransmitted. The TCP connection negotiates selective ACK and
                window scaling on both sides. First thing I would do is disable
                window scaling on your Postfix server. This will reduce throughput
                for mail from far away senders (without window scaling Hong Kong
                to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).

                The problem is almost certainly a firewall bug either at the sending
                or receiving side. Updating a more recent firewall software or
                firmware version may help.

                --
                Viktor.
              • Wietse Venema
                ... Postfix 2.6 and later have a workaround for TCP window scaling bugs. For detailed instructions, see: http://www.postfix.org/postconf.5.html#tcp_windowsize
                Message 7 of 11 , May 7 9:57 AM
                • 0 Attachment
                  Viktor Dukhovni:
                  > That said, clearly some packets from the sender are lost, and never
                  > retransmitted. The TCP connection negotiates selective ACK and
                  > window scaling on both sides. First thing I would do is disable
                  > window scaling on your Postfix server. This will reduce throughput
                  > for mail from far away senders (without window scaling Hong Kong
                  > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).

                  Postfix 2.6 and later have a workaround for TCP window scaling bugs.
                  For detailed instructions, see:
                  http://www.postfix.org/postconf.5.html#tcp_windowsize

                  Wietse
                • Viktor Dukhovni
                  ... On which operating systems does this suppress window scaling during the TCP 3-way handshake? I tried MacOSX and NetBSD, neither seemed to suppress window
                  Message 8 of 11 , May 7 10:26 AM
                  • 0 Attachment
                    On Tue, May 07, 2013 at 12:57:27PM -0400, Wietse Venema wrote:

                    > Viktor Dukhovni:
                    > > That said, clearly some packets from the sender are lost, and never
                    > > retransmitted. The TCP connection negotiates selective ACK and
                    > > window scaling on both sides. First thing I would do is disable
                    > > window scaling on your Postfix server. This will reduce throughput
                    > > for mail from far away senders (without window scaling Hong Kong
                    > > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).
                    >
                    > Postfix 2.6 and later have a workaround for TCP window scaling bugs.
                    > For detailed instructions, see:
                    > http://www.postfix.org/postconf.5.html#tcp_windowsize

                    On which operating systems does this suppress window scaling during
                    the TCP 3-way handshake?

                    I tried MacOSX and NetBSD, neither seemed to suppress window scaling
                    when the server window size was set to 32767:

                    C > S: Flags [S], seq 227891400, win 65535, options
                    [mss 1460,wscale 4,TS val 342104207 ecr 0,sackOK,eol], length 0
                    S > C: Flags [S.], seq 3628790798, ack 227891401, win 32767, options
                    [mss 1460,wscale 3,TS val 1 ecr 342104207,sackOK], length 0

                    Have not tried Linux yet, perhaps this works there...

                    --
                    Viktor.
                  • Wietse Venema
                    ... It s five years since this I added the tcp_windowsize feature, and the systems that I was using at the time have been decomissioned. Even with minuscule
                    Message 9 of 11 , May 7 11:32 AM
                    • 0 Attachment
                      Viktor Dukhovni:
                      > On Tue, May 07, 2013 at 12:57:27PM -0400, Wietse Venema wrote:
                      >
                      > > Viktor Dukhovni:
                      > > > That said, clearly some packets from the sender are lost, and never
                      > > > retransmitted. The TCP connection negotiates selective ACK and
                      > > > window scaling on both sides. First thing I would do is disable
                      > > > window scaling on your Postfix server. This will reduce throughput
                      > > > for mail from far away senders (without window scaling Hong Kong
                      > > > to USA will be at most 64KB per ~0.25 sec or 4 sec per MB of data).
                      > >
                      > > Postfix 2.6 and later have a workaround for TCP window scaling bugs.
                      > > For detailed instructions, see:
                      > > http://www.postfix.org/postconf.5.html#tcp_windowsize
                      >
                      > On which operating systems does this suppress window scaling during
                      > the TCP 3-way handshake?

                      It's five years since this I added the tcp_windowsize feature, and
                      the systems that I was using at the time have been decomissioned.
                      Even with minuscule window sizes I see a non-zero wscale option in
                      the SYN+ACK response from FreeBSD 8. I'll update the documentation.

                      Wietse
                    • mailtime
                      Thanks both for the info. I have attached a log file showing the tcp information. It starts to go wrong around Frame 24660: I did disable window scaling on
                      Message 10 of 11 , May 7 10:59 PM
                      • 0 Attachment
                        Thanks both for the info. I have attached a log file showing the tcp
                        information.
                        It starts to go wrong around Frame 24660:
                        I did disable window scaling on Postfix and for a while i thought it had
                        solved the issue but then more of the same as you can see in the log.
                        (hopefully showing the correct information!)

                        TCPLog <http://postfix.1071664.n5.nabble.com/file/n57846/TCPLog>

                        Thanks in advance




                        --
                        View this message in context: http://postfix.1071664.n5.nabble.com/Postfix-lost-connection-issue-tp57670p57846.html
                        Sent from the Postfix Users mailing list archive at Nabble.com.
                      • Viktor Dukhovni
                        ... What do you mean by on Postfix ? This is a kernel setting, modified via sysctl or similar. In any case the session you posted has no window scaling,
                        Message 11 of 11 , May 8 9:40 AM
                        • 0 Attachment
                          On Tue, May 07, 2013 at 10:59:45PM -0700, mailtime wrote:

                          > It starts to go wrong around Frame 24660:
                          > I did disable window scaling on Postfix

                          What do you mean by "on Postfix"? This is a kernel setting, modified
                          via "sysctl" or similar. In any case the session you posted has
                          no window scaling, so that is moot. You could try disabling
                          selective ACK support if that's possible:

                          net.ipv4.tcp_sack = 0
                          net.ipv4.tcp_dsack = 0
                          net.ipv4.tcp_fack = 0

                          > and for a while i thought it had
                          > solved the issue but then more of the same as you can see in the log.
                          > (hopefully showing the correct information!)
                          >
                          > TCPLog <http://postfix.1071664.n5.nabble.com/file/n57846/TCPLog>

                          Somebody's firewall or NAT device is pining for the scrap heap.
                          The receiver's duplicate acks eventually trigger sender RSTs instead
                          of retransmission of the lost segment. What we don't know is whose.

                          --
                          Viktor.
                        Your message has been successfully submitted and would be delivered to recipients shortly.