Loading ...
Sorry, an error occurred while loading the content.

"lost connection with domain while sending end of data -- message may be sent more than once"

Expand Messages
  • Hargis, Mandy
    Good morning, I have two Solaris 10 servers running Postifx 2.3.7 performing Internet bound smtp relay. I brought these online 3.19.07. Beginning 4.24.07 we
    Message 1 of 10 , Apr 30, 2007
    • 0 Attachment

      Good morning,

       

      I have two Solaris 10 servers running Postifx 2.3.7 performing Internet bound smtp relay. I brought these online 3.19.07.  Beginning 4.24.07 we began getting flooded with reports that messages destined for many domains are getting a warning message "lost connection with domain while sending end of data -- message may be sent more than once".  Messages ARE being delivered…anywhere between 15-40 times.  Senders at my site are getting a copy of the “warning” message anywhere between 15-40 times.    

       

      I saw the FAQ ” Mail fails consistently with timeout or lost connection” and have verified we are not using PIX firewalls anywhere and that path MTU discovery is disabled.

       

      Around the same time, we started getting similar reports for inbound messages.  Folks are receiving email messages about a dozen times per day.   These inbound servers are running Solaris 9, Postfix 2.0.12 (upgrading soon) with equally ancient versions of amavisd-new and Spamassassin performing inbound SMTP relay.   These have been up and running successfully for years so this is why I’m not convinced this is a Postfix issue – though everyone is pointing their finger here at the moment.

       

      Inbound and outbound - it does not happen with all messages.   This problem seems to be related to messages containing a .vcf  or .html file attachment. 

       

      I’ll gladly include my postconf –n if you think it will be helpful.  Nothing has changed on any of these servers recently.  Network and firewall groups claim there have been no changes there either.  I realize this may not be a Postfix issue at all – I’m just at a loss at the moment and figured I’d see if anyone on the list has seen anything like this.

       

      Many thanks,

      Mandy Hargis

       

       

    • Victor Duchovni
      ... There s your problem. *Enable* Path MTU discovery, or set a reasonably low MTU (usually something around 1380 rather than 1500 is enough to accomodate most
      Message 2 of 10 , Apr 30, 2007
      • 0 Attachment
        On Mon, Apr 30, 2007 at 11:11:08AM -0400, Hargis, Mandy wrote:

        > I saw the FAQ " Mail fails consistently with timeout or lost connection"
        > and have verified we are not using PIX firewalls anywhere and that path
        > MTU discovery is disabled.

        There's your problem. *Enable* Path MTU discovery, or set a reasonably
        low MTU (usually something around 1380 rather than 1500 is enough to
        accomodate most MTU reducing VPNs). Don't block ICMP "unreachable"
        messages.

        --
        Viktor.

        Disclaimer: off-list followups get on-list replies or get ignored.
        Please do not ignore the "Reply-To" header.

        To unsubscribe from the postfix-users list, visit
        http://www.postfix.org/lists.html or click the link below:
        <mailto:majordomo@...?body=unsubscribe%20postfix-users>

        If my response solves your problem, the best way to thank me is to not
        send an "it worked, thanks" follow-up. If you must respond, please put
        "It worked, thanks" in the "Subject" so I can delete these quickly.
      • Wietse Venema
        ... 1) What evidence do you have that IP path MTU discovery is turned off? 2) IP path MTU discovery on/off matters for sending mail only. With IP path MTU
        Message 3 of 10 , Apr 30, 2007
        • 0 Attachment
          Hargis, Mandy:
          > I saw the FAQ " Mail fails consistently with timeout or lost connection"
          > and have verified we are not using PIX firewalls anywhere and that path
          > MTU discovery is disabled.

          1) What evidence do you have that IP path MTU discovery is turned off?

          2) IP path MTU discovery on/off matters for sending mail only.

          With IP path MTU discovery off, you're asking remote routers to
          fragment packets for you. This *should* work. It's less fragile
          than relying on ICMP fedback, but it also is more wasteful.

          Wietse
        • Hargis, Mandy
          1/ ndd -get /dev/ip ip_path_mtu_discovery 0 2/ Our problem is definitely more frequent with sending messages, though it s happening receiving as well.
          Message 4 of 10 , Apr 30, 2007
          • 0 Attachment
            1/ ndd -get /dev/ip ip_path_mtu_discovery
            0

            2/ Our problem is definitely more frequent with sending messages,
            though it's happening receiving as well.

            Thanks,
            Mandy

            -----Original Message-----
            From: Wietse Venema [mailto:wietse@...]
            Sent: Monday, April 30, 2007 2:17 PM
            To: Hargis, Mandy
            Cc: postfix-users@...
            Subject: Re: "lost connection with domain while sending end of data --
            message may be sent more than once"

            Hargis, Mandy:
            > I saw the FAQ " Mail fails consistently with timeout or lost
            connection"
            > and have verified we are not using PIX firewalls anywhere and that
            path
            > MTU discovery is disabled.

            1) What evidence do you have that IP path MTU discovery is turned off?

            2) IP path MTU discovery on/off matters for sending mail only.

            With IP path MTU discovery off, you're asking remote routers to
            fragment packets for you. This *should* work. It's less fragile
            than relying on ICMP fedback, but it also is more wasteful.

            Wietse
          • Wietse Venema
            ... Let s assume for now that this setting works. ... So messages arrive at the remote site, but the Postfix SMTP client times out before the remote server
            Message 5 of 10 , Apr 30, 2007
            • 0 Attachment
              Hargis, Mandy:
              > 1/ ndd -get /dev/ip ip_path_mtu_discovery
              > 0

              Let's assume for now that this setting works.

              > 2/ Our problem is definitely more frequent with sending messages,
              > though it's happening receiving as well.

              So messages arrive at the remote site, but the Postfix SMTP client
              times out before the remote server responds to "."

              What is the output from:

              postconf | grep smtp_data_

              - Is the timeout problem message size dependent?

              - How many of those messages are you sending in parallel?

              Wietse
            • Hargis, Mandy
              ... times out before the remote server responds to . That is what appears to be happening. The recipient definitely gets the message (multiple times). ...
              Message 6 of 10 , May 1, 2007
              • 0 Attachment
                >So messages arrive at the remote site, but the Postfix SMTP client
                times out before the remote server responds to "."

                That is what appears to be happening. The recipient definitely gets the
                message (multiple times).

                >What is the output from:
                >postconf | grep smtp_data_

                smtp_data_done_timeout = 600s
                smtp_data_init_timeout = 120s
                smtp_data_xfer_timeout = 180s

                >Is the timeout problem message size dependent?
                No, many messages are smaller than 8k in size.

                >How many of those messages are you sending in parallel?
                One message/recipient.

                Thanks,
                Mandy
              • Wietse Venema
                ... So the remote server does not respond to . in 600s. Are you perhaps behind a NAT gateway? This may expire the connection from its tables too early. Such
                Message 7 of 10 , May 1, 2007
                • 0 Attachment
                  Hargis, Mandy:
                  >
                  > >So messages arrive at the remote site, but the Postfix SMTP client
                  > times out before the remote server responds to "."
                  >
                  > That is what appears to be happening. The recipient definitely gets the
                  > message (multiple times).
                  >
                  > >What is the output from:
                  > >postconf | grep smtp_data_
                  >
                  > smtp_data_done_timeout = 600s
                  > smtp_data_init_timeout = 120s
                  > smtp_data_xfer_timeout = 180s

                  So the remote server does not respond to "." in 600s.

                  Are you perhaps behind a NAT gateway? This may expire the connection
                  from its tables too early. Such boxes tend to be optimized for
                  short-lived http connections which is bad for email.

                  Is the remote SMTP server behind a NAT gateway?

                  In either case, it may help to turn on keep-alives.,
                  For example, in FreeBSD:

                  sysctl -w net.inet.tcp.keepidle=100000

                  This is currently not built into Postfix.

                  > >Is the timeout problem message size dependent?
                  > No, many messages are smaller than 8k in size.

                  Are there large messages that DON'T fail?

                  > >How many of those messages are you sending in parallel?
                  > One message/recipient.

                  How many EMAIL MESSAGES are you sending in parallel?

                  Wietse
                • Wietse Venema
                  ... Linux specifies the interval in seconds: sysctl -w net.ipv4.tcp_keepalive_time=100 Solaris specifies it in milliseconds, like *BSD: ndd -set /dev/tcp
                  Message 8 of 10 , May 1, 2007
                  • 0 Attachment
                    Wietse Venema:
                    > Are you perhaps behind a NAT gateway? This may expire the connection
                    > from its tables too early. Such boxes tend to be optimized for
                    > short-lived http connections which is bad for email.
                    >
                    > Is the remote SMTP server behind a NAT gateway?
                    >
                    > In either case, it may help to turn on keep-alives.,
                    > For example, in FreeBSD:
                    >
                    > sysctl -w net.inet.tcp.keepidle=100000

                    Linux specifies the interval in seconds:

                    sysctl -w net.ipv4.tcp_keepalive_time=100

                    Solaris specifies it in milliseconds, like *BSD:

                    ndd -set /dev/tcp tcp_keepalive_interval 100000

                    Linux sends keepalive probes only after an application turns on
                    the SO_KEEPALIVE option on a socket.

                    I suppose Solaris has the same behavior.

                    To turn on the SO_KEEPALIVE in Postfix, see attached patches for
                    Postfix 2.3, and for 2.4 and later. It takes an existing workaround
                    for Solaris, and turns it on for all platforms.

                    Wietse
                  • Hargis, Mandy
                    ... from its tables too early. Such boxes tend to be optimized for short-lived http connections which is bad for email. I m not behind a NAT gateway. ... The
                    Message 9 of 10 , May 1, 2007
                    • 0 Attachment
                      >Are you perhaps behind a NAT gateway? This may expire the connection
                      from its tables too early. Such boxes tend to be optimized for
                      short-lived http connections which is bad for email.

                      I'm not behind a NAT gateway.

                      >Is the remote SMTP server behind a NAT gateway?

                      The remote SMTP servers include hundreds of servers such as verizon.net,
                      yahoo, many .edus, gmail, etc.

                      >In either case, it may help to turn on keep-alives.,

                      Current Solaris setting:
                      > ndd -get /dev/tcp tcp_keepalive_interval
                      7200000

                      >This is currently not built into Postfix.

                      I have not installed the Postfix patch that you provided in the separate
                      message. I'm just wondering how my inbound SMTP servers could have been
                      running for three + years without this patch or problem. How could it
                      be necessary all of a sudden?

                      >Are there large messages that DON'T fail?

                      Yes many large messages have no problems. Oddly enough this seems to
                      happen when a message contains a .vcf or .html file attachment.

                      >How many EMAIL MESSAGES are you sending in parallel?
                      default_destination_concurrency_limit = 20
                    • Wietse Venema
                      ... On Solaris you don t need the patch. Postfix keepalives are already turned on to work around kernel bugs. However 7200000 milliseconds is two hours and
                      Message 10 of 10 , May 1, 2007
                      • 0 Attachment
                        Hargis, Mandy:
                        > > ndd -get /dev/tcp tcp_keepalive_interval
                        > 7200000

                        On Solaris you don't need the patch. Postfix keepalives are already
                        turned on to work around kernel bugs.

                        However 7200000 milliseconds is two hours and that won't make a
                        difference of the problem is caused by NAT boxes with too short
                        timeouts. Try 10s and see if it makes a difference.

                        ndd -set /dev/tcp tcp_keepalive_interval 10000

                        > I have not installed the Postfix patch that you provided in the separate
                        > message. I'm just wondering how my inbound SMTP servers could have been
                        > running for three + years without this patch or problem. How could it
                        > be necessary all of a sudden?

                        I suppose that if Postfix didn't change, then something else did.
                        Either this, or the problem already existed and you just didn't
                        know about it....

                        If you experience this problem with many sites, then it is
                        very likely that the problem is at your end of the world.

                        This is another reason why I suspect that something in your
                        infrastructure was changed recently.

                        Wietse
                      Your message has been successfully submitted and would be delivered to recipients shortly.