Loading ...
Sorry, an error occurred while loading the content.

RE: "lost connection with domain while sending end of data -- message may be sent more than once"

Expand Messages
  • Hargis, Mandy
    ... times out before the remote server responds to . That is what appears to be happening. The recipient definitely gets the message (multiple times). ...
    Message 1 of 10 , May 1, 2007
    • 0 Attachment
      >So messages arrive at the remote site, but the Postfix SMTP client
      times out before the remote server responds to "."

      That is what appears to be happening. The recipient definitely gets the
      message (multiple times).

      >What is the output from:
      >postconf | grep smtp_data_

      smtp_data_done_timeout = 600s
      smtp_data_init_timeout = 120s
      smtp_data_xfer_timeout = 180s

      >Is the timeout problem message size dependent?
      No, many messages are smaller than 8k in size.

      >How many of those messages are you sending in parallel?
      One message/recipient.

      Thanks,
      Mandy
    • Wietse Venema
      ... So the remote server does not respond to . in 600s. Are you perhaps behind a NAT gateway? This may expire the connection from its tables too early. Such
      Message 2 of 10 , May 1, 2007
      • 0 Attachment
        Hargis, Mandy:
        >
        > >So messages arrive at the remote site, but the Postfix SMTP client
        > times out before the remote server responds to "."
        >
        > That is what appears to be happening. The recipient definitely gets the
        > message (multiple times).
        >
        > >What is the output from:
        > >postconf | grep smtp_data_
        >
        > smtp_data_done_timeout = 600s
        > smtp_data_init_timeout = 120s
        > smtp_data_xfer_timeout = 180s

        So the remote server does not respond to "." in 600s.

        Are you perhaps behind a NAT gateway? This may expire the connection
        from its tables too early. Such boxes tend to be optimized for
        short-lived http connections which is bad for email.

        Is the remote SMTP server behind a NAT gateway?

        In either case, it may help to turn on keep-alives.,
        For example, in FreeBSD:

        sysctl -w net.inet.tcp.keepidle=100000

        This is currently not built into Postfix.

        > >Is the timeout problem message size dependent?
        > No, many messages are smaller than 8k in size.

        Are there large messages that DON'T fail?

        > >How many of those messages are you sending in parallel?
        > One message/recipient.

        How many EMAIL MESSAGES are you sending in parallel?

        Wietse
      • Wietse Venema
        ... Linux specifies the interval in seconds: sysctl -w net.ipv4.tcp_keepalive_time=100 Solaris specifies it in milliseconds, like *BSD: ndd -set /dev/tcp
        Message 3 of 10 , May 1, 2007
        • 0 Attachment
          Wietse Venema:
          > Are you perhaps behind a NAT gateway? This may expire the connection
          > from its tables too early. Such boxes tend to be optimized for
          > short-lived http connections which is bad for email.
          >
          > Is the remote SMTP server behind a NAT gateway?
          >
          > In either case, it may help to turn on keep-alives.,
          > For example, in FreeBSD:
          >
          > sysctl -w net.inet.tcp.keepidle=100000

          Linux specifies the interval in seconds:

          sysctl -w net.ipv4.tcp_keepalive_time=100

          Solaris specifies it in milliseconds, like *BSD:

          ndd -set /dev/tcp tcp_keepalive_interval 100000

          Linux sends keepalive probes only after an application turns on
          the SO_KEEPALIVE option on a socket.

          I suppose Solaris has the same behavior.

          To turn on the SO_KEEPALIVE in Postfix, see attached patches for
          Postfix 2.3, and for 2.4 and later. It takes an existing workaround
          for Solaris, and turns it on for all platforms.

          Wietse
        • Hargis, Mandy
          ... from its tables too early. Such boxes tend to be optimized for short-lived http connections which is bad for email. I m not behind a NAT gateway. ... The
          Message 4 of 10 , May 1, 2007
          • 0 Attachment
            >Are you perhaps behind a NAT gateway? This may expire the connection
            from its tables too early. Such boxes tend to be optimized for
            short-lived http connections which is bad for email.

            I'm not behind a NAT gateway.

            >Is the remote SMTP server behind a NAT gateway?

            The remote SMTP servers include hundreds of servers such as verizon.net,
            yahoo, many .edus, gmail, etc.

            >In either case, it may help to turn on keep-alives.,

            Current Solaris setting:
            > ndd -get /dev/tcp tcp_keepalive_interval
            7200000

            >This is currently not built into Postfix.

            I have not installed the Postfix patch that you provided in the separate
            message. I'm just wondering how my inbound SMTP servers could have been
            running for three + years without this patch or problem. How could it
            be necessary all of a sudden?

            >Are there large messages that DON'T fail?

            Yes many large messages have no problems. Oddly enough this seems to
            happen when a message contains a .vcf or .html file attachment.

            >How many EMAIL MESSAGES are you sending in parallel?
            default_destination_concurrency_limit = 20
          • Wietse Venema
            ... On Solaris you don t need the patch. Postfix keepalives are already turned on to work around kernel bugs. However 7200000 milliseconds is two hours and
            Message 5 of 10 , May 1, 2007
            • 0 Attachment
              Hargis, Mandy:
              > > ndd -get /dev/tcp tcp_keepalive_interval
              > 7200000

              On Solaris you don't need the patch. Postfix keepalives are already
              turned on to work around kernel bugs.

              However 7200000 milliseconds is two hours and that won't make a
              difference of the problem is caused by NAT boxes with too short
              timeouts. Try 10s and see if it makes a difference.

              ndd -set /dev/tcp tcp_keepalive_interval 10000

              > I have not installed the Postfix patch that you provided in the separate
              > message. I'm just wondering how my inbound SMTP servers could have been
              > running for three + years without this patch or problem. How could it
              > be necessary all of a sudden?

              I suppose that if Postfix didn't change, then something else did.
              Either this, or the problem already existed and you just didn't
              know about it....

              If you experience this problem with many sites, then it is
              very likely that the problem is at your end of the world.

              This is another reason why I suspect that something in your
              infrastructure was changed recently.

              Wietse
            Your message has been successfully submitted and would be delivered to recipients shortly.