Loading ...
Sorry, an error occurred while loading the content.

Expand Messages
  • Xie, Wei
    Need help!!! We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for production trial. From the email delivery log, we see delay 967 seconds. Apr 28
    Message 1 of 20 , Apr 28, 2014
    • 0 Attachment

      Need help!!!

       

      We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for production trial.

       

      From the email delivery log, we see delay 967 seconds.

       

      Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190: to=<turek.16@...>, relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...> [InternalId=9787221] Queued mail for delivery)

       

      We know about the definition of four a/b/c/d delays information. How do we do performance tuning to reduce delay in queue manager?

       

      # Message delivery time stamps

      # delays=a/b/c/d, where

      #   a = time before queue manager, including message transmission

      #   b = time in queue manager

      #   c = connection setup including DNS, HELO and TLS;

      #   d = message transmission time

       

      Here are our configurations about TLS in Postfix configuration file /etc/postfix/main.cf.  Other default parameters are used. We do not change them.

       

      ##################

      #

      # Configure TLS client to FOPE/ZIX

      #

      #################

      smtp_tls_security_level = encrypt

      smtp_tls_loglevel = 2

      smtp_tls_session_cache_database = btree:/var/lib/postfix/smtp_scache

      smtp_tls_note_starttls_offer = yes

      #

      # Enable "smtp_tls_CAfile" to fix the warning in the log /var/log/maillog

      # "Untrusted TLS connection established to mail.us.messaging.microsoft.com[216.32.180.22]:25:

      #  TLSv1 with cipher AES128-SHA (128/128 bits)"

      #

      smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt

      #

      ####################

      # Configure TLS server

      ####################

      smtpd_tls_security_level = may

      smtpd_tls_key_file = /etc/postfix/service_certs/osu_ues/OSU_UES_WC_Cert_Ex_key.pem

      smtpd_tls_cert_file = /etc/postfix/service_certs/osu_ues/OSU_UES_WC_Cert_Ex_certificate.pem

      smtpd_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt

      smtpd_tls_CApath = /etc/postfix/service_certs/osu_ues

      smtpd_tls_loglevel = 2

      smtpd_tls_received_header = yes

      smtpd_tls_session_cache_database = btree:/var/lib/postfix/smtpd_scache

       

      We have other servers running Postfix-2.6.6 with TLS on RHEL 6.4 and email delivery is very fast below. The parameters for TLS and other default parameters are same as above Postfix running on RHEL 5.10 server.

       

      Apr 28 10:58:11 cio-krc-pf07 postfix/smtp[16802]: 64DA2500066: to=<neff.194@...>, relay=mail.us.messaging.microsoft.com[216.32.180.22]:25, delay=0.72, delays=0/0/0.28/0.44, dsn=2.6.0, status=sent (250 2.6.0 <fafe453d676b3e3e740bc382f3bfcb5b@...> [InternalId=19618080] Queued mail for delivery).

       

       

      Thanks to all,

       

      Carl Xie

      OCIO/Enterprise messaging Group

      Ohio State University

    • Wietse Venema
      ... This needs 1.5 seconds for the TCP, SMTP and TLS handhake, and 1.3 seconds to deliver the message. RHEL 5.10 is old. Why are you evaluating it for
      Message 2 of 20 , Apr 28, 2014
      • 0 Attachment
        Xie, Wei:
        > Need help!!!
        >
        > We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for production trial.
        >
        > From the email delivery log, we see delay 967 seconds.
        >
        > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190: to=<turek.16@...>, relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...> [InternalId=9787221] Queued mail for delivery)

        This needs 1.5 seconds for the TCP, SMTP and TLS handhake, and 1.3
        seconds to deliver the message.

        RHEL 5.10 is old. Why are you evaluating it for production?

        > We have other servers running Postfix-2.6.6 with TLS on RHEL 6.4
        > and email delivery is very fast below. The parameters for TLS and
        > other default parameters are same as above Postfix running on RHEL
        > 5.10 server.
        >
        > Apr 28 10:58:11 cio-krc-pf07 postfix/smtp[16802]: 64DA2500066: to=<neff.194@...>, relay=mail.us.messaging.microsoft.com[216.32.180.22]:25, delay=0.72, delays=0/0/0.28/0.44, dsn=2.6.0, status=sent (250 2.6.0 <fafe453d676b3e3e740bc382f3bfcb5b@...> [InternalId=19618080] Queued mail for delivery).

        This needs 0.28 for the TCP, SMTP and TLS handhake, and it also
        uses less time to deliver mail. With less time needed to do the job,
        less mail piles up in the Postfix queue.

        If the same Postfix version and configuration different results on
        different OS distributions, then it is a good bet that the OS
        distribution is causing the difference, and that this requires
        Viktor to look into what may be causing this.

        Are you sure you did not swap the 5.10 and 6.4 release names?

        Wietse
      • Viktor Dukhovni
        ... Were all other messages to the same domain similarly delayed? What is the complete set of logs for this queue-id? How many recipients did this message
        Message 3 of 20 , Apr 28, 2014
        • 0 Attachment
          On Mon, Apr 28, 2014 at 03:06:16PM +0000, Xie, Wei wrote:

          >
          > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
          > to=<turek.16@...>,
          > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25,
          > delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent
          > (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
          > [InternalId=9787221] Queued mail for delivery)

          Were all other messages to the same domain similarly delayed? What
          is the complete set of logs for this queue-id? How many recipients
          did this message have?

          What was the output rate of email to this domain in the 30 minutes
          preceeding this log entry? (Avoid counting multiple recipients
          with the same queue-id, relay and remote server response as separate
          deliveries).

          Have you configured any concurrency controls or rate delay for this
          destination?

          If this is a new IP address sending high volume mail to a major
          email provider, it may take time for your "IP reputation" to be
          established. During that time it may be prudent to gradually
          ramp-up the volume of mail sent by routing a reduced fraction of
          your mail via that particular server (less input, not slower output).

          > We know about the definition of four a/b/c/d delays information.
          > How do we do performance tuning to reduce delay in queue manager?

          This delay means a large number of messages waiting behind the
          messages currently being delivered, subject to concurrency and
          rate delays.

          > Here are our configurations about TLS in Postfix configuration
          > file /etc/postfix/main.cf. Other default parameters are used. We
          > do not change them.

          There is no reason to expect that TLS has anything to do with this.
          Why are you sending TLS settings?

          You need to make sure that the new server has a working local DNS
          cache, its IP reputation is good, and your steady-state input rate
          does not exceed the output rate of remote destinations.

          --
          Viktor.
        • Xie, Wei
          ... It is very funny to talk about this story. We are running Symantec DLP application 12.0.1 + Postfix 2.6.6 on RHEL 6.4 for outbound emails, which are
          Message 4 of 20 , Apr 28, 2014
          • 0 Attachment
            >>> RHEL 5.10 is old. Why are you evaluating it for production?

            It is very funny to talk about this story. We are running Symantec DLP application 12.0.1 + Postfix 2.6.6 on RHEL 6.4 for outbound emails, which are delivered to FOPE (one windows antispam system - relay=mail.us.messaging.microsoft.com). Sometimes Symantec DLP application consecutively restart its own processes due to unknown reasons. We have tried all solutions from Symantec recommends, but the problem is still not fixed. Finally Symantec tells us their software running on RHEL 6.x is not supported. They only support us only if we downgrade to RHEL 5.8. After we test everything is functional on test/dev servers, we choose one production to downgrade to RHEL 5.10 and do real production trial. But performance is not good. The time for outbound emails in queue is too long. We need address the root cause. That's why I sent this email to ask for help.

            >>> Are you sure you did not swap the 5.10 and 6.4 release names?

            I am sure I did not swap the 5.10 and 6.4 release names. The Postfix-2.3.3 is bundled with RHEL 5.10 whereas Postfix-2.6.6 is bundled with RHEL 6.4. To turn on TLS on Postfix-2.3.3, I see some errors on the log /var/log/maillog. To get approval from Symantec that they support all releases of Postfix on RHEL 5.10, we upgrade Postfix-2.3.3 to Postfix-2.6.6, which is same as on our other RHEL 6.4 servers.

            Thanks,

            Carl



            -----Original Message-----
            From: Wietse Venema [mailto:wietse@...]
            Sent: Monday, April 28, 2014 11:50 AM
            To: Xie, Wei
            Cc: 'postfix-users@...'
            Subject: RHEL 5.10 vs 6.4 performancs difference

            Xie, Wei:
            > Need help!!!
            >
            > We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for production trial.
            >
            > From the email delivery log, we see delay 967 seconds.
            >
            > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
            > to=<turek.16@...>,
            > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967,
            > delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0
            > <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
            > [InternalId=9787221] Queued mail for delivery)

            This needs 1.5 seconds for the TCP, SMTP and TLS handhake, and 1.3 seconds to deliver the message.

            RHEL 5.10 is old. Why are you evaluating it for production?

            > We have other servers running Postfix-2.6.6 with TLS on RHEL 6.4 and
            > email delivery is very fast below. The parameters for TLS and other
            > default parameters are same as above Postfix running on RHEL
            > 5.10 server.
            >
            > Apr 28 10:58:11 cio-krc-pf07 postfix/smtp[16802]: 64DA2500066: to=<neff.194@...>, relay=mail.us.messaging.microsoft.com[216.32.180.22]:25, delay=0.72, delays=0/0/0.28/0.44, dsn=2.6.0, status=sent (250 2.6.0 <fafe453d676b3e3e740bc382f3bfcb5b@...> [InternalId=19618080] Queued mail for delivery).

            This needs 0.28 for the TCP, SMTP and TLS handhake, and it also uses less time to deliver mail. With less time needed to do the job, less mail piles up in the Postfix queue.

            If the same Postfix version and configuration different results on different OS distributions, then it is a good bet that the OS distribution is causing the difference, and that this requires Viktor to look into what may be causing this.

            Are you sure you did not swap the 5.10 and 6.4 release names?

            Wietse
          • Wietse Venema
            ... As Viktor noted, it is possible that they slow you down because - You re sending mail from a new IP address that has no reputation, - or you re sending
            Message 5 of 20 , Apr 28, 2014
            • 0 Attachment
              Wietse:
              > RHEL 5.10 is old. Why are you evaluating it for production?

              Xie, Wei:
              > [Symantec support issue]

              As Viktor noted, it is possible that they slow you down because

              - You're sending mail from a new IP address that has no reputation,

              - or you're sending mail from an old IP address that has sent them
              spam in the past,

              - or there is something about your DNS records that they don't like
              (the IP address is embedded in the name of the A/PTR record, or the
              PTR record name does not match the A record),

              - or Postfix is configured to use a myhostname that does not match
              the A/PTR record,

              - or you have not updated your SPF record to include the new machine,

              - or a list of other things.

              If they don't like you and each SMTP session needs 3 seconds, then
              those delays quickly add up to 1000.

              Wietse
            • Viktor Dukhovni
              ... Or local shot in foot via rate delays or overly low concurrency ceilings, be these perhaps motivated by strict receving system rate ceilings. If the
              Message 6 of 20 , Apr 28, 2014
              • 0 Attachment
                On Mon, Apr 28, 2014 at 01:23:41PM -0400, Wietse Venema wrote:

                > Xie, Wei:
                > > [Symantec support issue]
                >
                > As Viktor noted, it is possible that they slow you down because
                >
                > - You're sending mail from a new IP address that has no reputation,
                >
                > - or you're sending mail from an old IP address that has sent them
                > spam in the past,
                >
                > - or there is something about your DNS records that they don't like
                > (the IP address is embedded in the name of the A/PTR record, or the
                > PTR record name does not match the A record),
                >
                > - or Postfix is configured to use a myhostname that does not match
                > the A/PTR record,
                >
                > - or you have not updated your SPF record to include the new machine,
                >
                > - or a list of other things.

                Or local shot in foot via rate delays or overly low concurrency
                ceilings, be these perhaps motivated by strict receving system rate
                ceilings. If the receiving system makes it difficult enough to
                send mail get the problem fixed on their end, I am under the
                impression you're paying them money to be your service provider.

                --
                Viktor.
              • Marius Gologan
                Hi, I experienced funny stories with Symantec, on Windows, resulting in a discount for another year for my employer and a pat on the back for me. Some
                Message 7 of 20 , Apr 28, 2014
                • 0 Attachment
                  Hi,

                  I experienced funny stories with Symantec, on Windows, resulting in a
                  discount for another year for my employer and a pat on the back for me.

                  Some suggestions:
                  - try to identify the tmp folder where Symantec keeps the message while is
                  filtered and see how long is kept there.
                  - for test purpose: try to set a time limit in Symantec (probably you don't
                  have such option). If you do, set a limit of 60 seconds and see if you get
                  any change.
                  - You may find some improvements in disabling some features. Limit the
                  scanning to executables, scripts, archives and documents, instead of
                  scanning all files (pictures, video, audio).
                  - Activate/deactivate DLP and other modules to see if something changes.
                  - You may have depth level scanning too.

                  Regards,
                  Marius.

                  -----Original Message-----
                  From: owner-postfix-users@...
                  [mailto:owner-postfix-users@...] On Behalf Of Xie, Wei
                  Sent: Monday, April 28, 2014 8:01 PM
                  To: Wietse Venema
                  Cc: 'postfix-users@...'
                  Subject: RE: RHEL 5.10 vs 6.4 performancs difference

                  >>> RHEL 5.10 is old. Why are you evaluating it for production?

                  It is very funny to talk about this story. We are running Symantec DLP
                  application 12.0.1 + Postfix 2.6.6 on RHEL 6.4 for outbound emails, which
                  are delivered to FOPE (one windows antispam system -
                  relay=mail.us.messaging.microsoft.com). Sometimes Symantec DLP application
                  consecutively restart its own processes due to unknown reasons. We have
                  tried all solutions from Symantec recommends, but the problem is still not
                  fixed. Finally Symantec tells us their software running on RHEL 6.x is not
                  supported. They only support us only if we downgrade to RHEL 5.8. After we
                  test everything is functional on test/dev servers, we choose one production
                  to downgrade to RHEL 5.10 and do real production trial. But performance is
                  not good. The time for outbound emails in queue is too long. We need address
                  the root cause. That's why I sent this email to ask for help.

                  >>> Are you sure you did not swap the 5.10 and 6.4 release names?

                  I am sure I did not swap the 5.10 and 6.4 release names. The Postfix-2.3.3
                  is bundled with RHEL 5.10 whereas Postfix-2.6.6 is bundled with RHEL 6.4.
                  To turn on TLS on Postfix-2.3.3, I see some errors on the log
                  /var/log/maillog. To get approval from Symantec that they support all
                  releases of Postfix on RHEL 5.10, we upgrade Postfix-2.3.3 to Postfix-2.6.6,
                  which is same as on our other RHEL 6.4 servers.

                  Thanks,

                  Carl



                  -----Original Message-----
                  From: Wietse Venema [mailto:wietse@...]
                  Sent: Monday, April 28, 2014 11:50 AM
                  To: Xie, Wei
                  Cc: 'postfix-users@...'
                  Subject: RHEL 5.10 vs 6.4 performancs difference

                  Xie, Wei:
                  > Need help!!!
                  >
                  > We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for production
                  trial.
                  >
                  > From the email delivery log, we see delay 967 seconds.
                  >
                  > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
                  > to=<turek.16@...>,
                  > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967,
                  > delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0
                  > <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                  > [InternalId=9787221] Queued mail for delivery)

                  This needs 1.5 seconds for the TCP, SMTP and TLS handhake, and 1.3 seconds
                  to deliver the message.

                  RHEL 5.10 is old. Why are you evaluating it for production?

                  > We have other servers running Postfix-2.6.6 with TLS on RHEL 6.4 and
                  > email delivery is very fast below. The parameters for TLS and other
                  > default parameters are same as above Postfix running on RHEL
                  > 5.10 server.
                  >
                  > Apr 28 10:58:11 cio-krc-pf07 postfix/smtp[16802]: 64DA2500066:
                  to=<neff.194@...>,
                  relay=mail.us.messaging.microsoft.com[216.32.180.22]:25, delay=0.72,
                  delays=0/0/0.28/0.44, dsn=2.6.0, status=sent (250 2.6.0
                  <fafe453d676b3e3e740bc382f3bfcb5b@...>
                  [InternalId=19618080] Queued mail for delivery).

                  This needs 0.28 for the TCP, SMTP and TLS handhake, and it also uses less
                  time to deliver mail. With less time needed to do the job, less mail piles
                  up in the Postfix queue.

                  If the same Postfix version and configuration different results on different
                  OS distributions, then it is a good bet that the OS distribution is causing
                  the difference, and that this requires Viktor to look into what may be
                  causing this.

                  Are you sure you did not swap the 5.10 and 6.4 release names?

                  Wietse
                • Xie, Wei
                  ... When congestion occurred, all other messages to the same domain were similarly delayed. The delay are longer and longer (the longest exceeded 1200
                  Message 8 of 20 , Apr 28, 2014
                  • 0 Attachment
                    >>>Were all other messages to the same domain similarly delayed?

                    When congestion occurred, all other messages to the same domain were similarly delayed. The delay are longer and longer (the longest exceeded 1200 seconds) and the length of active queue is longer and longer (get to know from the outputs from commands 'qshape active' and 'mailq |grep \* |wc -l, the queued messages were over 9,000).

                    All outbound emails will be sent to FOPE (one windows antispam system) for scanning. The destination domain is mail.us.messaging.microsoft.com.

                    >>> What is the complete set of logs for this queue-id?

                    Apr 28 10:47:11 cio-krc-pf03 postfix/smtpd[31853]: 9934181190: client=cio-tnc-ht06.osuad.osu.edu[164.107.81.171]
                    Apr 28 10:47:11 cio-krc-pf03 postfix/cleanup[31859]: 9934181190: message-id=<27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                    Apr 28 10:47:11 cio-krc-pf03 postfix/cleanup[31859]: 9934181190: warning: header Subject: eRequest Submitted from cio-tnc-ht06.osuad.osu.edu[164.107.81.171]; from=<erequest.do.not.reply@...> to=<turek.16@...> proto=ESMTP helo=<CIO-TNC-HT06.osuad.osu.edu>
                    Apr 28 10:47:11 cio-krc-pf03 postfix/qmgr[31812]: 9934181190: from=<erequest.do.not.reply@...>, size=1905, nrcpt=1 (queue active)
                    Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190: to=<turek.16@...>, relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...> [InternalId=9787221] Queued mail for delivery)
                    Apr 28 11:03:18 cio-krc-pf03 postfix/qmgr[31812]: 9934181190: removed

                    >>> How many recipients did this message have?

                    Only one.

                    >>>What was the output rate of email to this domain in the 30 minutes preceeding this log entry? (Avoid counting multiple recipients with the same queue-id, relay and remote server
                    >>> response as separate deliveries).

                    Today 10:00:00 ~ 10:29:59 the output rate of email to this domain in the 30 minutes was 15,361.
                    Today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes was 28,827.
                    Today 11:00:00 ~ 11:29:59 the output rate of email to this domain in the 30 minutes was 111,27.

                    >>> Have you configured any concurrency controls or rate delay for this destination?

                    No. keep default unchanged. Which parameters for concurrency controls or rate delay need to be checked?

                    >>> If this is a new IP address sending high volume mail to a major email provider, it may take time for your "IP reputation" to be established. During that time it may be prudent to gradually ramp-up the volume of mail sent by routing a reduced fraction of your mail via that particular server (less input, not slower output).

                    The IP is old IP. It has been used for 1.5 years.

                    >>> We know about the definition of four a/b/c/d delays information.
                    >>> How do we do performance tuning to reduce delay in queue manager?
                    >>>
                    >>>This delay means a large number of messages waiting behind the messages currently being delivered, subject to concurrency and rate delays.

                    How can we increase delivery rate so that b-delay is down?

                    >>>There is no reason to expect that TLS has anything to do with this.
                    >>>Why are you sending TLS settings?

                    Security department enforces us to turn on TLS. I just provided TLS settings for reference. If no use, just ignore it. Sorry!

                    >>You need to make sure that the new server has a working local DNS cache, its IP reputation is good, and your steady-state input rate does not exceed the output rate of remote >>destinations.

                    How can we check the new server has a working local DNS cache? Check the file /etc/resolv.conf?

                    The server IP reputation should be good, never gets blocked.

                    In peak hours 10:30:00~ 10:59:59, other servers running Postfix-2.6.6 on RHEL 6.4 were fine. Only this server running Postfix-2.6.6 on RHEL 5.10 experienced serious delay. Do we need change some parameters to increase delivery rate or set special channel/allocate fixed SMTP processes for specified outbound domains?

                    Our outbound emails have four main destination domains to be relayed to Windows FOPE.

                    Buckeyemail.osu.edu ---------------> mail.us.messaging.microsoft.com
                    Gmail.com ---------------------------> mail.us.messaging.microsoft.com
                    Yahoo.com ----------------------------> mail.us.messaging.microsoft.com
                    Hotmail.com ---------------------------> mail.us.messaging.microsoft.com

                    Thanks,

                    Carl

                    -----Original Message-----
                    From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                    Sent: Monday, April 28, 2014 12:17 PM
                    To: postfix-users@...
                    Subject: Backlog to outsourced email provider

                    On Mon, Apr 28, 2014 at 03:06:16PM +0000, Xie, Wei wrote:

                    >
                    > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
                    > to=<turek.16@...>,
                    > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25,
                    > delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent
                    > (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                    > [InternalId=9787221] Queued mail for delivery)

                    Were all other messages to the same domain similarly delayed? What is the complete set of logs for this queue-id? How many recipients did this message have?

                    What was the output rate of email to this domain in the 30 minutes preceeding this log entry? (Avoid counting multiple recipients with the same queue-id, relay and remote server response as separate deliveries).

                    Have you configured any concurrency controls or rate delay for this destination?

                    If this is a new IP address sending high volume mail to a major email provider, it may take time for your "IP reputation" to be established. During that time it may be prudent to gradually ramp-up the volume of mail sent by routing a reduced fraction of your mail via that particular server (less input, not slower output).

                    > We know about the definition of four a/b/c/d delays information.
                    > How do we do performance tuning to reduce delay in queue manager?

                    This delay means a large number of messages waiting behind the messages currently being delivered, subject to concurrency and rate delays.

                    > Here are our configurations about TLS in Postfix configuration file
                    > /etc/postfix/main.cf. Other default parameters are used. We do not
                    > change them.

                    There is no reason to expect that TLS has anything to do with this.
                    Why are you sending TLS settings?

                    You need to make sure that the new server has a working local DNS cache, its IP reputation is good, and your steady-state input rate does not exceed the output rate of remote destinations.

                    --
                    Viktor.
                  • Xie, Wei
                    ... The server IP is not a new IP. ... The reputation for this IP should be good. Never got blocked or warning before from Windows FOPE. ... DNS record is
                    Message 9 of 20 , Apr 28, 2014
                    • 0 Attachment
                      >> - You're sending mail from a new IP address that has no reputation,
                      The server IP is not a new IP.

                      >> - or you're sending mail from an old IP address that has sent them spam in the past,
                      The reputation for this IP should be good. Never got blocked or warning before from Windows FOPE.

                      >>- or there is something about your DNS records that they don't like (the IP address is embedded in the name of the A/PTR record, or the PTR record name does not match the A record),

                      DNS record is good.

                      >> - or Postfix is configured to use a myhostname that does not match the A/PTR record,

                      The following two settings are existing on RHEL 6.4 and RHEL 5.10. Do we really need to change myhostname to fully-qualified domain name (i.e. cio-krc-pf03.osuad.osu.edu)?

                      myhostname = cio-krc-pf03
                      mydomain = osuad.osu.edu

                      >> - or you have not updated your SPF record to include the new machine,

                      The machine is already in SPF record for domain 'osu.edu'.

                      >> - or a list of other things.

                      We hope to find out something else.

                      Thanks,

                      Carl


                      -----Original Message-----
                      From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Wietse Venema
                      Sent: Monday, April 28, 2014 1:24 PM
                      To: Postfix users
                      Subject: Re: RHEL 5.10 vs 6.4 performancs difference

                      Wietse:
                      > RHEL 5.10 is old. Why are you evaluating it for production?

                      Xie, Wei:
                      > [Symantec support issue]

                      As Viktor noted, it is possible that they slow you down because

                      - You're sending mail from a new IP address that has no reputation,

                      - or you're sending mail from an old IP address that has sent them spam in the past,

                      - or there is something about your DNS records that they don't like (the IP address is embedded in the name of the A/PTR record, or the PTR record name does not match the A record),

                      - or Postfix is configured to use a myhostname that does not match the A/PTR record,

                      - or you have not updated your SPF record to include the new machine,

                      - or a list of other things.

                      If they don't like you and each SMTP session needs 3 seconds, then those delays quickly add up to 1000.

                      Wietse
                    • Xie, Wei
                      Marius, To isolated problem, in local firewall iptables, we have made the change (redirect 25 to Postfix port 10026 instead DLP port 10025) to bypass DLP
                      Message 10 of 20 , Apr 28, 2014
                      • 0 Attachment
                        Marius,

                        To isolated problem, in local firewall iptables, we have made the change (redirect 25 to Postfix port 10026 instead DLP port 10025) to bypass DLP stuff to let the outbound emails directly hit Postfix. But seeing the display rate of contents of /var/log/maillog (tail -f /var/log/maillog) on this RHEL 5.10 server, we feel obvious slow delivery rate on this server than other servers.

                        Thanks,

                        Carl

                        -----Original Message-----
                        From: Marius Gologan [mailto:marius.gologan@...]
                        Sent: Monday, April 28, 2014 2:06 PM
                        To: Xie, Wei
                        Cc: postfix-users@...
                        Subject: RE: RHEL 5.10 vs 6.4 performancs difference

                        Hi,

                        I experienced funny stories with Symantec, on Windows, resulting in a discount for another year for my employer and a pat on the back for me.

                        Some suggestions:
                        - try to identify the tmp folder where Symantec keeps the message while is filtered and see how long is kept there.
                        - for test purpose: try to set a time limit in Symantec (probably you don't have such option). If you do, set a limit of 60 seconds and see if you get any change.
                        - You may find some improvements in disabling some features. Limit the scanning to executables, scripts, archives and documents, instead of scanning all files (pictures, video, audio).
                        - Activate/deactivate DLP and other modules to see if something changes.
                        - You may have depth level scanning too.

                        Regards,
                        Marius.

                        -----Original Message-----
                        From: owner-postfix-users@...
                        [mailto:owner-postfix-users@...] On Behalf Of Xie, Wei
                        Sent: Monday, April 28, 2014 8:01 PM
                        To: Wietse Venema
                        Cc: 'postfix-users@...'
                        Subject: RE: RHEL 5.10 vs 6.4 performancs difference

                        >>> RHEL 5.10 is old. Why are you evaluating it for production?

                        It is very funny to talk about this story. We are running Symantec DLP application 12.0.1 + Postfix 2.6.6 on RHEL 6.4 for outbound emails, which are delivered to FOPE (one windows antispam system - relay=mail.us.messaging.microsoft.com). Sometimes Symantec DLP application consecutively restart its own processes due to unknown reasons. We have tried all solutions from Symantec recommends, but the problem is still not fixed. Finally Symantec tells us their software running on RHEL 6.x is not supported. They only support us only if we downgrade to RHEL 5.8. After we test everything is functional on test/dev servers, we choose one production to downgrade to RHEL 5.10 and do real production trial. But performance is not good. The time for outbound emails in queue is too long. We need address the root cause. That's why I sent this email to ask for help.

                        >>> Are you sure you did not swap the 5.10 and 6.4 release names?

                        I am sure I did not swap the 5.10 and 6.4 release names. The Postfix-2.3.3 is bundled with RHEL 5.10 whereas Postfix-2.6.6 is bundled with RHEL 6.4.
                        To turn on TLS on Postfix-2.3.3, I see some errors on the log /var/log/maillog. To get approval from Symantec that they support all releases of Postfix on RHEL 5.10, we upgrade Postfix-2.3.3 to Postfix-2.6.6, which is same as on our other RHEL 6.4 servers.

                        Thanks,

                        Carl



                        -----Original Message-----
                        From: Wietse Venema [mailto:wietse@...]
                        Sent: Monday, April 28, 2014 11:50 AM
                        To: Xie, Wei
                        Cc: 'postfix-users@...'
                        Subject: RHEL 5.10 vs 6.4 performancs difference

                        Xie, Wei:
                        > Need help!!!
                        >
                        > We are using Postfix-2.6.6 with TLS running on RHEL 5.10 for
                        > production
                        trial.
                        >
                        > From the email delivery log, we see delay 967 seconds.
                        >
                        > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
                        > to=<turek.16@...>,
                        > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967,
                        > delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0
                        > <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                        > [InternalId=9787221] Queued mail for delivery)

                        This needs 1.5 seconds for the TCP, SMTP and TLS handhake, and 1.3 seconds to deliver the message.

                        RHEL 5.10 is old. Why are you evaluating it for production?

                        > We have other servers running Postfix-2.6.6 with TLS on RHEL 6.4 and
                        > email delivery is very fast below. The parameters for TLS and other
                        > default parameters are same as above Postfix running on RHEL
                        > 5.10 server.
                        >
                        > Apr 28 10:58:11 cio-krc-pf07 postfix/smtp[16802]: 64DA2500066:
                        to=<neff.194@...>,
                        relay=mail.us.messaging.microsoft.com[216.32.180.22]:25, delay=0.72, delays=0/0/0.28/0.44, dsn=2.6.0, status=sent (250 2.6.0 <fafe453d676b3e3e740bc382f3bfcb5b@...>
                        [InternalId=19618080] Queued mail for delivery).

                        This needs 0.28 for the TCP, SMTP and TLS handhake, and it also uses less time to deliver mail. With less time needed to do the job, less mail piles up in the Postfix queue.

                        If the same Postfix version and configuration different results on different OS distributions, then it is a good bet that the OS distribution is causing the difference, and that this requires Viktor to look into what may be causing this.

                        Are you sure you did not swap the 5.10 and 6.4 release names?

                        Wietse
                      • Viktor Dukhovni
                        ... Clearly the output rate is not keeping up with the input rate. ... Yes indeed nothing seems to happen for 964 seconds sitting in the queue. ... Can you
                        Message 11 of 20 , Apr 28, 2014
                        • 0 Attachment
                          On Mon, Apr 28, 2014 at 06:09:43PM +0000, Xie, Wei wrote:

                          > When congestion occurred, all other messages to the same domain
                          > were similarly delayed. The delay are longer and longer (the
                          > longest exceeded 1200 seconds) and the length of active queue is
                          > longer and longer (get to know from the outputs from commands
                          > 'qshape active' and 'mailq |grep \* |wc -l, the queued messages
                          > were over 9,000).

                          Clearly the output rate is not keeping up with the input rate.

                          > >>> What is the complete set of logs for this queue-id?
                          >
                          > Apr 28 10:47:11 cio-krc-pf03 postfix/smtpd[31853]: 9934181190: client=cio-tnc-ht06.osuad.osu.edu[164.107.81.171]
                          > Apr 28 10:47:11 cio-krc-pf03 postfix/qmgr[31812]: 9934181190: from=<erequest.do.not.reply@...>, size=1905, nrcpt=1 (queue active)
                          > Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190: to=<turek.16@...>, relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967, delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0 <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...> [InternalId=9787221] Queued mail for delivery)
                          > Apr 28 11:03:18 cio-krc-pf03 postfix/qmgr[31812]: 9934181190: removed

                          Yes indeed nothing seems to happen for 964 seconds sitting in the queue.

                          > >>>What was the output rate of email to this domain in the 30
                          > >>> minutes preceeding this log entry? (Avoid counting multiple
                          > >>> recipients with the same queue-id, relay and remote server
                          > >>> response as separate deliveries).
                          >
                          > Today 10:00:00 ~ 10:29:59 the output rate of email to this domain in the 30 minutes was 15,361.
                          > Today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes was 28,827.
                          > Today 11:00:00 ~ 11:29:59 the output rate of email to this domain in the 30 minutes was 111,27.

                          Can you clarify that last one, is that ~11 thousand, the comma
                          seems misplaced. The earlier rate appears to be ~100 messages per
                          minute, or just over 1 per second. You should measure the average
                          "c+d" in the log for these time frames, again counting multiple
                          recipients in a single delivery as one event.

                          Supposing the 2.8 second delivery latency to typical, a delivery
                          rate of 1-2 messages per second suggests a destination concurrency
                          limit of "2", rather than the default limit of 20.

                          You need to post "postconf -n" output or at least:

                          default_destination_concurrency_limit
                          smtp_destination_concurrency_limit

                          check which transport is used for this domain, and if not "smtp",
                          post the concurrency limit for that.

                          > >>> Have you configured any concurrency controls or rate delay for this destination?
                          >
                          > No. keep default unchanged. Which parameters for concurrency controls or rate delay need to be checked?

                          All <transport>_destination_concurrency_limit settings in main.cf

                          Any <transport>_destination_rate_delay settings in main.cf

                          > >>>This delay means a large number of messages waiting behind the messages currently being delivered, subject to concurrency and rate delays.
                          >
                          > How can we increase delivery rate so that b-delay is down?

                          Either increase concurrency or reduce latency. Network captures
                          may show which protocol stage is responsible for most of the delay,
                          even with TLS one can tell whether the delay is at the beginning
                          or at the end of the TLS session or just low bandwidth throughout.

                          > How can we check the new server has a working local DNS cache?
                          > Check the file /etc/resolv.conf?

                          Yes, but also time MX, A and AAAA lookups for the destination relay.
                          How is the relay specified with or without surrounding "[]"?

                          > In peak hours 10:30:00~ 10:59:59, other servers running Postfix-2.6.6
                          > on RHEL 6.4 were fine.

                          That's meaningless, what was their output rate? What was their
                          input rate? What was the typical "c+d" latency. If you want help
                          with performance problems you need to start gathering and crunching
                          data, being lazy and avoiding hard numbers is not an option.

                          > Only this server running Postfix-2.6.6 on RHEL 5.10 experienced serious
                          > delay.

                          Delay happens when the input rate exceeds the output rate.

                          > Do we need change some parameters to increase delivery rate or
                          > set special channel/allocate fixed SMTP processes for specified
                          > outbound domains?

                          Random parameter twiddling rarely solves congestion, but it can
                          cause it. Before changing anything the reason for the congestion
                          needs to be identified. The output rate looks anaemic to me, why
                          is the output concurrency so low?

                          > Our outbound emails have four main destination domains to be
                          > relayed to Windows FOPE.
                          >
                          > Buckeyemail.osu.edu ---------------> mail.us.messaging.microsoft.com
                          > Gmail.com ---------------------------> mail.us.messaging.microsoft.com
                          > Yahoo.com ----------------------------> mail.us.messaging.microsoft.com
                          > Hotmail.com ---------------------------> mail.us.messaging.microsoft.com

                          Did you measure the output rate for all mail destined to this relay,
                          or just the first domain? The correct measurement is to aggregate
                          counts by transport next-hop. Please report output rates for all
                          these combined, or rather all mail with a relay of
                          "mail.us.messaging.microsoft.com".

                          --
                          Viktor.
                        • Xie, Wei
                          ... Symantec support service reaction is much slower than the experts including you. Anyways, thanks a lot! Carl ... From: owner-postfix-users@postfix.org
                          Message 12 of 20 , Apr 28, 2014
                          • 0 Attachment
                            >>Or local shot in foot via rate delays or overly low concurrency ceilings, be these perhaps motivated by strict receving system rate ceilings. If the receiving system makes it difficult enough >>to send mail get the problem fixed on their end, I am under the impression you're paying them money to be your service provider.

                            Symantec support service reaction is much slower than the experts including you.

                            Anyways, thanks a lot!

                            Carl

                            -----Original Message-----
                            From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                            Sent: Monday, April 28, 2014 1:32 PM
                            To: postfix-users@...
                            Subject: Re: RHEL 5.10 vs 6.4 performancs difference

                            On Mon, Apr 28, 2014 at 01:23:41PM -0400, Wietse Venema wrote:

                            > Xie, Wei:
                            > > [Symantec support issue]
                            >
                            > As Viktor noted, it is possible that they slow you down because
                            >
                            > - You're sending mail from a new IP address that has no reputation,
                            >
                            > - or you're sending mail from an old IP address that has sent them
                            > spam in the past,
                            >
                            > - or there is something about your DNS records that they don't like
                            > (the IP address is embedded in the name of the A/PTR record, or the
                            > PTR record name does not match the A record),
                            >
                            > - or Postfix is configured to use a myhostname that does not match the
                            > A/PTR record,
                            >
                            > - or you have not updated your SPF record to include the new machine,
                            >
                            > - or a list of other things.

                            Or local shot in foot via rate delays or overly low concurrency ceilings, be these perhaps motivated by strict receving system rate ceilings. If the receiving system makes it difficult enough to send mail get the problem fixed on their end, I am under the impression you're paying them money to be your service provider.

                            --
                            Viktor.
                          • Xie, Wei
                            Viktor, ... It is typo. Should be 11,127. I will carefully read other part of your email and digest them. Thanks, Carl ... From:
                            Message 13 of 20 , Apr 28, 2014
                            • 0 Attachment
                              Viktor,

                              > Today 11:00:00 ~ 11:29:59 the output rate of email to this domain in the 30 minutes was 111,27.

                              It is typo. Should be 11,127.

                              I will carefully read other part of your email and digest them.

                              Thanks,

                              Carl

                              -----Original Message-----
                              From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                              Sent: Monday, April 28, 2014 2:42 PM
                              To: postfix-users@...
                              Subject: Re: Backlog to outsourced email provider

                              On Mon, Apr 28, 2014 at 06:09:43PM +0000, Xie, Wei wrote:

                              > When congestion occurred, all other messages to the same domain were
                              > similarly delayed. The delay are longer and longer (the longest
                              > exceeded 1200 seconds) and the length of active queue is longer and
                              > longer (get to know from the outputs from commands 'qshape active' and
                              > 'mailq |grep \* |wc -l, the queued messages were over 9,000).

                              Clearly the output rate is not keeping up with the input rate.

                              > >>> What is the complete set of logs for this queue-id?
                              >
                              > Apr 28 10:47:11 cio-krc-pf03 postfix/smtpd[31853]: 9934181190:
                              > client=cio-tnc-ht06.osuad.osu.edu[164.107.81.171]
                              > Apr 28 10:47:11 cio-krc-pf03 postfix/qmgr[31812]: 9934181190:
                              > from=<erequest.do.not.reply@...>, size=1905, nrcpt=1 (queue
                              > active) Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
                              > to=<turek.16@...>,
                              > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967,
                              > delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0
                              > <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                              > [InternalId=9787221] Queued mail for delivery) Apr 28 11:03:18
                              > cio-krc-pf03 postfix/qmgr[31812]: 9934181190: removed

                              Yes indeed nothing seems to happen for 964 seconds sitting in the queue.

                              > >>>What was the output rate of email to this domain in the 30 minutes
                              > >>>preceeding this log entry? (Avoid counting multiple recipients
                              > >>>with the same queue-id, relay and remote server response as
                              > >>>separate deliveries).
                              >
                              > Today 10:00:00 ~ 10:29:59 the output rate of email to this domain in the 30 minutes was 15,361.
                              > Today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes was 28,827.
                              > Today 11:00:00 ~ 11:29:59 the output rate of email to this domain in the 30 minutes was 111,27.

                              Can you clarify that last one, is that ~11 thousand, the comma seems misplaced. The earlier rate appears to be ~100 messages per minute, or just over 1 per second. You should measure the average "c+d" in the log for these time frames, again counting multiple recipients in a single delivery as one event.

                              Supposing the 2.8 second delivery latency to typical, a delivery rate of 1-2 messages per second suggests a destination concurrency limit of "2", rather than the default limit of 20.

                              You need to post "postconf -n" output or at least:

                              default_destination_concurrency_limit
                              smtp_destination_concurrency_limit

                              check which transport is used for this domain, and if not "smtp", post the concurrency limit for that.

                              > >>> Have you configured any concurrency controls or rate delay for this destination?
                              >
                              > No. keep default unchanged. Which parameters for concurrency controls or rate delay need to be checked?

                              All <transport>_destination_concurrency_limit settings in main.cf

                              Any <transport>_destination_rate_delay settings in main.cf

                              > >>>This delay means a large number of messages waiting behind the messages currently being delivered, subject to concurrency and rate delays.
                              >
                              > How can we increase delivery rate so that b-delay is down?

                              Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at the beginning or at the end of the TLS session or just low bandwidth throughout.

                              > How can we check the new server has a working local DNS cache?
                              > Check the file /etc/resolv.conf?

                              Yes, but also time MX, A and AAAA lookups for the destination relay.
                              How is the relay specified with or without surrounding "[]"?

                              > In peak hours 10:30:00~ 10:59:59, other servers running Postfix-2.6.6
                              > on RHEL 6.4 were fine.

                              That's meaningless, what was their output rate? What was their input rate? What was the typical "c+d" latency. If you want help with performance problems you need to start gathering and crunching data, being lazy and avoiding hard numbers is not an option.

                              > Only this server running Postfix-2.6.6 on RHEL 5.10 experienced
                              > serious delay.

                              Delay happens when the input rate exceeds the output rate.

                              > Do we need change some parameters to increase delivery rate or set
                              > special channel/allocate fixed SMTP processes for specified outbound
                              > domains?

                              Random parameter twiddling rarely solves congestion, but it can cause it. Before changing anything the reason for the congestion needs to be identified. The output rate looks anaemic to me, why is the output concurrency so low?

                              > Our outbound emails have four main destination domains to be relayed
                              > to Windows FOPE.
                              >
                              > Buckeyemail.osu.edu ---------------> mail.us.messaging.microsoft.com
                              > Gmail.com ---------------------------> mail.us.messaging.microsoft.com
                              > Yahoo.com ---------------------------->
                              > mail.us.messaging.microsoft.com Hotmail.com
                              > ---------------------------> mail.us.messaging.microsoft.com

                              Did you measure the output rate for all mail destined to this relay, or just the first domain? The correct measurement is to aggregate counts by transport next-hop. Please report output rates for all these combined, or rather all mail with a relay of "mail.us.messaging.microsoft.com".

                              --
                              Viktor.
                            • Viktor Dukhovni
                              Note, my quick mental arithmetic was wrong, 28,000 in 30 minutes, is ~15 per second not ~1.5 per second, which at a latency of 2.8 seconds, is closer to a
                              Message 14 of 20 , Apr 28, 2014
                              • 0 Attachment
                                Note, my "quick" mental arithmetic was wrong, 28,000 in 30 minutes,
                                is ~15 per second not ~1.5 per second, which at a latency of 2.8
                                seconds, is closer to a respectable concurrency of ~40.

                                If this output rate is not high enough, you need to work with the
                                remote vendor to reduce latency, or raise your concurrency further
                                still.

                                Post relevant entries from your transport table, and all concurrency
                                parameters (or just postconf -n).

                                --
                                Viktor.
                              • Xie, Wei
                                Victor, ... Here are the output of posconf -n : alias_database = dbm:/etc/aliases alias_maps = hash:/etc/aliases command_directory = /usr/sbin
                                Message 15 of 20 , Apr 28, 2014
                                • 0 Attachment
                                  Victor,

                                  >>You need to post "postconf -n" output or at least:
                                  >>
                                  >> default_destination_concurrency_limit
                                  >> smtp_destination_concurrency_limit
                                  >>
                                  >>check which transport is used for this domain, and if not "smtp", post the concurrency limit for that.

                                  Here are the output of 'posconf -n':

                                  alias_database = dbm:/etc/aliases
                                  alias_maps = hash:/etc/aliases
                                  command_directory = /usr/sbin
                                  config_directory = /etc/postfix
                                  daemon_directory = /usr/libexec/postfix
                                  data_directory = /var/lib/postfix
                                  debug_peer_level = 2
                                  header_checks = regexp:/etc/postfix/header_checks
                                  html_directory = no
                                  inet_interfaces = $myhostname, localhost
                                  inet_protocols = ipv4
                                  mail_owner = postfix
                                  mailbox_size_limit = 53000000
                                  mailq_path = /usr/bin/mailq.postfix
                                  manpage_directory = /usr/share/man
                                  message_size_limit = 52428800
                                  mydestination = $myhostname, localhost.$mydomain, localhost
                                  mydomain = osuad.osu.edu
                                  myhostname = cio-krc-pf03.osuad.osu.edu
                                  newaliases_path = /usr/bin/newaliases.postfix
                                  queue_directory = /var/spool/postfix
                                  readme_directory = /usr/share/doc/postfix-2.6.14-documentation/readme
                                  relayhost = mail.us.messaging.microsoft.com
                                  sample_directory = /usr/share/doc/postfix-2.6.14-documentation/samples
                                  sendmail_path = /usr/sbin/sendmail.postfix
                                  setgid_group = postdrop
                                  smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
                                  smtp_tls_loglevel = 2
                                  smtp_tls_note_starttls_offer = yes
                                  smtp_tls_security_level = encrypt
                                  smtp_tls_session_cache_database = btree:/var/lib/postfix/smtp_scache
                                  smtpd_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
                                  smtpd_tls_CApath = /etc/postfix/service_certs/osu_ues
                                  smtpd_tls_cert_file = /etc/postfix/service_certs/osu_ues/OSU_UES_WC_Cert_Ex_certificate.pem
                                  smtpd_tls_key_file = /etc/postfix/service_certs/osu_ues/OSU_UES_WC_Cert_Ex_key.pem
                                  smtpd_tls_loglevel = 2
                                  smtpd_tls_received_header = yes
                                  smtpd_tls_security_level = may
                                  smtpd_tls_session_cache_database = btree:/var/lib/postfix/smtpd_scache
                                  unknown_local_recipient_reject_code = 550

                                  Here are the settings for the following two parameters:

                                  default_destination_concurrency_limit = 20
                                  smtp_destination_concurrency_limit = $default_destination_concurrency_limit

                                  No transport is used so far.

                                  > >>> Have you configured any concurrency controls or rate delay for this destination?
                                  >
                                  > No. keep default unchanged. Which parameters for concurrency controls or rate delay need to be checked?
                                  >
                                  > All <transport>_destination_concurrency_limit settings in main.cf
                                  >
                                  > Any <transport>_destination_rate_delay settings in main.cf

                                  There is no parameter defined for transport in main.cf.

                                  >>Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at >>the beginning or at the end of the TLS session or just low bandwidth throughout.

                                  We prefer to increase concurrency.


                                  >>> How can we check the new server has a working local DNS cache?
                                  >>> Check the file /etc/resolv.conf?
                                  >>
                                  >>Yes, but also time MX, A and AAAA lookups for the destination relay.

                                  # time dig MX mail.us.messaging.microsoft.com
                                  real 0m0.014s
                                  user 0m0.002s
                                  sys 0m0.003s

                                  # time dig A mail.us.messaging.microsoft.com
                                  real 0m0.016s
                                  user 0m0.004s
                                  sys 0m0.004s

                                  # time dig AAAA mail.us.messaging.microsoft.com
                                  real 0m0.058s
                                  user 0m0.001s
                                  sys 0m0.004s

                                  >>How is the relay specified with or without surrounding "[]"?

                                  Without surrounding "[]".

                                  relayhost = mail.us.messaging.microsoft.com

                                  >> In peak hours 10:30:00~ 10:59:59, other servers running Postfix-2.6.6
                                  >> on RHEL 6.4 were fine.
                                  >
                                  >That's meaningless, what was their output rate? What was their input rate? What was the typical "c+d" latency. If you want help with performance problems you need to start gathering >and crunching data, being lazy and avoiding hard numbers is not an option.

                                  You are total correct.

                                  On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes was 10,928.

                                  On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes were 4,824 ~ 6,564.

                                  >> Our outbound emails have four main destination domains to be relayed
                                  >> to Windows FOPE.
                                  >>
                                  >> Buckeyemail.osu.edu ---------------> mail.us.messaging.microsoft.com
                                  >> Gmail.com ---------------------------> mail.us.messaging.microsoft.com
                                  >> Yahoo.com ----------------------------> mail.us.messaging.microsoft.com
                                  >>Hotmail.com ---------------------------> mail.us.messaging.microsoft.com

                                  >Did you measure the output rate for all mail destined to this relay, or just the first domain? The correct measurement is to aggregate counts by transport next-hop. Please report output >rates for all these combined, or rather all mail with a relay of "mail.us.messaging.microsoft.com".

                                  We measure the output rate for all mail destined to this relay. I use your criteria to double check again and get the following output rate. The data which I gave to you in previous email is not accurate.

                                  Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in the 30 minutes was 9,623.

                                  Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in the 30 minutes was 10,928.
                                  the output rate of email to domain "buckeymail.osu.edu" via this relay was 6,399
                                  the output rate of email to domain "gmail.com" via this relay was 2,803
                                  the output rate of email to domain "yahoo.com" via this relay was 619
                                  the output rate of email to domain "Hotmail.com" via this relay was 336
                                  the output rate of email to other domains via this relay was 771

                                  Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in the 30 minutes was 15,597.

                                  Thanks,

                                  Carl

                                  -----Original Message-----
                                  From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                                  Sent: Monday, April 28, 2014 2:42 PM
                                  To: postfix-users@...
                                  Subject: Re: Backlog to outsourced email provider

                                  On Mon, Apr 28, 2014 at 06:09:43PM +0000, Xie, Wei wrote:

                                  > When congestion occurred, all other messages to the same domain were
                                  > similarly delayed. The delay are longer and longer (the longest
                                  > exceeded 1200 seconds) and the length of active queue is longer and
                                  > longer (get to know from the outputs from commands 'qshape active' and
                                  > 'mailq |grep \* |wc -l, the queued messages were over 9,000).

                                  Clearly the output rate is not keeping up with the input rate.

                                  > >>> What is the complete set of logs for this queue-id?
                                  >
                                  > Apr 28 10:47:11 cio-krc-pf03 postfix/smtpd[31853]: 9934181190:
                                  > client=cio-tnc-ht06.osuad.osu.edu[164.107.81.171]
                                  > Apr 28 10:47:11 cio-krc-pf03 postfix/qmgr[31812]: 9934181190:
                                  > from=<erequest.do.not.reply@...>, size=1905, nrcpt=1 (queue
                                  > active) Apr 28 11:03:18 cio-krc-pf03 postfix/smtp[5015]: 9934181190:
                                  > to=<turek.16@...>,
                                  > relay=mail.us.messaging.microsoft.com[216.32.181.178]:25, delay=967,
                                  > delays=0/964/1.5/1.3, dsn=2.6.0, status=sent (250 2.6.0
                                  > <27520027.166481398696426625.JavaMail.erequest.do.not.reply@...>
                                  > [InternalId=9787221] Queued mail for delivery) Apr 28 11:03:18
                                  > cio-krc-pf03 postfix/qmgr[31812]: 9934181190: removed

                                  Yes indeed nothing seems to happen for 964 seconds sitting in the queue.

                                  > >>>What was the output rate of email to this domain in the 30 minutes
                                  > >>>preceeding this log entry? (Avoid counting multiple recipients
                                  > >>>with the same queue-id, relay and remote server response as
                                  > >>>separate deliveries).
                                  >
                                  > Today 10:00:00 ~ 10:29:59 the output rate of email to this domain in the 30 minutes was 15,361.
                                  > Today 10:30:00 ~ 10:59:59 the output rate of email to this domain in the 30 minutes was 28,827.
                                  > Today 11:00:00 ~ 11:29:59 the output rate of email to this domain in the 30 minutes was 111,27.

                                  Can you clarify that last one, is that ~11 thousand, the comma seems misplaced. The earlier rate appears to be ~100 messages per minute, or just over 1 per second. You should measure the average "c+d" in the log for these time frames, again counting multiple recipients in a single delivery as one event.

                                  Supposing the 2.8 second delivery latency to typical, a delivery rate of 1-2 messages per second suggests a destination concurrency limit of "2", rather than the default limit of 20.

                                  You need to post "postconf -n" output or at least:

                                  default_destination_concurrency_limit
                                  smtp_destination_concurrency_limit

                                  check which transport is used for this domain, and if not "smtp", post the concurrency limit for that.

                                  > >>> Have you configured any concurrency controls or rate delay for this destination?
                                  >
                                  > No. keep default unchanged. Which parameters for concurrency controls or rate delay need to be checked?

                                  All <transport>_destination_concurrency_limit settings in main.cf

                                  Any <transport>_destination_rate_delay settings in main.cf

                                  > >>>This delay means a large number of messages waiting behind the messages currently being delivered, subject to concurrency and rate delays.
                                  >
                                  > How can we increase delivery rate so that b-delay is down?

                                  Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at the beginning or at the end of the TLS session or just low bandwidth throughout.

                                  > How can we check the new server has a working local DNS cache?
                                  > Check the file /etc/resolv.conf?

                                  Yes, but also time MX, A and AAAA lookups for the destination relay.
                                  How is the relay specified with or without surrounding "[]"?

                                  > In peak hours 10:30:00~ 10:59:59, other servers running Postfix-2.6.6
                                  > on RHEL 6.4 were fine.

                                  That's meaningless, what was their output rate? What was their input rate? What was the typical "c+d" latency. If you want help with performance problems you need to start gathering and crunching data, being lazy and avoiding hard numbers is not an option.

                                  > Only this server running Postfix-2.6.6 on RHEL 5.10 experienced
                                  > serious delay.

                                  Delay happens when the input rate exceeds the output rate.

                                  > Do we need change some parameters to increase delivery rate or set
                                  > special channel/allocate fixed SMTP processes for specified outbound
                                  > domains?

                                  Random parameter twiddling rarely solves congestion, but it can cause it. Before changing anything the reason for the congestion needs to be identified. The output rate looks anaemic to me, why is the output concurrency so low?

                                  > Our outbound emails have four main destination domains to be relayed
                                  > to Windows FOPE.
                                  >
                                  > Buckeyemail.osu.edu ---------------> mail.us.messaging.microsoft.com
                                  > Gmail.com ---------------------------> mail.us.messaging.microsoft.com
                                  > Yahoo.com ---------------------------->
                                  > mail.us.messaging.microsoft.com Hotmail.com
                                  > ---------------------------> mail.us.messaging.microsoft.com

                                  Did you measure the output rate for all mail destined to this relay, or just the first domain? The correct measurement is to aggregate counts by transport next-hop. Please report output rates for all these combined, or rather all mail with a relay of "mail.us.messaging.microsoft.com".

                                  --
                                  Viktor.
                                • Viktor Dukhovni
                                  ... This is effectively a miniature transport entry: relay_transport = relay:mail.us.messaging.microsoft.com default_transport =
                                  Message 16 of 20 , Apr 28, 2014
                                  • 0 Attachment
                                    On Mon, Apr 28, 2014 at 11:05:56PM +0000, Xie, Wei wrote:

                                    > header_checks = regexp:/etc/postfix/header_checks
                                    > relayhost = mail.us.messaging.microsoft.com

                                    This is effectively a miniature transport entry:

                                    relay_transport = relay:mail.us.messaging.microsoft.com
                                    default_transport = relay:mail.us.messaging.microsoft.com

                                    Don't know whether the vendor intends for you to do MX lookups here
                                    or not (you're doing MX lookups). The MX record just returns the
                                    original hostname.

                                    $ dig +noall +ans -t mx mail.us.messaging.microsoft.com
                                    mail.us.messaging.microsoft.com. IN MX 10 mail.us.messaging.microsoft.com.

                                    $ dig +noall +ans -t a mail.us.messaging.microsoft.com
                                    mail.us.messaging.microsoft.com. IN A 216.32.181.178
                                    mail.us.messaging.microsoft.com. IN A 216.32.180.22

                                    > smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
                                    > smtp_tls_loglevel = 2
                                    > smtpd_tls_loglevel = 2

                                    You're killing your syslog daemon with debug logging. Why is the
                                    TLS loglevel set to 2? Have you looked at your logs? They are
                                    full of debugging noise and likely severely limit performance.
                                    For normal operation set the log level to 1. Also make sure your
                                    syslogd is not doing synchronous logging of each log entry.

                                    > smtp_tls_note_starttls_offer = yes

                                    Futile, given:

                                    > smtp_tls_security_level = encrypt

                                    > Here are the settings for the following two parameters:
                                    >
                                    > default_destination_concurrency_limit = 20

                                    Fix your logging, then measure again. A concurrency of 20 may be
                                    sufficient when the log level is sane.

                                    > smtp_destination_concurrency_limit = $default_destination_concurrency_limit

                                    This is redundant.

                                    > >>Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at >>the beginning or at the end of the TLS session or just low bandwidth throughout.
                                    >
                                    > We prefer to increase concurrency.

                                    The vendor might limit your concurrency, don't do that quite yet.

                                    > >>How is the relay specified with or without surrounding "[]"?
                                    >
                                    > Without surrounding "[]".
                                    >
                                    > relayhost = mail.us.messaging.microsoft.com

                                    Ask the vendor whether they want you to use MX indirection or not.

                                    > On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of
                                    > email to this domain in the 30 minutes was 10,928.
                                    >
                                    > On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate of
                                    > email to this domain in the 30 minutes were 4,824 ~ 6,564.

                                    You're comparing apples and oranges, the RHEL 6 hosts don't receive
                                    nearly enough traffic to be congested, they would perhaps be equally
                                    congested under the same load. However, they may have sensibly
                                    configured logging with TLS loglevel 1, and/or no synchronous log
                                    writes.

                                    > Today 10:00:00 ~ 10:29:59 the output rate of email to this relay
                                    > in the 30 minutes was 9,623.
                                    >
                                    > Today 10:30:00 ~ 10:59:59 the output rate of email to this relay
                                    > in the 30 minutes was 10,928.

                                    That's more like it: Throughput * Latency = Concurrency

                                    10928 / 1800 * 2.8 = 16.8

                                    So with latencies around 2.8 seconds your estimate concurrency is
                                    ~17 which is close enough to 20. The problem is either that your
                                    syslogd is overwhelmed and too slow or the vendor service is too slow.
                                    Fix the first problem first.


                                    > Today 11:00:00 ~ 11:29:59 the output rate of email to this relay
                                    > in the 30 minutes was 15,597.

                                    15597 / 1800 * 2.8 = 22.4

                                    So the latency number from that one message is likely a bit above
                                    average. Understand and memorize this simple formula:

                                    Throughput = Concurrency / Latency

                                    fix your logging settings in main.cf and make sure that you follow
                                    the advise at the bottom of:

                                    http://www.postfix.org/LINUX_README.html

                                    Syslogd performance

                                    LINUX syslogd uses synchronous writes by default. Because of
                                    this, syslogd can actually use more system resources than
                                    Postfix. To avoid such badness, disable synchronous mail logfile
                                    writes by editing /etc/syslog.conf and by prepending a "-" to
                                    the logfile name:

                                    /etc/syslog.conf:
                                    mail.* -/var/log/mail.log

                                    Send a "kill -HUP" to the syslogd to make the change effective.

                                    --
                                    Viktor.
                                  • Xie, Wei
                                    ... On our Symantec DLP servers running on RHEL 6.4 and RHEL 5.10, TLS loglevel is set to 2 in /etc/postfix/main.cf and syslog for mail activities is set to
                                    Message 17 of 20 , Apr 28, 2014
                                    • 0 Attachment
                                      >>>You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly enough traffic to be congested, they would perhaps be equally congested under the same load. However, >>>they may have sensibly configured logging with TLS loglevel 1, and/or no synchronous log writes

                                      On our Symantec DLP servers running on RHEL 6.4 and RHEL 5.10, TLS loglevel is set to 2 in /etc/postfix/main.cf and syslog for mail activities is set to non-synchronous log writes in /etc/rsyslog.conf.

                                      smtp_tls_loglevel = 2
                                      smtpd_tls_loglevel = 2

                                      # Log all the mail messages in one place.
                                      mail.* -/var/log/maillog

                                      I do not know why today RHEL 5.10 server got much more traffic than other RHEL 6.4 servers in morning peak hours today.

                                      >>> Fix your logging, then measure again. A concurrency of 20 may be sufficient when the log level is sane.

                                      On this RHEL 5.10 server, I have changed tls_loglevel to 1 instead of 2 in /etc/postfix/main.cf. We will keep watch the logs in peak hours this week.

                                      smtp_tls_loglevel = 1
                                      smtpd_tls_loglevel = 1

                                      >>> Ask the vendor whether they want you to use MX indirection or not.

                                      Last time Symantec vendor told us it was fine to do MX lookup and Window FOPE vendor recommended us to do MX lookup.

                                      Any ways, thank you so much for great helps! I am really learning a lot from you!!!

                                      Good night,

                                      Carl

                                      -----Original Message-----
                                      From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                                      Sent: Monday, April 28, 2014 7:45 PM
                                      To: postfix-users@...
                                      Subject: Re: Backlog to outsourced email provider

                                      On Mon, Apr 28, 2014 at 11:05:56PM +0000, Xie, Wei wrote:

                                      > header_checks = regexp:/etc/postfix/header_checks relayhost =
                                      > mail.us.messaging.microsoft.com

                                      This is effectively a miniature transport entry:

                                      relay_transport = relay:mail.us.messaging.microsoft.com
                                      default_transport = relay:mail.us.messaging.microsoft.com

                                      Don't know whether the vendor intends for you to do MX lookups here or not (you're doing MX lookups). The MX record just returns the original hostname.

                                      $ dig +noall +ans -t mx mail.us.messaging.microsoft.com
                                      mail.us.messaging.microsoft.com. IN MX 10 mail.us.messaging.microsoft.com.

                                      $ dig +noall +ans -t a mail.us.messaging.microsoft.com
                                      mail.us.messaging.microsoft.com. IN A 216.32.181.178
                                      mail.us.messaging.microsoft.com. IN A 216.32.180.22

                                      > smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
                                      > smtp_tls_loglevel = 2
                                      > smtpd_tls_loglevel = 2

                                      You're killing your syslog daemon with debug logging. Why is the TLS loglevel set to 2? Have you looked at your logs? They are full of debugging noise and likely severely limit performance.
                                      For normal operation set the log level to 1. Also make sure your syslogd is not doing synchronous logging of each log entry.

                                      > smtp_tls_note_starttls_offer = yes

                                      Futile, given:

                                      > smtp_tls_security_level = encrypt

                                      > Here are the settings for the following two parameters:
                                      >
                                      > default_destination_concurrency_limit = 20

                                      Fix your logging, then measure again. A concurrency of 20 may be sufficient when the log level is sane.

                                      > smtp_destination_concurrency_limit =
                                      > $default_destination_concurrency_limit

                                      This is redundant.

                                      > >>Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at >>the beginning or at the end of the TLS session or just low bandwidth throughout.
                                      >
                                      > We prefer to increase concurrency.

                                      The vendor might limit your concurrency, don't do that quite yet.

                                      > >>How is the relay specified with or without surrounding "[]"?
                                      >
                                      > Without surrounding "[]".
                                      >
                                      > relayhost = mail.us.messaging.microsoft.com

                                      Ask the vendor whether they want you to use MX indirection or not.

                                      > On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of
                                      > email to this domain in the 30 minutes was 10,928.
                                      >
                                      > On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate
                                      > of email to this domain in the 30 minutes were 4,824 ~ 6,564.

                                      You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly enough traffic to be congested, they would perhaps be equally congested under the same load. However, they may have sensibly configured logging with TLS loglevel 1, and/or no synchronous log writes.

                                      > Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in
                                      > the 30 minutes was 9,623.
                                      >
                                      > Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in
                                      > the 30 minutes was 10,928.

                                      That's more like it: Throughput * Latency = Concurrency

                                      10928 / 1800 * 2.8 = 16.8

                                      So with latencies around 2.8 seconds your estimate concurrency is
                                      ~17 which is close enough to 20. The problem is either that your syslogd is overwhelmed and too slow or the vendor service is too slow.
                                      Fix the first problem first.


                                      > Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in
                                      > the 30 minutes was 15,597.

                                      15597 / 1800 * 2.8 = 22.4

                                      So the latency number from that one message is likely a bit above average. Understand and memorize this simple formula:

                                      Throughput = Concurrency / Latency

                                      fix your logging settings in main.cf and make sure that you follow the advise at the bottom of:

                                      http://www.postfix.org/LINUX_README.html

                                      Syslogd performance

                                      LINUX syslogd uses synchronous writes by default. Because of
                                      this, syslogd can actually use more system resources than
                                      Postfix. To avoid such badness, disable synchronous mail logfile
                                      writes by editing /etc/syslog.conf and by prepending a "-" to
                                      the logfile name:

                                      /etc/syslog.conf:
                                      mail.* -/var/log/mail.log

                                      Send a "kill -HUP" to the syslogd to make the change effective.

                                      --
                                      Viktor.
                                    • Xie, Wei
                                      Victor, ... Monday night I fixed my logging as below. ... Here are throughput per 30 minutes from 8:00 through 18:00 yesterday and today April 29 (Tuesday):
                                      Message 18 of 20 , Apr 30, 2014
                                      • 0 Attachment
                                        Victor,

                                        >>>Fix your logging, then measure again. A concurrency of 20 may be sufficient when the log level is sane.

                                        Monday night I fixed my logging as below.

                                        > smtp_tls_loglevel = 1
                                        > smtpd_tls_loglevel = 1

                                        Here are throughput per 30 minutes from 8:00 through 18:00 yesterday and today

                                        April 29 (Tuesday):
                                        08:00:00 - 08:29:59: 10961
                                        08:30:00 - 08:59:59: 13615
                                        09:00:00 - 09:29:59: 14595
                                        09:30:00 - 09:59:59: 8773
                                        10:00:00 - 10:29:59: 14430
                                        10:30:00 - 10:59:59: 10008
                                        11:00:00 - 11:29:59: 15775
                                        11:30:00 - 11:59:59: 8831
                                        12:00:00 - 12:29:59: 10278
                                        12:30:00 - 12:59:59: 7385
                                        13:00:00 - 13:29:59: 10667
                                        13:30:00 - 13:59:59: 11157
                                        14:00:00 - 14:29:59: 14754
                                        14:30:00 - 14:59:59: 16204
                                        15:00:00 - 15:29:59: 14562
                                        15:30:00 - 15:59:59: 8669
                                        16:00:00 - 16:29:59: 12502
                                        16:30:00 - 16:59:59: 5390
                                        17:00:00 - 17:29:59: 10168
                                        17:30:00 - 17:59:59: 11201
                                        18:00:00 - 18:29:59: 11841
                                        18:30:00 - 18:59:59: 5495

                                        April 30 (Wednesday):
                                        08:00:00 - 08:29:59: 12537
                                        08:30:00 - 08:59:59: 6535
                                        09:00:00 - 09:29:59: 10978
                                        09:30:00 - 09:59:59: 9147
                                        10:00:00 - 10:29:59: 18220
                                        10:30:00 - 10:59:59: 12779
                                        11:00:00 - 11:29:59: 12659
                                        11:30:00 - 11:59:59: 8974
                                        12:00:00 - 12:29:59: 13835
                                        12:30:00 - 12:59:59: 14805
                                        13:00:00 - 13:29:59: 16831
                                        13:30:00 - 13:59:59: 7153
                                        14:00:00 - 14:29:59: 11017
                                        14:30:00 - 14:59:59: 10422
                                        15:00:00 - 15:29:59: 15617
                                        15:30:00 - 15:59:59: 11271
                                        16:00:00 - 16:29:59: 11120
                                        16:30:00 - 16:59:59: 7963
                                        17:00:00 - 17:29:59: 7759
                                        17:30:00 - 17:59:59: 4817
                                        18:00:00 - 18:29:59: 5815
                                        18:30:00 - 18:59:59: 3581

                                        >>>Understand and memorize this simple formula:
                                        >>>
                                        >>> Throughput = Concurrency / Latency

                                        If Latency = 20, Concurrency=2.8s, Throughput=7.14286/second, which is equal to 12,857/30minutes. Is this a threshold? If real throughput is approximately greater than this number, delays will obviously occurred in peak hours as below on Monday, right?

                                        April 28 (Monday):
                                        10:30:00 - 10:59:59: 10928 delays (>=120s) were consecutive - 121s ~ 2402s
                                        11:00:00 - 11:29:59: 15597 delays (>=120s) were consecutive - 648s ~ 2380s
                                        11:30:00 - 11:59:59: 3821 delays (>=120s) were consecutive - 514s ~ 813s

                                        If we increase default_destination_concurrency_limit = 30, the threshold of throughput per 30 minutes will be19,285.71 ( 30/2.8 * 1800=19,285.71), which is greater than throughput yesterday and today. Does this avoid obvious delays?

                                        Also, I read Postfix Performance Tuning at URL http://www.postfix.org/TUNING_README.html about "Tuning the number of simultaneous deliveries" and " Tuning the number of recipients per delivery".

                                        * For high volume destination, it seems we are able to increase default_destination_concurrency (20->30?) and lower smtp_connection_timeout (30s ->5s);
                                        * For high volume destination, it seems we are able to increase default_destination_recipient_limit (50 ->100?)

                                        And, I read Postfix Bottleneck Analysis URL http://www.postfix.org/QSHAPE_README.html about " The active queue"

                                        (The only way to reduce congestion is to either reduce the input rate or increase the throughput. Increasing the throughput requires either increasing the concurrency or reducing the latency of deliveries.

                                        For high volume sites a key tuning parameter is the number of "smtp" delivery agents allocated to the "smtp" and "relay" transports. High volume sites tend to send to many different destinations, many of which may be down or slow, so a good fraction of the available delivery agents will be blocked waiting for slow sites. Also mail destined across the globe will incur large SMTP command-response latencies, so high message throughput can only be achieved with more concurrent delivery agents. )

                                        and " Example 4: High volume destination backlog", including the following paragraph:

                                        ************************************
                                        Postfix version 2.5 and later:

                                        In master.cf set up a dedicated clone of the "smtp" transport for the destination in question. In the example below we will call it "fragile".

                                        In master.cf configure a reasonable process limit for the cloned smtp transport (a number in the 10-20 range is typical).

                                        IMPORTANT!!! In main.cf configure a large per-destination pseudo-cohort failure limit for the cloned smtp transport.

                                        /etc/postfix/main.cf:
                                        transport_maps = hash:/etc/postfix/transport
                                        fragile_destination_concurrency_failed_cohort_limit = 100
                                        fragile_destination_concurrency_limit = 20

                                        /etc/postfix/transport:
                                        example.com fragile:

                                        /etc/postfix/master.cf:
                                        # service type private unpriv chroot wakeup maxproc command
                                        fragile unix - - n - 20 smtp

                                        See also the documentation for default_destination_concurrency_failed_cohort_limit and default_destination_concurrency_limit
                                        *******************************************************************************************************

                                        Can we divide destination domains into three transport groups and create two clones of the "smtp" transport? We do the configuration test on our test server and it seems new extra smtp processes are created for both "buckeye" transport and "famous-ISP" transport although default smtp processes are still created . Will this change further reduce latency and provide more concurrency so that throughput will increase ?

                                        Group1 - buckeyemail.osu.edu uses "buckeye" transport
                                        Group2 - gmail.com, yahoo.com and Hotmail.com use "famous-ISP" transport
                                        Group3 - other domains use default "smtp" transport

                                        /etc/postfix/main.cf:
                                        transport_maps = hash:/etc/postfix/transport
                                        buckeye_destination_concurrency_failed_cohort_limit = 100
                                        buckeye_destination_concurrency_limit = 30
                                        famous-ISP_destination_concurrency_failed_cohort_limit = 100
                                        famous-ISP_destination_concurrency_limit = 20

                                        /etc/postfix/transport:
                                        Buckeyemail.osu.edu buckeye: mail.us.messaging.microsoft.com
                                        Gmail.com famous-ISP: mail.us.messaging.microsoft.com
                                        Yahoo.com famous-ISP: mail.us.messaging.microsoft.com
                                        Hotmail.com famous-ISP: mail.us.messaging.microsoft.com

                                        /etc/postfix/master.cf:
                                        # service type private unpriv chroot wakeup maxproc command
                                        buckeye unix - - n - 30 smtp
                                        -o smtp_connect_timeout=5
                                        famous-ISP unix - - n - 20 smtp

                                        Thanks and good night,

                                        Carl

                                        -----Original Message-----
                                        From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                                        Sent: Monday, April 28, 2014 7:45 PM
                                        To: postfix-users@...
                                        Subject: Re: Backlog to outsourced email provider

                                        On Mon, Apr 28, 2014 at 11:05:56PM +0000, Xie, Wei wrote:

                                        > header_checks = regexp:/etc/postfix/header_checks relayhost =
                                        > mail.us.messaging.microsoft.com

                                        This is effectively a miniature transport entry:

                                        relay_transport = relay:mail.us.messaging.microsoft.com
                                        default_transport = relay:mail.us.messaging.microsoft.com

                                        Don't know whether the vendor intends for you to do MX lookups here or not (you're doing MX lookups). The MX record just returns the original hostname.

                                        $ dig +noall +ans -t mx mail.us.messaging.microsoft.com
                                        mail.us.messaging.microsoft.com. IN MX 10 mail.us.messaging.microsoft.com.

                                        $ dig +noall +ans -t a mail.us.messaging.microsoft.com
                                        mail.us.messaging.microsoft.com. IN A 216.32.181.178
                                        mail.us.messaging.microsoft.com. IN A 216.32.180.22

                                        > smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
                                        > smtp_tls_loglevel = 2
                                        > smtpd_tls_loglevel = 2

                                        You're killing your syslog daemon with debug logging. Why is the TLS loglevel set to 2? Have you looked at your logs? They are full of debugging noise and likely severely limit performance.
                                        For normal operation set the log level to 1. Also make sure your syslogd is not doing synchronous logging of each log entry.

                                        > smtp_tls_note_starttls_offer = yes

                                        Futile, given:

                                        > smtp_tls_security_level = encrypt

                                        > Here are the settings for the following two parameters:
                                        >
                                        > default_destination_concurrency_limit = 20

                                        Fix your logging, then measure again. A concurrency of 20 may be sufficient when the log level is sane.

                                        > smtp_destination_concurrency_limit =
                                        > $default_destination_concurrency_limit

                                        This is redundant.

                                        > >>Either increase concurrency or reduce latency. Network captures may show which protocol stage is responsible for most of the delay, even with TLS one can tell whether the delay is at >>the beginning or at the end of the TLS session or just low bandwidth throughout.
                                        >
                                        > We prefer to increase concurrency.

                                        The vendor might limit your concurrency, don't do that quite yet.

                                        > >>How is the relay specified with or without surrounding "[]"?
                                        >
                                        > Without surrounding "[]".
                                        >
                                        > relayhost = mail.us.messaging.microsoft.com

                                        Ask the vendor whether they want you to use MX indirection or not.

                                        > On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of
                                        > email to this domain in the 30 minutes was 10,928.
                                        >
                                        > On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate
                                        > of email to this domain in the 30 minutes were 4,824 ~ 6,564.

                                        You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly enough traffic to be congested, they would perhaps be equally congested under the same load. However, they may have sensibly configured logging with TLS loglevel 1, and/or no synchronous log writes.

                                        > Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in
                                        > the 30 minutes was 9,623.
                                        >
                                        > Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in
                                        > the 30 minutes was 10,928.

                                        That's more like it: Throughput * Latency = Concurrency

                                        10928 / 1800 * 2.8 = 16.8

                                        So with latencies around 2.8 seconds your estimate concurrency is
                                        ~17 which is close enough to 20. The problem is either that your syslogd is overwhelmed and too slow or the vendor service is too slow.
                                        Fix the first problem first.


                                        > Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in
                                        > the 30 minutes was 15,597.

                                        15597 / 1800 * 2.8 = 22.4

                                        So the latency number from that one message is likely a bit above average. Understand and memorize this simple formula:

                                        Throughput = Concurrency / Latency

                                        fix your logging settings in main.cf and make sure that you follow the advise at the bottom of:

                                        http://www.postfix.org/LINUX_README.html

                                        Syslogd performance

                                        LINUX syslogd uses synchronous writes by default. Because of
                                        this, syslogd can actually use more system resources than
                                        Postfix. To avoid such badness, disable synchronous mail logfile
                                        writes by editing /etc/syslog.conf and by prepending a "-" to
                                        the logfile name:

                                        /etc/syslog.conf:
                                        mail.* -/var/log/mail.log

                                        Send a "kill -HUP" to the syslogd to make the change effective.

                                        --
                                        Viktor.
                                      • Viktor Dukhovni
                                        ... Good. ... When your queue is not congested (queue backlog is negligible), the output rate is simply equal to the input rate and does not mean much other
                                        Message 19 of 20 , Apr 30, 2014
                                        • 0 Attachment
                                          On Thu, May 01, 2014 at 04:25:01AM +0000, Xie, Wei wrote:

                                          > Monday night I fixed my logging as below.
                                          >
                                          > > smtp_tls_loglevel = 1
                                          > > smtpd_tls_loglevel = 1

                                          Good.

                                          > Here are throughput per 30 minutes from 8:00 through 18:00 yesterday and today

                                          When your queue is not congested (queue backlog is negligible),
                                          the output rate is simply equal to the input rate and does not mean
                                          much other than that you're sending below the peak output capacity
                                          (which is a good thing).

                                          Therefore, while measuring output rates, you need to also determine
                                          whether there is indeed a backlog. Therefore, associated with all
                                          these numbers you need to track:

                                          - Exponentially smoothed moving average "c+d" values.
                                          - Exponentially smoothed moving average "b" values.

                                          The "c+d" values (abnormally high delivery latency) will measure
                                          potential remote causes of congestion, while "b" value will measure
                                          the resulting delays.

                                          The exponential smoothing avoids undue contribution from single
                                          message spikes and quickly forgets stale history. For each new
                                          delivery (again avoid double-counting multiple recipients in a
                                          single message) apply something like the Perl snippet below:

                                          $alpha = 0.05; # If you want less noisy data at the cost of not seeing
                                          # some shorter-term spikes, reduce $alpha to ~0.02.
                                          $b_moving = (1-$alpha) * $b_moving + $alpha * $b;
                                          $cd_moving = (1-$alpha) * $cd_moving + $alpha * $cd;

                                          Then print "$b_moving" and "$cd_moving" every 100 or so deliveries.

                                          > >>>Understand and memorize this simple formula:
                                          > >>>
                                          > >>> Throughput = Concurrency / Latency
                                          >
                                          > If Latency = 20,

                                          Latency has units of time, it is how long it takes to deliver a
                                          single message, so the above makes no sense.

                                          > Concurrency=2.8s,

                                          Concurrency is dimensionless, it counts the number of simultaneous
                                          deliveries. This makes no sense.

                                          You have to *measure* the latency (smoothed "c+d"), not guess from
                                          a single message. That was just a crude estimate based on the drop
                                          of water provided to estimate the number of fish in the ocean.

                                          > Is this a threshold?

                                          The output rate cannot exceed the peak concurrency divided by the
                                          average latency. This is only a problem if the input rate is higher
                                          still. For email, the solution is to first work to eliminate anomalous
                                          latency, and then if possible increase concurrency.

                                          "Money can buy bandwidth, but latency is forever".
                                          -- John Mashey, MIPS

                                          The latency for email delivery between well functioning systems is
                                          often two orders of magnitude smaller than the latency when something
                                          is wrong. So it makes sense to first control the latency, but
                                          physics imposes tight lower bounds, at which point if more throughput
                                          is required, you need more concurrency, and email is delivery highly
                                          parallelizable.

                                          > If real throughput is approximately greater than this number,

                                          s/throughput/input/

                                          > delays will obviously occurred in peak hours as below on Monday, right?

                                          Mail piles up when it arrives faster than it leaves.

                                          > If we increase default_destination_concurrency_limit = 30, the
                                          > threshold of throughput per 30 minutes will be 19,285.71 ( 30/2.8
                                          > * 1800=19,285.71), which is greater than throughput yesterday and
                                          > today. Does this avoid obvious delays?

                                          The 2.8 was pulled out of a hat, you really should have a strong
                                          impression by now that I have little patience for lazy guess-work.
                                          Don't guess, measure! The 2.8s number is I think way too high,
                                          surely the provider can do better.

                                          Perhaps your DNS is configured poorly and their lookups are slowing
                                          down deliveries? Or you're hitting a congested shared system that
                                          the provider needs to make more performant. Are you paying them
                                          enough money to get good service? Can someone else deliver good
                                          service for a similar cost?

                                          > Also, I read Postfix Performance Tuning at URL
                                          > http://www.postfix.org/TUNING_README.html about "Tuning the number
                                          > of simultaneous deliveries" and " Tuning the number of recipients
                                          > per delivery".

                                          This does not apply with "transactional" email where each message
                                          has just one recipient. For mail to large lists, the Postfix
                                          default of 50 recipients per message is about right in most cases.
                                          When virus scanning messages to large lists (content filter
                                          transports), I used to set the recipient limit to ~1000, but both
                                          ends of the SMTP connection where configured by me. Remote systems
                                          may not support much more than (and sometimes unfortunately less)
                                          than the RFC requirement of at least 100 recipients per message.

                                          > * For high volume destination, it seems we are able to increase
                                          > default_destination_concurrency (20->30?)

                                          *After* figuring out what the latency is, why it is, and what if
                                          anything can be done about it.

                                          > and lower smtp_connection_timeout (30s ->5s);

                                          Fine on your own network, unlikely to make any difference with large
                                          providers that use load-balancers, which almost never exhibit any
                                          connection latency.

                                          > * For high volume destination, it seems we are able to increase
                                          > default_destination_recipient_limit (50 ->100?)

                                          Won't make any difference if each message has just one recipient.
                                          What is the distribution of message recipient counts in your logs?

                                          > For high volume sites a key tuning parameter is the number of
                                          > "smtp" delivery agents allocated to the "smtp" and "relay" transports.
                                          > High volume sites tend to send to many different destinations, many
                                          > of which may be down or slow, so a good fraction of the available
                                          > delivery agents will be blocked waiting for slow sites. Also mail
                                          > destined across the globe will incur large SMTP command-response
                                          > latencies, so high message throughput can only be achieved with
                                          > more concurrent delivery agents. )

                                          All your mail goes to a single relay host. The above is about high
                                          volume sending systems that send "direct to MX".

                                          > and " Example 4: High volume destination backlog", including the
                                          > following paragraph:

                                          No need to quote QSHAPE_README at me, I wrote it. :-)

                                          > In master.cf set up a dedicated clone of the "smtp" transport
                                          > for the destination in question. In the example below we will call
                                          > it "fragile".

                                          Your destination is not "fragile". That's only needed for destinations
                                          that get throttled due to repeated timeouts, connection failures
                                          or the destination refusing service under load. Your destination
                                          is "slow", not "fragile".

                                          > Can we divide destination domains into three transport groups
                                          > and create two clones of the "smtp" transport?

                                          All your mail goes to a single relay host, there's nothing to divide.

                                          --
                                          Viktor.
                                        • Xie, Wei
                                          Victor, ... It is typo because of I was tired last midnight. Should be: If Latency = 2.8s, Concurrency=20, Throughput=7.14286/second, which is equal to
                                          Message 20 of 20 , May 1 6:57 AM
                                          • 0 Attachment
                                            Victor,

                                            >>Understand and memorize this simple formula:
                                            >>
                                            >> Throughput = Concurrency / Latency

                                            >If Latency = 20, Concurrency=2.8s, Throughput=7.14286/second, which is equal to 12,857/30minutes. Is this a threshold? If real throughput is approximately greater than this number, delays >will obviously occurred in peak hours as below on Monday, right?

                                            It is typo because of I was tired last midnight. Should be:

                                            If Latency = 2.8s, Concurrency=20, Throughput=7.14286/second, which is equal to 12,857/30minutes. Is this a threshold? If real throughput is approximately greater than this number, delays will obviously occurred in peak hours as below on Monday, right?

                                            >Therefore, while measuring output rates, you need to also determine whether there is indeed a backlog. Therefore, associated with all these numbers you need to track:
                                            >
                                            > - Exponentially smoothed moving average "c+d" values.
                                            > - Exponentially smoothed moving average "b" values.
                                            >
                                            >The "c+d" values (abnormally high delivery latency) will measure potential remote causes of congestion, while "b" value will measure the resulting delays.

                                            Actually I am doing this. Once I complete, I will post.

                                            >The exponential smoothing avoids undue contribution from single message spikes and quickly forgets stale history. For each new delivery (again avoid double-counting multiple recipients >in a single message) apply something like the Perl snippet below:
                                            >
                                            > $alpha = 0.05; # If you want less noisy data at the cost of not seeing
                                            > # some shorter-term spikes, reduce $alpha to ~0.02.
                                            > $b_moving = (1-$alpha) * $b_moving + $alpha * $b;
                                            > $cd_moving = (1-$alpha) * $cd_moving + $alpha * $cd;
                                            >
                                            >Then print "$b_moving" and "$cd_moving" every 100 or so deliveries.

                                            The initial values of $b_moving and $cd_moving should be zero, right?

                                            >> If we increase default_destination_concurrency_limit = 30, the
                                            >> threshold of throughput per 30 minutes will be 19,285.71 ( 30/2.8
                                            >> * 1800=19,285.71), which is greater than throughput yesterday and
                                            >> today. Does this avoid obvious delays?
                                            >
                                            >The 2.8 was pulled out of a hat, you really should have a strong impression by now that I have little patience for lazy guess-work.
                                            >Don't guess, measure! The 2.8s number is I think way too high, surely the provider can do better.

                                            2.8s was first given by your email below at Monday 7:46pm. I just use it to do a calculation to ask my question. Through these days I have been working until mid-night and wriate small scripts to scan the logs to do measure work.

                                            ==============================================================
                                            > Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in
                                            > the 30 minutes was 9,623.
                                            >
                                            > Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in
                                            > the 30 minutes was 10,928.

                                            That's more like it: Throughput * Latency = Concurrency

                                            10928 / 1800 * 2.8 = 16.8

                                            So with latencies around 2.8 seconds your estimate concurrency is
                                            ~17 which is close enough to 20. The problem is either that your syslogd is overwhelmed and too slow or the vendor service is too slow.
                                            Fix the first problem first.


                                            > Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in
                                            > the 30 minutes was 15,597.

                                            15597 / 1800 * 2.8 = 22.4
                                            ===========================================

                                            Thanks a lot!!!

                                            Carl

                                            -----Original Message-----
                                            From: owner-postfix-users@... [mailto:owner-postfix-users@...] On Behalf Of Viktor Dukhovni
                                            Sent: Thursday, May 01, 2014 1:11 AM
                                            To: postfix-users@...
                                            Subject: Re: Backlog to outsourced email provider

                                            On Thu, May 01, 2014 at 04:25:01AM +0000, Xie, Wei wrote:

                                            > Monday night I fixed my logging as below.
                                            >
                                            > > smtp_tls_loglevel = 1
                                            > > smtpd_tls_loglevel = 1

                                            Good.

                                            > Here are throughput per 30 minutes from 8:00 through 18:00 yesterday
                                            > and today

                                            When your queue is not congested (queue backlog is negligible), the output rate is simply equal to the input rate and does not mean much other than that you're sending below the peak output capacity (which is a good thing).

                                            Therefore, while measuring output rates, you need to also determine whether there is indeed a backlog. Therefore, associated with all these numbers you need to track:

                                            - Exponentially smoothed moving average "c+d" values.
                                            - Exponentially smoothed moving average "b" values.

                                            The "c+d" values (abnormally high delivery latency) will measure potential remote causes of congestion, while "b" value will measure the resulting delays.

                                            The exponential smoothing avoids undue contribution from single message spikes and quickly forgets stale history. For each new delivery (again avoid double-counting multiple recipients in a single message) apply something like the Perl snippet below:

                                            $alpha = 0.05; # If you want less noisy data at the cost of not seeing
                                            # some shorter-term spikes, reduce $alpha to ~0.02.
                                            $b_moving = (1-$alpha) * $b_moving + $alpha * $b;
                                            $cd_moving = (1-$alpha) * $cd_moving + $alpha * $cd;

                                            Then print "$b_moving" and "$cd_moving" every 100 or so deliveries.

                                            > >>>Understand and memorize this simple formula:
                                            > >>>
                                            > >>> Throughput = Concurrency / Latency
                                            >
                                            > If Latency = 20,

                                            Latency has units of time, it is how long it takes to deliver a single message, so the above makes no sense.

                                            > Concurrency=2.8s,

                                            Concurrency is dimensionless, it counts the number of simultaneous deliveries. This makes no sense.

                                            You have to *measure* the latency (smoothed "c+d"), not guess from a single message. That was just a crude estimate based on the drop of water provided to estimate the number of fish in the ocean.

                                            > Is this a threshold?

                                            The output rate cannot exceed the peak concurrency divided by the average latency. This is only a problem if the input rate is higher still. For email, the solution is to first work to eliminate anomalous latency, and then if possible increase concurrency.

                                            "Money can buy bandwidth, but latency is forever".
                                            -- John Mashey, MIPS

                                            The latency for email delivery between well functioning systems is often two orders of magnitude smaller than the latency when something is wrong. So it makes sense to first control the latency, but physics imposes tight lower bounds, at which point if more throughput is required, you need more concurrency, and email is delivery highly parallelizable.

                                            > If real throughput is approximately greater than this number,

                                            s/throughput/input/

                                            > delays will obviously occurred in peak hours as below on Monday, right?

                                            Mail piles up when it arrives faster than it leaves.

                                            > If we increase default_destination_concurrency_limit = 30, the
                                            > threshold of throughput per 30 minutes will be 19,285.71 ( 30/2.8
                                            > * 1800=19,285.71), which is greater than throughput yesterday and
                                            > today. Does this avoid obvious delays?

                                            The 2.8 was pulled out of a hat, you really should have a strong impression by now that I have little patience for lazy guess-work.
                                            Don't guess, measure! The 2.8s number is I think way too high, surely the provider can do better.

                                            Perhaps your DNS is configured poorly and their lookups are slowing down deliveries? Or you're hitting a congested shared system that the provider needs to make more performant. Are you paying them enough money to get good service? Can someone else deliver good service for a similar cost?

                                            > Also, I read Postfix Performance Tuning at URL
                                            > http://www.postfix.org/TUNING_README.html about "Tuning the number of
                                            > simultaneous deliveries" and " Tuning the number of recipients per
                                            > delivery".

                                            This does not apply with "transactional" email where each message has just one recipient. For mail to large lists, the Postfix default of 50 recipients per message is about right in most cases.
                                            When virus scanning messages to large lists (content filter transports), I used to set the recipient limit to ~1000, but both ends of the SMTP connection where configured by me. Remote systems may not support much more than (and sometimes unfortunately less) than the RFC requirement of at least 100 recipients per message.

                                            > * For high volume destination, it seems we are able to increase
                                            > default_destination_concurrency (20->30?)

                                            *After* figuring out what the latency is, why it is, and what if anything can be done about it.

                                            > and lower smtp_connection_timeout (30s ->5s);

                                            Fine on your own network, unlikely to make any difference with large providers that use load-balancers, which almost never exhibit any connection latency.

                                            > * For high volume destination, it seems we are able to increase
                                            > default_destination_recipient_limit (50 ->100?)

                                            Won't make any difference if each message has just one recipient.
                                            What is the distribution of message recipient counts in your logs?

                                            > For high volume sites a key tuning parameter is the number of "smtp"
                                            > delivery agents allocated to the "smtp" and "relay" transports.
                                            > High volume sites tend to send to many different destinations, many of
                                            > which may be down or slow, so a good fraction of the available
                                            > delivery agents will be blocked waiting for slow sites. Also mail
                                            > destined across the globe will incur large SMTP command-response
                                            > latencies, so high message throughput can only be achieved with more
                                            > concurrent delivery agents. )

                                            All your mail goes to a single relay host. The above is about high volume sending systems that send "direct to MX".

                                            > and " Example 4: High volume destination backlog", including the
                                            > following paragraph:

                                            No need to quote QSHAPE_README at me, I wrote it. :-)

                                            > In master.cf set up a dedicated clone of the "smtp" transport
                                            > for the destination in question. In the example below we will call
                                            > it "fragile".

                                            Your destination is not "fragile". That's only needed for destinations that get throttled due to repeated timeouts, connection failures or the destination refusing service under load. Your destination is "slow", not "fragile".

                                            > Can we divide destination domains into three transport groups and
                                            > create two clones of the "smtp" transport?

                                            All your mail goes to a single relay host, there's nothing to divide.

                                            --
                                            Viktor.
                                          Your message has been successfully submitted and would be delivered to recipients shortly.