Loading ...
Sorry, an error occurred while loading the content.
 

performance problems

Expand Messages
  • Jeremie CEINTREY
    Hi everybody, First, thank you for your help. I often encounter performance problems with postfix and mailing-list. I have a relay mail postfix filtering mail
    Message 1 of 19 , Mar 30, 2012
      Hi everybody,


      First, thank you for your help.

      I often encounter performance problems with postfix and mailing-list.
      I have a relay mail postfix filtering mail traffic for about 12 000 mailboxes. Mailboxes are hosted on other mail servers.

      I use also amavisd-new with spammassassin and clamav.
      Postfix version is 2.8 and i use it with postscreen configuration.

      My problem is that I often see that mailq grows up to 7000 mails and it takes about 2 hours to deliver those mails. I don't understand why it is so slow. When i take a look at sender with qshape -s active i see who are sending mails and it is always mailling, but it is not spam traffic.

      Thank you for your help.

      Jérémie


    • Nerijus Kislauskas
      ... We don t have a magic ball. Logs (and sometime configs) are very helpful. -- Sincerely, Nerijus Kislauskas
      Message 2 of 19 , Mar 30, 2012
        On 03/30/2012 10:40 AM, Jeremie CEINTREY wrote:
        > My problem is that I often see that mailq grows up to 7000 mails and it
        > takes about 2 hours to deliver those mails. I don't understand why it is
        > so slow.

        We don't have a magic ball. Logs (and sometime configs) are very helpful.
        --
        Sincerely,
        Nerijus Kislauskas
      • Robert Schetterer
        ... as others stated , logs and configs would be helpfull have you allready tried slow transports to big mail players? -- Best Regards MfG Robert Schetterer
        Message 3 of 19 , Mar 30, 2012
          Am 30.03.2012 09:40, schrieb Jeremie CEINTREY:
          > Hi everybody,
          >
          >
          > First, thank you for your help.
          >
          > I often encounter performance problems with postfix and mailing-list.
          > I have a relay mail postfix filtering mail traffic for about 12 000
          > mailboxes. Mailboxes are hosted on other mail servers.
          >
          > I use also amavisd-new with spammassassin and clamav.
          > Postfix version is 2.8 and i use it with postscreen configuration.
          >
          > My problem is that I often see that mailq grows up to 7000 mails and it
          > takes about 2 hours to deliver those mails. I don't understand why it is
          > so slow. When i take a look at sender with qshape -s active i see who
          > are sending mails and it is always mailling, but it is not spam traffic.
          >
          > Thank you for your help.
          >
          > Jérémie
          >
          >

          as others stated , logs and configs would be helpfull

          have you allready tried slow transports to big mail players?

          --
          Best Regards

          MfG Robert Schetterer

          Germany/Munich/Bavaria
        • Jeremie CEINTREY
          ok, sorry, postconf -n alias_maps = hash:/etc/aliases append_dot_mydomain = no biff = no bounce_size_limit = 512000 broken_sasl_auth_clients = yes
          Message 4 of 19 , Mar 30, 2012
            ok, sorry,

            postconf -n
            alias_maps = hash:/etc/aliases
            append_dot_mydomain = no
            biff = no
            bounce_size_limit = 512000
            broken_sasl_auth_clients = yes
            command_directory = /usr/sbin
            config_directory = /etc/postfix
            content_filter = smtp-amavis:[127.0.0.1]:10024
            debug_peer_level = 2
            delay_warning_time = 4h
            disable_dns_lookups = no
            disable_vrfy_command = yes
            header_size_limit = 102400
            home_mailbox = Maildir/
            html_directory = no
            in_flow_delay = 1s
            lmtp_host_lookup = dns
            mail_name = courriel
            mail_owner = postfix
            mailq_path = /usr/bin/mailq
            manpage_directory = /usr/local/man
            maximal_backoff_time = 4000s
            message_size_limit = 10485760
            minimal_backoff_time = 300s
            mydestination = $myhostname
            mydomain = pharmagest.com
            myhostname = venus.pharmagest.com
            mynetworks = cidr:/etc/postfix/network_table.cidr
            myorigin = $mydomain
            newaliases_path = /usr/bin/newaliases
            postscreen_access_list = cidr:/etc/postfix/postscreen_access.cidr
            postscreen_bare_newline_action = ignore
            postscreen_bare_newline_enable = no
            postscreen_blacklist_action = enforce
            postscreen_dnsbl_action = enforce
            postscreen_dnsbl_sites = zen.spamhaus.org*3     dnsbl.njabl.org*2       bl.spameatingmonkey.net*2       dnsbl.ahbl.org          bl.spamcop.net          dnsbl.sorbs.net         b.barracudacentral.org*2
            postscreen_dnsbl_threshold = 3
            postscreen_greet_action = enforce
            postscreen_greet_banner = Bienvenue et merci d'attendre un moment
            postscreen_non_smtp_command_action = enforce
            postscreen_non_smtp_command_enable = yes
            postscreen_pipelining_action = enforce
            postscreen_pipelining_enable = yes
            queue_directory = /var/spool/postfix
            queue_run_delay = 300s
            readme_directory = no
            receive_override_options = no_address_mappings
            relay_domains = hash:/etc/postfix/relay_domains
            relayhost = [192.168.202.4]
            sample_directory = /etc/postfix
            sendmail_path = /usr/sbin/sendmail
            setgid_group = postdrop
            smtpd_banner = $myhostname ESMTP $mail_name
            smtpd_data_restrictions = reject_unauth_pipelining
            smtpd_helo_required = yes
            smtpd_recipient_restrictions = reject_non_fqdn_recipient, reject_unknown_recipient_domain, check_recipient_access hash:/etc/postfix/recipient_access, permit_mynetworks, reject_unauth_destination, reject_unauth_pipelining
            smtpd_sender_restrictions = reject_non_fqdn_sender, check_sender_access hash:/etc/postfix/sender_access, reject_unknown_sender_domain,  reject_invalid_hostname, permit_mynetworks, check_sender_access hash:/etc/postfix/localdomains
            smtpd_tls_auth_only = yes
            smtpd_tls_loglevel = 1
            smtpd_use_tls = no
            strict_rfc821_envelopes = yes
            transport_maps = hash:/etc/postfix/transport, hash:/etc/postfix/slow
            unknown_local_recipient_reject_code = 450

            About Logs, what log do you want ? /var/log/mail.log is huge





            De: "Robert Schetterer" <robert@...>
            À: postfix-users@...
            Envoyé: Vendredi 30 Mars 2012 09:58:26
            Objet: Re: performance problems

            Am 30.03.2012 09:40, schrieb Jeremie CEINTREY:
            > Hi everybody,
            >
            >
            > First, thank you for your help.
            >
            > I often encounter performance problems with postfix and mailing-list.
            > I have a relay mail postfix filtering mail traffic for about 12 000
            > mailboxes. Mailboxes are hosted on other mail servers.
            >
            > I use also amavisd-new with spammassassin and clamav.
            > Postfix version is 2.8 and i use it with postscreen configuration.
            >
            > My problem is that I often see that mailq grows up to 7000 mails and it
            > takes about 2 hours to deliver those mails. I don't understand why it is
            > so slow. When i take a look at sender with qshape -s active i see who
            > are sending mails and it is always mailling, but it is not spam traffic.
            >
            > Thank you for your help.
            >
            > Jérémie
            >
            >

            as others stated , logs and configs would be helpfull

            have you allready tried slow transports to big mail players?

            --
            Best Regards

            MfG Robert Schetterer

            Germany/Munich/Bavaria

          • Simone Caruso
            ... I don t understand what s slow, you AVAS or your smarthost/relay server? For amavis try increasing the number of forked processes and use sysstat
            Message 5 of 19 , Mar 30, 2012
              >> First, thank you for your help.
              >>
              >> I often encounter performance problems with postfix and mailing-list.
              >> I have a relay mail postfix filtering mail traffic for about 12 000
              >> mailboxes. Mailboxes are hosted on other mail servers.
              >>
              >> I use also amavisd-new with spammassassin and clamav.
              >> Postfix version is 2.8 and i use it with postscreen configuration.
              >>
              >> My problem is that I often see that mailq grows up to 7000 mails and it
              >> takes about 2 hours to deliver those mails. I don't understand why it is
              >> so slow. When i take a look at sender with qshape -s active i see who
              >> are sending mails and it is always mailling, but it is not spam traffic.
              >>
              >> Thank you for your help.
              >>
              >> Jérémie
              >>
              >>

              I don't understand what's slow, you AVAS or your smarthost/relay server?

              For amavis try increasing the number of forked processes and use sysstat
              (iostat,pidstat,vmstat,sar) to monitor performance (graphing help too).

              --
              Simone Caruso
              IT Consultant
              +39 349 65 90 805
            • Ralf Hildebrandt
              ... So, are the quede mails BEFORE smtp-amavis or behind it? How many amavis processes are running? Are they all busy (use amavisd-nanny to find out!) -- Ralf
              Message 6 of 19 , Mar 30, 2012
                > content_filter = smtp-amavis:[127.0.0.1]:10024

                So, are the quede mails BEFORE smtp-amavis or behind it?

                How many amavis processes are running? Are they all busy (use
                amavisd-nanny to find out!)

                --
                Ralf Hildebrandt
                Geschäftsbereich IT | Abteilung Netzwerk
                Charité - Universitätsmedizin Berlin
                Campus Benjamin Franklin
                Hindenburgdamm 30 | D-12203 Berlin
                Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
                ralf.hildebrandt@... | http://www.charite.de
              • Jeremie CEINTREY
                mails are in active queue. Amavis Processes : $max_servers =3D 8; # 2 processes by core Actually, the server is ok, not stressed at all, the relay mail is
                Message 7 of 19 , Mar 30, 2012
                  mails are in active queue.

                  Amavis Processes :
                  $max_servers =3D 8; # 2 processes by core

                  Actually, the server is ok, not stressed at all, the relay mail is slow.

                  top command :

                  top
                  top - 11:25:59 up 10 days, 17:53,  6 users,  load average: 1,27, 0,88, 0,68
                  Tasks: 148 total,   2 running, 146 sleeping,   0 stopped,   0 zombie
                  %Cpu(s): 71,2 us,  3,2 sy,  0,0 ni, 24,3 id,  0,2 wa,  0,0 hi,  1,2 si,  0,0 st
                  Kb Mem:   2074508 total,  1828516 used,   245992 free,       76 buffers
                  Kb Swap:  1951892 total,    91196 used,  1860696 free,   808452 cached

                    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
                  25163 amavis    20   0 83148  70m 3612 S  39,8  3,5   0:23.15 /usr/sbin/amavi
                  23990 amavis    20   0 83364  70m 3616 S  35,2  3,5   0:34.83 /usr/sbin/amavi
                  24838 amavis    20   0  230m 224m 2244 S  20,9 11,1   0:02.24 fsavd
                  24294 amavis    20   0 84804  72m 3624 R  18,3  3,6   0:28.56 /usr/sbin/amavi
                  24151 amavis    20   0  231m 225m 2276 S  13,3 11,1   0:21.65 fsavd
                  25183 amavis    20   0 82980  70m 3596 S   6,0  3,5   0:23.12 /usr/sbin/amavi
                   1857 clamav    20   0  219m 159m 6236 S   5,6  7,9 217:15.14 clamd
                   2861 bind      20   0  130m  68m 2096 S   1,7  3,4 172:15.23 named
                   1605 root      20   0 53668 1648 1340 S   0,7  0,1  11:25.61 dccifd
                   1281 syslog    20   0  2048  692  584 S   0,3  0,0  17:02.68 syslogd
                   8724 amavis    20   0  229m 224m 3348 S   0,3 11,1   0:22.29 fsavd
                  25033 postfix   20   0  6936 2668 2056 S   0,3  0,1   0:00.36 smtp
                      1 root      20   0  2244  560  540 S   0,0  0,0   0:07.27 init
                      2 root      20   0     0    0    0 S   0,0  0,0   0:00.00 kthreadd






                  De: "Ralf Hildebrandt" <Ralf.Hildebrandt@...>
                  À: postfix-users@...
                  Envoyé: Vendredi 30 Mars 2012 11:04:01
                  Objet: Re: performance problems

                  > content_filter = smtp-amavis:[127.0.0.1]:10024

                  So, are the quede mails BEFORE smtp-amavis or behind it?

                  How many amavis processes are running? Are they all busy (use
                  amavisd-nanny to find out!)

                  --
                  Ralf Hildebrandt
                    Geschäftsbereich IT | Abteilung Netzwerk
                    Charité - Universitätsmedizin Berlin
                    Campus Benjamin Franklin
                    Hindenburgdamm 30 | D-12203 Berlin
                    Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
                    ralf.hildebrandt@... | http://www.charite.de
                              

                • Robert Schetterer
                  ... grep i.e about big mail sites that i.e use greylisting ( 450 etc ) and lead to big queues after all you should ever have some idea whats goings on in your
                  Message 8 of 19 , Mar 30, 2012
                    Am 30.03.2012 10:33, schrieb Jeremie CEINTREY:
                    > About Logs, what log do you want ? /var/log/mail.log is huge

                    grep i.e about big mail sites that i.e use greylisting ( 450 etc )
                    and lead to big queues
                    after all you should ever have some idea whats goings on in your logs
                    even when they are big

                    --
                    Best Regards

                    MfG Robert Schetterer

                    Germany/Munich/Bavaria
                  • Ralf Hildebrandt
                    ... What about amavisd-nanny? -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin
                    Message 9 of 19 , Mar 30, 2012
                      * Jeremie CEINTREY <jeremie.ceintrey@...>:
                      > mails are in active queue.
                      >
                      > Amavis Processes :
                      > $max_servers = 8; # 2 processes by core
                      >
                      > Actually, the server is ok, not stressed at all, the relay mail is slow.

                      What about amavisd-nanny?

                      --
                      Ralf Hildebrandt
                      Geschäftsbereich IT | Abteilung Netzwerk
                      Charité - Universitätsmedizin Berlin
                      Campus Benjamin Franklin
                      Hindenburgdamm 30 | D-12203 Berlin
                      Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
                      ralf.hildebrandt@... | http://www.charite.de
                    • Simone Caruso
                      ... What about network bandwidth or latency? Do you have a local dns cache? disk i/o? -- Simone Caruso IT Consultant +39 349 65 90 805
                      Message 10 of 19 , Mar 30, 2012
                        On 30/03/2012 11:57, Ralf Hildebrandt wrote:
                        > * Jeremie CEINTREY <jeremie.ceintrey@...>:
                        >> mails are in active queue.
                        >>
                        >> Amavis Processes :
                        >> $max_servers = 8; # 2 processes by core
                        >>
                        >> Actually, the server is ok, not stressed at all, the relay mail is slow.
                        >
                        > What about amavisd-nanny?
                        >
                        What about network bandwidth or latency? Do you have a "local" dns cache? disk i/o?

                        --
                        Simone Caruso
                        IT Consultant
                        +39 349 65 90 805
                      • Tomas Macek
                        ... What from amavis do you have in your master.cf file? The master.cf option -o max_use= in the amavis line must correspond to the $max_servers from
                        Message 11 of 19 , Mar 30, 2012
                          On Fri, 30 Mar 2012, Jeremie CEINTREY wrote:

                          > mails are in active queue.
                          >
                          > Amavis Processes :
                          > $max_servers =3D 8; # 2 processes by core
                          >
                          > Actually, the server is ok, not stressed at all, the relay mail is slow.
                          >

                          What from amavis do you have in your master.cf file?

                          The master.cf option "-o max_use=<XXX>" in the amavis line must correspond
                          to the $max_servers from amavisd.conf. This was once my own bottleneck :-)

                          Best Regards, Tomas
                        • Ralf Hildebrandt
                          ... That s not correct. max_use specifies how often Postfix RE-USES a service instance The NUMBER (column maxproc ) should correspond to $max_servers from
                          Message 12 of 19 , Mar 30, 2012
                            * Tomas Macek <macek@...>:
                            > On Fri, 30 Mar 2012, Jeremie CEINTREY wrote:
                            >
                            > >mails are in active queue.
                            > >
                            > >Amavis Processes :
                            > >$max_servers =3D 8; # 2 processes by core
                            > >
                            > >Actually, the server is ok, not stressed at all, the relay mail is slow.
                            > >
                            >
                            > What from amavis do you have in your master.cf file?
                            >
                            > The master.cf option "-o max_use=<XXX>" in the amavis line must
                            > correspond to the $max_servers from amavisd.conf. This was once my
                            > own bottleneck :-)

                            That's not correct.

                            max_use specifies how often Postfix RE-USES a service instance

                            The NUMBER (column "maxproc") should correspond to $max_servers from amavisd.conf.

                            --
                            Ralf Hildebrandt
                            Geschäftsbereich IT | Abteilung Netzwerk
                            Charité - Universitätsmedizin Berlin
                            Campus Benjamin Franklin
                            Hindenburgdamm 30 | D-12203 Berlin
                            Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
                            ralf.hildebrandt@... | http://www.charite.de
                          • Tomas Macek
                            ... Oops! Thank you for the correction! T.
                            Message 13 of 19 , Mar 30, 2012
                              On Fri, 30 Mar 2012, Ralf Hildebrandt wrote:

                              > * Tomas Macek <macek@...>:
                              >> On Fri, 30 Mar 2012, Jeremie CEINTREY wrote:
                              >>
                              >>> mails are in active queue.
                              >>>
                              >>> Amavis Processes :
                              >>> $max_servers =3D 8; # 2 processes by core
                              >>>
                              >>> Actually, the server is ok, not stressed at all, the relay mail is slow.
                              >>>
                              >>
                              >> What from amavis do you have in your master.cf file?
                              >>
                              >> The master.cf option "-o max_use=<XXX>" in the amavis line must
                              >> correspond to the $max_servers from amavisd.conf. This was once my
                              >> own bottleneck :-)
                              >
                              > That's not correct.
                              >
                              > max_use specifies how often Postfix RE-USES a service instance
                              >
                              > The NUMBER (column "maxproc") should correspond to $max_servers from amavisd.conf.
                              >

                              Oops! Thank you for the correction!

                              T.
                            • Stan Hoeppner
                              On 3/30/2012 2:40 AM, Jeremie CEINTREY wrote: Due to lack of any evidence I ll have to make some coarse educated guesses here. ... Mailing list servers.
                              Message 14 of 19 , Mar 31, 2012
                                On 3/30/2012 2:40 AM, Jeremie CEINTREY wrote:

                                Due to lack of any evidence I'll have to make some coarse educated
                                guesses here.

                                > I often encounter performance problems with postfix and mailing-list.

                                Mailing list servers. Implicates rapid and likely parallel delivery
                                over a short interval.

                                > I have a relay mail postfix filtering mail traffic for about 12 000 mailboxes. Mailboxes are hosted on other mail servers.
                                >
                                > I use also amavisd-new with spammassassin and clamav.
                                > Postfix version is 2.8 and i use it with postscreen configuration.

                                This is likely not the problem.

                                > My problem is that I often see that mailq grows up to 7000 mails and it takes about 2 hours to deliver those mails. I don't understand why it is so slow. When i take a look at sender with qshape -s active i see who are sending mails and it is always mailling, but it is not spam traffic.

                                Let's assume you don't have any obvious Postfix mis-configuration.

                                So lets think this through logically, step by step. The first and
                                obvious issue is that mail is coming in faster than it can go out which
                                is why the queue is filling up. So we need to identify the cause of the
                                slow outbound queue. A couple of possible causes:

                                1. Downstream servers are limiting your delivery rate
                                2. You storage system doesn't have enough IOPS performance to handle
                                both the incoming write load *and* outgoing read load

                                #1 should be identifiable by lots of premature disconnects and/or 4xx
                                rejections from downstream servers. If this is indeed the problem you
                                can have admins of those servers make necessary changes, or you can
                                create individual relay transports configured with delays.

                                #2 can be fixed by replacing the queue disk with a faster device with
                                more IOPS capability, either a disk with higher RPM, an SSD, or multiple
                                disks in a striped RAID configuration, accomplished either with a
                                software RAID driver or with a hardware RAID controller.

                                Both #1 and #2 can usually easily be fixed by limiting the rate of
                                incoming mail, specifically by reducing the number of allowed parallel
                                connections. Both mailing lists and bulk mailers tend to send a large
                                volume of mail in a short period of time. The sending MTAs tend to do
                                this by opening multiple connections to your MTA, Postfix in this case,
                                which allows 50 connections per client by default. To limit the number
                                of inbound parallel connections, you would change the value of the
                                following parameter

                                smtpd_client_connection_count_limit

                                from the default of 50 to something much lower, such as 1. This will
                                limit the number of connections from mailing list servers, bulk mailers,
                                and spammers, while having no negative impact on normal inbound mail
                                flow. Regular/normal mail delivery is usually accomplished with one
                                email being sent over a single connection which is then closed upon
                                successful delivery. With 50 open connections, a sending MTA such as a
                                mailing list server, could potentially send hundreds of messages per
                                second. If your disk isn't fast enough, which is likely your problem,
                                it will spend all of its IOPS writing the inbound mail to the queue,
                                which takes precedence over reads, causing read starvation and thus slow
                                delivery. This is likely why your queue piles up and takes so long to
                                drain.

                                The net effect of this is that instead of your outbound queue piling up
                                messages, these messages pile up in the outbound queue of the sending
                                MTA, allowing you to receive them at a sane rate, and preventing your
                                queue from piling up, and delaying delivery.

                                An almost identical question came up on this list about a month or so
                                ago. Setting smtpd_client_connection_count_limit=1 solved the OP's
                                queue problem. It'll probably solve yours as well.

                                Hope this information is helpful. Please let us know if this fixes your
                                problem.

                                --
                                Stan
                              • Jeremie CEINTREY
                                Thank you very much for your explanations. I m going to test with smtpd_client_connection_count_limit = 1 Three days ago I added
                                Message 15 of 19 , Apr 1, 2012
                                  Thank you very much for your explanations.

                                  I'm going to test with smtpd_client_connection_count_limit = 1

                                  Three days ago I added smtpd_client_connection_rate_limit = 10, wich limit the number of connection by a client to 10 by time unit; a time unit equal to 60s by default.
                                  I noticed that it works well and permit to slow down big mailers. As you write it, when a mailing list campain was in progress, I was able to see hundreds of mails arriving from a domain with tail -f /var/log/mail.log | grep cleanup

                                  tail -f /var/log/mail.log | grep 'postfix/cleanup.*@domain_of_big_mailer

                                  Yet, i'm going to test with smtpd_client_connection_count_limit = 1, wich looks like smtpd_client_connection_rate_limit and smtpd_client_message_(rate|count)_limit parameters.

                                  I will give you news in a about week.





                                  De: "Stan Hoeppner" <stan@...>
                                  À: "Jeremie CEINTREY" <jeremie.ceintrey@...>
                                  Cc: postfix-users@...
                                  Envoyé: Samedi 31 Mars 2012 19:16:57
                                  Objet: Re: performance problems

                                  On 3/30/2012 2:40 AM, Jeremie CEINTREY wrote:

                                  Due to lack of any evidence I'll have to make some coarse educated
                                  guesses here.

                                  > I often encounter performance problems with postfix and mailing-list.

                                  Mailing list servers.  Implicates rapid and likely parallel delivery
                                  over a short interval.

                                  > I have a relay mail postfix filtering mail traffic for about 12 000 mailboxes. Mailboxes are hosted on other mail servers.
                                  >
                                  > I use also amavisd-new with spammassassin and clamav.
                                  > Postfix version is 2.8 and i use it with postscreen configuration.

                                  This is likely not the problem.

                                  > My problem is that I often see that mailq grows up to 7000 mails and it takes about 2 hours to deliver those mails. I don't understand why it is so slow. When i take a look at sender with qshape -s active i see who are sending mails and it is always mailling, but it is not spam traffic.

                                  Let's assume you don't have any obvious Postfix mis-configuration.

                                  So lets think this through logically, step by step.  The first and
                                  obvious issue is that mail is coming in faster than it can go out which
                                  is why the queue is filling up.  So we need to identify the cause of the
                                  slow outbound queue.  A couple of possible causes:

                                  1.  Downstream servers are limiting your delivery rate
                                  2.  You storage system doesn't have enough IOPS performance to handle
                                      both the incoming write load *and* outgoing read load

                                  #1 should be identifiable by lots of premature disconnects and/or 4xx
                                  rejections from downstream servers.  If this is indeed the problem you
                                  can have admins of those servers make necessary changes, or you can
                                  create individual relay transports configured with delays.

                                  #2 can be fixed by replacing the queue disk with a faster device with
                                  more IOPS capability, either a disk with higher RPM, an SSD, or multiple
                                  disks in a striped RAID configuration, accomplished either with a
                                  software RAID driver or with a hardware RAID controller.

                                  Both #1 and #2 can usually easily be fixed by limiting the rate of
                                  incoming mail, specifically by reducing the number of allowed parallel
                                  connections.  Both mailing lists and bulk mailers tend to send a large
                                  volume of mail in a short period of time.  The sending MTAs tend to do
                                  this by opening multiple connections to your MTA, Postfix in this case,
                                  which allows 50 connections per client by default.  To limit the number
                                  of inbound parallel connections, you would change the value of the
                                  following parameter

                                  smtpd_client_connection_count_limit

                                  from the default of 50 to something much lower, such as 1.  This will
                                  limit the number of connections from mailing list servers, bulk mailers,
                                  and spammers, while having no negative impact on normal inbound mail
                                  flow.  Regular/normal mail delivery is usually accomplished with one
                                  email being sent over a single connection which is then closed upon
                                  successful delivery.  With 50 open connections, a sending MTA such as a
                                  mailing list server, could potentially send hundreds of messages per
                                  second.  If your disk isn't fast enough, which is likely your problem,
                                  it will spend all of its IOPS writing the inbound mail to the queue,
                                  which takes precedence over reads, causing read starvation and thus slow
                                  delivery.  This is likely why your queue piles up and takes so long to
                                  drain.

                                  The net effect of this is that instead of your outbound queue piling up
                                  messages, these messages pile up in the outbound queue of the sending
                                  MTA, allowing you to receive them at a sane rate, and preventing your
                                  queue from piling up, and delaying delivery.

                                  An almost identical question came up on this list about a month or so
                                  ago.  Setting smtpd_client_connection_count_limit=1 solved the OP's
                                  queue problem.  It'll probably solve yours as well.

                                  Hope this information is helpful.  Please let us know if this fixes your
                                  problem.

                                  --
                                  Stan

                                • Stan Hoeppner
                                  ... smtpd_client_connection_count_limit tends to only slow down bulk mailers and not normal non-bulk mailers, which is why I recommended it.
                                  Message 16 of 19 , Apr 2, 2012
                                    On 4/2/2012 1:51 AM, Jeremie CEINTREY wrote:
                                    > Thank you very much for your explanations.
                                    >
                                    > I'm going to test with smtpd_client_connection_count_limit = 1
                                    >
                                    > Three days ago I added smtpd_client_connection_rate_limit = 10, wich limit the number of connection by a client to 10 by time unit; a time unit equal to 60s by default.
                                    > I noticed that it works well and permit to slow down big mailers. As you write it, when a mailing list campain was in progress, I was able to see hundreds of mails arriving from a domain with tail -f /var/log/mail.log | grep cleanup
                                    >
                                    > tail -f /var/log/mail.log | grep 'postfix/cleanup.*@domain_of_big_mailer
                                    >
                                    > Yet, i'm going to test with smtpd_client_connection_count_limit = 1, wich looks like smtpd_client_connection_rate_limit and smtpd_client_message_(rate|count)_limit parameters.

                                    smtpd_client_connection_count_limit tends to only slow down bulk mailers
                                    and not 'normal' non-bulk mailers, which is why I recommended it.

                                    smtpd_client_connection_rate_limit and
                                    smtpd_client_message_(rate|count)_limit will delay delivery from
                                    'normal' mailers on occasion, possibly very frequently. This is a
                                    negative side effect most would want to avoid. This type of restriction
                                    should be configured only on a domain or IP subnet basis so you only
                                    affect the bulk mailers. Postfix doesn't have an inbuilt way to do so.
                                    These settings are global. Thus, if you want to use this type of rate
                                    delay you would want to use an add on policy daemon. The policy daemon
                                    method has a downside: it requires an smtpd process for each connection
                                    to be delayed, eating extra system resources.

                                    Setting smtpd_client_connection_count_limit also sets
                                    postscreen_client_connection_count_limit if you're using postfix 2.8 and
                                    postscreen. Thus the limit is enforced before connections are handed to
                                    smtpd processes, so you don't needlessly eat up additional smtpds.

                                    Thus, it's much simpler and more effective to use
                                    smtpd_client_connection_count_limit to achieve your goal, without
                                    multiple unwanted side effects.

                                    --
                                    Stan
                                  • Wietse Venema
                                    ... Note that postscreen either blocks a client or hands it off to a Postfix SMTP server process. The connection count limit in postscreen applies only to the
                                    Message 17 of 19 , Apr 3, 2012
                                      Stan Hoeppner:
                                      > Setting smtpd_client_connection_count_limit also sets
                                      > postscreen_client_connection_count_limit if you're using postfix 2.8 and
                                      > postscreen. Thus the limit is enforced before connections are handed to
                                      > smtpd processes, so you don't needlessly eat up additional smtpds.

                                      Note that postscreen either blocks a client or hands it off to a
                                      Postfix SMTP server process. The connection count limit in postscreen
                                      applies only to the SMTP clients that are (not yet) handed off to
                                      an SMTP server process. Once the hand-off is done, postscreen does
                                      not know when an SMTP session ends, so the session no longer counts
                                      towards the postscreen connection count limit. The code was tricky
                                      enough that I did not want to introduce a postscreen-to-anvil
                                      dependency.

                                      The postscreen connection count limit is still effective for "hit
                                      and run" spambots that make a burst of connections at approximately
                                      the same time. Such clients will exceed the connection limit while
                                      waiting for the pregreet timer to expire, or for DNS[BW]L lookups
                                      to complete.

                                      Wietse
                                    • Stan Hoeppner
                                      ... Ahh, thanks for the clarification Wietse. The smtpd_client_connection_count_limit is still enforced against post hand off client connections though,
                                      Message 18 of 19 , Apr 3, 2012
                                        On 4/3/2012 10:27 AM, Wietse Venema wrote:
                                        > Stan Hoeppner:
                                        >> Setting smtpd_client_connection_count_limit also sets
                                        >> postscreen_client_connection_count_limit if you're using postfix 2.8 and
                                        >> postscreen. Thus the limit is enforced before connections are handed to
                                        >> smtpd processes, so you don't needlessly eat up additional smtpds.
                                        >
                                        > Note that postscreen either blocks a client or hands it off to a
                                        > Postfix SMTP server process. The connection count limit in postscreen
                                        > applies only to the SMTP clients that are (not yet) handed off to
                                        > an SMTP server process. Once the hand-off is done, postscreen does
                                        > not know when an SMTP session ends, so the session no longer counts
                                        > towards the postscreen connection count limit. The code was tricky
                                        > enough that I did not want to introduce a postscreen-to-anvil
                                        > dependency.

                                        Ahh, thanks for the clarification Wietse. The
                                        smtpd_client_connection_count_limit is still enforced against post hand
                                        off client connections though, correct?

                                        > The postscreen connection count limit is still effective for "hit
                                        > and run" spambots that make a burst of connections at approximately
                                        > the same time. Such clients will exceed the connection limit while
                                        > waiting for the pregreet timer to expire, or for DNS[BW]L lookups
                                        > to complete.

                                        So the postscreen connection limit is good for slowing bots, no surprise
                                        since bots are the postscreen target, but the smtpd connection limit is
                                        still appropriate/needed for slowing legit bulk mailer clients, assuming
                                        one chooses to use it vs the other anvil based restrictions.

                                        --
                                        Stan
                                      • Wietse Venema
                                        ... Correct. postscreen by design has no effect on known, non-bot, clients. Wietse
                                        Message 19 of 19 , Apr 3, 2012
                                          Stan Hoeppner:
                                          > So the postscreen connection limit is good for slowing bots, no surprise
                                          > since bots are the postscreen target, but the smtpd connection limit is
                                          > still appropriate/needed for slowing legit bulk mailer clients, assuming
                                          > one chooses to use it vs the other anvil based restrictions.

                                          Correct. postscreen by design has no effect on known, non-bot, clients.

                                          Wietse
                                        Your message has been successfully submitted and would be delivered to recipients shortly.