Loading ...
Sorry, an error occurred while loading the content.
 

filling incoming queue

Expand Messages
  • Pavel Urban
    Hello, I have quite busy mail server (~2mio mails per day) and it has trouble handling incoming queue. Incoming: 1028210 Active: 177 Deferred: 43310 RedHat
    Message 1 of 15 , Feb 29, 2008
      Hello,

      I have quite busy mail server (~2mio mails per day) and it has trouble
      handling 'incoming' queue.

      Incoming: 1028210
      Active: 177
      Deferred: 43310

      RedHat Linux 4, Postfix 2.4.7, dual Opteron. 4GB RAM

      'top' says:

      top - 09:35:42 up 218 days, 18:58, 4 users, load average: 6.95, 7.45, 8.29
      Tasks: 616 total, 2 running, 614 sleeping, 0 stopped, 0 zombie
      Cpu0 : 2.6% us, 3.3% sy, 0.0% ni, 87.9% id, 4.9% wa, 0.3% hi, 1.0% si
      Cpu1 : 2.0% us, 2.3% sy, 0.0% ni, 92.5% id, 2.9% wa, 0.0% hi, 0.3% si
      Cpu2 : 2.0% us, 3.6% sy, 0.0% ni, 1.0% id, 92.8% wa, 0.3% hi, 0.3% si
      Cpu3 : 1.3% us, 3.0% sy, 0.0% ni, 89.4% id, 6.3% wa, 0.0% hi, 0.0% si
      Mem: 4045708k total, 4019720k used, 25988k free, 328260k buffers
      Swap: 2096472k total, 139116k used, 1957356k free, 1975056k cached

      PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
      11758 clamav 15 0 2 766:32.75 6.8 503m 267m 1064 S clamd
      17158 postfix 16 0 2 2:46.95 0.1 29272 5760 1596 D qmgr
      30955 root 16 0 2 0:00.18 0.0 6552 1548 848 R top
      18241 postfix 16 0 1 0:57.27 0.1 26552 2908 1568 S trivial-rewrite
      17155 root 16 0 1 1:56.64 0.0 25292 1716 1316 S master
      2496 root 15 0 1 931:03.77 0.0 0 0 0 D kjournald
      3350 root 16 0 1 806:46.89 0.0 7256 600 316 S syslog-ng
      17241 postfix 16 0 1 0:21.37 0.1 26468 2832 1568 S trivial-rewrite
      18014 postfix 16 0 1 0:27.82 0.1 26340 2704 1568 S trivial-rewrite
      28503 postfix 16 0 1 0:00.33 0.1 29048 3740 2436 S smtpd
      29360 postfix 16 0 1 0:00.15 0.1 29032 3600 2316 S smtpd
      30360 postfix 15 0 1 0:00.13 0.1 29020 3564 2316 S smtpd
      27979 clamav 15 0 0 63:58.47 0.1 650m 4200 344 S clamsmtpd
      28694 postfix 16 0 0 0:00.19 0.1 29036 3752 2436 S smtpd
      28829 postfix 16 0 0 0:00.33 0.1 26332 3120 2192 S smtpd
      29480 postfix 16 0 0 0:00.10 0.1 29020 3568 2316 S smtpd
      29626 postfix 16 0 0 0:00.16 0.1 26092 2464 1776 S cleanup

      It seems to me like qmgr is unable to keep up with the rest of the
      system. Incoming queue is full of 0700 mode files. header and
      body_checks contain just several trivial regexp-es. I'm using
      clamsmtpd+clamd on this server, the second nearly similar one is not
      using any content_filter at all, but suffers from this similar problem.
      I don't see any warnings/fatal/panic messages except of smtpd process
      limit. Any suggestions? I'm getting desperate :-( Thank you!


      [root@relay2new 0]# postconf -n
      alias_database = hash:/etc/postfix/aliases
      alias_maps = hash:/etc/postfix/aliases
      biff = no
      body_checks = regexp:/etc/postfix/body_checks
      bounce_size_limit = 50000
      command_directory = /usr/sbin
      config_directory = /etc/postfix
      content_filter = smtp-amavis:[127.0.0.1]:10026
      daemon_directory = /usr/libexec/postfix
      debug_peer_level = 2
      default_destination_recipient_limit = 1000
      default_process_limit = 500
      defer_transports = etrn-only
      delay_warning_time = 8h
      disable_vrfy_command = yes
      duplicate_filter_limit = 1000
      fast_flush_domains = $relay_domains
      hash_queue_depth = 2
      hash_queue_names = deferred defer active incoming
      header_checks = pcre:/etc/postfix/header_checks
      html_directory = /usr/share/doc/postfix-2.4.7-documentation/html
      in_flow_delay = 10s
      inet_interfaces = all
      initial_destination_concurrency = 10
      mail_owner = postfix
      mailq_path = /usr/bin/mailq.postfix
      manpage_directory = /usr/share/man
      maximal_queue_lifetime = 5
      message_size_limit = 13721600
      mydestination = $myhostname, localhost.$mydomain, mailrelay.iol.cz
      myhostname = relay.iol.cz
      mynetworks = /etc/postfix/network_table
      newaliases_path = /usr/bin/newaliases.postfix
      proxy_interfaces = 194.228.41.114
      qmgr_message_active_limit = 900000
      qmgr_message_recipient_limit = 900000
      queue_directory = /var/spool/postfix
      queue_minfree = 20582400
      readme_directory = /usr/share/doc/postfix-2.4.7-documentation/readme
      recipient_canonical_maps = hash:/etc/postfix/recipient_canonical
      relay_domains = $mydestination hash:/etc/postfix/relay_domains
      sample_directory = /usr/share/doc/postfix-2.4.7-documentation/examples
      sendmail_path = /usr/sbin/sendmail.postfix
      setgid_group = postdrop
      smtp_connect_timeout = 15s
      smtp_data_done_timeout = 90s
      smtp_data_init_timeout = 60s
      smtp_discard_ehlo_keyword_address_maps = hash:/etc/postfix/ehlo_keywords
      smtp_helo_timeout = 90s
      smtp_mail_timeout = 90s
      smtp_quit_timeout = 60s
      smtp_rcpt_timeout = 90s
      smtpd_banner = $myhostname ESMTP $mail_name ($mail_version)
      smtpd_client_event_limit_exceptions = 10.0.0.0/8 192.168.0.0/16
      172.23.52.0/24
      smtpd_client_restrictions = hash:/etc/postfix/access_client
      permit_mynetworks reject_unauth_pipelining
      smtpd_enforce_tls = no
      smtpd_etrn_restrictions = check_policy_service inet:127.0.0.1:9998
      smtpd_hard_error_limit = 5
      smtpd_helo_required = no
      smtpd_helo_restrictions =
      smtpd_recipient_limit = 1000
      smtpd_recipient_restrictions = hash:/etc/postfix/spec_domains
      hash:/etc/postfix/access_to reject_non_fqdn_recipient
      reject_unknown_recipient_domain check_sender_access
      hash:/etc/postfix/freemail_access check_helo_access
      hash:/etc/postfix/helo_checks permit_mynetworks permit_mx_backup
      reject_unauth_destination
      smtpd_restriction_classes = from_freemail_host
      smtpd_sender_restrictions = reject_non_fqdn_sender
      reject_unknown_sender_domain hash:/etc/postfix/access_from
      smtpd_starttls_timeout = 300s
      smtpd_tls_CApath = /usr/share/ssl/certs
      smtpd_tls_cert_file = /etc/postfix/tls/relay.iol.cz.2007.crt
      smtpd_tls_key_file = /etc/postfix/tls/relay.iol.cz.key
      smtpd_tls_loglevel = 1
      smtpd_tls_session_cache_timeout = 3600s
      smtpd_use_tls = yes
      tls_random_source = dev:/dev/urandom
      transport_maps = hash:/etc/postfix/transport-cluster
      unknown_local_recipient_reject_code = 550

      [root@relay2new 0]# cat /etc/postfix/master.cf
      #
      # Postfix master process configuration file. For details on the format
      # of the file, see the Postfix master(5) manual page.
      #
      # ==========================================================================
      # service type private unpriv chroot wakeup maxproc command + args
      # (yes) (yes) (yes) (never) (100)
      # ==========================================================================
      #smtp inet n - n - 400 smtpd
      #smtp inet n - n - 1000 smtpd
      #submission inet n - n - - smtpd
      # -o smtpd_etrn_restrictions=reject
      # -o smtpd_client_restrictions=permit_sasl_authenticated,reject

      127.0.0.1:9998 inet n n n - 0 spawn
      user=nobody argv=/usr/local/bin/etrn-relay.pl
      jendva unix - - n - 2 smtp
      -o smtp_destination_concurrency_limit=2
      #smtps inet n - n - - smtpd
      # -o smtpd_tls_wrappermode=yes -o smtpd_sasl_auth_enable=yes
      #submission inet n - n - - smtpd
      # -o smtpd_etrn_restrictions=reject
      # -o smtpd_enforce_tls=yes -o smtpd_sasl_auth_enable=yes
      #628 inet n - n - - qmqpd
      pickup fifo n - n 60 1 pickup
      cleanup unix n - n - 0 cleanup
      qmgr fifo n - n 300 1 qmgr
      #qmgr fifo n - n 300 1 oqmgr
      tlsmgr unix - - n 1000? 1 tlsmgr
      rewrite unix - - n - - trivial-rewrite
      bounce unix - - n - 0 bounce
      defer unix - - n - 0 bounce
      trace unix - - n - 0 bounce
      verify unix - - n - 1 verify
      flush unix n - n 1000? 0 flush
      proxymap unix - - n - - proxymap
      smtp unix - - n - - smtp
      # When relaying mail as backup MX, disable fallback_relay to avoid MX loops
      relay unix - - n - - smtp
      -o fallback_relay=
      etrn-only unix - - n - - smtp

      # -o smtp_helo_timeout=5 -o smtp_connect_timeout=5
      showq unix n - n - - showq
      error unix - - n - - error
      discard unix - - n - - discard
      local unix - n n - - local
      virtual unix - n n - - virtual
      lmtp unix - - n - - lmtp
      anvil unix - - n - 1 anvil
      scache unix - - n - 1 scache
      #
      # ====================================================================
      # Interfaces to non-Postfix software. Be sure to examine the manual
      # pages of the non-Postfix software to find out what options it wants.
      #
      # Many of the following services use the Postfix pipe(8) delivery
      # agent. See the pipe(8) man page for information about ${recipient}
      # and other message envelope options.
      # ====================================================================
      #
      # maildrop. See the Postfix MAILDROP_README file for details.
      # Also specify in main.cf: maildrop_destination_recipient_limit=1
      #
      maildrop unix - n n - - pipe
      flags=DRhu user=vmail argv=/usr/local/bin/maildrop -d ${recipient}
      #
      # The Cyrus deliver program has changed incompatibly, multiple times.
      #
      old-cyrus unix - n n - - pipe
      flags=R user=cyrus argv=/usr/lib/cyrus-imapd/deliver -e -m
      ${extension} ${user}
      # Cyrus 2.1.5 (Amos Gouaux)
      # Also specify in main.cf: cyrus_destination_recipient_limit=1
      cyrus unix - n n - - pipe
      user=cyrus argv=/usr/lib/cyrus-imapd/deliver -e -r ${sender} -m
      ${extension} ${user}
      #
      # See the Postfix UUCP_README file for configuration details.
      #
      uucp unix - n n - - pipe
      flags=Fqhu user=uucp argv=uux -r -n -z -a$sender - $nexthop!rmail
      ($recipient)
      #
      # Other external delivery methods.
      #
      ifmail unix - n n - - pipe
      flags=F user=ftn argv=/usr/lib/ifmail/ifmail -r $nexthop ($recipient)
      bsmtp unix - n n - - pipe
      flags=Fq. user=foo argv=/usr/local/sbin/bsmtp -f $sender $nexthop
      $recipient
      retry unix - - n - - error

      #if replacing default instance, (un)comment following lines

      smtp inet n - n - 200 smtpd
      ##
      ## AmaViS
      ##
      smtp-amavis unix - - n - 64 smtp
      -o smtp_send_xforward_command=yes
      -o disable_dns_lookups=yes
      -o max_use=20
      127.0.0.1:10025 inet n - n - 130 smtpd
      -o content_filter=
      -o smtpd_restriction_classes=
      -o smtpd_delay_reject=no
      -o smtpd_client_restrictions=permit_mynetworks,reject
      -o smtpd_helo_restrictions=
      -o smtpd_sender_restrictions=
      -o smtpd_recipient_restrictions=permit_mynetworks,reject
      -o smtpd_data_restrictions=reject_unauth_pipelining
      -o smtpd_end_of_data_restrictions=
      -o mynetworks=127.0.0.0/8
      -o smtpd_error_sleep_time=0
      -o smtpd_soft_error_limit=1001
      -o smtpd_hard_error_limit=1000
      -o smtpd_client_connection_count_limit=0
      -o smtpd_client_connection_rate_limit=0
      -o smtpd_milters=
      -o local_header_rewrite_clients=
      -o local_recipient_maps=
      -o relay_recipient_maps=
      -o
      receive_override_options=no_header_body_checks,no_unknown_recipient_checks


      --
      ***********************************************************************
      Pavel Urban (pavel.urban@...)
      O2 system disaster
      Telefonica O2 Czech Republic, a.s. - www.cz.o2.com
      ***********************************************************************
      Vegetables should not operate electronic equipment.
      Computer Stupidities, http://rinkworks.com/stupid/
      ***********************************************************************
    • MailingListe
      ... From a first glance i would say your IO is saturated (see 92,8% wa for CPU2). The reason why could be that it *is* simply to slow for massive random IO or
      Message 2 of 15 , Feb 29, 2008
        Zitat von Pavel Urban <urbanp@...>:

        > Hello,
        >
        > I have quite busy mail server (~2mio mails per day) and it has trouble
        > handling 'incoming' queue.
        >
        > Incoming: 1028210
        > Active: 177
        > Deferred: 43310
        >
        > RedHat Linux 4, Postfix 2.4.7, dual Opteron. 4GB RAM
        >
        > 'top' says:
        >
        > top - 09:35:42 up 218 days, 18:58, 4 users, load average: 6.95, 7.45, 8.29
        > Tasks: 616 total, 2 running, 614 sleeping, 0 stopped, 0 zombie
        > Cpu0 : 2.6% us, 3.3% sy, 0.0% ni, 87.9% id, 4.9% wa, 0.3% hi, 1.0% si
        > Cpu1 : 2.0% us, 2.3% sy, 0.0% ni, 92.5% id, 2.9% wa, 0.0% hi, 0.3% si
        > Cpu2 : 2.0% us, 3.6% sy, 0.0% ni, 1.0% id, 92.8% wa, 0.3% hi, 0.3% si
        > Cpu3 : 1.3% us, 3.0% sy, 0.0% ni, 89.4% id, 6.3% wa, 0.0% hi, 0.0% si
        > Mem: 4045708k total, 4019720k used, 25988k free, 328260k buffers
        > Swap: 2096472k total, 139116k used, 1957356k free, 1975056k cached
        >
        > PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
        > 11758 clamav 15 0 2 766:32.75 6.8 503m 267m 1064 S clamd
        > 17158 postfix 16 0 2 2:46.95 0.1 29272 5760 1596 D qmgr
        > 30955 root 16 0 2 0:00.18 0.0 6552 1548 848 R top
        > 18241 postfix 16 0 1 0:57.27 0.1 26552 2908 1568 S trivial-rewrite
        > 17155 root 16 0 1 1:56.64 0.0 25292 1716 1316 S master
        > 2496 root 15 0 1 931:03.77 0.0 0 0 0 D kjournald
        > 3350 root 16 0 1 806:46.89 0.0 7256 600 316 S syslog-ng
        > 17241 postfix 16 0 1 0:21.37 0.1 26468 2832 1568 S trivial-rewrite
        > 18014 postfix 16 0 1 0:27.82 0.1 26340 2704 1568 S trivial-rewrite
        > 28503 postfix 16 0 1 0:00.33 0.1 29048 3740 2436 S smtpd
        > 29360 postfix 16 0 1 0:00.15 0.1 29032 3600 2316 S smtpd
        > 30360 postfix 15 0 1 0:00.13 0.1 29020 3564 2316 S smtpd
        > 27979 clamav 15 0 0 63:58.47 0.1 650m 4200 344 S clamsmtpd
        > 28694 postfix 16 0 0 0:00.19 0.1 29036 3752 2436 S smtpd
        > 28829 postfix 16 0 0 0:00.33 0.1 26332 3120 2192 S smtpd
        > 29480 postfix 16 0 0 0:00.10 0.1 29020 3568 2316 S smtpd
        > 29626 postfix 16 0 0 0:00.16 0.1 26092 2464 1776 S cleanup
        >
        > It seems to me like qmgr is unable to keep up with the rest of the
        > system. Incoming queue is full of 0700 mode files. header and
        > body_checks contain just several trivial regexp-es. I'm using
        > clamsmtpd+clamd on this server, the second nearly similar one is not
        > using any content_filter at all, but suffers from this similar problem.
        > I don't see any warnings/fatal/panic messages except of smtpd process
        > limit. Any suggestions? I'm getting desperate :-( Thank you!

        From a first glance i would say your IO is saturated (see 92,8% wa
        for CPU2). The reason why could be that it *is* simply to slow for
        massive random IO or some other task is doing massive IO on that
        machine. Some things to check :

        - What filesystem is the queue on ? With 1 million (billion in
        english?) files in incoming this could get you in trouble. The high
        TIME value for kjournald could point to that direction.
        - Examine if you do buffered logging with syslog-ng. If you don't
        syslog maybe saturate the IO.
        - The memory usage of clamd is somewhat high but i don't use clamd so
        no real judgement on this.
        - The number of 64 for amavis feed is maybe too high but i don't know
        what you are doing with it.
        - The deferred number is high. This should not be the case for a
        inbound relay. Why Postfix have to defer the messages?

        Regards

        Andreas

        --
        All your trash belong to us ;-) www.spamschlucker.org
        To: stephan@...
      • MailingListe
        ... Additionally have a look at http://www.postfix.org/QSHAPE_README.html#incoming_queue Regards Andreas -- All your trash belong to us ;-)
        Message 3 of 15 , Feb 29, 2008
          Zitat von Pavel Urban <urbanp@...>:

          > Hello,
          >
          > I have quite busy mail server (~2mio mails per day) and it has trouble
          > handling 'incoming' queue.
          >
          > Incoming: 1028210
          > Active: 177
          > Deferred: 43310
          >

          Additionally have a look at

          http://www.postfix.org/QSHAPE_README.html#incoming_queue

          Regards

          Andreas


          --
          All your trash belong to us ;-) www.spamschlucker.org
          To: stephan@...
        • Victor Duchovni
          ... It is important to keep in mind that frequent reload operations may also result in saturated incoming queues on systems with a lot of mail. Every time
          Message 4 of 15 , Feb 29, 2008
            On Fri, Feb 29, 2008 at 01:31:38PM +0100, MailingListe wrote:

            > Zitat von Pavel Urban <urbanp@...>:
            >
            > >Hello,
            > >
            > >I have quite busy mail server (~2mio mails per day) and it has trouble
            > >handling 'incoming' queue.
            > >
            > >Incoming: 1028210
            > >Active: 177
            > >Deferred: 43310
            > >
            >
            > Additionally have a look at
            >
            > http://www.postfix.org/QSHAPE_README.html#incoming_queue
            >

            It is important to keep in mind that frequent "reload" operations may
            also result in saturated "incoming" queues on systems with a lot of mail.

            Every time the queue manager is restarted, it moves all mail from "active"
            to "incoming" to reprocess the queue, if it is restarted again shortly
            after it never makes any progress.

            So a key question here is how long the queue manager gets to run before
            someone (or a cron job?) reloads Postfix. Logs showing either stability
            or instability of the queue manager "pid" over time would be most helpful.

            --
            Viktor.

            Disclaimer: off-list followups get on-list replies or get ignored.
            Please do not ignore the "Reply-To" header.

            To unsubscribe from the postfix-users list, visit
            http://www.postfix.org/lists.html or click the link below:
            <mailto:majordomo@...?body=unsubscribe%20postfix-users>

            If my response solves your problem, the best way to thank me is to not
            send an "it worked, thanks" follow-up. If you must respond, please put
            "It worked, thanks" in the "Subject" so I can delete these quickly.
          • Victor Duchovni
            ... This is covered in the last paragraph of: http://www.postfix.org/QSHAPE_README.html#active_queue -- Viktor. Disclaimer: off-list followups get on-list
            Message 5 of 15 , Feb 29, 2008
              On Fri, Feb 29, 2008 at 10:46:45AM -0500, Victor Duchovni wrote:

              > On Fri, Feb 29, 2008 at 01:31:38PM +0100, MailingListe wrote:
              >
              > > Zitat von Pavel Urban <urbanp@...>:
              > >
              > > >Hello,
              > > >
              > > >I have quite busy mail server (~2mio mails per day) and it has trouble
              > > >handling 'incoming' queue.
              > > >
              > > >Incoming: 1028210
              > > >Active: 177
              > > >Deferred: 43310
              > > >
              > >
              > > Additionally have a look at
              > >
              > > http://www.postfix.org/QSHAPE_README.html#incoming_queue
              > >
              >
              > It is important to keep in mind that frequent "reload" operations may
              > also result in saturated "incoming" queues on systems with a lot of mail.
              >
              > Every time the queue manager is restarted, it moves all mail from "active"
              > to "incoming" to reprocess the queue, if it is restarted again shortly
              > after it never makes any progress.
              >
              > So a key question here is how long the queue manager gets to run before
              > someone (or a cron job?) reloads Postfix. Logs showing either stability
              > or instability of the queue manager "pid" over time would be most helpful.

              This is covered in the last paragraph of:

              http://www.postfix.org/QSHAPE_README.html#active_queue

              --
              Viktor.

              Disclaimer: off-list followups get on-list replies or get ignored.
              Please do not ignore the "Reply-To" header.

              To unsubscribe from the postfix-users list, visit
              http://www.postfix.org/lists.html or click the link below:
              <mailto:majordomo@...?body=unsubscribe%20postfix-users>

              If my response solves your problem, the best way to thank me is to not
              send an "it worked, thanks" follow-up. If you must respond, please put
              "It worked, thanks" in the "Subject" so I can delete these quickly.
            • Pavel Urban
              ... Thanks for the tip, but it doesn t seem to be the case [root@relay2new ~]# perl -ne if (/qmgr [( d+) ]/) {print if $last != $1; $last=$1;}
              Message 6 of 15 , Feb 29, 2008
                Victor Duchovni wrote:
                > On Fri, Feb 29, 2008 at 01:31:38PM +0100, MailingListe wrote:
                >
                >> Zitat von Pavel Urban <urbanp@...>:
                >>
                >>> Hello,
                >>>
                >>> I have quite busy mail server (~2mio mails per day) and it has trouble
                >>> handling 'incoming' queue.
                >>>
                >>> Incoming: 1028210
                >>> Active: 177
                >>> Deferred: 43310
                >>>
                >> Additionally have a look at
                >>
                >> http://www.postfix.org/QSHAPE_README.html#incoming_queue
                >>
                >
                > It is important to keep in mind that frequent "reload" operations may
                > also result in saturated "incoming" queues on systems with a lot of mail.
                >
                > Every time the queue manager is restarted, it moves all mail from "active"
                > to "incoming" to reprocess the queue, if it is restarted again shortly
                > after it never makes any progress.
                >
                > So a key question here is how long the queue manager gets to run before
                > someone (or a cron job?) reloads Postfix. Logs showing either stability
                > or instability of the queue manager "pid" over time would be most helpful.
                >

                Thanks for the tip, but it doesn't seem to be the case

                [root@relay2new ~]# perl -ne 'if (/qmgr\[(\d+)\]/) {print if $last !=
                $1; $last=$1;}' </var/log/maillog
                Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                size=3596, nrcpt=1 (queue active)
                Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                from=<bichromate@...>, size=3436, nrcpt=20 (queue active)
                Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                from=<jay6santisuk78@...>, size=1048, nrcpt=1 (queue active)
                Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                from=<byung-uk@...>, size=2346, nrcpt=1 (queue active)
                Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                from=<yeoman@...>, size=13049, nrcpt=1 (queue active)
                Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:
                from=<elknarf@...>, size=1816, nrcpt=1 (queue active)
              • Victor Duchovni
                ... With that out of the way, what is the output rate? (qmgr[pid]: qid: removed log entries per minute). What is the input rate? (cleanup[pid]: qid:
                Message 7 of 15 , Feb 29, 2008
                  On Fri, Feb 29, 2008 at 05:42:52PM +0100, Pavel Urban wrote:

                  > >So a key question here is how long the queue manager gets to run before
                  > >someone (or a cron job?) reloads Postfix. Logs showing either stability
                  > >or instability of the queue manager "pid" over time would be most helpful.
                  > >
                  >
                  > Thanks for the tip, but it doesn't seem to be the case
                  >
                  > [root@relay2new ~]# perl -ne 'if (/qmgr\[(\d+)\]/) {print if $last !=
                  > $1; $last=$1;}' </var/log/maillog
                  > Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                  > size=3596, nrcpt=1 (queue active)
                  > Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                  > from=<bichromate@...>, size=3436, nrcpt=20 (queue active)
                  > Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                  > from=<jay6santisuk78@...>, size=1048, nrcpt=1 (queue active)
                  > Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                  > from=<byung-uk@...>, size=2346, nrcpt=1 (queue active)
                  > Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                  > from=<yeoman@...>, size=13049, nrcpt=1 (queue active)
                  > Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:
                  > from=<elknarf@...>, size=1816, nrcpt=1 (queue active)

                  With that out of the way, what is the output rate? (qmgr[pid]: qid:
                  removed log entries per minute). What is the input rate? (cleanup[pid]:
                  qid: message-id=<...> entries per minute)?

                  Inject a new message into the queue. Find its queue id in the logs. What
                  is the current time (from date(1)) and the timestamp of the queue file
                  in the incoming queue (from "ls -l").

                  What table mechanisms are used for address class and transport lookups?

                  --
                  Viktor.

                  Disclaimer: off-list followups get on-list replies or get ignored.
                  Please do not ignore the "Reply-To" header.

                  To unsubscribe from the postfix-users list, visit
                  http://www.postfix.org/lists.html or click the link below:
                  <mailto:majordomo@...?body=unsubscribe%20postfix-users>

                  If my response solves your problem, the best way to thank me is to not
                  send an "it worked, thanks" follow-up. If you must respond, please put
                  "It worked, thanks" in the "Subject" so I can delete these quickly.
                • Wietse Venema
                  ... Why is the qmgr restarted in the first place? Does it abort with fatal errors? Wietse
                  Message 8 of 15 , Feb 29, 2008
                    Pavel Urban:
                    > Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                    > Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                    > Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                    > Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                    > Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                    > Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:

                    Why is the qmgr restarted in the first place? Does it abort with
                    fatal errors?

                    Wietse
                  • Pavel Urban
                    ... No, these are reloads due to configuration changes. We ve tried to tune the system, but weren t very successful :-( --
                    Message 9 of 15 , Feb 29, 2008
                      Wietse Venema wrote:
                      > Pavel Urban:
                      >> Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                      >> Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                      >> Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                      >> Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                      >> Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                      >> Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:
                      >
                      > Why is the qmgr restarted in the first place? Does it abort with
                      > fatal errors?
                      >
                      > Wietse
                      >

                      No, these are reloads due to configuration changes. We've tried to tune
                      the system, but weren't very successful :-(

                      --
                      ***********************************************************************
                      Pavel Urban (pavel.urban (at) o2.com)
                      O2 system disaster
                      Telefonica O2 Czech Republic, a.s. - www.cz.o2.com
                      ***********************************************************************
                      Vegetables should not operate electronic equipment.
                      Computer Stupidities, http://rinkworks.com/stupid/
                      ***********************************************************************
                    • Wietse Venema
                      ... With 1 million messages in the incoming queue and 177 messages in the active queue, the queue manager is not successful with finding mail in the queue. Is
                      Message 10 of 15 , Feb 29, 2008
                        Pavel Urban:
                        > Wietse Venema wrote:
                        > > Pavel Urban:
                        > >> Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                        > >> Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                        > >> Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                        > >> Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                        > >> Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                        > >> Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:
                        > >
                        > > Why is the qmgr restarted in the first place? Does it abort with
                        > > fatal errors?
                        > >
                        > > Wietse
                        > >
                        >
                        > No, these are reloads due to configuration changes. We've tried to tune
                        > the system, but weren't very successful :-(

                        With 1 million messages in the incoming queue and 177 messages in
                        the active queue, the queue manager is not successful with finding
                        mail in the queue.

                        Is your mail queue mounted from a server? Ar the clocks of the
                        Postfix machine and of the file server synchronized? Postfix will
                        try to compensate but older versions have fewer workarounds.

                        Wietse
                      • Pavel Urban
                        ... [root@relay2new skripty]# tail -1000000 /var/log/maillog |./analyzuj.pl last minute cleanup count: 632 last minute qmgr count: 401 last minute qmgr count:
                        Message 11 of 15 , Feb 29, 2008
                          Victor Duchovni wrote:
                          > On Fri, Feb 29, 2008 at 05:42:52PM +0100, Pavel Urban wrote:
                          >
                          >>> So a key question here is how long the queue manager gets to run before
                          >>> someone (or a cron job?) reloads Postfix. Logs showing either stability
                          >>> or instability of the queue manager "pid" over time would be most helpful.
                          >>>
                          >> Thanks for the tip, but it doesn't seem to be the case
                          >>
                          >> [root@relay2new ~]# perl -ne 'if (/qmgr\[(\d+)\]/) {print if $last !=
                          >> $1; $last=$1;}' </var/log/maillog
                          >> Feb 26 12:49:33 relay2new postfix/qmgr[30697]: CF10C78C1DA: from=<>,
                          >> size=3596, nrcpt=1 (queue active)
                          >> Feb 27 14:09:50 relay2new postfix/qmgr[15012]: CCF6CF2D982:
                          >> from=<bichromate@...>, size=3436, nrcpt=20 (queue active)
                          >> Feb 28 06:40:53 relay2new postfix/qmgr[23354]: CCF2C808A8C:
                          >> from=<jay6santisuk78@...>, size=1048, nrcpt=1 (queue active)
                          >> Feb 28 08:23:15 relay2new postfix/qmgr[32517]: CC49F7BD1BC:
                          >> from=<byung-uk@...>, size=2346, nrcpt=1 (queue active)
                          >> Feb 28 12:17:39 relay2new postfix/qmgr[29852]: CC9C57B928E:
                          >> from=<yeoman@...>, size=13049, nrcpt=1 (queue active)
                          >> Feb 29 07:27:33 relay2new postfix/qmgr[17158]: CCE867AFA0B:
                          >> from=<elknarf@...>, size=1816, nrcpt=1 (queue active)
                          >
                          > With that out of the way, what is the output rate? (qmgr[pid]: qid:
                          > removed log entries per minute). What is the input rate? (cleanup[pid]:
                          > qid: message-id=<...> entries per minute)?
                          >
                          > Inject a new message into the queue. Find its queue id in the logs. What
                          > is the current time (from date(1)) and the timestamp of the queue file
                          > in the incoming queue (from "ls -l").
                          >
                          > What table mechanisms are used for address class and transport lookups?
                          >

                          [root@relay2new skripty]# tail -1000000 /var/log/maillog |./analyzuj.pl
                          last minute cleanup count: 632
                          last minute qmgr count: 401
                          last minute qmgr count: 829
                          last minute cleanup count: 1248
                          last minute cleanup count: 1254
                          last minute qmgr count: 819
                          last minute cleanup count: 1250
                          last minute qmgr count: 832
                          last minute qmgr count: 677
                          last minute cleanup count: 1093
                          last minute qmgr count: 807
                          last minute cleanup count: 1237
                          last minute cleanup count: 1210
                          last minute qmgr count: 854
                          last minute cleanup count: 1080
                          last minute qmgr count: 711
                          last minute qmgr count: 686
                          last minute cleanup count: 1083
                          last minute cleanup count: 1182
                          last minute qmgr count: 732
                          last minute qmgr count: 844
                          last minute cleanup count: 1305
                          last minute cleanup count: 1332
                          last minute qmgr count: 866
                          last minute qmgr count: 855
                          last minute cleanup count: 1300
                          last minute cleanup count: 1235
                          last minute qmgr count: 852
                          last minute cleanup count: 1137
                          last minute qmgr count: 699
                          last minute cleanup count: 1191
                          last minute qmgr count: 784
                          last minute cleanup count: 1180
                          last minute qmgr count: 721
                          last minute cleanup count: 1273
                          last minute qmgr count: 733
                          last minute cleanup count: 1222
                          last minute qmgr count: 708
                          last minute qmgr count: 876
                          last minute cleanup count: 1337
                          last minute qmgr count: 933
                          last minute cleanup count: 1469
                          last minute cleanup count: 1333
                          last minute qmgr count: 913
                          last minute qmgr count: 924
                          last minute cleanup count: 1281
                          last minute cleanup count: 1387
                          last minute qmgr count: 954
                          last minute qmgr count: 757
                          last minute cleanup count: 1222
                          last minute cleanup count: 1143
                          last minute qmgr count: 699
                          last minute cleanup count: 1143
                          last minute qmgr count: 690
                          last minute cleanup count: 1168
                          last minute qmgr count: 705
                          last minute qmgr count: 671
                          last minute cleanup count: 1171
                          last minute qmgr count: 631
                          last minute cleanup count: 1114
                          last minute cleanup count: 1173
                          last minute qmgr count: 670
                          last minute qmgr count: 728
                          last minute cleanup count: 1171
                          last minute cleanup count: 1169
                          last minute qmgr count: 693
                          last minute cleanup count: 1109
                          last minute qmgr count: 613
                          last minute cleanup count: 1049
                          last minute qmgr count: 640
                          last minute qmgr count: 677
                          last minute cleanup count: 1121
                          last minute cleanup count: 1126
                          last minute qmgr count: 638
                          last minute cleanup count: 1160
                          last minute qmgr count: 668
                          last minute qmgr count: 683
                          last minute cleanup count: 1159
                          last minute cleanup count: 1149
                          last minute qmgr count: 636
                          last minute cleanup count: 1215
                          last minute qmgr count: 626
                          last minute qmgr count: 1185
                          last minute cleanup count: 1584
                          last minute cleanup count: 1665
                          last minute qmgr count: 1236
                          last minute cleanup count: 1188
                          last minute qmgr count: 713
                          last minute cleanup count: 1247
                          last minute qmgr count: 817
                          last minute cleanup count: 1171
                          last minute qmgr count: 769
                          last minute cleanup count: 1227
                          last minute qmgr count: 762
                          last minute cleanup count: 1143
                          last minute qmgr count: 652
                          last minute qmgr count: 784
                          last minute cleanup count: 1175
                          last minute cleanup count: 1390
                          last minute qmgr count: 974
                          last minute qmgr count: 1097
                          last minute cleanup count: 1439
                          last minute cleanup count: 1576
                          last minute qmgr count: 1213
                          last minute cleanup count: 1526
                          last minute qmgr count: 1279
                          last minute qmgr count: 1168
                          last minute cleanup count: 1480
                          last minute cleanup count: 1639
                          last minute qmgr count: 1279
                          last minute cleanup count: 1665
                          last minute qmgr count: 1341
                          last minute cleanup count: 1449
                          last minute qmgr count: 1145
                          last minute qmgr count: 1181
                          last minute cleanup count: 1522
                          last minute qmgr count: 1351
                          last minute cleanup count: 1748
                          last minute cleanup count: 1405
                          last minute qmgr count: 1090
                          last minute qmgr count: 1033
                          last minute cleanup count: 1391
                          last minute qmgr count: 1074
                          last minute cleanup count: 1451
                          last minute cleanup count: 1395
                          last minute qmgr count: 1112
                          last minute cleanup count: 1389
                          last minute qmgr count: 1096
                          last minute qmgr count: 1021
                          last minute cleanup count: 1371
                          last minute cleanup count: 1462
                          last minute qmgr count: 1132
                          last minute qmgr count: 1453
                          last minute cleanup count: 1758
                          last minute cleanup count: 1449
                          last minute qmgr count: 997
                          last minute cleanup count: 1448
                          last minute qmgr count: 1139
                          last minute cleanup count: 1397
                          last minute qmgr count: 1055
                          last minute cleanup count: 1728
                          last minute qmgr count: 1520
                          last minute qmgr count: 1592
                          last minute cleanup count: 1790
                          last minute qmgr count: 1076
                          last minute cleanup count: 1256
                          last minute cleanup count: 1449
                          last minute qmgr count: 1288
                          last minute qmgr count: 1136
                          last minute cleanup count: 1315
                          last minute cleanup count: 1695
                          last minute qmgr count: 1404
                          last minute cleanup count: 1783
                          last minute qmgr count: 1476
                          last minute cleanup count: 1822
                          last minute qmgr count: 1561
                          last minute qmgr count: 1626
                          last minute cleanup count: 1841
                          last minute qmgr count: 1610
                          last minute cleanup count: 1874
                          last minute qmgr count: 1413
                          last minute cleanup count: 1676
                          last minute cleanup count: 1848
                          last minute qmgr count: 1616
                          last minute qmgr count: 1645
                          last minute cleanup count: 1979
                          last minute cleanup count: 1861
                          last minute qmgr count: 1598
                          last minute cleanup count: 1532
                          last minute qmgr count: 1245
                          last minute cleanup count: 1530
                          last minute qmgr count: 1100
                          last minute cleanup count: 1703
                          last minute qmgr count: 1332
                          last minute cleanup count: 1777
                          last minute qmgr count: 1501
                          last minute cleanup count: 1745
                          last minute qmgr count: 1527
                          last minute qmgr count: 1518
                          last minute cleanup count: 1766
                          last minute qmgr count: 1387
                          last minute cleanup count: 1552
                          last minute qmgr count: 1200
                          last minute cleanup count: 1461


                          [root@relay2new ~]# ls -l /var/spool/postfix/incoming/7/3/7349D7E7B3B
                          -rwx------ 1 postfix postfix 2748 Feb 29 19:54
                          /var/spool/postfix/incoming/7/3/7349D7E7B3B
                          [root@relay2new ~]# date
                          Fri Feb 29 19:54:57 CET 2008

                          lookups are just hashed tables (.db), no ldap, no mysql.

                          --
                          ***********************************************************************
                          Pavel Urban (pavel.urban (at) o2.com)
                          O2 system disaster
                          Telefonica O2 Czech Republic, a.s. - www.cz.o2.com
                          ***********************************************************************
                          Vegetables should not operate electronic equipment.
                          Computer Stupidities, http://rinkworks.com/stupid/
                          ***********************************************************************
                        • Victor Duchovni
                          ... It would be better to print the cleanup and qmgr counts for any given minute on the same line, much easier to understand, but in aggregate, you have
                          Message 12 of 15 , Feb 29, 2008
                            On Fri, Feb 29, 2008 at 08:00:54PM +0100, Pavel Urban wrote:

                            > [root@relay2new skripty]# tail -1000000 /var/log/maillog |./analyzuj.pl

                            It would be better to print the cleanup and qmgr counts for any given
                            minute on the same line, much easier to understand, but in aggregate,
                            you have delivered ~93,000 messages and received ~127,000, so the queue
                            is now 34,000 messages longer than 1,000,000 log entries ago.

                            To drain the queue, clearly the output rate needs to exceed the input
                            rate. Normally, qmgr is able to fill the active queue as fast as mail
                            comes in and the incoming queue is nearly empty. In your case incoming
                            queue scans are slower than the arrival rate, but deliveries from the
                            active queue are fast.

                            Is the queue on a very slow disk? Have you tried mounting with "noatime"?

                            >
                            > [root@relay2new ~]# ls -l /var/spool/postfix/incoming/7/3/7349D7E7B3B
                            > -rwx------ 1 postfix postfix 2748 Feb 29 19:54
                            > /var/spool/postfix/incoming/7/3/7349D7E7B3B
                            > [root@relay2new ~]# date
                            > Fri Feb 29 19:54:57 CET 2008

                            So timestamps don't appear to be the problem.

                            > lookups are just hashed tables (.db), no ldap, no mysql.

                            Why is the queue manager so I/O starved? Perhaps the disk is totally
                            saturated by input processing, if so, reduce the input concurrency
                            until the queue manager catches up. The work-load may be too big for
                            the hardware. Consider RAID controllers with battery caches, ...

                            --
                            Viktor.

                            Disclaimer: off-list followups get on-list replies or get ignored.
                            Please do not ignore the "Reply-To" header.

                            To unsubscribe from the postfix-users list, visit
                            http://www.postfix.org/lists.html or click the link below:
                            <mailto:majordomo@...?body=unsubscribe%20postfix-users>

                            If my response solves your problem, the best way to thank me is to not
                            send an "it worked, thanks" follow-up. If you must respond, please put
                            "It worked, thanks" in the "Subject" so I can delete these quickly.
                          • Pavel Urban
                            ... /dev/sdb1 on /var/spool type ext3 (rw,noatime) On the second disk, it is even /dev/sdb1 on /var/spool type ext3 (rw,noatime,data=writeback) . It is HW
                            Message 13 of 15 , Feb 29, 2008
                              Victor Duchovni wrote:
                              > On Fri, Feb 29, 2008 at 08:00:54PM +0100, Pavel Urban wrote:
                              >
                              >> [root@relay2new skripty]# tail -1000000 /var/log/maillog |./analyzuj.pl
                              >
                              > It would be better to print the cleanup and qmgr counts for any given
                              > minute on the same line, much easier to understand, but in aggregate,
                              > you have delivered ~93,000 messages and received ~127,000, so the queue
                              > is now 34,000 messages longer than 1,000,000 log entries ago.
                              >
                              > To drain the queue, clearly the output rate needs to exceed the input
                              > rate. Normally, qmgr is able to fill the active queue as fast as mail
                              > comes in and the incoming queue is nearly empty. In your case incoming
                              > queue scans are slower than the arrival rate, but deliveries from the
                              > active queue are fast.
                              >
                              > Is the queue on a very slow disk? Have you tried mounting with "noatime"?
                              >
                              >> [root@relay2new ~]# ls -l /var/spool/postfix/incoming/7/3/7349D7E7B3B
                              >> -rwx------ 1 postfix postfix 2748 Feb 29 19:54
                              >> /var/spool/postfix/incoming/7/3/7349D7E7B3B
                              >> [root@relay2new ~]# date
                              >> Fri Feb 29 19:54:57 CET 2008
                              >
                              > So timestamps don't appear to be the problem.
                              >
                              >> lookups are just hashed tables (.db), no ldap, no mysql.
                              >
                              > Why is the queue manager so I/O starved? Perhaps the disk is totally
                              > saturated by input processing, if so, reduce the input concurrency
                              > until the queue manager catches up. The work-load may be too big for
                              > the hardware. Consider RAID controllers with battery caches, ...
                              >

                              /dev/sdb1 on /var/spool type ext3 (rw,noatime)

                              On the second disk, it is even

                              /dev/sdb1 on /var/spool type ext3 (rw,noatime,data=writeback)

                              . It is HW RAID1, 10 and 15k disks. You are confirming my initial
                              suspision - it is not enough and I cannot do anything else on an
                              application level. I shall try some benchmarking on a backup server and
                              then perhaps SAN with bigger FibreChannel disks and cache.

                              Thanks a lot!

                              --
                              ***********************************************************************
                              Pavel Urban (pavel.urban (at) o2.com)
                              O2 system disaster
                              Telefonica O2 Czech Republic, a.s. - www.cz.o2.com
                              ***********************************************************************
                              Vegetables should not operate electronic equipment.
                              Computer Stupidities, http://rinkworks.com/stupid/
                              ***********************************************************************
                            • Victor Duchovni
                              ... SAN could be slower, Postfix probably cares more about latency than bandwidth. Is the SAN caching on the controler inside your machine or on the controller
                              Message 14 of 15 , Feb 29, 2008
                                On Fri, Feb 29, 2008 at 08:32:05PM +0100, Pavel Urban wrote:

                                > Victor Duchovni wrote:
                                > >On Fri, Feb 29, 2008 at 08:00:54PM +0100, Pavel Urban wrote:
                                > >
                                > >>[root@relay2new skripty]# tail -1000000 /var/log/maillog |./analyzuj.pl
                                > >
                                > >It would be better to print the cleanup and qmgr counts for any given
                                > >minute on the same line, much easier to understand, but in aggregate,
                                > >you have delivered ~93,000 messages and received ~127,000, so the queue
                                > >is now 34,000 messages longer than 1,000,000 log entries ago.
                                > >
                                > >To drain the queue, clearly the output rate needs to exceed the input
                                > >rate. Normally, qmgr is able to fill the active queue as fast as mail
                                > >comes in and the incoming queue is nearly empty. In your case incoming
                                > >queue scans are slower than the arrival rate, but deliveries from the
                                > >active queue are fast.
                                > >
                                > >Is the queue on a very slow disk? Have you tried mounting with "noatime"?
                                > >
                                > >>[root@relay2new ~]# ls -l /var/spool/postfix/incoming/7/3/7349D7E7B3B
                                > >>-rwx------ 1 postfix postfix 2748 Feb 29 19:54
                                > >>/var/spool/postfix/incoming/7/3/7349D7E7B3B
                                > >>[root@relay2new ~]# date
                                > >>Fri Feb 29 19:54:57 CET 2008
                                > >
                                > >So timestamps don't appear to be the problem.
                                > >
                                > >>lookups are just hashed tables (.db), no ldap, no mysql.
                                > >
                                > >Why is the queue manager so I/O starved? Perhaps the disk is totally
                                > >saturated by input processing, if so, reduce the input concurrency
                                > >until the queue manager catches up. The work-load may be too big for
                                > >the hardware. Consider RAID controllers with battery caches, ...
                                > >
                                >
                                > /dev/sdb1 on /var/spool type ext3 (rw,noatime)
                                >
                                > On the second disk, it is even
                                >
                                > /dev/sdb1 on /var/spool type ext3 (rw,noatime,data=writeback)
                                >
                                > . It is HW RAID1, 10 and 15k disks. You are confirming my initial
                                > suspision - it is not enough and I cannot do anything else on an
                                > application level. I shall try some benchmarking on a backup server and
                                > then perhaps SAN with bigger FibreChannel disks and cache.
                                >

                                SAN could be slower, Postfix probably cares more about latency than
                                bandwidth. Is the SAN caching on the controler inside your machine
                                or on the controller in the array? (More likely the latter).

                                In any case, I get ~200 msgs/sec on Dell 2850 class hardware with Battery
                                RAID cache (onboard no SAN) enabled. That's 12,000 per minute, and you
                                are topping out at 1/12th of that, so either the NVRAM cache is disabled
                                or too small.

                                Your volume of deferred mail also seems high. Are you scanning the
                                deferred queue too often?

                                --
                                Viktor.

                                Disclaimer: off-list followups get on-list replies or get ignored.
                                Please do not ignore the "Reply-To" header.

                                To unsubscribe from the postfix-users list, visit
                                http://www.postfix.org/lists.html or click the link below:
                                <mailto:majordomo@...?body=unsubscribe%20postfix-users>

                                If my response solves your problem, the best way to thank me is to not
                                send an "it worked, thanks" follow-up. If you must respond, please put
                                "It worked, thanks" in the "Subject" so I can delete these quickly.
                              • Pavel Urban
                                ... I think it will make big difference. These are Sun s X4200, they are ment to be connected to SAN or to an external array - they have SAS disks and some
                                Message 15 of 15 , Mar 1, 2008
                                  Victor Duchovni wrote:
                                  > On Fri, Feb 29, 2008 at 08:32:05PM +0100, Pavel Urban wrote:
                                  >
                                  > SAN could be slower, Postfix probably cares more about latency than
                                  > bandwidth. Is the SAN caching on the controler inside your machine
                                  > or on the controller in the array? (More likely the latter).
                                  >
                                  > In any case, I get ~200 msgs/sec on Dell 2850 class hardware with Battery
                                  > RAID cache (onboard no SAN) enabled. That's 12,000 per minute, and you
                                  > are topping out at 1/12th of that, so either the NVRAM cache is disabled
                                  > or too small.
                                  >
                                  > Your volume of deferred mail also seems high. Are you scanning the
                                  > deferred queue too often?
                                  >

                                  I think it will make big difference. These are Sun's X4200, they are
                                  ment to be connected to SAN or to an external array - they have SAS
                                  disks and some highly suspicious onboard RAID controller. I hope 2GBit
                                  FC infrastructure and FC arrays (yes, with cache on array's controller,
                                  but anyway) should be enough.

                                  I haven't modified anything about queue scanning interval.

                                  --
                                  ***********************************************************************
                                  Pavel Urban (pavel.urban (at) o2.com)
                                  O2 system disaster
                                  Telefonica O2 Czech Republic, a.s. - www.cz.o2.com
                                  ***********************************************************************
                                  Vegetables should not operate electronic equipment.
                                  Computer Stupidities, http://rinkworks.com/stupid/
                                  ***********************************************************************
                                Your message has been successfully submitted and would be delivered to recipients shortly.