Loading ...
Sorry, an error occurred while loading the content.

Large incoming que

Expand Messages
  • Francisco Reyes
    Postfix 2.2.10 Scenario: two mailstore machines running courier and receiving mail from 3 MX machines. Several times per week the incoming que in two machiens
    Message 1 of 10 , Sep 29, 2006
    • 0 Attachment
      Postfix 2.2.10
      Scenario: two mailstore machines running courier and receiving mail from 3
      MX machines.

      Several times per week the incoming que in two machiens grows to the
      hundreds (500 to 1000 usually) and stays high sometimes for half an hour.
      Rarely it has stayed high for over 30 minutes, but has happened. There
      were 3 SQL look and I have changed 2 of them to hash lookups. Plan to change
      the 3rd soon.

      Things I have tried
      Changing SQL lookups to hash lookups
      Increasing default_process_limit to 120 (will higher be better?)
      Increasing initial_destination_concurrency to 10
      Setting in_flow_delay to 10s

      Active que rarely has much in it, usually below 50, so for some reason the
      issue seems to be only with the incoming que (as verified with qshape).

      Looking at vmstat we see that when this happens the 'b' column seems high,
      30+ values.. Several times we tried shutting down courier imap for half a
      minute or so and that seemed to help.

      Are we not having enough disk speed to process the load?
      Any way to decrease from the mailstore the amount of mail accepted to
      incoming and speedup delivery?
      Would prefer to not slow down the MX machines because there are other
      mailstores that are not having the problem.

      The two mailstores in question have the heaviest domains in terms of how
      much incoming traffic they get. One has 4GB of RAM and the second has 8GB of
      RAM. Both have 8 SATA 7,200 RPM disks. 6 of them in RAID10 and 2 as hot
      spare. Both are dual CPU.

      I keep reading the cleanup and incoming que section of the manual, but not
      seen much else to try... except perhaps see if there is a way to increase
      the cleanup processes (if there is even a way).

      Reading the cleanup docs I see that the only check we do related to cleanup
      is message size. Everything else is defaults.

      Any recommendationgs greatly appreciated.
    • Sandy Drobic
      ... If processes are in uninterruptable sleep I would suspect the raid controller module in the first place. ... Try to find out which process consumes most of
      Message 2 of 10 , Sep 29, 2006
      • 0 Attachment
        Francisco Reyes wrote:
        > Postfix 2.2.10
        > Scenario: two mailstore machines running courier and receiving mail from
        > 3 MX machines.
        >
        > Several times per week the incoming que in two machiens grows to the
        > hundreds (500 to 1000 usually) and stays high sometimes for half an
        > hour. Rarely it has stayed high for over 30 minutes, but has happened.
        > There were 3 SQL look and I have changed 2 of them to hash lookups. Plan
        > to change the 3rd soon.
        >
        > Things I have tried
        > Changing SQL lookups to hash lookups
        > Increasing default_process_limit to 120 (will higher be better?)
        > Increasing initial_destination_concurrency to 10
        > Setting in_flow_delay to 10s
        >
        > Active que rarely has much in it, usually below 50, so for some reason
        > the issue seems to be only with the incoming que (as verified with qshape).
        >
        > Looking at vmstat we see that when this happens the 'b' column seems
        > high, 30+ values.. Several times we tried shutting down courier imap for
        > half a minute or so and that seemed to help.

        If processes are in uninterruptable sleep I would suspect the raid
        controller module in the first place.

        > Are we not having enough disk speed to process the load?
        > Any way to decrease from the mailstore the amount of mail accepted to
        > incoming and speedup delivery?
        > Would prefer to not slow down the MX machines because there are other
        > mailstores that are not having the problem.

        Try to find out which process consumes most of the I/O. Is it really the
        local delivery from Postfix or is it an imap process?

        > The two mailstores in question have the heaviest domains in terms of how
        > much incoming traffic they get. One has 4GB of RAM and the second has
        > 8GB of RAM. Both have 8 SATA 7,200 RPM disks. 6 of them in RAID10 and 2
        > as hot spare. Both are dual CPU.

        Are the raid controllers equipped with battery backup units? It helps a
        lot, if the write cache is enabled on the controllers and the controller
        has lots of fast cache RAM. Often you can enable the write cache without
        battery backup unit though this is not recommended.

        > I keep reading the cleanup and incoming que section of the manual, but
        > not seen much else to try... except perhaps see if there is a way to
        > increase the cleanup processes (if there is even a way).

        If cleanup is a bottleneck you might have lots of regexp
        header-/bodychecks. If so reduce them and change them to pcre.

        > Reading the cleanup docs I see that the only check we do related to
        > cleanup is message size. Everything else is defaults.
        >
        > Any recommendationgs greatly appreciated.

        Sandy
        --
        List replies only please!
        Please address PMs to: news-reply2 (@) japantest (.) homelinux (.) com
      • Francisco Reyes
        ... Imap processes. ... It has and the write cache is enabled. ... About 128MB of cache. It is a 3ware controller. 9500SX ... Only check we do at that stage is
        Message 3 of 10 , Sep 30, 2006
        • 0 Attachment
          Sandy Drobic writes:

          > Try to find out which process consumes most of the I/O. Is it really the
          > local delivery from Postfix or is it an imap process?

          Imap processes.

          > Are the raid controllers equipped with battery backup units? It helps a
          > lot, if the write cache is enabled

          It has and the write cache is enabled.

          > has lots of fast cache RAM.

          About 128MB of cache. It is a 3ware controller. 9500SX

          > If cleanup is a bottleneck you might have lots of regexp
          > header-/bodychecks. If so reduce them and change them to pcre.

          Only check we do at that stage is mail size.
        • Adrian Ulrich
          ... Are you using Linux or *BSD? Running something like iostat (Solaris /BSD?) or sysstat (Linux: http://perso.orange.fr/sebastien.godard/) could give you a
          Message 4 of 10 , Oct 1, 2006
          • 0 Attachment
            > Looking at vmstat we see that when this happens the 'b' column seems high,
            > 30+ values..

            Are you using Linux or *BSD?

            Running something like iostat (Solaris /BSD?) or sysstat (Linux: http://perso.orange.fr/sebastien.godard/)
            could give you a clue about how busy your Raid-system is:

            Example output, showing a *very* busy 'sda' (svctm / %util)
            # iostat -x 5
            Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
            md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
            sda 4.99 0.00 153.49 1.00 6125.35 7.98 39.70 1.17 7.54 6.34 97.88


            > Are we not having enough disk speed to process the load?

            If sysstat/iostat shows high svctm/%util values: yes.
            Otherwise your SQL-Server may be the bottleneck.



            > Any way to decrease from the mailstore the amount of mail accepted to
            > incoming and speedup delivery?

            You could try to decrase the number of smtpd processes (master.cf) or
            play with *_destination_concurrency_limit on your frontend servers.


            > The two mailstores in question have the heaviest domains in terms of how
            > much incoming traffic they get.

            How about spreading the heavy-domains across multiple backends?


            > Any recommendationgs greatly appreciated.

            You are using Maildir, correct?

            Regards,
            Adrian


            --
            A. Top posters
            Q. What's the most annoying thing on Usenet?
          • Sandy Drobic
            ... In that case no fiddling with Postfix settings will alleviate the problem. ... Increasing the cache size will help a bit, also adding one or two more hdds
            Message 5 of 10 , Oct 1, 2006
            • 0 Attachment
              Francisco Reyes wrote:
              > Sandy Drobic writes:
              >
              >> Try to find out which process consumes most of the I/O. Is it really
              >> the local delivery from Postfix or is it an imap process?
              >
              > Imap processes.

              In that case no fiddling with Postfix settings will alleviate the problem.

              >> Are the raid controllers equipped with battery backup units? It helps
              >> a lot, if the write cache is enabled
              >
              > It has and the write cache is enabled.
              >
              >> has lots of fast cache RAM.
              >
              > About 128MB of cache. It is a 3ware controller. 9500SX

              Increasing the cache size will help a bit, also adding one or two more
              hdds to the raid to spread the I/O over more hdds. How many hdds are there
              in your raid? Which raid level?

              Though I am afraid your system is just overtaxed. In that case you can
              probably only decrease the load or increase the system performance with
              better hardware.

              You need some hard figures to find out if it is sufficient to tweak your
              system or you need new hardware. Try "sar" on linux to get continuous
              measure of your system load.

              Sandy

              --
              List replies only please!
              Please address PMs to: news-reply2 (@) japantest (.) homelinux (.) com
            • Francisco Reyes
              ... FreeBSD 6.1 ... Have used iostat and I see lots of small transactions. ... What do you mean by svctm/%util ? ... Eliminated 2 of the 3 SQL lookups and
              Message 6 of 10 , Oct 2, 2006
              • 0 Attachment
                Adrian Ulrich writes:

                > Are you using Linux or *BSD?

                FreeBSD 6.1

                > Running something like iostat (Solaris /BSD?) or sysstat (Linux: http://perso.orange.fr/sebastien.godard/)
                > could give you a clue about how busy your Raid-system is:

                Have used iostat and I see lots of small transactions.

                > If sysstat/iostat shows high svctm/%util

                What do you mean by "svctm/%util"?


                > Otherwise your SQL-Server may be the bottleneck.

                Eliminated 2 of the 3 SQL lookups and plan to remove the 3rd today.

                > You could try to decrase the number of smtpd processes (master.cf) or
                > play with *_destination_concurrency_limit on your frontend servers.

                Ideally I would like to decrease incoming mails only to these two machines.
                Other machines are fine.


                > How about spreading the heavy-domains across multiple backends?

                We have 4 mailstores. These two have the better hardware so we moved the
                heaviest smtp traffic domains to these two machines. I am still trying to
                convince the powers that be.. to get a SCSI setup for these domains.

                > You are using Maildir, correct?

                Correct. Maildir run off Courier IMAP, serving imap and pop3.
              • Victor Duchovni
                ... Is syslogd writing synchronously to disk? If so, disable that behaviour. -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored.
                Message 7 of 10 , Oct 2, 2006
                • 0 Attachment
                  On Mon, Oct 02, 2006 at 10:55:10AM -0400, Francisco Reyes wrote:

                  > Adrian Ulrich writes:
                  >
                  > >Are you using Linux or *BSD?
                  >
                  > FreeBSD 6.1
                  >
                  > >Running something like iostat (Solaris /BSD?) or sysstat (Linux:
                  > >http://perso.orange.fr/sebastien.godard/)
                  > >could give you a clue about how busy your Raid-system is:
                  >
                  > Have used iostat and I see lots of small transactions.

                  Is syslogd writing synchronously to disk? If so, disable that behaviour.

                  --
                  Viktor.

                  Disclaimer: off-list followups get on-list replies or get ignored.
                  Please do not ignore the "Reply-To" header.

                  To unsubscribe from the postfix-users list, visit
                  http://www.postfix.org/lists.html or click the link below:
                  <mailto:majordomo@...?body=unsubscribe%20postfix-users>

                  If my response solves your problem, the best way to thank me is to not
                  send an "it worked, thanks" follow-up. If you must respond, please put
                  "It worked, thanks" in the "Subject" so I can delete these quickly.
                • Francisco Reyes
                  ... So far eliminating two of the 3 SQL lookups helped. Also moved IMAP connections to a gigabit switch... so hopefully IMAP connections will get served
                  Message 8 of 10 , Oct 2, 2006
                  • 0 Attachment
                    Sandy Drobic writes:

                    >> Imap processes.
                    >
                    > In that case no fiddling with Postfix settings will alleviate the problem.

                    So far eliminating two of the 3 SQL lookups helped.
                    Also moved IMAP connections to a gigabit switch... so hopefully IMAP
                    connections will get served faster.

                    > Increasing the cache size will help a bit

                    Don't think these controllers can take any more memory.

                    >, also adding one or two more
                    > hdds to the raid to spread the I/O over more hdds. How many hdds are there
                    > in your raid? Which raid level?

                    8 disk totals. 6 in RAID 10 and 2 as hot spares. I wanted to get all 8 in
                    the RAID, but the owner preferred to have them as hot spares.

                    > Though I am afraid your system is just overtaxed. In that case you can
                    > probably only decrease the load or increase the system performance with
                    > better hardware.

                    Thanks for the help. After a couple of weeks of looking into this I was
                    starting to get that impression. Waiting to see how the last changes help
                    the system. Also plan to remove the last SQL lookup today.

                    > You need some hard figures to find out if it is sufficient to tweak your
                    > system or you need new hardware.

                    Started to work on that too. Basically all the heavy SMTP domains were moved
                    to these machines and I think we need to move those domains to a new machine
                    with a SCSI setup.

                    We are also considering to move to Dovecot. As customers let email pile up
                    on directories Courier takes longer and longer to open large folders.

                    Our early testing with Dovecot looks very promissing. The first time a
                    directory is read it takes long, but still quicker than Courier, and after
                    that access is through index files which seems noticeable faster than
                    Courier. Hopefully a change in IMAP will decrease I/O tso here will me more
                    resources for Postfix.
                  • Wietse Venema
                    ... On FreeBSD, this gives cpu, memory, disk utilization and more: systat -vmstat ... If none of cpu/memory/disk are saturated, then it s likely waiting for
                    Message 9 of 10 , Oct 2, 2006
                    • 0 Attachment
                      Francisco Reyes:
                      > Adrian Ulrich writes:
                      >
                      > > Are you using Linux or *BSD?
                      >
                      > FreeBSD 6.1
                      >
                      > > Running something like iostat (Solaris /BSD?) or sysstat (Linux: http://perso.orange.fr/sebastien.godard/)
                      > > could give you a clue about how busy your Raid-system is:
                      >
                      > Have used iostat and I see lots of small transactions.

                      On FreeBSD, this gives cpu, memory, disk utilization and more:

                      systat -vmstat

                      > > Otherwise your SQL-Server may be the bottleneck.
                      >
                      > Eliminated 2 of the 3 SQL lookups and plan to remove the 3rd today.
                      >
                      > > You could try to decrase the number of smtpd processes (master.cf) or
                      > > play with *_destination_concurrency_limit on your frontend servers.
                      >
                      > Ideally I would like to decrease incoming mails only to these two machines.
                      > Other machines are fine.

                      If none of cpu/memory/disk are saturated, then it's likely waiting
                      for things going across the network.

                      "netstat -I" can identify problems with collision or bad packets;
                      "netstat -s" can help to identity trouble higher up the network
                      stack.

                      Wietse
                      >
                      > > How about spreading the heavy-domains across multiple backends?
                      >
                      > We have 4 mailstores. These two have the better hardware so we moved the
                      > heaviest smtp traffic domains to these two machines. I am still trying to
                      > convince the powers that be.. to get a SCSI setup for these domains.
                      >
                      > > You are using Maildir, correct?
                      >
                      > Correct. Maildir run off Courier IMAP, serving imap and pop3.
                      >
                      >
                    • Adrian Ulrich
                      Hi, ... It s a Mailserver afterall ;-) ... Service-Time + Percent-Busy I don t have access to a FreeBSD-Host ATM, but the output of solaris iostat -xnz 5 looks
                      Message 10 of 10 , Oct 2, 2006
                      • 0 Attachment
                        Hi,

                        > Have used iostat and I see lots of small transactions.

                        It's a Mailserver afterall ;-)


                        > What do you mean by "svctm/%util"?

                        Service-Time + Percent-Busy

                        I don't have access to a FreeBSD-Host ATM, but the output of solaris
                        iostat -xnz 5 looks like this: (Output on FreeBSD should be similar):

                        r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                        20.6 8.9 1281.1 53.1 1.8 0.3 62.1 11.2 9 21 c0d0

                        If %b(usy) is above ~>70% you are having speed-issues with your
                        storage system.


                        > Ideally I would like to decrease incoming mails only to these two machines.
                        > Other machines are fine.

                        Limit the number of smtpd processes (= Incoming mail) on this two hosts


                        > I am still trying to
                        > convince the powers that be.. to get a SCSI setup for these domains.

                        How about a 'shared storage' for all 4 Backends?

                        <ADV>
                        I'm a very happy NetApp(.com) customer: We are using a bunch
                        of FAS270 Boxes for Mailqueues and never had any (Storage-Related)
                        speed issues. They can do NFS, FCP, iSCSI (urgs) provide Raid4 and
                        Raid4-DP (Think 'raid6' -> Double Parity) you can create
                        Snapshots (without any performance issues unlike
                        ${everyone_else's_solution}) .. etc..
                        </ADV>



                        > > You are using Maildir, correct?
                        > Correct. Maildir run off Courier IMAP, serving imap and pop3.

                        So switching to Dovecot might help: IMAP with dovecot seems to be
                        pretty fast.

                        Regards,
                        Adrian
                      Your message has been successfully submitted and would be delivered to recipients shortly.