Loading ...
Sorry, an error occurred while loading the content.

Re: R: Scheduling policies for outgoing smtp server

Expand Messages
  • Wietse Venema
    ... Good. The use of unbounded memory came up recently in a different Postfix context. ... That solution would have no foreach job in the queue . If Postfix
    Message 1 of 17 , Apr 8 10:31 AM
    View Source
    • 0 Attachment
      Giorgio Luchi:
      > I agree with you and with the sentence "Postfix is designed to
      > work on a mail queue of any size. This is possible because Postfix
      > works with a fixed memory budget.", and we don't want to break the
      > architecture.

      Good. The use of unbounded memory came up recently in a different
      Postfix context.

      > What I'm talking about at this stage, it's if the scheduling
      > algorithm we are asking for could be a good choice for an "only
      > outgoing/relay" SMTP server.
      > I know that the instruction "foreach transport's job (round-robin
      > by ip or auth-user)" is too much superficial to address completely
      > the problem but, if there is interest, I'm sure that we could find
      > a solution to solve the issue.

      That solution would have no "foreach job in the queue".

      If Postfix has room in memory for N job entries, and some customer
      sends 10*N, then Postfix will not look at any other customer's mail
      for some amount of time. This is because by design Postfix cannot
      know what other jobs exist in the mail queue.

      > I've saw the idea behind "dead destination", "preemption", "dealing
      > with memory resource limits"; similar ways could be applied to our
      > request: we could limit the size of the customers queue, we could
      > limit the size of each customer queue with a max number of messages,
      > ... (others); it may be that the solution will not be fair at 100%,
      > but a little less could be enough to address the problem.

      How does Postfix enforce per-destination concurrency policy, when
      mail is kept in per-customer queues?

      What happens when Postfix opens a queue file and discovers that
      this customer's queue is full? Postfix must not open the same files
      again and again in an infinite loop, and Postfix must not have
      "foreach job in the queue" knowledge.

      We could move such messages to the deferred queue, but that would
      make queue management more expensive, replacing rename(incoming,
      active) +remove() with rename(incoming, deferred) +rename(deferred,
      active) +remove().

      > I'd like to know if we can engage someone that works on the core
      > of postfix to implement our feature. We'd like to have this feature
      > on our servers and so we are also glad to pay for someone that can
      > agree with us and that could collaborate with us to implement this
      > request.
      > I'm not so familiar with feature request in open source project,
      > so I apologize if I'm not following the "right procedure".

      I'm sure that each project has its "right procedure".

      With Postfix the procedure is to present a design that is detailed
      enough so that other people can understand how it works (a little
      shorter perhaps than the scheduler_readme file). Then, people like
      Victor and myself will look for problems and whether they can be
      fixed.

      Wietse
    • Stan Hoeppner
      ... Isn t this a class of problem that can be fairly easily solved using virtual machines? Dedicate a VM and Postfix per customer, without needing to hack up
      Message 2 of 17 , Apr 8 12:08 PM
      View Source
      • 0 Attachment
        On 4/8/2013 12:31 PM, Wietse Venema wrote:
        > Giorgio Luchi:
        >> I agree with you and with the sentence "Postfix is designed to
        >> work on a mail queue of any size. This is possible because Postfix
        >> works with a fixed memory budget.", and we don't want to break the
        >> architecture.
        ...
        > How does Postfix enforce per-destination concurrency policy, when
        > mail is kept in per-customer queues?
        >
        > What happens when Postfix opens a queue file and discovers that
        > this customer's queue is full? Postfix must not open the same files
        > again and again in an infinite loop, and Postfix must not have
        > "foreach job in the queue" knowledge.
        >
        > We could move such messages to the deferred queue, but that would
        > make queue management more expensive, replacing rename(incoming,
        > active) +remove() with rename(incoming, deferred) +rename(deferred,
        > active) +remove().
        >
        >> I'd like to know if we can engage someone that works on the core
        >> of postfix to implement our feature. We'd like to have this feature
        >> on our servers and so we are also glad to pay for someone that can
        >> agree with us and that could collaborate with us to implement this
        >> request.

        Isn't this a class of problem that can be fairly easily solved using
        virtual machines? Dedicate a VM and Postfix per customer, without
        needing to hack up the MTA. If the issue is "queue fairness" then one
        virtual machine per customer should address this. Disk space is so
        cheap today that dedicating a few GB to a queue for each customer isn't
        a limiting factor. With a sufficiently stripped down custom Linux or
        FreeBSD image the OS memory footprint should be small enough to pack
        many VMs/customers onto one machine. In the case of Linux one may be
        able to use KVM/KSM to consolidate all the like in memory binary images,
        cutting down the total memory footprint even further. The same can be
        done with VMWare ESXi, probably more easily in the latter case, but this
        freebie version probably limits the number of virtual machines to a
        value lower than what you'd need.

        A small farm of eight inexpensive 1U single socket quad core servers
        with 32GB RAM and a couple of 200GB mirrored SSDs could handle quite a
        few customers. Reserve 2GB RAM and 20GB disk for the hypervisor and
        assign 512MB RAM and 3GB disk to each customer VM. This allows for 60
        customer dedicated servers per box each with a ~2.5GB queue. With 8
        such servers that's 480 customer outbound dedicated relay servers in 8U
        of rack space, for relatively little hardware investment. This could be
        tweaked to fit even more customer VMs/queues per box depending on per
        customer queue requirements.

        --
        Stan
      • Reindl Harald
        ... have fun with a grwoing number of customers up to some hundret http://www.postfix.org/MULTI_INSTANCE_README.html
        Message 3 of 17 , Apr 8 12:16 PM
        View Source
        • 0 Attachment
          Am 08.04.2013 21:08, schrieb Stan Hoeppner:
          > Isn't this a class of problem that can be fairly easily solved using
          > virtual machines? Dedicate a VM and Postfix per customer, without
          > needing to hack up the MTA. If the issue is "queue fairness" then one
          > virtual machine per customer should address this. Disk space is so
          > cheap today that dedicating a few GB to a queue for each customer isn't
          > a limiting factor. With a sufficiently stripped down custom Linux or
          > FreeBSD image the OS memory footprint should be small enough to pack
          > many VMs/customers onto one machine. In the case of Linux one may be
          > able to use KVM/KSM to consolidate all the like in memory binary images,
          > cutting down the total memory footprint even further. The same can be
          > done with VMWare ESXi, probably more easily in the latter case, but this
          > freebie version probably limits the number of virtual machines to a
          > value lower than what you'd need

          have fun with a grwoing number of customers up to some hundret
          http://www.postfix.org/MULTI_INSTANCE_README.html
        • Simone Caruso
          On 06/04/2013 15:01, Wietse Venema wrote ... (if i understood correctly the needs of Giorgio) Can be the following points the way to obtain round-robin
          Message 4 of 17 , Apr 8 3:00 PM
          View Source
          • 0 Attachment
            On 06/04/2013 15:01, Wietse Venema wrote
            > There must be other solutions that can work with a fixed memory
            > budget and that can push excessive mail to a "low-priority" queue
            > that is processed when it does not interfere with other delivery.
            >
            > Any solution that requires knowledge of THE COMPLETE MAIL QUEUE
            > will be firmly rejected. I will not allow a built-in limitation on
            > the amount of mail that Postfix can handle.
            >
            (if i understood correctly the needs of Giorgio)
            Can be the following points the way to obtain round-robin delivery based on
            sender address/ip?

            1. change the "incoming queue" directory structure indexing sub-directories with
            a per-sender index
            2. changing the algorithm used by qmgr to move email from "Incoming" to "Active"
            queue, (a foreach over subdirs)

            This increases disks i/o but you can obtain a pretty fair delivery between senders

            --
            Simone Caruso
            IT Consultant
            +39 349 65 90 805
          • Wietse Venema
            ... I just simulated the performance hit of 256 incoming queues by setting hash_queue_names = incoming hash_queue_depth = 2 and running smtp-source, sending
            Message 5 of 17 , Apr 8 4:47 PM
            View Source
            • 0 Attachment
              Simone Caruso:
              > On 06/04/2013 15:01, Wietse Venema wrote
              > > There must be other solutions that can work with a fixed memory
              > > budget and that can push excessive mail to a "low-priority" queue
              > > that is processed when it does not interfere with other delivery.
              > >
              > > Any solution that requires knowledge of THE COMPLETE MAIL QUEUE
              > > will be firmly rejected. I will not allow a built-in limitation on
              > > the amount of mail that Postfix can handle.
              > >
              > (if i understood correctly the needs of Giorgio)
              > Can be the following points the way to obtain round-robin delivery based on
              > sender address/ip?
              >
              > 1. change the "incoming queue" directory structure indexing sub-directories with
              > a per-sender index
              > 2. changing the algorithm used by qmgr to move email from "Incoming" to "Active"
              > queue, (a foreach over subdirs)
              >
              > This increases disks i/o but you can obtain a pretty fair delivery
              > between senders

              I just simulated the performance hit of 256 incoming queues by setting

              hash_queue_names = incoming
              hash_queue_depth = 2

              and running smtp-source, sending mail to an alias for /dev/null.

              Postfix queue performance for small messages already dropped by
              30%, with the write cache enabled on a 10,000RPM SAS disk (which
              is recommended for a production server only when the write cache
              has a battery to survive power failures).

              The performance drop will be worse with one queue directory per
              customer, unless you have very few customers of course.

              Wietse
            • Simone Caruso
              ... I expected some degradation in performance, but not so much (you tried with a lot of queues too). I think the example environment is a mail marketing relay
              Message 6 of 17 , Apr 8 5:50 PM
              View Source
              • 0 Attachment
                > I just simulated the performance hit of 256 incoming queues by setting
                >
                > hash_queue_names = incoming
                > hash_queue_depth = 2
                >
                > and running smtp-source, sending mail to an alias for /dev/null.
                >
                > Postfix queue performance for small messages already dropped by
                > 30%, with the write cache enabled on a 10,000RPM SAS disk (which
                > is recommended for a production server only when the write cache
                > has a battery to survive power failures).
                >
                > The performance drop will be worse with one queue directory per
                > customer, unless you have very few customers of course.
                >
                >
                I expected some degradation in performance, but not so much (you tried with a
                lot of queues too).

                I think the example environment is a mail marketing relay server, Giorgio said:
                "User A, with ip address IP_A, sends 1 different email to 1 million of different
                domain destinations"

                The indexing approach can fit this this specific application (marketing cloud
                service!?); the daemon
                don't need to scan on disk lots of hashes/subdirs. (a small size hash loadable
                in memory can be less expensive)

                --
                Simone Caruso
                IT Consultant
                +39 349 65 90 805
              • Wietse Venema
                ... I reject solutions that require in-memory information about all mail in the queue. More generally, I reject solutions that cause Postfix to fail with more
                Message 7 of 17 , Apr 9 4:00 AM
                View Source
                • 0 Attachment
                  Simone Caruso:
                  > > I just simulated the performance hit of 256 incoming queues by setting
                  > >
                  > > hash_queue_names = incoming
                  > > hash_queue_depth = 2
                  > >
                  > > and running smtp-source, sending mail to an alias for /dev/null.
                  > >
                  > > Postfix queue performance for small messages already dropped by
                  > > 30%, with the write cache enabled on a 10,000RPM SAS disk (which
                  > > is recommended for a production server only when the write cache
                  > > has a battery to survive power failures).
                  > >
                  > > The performance drop will be worse with one queue directory per
                  > > customer, unless you have very few customers of course.
                  > >
                  > >
                  > I expected some degradation in performance, but not so much (you tried with a
                  > lot of queues too).
                  >
                  > I think the example environment is a mail marketing relay server, Giorgio said:
                  > "User A, with ip address IP_A, sends 1 different email to 1 million of different
                  > domain destinations"
                  >
                  > The indexing approach can fit this this specific application (marketing cloud
                  > service!?); the daemon
                  > don't need to scan on disk lots of hashes/subdirs. (a small size hash loadable
                  > in memory can be less expensive)

                  I reject solutions that require in-memory information about all
                  mail in the queue.

                  More generally, I reject solutions that cause Postfix to fail with
                  more than N messages in the queue, regardless of the value of N.

                  Wietse
                • Giorgio Luchi
                  Hi, I m sorry for the delay, but I m very busy in some projects. I continue the discussion with my opinion and some details. No virtual machines and no multi
                  Message 8 of 17 , Apr 15 4:20 AM
                  View Source
                  • 0 Attachment
                    Hi,
                    I'm sorry for the delay, but I'm very busy in some projects.
                    I continue the discussion with my opinion and some details.

                    No virtual machines and no multi instance solution: we have more than 10.000 customers, so these solutions are not applicable. We don't want to classify them (in order to have fewer outgoing queues).

                    I'm thinking and trying to found a solution that could respect the indication "Postfix mustn't fail with more than N messages in the queue".

                    I work for an ISP, so I consider "incoming" emails and "outgoing" emails as two separate services. The first is already served in the right way by the existing scheduling algorithm, for the second I think that there could be another separated scheduling algorithm. So I think about the possibility to choose a different algorithm by some configuration parameter (i.e. specifying in master.cf startup different options for different transports).

                    >How does Postfix enforce per-destination concurrency policy, when mail is kept in per-customer queues?
                    Assuming that we choose the right algorithm for every transport (the default could be the existing one):
                    - with "incoming" algorithm (the existing one): there is no need for "sender round-robin", in fact the current algorithm doesn't use sender information
                    - with "outgoing" algorithm: it should be analogous; in this case we need to do only "sender round-robin", the destination doesn't matter.

                    The "outgoing" scheduling algorithm looks like the "incoming" with the difference in the step of picking up the email (like I wrote in a previous post).

                    How can we implement round-robin by sender ip/authenticated user and to preserve the memory constraint too?
                    - "sender" is the sender's ip address or the authenticated user name (i.e. "80.93.143.50" or "giorgio.luchi")
                    - "rrsender_message_limit" is the max number of messages in the sender queue (i.e. "3")
                    - "rrsender_queue_limit" is the max number of senders actually loaded in RAM (i.e. "6000")
                    - "What happens when Postfix opens a queue file and discovers that this customer's queue is full? Postfix must not open the same files again and again in an infinite loop, and Postfix must not have "foreach job in the queue" knowledge." - "We could move such messages to the deferred queue, but that would make queue management more expensive, replacing rename(incoming, active) +remove() with rename(incoming, deferred) +rename(deferred, active) +remove()": I think the solution should behaves as the second proposal. I know that it costs more in term of IOPS, but in our environment of ISP, we can spend some effort and money for this operation and accepting "lower performance" (we can gain the lost power, balancing the load with several dedicated servers)
                    - so, at any time, in RAM there will be at maximum rrsender_queue_limit queues per rrsender_message_limit messages, it means 6000 different senders that need to send emails simultaneously (I think is a big number)
                    - we could also implement preemption in an analogous way as the current scheduling manager do (if a message is sent to several recipients)

                    I hope this can help in understanding and in finding a solution to what we need.

                    Regards
                    Giorgio Luchi
                  • Wietse Venema
                    Coming back to original example of a one-million message queue: Postfix is designed to survive extreme overload, but all mail will be delayed. This is no
                    Message 9 of 17 , Apr 15 7:00 AM
                    View Source
                    • 0 Attachment
                      Coming back to original example of a one-million message queue:
                      Postfix is designed to survive extreme overload, but all mail will
                      be delayed. This is no different than the road to the airport:
                      when it becomes full, all vehicles will be delayed. Both the Postfix
                      scheduler and the road to the airport have a finite capacity.
                      Once they become congested you get first-in, first-out.

                      If you want fairness with 1M+ messages and a scheduler with a fixed
                      memory budget, then you need a scheduler admission policy.

                      For the problem at hand, the important scheduler decisions are:

                      a) Output side: which (destination, recipients) to deliver next.

                      I think that this part does not need to be changed, precisely
                      because the scheduler can see only a subset of all recipients
                      in the mail queue. The trick is deciding what recipient subset
                      the scheduler gets to see.

                      b) Input side: how many recipients to read from a queue file.

                      This is an important part of Patrik's scheduler, but fairness
                      between multi- and single-recipient mail is not the issue here.

                      In the case of a queue full of single-recipient messages, the
                      only choice is to exclude a queue file from consideration (in
                      terms of the road to the airport, to not allow a car to enter
                      the road).

                      Basically this means some sort of "active queue" admission policy
                      that includes queue file skipping based on some policy.

                      A queue file can be skipped in more than one way. Besides moving a
                      file the deferred queue, a file can also be skipped by leaving it
                      in the incoming queue and setting the mtime time stamp a little
                      into the future. We could also add an "overflow" queue that is read
                      when the incoming queue becomes empty. Which option is better depends
                      on how large the amount excess of mail is.

                      Wietse
                    • Viktor Dukhovni
                      ... Your specific proposal is not viable. The scheduler (queue manager processing the active queue) works with a bounded subset of messages and message
                      Message 10 of 17 , Apr 15 9:59 AM
                      View Source
                      • 0 Attachment
                        On Mon, Apr 15, 2013 at 01:20:58PM +0200, Giorgio Luchi wrote:

                        > How can we implement round-robin by sender ip/authenticated user
                        > and to preserve the memory constraint too?
                        >
                        > - "sender" is the sender's ip address or the authenticated user
                        > name (i.e. "80.93.143.50" or "giorgio.luchi")
                        > - "rrsender_message_limit" is the max number of messages in the
                        > sender queue (i.e. "3")
                        > - "rrsender_queue_limit" is the max number of senders actually
                        > loaded in RAM (i.e. "6000")
                        >
                        > [...]
                        >
                        > I hope this can help in understanding and in finding a solution
                        > to what we need.

                        Your specific proposal is not viable. The scheduler (queue manager
                        processing the active queue) works with a bounded subset of messages
                        and message recipients. Think of these as people who are in the
                        airport terminal whose ticket class determines which line they may
                        join, with some lines getting better service than others.

                        The capacity of the capacity of the roads leading to the terminal
                        is finite. A first-class ticket holder cannot take advantage of
                        preferrential treatment at the terminal if he is stuck behind
                        thousands of economy ticket holders on a congested highway.

                        The highway is analogous to the Postfix incoming and deferred queues
                        and finite capacity of smtpd(8) processes to take in new mail.

                        You must abandon all hope of trying to make the highway more "fair"
                        than FIFO, this is not possible.

                        What is possible is tweaking the algorithm at the terminal. It is
                        in principle possible to adjust the scheduler algorithm for selecting
                        the next message to send (out of those already in the active queue).

                        Right now the scheduler is able to interleave delivery of later
                        arriving small messages with progressive delivery of larger (mailing
                        list) messages, so that a single newsletter delivery does not
                        substantially delay the delivery of all subsequent mail.

                        If the same newsletter were to be injected into the queue as as a
                        burst of individual messages the scheduler would not be able to
                        apply Patrick Raq's nqmgr FIFO preemption logic.

                        One could conceivable enhance the algorithm to support a notion
                        of job groups, where the pre-emption logic works at two different
                        levels.

                        - A job group can be preempted by a later job group after the
                        first job group consumes enough delivery slots.

                        - A job within a group can be preempted by a later job within that
                        group after the first job consumes enough delivery slots.

                        It would then be up to the input stage to tag queue files with job
                        group identifiers. This could be done a policy service that returns
                        a new access(5) action to set the job group id.

                        The effect is to logically "re-assemble" a multi-message bulk
                        mailing from a single source (be it by IP, SASL username, sender
                        address, ...) into a single logical message which is subject to
                        the scheduler preemption algorithm.

                        This is perhaps a fruitful direction, though an ISP would likely
                        get more "bang for the buck" by rate limiting input with a policy
                        service. The outlined design would be of greater utility in large
                        corporate and some hosted email marketing scenarios.

                        If anyone were to design an build such a thing on contract, it
                        would probably be Patrick Raq (if he's available). Otherwise,
                        nobody else comes to mind, you'd have to hire someone who's capable
                        of understanding Patrick's design in detail, and extending it
                        correctly.

                        The result would be unlikely to be incorporated into the mainstream
                        Postfix release unless it were exceedingly well documented and
                        implemented with care. Most likely you'd be stuck maintaining it
                        as a private patch indefinitely.

                        So the best approach is perhaps to find some other MTA that does
                        what you want if one exists. Otherwise, try to leverage the existing
                        feature set to approximate what you want, rate limiting input is
                        my best suggestion.

                        --
                        Viktor.
                      • Timo Röhling
                        ... Doesn t this scream for two different postfix instances? - One high priority instance with strict rate limiting - One low priority instance for bulk
                        Message 11 of 17 , Apr 15 11:10 AM
                        View Source
                        • 0 Attachment
                          Am 2013-04-05 12:36, schrieb Giorgio Luchi:
                          > - User A [...] sends 1 different email to 1 million
                          > of different domain destinations
                          > [...]
                          > - User B [...] sends an email to a different more domain destinatio

                          Doesn't this scream for two different postfix instances?
                          - One high priority instance with strict rate limiting
                          - One low priority instance for bulk messages, possibly with some sort
                          of traffic shaping

                          You could check for a Precedence: bulk header to forward mails to the
                          correct postfix instance automatically. Or am I overlooking an obvious
                          problem here?

                          Timo
                        • Giorgio Luchi
                          Viktor, I ll report your answer internally and so we ll decide next step. Timo, also your suggestion is right; the problem is that we must teach to our
                          Message 12 of 17 , Apr 16 4:07 AM
                          View Source
                          • 0 Attachment
                            Viktor, I'll report your answer internally and so we'll decide next step.

                            Timo, also your suggestion is right; the problem is that we must teach to our Customers the guide line to use, and it's not an easy task. But we can do something with it.

                            I'll back on this thread as soon I have news

                            Thanks to all
                            Giorgio Luchi
                          Your message has been successfully submitted and would be delivered to recipients shortly.