Loading ...
Sorry, an error occurred while loading the content.
 

Re: Tweaking DNS timeouts

Expand Messages
  • Viktor Dukhovni
    ... It creates a lot of needless congestion on legitimate sending systems even if they don t hang up. Now every message (from a small MTA that does not visit
    Message 1 of 25 , May 17, 2013
      On Fri, May 17, 2013 at 12:26:13PM -0500, /dev/rob0 wrote:

      > > Increasing the greet-wait to 10+ seconds could result in
      > > legitimate clients hanging up, so I would not recommend that.
      >
      > Do we have any testing to validate this? I'm pretty sure I recall
      > from a few years back on the old original SPAM-L list that some
      > Sendmail people[1] were saying they used greet pauses in excess of 30
      > seconds.

      It creates a lot of needless congestion on legitimate sending
      systems even if they don't hang up.

      Now every message (from a small MTA that does not visit often)
      starts to take 30s to make a delivery. Queue throughput collapses
      and Patrick Raq's MTA can't deliver new mail in a timely fashion.
      On the plus side, Wietse and Patrick may finally consider my
      "concurrency balooning" suggestion. :-)

      Much of the damage to the SMTP infrastructure is done by well-meaning
      anti-spam measures. Let's not take it too far.

      --
      Viktor.
    • /dev/rob0
      ... snip ... I understand all this and agree. I m not advocating a 30+ second greet pause. My original goal was to reduce delays. Most of those who manage
      Message 2 of 25 , May 17, 2013
        On Fri, May 17, 2013 at 05:53:47PM +0000, Viktor Dukhovni wrote:
        > On Fri, May 17, 2013 at 12:26:13PM -0500, /dev/rob0 wrote:
        > Wietse:
        > > > Increasing the greet-wait to 10+ seconds could result in
        > > > legitimate clients hanging up, so I would not recommend that.
        > >
        > > Do we have any testing to validate this? I'm pretty sure I
        > > recall from a few years back on the old original SPAM-L list
        > > that some Sendmail people[1] were saying they used greet
        > > pauses in excess of 30 seconds.
        >
        > It creates a lot of needless congestion on legitimate sending
        > systems even if they don't hang up.
        >
        snip
        >
        > Much of the damage to the SMTP infrastructure is done by
        > well-meaning anti-spam measures. Let's not take it too far.

        I understand all this and agree. I'm not advocating a 30+ second
        greet pause. My original goal was to reduce delays.

        Most of those who manage really busy outbounds will have gone to the
        trouble of getting listed on DNS whitelists. And for these outbounds,
        an occasional 10-second greet pause is better than "Service currently
        unavailable" and PASS NEW.

        But I think this is all moot, and my quick fix, to stop querying
        psbl.surriel.com, was the best. The moral of the story being, use
        DNSBL sites with adequate response times and five nines. It's
        probably also moot if the postscreen_dnsbl_threshold score is only
        calculated when in excess thereof in case of DNS timeouts.
        --
        http://rob0.nodns4.us/ -- system administration and consulting
        Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
      • Wietse Venema
        ... [begin background material] I mis-understood how postscreen works (I do not constantly stare at Postfix source code, having other things to work on that
        Message 3 of 25 , May 17, 2013
          /dev/rob0:
          >
          > I guess this says that postscreen_dnsbl_action fires at the end of
          > the greet pause when postscreen_dnsbl_threshold is met, but
          > postscreen_dnsbl_whitelist_threshold is not calculated. Here's the

          [begin background material]

          I mis-understood how postscreen works (I do not constantly stare
          at Postfix source code, having other things to work on that pay the
          bills).

          I thought that the whitelist will be applied only when DNS lookups
          complete *before* the pregreet timer expires. That is,

          - When some DNS lookup is taking too long, no DNS score is available.

          This is consistent with how postscreen whitelisting works for non-DNS
          tests. It applies the whitelist threshold only when DNS lookup
          completes before the pregreet timer expires.

          However, the bullet above is incorrect. When soe DNS lookup takes
          too long, a DNS score is available, and the postscreen DNS blocking
          code uses that partial score.

          This is safe when there are only positive scores (if the partial
          client is already over the threshold then the client should be
          blocked even if some DNS results are not yet in).

          This is less safe when there may also be exculpatory evidence (in
          the form of DNSWL lookups). But, sites are usually not listed in
          both white and block lists.

          [end background material]

          I can change postscreen to also use partial scores for whitelisting
          of non-DNS tests, and thereby make whitelisting of non-DNS tests
          consistent with DNS-based blocking (that's one less WTF factor).
          This requires minor code duplication.

          Wietse
        • Wietse Venema
          ... Released as snapshot 20130517. Wietse
          Message 4 of 25 , May 17, 2013
            Wietse Venema:
            > I can change postscreen to also use partial scores for whitelisting
            > of non-DNS tests, and thereby make whitelisting of non-DNS tests
            > consistent with DNS-based blocking (that's one less WTF factor).
            > This requires minor code duplication.

            Released as snapshot 20130517.

            Wietse
          • /dev/rob0
            ... For testing I reenabled PSBL, and I ll see what comes in overnight. I thought I could make my own pseudo-DNSBL on a random IP address with blocked ports
            Message 5 of 25 , May 17, 2013
              On Fri, May 17, 2013 at 10:06:38PM -0400, Wietse Venema wrote:
              > Wietse Venema:
              > > I can change postscreen to also use partial scores for
              > > whitelisting of non-DNS tests, and thereby make whitelisting
              > > of non-DNS tests consistent with DNS-based blocking (that's one
              > > less WTF factor). This requires minor code duplication.
              >
              > Released as snapshot 20130517.

              For testing I reenabled PSBL, and I'll see what comes in overnight.
              I thought I could make my own pseudo-DNSBL on a random IP address
              with blocked ports 53, but I need to set up an NS record to point to
              that. I'll do that tomorrow if results tonight are inconclusive.
              --
              http://rob0.nodns4.us/ -- system administration and consulting
              Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
            • Wietse Venema
              ... For whitelisting I used a wild-card A record, and for timeout testing I used an NS record that resolves to a firewalled port (a black hole). This
              Message 6 of 25 , May 18, 2013
                /dev/rob0:
                > On Fri, May 17, 2013 at 10:06:38PM -0400, Wietse Venema wrote:
                > > Wietse Venema:
                > > > I can change postscreen to also use partial scores for
                > > > whitelisting of non-DNS tests, and thereby make whitelisting
                > > > of non-DNS tests consistent with DNS-based blocking (that's one
                > > > less WTF factor). This requires minor code duplication.
                > >
                > > Released as snapshot 20130517.
                >
                > For testing I reenabled PSBL, and I'll see what comes in overnight.
                > I thought I could make my own pseudo-DNSBL on a random IP address
                > with blocked ports 53, but I need to set up an NS record to point to
                > that. I'll do that tomorrow if results tonight are inconclusive.

                For whitelisting I used a wild-card "A" record, and for timeout
                testing I used an NS record that resolves to a firewalled port (a
                black hole).

                This confirmed that postscreen will now use partial scores to
                whitelist pending non-dnbsbl tests.

                I can make those domain names available for general testing (but
                not now as I am in the middle of a copper-to-fiber conversion).

                Wietse
              • /dev/rob0
                Still watching logs, this one just passed by. Probably unrelated to the changes in 20130517, but I was curious about it: May 19 13:24:20 harrier
                Message 7 of 25 , May 19, 2013
                  Still watching logs, this one just passed by. Probably unrelated to
                  the changes in 20130517, but I was curious about it:

                  May 19 13:24:20 harrier postfix/postscreen[3533]: CONNECT from [188.42.15.19]:48706 to [207.223.116.211]:25
                  May 19 13:24:26 harrier postfix/postscreen[3533]: NOQUEUE: reject: RCPT from [188.42.15.19]:48706: 450 4.3.2 Service currently unavailable; from=<bounce@...>, to=<munged@...>, proto=ESMTP, helo=<mail18.consumer-news123.com>
                  May 19 13:24:26 harrier postfix/postscreen[3533]: PASS NEW [188.42.15.19]:48706
                  May 19 13:24:26 harrier postfix/postscreen[3533]: DISCONNECT [188.42.15.19]:48706

                  All is well and good for a non-whitelisted host, but apparently it
                  was too quick in coming back to the secondary MX IP address ...

                  May 19 13:24:26 harrier postfix/postscreen[3533]: CONNECT from [188.42.15.9]:33610 to [207.223.116.214]:25
                  May 19 13:24:26 harrier postfix/postscreen[3533]: WHITELIST VETO [188.42.15.9]:33610

                  ... all in the same second, but according to syslog, sequentially
                  after having earned whitelist status.

                  May 19 13:24:32 harrier postfix/postscreen[3533]: NOQUEUE: reject: RCPT from [188.42.15.9]:33610: 450 4.3.2 Service currently unavailable; from=<bounce@...>, to=<munged@...>, proto=ESMTP, helo=<mail8.consumer-news123.com>
                  May 19 13:24:32 harrier postfix/postscreen[3533]: DISCONNECT [188.42.15.9]:33610

                  Another six seconds pass before this one is turned away, which
                  suggests that the greet pause was repeated. Makes sense, because
                  "WHITELIST VETO" means it was not seen before.
                  --
                  http://rob0.nodns4.us/ -- system administration and consulting
                  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
                • Wietse Venema
                  ... postscreen does not find the client IP address in the permanent postscreen_access_list, does not find client the IP address in the temporary
                  Message 8 of 25 , May 19, 2013
                    /dev/rob0:
                    > Still watching logs, this one just passed by. Probably unrelated to
                    > the changes in 20130517, but I was curious about it:
                    >
                    > May 19 13:24:20 harrier postfix/postscreen[3533]: CONNECT from [188.42.15.19]:48706 to [207.223.116.211]:25
                    > May 19 13:24:26 harrier postfix/postscreen[3533]: NOQUEUE: reject: RCPT from [188.42.15.19]:48706: 450 4.3.2 Service currently unavailable; from=<bounce@...>, to=<munged@...>, proto=ESMTP, helo=<mail18.consumer-news123.com>
                    > May 19 13:24:26 harrier postfix/postscreen[3533]: PASS NEW [188.42.15.19]:48706
                    > May 19 13:24:26 harrier postfix/postscreen[3533]: DISCONNECT [188.42.15.19]:48706

                    postscreen does not find the client IP address in the permanent
                    postscreen_access_list, does not find client the IP address in the
                    temporary postscreen_cache_map, logs the "all tests passed" status,
                    updates the temporary postscreen_cache_map with the expiration time
                    for each test, and forgets the test results.

                    > All is well and good for a non-whitelisted host, but apparently it
                    > was too quick in coming back to the secondary MX IP address ...
                    >
                    > May 19 13:24:26 harrier postfix/postscreen[3533]: CONNECT from [188.42.15.9]:33610 to [207.223.116.214]:25
                    > May 19 13:24:26 harrier postfix/postscreen[3533]: WHITELIST VETO [188.42.15.9]:33610
                    >
                    > ... all in the same second, but according to syslog, sequentially
                    > after having earned whitelist status.

                    postscreen logs "CONNECT from", does not find the client IP address
                    in the permanent postscreen_access_list, and does not find the
                    client IP address in the temporary postscreen_cache_map. Therefore
                    this is handled as a non-whitelisted client that connects to the
                    "wrong" IP address.

                    Why wasn't the client IP address found in the temporary
                    postscreen_cache_map? Maybe silent corruption of the cache database.

                    Wietse
                  Your message has been successfully submitted and would be delivered to recipients shortly.