Loading ...
Sorry, an error occurred while loading the content.

Understanding postscreen timeouts

Expand Messages
  • Alex
    Hi, I m using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I m however occasionally receiving
    Message 1 of 10 , May 1 2:21 PM
    • 0 Attachment
      Hi,

      I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message:

      May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org

      This appears to happen during periods of load, but also when the server is idle. I understand it's possible to increase the timeout, but I would think 10s would be long enough, so didn't want to start doing that. This is also on multiple hosts on multiple different, unrelated networks.

      I'm also using a half-dozen RBLs, but they don't all always timeout.

      I'm using a local bind caching server on the hosts that are involved. Should I consider setting up rbldnsd for this instead? Or is that only for caching local RBLs only?

      What is the result of this timeout? Does postscreen/dnsblog retry, or is the attempt failed and the mail just passed on?

      Here is the relevant postscreen info from my config. Please let me know if the full config is necessary.

      postscreen_access_list = permit_mynetworks, cidr:/etc/postfix/postscreen_access.cidr
      postscreen_blacklist_action = drop
      postscreen_dnsbl_action = enforce
      postscreen_dnsbl_reply_map = pcre:$config_directory/postscreen_dnsbl_reply_map.pcre
      postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2 bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com bl.mailspike.net swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
      postscreen_dnsbl_threshold = 3
      postscreen_greet_action = enforce
      postscreen_whitelist_interfaces = static:all 172.XX.YY.160/32 64.XX.YY.0/24 67.XX.YY.0/24

      Thanks so much,
      Alex

    • Wietse Venema
      ... This time limit has unfortunately escaped my attention. It is not yet configurable. The warning message means that postscreen gives up waiting for the DNS
      Message 2 of 10 , May 1 2:38 PM
      • 0 Attachment
        Alex:
        > I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
        > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
        > receiving the following timeout message:
        >
        > May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
        > timeout 10s for swl.spamhaus.org

        This time limit has unfortunately escaped my attention. It is not
        yet configurable.

        The warning message means that postscreen gives up waiting for the
        DNS lookup result. This is a safety mechanism.

        > I'm also using a half-dozen RBLs, but they don't all always timeout.

        I see occasional timeouts on residential and co-located servers.
        By default the resolver *system library* routines wait 5s before
        retrying; this may be configurable in resolv.conf, but the
        postscreen time limit is still hard-coded.

        Wietse
      • Alex
        Hi, ... These are both corporate 10mbs dedicated links and I don t think latency and/or bandwidth is a problem. It actually appears swl.spamhaus.org is the
        Message 3 of 10 , May 1 6:15 PM
        • 0 Attachment
          Hi,

          On Thu, May 1, 2014 at 5:38 PM, Wietse Venema <wietse@...> wrote:
          Alex:
          > I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
          > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
          > receiving the following timeout message:
          >
          > May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
          > timeout 10s for swl.spamhaus.org

          This time limit has unfortunately escaped my attention.  It is not
          yet configurable.

          The warning message means that postscreen gives up waiting for the
          DNS lookup result. This is a safety mechanism.

          > I'm also using a half-dozen RBLs, but they don't all always timeout.

          I see occasional timeouts on residential and co-located servers.
          By default the resolver *system library* routines wait 5s before
          retrying; this may be configurable in resolv.conf, but the
          postscreen time limit is still hard-coded.

          These are both corporate 10mbs dedicated links and I don't think latency and/or bandwidth is a problem.

          It actually appears swl.spamhaus.org is the main problem. It doesn't even resolve when I try to do it manually. This was a recommendation I used from this list some time ago. Has something changed? This is my current config:

          postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
                  b.barracudacentral.org*2
                  bl.spameatingmonkey.net*2
                  bl.spamcop.net
                  dnsbl.sorbs.net
                  psbl.surriel.com
                  bl.mailspike.net
                  swl.spamhaus.org*-4
                  list.dnswl.org=127.[0..255].[0..255].0*-2
                  list.dnswl.org=127.[0..255].[0..255].1*-3
                  list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

          I'm also curious what resolvers people are using for their mail servers? bind? Looking at my query graphs, it appears to be about 30 queries/sec on average for each host, just as a local caching server.

          Thanks,
          Alex

        • Stan Hoeppner
          On 5/1/2014 8:15 PM, Alex wrote: ... The problem, if network related, will be UDP packet loss somewhere in the end-to-end path, not b/w or latency on the CPE
          Message 4 of 10 , May 1 8:15 PM
          • 0 Attachment
            On 5/1/2014 8:15 PM, Alex wrote:
            ...
            > These are both corporate 10mbs dedicated links and I don't think latency
            > and/or bandwidth is a problem.

            The problem, if network related, will be UDP packet loss somewhere in
            the end-to-end path, not b/w or latency on the CPE link into the
            provider's net.

            > It actually appears swl.spamhaus.org is the main problem. It doesn't even
            > resolve when I try to do it manually.

            From here:

            $ host 2.0.0.127.swl.spamhaus.org
            2.0.0.127.swl.spamhaus.org has address 127.0.2.2

            What response do you receive?

            Due to your query volume you require paid service for Spamhaus Zen. The
            same terms apply to all Spamhaus services. Your IPs may have been
            blacklisted from DWL due to high query volume. Contact Spamhaus. If
            your contract entitles you to all Spamhaus lists, the fix may be as
            simple as changing the SWL hostname and adding your key.

            > This was a recommendation I used from
            > this list some time ago. Has something changed?

            See above.

            > postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
            > b.barracudacentral.org*2
            > bl.spameatingmonkey.net*2
            > bl.spamcop.net
            > dnsbl.sorbs.net
            > psbl.surriel.com
            > bl.mailspike.net

            With these 7 dnsbls you will have extreme overlap of listed IPs. The
            last 5 will gain you little to nothing and simply add latency to your
            mail transactions, which is something you do not want in a high volume
            environment. I'd recommend you use Zen and BRBL, remove the rest, and
            rely on SWL and dnswl for FP mitigation during SMTP. You also run
            SpamAssassin on all of these hosts, so there's no need to pile on dnsbl
            queries at SMTP connect.

            > swl.spamhaus.org*-4
            > list.dnswl.org=127.[0..255].[0..255].0*-2
            > list.dnswl.org=127.[0..255].[0..255].1*-3
            > list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

            Consolidate these last 3 to something like:
            list.dnswl.org=127.0.[2..14].[2..3]*-4

            To understand why, read "Return Codes" at:
            http://dnswl.org/tech

            > I'm also curious what resolvers people are using for their mail servers?
            > bind? Looking at my query graphs, it appears to be about 30 queries/sec on
            > average for each host, just as a local caching server.

            That's ~2.6M queries/day/host. Eliminating the 5 unnecessary dnsbl
            queries will lower this considerably. If you're not happy with bind,
            check out: http://doc.powerdns.com/html/built-in-recursor.html

            If you have more than a handful of hosts doing 2.5M queries/day, you
            should seriously consider building a couple of resolvers homed in
            different networks and having the MX hosts query the pair. This will
            cut down considerably on the query load you're placing on your dns[b|w]l
            servers, as resolver cache will be much more effective.

            Cheers,

            Stan
          • Tom Hendrikx
            ... As a feed user of spamhaus, it s easy to see the amount of data that is actually in the zones. Both DWL and SWL zones are empty, so the whitelist
            Message 5 of 10 , May 2 12:57 AM
            • 0 Attachment
              On 05/02/2014 03:15 AM, Alex wrote:
              > Hi,
              >
              > On Thu, May 1, 2014 at 5:38 PM, Wietse Venema <wietse@...
              > <mailto:wietse@...>> wrote:
              >
              > Alex:
              > > I'm using postfix-2.10.3 with fedora20 and have configured
              > postscreen with
              > > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
              > > receiving the following timeout message:
              > >
              > > May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog
              > reply
              > > timeout 10s for swl.spamhaus.org <http://swl.spamhaus.org>
              >
              > This time limit has unfortunately escaped my attention. It is not
              > yet configurable.
              >
              > The warning message means that postscreen gives up waiting for the
              > DNS lookup result. This is a safety mechanism.
              >
              > > I'm also using a half-dozen RBLs, but they don't all always timeout.
              >
              > I see occasional timeouts on residential and co-located servers.
              > By default the resolver *system library* routines wait 5s before
              > retrying; this may be configurable in resolv.conf, but the
              > postscreen time limit is still hard-coded.
              >
              >
              > These are both corporate 10mbs dedicated links and I don't think latency
              > and/or bandwidth is a problem.
              >
              > It actually appears swl.spamhaus.org <http://swl.spamhaus.org> is the
              > main problem. It doesn't even resolve when I try to do it manually. This
              > was a recommendation I used from this list some time ago. Has something
              > changed?

              As a feed user of spamhaus, it's easy to see the amount of data that is
              actually in the zones. Both DWL and SWL zones are empty, so the
              whitelist experiments of spamhaus seem to be either 'on hold' or dead.
              Feel free to drop the zones from your setup.

              This won't fix dns lookup problems in general though.

              Tom
            • Wietse Venema
              ... These three will result in one list.dnswl.org query, just like the consolidated one. There is no performance difference. However, there is a correctness
              Message 6 of 10 , May 2 4:07 AM
              • 0 Attachment
                Stan Hoeppner:
                > > swl.spamhaus.org*-4
                > > list.dnswl.org=127.[0..255].[0..255].0*-2
                > > list.dnswl.org=127.[0..255].[0..255].1*-3
                > > list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
                >
                > Consolidate these last 3 to something like:
                > list.dnswl.org=127.0.[2..14].[2..3]*-4

                These three will result in one list.dnswl.org query, just like the
                consolidated one. There is no performance difference.

                However, there is a correctness difference. The consolidated form
                has the same weight 4 for all results, while the original form
                has different weights.

                Wietse
              • Wietse Venema
                ... Fixed in Postfix 2.12. Wietse 20140501 Cleanup: postcreen_dnsbl_timeout parameter. Files: mantools/postlink, proto/postconf.proto, global/mail_params.h,
                Message 7 of 10 , May 2 5:41 AM
                • 0 Attachment
                  Wietse Venema:
                  > Alex:
                  > > I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
                  > > spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
                  > > receiving the following timeout message:
                  > >
                  > > May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
                  > > timeout 10s for swl.spamhaus.org
                  >
                  > This time limit has unfortunately escaped my attention. It is not
                  > yet configurable.

                  Fixed in Postfix 2.12.

                  Wietse

                  20140501

                  Cleanup: postcreen_dnsbl_timeout parameter. Files:
                  mantools/postlink, proto/postconf.proto, global/mail_params.h,
                  postscreen/postscreen.c, postscreen/postscreen_dnsbl.c.
                • Stan Hoeppner
                  ... Correct. The reason for consolidating these is not to reduce queries. ... The consolidated form gives no score to a 4th octet value of [0..1], but gives
                  Message 8 of 10 , May 2 3:45 PM
                  • 0 Attachment
                    On 5/2/2014 6:07 AM, Wietse Venema wrote:
                    > Stan Hoeppner:
                    >>> swl.spamhaus.org*-4
                    >>> list.dnswl.org=127.[0..255].[0..255].0*-2
                    >>> list.dnswl.org=127.[0..255].[0..255].1*-3
                    >>> list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
                    >>
                    >> Consolidate these last 3 to something like:
                    >> list.dnswl.org=127.0.[2..14].[2..3]*-4
                    >
                    > These three will result in one list.dnswl.org query, just like the
                    > consolidated one. There is no performance difference.

                    Correct. The reason for consolidating these is not to reduce queries.

                    > However, there is a correctness difference. The consolidated form
                    > has the same weight 4 for all results, while the original form
                    > has different weights.

                    The consolidated form gives no score to a 4th octet value of [0..1], but
                    gives -4 to [2..3]. This is the key difference.

                    Alex' form and weights are not correct. And that is why I posted the
                    link to the return codes. The second 'octet' is always zero, not a
                    range. The 3rd octet has a range of 2-15, and the 4th octet a range of
                    0-3. Specifying a range of 0-255 or 2-255 to cover "the future" may
                    have the opposite effect, resulting in potential disaster, depending on
                    how/if/when dnswl changes things. Such wildcards should not be used.

                    A value of 15 in the 3rd octet means the sender is an Email Marketing
                    Provider. Most people would never whitelist such senders. Alex
                    currently does. Most people would give no preference to a 4th octet
                    score of 0 which means "no trust". Alex is giving -2. And he is giving
                    -3 to a 4th octet score of 1, "low trust". The recommended scale is
                    -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
                    scoring. Using a 4 point scale instead of 100, a 4th octet value of 0
                    or 1 should be given NO whitelisting preference at all, which is what my
                    consolidated example does.

                    Cheers,

                    Stan
                  • Alex
                    Hi, ... Somehow your first message to the list on this topic didn t make it to me. Had to read it in the archives. Anyway, thanks so much. My postscreen config
                    Message 9 of 10 , May 2 5:10 PM
                    • 0 Attachment
                      Hi,

                      On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner <stan@...> wrote:
                      On 5/2/2014 6:07 AM, Wietse Venema wrote:
                      > Stan Hoeppner:
                      >>>         swl.spamhaus.org*-4
                      >>>         list.dnswl.org=127.[0..255].[0..255].0*-2
                      >>>         list.dnswl.org=127.[0..255].[0..255].1*-3
                      >>>         list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
                      >>
                      >> Consolidate these last 3 to something like:
                      >>      list.dnswl.org=127.0.[2..14].[2..3]*-4
                      >
                      > These three will result in one list.dnswl.org query, just like the
                      > consolidated one. There is no performance difference.

                      Correct.  The reason for consolidating these is not to reduce queries.

                      > However, there is a correctness difference. The consolidated form
                      > has the same weight 4 for all results, while the original form
                      > has different weights.

                      The consolidated form gives no score to a 4th octet value of [0..1], but
                      gives -4 to [2..3].  This is the key difference.

                      Alex' form and weights are not correct.  And that is why I posted the
                      link to the return codes.  The second 'octet' is always zero, not a
                      range.  The 3rd octet has a range of 2-15, and the 4th octet a range of
                      0-3.  Specifying a range of 0-255 or 2-255 to cover "the future" may
                      have the opposite effect, resulting in potential disaster, depending on
                      how/if/when dnswl changes things.  Such wildcards should not be used.

                      A value of 15 in the 3rd octet means the sender is an  Email Marketing
                      Provider.  Most people would never whitelist such senders.  Alex
                      currently does.  Most people would give no preference to a 4th octet
                      score of 0 which means "no trust".  Alex is giving -2.  And he is giving
                      -3 to a 4th octet score of 1, "low trust".  The recommended scale is
                      -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
                      scoring.  Using a 4 point scale instead of 100, a 4th octet value of 0
                      or 1 should be given NO whitelisting preference at all, which is what my
                      consolidated example does.

                      Somehow your first message to the list on this topic didn't make it to me. Had to read it in the archives. Anyway, thanks so much. My postscreen config was generated through a discussion on this list with rob0 some time ago, as well as his postscreen config (http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's reading, he can correct this.

                      I can't believe I've been whitelisting mass mailers. That's far from what I would want to be doing. In fact, I'm considering figuring out some spamassassin rules to better identify them based on the dnswl queries.

                      Regarding your DNS caching comments, thanks for this too. I hadn't realized there would be bandwidth savings by having one or two DNS servers that are queried on the network versus having a local cache on each mail server. I've always been a bind loyalist, but will consider the powerDNS program if it doesn't improve.

                      I've already made the postscreen changes on the systems, and already noticing fewer DNS queries.

                      I've also removed swl.spamhaus.org entirely, thanks to a conversation with spamhaus and comments from Tom Hendrikx about it being discontinued.

                      Thanks everyone!
                      Alex


                    • /dev/rob0
                      ... Good point. I thought of this, but did not bother to implement it that way. Eventually I will change it. ... Well, I whitelist mildly. Do note that this is
                      Message 10 of 10 , May 2 8:00 PM
                      • 0 Attachment
                        On Fri, May 02, 2014 at 08:10:18PM -0400, Alex wrote:
                        > On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner
                        > <stan@...>wrote:
                        > > On 5/2/2014 6:07 AM, Wietse Venema wrote:
                        > > > Stan Hoeppner:
                        > > >>> swl.spamhaus.org*-4
                        > > >>> list.dnswl.org=127.[0..255].[0..255].0*-2
                        > > >>> list.dnswl.org=127.[0..255].[0..255].1*-3
                        > > >>> list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
                        > > >>
                        > > >> Consolidate these last 3 to something like:
                        > > >> list.dnswl.org=127.0.[2..14].[2..3]*-4
                        > > >
                        > > > These three will result in one list.dnswl.org query, just like
                        > > > the consolidated one. There is no performance difference.
                        > >
                        > > Correct. The reason for consolidating these is not to reduce
                        > > queries.
                        > >
                        > > > However, there is a correctness difference. The consolidated
                        > > > form has the same weight 4 for all results, while the original
                        > > > form has different weights.
                        > >
                        > > The consolidated form gives no score to a 4th octet value of
                        > > [0..1], but gives -4 to [2..3]. This is the key difference.
                        > >
                        > > Alex' form and weights are not correct. And that is why I posted
                        > > the link to the return codes. The second 'octet' is always zero,
                        > > not a range. The 3rd octet has a range of 2-15, and the 4th
                        > > octet a range of 0-3. Specifying a range of 0-255 or 2-255 to
                        > > cover "the future" may have the opposite effect, resulting in
                        > > potential disaster, depending on how/if/when dnswl changes
                        > > things. Such wildcards should not be used.

                        Good point. I thought of this, but did not bother to implement it
                        that way. Eventually I will change it.

                        > > A value of 15 in the 3rd octet means the sender is an Email
                        > > Marketing Provider. Most people would never whitelist such
                        > > senders. Alex currently does. Most people would give no
                        > > preference to a 4th octet score of 0 which means "no trust".

                        Well, I whitelist mildly. Do note that this is a whitelist, under
                        management by people who, I suppose, don't like spam any more than
                        you nor I.

                        A DNSWL.org return of 127.0.15.0 means an email marketer who is
                        nominally trying to limit spam (thus deserving a whitelist entry),
                        but who might be doing that well.

                        A -1 score makes sense. It's not enough to override Zen nor a
                        grouping of other DNSBLs, but if that's the only result from
                        postscreen_dnsbl_sites, it's enough to bypass the after-220 checks.

                        > > Alex is giving -2. And he is giving -3 to a 4th octet score of
                        > > 1, "low trust". The recommended scale is -0.1, -1.0, -10, -100,
                        > > and this is how SpamAssassin handles dnswl scoring.

                        Yes, I think -1, -2 and -4 make sense. I lump 4th octet 2 and 3
                        together because I'm a 2. :) Also, a -4 is going to override any
                        borderline DNSBL score. If it doesn't, I expect something to give
                        somewhere. In my studies, I found very little overlap between the
                        DNSBLs and the DNSWLs.

                        > > Using a 4 point scale instead of 100, a 4th octet value of
                        > > 0 or 1 should be given NO whitelisting preference at all,
                        > > which is what my consolidated example does.

                        But I don't agree with that. Scoring at the content scanning stage
                        differs from scoring in postscreen. DNSWL.org assumes that their
                        trust level "none" sites are not actually making money from spam. I
                        can't speak for Mathias, but I am pretty sure that he would delist
                        ANY known spammer.

                        > Somehow your first message to the list on this topic didn't make it
                        > to me. Had to read it in the archives. Anyway, thanks so much. My
                        > postscreen config was generated through a discussion on this list
                        > with rob0 some time ago, as well as his postscreen config (
                        > http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's
                        > reading, he can correct this.

                        Hiya! Yes, I remember. BTW, the better link to share is the HTML
                        page, http://rob0.nodns4.us/postscreen.html , which has all the
                        explanations and warnings.

                        > I can't believe I've been whitelisting mass mailers. That's far
                        > from what I would want to be doing. In fact, I'm considering
                        > figuring out some spamassassin rules to better identify them based
                        > on the dnswl queries.

                        If you want to be adventurous (and to violate the DNSWL.org spirit)
                        nothing stops you from using 127.0.15.0 with a positive score in
                        postscreen ... or even as a reject_rbl_client in smtpd!

                        I figure these are at worst the gray hats. And why bother giving
                        delays with the after-220 tests they will pass anyway? So yes, my
                        policy here was considered and deliberate. But looking back, I'll
                        agree that a -1 would make more sense than -2.

                        Stan probably tends to be more aggressive than I am. There's no
                        right/wrong to that, it's a choice.

                        > Regarding your DNS caching comments, thanks for this too. I hadn't
                        > realized there would be bandwidth savings by having one or two DNS
                        > servers that are queried on the network versus having a local cache
                        > on each mail server. I've always been a bind loyalist, but will
                        > consider the powerDNS program if it doesn't improve.

                        I've always been a BIND loyalist too. Now I'm paid to be a BIND
                        loyalist. I have nothing against the competition, certainly I can't
                        say anything bad about them.

                        But I can assure you that if you know ways in which BIND needs to
                        improve, ISC wants to hear from you.

                        Bigger doesn't always mean better, this I grant (just look at
                        Microsoft!) But in the case of BIND it means that an enormous
                        worldwide userbase is assisting ISC in continually improving BIND.

                        I don't mind questioning my loyalties from time to time, but I
                        wouldn't blindly jump ship from software I know and trust unless
                        there was a very good reason.

                        > I've already made the postscreen changes on the systems, and
                        > already noticing fewer DNS queries.
                        >
                        > I've also removed swl.spamhaus.org entirely, thanks to a
                        > conversation with spamhaus and comments from Tom Hendrikx about
                        > it being discontinued.

                        Yep, I will be doing the same. Unfortunately I probably won't get
                        around to updating my web page very soon. Note also that I used
                        dnsbl.ahbl.org in postscreen; by the beginning of 2015 that will
                        become disastrous, as they are planning to put a wildcard in the
                        zone.
                        --
                        http://rob0.nodns4.us/
                        Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
                      Your message has been successfully submitted and would be delivered to recipients shortly.