Loading ...
Sorry, an error occurred while loading the content.

Re: reject_unknown_client_hostname rejecting on SERVFAIL

Expand Messages
  • Wietse Venema
    ... Postfix uses the getnameinfo() SYSTEM LIBRARY routine. Apparently, your system s version reports DNS error code SERVFAIL as an unrecoverable error
    Message 1 of 8 , Jun 2, 2008
    • 0 Attachment
      Bernhard Schmidt:
      > Hi,
      >
      > we're running 2.5.1 with the pretty hard setting
      >
      > smtpd_client_restrictions =
      > check_client_access cidr:biiiiig_whitelist,
      > reject_unknown_client_hostname
      >
      > unknown_client_reject_code = ${stress?421}${stress:550}
      >
      > According to the documentation (and my expectation) the reject code
      > should be 450 if either address->name or name->address fails (e.g.
      > timeout or SERVFAIL)
      >
      > According to my logs this is true if the name->address mapping fails,
      > but if the address->name mapping fails with SERVFAIL the mail is still
      > rejected with 550.
      >
      > lxmhs17:~ # host 89.139.242.190
      > ;; connection timed out; no servers could be reached
      > lxmhs17:~ # host 89.139.242.190
      > Host 190.242.139.89.in-addr.arpa not found: 2(SERVFAIL)
      >
      > but
      >
      > Jun 2 14:25:54 lxmhs17 postfix/smtpd[11967]: NOQUEUE: reject: RCPT from
      > unknown[89.139.242.190]: 550 5.7.1 Client host rejected: cannot find
      > your hostname, [89.139.242.190]; from=<akstcaaonlinemnsdgs@...>
      > to=<2180-3517loescher@...> proto=ESMTP helo=<harari>
      >
      > Is this a bug or am I doing something seriously wrong?

      Postfix uses the getnameinfo() SYSTEM LIBRARY routine.

      Apparently, your system's version reports DNS error code SERVFAIL
      as an unrecoverable error condition.

      Postfix considers the following getnameinfo() results recoverable:
      EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.

      So this would be a bug in your system library.

      Wietse
    • Victor Duchovni
      ... This does not prove the point. The nameservice for this IP is rather intermittently available. Sometimes it just works, other times it fails. When it
      Message 2 of 8 , Jun 2, 2008
      • 0 Attachment
        On Mon, Jun 02, 2008 at 02:28:15PM +0200, Bernhard Schmidt wrote:

        > Hi,
        >
        > we're running 2.5.1 with the pretty hard setting
        >
        > smtpd_client_restrictions =
        > check_client_access cidr:biiiiig_whitelist,
        > reject_unknown_client_hostname
        >
        > unknown_client_reject_code = ${stress?421}${stress:550}
        >
        > According to the documentation (and my expectation) the reject code
        > should be 450 if either address->name or name->address fails (e.g.
        > timeout or SERVFAIL)
        >
        > According to my logs this is true if the name->address mapping fails,
        > but if the address->name mapping fails with SERVFAIL the mail is still
        > rejected with 550.
        >
        > lxmhs17:~ # host 89.139.242.190
        > ;; connection timed out; no servers could be reached
        > lxmhs17:~ # host 89.139.242.190
        > Host 190.242.139.89.in-addr.arpa not found: 2(SERVFAIL)
        >
        > but
        >
        > Jun 2 14:25:54 lxmhs17 postfix/smtpd[11967]: NOQUEUE: reject: RCPT from
        > unknown[89.139.242.190]: 550 5.7.1 Client host rejected: cannot find
        > your hostname, [89.139.242.190]; from=<akstcaaonlinemnsdgs@...>
        > to=<2180-3517loescher@...> proto=ESMTP helo=<harari>

        This does not prove the point. The nameservice for this IP is rather intermittently
        available. Sometimes it just works, other times it fails.

        When it works, I get:

        $ ./getnameinfo 89.139.242.190
        Hostname: 89-139-242-190.bb.netvision.net.il
        Address: 89.139.242.190

        $ ./getaddrinfo 89-139-242-190.bb.netvision.net.il
        Hostname: 89-139-242-190.bb.netvision.net.il
        Addresses: 89.139.242.190

        It seems to be working right now, but we don't know what the story
        was when Postfix reported this error. Either at some point NXDOMAIN
        was actually returned or, as Wietse says, the system library returns
        unexpected error codes for temporary lookup failures.

        --
        Viktor.

        Disclaimer: off-list followups get on-list replies or get ignored.
        Please do not ignore the "Reply-To" header.

        To unsubscribe from the postfix-users list, visit
        http://www.postfix.org/lists.html or click the link below:
        <mailto:majordomo@...?body=unsubscribe%20postfix-users>

        If my response solves your problem, the best way to thank me is to not
        send an "it worked, thanks" follow-up. If you must respond, please put
        "It worked, thanks" in the "Subject" so I can delete these quickly.
      • Bernhard Schmidt
        Hello, ... This appears to be the case, I can t really code C but my tests in python (which forward gaierrors pretty well) show rc=-2 ( Name or service not
        Message 3 of 8 , Jun 2, 2008
        • 0 Attachment
          Hello,

          >> Is this a bug or am I doing something seriously wrong?
          >
          > Postfix uses the getnameinfo() SYSTEM LIBRARY routine.
          >
          > Apparently, your system's version reports DNS error code SERVFAIL
          > as an unrecoverable error condition.
          >
          > Postfix considers the following getnameinfo() results recoverable:
          > EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.
          >
          > So this would be a bug in your system library.

          This appears to be the case, I can't really code C but my tests in
          python (which forward gaierrors pretty well) show rc=-2 ('Name or
          service not known') after quite some delay. I verified with tcpdump that
          I did in fact get back SERVFAIL responses from both resolvers.

          Does anyone else see that? I see that on SLES10.1 (glibc-2.4-31.43.6)
          and on Ubuntu Hardy (libc6_2.7-10ubuntu3). I guess that should be the
          most common platform around.

          Bernhard
        • Bernhard Schmidt
          ... Okay, I did some more research. Again, disclaimer, I m nowhere near understanding more than the basics of C, so I might be missing something really big
          Message 4 of 8 , Jun 2, 2008
          • 0 Attachment
            On Mon, Jun 02, 2008 at 03:53:25PM +0200, Bernhard Schmidt wrote:

            > >> Is this a bug or am I doing something seriously wrong?
            > >
            > > Postfix uses the getnameinfo() SYSTEM LIBRARY routine.
            > >
            > > Apparently, your system's version reports DNS error code SERVFAIL
            > > as an unrecoverable error condition.
            > >
            > > Postfix considers the following getnameinfo() results recoverable:
            > > EAI_AGAIN, EAI_MEMORY, or EAI_SYSTEM. See src/smtp/smtpd_peer.c.
            > >
            > > So this would be a bug in your system library.
            >
            > This appears to be the case, I can't really code C but my tests in
            > python (which forward gaierrors pretty well) show rc=-2 ('Name or
            > service not known') after quite some delay. I verified with tcpdump that
            > I did in fact get back SERVFAIL responses from both resolvers.
            >
            > Does anyone else see that? I see that on SLES10.1 (glibc-2.4-31.43.6)
            > and on Ubuntu Hardy (libc6_2.7-10ubuntu3). I guess that should be the
            > most common platform around.

            Okay, I did some more research. Again, disclaimer, I'm nowhere near
            understanding more than the basics of C, so I might be missing something
            really big here.

            All line numbers refer to the vanilla Postfix 2.5.2 code.

            In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
            function is defined in src/util/myaddrinfo.c:610 which basically calls
            getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
            that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
            tempfail and thus a 450 reject code, right?

            RFC3493 Section 6.2 allows the following Error Return Values for
            getnameinfo():

            | Error Return Values:
            |
            | The getnameinfo() function shall fail and return the corresponding
            | value if:
            |
            | [EAI_AGAIN] The name could not be resolved at this time.
            | Future attempts may succeed.
            |
            | [EAI_BADFLAGS] The flags had an invalid value.
            |
            | [EAI_FAIL] A non-recoverable error occurred.
            |
            | [EAI_FAMILY] The address family was not recognized or the address
            | length was invalid for the specified family.
            |
            | [EAI_MEMORY] There was a memory allocation failure.
            |
            | [EAI_NONAME] The name does not resolve for the supplied parameters.
            | NI_NAMEREQD is set and the host's name cannot be
            | located, or both nodename and servname were null.
            |
            | [EAI_OVERFLOW] An argument buffer overflowed.
            |
            | [EAI_SYSTEM] A system error occurred. The error code can be found
            | in errno.

            In my point of view, in case of a timeout/servfail both EAI_AGAIN
            (because it is a temporary error) and EAI_NONAME (because NI_NAMEREQD is
            set) are allowed. Again, the usual disclaimer about my understanding of
            C code applies, but it certainly looks like both glibc trunk and FreeBSD
            chose to return EAI_NONAME in this situation

            http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getnameinfo.c?annotate=1.20
            lines 273/274 and
            http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c?annotate=1.36&cvsroot=glibc
            lines 294-300

            This matches the behaviour of my python test (not C, but the error codes
            match on both platforms which is a very strong hint in my POV)

            On Linux (glibc 2.7):
            >>> import socket
            >>> socket.getnameinfo( ('89.139.242.190', 0), socket.NI_NAMEREQD)
            Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            socket.gaierror: (-2, 'Name or service not known')
            # define EAI_NONAME -2 /* NAME or SERVICE is unknown. */

            On FreeBSD 7.0-RELEASE:
            >>> import socket
            >>> socket.getnameinfo( ('89.139.242.190', 0), socket.NI_NAMEREQD)
            Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            socket.gaierror: (8, 'hostname nor servname provided, or not known')
            #define EAI_NONAME 8 /* nodename nor servname ... */

            Am I missing anything?

            Bernhard
          • Wietse Venema
            ... Yes, unless your Postfix was compiled with EMULATE_IPV4_ADDRINFO, which is only supported on older systems that have no IPv6 support, and that have no
            Message 5 of 8 , Jun 2, 2008
            • 0 Attachment
              Bernhard Schmidt:
              > Okay, I did some more research. Again, disclaimer, I'm nowhere near
              > understanding more than the basics of C, so I might be missing something
              > really big here.
              >
              > All line numbers refer to the vanilla Postfix 2.5.2 code.
              >
              > In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
              > function is defined in src/util/myaddrinfo.c:610 which basically calls
              > getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
              > that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
              > tempfail and thus a 450 reject code, right?

              Yes, unless your Postfix was compiled with EMULATE_IPV4_ADDRINFO,
              which is only supported on older systems that have no IPv6 support,
              and that have no getnameinfo() etc. routines.

              If your system library reports SERVFAIL errors as EAI_NONAME, then
              there is no way to report this as a recoverable error.

              Wietse
            • Bernhard Schmidt
              Hello Wietse, ... Have you encountered any stack that behaves correctly here? I m trying to take this up with the glibc developers, having a working example
              Message 6 of 8 , Jun 5, 2008
              • 0 Attachment
                Hello Wietse,

                >> In src/smtpd/smtpd_peer.c:308 sockaddr_to_hostname() is called. This
                >> function is defined in src/util/myaddrinfo.c:610 which basically calls
                >> getnameinfo(...., NI_NAMEREQD) in line 676. This would be the function
                >> that would need to return EAI_AGAIN, EAI_MEMORY or EAI_SYSTEM to have a
                >> tempfail and thus a 450 reject code, right?
                > If your system library reports SERVFAIL errors as EAI_NONAME, then
                > there is no way to report this as a recoverable error.

                Have you encountered any stack that behaves correctly here? I'm trying
                to take this up with the glibc developers, having a working example
                (preferably open source) would be very helpful.

                Bernhard
              • Bernhard Schmidt
                ... For the record, after spending hours of barking up wrong trees (or at least the wrong branches of the correct tree) this problem has finally been resolved.
                Message 7 of 8 , Jun 14, 2008
                • 0 Attachment
                  On Mon, Jun 02, 2008 at 05:25:32PM -0400, Wietse Venema wrote:

                  > If your system library reports SERVFAIL errors as EAI_NONAME, then
                  > there is no way to report this as a recoverable error.

                  For the record, after spending hours of barking up wrong trees (or at
                  least the wrong branches of the correct tree) this problem has finally
                  been resolved. Executive summary: this is/was indeed a bug in the system
                  library.

                  We originally observed this problem on SLES 10.1 which includes glibc
                  2.4. After you pointed towards an errorneous return value of
                  getnameinfo() I did some tests on my workstation (Ubuntu Hardy, glibc
                  2.7) and found it to be affected as well. Since there had been no
                  changes in glibc CVS since that version for that code I concluded that
                  this bug was still present in current glibc.

                  This assumption was wrong. The bug in glibc has been fixed with the
                  following commit for glibc 2.5

                  http://sourceware.org/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c.diff?r1=1.34&r2=1.35&cvsroot=glibc&f=h

                  My Hardy workstation (glibc 2.7) still being broken was caused by an
                  unrelated problem with the mDNS/avahi module installed on Ubuntu by
                  default

                  bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
                  hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
                  bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
                  rv:Name or service not known(-2)
                  bschmidt@lxbsc01:~$ sudo vim /etc/nsswitch.conf
                  bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
                  hosts: files dns
                  bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
                  rv:Temporary failure in name resolution(-3)

                  After recompiling the glibc 2.4 in SLES10 with the patch applied
                  getnameinfo() and thus Postfix behave as expected.

                  Would it be unreasonable to add a heads-up to the manpage? Definitely
                  affected are

                  * SLES 10 (including the recently released SP2) shipping glibc 2.4
                  * Debian Etch shipping glibc 2.3
                  * FreeBSD 7.0-RELEASE (not shipping any glibc but according to my tests
                  broken as well)

                  I'll file the appropriate bug reports with Novell and Debian in the next
                  couple of days, but it will probably take years rather than months to
                  fix all the systems out there, so a small note in the manpage would
                  probably be a good idea. And/or maybe ship a small test program that can
                  be used to determine whether your system library is broken. I can
                  provide an IP address where the reverse lookup will always fail if
                  necessary.

                  Regards,
                  Bernhard
                Your message has been successfully submitted and would be delivered to recipients shortly.