Loading ...
Sorry, an error occurred while loading the content.
 

Why "lost connection after RCPT" when we reject?

Expand Messages
  • D'Arcy J.M. Cain
    There s a new trick in the spammer s bag of tricks. Companies like strikeiron and briteverify are springing up promising to verify email addresses so that
    Message 1 of 18 , Jul 11, 2014
      There's a new trick in the spammer's bag of tricks. Companies like
      strikeiron and briteverify are springing up promising to verify email
      addresses so that senders can limit sending invalid emails to MTAs and
      thus wind up on their suspicious sender list. I can't think of a
      single legitimate use for this service.

      In order to find spammers or, as described above, their agents, I would
      like to find incomplete sessions. Unfortunately there is only one
      string to trigger on, the subject mentioned one. We get this message
      in at least three scenarios that I can see. One, someone sends email
      to an invalid address and we reject the balance of the session. Two,
      we reject the session because of an RBL. Three, someone is probing to
      find out if an address is valid. I don't really care about number two
      since I have already dealt with it but Postfix still sends out the log
      line for it even after it has already logged the reject. Is there any
      way to not have that logged or at least change the log message?

      Even number one is unnecessary since we already know that someone has
      attempted to send to an invalid user. It's really just the third case
      that we care about. If I do this I guess I can drop the "User unknown
      in local recipient table" test.

      I suppose there is a number four where the sender has a system issue
      and disconnects prematurely but this probably doesn't happen often
      enough to worry about especially if I only take note once the sender
      passes some reasonable threshold.

      --
      D'Arcy J.M. Cain
      System Administrator, Vex.Net
      http://www.Vex.Net/ IM:darcy@...
      VoIP: sip:darcy@...
    • lists@rhsoft.net
      ... you did not provide any log but lost connection after RCPT means the client did not quit the smtp session properly and so the client is broken * client
      Message 2 of 18 , Jul 11, 2014
        Am 11.07.2014 21:02, schrieb D'Arcy J.M. Cain:
        > There's a new trick in the spammer's bag of tricks. Companies like
        > strikeiron and briteverify are springing up promising to verify email
        > addresses so that senders can limit sending invalid emails to MTAs and
        > thus wind up on their suspicious sender list. I can't think of a
        > single legitimate use for this service.
        >
        > In order to find spammers or, as described above, their agents, I would
        > like to find incomplete sessions. Unfortunately there is only one
        > string to trigger on, the subject mentioned one. We get this message
        > in at least three scenarios that I can see. One, someone sends email
        > to an invalid address and we reject the balance of the session. Two,
        > we reject the session because of an RBL. Three, someone is probing to
        > find out if an address is valid. I don't really care about number two
        > since I have already dealt with it but Postfix still sends out the log
        > line for it even after it has already logged the reject. Is there any
        > way to not have that logged or at least change the log message?
        >
        > Even number one is unnecessary since we already know that someone has
        > attempted to send to an invalid user. It's really just the third case
        > that we care about. If I do this I guess I can drop the "User unknown
        > in local recipient table" test.
        >
        > I suppose there is a number four where the sender has a system issue
        > and disconnects prematurely but this probably doesn't happen often
        > enough to worry about especially if I only take note once the sender
        > passes some reasonable threshold

        you did not provide any log but "lost connection after RCPT"
        means the client did not quit the smtp session properly and
        so the client is broken

        * client connects
        * client send SMTP commands
        * postfix answers with the REJECT status
        * client blindly closes the connection
      • D'Arcy J.M. Cain
        On Fri, 11 Jul 2014 21:06:59 +0200 ... Are you sure that you read my message? That s only one of the three scenarios that generates that log. ... That s
        Message 3 of 18 , Jul 11, 2014
          On Fri, 11 Jul 2014 21:06:59 +0200
          "lists@..." <lists@...> wrote:
          > > this message in at least three scenarios that I can see. One,
          > > someone sends email to an invalid address and we reject the balance
          > > of the session. Two, we reject the session because of an RBL.
          > > Three, someone is probing to find out if an address is valid. I

          > you did not provide any log but "lost connection after RCPT"
          > means the client did not quit the smtp session properly and
          > so the client is broken

          Are you sure that you read my message? That's only one of the three
          scenarios that generates that log.

          > * client connects
          > * client send SMTP commands
          > * postfix answers with the REJECT status
          > * client blindly closes the connection

          That's number one above. The problem is that Postfix logs the same
          message for scenario number three which is the one I want to isolate.

          Actually, number three can also look like number one when they try an
          invalid address so grepping for the lost connection log line would be
          fine if I could ignore number two.

          --
          D'Arcy J.M. Cain
          System Administrator, Vex.Net
          http://www.Vex.Net/ IM:darcy@...
          VoIP: sip:darcy@...
        • lists@rhsoft.net
          ... no - you did not understand what i told you the client is expected to close the SMTP session sane after receive the reject answer from the server and not
          Message 4 of 18 , Jul 11, 2014
            Am 11.07.2014 22:16, schrieb D'Arcy J.M. Cain:
            > On Fri, 11 Jul 2014 21:06:59 +0200
            > "lists@..." <lists@...> wrote:
            >>> this message in at least three scenarios that I can see. One,
            >>> someone sends email to an invalid address and we reject the balance
            >>> of the session. Two, we reject the session because of an RBL.
            >>> Three, someone is probing to find out if an address is valid. I
            >
            >> you did not provide any log but "lost connection after RCPT"
            >> means the client did not quit the smtp session properly and
            >> so the client is broken
            >
            > Are you sure that you read my message? That's only one of the three
            > scenarios that generates that log.
            >
            >> * client connects
            >> * client send SMTP commands
            >> * postfix answers with the REJECT status
            >> * client blindly closes the connection
            >
            > That's number one above. The problem is that Postfix logs the same
            > message for scenario number three which is the one I want to isolate.
            >
            > Actually, number three can also look like number one when they try an
            > invalid address so grepping for the lost connection log line would be
            > fine if I could ignore number two

            no - you did not understand what i told you

            the client is expected to close the SMTP session sane after
            receive the reject answer from the server and not just close
            the connection without saying "bye"

            in other words:

            * the 4 steps above are what happening
            * the first 3 steps are expected
            * step 4 should be a quit from the client before close

            look at the last post of that thread
            http://postfix.1071664.n5.nabble.com/lost-connection-after-RCPT-td903.html
          • Noel Jones
            ... But there s really only one scenario. The only time postfix logs that message is when the connection is lost after RCPT. This is always caused by either
            Message 5 of 18 , Jul 11, 2014
              On 7/11/2014 3:16 PM, D'Arcy J.M. Cain wrote:
              > On Fri, 11 Jul 2014 21:06:59 +0200
              > "lists@..." <lists@...> wrote:
              >>> this message in at least three scenarios that I can see. One,
              >>> someone sends email to an invalid address and we reject the balance
              >>> of the session. Two, we reject the session because of an RBL.
              >>> Three, someone is probing to find out if an address is valid. I
              >
              >> you did not provide any log but "lost connection after RCPT"
              >> means the client did not quit the smtp session properly and
              >> so the client is broken
              >
              > Are you sure that you read my message? That's only one of the three
              > scenarios that generates that log.

              But there's really only one scenario. The only time postfix logs
              that message is when the connection is lost after RCPT. This is
              always caused by either A) a poorly written mail engine that
              improperly drops the connection, or B) a network problem.
              Unfortunately, it's impossible to tell the difference from your end.
              All postfix knows is the connection was lost unexpectedly, and it
              would be improper to not log it.

              You're focusing on what happens before the lost connection. That's a
              job for log analysis tools.

              I suppose the "recipient count" could be added to the "lost
              connection" message. That might be modestly useful to the general
              user base. Maybe something like:

              postfix/smtpd[nnn]: lost connection after RCPT from
              test.example.com[192.0.2.100], nrcpt=N

              But that's just an idea, not a fully thought-out proposal. Feel free
              to submit a patch.

              Of course, the spamware writers could easily fix this little
              artifact by sending QUIT after their payload is rejected rather than
              just dropping the connection. They already know this. Apparently
              (for now) they would rather save a few milliseconds and move on to
              the next target.




              -- Noel Jones
            • Wietse Venema
              Noel Jones: [ Charset ISO-8859-1 converted... ] ... I wonder, does that include rejected recipients? What about recipients in earlier transactions within the
              Message 6 of 18 , Jul 11, 2014
                Noel Jones:
                [ Charset ISO-8859-1 converted... ]
                > On 7/11/2014 3:16 PM, D'Arcy J.M. Cain wrote:
                > > On Fri, 11 Jul 2014 21:06:59 +0200
                > > "lists@..." <lists@...> wrote:
                > >>> this message in at least three scenarios that I can see. One,
                > >>> someone sends email to an invalid address and we reject the balance
                > >>> of the session. Two, we reject the session because of an RBL.
                > >>> Three, someone is probing to find out if an address is valid. I
                > >
                > >> you did not provide any log but "lost connection after RCPT"
                > >> means the client did not quit the smtp session properly and
                > >> so the client is broken
                > >
                > > Are you sure that you read my message? That's only one of the three
                > > scenarios that generates that log.
                >
                > But there's really only one scenario. The only time postfix logs
                > that message is when the connection is lost after RCPT. This is
                > always caused by either A) a poorly written mail engine that
                > improperly drops the connection, or B) a network problem.
                > Unfortunately, it's impossible to tell the difference from your end.
                > All postfix knows is the connection was lost unexpectedly, and it
                > would be improper to not log it.
                >
                > You're focusing on what happens before the lost connection. That's a
                > job for log analysis tools.
                >
                > I suppose the "recipient count" could be added to the "lost
                > connection" message. That might be modestly useful to the general
                > user base. Maybe something like:
                >
                > postfix/smtpd[nnn]: lost connection after RCPT from
                > test.example.com[192.0.2.100], nrcpt=N
                >
                > But that's just an idea, not a fully thought-out proposal. Feel free
                > to submit a patch.

                I wonder, does that include rejected recipients? What about recipients
                in earlier transactions within the same SMTP session? Whatever we
                log would need to be easy to explain.

                Wietse
                > Of course, the spamware writers could easily fix this little
                > artifact by sending QUIT after their payload is rejected rather than
                > just dropping the connection. They already know this. Apparently
                > (for now) they would rather save a few milliseconds and move on to
                > the next target.
                >
                >
                >
                >
                > -- Noel Jones
                >
              • Bill Cole
                ... No, it seems to be a generic description of all 3 scenarios you describe. The most common modern configuration of Postfix postpones policy-based rejections
                Message 7 of 18 , Jul 11, 2014
                  On 11 Jul 2014, at 16:16, D'Arcy J.M. Cain wrote:

                  > On Fri, 11 Jul 2014 21:06:59 +0200
                  > "lists@..." <lists@...> wrote:
                  >>> this message in at least three scenarios that I can see. One,
                  >>> someone sends email to an invalid address and we reject the balance
                  >>> of the session. Two, we reject the session because of an RBL.
                  >>> Three, someone is probing to find out if an address is valid. I
                  >
                  >> you did not provide any log but "lost connection after RCPT"
                  >> means the client did not quit the smtp session properly and
                  >> so the client is broken
                  >
                  > Are you sure that you read my message? That's only one of the three
                  > scenarios that generates that log.
                  >
                  >> * client connects
                  >> * client send SMTP commands
                  >> * postfix answers with the REJECT status
                  >> * client blindly closes the connection
                  >
                  > That's number one above.

                  No, it seems to be a generic description of all 3 scenarios you
                  describe.

                  The most common modern configuration of Postfix postpones policy-based
                  rejections for reasons that are determined early in the SMTP
                  conversation until the RCPT command, because historically some sending
                  software (even some which was nominally legitimate, e.g. Exchange,
                  qmail) reacted in undesirable ways to 5xx replies to earlier commands.
                  "Rejection" in all 3 of your scenarios is a 5xx reply code from Postfix
                  to the client's RCPT command.

                  The formally proper reactions by a well-behaved SMTP client to that
                  event do not include unilaterally closing the TCP connection over which
                  the SMTP session is running. The proper way to end an SMTP session is to
                  send the QUIT command and await the inevitable 221 reply and
                  server-initiated TCP closure, but many spammer tools (and a few MUA's)
                  either never send the QUIT or send it with a TCP FIN flag of their own
                  and immediately dispose of their end of the connection. That event is
                  logged by Postfix as a "lost connection after RCPT" because that is what
                  it is. A proper client could also have offered another RCPT or RSET
                  followed by a new transaction.

                  > The problem is that Postfix logs the same
                  > message for scenario number three which is the one I want to isolate.

                  Scenario 3 is indistinguishable from Scenario 1. The only useful means
                  in the modern world to probe the validity of user@... requires
                  the prober to say "RCPT TO:<user@...>" to an MX for example.com
                  (after suitable HELO/EHLO & MAIL commands) and check the reply code. A
                  definitive test also requires following through with an actual message
                  if the reply to RCPT begins with '250' and then waiting for a true reply
                  to the message, but most spam facilitators will just check the RCPT
                  reply code and move to the next address in their list.

                  > Actually, number three can also look like number one when they try an
                  > invalid address so grepping for the lost connection log line would be
                  > fine if I could ignore number two.

                  You're looking for a solution in the wrong place. Postfix logs its
                  rejections of RCPT commands clearly in lines that include the envelope
                  sender, rejected recipient, ehlo/helo argument, connecting IP/hostname,
                  and the specific reply code & text that it gave to the client. For
                  example, here are 4 rejections by my server today, selected for their
                  lack of obscenity in response text and very lightly redacted, showing 4
                  different types of rejection including those you want to identify. Note:
                  I have "smtpd_delay_open_until_valid_rcpt = no" & "enable_long_queue_ids
                  = yes" so these have nice long meaningful and eternally unique IDs where
                  your logs might have NOQUEUE or shorter traditional Postfix IDs:

                  Fri Jul 11 16:13:46 toaster postfix/smtpd[44274]: 3h956y3tPTz1XtSCD:
                  reject: RCPT from drone13.ral.icpbounce.com[66.192.165.132]: 550 5.7.1
                  <bounces+630579.127201093.883421@...>: Sender address
                  rejected: Too much spam from your ESP.;
                  from=<bounces+630579.127201093.883421@...>
                  to=<REDACT@...> proto=ESMTP
                  helo=<drone13.ral.icpbounce.com>

                  Fri Jul 11 16:14:19 toaster postfix/smtpd[44274]: 3h957b4lzFz1XtSCJ:
                  reject: RCPT from unknown[87.252.208.73]: 550 5.7.1
                  <kwame_akaba@...>: Sender address rejected: Mail from Yahoo's
                  sewer is not welcome here. Get a real mail service.;
                  from=<kwame_akaba@...> to=<REDACT@...> proto=ESMTP
                  helo=<researchandmarkets.org>

                  Both of those are due to local check_sender_access rules.

                  Fri Jul 11 18:05:22 toaster postfix/smtpd[45421]: 3h97bj3k7fz1XtSJP:
                  reject: RCPT from unknown[85.237.53.92]: 554 5.7.1 Service unavailable;
                  Client host [85.237.53.92] blocked using psbl.surriel.com; Listed in
                  PSBL, see http://psbl.org/listing?ip=85.237.53.92; from=<info@...>
                  to=<REDACT@...> proto=ESMTP helo=<mail.pekof.com>

                  A DNSBL rejection.

                  Fri Jul 11 18:34:13 toaster postfix/smtpd[45719]: 3h98Dz6dv0z1XtSLG:
                  reject: RCPT from vds1-14.ispfr.net[195.14.0.23]: 550 5.1.1
                  <admin@...>: Recipient address rejected: User unknown in relay
                  recipient table; from=<REDACT@...>
                  to=<admin@...> proto=ESMTP helo=<de505.ispfr.net>

                  A spammer trying to mail or verify a bogus address.
                • D'Arcy J.M. Cain
                  On Fri, 11 Jul 2014 16:52:12 -0500 ... But A has subsets. I want to ask the question Who connected, confirmed a valid address and disconnected without
                  Message 8 of 18 , Jul 12, 2014
                    On Fri, 11 Jul 2014 16:52:12 -0500
                    Noel Jones <njones@...> wrote:
                    > But there's really only one scenario. The only time postfix logs
                    > that message is when the connection is lost after RCPT. This is
                    > always caused by either A) a poorly written mail engine that
                    > improperly drops the connection, or B) a network problem.

                    But 'A' has subsets. I want to ask the question "Who connected,
                    confirmed a valid address and disconnected without sending mail?" Is
                    that an unreasonable question without needing to do stateful log
                    analysis? It's not that I am a stranger to that sort of log analysis
                    but the Postfix engine already has that information. All I am saying
                    is that it would be nice if the lost connection message (or a separate
                    message) made note of the status at the time of disconnection.

                    Actually, a separate log entry makes sense because I want to know that
                    information whether the connection was dropped properly or not. In
                    other words, after a disconnect of any sort I want to know if the sender
                    sent an invalid address, a valid one which it followed up with DATA or
                    a valid one that it did not follow up.

                    > Unfortunately, it's impossible to tell the difference from your end.
                    > All postfix knows is the connection was lost unexpectedly, and it
                    > would be improper to not log it.

                    I understand that. In fact, I understand more about what gets logged
                    now from this discussion which I thank you and others for.

                    > You're focusing on what happens before the lost connection. That's a
                    > job for log analysis tools.

                    Which I was trying to avoid mainly because I analyze the logs every
                    five minutes to see who to block. By the end of the day that gets very
                    CPU intensive. I was hoping for a simple grep|sed solution. Maybe I
                    need a single process that runs all day doing a tail -f on the log file.

                    Hmm. I wonder if I can play games with syslogd so that mail logs go to
                    maillog as well as a socket that I can read. I'll have to play with
                    that.

                    > Of course, the spamware writers could easily fix this little
                    > artifact by sending QUIT after their payload is rejected rather than
                    > just dropping the connection. They already know this. Apparently
                    > (for now) they would rather save a few milliseconds and move on to
                    > the next target.

                    This is what I am worried about. Right now I am just counting dropped
                    connections but that's not a long term solution.

                    --
                    D'Arcy J.M. Cain
                    System Administrator, Vex.Net
                    http://www.Vex.Net/ IM:darcy@...
                    VoIP: sip:darcy@...
                  • Noel Jones
                    ... My first thought was a simple number of valid recipients within this session before it disconnected , similar to the nrcpt counter in the cleanup log
                    Message 9 of 18 , Jul 12, 2014
                      On 7/11/2014 5:06 PM, Wietse Venema wrote:
                      >>
                      >> I suppose the "recipient count" could be added to the "lost
                      >> connection" message. That might be modestly useful to the general
                      >> user base. Maybe something like:
                      >>
                      >> postfix/smtpd[nnn]: lost connection after RCPT from
                      >> test.example.com[192.0.2.100], nrcpt=N
                      >>
                      >> But that's just an idea, not a fully thought-out proposal. Feel free
                      >> to submit a patch.
                      >
                      > I wonder, does that include rejected recipients? What about recipients
                      > in earlier transactions within the same SMTP session? Whatever we
                      > log would need to be easy to explain.
                      >
                      > Wietse

                      My first thought was a simple "number of valid recipients within
                      this session before it disconnected", similar to the nrcpt counter
                      in the cleanup log entry, or the "recipient count" in the policy
                      service. This seems dirt simple to explain, which is always good.
                      One could use this simple display to look for non-zero events worthy
                      of investigation. Zero count shows a host that was already rejected
                      for some reason and can be ignored.

                      proposed log:
                      postfix/smtpd[nnn]: lost connection after RCPT from
                      test.example.com[192.0.2.100], nrcpt=N



                      Probably more useful to help identify abuse would be a counter of
                      valid/total RCPT commands within a session that drops. nrcpt=N/T
                      where N is valid recipients, T is total RCPT commands. I think
                      valid/total is easier to explain than valid/rejected, and makes a
                      pretty fraction display.

                      proposed log:
                      postfix/smtpd[nnn]: lost connection after RCPT from
                      test.example.com[192.0.2.100], nrcpt=N/T




                      -- Noel Jones
                    • Bill Cole
                      ... A formally well-behaved address verifier is most obvious in Postfix syslog messages by obscurity. This is all syslog messages generated by a manual SMTP
                      Message 10 of 18 , Jul 12, 2014
                        On 12 Jul 2014, at 9:19, D'Arcy J.M. Cain wrote:

                        > I want to ask the question "Who connected,
                        > confirmed a valid address and disconnected without sending mail?" Is
                        > that an unreasonable question without needing to do stateful log
                        > analysis? It's not that I am a stranger to that sort of log analysis
                        > but the Postfix engine already has that information. All I am saying
                        > is that it would be nice if the lost connection message (or a separate
                        > message) made note of the status at the time of disconnection.
                        >
                        > Actually, a separate log entry makes sense because I want to know that
                        > information whether the connection was dropped properly or not. In
                        > other words, after a disconnect of any sort I want to know if the
                        > sender
                        > sent an invalid address, a valid one which it followed up with DATA or
                        > a valid one that it did not follow up.

                        A formally well-behaved address verifier is most obvious in Postfix
                        syslog messages by obscurity. This is all syslog messages generated by a
                        manual SMTP session testing 6 addresses and disconnecting properly:

                        Sat Jul 12 18:54:45 toaster postfix/postscreen[65414] <Info>: CONNECT
                        from [127.0.0.1]:64826 to \[127.0.0.1\]:25
                        Sat Jul 12 18:54:45 toaster postfix/postscreen[65414] <Info>:
                        WHITELISTED [127.0.0.1]:64826
                        Sat Jul 12 18:54:45 toaster postfix/smtpd[65416] <Info>: connect from
                        localhost[127.0.0.1]
                        Sat Jul 12 18:55:06 toaster postfix/smtpd[65416] <Info>:
                        3h9mff597Zz1XtTxB: client=localhost[127.0.0.1]
                        Sat Jul 12 18:56:18 toaster postfix/smtpd[65416] <Info>: disconnect from
                        localhost[127.0.0.1]

                        IOW: Postfix logs nothing about what a client does unless it results in
                        a failure reply or a queued message.

                        Absent a log message from Postfix itself, you could get a message out of
                        a milter (e.g. MIMEDefang) of each RCPT.
                      • Wietse Venema
                        ... [I am making an exception to respond on-list to known people.] Interesting idea, but why not log these numbers with the disconnect event? This is logged
                        Message 11 of 18 , Jul 12, 2014
                          Noel Jones:
                          > Probably more useful to help identify abuse would be a counter of
                          > valid/total RCPT commands within a session that drops. nrcpt=N/T
                          > where N is valid recipients, T is total RCPT commands. I think
                          > valid/total is easier to explain than valid/rejected, and makes a
                          > pretty fraction display.
                          >
                          > proposed log:
                          > postfix/smtpd[nnn]: lost connection after RCPT from
                          > test.example.com[192.0.2.100], nrcpt=N/T

                          [I am making an exception to respond on-list to known people.]

                          Interesting idea, but why not log these numbers with the "disconnect"
                          event? This is logged for all SMTP sessions, whether or not the client
                          terminates a session with the QUIT command.

                          And more counters might be of interest: the distribution of
                          accepted/total number of {helo/mail/rcpt/data/dot/other} commands
                          would give the demographics of an SMTP session. If a client hangs
                          up after sending MAIL FROM and that command was or was not accepted,
                          then that is a clue that would otherwise only be available with
                          verbose logging.

                          Wietse
                        • Noel Jones
                          ... Yes, that had occurred to me, but then you would still have to correlate the stats on the disconnect line with a premature lost connection earlier in the
                          Message 12 of 18 , Jul 12, 2014
                            On 7/12/2014 7:09 PM, Wietse Venema wrote:
                            > Noel Jones:
                            >> Probably more useful to help identify abuse would be a counter of
                            >> valid/total RCPT commands within a session that drops. nrcpt=N/T
                            >> where N is valid recipients, T is total RCPT commands. I think
                            >> valid/total is easier to explain than valid/rejected, and makes a
                            >> pretty fraction display.
                            >>
                            >> proposed log:
                            >> postfix/smtpd[nnn]: lost connection after RCPT from
                            >> test.example.com[192.0.2.100], nrcpt=N/T
                            >
                            > [I am making an exception to respond on-list to known people.]
                            >
                            > Interesting idea, but why not log these numbers with the "disconnect"
                            > event? This is logged for all SMTP sessions, whether or not the client
                            > terminates a session with the QUIT command.

                            Yes, that had occurred to me, but then you would still have to
                            correlate the stats on the disconnect line with a premature lost
                            connection earlier in the log. At least for now, the lost
                            connection is a nice flag for possible abuse, and normal disconnects
                            are less interesting.

                            My goal is something a simple grep command can identify for further
                            investigation.


                            >
                            > And more counters might be of interest: the distribution of
                            > accepted/total number of {helo/mail/rcpt/data/dot/other} commands
                            > would give the demographics of an SMTP session. If a client hangs
                            > up after sending MAIL FROM and that command was or was not accepted,
                            > then that is a clue that would otherwise only be available with
                            > verbose logging.

                            I was trying to start with something easily implemented. Moving past
                            that...

                            A new "connection stats" line logged separately after the disconnect
                            could include all that and more, would surely be used for things I
                            haven't thought of, while still being fairly easy to explain.

                            Sample log expanding on the earlier ideas, n=valid T=total S=seconds:

                            postfix/smtpd[nnn]: stats: test.example.com[192.0.2.100]:port,
                            helo=n/T, auth=n/T, mail=n/T, rcpt=n/T, data=n/T, dot=n/T, quit=n/T,
                            other=n/T, bytes=transmited/received, duration=SSS.ss,
                            TLS={none|anonymous|trusted|...}

                            I'm not sure how to indicate a lost connection in the sample above.
                            Would including a "quit=n/T" be sufficient, 0/0 indicating a lost
                            connection, 1/1 normal? Or would there need to be a separate
                            end={normal|lost} indicator? Or maybe better for documenting,
                            "quit={yes|none}"

                            Admittedly, I have no idea what it would take to add all that info.
                            Not my intention to propose a 3-month project. But you asked, so
                            I'll shoot for the moon.



                            -- Noel Jones
                          • Wietse Venema
                            In response to Noel s followup, here is a proposal that can make Postfix trouble shooting / anomaly detection easier. This would reveal information that is
                            Message 13 of 18 , Jul 14, 2014
                              In response to Noel's followup, here is a proposal that can make
                              Postfix trouble shooting / anomaly detection easier. This would
                              reveal information that is currently available only by turning on
                              verbose logging.

                              Proposal:

                              The Postfix SMTP server maintains two counters for each known
                              command: one counter for the total number of times the command was
                              issued during an SMTP session, and one counter for the number of
                              normal completions (a 2XX reply status). These counters are reset
                              before the server accepts the next SMTP connection. Perhaps there
                              should also be a counter for unknown commands.

                              Upon disconnect. the Postfix SMTP server logs statistics for each
                              command that has a non-zero counter. The syntax is:

                              command-name=normal-completions/total

                              Example: a "normal" session with ESMTP handshake, one mail delivery
                              transaction with one recipient, and closed with "quit":

                              ehlo=1/1 mail=1/1 rcpt=1/1 data=1/1 quit=1/1

                              An "abnormal" session that drops after a rejected recipient:

                              helo=1/1 mail=1/1 rcpt=0/1

                              A "normal" ESMTP session with vrfy:

                              ehlo=1/1 vrfy=1/1 quit=1/1

                              An "abnormal" session that drops after 10 rejected AUTH commands:

                              ehlo=1/1 auth=0/10

                              The logging shows only counters for commands that were actually
                              issued. To save space we could replace "n/n" (two identical numbers)
                              with just "n". I don't know if this would actually simplify parsing.

                              As the examples show this is really a small amount of text, so there
                              is no reason to increase logging overhead by using a separate record.
                              Since the stats would be logged at the end of a session, they can
                              be logged in the "disconnect" record.

                              Wietse
                            • Andreas Schulze
                              ... Hello Wietse, the proposal sounds good. Such intormation could be helpful. Do you think it should be logged always or only while debugging? I use to
                              Message 14 of 18 , Jul 14, 2014
                                Wietse Venema:
                                > Since the stats would be logged at the end of a session, they can
                                > be logged in the "disconnect" record.

                                Hello Wietse,

                                the proposal sounds good. Such intormation could be helpful.

                                Do you think it should be logged always or only while debugging?
                                I use to "postconf -e "debug_peer_list=$buggy_client" when searching anomalies
                                and would expect such details only in that context.

                                Andreas
                              • Noel Jones
                                ... As I understand it, the stats will always be logged, and there will not be an option to turn them off. This is just a few characters added to the existing
                                Message 15 of 18 , Jul 14, 2014
                                  On 7/14/2014 9:40 AM, Andreas Schulze wrote:
                                  > Wietse Venema:
                                  >> Since the stats would be logged at the end of a session, they can
                                  >> be logged in the "disconnect" record.
                                  >
                                  > Hello Wietse,
                                  >
                                  > the proposal sounds good. Such intormation could be helpful.
                                  >
                                  > Do you think it should be logged always or only while debugging?
                                  > I use to "postconf -e "debug_peer_list=$buggy_client" when searching anomalies
                                  > and would expect such details only in that context.
                                  >
                                  > Andreas
                                  >


                                  As I understand it, the stats will always be logged, and there will
                                  not be an option to turn them off.

                                  This is just a few characters added to the existing "disconnect" log
                                  entry, rather than the dozens/hundreds of log lines added by
                                  debugging, and shouldn't make a significant difference in log size.

                                  This also allows you to retroactively examine normal logs for
                                  anomalies, which isn't practical with debugging.




                                  -- Noel Jones
                                • Noel Jones
                                  ... Seems to me an error count, including unknown commands, would be helpful. err=n Or unknown commands/total errors maybe? err=n/T This would not be logged
                                  Message 16 of 18 , Jul 14, 2014
                                    On 7/14/2014 8:25 AM, Wietse Venema wrote:
                                    > In response to Noel's followup, here is a proposal that can make
                                    > Postfix trouble shooting / anomaly detection easier. This would
                                    > reveal information that is currently available only by turning on
                                    > verbose logging.
                                    >
                                    > Proposal:
                                    >
                                    > The Postfix SMTP server maintains two counters for each known
                                    > command: one counter for the total number of times the command was
                                    > issued during an SMTP session, and one counter for the number of
                                    > normal completions (a 2XX reply status). These counters are reset
                                    > before the server accepts the next SMTP connection. Perhaps there
                                    > should also be a counter for unknown commands.

                                    Seems to me an error count, including unknown commands, would be
                                    helpful. err=n

                                    Or unknown commands/total errors maybe? err=n/T

                                    This would not be logged if there are no errors.

                                    >
                                    > Upon disconnect. the Postfix SMTP server logs statistics for each
                                    > command that has a non-zero counter. The syntax is:
                                    >
                                    > command-name=normal-completions/total
                                    >
                                    > Example: a "normal" session with ESMTP handshake, one mail delivery
                                    > transaction with one recipient, and closed with "quit":
                                    >
                                    > ehlo=1/1 mail=1/1 rcpt=1/1 data=1/1 quit=1/1

                                    ...

                                    > The logging shows only counters for commands that were actually
                                    > issued. To save space we could replace "n/n" (two identical numbers)
                                    > with just "n". I don't know if this would actually simplify parsing.

                                    I vote for consistent 1/1. But I don't feel strongly about it.

                                    >
                                    > As the examples show this is really a small amount of text, so there
                                    > is no reason to increase logging overhead by using a separate record.
                                    > Since the stats would be logged at the end of a session, they can
                                    > be logged in the "disconnect" record.

                                    Thanks.



                                    -- Noel Jones

                                    >
                                    > Wietse
                                    >
                                  • Wietse Venema
                                    ... On second consideration, the main benefit is that anomalies become easier to recognize. This is best demonstrated with a few examples: - normal ESMTP
                                    Message 17 of 18 , Jul 14, 2014
                                      > A "normal" ESMTP session with vrfy:
                                      >
                                      > ehlo=1/1 vrfy=1/1 quit=1/1
                                      >
                                      > An "abnormal" session that drops after 10 rejected AUTH commands:
                                      >
                                      > ehlo=1/1 auth=0/10
                                      >
                                      > The logging shows only counters for commands that were actually
                                      > issued. To save space we could replace "n/n" (two identical numbers)
                                      > with just "n". I don't know if this would actually simplify parsing.

                                      On second consideration, the main benefit is that anomalies become
                                      easier to recognize.

                                      This is best demonstrated with a few examples:

                                      - "normal" ESMTP session with vrfy:

                                      ehlo=1 vrfy=1 quit=1

                                      - "abnormal" session that drops after 10 rejected AUTH commands:

                                      ehlo=1 auth=0/10

                                      Note that the "/" appears only when there is an anomaly. Here, the
                                      number of "good" auth commands (0) differs from the total number
                                      of auth commands (10).

                                      In a logfile analyzer, anomalies would match 'disconnect.*=\d+/\d+'
                                      (perl or pcre syntax). I think that we have a winner.

                                      Wietse
                                    • Noel Jones
                                      ... Bingo! (that s good) -- Noel Jones
                                      Message 18 of 18 , Jul 14, 2014
                                        On 7/14/2014 10:21 AM, Wietse Venema wrote:
                                        >
                                        > Note that the "/" appears only when there is an anomaly. Here, the
                                        > number of "good" auth commands (0) differs from the total number
                                        > of auth commands (10).
                                        >
                                        > In a logfile analyzer, anomalies would match 'disconnect.*=\d+/\d+'
                                        > (perl or pcre syntax). I think that we have a winner.
                                        >
                                        > Wietse
                                        >


                                        Bingo! (that's good)




                                        -- Noel Jones
                                      Your message has been successfully submitted and would be delivered to recipients shortly.