Re: sporadic bouts of lost connections to exchange 2010 hub transport
- On 9/25/2012 8:29 AM, Ralf Hildebrandt wrote:
> * Mikael Bak <mbak@...>:First, this does seem to be a rare issue. Given the behavior you're
>> Hi Stan,
>> On 09/25/2012 08:22 AM, Stan Hoeppner wrote:
>>> Apparently Linux and Windows TCP window scaling doesn't always work
>>> reliably together. Try disabling TCP window scaling on the Linux box(en):
>> Perhaps off topic, but do you have any links to documents or similar
>> that proves that there is a problem between the two operationg systems
>> with regard to TCP window scaling. This is the first time I hear about
>> this to be honest.
> I was wondering about this as well. I mean, it doesn't happen THAT
seeing it seems likely the problem is in the TCP stack. TCP window
scaling mis-negotiation simply seems a likely culprit. Linux kernels
have a workaround hack for window scaling issues:
man 7 tcp
tcp_workaround_signed_windows (Boolean; default: disabled;
since Linux 2.6.26)
If enabled, assume that no receipt of a window-scaling
option means that the remote TCP is broken and treats
the window as a signed quantity. If disabled, assume
that the remote TCP is not broken even if we do not
receive a window scaling option from it.
To me this seems a partial workaround, not an absolute, which is why I
recommended testing with window scaling totally disabled on one side of
the connection. Since window scaling is designed to maximize throughput
for streaming data transfer applications such as FTP, disabling it will
have little, if any, negative impact on SMTP traffic, which is
transactional and bursty in nature. Disabling windows scaling in your
Postfix/Exchange case should simply force both to use the RFC1323 64KB
max window size. If the problem is window negotiation, disabling it
should fix the problem.
The rarity of manifestation seems to indicate that on occasion you have
long bursts of traffic between the two hosts--bursts of sufficient
duration to cause one or both hosts to initiate window scaling to
increase throughput. When this occurs, and if negotiation fails, you
may see things break at the application level.
Regarding docs or links, I couldn't find any official documentation
describing this issue, only a few scattered forum posts, which is likely
directly related to the rarity of occurrence.
You could always put a trace on the Linux ethernet interface to confirm
the TCP problem. But given the rarity of occurrence, twice in 4 weeks,
that would yield a rather large file to search.