Fwd: The Emperor's New Clothes
- Interesting info on the 2.4.10 kernel
Interesting reading all around.
--- Moshe Bar <moshe@...> wrote:
> To: moelabs@...=====
> From: Moshe Bar <moshe@...>
> Subject: The Emperor's New Clothes
> Date: Sun, 30 Sep 2001 08:24:37 +0000
> Reply-to: moelabs@...
> Dear readers,
> If the Linux kernel was a car, then as of version 2.4.10, the
> engine is
> a Ferrari one. The virtual memory management is a kernel�s engine
> because it is the VM that determines how fast the a kernel car
> how elastic its drag is and how well it behaves under stress.
> Andrea Arcangeli, a friend of mine, is the guy who developed the
> completely new VM engine for kernel 2.4.10. Actually he supplied
> necessary patches for it already as of 2.4.10pre10, then fixed it
> more and - out of the blue- Linus accepted the new VM and
> it into the official Linus kernel tree (Alan Cox is always a bit
> and his ac kernels are not up to the new VM yet.)
> This week I spent two days with Andrea when we went to a very
> prestigious university to speak about high performance computing
> did the ccNUMA part of Linux and I am heavily involved with Mosix,
> High Performance Computing clustering kernel extension). During the
> days, over many a bottle of beer, we had plenty time to discuss his
> VM. I was mainly interested how the new VM affects Mosix. Because
> must migrate virtual memory pages belonging to the programs�
> spaces between cluster nodes, it is very important to correctly
> understand the VM and interface efficiently to it.
> I have yet to meet a heavy Linux user (be it for workstations or
> servers) happy with it�s memory behaviour with any of the kernel
> versions 2.4.0 to 2.4.9. All of them behaved very poorly,
> and many people complained of �swapping storms�, unexplained heavy
> swapping activity without any real memory shortage. Linus himself,
> in a recent kernel list mailing that he is not happy yet with the
> When I published my article at Byte.com comparing the VMs of
> 4.1.1 (then already an outdated version) to that of Linux 2.4.0
> http://www.byte.com/documents/s=558/byt20010130s0010/ ), I got over
> emails. Most of the senders were angry at me for finding out in
> benchmarks that FreeBSD was superior and higher-performing than
> Against two of the senders I had to contact the FBI because they
> threatened me (one with fire to my house, the other with attacking
> computers). Never heard from the FBI every since, but the point is
> a lot of Linux bigots got angry for me saying how things were.
> Andrea Arcangeli, wasn�t happy with the < 2.4.10 kernel performance
> either. Mainly he took exception to:
> � kswapd looping forever on DMA or NORMAL class-zones
> � swap+ram will be almost all available address space (modulo when
> swap cache serves to avoid swapins of shares anonymous memory after
> � swapout storms
> � benchmark, when run repeatedly gradually slow down
> These problems were enough for many customers to resist the
> migration to
> the 2.4 kernels and instead continue using the 2.2.19 kind of
> Obviously, compared to 2.4., the 2.2. series have many
> Shortcomings like no zero-copy networking, the division of page
> and buffer-cache in file system operations, big spinlocks for many
> of the kernel, etc.
> So, Andrea just sat down one evening and in one week wrote a new VM
> scratch. One week! This is an extremely remarkeably feast. A VM is
> major piece of software and very complex in its nature. One needs
> satisfy many opposed objectives: efficient handling for server-type
> loads and interactive-type loads at the same time. Ease of
> implementations versus optimized use of every last and small
> feature of
> the CPU. It must also be able to run well on Intel CPUs spanning 4
> or 5
> generations as well as on AMD chips, Alphas, MIPSs, Sparcs, ARM and
> have you not. Andrea, by the way, does all his development on an
> AlphaServer with 2 500Mhz CPUs and 3gigabyte RAM.
> The implementation of the new VM results in a much simpler and
> VM. Let me explain how it works.
> The old 2.4. VM had a major design problem and it manifested itself
> mainly when freeing physically dirty pages (remember dirty pages
> are the
> frames of 4KB memory in the RAM whose contents have been modified
> by one
> of the virtual memory pages residing in it). The last owner of
> page (usually the VM, except in swapoff) has to clear the dirty
> before freeing the page. When being swapped off in swapoff it may
> be a
> little more complicate (we may need to grab the pagecache_lock to
> ensure nobody start using the page while we clear it).
> So, Andrea went and did the following: All physical pages are
> now in active and inactive. These two are further divided into
> dirty and
> clean for both active and inactive. When the active dirty pages
> about 66% of the total number of pages, the VM starts to scan them
> the oldest ones to be put into inactive dirty and then, later
> from there to the swap when memory becomes tight. This part is very
> central to the new VM and its simplicity is �. well, simply
> This simple mechanism totally changes the behaviour of the 2.4.19
> under heavy load and also makes for much better predictability of
> system. Another very important change (too complex to explain here,
> Andrea and I worked on it together to make sure it is actually
> under cluster load, too) is that the swap is now additional to the
> just like in 2.2 times. All earlier 2.4 kernels (since 2.3.48)
> needed at
> least the same amout of RAM in swap and then more to give you
> virtual memory. This meant that on a 8GB server, you needed to put
> almost a full 9GB disk just to be able to swap, just like in
> Solaris or
> Finally, the page scanner doesn�t page scan if there theoretically
> freeable pages, whereas before it did. Oh, and the stupid OOM
> (out of memory killer) which never really worked, so Andrea
> disabled it,
> as I did for all my kernels, too. It sometimes killed init, for
> goodness� sake.
> Upon returning home this week after meeting with Andrea, I went to
> lab and searched for the disk images of the server comparison I ran
> in January this year ( of FreeBSD 4.1.1 versus Linux 2.4.0.) I took
> Compaq ML500 server I have been reviewing (2x 1Ghz Cpus, 2GB RAM)
> upgraded both the FreeBSD disk image to 4.4-Stable and the Linux
> to 2.4.10. I also upgraded to the latest stable versions of
> (8.12.1) and MySQL (version 3.23.42). I also compiled everything
> the latest version of gcc, 3.0.2 and tuned the two instances to
> best of
> knowledge (softupdates and increased maxusers for FreeBSD and
> default values for Linux).
> Upon running the benchmarks again, I noticed that the 2.4.10 kernel
> now almost the same speed as the FreeBSD�s, but still a bit slower.
> Since 2.4.10 is just the first release with the new VM,
> improvements are
> certain to come. Marcelo Tosatti, the 18year old kernel genius
> for Conectiva (like Rik van Riel) in Brasil, understood the new VM
> reading the code within 1 hour of it being available for download
> proposed important changes to it. Linus has by the way, variously
> with the idea of making Marcelo the official VM maintainer.
> The actual benchmarking will go into a separate article, maybe for
> Byte column, as it is a story by itself.
> The story of this mailing is that the 2.4. kernel has finally grown
> with the 2.4.10 release. Nobody, or only a handful of people, have
> realized that. Now, you know about it, too. Spread the good news
> immediately install 2.4.10 on your busy server. The server will
> you for it.
> Get back to me with any questions. Oh, and by the way, any comments
> welcome. I don�t know if it is the recession, the high-tech slump
> or the
> tragic events from early September,but comments to my mailings have
> slowed down to a few precious emails. For the first time since
> 1998, when I created this mailing list, the number of subscribers
> going down. Subscribers go away siltently in mailing list. The
> to the bounce and they get removed. I guess it is a sign of times
> net number has started (?) to go down. I wonder if this list is
> still in
> tune with times and if it is still has a place in your Inbox. Let
> Best regards
> Moshe Bar
> Copyright 2001 by Moshe Bar � Reprint and publication without my
> not permitted.
> EASY UNSUBSCRIBE click here: http://topica.com/u/?a84vMl.a9hpil
> Or send an email To: moelabs-unsubscribe@...
> This email was sent to: richjob@...
> T O P I C A -- Register now to manage your mail!
Richard Jobity, Trinidad and Tobago
ICQ: 5183191, AIM: richjob
"I see dead websites."
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.