Loading ...
Sorry, an error occurred while loading the content.

Fwd: The Emperor's New Clothes

Expand Messages
  • Richard Jobity
    Interesting info on the 2.4.10 kernel Interesting reading all around. ... ===== ************** Richard Jobity, Trinidad and Tobago ICQ: 5183191, AIM: richjob
    Message 1 of 1 , Sep 30, 2001
      Interesting info on the 2.4.10 kernel

      Interesting reading all around.

      --- Moshe Bar <moshe@...> wrote:
      > To: moelabs@...
      > From: Moshe Bar <moshe@...>
      > Subject: The Emperor's New Clothes
      > Date: Sun, 30 Sep 2001 08:24:37 +0000
      > Reply-to: moelabs@...
      > Dear readers,
      > If the Linux kernel was a car, then as of version 2.4.10, the
      > engine is
      > a Ferrari one. The virtual memory management is a kernel�s engine
      > because it is the VM that determines how fast the a kernel car
      > drives,
      > how elastic its drag is and how well it behaves under stress.
      > Andrea Arcangeli, a friend of mine, is the guy who developed the
      > completely new VM engine for kernel 2.4.10. Actually he supplied
      > the
      > necessary patches for it already as of 2.4.10pre10, then fixed it
      > some
      > more and - out of the blue- Linus accepted the new VM and
      > incorporated
      > it into the official Linus kernel tree (Alan Cox is always a bit
      > behind
      > and his ac kernels are not up to the new VM yet.)
      > This week I spent two days with Andrea when we went to a very
      > prestigious university to speak about high performance computing
      > (Andrea
      > did the ccNUMA part of Linux and I am heavily involved with Mosix,
      > a
      > High Performance Computing clustering kernel extension). During the
      > two
      > days, over many a bottle of beer, we had plenty time to discuss his
      > new
      > VM. I was mainly interested how the new VM affects Mosix. Because
      > Mosix
      > must migrate virtual memory pages belonging to the programs�
      > address
      > spaces between cluster nodes, it is very important to correctly
      > understand the VM and interface efficiently to it.
      > I have yet to meet a heavy Linux user (be it for workstations or
      > servers) happy with it�s memory behaviour with any of the kernel
      > versions 2.4.0 to 2.4.9. All of them behaved very poorly,
      > undpredictably
      > and many people complained of �swapping storms�, unexplained heavy
      > swapping activity without any real memory shortage. Linus himself,
      > said
      > in a recent kernel list mailing that he is not happy yet with the
      > VM.
      > When I published my article at Byte.com comparing the VMs of
      > FreeBSD
      > 4.1.1 (then already an outdated version) to that of Linux 2.4.0
      > (see
      > http://www.byte.com/documents/s=558/byt20010130s0010/ ), I got over
      > 3000
      > emails. Most of the senders were angry at me for finding out in
      > benchmarks that FreeBSD was superior and higher-performing than
      > Linux.
      > Against two of the senders I had to contact the FBI because they
      > threatened me (one with fire to my house, the other with attacking
      > my
      > computers). Never heard from the FBI every since, but the point is
      > that
      > a lot of Linux bigots got angry for me saying how things were.
      > Andrea Arcangeli, wasn�t happy with the < 2.4.10 kernel performance
      > either. Mainly he took exception to:
      > � kswapd looping forever on DMA or NORMAL class-zones
      > � swap+ram will be almost all available address space (modulo when
      > the
      > swap cache serves to avoid swapins of shares anonymous memory after
      > a
      > fork)
      > � swapout storms
      > � benchmark, when run repeatedly gradually slow down
      > These problems were enough for many customers to resist the
      > migration to
      > the 2.4 kernels and instead continue using the 2.2.19 kind of
      > kernels.
      > Obviously, compared to 2.4., the 2.2. series have many
      > shortcomings.
      > Shortcomings like no zero-copy networking, the division of page
      > cache
      > and buffer-cache in file system operations, big spinlocks for many
      > parts
      > of the kernel, etc.
      > So, Andrea just sat down one evening and in one week wrote a new VM
      > from
      > scratch. One week! This is an extremely remarkeably feast. A VM is
      > a
      > major piece of software and very complex in its nature. One needs
      > to
      > satisfy many opposed objectives: efficient handling for server-type
      > loads and interactive-type loads at the same time. Ease of
      > implementations versus optimized use of every last and small
      > feature of
      > the CPU. It must also be able to run well on Intel CPUs spanning 4
      > or 5
      > generations as well as on AMD chips, Alphas, MIPSs, Sparcs, ARM and
      > what
      > have you not. Andrea, by the way, does all his development on an
      > Compaq
      > AlphaServer with 2 500Mhz CPUs and 3gigabyte RAM.
      > The implementation of the new VM results in a much simpler and
      > faster
      > VM. Let me explain how it works.
      > The old 2.4. VM had a major design problem and it manifested itself
      > mainly when freeing physically dirty pages (remember dirty pages
      > are the
      > frames of 4KB memory in the RAM whose contents have been modified
      > by one
      > of the virtual memory pages residing in it). The last owner of
      > the
      > page (usually the VM, except in swapoff) has to clear the dirty
      > flag
      > before freeing the page. When being swapped off in swapoff it may
      > be a
      > little more complicate (we may need to grab the pagecache_lock to
      > ensure nobody start using the page while we clear it).
      > So, Andrea went and did the following: All physical pages are
      > divided
      > now in active and inactive. These two are further divided into
      > dirty and
      > clean for both active and inactive. When the active dirty pages
      > become
      > about 66% of the total number of pages, the VM starts to scan them
      > for
      > the oldest ones to be put into inactive dirty and then, later
      > still,
      > from there to the swap when memory becomes tight. This part is very
      > central to the new VM and its simplicity is �. well, simply
      > stunning.
      > This simple mechanism totally changes the behaviour of the 2.4.19
      > kernel
      > under heavy load and also makes for much better predictability of
      > the
      > system. Another very important change (too complex to explain here,
      > but
      > Andrea and I worked on it together to make sure it is actually
      > efficient
      > under cluster load, too) is that the swap is now additional to the
      > RAM,
      > just like in 2.2 times. All earlier 2.4 kernels (since 2.3.48)
      > needed at
      > least the same amout of RAM in swap and then more to give you
      > additional
      > virtual memory. This meant that on a 8GB server, you needed to put
      > aside
      > almost a full 9GB disk just to be able to swap, just like in
      > Solaris or
      > HP-UX.
      > Finally, the page scanner doesn�t page scan if there theoretically
      > no
      > freeable pages, whereas before it did. Oh, and the stupid OOM
      > killer
      > (out of memory killer) which never really worked, so Andrea
      > disabled it,
      > as I did for all my kernels, too. It sometimes killed init, for
      > goodness� sake.
      > Upon returning home this week after meeting with Andrea, I went to
      > my
      > lab and searched for the disk images of the server comparison I ran
      > back
      > in January this year ( of FreeBSD 4.1.1 versus Linux 2.4.0.) I took
      > the
      > Compaq ML500 server I have been reviewing (2x 1Ghz Cpus, 2GB RAM)
      > and
      > upgraded both the FreeBSD disk image to 4.4-Stable and the Linux
      > version
      > to 2.4.10. I also upgraded to the latest stable versions of
      > Sendmail
      > (8.12.1) and MySQL (version 3.23.42). I also compiled everything
      > with
      > the latest version of gcc, 3.0.2 and tuned the two instances to
      > best of
      > knowledge (softupdates and increased maxusers for FreeBSD and
      > untouched
      > default values for Linux).
      > Upon running the benchmarks again, I noticed that the 2.4.10 kernel
      > is
      > now almost the same speed as the FreeBSD�s, but still a bit slower.
      > Since 2.4.10 is just the first release with the new VM,
      > improvements are
      > certain to come. Marcelo Tosatti, the 18year old kernel genius
      > working
      > for Conectiva (like Rik van Riel) in Brasil, understood the new VM
      > by
      > reading the code within 1 hour of it being available for download
      > and
      > proposed important changes to it. Linus has by the way, variously
      > played
      > with the idea of making Marcelo the official VM maintainer.
      > The actual benchmarking will go into a separate article, maybe for
      > my
      > Byte column, as it is a story by itself.
      > The story of this mailing is that the 2.4. kernel has finally grown
      > up
      > with the 2.4.10 release. Nobody, or only a handful of people, have
      > realized that. Now, you know about it, too. Spread the good news
      > and
      > immediately install 2.4.10 on your busy server. The server will
      > thank
      > you for it.
      > Get back to me with any questions. Oh, and by the way, any comments
      > are
      > welcome. I don�t know if it is the recession, the high-tech slump
      > or the
      > tragic events from early September,but comments to my mailings have
      > slowed down to a few precious emails. For the first time since
      > early
      > 1998, when I created this mailing list, the number of subscribers
      > is
      > going down. Subscribers go away siltently in mailing list. The
      > mailings
      > to the bounce and they get removed. I guess it is a sign of times
      > that
      > net number has started (?) to go down. I wonder if this list is
      > still in
      > tune with times and if it is still has a place in your Inbox. Let
      > me
      > know.
      > Best regards
      > Moshe Bar
      > Copyright 2001 by Moshe Bar � Reprint and publication without my
      > consent
      > not permitted.
      > ==^================================================================
      > EASY UNSUBSCRIBE click here: http://topica.com/u/?a84vMl.a9hpil
      > Or send an email To: moelabs-unsubscribe@...
      > This email was sent to: richjob@...
      > T O P I C A -- Register now to manage your mail!
      > http://www.topica.com/partner/tag02/register
      > ==^================================================================

      Richard Jobity, Trinidad and Tobago
      ICQ: 5183191, AIM: richjob

      "I see dead websites."

      Do You Yahoo!?
      Listen to your Yahoo! Mail messages from any phone.
    Your message has been successfully submitted and would be delivered to recipients shortly.