We do a lot of our intensive runs on a beowulf-style cluster using
what's sometimes called an "island" model, in which nearly-
independent runs are launched and run independently and
asynchronously on each of the nodes, and in which small numbers of
individuals (programs) are allowed to migrate occasionally between
nodes (once per generation in our case). This is similar in many ways
to what John Koza and many others have done -- there are a lot of
variations (e.g. with respect to possible migration patterns), but
even the simplest variations (e.g. allowing migration between any
pair of nodes) often produce good results. And it's simple to
implement -- you just have to arrange for "sending" migrants
occasionally, and for using "received" migrants occasionally when
selecting individuals/parents for the next generation. And you have
to do something to launch/kill/etc. all of your processes across the
network -- for this we use a bunch of hacked-together shell scripts,
the details depend on the cluster configuration etc., but essentially
these are all just loops that call ssh on each of the nodes. Aside
from these changes (the migration stuff and the launching/killing
stuff) one can use existing GP code as-is.
Since it's often actually better to use small numbers of migrants
than it is to use large numbers of migrants (e.g. because diversity
is better maintained with low migration rates), and since no
synchronization is needed, the demands of this approach on the
network are extremely low. You can even do the communication through
the file system (with a cross-mounted directory), with no noticeable
performance hit. That's how I did it with some of my Lisp-based GP
systems, though I'm now using a system that does the communication
via sockets (the clusterized version of PushGP built into the breve
simulation environment; http:www.spiderland.org/breve).
I think this technique often produces a lot of bang for the buck --
very little programming overhead and very little demands on the
network, but a lot of problem-solving power delivered by the cluster.
PS for a completely different take on parallelizing GP via the web
see the paper that Jon Klein and I presented at GECCO on "unwitting"
genetic programming: http://hampshire.edu/lspector/unwitting-gecco-2007/
On Aug 24, 2007, at 8:39 AM, Emyr James wrote:
> I'm currently doing my own GP implementation in C++.
> I tried to use beagle but I found the learning curve of the code to
> be a
> bit too much for the time I have available for my MSc project.
> I have access to a 30 node Beowulf cluster here and I'm intending
> to use
> the MPI library to do some parallel stuff.
> Rather than trying to parallelise the code, I'm simply going to do
> of independent runs on each node so that I can get data for 30 runs
> at once.
> I think this is the best way to parallelise any monte-carlo type
> application. You typically want to run the same experiment many times
> using the same parameters to get some decent statisitics and doing it
> this way gives you the best speedup possible.
> Trying to do something more complicated will just lose you
> efficiency in
> the inter-node comms. With this simple parallelisation approach
> there is
> hardly any node comms so you get the best possible speedup.
> dgoe@... wrote:
> > Christophe:
> > Linux is used to do most of the high powered clustering.
> > Most of this type uses Linux exchanage.
> > Programming is involved to make the task and data available to
> > do the computational sub task that is handed off to any of the
> sub task
> > processors.
> > Although you could run different Mutations/Crossovers... on
> > machines as stand alone, and pass the result back to the central
> > for the next generation. Havn't worked out the sharing of data
> sets and
> > the update, but I am sure there are ways to do it with mimimal
> > BOINC allows some of this and seems to be state of the art in web
> > processiong. This is free software.
> > http://boinc.berkeley.edu/ <http://boinc.berkeley.edu/>
> > If you get into this please let me know.
> > I will eventually want to use this across the web.
> > You can probable work out some exchange of data from machine to
> > and pass the results back thru a local area network(lan).
> > Hope this helps.
> > Are you familar with Koza GP algorithm?
> > Would appreciated anything to better understand Koza methodology.
> > Dan Goe
> > Exclusive use only.
> > ----------------------------------------------------
> > From : christophe_jacquelin <cjacquel@...
> > <mailto:cjacquel%40imagetomosaic.com>>
> > To : firstname.lastname@example.org
> > <mailto:genetic_programming%40yahoogroups.com>
> > Subject : [GP] GP on several computers in //
> > Date : Wed, 22 Aug 2007 20:11:13 -0000
> > > Hello,
> > >
> > > How to run a GP program on several computers in parallel? (like
> > > 1000 PC of Mr Koza).
> > > Which software to use to manage the different computers ?
> > >
> > > Thanks,
> > > Christophe,
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
Phone: 413-559-5352, Fax: 413-559-5438