Re: [GP] Binary Codings; & Living Genetic Systems

Expand Messages
• ... I ve only coded a single binary GA so perhaps my assumptions are wrong. (I ve coded and use numerous integer and FPGAs). Or maybe my use of the term
Message 1 of 10 , Jun 30, 2006
• 0 Attachment
> Dave, I don't see how you can justify saying that the result of crossover is
> randomly valued. Your respect for binary coded GAs might go up if you
> clarified that point for yourself.

I've only coded a single binary GA so perhaps my assumptions are wrong.
(I've coded and use numerous integer and FPGAs). Or maybe my use of
the term "random" is inaccurate. Here is what I mean:

Lets assume that the binary chrom encodes a set of integers that are used
in a numerical optimization problem. I also assume that a straight, non-grey
coding is used.

To perform a single point crossover in binary GA, randomly choose a
position in the chrom and perform a cut at that point and recombine
with another chromosome that has been cut at the same place.

If that cut occurs within a parameter, then what will the newly encoded
integer value be? It will be a new number that contains the high order bits
of the first chrom and the low order bits of the second in one chrom,
and its compliment in the other chrom.

Now if the cut occurs in the low order bits, then the new integer will be
"close" to the second in value, but its actual value can't be predicted
entirely. Further, if the cut occurs in the higher order bits, then the new
integer value can be wildly different from the original value(s). This is what
I meant by "random" -- the delta between the original integer(s) and the
newly formed one is not entirely predicatible, and so that is why I say
that it resembles mutation in a sense, introducing wildly different integer
values into the search. FPGAs do not do this as a result of crossover.

(Aside -- perhaps what I'm about to assert is blasphemy, but it seems
that the only reason that Xover is not a catastrophic mechanism in
BGAs is because the set of encodings across the population for a
particular integer tends to converge to a single value over the course
of the evolution. I.e., if you look at a single "column" in the
population and watch it, it tends to become homogeneous. At that
point, crossover is exactly as it is in FPGA and the problem of
cutting an integer in half is not an issue. Essentially the BGA has
become an FPGA with respect to the crossover operation.)

Of course, if the binary string encodes something other than an integer,
then this may not be the case, which brings me to a second point.

necessarily better" I would entirely agree: all representation schemes are
subject to the problem space that is being searched. This is the basis for
the "No Free Lunch" theorem, of course -- no representation scheme
can be separated (in the evaluation of it) from the problem you are
trying to solve.

> With respect to living systems, are you trying to argue that the proportion
> of exon to intron should be constant across genomes of different sizes, or
> that some other rule should hold? What I hear you saying is that introns are
> always good, because they help force crossover (which chemically speaking
> could occur at any base pair) to happen at useful boundaries within the
> genome.

Yes essentially that is it.

I am arguing that the number of crossovers that can occur without
destroying a gamete cell is subject to the amount of disruption that
can be absorbed by the gene set, and that this number increases with
larger and larger intron space. As I pointed out in my last note,
exon shuffling is only possible because of introns, and organisms
without introns do not use crossover as an evolutionary mechanism. It
may be that the number of crossovers that can occur in the formation
of a gamete is dictated by this exon to intron ratio, and that
examining this value across species may show some kind of pan-species
relationship along this line. It is true that "high-order" eukaryotes
(multicellular species with discernible nuclei) have large intron
spaces and prokaryotes (single cell organisms without nuclei) have
very few. Introns serve to stabilize the crossover operation in a sense.

More importantly I think we are all saying that in some problem
spaces, such as numerical optimization, certain chromosome
organizations can enhance the evolutionary process, and that there is
value in modularizing the chromosome. I guess I am also saying that
nature has provided a good example of this in living genetic systems.

Dave Faulkner
• hello again list! this is turning to be a nice discussion may I mention Sean b Carroll s book endless forms most beautiful - the idea of evo devo which
Message 2 of 10 , Jul 1, 2006
• 0 Attachment
hello again list! this is turning to be a nice discussion

may I mention Sean b Carroll's book endless forms most beautiful - the idea
of "evo devo" which explains for dilettantes like me with no hard biological
knowledge about the embryological significance of the position of switch
components in dna strings and how if they move (as a result of crossover )it
can have a huge affect on the developmental history of the phenotype.

so as falkner says, representations are not a simple mapping, and any useful
scheme will have to be many layered with meta instructions coded for in
meta-meta stings coded for in meta-meta-meta strings and so ad infinitum (or
as many levels as have evolved to be handy)

also I am not so sure Carroll really believes in 'junk' dna though I have
often thought like falkner they would work well as buffer zones

dawkins old article the evolution of evolvability (AL II 1989?) is a good
intro to the importance of developmental algorithms as well as genetic ones
in any morphological project anyway.

and of course if you want to evolve these meta instructions/emergent
developmental algorithms why not turn to GP which is set up to just this?

hurrah!

paul coates
UEL UK

________________________________

From: genetic_programming@yahoogroups.com on behalf of Dave Faulkner
Sent: Sat 01-Jul-06 7:12 AM
To: genetic_programming@yahoogroups.com
Subject: Re: [GP] Binary Codings; & Living Genetic Systems

> Dave, I don't see how you can justify saying that the result of crossover
is
> randomly valued. Your respect for binary coded GAs might go up if you
> clarified that point for yourself.

I've only coded a single binary GA so perhaps my assumptions are wrong.
(I've coded and use numerous integer and FPGAs). Or maybe my use of
the term "random" is inaccurate. Here is what I mean:

Lets assume that the binary chrom encodes a set of integers that are used
in a numerical optimization problem. I also assume that a straight, non-grey
coding is used.

To perform a single point crossover in binary GA, randomly choose a
position in the chrom and perform a cut at that point and recombine
with another chromosome that has been cut at the same place.

If that cut occurs within a parameter, then what will the newly encoded
integer value be? It will be a new number that contains the high order bits
of the first chrom and the low order bits of the second in one chrom,
and its compliment in the other chrom.

Now if the cut occurs in the low order bits, then the new integer will be
"close" to the second in value, but its actual value can't be predicted
entirely. Further, if the cut occurs in the higher order bits, then the new
integer value can be wildly different from the original value(s). This is
what
I meant by "random" -- the delta between the original integer(s) and the
newly formed one is not entirely predicatible, and so that is why I say
that it resembles mutation in a sense, introducing wildly different integer
values into the search. FPGAs do not do this as a result of crossover.

(Aside -- perhaps what I'm about to assert is blasphemy, but it seems
that the only reason that Xover is not a catastrophic mechanism in
BGAs is because the set of encodings across the population for a
particular integer tends to converge to a single value over the course
of the evolution. I.e., if you look at a single "column" in the
population and watch it, it tends to become homogeneous. At that
point, crossover is exactly as it is in FPGA and the problem of
cutting an integer in half is not an issue. Essentially the BGA has
become an FPGA with respect to the crossover operation.)

Of course, if the binary string encodes something other than an integer,
then this may not be the case, which brings me to a second point.

necessarily better" I would entirely agree: all representation schemes are
subject to the problem space that is being searched. This is the basis for
the "No Free Lunch" theorem, of course -- no representation scheme
can be separated (in the evaluation of it) from the problem you are
trying to solve.

> With respect to living systems, are you trying to argue that the proportion
> of exon to intron should be constant across genomes of different sizes, or
> that some other rule should hold? What I hear you saying is that introns
are
> always good, because they help force crossover (which chemically speaking
> could occur at any base pair) to happen at useful boundaries within the
> genome.

Yes essentially that is it.

I am arguing that the number of crossovers that can occur without
destroying a gamete cell is subject to the amount of disruption that
can be absorbed by the gene set, and that this number increases with
larger and larger intron space. As I pointed out in my last note,
exon shuffling is only possible because of introns, and organisms
without introns do not use crossover as an evolutionary mechanism. It
may be that the number of crossovers that can occur in the formation
of a gamete is dictated by this exon to intron ratio, and that
examining this value across species may show some kind of pan-species
relationship along this line. It is true that "high-order" eukaryotes
(multicellular species with discernible nuclei) have large intron
spaces and prokaryotes (single cell organisms without nuclei) have
very few. Introns serve to stabilize the crossover operation in a sense.

More importantly I think we are all saying that in some problem
spaces, such as numerical optimization, certain chromosome
organizations can enhance the evolutionary process, and that there is
value in modularizing the chromosome. I guess I am also saying that
nature has provided a good example of this in living genetic systems.

Dave Faulkner

[Non-text portions of this message have been removed]
• Message 3 of 10 , Jul 1, 2006
• 0 Attachment
<< I am resubmitting this post as I did not receive it back after 12 hours.
<< My apologies if you received this twice.

Sorry but I have to rebut this response ......

Mariusz wrote:

> The description above is a simplification - the functional mapping from
> DNA to proteins is not as simple as described above. All levels: DNA
> molecules, RNA and mRNA molecules, etc, together with other proteins may
> (and often do) participate in formation of new protein strands from a
> given DNA codon(s). The reactions are not quite like simple sequences
> (or directer-graph like), but they all are more network-like, with
> hypercyclic dependencies.

...etc.

Of course this is a simplification. This is a discussion about
genetic mechanisms and not molecular biology; it is also not a
construction is controlled via transcription regulatory networks. It
is a discussion about how a chromosome can be organized to
enhance the evolutionary process.

Further, the role of the intron regions is not understood. Most of
the known control regions are actually codings for proteins derived
from genes that are in the exons - not introns. I.e., proteins are
constructed to control the creation of other proteins. Moreover large
regions of introns are simple repetitions of nonsense sequences that have
no useable function, control or otherwise -- except perhaps to serve
as a buffer for crossover and mutation events, or to serve as spacing
when the DNA folds back on itself as part of some control. Theory
about the role of introns seems to me very speculative based on
observations of the conservation of certain sequences, etc.

Actually this spacing concept of introns may be a useful general
concept in recombination. A single protein can be encoded across a
sequence of exons and introns, and if the crossover event occurs
within that gene but hits an intron region, then the protein will
survive and a new protein may emerge. This is because each exon codes
for a module within the protein, and the crossover rearranges the
modules rather than individual amino acid encodings. (ref: "Molecular
Biology of the Cell", 4th edition, p462.) But this is similar to the
FPGA organization if some number of FP numbers map to a single
parameter in the objective function. This differs from the typical
FPGA organization but might be a useful idea of modularity within
the chromosome.

Continuing, isn't it interesting that there are so relatively few exons in
species like bacteria (i.e., asexually reproducing species) that do NOT
perform crossover? For these species, mutation is the primary mechanism
of change in their evolutionary path (excluding plasmids).

Why is it so difficult to accept that 98% of the chromosome is non-coding?
At least half of my code, character by character, is comments and spacing!
Actually I'd like to claim credit for this idea (it came to me in class one night)
but it is actually discussed somewhat in the above reference. What seems
clear to me is that introns serve as a minimum to organize the chromosome
into modules that resist crossover disruption, and this seems very similar
to me as the chromosome organization in an FPGA.

On the other hand, I would certainly concede that introns are not entirely
without function (another simplification on my part) and you've got me
doing some reading on the topic. Apparently there are at least five
types of introns, some of which code for regulatory iRNA.

I would like to conclude with the following quote from ScientificAmerica.com:

"In 1978 Walter Gilbert of Harvard expressed a different view of the
nature of introns (in the same report in which he coined the terms
'exon' and 'intron'). He suggested that introns could speed up
evolution by promoting genetic recombinations between exons. This
process (which he called 'exon shuffling') would be directly
associated with formation of new genes. Introns, from this
perspective, have a profound purpose. They serve as hot spots for
recombination in the formation of new combinations of exons. In other
words, they are in our genes because they have been used during
evolution as a faster pathway to assemble new genes. Over the past 10
years, the exon shuffling idea has been supported by data from various
experimental approaches."
• Dave, Thanks for clarifying your meaning. ... OK, assuming a straight forward integer encoding (this does need to be distinguished from binary encoding, even
Message 4 of 10 , Jul 1, 2006
• 0 Attachment
Dave,

>Now if the cut occurs in the low order bits, then the new integer will be
>"close" to the second in value, but its actual value can't be predicted
>entirely. Further, if the cut occurs in the higher order bits, then the
>new
>integer value can be wildly different from the original value(s). This is
>what
>I meant by "random" -- the delta between the original integer(s) and the
>newly formed one is not entirely predicatible,

OK, assuming a straight forward integer encoding (this does need to be
distinguished from "binary" encoding, even for numerical parameters) the max
difference is just less than the value of the bit that is just on the high
side of the cut, for both children. I don't know if that qualifies as wildly
unpredictable, but it is the mechanism that allows the algorithm to jump to
a new part of the search space. Crossover jumps, mutation creeps.

>and so that is why I say
>that it resembles mutation in a sense, introducing wildly different integer
>values into the search. FPGAs do not do this as a result of crossover.

You'll have to specify your favorite FP crossover operator before we can
discuss what it does. I can imagine lots of variations on FP crossover. Do
you assume that crossover always occurs between FP parameters, which implies
that crossover never changes the value of a FP param, just rearranges them?
Or do you assume that crossover can occur at an FP locus, which implies
feeding the two parent values into a function and getting a new value for
each of the children at that locus?

>
>(Aside -- perhaps what I'm about to assert is blasphemy, but it seems
>that the only reason that Xover is not a catastrophic mechanism in
>BGAs is because the set of encodings across the population for a
>particular integer tends to converge to a single value over the course
>of the evolution. I.e., if you look at a single "column" in the
>population and watch it, it tends to become homogeneous. At that
>point, crossover is exactly as it is in FPGA and the problem of
>cutting an integer in half is not an issue. Essentially the BGA has
>become an FPGA with respect to the crossover operation.)

Domino convergence is to be expected in BinInt problems. I agree there is a
very rough analogy here to what you've been saying about exons and introns,
but not necessarily to FPGAs. In any case, jumping around the search space
is not a catastrophe, it is the way the algorithm works.

> > With respect to living systems, are you trying to argue that the
>proportion
> > of exon to intron should be constant across genomes of different sizes,
>or
> > that some other rule should hold? What I hear you saying is that introns
>are
> > always good, because they help force crossover (which chemically
>speaking
> > could occur at any base pair) to happen at useful boundaries within the
> > genome.
>
>Yes essentially that is it.
>Introns serve to stabilize the crossover operation in a sense.

It's an intersting conjecture, but I'm not aware of evidence to support it.
I've heard the contrary, that intron/exon ratios rise in "higher" animals.
If so, this could be taken as evidence that introns are more useful for
creating alternative splicing sites within a gene than for creating any kind
of guarantee that crossover takes place predominantly at gene boundaries. I
don't know enough about intron research to speculate if different kinds of
intron could serve different purposes.
• ... To a good first approximation (and modulo a number of notable outliers ), the ratio of non-coding to coding DNA in the so-called higher eukaryotes is
Message 5 of 10 , Jul 1, 2006
• 0 Attachment
On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

> With respect to living systems, are you trying to argue
> that the proportion of exon to intron should be constant
> across genomes of different sizes, or that some other rule
> should hold?

To a good first approximation (and modulo a number of notable
"outliers"), the ratio of non-coding to coding DNA in the
so-called "higher eukaryotes" is directly proportional to
genome size.

In other words, to a first approximation, a genome that is
twice as large will have about twice as much non-coding DNA.

Put a third way, most of the "higher eukaryotes" so far sequences
have on the order of 10,000--30,000 "coding regions," each of which
typically contains between 1000 and 10,000 "coding characters;"
the remainder of the genome is (as far as anyone currently knows)
"non-coding" (or at least, it doesn't code for _proteins_ ---
albeit it cannot yet be ruled out that some of the putative
"non-coding" DNA might actually code for "small regulatory RNAs").
Since all euk genomes have roughly the same order of magnitude
of "coding" DNA, and since this "coding" DNA accounts for only
a small fraction of the total genome size, with the remainder being
allegedly "non-coding," it is perfectly natural that the amount
of so-called "non-coding" (AKA "Junk") DNA would be directly
proportional to genome size.

There is also the additional complication that the so-called
"higher eukaryotes" seem to make more use of "variant splicing"
that the single-celled eukaryotes, i.e., each coding region
actually codes for multiple (but related) protein products.
Hence the number of "genes" may actually be significantly larger
than the number of "coding regions."

One empirical datum in favor of the idea that a significant
fraction of eukaryotic regulation may be carried out by small RNAs
rather than proteins is the observation that in prokaryotes,
the number of "metabolic" genes appears to grow roughly linearly
with genome size, whereas the number of "regulatory" genes appears
to grow roughly quadratically with genome size. Assuming for the
moment that both these empirical "scaling laws" continued
indefinitely, it would imply that there would be more than one
regulator per metabolic gene in any prokaryote larger than
10,000--20,000 genes, i.e., the "regulatory" genes would
outnumber the genes being controlled, and the genome would be
"mostly regulatory," which seems absurd --- and indeed,
prokaryotes larger than this size do not seem to exist.
For the empirical data supporting this argument, see
<http://www.arxiv.org/abs/q-bio.MN/0311021>,
whose authors speculate that the so-called "higher eukaryotes"
may have had to evolve novel control mechanisms such as the use
of small "non-coding" regulatory RNAs to circumvent the problem
that a genome larger than 10,000-20,000 genes would be mostly
"regulatory" rather than "metabolic" if it attempted to use
protein interactions to regulate its metabolism.

which likewise explores the idea that the "higher euks"
may be dominated by small "non-coding" regulatory RNAs
rather than proteins.

-- Gordon D. Pusch

perl -e '\$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'
• Hi, For anyone who is interested some of these topics will be touched on in the Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
Message 6 of 10 , Jul 5, 2006
• 0 Attachment
Hi,

For anyone who is interested some of these topics will be touched on in the
Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
evolutionary algorithms with variable length codings exhibit a variety of
interesting behaviors in terms of both growth and contraction and coding and
non-coding regions. It seems likely that how genomes evolve even in our
relatively simple artificial systems is a lot more complex, and interesting,
than is widely assumed. The tutorial (although it often becomes more of a
discussion than a tutorial) covers recent research into how pressure for
genetically robust/resilient solutions influences the evolutionary process,
particularly genome size, coding/non-coding regions, etc.

Hope to see you there,
Terry Soule
Department of Computer Science
University of Idaho
tsoule@...

-----Original Message-----
From: genetic_programming@yahoogroups.com
[mailto:genetic_programming@yahoogroups.com]On Behalf Of Gordon D. Pusch
Sent: Saturday, July 01, 2006 10:22 PM
To: genetic_programming@yahoogroups.com
Subject: Re: [GP] Binary Codings; & Living Genetic Systems

On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

> With respect to living systems, are you trying to argue
> that the proportion of exon to intron should be constant
> across genomes of different sizes, or that some other rule
> should hold?

To a good first approximation (and modulo a number of notable
"outliers"), the ratio of non-coding to coding DNA in the
so-called "higher eukaryotes" is directly proportional to
genome size.

In other words, to a first approximation, a genome that is
twice as large will have about twice as much non-coding DNA.

Put a third way, most of the "higher eukaryotes" so far sequences
have on the order of 10,000--30,000 "coding regions," each of which
typically contains between 1000 and 10,000 "coding characters;"
the remainder of the genome is (as far as anyone currently knows)
"non-coding" (or at least, it doesn't code for _proteins_ ---
albeit it cannot yet be ruled out that some of the putative
"non-coding" DNA might actually code for "small regulatory RNAs").
Since all euk genomes have roughly the same order of magnitude
of "coding" DNA, and since this "coding" DNA accounts for only
a small fraction of the total genome size, with the remainder being
allegedly "non-coding," it is perfectly natural that the amount
of so-called "non-coding" (AKA "Junk") DNA would be directly
proportional to genome size.

There is also the additional complication that the so-called
"higher eukaryotes" seem to make more use of "variant splicing"
that the single-celled eukaryotes, i.e., each coding region
actually codes for multiple (but related) protein products.
Hence the number of "genes" may actually be significantly larger
than the number of "coding regions."

One empirical datum in favor of the idea that a significant
fraction of eukaryotic regulation may be carried out by small RNAs
rather than proteins is the observation that in prokaryotes,
the number of "metabolic" genes appears to grow roughly linearly
with genome size, whereas the number of "regulatory" genes appears
to grow roughly quadratically with genome size. Assuming for the
moment that both these empirical "scaling laws" continued
indefinitely, it would imply that there would be more than one
regulator per metabolic gene in any prokaryote larger than
10,000--20,000 genes, i.e., the "regulatory" genes would
outnumber the genes being controlled, and the genome would be
"mostly regulatory," which seems absurd --- and indeed,
prokaryotes larger than this size do not seem to exist.
For the empirical data supporting this argument, see
<http://www.arxiv.org/abs/q-bio.MN/0311021>,
whose authors speculate that the so-called "higher eukaryotes"
may have had to evolve novel control mechanisms such as the use
of small "non-coding" regulatory RNAs to circumvent the problem
that a genome larger than 10,000-20,000 genes would be mostly
"regulatory" rather than "metabolic" if it attempted to use
protein interactions to regulate its metabolism.

which likewise explores the idea that the "higher euks"
may be dominated by small "non-coding" regulatory RNAs
rather than proteins.

-- Gordon D. Pusch

perl -e '\$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'