Loading ...
Sorry, an error occurred while loading the content.

RE: [GP] Binary Codings; & Living Genetic Systems

Expand Messages
  • Paul S. Coates
    hello again list! this is turning to be a nice discussion may I mention Sean b Carroll s book endless forms most beautiful - the idea of evo devo which
    Message 1 of 10 , Jul 1, 2006
    • 0 Attachment
      hello again list! this is turning to be a nice discussion

      may I mention Sean b Carroll's book endless forms most beautiful - the idea
      of "evo devo" which explains for dilettantes like me with no hard biological
      knowledge about the embryological significance of the position of switch
      components in dna strings and how if they move (as a result of crossover )it
      can have a huge affect on the developmental history of the phenotype.

      so as falkner says, representations are not a simple mapping, and any useful
      scheme will have to be many layered with meta instructions coded for in
      meta-meta stings coded for in meta-meta-meta strings and so ad infinitum (or
      as many levels as have evolved to be handy)

      also I am not so sure Carroll really believes in 'junk' dna though I have
      often thought like falkner they would work well as buffer zones

      dawkins old article the evolution of evolvability (AL II 1989?) is a good
      intro to the importance of developmental algorithms as well as genetic ones
      in any morphological project anyway.

      and of course if you want to evolve these meta instructions/emergent
      developmental algorithms why not turn to GP which is set up to just this?

      hurrah!

      paul coates
      UEL UK



      ________________________________

      From: genetic_programming@yahoogroups.com on behalf of Dave Faulkner
      Sent: Sat 01-Jul-06 7:12 AM
      To: genetic_programming@yahoogroups.com
      Subject: Re: [GP] Binary Codings; & Living Genetic Systems





      > Dave, I don't see how you can justify saying that the result of crossover
      is
      > randomly valued. Your respect for binary coded GAs might go up if you
      > clarified that point for yourself.

      I've only coded a single binary GA so perhaps my assumptions are wrong.
      (I've coded and use numerous integer and FPGAs). Or maybe my use of
      the term "random" is inaccurate. Here is what I mean:

      Lets assume that the binary chrom encodes a set of integers that are used
      in a numerical optimization problem. I also assume that a straight, non-grey
      coding is used.

      To perform a single point crossover in binary GA, randomly choose a
      position in the chrom and perform a cut at that point and recombine
      with another chromosome that has been cut at the same place.

      If that cut occurs within a parameter, then what will the newly encoded
      integer value be? It will be a new number that contains the high order bits
      of the first chrom and the low order bits of the second in one chrom,
      and its compliment in the other chrom.

      Now if the cut occurs in the low order bits, then the new integer will be
      "close" to the second in value, but its actual value can't be predicted
      entirely. Further, if the cut occurs in the higher order bits, then the new
      integer value can be wildly different from the original value(s). This is
      what
      I meant by "random" -- the delta between the original integer(s) and the
      newly formed one is not entirely predicatible, and so that is why I say
      that it resembles mutation in a sense, introducing wildly different integer
      values into the search. FPGAs do not do this as a result of crossover.

      (Aside -- perhaps what I'm about to assert is blasphemy, but it seems
      that the only reason that Xover is not a catastrophic mechanism in
      BGAs is because the set of encodings across the population for a
      particular integer tends to converge to a single value over the course
      of the evolution. I.e., if you look at a single "column" in the
      population and watch it, it tends to become homogeneous. At that
      point, crossover is exactly as it is in FPGA and the problem of
      cutting an integer in half is not an issue. Essentially the BGA has
      become an FPGA with respect to the crossover operation.)

      Of course, if the binary string encodes something other than an integer,
      then this may not be the case, which brings me to a second point.

      With respect to your comments in your last note about "different, but not
      necessarily better" I would entirely agree: all representation schemes are
      subject to the problem space that is being searched. This is the basis for
      the "No Free Lunch" theorem, of course -- no representation scheme
      can be separated (in the evaluation of it) from the problem you are
      trying to solve.

      > With respect to living systems, are you trying to argue that the proportion
      > of exon to intron should be constant across genomes of different sizes, or
      > that some other rule should hold? What I hear you saying is that introns
      are
      > always good, because they help force crossover (which chemically speaking
      > could occur at any base pair) to happen at useful boundaries within the
      > genome.

      Yes essentially that is it.

      I am arguing that the number of crossovers that can occur without
      destroying a gamete cell is subject to the amount of disruption that
      can be absorbed by the gene set, and that this number increases with
      larger and larger intron space. As I pointed out in my last note,
      exon shuffling is only possible because of introns, and organisms
      without introns do not use crossover as an evolutionary mechanism. It
      may be that the number of crossovers that can occur in the formation
      of a gamete is dictated by this exon to intron ratio, and that
      examining this value across species may show some kind of pan-species
      relationship along this line. It is true that "high-order" eukaryotes
      (multicellular species with discernible nuclei) have large intron
      spaces and prokaryotes (single cell organisms without nuclei) have
      very few. Introns serve to stabilize the crossover operation in a sense.

      More importantly I think we are all saying that in some problem
      spaces, such as numerical optimization, certain chromosome
      organizations can enhance the evolutionary process, and that there is
      value in modularizing the chromosome. I guess I am also saying that
      nature has provided a good example of this in living genetic systems.


      Dave Faulkner








      Yahoo! Groups Links












      [Non-text portions of this message have been removed]
    • Dave Faulkner
      Message 2 of 10 , Jul 1, 2006
      • 0 Attachment
        << I am resubmitting this post as I did not receive it back after 12 hours.
        << My apologies if you received this twice.

        Sorry but I have to rebut this response ......

        Mariusz wrote:

        > The description above is a simplification - the functional mapping from
        > DNA to proteins is not as simple as described above. All levels: DNA
        > molecules, RNA and mRNA molecules, etc, together with other proteins may
        > (and often do) participate in formation of new protein strands from a
        > given DNA codon(s). The reactions are not quite like simple sequences
        > (or directer-graph like), but they all are more network-like, with
        > hypercyclic dependencies.

        ...etc.

        Of course this is a simplification. This is a discussion about
        genetic mechanisms and not molecular biology; it is also not a
        discussion about how proteins are constructed or about how that
        construction is controlled via transcription regulatory networks. It
        is a discussion about how a chromosome can be organized to
        enhance the evolutionary process.

        Further, the role of the intron regions is not understood. Most of
        the known control regions are actually codings for proteins derived
        from genes that are in the exons - not introns. I.e., proteins are
        constructed to control the creation of other proteins. Moreover large
        regions of introns are simple repetitions of nonsense sequences that have
        no useable function, control or otherwise -- except perhaps to serve
        as a buffer for crossover and mutation events, or to serve as spacing
        when the DNA folds back on itself as part of some control. Theory
        about the role of introns seems to me very speculative based on
        observations of the conservation of certain sequences, etc.

        Actually this spacing concept of introns may be a useful general
        concept in recombination. A single protein can be encoded across a
        sequence of exons and introns, and if the crossover event occurs
        within that gene but hits an intron region, then the protein will
        survive and a new protein may emerge. This is because each exon codes
        for a module within the protein, and the crossover rearranges the
        modules rather than individual amino acid encodings. (ref: "Molecular
        Biology of the Cell", 4th edition, p462.) But this is similar to the
        FPGA organization if some number of FP numbers map to a single
        parameter in the objective function. This differs from the typical
        FPGA organization but might be a useful idea of modularity within
        the chromosome.

        Continuing, isn't it interesting that there are so relatively few exons in
        species like bacteria (i.e., asexually reproducing species) that do NOT
        perform crossover? For these species, mutation is the primary mechanism
        of change in their evolutionary path (excluding plasmids).

        Why is it so difficult to accept that 98% of the chromosome is non-coding?
        At least half of my code, character by character, is comments and spacing!
        Actually I'd like to claim credit for this idea (it came to me in class one night)
        but it is actually discussed somewhat in the above reference. What seems
        clear to me is that introns serve as a minimum to organize the chromosome
        into modules that resist crossover disruption, and this seems very similar
        to me as the chromosome organization in an FPGA.

        On the other hand, I would certainly concede that introns are not entirely
        without function (another simplification on my part) and you've got me
        doing some reading on the topic. Apparently there are at least five
        types of introns, some of which code for regulatory iRNA.

        I would like to conclude with the following quote from ScientificAmerica.com:

        "In 1978 Walter Gilbert of Harvard expressed a different view of the
        nature of introns (in the same report in which he coined the terms
        'exon' and 'intron'). He suggested that introns could speed up
        evolution by promoting genetic recombinations between exons. This
        process (which he called 'exon shuffling') would be directly
        associated with formation of new genes. Introns, from this
        perspective, have a profound purpose. They serve as hot spots for
        recombination in the formation of new combinations of exons. In other
        words, they are in our genes because they have been used during
        evolution as a faster pathway to assemble new genes. Over the past 10
        years, the exon shuffling idea has been supported by data from various
        experimental approaches."
      • David vun Kannon
        Dave, Thanks for clarifying your meaning. ... OK, assuming a straight forward integer encoding (this does need to be distinguished from binary encoding, even
        Message 3 of 10 , Jul 1, 2006
        • 0 Attachment
          Dave,

          Thanks for clarifying your meaning.

          >Now if the cut occurs in the low order bits, then the new integer will be
          >"close" to the second in value, but its actual value can't be predicted
          >entirely. Further, if the cut occurs in the higher order bits, then the
          >new
          >integer value can be wildly different from the original value(s). This is
          >what
          >I meant by "random" -- the delta between the original integer(s) and the
          >newly formed one is not entirely predicatible,

          OK, assuming a straight forward integer encoding (this does need to be
          distinguished from "binary" encoding, even for numerical parameters) the max
          difference is just less than the value of the bit that is just on the high
          side of the cut, for both children. I don't know if that qualifies as wildly
          unpredictable, but it is the mechanism that allows the algorithm to jump to
          a new part of the search space. Crossover jumps, mutation creeps.

          >and so that is why I say
          >that it resembles mutation in a sense, introducing wildly different integer
          >values into the search. FPGAs do not do this as a result of crossover.

          You'll have to specify your favorite FP crossover operator before we can
          discuss what it does. I can imagine lots of variations on FP crossover. Do
          you assume that crossover always occurs between FP parameters, which implies
          that crossover never changes the value of a FP param, just rearranges them?
          Or do you assume that crossover can occur at an FP locus, which implies
          feeding the two parent values into a function and getting a new value for
          each of the children at that locus?

          >
          >(Aside -- perhaps what I'm about to assert is blasphemy, but it seems
          >that the only reason that Xover is not a catastrophic mechanism in
          >BGAs is because the set of encodings across the population for a
          >particular integer tends to converge to a single value over the course
          >of the evolution. I.e., if you look at a single "column" in the
          >population and watch it, it tends to become homogeneous. At that
          >point, crossover is exactly as it is in FPGA and the problem of
          >cutting an integer in half is not an issue. Essentially the BGA has
          >become an FPGA with respect to the crossover operation.)

          Domino convergence is to be expected in BinInt problems. I agree there is a
          very rough analogy here to what you've been saying about exons and introns,
          but not necessarily to FPGAs. In any case, jumping around the search space
          is not a catastrophe, it is the way the algorithm works.

          > > With respect to living systems, are you trying to argue that the
          >proportion
          > > of exon to intron should be constant across genomes of different sizes,
          >or
          > > that some other rule should hold? What I hear you saying is that introns
          >are
          > > always good, because they help force crossover (which chemically
          >speaking
          > > could occur at any base pair) to happen at useful boundaries within the
          > > genome.
          >
          >Yes essentially that is it.
          >Introns serve to stabilize the crossover operation in a sense.

          It's an intersting conjecture, but I'm not aware of evidence to support it.
          I've heard the contrary, that intron/exon ratios rise in "higher" animals.
          If so, this could be taken as evidence that introns are more useful for
          creating alternative splicing sites within a gene than for creating any kind
          of guarantee that crossover takes place predominantly at gene boundaries. I
          don't know enough about intron research to speculate if different kinds of
          intron could serve different purposes.
        • Gordon D. Pusch
          ... To a good first approximation (and modulo a number of notable outliers ), the ratio of non-coding to coding DNA in the so-called higher eukaryotes is
          Message 4 of 10 , Jul 1, 2006
          • 0 Attachment
            On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

            > With respect to living systems, are you trying to argue
            > that the proportion of exon to intron should be constant
            > across genomes of different sizes, or that some other rule
            > should hold?

            To a good first approximation (and modulo a number of notable
            "outliers"), the ratio of non-coding to coding DNA in the
            so-called "higher eukaryotes" is directly proportional to
            genome size.

            In other words, to a first approximation, a genome that is
            twice as large will have about twice as much non-coding DNA.

            Put a third way, most of the "higher eukaryotes" so far sequences
            have on the order of 10,000--30,000 "coding regions," each of which
            typically contains between 1000 and 10,000 "coding characters;"
            the remainder of the genome is (as far as anyone currently knows)
            "non-coding" (or at least, it doesn't code for _proteins_ ---
            albeit it cannot yet be ruled out that some of the putative
            "non-coding" DNA might actually code for "small regulatory RNAs").
            Since all euk genomes have roughly the same order of magnitude
            of "coding" DNA, and since this "coding" DNA accounts for only
            a small fraction of the total genome size, with the remainder being
            allegedly "non-coding," it is perfectly natural that the amount
            of so-called "non-coding" (AKA "Junk") DNA would be directly
            proportional to genome size.

            There is also the additional complication that the so-called
            "higher eukaryotes" seem to make more use of "variant splicing"
            that the single-celled eukaryotes, i.e., each coding region
            actually codes for multiple (but related) protein products.
            Hence the number of "genes" may actually be significantly larger
            than the number of "coding regions."

            One empirical datum in favor of the idea that a significant
            fraction of eukaryotic regulation may be carried out by small RNAs
            rather than proteins is the observation that in prokaryotes,
            the number of "metabolic" genes appears to grow roughly linearly
            with genome size, whereas the number of "regulatory" genes appears
            to grow roughly quadratically with genome size. Assuming for the
            moment that both these empirical "scaling laws" continued
            indefinitely, it would imply that there would be more than one
            regulator per metabolic gene in any prokaryote larger than
            10,000--20,000 genes, i.e., the "regulatory" genes would
            outnumber the genes being controlled, and the genome would be
            "mostly regulatory," which seems absurd --- and indeed,
            prokaryotes larger than this size do not seem to exist.
            For the empirical data supporting this argument, see
            <http://www.arxiv.org/abs/q-bio.MN/0311021>,
            whose authors speculate that the so-called "higher eukaryotes"
            may have had to evolve novel control mechanisms such as the use
            of small "non-coding" regulatory RNAs to circumvent the problem
            that a genome larger than 10,000-20,000 genes would be mostly
            "regulatory" rather than "metabolic" if it attempted to use
            protein interactions to regulate its metabolism.

            See also <http://www.arxiv.org/abs/q-bio.MN/0412027>,
            which likewise explores the idea that the "higher euks"
            may be dominated by small "non-coding" regulatory RNAs
            rather than proteins.


            -- Gordon D. Pusch

            perl -e '$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'
          • Terry Soule
            Hi, For anyone who is interested some of these topics will be touched on in the Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
            Message 5 of 10 , Jul 5, 2006
            • 0 Attachment
              Hi,

              For anyone who is interested some of these topics will be touched on in the
              Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
              evolutionary algorithms with variable length codings exhibit a variety of
              interesting behaviors in terms of both growth and contraction and coding and
              non-coding regions. It seems likely that how genomes evolve even in our
              relatively simple artificial systems is a lot more complex, and interesting,
              than is widely assumed. The tutorial (although it often becomes more of a
              discussion than a tutorial) covers recent research into how pressure for
              genetically robust/resilient solutions influences the evolutionary process,
              particularly genome size, coding/non-coding regions, etc.

              Hope to see you there,
              Terry Soule
              Department of Computer Science
              University of Idaho
              tsoule@...

              -----Original Message-----
              From: genetic_programming@yahoogroups.com
              [mailto:genetic_programming@yahoogroups.com]On Behalf Of Gordon D. Pusch
              Sent: Saturday, July 01, 2006 10:22 PM
              To: genetic_programming@yahoogroups.com
              Subject: Re: [GP] Binary Codings; & Living Genetic Systems


              On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

              > With respect to living systems, are you trying to argue
              > that the proportion of exon to intron should be constant
              > across genomes of different sizes, or that some other rule
              > should hold?

              To a good first approximation (and modulo a number of notable
              "outliers"), the ratio of non-coding to coding DNA in the
              so-called "higher eukaryotes" is directly proportional to
              genome size.

              In other words, to a first approximation, a genome that is
              twice as large will have about twice as much non-coding DNA.

              Put a third way, most of the "higher eukaryotes" so far sequences
              have on the order of 10,000--30,000 "coding regions," each of which
              typically contains between 1000 and 10,000 "coding characters;"
              the remainder of the genome is (as far as anyone currently knows)
              "non-coding" (or at least, it doesn't code for _proteins_ ---
              albeit it cannot yet be ruled out that some of the putative
              "non-coding" DNA might actually code for "small regulatory RNAs").
              Since all euk genomes have roughly the same order of magnitude
              of "coding" DNA, and since this "coding" DNA accounts for only
              a small fraction of the total genome size, with the remainder being
              allegedly "non-coding," it is perfectly natural that the amount
              of so-called "non-coding" (AKA "Junk") DNA would be directly
              proportional to genome size.

              There is also the additional complication that the so-called
              "higher eukaryotes" seem to make more use of "variant splicing"
              that the single-celled eukaryotes, i.e., each coding region
              actually codes for multiple (but related) protein products.
              Hence the number of "genes" may actually be significantly larger
              than the number of "coding regions."

              One empirical datum in favor of the idea that a significant
              fraction of eukaryotic regulation may be carried out by small RNAs
              rather than proteins is the observation that in prokaryotes,
              the number of "metabolic" genes appears to grow roughly linearly
              with genome size, whereas the number of "regulatory" genes appears
              to grow roughly quadratically with genome size. Assuming for the
              moment that both these empirical "scaling laws" continued
              indefinitely, it would imply that there would be more than one
              regulator per metabolic gene in any prokaryote larger than
              10,000--20,000 genes, i.e., the "regulatory" genes would
              outnumber the genes being controlled, and the genome would be
              "mostly regulatory," which seems absurd --- and indeed,
              prokaryotes larger than this size do not seem to exist.
              For the empirical data supporting this argument, see
              <http://www.arxiv.org/abs/q-bio.MN/0311021>,
              whose authors speculate that the so-called "higher eukaryotes"
              may have had to evolve novel control mechanisms such as the use
              of small "non-coding" regulatory RNAs to circumvent the problem
              that a genome larger than 10,000-20,000 genes would be mostly
              "regulatory" rather than "metabolic" if it attempted to use
              protein interactions to regulate its metabolism.

              See also <http://www.arxiv.org/abs/q-bio.MN/0412027>,
              which likewise explores the idea that the "higher euks"
              may be dominated by small "non-coding" regulatory RNAs
              rather than proteins.


              -- Gordon D. Pusch

              perl -e '$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'






              Yahoo! Groups Links
            Your message has been successfully submitted and would be delivered to recipients shortly.