Loading ...
Sorry, an error occurred while loading the content.

Re: [GP] Binary Codings; & Living Genetic Systems

Expand Messages
  • David vun Kannon
    Dave, Thanks for clarifying your meaning. ... OK, assuming a straight forward integer encoding (this does need to be distinguished from binary encoding, even
    Message 1 of 10 , Jul 1, 2006
    • 0 Attachment
      Dave,

      Thanks for clarifying your meaning.

      >Now if the cut occurs in the low order bits, then the new integer will be
      >"close" to the second in value, but its actual value can't be predicted
      >entirely. Further, if the cut occurs in the higher order bits, then the
      >new
      >integer value can be wildly different from the original value(s). This is
      >what
      >I meant by "random" -- the delta between the original integer(s) and the
      >newly formed one is not entirely predicatible,

      OK, assuming a straight forward integer encoding (this does need to be
      distinguished from "binary" encoding, even for numerical parameters) the max
      difference is just less than the value of the bit that is just on the high
      side of the cut, for both children. I don't know if that qualifies as wildly
      unpredictable, but it is the mechanism that allows the algorithm to jump to
      a new part of the search space. Crossover jumps, mutation creeps.

      >and so that is why I say
      >that it resembles mutation in a sense, introducing wildly different integer
      >values into the search. FPGAs do not do this as a result of crossover.

      You'll have to specify your favorite FP crossover operator before we can
      discuss what it does. I can imagine lots of variations on FP crossover. Do
      you assume that crossover always occurs between FP parameters, which implies
      that crossover never changes the value of a FP param, just rearranges them?
      Or do you assume that crossover can occur at an FP locus, which implies
      feeding the two parent values into a function and getting a new value for
      each of the children at that locus?

      >
      >(Aside -- perhaps what I'm about to assert is blasphemy, but it seems
      >that the only reason that Xover is not a catastrophic mechanism in
      >BGAs is because the set of encodings across the population for a
      >particular integer tends to converge to a single value over the course
      >of the evolution. I.e., if you look at a single "column" in the
      >population and watch it, it tends to become homogeneous. At that
      >point, crossover is exactly as it is in FPGA and the problem of
      >cutting an integer in half is not an issue. Essentially the BGA has
      >become an FPGA with respect to the crossover operation.)

      Domino convergence is to be expected in BinInt problems. I agree there is a
      very rough analogy here to what you've been saying about exons and introns,
      but not necessarily to FPGAs. In any case, jumping around the search space
      is not a catastrophe, it is the way the algorithm works.

      > > With respect to living systems, are you trying to argue that the
      >proportion
      > > of exon to intron should be constant across genomes of different sizes,
      >or
      > > that some other rule should hold? What I hear you saying is that introns
      >are
      > > always good, because they help force crossover (which chemically
      >speaking
      > > could occur at any base pair) to happen at useful boundaries within the
      > > genome.
      >
      >Yes essentially that is it.
      >Introns serve to stabilize the crossover operation in a sense.

      It's an intersting conjecture, but I'm not aware of evidence to support it.
      I've heard the contrary, that intron/exon ratios rise in "higher" animals.
      If so, this could be taken as evidence that introns are more useful for
      creating alternative splicing sites within a gene than for creating any kind
      of guarantee that crossover takes place predominantly at gene boundaries. I
      don't know enough about intron research to speculate if different kinds of
      intron could serve different purposes.
    • Gordon D. Pusch
      ... To a good first approximation (and modulo a number of notable outliers ), the ratio of non-coding to coding DNA in the so-called higher eukaryotes is
      Message 2 of 10 , Jul 1, 2006
      • 0 Attachment
        On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

        > With respect to living systems, are you trying to argue
        > that the proportion of exon to intron should be constant
        > across genomes of different sizes, or that some other rule
        > should hold?

        To a good first approximation (and modulo a number of notable
        "outliers"), the ratio of non-coding to coding DNA in the
        so-called "higher eukaryotes" is directly proportional to
        genome size.

        In other words, to a first approximation, a genome that is
        twice as large will have about twice as much non-coding DNA.

        Put a third way, most of the "higher eukaryotes" so far sequences
        have on the order of 10,000--30,000 "coding regions," each of which
        typically contains between 1000 and 10,000 "coding characters;"
        the remainder of the genome is (as far as anyone currently knows)
        "non-coding" (or at least, it doesn't code for _proteins_ ---
        albeit it cannot yet be ruled out that some of the putative
        "non-coding" DNA might actually code for "small regulatory RNAs").
        Since all euk genomes have roughly the same order of magnitude
        of "coding" DNA, and since this "coding" DNA accounts for only
        a small fraction of the total genome size, with the remainder being
        allegedly "non-coding," it is perfectly natural that the amount
        of so-called "non-coding" (AKA "Junk") DNA would be directly
        proportional to genome size.

        There is also the additional complication that the so-called
        "higher eukaryotes" seem to make more use of "variant splicing"
        that the single-celled eukaryotes, i.e., each coding region
        actually codes for multiple (but related) protein products.
        Hence the number of "genes" may actually be significantly larger
        than the number of "coding regions."

        One empirical datum in favor of the idea that a significant
        fraction of eukaryotic regulation may be carried out by small RNAs
        rather than proteins is the observation that in prokaryotes,
        the number of "metabolic" genes appears to grow roughly linearly
        with genome size, whereas the number of "regulatory" genes appears
        to grow roughly quadratically with genome size. Assuming for the
        moment that both these empirical "scaling laws" continued
        indefinitely, it would imply that there would be more than one
        regulator per metabolic gene in any prokaryote larger than
        10,000--20,000 genes, i.e., the "regulatory" genes would
        outnumber the genes being controlled, and the genome would be
        "mostly regulatory," which seems absurd --- and indeed,
        prokaryotes larger than this size do not seem to exist.
        For the empirical data supporting this argument, see
        <http://www.arxiv.org/abs/q-bio.MN/0311021>,
        whose authors speculate that the so-called "higher eukaryotes"
        may have had to evolve novel control mechanisms such as the use
        of small "non-coding" regulatory RNAs to circumvent the problem
        that a genome larger than 10,000-20,000 genes would be mostly
        "regulatory" rather than "metabolic" if it attempted to use
        protein interactions to regulate its metabolism.

        See also <http://www.arxiv.org/abs/q-bio.MN/0412027>,
        which likewise explores the idea that the "higher euks"
        may be dominated by small "non-coding" regulatory RNAs
        rather than proteins.


        -- Gordon D. Pusch

        perl -e '$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'
      • Terry Soule
        Hi, For anyone who is interested some of these topics will be touched on in the Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
        Message 3 of 10 , Jul 5, 2006
        • 0 Attachment
          Hi,

          For anyone who is interested some of these topics will be touched on in the
          Evolution and Resiliency tutorial on Sunday afternoon at GECCO. Artificial
          evolutionary algorithms with variable length codings exhibit a variety of
          interesting behaviors in terms of both growth and contraction and coding and
          non-coding regions. It seems likely that how genomes evolve even in our
          relatively simple artificial systems is a lot more complex, and interesting,
          than is widely assumed. The tutorial (although it often becomes more of a
          discussion than a tutorial) covers recent research into how pressure for
          genetically robust/resilient solutions influences the evolutionary process,
          particularly genome size, coding/non-coding regions, etc.

          Hope to see you there,
          Terry Soule
          Department of Computer Science
          University of Idaho
          tsoule@...

          -----Original Message-----
          From: genetic_programming@yahoogroups.com
          [mailto:genetic_programming@yahoogroups.com]On Behalf Of Gordon D. Pusch
          Sent: Saturday, July 01, 2006 10:22 PM
          To: genetic_programming@yahoogroups.com
          Subject: Re: [GP] Binary Codings; & Living Genetic Systems


          On 2006-Jun-30, at 11:42 PM, David vun Kannon wrote:

          > With respect to living systems, are you trying to argue
          > that the proportion of exon to intron should be constant
          > across genomes of different sizes, or that some other rule
          > should hold?

          To a good first approximation (and modulo a number of notable
          "outliers"), the ratio of non-coding to coding DNA in the
          so-called "higher eukaryotes" is directly proportional to
          genome size.

          In other words, to a first approximation, a genome that is
          twice as large will have about twice as much non-coding DNA.

          Put a third way, most of the "higher eukaryotes" so far sequences
          have on the order of 10,000--30,000 "coding regions," each of which
          typically contains between 1000 and 10,000 "coding characters;"
          the remainder of the genome is (as far as anyone currently knows)
          "non-coding" (or at least, it doesn't code for _proteins_ ---
          albeit it cannot yet be ruled out that some of the putative
          "non-coding" DNA might actually code for "small regulatory RNAs").
          Since all euk genomes have roughly the same order of magnitude
          of "coding" DNA, and since this "coding" DNA accounts for only
          a small fraction of the total genome size, with the remainder being
          allegedly "non-coding," it is perfectly natural that the amount
          of so-called "non-coding" (AKA "Junk") DNA would be directly
          proportional to genome size.

          There is also the additional complication that the so-called
          "higher eukaryotes" seem to make more use of "variant splicing"
          that the single-celled eukaryotes, i.e., each coding region
          actually codes for multiple (but related) protein products.
          Hence the number of "genes" may actually be significantly larger
          than the number of "coding regions."

          One empirical datum in favor of the idea that a significant
          fraction of eukaryotic regulation may be carried out by small RNAs
          rather than proteins is the observation that in prokaryotes,
          the number of "metabolic" genes appears to grow roughly linearly
          with genome size, whereas the number of "regulatory" genes appears
          to grow roughly quadratically with genome size. Assuming for the
          moment that both these empirical "scaling laws" continued
          indefinitely, it would imply that there would be more than one
          regulator per metabolic gene in any prokaryote larger than
          10,000--20,000 genes, i.e., the "regulatory" genes would
          outnumber the genes being controlled, and the genome would be
          "mostly regulatory," which seems absurd --- and indeed,
          prokaryotes larger than this size do not seem to exist.
          For the empirical data supporting this argument, see
          <http://www.arxiv.org/abs/q-bio.MN/0311021>,
          whose authors speculate that the so-called "higher eukaryotes"
          may have had to evolve novel control mechanisms such as the use
          of small "non-coding" regulatory RNAs to circumvent the problem
          that a genome larger than 10,000-20,000 genes would be mostly
          "regulatory" rather than "metabolic" if it attempted to use
          protein interactions to regulate its metabolism.

          See also <http://www.arxiv.org/abs/q-bio.MN/0412027>,
          which likewise explores the idea that the "higher euks"
          may be dominated by small "non-coding" regulatory RNAs
          rather than proteins.


          -- Gordon D. Pusch

          perl -e '$_ = "gdpusch\@..."; s/[A-Z]+\.//g; print;'






          Yahoo! Groups Links
        Your message has been successfully submitted and would be delivered to recipients shortly.