Loading ...
Sorry, an error occurred while loading the content.
 

Re: [GP] prerequisites for using GP

Expand Messages
  • Julian Togelius
    Hi Sergey, Given the limited size of your problem, my guess is that a standard GP system with arithmetic function nodes would quickly find a solution with very
    Message 1 of 9 , Mar 30, 2009
      Hi Sergey,

      Given the limited size of your problem, my guess is that a standard GP
      system with arithmetic function nodes would quickly find a solution
      with very low error on the test set. However, it would likely display
      serious overfitting to the data points at hand. That is, if you draw a
      seventh data point from the same underlying distribution, there is a
      large risk that it would not be adequately predicted by your evolved
      expression tree (the error on the "validation set" would likely be
      large).

      This is something symbolic regression has in common with most
      supervised learning algorithms, and unfortunately there is no simple
      way around it. It just seems like you don't have enough data.

      Julian

      2009/3/30 Sergey Grechin <hq9000_vst@...>:
      > Hello everybody!
      >
      > I hope this list is an appropriate place to ask my question.
      >
      > It's going to be rather lenthy mail since I need to put across some
      > background. Thanks in advance for your attention.
      >
      > Imagine we have a system with N numerical inputs and 1 numerical output.
      > Let's say that using GP I want to guess the implicit relation between
      > inputs and output. The criterion for the better fitness will be the
      > lesser summarized error of the candidate solution at K historical samples.
      >
      > Very classical statement.
      >
      > Then to be concrete let's say I have N=30 (number of inputs) and K=6
      > (number of historical samples). As you see N is several times greater
      > than K.
      >
      > Now, if I wanted to use linear regression (or any parametric model
      > with about 30 parameters) - there would be now way to
      > proceed since input data set is too small to resolve against 30
      > unknown coefficients.
      >
      > But I can easily try to attack the problem with GP.
      > Formally nothing prevents me from doing so.
      >
      > But if paramtric model fail to deal with this case -
      > ****
      >
      > 1. what is so
      > special about symbolic regression that it does allow us to do this?
      >
      > 2. is there any prerequisites for problem statement to be able to use
      > GP?
      >
      > ****
      >
      > If we look at this as quantities of information - it's OK to derive
      > values for 30 coefficients from 30 facts about the model (30 samples).
      > But GP pretends to be able to find a formula interconnecting 30 inputs
      > basing on 6 facts...
      >
      > I'm really sorry for not being able to describe my concerns more
      > precisely but I think I've managed to conduct the feel of the
      > problem.
      >
      > Thanks!
      >
      > Sergey
      >
      >



      --
      Julian Togelius
      IDSIA
      Galleria 2
      6928 Manno-Lugano
      Switzerland
      julian@...
      http://julian.togelius.com
      http://www.idsia.ch/~togelius
      +41-764-110679
      +46-705-192088
    • Bob MacCallum
      It sounds like any evolutionary search-based approach will overfit massively to such a small dataset. I would not even attempt this problem with GP.
      Message 2 of 9 , Mar 30, 2009
        It sounds like any evolutionary search-based approach will overfit
        massively to such a small dataset.
        I would not even attempt this problem with GP.

        On Mon, Mar 30, 2009 at 5:22 AM, Sergey Grechin <hq9000_vst@...> wrote:
        > Hello everybody!
        >
        > I hope this list is an appropriate place to ask my question.
        >
        > It's going to be rather lenthy mail since I need to put across some
        > background. Thanks in advance for your attention.
        >
        > Imagine we have a system with N numerical inputs and 1 numerical output.
        > Let's say that using GP I want to guess the implicit relation between
        > inputs and output. The criterion for the better fitness will be the
        > lesser summarized error of the candidate solution at K historical samples.
        >
        > Very classical statement.
        >
        > Then to be concrete let's say I have N=30 (number of inputs) and K=6
        > (number of historical samples). As you see N is several times greater
        > than K.
        >
        > Now, if I wanted to use linear regression (or any parametric model
        > with about 30 parameters) - there would be now way to
        > proceed since input data set is too small to resolve against 30
        > unknown coefficients.
        >
        > But I can easily try to attack the problem with GP.
        > Formally nothing prevents me from doing so.
        >
        > But if paramtric model fail to deal with this case -
        > ****
        >
        > 1. what is so
        > special about symbolic regression that it does allow us to do this?
        >
        > 2. is there any prerequisites for problem statement to be able to use
        > GP?
        >
        > ****
        >
        > If we look at this as quantities of information - it's OK to derive
        > values for 30 coefficients from 30 facts about the model (30 samples).
        > But GP pretends to be able to find a formula interconnecting 30 inputs
        > basing on 6 facts...
        >
        > I'm really sorry for not being able to describe my concerns more
        > precisely but I think I've managed to conduct the feel of the
        > problem.
        >
        > Thanks!
        >
        > Sergey
        >
        >
      • Terence Soule
        I believe that there have been some successes using GP to analyze microarray data where the ratio of inputs to training data is similar although the scale is
        Message 3 of 9 , Mar 30, 2009
          I believe that there have been some successes using GP to analyze
          microarray data where the ratio of inputs to training data is similar
          although the scale is much larger (1000's to 10,000s of input
          variables and only 100's of historical data points). You could look
          at the techniques that have been used there. Although if you really
          only have K=6 I think you're in trouble.

          Terry Soule
          Department of Computer Science
          University of Idaho
          Moscow, ID 83843
          tsoule@...

          On Mon, Mar 30, 2009 at 11:08 PM, Sergey Grechin <hq9000_vst@...> wrote:
          > Hi,
          >
          > Thanks for opinion.
          > Any idea where to find for reasonable approach to analyze such data?
          > I have no option to get more observations.
          >
          > I think I can somehow fight this overfitting by harder constrains on
          > complexity of the member of the population.
          >
          > What do you think?
          >
          > Sergey
          >
          >>It sounds like any evolutionary search-based approach will overfit
          >>massively to such a small dataset.
          >>I would not even attempt this problem with GP.
          >
          >
          >
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
        • Sergey Grechin
          Hi, Thanks for opinion. Any idea where to find for reasonable approach to analyze such data? I have no option to get more observations. I think I can somehow
          Message 4 of 9 , Mar 30, 2009
            Hi,

            Thanks for opinion.
            Any idea where to find for reasonable approach to analyze such data?
            I have no option to get more observations.

            I think I can somehow fight this overfitting by harder constrains on
            complexity of the member of the population.

            What do you think?

            Sergey

            >It sounds like any evolutionary search-based approach will overfit
            >massively to such a small dataset.
            >I would not even attempt this problem with GP.
          • R. Muhammad Atif Azad
            On Tue, 31 Mar 2009, R. Muhammad Atif Azad wrote: I agree that your best hope is GP simplifying the functional mapping: it may make a large number of inputs
            Message 5 of 9 , Mar 31, 2009
              On Tue, 31 Mar 2009, R. Muhammad Atif Azad wrote:

              I agree that your best hope is GP simplifying the functional mapping: it may
              make a large number of inputs redundant as well. One obvious benchmark may be
              (with least squares regression) would be to keep the degree of the polynomial
              at least less than the number of sample points (or else there is no data
              compression achieved). To avoid overfitting in that case, you may also
              consider ridge regression.

              On Mon, 30 Mar 2009, Sergey Grechin wrote:

              >>
              >> Hi,
              >>
              >> Thanks for opinion.
              >> Any idea where to find for reasonable approach to analyze such data?
              >> I have no option to get more observations.
              >>
              >> I think I can somehow fight this overfitting by harder constrains on
              >> complexity of the member of the population.
              >>
              >> What do you think?
              >>
              >> Sergey
              >>
              >> >It sounds like any evolutionary search-based approach will overfit
              >> >massively to such a small dataset.
              >> >I would not even attempt this problem with GP.
            • Terence Soule
              Hi, I think if you just go to Google scholar and do a search on: microarrays genetic programming you should turn up a number of papers. Here s a few
              Message 6 of 9 , Mar 31, 2009
                Hi,

                I think if you just go to Google scholar and do a search on:
                microarrays genetic programming
                you should turn up a number of papers. Here's a few citations:
                @article{hong2006ccb,
                title={{The classification of cancer based on DNA microarray data
                that uses diverse ensemble genetic programming}},
                author={Hong, J.H. and Cho, S.B.},
                journal={Artificial Intelligence in Medicine},
                volume={36},
                number={1},
                pages={43--58},
                year={2006},
                publisher={Elsevier}
                }


                @article{langdon2004gpm,
                title={{Genetic programming for mining DNA chip data from cancer patients}},
                author={Langdon, WB and Buxton, BF},
                journal={Genetic Programming and Evolvable Machines},
                volume={5},
                number={3},
                pages={251--257},
                year={2004},
                publisher={Springer}
                }

                Terry

                On Tue, Mar 31, 2009 at 10:11 AM, Sergey Grechin <hq9000_vst@...> wrote:
                > Hi Terry,
                >
                > Could you plase give a name of the paper or link or something that could
                > help me to find more info on this research?
                >
                > Thanks
                >
                > Sergey
                >
                >> I believe that there have been some successes using GP to analyze
                >> microarray data where the ratio of inputs to training data is similar
                >> although the scale is much larger (1000's to 10,000s of input
                >> variables and only 100's of historical data points). You could look
                >> at the techniques that have been used there. Although if you really
                >> only have K=6 I think you're in trouble.
                >
                >> Terry Soule
                >> Department of Computer Science
                >> University of Idaho
                >> Moscow, ID 83843
                >> tsoule@...
                >
                >> On Mon, Mar 30, 2009 at 11:08 PM, Sergey Grechin <hq9000_vst@...> wrote:
                >>> Hi,
                >>>
                >>> Thanks for opinion.
                >>> Any idea where to find for reasonable approach to analyze such data?
                >>> I have no option to get more observations.
                >>>
                >>> I think I can somehow fight this overfitting by harder constrains on
                >>> complexity of the member of the population.
                >>>
                >>> What do you think?
                >>>
                >>> Sergey
                >>>
                >>>>It sounds like any evolutionary search-based approach will overfit
                >>>>massively to such a small dataset.
                >>>>I would not even attempt this problem with GP.
                >>>
                >>>
                >>>
                >>>
                >>>
                >>> ------------------------------------
                >>>
                >>> Yahoo! Groups Links
                >>>
                >>>
                >>>
                >>>
                >>
                >
                >
                >
                > --
                > Ñ óâàæåíèåì,
                >  Sergey                          mailto:hq9000_vst@...
                >
                >
                >
                > ------------------------------------
                >
                > Yahoo! Groups Links
                >
                >
                >
                >
              • Sergey Grechin
                Hi Terry, Could you plase give a name of the paper or link or something that could help me to find more info on this research? Thanks Sergey ... -- Ñ
                Message 7 of 9 , Mar 31, 2009
                  Hi Terry,

                  Could you plase give a name of the paper or link or something that could
                  help me to find more info on this research?

                  Thanks

                  Sergey

                  > I believe that there have been some successes using GP to analyze
                  > microarray data where the ratio of inputs to training data is similar
                  > although the scale is much larger (1000's to 10,000s of input
                  > variables and only 100's of historical data points). You could look
                  > at the techniques that have been used there. Although if you really
                  > only have K=6 I think you're in trouble.

                  > Terry Soule
                  > Department of Computer Science
                  > University of Idaho
                  > Moscow, ID 83843
                  > tsoule@...

                  > On Mon, Mar 30, 2009 at 11:08 PM, Sergey Grechin <hq9000_vst@...> wrote:
                  >> Hi,
                  >>
                  >> Thanks for opinion.
                  >> Any idea where to find for reasonable approach to analyze such data?
                  >> I have no option to get more observations.
                  >>
                  >> I think I can somehow fight this overfitting by harder constrains on
                  >> complexity of the member of the population.
                  >>
                  >> What do you think?
                  >>
                  >> Sergey
                  >>
                  >>>It sounds like any evolutionary search-based approach will overfit
                  >>>massively to such a small dataset.
                  >>>I would not even attempt this problem with GP.
                  >>
                  >>
                  >>
                  >>
                  >>
                  >> ------------------------------------
                  >>
                  >> Yahoo! Groups Links
                  >>
                  >>
                  >>
                  >>
                  >



                  --
                  Ñ óâàæåíèåì,
                  Sergey mailto:hq9000_vst@...
                • adil raja
                  Salam mate, Can u give me a call at my number at 0323 444 9019? Best, Adil Raja ________________________________ From: R. Muhammad Atif Azad
                  Message 8 of 9 , Apr 1, 2009
                    Salam mate,
                    Can u give me a call at my number at 0323 444 9019?

                    Best,
                    Adil Raja




                    ________________________________
                    From: R. Muhammad Atif Azad <atif.azad@...>
                    To: genetic_programming@yahoogroups.com
                    Sent: Tuesday, March 31, 2009 5:44:31 PM
                    Subject: Re: [GP] prerequisites for using GP




                    On Tue, 31 Mar 2009, R. Muhammad Atif Azad wrote:

                    I agree that your best hope is GP simplifying the functional mapping: it may
                    make a large number of inputs redundant as well. One obvious benchmark may be
                    (with least squares regression) would be to keep the degree of the polynomial
                    at least less than the number of sample points (or else there is no data
                    compression achieved). To avoid overfitting in that case, you may also
                    consider ridge regression.

                    On Mon, 30 Mar 2009, Sergey Grechin wrote:

                    >>
                    >> Hi,
                    >>
                    >> Thanks for opinion.
                    >> Any idea where to find for reasonable approach to analyze such data?
                    >> I have no option to get more observations.
                    >>
                    >> I think I can somehow fight this overfitting by harder constrains on
                    >> complexity of the member of the population.
                    >>
                    >> What do you think?
                    >>
                    >> Sergey
                    >>
                    >> >It sounds like any evolutionary search-based approach will overfit
                    >> >massively to such a small dataset.
                    >> >I would not even attempt this problem with GP.






                    [Non-text portions of this message have been removed]
                  Your message has been successfully submitted and would be delivered to recipients shortly.