Re: [GP] Overfitting in GP
- On 24/8/04 4:20, Nitin Muttil wrote:
> I am noticing that when the depth of GP tree isIt sounds like it, Nitin, but you would need to look at the fits themselves
> increased, the model performs better in training, but worsens for the test
> data. I assume this is because of overfitting, something similar to
> overfitting in neural networks, when hidden layers/nodes are increased.
> 1) Is this actually overfitting?
to see whether it is.
> If so, is there a optimal GP equation size,It is highly data dependent, in my experience. One person's overfitting can
> or has it to be fixed by trial and error?
be another person's invaluable information! This is no different from any
other regression/forecasting problem, for which there is very extensive
> 2) Can I get pointers to studies to find optimal values of GP parameters likeIn different applications, yes - see the GP bibliography, etc. But you are
> equation size, population size, crossover and mutation rate, etc.
the judge as to whether any of that will apply to your data.
Assuming that your training dataset is noisy, I would suggest that your next
step is to produce a noise-free training dataset. You may need to complete
this by hand. Try training on that and then testing on regular noisy data.
Then try varying tree depth and complexity, population size, etc., to see
what effects those have. You will need not only to look at fitness and
prediction error, but actually to look at data plots. If your data has a lot
of outliers to which you do not wish fitting, then you may find it better to
use an absolute deviation in the fitness function, rather than the classical
squared deviation, which tends to weight towards outliers, of course - these
issues and others have been covered well in the GP and statistical
I wish you success,
Dr Howard Oakley
The Works columnist for MacUser magazine (UK)
- Hi Nitin,
Interesting problem. Yes, it seems like overfitting. Have you
considered pruning or even ensembles? You could even put a penalty
term in the fitness function which penalises trees if they are very
--- In firstname.lastname@example.org, "Nitin Muttil"
> Dear GP list,explain what HAB is in brief, it is an explosive growth of algae in
> I have been trying GP for harmful algal bloom (HAB) predictions. To
coastal waters, caused due to dumping pollutants in those waters. HABs
can be toxic and thus may harm aquatic life and in some cases even
>the models on the unseen test data. I am noticing that when the depth
> I am evolving GP models using a training dataset and then testing
of GP tree is increased, the model performs better in training, but
worsens for the test data. I assume this is because of overfitting,
something similar to overfitting in neural networks, when hidden
layers/nodes are increased.
>equation size, or has it to be fixed by trial and error?
> My questions are:
> 1) Is this actually overfitting? If so, is there a optimal GP
>parameters like equation size, population size, crossover and mutation
> 2) Can I get pointers to studies to find optimal values of GP
> Thanks very much and any help would be highly appreciated.
> Best regards,
- Another thing you can do is have about 15 different
test datasets. Then use a randomly selected testset
to test the GP's against. That way there is not any
possible way of "overfittness". REMEMBER NOT EVERY
CREATURE TRAVELS IN THE SAME SHOES.
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.