Re: [Re: AI-GEOSTATS: cross-validation]

Dear Ercan, the choice of the cross-validation error function should depend on the objectives of your work: you could focus on extreme values, on the root mean
Mar 26, 2002
Dear Ercan,

the choice of the cross-validation error function should depend on the
objectives of your work: you could focus on extreme values, on the
root mean squared error (RMSE), the mean absolute error (MAE), the mean
relative error (MRE), etc. Unfortunately I didn't find much litterature on how
to select an "adequate" error functions. Jeff Myers discussed the problem and
a case study can be found at

ftp://ftp.geog.uwo.ca/SIC97/Thieken/Thieken.html

You could also have a look at the following paper

ftp://ftp.geog.uwo.ca/SIC97/Tomczak/Tomczak.html

which deals with cross-validation and jacknifing.

For what concerns k-fold cross-validation (isn't it the name of
cross-validation methods where a subset of the dataset is removed??), you
would have to repeat the resampling quite a large number of times in order to
avoid too much bias.

Just a few thoughts,

Gregoire

Soeren Nymand Lophaven <snl@...> wrote:
> Dear Ercan
>
> Cross validation could for example be performed by discarding a single
> observation from the dataset and predicting this single observation based on

> the rest of the dataset and with the proposed model. This is repeated for
all
> the observations in the dataset, and as a measure of the goodness of the
model
> you could calculate
>
> GOM = sum[(observation - predicted)^2]/n
>
> If you have n observations this procedure is referred to as n-fold cross
> validation. Altenatively you could split your dataset into 10 parts,
> then discarding one part and predicting this part based on the other 9
> parts and the proposed model, would give you 10-fold cross validation.
>
> If you want to compare different models, off course the one which gives
> the lowest GOM - value is the best for predicting the variable.
>
> Best regards / Venlig hilsen
>
> Søren Lophaven
>
> Master of Science in Engineering | Ph.D. student
> Informatics and Mathematical Modelling | Building 321, Room 011
> Technical University of Denmark | 2800 kgs. Lyngby, Denmark
> E-mail: snl@... | http://www.imm.dtu.dk/~snl
> Telephone: +45 45253419 |
>
>
> On Mon, 25 Mar 2002, Ercan Yesilirmak wrote:
>
> > Dear list members
> >
> > My question is as folows:
> >
> > For a variable after getting a number of models all of
> > which seem well, how to decide which one is the best
> > among them based on cross-validation results.
> > i.e., How to use cross-validation results of a model
> > to compare with those of others?
> >
> > Best regards
> > Ercan Yesilirmak
> >
> >
>
>
Ercan - I can provide you with a copy of my 1991 paper on Type-Casting of Error that describes classifying Type I and II errors according to threshold cutoffs.
Mar 26, 2002
Ercan -

I can provide you with a copy of my 1991 paper on Type-Casting of Error that
describes classifying Type I and II errors according to threshold cutoffs.
This paper also describes an environmental application of the technique.

Jeff Myers
Fellow Engineer
Westinghouse Safety Management Solutions
2131 S. Centennial Ave., SE
Aiken, SC 29803
803.502.9747 (direct)
803.502.9767 (main)
803.502.2747 (fax)
803.221.1141 (cell)
email: jeff.myers@...
website: http://www.gemdqos.com

Dear Ercan,

the choice of the cross-validation error function should depend on the
objectives of your work: you could focus on extreme values, on the
root mean squared error (RMSE), the mean absolute error (MAE), the mean
relative error (MRE), etc. Unfortunately I didn't find much litterature on
how
to select an "adequate" error functions. Jeff Myers discussed the problem
and
a case study can be found at

ftp://ftp.geog.uwo.ca/SIC97/Thieken/Thieken.html

You could also have a look at the following paper

ftp://ftp.geog.uwo.ca/SIC97/Tomczak/Tomczak.html

which deals with cross-validation and jacknifing.

For what concerns k-fold cross-validation (isn't it the name of
cross-validation methods where a subset of the dataset is removed??), you
would have to repeat the resampling quite a large number of times in order
to
avoid too much bias.

Just a few thoughts,

Gregoire

Soeren Nymand Lophaven <snl@...> wrote:
> Dear Ercan
>
> Cross validation could for example be performed by discarding a single
> observation from the dataset and predicting this single observation based
on

> the rest of the dataset and with the proposed model. This is repeated for
all
> the observations in the dataset, and as a measure of the goodness of the
model
> you could calculate
>
> GOM = sum[(observation - predicted)^2]/n
>
> If you have n observations this procedure is referred to as n-fold cross
> validation. Altenatively you could split your dataset into 10 parts,
> then discarding one part and predicting this part based on the other 9
> parts and the proposed model, would give you 10-fold cross validation.
>
> If you want to compare different models, off course the one which gives
> the lowest GOM - value is the best for predicting the variable.
>
> Best regards / Venlig hilsen
>
> Søren Lophaven
>
> Master of Science in Engineering | Ph.D. student
> Informatics and Mathematical Modelling | Building 321, Room 011
> Technical University of Denmark | 2800 kgs. Lyngby, Denmark
> E-mail: snl@... | http://www.imm.dtu.dk/~snl
> Telephone: +45 45253419 |
>
>
> On Mon, 25 Mar 2002, Ercan Yesilirmak wrote:
>
> > Dear list members
> >
> > My question is as folows:
> >
> > For a variable after getting a number of models all of
> > which seem well, how to decide which one is the best
> > among them based on cross-validation results.
> > i.e., How to use cross-validation results of a model
> > to compare with those of others?
> >
> > Best regards
> > Ercan Yesilirmak
> >
> >
>
>
