- Jun 16, 1998Hello

thanks to every one who responded to my question. Here is a summary

of responses, I'm also going to post a few follow up questions and

more details about my problem (which some people requested) in a

couple of days. I've included all the postings at the bottom of this

mail (apologies for the length but I don't think I'm competent to

edit some of the mails I received)

With regard to a rule of thumb for search radius, a consensus

appeared to be the correlation radius, but most qualified

their answers to suggest some investigation before choosing.

Some people suggested that many points (i.e up to 100 say) could be

included particularly as the power of modern computers makes

computational time less significant. One person suggested that 10-20

points was a more appropriate number.

Several people commented on problems they experience when trying

to match the demands of the kriging algorithm with the reality of

the available dataset. This is something of particular interest

to us because in many ways our data has all the worst qualities:

very limited, non-uniformly spread, of varying quality and biased.

But I'll return to this in a future post.

thanks Tom

--------------------------------------------------------------------

My original question was:

>I am investigating the use of geostatistical techniques for

Gregoire Dubois has been working on an Atlas of deposition of Cs-137

>interpolating deposition of radioactive material following an

>accident. In such an event data would be limited and so the

>experiments have been constrained to limited data.

>

>Initially I am trying ordinary kriging but will move on to more

>refined approaches. One problem is specifying an appropriate

>search radius and maximum/minimum number of points to include.

>

>As I see it there are arguments for and against a short seach

>radius:

>

>for:

>-allows local rescaling of the mean in ordinary kriging

>-doesn't require knowledge of the covariance at very large

>separations

>-reduces computational demands

>

> against

> -poor/patchy coverage of the final estimate surface

> -a datum may be beyond the search radius but it does provide

> information about the global but not local mean value

>

>I would be grateful for any advice or a rule of thumb for choosing

>an appropriate search radius for my purpose and advice on the

>effect of changing the minimum number of points that can be

>include in a kriging estimate.

after Chernobyl:

> I was concentrating more on the descriptive aspect of the deposition

Hao Zhang spotted that the deposition process was unlikely to be

>patterns and therefore didn't use geostats methods since these were

>smoothing too strongly the data. The main reason is that I was also

>using a GIS without having the possibility to use the variogram

>model during the kriging. My option was therefore to define the

>search radius as the range given by the variogram and to use a

>minimum number of points (4-6 points depending on the dataset) with

>one point in each search quadrant. Such approach has been conforted

>by cross-validations. There is indeed no rule of thumb to answer

>your question. Maybe cross validations will help you to take some

>decisions but it will be difficult to generalize your decisions.

>About the "patchy" coverage has an important advantage: it really

>shows the decision maker how trickky the data are and how variable

>are the values taken by the measurements even at small distances.

>Unfortunately, a generalisation of the deposition pattern is often

>required to underline the global structure of the pattern which

>might be hidden by these so called "bulls eyes effects". The ideal

>approach should involve as much as possible local functions: zonal

>kriging or even better, the class kriging that has been recently

>developed during the Spatial Interpolation Comparison 97 organised

>on Ai-geostats.

stationary, (we intend to investigate non-stationary methods

next):

>Julian Besag has done some work for nonstationary data. A model

Dan Cornford wrote:

>goes like the following

>

> x(s)= b'y(s) + e(s)

>

>where y(s) is a vector of covariates, e(s) is the residual and

>nonstationary (due to the fact that it is impossible to

>include all related covariates in the model). e(s) is modeled by

>intrinsic autoregression. With small datasets, Bayesian estimates

>of parameters were considered by him. This approach has been applied

>to agriculture variety trial, cancer risk in space and image

>analysis.

>In general the value in O.K. will be a weighted sum of those

Susan Skirvan wrote:

>observations within the correlation radius (that is the effective

>range for many models - although for some (e.g. squared

>exponential)

>models all points will receive some weight albeit very small) when

>all the data is used. The question of how much data should be

>included (or setting a search radius can only be answere with

>respect to:

>

>- sampling design

>- correlation scale(s)

>- what you want to map

>

>I practice I tend to use a search radius that is of the order of the

>correlation range (this means I get a lot of sites contributing in

>some cases but with modern computers this is no problem). I guess

>what you do depends on what aspects of the data you want to bring

>out - but to get the optimal estimate (conditional on all the

>kriging assumptions being met!!!) one should go for a search radius

>~ correlation range.

>what has been helpful for me is to examine both the resulting kriged

Prof Mikhail Kanevski is also working in this area:

>map and the kriging variance map for discontinuities, where values

>jump abruptly due to insufficient points being used for estimation.

>

>When I see this problem, increasing the search radius has been

>helpful in smoothing such discontinuities. Increasing the number

>of points used has been more helpful in reducing kriging

>variances. I think that the appropriate choice of model (for

>example, I have had a nested model with a spherical and a linear

>component, representing short- and long-range spatial

>correlations) will help compensate for possible local changes in

>mean.

>we have done and published many papers related to the spatial data

Syed Abdul Rahman commented:

>analysis of radioactively contaminated territories after the

>Chernobyl accident. You can find the list of these publications on

>our homepage. May be it can be interesting for you.

>We are developing also software tools.

>

>http://www.ibrae.ac.ru/~mkanev/

>

>A) Generally, the rule of thumb is to limit the search neighborhood,

Dan McCarn warned about the problem of computational error.

>to avoid stretching the stationarity assumption. Of course, this is

>only fine and dandy if you have sufficient data. This could range

>from three to six wells in an offshore development to hundreds of

>thousands of points in a remote sensing problem.

>

>B) Consider that cross-validation can be used to fine tune

>search neighborhood parameters such as radii, points per

>quadrant, anisotropy ratios, and so forth.

>

>C) Limiting the number of CPU cycles can be a moot point with

>today's hardware capabilities. Also, techniques are available

>to increase computation speed, e.g. sparse matrix solvers in

>special kriging cases.

>

>D) Realistically speaking, 100 points is about the maximum for

>a problem involving a unique neighborhood (using all points

>instead of using a local neighborhood). Covariance matrix is

>inverted only once, and doesn't change from location to location.

>The answer to your question goes beyond finding a "limited"

Thanks again

>neighborhood to satify the stationarity assumption. Keep in mind

>that in kriging, one solves an order NxN matrix, N being the number

>of points for the neighborhood. My experience has been that 10 to

>20 points pretty well maximizes the accuracy. More points, and

>there is more numerical error associated with the solver. Solving

>a system is usually an N cubed process. (1/3 N**3 for LU

>Decomposition; 2/3 N**3 for max. Pivot Gauss elimination). A large

>N results in a very large number of calculations (divisions,

>additions, etc.), with an associated buildup of numerical error

>which can swamp the problem.

>

>I presented findings of a study of a published benchmark examining

>this problem (McCarn & Carr, 1992) that you might review to

>understand the underlying issues. Some ignore the practical lessons

>related to errors associated with trying to solve a very large

>system and still give it a large neighborhood. They treat the

>kriging code as a "black box" assuming infinite precision.

>"Stretching the envelope" here translates into adding many, many

>small errors together and eventually having a quite large error.

>

>You may or may not have access to multiple solvers or iterative

>refinement of a solver, but if you do, please compare the results

>of the solvers with varying numbers of N. If you have a VAX or an

>ALPHA, compile the code in REAL*16 and compare the results.

>Depending on the sample geometry, varying degrees of

>ill-conditioning can result, up to and including failure of the

>solver. Frequently, iterative refinement can solve most of these

>problems, but there is no "magic bullet" and every kriging problem

>is unique.

>

>McCarn, D.W. and Carr, J.R. (1992): Influence of Numerical Precision

>and Equation Solution Algorithm on Computation of Kriging Weights,

>Computers & Geosciences, Vol. 18, No. 9, pp. 1127-1167.

>

Dr Tom Charnock

National Radiological Protection Board

Didcot, Chilton, Oxon, OX11 0RQ, UK.

01235 822623

Tom.Charnock@...

--

*To post a message to the list, send it to ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.

*To unsubscribe, send email to majordomo@... with no subject and

"unsubscribe ai-geostats" in the message body.

DO NOT SEND Subscribe/Unsubscribe requests to the list!