I apologize for the lateness of this posted summary to my question.

What follows is my question and the replies I had.

Thanks goes to:

From: Goovaerts@... (Goovaerts Pierre)

From: Robert B Szerbiak <szerbiak@...>

From: "J. Felipe Costa" <costa@...>>From cj_murray@... Tue Jan 14 03:23:36 1997

From: jeffrey@... (jeffrey YARUS)

Thanks in particular to Pierre, who gave me a few pages from him book due

out later this year. I recommend getting it!

My Post:

From: Andrew Gill <agill@...>

Subject: GEOSTATS: GSLIB sGs Question

Howdy All

Can someone help me out with a question.

My understanding of GSLIB's sgsimm Sequential Gaussian Simulation program

is that it produces a uniform map of values, each of which comes from a

Gaussian [Normal] distribution, and as a whole has an experimental variogram

much more similar to the sample experimental variogram than that of ordinary

kriging.

Q1: Is this true?

Q2: if so, how do they do it?

I am guessing that at each point, z_i~~N(mu,sigma). I presume the parameters

are the kriged value and kriging variance. That seems reasonable. Now how

do they impose the constraint on the variogram produced by the simulation??

Any help will be gratefully accepted [and passed off as my own].

I apologize if this is a RTFM-type question.

uoY knahT

Andrew

--

--------------------------------------------------------------------------

Dr Andrew Gill Room:203d

Department of Applied Mathematics Email: agill@...

Mathematics Building Phone: 08-3035577

The University of Adelaide Fax: 08-3033696

Adelaide, SA, 5005

--------------------------------------------------------------------------

--

The Replies:

From: Goovaerts@... (Goovaerts Pierre)

Dear Andrew,

GSLIB's sgsim program allows one to generate multiple

"maps" of values reproducing the histogram and pattern

of spatial variability inferred from the data.

The algorithm requires the use of the multiGaussian RF model.

Since most variables do not exhibit a Gaussian histogram,

the first step is to transform the original values into

"normal scores". The normal-score transform consists of identifying

each original value z_p with the corresponding p-quantile y_p

in the standard Gaussian distribution.

Note that this transformation ensures only the univariate normality

of the data, and that there is no way to assess the validity

of the MULTIGaussian assumption.

Anyway, once the data have been normal score transformed, one

computes the variogram and model it. Then, we proceed with

the sequential simulation algorithm:

1. Define a random path visiting only once each node to be simulated.

2. At each node determine the parameters (mean and variance) of the

Gaussian conditional cumulative distribution function (ccdf) using

simple kriging and the normal score semivariogram model.

The conditioning information consists of a specified number of both

normal score data and values simulated at previously visited grid nodes.

3. Draw a simulated value from that ccdf, and add it to the data set.

4. Proceed to the next node along the random path, and repeat the two previous steps.

5. Loop until all $N$ nodes are simulated.

The final step consists of back-transforming the simulated normal scores

into simulated values for the original variable, which amounts to applying the

inverse of the normal score transform to the simulated values.

In reply to your questions:

1. The back-transform ensures the reproduction of the sample histogram.

2. The reproduction of the variogram is ensured by conditioning each

ccdf to the previously simulated values. Note that the reproduction

of the variogram does not require the successive ccdf models to be Gaussian;

they can be of any type as long as their means and variances are determined by

simple kriging (A. Journel, Modeling uncertainty: some conceptual thoughts.

In R. Dimitrakopoulos, editor, "Geostatistics for the Next Century", pages 30--43.

Kluwer, Dordrecht, 1994).

Note that the theoretical background underlying most GSLIB programs will be

available in a book entitled "Geostatistics for Natural Resources Evaluation"

by P. Goovaerts; it will be published by Oxford University Press in July 1997 (ISBN: 511538-4,

cost $59.95). The book (512 pages) aims at bridging the gap between Isaaks

and Srivastava's introductory book and GSLIB's more complete user guide.

The most important geostatical concepts are illustrated using soil contamination data

that are provided with the book and were analysed using the public-domain geostatistical

software library GSLIB, which makes the book highly suitable for teaching both theory and

practice of geostatistics.

Dr. Pierre Goovaerts

Universite Catholique de Louvain, Unite BIOM

Place Croix du Sud 2 Bte 16,

B-1348 Louvain-la-Neuve, Belgium

>From szerbiak@... Mon Jan 13 11:19:18 1997

From: Robert B Szerbiak <szerbiak@...>

The way I understand the process after using it several times is this:

values are simulated on a grid sequentially chosen from a

random distribution. Specifying a search radius determines

what neighborhood of conditioning points will be used in

determining the mean and variance. The final kriged value is

found by choosing at random a value from the probability

distribution and adding the mean and variance. Proceeding

sequentially with next grid point to be kriged it will have a

denser population to be conditioned from (thus sequential Gaussian

simulation). Points found later sequentially in the process

naturally are conditioned by previous values and since the

initial kriged values are restricted to a user defined

neighborhood will tend to give the appearance of sample

statistics. Hope this helps. Most of this can be gleaned from

Isaaks, GSLIB and RC^2 (Udo Araktingi).

____________________________________________________________________________

Robert B. Szerbiak szerbiak@...

http://www.utdallas.edu/~szerbiak

The University of Texas at Dallas PHONE: (972) 883-2843

Center For Lithospheric Studies FAX (972) 883-2829

MS/ FA31, BOX 830688

Richardson, TX 75083-0688

From: "J. Felipe Costa" <costa@...>

To: Andrew Gill <agill@...>

I suggest you read either Isaaks, 1991 PhD thesis or G. Luster 1985 PhD

thesis (both from Stanford)

They discuss in details the way sgsim was derived. The initial proposal

in this topic can be found also in:

M. Johnson

Multivariate Statisitical Simulation, Wiley and Sons, 1987

Chap 3 and 4

However I prefer Isaaks style in presenting the idea.

Good luck

------------------------------------------------------------------------

J. Felipe Costa, PhD candidate Phone:

University of Queensland national: (07)3365.3473

Dept. of Earth Sciences international: (61)(7)3365-3473

Brisbane, Qld 4072 Fax: (07) 3365-1277

E-mail: costa@... home: (07) 3878.6475

------------------------------------------------------------------------

>From cj_murray@... Tue Jan 14 03:23:36 1997

Andrew,

Do you have the GSLIB manual (Deutsch and Journel, 1992, GSLIB: Geostatistical

software library and user's guide, Oxford University Press)? I think that the

manual does a decent job of explaining the sequential simulation algorithm used

in sgsim (see sections V.1.3 and V.2.3). Other explanations of Gaussian

simulation that I think are well written are contained in Ed Isaaks' doctoral

dissertation (1990, The application of Monte Carlo methods to the analysis of

spatially correlated data), Gordon Luster's doctoral dissertation (1985, Raw

materials for portland cement: Applications of conditional simulation of

coregionalization), or my doctoral dissertation (1992, Geostatistical

applications in petroleum geology and sedimentary geology). All three

dissertations were completed at Stanford University, and should be available

from University Microfilms. Hope this helps,

Chris Murray

----------------------------------------------------------------------

Chris Murray | email: cj_murray@...

Applied Geology and Geochemistry Group | phone: 509-376-5848

Battelle, Pacific NW National Lab | FAX: 509-376-5368

3110 Port of Benton Blvd., MSIN K6-81 |

P.O. Box 999, Richland, WA 99352 |

----------------------------------------------------------------------

_______________________________________________________________________________

From: jeffrey@... (jeffrey YARUS)

Hi Andrew:

Answering your question mathematically is explained pretty well in the GSLIB

book by Deustch and Journel (starting on page 123). Understanding in english

how things are actually happening is more difficult, but I'm willing to give

it a try.

First of all, you are correct that the variogram of the sGs better represents

the variogram of the experimental data (raw data) than ordinary kriging. The

reason is that kriging does not attempt to reproduce the variability inherent in

the data, only a reliable mean. If you think about this in simple way, kriging is a sophisticated "interpolation" algorithm. It produces an "average" value at

the location of grid nodes that are a function of set of neighboring control

points. Although the weighting scheme (via the variogram) is unique, the final

product is an average. Therefore, the final set of kriged values at grid

nodes will not show the highs and lows which may be present in the raw data.

You can test this by producing a variogram of a kriged surface and comparing it

to the experimental variogram from the raw data.

Simulation (sGs or any other) attempts to reproduce the variance, and in fact

does not do a very good job at reproducing a "mean" value at any one node. The

relationship between a kriged surface and a conditional simulation surface of

the same data is this: the average of an infinite number of realizations from

conditional simulation would equal the kriged result. In practice, if you

performed a number (say 50) conditional simulations and averaged the resulting

grids, you would see a something that closley resembled the kirged result.

With this in mind, let me address gaussian simulation. The steps I will address

were originally designed to explain truning band simulation (a form of gaussian

simulation) but can be applied to almost all simulation techniques for the

purpose of explaination:

1. The first step is to krige your data (put the results aside for a moment).

2. The second step is to produce a random gaussian field with same dementions

(cartisian coordinate system) as the kriged data.

3. Sample the random gaussian field at the location of the raw data.

The purpose here is to generate a random gaussian value at the location of

a control point.

NOTE: so far what you have are three files; one of the original kriged data,

one random gaussian field, and one consisting of gaussian values at the points of control.

4. Krige the gaussian values located at the location of the control points.

This gives you a fourth file consisting of a kriged surface of random

gaussian numbers generated from the sampled values created in step 2.

5. Subtract the grid generated in step 2 from the one generated in step 4.

This produces a grid with zero values at the well locations. Since

kriging is an exact estimator, the control point values in step 4 must

match those in step 2.

6. Add the result of step 5 to the original kriged grid in step 1. Voila, you

have your first realization!

If you can imagine, in step one you modeled your experimantal semi-variogram.

In step 4, you use the same variogram model as in one. This way, you

automatically build in the variogram.

Finally, sGs.

The only difference in sGs comes in the treatment of step 4. The idea here is

to increase the neighborhood of control points every time a kriged estimate is

made. The increaed neighborhood of control points comes from priviously

estimated nodes. That is, the process starts out just like before. However,

each time a node is estimated, it is treated as a "new control point," and can

therefore be used as a neighbor in subsequent estimations. Usually, the program

allows for a practical number of these "new control points" so as not to bog

down the computaitonal time.

Obviously, the actually programing is more efficient then the algorithmic-like

steps I have give. However, I find this type of explaination somewhat helpful

in undertanding the general concept of what is happening. Hopefully, this has

been of some value to you.

Let me know if I can be of any further help... Cheers.

---------------

Jeffrey YARUS GEOMATH, Inc.

phone : (713) 293-8550 ext.16 Suite 1125

fax : (713) 293-8294 200 WestLake Park Blvd.

e-mail : Jyarus@... Houston, TX 77079, USA

>From agill Wed Jan 15 08:37:53 1997

Subject: Sequential Gaussian Simulation

To: Jyarus@...

Hi Jeffrey

Thanks for the explanation of sgs. I still have a couple of questions.

In Step1 you krig the data using the variogram model you want the results

to have. However, the kriged data are smoother. I understand that.

In Step2 you generate a random Gaussian grid. Is each value [ie at each

grid location] sampled from the same univariate Gaussian distribution?

If so, does this assume each value is independent from each other? Also,

what are the parameters of the Gaussian distribution? Is it standard normal

N(0,1)?

In Step4 you krig the Gaussian data. Here you use the variogram model that

you used in Step1, right? [ie not that modelled from the sample Gaussian

data, which would be different?] Now, wouldnt the kriged Gaussian data be

smoother [as in Step1]? [ie give a smoother variogram?] Or is that OK?

In Step5 you calculate Step4-Step2, and in Step6 you add this to Step1.

I understand that Step4-Step2 gives you zero values at the data locations.

I can see that when you add this to Step1 you modify the kriged data only

at the non-data locations. I can see that this will change the variogram

produced by this altered image [adding a residual which is sort of random].

I cant see how this will produce the variogram model you want.

Step1 is too smooth, Step4 is too smooth, Step5 is ???, and Step6=Step1+Step5

is ???

My "logic" would then be:

In terms of the variograms: (CV is correct variogram)

Step1: CV-A

Step4: CV-B

Step5: CV-B-C [C is variogram of Step2]

Step6:CV-A+CV-B-C

If Step6 should equal CV, then A+B+C=CV has to hold. Is this the case?

I realize my logic is pretty rough, but can you shed some [more] light on

this?

Thanks again

Andrew

--

Date: Wed, 15 Jan 1997 11:32:18 -0600

From: jeffrey@... (jeffrey YARUS)

The Gaussian field in step 2 is a standard normal field. The important thing to

realize is that the "residual" which is added to step 1 (in step 6) maintains

the anisotropy features of the variogram and increases the variance to the

original kriged surface (step one). In step 2, the field should be random, thus

each point is independant. And, you do not necessarily apply the variogram

model from step one at this point. Remember, step 2 and 3 essentially results

in a set of random numbers. Yes, the resulting kriged surfacd is "smoothed,"

but it represents a surface that is totally independant of the original surface!

So, the smoothing of this surface will not ultimately prevent us from attaining

the final variablilty in in step 6 that we need.

Again, The purpose of step 2 is generate a set of psudo raw data values, which

will be kriged using the same variogram model from step 1. Yes, this will

result in a surface which will not yield the same variogram seen in the raw

data, but this is okay! The reason the variogram reproduced from a kriged

surface is different is that it lacks the degree of variance seen in the raw

data. The whole purpose of steps 3,4, and 5,is to generate a kind of

oriented "noise" to add to the original kriged surface. If I add this "noise"

or variance to the kriged surface, and then reproduce the variogram from the

simulated surface, my variogram will better approximate that of the raw data.

Hope this helps clairify the process. If not, PLEASE, email me again and I will

attempt another response.

---------------

Jeffrey YARUS GEOMATH, Inc.

phone : (713) 293-8550 ext.16 Suite 1125

fax : (713) 293-8294 200 WestLake Park Blvd.

e-mail : Jyarus@... Houston, TX 77079, USA

Date: Thu, 16 Jan 1997 11:04:47 +0100

From: Goovaerts@... (Goovaerts Pierre)

Dear Andrew,

The use of the kriging variance as a measure of the error calls for two stringent

hypotheses (which are rarely met in practice):

- the estimation error is modeled as a realization of a Gaussian RV

which implies the symmetry of the local distribution of error.

- the error variance is independent of the data value (homoscedasticity).

Under the multiGaussian RF model, one shows that the kriging variance is a

measure of estimation accuracy. It is the reason why the simple kriging

variance is used as the variance of the conditional distribution in SGSSIM.

However, common practice has shown that the kriging variance is usually

non correlated with the actual estimation error. It is but a fact which may

be related to the violation of the multiGaussian assumption.

Dr. Pierre Goovaerts

Universite Catholique de Louvain, Unite BIOM

Place Croix du Sud 2 Bte 16,

B-1348 Louvain-la-Neuve, Belgium

--

*To post a message to the list, send it to ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.