Thanks for your reply.

Yes you are correct. The probability that a field will be discovered is

the product of f(x) that it is from a natural abundance (of its parent

population) and as a function of its size g(x). And yes the larger the

field is the greater the probability that it is found.

I looked at described the probability that a field will be discovered

conditional to its size for my empirical dataset as a discovery sequence -

and although there is a first order trend of decreasing size with

increasing discovery sequence there were lots of perturbations - for

example the 3rd largest field was discovered 23rd in the discovery

sequence. therefore describing the discovery sequence may not be a

straightforward function. I may start off with a simplistic model to

demodulate the observed data but may have more success with an integral as

you have mentioned for the early geometrical work.

The small dataset I have is probably not suitable to describe the

discovery sequence theoretically - and I will use an analogue area to

establish the discovery sequence function .

Kind regards

Beatrice

Hydrocarbons Group

Institute of Geological and Nuclear Sciences Limited

69 Gracefield Road, Lower Hutt, WELLINGTON

NEW ZEALAND

64 4 570 4821

b.mare-jones@...

www.gns.cri.nz

(Ted Harding) <Ted.Harding@...>

02/09/2005 20:56

Please respond to ted.harding

To: Chris Hlavka <chlavka@...>

cc: ai-geostats@..., Beatrice Mare-Jones <B.Mare-Jones@...>

Subject: Re: [ai-geostats] Pareto vs Lognormal distribution

I'm intruding into foreign territory here, since I don't have

experience in exploring for gas fields etc., (though I have had

to deal with patches of contamination, which is the same sort

of thing on a small scale). So apologies if I blunder around

and tread on toes!

Be that as it may, a point which has occurred to me in reading

this thread is that the distribution being observed is the

distribution of size conditional on being discovered, and the

probability of being discovered may be expected to increase

with size.

So the frequency f(x) of occurrence of a size x in nature is

attenuated by a factor equal to the probability that an item

of size x will be discovered -- g(x) say. The frequency of x

in the observed data is f(x | D), say, and so

f(x | D) = f(x)*g(x)

from which f(x) = f(x | D)/g(x). From this point, provided

there is a reasonaboly justifiable model for g(x) (to within

a constant of proportionality, e.g. simply g(x) = x), you can

"demodulate" the observed data to infer the "wild" data.

There has for many decades been a similar problem in classical

geometrical probability (the ancestor of spatial statistics

and morphometry), namely to infer the distribution of (e.g.)

areas of cells given the observed distribution of the sizes

of transects by lines, or of counts of sampling points intersecting

them, leading to an integral equation.

Maybe all this is old hat in the areas you are investigating,

but since it did not seem to be even implicit in the discussion

so far I thought I would bring it to the surface.

Best wishes to all,

Ted.

On 01-Sep-05 Chris Hlavka wrote:

> Beatrice - This is a vexing problem that I've tried to deal with in

> sizes of features in satellite imagery (Hlavka, C. A. and J. L.

> Dungan, 2002. Areal estimates of fragmented land cover - effects of

> pixel size and model-based corrections. International Journal of

> Remote Sensing23(4): 711-724.) The affine (count versus continuous)

> nature of the digital imagery is at least part of the problem. I've

> used probability plots to assess type of distribution.

>

> In gas field work, there is evidence that the apparent lognormality

> of field-sizes is due to lower rates of discovery of smaller fields

> than larger fields - especially for older surveys. It has been

> noted that newer field data was closer to Pareto than older data and

> thus inferred that the actual distribution is Pareto. -- Chris

>

>

>

>>Hello list

>>

>>I am a PhD student looking at developing a statistical model to predict

>>the size-distribution of an area's oil and gas fields.

>>

>>It is clear that previous investigators prefer either a Pareto power

>>law

>>or a lognormal distribution to approximate field-size distributions.

>>

>>The data I am using does not look like it comes from a Pareto

>>distribution

>>- which I explain as being a result of undersampling - which previous

>>investigators have reported - that undersampling occurs because the

>>small

>>fields are not sampled or recoded. However by using basin-modelling

>>software to simulate oil and gas fields (for the same basin that my

>>discovered empirical data comes from) I notice that this sample is also

>>undersampled - that is fields under a certain size are not being

>>simulated

>>- which is probably due to the resolution of my input data but what is

>>interesting is that the undersampling actually occurs throughout all

>>the

>>size ranges - including the medium to larger sizes - which I would not

>>have expected. Like the discovery dataset (n = 25) the simulated

>>dataset

>>(n = 140) looks like it is more from a lognormal distribution than a

>>Pareto distribution.

>>

>>My conclusion is that without being able to say that a Pareto is better

>>than a lognormal and vise-versa it appears only logical to use both

>>distributions.

>>

>>Geologically there does not seems to be a reason why a modal size

>>(greater

>>than what is detectable by exploration methods) of fields should exist

>>-

>>which would be the case if the data was from a lognormal distribution

>>-

>>except if the distribution is highly right skewed (at the small field

>>size) and the mode is actually just below the detection of size.

>>

>>Geologically there does seem reason for fields to become so small that

>>they become entities (that trap oil and gas) - and this relationship

>>may

>>be better approximated by a Pareto.

>>

>>

>>The Pareto and lognormal form is similar but maybe one is better to

>>approximate field sizes than the other.

>>My question is do you think a Pareto distribution better approximates

>>an

>>oil and gas size distribution than a lognormal (or vise-versa) and if

>>so

>>why.

>>

>>

>>I am currently working on goodness of fit test to throw some more light

>>on

>>this - but if anyone has any thing to say I'd appreciate some comments.

>>

>>Thank you,

>>

>>Kind regards

>>

>>Beatrice

>>

>>Geological and Nuclear Sciences

>>New Zealand

>>www.gns.cri.nz

>>

>>

>>* By using the ai-geostats mailing list you agree to follow its rules

>>( see http://www.ai-geostats.org/help_ai-geostats.htm )

>>

>>* To unsubscribe to ai-geostats, send the following in the subject

>>or in the body (plain text format) of an email message to

>>sympa@...

>>

>>Signoff ai-geostats

>

>

> --

> ***************************************

> Chris Hlavka

> NASA/Ames Research Center 242-4

> Moffett Field, CA 94035-1000

> (650)604-3328 FAX 604-4680

> Christine.A.Hlavka@...

> ***************************************

--------------------------------------------------------------------

E-Mail: (Ted Harding) <Ted.Harding@...>

Fax-to-email: +44 (0)870 094 0861

Date: 02-Sep-05 Time: 09:16:02

------------------------------ XFMail ------------------------------

* By using the ai-geostats mailing list you agree to follow its rules

( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in

the body (plain text format) of an email message to sympa@...

Signoff ai-geostats