## [Fwd: Re: AI-GEOSTATS: Large sample size and normal distribution]

Expand Messages
• ... Asunto: Re: AI-GEOSTATS: Large sample size and normal distribution De: Ruben Roa Ureta Fecha: Sat, 2 de Agosto de 2003, 5:22 pm Para:
Message 1 of 1 , Aug 2, 2003
• 0 Attachment
---------------------------- Mensaje Original ----------------------------
Asunto: Re: AI-GEOSTATS: Large sample size and normal distribution De:
"Ruben Roa Ureta" <rroa@...>
Fecha: Sat, 2 de Agosto de 2003, 5:22 pm
Para: "Chaosheng Zhang" <Chaosheng.Zhang@...>
Cc: ai-geostat@...
--------------------------------------------------------------------------

> Dear list,
>
> I'm wondering if anyone out there has the experience of dealing with the
probability distribution of data sets of a large sample size, e.g.,
n>10,000. I am studying the probability feature of chemical element
concentrations in a USGS sediment database with the sample number of
around 50,000, and have found that it is virtually impossible for any
real data set to pass tests for normality as the tests become too
powerful with the increase of sample size. It is widely oberved that
geochemical data do not follow a normal or even a lognormal
> distribution. However, I feel that the large sample size is also making
trouble.

I pressume your null hypothesis is that the data comes from the given
distribution as is usual in goodness of fit tests. If such is the case
logical inconsistencies of the standard test of hypothesis based on the
p-value are magnified under large n.
You have these options at least:
1) Find some authority that says that for large sample sizes the p-value
is less informative; e.g. Lindley and Scott. 1984. New Cambridge
Elementary Statistical Tables. Cambridge Univ Press; and then you can
throw away your goodness-of-fit test. But be warned that equally important
authorities have said exactly the contrary thing, that the force of the
p-value is stronger for large sample sizes (Peto et al. 1976. British
Medical Journal 34:585-612). To make matters even worse, certainly other
equally important authorities have said that the sample size doesn't
matter (Cornfield 1966, American Statistician 29:18-23).
2) Do a more reasonable analysis than the standard goodness-of-fit test. I
suggest you plot the likelihood function under normal and lognormal models
and derive the probabilistic features of your data by direct inspection of
the function. Also you can test for different location or scale parameters
using the likelihood ratio (its direct valu, not its derived asymptotic
distribution in the sample space) for any two well defined hypotheses.
Ruben

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
Your message has been successfully submitted and would be delivered to recipients shortly.