Loading ...
Sorry, an error occurred while loading the content.

[Fwd: Re: AI-GEOSTATS: Large sample size and normal distribution]

Expand Messages
  • Ruben Roa Ureta
    ... Asunto: Re: AI-GEOSTATS: Large sample size and normal distribution De: Ruben Roa Ureta Fecha: Sat, 2 de Agosto de 2003, 5:22 pm Para:
    Message 1 of 1 , Aug 2, 2003
    • 0 Attachment
      ---------------------------- Mensaje Original ----------------------------
      Asunto: Re: AI-GEOSTATS: Large sample size and normal distribution De:
      "Ruben Roa Ureta" <rroa@...>
      Fecha: Sat, 2 de Agosto de 2003, 5:22 pm
      Para: "Chaosheng Zhang" <Chaosheng.Zhang@...>
      Cc: ai-geostat@...
      --------------------------------------------------------------------------

      > Dear list,
      >
      > I'm wondering if anyone out there has the experience of dealing with the
      probability distribution of data sets of a large sample size, e.g.,
      n>10,000. I am studying the probability feature of chemical element
      concentrations in a USGS sediment database with the sample number of
      around 50,000, and have found that it is virtually impossible for any
      real data set to pass tests for normality as the tests become too
      powerful with the increase of sample size. It is widely oberved that
      geochemical data do not follow a normal or even a lognormal
      > distribution. However, I feel that the large sample size is also making
      trouble.

      I pressume your null hypothesis is that the data comes from the given
      distribution as is usual in goodness of fit tests. If such is the case
      your sample size will almost surely lead to rejection. The well-known
      logical inconsistencies of the standard test of hypothesis based on the
      p-value are magnified under large n.
      You have these options at least:
      1) Find some authority that says that for large sample sizes the p-value
      is less informative; e.g. Lindley and Scott. 1984. New Cambridge
      Elementary Statistical Tables. Cambridge Univ Press; and then you can
      throw away your goodness-of-fit test. But be warned that equally important
      authorities have said exactly the contrary thing, that the force of the
      p-value is stronger for large sample sizes (Peto et al. 1976. British
      Medical Journal 34:585-612). To make matters even worse, certainly other
      equally important authorities have said that the sample size doesn't
      matter (Cornfield 1966, American Statistician 29:18-23).
      2) Do a more reasonable analysis than the standard goodness-of-fit test. I
      suggest you plot the likelihood function under normal and lognormal models
      and derive the probabilistic features of your data by direct inspection of
      the function. Also you can test for different location or scale parameters
      using the likelihood ratio (its direct valu, not its derived asymptotic
      distribution in the sample space) for any two well defined hypotheses.
      Ruben



      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    Your message has been successfully submitted and would be delivered to recipients shortly.