Loading ...
Sorry, an error occurred while loading the content.

30Re: AI-GEOSTATS: In need of some help.

Expand Messages
  • Eric LEWIN
    Dec 20, 2000
      > I need a way to compare two small populations (very small sample sizes...
      >5 and 6... both of which lack normality). I would like to compare them
      >based on 3-5 parameters. Because of the above limitations I have given up
      >on the validity of a t-test (which assumes a normal distribution and
      >larger sample sizes). My basic question is this: are these two small
      >populations statistically different or do they belong to the same
      >population ?

      My thinking is that, on one hand, there is no theoretical lower limit to
      the minimum number of data, except the evident "at least one" ! (or two if
      some spreadth information is required). What changes with such very small
      samples is the 2nd kind risk value, if calculable, which becomes high. Said
      differently, to a 1st kind risk given, the decision criteron becomes so
      "wide", that it has a very low power of discreminating between statistical
      coherence, which is the question given, and bad luck coincidence of the
      data (for instance, on one scales plate, five or six realisations of a
      random variate defined between 0 and 1 and having a bimodal distribution
      with modes at 0.25 ans 0.75, and on the other plate, not more much data
      from a normal random variate of mean = 0.5 and standard deviation = 0.25 --
      hue ! just a guess of a counter-example...).

      On the other hand, practically, having no restriction on the class of
      plausible probability laws implies the non-parametric test, which decision
      intervals can not necessarily be calculated to a known precision. More
      precisely, in the present case, the test I am thinking of, to compare two
      samples for being from the same parent distribution with no other
      assumption, is the Kolmogorov-Smirnov test, which is based on the
      distribution of the maximum absolute difference between the two empirical
      cumulative functions (CDFs), a distribution which pdf expression is only
      known _assymptotical_ (as far as I have learned in stat books... ; more
      other, the assymptotic function is an infinite serie, which may show in
      some cases a poor numerical convergence -- but this is another story). By
      assymptotic is meant that the approximation becomes more and more valid
      when the sample size increases. However, no idea is given to the quality of
      this approximation ! And in the present case, as the question relates to
      very small samples, I have found nothing on the validity of this
      criteria... So if some theoretical statistitian can confirm, or invalidate
      and complete this, I'll be happy to learn more.

      A last word : in case you (I mean, anyone on the list) is interested, I
      have written a Matlab script (v.4.2) that does the job : asking for two
      samples files, drawing these samples and their two associated empirical
      CDFs, calculating the max difference, and evaluating the corresponding
      probability according the (assymptotic) K-S law.

      --E'ric Lewin

      PS: I am not fully sure of the exact statistical english terminology (1st
      or 2nd "kind risk", etc.); if I am wrong, thanks for correcting me.


      +=[ Éric LEWIN <mailto:eric.lewin@...> Tél: (33/0)4 76 63 59 13 ]=+
      +===[ LGCA (Labo. de Géodynamique des Chaînes Alpines), Grenoble (France) ]===+



      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    • Show all 7 messages in this topic