Loading ...
Sorry, an error occurred while loading the content.

1784RE: [ai-geostats] F and T-test for samples drawn from the same p

Expand Messages
  • Pierre Goovaerts
    Dec 5, 2004
      Hello,

      I am currently principal investigator on a major NIH grant
      that aims to develop software for test of hypothesis
      using alternate hypothesis specified by the user and that
      differ from the omnibus "spatial independence";
      we called them "spatial neutral models".
      For example, you can test for clusters of cancer rates
      "above and beyond" a regional background in exposure.
      The p-values are computed using randomization and I applied
      geostatistical simulation to generate multiple realizations
      that are then used to derive the empirical distribution of
      the test statistic.

      I presented an example during the last GeoEnv conference
      and I put a PDF copy of the paper, which is in press for
      the moment, on my website.

      Cheers,

      Pierre

      <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

      Dr. Pierre Goovaerts
      President of PGeostat, LLC
      Chief Scientist with Biomedware Inc.
      710 Ridgemont Lane
      Ann Arbor, Michigan, 48103-1535, U.S.A.

      E-mail: goovaert@...
      Phone: (734) 668-9900
      Fax: (734) 668-7788
      http://alumni.engin.umich.edu/~goovaert/

      <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

      On Sun, 5 Dec 2004, Colin Daly wrote:

      >
      >
      > Hi
      >
      > Sorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated. In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence.
      >
      > Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often)
      >
      > Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[ then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example). Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[ - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function, the mean is not different. We have been lulled into the false conclusion of differing means by assuming that all our data are independent.
      >
      > Regards
      >
      > Colin Daly
      >
      >
      > -----Original Message-----
      > From: Chaosheng Zhang [mailto:Chaosheng.Zhang@...]
      > Sent: Sun 12/5/2004 11:42 AM
      > To: ai-geostats@...
      > Cc: Colin Badenhorst; Isobel Clark; Donald E. Myers
      > Subject: Re: [ai-geostats] F and T-test for samples drawn from the same p
      > Dear all,
      >
      >
      >
      > I'm wondering if sample size (number of samples, n) is playing a role here.
      >
      >
      >
      > Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for "t-Test: Two-Sample Assuming Unequal Variances analysis". This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for "assuming equal variances" in Excel, but I assume they are similar, and should be the same as those from some text books.
      >
      >
      >
      > From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are "significantly" different.
      >
      >
      >
      > If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis.
      >
      >
      >
      > Cheers,
      >
      >
      >
      > Chaosheng
      >
      > --------------------------------------------------------------------------
      >
      > Dr. Chaosheng Zhang
      >
      > Lecturer in GIS
      >
      > Department of Geography
      >
      > National University of Ireland, Galway
      >
      > IRELAND
      >
      > Tel: +353-91-524411 x 2375
      >
      > Direct Tel: +353-91-49 2375
      >
      > Fax: +353-91-525700
      >
      > E-mail: Chaosheng.Zhang@...
      >
      > Web 1: www.nuigalway.ie/geography/zhang.html
      >
      > Web 2: www.nuigalway.ie/geography/gis/index.htm
      >
      > ----------------------------------------------------------------------------
      >
      >
      >
      >
      >
      > ----- Original Message -----
      >
      > From: "Isobel Clark" <drisobelclark@...>
      >
      > To: "Donald E. Myers" <myers@...>
      >
      > Cc: "Colin Badenhorst" <CBadenhorst@...>; <ai-geostats@...>
      >
      > Sent: Saturday, December 04, 2004 11:49 AM
      >
      > Subject: [ai-geostats] F and T-test for samples drawn from the same p
      >
      >
      >
      >
      >
      > > Don
      >
      > >
      >
      > > Thank you for the extended clarification of F and t
      >
      > > hypothesis test. For those unfamiliar with the
      >
      > > concept, it is worth noting that the F test for
      >
      > > multiple means may be more familiar under the title
      >
      > > "Analysis of variance".
      >
      > >
      >
      > > My own brief answer was in the context of Colin's
      >
      > > question, where it was quite clear that he was talking
      >
      > > aboutthe simplest F variance-ratio and t comparison of
      >
      > > means test.
      >
      > >
      >
      > > Isobel
      >
      > >
      >
      > >
      >
      >
      >
      >
      >
      > --------------------------------------------------------------------------------
      >
      >
      >
      >
      >
      > > * By using the ai-geostats mailing list you agree to follow its rules
      >
      > > ( see http://www.ai-geostats.org/help_ai-geostats.htm )
      >
      > >
      >
      > > * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...
      >
      > >
      >
      > > Signoff ai-geostats
      >
      > >
      >
      >
      >
      >
      > DISCLAIMER:
      > This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
    • Show all 16 messages in this topic