Loading ...
Sorry, an error occurred while loading the content.

AI-GEOSTATS: Mix distribution - summary

Expand Messages
  • Monica Palaseanu-Lovejoy
    Dear list,  Here it is my experience with both RMIX (Peter Mcdonald) and Toolkit (Isobel Clark) regarding mix distributions in data. Both toosl are fit for
    Message 1 of 2 , Mar 17, 2004
      Dear list,
      Here it is my experience with both RMIX (Peter Mcdonald) and
      Toolkit (Isobel Clark) regarding mix distributions in data.

      Both toosl are fit for purpose, but they have + and – depending
      what are you looking for.

      Toolkit is very easy to use, and it takes probably 20 minutes to
      figure it out and make a first decent modelling of your data. You
      can use raw data or log data, and can fit up to 4 normal or log-
      normal distributions on the data. For each distribution it gives the
      mean, standard deviation, and the percentage that distribution
      contributes to the modelled data, as well as the degrees of
      freedom and chi-square of the model. Isobel told me that the chi-
      squared is very sensitive to small deviations in the tail of a
      distribution. Because of this her tool is grouping any intervals
      which have an 'expected' frequency less than 4 samples, and this
      leaves fewer degrees of freedom but is more robust to the odd
      extreme value in the tails. I think in a way this “force” the
      distribution to be more “normal”.

      With RMIX you can fit any number of distributions, and it is not
      limited only to normals and log-normals, but it can also use
      binomials, negative-binomials, gamma, weibull and poisson
      distributions. The learning curve is very steep like any R
      package, and if you are a beginner in using R, like I am, it takes
      several hours to figure it out (if it is your first experience with R it
      can take several days ....). Also, it seems that the final result in R
      is sometimes sensitive to the first modelling attempt. The
      graphical output of RMIX is clearer than that for Toolkit, and it
      has the possibility to plot the frequency errors between the
      empirical data (empirical histogram) and the mixture
      distributions. As output, for each distribution you get the mean,
      std.dev, percentage, and of course, the degrees of freedom, chi-
      square and p-value for the model.

      The Toolkit uses non-linear least squares and I suppose RMIX
      uses maximum likelihood for how the best fit is defined.

      Now a result comparison on modelling pollution data – Zn. In
      both cases I used log data, and I fit 2 normal distributions:


      Mean std.dev. %
      1. 4.1504 0.2212 40.0037
      2. 4.3481 0.8655 59.9963
      Chi-square: 22.8914 DF: 20 p-value=0.2942


      Mean std.dev. %
      1. 4.116 0.2515 43.58
      2. 56.42 4.237 1.0146
      Chi-square: 4.6009 DF: 6 p-value=0.5959

      I hope this helps. In my case, after I did this modelling I realized
      that the best kriging procedure I can employ is disjunctive.


      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    Your message has been successfully submitted and would be delivered to recipients shortly.