1088GEOSTATS: Multivariate sample size estimation

  • Martin Pelikan
    Nov 20, 1998
      I have a following problem. I have a multivariate discrete data set (n
      binary variables, total of N instances) and I would like to find a
      reasonable estimate of marginal frequencies of all instances of k
      variables (there're 2^k such instances, considering binary variables
      only). This I would like to do by sampling a smaller random sample from
      the original sample in order to decrease the computational requirements.
      But I have no idea how to estimate the sample size needed to get the
      accurate proportions of these instances in the given data set. I know
      about the way how to approach this with one random variable - via
      estimating the standard deviation of this variable in the sample and
      then using this in order to estimate the sample size for desired
      accuracy. However, I have no idea how to do this with multivariate
      frequencies, i.e. how to estimate the size of a random sample I have to
      draw from a given finite sample of N instances of n binary variables to
      get an accurate estimate p(X_i1=x_i1,...,X_ik=x_ik) for all x_ij from
      {0,1} (binary variables).

      Please, send me any suggestions to my email address,

      Thank you in advance,


      Martin Pelikan
      Illinois Genetic Algorithms Laboratory
      University of Illinois at Urbana Champaign
      117 Transportation Building
      104 S. Mathews Avenue, Urbana, IL 61801
      tel: (217) 333-2346, fax: (217) 244-5705
