• Hello, I have a following problem. I have a multivariate discrete data set (n binary variables, total of N instances) and I would like to find a reasonable
Message 1 of 1 , Nov 20, 1998
Hello,

I have a following problem. I have a multivariate discrete data set (n
binary variables, total of N instances) and I would like to find a
reasonable estimate of marginal frequencies of all instances of k
variables (there're 2^k such instances, considering binary variables
only). This I would like to do by sampling a smaller random sample from
the original sample in order to decrease the computational requirements.
But I have no idea how to estimate the sample size needed to get the
accurate proportions of these instances in the given data set. I know
about the way how to approach this with one random variable - via
estimating the standard deviation of this variable in the sample and
then using this in order to estimate the sample size for desired
accuracy. However, I have no idea how to do this with multivariate
frequencies, i.e. how to estimate the size of a random sample I have to
draw from a given finite sample of N instances of n binary variables to
get an accurate estimate p(X_i1=x_i1,...,X_ik=x_ik) for all x_ij from
{0,1} (binary variables).

Please, send me any suggestions to my email address,
pelikan@....

Thank you in advance,

Martin

