Hello,

I have a following problem. I have a multivariate discrete data set (n

binary variables, total of N instances) and I would like to find a

reasonable estimate of marginal frequencies of all instances of k

variables (there're 2^k such instances, considering binary variables

only). This I would like to do by sampling a smaller random sample from

the original sample in order to decrease the computational requirements.

But I have no idea how to estimate the sample size needed to get the

accurate proportions of these instances in the given data set. I know

about the way how to approach this with one random variable - via

estimating the standard deviation of this variable in the sample and

then using this in order to estimate the sample size for desired

accuracy. However, I have no idea how to do this with multivariate

frequencies, i.e. how to estimate the size of a random sample I have to

draw from a given finite sample of N instances of n binary variables to

get an accurate estimate p(X_i1=x_i1,...,X_ik=x_ik) for all x_ij from

{0,1} (binary variables).

Thank you in advance,

Martin

