Hello,

I have a following problem. I have a multivariate discrete data set (n

binary variables, total of N instances) and I would like to find a

reasonable estimate of marginal frequencies of all instances of k

variables (there're 2^k such instances, considering binary variables

only). This I would like to do by sampling a smaller random sample from

the original sample in order to decrease the computational requirements.

But I have no idea how to estimate the sample size needed to get the

accurate proportions of these instances in the given data set. I know

about the way how to approach this with one random variable - via

estimating the standard deviation of this variable in the sample and

then using this in order to estimate the sample size for desired

accuracy. However, I have no idea how to do this with multivariate

frequencies, i.e. how to estimate the size of a random sample I have to

draw from a given finite sample of N instances of n binary variables to

get an accurate estimate p(X_i1=x_i1,...,X_ik=x_ik) for all x_ij from

{0,1} (binary variables).

Please, send me any suggestions to my email address,

pelikan@....

Thank you in advance,

Martin

----------------------------------------------

Martin Pelikan

Illinois Genetic Algorithms Laboratory

University of Illinois at Urbana Champaign

117 Transportation Building

104 S. Mathews Avenue, Urbana, IL 61801

tel: (217) 333-2346, fax: (217) 244-5705

----------------------------------------------

--

*To post a message to the list, send it to

ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.

*To unsubscribe, send email to

majordomo@... with no subject and

"unsubscribe ai-geostats" in the message body.

DO NOT SEND Subscribe/Unsubscribe requests to the list!