On Sat, Jul 07, 2012, Omer Zak wrote about "Re: [hackers-il] Algorithm for randomly choosing a question":

> The proposed approach has the problem of memory consumption of order of

> O(N) (where N is the number of the questions). I'd like to see an

> algorithm whose memory consumption is closer to O(1) than to O(N).

Omer, here is an O(1) *RAM* algorithm. Note that you still have O(N)

disk usage, of course - there are N questions, and they need to be saved

somewhere, and so is the history of the success for them.

If you drop the O(1) memory requirement (which I think is an overkill for

any type of modern computer), you can make things simpler.

Note - I assume the number of probability classes (in your example,

1, 1/2 and 1/100, if I remember correctly) is constant (O(1)) and not

O(N).

Keep on disk a file for each probability class - one list of questions

with normal probability (1), another list of questions with half

probability, and a third list of questions with 1/100 probability.

Each will be a list of numbers (indexing questions stored in a separate

file).

In *memory*, keep the number of questions in each class: n1, n2, n3,

and the partial probability sums:

p1 = n1*1

p2 = n1*1 + n2*(1/2.)

p3 = n1*1 + n2*(1/2.) + n3*(1/100.)

Each time you want to draw a random question, pick a random number

between 0 and p3. If it's between 0 and p1, you decided to take a

question from class 1. If it's between p1 and p2, you decided to take

a question from class 3. If it's between p2 and p3, then from class 3.

You decide *which* question to take in the obvious way - e.g., if you

got a random number p between p1 and p2, then the question number in the

class 2 list is m=(p-p1)/(1/2). You then need to look at the m'th

position in the on-disk list of class 2 questions, to get the actual

index of the question.

Finally, when the user answers you need to update your data. You need

to remove the question from the previous class (to do this in O(1) time,

just move the last element of the list into the newly formed hole),

and to add it to the new class (put it last), and of course update

n1,n2,n3 and p1,p2,p3.

Again, if you don't mind O(N) memory usage - actually just N*4 bytes -

then I suggest that you do keep the list of questions - just their

numbers - in memory. Then you won't need to read and write the disk

all the time.

Nadav.

P.S. This is of course a variant on the classic shuffling (or random

permutation) algorithm. In the shuffling algorithm, you need to draw

random questions from the list, but after you used a question, you don't

want to see it again (its probability becomes zero). Instead of keeping

the used questions on the list (and getting O(N^2) time complexity for

the whole shuffle operation), the trick is to move them to the end of

the list, and to remember how many questions we still have (in the

beginning of the list) to draw from.

--

Nadav Har'El | Sunday, Jul 8 2012, 18 Tammuz 5772

nyh@... |-----------------------------------------

Phone +972-523-790466, ICQ 13349191 |If Barbie is so popular, why do you have

http://nadav.harel.org.il |to buy her friends?