- Jul 8, 2012Hello Nadav,

Thanks for your answer.

I ended up implementing something similar to your proposal.

The questions are stored in a SQLite database, so I don't mind at all

O(N) disk space consumption. I did want O(1) RAM consumption, as RAM is

to be considered as a scarce resource in smartphones.

The number of probability classes is finite (3 if to be exact). In

principle, it is possible to have a more sophisticated algorithm for

selecting the next question (with unlimited, time varying probability

classes) but I don't think that the subject matter warrants this kind of

sophistication.

The disk files for each probability class are being kept implicitly - as

result sets of queries for each probability class.

The actual performance of the algorithm (O(1), O(N) or whatever)

actually depends upon the DB performance, which I didn't bother to

benchmark or fine tune as a function of the number of questions.

I did not (even implicitly) implement the optimization that you

suggested for O(1) disk filee modifications.

--- Omer

On Sun, 2012-07-08 at 10:23 +0300, Nadav Har'El wrote:

> On Sat, Jul 07, 2012, Omer Zak wrote about "Re: [hackers-il] Algorithm for randomly choosing a question":

> > The proposed approach has the problem of memory consumption of order of

> > O(N) (where N is the number of the questions). I'd like to see an

> > algorithm whose memory consumption is closer to O(1) than to O(N).

>

> Omer, here is an O(1) *RAM* algorithm. Note that you still have O(N)

> disk usage, of course - there are N questions, and they need to be saved

> somewhere, and so is the history of the success for them.

> If you drop the O(1) memory requirement (which I think is an overkill for

> any type of modern computer), you can make things simpler.

>

> Note - I assume the number of probability classes (in your example,

> 1, 1/2 and 1/100, if I remember correctly) is constant (O(1)) and not

> O(N).

>

> Keep on disk a file for each probability class - one list of questions

> with normal probability (1), another list of questions with half

> probability, and a third list of questions with 1/100 probability.

> Each will be a list of numbers (indexing questions stored in a separate

> file).

>

> In *memory*, keep the number of questions in each class: n1, n2, n3,

> and the partial probability sums:

> p1 = n1*1

> p2 = n1*1 + n2*(1/2.)

> p3 = n1*1 + n2*(1/2.) + n3*(1/100.)

>

> Each time you want to draw a random question, pick a random number

> between 0 and p3. If it's between 0 and p1, you decided to take a

> question from class 1. If it's between p1 and p2, you decided to take

> a question from class 3. If it's between p2 and p3, then from class 3.

>

> You decide *which* question to take in the obvious way - e.g., if you

> got a random number p between p1 and p2, then the question number in the

> class 2 list is m=(p-p1)/(1/2). You then need to look at the m'th

> position in the on-disk list of class 2 questions, to get the actual

> index of the question.

>

> Finally, when the user answers you need to update your data. You need

> to remove the question from the previous class (to do this in O(1) time,

> just move the last element of the list into the newly formed hole),

> and to add it to the new class (put it last), and of course update

> n1,n2,n3 and p1,p2,p3.

>

> Again, if you don't mind O(N) memory usage - actually just N*4 bytes -

> then I suggest that you do keep the list of questions - just their

> numbers - in memory. Then you won't need to read and write the disk

> all the time.

>

> Nadav.

>

> P.S. This is of course a variant on the classic shuffling (or random

> permutation) algorithm. In the shuffling algorithm, you need to draw

> random questions from the list, but after you used a question, you don't

> want to see it again (its probability becomes zero). Instead of keeping

> the used questions on the list (and getting O(N^2) time complexity for

> the whole shuffle operation), the trick is to move them to the end of

> the list, and to remember how many questions we still have (in the

> beginning of the list) to draw from.

--

Sent from a PC running a top secret test version of Windows 97.

My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.

They do not represent the official policy of any organization with which

I may be affiliated in any way.

WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html - << Previous post in topic Next post in topic >>