5189Re: [hackers-il] Algorithm for randomly choosing a question
- Jul 8, 2012Hello Nadav,
Thanks for your answer.
I ended up implementing something similar to your proposal.
The questions are stored in a SQLite database, so I don't mind at all
O(N) disk space consumption. I did want O(1) RAM consumption, as RAM is
to be considered as a scarce resource in smartphones.
The number of probability classes is finite (3 if to be exact). In
principle, it is possible to have a more sophisticated algorithm for
selecting the next question (with unlimited, time varying probability
classes) but I don't think that the subject matter warrants this kind of
The disk files for each probability class are being kept implicitly - as
result sets of queries for each probability class.
The actual performance of the algorithm (O(1), O(N) or whatever)
actually depends upon the DB performance, which I didn't bother to
benchmark or fine tune as a function of the number of questions.
I did not (even implicitly) implement the optimization that you
suggested for O(1) disk filee modifications.
On Sun, 2012-07-08 at 10:23 +0300, Nadav Har'El wrote:
> On Sat, Jul 07, 2012, Omer Zak wrote about "Re: [hackers-il] Algorithm for randomly choosing a question":
> > The proposed approach has the problem of memory consumption of order of
> > O(N) (where N is the number of the questions). I'd like to see an
> > algorithm whose memory consumption is closer to O(1) than to O(N).
> Omer, here is an O(1) *RAM* algorithm. Note that you still have O(N)
> disk usage, of course - there are N questions, and they need to be saved
> somewhere, and so is the history of the success for them.
> If you drop the O(1) memory requirement (which I think is an overkill for
> any type of modern computer), you can make things simpler.
> Note - I assume the number of probability classes (in your example,
> 1, 1/2 and 1/100, if I remember correctly) is constant (O(1)) and not
> Keep on disk a file for each probability class - one list of questions
> with normal probability (1), another list of questions with half
> probability, and a third list of questions with 1/100 probability.
> Each will be a list of numbers (indexing questions stored in a separate
> In *memory*, keep the number of questions in each class: n1, n2, n3,
> and the partial probability sums:
> p1 = n1*1
> p2 = n1*1 + n2*(1/2.)
> p3 = n1*1 + n2*(1/2.) + n3*(1/100.)
> Each time you want to draw a random question, pick a random number
> between 0 and p3. If it's between 0 and p1, you decided to take a
> question from class 1. If it's between p1 and p2, you decided to take
> a question from class 3. If it's between p2 and p3, then from class 3.
> You decide *which* question to take in the obvious way - e.g., if you
> got a random number p between p1 and p2, then the question number in the
> class 2 list is m=(p-p1)/(1/2). You then need to look at the m'th
> position in the on-disk list of class 2 questions, to get the actual
> index of the question.
> Finally, when the user answers you need to update your data. You need
> to remove the question from the previous class (to do this in O(1) time,
> just move the last element of the list into the newly formed hole),
> and to add it to the new class (put it last), and of course update
> n1,n2,n3 and p1,p2,p3.
> Again, if you don't mind O(N) memory usage - actually just N*4 bytes -
> then I suggest that you do keep the list of questions - just their
> numbers - in memory. Then you won't need to read and write the disk
> all the time.
> P.S. This is of course a variant on the classic shuffling (or random
> permutation) algorithm. In the shuffling algorithm, you need to draw
> random questions from the list, but after you used a question, you don't
> want to see it again (its probability becomes zero). Instead of keeping
> the used questions on the list (and getting O(N^2) time complexity for
> the whole shuffle operation), the trick is to move them to the end of
> the list, and to remember how many questions we still have (in the
> beginning of the list) to draw from.
Sent from a PC running a top secret test version of Windows 97.
My own blog is at http://www.zak.co.il/tddpirate/
My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
- << Previous post in topic Next post in topic >>