Loading ...
Sorry, an error occurred while loading the content.

5189Re: [hackers-il] Algorithm for randomly choosing a question

Expand Messages
  • Omer Zak
    Jul 8, 2012
    • 0 Attachment
      Hello Nadav,
      Thanks for your answer.

      I ended up implementing something similar to your proposal.
      The questions are stored in a SQLite database, so I don't mind at all
      O(N) disk space consumption. I did want O(1) RAM consumption, as RAM is
      to be considered as a scarce resource in smartphones.

      The number of probability classes is finite (3 if to be exact). In
      principle, it is possible to have a more sophisticated algorithm for
      selecting the next question (with unlimited, time varying probability
      classes) but I don't think that the subject matter warrants this kind of
      sophistication.

      The disk files for each probability class are being kept implicitly - as
      result sets of queries for each probability class.
      The actual performance of the algorithm (O(1), O(N) or whatever)
      actually depends upon the DB performance, which I didn't bother to
      benchmark or fine tune as a function of the number of questions.

      I did not (even implicitly) implement the optimization that you
      suggested for O(1) disk filee modifications.

      --- Omer


      On Sun, 2012-07-08 at 10:23 +0300, Nadav Har'El wrote:
      > On Sat, Jul 07, 2012, Omer Zak wrote about "Re: [hackers-il] Algorithm for randomly choosing a question":
      > > The proposed approach has the problem of memory consumption of order of
      > > O(N) (where N is the number of the questions). I'd like to see an
      > > algorithm whose memory consumption is closer to O(1) than to O(N).
      >
      > Omer, here is an O(1) *RAM* algorithm. Note that you still have O(N)
      > disk usage, of course - there are N questions, and they need to be saved
      > somewhere, and so is the history of the success for them.
      > If you drop the O(1) memory requirement (which I think is an overkill for
      > any type of modern computer), you can make things simpler.
      >
      > Note - I assume the number of probability classes (in your example,
      > 1, 1/2 and 1/100, if I remember correctly) is constant (O(1)) and not
      > O(N).
      >
      > Keep on disk a file for each probability class - one list of questions
      > with normal probability (1), another list of questions with half
      > probability, and a third list of questions with 1/100 probability.
      > Each will be a list of numbers (indexing questions stored in a separate
      > file).
      >
      > In *memory*, keep the number of questions in each class: n1, n2, n3,
      > and the partial probability sums:
      > p1 = n1*1
      > p2 = n1*1 + n2*(1/2.)
      > p3 = n1*1 + n2*(1/2.) + n3*(1/100.)
      >
      > Each time you want to draw a random question, pick a random number
      > between 0 and p3. If it's between 0 and p1, you decided to take a
      > question from class 1. If it's between p1 and p2, you decided to take
      > a question from class 3. If it's between p2 and p3, then from class 3.
      >
      > You decide *which* question to take in the obvious way - e.g., if you
      > got a random number p between p1 and p2, then the question number in the
      > class 2 list is m=(p-p1)/(1/2). You then need to look at the m'th
      > position in the on-disk list of class 2 questions, to get the actual
      > index of the question.
      >
      > Finally, when the user answers you need to update your data. You need
      > to remove the question from the previous class (to do this in O(1) time,
      > just move the last element of the list into the newly formed hole),
      > and to add it to the new class (put it last), and of course update
      > n1,n2,n3 and p1,p2,p3.
      >
      > Again, if you don't mind O(N) memory usage - actually just N*4 bytes -
      > then I suggest that you do keep the list of questions - just their
      > numbers - in memory. Then you won't need to read and write the disk
      > all the time.
      >
      > Nadav.
      >
      > P.S. This is of course a variant on the classic shuffling (or random
      > permutation) algorithm. In the shuffling algorithm, you need to draw
      > random questions from the list, but after you used a question, you don't
      > want to see it again (its probability becomes zero). Instead of keeping
      > the used questions on the list (and getting O(N^2) time complexity for
      > the whole shuffle operation), the trick is to move them to the end of
      > the list, and to remember how many questions we still have (in the
      > beginning of the list) to draw from.
      --
      Sent from a PC running a top secret test version of Windows 97.
      My own blog is at http://www.zak.co.il/tddpirate/

      My opinions, as expressed in this E-mail message, are mine alone.
      They do not represent the official policy of any organization with which
      I may be affiliated in any way.
      WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
    • Show all 7 messages in this topic