Loading ...
Sorry, an error occurred while loading the content.
 

Secondary Hash Values Considered Harmful

Expand Messages
  • Shlomi Fish
    The Freecell Solver s internal hash (which lives in fcs_hash.c and fcs_hash.h) has gone through several modifications and tweaks in its history. At one time, I
    Message 1 of 2 , Apr 23, 2009
      The Freecell Solver's internal hash (which lives in fcs_hash.c and fcs_hash.h)
      has gone through several modifications and tweaks in its history. At one time,
      I replaced Perl's hash function with the one from http://burtleburtle.net/bob/
      , but then decided to also store the two hash values (generated by the old and
      new hash functions) next to the actual item, so one can compare them. So
      naturally, I had to run the two hash functions in order to computer the two
      distinct hash values for each element that I'd like to insert.

      For some months now I wondered if the secondary hash function (= the one that
      was used only for ruling out false-duplicates, not the one used for actually
      computing the chain of the elements) was really worth it. Its advantages are a
      possible elimination of some false-duplicate states[Birthday], but its
      disadvantages are an overhead of computing the secondary hash, the memory
      occupied by the code of the secondary hash and the secondary hash values (in
      the hash-wrapper for each element), and some small overhead of the
      comparisons.

      So today I implemented a compile-time option that surgically removes the
      internal hash value/function from the resultant binaries' internal hash, and
      then I benchmarked them.

      Perhaps unsurprisingly, the results were not disappointing:

      1. While the Microsoft 32,000 previously finished on my machine (a Pentium 4
      2.4 GHz running Mandriva Linux Cooker) at 228.917690038681s , they now
      finished at 197.041167974472s (162.4 boards / second).

      2. With the FCS_FREECELL_ONLY flag, the time it took to solve the 32K was
      reduced from 194.563528776169s down to 175.43700504303s ( 182.4 boards /
      second ).

      ----------

      These are huge gains for so little effort. I'm quite surprised they were so
      substantial. It seems that a lot of time was unnecessarily spent dealing with
      the secondary hash. I have a few other hash details in mind that I'd like to
      benchmark and see which one is better.

      Meanwhile, I think I'll make the new CMake/C-preprocessor option the default.

      Regards,

      Shlomi Fish


      [Birthday] - see the Birthday paradox here:
      http://burtleburtle.net/bob/hash/birthday.html

      With a 31-bit hash value we get about one collision per 2**16 == 65536 items
      on average. With a combined 31-bit+31-bit hash values we get about one
      collision per 3037000499.97605 items on average.
      --
      -----------------------------------------------------------------
      Shlomi Fish http://www.shlomifish.org/
      Freecell Solver - http://fc-solve.berlios.de/

      God gave us two eyes and ten fingers so we will type five times as much as we
      read.
    • Shlomi Fish
      ... [snipped] ... Next I covered the optimize_for_caching run-time flag of the hash. What it does is promote entries that were found to the beginning of the
      Message 2 of 2 , Apr 25, 2009
        On Friday 24 April 2009 00:52:21 Shlomi Fish wrote:
        > The Freecell Solver's internal hash (which lives in fcs_hash.c and
        > fcs_hash.h) has gone through several modifications and tweaks in its
        > history. At one time, I replaced Perl's hash function with the one from
        > http://burtleburtle.net/bob/ , but then decided to also store the two hash
        > values (generated by the old and new hash functions) next to the actual
        > item, so one can compare them. So naturally, I had to run the two hash
        > functions in order to computer the two distinct hash values for each
        > element that I'd like to insert.
        [snipped]
        >
        > These are huge gains for so little effort. I'm quite surprised they were so
        > substantial. It seems that a lot of time was unnecessarily spent dealing
        > with the secondary hash. I have a few other hash details in mind that I'd
        > like to benchmark and see which one is better.

        Next I covered the optimize_for_caching run-time flag of the hash. What it
        does is promote entries that were found to the beginning of the chain in the
        hash table. Eliminating it (first in run-time, then in compile-time) also
        turned out to have a positive effect on the solver's speed, but this time
        relatively small. Still - we saved a second or two.

        I haven't made the no-secondary-hash option the default yet, but I plan to.

        Regards,

        Shlomi Fish

        --
        -----------------------------------------------------------------
        Shlomi Fish http://www.shlomifish.org/
        Understand what Open Source is - http://xrl.us/bjn82

        God gave us two eyes and ten fingers so we will type five times as much as we
        read.
      Your message has been successfully submitted and would be delivered to recipients shortly.