Loading ...
Sorry, an error occurred while loading the content.

=?iso-8859-1?Q?Re: [hackers-il] To Hash or not to Hash [was Re: \"On Lisp\" now available =

Expand Messages
  • Arik Baratz
    online for download ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Hashes are application specific. A good hash is good in a specific
    Message 1 of 22 , Feb 6, 2002
    • 0 Attachment
      online for download ]?=
      MIME-Version: 1.0
      Content-Type: text/plain; charset=iso-8859-1


      Hashes are application specific. A good hash is good in a specific situation -
      that is, for each function f(x) of a random variable x there is a hash function
      h(f(x)) that generates a uniform distribution.

      If you want a catch-all hash function, use a cryptographic hash like MD5. If you
      want a lean and mean function, you\'d have to analyze the domain you are working
      with and think up a function that hashes it ok.

      The glibc hash is very simplistic - you can beat it in many ways.

      -- Arik

      On 06.02.2002 at 15:49:49, Shlomi Fish <shlomif@...> wrote:

      > On 5 Feb 2002, Oleg Goldshmidt wrote:
      >
      > > Omer Musaev <omerm@...> writes:
      > >
      > > > some more esoteric features entered the library, while some basic
      > > > ones ( most notably hash ) are absent.
      > >
      > > How many times have I missed that! It does come with gcc, but is not
      > > standard...
      > >
      >
      > I first thought that it was strange that STL lacked hashes. But then I
      > remembered something: hashing theory is very comprehensive and there are a
      > lot of ways to implement and tweak a hash. Points that come to mind:
      >
      > 1. Chaining vs. Open-Addressing.
      >
      > 2. The Hash function being chosen. (refer to:
      > http://burtleburtle.net/bob/hash/)
      >
      > 3. Modulo bucketing vs. multiplicative bucketing.
      >
      > 4. Re-hashing.
      >
      > 5. Storing the hash values next to the keys (to make for faster
      > comparisons and re-hashings)
      >
      > 6. Promoting or caching frequently accessed elements in the backets.
      >
      > 7. Using something other than a linked list as a bucket - a doubly-linked
      > list, a binary tree, a vector, another hash (;-) - I know someone who did
      > that because he did not know better),
      >
      > 8. Perfect Hashing.
      >
      > 9. Universal Hashing.
      >
      > 10. Which operations: an atomic check-if-exists and if not add? An atomic
      > check-and-replace, an insert-if-not-exist, etc.
      >
      > ---
      >
      > When working on Freecell Solver, I noticed that my own hash performed
      > better than Glib\'s because I used better optimizations. I believe it would
      > have out-performed Glib\'s hash in most other cases. (Note - it\'s API is
      > still very incomplete, because of the requirements of FCS). Had I written
      > a Gtk+/GNOME app, I probably would have been \"forced\" to use it because
      > it\'s part of the Gtk+ architecture, which would have made the application
      > slower.
      >
      > Some languages, like Perl, force a certain implementation (and a certain
      > hash function) on their users. In Perl at least, one can program
      > primitives that behave like hashes in Perl and in C, but they are not the
      > default. Mark-Jason Dominus demonstrates that the hash function can go
      > wrong:
      >
      > http://perl.plover.com/#badhash
      >
      > So basically the dilemma is what hash implementation to have, or what
      > range of hashing options to support. I\'m not sure it is possible to build
      > a hash abstraction that would support all of the hashing options and will
      > not be too bloated or unmaintainable.
      >
      > Had STL contained a hash, it would have been possible that some
      > programmers could have (and possibly rightfully) blamed their program\'s
      > performance on the hash. Since a hash is usually quite trivial to program,
      > it\'s, IMO, a good idea to sometimes use your own custom hash
      > implementation.
      >
      > Regards,
      >
      > Shlomi Fish
      >
      > There is no IGLU Cabal! They tried to save complexity and preferred to use
      > a hash over a balanced binary tree. They managed to insert all of the
      > elements in O(n) time, but then discovered it would take an extra
      > O(n*log(n)) to print them in order. This caused confusion and
      > disappointment and brought the demise of the cabal.
      >
      >
      > > > [1] http://www.stlport.org/resources/StepanovUSA.html
      > >
      > > There is a great passage there about Stepanov\'s early realization that
      > > \"agorithms are defined on algebraic structures\". In my head, it echoes
      > > the \"show me your structures and the block diagram becomes irrelevant\"
      > > very nicely.
      > >
      > > What does the hackers-il population think of Stepanov\'s criticism of
      > > OOP?
      > >
      > > \"I have yet to see an interesting piece of code that comes from these
      > > OO people.\"
      > >
      > > \"...It might be a profitable thing for all your readers to learn Java,
      > > but it has no intellectual value whatsoever.\"
      > >
      > > What I did like was the \"Money Oriented Programming\" moniker ;-)
      > >
      > > --
      > > Oleg Goldshmidt | ogoldshmidt@...
      > > \"If it ain\'t broken, it has not got enough features yet.\"
      > >
      > >
      > > To unsubscribe from this group, send an email to:
      > > hackers-il-unsubscribe@egroups.com
      > >
      > >
      > >
      > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      > >
      > >
      >
      >
      >
      > ----------------------------------------------------------------------
      > Shlomi Fish shlomif@...
      > Home Page: http://t2.technion.ac.il/~shlomif/
      > Home E-mail: shlomif@...
      >
      > \"Let\'s suppose you have a table with 2^n cups...\"
      > \"Wait a second - is n a natural number?\"
      >
      >
      >
      > To unsubscribe from this group, send an email to:
      > hackers-il-unsubscribe@egroups.com
      >
      >
      >
      > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      >
      >




      Arik Baratz
      System Engineer
      arikb@...

      Office:
      4 Hamelacha St.
      Raa’nana 43661
      ISRAEL

      Tel: +972 (9) 743-9250 ext. 214
      Fax: +972 (9) 743-9251
      Cell: +972 (52) 354 959
      eFax: +1 (978) 926-8913
      ICQ: 210 8214

      Privileged and / or confidential Information may be contained in this electronic
      mail message.

      You may not copy or deliver this message to anyone without my consent.

      If you are not the addressee indicated in this message, or you feel that this
      message is not intended for you, Please destroy this message and kindly notify
      the sender by replying to this electronic mail.

      Please advise immediately if you or your employer do not agree to the use of
      Internet email for messages of this kind.

      Opinions, conclusions and other information in this message that do not relate
      to the official business of Vidius shall be understood as neither given nor
      endorsed by it.
    • Omer Zak
      ... [... more options were snipped ...] ... On the other hand, let s not forget what I believe to be the reason for FORTH s demise. FORTH is a very elegant
      Message 2 of 22 , Feb 6, 2002
      • 0 Attachment
        On Wed, 6 Feb 2002, Shlomi Fish wrote:

        > On 5 Feb 2002, Oleg Goldshmidt wrote:
        >
        > > Omer Musaev <omerm@...> writes:
        > >
        > > > some more esoteric features entered the library, while some basic
        > > > ones ( most notably hash ) are absent.
        > >
        > > How many times have I missed that! It does come with gcc, but is not
        > > standard...
        > >
        >
        > I first thought that it was strange that STL lacked hashes. But then I
        > remembered something: hashing theory is very comprehensive and there are a
        > lot of ways to implement and tweak a hash. Points that come to mind:
        >
        > 1. Chaining vs. Open-Addressing.

        [... more options were snipped ...]

        > So basically the dilemma is what hash implementation to have, or what
        > range of hashing options to support. I'm not sure it is possible to build
        > a hash abstraction that would support all of the hashing options and will
        > not be too bloated or unmaintainable.
        >
        > Had STL contained a hash, it would have been possible that some
        > programmers could have (and possibly rightfully) blamed their program's
        > performance on the hash. Since a hash is usually quite trivial to program,
        > it's, IMO, a good idea to sometimes use your own custom hash
        > implementation.

        On the other hand, let's not forget what I believe to be the reason for
        FORTH's demise. FORTH is a very elegant language, with unorthodox ideas.
        It was invented by Chuck Moore, who is having his own eccentric (and
        fresh) ideas about how one should program.

        The reason FORTH didn't take hold (at least in my own projects) was that
        it lacked standard libraries for the things which I needed. It expected
        people to reinvent the wheel (and optimize it to their project's needs)
        all the time. It didn't take to heart Pareto's Law (80% of the
        computer time/programmer time/memory requirements/bug expenses of software
        are in 20% of the code). People should optimize and design their own
        implementations of data structures only when and where they are critical
        to the software's performance. For non-critical parts of the software,
        standard libraries are good enough and should be used.

        The morale of the story to hash functions in STL:
        STL should have provided a standard hash implementation (like Perl does).
        But the standard implementation should (like implementations of all
        other STL data structuers) have provisions for people to substitute their
        optimized algorithms when those algorithms are really needed for a
        specific application.
        --- Omer
        There is no IGLU Cabal. We'll organize the next meeting when we finish
        to prove, from First Principles, that 1 + 1 = 2.
        WARNING TO SPAMMERS: see at http://www.zak.co.il/spamwarning.html
      • Nadav Har'El
        ... When efficiency is the key, canned hash implementations are many times not good enough. They waste memory when memory is important to you, they are too
        Message 3 of 22 , Feb 6, 2002
        • 0 Attachment
          On Wed, Feb 06, 2002, Shlomi Fish wrote about "[hackers-il] To Hash or not to Hash [was Re: "On Lisp" now available online for download ]":
          > On 5 Feb 2002, Oleg Goldshmidt wrote:
          >
          > > Omer Musaev <omerm@...> writes:
          > >
          > > > some more esoteric features entered the library, while some basic
          > > > ones ( most notably hash ) are absent.
          > >
          > > How many times have I missed that! It does come with gcc, but is not
          > > standard...
          > >
          >
          > I first thought that it was strange that STL lacked hashes. But then I
          > remembered something: hashing theory is very comprehensive and there are a
          > lot of ways to implement and tweak a hash. Points that come to mind:
          >...

          When efficiency is the key, "canned" hash implementations are many times
          not good enough. They waste memory when memory is important to you, they
          are too slow when speed is important, the hash function doesn't work well
          for your (weird) choice of keys, they don't have predictable run times
          (e.g., one insert can take 1000 times of another insert, because your hash
          spontaneously decided to grow and reorder itself), and a bunch of other
          problems.

          But when absolute efficiency and suiting the exact problem isn't needed,
          a general-purpose hash is a very useful tool. Consider for example the
          hashes in Perl or Awk. I've used them countless times. They let you do
          very powerful stuff very easily. And I never cared if they are not
          the most efficient solution possible.

          So I think the STL *should* contain a hash, at least for keys with certain
          simple types (integer arrays, strings, etc.). Someone who needs very special
          implementations for very specific purposes will probably implement something
          differet - but someone who just needs "something that works well" will be
          happy with the STL's hash.

          Other STL containers have similar tradeoffs as with hashes - consider, for
          example, priority queues. STL's implementation are based on "heaps", and
          have O(logn) insertion and pop. It's easy to write a different implementation
          with a O(n) insertion of a whole sorted group of events and O(1) pop - such
          implementation might be faster for special uses. It's possible to implement
          hash-like O(1) insert and O(1) pop if you make other types of assumptions on
          the distribution of events on the queue.
          For example, in one program I once wrote in C++, where efficiency was of
          utmost importance and required the use of priority queues in two places -
          in one place the STL implementation was very good (better than something I
          tried to write myself), but in another place, my O(1) implementation was
          better.

          In the same program I also needed a hashtable, and in that case two things
          were very important: very low memory use and predictable time (no sudden
          reorganizations of the hash), so the SGI's STL hash was not suitable at
          all. I ended up writing a very optimized solution, that ended up taking
          only about 6 bytes (if I remember correctly) per entry over the length of
          the entry (key+value) itself, under certain assumption (one key assumption
          that the hash table will never contain more than 2^16 entries).

          --
          Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
          nyh@... |-----------------------------------------
          Phone: +972-53-245868, ICQ 13349191 |I want to live forever or die in the
          http://nadav.harel.org.il |attempt.
        • Shlomi Fish
          ... Sounds like a good compromise. In any case, deriving a class from a base hash class may make the code a bit more difficult but not much, assuming the code
          Message 4 of 22 , Feb 6, 2002
          • 0 Attachment
            On Thu, 7 Feb 2002, Omer Zak wrote:

            >
            > On Wed, 6 Feb 2002, Shlomi Fish wrote:
            >
            > > On 5 Feb 2002, Oleg Goldshmidt wrote:
            > >
            > > > Omer Musaev <omerm@...> writes:
            > > >
            > > > > some more esoteric features entered the library, while some basic
            > > > > ones ( most notably hash ) are absent.
            > > >
            > > > How many times have I missed that! It does come with gcc, but is not
            > > > standard...
            > > >
            > >
            > > I first thought that it was strange that STL lacked hashes. But then I
            > > remembered something: hashing theory is very comprehensive and there are a
            > > lot of ways to implement and tweak a hash. Points that come to mind:
            > >
            > > 1. Chaining vs. Open-Addressing.
            >
            > [... more options were snipped ...]
            >
            > > So basically the dilemma is what hash implementation to have, or what
            > > range of hashing options to support. I'm not sure it is possible to build
            > > a hash abstraction that would support all of the hashing options and will
            > > not be too bloated or unmaintainable.
            > >
            > > Had STL contained a hash, it would have been possible that some
            > > programmers could have (and possibly rightfully) blamed their program's
            > > performance on the hash. Since a hash is usually quite trivial to program,
            > > it's, IMO, a good idea to sometimes use your own custom hash
            > > implementation.
            >
            > On the other hand, let's not forget what I believe to be the reason for
            > FORTH's demise. FORTH is a very elegant language, with unorthodox ideas.
            > It was invented by Chuck Moore, who is having his own eccentric (and
            > fresh) ideas about how one should program.
            >
            > The reason FORTH didn't take hold (at least in my own projects) was that
            > it lacked standard libraries for the things which I needed. It expected
            > people to reinvent the wheel (and optimize it to their project's needs)
            > all the time. It didn't take to heart Pareto's Law (80% of the
            > computer time/programmer time/memory requirements/bug expenses of software
            > are in 20% of the code). People should optimize and design their own
            > implementations of data structures only when and where they are critical
            > to the software's performance. For non-critical parts of the software,
            > standard libraries are good enough and should be used.
            >
            > The morale of the story to hash functions in STL:
            > STL should have provided a standard hash implementation (like Perl does).
            > But the standard implementation should (like implementations of all
            > other STL data structuers) have provisions for people to substitute their
            > optimized algorithms when those algorithms are really needed for a
            > specific application.

            Sounds like a good compromise. In any case, deriving a class from a base
            hash class may make the code a bit more difficult but not much, assuming
            the code makes use of hash<mytype> & instead of
            my_own_efficient_hash<mytype> & (pardon my C++ and STL ignorance in case
            it shows).

            > --- Omer
            > There is no IGLU Cabal. We'll organize the next meeting when we finish
            > to prove, from First Principles, that 1 + 1 = 2.

            Purposely taking the joke too seriously:

            I believe Russel and Whitham managed to prove arithmetics from logic in
            "Prinicipa Mathematica". However, I believe 2 is defined as "1 + 1", or as
            the number that follows one, even though it undoubtedly has much more
            properties.

            It's like the De-Morgan Laws (not(A and B) = not(A) or not(B) and vice
            versa). One can show that they hold for every A and B and that's enough of
            a proof. I am not referring to the general case (for A1, A2, A3, A4...)
            that has to be proved by mathemtical induction.

            Regards,

            Shlomi Fish

            There is no IGLU Cabal! One of its most prominent members commited suicide
            when he failed to find a proof that the axiom that "A is A, and A is not
            not-A" holds for every A.

            > WARNING TO SPAMMERS: see at http://www.zak.co.il/spamwarning.html
            >
            >
            >
            > To unsubscribe from this group, send an email to:
            > hackers-il-unsubscribe@egroups.com
            >
            >
            >
            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            >
            >



            ----------------------------------------------------------------------
            Shlomi Fish shlomif@...
            Home Page: http://t2.technion.ac.il/~shlomif/
            Home E-mail: shlomif@...

            "Let's suppose you have a table with 2^n cups..."
            "Wait a second - is n a natural number?"
          • Shlomi Fish
            ... Actually if f(x) = C, where C is a constant, no such hash function exist or can exist. ... Have you ever tried running md5sum on a CD-ROM you just
            Message 5 of 22 , Feb 6, 2002
            • 0 Attachment
              On Wed, 6 Feb 2002, Arik Baratz wrote:

              > online for download ]?=
              > MIME-Version: 1.0
              > Content-Type: text/plain; charset=iso-8859-1
              >
              >
              > Hashes are application specific. A good hash is good in a specific situation -
              > that is, for each function f(x) of a random variable x there is a hash function
              > h(f(x)) that generates a uniform distribution.
              >

              Actually if f(x) = C, where C is a constant, no such hash function exist
              or can exist.

              > If you want a catch-all hash function, use a cryptographic hash like MD5.

              Have you ever tried running md5sum on a CD-ROM you just downloaded from
              the Internet? It takes forever to run. Usually, one will prefer to use a
              non-cryptographic hash function because it is faster and albeit can be
              tricked, usually generates equally good results. If there are a limited
              number of elements, than a Perfect Hash, which you'll have to compile from
              the set, would probably be a better choice.

              This is of course assuming the keys are fully serializable in some way.
              Sometimes, preparing a serialized key is not very straightforward.

              > If you
              > want a lean and mean function, you\'d have to analyze the domain you are working
              > with and think up a function that hashes it ok.
              >

              That was the case with Freecell Solver, where I recently switched from MD5
              to Perl's hash function and did not notice too big a difference in the
              speed.

              However, the hash function being chosen is just one of the elements of
              constructing a good hash (albeit a very important one, because a hash is
              only as good as it). Refer to my previous posts for other elements like
              chaining vs. open addressing, promoting or caching elements, etc. In C++,
              I'm not sure how well can one re-use the chain's elements, when they are
              re-hashed, and just pointing them to differnt elements, which is another
              technique I used in FCS.

              What I like about ANSI C, is that it does what you want when you want it,
              and if you know what you're doing there are very little side-effects. It
              does provide for a greater error ratio, than most other languages I know,
              though.

              > The glibc hash is very simplistic - you can beat it in many ways.
              >

              s/glibc/glib/? (glibc does not contain a hash, AFAI'm aware).

              My point was that if you are working on a Gtk+/GNOME application, you are
              stuck with the glib's hash whether you want it or not, because it's part
              of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
              then you may be criticized for making the code unreadable, or deviating
              from the Standard Way of Doing Things<tm> there. And in Glib's 1.2 at
              least, I believe it's impossible to define a hash with the same interface
              as Glib's hash, not to mention that the interface itself is not very
              optimizied (or sensible).

              Regards,

              Shlomi Fish

              > -- Arik
              >
              > On 06.02.2002 at 15:49:49, Shlomi Fish <shlomif@...> wrote:
              >
              > > On 5 Feb 2002, Oleg Goldshmidt wrote:
              > >
              > > > Omer Musaev <omerm@...> writes:
              > > >
              > > > > some more esoteric features entered the library, while some basic
              > > > > ones ( most notably hash ) are absent.
              > > >
              > > > How many times have I missed that! It does come with gcc, but is not
              > > > standard...
              > > >
              > >
              > > I first thought that it was strange that STL lacked hashes. But then I
              > > remembered something: hashing theory is very comprehensive and there are a
              > > lot of ways to implement and tweak a hash. Points that come to mind:
              > >
              > > 1. Chaining vs. Open-Addressing.
              > >
              > > 2. The Hash function being chosen. (refer to:
              > > http://burtleburtle.net/bob/hash/)
              > >
              > > 3. Modulo bucketing vs. multiplicative bucketing.
              > >
              > > 4. Re-hashing.
              > >
              > > 5. Storing the hash values next to the keys (to make for faster
              > > comparisons and re-hashings)
              > >
              > > 6. Promoting or caching frequently accessed elements in the backets.
              > >
              > > 7. Using something other than a linked list as a bucket - a doubly-linked
              > > list, a binary tree, a vector, another hash (;-) - I know someone who did
              > > that because he did not know better),
              > >
              > > 8. Perfect Hashing.
              > >
              > > 9. Universal Hashing.
              > >
              > > 10. Which operations: an atomic check-if-exists and if not add? An atomic
              > > check-and-replace, an insert-if-not-exist, etc.
              > >
              > > ---
              > >
              > > When working on Freecell Solver, I noticed that my own hash performed
              > > better than Glib\'s because I used better optimizations. I believe it would
              > > have out-performed Glib\'s hash in most other cases. (Note - it\'s API is
              > > still very incomplete, because of the requirements of FCS). Had I written
              > > a Gtk+/GNOME app, I probably would have been \"forced\" to use it because
              > > it\'s part of the Gtk+ architecture, which would have made the application
              > > slower.
              > >
              > > Some languages, like Perl, force a certain implementation (and a certain
              > > hash function) on their users. In Perl at least, one can program
              > > primitives that behave like hashes in Perl and in C, but they are not the
              > > default. Mark-Jason Dominus demonstrates that the hash function can go
              > > wrong:
              > >
              > > http://perl.plover.com/#badhash
              > >
              > > So basically the dilemma is what hash implementation to have, or what
              > > range of hashing options to support. I\'m not sure it is possible to build
              > > a hash abstraction that would support all of the hashing options and will
              > > not be too bloated or unmaintainable.
              > >
              > > Had STL contained a hash, it would have been possible that some
              > > programmers could have (and possibly rightfully) blamed their program\'s
              > > performance on the hash. Since a hash is usually quite trivial to program,
              > > it\'s, IMO, a good idea to sometimes use your own custom hash
              > > implementation.
              > >
              > > Regards,
              > >
              > > Shlomi Fish
              > >
              > > There is no IGLU Cabal! They tried to save complexity and preferred to use
              > > a hash over a balanced binary tree. They managed to insert all of the
              > > elements in O(n) time, but then discovered it would take an extra
              > > O(n*log(n)) to print them in order. This caused confusion and
              > > disappointment and brought the demise of the cabal.
              > >
              > >
              > > > > [1] http://www.stlport.org/resources/StepanovUSA.html
              > > >
              > > > There is a great passage there about Stepanov\'s early realization that
              > > > \"agorithms are defined on algebraic structures\". In my head, it echoes
              > > > the \"show me your structures and the block diagram becomes irrelevant\"
              > > > very nicely.
              > > >
              > > > What does the hackers-il population think of Stepanov\'s criticism of
              > > > OOP?
              > > >
              > > > \"I have yet to see an interesting piece of code that comes from these
              > > > OO people.\"
              > > >
              > > > \"...It might be a profitable thing for all your readers to learn Java,
              > > > but it has no intellectual value whatsoever.\"
              > > >
              > > > What I did like was the \"Money Oriented Programming\" moniker ;-)
              > > >
              > > > --
              > > > Oleg Goldshmidt | ogoldshmidt@...
              > > > \"If it ain\'t broken, it has not got enough features yet.\"
              > > >
              > > >
              > > > To unsubscribe from this group, send an email to:
              > > > hackers-il-unsubscribe@egroups.com
              > > >
              > > >
              > > >
              > > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              > > >
              > > >
              > >
              > >
              > >
              > > ----------------------------------------------------------------------
              > > Shlomi Fish shlomif@...
              > > Home Page: http://t2.technion.ac.il/~shlomif/
              > > Home E-mail: shlomif@...
              > >
              > > \"Let\'s suppose you have a table with 2^n cups...\"
              > > \"Wait a second - is n a natural number?\"
              > >
              > >
              > >
              > > To unsubscribe from this group, send an email to:
              > > hackers-il-unsubscribe@egroups.com
              > >
              > >
              > >
              > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              > >
              > >
              >
              >
              >
              >
              > Arik Baratz
              > System Engineer
              > arikb@...
              >
              > Office:
              > 4 Hamelacha St.
              > Raa’nana 43661
              > ISRAEL
              >
              > Tel: +972 (9) 743-9250 ext. 214
              > Fax: +972 (9) 743-9251
              > Cell: +972 (52) 354 959
              > eFax: +1 (978) 926-8913
              > ICQ: 210 8214
              >
              > Privileged and / or confidential Information may be contained in this electronic
              > mail message.
              >
              > You may not copy or deliver this message to anyone without my consent.
              >
              > If you are not the addressee indicated in this message, or you feel that this
              > message is not intended for you, Please destroy this message and kindly notify
              > the sender by replying to this electronic mail.
              >
              > Please advise immediately if you or your employer do not agree to the use of
              > Internet email for messages of this kind.
              >
              > Opinions, conclusions and other information in this message that do not relate
              > to the official business of Vidius shall be understood as neither given nor
              > endorsed by it.
              >
              >
              > To unsubscribe from this group, send an email to:
              > hackers-il-unsubscribe@egroups.com
              >
              >
              >
              > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              >
              >



              ----------------------------------------------------------------------
              Shlomi Fish shlomif@...
              Home Page: http://t2.technion.ac.il/~shlomif/
              Home E-mail: shlomif@...

              "Let's suppose you have a table with 2^n cups..."
              "Wait a second - is n a natural number?"
            • Ofir Carny
              As far as I remember, for some applications, you can also use a random hash in order to avoid being tricked. This means a hash function which is not constant
              Message 6 of 22 , Feb 7, 2002
              • 0 Attachment
                As far as I remember, for some applications, you can also use a random hash
                in order to avoid being tricked.

                This means a hash function which is not constant (or depends on another not
                constant parameter).

                Ofir


                **********************************************************************
                This email and any files transmitted were checked by
                Port Authority Enterprise for unathorized content.
                **********************************************************************
              • Nadav Har'El
                ... What do you mean? If you place an entry somewhere in the table, and next time you go looking for it your random hash lands you somewhere else, how will
                Message 7 of 22 , Feb 7, 2002
                • 0 Attachment
                  On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                  > As far as I remember, for some applications, you can also use a random hash
                  > in order to avoid being tricked.
                  >
                  > This means a hash function which is not constant (or depends on another not
                  > constant parameter).

                  What do you mean? If you place an entry somewhere in the table, and next
                  time you go looking for it your "random hash" lands you somewhere else, how
                  will you find that existing entry? Maybe you mean ordering entries in one
                  hash chain in a random order? But I can't see what that would get you -
                  hash chains are supposed to be short anyway.

                  --
                  Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                  nyh@... |-----------------------------------------
                  Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                  http://nadav.harel.org.il |YOUR BUFFERS!!!
                • Ofir Carny
                  As I said, it is only good for specific applications, obviously, you can t change a hash function without rebuilding an existing table, however in some
                  Message 8 of 22 , Feb 7, 2002
                  • 0 Attachment
                    As I said, it is only good for specific applications, obviously, you can't
                    change a hash function without rebuilding an existing table, however in some
                    applications it is enough to prevent a malicious attempt to 'break' your
                    function.

                    I didn't refer to the chain, however (unrelated), you can use a second
                    function to use the table for the chain, avoiding memory allocation for
                    collisions in a constant sized table.
                    -----Original Message-----
                    From: Nadav Har'El [mailto:nyh@...]
                    Sent: Thursday, February 07, 2002 11:39 AM
                    To: hackers-il@yahoogroups.com
                    Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \"On
                    Lisp\" now available ]


                    On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash
                    or not to Hash [was Re: \"On Lisp\" now available ]":
                    > As far as I remember, for some applications, you can also use a random
                    hash
                    > in order to avoid being tricked.
                    >
                    > This means a hash function which is not constant (or depends on another
                    not
                    > constant parameter).

                    What do you mean? If you place an entry somewhere in the table, and next
                    time you go looking for it your "random hash" lands you somewhere else, how
                    will you find that existing entry? Maybe you mean ordering entries in one
                    hash chain in a random order? But I can't see what that would get you -
                    hash chains are supposed to be short anyway.

                    --
                    Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat
                    5762
                    nyh@...
                    |-----------------------------------------
                    Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                    http://nadav.harel.org.il |YOUR BUFFERS!!!

                    To unsubscribe from this group, send an email to:
                    hackers-il-unsubscribe@egroups.com



                    Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



                    **********************************************************************
                    This email and any files transmitted were checked by
                    Port Authority Enterprise for unathorized content.
                    **********************************************************************
                  • Nadav Har'El
                    ... Oh, I see - you meant choosing, once, a *hash function* at random, but then use the same hash function all the time? Ok. -- Nadav Har El
                    Message 9 of 22 , Feb 7, 2002
                    • 0 Attachment
                      On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                      > As I said, it is only good for specific applications, obviously, you can't
                      > change a hash function without rebuilding an existing table, however in some
                      > applications it is enough to prevent a malicious attempt to 'break' your
                      > function.

                      Oh, I see - you meant choosing, once, a *hash function* at random, but then
                      use the same hash function all the time? Ok.

                      --
                      Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                      nyh@... |-----------------------------------------
                      Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                      http://nadav.harel.org.il |me food for free if I find a bug in it!
                    • Shlomi Fish
                      ... There is a methodology to construct a random hash function out of a universal set of hash functions. This is called Universal Hashing, and I studied about
                      Message 10 of 22 , Feb 7, 2002
                      • 0 Attachment
                        On Thu, 7 Feb 2002, Nadav Har'El wrote:

                        > On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                        > > As I said, it is only good for specific applications, obviously, you can't
                        > > change a hash function without rebuilding an existing table, however in some
                        > > applications it is enough to prevent a malicious attempt to 'break' your
                        > > function.
                        >
                        > Oh, I see - you meant choosing, once, a *hash function* at random, but then
                        > use the same hash function all the time? Ok.
                        >

                        There is a methodology to construct a random hash function out of a
                        universal set of hash functions. This is called Universal Hashing, and I
                        studied about it in my DS and Algorithms course. An example for it, would
                        be to randomize an arbitrary string to prepend (or append) to the data
                        before it is MD5'ed. That way, even if the user deliberately creates
                        different strings whose first 32-bit MD5 bits are the same, he'll still
                        won't be able to out-smart the hash, because the prefix will make their
                        salt values completely different.

                        Of course, letting the user know what the prefix is will render it
                        useless. So it's kind of like a "security by obscurity" methodolgy.

                        Regards,

                        > --
                        > Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                        > nyh@... |-----------------------------------------
                        > Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                        > http://nadav.harel.org.il |me food for free if I find a bug in it!
                        >
                        >
                        > To unsubscribe from this group, send an email to:
                        > hackers-il-unsubscribe@egroups.com
                        >
                        >
                        >
                        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                        >
                        >



                        ----------------------------------------------------------------------
                        Shlomi Fish shlomif@...
                        Home Page: http://t2.technion.ac.il/~shlomif/
                        Home E-mail: shlomif@...

                        "Let's suppose you have a table with 2^n cups..."
                        "Wait a second - is n a natural number?"
                      • Arik Baratz
                        ailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... - ... function ... Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C,
                        Message 11 of 22 , Feb 7, 2002
                        • 0 Attachment
                          ailable ]?=
                          MIME-Version: 1.0
                          Content-Type: text/plain; charset=iso-8859-1

                          On 07.02.2002 at 09:01:02, Shlomi Fish <shlomif@...> wrote:

                          > >
                          > > Hashes are application specific. A good hash is good in a specific situation
                          -
                          > > that is, for each function f(x) of a random variable x there is a hash
                          function
                          > > h(f(x)) that generates a uniform distribution.
                          > Actually if f(x) = C, where C is a constant, no such hash function exist
                          > or can exist.

                          Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C, you might want to
                          use a data structure different than a hash anyways.

                          > > If you want a catch-all hash function, use a cryptographic hash like MD5.
                          > Have you ever tried running md5sum on a CD-ROM you just downloaded from
                          > the Internet? It takes forever to run. Usually, one will prefer to use a
                          > non-cryptographic hash function because it is faster and albeit can be
                          > tricked, usually generates equally good results. If there are a limited
                          > number of elements, than a Perfect Hash, which you\'ll have to compile from
                          > the set, would probably be a better choice.
                          >
                          > This is of course assuming the keys are fully serializable in some way.
                          > Sometimes, preparing a serialized key is not very straightforward.

                          Actually I have just md5sum-ed a 2GB file, so I know how slow it is... I totaly
                          agree that baring special circumstances (i.e. in order to create the trapdoor
                          effect, say, in issuing confirmation numbers for flights, when the confirmation
                          cannot lead back to info in the ticket).

                          > > If you
                          > > want a lean and mean function, you\\\'d have to analyze the domain you are
                          working
                          > > with and think up a function that hashes it ok.
                          > That was the case with Freecell Solver, where I recently switched from MD5
                          > to Perl\'s hash function and did not notice too big a difference in the
                          > speed.
                          >
                          > However, the hash function being chosen is just one of the elements of
                          > constructing a good hash (albeit a very important one, because a hash is
                          > only as good as it). Refer to my previous posts for other elements like
                          > chaining vs. open addressing, promoting or caching elements, etc. In C++,
                          > I\'m not sure how well can one re-use the chain\'s elements, when they are
                          > re-hashed, and just pointing them to differnt elements, which is another
                          > technique I used in FCS.
                          >
                          > What I like about ANSI C, is that it does what you want when you want it,
                          > and if you know what you\'re doing there are very little side-effects. It
                          > does provide for a greater error ratio, than most other languages I know,
                          > though.
                          >
                          > > The glibc hash is very simplistic - you can beat it in many ways.
                          > >
                          >
                          > s/glibc/glib/? (glibc does not contain a hash, AFAI\'m aware).

                          yes

                          > My point was that if you are working on a Gtk+/GNOME application, you are
                          > stuck with the glib\'s hash whether you want it or not, because it\'s part
                          > of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
                          > then you may be criticized for making the code unreadable, or deviating
                          > from the Standard Way of Doing Things<tm> there. And in Glib\'s 1.2 at
                          > least, I believe it\'s impossible to define a hash with the same interface
                          > as Glib\'s hash, not to mention that the interface itself is not very
                          > optimizied (or sensible).

                          Well, the SWoDT is not always the best, and you ought not always to listen to
                          what other people say about your style. Well, maybe listen, but not always
                          embrace.

                          -- Arik
                        • Arik Baratz
                          vailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Perhaps all you want is to distribute a set of values in a uniform fashion. -- Arik
                          Message 12 of 22 , Feb 7, 2002
                          • 0 Attachment
                            vailable ]?=
                            MIME-Version: 1.0
                            Content-Type: text/plain; charset=iso-8859-1


                            Perhaps all you want is to distribute a set of values in a uniform fashion.

                            -- Arik

                            On 07.02.2002 at 12:09:56, Ofir Carny <ofir@...> wrote:

                            > As I said, it is only good for specific applications, obviously, you can\'t
                            > change a hash function without rebuilding an existing table, however in some
                            > applications it is enough to prevent a malicious attempt to \'break\' your
                            > function.
                            >
                            > I didn\'t refer to the chain, however (unrelated), you can use a second
                            > function to use the table for the chain, avoiding memory allocation for
                            > collisions in a constant sized table.
                            > -----Original Message-----
                            > From: Nadav Har\'El [mailto:nyh@...]
                            > Sent: Thursday, February 07, 2002 11:39 AM
                            > To: hackers-il@yahoogroups.com
                            > Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On
                            > Lisp\\\" now available ]
                            >
                            >
                            > On Thu, Feb 07, 2002, Ofir Carny wrote about \"RE: [hackers-il] Re: To Hash
                            > or not to Hash [was Re: \\\"On Lisp\\\" now available ]\":
                            > > As far as I remember, for some applications, you can also use a random
                            > hash
                            > > in order to avoid being tricked.
                            > >
                            > > This means a hash function which is not constant (or depends on another
                            > not
                            > > constant parameter).
                            >
                            > What do you mean? If you place an entry somewhere in the table, and next
                            > time you go looking for it your \"random hash\" lands you somewhere else,
                            how
                            > will you find that existing entry? Maybe you mean ordering entries in one
                            > hash chain in a random order? But I can\'t see what that would get you -
                            > hash chains are supposed to be short anyway.
                            >
                            > --
                            > Nadav Har\'El | Thursday, Feb 7 2002, 25 Shevat
                            > 5762
                            > nyh@...
                            > |-----------------------------------------
                            > Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                            > http://nadav.harel.org.il |YOUR BUFFERS!!!
                            >
                            > To unsubscribe from this group, send an email to:
                            > hackers-il-unsubscribe@egroups.com
                            >
                            >
                            >
                            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                            >
                            >
                            >
                            > **********************************************************************
                            > This email and any files transmitted were checked by
                            > Port Authority Enterprise for unathorized content.
                            > **********************************************************************
                            >
                            >
                            >
                            > To unsubscribe from this group, send an email to:
                            > hackers-il-unsubscribe@egroups.com
                            >
                            >
                            >
                            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                            >
                            >




                            Arik Baratz
                            System Engineer
                            arikb@...

                            Office:
                            4 Hamelacha St.
                            Raa’nana 43661
                            ISRAEL

                            Tel: +972 (9) 743-9250 ext. 214
                            Fax: +972 (9) 743-9251
                            Cell: +972 (52) 354 959
                            eFax: +1 (978) 926-8913
                            ICQ: 210 8214

                            Privileged and / or confidential Information may be contained in this electronic
                            mail message.

                            You may not copy or deliver this message to anyone without my consent.

                            If you are not the addressee indicated in this message, or you feel that this
                            message is not intended for you, Please destroy this message and kindly notify
                            the sender by replying to this electronic mail.

                            Please advise immediately if you or your employer do not agree to the use of
                            Internet email for messages of this kind.

                            Opinions, conclusions and other information in this message that do not relate
                            to the official business of Vidius shall be understood as neither given nor
                            endorsed by it.
                          • Nadav Har'El
                            ... Wow, something is *REALLY* wrong with your mail program, Vidius filter, or whatever... A part of the the subject got hacked off into the main text (as you
                            Message 13 of 22 , Feb 7, 2002
                            • 0 Attachment
                              On Thu, Feb 07, 2002, Arik Baratz wrote about "=?iso-8859-1?Q?Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On Lisp\\\" now av=":
                              > ailable ]?=
                              > MIME-Version: 1.0
                              > Content-Type: text/plain; charset=iso-8859-1

                              Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                              or whatever... A part of the the subject got hacked off into the main
                              text (as you can see in the quote above),

                              > > > want a lean and mean function, you\\\'d have to analyze the domain you are
                              > working
                              > > > with and think up a function that hashes it ok.
                              > > That was the case with Freecell Solver, where I recently switched from MD5
                              > > to Perl\'s hash function and did not notice too big a difference in the

                              and something caused all single quotes in your message (even its subject line)
                              to be backslashed, sometimes by more than one backslash. What is this - a
                              mailer written in a shell? :)

                              Weird ;)


                              --
                              Nadav Har'El | Thursday, Feb 7 2002, 26 Shevat 5762
                              nyh@... |-----------------------------------------
                              Phone: +972-53-245868, ICQ 13349191 |I before E except after C. We live in a
                              http://nadav.harel.org.il |weird society!
                            • Arik Baratz
                              On Lisp now av=3D?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... [hackers-il] Re: To Hash or not to Hash [was Re:
                              Message 14 of 22 , Feb 7, 2002
                              • 0 Attachment
                                \\\\\\"On Lisp\\\\\\\" now av=3D?=
                                MIME-Version: 1.0
                                Content-Type: text/plain; charset=iso-8859-1

                                On 07.02.2002 at 21:18:58, Nadav Har\'El <nyh@...> wrote:

                                > On Thu, Feb 07, 2002, Arik Baratz wrote about \"=?iso-8859-1?Q?Re:
                                [hackers-il] Re: To Hash or not to Hash [was Re: \\\\\\\"On Lisp\\\\\\\" now
                                av=\":
                                > > ailable ]?=
                                > > MIME-Version: 1.0
                                > > Content-Type: text/plain; charset=iso-8859-1
                                >
                                > Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                                > or whatever... A part of the the subject got hacked off into the main
                                > text (as you can see in the quote above),

                                I\'m in the united states right now, and I\'m using JawMail
                                http://jawmail.sf.net beta version. It has some flaws, I admit.
                                >
                                > > > > want a lean and mean function, you\\\\\\\'d have to analyze the domain
                                you are
                                > > working
                                > > > > with and think up a function that hashes it ok.
                                > > > That was the case with Freecell Solver, where I recently switched from
                                MD5
                                > > > to Perl\\\'s hash function and did not notice too big a difference in
                                the
                                >
                                > and something caused all single quotes in your message (even its subject
                                line)
                                > to be backslashed, sometimes by more than one backslash. What is this - a
                                > mailer written in a shell? :)
                                >
                                > Weird ;)

                                Not in shell, in PHP. It\'s somewhat buggy, but it works...


                                Arik Baratz
                                System Engineer
                                arikb@...

                                Office:
                                4 Hamelacha St.
                                Raa’nana 43661
                                ISRAEL

                                Tel: +972 (9) 743-9250 ext. 214
                                Fax: +972 (9) 743-9251
                                Cell: +972 (52) 354 959
                                eFax: +1 (978) 926-8913
                                ICQ: 210 8214

                                Privileged and / or confidential Information may be contained in this electronic
                                mail message.

                                You may not copy or deliver this message to anyone without my consent.

                                If you are not the addressee indicated in this message, or you feel that this
                                message is not intended for you, Please destroy this message and kindly notify
                                the sender by replying to this electronic mail.

                                Please advise immediately if you or your employer do not agree to the use of
                                Internet email for messages of this kind.

                                Opinions, conclusions and other information in this message that do not relate
                                to the official business of Vidius shall be understood as neither given nor
                                endorsed by it.
                              • mulix
                                ... [snipped other such monstrosities, and then arik said] ... egads, whatever happened to good ol telnet my.mail.server.com 25 and talking SMTP like
                                Message 15 of 22 , Feb 7, 2002
                                • 0 Attachment
                                  On Thu, 7 Feb 2002, Arik Baratz wrote:

                                  > \\\\\\"On Lisp\\\\\\\" now av=3D?=
                                  > MIME-Version: 1.0
                                  > Content-Type: text/plain; charset=iso-8859-1

                                  [snipped other such monstrosities, and then arik said]
                                  > Not in shell, in PHP. It\'s somewhat buggy, but it works...

                                  egads, whatever happened to good ol' telnet my.mail.server.com 25 and
                                  talking SMTP like reasonable human beings?

                                  ObHackersIL - nothing. things have certainly hit a low point.
                                  --
                                  mulix

                                  http://vipe.technion.ac.il/~mulix/
                                  http://syscalltrack.sf.net/
                                Your message has been successfully submitted and would be delivered to recipients shortly.