Loading ...
Sorry, an error occurred while loading the content.

Re: [hackers-il] To Hash or not to Hash [was Re: "On Lisp" now available online for download ]

Expand Messages
  • Nadav Har'El
    ... When efficiency is the key, canned hash implementations are many times not good enough. They waste memory when memory is important to you, they are too
    Message 1 of 22 , Feb 6, 2002
    • 0 Attachment
      On Wed, Feb 06, 2002, Shlomi Fish wrote about "[hackers-il] To Hash or not to Hash [was Re: "On Lisp" now available online for download ]":
      > On 5 Feb 2002, Oleg Goldshmidt wrote:
      >
      > > Omer Musaev <omerm@...> writes:
      > >
      > > > some more esoteric features entered the library, while some basic
      > > > ones ( most notably hash ) are absent.
      > >
      > > How many times have I missed that! It does come with gcc, but is not
      > > standard...
      > >
      >
      > I first thought that it was strange that STL lacked hashes. But then I
      > remembered something: hashing theory is very comprehensive and there are a
      > lot of ways to implement and tweak a hash. Points that come to mind:
      >...

      When efficiency is the key, "canned" hash implementations are many times
      not good enough. They waste memory when memory is important to you, they
      are too slow when speed is important, the hash function doesn't work well
      for your (weird) choice of keys, they don't have predictable run times
      (e.g., one insert can take 1000 times of another insert, because your hash
      spontaneously decided to grow and reorder itself), and a bunch of other
      problems.

      But when absolute efficiency and suiting the exact problem isn't needed,
      a general-purpose hash is a very useful tool. Consider for example the
      hashes in Perl or Awk. I've used them countless times. They let you do
      very powerful stuff very easily. And I never cared if they are not
      the most efficient solution possible.

      So I think the STL *should* contain a hash, at least for keys with certain
      simple types (integer arrays, strings, etc.). Someone who needs very special
      implementations for very specific purposes will probably implement something
      differet - but someone who just needs "something that works well" will be
      happy with the STL's hash.

      Other STL containers have similar tradeoffs as with hashes - consider, for
      example, priority queues. STL's implementation are based on "heaps", and
      have O(logn) insertion and pop. It's easy to write a different implementation
      with a O(n) insertion of a whole sorted group of events and O(1) pop - such
      implementation might be faster for special uses. It's possible to implement
      hash-like O(1) insert and O(1) pop if you make other types of assumptions on
      the distribution of events on the queue.
      For example, in one program I once wrote in C++, where efficiency was of
      utmost importance and required the use of priority queues in two places -
      in one place the STL implementation was very good (better than something I
      tried to write myself), but in another place, my O(1) implementation was
      better.

      In the same program I also needed a hashtable, and in that case two things
      were very important: very low memory use and predictable time (no sudden
      reorganizations of the hash), so the SGI's STL hash was not suitable at
      all. I ended up writing a very optimized solution, that ended up taking
      only about 6 bytes (if I remember correctly) per entry over the length of
      the entry (key+value) itself, under certain assumption (one key assumption
      that the hash table will never contain more than 2^16 entries).

      --
      Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
      nyh@... |-----------------------------------------
      Phone: +972-53-245868, ICQ 13349191 |I want to live forever or die in the
      http://nadav.harel.org.il |attempt.
    • Shlomi Fish
      ... Sounds like a good compromise. In any case, deriving a class from a base hash class may make the code a bit more difficult but not much, assuming the code
      Message 2 of 22 , Feb 6, 2002
      • 0 Attachment
        On Thu, 7 Feb 2002, Omer Zak wrote:

        >
        > On Wed, 6 Feb 2002, Shlomi Fish wrote:
        >
        > > On 5 Feb 2002, Oleg Goldshmidt wrote:
        > >
        > > > Omer Musaev <omerm@...> writes:
        > > >
        > > > > some more esoteric features entered the library, while some basic
        > > > > ones ( most notably hash ) are absent.
        > > >
        > > > How many times have I missed that! It does come with gcc, but is not
        > > > standard...
        > > >
        > >
        > > I first thought that it was strange that STL lacked hashes. But then I
        > > remembered something: hashing theory is very comprehensive and there are a
        > > lot of ways to implement and tweak a hash. Points that come to mind:
        > >
        > > 1. Chaining vs. Open-Addressing.
        >
        > [... more options were snipped ...]
        >
        > > So basically the dilemma is what hash implementation to have, or what
        > > range of hashing options to support. I'm not sure it is possible to build
        > > a hash abstraction that would support all of the hashing options and will
        > > not be too bloated or unmaintainable.
        > >
        > > Had STL contained a hash, it would have been possible that some
        > > programmers could have (and possibly rightfully) blamed their program's
        > > performance on the hash. Since a hash is usually quite trivial to program,
        > > it's, IMO, a good idea to sometimes use your own custom hash
        > > implementation.
        >
        > On the other hand, let's not forget what I believe to be the reason for
        > FORTH's demise. FORTH is a very elegant language, with unorthodox ideas.
        > It was invented by Chuck Moore, who is having his own eccentric (and
        > fresh) ideas about how one should program.
        >
        > The reason FORTH didn't take hold (at least in my own projects) was that
        > it lacked standard libraries for the things which I needed. It expected
        > people to reinvent the wheel (and optimize it to their project's needs)
        > all the time. It didn't take to heart Pareto's Law (80% of the
        > computer time/programmer time/memory requirements/bug expenses of software
        > are in 20% of the code). People should optimize and design their own
        > implementations of data structures only when and where they are critical
        > to the software's performance. For non-critical parts of the software,
        > standard libraries are good enough and should be used.
        >
        > The morale of the story to hash functions in STL:
        > STL should have provided a standard hash implementation (like Perl does).
        > But the standard implementation should (like implementations of all
        > other STL data structuers) have provisions for people to substitute their
        > optimized algorithms when those algorithms are really needed for a
        > specific application.

        Sounds like a good compromise. In any case, deriving a class from a base
        hash class may make the code a bit more difficult but not much, assuming
        the code makes use of hash<mytype> & instead of
        my_own_efficient_hash<mytype> & (pardon my C++ and STL ignorance in case
        it shows).

        > --- Omer
        > There is no IGLU Cabal. We'll organize the next meeting when we finish
        > to prove, from First Principles, that 1 + 1 = 2.

        Purposely taking the joke too seriously:

        I believe Russel and Whitham managed to prove arithmetics from logic in
        "Prinicipa Mathematica". However, I believe 2 is defined as "1 + 1", or as
        the number that follows one, even though it undoubtedly has much more
        properties.

        It's like the De-Morgan Laws (not(A and B) = not(A) or not(B) and vice
        versa). One can show that they hold for every A and B and that's enough of
        a proof. I am not referring to the general case (for A1, A2, A3, A4...)
        that has to be proved by mathemtical induction.

        Regards,

        Shlomi Fish

        There is no IGLU Cabal! One of its most prominent members commited suicide
        when he failed to find a proof that the axiom that "A is A, and A is not
        not-A" holds for every A.

        > WARNING TO SPAMMERS: see at http://www.zak.co.il/spamwarning.html
        >
        >
        >
        > To unsubscribe from this group, send an email to:
        > hackers-il-unsubscribe@egroups.com
        >
        >
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >



        ----------------------------------------------------------------------
        Shlomi Fish shlomif@...
        Home Page: http://t2.technion.ac.il/~shlomif/
        Home E-mail: shlomif@...

        "Let's suppose you have a table with 2^n cups..."
        "Wait a second - is n a natural number?"
      • Shlomi Fish
        ... Actually if f(x) = C, where C is a constant, no such hash function exist or can exist. ... Have you ever tried running md5sum on a CD-ROM you just
        Message 3 of 22 , Feb 6, 2002
        • 0 Attachment
          On Wed, 6 Feb 2002, Arik Baratz wrote:

          > online for download ]?=
          > MIME-Version: 1.0
          > Content-Type: text/plain; charset=iso-8859-1
          >
          >
          > Hashes are application specific. A good hash is good in a specific situation -
          > that is, for each function f(x) of a random variable x there is a hash function
          > h(f(x)) that generates a uniform distribution.
          >

          Actually if f(x) = C, where C is a constant, no such hash function exist
          or can exist.

          > If you want a catch-all hash function, use a cryptographic hash like MD5.

          Have you ever tried running md5sum on a CD-ROM you just downloaded from
          the Internet? It takes forever to run. Usually, one will prefer to use a
          non-cryptographic hash function because it is faster and albeit can be
          tricked, usually generates equally good results. If there are a limited
          number of elements, than a Perfect Hash, which you'll have to compile from
          the set, would probably be a better choice.

          This is of course assuming the keys are fully serializable in some way.
          Sometimes, preparing a serialized key is not very straightforward.

          > If you
          > want a lean and mean function, you\'d have to analyze the domain you are working
          > with and think up a function that hashes it ok.
          >

          That was the case with Freecell Solver, where I recently switched from MD5
          to Perl's hash function and did not notice too big a difference in the
          speed.

          However, the hash function being chosen is just one of the elements of
          constructing a good hash (albeit a very important one, because a hash is
          only as good as it). Refer to my previous posts for other elements like
          chaining vs. open addressing, promoting or caching elements, etc. In C++,
          I'm not sure how well can one re-use the chain's elements, when they are
          re-hashed, and just pointing them to differnt elements, which is another
          technique I used in FCS.

          What I like about ANSI C, is that it does what you want when you want it,
          and if you know what you're doing there are very little side-effects. It
          does provide for a greater error ratio, than most other languages I know,
          though.

          > The glibc hash is very simplistic - you can beat it in many ways.
          >

          s/glibc/glib/? (glibc does not contain a hash, AFAI'm aware).

          My point was that if you are working on a Gtk+/GNOME application, you are
          stuck with the glib's hash whether you want it or not, because it's part
          of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
          then you may be criticized for making the code unreadable, or deviating
          from the Standard Way of Doing Things<tm> there. And in Glib's 1.2 at
          least, I believe it's impossible to define a hash with the same interface
          as Glib's hash, not to mention that the interface itself is not very
          optimizied (or sensible).

          Regards,

          Shlomi Fish

          > -- Arik
          >
          > On 06.02.2002 at 15:49:49, Shlomi Fish <shlomif@...> wrote:
          >
          > > On 5 Feb 2002, Oleg Goldshmidt wrote:
          > >
          > > > Omer Musaev <omerm@...> writes:
          > > >
          > > > > some more esoteric features entered the library, while some basic
          > > > > ones ( most notably hash ) are absent.
          > > >
          > > > How many times have I missed that! It does come with gcc, but is not
          > > > standard...
          > > >
          > >
          > > I first thought that it was strange that STL lacked hashes. But then I
          > > remembered something: hashing theory is very comprehensive and there are a
          > > lot of ways to implement and tweak a hash. Points that come to mind:
          > >
          > > 1. Chaining vs. Open-Addressing.
          > >
          > > 2. The Hash function being chosen. (refer to:
          > > http://burtleburtle.net/bob/hash/)
          > >
          > > 3. Modulo bucketing vs. multiplicative bucketing.
          > >
          > > 4. Re-hashing.
          > >
          > > 5. Storing the hash values next to the keys (to make for faster
          > > comparisons and re-hashings)
          > >
          > > 6. Promoting or caching frequently accessed elements in the backets.
          > >
          > > 7. Using something other than a linked list as a bucket - a doubly-linked
          > > list, a binary tree, a vector, another hash (;-) - I know someone who did
          > > that because he did not know better),
          > >
          > > 8. Perfect Hashing.
          > >
          > > 9. Universal Hashing.
          > >
          > > 10. Which operations: an atomic check-if-exists and if not add? An atomic
          > > check-and-replace, an insert-if-not-exist, etc.
          > >
          > > ---
          > >
          > > When working on Freecell Solver, I noticed that my own hash performed
          > > better than Glib\'s because I used better optimizations. I believe it would
          > > have out-performed Glib\'s hash in most other cases. (Note - it\'s API is
          > > still very incomplete, because of the requirements of FCS). Had I written
          > > a Gtk+/GNOME app, I probably would have been \"forced\" to use it because
          > > it\'s part of the Gtk+ architecture, which would have made the application
          > > slower.
          > >
          > > Some languages, like Perl, force a certain implementation (and a certain
          > > hash function) on their users. In Perl at least, one can program
          > > primitives that behave like hashes in Perl and in C, but they are not the
          > > default. Mark-Jason Dominus demonstrates that the hash function can go
          > > wrong:
          > >
          > > http://perl.plover.com/#badhash
          > >
          > > So basically the dilemma is what hash implementation to have, or what
          > > range of hashing options to support. I\'m not sure it is possible to build
          > > a hash abstraction that would support all of the hashing options and will
          > > not be too bloated or unmaintainable.
          > >
          > > Had STL contained a hash, it would have been possible that some
          > > programmers could have (and possibly rightfully) blamed their program\'s
          > > performance on the hash. Since a hash is usually quite trivial to program,
          > > it\'s, IMO, a good idea to sometimes use your own custom hash
          > > implementation.
          > >
          > > Regards,
          > >
          > > Shlomi Fish
          > >
          > > There is no IGLU Cabal! They tried to save complexity and preferred to use
          > > a hash over a balanced binary tree. They managed to insert all of the
          > > elements in O(n) time, but then discovered it would take an extra
          > > O(n*log(n)) to print them in order. This caused confusion and
          > > disappointment and brought the demise of the cabal.
          > >
          > >
          > > > > [1] http://www.stlport.org/resources/StepanovUSA.html
          > > >
          > > > There is a great passage there about Stepanov\'s early realization that
          > > > \"agorithms are defined on algebraic structures\". In my head, it echoes
          > > > the \"show me your structures and the block diagram becomes irrelevant\"
          > > > very nicely.
          > > >
          > > > What does the hackers-il population think of Stepanov\'s criticism of
          > > > OOP?
          > > >
          > > > \"I have yet to see an interesting piece of code that comes from these
          > > > OO people.\"
          > > >
          > > > \"...It might be a profitable thing for all your readers to learn Java,
          > > > but it has no intellectual value whatsoever.\"
          > > >
          > > > What I did like was the \"Money Oriented Programming\" moniker ;-)
          > > >
          > > > --
          > > > Oleg Goldshmidt | ogoldshmidt@...
          > > > \"If it ain\'t broken, it has not got enough features yet.\"
          > > >
          > > >
          > > > To unsubscribe from this group, send an email to:
          > > > hackers-il-unsubscribe@egroups.com
          > > >
          > > >
          > > >
          > > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          > > >
          > > >
          > >
          > >
          > >
          > > ----------------------------------------------------------------------
          > > Shlomi Fish shlomif@...
          > > Home Page: http://t2.technion.ac.il/~shlomif/
          > > Home E-mail: shlomif@...
          > >
          > > \"Let\'s suppose you have a table with 2^n cups...\"
          > > \"Wait a second - is n a natural number?\"
          > >
          > >
          > >
          > > To unsubscribe from this group, send an email to:
          > > hackers-il-unsubscribe@egroups.com
          > >
          > >
          > >
          > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          > >
          > >
          >
          >
          >
          >
          > Arik Baratz
          > System Engineer
          > arikb@...
          >
          > Office:
          > 4 Hamelacha St.
          > Raa’nana 43661
          > ISRAEL
          >
          > Tel: +972 (9) 743-9250 ext. 214
          > Fax: +972 (9) 743-9251
          > Cell: +972 (52) 354 959
          > eFax: +1 (978) 926-8913
          > ICQ: 210 8214
          >
          > Privileged and / or confidential Information may be contained in this electronic
          > mail message.
          >
          > You may not copy or deliver this message to anyone without my consent.
          >
          > If you are not the addressee indicated in this message, or you feel that this
          > message is not intended for you, Please destroy this message and kindly notify
          > the sender by replying to this electronic mail.
          >
          > Please advise immediately if you or your employer do not agree to the use of
          > Internet email for messages of this kind.
          >
          > Opinions, conclusions and other information in this message that do not relate
          > to the official business of Vidius shall be understood as neither given nor
          > endorsed by it.
          >
          >
          > To unsubscribe from this group, send an email to:
          > hackers-il-unsubscribe@egroups.com
          >
          >
          >
          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          >
          >



          ----------------------------------------------------------------------
          Shlomi Fish shlomif@...
          Home Page: http://t2.technion.ac.il/~shlomif/
          Home E-mail: shlomif@...

          "Let's suppose you have a table with 2^n cups..."
          "Wait a second - is n a natural number?"
        • Ofir Carny
          As far as I remember, for some applications, you can also use a random hash in order to avoid being tricked. This means a hash function which is not constant
          Message 4 of 22 , Feb 7, 2002
          • 0 Attachment
            As far as I remember, for some applications, you can also use a random hash
            in order to avoid being tricked.

            This means a hash function which is not constant (or depends on another not
            constant parameter).

            Ofir


            **********************************************************************
            This email and any files transmitted were checked by
            Port Authority Enterprise for unathorized content.
            **********************************************************************
          • Nadav Har'El
            ... What do you mean? If you place an entry somewhere in the table, and next time you go looking for it your random hash lands you somewhere else, how will
            Message 5 of 22 , Feb 7, 2002
            • 0 Attachment
              On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
              > As far as I remember, for some applications, you can also use a random hash
              > in order to avoid being tricked.
              >
              > This means a hash function which is not constant (or depends on another not
              > constant parameter).

              What do you mean? If you place an entry somewhere in the table, and next
              time you go looking for it your "random hash" lands you somewhere else, how
              will you find that existing entry? Maybe you mean ordering entries in one
              hash chain in a random order? But I can't see what that would get you -
              hash chains are supposed to be short anyway.

              --
              Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
              nyh@... |-----------------------------------------
              Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
              http://nadav.harel.org.il |YOUR BUFFERS!!!
            • Ofir Carny
              As I said, it is only good for specific applications, obviously, you can t change a hash function without rebuilding an existing table, however in some
              Message 6 of 22 , Feb 7, 2002
              • 0 Attachment
                As I said, it is only good for specific applications, obviously, you can't
                change a hash function without rebuilding an existing table, however in some
                applications it is enough to prevent a malicious attempt to 'break' your
                function.

                I didn't refer to the chain, however (unrelated), you can use a second
                function to use the table for the chain, avoiding memory allocation for
                collisions in a constant sized table.
                -----Original Message-----
                From: Nadav Har'El [mailto:nyh@...]
                Sent: Thursday, February 07, 2002 11:39 AM
                To: hackers-il@yahoogroups.com
                Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \"On
                Lisp\" now available ]


                On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash
                or not to Hash [was Re: \"On Lisp\" now available ]":
                > As far as I remember, for some applications, you can also use a random
                hash
                > in order to avoid being tricked.
                >
                > This means a hash function which is not constant (or depends on another
                not
                > constant parameter).

                What do you mean? If you place an entry somewhere in the table, and next
                time you go looking for it your "random hash" lands you somewhere else, how
                will you find that existing entry? Maybe you mean ordering entries in one
                hash chain in a random order? But I can't see what that would get you -
                hash chains are supposed to be short anyway.

                --
                Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat
                5762
                nyh@...
                |-----------------------------------------
                Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                http://nadav.harel.org.il |YOUR BUFFERS!!!

                To unsubscribe from this group, send an email to:
                hackers-il-unsubscribe@egroups.com



                Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



                **********************************************************************
                This email and any files transmitted were checked by
                Port Authority Enterprise for unathorized content.
                **********************************************************************
              • Nadav Har'El
                ... Oh, I see - you meant choosing, once, a *hash function* at random, but then use the same hash function all the time? Ok. -- Nadav Har El
                Message 7 of 22 , Feb 7, 2002
                • 0 Attachment
                  On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                  > As I said, it is only good for specific applications, obviously, you can't
                  > change a hash function without rebuilding an existing table, however in some
                  > applications it is enough to prevent a malicious attempt to 'break' your
                  > function.

                  Oh, I see - you meant choosing, once, a *hash function* at random, but then
                  use the same hash function all the time? Ok.

                  --
                  Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                  nyh@... |-----------------------------------------
                  Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                  http://nadav.harel.org.il |me food for free if I find a bug in it!
                • Shlomi Fish
                  ... There is a methodology to construct a random hash function out of a universal set of hash functions. This is called Universal Hashing, and I studied about
                  Message 8 of 22 , Feb 7, 2002
                  • 0 Attachment
                    On Thu, 7 Feb 2002, Nadav Har'El wrote:

                    > On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                    > > As I said, it is only good for specific applications, obviously, you can't
                    > > change a hash function without rebuilding an existing table, however in some
                    > > applications it is enough to prevent a malicious attempt to 'break' your
                    > > function.
                    >
                    > Oh, I see - you meant choosing, once, a *hash function* at random, but then
                    > use the same hash function all the time? Ok.
                    >

                    There is a methodology to construct a random hash function out of a
                    universal set of hash functions. This is called Universal Hashing, and I
                    studied about it in my DS and Algorithms course. An example for it, would
                    be to randomize an arbitrary string to prepend (or append) to the data
                    before it is MD5'ed. That way, even if the user deliberately creates
                    different strings whose first 32-bit MD5 bits are the same, he'll still
                    won't be able to out-smart the hash, because the prefix will make their
                    salt values completely different.

                    Of course, letting the user know what the prefix is will render it
                    useless. So it's kind of like a "security by obscurity" methodolgy.

                    Regards,

                    > --
                    > Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                    > nyh@... |-----------------------------------------
                    > Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                    > http://nadav.harel.org.il |me food for free if I find a bug in it!
                    >
                    >
                    > To unsubscribe from this group, send an email to:
                    > hackers-il-unsubscribe@egroups.com
                    >
                    >
                    >
                    > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                    >
                    >



                    ----------------------------------------------------------------------
                    Shlomi Fish shlomif@...
                    Home Page: http://t2.technion.ac.il/~shlomif/
                    Home E-mail: shlomif@...

                    "Let's suppose you have a table with 2^n cups..."
                    "Wait a second - is n a natural number?"
                  • Arik Baratz
                    ailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... - ... function ... Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C,
                    Message 9 of 22 , Feb 7, 2002
                    • 0 Attachment
                      ailable ]?=
                      MIME-Version: 1.0
                      Content-Type: text/plain; charset=iso-8859-1

                      On 07.02.2002 at 09:01:02, Shlomi Fish <shlomif@...> wrote:

                      > >
                      > > Hashes are application specific. A good hash is good in a specific situation
                      -
                      > > that is, for each function f(x) of a random variable x there is a hash
                      function
                      > > h(f(x)) that generates a uniform distribution.
                      > Actually if f(x) = C, where C is a constant, no such hash function exist
                      > or can exist.

                      Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C, you might want to
                      use a data structure different than a hash anyways.

                      > > If you want a catch-all hash function, use a cryptographic hash like MD5.
                      > Have you ever tried running md5sum on a CD-ROM you just downloaded from
                      > the Internet? It takes forever to run. Usually, one will prefer to use a
                      > non-cryptographic hash function because it is faster and albeit can be
                      > tricked, usually generates equally good results. If there are a limited
                      > number of elements, than a Perfect Hash, which you\'ll have to compile from
                      > the set, would probably be a better choice.
                      >
                      > This is of course assuming the keys are fully serializable in some way.
                      > Sometimes, preparing a serialized key is not very straightforward.

                      Actually I have just md5sum-ed a 2GB file, so I know how slow it is... I totaly
                      agree that baring special circumstances (i.e. in order to create the trapdoor
                      effect, say, in issuing confirmation numbers for flights, when the confirmation
                      cannot lead back to info in the ticket).

                      > > If you
                      > > want a lean and mean function, you\\\'d have to analyze the domain you are
                      working
                      > > with and think up a function that hashes it ok.
                      > That was the case with Freecell Solver, where I recently switched from MD5
                      > to Perl\'s hash function and did not notice too big a difference in the
                      > speed.
                      >
                      > However, the hash function being chosen is just one of the elements of
                      > constructing a good hash (albeit a very important one, because a hash is
                      > only as good as it). Refer to my previous posts for other elements like
                      > chaining vs. open addressing, promoting or caching elements, etc. In C++,
                      > I\'m not sure how well can one re-use the chain\'s elements, when they are
                      > re-hashed, and just pointing them to differnt elements, which is another
                      > technique I used in FCS.
                      >
                      > What I like about ANSI C, is that it does what you want when you want it,
                      > and if you know what you\'re doing there are very little side-effects. It
                      > does provide for a greater error ratio, than most other languages I know,
                      > though.
                      >
                      > > The glibc hash is very simplistic - you can beat it in many ways.
                      > >
                      >
                      > s/glibc/glib/? (glibc does not contain a hash, AFAI\'m aware).

                      yes

                      > My point was that if you are working on a Gtk+/GNOME application, you are
                      > stuck with the glib\'s hash whether you want it or not, because it\'s part
                      > of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
                      > then you may be criticized for making the code unreadable, or deviating
                      > from the Standard Way of Doing Things<tm> there. And in Glib\'s 1.2 at
                      > least, I believe it\'s impossible to define a hash with the same interface
                      > as Glib\'s hash, not to mention that the interface itself is not very
                      > optimizied (or sensible).

                      Well, the SWoDT is not always the best, and you ought not always to listen to
                      what other people say about your style. Well, maybe listen, but not always
                      embrace.

                      -- Arik
                    • Arik Baratz
                      vailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Perhaps all you want is to distribute a set of values in a uniform fashion. -- Arik
                      Message 10 of 22 , Feb 7, 2002
                      • 0 Attachment
                        vailable ]?=
                        MIME-Version: 1.0
                        Content-Type: text/plain; charset=iso-8859-1


                        Perhaps all you want is to distribute a set of values in a uniform fashion.

                        -- Arik

                        On 07.02.2002 at 12:09:56, Ofir Carny <ofir@...> wrote:

                        > As I said, it is only good for specific applications, obviously, you can\'t
                        > change a hash function without rebuilding an existing table, however in some
                        > applications it is enough to prevent a malicious attempt to \'break\' your
                        > function.
                        >
                        > I didn\'t refer to the chain, however (unrelated), you can use a second
                        > function to use the table for the chain, avoiding memory allocation for
                        > collisions in a constant sized table.
                        > -----Original Message-----
                        > From: Nadav Har\'El [mailto:nyh@...]
                        > Sent: Thursday, February 07, 2002 11:39 AM
                        > To: hackers-il@yahoogroups.com
                        > Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On
                        > Lisp\\\" now available ]
                        >
                        >
                        > On Thu, Feb 07, 2002, Ofir Carny wrote about \"RE: [hackers-il] Re: To Hash
                        > or not to Hash [was Re: \\\"On Lisp\\\" now available ]\":
                        > > As far as I remember, for some applications, you can also use a random
                        > hash
                        > > in order to avoid being tricked.
                        > >
                        > > This means a hash function which is not constant (or depends on another
                        > not
                        > > constant parameter).
                        >
                        > What do you mean? If you place an entry somewhere in the table, and next
                        > time you go looking for it your \"random hash\" lands you somewhere else,
                        how
                        > will you find that existing entry? Maybe you mean ordering entries in one
                        > hash chain in a random order? But I can\'t see what that would get you -
                        > hash chains are supposed to be short anyway.
                        >
                        > --
                        > Nadav Har\'El | Thursday, Feb 7 2002, 25 Shevat
                        > 5762
                        > nyh@...
                        > |-----------------------------------------
                        > Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                        > http://nadav.harel.org.il |YOUR BUFFERS!!!
                        >
                        > To unsubscribe from this group, send an email to:
                        > hackers-il-unsubscribe@egroups.com
                        >
                        >
                        >
                        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                        >
                        >
                        >
                        > **********************************************************************
                        > This email and any files transmitted were checked by
                        > Port Authority Enterprise for unathorized content.
                        > **********************************************************************
                        >
                        >
                        >
                        > To unsubscribe from this group, send an email to:
                        > hackers-il-unsubscribe@egroups.com
                        >
                        >
                        >
                        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                        >
                        >




                        Arik Baratz
                        System Engineer
                        arikb@...

                        Office:
                        4 Hamelacha St.
                        Raa’nana 43661
                        ISRAEL

                        Tel: +972 (9) 743-9250 ext. 214
                        Fax: +972 (9) 743-9251
                        Cell: +972 (52) 354 959
                        eFax: +1 (978) 926-8913
                        ICQ: 210 8214

                        Privileged and / or confidential Information may be contained in this electronic
                        mail message.

                        You may not copy or deliver this message to anyone without my consent.

                        If you are not the addressee indicated in this message, or you feel that this
                        message is not intended for you, Please destroy this message and kindly notify
                        the sender by replying to this electronic mail.

                        Please advise immediately if you or your employer do not agree to the use of
                        Internet email for messages of this kind.

                        Opinions, conclusions and other information in this message that do not relate
                        to the official business of Vidius shall be understood as neither given nor
                        endorsed by it.
                      • Nadav Har'El
                        ... Wow, something is *REALLY* wrong with your mail program, Vidius filter, or whatever... A part of the the subject got hacked off into the main text (as you
                        Message 11 of 22 , Feb 7, 2002
                        • 0 Attachment
                          On Thu, Feb 07, 2002, Arik Baratz wrote about "=?iso-8859-1?Q?Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On Lisp\\\" now av=":
                          > ailable ]?=
                          > MIME-Version: 1.0
                          > Content-Type: text/plain; charset=iso-8859-1

                          Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                          or whatever... A part of the the subject got hacked off into the main
                          text (as you can see in the quote above),

                          > > > want a lean and mean function, you\\\'d have to analyze the domain you are
                          > working
                          > > > with and think up a function that hashes it ok.
                          > > That was the case with Freecell Solver, where I recently switched from MD5
                          > > to Perl\'s hash function and did not notice too big a difference in the

                          and something caused all single quotes in your message (even its subject line)
                          to be backslashed, sometimes by more than one backslash. What is this - a
                          mailer written in a shell? :)

                          Weird ;)


                          --
                          Nadav Har'El | Thursday, Feb 7 2002, 26 Shevat 5762
                          nyh@... |-----------------------------------------
                          Phone: +972-53-245868, ICQ 13349191 |I before E except after C. We live in a
                          http://nadav.harel.org.il |weird society!
                        • Arik Baratz
                          On Lisp now av=3D?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... [hackers-il] Re: To Hash or not to Hash [was Re:
                          Message 12 of 22 , Feb 7, 2002
                          • 0 Attachment
                            \\\\\\"On Lisp\\\\\\\" now av=3D?=
                            MIME-Version: 1.0
                            Content-Type: text/plain; charset=iso-8859-1

                            On 07.02.2002 at 21:18:58, Nadav Har\'El <nyh@...> wrote:

                            > On Thu, Feb 07, 2002, Arik Baratz wrote about \"=?iso-8859-1?Q?Re:
                            [hackers-il] Re: To Hash or not to Hash [was Re: \\\\\\\"On Lisp\\\\\\\" now
                            av=\":
                            > > ailable ]?=
                            > > MIME-Version: 1.0
                            > > Content-Type: text/plain; charset=iso-8859-1
                            >
                            > Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                            > or whatever... A part of the the subject got hacked off into the main
                            > text (as you can see in the quote above),

                            I\'m in the united states right now, and I\'m using JawMail
                            http://jawmail.sf.net beta version. It has some flaws, I admit.
                            >
                            > > > > want a lean and mean function, you\\\\\\\'d have to analyze the domain
                            you are
                            > > working
                            > > > > with and think up a function that hashes it ok.
                            > > > That was the case with Freecell Solver, where I recently switched from
                            MD5
                            > > > to Perl\\\'s hash function and did not notice too big a difference in
                            the
                            >
                            > and something caused all single quotes in your message (even its subject
                            line)
                            > to be backslashed, sometimes by more than one backslash. What is this - a
                            > mailer written in a shell? :)
                            >
                            > Weird ;)

                            Not in shell, in PHP. It\'s somewhat buggy, but it works...


                            Arik Baratz
                            System Engineer
                            arikb@...

                            Office:
                            4 Hamelacha St.
                            Raa’nana 43661
                            ISRAEL

                            Tel: +972 (9) 743-9250 ext. 214
                            Fax: +972 (9) 743-9251
                            Cell: +972 (52) 354 959
                            eFax: +1 (978) 926-8913
                            ICQ: 210 8214

                            Privileged and / or confidential Information may be contained in this electronic
                            mail message.

                            You may not copy or deliver this message to anyone without my consent.

                            If you are not the addressee indicated in this message, or you feel that this
                            message is not intended for you, Please destroy this message and kindly notify
                            the sender by replying to this electronic mail.

                            Please advise immediately if you or your employer do not agree to the use of
                            Internet email for messages of this kind.

                            Opinions, conclusions and other information in this message that do not relate
                            to the official business of Vidius shall be understood as neither given nor
                            endorsed by it.
                          • mulix
                            ... [snipped other such monstrosities, and then arik said] ... egads, whatever happened to good ol telnet my.mail.server.com 25 and talking SMTP like
                            Message 13 of 22 , Feb 7, 2002
                            • 0 Attachment
                              On Thu, 7 Feb 2002, Arik Baratz wrote:

                              > \\\\\\"On Lisp\\\\\\\" now av=3D?=
                              > MIME-Version: 1.0
                              > Content-Type: text/plain; charset=iso-8859-1

                              [snipped other such monstrosities, and then arik said]
                              > Not in shell, in PHP. It\'s somewhat buggy, but it works...

                              egads, whatever happened to good ol' telnet my.mail.server.com 25 and
                              talking SMTP like reasonable human beings?

                              ObHackersIL - nothing. things have certainly hit a low point.
                              --
                              mulix

                              http://vipe.technion.ac.il/~mulix/
                              http://syscalltrack.sf.net/
                            Your message has been successfully submitted and would be delivered to recipients shortly.