Loading ...
Sorry, an error occurred while loading the content.
 

The FORTH Question (was: Re: [hackers-il] To Hash or not to Hash)

Expand Messages
  • Omer Zak
    ... [... more options were snipped ...] ... On the other hand, let s not forget what I believe to be the reason for FORTH s demise. FORTH is a very elegant
    Message 1 of 22 , Feb 6, 2002
      On Wed, 6 Feb 2002, Shlomi Fish wrote:

      > On 5 Feb 2002, Oleg Goldshmidt wrote:
      >
      > > Omer Musaev <omerm@...> writes:
      > >
      > > > some more esoteric features entered the library, while some basic
      > > > ones ( most notably hash ) are absent.
      > >
      > > How many times have I missed that! It does come with gcc, but is not
      > > standard...
      > >
      >
      > I first thought that it was strange that STL lacked hashes. But then I
      > remembered something: hashing theory is very comprehensive and there are a
      > lot of ways to implement and tweak a hash. Points that come to mind:
      >
      > 1. Chaining vs. Open-Addressing.

      [... more options were snipped ...]

      > So basically the dilemma is what hash implementation to have, or what
      > range of hashing options to support. I'm not sure it is possible to build
      > a hash abstraction that would support all of the hashing options and will
      > not be too bloated or unmaintainable.
      >
      > Had STL contained a hash, it would have been possible that some
      > programmers could have (and possibly rightfully) blamed their program's
      > performance on the hash. Since a hash is usually quite trivial to program,
      > it's, IMO, a good idea to sometimes use your own custom hash
      > implementation.

      On the other hand, let's not forget what I believe to be the reason for
      FORTH's demise. FORTH is a very elegant language, with unorthodox ideas.
      It was invented by Chuck Moore, who is having his own eccentric (and
      fresh) ideas about how one should program.

      The reason FORTH didn't take hold (at least in my own projects) was that
      it lacked standard libraries for the things which I needed. It expected
      people to reinvent the wheel (and optimize it to their project's needs)
      all the time. It didn't take to heart Pareto's Law (80% of the
      computer time/programmer time/memory requirements/bug expenses of software
      are in 20% of the code). People should optimize and design their own
      implementations of data structures only when and where they are critical
      to the software's performance. For non-critical parts of the software,
      standard libraries are good enough and should be used.

      The morale of the story to hash functions in STL:
      STL should have provided a standard hash implementation (like Perl does).
      But the standard implementation should (like implementations of all
      other STL data structuers) have provisions for people to substitute their
      optimized algorithms when those algorithms are really needed for a
      specific application.
      --- Omer
      There is no IGLU Cabal. We'll organize the next meeting when we finish
      to prove, from First Principles, that 1 + 1 = 2.
      WARNING TO SPAMMERS: see at http://www.zak.co.il/spamwarning.html
    • Nadav Har'El
      ... When efficiency is the key, canned hash implementations are many times not good enough. They waste memory when memory is important to you, they are too
      Message 2 of 22 , Feb 6, 2002
        On Wed, Feb 06, 2002, Shlomi Fish wrote about "[hackers-il] To Hash or not to Hash [was Re: "On Lisp" now available online for download ]":
        > On 5 Feb 2002, Oleg Goldshmidt wrote:
        >
        > > Omer Musaev <omerm@...> writes:
        > >
        > > > some more esoteric features entered the library, while some basic
        > > > ones ( most notably hash ) are absent.
        > >
        > > How many times have I missed that! It does come with gcc, but is not
        > > standard...
        > >
        >
        > I first thought that it was strange that STL lacked hashes. But then I
        > remembered something: hashing theory is very comprehensive and there are a
        > lot of ways to implement and tweak a hash. Points that come to mind:
        >...

        When efficiency is the key, "canned" hash implementations are many times
        not good enough. They waste memory when memory is important to you, they
        are too slow when speed is important, the hash function doesn't work well
        for your (weird) choice of keys, they don't have predictable run times
        (e.g., one insert can take 1000 times of another insert, because your hash
        spontaneously decided to grow and reorder itself), and a bunch of other
        problems.

        But when absolute efficiency and suiting the exact problem isn't needed,
        a general-purpose hash is a very useful tool. Consider for example the
        hashes in Perl or Awk. I've used them countless times. They let you do
        very powerful stuff very easily. And I never cared if they are not
        the most efficient solution possible.

        So I think the STL *should* contain a hash, at least for keys with certain
        simple types (integer arrays, strings, etc.). Someone who needs very special
        implementations for very specific purposes will probably implement something
        differet - but someone who just needs "something that works well" will be
        happy with the STL's hash.

        Other STL containers have similar tradeoffs as with hashes - consider, for
        example, priority queues. STL's implementation are based on "heaps", and
        have O(logn) insertion and pop. It's easy to write a different implementation
        with a O(n) insertion of a whole sorted group of events and O(1) pop - such
        implementation might be faster for special uses. It's possible to implement
        hash-like O(1) insert and O(1) pop if you make other types of assumptions on
        the distribution of events on the queue.
        For example, in one program I once wrote in C++, where efficiency was of
        utmost importance and required the use of priority queues in two places -
        in one place the STL implementation was very good (better than something I
        tried to write myself), but in another place, my O(1) implementation was
        better.

        In the same program I also needed a hashtable, and in that case two things
        were very important: very low memory use and predictable time (no sudden
        reorganizations of the hash), so the SGI's STL hash was not suitable at
        all. I ended up writing a very optimized solution, that ended up taking
        only about 6 bytes (if I remember correctly) per entry over the length of
        the entry (key+value) itself, under certain assumption (one key assumption
        that the hash table will never contain more than 2^16 entries).

        --
        Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
        nyh@... |-----------------------------------------
        Phone: +972-53-245868, ICQ 13349191 |I want to live forever or die in the
        http://nadav.harel.org.il |attempt.
      • Shlomi Fish
        ... Sounds like a good compromise. In any case, deriving a class from a base hash class may make the code a bit more difficult but not much, assuming the code
        Message 3 of 22 , Feb 6, 2002
          On Thu, 7 Feb 2002, Omer Zak wrote:

          >
          > On Wed, 6 Feb 2002, Shlomi Fish wrote:
          >
          > > On 5 Feb 2002, Oleg Goldshmidt wrote:
          > >
          > > > Omer Musaev <omerm@...> writes:
          > > >
          > > > > some more esoteric features entered the library, while some basic
          > > > > ones ( most notably hash ) are absent.
          > > >
          > > > How many times have I missed that! It does come with gcc, but is not
          > > > standard...
          > > >
          > >
          > > I first thought that it was strange that STL lacked hashes. But then I
          > > remembered something: hashing theory is very comprehensive and there are a
          > > lot of ways to implement and tweak a hash. Points that come to mind:
          > >
          > > 1. Chaining vs. Open-Addressing.
          >
          > [... more options were snipped ...]
          >
          > > So basically the dilemma is what hash implementation to have, or what
          > > range of hashing options to support. I'm not sure it is possible to build
          > > a hash abstraction that would support all of the hashing options and will
          > > not be too bloated or unmaintainable.
          > >
          > > Had STL contained a hash, it would have been possible that some
          > > programmers could have (and possibly rightfully) blamed their program's
          > > performance on the hash. Since a hash is usually quite trivial to program,
          > > it's, IMO, a good idea to sometimes use your own custom hash
          > > implementation.
          >
          > On the other hand, let's not forget what I believe to be the reason for
          > FORTH's demise. FORTH is a very elegant language, with unorthodox ideas.
          > It was invented by Chuck Moore, who is having his own eccentric (and
          > fresh) ideas about how one should program.
          >
          > The reason FORTH didn't take hold (at least in my own projects) was that
          > it lacked standard libraries for the things which I needed. It expected
          > people to reinvent the wheel (and optimize it to their project's needs)
          > all the time. It didn't take to heart Pareto's Law (80% of the
          > computer time/programmer time/memory requirements/bug expenses of software
          > are in 20% of the code). People should optimize and design their own
          > implementations of data structures only when and where they are critical
          > to the software's performance. For non-critical parts of the software,
          > standard libraries are good enough and should be used.
          >
          > The morale of the story to hash functions in STL:
          > STL should have provided a standard hash implementation (like Perl does).
          > But the standard implementation should (like implementations of all
          > other STL data structuers) have provisions for people to substitute their
          > optimized algorithms when those algorithms are really needed for a
          > specific application.

          Sounds like a good compromise. In any case, deriving a class from a base
          hash class may make the code a bit more difficult but not much, assuming
          the code makes use of hash<mytype> & instead of
          my_own_efficient_hash<mytype> & (pardon my C++ and STL ignorance in case
          it shows).

          > --- Omer
          > There is no IGLU Cabal. We'll organize the next meeting when we finish
          > to prove, from First Principles, that 1 + 1 = 2.

          Purposely taking the joke too seriously:

          I believe Russel and Whitham managed to prove arithmetics from logic in
          "Prinicipa Mathematica". However, I believe 2 is defined as "1 + 1", or as
          the number that follows one, even though it undoubtedly has much more
          properties.

          It's like the De-Morgan Laws (not(A and B) = not(A) or not(B) and vice
          versa). One can show that they hold for every A and B and that's enough of
          a proof. I am not referring to the general case (for A1, A2, A3, A4...)
          that has to be proved by mathemtical induction.

          Regards,

          Shlomi Fish

          There is no IGLU Cabal! One of its most prominent members commited suicide
          when he failed to find a proof that the axiom that "A is A, and A is not
          not-A" holds for every A.

          > WARNING TO SPAMMERS: see at http://www.zak.co.il/spamwarning.html
          >
          >
          >
          > To unsubscribe from this group, send an email to:
          > hackers-il-unsubscribe@egroups.com
          >
          >
          >
          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          >
          >



          ----------------------------------------------------------------------
          Shlomi Fish shlomif@...
          Home Page: http://t2.technion.ac.il/~shlomif/
          Home E-mail: shlomif@...

          "Let's suppose you have a table with 2^n cups..."
          "Wait a second - is n a natural number?"
        • Shlomi Fish
          ... Actually if f(x) = C, where C is a constant, no such hash function exist or can exist. ... Have you ever tried running md5sum on a CD-ROM you just
          Message 4 of 22 , Feb 6, 2002
            On Wed, 6 Feb 2002, Arik Baratz wrote:

            > online for download ]?=
            > MIME-Version: 1.0
            > Content-Type: text/plain; charset=iso-8859-1
            >
            >
            > Hashes are application specific. A good hash is good in a specific situation -
            > that is, for each function f(x) of a random variable x there is a hash function
            > h(f(x)) that generates a uniform distribution.
            >

            Actually if f(x) = C, where C is a constant, no such hash function exist
            or can exist.

            > If you want a catch-all hash function, use a cryptographic hash like MD5.

            Have you ever tried running md5sum on a CD-ROM you just downloaded from
            the Internet? It takes forever to run. Usually, one will prefer to use a
            non-cryptographic hash function because it is faster and albeit can be
            tricked, usually generates equally good results. If there are a limited
            number of elements, than a Perfect Hash, which you'll have to compile from
            the set, would probably be a better choice.

            This is of course assuming the keys are fully serializable in some way.
            Sometimes, preparing a serialized key is not very straightforward.

            > If you
            > want a lean and mean function, you\'d have to analyze the domain you are working
            > with and think up a function that hashes it ok.
            >

            That was the case with Freecell Solver, where I recently switched from MD5
            to Perl's hash function and did not notice too big a difference in the
            speed.

            However, the hash function being chosen is just one of the elements of
            constructing a good hash (albeit a very important one, because a hash is
            only as good as it). Refer to my previous posts for other elements like
            chaining vs. open addressing, promoting or caching elements, etc. In C++,
            I'm not sure how well can one re-use the chain's elements, when they are
            re-hashed, and just pointing them to differnt elements, which is another
            technique I used in FCS.

            What I like about ANSI C, is that it does what you want when you want it,
            and if you know what you're doing there are very little side-effects. It
            does provide for a greater error ratio, than most other languages I know,
            though.

            > The glibc hash is very simplistic - you can beat it in many ways.
            >

            s/glibc/glib/? (glibc does not contain a hash, AFAI'm aware).

            My point was that if you are working on a Gtk+/GNOME application, you are
            stuck with the glib's hash whether you want it or not, because it's part
            of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
            then you may be criticized for making the code unreadable, or deviating
            from the Standard Way of Doing Things<tm> there. And in Glib's 1.2 at
            least, I believe it's impossible to define a hash with the same interface
            as Glib's hash, not to mention that the interface itself is not very
            optimizied (or sensible).

            Regards,

            Shlomi Fish

            > -- Arik
            >
            > On 06.02.2002 at 15:49:49, Shlomi Fish <shlomif@...> wrote:
            >
            > > On 5 Feb 2002, Oleg Goldshmidt wrote:
            > >
            > > > Omer Musaev <omerm@...> writes:
            > > >
            > > > > some more esoteric features entered the library, while some basic
            > > > > ones ( most notably hash ) are absent.
            > > >
            > > > How many times have I missed that! It does come with gcc, but is not
            > > > standard...
            > > >
            > >
            > > I first thought that it was strange that STL lacked hashes. But then I
            > > remembered something: hashing theory is very comprehensive and there are a
            > > lot of ways to implement and tweak a hash. Points that come to mind:
            > >
            > > 1. Chaining vs. Open-Addressing.
            > >
            > > 2. The Hash function being chosen. (refer to:
            > > http://burtleburtle.net/bob/hash/)
            > >
            > > 3. Modulo bucketing vs. multiplicative bucketing.
            > >
            > > 4. Re-hashing.
            > >
            > > 5. Storing the hash values next to the keys (to make for faster
            > > comparisons and re-hashings)
            > >
            > > 6. Promoting or caching frequently accessed elements in the backets.
            > >
            > > 7. Using something other than a linked list as a bucket - a doubly-linked
            > > list, a binary tree, a vector, another hash (;-) - I know someone who did
            > > that because he did not know better),
            > >
            > > 8. Perfect Hashing.
            > >
            > > 9. Universal Hashing.
            > >
            > > 10. Which operations: an atomic check-if-exists and if not add? An atomic
            > > check-and-replace, an insert-if-not-exist, etc.
            > >
            > > ---
            > >
            > > When working on Freecell Solver, I noticed that my own hash performed
            > > better than Glib\'s because I used better optimizations. I believe it would
            > > have out-performed Glib\'s hash in most other cases. (Note - it\'s API is
            > > still very incomplete, because of the requirements of FCS). Had I written
            > > a Gtk+/GNOME app, I probably would have been \"forced\" to use it because
            > > it\'s part of the Gtk+ architecture, which would have made the application
            > > slower.
            > >
            > > Some languages, like Perl, force a certain implementation (and a certain
            > > hash function) on their users. In Perl at least, one can program
            > > primitives that behave like hashes in Perl and in C, but they are not the
            > > default. Mark-Jason Dominus demonstrates that the hash function can go
            > > wrong:
            > >
            > > http://perl.plover.com/#badhash
            > >
            > > So basically the dilemma is what hash implementation to have, or what
            > > range of hashing options to support. I\'m not sure it is possible to build
            > > a hash abstraction that would support all of the hashing options and will
            > > not be too bloated or unmaintainable.
            > >
            > > Had STL contained a hash, it would have been possible that some
            > > programmers could have (and possibly rightfully) blamed their program\'s
            > > performance on the hash. Since a hash is usually quite trivial to program,
            > > it\'s, IMO, a good idea to sometimes use your own custom hash
            > > implementation.
            > >
            > > Regards,
            > >
            > > Shlomi Fish
            > >
            > > There is no IGLU Cabal! They tried to save complexity and preferred to use
            > > a hash over a balanced binary tree. They managed to insert all of the
            > > elements in O(n) time, but then discovered it would take an extra
            > > O(n*log(n)) to print them in order. This caused confusion and
            > > disappointment and brought the demise of the cabal.
            > >
            > >
            > > > > [1] http://www.stlport.org/resources/StepanovUSA.html
            > > >
            > > > There is a great passage there about Stepanov\'s early realization that
            > > > \"agorithms are defined on algebraic structures\". In my head, it echoes
            > > > the \"show me your structures and the block diagram becomes irrelevant\"
            > > > very nicely.
            > > >
            > > > What does the hackers-il population think of Stepanov\'s criticism of
            > > > OOP?
            > > >
            > > > \"I have yet to see an interesting piece of code that comes from these
            > > > OO people.\"
            > > >
            > > > \"...It might be a profitable thing for all your readers to learn Java,
            > > > but it has no intellectual value whatsoever.\"
            > > >
            > > > What I did like was the \"Money Oriented Programming\" moniker ;-)
            > > >
            > > > --
            > > > Oleg Goldshmidt | ogoldshmidt@...
            > > > \"If it ain\'t broken, it has not got enough features yet.\"
            > > >
            > > >
            > > > To unsubscribe from this group, send an email to:
            > > > hackers-il-unsubscribe@egroups.com
            > > >
            > > >
            > > >
            > > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            > > >
            > > >
            > >
            > >
            > >
            > > ----------------------------------------------------------------------
            > > Shlomi Fish shlomif@...
            > > Home Page: http://t2.technion.ac.il/~shlomif/
            > > Home E-mail: shlomif@...
            > >
            > > \"Let\'s suppose you have a table with 2^n cups...\"
            > > \"Wait a second - is n a natural number?\"
            > >
            > >
            > >
            > > To unsubscribe from this group, send an email to:
            > > hackers-il-unsubscribe@egroups.com
            > >
            > >
            > >
            > > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            > >
            > >
            >
            >
            >
            >
            > Arik Baratz
            > System Engineer
            > arikb@...
            >
            > Office:
            > 4 Hamelacha St.
            > Raa’nana 43661
            > ISRAEL
            >
            > Tel: +972 (9) 743-9250 ext. 214
            > Fax: +972 (9) 743-9251
            > Cell: +972 (52) 354 959
            > eFax: +1 (978) 926-8913
            > ICQ: 210 8214
            >
            > Privileged and / or confidential Information may be contained in this electronic
            > mail message.
            >
            > You may not copy or deliver this message to anyone without my consent.
            >
            > If you are not the addressee indicated in this message, or you feel that this
            > message is not intended for you, Please destroy this message and kindly notify
            > the sender by replying to this electronic mail.
            >
            > Please advise immediately if you or your employer do not agree to the use of
            > Internet email for messages of this kind.
            >
            > Opinions, conclusions and other information in this message that do not relate
            > to the official business of Vidius shall be understood as neither given nor
            > endorsed by it.
            >
            >
            > To unsubscribe from this group, send an email to:
            > hackers-il-unsubscribe@egroups.com
            >
            >
            >
            > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            >
            >



            ----------------------------------------------------------------------
            Shlomi Fish shlomif@...
            Home Page: http://t2.technion.ac.il/~shlomif/
            Home E-mail: shlomif@...

            "Let's suppose you have a table with 2^n cups..."
            "Wait a second - is n a natural number?"
          • Ofir Carny
            As far as I remember, for some applications, you can also use a random hash in order to avoid being tricked. This means a hash function which is not constant
            Message 5 of 22 , Feb 7, 2002
              As far as I remember, for some applications, you can also use a random hash
              in order to avoid being tricked.

              This means a hash function which is not constant (or depends on another not
              constant parameter).

              Ofir


              **********************************************************************
              This email and any files transmitted were checked by
              Port Authority Enterprise for unathorized content.
              **********************************************************************
            • Nadav Har'El
              ... What do you mean? If you place an entry somewhere in the table, and next time you go looking for it your random hash lands you somewhere else, how will
              Message 6 of 22 , Feb 7, 2002
                On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                > As far as I remember, for some applications, you can also use a random hash
                > in order to avoid being tricked.
                >
                > This means a hash function which is not constant (or depends on another not
                > constant parameter).

                What do you mean? If you place an entry somewhere in the table, and next
                time you go looking for it your "random hash" lands you somewhere else, how
                will you find that existing entry? Maybe you mean ordering entries in one
                hash chain in a random order? But I can't see what that would get you -
                hash chains are supposed to be short anyway.

                --
                Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                nyh@... |-----------------------------------------
                Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                http://nadav.harel.org.il |YOUR BUFFERS!!!
              • Ofir Carny
                As I said, it is only good for specific applications, obviously, you can t change a hash function without rebuilding an existing table, however in some
                Message 7 of 22 , Feb 7, 2002
                  As I said, it is only good for specific applications, obviously, you can't
                  change a hash function without rebuilding an existing table, however in some
                  applications it is enough to prevent a malicious attempt to 'break' your
                  function.

                  I didn't refer to the chain, however (unrelated), you can use a second
                  function to use the table for the chain, avoiding memory allocation for
                  collisions in a constant sized table.
                  -----Original Message-----
                  From: Nadav Har'El [mailto:nyh@...]
                  Sent: Thursday, February 07, 2002 11:39 AM
                  To: hackers-il@yahoogroups.com
                  Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \"On
                  Lisp\" now available ]


                  On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash
                  or not to Hash [was Re: \"On Lisp\" now available ]":
                  > As far as I remember, for some applications, you can also use a random
                  hash
                  > in order to avoid being tricked.
                  >
                  > This means a hash function which is not constant (or depends on another
                  not
                  > constant parameter).

                  What do you mean? If you place an entry somewhere in the table, and next
                  time you go looking for it your "random hash" lands you somewhere else, how
                  will you find that existing entry? Maybe you mean ordering entries in one
                  hash chain in a random order? But I can't see what that would get you -
                  hash chains are supposed to be short anyway.

                  --
                  Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat
                  5762
                  nyh@...
                  |-----------------------------------------
                  Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                  http://nadav.harel.org.il |YOUR BUFFERS!!!

                  To unsubscribe from this group, send an email to:
                  hackers-il-unsubscribe@egroups.com



                  Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



                  **********************************************************************
                  This email and any files transmitted were checked by
                  Port Authority Enterprise for unathorized content.
                  **********************************************************************
                • Nadav Har'El
                  ... Oh, I see - you meant choosing, once, a *hash function* at random, but then use the same hash function all the time? Ok. -- Nadav Har El
                  Message 8 of 22 , Feb 7, 2002
                    On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                    > As I said, it is only good for specific applications, obviously, you can't
                    > change a hash function without rebuilding an existing table, however in some
                    > applications it is enough to prevent a malicious attempt to 'break' your
                    > function.

                    Oh, I see - you meant choosing, once, a *hash function* at random, but then
                    use the same hash function all the time? Ok.

                    --
                    Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                    nyh@... |-----------------------------------------
                    Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                    http://nadav.harel.org.il |me food for free if I find a bug in it!
                  • Shlomi Fish
                    ... There is a methodology to construct a random hash function out of a universal set of hash functions. This is called Universal Hashing, and I studied about
                    Message 9 of 22 , Feb 7, 2002
                      On Thu, 7 Feb 2002, Nadav Har'El wrote:

                      > On Thu, Feb 07, 2002, Ofir Carny wrote about "RE: [hackers-il] Re: To Hash or not to Hash [was Re: \"On Lisp\" now available ]":
                      > > As I said, it is only good for specific applications, obviously, you can't
                      > > change a hash function without rebuilding an existing table, however in some
                      > > applications it is enough to prevent a malicious attempt to 'break' your
                      > > function.
                      >
                      > Oh, I see - you meant choosing, once, a *hash function* at random, but then
                      > use the same hash function all the time? Ok.
                      >

                      There is a methodology to construct a random hash function out of a
                      universal set of hash functions. This is called Universal Hashing, and I
                      studied about it in my DS and Algorithms course. An example for it, would
                      be to randomize an arbitrary string to prepend (or append) to the data
                      before it is MD5'ed. That way, even if the user deliberately creates
                      different strings whose first 32-bit MD5 bits are the same, he'll still
                      won't be able to out-smart the hash, because the prefix will make their
                      salt values completely different.

                      Of course, letting the user know what the prefix is will render it
                      useless. So it's kind of like a "security by obscurity" methodolgy.

                      Regards,

                      > --
                      > Nadav Har'El | Thursday, Feb 7 2002, 25 Shevat 5762
                      > nyh@... |-----------------------------------------
                      > Phone: +972-53-245868, ICQ 13349191 |Unlike Microsoft, a restaurant will give
                      > http://nadav.harel.org.il |me food for free if I find a bug in it!
                      >
                      >
                      > To unsubscribe from this group, send an email to:
                      > hackers-il-unsubscribe@egroups.com
                      >
                      >
                      >
                      > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                      >
                      >



                      ----------------------------------------------------------------------
                      Shlomi Fish shlomif@...
                      Home Page: http://t2.technion.ac.il/~shlomif/
                      Home E-mail: shlomif@...

                      "Let's suppose you have a table with 2^n cups..."
                      "Wait a second - is n a natural number?"
                    • Arik Baratz
                      ailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... - ... function ... Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C,
                      Message 10 of 22 , Feb 7, 2002
                        ailable ]?=
                        MIME-Version: 1.0
                        Content-Type: text/plain; charset=iso-8859-1

                        On 07.02.2002 at 09:01:02, Shlomi Fish <shlomif@...> wrote:

                        > >
                        > > Hashes are application specific. A good hash is good in a specific situation
                        -
                        > > that is, for each function f(x) of a random variable x there is a hash
                        function
                        > > h(f(x)) that generates a uniform distribution.
                        > Actually if f(x) = C, where C is a constant, no such hash function exist
                        > or can exist.

                        Oh yeah? how about f(x) = rand(seed)? Obviously, if f(x)=C, you might want to
                        use a data structure different than a hash anyways.

                        > > If you want a catch-all hash function, use a cryptographic hash like MD5.
                        > Have you ever tried running md5sum on a CD-ROM you just downloaded from
                        > the Internet? It takes forever to run. Usually, one will prefer to use a
                        > non-cryptographic hash function because it is faster and albeit can be
                        > tricked, usually generates equally good results. If there are a limited
                        > number of elements, than a Perfect Hash, which you\'ll have to compile from
                        > the set, would probably be a better choice.
                        >
                        > This is of course assuming the keys are fully serializable in some way.
                        > Sometimes, preparing a serialized key is not very straightforward.

                        Actually I have just md5sum-ed a 2GB file, so I know how slow it is... I totaly
                        agree that baring special circumstances (i.e. in order to create the trapdoor
                        effect, say, in issuing confirmation numbers for flights, when the confirmation
                        cannot lead back to info in the ticket).

                        > > If you
                        > > want a lean and mean function, you\\\'d have to analyze the domain you are
                        working
                        > > with and think up a function that hashes it ok.
                        > That was the case with Freecell Solver, where I recently switched from MD5
                        > to Perl\'s hash function and did not notice too big a difference in the
                        > speed.
                        >
                        > However, the hash function being chosen is just one of the elements of
                        > constructing a good hash (albeit a very important one, because a hash is
                        > only as good as it). Refer to my previous posts for other elements like
                        > chaining vs. open addressing, promoting or caching elements, etc. In C++,
                        > I\'m not sure how well can one re-use the chain\'s elements, when they are
                        > re-hashed, and just pointing them to differnt elements, which is another
                        > technique I used in FCS.
                        >
                        > What I like about ANSI C, is that it does what you want when you want it,
                        > and if you know what you\'re doing there are very little side-effects. It
                        > does provide for a greater error ratio, than most other languages I know,
                        > though.
                        >
                        > > The glibc hash is very simplistic - you can beat it in many ways.
                        > >
                        >
                        > s/glibc/glib/? (glibc does not contain a hash, AFAI\'m aware).

                        yes

                        > My point was that if you are working on a Gtk+/GNOME application, you are
                        > stuck with the glib\'s hash whether you want it or not, because it\'s part
                        > of the Gtk+/GNOME architecture. You can code your own hash in ANSI C, but
                        > then you may be criticized for making the code unreadable, or deviating
                        > from the Standard Way of Doing Things<tm> there. And in Glib\'s 1.2 at
                        > least, I believe it\'s impossible to define a hash with the same interface
                        > as Glib\'s hash, not to mention that the interface itself is not very
                        > optimizied (or sensible).

                        Well, the SWoDT is not always the best, and you ought not always to listen to
                        what other people say about your style. Well, maybe listen, but not always
                        embrace.

                        -- Arik
                      • Arik Baratz
                        vailable ]?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Perhaps all you want is to distribute a set of values in a uniform fashion. -- Arik
                        Message 11 of 22 , Feb 7, 2002
                          vailable ]?=
                          MIME-Version: 1.0
                          Content-Type: text/plain; charset=iso-8859-1


                          Perhaps all you want is to distribute a set of values in a uniform fashion.

                          -- Arik

                          On 07.02.2002 at 12:09:56, Ofir Carny <ofir@...> wrote:

                          > As I said, it is only good for specific applications, obviously, you can\'t
                          > change a hash function without rebuilding an existing table, however in some
                          > applications it is enough to prevent a malicious attempt to \'break\' your
                          > function.
                          >
                          > I didn\'t refer to the chain, however (unrelated), you can use a second
                          > function to use the table for the chain, avoiding memory allocation for
                          > collisions in a constant sized table.
                          > -----Original Message-----
                          > From: Nadav Har\'El [mailto:nyh@...]
                          > Sent: Thursday, February 07, 2002 11:39 AM
                          > To: hackers-il@yahoogroups.com
                          > Subject: Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On
                          > Lisp\\\" now available ]
                          >
                          >
                          > On Thu, Feb 07, 2002, Ofir Carny wrote about \"RE: [hackers-il] Re: To Hash
                          > or not to Hash [was Re: \\\"On Lisp\\\" now available ]\":
                          > > As far as I remember, for some applications, you can also use a random
                          > hash
                          > > in order to avoid being tricked.
                          > >
                          > > This means a hash function which is not constant (or depends on another
                          > not
                          > > constant parameter).
                          >
                          > What do you mean? If you place an entry somewhere in the table, and next
                          > time you go looking for it your \"random hash\" lands you somewhere else,
                          how
                          > will you find that existing entry? Maybe you mean ordering entries in one
                          > hash chain in a random order? But I can\'t see what that would get you -
                          > hash chains are supposed to be short anyway.
                          >
                          > --
                          > Nadav Har\'El | Thursday, Feb 7 2002, 25 Shevat
                          > 5762
                          > nyh@...
                          > |-----------------------------------------
                          > Phone: +972-53-245868, ICQ 13349191 |The world is coming to an end ... SAVE
                          > http://nadav.harel.org.il |YOUR BUFFERS!!!
                          >
                          > To unsubscribe from this group, send an email to:
                          > hackers-il-unsubscribe@egroups.com
                          >
                          >
                          >
                          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                          >
                          >
                          >
                          > **********************************************************************
                          > This email and any files transmitted were checked by
                          > Port Authority Enterprise for unathorized content.
                          > **********************************************************************
                          >
                          >
                          >
                          > To unsubscribe from this group, send an email to:
                          > hackers-il-unsubscribe@egroups.com
                          >
                          >
                          >
                          > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
                          >
                          >




                          Arik Baratz
                          System Engineer
                          arikb@...

                          Office:
                          4 Hamelacha St.
                          Raa’nana 43661
                          ISRAEL

                          Tel: +972 (9) 743-9250 ext. 214
                          Fax: +972 (9) 743-9251
                          Cell: +972 (52) 354 959
                          eFax: +1 (978) 926-8913
                          ICQ: 210 8214

                          Privileged and / or confidential Information may be contained in this electronic
                          mail message.

                          You may not copy or deliver this message to anyone without my consent.

                          If you are not the addressee indicated in this message, or you feel that this
                          message is not intended for you, Please destroy this message and kindly notify
                          the sender by replying to this electronic mail.

                          Please advise immediately if you or your employer do not agree to the use of
                          Internet email for messages of this kind.

                          Opinions, conclusions and other information in this message that do not relate
                          to the official business of Vidius shall be understood as neither given nor
                          endorsed by it.
                        • Nadav Har'El
                          ... Wow, something is *REALLY* wrong with your mail program, Vidius filter, or whatever... A part of the the subject got hacked off into the main text (as you
                          Message 12 of 22 , Feb 7, 2002
                            On Thu, Feb 07, 2002, Arik Baratz wrote about "=?iso-8859-1?Q?Re: [hackers-il] Re: To Hash or not to Hash [was Re: \\\"On Lisp\\\" now av=":
                            > ailable ]?=
                            > MIME-Version: 1.0
                            > Content-Type: text/plain; charset=iso-8859-1

                            Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                            or whatever... A part of the the subject got hacked off into the main
                            text (as you can see in the quote above),

                            > > > want a lean and mean function, you\\\'d have to analyze the domain you are
                            > working
                            > > > with and think up a function that hashes it ok.
                            > > That was the case with Freecell Solver, where I recently switched from MD5
                            > > to Perl\'s hash function and did not notice too big a difference in the

                            and something caused all single quotes in your message (even its subject line)
                            to be backslashed, sometimes by more than one backslash. What is this - a
                            mailer written in a shell? :)

                            Weird ;)


                            --
                            Nadav Har'El | Thursday, Feb 7 2002, 26 Shevat 5762
                            nyh@... |-----------------------------------------
                            Phone: +972-53-245868, ICQ 13349191 |I before E except after C. We live in a
                            http://nadav.harel.org.il |weird society!
                          • Arik Baratz
                            On Lisp now av=3D?= MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 ... [hackers-il] Re: To Hash or not to Hash [was Re:
                            Message 13 of 22 , Feb 7, 2002
                              \\\\\\"On Lisp\\\\\\\" now av=3D?=
                              MIME-Version: 1.0
                              Content-Type: text/plain; charset=iso-8859-1

                              On 07.02.2002 at 21:18:58, Nadav Har\'El <nyh@...> wrote:

                              > On Thu, Feb 07, 2002, Arik Baratz wrote about \"=?iso-8859-1?Q?Re:
                              [hackers-il] Re: To Hash or not to Hash [was Re: \\\\\\\"On Lisp\\\\\\\" now
                              av=\":
                              > > ailable ]?=
                              > > MIME-Version: 1.0
                              > > Content-Type: text/plain; charset=iso-8859-1
                              >
                              > Wow, something is *REALLY* wrong with your mail program, Vidius filter,
                              > or whatever... A part of the the subject got hacked off into the main
                              > text (as you can see in the quote above),

                              I\'m in the united states right now, and I\'m using JawMail
                              http://jawmail.sf.net beta version. It has some flaws, I admit.
                              >
                              > > > > want a lean and mean function, you\\\\\\\'d have to analyze the domain
                              you are
                              > > working
                              > > > > with and think up a function that hashes it ok.
                              > > > That was the case with Freecell Solver, where I recently switched from
                              MD5
                              > > > to Perl\\\'s hash function and did not notice too big a difference in
                              the
                              >
                              > and something caused all single quotes in your message (even its subject
                              line)
                              > to be backslashed, sometimes by more than one backslash. What is this - a
                              > mailer written in a shell? :)
                              >
                              > Weird ;)

                              Not in shell, in PHP. It\'s somewhat buggy, but it works...


                              Arik Baratz
                              System Engineer
                              arikb@...

                              Office:
                              4 Hamelacha St.
                              Raa’nana 43661
                              ISRAEL

                              Tel: +972 (9) 743-9250 ext. 214
                              Fax: +972 (9) 743-9251
                              Cell: +972 (52) 354 959
                              eFax: +1 (978) 926-8913
                              ICQ: 210 8214

                              Privileged and / or confidential Information may be contained in this electronic
                              mail message.

                              You may not copy or deliver this message to anyone without my consent.

                              If you are not the addressee indicated in this message, or you feel that this
                              message is not intended for you, Please destroy this message and kindly notify
                              the sender by replying to this electronic mail.

                              Please advise immediately if you or your employer do not agree to the use of
                              Internet email for messages of this kind.

                              Opinions, conclusions and other information in this message that do not relate
                              to the official business of Vidius shall be understood as neither given nor
                              endorsed by it.
                            • mulix
                              ... [snipped other such monstrosities, and then arik said] ... egads, whatever happened to good ol telnet my.mail.server.com 25 and talking SMTP like
                              Message 14 of 22 , Feb 7, 2002
                                On Thu, 7 Feb 2002, Arik Baratz wrote:

                                > \\\\\\"On Lisp\\\\\\\" now av=3D?=
                                > MIME-Version: 1.0
                                > Content-Type: text/plain; charset=iso-8859-1

                                [snipped other such monstrosities, and then arik said]
                                > Not in shell, in PHP. It\'s somewhat buggy, but it works...

                                egads, whatever happened to good ol' telnet my.mail.server.com 25 and
                                talking SMTP like reasonable human beings?

                                ObHackersIL - nothing. things have certainly hit a low point.
                                --
                                mulix

                                http://vipe.technion.ac.il/~mulix/
                                http://syscalltrack.sf.net/
                              Your message has been successfully submitted and would be delivered to recipients shortly.