Loading ...
Sorry, an error occurred while loading the content.

About Hashtbl

Expand Messages
  • Fabrice Marchant
    Hi ! I ve 2 questions about this module : * I have no idea about any possible use of Hashtbl.fold... * Iterating through the table by key ? Hashtabl.iter doc
    Message 1 of 9 , Apr 5, 2007
    • 0 Attachment
      Hi !

      I've 2 questions about this module :

      * I have no idea about any possible use of Hashtbl.fold...

      * Iterating through the table by key ?

      Hashtabl.iter doc for "Hashtbl.iter f tbl" says "...if the table contains several bindings for the same key, they are passed to f in reverse order of introduction..."
      OK, for a given key, iter exhibits the bindings in this order. But the output of different keys are mixed.

      In fact the issue I encounter is :
      I'm only interested in list of rare multiple data that share the same hash key. ( Numerous lists of keys only once bound do not care. )

      Please how to get the list of these keys that share multiple bindings, in order then to get these data with Hashtbl.find_all ?

      Maybe must I keep track of the fact some keys are multiple, (checking this with Hashtabl.find) in parallel with hash-table filling ?

      Or is there any straightforward way to perform this ?

      I trust in your clever advices.

      Fabrice
    • Richard Jones
      ... Hashtbl itself is a little bit strange because of the single key / multiple value thing. It was originally designed to store the symbol tables inside the
      Message 2 of 9 , Apr 5, 2007
      • 0 Attachment
        On Thu, Apr 05, 2007 at 12:38:23PM +0200, Fabrice Marchant wrote:
        > Hi !
        >
        > I've 2 questions about this module :
        >
        > * I have no idea about any possible use of Hashtbl.fold...

        Hashtbl itself is a little bit strange because of the single key /
        multiple value thing. It was originally designed to store the symbol
        tables inside the OCaml compiler (or so legend goes anyway ...), and
        in that context this is useful. In other contexts it's not very
        useful.

        A common use of Hashtbl is to limit it so that you can only add a
        single value for each key. You might also consider using Map instead,
        which has its own set of problems.

        Hashtbl.fold is used to iterate over all (key, value) pairs in the
        hash table. It's very useful.

        > Maybe must I keep track of the fact some keys are multiple,
        > (checking this with Hashtabl.find) in parallel with hash-table filling

        Basically yes.

        Have a look at the attached implementation of a Counter, using a hash
        table in this mode. In particular at the implementation of the
        function 'get_ref'.

        As a bonus this implementation also uses Hashtbl.fold.

        Rich.

        ----------------------------------------------------------------------
        (** Basic counting module.
        * $Id: counter.mli,v 1.4 2006/09/05 15:37:59 rich Exp $
        *)

        type 'a t
        (** Count items of type ['a]. *)

        val create : unit -> 'a t
        (** Create a new counter. *)

        val incr : 'a t -> 'a -> unit
        (** [incr counter thing] adds one to the count of [thing]s in [counter]. *)

        val decr : 'a t -> 'a -> unit
        (** [decr counter thing] subtracts one to the count of [thing]s in [counter]. *)

        val add : 'a t -> 'a -> int -> unit
        (** [add counter thing n] adds [n] to the count of [thing]s in [counter]. *)

        val sub : 'a t -> 'a -> int -> unit
        (** [sub counter thing n] subtracts [n] to the count of [thing]s in [counter]. *)

        val set : 'a t -> 'a -> int -> unit
        (** [set counter thing n] sets the count of [thing]s to [n]. *)

        val get : 'a t -> 'a -> int
        (** [get counter thing] returns the count of [thing]s. (Returns 0 for
        * [thing]s which have not been added.
        *)

        val incr_get : 'a t -> 'a -> int
        (** Faster form of {!Counter.incr} followed by {!Counter.get}. *)

        val zero : 'a t -> 'a -> unit
        (** [zero counter thing] sets the count of [thing]s to 0.
        * See also {!Counter.clear}.
        *)

        val read : 'a t -> (int * 'a) list
        (** [read counter] reads the frequency of each thing. They are sorted
        * with the thing appearing most frequently first. Only things occurring
        * non-zero times are returned.
        *)

        val length : 'a t -> int
        (** Return the number of distinct things. See also {!Counter.total} *)

        val total : 'a t -> int
        (** Return the number of things counted (the total number of counts).
        * See also {!Counter.length}
        *)

        val clear : 'a t -> unit
        (** [clear counter] zeroes all counts. *)
        ----------------------------------------------------------------------
        (* Basic counting module.
        * $Id: counter.ml,v 1.4 2006/09/05 15:37:59 rich Exp $
        *)

        type 'a t = ('a, int ref) Hashtbl.t

        let create () =
        Hashtbl.create 13

        let get_ref counter thing =
        try
        Hashtbl.find counter thing
        with
        Not_found ->
        let r = ref 0 in
        Hashtbl.add counter thing r;
        r

        let incr counter thing =
        let r = get_ref counter thing in
        incr r

        let decr counter thing =
        let r = get_ref counter thing in
        decr r

        let add counter thing n =
        let r = get_ref counter thing in
        r := !r + n

        let sub counter thing n =
        let r = get_ref counter thing in
        r := !r - n

        let set counter thing n =
        let r = get_ref counter thing in
        r := n

        (* Don't use get_ref, to avoid unnecessarily creating 'ref 0's. *)
        let get counter thing =
        try
        !(Hashtbl.find counter thing)
        with
        Not_found -> 0

        (* This is a common pair of operations, worth optimising. *)
        let incr_get counter thing =
        let r = get_ref counter thing in
        Pervasives.incr r;
        !r

        let zero = Hashtbl.remove

        let read counter =
        let counts =
        Hashtbl.fold (
        fun thing r xs ->
        let r = !r in
        if r <> 0 then (r, thing) :: xs
        else xs
        ) counter [] in
        List.sort (fun (a, _) (b, _) -> compare (b : int) (a : int)) counts

        let length = Hashtbl.length

        let total counter =
        let total = ref 0 in
        Hashtbl.iter (fun _ r -> total := !total + !r) counter;
        !total

        let clear = Hashtbl.clear
        ----------------------------------------------------------------------

        --
        Richard Jones
        Red Hat
      • Fabrice Marchant
        Hi Rich ! Thank you for your precious answer. ... Ah, this solves my two questions ! It is useful to build a list from the table. I will hack your Counter.read
        Message 3 of 9 , Apr 6, 2007
        • 0 Attachment
          Hi Rich !

          Thank you for your precious answer.

          > Hashtbl.fold is used to iterate over all (key, value) pairs in the
          > hash table. It's very useful.
          Ah, this solves my two questions !
          It is useful to build a list from the table. I will hack your Counter.read and keep only multiple values before the sort of the list.

          > You might also consider using Map instead,...
          I've begun to experiment with it.

          > ...which has its own set of problems.
          Please, what are they ?

          Best regards.

          Fabrice
          --------
          Never put off until tomorrow that which can be done the day after tomorrow.
          ( Who said that ?)
        • Jon Harrop
          ... Hash tables are faster and smaller than maps in OCaml. Hash tables provide a default structural hash function which, when it works (when structural
          Message 4 of 9 , Apr 6, 2007
          • 0 Attachment
            On Friday 06 April 2007 23:40, Fabrice Marchant wrote:
            > > You might also consider using Map instead,...
            > > ...which has its own set of problems.
            >
            > Please, what are they ?

            Hash tables are faster and smaller than maps in OCaml. Hash tables provide a
            default structural hash function which, when it works (when structural
            equality is semantic equality) makes hash tables easier to use than maps.

            Maps have better worse-case performance. Maps are threadsafe. Maps are kept
            sorted.

            --
            Dr Jon D Harrop, Flying Frog Consultancy Ltd.
            OCaml for Scientists
            http://www.ffconsultancy.com/products/ocaml_for_scientists
          • Richard Jones
            ... In addition to Jon s answer: Map requires use of the functor syntax. This can be quite obscure -- I know that I usually have to look up the syntax when I
            Message 5 of 9 , Apr 7, 2007
            • 0 Attachment
              On Sat, Apr 07, 2007 at 12:40:13AM +0200, Fabrice Marchant wrote:
              > > You might also consider using Map instead,...
              > I've begun to experiment with it.
              >
              > > ...which has its own set of problems.
              > Please, what are they ?

              In addition to Jon's answer: Map requires use of the functor syntax.
              This can be quite obscure -- I know that I usually have to look up the
              syntax when I use it.

              In more general terms, Map is persistent. This Wikipedia page has
              some diagrams I copied from the Okasaki book which explains
              persistence:

              http://en.wikipedia.org/wiki/Purely_functional

              Rich.

              --
              Richard Jones
              Red Hat
            • Dave Benjamin
              ... Also, since Map requires the type of keys to be given at module construction time, it s not possible to do certain things with Map like write a function
              Message 6 of 9 , Apr 7, 2007
              • 0 Attachment
                On Sat, 7 Apr 2007, Richard Jones wrote:

                > On Sat, Apr 07, 2007 at 12:40:13AM +0200, Fabrice Marchant wrote:
                >>> You might also consider using Map instead,...
                >> I've begun to experiment with it.
                >>
                >>> ...which has its own set of problems.
                >> Please, what are they ?
                >
                > In addition to Jon's answer: Map requires use of the functor syntax.
                > This can be quite obscure -- I know that I usually have to look up the
                > syntax when I use it.

                Also, since Map requires the type of keys to be given at module
                construction time, it's not possible to do certain things with Map like
                write a function that is polymorphic across all Maps.

                For instance, we can write a function to return the keys from an IntMap:

                module IntMap = Map.Make(struct
                type t = int
                let compare = compare
                end)

                let keys m =
                List.rev (IntMap.fold (fun k v a -> k :: a) m [])

                But we can't use this function with a StringMap or any other kind of Map
                since there's no way to abstract over the module itself.

                The PMap module in Extlib implements a persistent Map without using
                functors, so writing functions like "keys" is again possible.

                The only downside to PMap is that you can't safely compare two PMaps like
                you can with Map.compare, since you can't guarantee that two PMaps have
                equivalent comparison functions. Though from my experience, the need to do
                this is rare.

                Back to the original question: Most of the time, if you just use
                Hashtbl.replace and pretend like Hashtbl.add does not exist, you'll get
                the behavior you want. I like Map because it's purely functional, but for
                most tasks Hashtbl is faster and easier to use (ignoring the
                multiple-value thing).

                Dave
              • Jon Harrop
                ... On the contrary, there are many ways to abstract that function. The simplest is to factor out the fold that is used: let keys fold m = List.rev
                Message 7 of 9 , Apr 7, 2007
                • 0 Attachment
                  On Saturday 07 April 2007 21:18, Dave Benjamin wrote:
                  > let keys m =
                  > List.rev (IntMap.fold (fun k v a -> k :: a) m [])
                  >
                  > But we can't use this function with a StringMap or any other kind of Map
                  > since there's no way to abstract over the module itself.

                  On the contrary, there are many ways to abstract that function. The simplest
                  is to factor out the fold that is used:

                  let keys fold m =
                  List.rev (IntMap.fold (fun k _ t -> k :: t) m [])

                  An alternative is to put keys in a functor that accepts a module implementing
                  fold as an argument.

                  F# doesn't have functors and implements Sets and Maps the way you suggest. I
                  find it easier to use than OCaml in this context but it has some other
                  important differences:

                  1. Thanks to .NET being OO, structural comparison defers to customized
                  comparison functions, e.g. comparing two Maps with = works.

                  2. Because .NET is OO, it can be significantly slower than OCaml.

                  --
                  Dr Jon D Harrop, Flying Frog Consultancy Ltd.
                  OCaml for Scientists
                  http://www.ffconsultancy.com/products/ocaml_for_scientists
                • Fabrice Marchant
                  Thanks Jon for the explanations about differences between Map and Hashtbl. I wonder how it is possible to discover these properties because I do not saw such a
                  Message 8 of 9 , Apr 7, 2007
                  • 0 Attachment
                    Thanks Jon for the explanations about differences between Map and Hashtbl.
                    I wonder how it is possible to discover these properties because I do not saw such a comparison in the doc.
                    Maybe we can learn this in studying the sources of these modules ?

                    Thanks Rich for your Wikipedia page.
                    I discover there is about the same things in the Book of Philippe Narbel "Programmation fonctionnelle générique et objet"
                    http://www.vuibert.com/livre12724.html
                    (and Okasaki is in its Bibliography).
                    Chapter 6.4. "L'effet de persistance" Persistance benefits, and drawbacks too, are studied.

                    Dave, thank you for your explanations. I'm glad to discover this alternative with PMap !
                    And it make me remember about Pomap from Markus Mottl ( who has read Okasaki too ) : I must consider this 4th lib.
                    http://www.ocaml.info/home/ocaml_sources.html


                    Best regards.

                    Fabrice.
                  • Jon Harrop
                    ... Read my book. :-) While you re there, please add it to the list of books on Wikipedia s OCaml page! -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml
                    Message 9 of 9 , Apr 7, 2007
                    • 0 Attachment
                      On Saturday 07 April 2007 23:29, Fabrice Marchant wrote:
                      > Thanks Jon for the explanations about differences between Map and
                      > Hashtbl. I wonder how it is possible to discover these properties because I
                      > do not saw such a comparison in the doc. Maybe we can learn this in
                      > studying the sources of these modules ?

                      Read my book. :-)

                      While you're there, please add it to the list of books on Wikipedia's OCaml
                      page!

                      --
                      Dr Jon D Harrop, Flying Frog Consultancy Ltd.
                      OCaml for Scientists
                      http://www.ffconsultancy.com/products/ocaml_for_scientists
                    Your message has been successfully submitted and would be delivered to recipients shortly.