Loading ...
Sorry, an error occurred while loading the content.
 

Fwd: [Caml - list] Using zippers to handle huge trees

Expand Messages
  • Stalkern 2
    Thanks to Claudio for the hint; here I join a message that I had bookmarked on the seniors list before knowing that I needed it :-) and that I m about to
    Message 1 of 7 , May 9, 2003
      Thanks to Claudio for the hint; here I join a message that I had bookmarked on
      the seniors' list before knowing that I needed it :-) and that I'm about to
      study.

      I'd also have a question more or rather a bunch of infosophical hot spots.

      In Computer Science, trees are everywhere. Filesystems, LDAP, Document Bases,
      XML, and so on and so forth.
      Maybe it's one of the easiest recursive structures around. Now if I go and
      look for trees in the Dictionary of Algorithms and Data Structures at
      http://www.nist.gov/dads/ , or in Structure And Interpretation of Computer
      Program by Abelson and Sussman, I get a... forest. But this forest seems to
      be generally aimed at searching... on made-to-measure trees. So I'm in
      trouble for recognizing these algorithms in real -life example. I wonder:

      1) What of the known tree structures is a plain XML file? I mean, the
      structure is the one of the DTD, but does it have a specific name and theory
      other than the one of Aristoteles, or is it "just a tree"?

      2) What of the known tree structures is its DOM? Is its structure a footprint
      of teh XML's one, or does it come from some other tradition or school of
      thought?

      3) What of the known tree structures is a XML file with extended links? Is it
      just a set of interwoven trees or is there a theory for that, such for
      matrixes?

      4) Trees can be ordered sets and for instance distinguish left-branching and
      right-branching. Can they also implement n dimension apart from ordered
      parenting and ordered sibling?

      5) Does linking between separate trees always require that a supertree exists
      or can be thought of? (something like PXP's superroots)

      Thanks for any hint and food for thought.

      Ernesto


      ---------- Messaggio inoltrato ----------

      Subject: [Caml-list] Using zippers to handle huge trees
      Date: Friday 11 April 2003 13:17
      From: Diego Olivier Fernandez Pons <Diego-Olivier.FERNANDEZ-PONS at
      cicrp.jussieu.fr>
      To: caml-list at inria.fr

      Bonjour,

      On Wednesday, 09 Apr 2003, Yang Shouxun wrote:
      > I don't know how to write a tail recursive version to build trees.
      > If there are not that many continuous attributes and the dataset is
      > not so large, the tree stops growing before stack overflow.

      On Wednesday 09 April 2003, Markus Mottl wrote:
      > The trick is to use continuation passing style (CPS): you pass a
      > function closure (continuation) containing everything that's needed
      > in subsequent computations. Instead of returning a result, the
      > sub-function calls the continuation with the result, which makes the
      > functions tail-recursive.

      Zippers are a simple way to handle huge (in fact infinite) trees.

      1. Explanation of zippers
      2. Related work
      3. Examples of code

      1. Explanation of zippers

      Zippers may be seen as 'functional pointers' since they offer :
      - purely functional and typed operations
      - O(1) acces to the pointed element
      - O(1) pointer movements

      The restrictions are that only one pointer is allowed by data
      structure and every pointer movement allocates memory.

      Take a classical type declaration for binary search trees

      type 'a tree = E | N of 'a tree * 'a * 'a tree * int

      Consider a binary search tree and an inner node to which you want to
      point. To have a 0(1) acces to the pointed subtree, it has to be
      directly available from the base of the data structure. Then, the
      rest of the tree must be kept in a separate place. We will deconstruct
      it along the path from the root to the pointed node

      type 'a path =

      | Root
      | Left of 'a * 'a tree * 'a path
      | Right of 'a tree * 'a * 'a path

      type 'a zipper = 'a tree * 'a path

      The zipper contrains as annouced :
      - the pointed subtree
      - the rest of the tree breaked along the path to the root

      Then we define the pointer movements (one for each pointer in the data
      structure) :

      exception Bottom

      (* To be replaced by a balancing constructor *)
      let makeDTree = fun l v r -> N (l, v, r, 0)

      let move_left = fun (tree, path) ->
      match tree with

      | E -> raise Bottom
      | N (l, v, r, _) -> (l, Left (v, r, path))

      let move_right = fun (tree, path) ->
      match tree with

      | E -> raise Bottom
      | N (l, v, r, _) -> (r, Right (l, v, path))

      let move_up = fun (tree, path) ->
      match path with

      | Root -> raise Top
      | Left (v, r, tail) -> (makeDTree tree v r, tail)
      | Right (l, v, tail) -> (makeDTree l v tree, tail)

      Now we can build an arbitrary large tree by the following procedure :
      - build a tree of bounded depth
      - choose the node which will be developped next
      - move the current pointer to that node
      - continue building the tree

      This procedure uses a bounded stack space

      2. Related work

      Zippers were invented by Gérard Huet. There is a paper on the JFP
      G. Huet. The Zipper. J. Functional Programming, 7 (5), Sept 1997, pp.
      549--554.

      He uses n-ary trees and binary trees in his examples. The main
      difference is that in binary trees the pointers are not organized in
      the same way (his 'left' operation is what in Baire is (left o up))

      Ralf Hinze has tried to give a general framework for functional
      pointers named a web (you give your data structure and the basic
      transformations and the data structure does the rest)

      Ralf Hinze and Johan Jeuring. Functional Pearl: Weaving a Web. Journal
      of Functional Programming, 11(6):681-689, November 2001.

      Available on the net via Hinze's home page.
      In my opinion his article is not really convincing and not very clear.

      Several libraries already use zippers

      - Zen (Gérard Huet, Caml) uses zippers to handle acyclic automata
      minimization

      - SML/NJ Standard library (John Reppy) uses zippers to handle deletion
      in red-black trees

      - MetaPRL (Caml) uses zippers to handle insertion and deletion in
      splay and red-black trees

      - Grammatical Framework (Aarne Ranta, Haskell) uses zippers to
      navigate through n-ary trees.

      All this code is available on the web.

      3. Examples of code

      Here are some examples of usual binary search trees operations made
      whith zippers (insert, delete, move_pointer_to, ...) extracted from
      Baire (current version 11 avril 2003, plenty of bugs and breaked
      code, you will find it in Baire's download pages) :

      let rec move_to_top = function ((tree, path) as pointer) ->
      match path with

      | Root -> pointer
      | Left (v, r, tail) -> move_to_top (makeDTree tree v r, tail)
      | Right (l, v, tail) -> move_to_top (makeDTree l v tree, tail)

      let rec move_to x = function ((tree, path) as pointer) ->
      match tree with

      | E ->

      (match path with

      | Right (_, rv, _) when x <= rv ->

      move_to x (move_up pointer)

      | Left (lv, _, _) when x >= lv ->

      move_to x (move_up pointer)

      | _ -> pointer

      )

      | N (_, v, _, _) ->

      match compare x v with

      | n when n < 0 ->

      (match path with

      | Right (_, rv, _) when x < rv ->

      move_to x (move_up pointer)

      | Right _ | Root | Left _ ->

      move_to x (move_left pointer)
      )

      | n when n > 0 ->

      (match path with

      | Left (lv, _, _) when x > lv ->

      move_to x (move_up pointer)

      | Left _ | Root | Right _ ->

      move_to x (move_right pointer)
      )

      | _ -> pointer

      let rec member_path x = function

      | Right (l, v, tail) ->

      (match compare x v with

      | n when n < 0 -> member x l
      | 0 -> true
      | _ -> member_path x tail

      )

      | Left (v, r, tail) ->

      (match compare x v with

      | n when n > 0 -> member x r
      | 0 -> true
      | _ -> member_path x tail

      )

      | Root -> false

      let rec zipper_member x = function (tree, path) ->
      match tree with

      | E -> member_path x path
      | N (l, v, r, _) ->

      match compare x v with

      | n when n < 0 ->

      (match path with

      | Right (_, rv, _) when x <= rv -> member_path x path
      | Right _ | Root | Left _ -> member x l

      )

      | n when n > 0 ->

      (match path with

      | Left (lv, _, _) when x >= lv -> member_path x path
      | Left _ | Root | Right _ -> member x r

      )

      | _ -> true

      let current_tree = function (tree, _) -> tree

      let current_value = function (tree, _) ->
      match tree with

      | E -> None
      | N (_, v, _, _) -> Some v

      let current_value' = function (tree, _) ->
      match tree with

      | E -> raise Empty
      | N (_, v, _, _) -> v

      let rec zipper_insert x = function ((tree, path) as pointer) ->
      match tree with

      | E ->

      (match path with

      | Right (_, rv, _) when x <= rv ->

      zipper_insert x (move_up pointer)

      | Left (lv, _, _) when x >= lv ->

      zipper_insert x (move_up pointer)

      | _ -> (makeTree E x E, path)

      )

      | N (_, v, _, _) ->

      match compare x v with

      | n when n < 0 ->

      (match path with

      | Right (_, rv, _) when x < rv ->

      zipper_insert x (move_up pointer)

      | Right _ | Root | Left _ ->

      zipper_insert x (move_left pointer)
      )

      | n when n > 0 ->

      (match path with

      | Left (lv, _, _) when x > lv ->

      zipper_insert x (move_up pointer)

      | Left _ | Root | Right _ ->

      zipper_insert x (move_right pointer)
      )

      | _ -> pointer

      let rec zipper_delete x = function ((tree, path) as pointer) ->
      match tree with

      | E ->

      (match path with

      | Right (_, rv, _) when x <= rv ->

      zipper_delete x (move_up pointer)

      | Left (lv, _, _) when x >= lv ->

      zipper_delete x (move_up pointer)

      | _ -> pointer

      )

      | N (l, v, r, _) ->

      match compare x v with

      | n when n < 0 ->

      (match path with

      | Right (_, rv, _) when x < rv ->

      zipper_delete x (move_up pointer)

      | Right _ | Root | Left _ ->

      zipper_delete x (move_left pointer)
      )

      | n when n > 0 ->

      (match path with

      | Left (lv, _, _) when x > lv ->

      zipper_delete x (move_up pointer)

      | Left _ | Root | Right _ ->

      zipper_delete x (move_right pointer)
      )

      | _ -> move_to x (appendB l r, path)

      let rec path_to_list result = function

      | Root -> result
      | Left (v, r, path) ->

      path_to_list (result @ v :: to_ordered_list r) path

      | Right (l, v, path) ->

      path_to_list (to_ordered_list_rec (v :: result) l) path

      let zipper_to_list = function (tree, path) ->
      path_to_list (to_list tree) path


      Diego Olivier

      -------------------
      To unsubscribe, mail caml-list-request@... Archives:
      http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
      http://caml.inria.fr/FAQ/ Beginner's list:
      http://groups.yahoo.com/group/ocaml_beginners

      -------------------------------------------------------
    • Brian Hurt
      ... 1) They re easy to implement 2) They have very nice properties. In addition to fast searching, they also have reasonably fast inserts and deletes. 3)
      Message 2 of 7 , May 9, 2003
        On Fri, 9 May 2003, Stalkern 2 wrote:
        >
        > In Computer Science, trees are everywhere. Filesystems, LDAP, Document Bases,
        > XML, and so on and so forth.
        > Maybe it's one of the easiest recursive structures around.

        1) They're easy to implement

        2) They have very nice properties. In addition to fast searching, they
        also have reasonably fast inserts and deletes.

        3) Trees are a special case of the more general concept of graphs- which
        have their own branch of mathematical theory (called, coincidentally
        enough, graph theory) dedicated to them.

        > Now if I go and
        > look for trees in the Dictionary of Algorithms and Data Structures at
        > http://www.nist.gov/dads/ , or in Structure And Interpretation of Computer
        > Program by Abelson and Sussman, I get a... forest. But this forest seems to
        > be generally aimed at searching... on made-to-measure trees. So I'm in
        > trouble for recognizing these algorithms in real -life example. I wonder:

        You listed a whole slew of real-life uses of trees above :-). And you
        forgot/didn't notice hundreds of other examples.

        >
        > 1) What of the known tree structures is a plain XML file? I mean, the
        > structure is the one of the DTD, but does it have a specific name and theory
        > other than the one of Aristoteles, or is it "just a tree"?

        XML is just text. Now, once you *parse* the text, one valid way to view
        the text is as a tree. But it isn't the only way (for example, SAX views
        XML as a series of events- the events being SAX recognizing various tags).
        To be pendatic, it's DOM (or it's equivelents) which are trees, XML is
        just formatted text.

        >
        > 2) What of the known tree structures is its DOM? Is its
        > structure a footprint of teh XML's one, or does it come from some
        > other tradition or school of thought?

        DOM, the tree representation of an XML file, is a tree with different
        types of data of the nodes. You could implement it something like:

        type domData = Tag of string * (string * string) list * ...
        (* tag id, attributes, etc. *)
        | Cdata of string
        | ...

        type domNode = Leaf
        | Node of domData * domNode * domNode

        But note the similiarities between this definition and the definition from
        my previous post:

        type 'a tree = Leaf
        | Node of 'a * 'a tree * 'a tree

        I could have defined domNode just as:

        type domNode = domData tree

        The computer scientists just sort of assume you understand that
        generalization is going on. This allows them to think about trees in the
        abstract (and over the last 50 years, a lot of thinking about trees has
        gone on) and assume the programming will ungeneralize it as needed.

        >
        > 3) What of the known tree structures is a XML file with extended
        > links? Is it just a set of interwoven trees or is there a theory for
        > that, such for matrixes?

        Actually, how it works depends upon how the programmer decides to
        implement it. You can do it one of three obvious ways. First off, you
        can simply replace the link node with the root node of the tree the link
        node references. Or, the link node can simply contain a pointer to the
        root node of the tree it references, but still exist. Or the tree it
        references may or may not even exist in the program's memory, and instead
        the link node could contain data on how to fetch the tree (the URI/URL,
        for example). Depends upon what you want to do.

        >
        > 4) Trees can be ordered sets and for instance distinguish
        > left-branching and right-branching. Can they also implement n
        > dimension apart from ordered parenting and ordered sibling?

        Yes. A common data structure in graphs is the quad tree. Each node of a
        quad tree has four children. So the whole screeen (or window) would be
        the root node, and it's four children are the upper left, upper right,
        lower left, and lower right quadrants of the screen. Each quadrant in
        turn has four subquadrants, and so on. Where a particular graphics object
        is on the screen controls where it is in the quad tree. A variant of the
        quad tree for 3D is the oct tree. So these are examples of
        two-dimensional (or three-dimensional) partitioning of a tree.

        >
        > 5) Does linking between separate trees always require that a
        > supertree exists or can be thought of? (something like PXP's
        > superroots)

        No. See above.

        Brian
      • Stalkern 2
        ... You have been enlighting, Brian. I m touched and I have been thinking about your words until late night. Thank you so much... I m very curious about this
        Message 3 of 7 , May 11, 2003
          Il Friday 09 May 2003 17:50, Brian Hurt ha scritto:
          > On Fri, 9 May 2003, Stalkern 2 wrote:
          > > In Computer Science, trees are everywhere. Filesystems, LDAP, Document
          > > Bases, XML, and so on and so forth.
          > > Maybe it's one of the easiest recursive structures around.
          >
          > 1) They're easy to implement
          >
          > 2) They have very nice properties. In addition to fast searching, they
          > also have reasonably fast inserts and deletes.
          >
          > 3) Trees are a special case of the more general concept of graphs- which
          > have their own branch of mathematical theory (called, coincidentally
          > enough, graph theory) dedicated to them.

          You have been enlighting, Brian. I'm touched and I have been thinking about
          your words until late night.

          Thank you so much... I'm very curious about this knowledge.

          Ernesto
        • Stalkern 2
          ... [...] Following the words Nodes are doubly linked trees at http://www.ocaml-programming.de/packages/documentation/pxp/manual/x941.html I ve tried out
          Message 4 of 7 , May 13, 2003
            Il Monday 12 May 2003 00:59, Stalkern 2 ha scritto:
            > <html><body>
            >
            [...]

            Following the words "Nodes are doubly linked trees" at
            http://www.ocaml-programming.de/packages/documentation/pxp/manual/x941.html
            I've tried out trees as doubled-linked lists.
            I've been starting from
            http://caml.inria.fr/oreilly-book/html/book-ora031.html#toc43

            and wrote a type like

            type 'a node = {
            name : string ;
            mutable data : 'a ;
            mutable parent : 'a dll ;
            mutable level : int;
            mutable rank : int;
            mutable children : 'a dll list }
            and 'a dll = EmptyElement | Leaf of 'a | Node of 'a node ;;

            and a few functions:


            let add_child_by_name childName childData aNode =
            match aNode with
            EmptyElement ->
            (* when adding to a empty node, no parent and no children*)
            Node {
            name=childName ;
            data=childData ;
            parent=EmptyElement ;
            level=0 ;
            rank=0 ;
            children=[] }
            | Leaf s -> assert false
            | Node n as targetParent ->
            (* when adding a new Cell before a addressee Cell, we take as prev of
            the new Cell the former prev of addressee Cell and as next of the new Cell
            the addressee Cell itself.
            Then, we set the new Cell as previous of the adressee Cell and
            if the prev of the new Cell is not Empty, that could happen if the previous
            of the addressee cell was empty, we set as the new Cell as next of it*)
            let newNodeRecord = {
            name=childName ;
            data=childData ;
            parent=targetParent;
            level=(n.level + 1);
            rank=(List.length n.children) ;
            children=[] } in
            let newNode = Node newNodeRecord in
            n.children <- (n.children@[newNode]);
            targetParent;;



            let rec walk_vertically func aNode =

            let rec treat_nodeList aNodeList =
            match aNodeList with
            [] -> []
            | hd::tail ->
            let newHd =
            match hd with
            EmptyElement -> hd
            | Leaf s -> hd
            | Node n as aNode ->
            Node {
            name = n.name ;
            data = func n.data ;
            parent=n.parent ;
            level=n.level ;
            rank=n.rank ;
            children= (treat_nodeList n.children)
            } in

            newHd::(treat_nodeList tail)
            in
            treat_nodeList [aNode];;


            let rec show_structure data_to_string_function aNode =
            match aNode with
            EmptyElement -> ()
            | Leaf s -> ()
            | Node n ->
            let () = Printf.printf "%d %d %s %s\n" n.level n.rank n.name
            (data_to_string_function n.data) in
            let () = flush Pervasives.stdout in

            (List.iter (fun n -> show_structure data_to_string_function n)
            n.children);;


            And so:

            # let aT1 = add_child_by_name "twin child" 500 (
            add_child_by_name "twin child" 400 (
            add_child_by_name "second child" 300 (
            add_child_by_name "first child" 200 (
            add_child_by_name "root" 100 EmptyElement))));;

            # show_structure string_of_int aT1;;
            0 0 root 100
            1 0 first child 200
            1 1 second child 300
            1 2 twin child 400
            1 3 twin child 500
            - : unit = ()


            Apparently, this is very simple and can be switched to OOP very easily and so
            become as easy to use as PXP DOM.

            Is there any main drawback of this approach that I should know? For instance,
            I can't "see" this structure, the printout of ocaml loops; I can only set it.

            T I A
            Ernesto
          • Stalkern 2
            Il Tuesday 13 May 2003 12:04, Stalkern 2 ha scritto: [...] ... [...] About some kind of bad relationship between FP and circular structures... I have just
            Message 5 of 7 , May 13, 2003
              Il Tuesday 13 May 2003 12:04, Stalkern 2 ha scritto:
              [...]
              > Is there any main drawback of this approach that I should know? For
              > instance, I can't "see" this structure, the printout of ocaml loops; I can
              > only set it.
              [...]

              About some kind of bad relationship between FP and circular structures...

              I have just noticed that walking HORIZONTALLY a VERTICALLY double-linked tree
              is a kind of big mess if one goes the functional way (at least it was for me
              and I stopped my search) . In-place modification is quite easy.

              Cheers
              Ernesto
            • Stalkern 2
              ... Just to say that this function doesn t work. Now, I ve been converting the tree type to type a node = { name : string ; mutable data : a ; mutable parent
              Message 6 of 7 , May 14, 2003
                Il Tuesday 13 May 2003 12:04, Stalkern 2 ha scritto:
                > let add_child_by_name childName childData aNode =
                > match aNode with
                > EmptyElement ->
                > (* when adding to a empty node, no parent and no children*)
                > Node {
                > name=childName ;
                > data=childData ;
                > parent=EmptyElement ;
                > level=0 ;
                > rank=0 ;
                > children=[] }
                > | Leaf s -> assert false
                > | Node n as targetParent ->
                > (* when adding a new Cell before a addressee Cell, we take as prev
                > of the new Cell the former prev of addressee Cell and as next of the new
                > Cell the addressee Cell itself.
                > Then, we set the new Cell as previous of the adressee Cell and
                > if the prev of the new Cell is not Empty, that could happen if the
                > previous of the addressee cell was empty, we set as the new Cell as next of
                > it*) let newNodeRecord = {
                > name=childName ;
                > data=childData ;
                > parent=targetParent;
                > level=(n.level + 1);
                > rank=(List.length n.children) ;
                > children=[] } in
                > let newNode = Node newNodeRecord in
                > n.children <- (n.children@[newNode]);
                > targetParent;;

                Just to say that this function doesn't work.
                Now, I've been converting the tree type to

                type 'a node = {
                name : string ;
                mutable data : 'a ;
                mutable parent : 'a dll ;
                mutable level : int;
                mutable path : int list;
                mutable rank : int;
                mutable children : 'a dll list }
                and 'a dll = EmptyElement | Leaf of 'a | Node of 'a node ;;

                exception Inconsistent of string;;

                but I'm stuck in a sad problem, i.e. reparenting of added child nodes.
                The problem is that the tree is contained (in maximum n=depth passages) in
                every node (one can take any node and ask for parent, then for children of
                parent, then for parents of children, then for parents of parents of children
                and so on until all the tree has been walked) but the record structure
                implies that every node records its tree at least (1 = number of this node
                +(0 or 1 = number of parent nodes) +(number of children nodes)) times. The
                problem is every recording changes the tree itself! =8-O

                One technique that come sto my mind would be to have not a parent record, but
                something like a parent method, retracing the parent by its path (if I'm at
                node 0-4-2-1, parent will of course be node 0-4-2 and I could retrieve it
                when needed). But I don't know enough samples and I don't understand PXP
                enough to see what did its creators call "double-linked-trees" .

                Does anybody know the solution?

                T I A
                Ernesto
              • Stalkern 2
                ... Yes, somebody knew it! I have just been told that the mutable fields in a record pass values by reference, so if I duplicate something, it s my fault. I
                Message 7 of 7 , May 15, 2003
                  Il Wednesday 14 May 2003 13:30, Stalkern 2 ha scritto:
                  > but I'm stuck in a sad problem, i.e. reparenting of added child nodes.
                  > The problem is that the tree is contained (in maximum n=depth passages) in
                  > every node (one can take any node and ask for parent, then for children of
                  > parent, then for parents of children, then for parents of parents of
                  > children and so on until all the tree has been walked) but the record
                  > structure implies that every node records its tree at least (1 = number of
                  > this node +(0 or 1 = number of parent nodes) +(number of children nodes))
                  > times. The problem is every recording changes the tree itself! =8-O
                  >
                  > One technique that come sto my mind would be to have not a parent record,
                  > but something like a parent method, retracing the parent by its path (if
                  > I'm at node 0-4-2-1, parent will of course be node 0-4-2 and I could
                  > retrieve it when needed). But I don't know enough samples and I don't
                  > understand PXP enough to see what did its creators call
                  > "double-linked-trees" .
                  >
                  > Does anybody know the solution?

                  Yes, somebody knew it! I have just been told that the mutable fields in a
                  record pass values by reference, so if I duplicate something, it's my fault.
                  I should have passed records as such, without opening and rebuilding them.

                  Cheers
                  Ernesto
                Your message has been successfully submitted and would be delivered to recipients shortly.