!standard A.18.18(1) 09-02-04 AI05-0136-1/01 !class Amendment 09-02-04 !status work item 09-02-04 !status received 09-01-27 !priority Medium !difficulty Medium !subject Multiway tree container !summary Add a multiway tree container. !problem One of the most important advantages of using predefined containers over hand-building data structures is that the memory management is handled by the container. This is especially an advantage when the elements are indefinite types such as class-wide types. However, Ada 2005 does not provide any way to create a tree structure out of existing containers without requiring memory management. Consider representing a multiway tree using only the existing containers. Clearly, the list of child nodes of an element can easily be represented as a list container. That list would have to be part of the element as inserted in the tree - which of course is also the type of the elements in the list. Thus, the container has to appear in its own elements; implementing that necesarily includes an incomplete type and an access type (which could be anonymous). (If Ada did not require that, then the implementation of the containers would be significantly constrained; in particular, bounded forms would be incompatible with direct inclusion in the elements). The use of an access type here necessarily requires the use of allocators and thus hand-written memory management. Indeed, it probably harder to use one of the existing container in this way than it is to write the operations directly. This situation cries out for a solution, and the easiest one is to provide one or more appropriate containers. !proposal Add a multiway tree container. A multiway tree allows multiple child elements for each parent element. (Such a tree is sometimes known as parent-child-sibling tree.) A multiway tree is the most general tree; many problems map directly into it (see examples for a number of such cases). !wording The language-defined generic package Containers.Multiway_Trees provides private types List and Cursor, and a set of operations for each type. A multiway tree container is well-suited to represent nested structures. A multiway tree container object manages a tree of internal nodes, each of which contains an element and pointers to the parent, first child, last child, next (successor) sibling, and previous (predecessor) sibling internal nodes. A cursor designates a particular node within a tree (and by extension the element contained in that node). A cursor keeps designating the same node (and element) as long as the node is part of the container, even if the node is moved in the container. Static Semantics The generic library package Containers.Multiway_Trees has the following declaration: generic type Element_Type is private; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Multiway_Trees is pragma Preelaborate(Multiway_Trees); pragma Remote_Types(Multiway_Trees); type Tree is tagged private; pragma Preelaborable_Initialization(Tree); type Cursor is private; pragma Preelaborable_Initialization(Cursor); Empty_Tree : constant Tree; No_Element : constant Cursor; function "=" (Left, Right : Tree) return Boolean; function Is_Empty (Container : Tree) return Boolean; procedure Clear (Container : in out Tree); function Element (Position : Cursor) return Element_Type; procedure Replace_Element (Container : in out Tree; Position : in Cursor; New_Item : in Element_Type); procedure Query_Element (Position : in Cursor; Process : not null access procedure (Element : in Element_Type)); procedure Update_Element (Container : in out Tree; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); procedure Move (Target : in out Tree; Source : in out Tree); procedure Delete (Container : in out Tree; Position : in out Cursor); function Find (Container : Tree; Item : Element_Type; Position : Cursor := No_Element) return Cursor; function Reverse_Find (Container : Tree; Item : Element_Type; Position : Cursor := No_Element) return Cursor; function Contains (Container : Tree; Item : Element_Type) return Boolean; function Has_Element (Position : Cursor) return Boolean; procedure Iterate (Container : in Tree; Process : not null access procedure (Position : in Cursor)); function Child_Length (Container : Tree; Parent : Cursor) return Count_Type; function Child_Depth (Parent, Child : Cursor) return Count_Type; procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; New_Item : in Element_Type; Position : out Cursor; Count : in Count_Type := 1); procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; Position : out Cursor; Count : in Count_Type := 1); procedure Prepend_Child (Container : in out Tree; Parent : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Append_Child (Container : in out Tree; Parent : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Delete_Child (Container : in out Tree; Parent : in Cursor; Position : in out Cursor; Count : in Count_Type := 1); procedure Delete_First_Child (Container : in out Tree; Parent : in Cursor; Count : in Count_Type := 1); procedure Delete_Last_Child (Container : in out Tree; Parent : in Cursor; Count : in Count_Type := 1); procedure Splice (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree); procedure Splice (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree; Position : in out Cursor); procedure Splice (Container: in out Tree; Parent : in Cursor; Before : in Cursor; Position : in Cursor); procedure Splice_Children (Container: in out Tree; Parent : in Cursor; Before : in Cursor; Position : in Cursor); function Parent (Position : Cursor) return Cursor; function First_Child (Container : Tree; Parent : Cursor) return Cursor; function First_Child_Element (Container : Tree; Parent : Cursor) return Element_Type; function Last_Child (Container : Tree; Parent : Cursor) return Cursor; function Last_Child_Element (Container : Tree; Parent : Cursor) return Element_Type; function Next_Sibling (Position : Cursor) return Cursor; function Previous_Sibling (Position : Cursor) return Cursor; procedure Next_Sibling (Position : in out Cursor); procedure Previous_Sibling (Position : in out Cursor); procedure Iterate_Child (Container : in Tree; Parent : in Cursor; Process : not null access procedure (Position : in Cursor)); procedure Reverse_Iterate_Child (Container : in Tree; Parent : in Cursor; Process : not null access procedure (Position : in Cursor)); private ... -- not specified by the language end Ada.Containers.Multiway_Trees; [** Rest of the wording TBD. See the discussion for a discussion of the model and some notes. **] [An indefinite version would be provided with similar differences as the existing indefinite containers.] !discussion The author has found that almost all of his large projects contained a least one multiway tree, as it is an ideal structure for representing nesting. Flatter structures such as the existing containers do not handle nesting well. The name "multiway tree" seems to be the most common one for this data structure in the literature. (It sometimes is written "multi-way tree".) We do not provide an indexing method in this tree; this is purely a structuring data structure. If indexing is needed, a set or map container (already provided by the Ada Containers library) is probably more appropriate. Model: The model is that every element in the tree has an associated doubly-linked list of children. Thus we provide almost all of the operations of the list container, each with an added Parent parameter. We did not change the names of the operations for familiarity. We added "_Child" to each name in order to describe clearly what is being accessed. We considered leaving the names the same (without the "_Child" suffix), but that seemed confusing, as it would be easy to believe that First always represents the first node of the root of the tree. Leaving off "_Child" also allows some operations to be defined over the entire tree, such as Find, and some versions of iterate. Note that the root of the tree is also a list of elements. (Thus the tree is multirooted as well as multiway.) We considered allowing only a single root node, but it seemed inconsistent for no important reason, and moreover, allowing additional roots allows bottom-up tree creation where subtrees are stashed in the root until their parent is created (at which time Slice_Child would be used to make it subtree of the parent). Note that AI05-0001-1 proposes to add some new routines to all of the existing containers; these weren't included in this proposal but presumably would be included. Similarly, we expect that a bounded form would be provided. For all of the subprograms that take a Parent parameter, if the parameter is No_Element, the node is inserted at the root of the tree. This is consistent with the meaning of the Before parameter in the Insert routines in the existing list containers. Notes on specific subprograms: Operations that were dropped are: Swap and Swap_List; Reverse_Elements; and the sorting operations. Delete is provided to delete an arbitrary node from the tree. We don't provide a Count parameter on that routine, because it is not obvious what is being counted (sibling nodes only? child nodes only? both sibling and child nodes?) Iterate walks the entire tree starting with the root list of nodes. Each node in every child list is visited in order; after each node is visited, the child list of that of that node are visited. This is done recursively. Child_Length returns the number of Children nodes of Parent. Child_Depth returns the number of ancestor nodes of Child, up to (but not including Parent). If Parent is No_Element, returns the number of ancestor nodes of Child. Raises Program_Error if Parent is not No_Element, and is not an ancestor of Child. For Insert_Child, if the Before parameter is not a child of Parent, Program_Error is propagated. Delete_Child is similar to Delete, but (a) the node to delete must be a child of the Parent; (b) a Count is provided to delete several children. We didn't add "_Child" to Splice, as it usually will be splicing entire subtrees. As with Insert_Child, if Before is not a child of Parent, Program_Error is propagated. Splice moves the node(s) and all child nodes of the node to the new position in the (new) tree. Splice_Children moves only the child nodes of the position to the new position. Selector function Parent returns the parent of the node represented by the given cursor. Next and Previous were renamed Next_Sibling and Previous_Sibling to make the meaning clearer (they work just like the list versions). Note that we left the Container parameter in routines like First_Child, because the Parent parameter can be No_Element to represent the root, and we need to know where to look for that root. An alternative design would be to have a separate set of routines for accessing the root, but that would clutter the package and complicate use (a lot of insertion code would have to explicitly check to see if it is inserting at the root: a scourge of hand-written code that we don't want to duplicate in the containers!). Iterate_Child and Iterate_Reverse_Child just iterate over the child nodes of Parent (and no others). Further ideas: * More iterators could be useful. One could imagine wanting to visit the child nodes before their parents, as well as a straight breadth-first iteration. One possible way to (re)structure the overall iterator is to have two process routines, one that is called when a node is first visited, and the second after the children are visited. (That would allow a more direct mapping of the HTML example below.) This would look something like: procedure Iterate (Container : in Tree; Process_Before_Children : access procedure (Position : in Cursor); Process_After_Children : access procedure (Position : in Cursor)); (Note that these subprograms are allowed to be null, so that if one or the other isn't needed, it can be skipped.) The biggest objection to such an iterator is that it can't be mapped into for-loop-like syntax (one hopes that some future version of Ada will allow that). Advantages of the multiway tree container over hand-created trees: (1) Complete memory management, including for indefinite elements. (2) Avoidance of linking mistakes during operations like insertions and deletions. (3) More safety than hand-built structures -- tampering checks and possible dangling pointer (cursor) detection. (4) Completely linked (doubly-linked sibling lists; complete parent and child pointers), so walking in any direction in the tree is always O(1). (5) Built-in iterators. (6) Container users get streaming for free; streaming of a tree is fairly hard to do correctly. Disadvantages of the multiway tree container: (1) Not as familiar as some other containers; may not be used as much for that reason. (Authors can help aleviate this situation somewhat.) (2) Not as general as a general graph structure. But a general graph structure is much harder to describe, it is harder to manage memory for, and is not as commonly needed as a tree. (3) More memory management options are available for hand-created code. But that also provides a lot more opportunities for bugs. !examples (1) Representing HTML: (XML is similar) type Root_HTML_Element is tagged private; function Init_Element return Root_HTML_Element; procedure Output_Prefix (Element : Root_Element_Type); procedure Output_Suffix (Element : Root_Element_Type); type Text is Root_HTML_Element with record The_Text : Unbounded_String; end record; procedure Output_Prefix (Element : Bold_Element_Type); -- Writes The_Text procedure Output_Suffix (Element : Bold_Element_Type); -- Writes nothing. type Bold_HTML_Element is new Root_HTML_Element with null record; procedure Output_Prefix (Element : Bold_Element_Type); -- Writes procedure Output_Suffix (Element : Bold_Element_Type); -- Writes -- Similar overridings of Output_Prefix and Output_Suffix for each type; -- we won't show them here. type Italic_HTML_Element is new Root_HTML_Element with null record; type Div_HTML_Element is new Root_HTML_Element with record Class : Class_Name; Style : Style_String; -- other attributes. end record; type Span_HTML_Element is new Root_HTML_Element with record Class : Class_Name; Style : Style_String; -- other attributes. end record; type Header_HTML_Element is new Root_HTML_Element with null record; type Body_HTML_Element is new Root_HTML_Element with null record; -- and so on. package HTML_Tree is new Ada.Containers.Multiway_Trees (Root_HTML_Element'Class); -- The following HTML code: