!standard A.18.18(1) 09-02-04 AI05-0136-1/01 !class Amendment 09-02-04 !status work item 09-02-04 !status received 09-01-27 !priority Medium !difficulty Medium !subject Multiway tree container !summary Add a multiway tree container. !problem One of the most important advantages of using predefined containers over hand-building data structures is that the memory management is handled by the container. This is especially an advantage when the elements are indefinite types such as class-wide types. However, Ada 2005 does not provide any way to create a tree structure out of existing containers without requiring memory management. Consider representing a multiway tree using only the existing containers. Clearly, the list of child nodes of an element can easily be represented as a list container. That list would have to be part of the element as inserted in the tree - which of course is also the type of the elements in the list. Thus, the container has to appear in its own elements; implementing that necesarily includes an incomplete type and an access type (which could be anonymous). (If Ada did not require that, then the implementation of the containers would be significantly constrained; in particular, bounded forms would be incompatible with direct inclusion in the elements). The use of an access type here necessarily requires the use of allocators and thus hand-written memory management. Indeed, it probably harder to use one of the existing container in this way than it is to write the operations directly. This situation cries out for a solution, and the easiest one is to provide one or more appropriate containers. !proposal Add a multiway tree container. A multiway tree allows multiple child elements for each parent element. (Such a tree is sometimes known as parent-child-sibling tree.) A multiway tree is the most general tree; many problems map directly into it (see examples for a number of such cases). !wording The language-defined generic package Containers.Multiway_Trees provides private types List and Cursor, and a set of operations for each type. A multiway tree container is well-suited to represent nested structures. A multiway tree container object manages a tree of internal nodes, each of which contains an element and pointers to the parent, first child, last child, next (successor) sibling, and previous (predecessor) sibling internal nodes. A cursor designates a particular node within a tree (and by extension the element contained in that node). A cursor keeps designating the same node (and element) as long as the node is part of the container, even if the node is moved in the container. Static Semantics The generic library package Containers.Multiway_Trees has the following declaration: generic type Element_Type is private; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Multiway_Trees is pragma Preelaborate(Multiway_Trees); pragma Remote_Types(Multiway_Trees); type Tree is tagged private; pragma Preelaborable_Initialization(Tree); type Cursor is private; pragma Preelaborable_Initialization(Cursor); Empty_Tree : constant Tree; No_Element : constant Cursor; function "=" (Left, Right : Tree) return Boolean; function Is_Empty (Container : Tree) return Boolean; procedure Clear (Container : in out Tree); function Element (Position : Cursor) return Element_Type; procedure Replace_Element (Container : in out Tree; Position : in Cursor; New_Item : in Element_Type); procedure Query_Element (Position : in Cursor; Process : not null access procedure (Element : in Element_Type)); procedure Update_Element (Container : in out Tree; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); procedure Move (Target : in out Tree; Source : in out Tree); procedure Delete (Container : in out Tree; Position : in out Cursor); function Find (Container : Tree; Item : Element_Type; Position : Cursor := No_Element) return Cursor; function Reverse_Find (Container : Tree; Item : Element_Type; Position : Cursor := No_Element) return Cursor; function Contains (Container : Tree; Item : Element_Type) return Boolean; function Has_Element (Position : Cursor) return Boolean; procedure Iterate (Container : in Tree; Process : not null access procedure (Position : in Cursor)); function Child_Length (Container : Tree; Parent : Cursor) return Count_Type; function Child_Depth (Parent, Child : Cursor) return Count_Type; procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; New_Item : in Element_Type; Position : out Cursor; Count : in Count_Type := 1); procedure Insert_Child (Container : in out Tree; Parent : in Cursor; Before : in Cursor; Position : out Cursor; Count : in Count_Type := 1); procedure Prepend_Child (Container : in out Tree; Parent : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Append_Child (Container : in out Tree; Parent : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); procedure Delete_Child (Container : in out Tree; Parent : in Cursor; Position : in out Cursor; Count : in Count_Type := 1); procedure Delete_First_Child (Container : in out Tree; Parent : in Cursor; Count : in Count_Type := 1); procedure Delete_Last_Child (Container : in out Tree; Parent : in Cursor; Count : in Count_Type := 1); procedure Splice (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree); procedure Splice (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree; Position : in out Cursor); procedure Splice (Container: in out Tree; Parent : in Cursor; Before : in Cursor; Position : in Cursor); procedure Splice_Children (Container: in out Tree; Parent : in Cursor; Before : in Cursor; Position : in Cursor); function Parent (Position : Cursor) return Cursor; function First_Child (Container : Tree; Parent : Cursor) return Cursor; function First_Child_Element (Container : Tree; Parent : Cursor) return Element_Type; function Last_Child (Container : Tree; Parent : Cursor) return Cursor; function Last_Child_Element (Container : Tree; Parent : Cursor) return Element_Type; function Next_Sibling (Position : Cursor) return Cursor; function Previous_Sibling (Position : Cursor) return Cursor; procedure Next_Sibling (Position : in out Cursor); procedure Previous_Sibling (Position : in out Cursor); procedure Iterate_Child (Container : in Tree; Parent : in Cursor; Process : not null access procedure (Position : in Cursor)); procedure Reverse_Iterate_Child (Container : in Tree; Parent : in Cursor; Process : not null access procedure (Position : in Cursor)); private ... -- not specified by the language end Ada.Containers.Multiway_Trees; [** Rest of the wording TBD. See the discussion for a discussion of the model and some notes. **] [An indefinite version would be provided with similar differences as the existing indefinite containers.] !discussion The author has found that almost all of his large projects contained a least one multiway tree, as it is an ideal structure for representing nesting. Flatter structures such as the existing containers do not handle nesting well. The name "multiway tree" seems to be the most common one for this data structure in the literature. (It sometimes is written "multi-way tree".) We do not provide an indexing method in this tree; this is purely a structuring data structure. If indexing is needed, a set or map container (already provided by the Ada Containers library) is probably more appropriate. Model: The model is that every element in the tree has an associated doubly-linked list of children. Thus we provide almost all of the operations of the list container, each with an added Parent parameter. We did not change the names of the operations for familiarity. We added "_Child" to each name in order to describe clearly what is being accessed. We considered leaving the names the same (without the "_Child" suffix), but that seemed confusing, as it would be easy to believe that First always represents the first node of the root of the tree. Leaving off "_Child" also allows some operations to be defined over the entire tree, such as Find, and some versions of iterate. Note that the root of the tree is also a list of elements. (Thus the tree is multirooted as well as multiway.) We considered allowing only a single root node, but it seemed inconsistent for no important reason, and moreover, allowing additional roots allows bottom-up tree creation where subtrees are stashed in the root until their parent is created (at which time Slice_Child would be used to make it subtree of the parent). Note that AI05-0001-1 proposes to add some new routines to all of the existing containers; these weren't included in this proposal but presumably would be included. Similarly, we expect that a bounded form would be provided. For all of the subprograms that take a Parent parameter, if the parameter is No_Element, the node is inserted at the root of the tree. This is consistent with the meaning of the Before parameter in the Insert routines in the existing list containers. Notes on specific subprograms: Operations that were dropped are: Swap and Swap_List; Reverse_Elements; and the sorting operations. Delete is provided to delete an arbitrary node from the tree. We don't provide a Count parameter on that routine, because it is not obvious what is being counted (sibling nodes only? child nodes only? both sibling and child nodes?) Iterate walks the entire tree starting with the root list of nodes. Each node in every child list is visited in order; after each node is visited, the child list of that of that node are visited. This is done recursively. Child_Length returns the number of Children nodes of Parent. Child_Depth returns the number of ancestor nodes of Child, up to (but not including Parent). If Parent is No_Element, returns the number of ancestor nodes of Child. Raises Program_Error if Parent is not No_Element, and is not an ancestor of Child. For Insert_Child, if the Before parameter is not a child of Parent, Program_Error is propagated. Delete_Child is similar to Delete, but (a) the node to delete must be a child of the Parent; (b) a Count is provided to delete several children. We didn't add "_Child" to Splice, as it usually will be splicing entire subtrees. As with Insert_Child, if Before is not a child of Parent, Program_Error is propagated. Splice moves the node(s) and all child nodes of the node to the new position in the (new) tree. Splice_Children moves only the child nodes of the position to the new position. Selector function Parent returns the parent of the node represented by the given cursor. Next and Previous were renamed Next_Sibling and Previous_Sibling to make the meaning clearer (they work just like the list versions). Note that we left the Container parameter in routines like First_Child, because the Parent parameter can be No_Element to represent the root, and we need to know where to look for that root. An alternative design would be to have a separate set of routines for accessing the root, but that would clutter the package and complicate use (a lot of insertion code would have to explicitly check to see if it is inserting at the root: a scourge of hand-written code that we don't want to duplicate in the containers!). Iterate_Child and Iterate_Reverse_Child just iterate over the child nodes of Parent (and no others). Further ideas: * More iterators could be useful. One could imagine wanting to visit the child nodes before their parents, as well as a straight breadth-first iteration. One possible way to (re)structure the overall iterator is to have two process routines, one that is called when a node is first visited, and the second after the children are visited. (That would allow a more direct mapping of the HTML example below.) This would look something like: procedure Iterate (Container : in Tree; Process_Before_Children : access procedure (Position : in Cursor); Process_After_Children : access procedure (Position : in Cursor)); (Note that these subprograms are allowed to be null, so that if one or the other isn't needed, it can be skipped.) The biggest objection to such an iterator is that it can't be mapped into for-loop-like syntax (one hopes that some future version of Ada will allow that). Advantages of the multiway tree container over hand-created trees: (1) Complete memory management, including for indefinite elements. (2) Avoidance of linking mistakes during operations like insertions and deletions. (3) More safety than hand-built structures -- tampering checks and possible dangling pointer (cursor) detection. (4) Completely linked (doubly-linked sibling lists; complete parent and child pointers), so walking in any direction in the tree is always O(1). (5) Built-in iterators. (6) Container users get streaming for free; streaming of a tree is fairly hard to do correctly. Disadvantages of the multiway tree container: (1) Not as familiar as some other containers; may not be used as much for that reason. (Authors can help aleviate this situation somewhat.) (2) Not as general as a general graph structure. But a general graph structure is much harder to describe, it is harder to manage memory for, and is not as commonly needed as a tree. (3) More memory management options are available for hand-created code. But that also provides a lot more opportunities for bugs. !examples (1) Representing HTML: (XML is similar) type Root_HTML_Element is tagged private; function Init_Element return Root_HTML_Element; procedure Output_Prefix (Element : Root_Element_Type); procedure Output_Suffix (Element : Root_Element_Type); type Text is Root_HTML_Element with record The_Text : Unbounded_String; end record; procedure Output_Prefix (Element : Bold_Element_Type); -- Writes The_Text procedure Output_Suffix (Element : Bold_Element_Type); -- Writes nothing. type Bold_HTML_Element is new Root_HTML_Element with null record; procedure Output_Prefix (Element : Bold_Element_Type); -- Writes procedure Output_Suffix (Element : Bold_Element_Type); -- Writes -- Similar overridings of Output_Prefix and Output_Suffix for each type; -- we won't show them here. type Italic_HTML_Element is new Root_HTML_Element with null record; type Div_HTML_Element is new Root_HTML_Element with record Class : Class_Name; Style : Style_String; -- other attributes. end record; type Span_HTML_Element is new Root_HTML_Element with record Class : Class_Name; Style : Style_String; -- other attributes. end record; type Header_HTML_Element is new Root_HTML_Element with null record; type Body_HTML_Element is new Root_HTML_Element with null record; -- and so on. package HTML_Tree is new Ada.Containers.Multiway_Trees (Root_HTML_Element'Class); -- The following HTML code:
Sample HTML
This is normal text; this is bold text; and this is italic text
-- would be represented as the following operations: Sample : HTML_Tree.Tree; Header, HTML_Body, Paragraph, Working: HTML_Tree.Cursor; HTML_Tree.Insert_Child (Sample, Parent => HTML_Tree.No_Element, Position => Header, New_Element => Header_HTML_Element'(Init_Element with null record)); HTML_Tree.Insert_Child (Sample, Parent => Header, Position => HTML_Body, New_Element => Body_HTML_Element'(Init_Element with null record)); HTML_Tree.Insert_Child (Sample, Parent => HTML_Body, Position => Paragraph, New_Element => Div_HTML_Element'(Init_Element with Class => "Text-Header", Style => "")); HTML_Tree.Insert_Child (Sample, Parent => Paragraph, New_Element => Text'(Init_Element with The_Text => "Sample HTML")); HTML_Tree.Insert_Child (Sample, Parent => HTML_Body, Position => Working, New_Element => Div_HTML_Element'(Init_Element with Class => "Text-Body", Style => "")); Paragraph := Working; HTML_Tree.Insert_Child (Sample, Parent => Paragraph, Position => Working, New_Element => Text'(Init_Element with The_Text => "This is normal text; ")); HTML_Tree.Insert_Child (Sample, Parent => Working, Position => Working, New_Element => Bold_HTML_Element'(Init_Element with null record)); HTML_Tree.Insert_Child (Sample, Parent => Working, New_Element => Text'(Init_Element with The_Text => "this is bold text")); HTML_Tree.Insert_Child (Sample, Parent => Paragraph, Position => Working, New_Element => Text'(Init_Element with The_Text => "; and ")); HTML_Tree.Insert_Child (Sample, Parent => Paragraph, Position => Working, New_Element => Italic_HTML_Element'(Init_Element with null record)); HTML_Tree.Insert_Child (Sample, Parent => Working, New_Element => Text'(Init_Element with The_Text => "this is italic text")); Outputing the original HTML could be accomplished as follows: procedure Output_Document (Document : in HTML_Tree.Tree) is procedure Output_Node (Node : HTML_Tree.Cursor) is Node_Element : Root_HTML_Element'Class := HTML_Tree.Element (Node); begin Output_Prefix (Node_Element); -- Dispatches. Iterate_Children (Document, Parent => Node, Process => Output_Node'Access); Output_Suffix (Node_Element); -- Dispatches. end Output_Node; begin Output_Node (HTML_Tree.First (Parent => No_Element)); end Output_Document; (2) Windows in a GUI. Windows in a GUI typically are structured as a multiway tree: each window can have a (theroetically) unlimited number of child windows. For instance, in Microsoft Windows, controls like buttons, check boxes, and edit controls are child windows. A program that manipulates a GUI in some way could represent the set of windows in an application as a multiway tree container, putting the burden of memory management on the container. [Author's note: sure would have liked that for the Claw GUI builder...] (3) E-mail messages. The MIME RFCs that define how attachments are structured in an e-mail message define that a message is divided into a series of sections, each with their own headers. One possible content of an e-mail message section is an attached e-mail message -- which has its own set of headers and sections. Thus, an e-mail parser needs to be able to structure the message so that it can look into and process these nested messages. (That's especially important for anti-spam and anti-malware processing -- it would not do to allow a bad message through simply because of nesting.) An e-mail processor could be structured to represent each section as one node in a multiway tree. The top-level sections would be root nodes, but any nested messages would be child nodes of their parent. Here again, we would put the memory management burden on the container, rather than the programmer of the processor. (4) Compiler symboltable. The symboltable in a compiler is also a multiway tree. Entities declared in a single declarative part are a list (including order). Of course, many kinds of entities have their own declarations (in Ada, records have components and discriminants; subprograms have parameters; blocks have declarative parts; etc.) This maps naturally into a multiway tree, where the entities that belong to a particular entity are represented as a child list of the tree. If a compiler needed to save a symboltable for some reason, it could easily be done using the streaming operations of the container. [Author's note: sure would have saved a lot of work for Janus/Ada...but perhaps wouldn't be fast enough.] A debugging symboltable dump could be done as a simple iterator: procedure Dump_Symboltable (Table : Symbol.Tree) is procedure Process (Node : Symbol_Cursor) is begin Put ((1 .. Child_Depth(Node)*2) => ' '); -- Indent based on the depth. Dump_Symbol (Element (Node)); end Process; begin Iterate (Tree, Process'Access); end Process; !ACATS test !appendix From: Randy Brukardt Date: Tuesday, January 27, 2009 12:39 AM Back in November, Tucker and I had the following exchange: Tucker: Tree structures are painful to work with without access types. Tagged types, access types, and heterogeneous trees are natural bedfellows. In this era of XML everything, heterogeneous trees are not just for compiler writers anymore. Randy: I surely agree that tagged types and heterogeneous trees are natural bedfellows. I'm not so certain about the access types, though. I thought that the Set containers were supposed to cover a significant portion of the need for trees. If they don't, then I think we ought to discuss what containers will. (Let's work on tomorrow's ways to program, not yesterday's.) Tucker: You can use a tree to implement a set, but you can't really use a set to implement a tree. You could implement a tree using a map, but that's pretty painful. If I want to represent something that is tree like (e.g. an abstract syntax tree, an XML file, a CAD drawing, etc.) I really need pointers of some sort. Randy (today): Having thought about this a lot more, I tend to agree that you can't use any of the existing containers to create a tree structure. But that means giving up the benefits of containers (storage management is done for you; correctness - links get made correctly; streaming is provided) in favor of doing things the same old way (with the same old set of bugs). In the past, Tucker has contended that it would be harder to use such a container than to write it yourself. I originally bought that argument, but I'm now more dubious, since memory management (especially of indefinite types) is hard to get right. So I decided to work out a definition of a multiway tree container. Having done that, and written some example code for an HTML processor, I've pretty decided that Tucker's position here is a load of MBE(*). But decide for yourself - I've attached trial ballon #4. (*) MBE => Male Bovine Excrement. :-) [Attached was version /01 of this AI.] **************************************************************** From: Randy Brukardt Date: Thursday, January 29, 2009 9:10 PM A couple of additions to my proposal for a Multiway_Tree container: ---- Add after the definition of No_Element: Root : Cursor renames No_Element; This will make adding to the root more readable (it doesn't change any semantics). For instance, in the example: HTML_Tree.Insert_Child (Sample, Parent => HTML_Tree.Root, Position => Header, New_Element => Header_HTML_Element'(Init_Element with null record)); ---- The other two forms of Slice also should have Splice_Children variants: procedure Splice_Children (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree); procedure Splice_Children (Target : in out Tree; Parent : in Cursor; Before : in Cursor; Source : in out Tree; Position : in out Cursor); Obviously, these operations should be consistent: it would be odd to find some possibilities missing for no obvious reason. These could come up quite a bit in practice. For instance, imagine an HTML editor based on the example code in the proposal. If you were to remove bold facing (for example), you would have to move the children of the bold node to up to the position of the bold node, then delete the bold node. That would use the original Splice_Children routine. If you have multiple HTML documents to operate on, it would make sense to move parts from one to another (cut-and-paste, for instance), and that could need moving the children. **************************************************************** From: Niklas Holsti Date: Friday, January 30, 2009 2:30 AM > Root : Cursor renames No_Element; > > This will make adding to the root more readable (it doesn't change any > semantics). I find the name "Root" here quite confusing. More below. > For instance, in the example: > > HTML_Tree.Insert_Child ( > Sample, > Parent => HTML_Tree.Root, > Position => Header, > New_Element => Header_HTML_Element'( > Init_Element with null record)); If I read the above, without remembering the suggested definition of "Root" as "No_Element" and the special meaning of Parent => No_Element, it clearly seems to mean that the new child is inserted below the root, not as the (or a) new root. It is quite likely that application code that builds a (single-rooted) tree will have a variable called Root, and the nearly identical call > HTML_Tree.Insert_Child ( > Sample, > Parent => Root, -- CHANGED > Position => Header, > New_Element => Header_HTML_Element'( > Init_Element with null record)); then has a very different effect. I think that defining Multiway_Tree.Root in this proposed way invites mistakes, and I find the original form "Parent => No_Element" clearer. Another comment: In the call to Insert_Child in the above example, and in fact in all Insert_Child calls in the HTML example in the proposal, there is no actual for the "Before" parameter, which is required in all three forms of Insert_Child as defined in the proposal. I assume that "Before => No_Element" is intended. I do welcome this Multiway_Tree suggestion. I think it fills a gap and will be useful. **************************************************************** From: Randy Brukardt Date: Friday, January 30, 2009 9:34 PM ... > > For instance, in the example: > > > > HTML_Tree.Insert_Child ( > > Sample, > > Parent => HTML_Tree.Root, > > Position => Header, > > New_Element => Header_HTML_Element'( > > Init_Element with null record)); > > If I read the above, without remembering the suggested definition of > "Root" as "No_Element" and the special meaning of Parent => > No_Element, it clearly seems to mean that the new child is inserted > below the root, not as the (or a) new root. Humm, I see your point. Perhaps it would be better named "As_Root" or something like that? I don't think that the use of Parent => No_Element is particularly understandable. Originally, I had planned a second set of functions for accessing the root: HTML_Tree.Insert_Root ( Sample, Position => Header, New_Element => Header_HTML_Element'( Init_Element with null record)); But a lot of the time, you would end up making the code conditional on whether you're inserting at the root or in a child, which doesn't seem helpful. So I was looking for a way to make it clearer that you are inserting the root of the tree. ... > Another comment: In the call to Insert_Child in the above example, and > in fact in all Insert_Child calls in the HTML example in the proposal, > there is no actual for the "Before" > parameter, which is required in all three forms of Insert_Child as > defined in the proposal. I assume that "Before => No_Element" is > intended. Yes, it is. I had really intended to use Append_Child, but since that doesn't return the position of the inserted element, it can't be used when it is intended to insert things on children. > I do welcome this Multiway_Tree suggestion. I think it fills a gap and > will be useful. Thanks, good to know I'm not alone on this one. **************************************************************** From: Niklas Holsti Date: Saturday, January 31, 2009 3:17 AM > Humm, I see your point. Perhaps it would be better named "As_Root" or > something like that? I also thought about "As_Root", and that would be OK in a positional call: Insert_Child (Sample, As_Root, ...). The named form, "Parent => As_Root", is better than "Parent => Root", but I'm still not happy with it. > I don't think that the use of Parent => No_Element is particularly > understandable. I do. If you insert a node in a tree and specify that it shall have no parent node, obviously (to me) the node becomes a root. I would not object to defining "As_Root : Cursor renames No_Element", but I don't think it is very beneficial. Logically one would then expect also definitions like As_Last : Cursor renames No_Element; for use as the Before parameter in Insert operations, but again I prefer "Before => No_Element" to "Before => As_Last". It is again obvious (to me) that inserting an element in a list with the specification that it shall be before no other element means that the element is appended as the new last element in the list. > Originally, I had planned a second set of functions for accessing the root: > > HTML_Tree.Insert_Root ( > Sample, > Position => Header, > New_Element => Header_HTML_Element'( > Init_Element with null record)); > > But a lot of the time, you would end up making the code conditional on > whether you're inserting at the root or in a child, which doesn't seem > helpful. I agree. I think the "Parent => No_Element" mechanism is a good solution. Perhaps it could be augmented by defining some root-specific operations, such as Insert_Root, analogously with the operations Prepend and Append and defined as equivalent to Insert_Child (.. Parent => No_Element ...). These could be used to increase clarity when it is statically known that the operation is on a root. To avoid cluttering the main package with these root-specific operations, why not define a child package, say Multiway_Trees.Roots, with operations such as Multiway_Trees.Roots.Insert and so on. **************************************************************** From: Jean-Pierre Rosen Date: Saturday, January 31, 2009 7:38 PM > Humm, I see your point. Perhaps it would be better named "As_Root" or > something like that? I don't think that the use of Parent => > No_Element is particularly understandable. Well, anybody using a tree should know that the only node without a parent is the root, so I doesn't seem illogical to say that you want to insert at the root by saying that you want to insert at the place that has no parent. Not a big deal anyway. ****************************************************************