!standard A.17 04-02-13 AI95-00302-04/00 !class amendment 04-02-13 !status work item 04-02-13 !status received 04-02-13 !priority Medium !difficulty Hard !subject Container library (mail container) !summary This is a dummy AI created solely to hold the volumious mail on this topic. See AI-302-03 for the actual proposal. !problem !proposal !wording !example --!corrigendum A.17 !ACATS Test !appendix [Editor's note: For mail earlier than February 6, 2004, see AI-302-3.] **************************************************************** From: Randy Brukardt Sent: Thursday, February 5, 2004 3:48 PM Jeffrey Carter wrote: ... > > I personally consider an extensible array (i.e. a vector) a useful and > > important standard container. I don't feel the same way about a linked > > list, because it is so easy to implement what you want, and there > > are so many options when it comes to how to link the objects > > together that having a standard container for that hardly > > seems worthwhile (IMHO). > > I have no objections to an extensible array, provided it's clearly > identified as such. I think it should look different from the proposal, > but that's mainly a taste issue. I'd want direct analogs to indexing, > both LHS and RHS (Put and Get?); slices, both LHS and RHS (Replace_Slice > and Slice?); and 'First, 'Last, and 'Length (though 'First is a constant > for an EA). An equivalent to 'range would be nice, but impossible. The > only difference to a normal array would be that Put and Replace_Slice > can accept indices not in First .. Last. I haven't given it a great deal > of thought, so I'm sure I'm missing some subtleties, but I don't see a > need for Front, Back, Insert, Delete, and so on. Let's see: - direct analogs to indexing, both LHS and RHS (Element, Replace_Element); - slices (nope); - 'First (First), 'Last (Last), 'Length (Length); Looks like pretty much everything is in there. And slicing will be expensive if the implementation is not a straight array, so it's somewhat dubious. Insert and Delete provide easier ways of adding or removing items than slices - and how often do you use a slice of a non-string type for something other than inserting or deleting elements anyway?? Ada doesn't (and isn't) going to support user-defined indexing or user-defined attributes, so this is about the best you can do. So what's the complaint (other than the name)?? > The proposal says that containers "that are relatively easy to code, > redundant, or rarely used are omitted". It also says that lists are > difficult to implement correctly. I think that's a mistake; only very rare operations are difficult to code. We didn't update every piece of the original text, and that one is misleading. > Given a list, structures such as > deques, stacks, and especially queues are easy to implement. Since > queues are common structures and not redundant (none of the proposed > containers provides an efficient implementation of a queue), the > proposal itself seems to argue that lists should be provided, since they > are not easy to code correctly, and provide a basis for the user to > easily code queues. The user can easily code a queue in terms of a Vector (that's one of the uses of Insert!). We dropped the list component because it had an identical interface to the Vector component, but was less flexible (no computed O(1) access). In any case efficiency is not a goal of the standard containers. It would be incorrect for the standard to specify performance to the point that only a single implementation would be possible. Moreover, we anticipate a secondary standard that *does* try to provide more control over performance (by adding lists, bounded forms, etc.) In my view, it is a mistake for projects to depend on standard containers where there are critical performance requirements (not just time, but also space as well). In that case, you really have to have control of the implementation -- you really need *all* of the source code. You can't trust something provided by the standard (or your compiler vendor) in those cases. In any case, the purpose of these containers is to provide a seed and a standard direction. I would hope that they would reduce the tower of babel that Ada containers are nowdays - by providing a style for other containers to follow. No one is suggesting that these are sufficient to solve all programming problems - just 80% of them, especially in prototypes and in Q&D programs. **************************************************************** From: Martin Dowie Sent: Thursday, February 5, 2004 5:50 PM > Dowie, Martin (UK) wrote: > > I could but wasn't part of the purpose of the library to allow us to > > do common things more easily? And I'd have to say I'd use a 'Quit' > > version a _lot_ more than the current process everything, > > every time one. > > It would be helpful if you could be specific about what kind of > container you were using. I was thinking, primarily, of a project that used single (bounded) lists to hold commands (a basic, domain-specific, scripting language I guess), one of which was 'stop this sequence of commands'. This pattern has since shown itself to be quite common in embedded systems - for either domain-specific scripting languages or graphics. There is the other idiom where one is processing an iteration of items and an external event occurs that stops the processing - e.g. the 'stop' button is pushed on a GUI-search window, but it could equally be a 50Hz message over a 1553. **************************************************************** From: Randy Brukardt Sent: Thursday, February 5, 2004 6:14 PM > I was thinking, primarily, of a project that used single (bounded) lists to > hold commands (a basic, domain-specific, scripting language I guess), > one of which was 'stop this sequence of commands'. My understanding of the model is that passive iterators are only for cases where you want to iterate over the entire container. Thus, this is clearly a use for an active iterator. Indeed, given the iteration model of packages, there's hardly any reason to use a passive iterator. They're harder to write (a subprogram and instantiation are required), and (especially if a Quit parameter is provided), harder to understand. We dropped the passive iterator from the Ada.Directories package precisely because even ARG members were confused about how it worked. Even though it was a classic passive iterator with a Quit parameter. Perhaps the confusion really was the Quit parameter (I thought it was the whole idea), but in any case, you've got to keep them simple. > This pattern has since shown itself to be quite common in embedded > systems - for either domain-specific scripting languages or graphics. > > There is the other idiom where one is processing an iteration of items > and an external event occurs that stops the processing - e.g. the 'stop' > button is pushed on a GUI-search window, but it could equally be a > 50Hz message over a 1553. It seems to me that an abort situation is best handled by propagating an exception. Otherwise, you end up distributing termination code/flags everywhere in the application. But YMMV. **************************************************************** From: Jeffery Carter Sent: Thursday, February 5, 2004 6:39 PM Matthew Heaney wrote: > Alexandre K. and Jeff C. are both incorrect. The container is not a > list, not a Java_Vector, not an Extensible_Array, and not a > Flexible_Array. Matthew H. is incorrect. The data structure is not a vector. I am at least as qualified as Matthew H. to make such pronouncements. **************************************************************** From: Jeffery Carter Sent: Friday, February 6, 2004 1:05 PM A comment on type names. Ada 83, with the unfortunate* exception of File_Type, did not use "_Type" on the end of predefined type names. We have Address and Count, not Address_Type and Count_Type. Ada 95 adhered to this principle, so we have Storage_Element and Unbounded_String, not Storage_Element_Type and Unbounded_String_Type. For consistency, I think the Ada-0X process should also adhere to this principle. The use of "_Type" on type names in the proposal should be eliminated. This takes some time and thought to do well; I am willing to volunteer for the effort if the Committee cannot spare the time and cannot find anyone preferable. This is a matter of consistently. While it is not my style, and not recommended by the Quality and Style Guide, I have used libraries that use the "_Type" convention without problem. I am concerned that the ARM be consistent far more than I am about what convention the ARM uses. *"Unfortunate" because it is inconsistent. **************************************************************** From: Matthew Heaney Sent: Friday, February 6, 2004 9:33 AM I have updated the reference implementation, which now has the sorted set container, too. There's also a test_sets.adb, so you have something to run. You can pass a seed on the command line. I'll take care of the hashed map containers this weekend, and post Mon AM. **************************************************************** From: Matthew Heaney Sent: Friday, February 6, 2004 3:36 PM Martin Dowie wrote: > I was thinking, primarily, of a project that used single (bounded) lists to > hold commands (a basic, domain-specific, scripting language I guess), > one of which was 'stop this sequence of commands'. It sounds like you have a sequence container, that you traverse from front to back. The only sequence container in the proposal is a vector, which doesn't have a passive iterator. Again, I recommend just using a loop: for Index in First (V) .. Last (V) loop declare Command : Command_Type := Element (V, Index); begin exit when Is_Stop (Command); -- process command end; end loop; If these are commands that have an order (say, each command has a timestamp, and commands are executed in timestamp order), then you can use the sorted set. Again, an explicit loop is appropriate: declare I : Cursor_Type := First (S); J : constant Cursor_Type := Back (S); begin while I /= J loop declare Command : Command_Type := Element (I); begin exit when Is_Stop (Command); -- process command end; Increment (I); end loop; end; **************************************************************** From: Alexandre E. Kopilovitch Sent: Friday, February 6, 2004 4:24 PM > The only sequence container in the proposal is a vector, Ah, yes, it's Sequence - quite right name for that container (and not Vector). **************************************************************** From: Jeffrey Carter Sent: Friday, February 6, 2004 7:17 PM Randy Brukardt wrote: > Let's see: > - direct analogs to indexing, both LHS and RHS (Element, Replace_Element); > - slices (nope); > - 'First (First), 'Last (Last), 'Length (Length); > > Looks like pretty much everything is in there. And slicing will be expensive > if the implementation is not a straight array, so it's somewhat dubious. > Insert and Delete provide easier ways of adding or removing items than > slices - and how often do you use a slice of a non-string type for something > other than inserting or deleting elements anyway?? Slicing isn't included because C++ doesn't have slices, so it's a foreign concept to its library and users. If we want to attract users of inferior languages to Ada, it should be because Ada is better. Ada's slices are a way that Ada is better; Ada's standard extensible array component should be better than its competition by also offering them. I do not see mimicking C++'s shortcomings as advisable. Insertion and deletion are basic operations of lists, but not of arrays. That's why the list and vector components had the same set of operations: they both specify lists with different implementations. Since String is an array, and [Un]Bounded_String is an extensible array, and we're now told the correct name is Vector, shouldn't these be renamed to something like Character_Vector? > Ada doesn't (and isn't) going to support user-defined indexing or > user-defined attributes, so this is about the best you can do. So what's the > complaint (other than the name)?? I don't expect user-defined indexing, slices, or attributes, which is why I talked about "analogs" to them. Missing slices is one complaint. And, yes, the name is unarguably wrong. In the C family of languages, users are accustomed to having to look at implementations in order to understand how to use something. Subprogram "prototypes" (yet another misused term to add to the collection) are generally insufficient, and appropriate comments are often lacking. So it comes as no surprise to me that C++ expects newcomers to its library, looking for an extensible array, and not finding anothing with an appropriate name, to have to look at the operations of the components to find that the inappropriately named "vector" is really an extensible array. However, this is not the Ada way, and I think it completely inappropriate to mimick this mistake. Looking at other languages' library to select useful components is fine; insisting that an Ada version must be identical to that of another language, including mistakes, is not. > The user can easily code a queue in terms of a Vector (that's one of the > uses of Insert!). We dropped the list component because it had an identical > interface to the Vector component, but was less flexible (no computed O(1) > access). The perfomance of a queue based on an extensible array is likely to be just as objectionable as extracting an element from an extensible array based on a list. That the vector and list components both had the same interface is further evidence that mimicking the STL is a bad idea. Insert and delete are as foreign to an extensible array as indexing and slicing should be to a list. > In my view, it is a mistake for projects to depend on standard containers > where there are critical performance requirements (not just time, but also > space as well). In that case, you really have to have control of the > implementation -- you really need *all* of the source code. You can't trust > something provided by the standard (or your compiler vendor) in those cases. I agree. That doesn't mean that the standard shouldn't provide a basis for queues with performance characteristics suitable for performance non-critical applications, which an extensible array does not provide. **************************************************************** From: Randy Brukardt Sent: Friday, February 6, 2004 8:24 PM Jeff Carter wrote: ... > I agree. That doesn't mean that the standard shouldn't provide a basis > for queues with performance characteristics suitable for performance > non-critical applications, which an extensible array does not provide. Huh? You've said, in effect, that the performance isn't good enough for applications where the performance doesn't matter. That's a pretty goofy statement! My opinion has not changed: if you care about performance *at all*, you *cannot* depend on *any* standard containers. But usually the performance does not matter at all (or so little as to be equivalent to not at all): the number of elements in the container is small (which would be true for virtually all queues), and/or it is used infrequently, and/or the application is a throw-away. Otherwise, if you are writing portable code, you shouldn't use a predefined container library at all -- the performance is likely to vary much more across implementations than code you write yourself. For instance, on Janus/Ada, any generic list container is going run 2-5 times slower than the same list created yourself -- that's just the effect of the extra call overhead and the shared body (which means the elements will be dynamically allocated - separately - in any case - at least doubling the allocation overhead). I'd expeect that effect to be much less on GNAT, for example, because they don't share generic bodies and thus don't have the double allocation overhead. If your application doesn't care about the component being 5 times slower, then it is highly unlikely that it is going to care about whether the Vector/Sequence/List component is implemented as an array, as a list, as a tree, as a hash table, or as something else. My preference with these components would be to say absolutely nothing about performance or implementation (because anything said is as meaningless as real-time metrics are). But others believe that that would cause real portability problems, and I'm willing to go along with that. The problem I see is a lot of people are looking far too closely at tiny pieces of abstractions. You might have a queue or a list as part of a large abstraction, but they're pretty much useless by themselves. And given that creating a queue or stack (both of which have only two operations, both trivial!) would take 3 minutes max, it makes no sense to use a complex (and necessarily slow) container library for just that -- indeed, it probably would be more work to use a container than the 3 minutes. I much prefer the vision of this containers library, where the only containers included are those that are large, complex, multi-purpose, and have a clear abstraction. **************************************************************** From: Jeffrey Carter Sent: Friday, February 6, 2004 7:39 PM Matthew Heaney wrote: > No. Vector iterators are fragile, and hence very error prone. Modifying a structure from an iterator should be a bounded error. > They are fragile because the (logical) internal array gets thrown away > during expansion, which invalidates the iterator. It's too hard to keep > track of whether a vector iterator is still valid, and most of the time > you end up with a dangling reference. You can only talk about what happens internally during an operation if a specific implementation is required, which Randy assures us is not the case. > A "set" is really any sorted sequence of items. If you want set > intersection, symmetric difference, etc, then just use a generic > algorithm. See the Charles library for such algorithms. I've used sets for decades, in discrete math, in specification languages such as Z, and in programming. A set is an unordered collection of elements from a universe that provides operations such as membership, union, intersection, and the like, represented by mathematical symbols that I can't reliably represent in an e-mail. An implementation of a set may be sorted to speed up operations, but that's a feature of the implementation, not of the concept implemented. That's a distinction that many users of C-family languages seem unable to make, but that I expect from those who embrace Ada. > The name for Delete_Sans_Increment comes from Emacs lisp, which has the > functions file-name-sans-extension and file-name-sans-versions. Yet another case of mimicking others' errors. > It was also in homage to Ada's French history, given that her original > designer was French, and worked for a French company. > > Why do you think "rendevous" was named that way? "Rendezvous" is not a predefined indentifier in the ARM. It was chosen because no English word has the precise meaning intended, and Ada's designers understood the importance of precise terminology. > If you don't immediately grok how vectors and sets and maps work, then I > suggest familiarizing yourself with the STL. There are lots of tutorials > on the WWW. I've been using arrays, including extensible arrays, sets, and maps for decades. I've also been using vectors for decades, having done a lot of scientific programming that required matrix math. I doubt that a study of C++ mistakes would have any effect besides raising my blood pressure. **************************************************************** From: Jeffrey Carter Sent: Friday, February 6, 2004 7:22 PM Randy Brukardt wrote: > Precisely my point. That is intended to say that there is a logical array in > the container, but not necessarly an actual one. Matt's descriptions were > too implementation-specific, and we moved most of that. But I'm not > surprised that some was missed. On closer inspection, the Size and Resize operations certainly imply an array implementation; they are meaningless otherwise. **************************************************************** From: Randy Brukardt Sent: Friday, February 6, 2004 9:09 PM Huh? Resize tells the container a reasonable size to use; what the container does with that information is up to it. Size simply returns that information. That's no different than many of the attributes in Ada, which (if set), always return the values that they were set to. But what the compiler does with those values is (almost) completely implementation-defined. The only real requirement here is O(1) element access (which prevents the use of a straight linked list). Janus/Ada will probably use an array of pointers (or possibly array of arrays of pointers); we're going to be (implicitly) allocating the elements anyway, we might as well do it explicitly and take advantage of that to make Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements are bigger than scalar types). An array of arrays of pointers is even better, because insertion cost is bounded by the maximum size of an array chunk -- but there is more overhead and complexity, so I'd like to see some real uses before deciding on an implementation. Note that a pure list component has no real opportunity for "better" implementations, and indeed, any implementation on Janus/Ada would suffer from "double" allocation. **************************************************************** From: Martin Dowie Sent: Saturday, February 7, 2004 4:02 AM > We dropped the passive iterator from the Ada.Directories package precisely > because even ARG members were confused about how it worked. Even though it > was a classic passive iterator with a Quit parameter. Perhaps the confusion > really was the Quit parameter (I thought it was the whole idea), but in any > case, you've got to keep them simple. I didn't find it confusing so I provided an extra child Ada.Directories.Iterate - and I've used it repeatedly! > > This pattern has since shown itself to be quite common in embedded > > systems - for either domain-specific scripting languages or graphics. > > > > There is the other idiom where one is processing an iteration of items > > and an external event occurs that stops the processing - e.g. the 'stop' > > button is pushed on a GUI-search window, but it could equally be a > > 50Hz message over a 1553. > > It seems to me that an abort situation is best handled by propagating an > exception. Otherwise, you end up distributing termination code/flags > everywhere in the application. But YMMV. I have tended to work in deeply enbedded systems, where exceptions (in any language!) are at best frowned upon and quite often forbidden! :-( **************************************************************** From: Martin Dowie Sent: Saturday, February 7, 2004 4:25 AM > > I was thinking, primarily, of a project that used single (bounded) lists to > > hold commands (a basic, domain-specific, scripting language I guess), > > one of which was 'stop this sequence of commands'. > > It sounds like you have a sequence container, that you traverse from > front to back. Pretty much, although we also read in where each 'First' is as the whole contained many 'subroutines'. > The only sequence container in the proposal is a vector, which doesn't > have a passive iterator. Again, I recommend just using a loop: I suspect the first thing I will do is add an extra child generic subprogram Ada.Containers.Vectors.Iterate! :-) **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 6:16 AM > I suspect the first thing I will do is add an extra child generic > subprogram Ada.Containers.Vectors.Iterate! :-) Well, guess don't use GNAT. GNAT gets quite upset if you try to add something to the Ada packages. **************************************************************** From: Marius Amado Alves Sent: Saturday, February 7, 2004 7:45 PM I'd expect *any* compiler to get really upset with this ;-) **************************************************************** From: Martin Dowie Sent: Sunday, February 8, 2004 2:08 AM "gcc -gnatg" or "gnatmake -a" will stop any warnings :-) **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 5:09 AM > Jeffrey Carter wrote: > > Given a list, structures such as > > deques, stacks, and especially queues are easy to implement. Since > > queues are common structures and not redundant (none of the proposed > > containers provides an efficient implementation of a queue), the > > proposal itself seems to argue that lists should be provided, since they > > are not easy to code correctly, and provide a basis for the user to > > easily code queues. > The user can easily code a queue in terms of a Vector (that's one of the > uses of Insert!). We dropped the list component because it had an identical > interface to the Vector component, but was less flexible (no computed O(1) > access). True enough. But if you wanted a build generic queue on top of the vector the tag should not be hidden from view. Otherwise one need to repeat all the access methods instead of just renaming the one provided from the parent package. In fact the hidden tag is the one feature which I realey dislike in charles. **************************************************************** From: Stephen Leake Sent: Saturday, February 7, 2004 8:40 AM "Randy Brukardt" writes: > Report of the ARG Select Committee on Containers > February 3, 2004 Thanks for the committee's hard work on this. What is the rationale for making the Map Key_Type definite, as opposed to indefinite? Since an indefinite Key_Type is required for Containers.Maps.Strings, why not make that capability available to the users? I don't see a discussion of this in AI-302-03/01. Another point: Containers.Vectors.Size should return Index_Type'Base, and the Size parameter in Resize should also be Index_Type'Base. It's confusing to have different types for Size and Index. There's also a problem if Natural'Last < Index_Type'Last; you can't have a vector that contains every index! **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 6:03 PM > What is the rationale for making the Map Key_Type definite, as opposed > to indefinite? The 'committee' primarily adopted the existing proposal submitted by Matt Heaney. We decided not to change any of the major design decisions of that proposal - because no package will suit everyone or every need, and we felt it was more important to standardize something coherently designed for most needs than to fiddle endlessly with it and risk introducing serious bugs. Which is to say, I don't know. :-) > Since an indefinite Key_Type is required for > Containers.Maps.Strings, why not make that capability available to the > users? We definitely expect that the strings container will use a purpose-built data structure for storing strings, not some general indefinite item capability. Ways to compactly and efficiently store sets of varying size strings are well known and commonly used. Such algorithms could be extended to a general "unconstrained array of elementary", but that hardly seems to be a worthwhile definition for keys. ... > Another point: Containers.Vectors.Size should return Index_Type'Base, > and the Size parameter in Resize should also be Index_Type'Base. It's > confusing to have different types for Size and Index. > > There's also a problem if Natural'Last < Index_Type'Last; you > can't have a vector that contains every index! Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you want the Size and Resize operations to take a numeric type that contains zero -- and certainly Index_Type is not that. Index_Type could be a subtype of an enumeration type or a subtype of a modular type (neither of which can contain zero) or a subtype of an integer type not containing zero. We had a short, inconclusive discussion about whether the index type ought to be range <> rather than (<>) (because enumeration and modular types fail the assertion and thus aren't directly usable), but that still doesn't guarantee a zero. Moreover, if the integer type has negative numbers, then the Length of the vector could be larger than Index_Type'Last. So I don't see a great solution. I wondered about using "Hash_Type" here (it has the correct properties), but that seems like a misuse of the type (and a bad idea in a library that most Ada programmers will read - you want to show them good style in standard libraries). **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 5:15 AM > The perfomance of a queue based on an extensible array is likely to be > just as objectionable as extracting an element from an extensible array > based on a list. That the vector and list components both had the same > interface is further evidence that mimicking the STL is a bad idea. > Insert and delete are as foreign to an extensible array as indexing and > slicing should be to a list. Well, depends. Most queues are not supposed to grow indefinetly so an using a vector with an modular type as index will give you good perfomace. Every Ada tutorial contains a expample on how to do it. **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 6:14 AM > The committee selected the second proposal as a starting point for a > standard containers library, with a number of simple changes. The > changes were simple enough that we produced a version of the library with > the changes made (AI-00302-3/01). Any place where I can actualy read the draft? Anyway, looking at the reference impementation vom Matthew Heaney (thanks for the quick responce) I have an improvements to suggest: type Element_Type is private; I said this bevore that is too limiting. With that signature you can't even store strings. And more important you cant store Element'Class. In fact I predict that with that signature 80% of all data stored will be "access to something". I have often heard Ada does not need garbage collection since a good container library should take care of memory management - and now I ready to follow that point. But taking that argument, vector is not a good container. Since vector will need heap storrage anyway and performace is only a minor issue I suggest: type Element_Type (<>) is private; **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 6:05 PM > Any place where I can actualy read the draft? The same place that you can read any other AI: www.ada-auth.org. **************************************************************** From: Martin Krischik Sent: Sunday, February 8, 2004 4:58 AM I looked there but I only found a very long discussion but not the actual concluding decision. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 6:03 PM Don't know what you're looking for, but certainly the entire AI is posted there. As with all AIs, the !wording section is what goes into the standard. **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 6:24 AM > > The only sequence container in the proposal is a vector, > > Ah, yes, it's Sequence - quite right name for that container (and not > Vector). No, in my book elements in a Sequence have only a relative positions, or at least the relative position is the primary position and absolut position is only the secondary. That is: Get_Next (V); is faster or as fast as Get (V, 5); **************************************************************** From: Martin Krischik Sent: Saturday, February 7, 2004 6:32 AM > My understanding of the model is that passive iterators are only for cases > where you want to iterate over the entire container. Yes. > Indeed, given the iteration model of packages, > there's hardly any reason to use a passive iterator. Passive Iterators should allways provide the fastes mean to iterate over the hole container. They should do so by knowing the internals of the container. Of course it only matters in advanced container with B-Trees or AVL-Trees as as internal structure. But I have only seen those in IBM's Open Class Library (which is far better the the STL). But there are no advanced containers in AI 302. **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 6:21 PM > Passive Iterators should allways provide the fastes mean to iterate over the > hole container. They should do so by knowing the internals of the > container. That might be true in a language with a built-in iterator construct, but it is certainly not true in Ada because of the overhead of calling the generic formal subprogram for each element. In Janus/Ada, the overhead of calling a formal subprogram is at least double of a normal subprogram (we have to save and restore display information, because you could be calling into a more nested scope than the generic body -- something that normally isn't possible in Ada). Other compilers may not have that overhead, but they'll certainly have call overhead. Whereas, the explicit loop iterator for Vectors only needs to call Element. So the call overhead is at best a wash, and at worst much worse for the passive iterator. Moreover, the compiler is a lot more likely to be able to in-line the call to Element (which likely has a pretty simple implementation and thus will meet the in-lining qualifications), than the bunch of arbitrary code in the Process formal routine. So, a passive iterator will only be faster in complex containers (where you have to separate the Element and Successor functions). For a Vector (where the language already has the needed iteration mechanism built-in), it's going to be slower (or, if you're really lucky, the same speed) and it certainly is a lot harder to write. So I think having it on Vector would simply be for consistency; you'd never actually use it if you know you're dealing with a Vector. **************************************************************** From: Robert A. Duff Sent: Saturday, February 7, 2004 7:22 PM > Other compilers may not have that overhead, but they'll certainly have call > overhead. Whereas, the explicit loop iterator for Vectors only needs to call > Element. So the call overhead is at best a wash, and at worst much worse for > the passive iterator. Moreover, the compiler is a lot more likely to be able > to in-line the call to Element (which likely has a pretty simple > implementation and thus will meet the in-lining qualifications), than the > bunch of arbitrary code in the Process formal routine. I don't see why the compiler shouldn't inline the Process routine, assuming the compiler isn't doing shared generics. They're usually small, but anyway, the Process routine is typically called exactly once, so it shouldn't matter how big it is. **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 7:33 PM Most compilers have limitations on what can be inlined; Process (which contains arbitrary code) is far more likely to violate one of those limitations than Element (which never changes and is likely to be very simple). In addition, many compilers only inline when you give pragma Inline, and you can't do that on a generic formal. **************************************************************** From: Robert A. Duff Sent: Saturday, February 7, 2004 7:43 PM If Process violates whatever these arbitrary restrictions are, then sure, you can't get it inlined. But typically Process is very simple -- often just one line of code that calls some other procedure to do the real work, passing some additional parameters. Process isn't a "real" procedure, conceptually -- it's just the body of a loop. In my current project, we make heavy use of the generic iterator pattern, and I think that in many many cases, Process is just a line or two of code. (And if it's more, inlining is relatively less important.) >... In addition, many compilers only inline when you give pragma > Inline, and you can't do that on a generic formal. You give the inline on the actual. In non-sharing implementations, that should apply inside the instance. And the iterator procedure itself can be inlined, too. **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 8:04 PM Certainly it's not real (which is one thing I dislike about passive iterators in Ada - but we've discussed that before), but if it is very short (or the bodies of your loops are typically very short), then you're programming style must be very different from mine. The only loops that I write that are very short are those that I probably shouldn't have written in the first place (like the one finding the last '.' in a string) -- there's a routine somewhere in Ada.Strings that will do the job, but looking it up is more work than writing the loop. (And a lot of them would be replaced by a Vector/List/Sequence container if I had one.) But just looking at the spam filter I'm working on at this moment: The average loop length is about 25 lines, the mean is around 8 lines. (There are more short loops than I would have guessed. But most of them wouldn't exist if I had a container to use instead - most of them are insert-at-end or delete-specific-item from a list.) ... > You give the inline on the actual. In non-sharing implementations, > that should apply inside the instance. And the iterator procedure > itself can be inlined, too. At which point, you *equal* the performance of the active iterator. And only if *everything* goes right. The OP claimed that the passive iterator would always have better performance, and that's certainly not true for the vector container. I doubt that it would be true for the Map container, either. It could be true for a complex container, but those aren't commonly used. **************************************************************** From: Alexandre E. Kopilovitch Sent: Saturday, February 7, 2004 7:55 PM Martin Krischik wrote: > > > The only sequence container in the proposal is a vector, > > > > Ah, yes, it's Sequence - quite right name for that container (and not Vector). > > No, in my book elements in a Sequence have only a relative positions, or at > least the relative position is the primary position and absolut position is > only the secondary. I don't know in which domain your book was grown up, but I can assure you that in mathematics (and by extension in physics and other natural sciences as they use mathematical apparatus) elements of a sequence are commonly indexed, and those indices are always treated as absolute position (which may be zero or even negative). By the way, your book is also certainly not from Biology/Genetics, where term "sequence" is used heavily, and they often speak about both absolute and relative positions in sequences. We have clearly different usage of terms "vector" and "sequence": substantial part of today's software engineering (tools and books) use them one way, while mathematics (and all natural sciences that use it heavily) always use them another way. So all the argument here about Vector/Sequence here is about Ada's choice of preference: will Ada choose software engineering (effectively, Java and C++ libraries) side or mathematical/scientific side on this issue. I suppose (or hope) that the thesis "Ada is for problem space, not for solution space" implies the latter. **************************************************************** From: Martin Krischik Sent: Sunday, February 8, 2004 11:40 AM > I don't know in which domain your book was grown up, but I can assure you It's the english dictornary: "Aufeinanderfolge, Reihenfolge, Szene, Zeitfolge". Ah, you don't speak german. Well let's look for "Reihenfolge" in a rushian dictornary (and have a fight with my wives rushian keyboard): "???????????". Asking my wives what it means she said "one after the other, queue". > that in mathematics (and by extension in physics and other natural sciences > as they use mathematical apparatus) elements of a sequence are commonly > indexed, and those indices are always treated as absolute position (which > may be zero or even negative). By the way, your book is also certainly not > from Biology/Genetics, where term "sequence" is used heavily, and they > often speak about both absolute and relative positions in sequences. I have spend 4 years in Great Britain I am shure if I ask anyone on the street there "what is a sequence" he or she will answer somthing like "one after the other" - and that is relativ positioning. > We have clearly different usage of terms "vector" and "sequence": > substantial part of today's software engineering (tools and books) use them > one way, while mathematics (and all natural sciences that use it heavily) > always use them another way. Even when it comes done to software engineering: IBM's Open Class Library has a Sequence - for relativ positioning getFirst, getNext, insertAfter. Usualy used to fill listboxes. > So all the argument here about Vector/Sequence here is about Ada's choice > of preference: will Ada choose software engineering (effectively, Java and > C++ libraries) side or mathematical/scientific side on this issue. I don't like the STL that much. So I am not realy defending "vector". > I suppose (or hope) that the thesis "Ada is for problem space, not for > solution space" implies the latter. I agree with you on that too. But I think we are off topic here. **************************************************************** From: Marius Amado Alves Sent: Saturday, February 7, 2004 8:41 PM Randy Brukardt wrote: >The 'committee' primarily adopted the existing proposal submitted by Matt >Heaney. We decided not to change any of the major design decisions of that >proposal - because no package will suit everyone or every need, and we felt >it was more important to standardize something coherently designed for most >needs than to fiddle endlessly with it and risk introducing serious bugs. > >Which is to say, I don't know. :-) I do: there is none (except perhaps the implicit one: ease of implementation). On the other hand, there is a rationale for indefinite elements. This requirement has been largely felt and voiced since ever, and I included it in my Bases document (I think stored in alternative 1), and even formulated it as an Annex (stored in alternative 2 but applicable to any alternative). But I've always seemed to feel some resistance from Matt and the ARG. Which resistance I find inexplicable. I really don't see how making the element type indefinite may "compromise coherence" or "introduce bugs". Sure it complicates the implementation. But the increase in power for the user is a quantum leap, as it frees him from doing tricky memory management in many situations. In my proposed Annex I included this passage from someone who should be dear to at least one person in that group--perhaps in the hope of making those strange walls of resistance just shiver a bit: <> -- Cristopher Alexander, Foreword to [Gabriel 1996] **************************************************************** From: Randy Brukardt Sent: Saturday, February 7, 2004 9:20 PM > I do: there is none (except perhaps the implicit one: ease of > implementation). On the other hand, there is a rationale for indefinite > elements. Perhaps. But that wasn't the question. The question was why aren't there indefinite *keys*. ... > But I've always seemed to feel some > resistance from Matt and the ARG. Given that the "ARG" (other than the subcommittee) has not yet looked at these proposals, that's a pretty bizarre statement. ... > I really don't see how making the element type indefinite may > "compromise coherence" or "introduce bugs". Sure it complicates the > implementation. And, on most implementations, I would expect it to make it *many* times slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we already have to allocate an element at a time anyway.) I would guess that it is that efficiency concern that Matt is responding to. But I'll let him respond himself... **************************************************************** From: Marius Amado Alves Sent: Sunday, February 8, 2004 6:26 AM >... that wasn't the question. The question was why aren't there >indefinite *keys*. > Oops... sorry. Curiously enough if you have indefinite elements the requirement for indefinite keys looses strength: you can then use elementary containers or indefinite element positions as keys. >... > >>But I've always seemed to feel some >>resistance from Matt and the ARG. > >Given that the "ARG" (other than the subcommittee) has not yet looked at >these proposals, that's a pretty bizarre statement. Just a feeling. The proposals are there in the AI, and there was some discussion. >>I really don't see how making the element type indefinite may >>"compromise coherence" or "introduce bugs". Sure it complicates the >>implementation. > >And, on most implementations, I would expect it to make it *many* times >slower.... No. The system should chose at compile time a specific body according to the 'Definite attribute of the actual element type. Aside. Of course there is still no standard means to do this, but it would be a nice extension. Conditional compilation of generic bodies based on instantiation properties. Variant units :-) generic type T is private; ... package G is when T'Definite => ...; when others => ...; end; (On the subject of conditional compilation, see also the recent Ada Preprocessor thread on CLA.) In the meanwhile, there is no requirement that Ada.Containers be implemented strictly in Ada, is there? I doubt any Ada 95 container (arrays, files) is. End of aside. So no coherence problem, nor bugs, nor efficiency problem :-) **************************************************************** From: Tucker Taft Sent: Sunday, February 8, 2004 7:33 AM I suggest the use of controlled types if you want implicit levels of indirection in the keys or the elements. Having the container worry about storage management issues relating to elements or keys significantly increases their complexity. We very much want these containers to be straightforward to define and use. They are definitely not the final answer, but more the initial answer -- the 20% that can handle 80% of the problems. **************************************************************** From: Marius Amado Alves Sent: Sunday, February 8, 2004 12:23 PM >I suggest the use of controlled types if you want implicit >levels of indirection in the keys or the elements. That is exactly the problem. The user is forced to control. Waste of time. And bug prone. The right controlled behaviour is very hard to get. How many times is Finalize called? > Having the >container worry about storage management issues relating to elements >or keys significantly increases their complexity. If you mean inneficiency, no, at least not significantly: see the variant unit solution. If you mean source code complexity, sure, a bit, but so what? > We very much >want these containers to be straightforward to define and use. >They are definitely not the final answer, but more the initial >answer -- the 20% that can handle 80% of the problems. With only definite elements I don't believe in the 80% figure. Just think: don't you need heterogeneous arrays all the time? For class-wide programming for example? And logical records? And words, texts, pictures, all sort of variable length stuff? BTW this is the kind of "resistance" I was talking about. No technical arguments really. Just a vague downsize whish. The pointer tradition maybe. **************************************************************** From: Marius Amado Alves Sent: Sunday, February 8, 2004 12:41 PM Just to make some things clear. I began championing indefinite elements long ago. Wrote the proposals. They met the "resistance". I let it be. I assumed the proposals had been viewed and were rejected. The recent discussion made me wonder if the proposals had really been seen. So I stepped in just to make sure. I don't want to discuss the issue itself. That has been done. See the proposals (my Bases document stored in alternative 1, my proposed Annexes in alternative 2, discussions in ASCLWG, CLA and here). When I say I won't rediscuss the issue it doesn't mean I won't give focused explanations here. I'll be glad to do it. Thanks a lot. **************************************************************** From: Tucker Taft Sent: Sunday, February 8, 2004 4:25 PM > ... > > We very much > >want these containers to be straightforward to define and use. > >They are definitely not the final answer, but more the initial > >answer -- the 20% that can handle 80% of the problems. > > > With only definite elements I don't believe in the 80% figure. Just > think: don't you need heterogeneous arrays all the time? For class-wide > programming for example? And logical records? And words, texts, > pictures, all sort of variable length stuff? But in almost all of these cases, I would not want to be copying these large objects around. I (as a user of the abstraction) would want to control storage allocation of the objects. That would imply I would be using access types explicitly, or define an abstraction which used a controlled type, with perhaps reference counting of a pointed-to part. > BTW this is the kind of "resistance" I was talking about. No technical > arguments really. Just a vague downsize whish. The pointer tradition maybe. Sorry if my arguments seem vague. I would be happy to engage in a long discussion about this design choice. I would want the container to take over storage allocation only in the case where it is "uniquifying" the objects, and I expect to "leave" the objects in the container indefinitely, and pass around keys (essentially pointers or ids) for the objects. The example of the "string table" comes to mind, where in a word or language processing tool, the first thing you do is uniquify all the strings, and then only deal with indices into the string table thereafter. This sort of table generally never goes away, and just grows slowly as new unique strings occur. The string mapping was included precisely for this application, as it seems important and common. However, for other cases, we felt it was better to let the programmer control storage allocation, so that the amount of allocation, copying, and deallocation of large, variable-sized objects could be minimized, and most importantly, under control of the user. Please don't confuse "resistance" with simply a difference of opinion. We spend long hours debating incredible minutiae in the ARG meetings. We rarely take the "easy" route. We may not document our discussions publically as well as we should, but rest assured we have a vigorous debate. The minutes of ARG meetings, which tend to be very good relative to most minutes I have seen, are nevertheless able to document only the "tip of the iceberg" of the discussion. **************************************************************** From: Marius Amado Alves Sent: Sunday, February 8, 2004 6:55 PM Thanks for taking the trouble to review this issue. I'll try to summarize: You feel the user want to control allocation himself. Sometimes, yes. In those times, he just does it. The indefinite element feature won't stand on its way. I feel most of the times the user does NOT want to bother with memory management. He will love to have indefinite elements. I think this is the principal difference between us. You think all or most users prefer to control allocation themselves. I'm conviced they don't, and they'd be really happy not to have to. You fear loss of efficiency due to copying. Containers are by-reference, so you must be referring to copying of elements. But doesn't that happen just exactly when it has to, be it in the library or in the user code? Assuming a well designed library, one which moves only references, not the things, as you yourself notice. I've done proof-of-concept implementations of this for alternative 2. The process and associated discussion with Matt was recorded on the ASCLWG forum. The code is still online I think, but needs cleansing. **************************************************************** From: Jeffrey Carter Sent: Monday, February 9, 2004 1:01 AM Randy Brukardt wrote: > > Huh? You've said, in effect, that the performance isn't good enough > for applications where the performance doesn't matter. That's a > pretty goofy statement! Actually, you originally said something like that. You have said 1. That the vector component should only be used by applications where performance doesn't matter. 2. That the difference in performance between possible implementations of vector may be critical to applications that use it. If performance doesn't matter to these applications, then the restriction on implementations should be removed. However, I agree with you that even applications that are suitable for the use of standard components may find the performance difference between different implementations critical. > The problem I see is a lot of people are looking far too closely at > tiny pieces of abstractions. You might have a queue or a list as > part of a large abstraction, but they're pretty much useless by > themselves. And given that creating a queue or stack (both of which > have only two operations, both trivial!) would take 3 minutes max, it > makes no sense to use a complex (and necessarily slow) container > library for just that -- indeed, it probably would be more work to > use a container than the 3 minutes. I have seen a number of these "3-min" structures, and many of them have subtle errors. These are not beginner mistakes, either; handling dynamic structures seems to be something that a segment of developers have difficulty understanding. That these structures are not as easy to implement as they seem is part of the reason why I think a list component should be part of a standard library. Regarding Size and Resize, you wrote: > That's no different than many of the attributes in Ada, which (if set), > always return the values that they were set to. But what the compiler does > with those values is (almost) completely implementation-defined. There is a difference between a compiler directive and an operation of a package. The latter must have well defined behavior that is not implementation defined. > Huh? Resize tells the container a reasonable size to use; what the container > does with that information is up to it. Size simply returns that > information. What does Size return if Resize has not been called? This description does not agree with the specification in the proposal. Size "Returns the length of the internal array." Clearly the implementation must have something that has a length, independent of the logical length of the value stored in the vector, for Size to return. Resize "allocates a new internal array whose length is at least the value Size". Clearly the implemention must allocate a new something with a new length. What the container does with the new size is not up to it; it is specified fairly clearly. The operations, as specified, are pretty meaningless except for an array implementation. If the intention is as you described, then the operations appear to be useless, and should be eliminated. If the intention is as specified, then these operations are too tied to the implementation, and should be eliminated. > I much prefer the vision of this containers library, where the only > containers included are those that are large, complex, multi-purpose, > and have a clear abstraction. The vision I see seems to be muddied. The containers are poorly named, poorly specified, and confuse abstractions with their implementations. My intention is to help assure that Ada has as good a container library as possible in the time available. I assume that the purpose of presenting the proposal to the Ada-Comment list is to attract comments on how it could be improved, and there is time to make such comments and have them considered. I have invested most of this weekend in describing specific ways I think they could be improved. In many cases I have provided concrete suggestions for alternative wording, which I present here. I hope the result will be useful to the committee. I have already presented my thoughts on changing the type names used to be consistent with the rest of the standard. I will use the type names from the proposal here, however, to avoid confusion. Vectors The introductory text to Vectors does not make it clear that this is an extensible array (EA). After reading the package spec, I initially thought this was a list, perhaps with an unusual implementation. I doubt if I am special, so I expect such an interpretation from many readers. After reading the entire section, I encountered the Implementation Advice that a vector is similar to an array and realized that this was an EA. An EA is a useful component that I will be happy to see in the standard. However, I think it is a disservice to Ada for readers to have to read the entire section to know what they're looking at. Borrowing from the introductory text for Strings.Unbounded, which is a special case of an extensible array, I suggest something along the lines of: "An object of type Vector_Type represents an array, indexed by Index_Type with components of Element_Type, whose low bound is Index_Type'First and whose length can vary conceptually between 0 and the number of values in Index_type." The wording used by Strings.Unbounded should serve as a guide to how to word the text here. Operations in Strings.Unbounded are defined by analogy to String operations; operations in Vectors should be defined by analogy to array operations. Even with such wording changes, however, it is still going to be difficult for the reader to find what he wants. Someone looking for vectors is going to be disappointed to find EAs, and someone looking for an EA is unlikely to look at something named Vectors. Ada should be able to do better than that. Extensible_Arrays, Flexible_Arrays, and Unbounded_Arrays have already been suggested by various people here; given that we already have Unbounded_Strings, Unbounded_Arrays may be the best choice. I am not the first to note that Annex A is one of the most accessible part of the ARM, and is frequently read by those using the standard library. It makes sense to recognize this and word these sections as a users' guide where possible. So, if the ARM gains a mathematical library of matrices and vectors, we should add to it a comment that those looking for the kind of vector provided by the STL of C++ or Java's library should look at package Ada.Containers.Unbounded_Arrays (A.17.2). In the introductory text to the section, we should mention that an Unbounded_Array is equivalent to the container called Vector in the STL of C++ or Java's library (similar to the comment about pointers in 3.10). Index_Subtype is never used, so it should be eliminated. Size and Resize were discussed above. First (Vector) is always Index_Type'First, so it should be a constant. We iterate over an array A by for I in A'range loop -- use A (I) end loop; By analogy, we should iterate over an EA by for I in First .. Last (EA) loop -- use Element and Replace_Element at I end loop; Front and Back, therefore, seem to be unnecessary, and may be deleted. This has the additional advantage that it eliminates concern about Index_Type'Base needing a greater range than Index_type, and we could remove the assertion. Writing prematurely when I thought this was a list, I suggested an iterator for vectors. I retract that suggestion. It could be useful to provide an operation to add an item at an index > Index_Type'Succ (Last (Vector) ) without assigning to the intervening positions. The component doesn't currently allow this. Possible wording: procedure Append (Vector : in out Vector_Type; Index : in Index_Type; New_Item : in Element_Type); If Index <= Last (Vector), this procedure has the same effect as Replace_Element (Vector, Index, New_Item). Otherwise, the length of Vector is extended so that Last (Vector) = Index, and New_Item is assigned to the element at Index. No value is assigned to the elements at the new positions with indices in Index_Type'Succ (Last (Vector) ) .. Index_Type'Pred (Index). There should be some way to indicate that this last use of "Last (Vector)" refers to the value before the call. I don't see an easy way to do that and welcome suggestions. This leaves the problem that Natural is used for the length of a vector and the counts of inserted or deleted elements, meaning that index types with more values than Natural cannot use some index values. This is avoided in Ada.Text_IO, for example, with a type specific for that purpose. However, this is really a general problem, and a general solution might be advisable. There are no predefined modular types in Standard, so we might want to add type Maximal_Count is mod implementation-defined; Maximal_Count'Modulus is the largest power of 2 that may be used as the modulus of a modular type. We could add a note that this means Maximal_Count'Modulus = System.Max_Binary_Modulus, for clarity. I presume it would be inappropriate to reference System in Standard. If that's not acceptable, we could add somewhere in the hierarchy, perhaps in package Ada itself type Maximal_Count is mod System.Max_Binary_Modulus; [Would we also like subtype Positive_Maximal_Count?] New packages could then use Maximal_Count rather than Natural for this sort of thing. Existing packages could be augmented with parallel operations that use Maximal_Count. Maps Maps is fairly well specified. I think the introductory wording should again be modified: "The user can insert key/value pairs into a map, and then search for and delete values by specifying the key. An object of type Map_Type allows searching for a key in less than linear time." This is a hashed map and specifies an implementation based on a hash table. This is appropriate, since a hashed map requires the user to provide a hash function that is not needed by other implementations. However, I think the name should reflect this (Hashed_Maps) so that we don't unnecessarily restrain the existence of other forms of Maps. Since the exact nature of the underlying hash table is implementation defined, the user doesn't have the information needed to choose an appropriate size for it. Size and Resize therefore seem inappropriate. I can hope that users will realize they lack the information to use them meaningfully, and never call them. The initial text after the spec seems unnecessarily restrictive of the implementation. Since the implementation knows best the details of the hash table, it should determine the initial size of the table. I agree with the open issue on Swap. I see little use for this operation on any of the components. It seems inappropriate to require Insert to resize the hash table. The implementation should know best when and how to resize the table. While it's appropriate to discuss nodes as containers of key/value pairs, it unnecessarily restricts the implementation to talk of nodes being allocated and deallocated. It should be adequate to say such things as "Insert adds a new node, initialized to Key and New_Item, to Map" and "Delete deletes the node from Map". I don't understand why the string-keyed maps exist, since they are equivalent to a map with an unbounded string key. The implementation would have to store the provided key in an appropriate unbounded string, or duplicate the functionality of unbounded strings. Duplicated functionality is a bad idea. Moving the conversions to and from unbounded strings into a special component doesn't seem worth the added complexity. Sorted_Sets The wording here is similar to that for vectors. The introductory text does not describe the abstraction that the package implements. Proceeding to the package spec, the reader will probably be puzzled by the lack of basic set operations such as union and intersection. The description of the operations that follows does nothing to alleviate the confusion. A newcomer to the language may very well wonder what's wrong with these Ada people. Only at the very end of section do we discover that this is a structure that provides searching in O(log N) time. Clearly the choice of Set as the name is confusing and misleading, but I'm not sure what to suggest as an alternative. Something like Fast_Search seems to imply that it is an algorithm, not a structure. Perhaps Sorted_Searchable_Structure would work, but I'm not very happy with it. Suggestions are welcome. The introductory text needs to identify what the component is: "An object of type Searchable_Structure represents a data structure that can be searched in less than linear time." Given that this is a searchable structure, the operations seem reasonable. The descriptions of the operations clearly require an implementation that performs dynamic allocation and deallocation. This is an unnecessary constraint on the implementation. A binary search is O(log N), but is not allowed by the current specification. These descriptions should be modified along similar lines to the suggestions for maps. If the package does not use "=" for elements, why does it import it? Why doesn't the package use "="? It's not clear why it should use "equivalence" rather then equality. The package Ger_Keys turns a searchable structure into a map. A searchable structure is a common implementation of a map. Providing an alternive implementation of a map seems is fine, provided that the name indicates that it is a map. Sorted_Map might be a better name. It's quite easy to implement a map with a searchable structure component, so it would be better if the map was another component at the same level as the hashed map. I would have no objection to the standard specifying that this map be implemented with an instantiation of the searchable structure component; it would make the specification of the map easy. The primary justifications for this change are that it allows the user who wants a map based on a searchable structure to obtain it with a single instantiation, rather than the two required as it stands, and it allows both maps to have similar interfaces, which they do not have with the existing proposal. I'm glad the proposal recognizes that both searchable structures and maps based on them are useful components, even if they go to great efforts to disguise what they are. This discussion of the searchable structure and the map based on it seems to indicate a basic design problem with the hashed map component. A hash table is not trivial to implement correctly. There are uses for hash tables other than maps. As it stands, the user who wants a hash table must create one, duplicating the effort performed for the map, and increasing the likelihood of errors. Just as both a searchable structure and a map based on it are desirable, so both a hash table and a map based on it would be a good idea. The user who requires a hash table but not a map could use one that has been tested by many users, reducing both effort and likelihood of errors. Thus I suggest that the hash table be turned into a component. As with the map based on a searchable structure, I would have no problem with the standard specifying that the hashed map be implemented using the hash table component. If we can only have one of the hash table or the hashed map components, I would argue for the hash table, since it is easy to implement a map given a hash table, but difficult to implement a hash table given a map. Providing maps based on other packages allows the standard to demonstrate a layered approach to creating abstractions. Since creating useful abstractions is a basic process in software engineering, perhaps the idea might rub off on some readers. If this suggestion is accepted, the library would increase from three to five: an extensible array, a hash table, a searchable structure, a map based on the hash table, and a map based on the searchable structure. That still seems a fairly minimal library, provides the same functionality as the proposal, and adds some additional useful functionality without significant extra effort. **************************************************************** From: Martin Krischik Sent: Monday, February 9, 2004 5:40 AM > And, on most implementations, I would expect it to make it *many* times > slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we > already have to allocate an element at a time anyway.) I would guess that it > is that efficiency concern that Matt is responding to. But I'll let him > respond himself... Actualy some operation will become faster. Like instet in the midle. Also append operation which need to extend internal storrage become faster. At least when the stored data is larger then an access - which should be 80% of the cases. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 8:39 AM Randy Brukardt wrote: > Huh? Resize tells the container a reasonable size to use; what the container > does with that information is up to it. Size simply returns that > information. It returns the value chosen by the implementation, which can be at least the size specified. > The only real requirement here is O(1) element access (which prevents the > use of a straight linked list). Yes, that is correct: you cannot use a linked list to implement a vector. Indeed, if a vector container were implemented as a linked list then it wouldn't be named "vector"; it would be named "linked list" instead. My original proposal had 3 kinds of sequence containers: vectors, deques, and (linked) lists. There were 3 because each has different time and space properties. I would have liked having a list container in the final committee report, since that's the most natural container for use as a queue. (I probably use lists more often than any other container, for exactly that reason.) But the size of the proposal had to be reduced somehow. > Janus/Ada will probably use an array of pointers (or possibly array of > arrays of pointers); we're going to be (implicitly) allocating the elements > anyway, we might as well do it explicitly and take advantage of that to make > Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements > are bigger than scalar types). An array of arrays of pointers is even > better, because insertion cost is bounded by the maximum size of an array > chunk -- but there is more overhead and complexity, so I'd like to see some > real uses before deciding on an implementation. My reference implementation just uses an unbounded array internally. It sounds like you have some other implementation ideas. I have the maps done, and I'll host the new reference implementation this morning (Mon, 9 Feb). > Note that a pure list component has no real opportunity for "better" > implementations, and indeed, any implementation on Janus/Ada would suffer > from "double" allocation. But a list component has O(1) insertion and deletion at any position. A vector is O(1) only at the back end. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:09 AM Martin Dowie wrote: >>The only sequence container in the proposal is a vector, which doesn't >>have a passive iterator. Again, I recommend just using a loop: > > I suspect the first thing I will do is add an extra child generic subprogram > Ada.Containers.Vectors.Iterate! :-) You might not have to. Since there seems to be interest, I added the following two declarations to the reference implementation: generic with procedure Process (Element : in Element_Type) is <>; procedure Generic_Constant_Iteration (Vector : in Vector_Type); generic with procedure Process (Element : in out Element_Type) is <>; procedure Generic_Iteration (Vector : in Vector_Type); The latest version of the reference implementation is available at my home page: **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:14 AM Martin Krischik wrote: >>The user can easily code a queue in terms of a Vector (that's one of the >>uses of Insert!). We dropped the list component because it had an identical >>interface to the Vector component, but was less flexible (no computed O(1) >>access). > > True enough. But if you wanted a build generic queue on top of the vector the > tag should not be hidden from view. Otherwise one need to repeat all the > access methods instead of just renaming the one provided from the parent > package. > > In fact the hidden tag is the one feature which I realey dislike in charles. You mean the type tag? The components are tagged because I needed controlledness for automatic memory management. They are tagged for no other reason, and Charles is specifically designed using static, not dynamic, polymorphism. For the record I don't think it's realistic to use a vector as a queue anyway, since deletion from the front end of a vector is O(n). **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:22 AM Martin Krischik wrote: > Passive Iterators should allways provide the fastes mean to iterate over the > whole container. They should do so by knowing the internals of the container. That is correct. A passive iterator will usually beat an active iterator. But for a vector it probably doesn't make any difference. However, the latest reference implementation does have passive iterators for the vector, that look like this: generic with procedure Process (Element : in Element_Type) is <>; procedure Generic_Constant_Iteration (Vector : in Vector_Type); generic with procedure Process (Element : in out Element_Type) is <>; procedure Generic_Iteration (Vector : in Vector_Type); The latest version of the reference implementation is available at my home page: > Of course it only matters in advanced container with B-Trees or AVL-Trees as > as internal structure. But I have only seen those in IBM's Open Class Library > (which is far better the the STL). > > But there are no advanced containers in AI 302. The sorted set is implemented using a balanced tree. The reference implementation uses a red-black tree, but I suppose an AVL tree would work too. The maps are implemented using a hash table. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:30 AM Stephen Leake wrote: > What is the rationale for making the Map Key_Type definite, as opposed > to indefinite? Since an indefinite Key_Type is required for > Containers.Maps.Strings, why not make that capability available to the > users? Because that would punish users that have definite key types. Also, type String isn't just any indefinite type. It's an array. The reference implementation for String_Maps looks like this: type Node_Type; type Node_Access is access Node_Type; type Node_Type (Key_Length : Natural) is record Key : String (1 .. Key_Length); Element : aliased Element_Type; Next : Node_Access; end record; > I don't see a discussion of this in AI-302-03/01. There is a paragraph in there explaining why we have a dedicated maps whose key type is String. > Another point: Containers.Vectors.Size should return Index_Type'Base, > and the Size parameter in Resize should also be Index_Type'Base. It's > confusing to have different types for Size and Index. No. The parameter of the Resize operation specifies a hint about the future length of the container, which is subtype Natural. > There's also a problem if Natural'Last < Index_Type'Last; you > can't have a vector that contains every index! The assumption is that a container will always have fewer the Integer'Last number of elements. (On a 32 bit machine that's 4.2 billion values...) **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:34 AM Randy Brukardt wrote: > We definitely expect that the strings container will use a purpose-built > data structure for storing strings, not some general indefinite item > capability. Ways to compactly and efficiently store sets of varying size > strings are well known and commonly used. I didn't do anything special here. The internal node declaration for String_Maps looks like this: type Node_Type; type Node_Access is access Node_Type; type Node_Type (Key_Length : Natural) is record Key : String (1 .. Key_Length); Element : aliased Element_Type; Next : Node_Access; end record; I have hosted the latest version of the reference implementation at my home page: **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:49 AM Randy Brukardt wrote: >>There's also a problem if Natural'Last < Index_Type'Last; you >>can't have a vector that contains every index! > > Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you > want the Size and Resize operations to take a numeric type that contains > zero -- and certainly Index_Type is not that. Index_Type could be a subtype > of an enumeration type or a subtype of a modular type (neither of which can > contain zero) or a subtype of an integer type not containing zero. > > We had a short, inconclusive discussion about whether the index type ought > to be range <> rather than (<>) (because enumeration and modular types fail > the assertion and thus aren't directly usable), but that still doesn't > guarantee a zero. Moreover, if the integer type has negative numbers, then > the Length of the vector could be larger than Index_Type'Last. Clearly, if the container is empty, and Index_Type'Base'First = Index_Type'First, then evaluation of function Last will raise Constraint_Error. The issue is whether elaboration of a vector container object can raise CE if the Index_Type'Base'First = Index_Type'First. There's no reason why we should punish users whose generic actual index subtype has Index_Type'Base'First = Index_Type'First, since they can always defend against CE like this: if not Is_Empty (V) and then Last (V) = X then In fact my reference implementation doesn't require that Index_Type'Base'First < Index_Type'First, so the assertion in the spec is somewhat spurious. I would prefer to weaken the precondition and allow Index_Type'Base'First = Index_Type'First, but it's really up to implementors, because allowing that condition will constrain implementation choices. > So I don't see a great solution. I wondered about using "Hash_Type" here (it > has the correct properties), but that seems like a misuse of the type (and a > bad idea in a library that most Ada programmers will read - you want to show > them good style in standard libraries). As I mentioned in my previous message, Resize specifies a hint about the future number of elements in --that is, the length of-- the container. My assumption is that no container will ever have more than Integer'Last number of elements. If that assumption is incorrect, then maybe the container can be allowed to grow internally to more than Integer'Last number of elements, but can only report a maximum value of Integer'Last. Subtype Natural is the correct choice for the vector Resize operation. I think the ARG wants to use Hash_Type for Resize for the maps. My reference implementation still uses Natural. **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 4:40 PM > Clearly, if the container is empty, and Index_Type'Base'First = > Index_Type'First, then evaluation of function Last will raise > Constraint_Error. Well, some might think it's clear, but some might think Last returns First-1, which for a modular type is 'Last. I'm in favor of making the Index_Type be "range <>", and also requiring that elaboration of an instance raise an exception if 'First = 'Base'First. That would avoid all these anomalies. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 6:24 PM That seems reasonable. It was questionable whether we really needed type Index_Type is (<>); so maybe these issues will require that type Index_Type is range (<>); This is probably good enough. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:53 AM Randy Brukardt wrote: > So, a passive iterator will only be faster in complex containers (where you > have to separate the Element and Successor functions). For a Vector (where > the language already has the needed iteration mechanism built-in), it's > going to be slower (or, if you're really lucky, the same speed) and it > certainly is a lot harder to write. > > So I think having it on Vector would simply be for consistency; you'd never > actually use it if you know you're dealing with a Vector. As I mentioned in one of my previous messages, the reference implementation now has a passive iterator like this: generic with procedure Process (Element : in Element_Type) is <>; procedure Generic_Constant_Iteration (Vector : in Vector_Type); generic with procedure Process (Element : in out Element_Type) is <>; procedure Generic_Iteration (Vector : in Vector_Type); There seems to be interest in a passive iterators for vectors, so we might as well include it. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 10:00 AM Randy Brukardt wrote: > At which point, you *equal* the performance of the active iterator. And only > if *everything* goes right. The OP claimed that the passive iterator would > always have better performance, and that's certainly not true for the vector > container. I doubt that it would be true for the Map container, either. It > could be true for a complex container, but those aren't commonly used. The vector is arguably a borderline case, but we should just include a passive iterator. The latest version of the reference implementation has them for vectors, too. For both a (hashed) map and (sorted) set, a passive iterator is likely to beat an active iterator (other things being equal, of course). For a map, the reason is that you can just use a loop internally, to keep track of which bucket you're visiting. In an active iterator, you have to compute the hash value again to find the next bucket. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 10:15 AM >I suspect the first thing I will do is add an extra child generic >subprogram Ada.Containers.Vectors.Iterate! :-) This probably won't be necessary. I added passive iterators to the vector reference implementation. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 10:13 AM > And, on most implementations, I would expect it to make it *many* times > slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we > already have to allocate an element at a time anyway.) I would guess that it > is that efficiency concern that Matt is responding to. But I'll let him > respond himself... The reason is that (in what I imagine is a typical implementation) allowing the key to be indefinite would have drastic performance implications. The internal node of the map reference implementation looks like this: type Node_Type; type Node_Access is access Node_Type; type Node_Type is record Key : aliased Key_Type; Element : aliased Element_Type; Next : Node_Access; end record; I can declare the key as a record component directly, because the formal key type is definite. Were we to allow indefinite key types, then we would have to do something like: type Node_Type; type Node_Access is access Node_Type; type Key_Access is access Key_Type; type Node_Type is record Key : Key_Access; Element : aliased Element_Type; Next : Node_Access; end record; which implies allocating the key object separately from allocation of the node itself. This would unfairly punish users that have a definite actual key type (as Integer or whatever). If you want an indefinite key type, then allocate the key object yourself and instantiate the component using the key access type. This shouldn't be a problem since the map object is typically part of some higher-level abstraction anyway, so you can hide the allocation and map manipulation from the users of that higher-level abstraction. See the !examples section of the proposal for more details. **************************************************************** From: Simon J. Wright Sent: Monday, February 9, 2004 11:37 AM > The internal node of the map reference implementation looks like this: Does the aliasing of Element carry any implications for Element_Type? I am thinking of the use of discriminated types, even with defaulted discriminants, where aliasing forces the object to be constrained. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 11:48 AM It means you can't instantiate the container using a default-discriminated element type. This is the same problem you have when trying to declare a default-discriminated record on the heap, or as aliased on the stack. The solution in all cases is to use a wrapper type sans discriminant, and instantiate the component using the wrapper type as the element type. **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 2:40 PM This seems like a real issue. Either the AI needs to specify that default-discriminated record "don't work", as it were, or the implementation needs to do the record-wrapping. Tucker and I have run into this issue in our current project (I think I wrote a container package, and Tucker instantiated it like that!), and it wasn't entirely obvious what the best solution was. **************************************************************** From: Gary Dismukes Sent: Monday, February 9, 2004 2:49 PM > It means you can't instantiate the container using a > default-discriminated element type. Not stated quite right -- you can instantiate the container with such a type, but it might not work right. You might get mysterious exceptions propagating out of operations if the implementation reassigns to an Element component in a node. > This is the same problem you have when trying to declare a > default-discriminated record on the heap, or as aliased on the stack. > > The solution in all cases is to use a wrapper type sans discriminant, > and instantiate the component using the wrapper type as the element type. I think that's not an acceptable answer in this case. These aliased element components are part of the implementation. The user shouldn't need to know about them and it's an abstraction violation in my opinion if the user is forced to wrap his element type. Instead it would seem that the implementation has to do that wrapping. Ugly, but at least it keeps the ugliness internal to the container implementation. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 2:57 PM >>The solution in all cases is to use a wrapper type sans discriminant, >>and instantiate the component using the wrapper type as the element type. > > This seems like a real issue. Either the AI needs to specify that > default-discriminated record "don't work", as it were, or the > implementation needs to do the record-wrapping. The problem is that the element type is aliased. Wrapping it internally won't work because Generic_Element returns an access object that designates the element, not the wrapper. You can't satisfy both conditions simultaneously. Personally I find in-place modification of elements much more useful than being able to store (unwrapped) default-descriminated elements. One compromise solution is to only disallow instantiation of Generic_Element, rather than the whole package, if the element type has a default-discriminant. But I don't know whether this is possible within the language. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 9:31 AM >>The solution in all cases is to use a wrapper type sans discriminant, >>and instantiate the component using the wrapper type as the element type. > > I think that's not an acceptable answer in this case. These aliased > element components are part of the implementation. The user shouldn't > need to know about them and it's an abstraction violation in my opinion > if the user is forced to wrap his element type. Instead it would seem > that the implementation has to do that wrapping. Ugly, but at least > it keeps the ugliness internal to the container implementation. That won't work. Generic_Element returns an access value that designates an object of type Element_Type, not the internal wrapper type. The problem is that objects of (default-discriminated) Element_Type can't be aliased, so I'm not allowed to say Element'Access. Perhaps there is some other solution. I'm not really sure... **************************************************************** From: Gary Dismukes Sent: Monday, February 9, 2004 3:47 PM Matt Heaney wrote: > > That won't work. Generic_Element returns an access value that > designates an object of type Element_Type, not the internal wrapper > type. The problem is that objects of (default-discriminated) > Element_Type can't be aliased, so I'm not allowed to say Element'Access. True, that's a problem. > Perhaps there is some other solution. I'm not really sure... Another solution is to use 'Address and unchecked conversion to the access type, and forget the aliased component. This is starting to look unpleasant though :-( What we really need is something like Tucker's proposal in AI-363 (eliminating access subtype problems), which would prevent this pesky aliased problem altogether... **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 4:01 PM Right. And that's still on the table, so there may ultimately be no problem here for Ada 200Y. **************************************************************** From: Simon J. Wright Sent: Tuesday, February 9, 2004 3:16 AM The Booch Components use Address_To_Access_Conversions for this precise purpose. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 4:07 PM Indeed. What I was trying to do with Generic_Element is something similar to what you have in C++: { std::vector v; v.push_back(42); int& i = v.back(); ++i; // i becomes 43 } The problem is that we don't have references in Ada. But even so you can do something like this: type Integer_Access is access all Integer; function To_Access is new Integer_Vectors.Generic_Element (Integer_Access); declare V : Integer_Vectors.Vector_Type; begin Append (V, New_Item => 42); declare I : Integer renames To_Access (V, Last (V)).all; begin I := I + 1; -- I becomes 43 end; end; This works but the model breaks if the element type has a default discriminant. In the case of Integer it is perhaps not necessary to use this mechanism, but consider if the element of the container is another container. You need a variable view of the container element in order to manipulate it. I wish there some other way, something like: function Element (V : VT) return Element_Type'Reference; --in the pseudo vectors pkg declare V : Integer_Vectors.Vector_Type; begin Append (V, New_Item => 42); declare I : Integer renames Element (V, Last (V)); begin I := I + 1; end; end; Here Element_Type'Reference is some kind of virtual type that is limited and indefinite. The only thing you're allowed to do with the the value returned by a function that returns T'Reference is to rename it. But perhaps the ARG has some other, more elegant technique. Just food for thought... **************************************************************** From: Tucker Taft Sent: Monday, February 9, 2004 5:50 PM Gary Dismukes wrote: > ... > I think that's not an acceptable answer in this case. These aliased > element components are part of the implementation. The user shouldn't > need to know about them and it's an abstraction violation in my opinion > if the user is forced to wrap his element type. Instead it would seem > that the implementation has to do that wrapping. Ugly, but at least > it keeps the ugliness internal to the container implementation. I agree. Just declare a local record type that wraps the user's type. And/or hope that the AI that solves this problem gets accepted. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 10:18 AM Tucker Taft wrote: > I suggest the use of controlled types if you want implicit > levels of indirection in the keys or the elements. Having the > container worry about storage management issues relating to elements > or keys significantly increases their complexity. We very much > want these containers to be straightforward to define and use. > They are definitely not the final answer, but more the initial > answer -- the 20% that can handle 80% of the problems. Ahhhh, the voice of reason. This is exactly right. If you want indefinite key types, then you pay that privilege, by having to do the memory management of indefinite keys yourself. This is how it should be. **************************************************************** From: Martin Krischik Sent: Monday, February 9, 2004 12:40 PM But you could not even strore a collection of strings. Ok, there are unbounded strings. But storing 'Class thats the killer feature. If Ada.Containers can't do it I am not interested. The will be no 20%/80% split. Its 0% - I won't us them. **************************************************************** From: Marius Amado Alves Sent: Monday, February 9, 2004 12:36 PM Sounds more like the voice of the Devil, or at least De Sade, to me. "Want indefinite? Go do memory management!" Too much pointer programming in your minds, dudes. No doubt from much systems programming in your resum‚s, but you forget not everybody is a systems programmer. For an application programmer that 80% figure is just so wrong. (Matt, "this is exactly right", "this is how it should be"? Assertive is good but now you're sounding like some God (or Devil). I thought you were an ateist ;-) **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 12:58 PM Ada is a low-level systems programming language. It gives you the tools to build higher-level abstractions. If you need to store elements whose type is indefinite, then you have to build that abstraction yourself, perhaps using the low-level containers as a substrate. As Tucker stated, the containers are the starting point, not the ending point. Certainly, building the higher-level abstraction is much easier with the low-level containers than without. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 12:53 PM > But storing 'Class thats the killer feature. If Ada.Containers can't > do it I am not interested. The will be no 20%/80% split. Its 0% - I won't > use them. The library is designed around the common case, which means definite key and element types. If you want to store elements of type T'Class, that you have to use an access type to instantiate the component, and then do the memory management of elements yourself. This is how it should be. **************************************************************** From: Pascal Obry Sent: Monday, February 9, 2004 1:15 PM > Ada is a low-level systems programming language. It gives you the tools > to build higher-level abstractions. As you seem to like strong arguments, let me try this: This is plain wrong :) Ada is not low-level and certainly not a system programming language. Ada is an high level language without a specific domain, this is my point of view. I find really strange that only Vector is being considered for example. It would be really useful to have queue, list and stack. Now limiting the containers to definite types is another restrictions... The idea behind the Ada containers was to have a common set of useful components for Ada to avoid reinventing the wheel... So the argument "If you need to store elements whose type is indefinite, then you have to build that abstraction yourself" sounds boggus to me ;) **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 1:26 PM You can use a vector as a stack. The library doesn't need to provide a stack directly. The library does not provide a list. I wish it had a list, but the subcommittee had to reduce the scope of the library and so list didn't make the cut. You can use a list as a queue. The library doesn't need to provide a queue directly. However, the library doesn't provide a list, so it doesn't provide a queue either. Note that if you need a priority queue, you can use the sorted set. The library doesn't need to provide a priority queue directly. > The idea behind the Ada containers was to have a common set of useful > components for Ada to avoid reinventing the wheel... So the argument > "If you need to store elements whose type is indefinite, then you have to > build that abstraction yourself" sounds boggus to me ;) I didn't mean that you have to build the component from scratch. I meant only that you have to do the memory management of indefinite elements yourself. The higher-level component that you build can be implemented using the low-level containers. Real systems are built from the bottom up. All we did was to provide the lowest level in the abstraction hierarchy. **************************************************************** From: Pascal Obry Sent: Monday, February 9, 2004 2:00 PM Matthew, > You can use a vector as a stack. The library doesn't need to provide a > stack directly. Except that a stack should have a far more limited set of operations. This ensure that the stack abstraction is not worked-around. > The library does not provide a list. I wish it had a list, but the > subcommittee had to reduce the scope of the library and so list didn't > make the cut. I really think that this should be reconsidered. A list is the most used abstraction in many software I have built/seen. > You can use a list as a queue. Of course but again this is wrong in my view. The abstraction should be constrained to the set of operations for a queue. In that case why not remove the vector, it can be implemented easily with a map, the key is the index of the item in the array :) > Note that if you need a priority queue, you can use the sorted set. The This is more high level component, I agree that it is ok to not include it. If we miss some important components in the standard container library what we will do ? Use another component library like Charles or PragmArc... an not use the standard container library... so what the point ???? The most important point in a container library is *completeness* I would say. This is exactly what STL has done. **************************************************************** From: Martin Krischik Sent: Monday, February 9, 2004 12:16 PM > If you want an indefinite key type, then allocate the key object > yourself and instantiate the component using the key access type. This > shouldn't be a problem since the map object is typically part of some > higher-level abstraction anyway, so you can hide the allocation and map > manipulation from the users of that higher-level abstraction. But Ada hasn't got a garbage collector so there is the deallocation problem. Especialy when the container copied or passed around. And Ada (unlike C++) can to better! With Ada you can have a container with indefinite types where with C++ you can't. We should not give away that advantage. **************************************************************** From: Marius Amado Alves Sent: Monday, February 9, 2004 1:07 PM > Ada is a low-level systems programming language. It gives you the > tools to build higher-level abstractions. Ok. Thanks for recentring the argument. So your position is that the standard should not give high-level facilities. Personally I see Ada's doom in that position. A stillborn Ada 2005. **************************************************************** From: Pascal Obry Sent: Monday, February 9, 2004 2:03 PM Sadly, I feel alike :( **************************************************************** From: Stephen Leake Sent: Monday, February 9, 2004 1:56 PM > If you want indefinite key types, then you pay that privilege, by > having to do the memory management of indefinite keys yourself. This > is how it should be. Ok. I'd like to see that rationale documented in the final version of the AI, so people understand why Ada.Containers.String_Map isn't simply an instantiation of Ada.Containers.Map. One more argument for indefinite keys; if a C++ person looks at this, they can say "Ada generics are so weak they can't even allow a String as a key!". Not good for the "let's attract more users" goal. And I will continue to use SAL, where the containers do the memory management, because I like that design point better :). **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 2:13 PM Stephen Leake wrote: > One more argument for indefinite keys; if a C++ person looks at this, > they can say "Ada generics are so weak they can't even allow a String > as a key!". Not good for the "let's attract more users" goal. But you can't do that in C++, either. Indeed, C++ doesn't have indefinite types so it's unlikely a C++ programmer would even think to ask that question. > And I will continue to use SAL, where the containers do the memory > management, because I like that design point better :). Real systems are built from the bottom up. All we did was to provide the lowest-level in the abstraction hierarchy. **************************************************************** From: Stephen Leake Sent: Monday, February 9, 2004 4:20 PM > But you can't do that in C++, either. Indeed, C++ doesn't have > indefinite types so it's unlikely a C++ programmer would even think to > ask that question. Hmm. To be specific; can a C++ STL Map be instantiated with a C++ STL String as the Key? I'll have to check, but I bet the answer is "yes". **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 6:27 PM Yes of course an STL map can be instantiated with type std::string as the key, but that type is analogous to Ada's Unbounded_String, not String. > I'll have to check, but I bet the answer is "yes". Yes it can, but you're comparing apples and oranges. **************************************************************** From: Stephen Leake Sent: Monday, February 9, 2004 8:36 PM Ok. And Ada.Containers.Map can be instantiated with Unbounded_String as the Key. Good enough. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 2:22 PM Pascal Obry wrote: > Except that a stack should have a far more limited set of operations. This > ensure that the stack abstraction is not worked-around. Fine. Then you can implement that stack abstraction yourself, using a vector as the implementation. > I really think that this should be reconsidered. A list is the most used > abstraction in many software I have built/seen. I think so too, but the subcommittee had to reduce the scope of the proposal and so lists didn't make the cut. If you ask for too much then you might not get anything. > Of course but again this is wrong in my view. The abstraction should be > constrained to the set of operations for a queue. Fine. Then you can implement that queue abstraction yourself, using a list as the implementation. >In that case why not remove > the vector, it can be implemented easily with a map, the key is the index of > the item in the array :) That would be an example of "abstraction inversion": using a higher-level abstraction to implement a more low-level one. This is the mistake they made in Ada83, requiring that high-level tasks be used to implement low-level synchronization constructs as semaphores and monitors. Ada is a low-level systems programming language. It is not Perl. > If we miss some important components in the standard container library what we > will do ? Use another component library like Charles or PragmArc... an not use > the standard container library... so what the point ???? Do whatever you're doing now. The intent of the committee is that this small, modest set of containers will provide the impetus for a secondary standard. > The most important point in a container library is *completeness* I would > say. This is exactly what STL has done. Well, my original proposal included all the containers in the STL and then some. So don't blame me! **************************************************************** From: Pascal Obry Sent: Monday, February 9, 2004 2:52 PM > Fine. Then you can implement that stack abstraction yourself, using a > vector as the implementation. Of course, I also can implement every thing myself :) > Fine. Then you can implement that queue abstraction yourself, using a > list as the implementation. Of course, I also can implement every thing myself :) > That would be an example of "abstraction inversion": using a > higher-level abstraction to implement a more low-level one. As it is to implement a stack over a vector abstraction. > Ada is a low-level systems programming language. It is not Perl. It is not Perl, but it is not either a low-level systems programming language :) And yes I'll keep repeating this :) > Do whatever you're doing now. But I don't !!! That's the whole point of the container library. > The intent of the committee is that this small, modest set of containers > will provide the impetus for a secondary standard. Ok. That's a point. > > The most important point in a container library is *completeness* I would > > say. This is exactly what STL has done. > > Well, my original proposal included all the containers in the STL and > then some. So don't blame me! I know Matthew and I want to thanks you for the hard work. I just expected a bit more so I'm frustrated :) **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 2:00 PM Martin Krischik wrote: > But Ada hasn't got a garbage collector so there is the deallocation problem. > Especialy when the container copied or passed around. You are responsible for memory management of the indefinite elements. Implement your high-level abstraction using the low-level container, instantiated with an access type. > And Ada (unlike C++) can to better! With Ada you can have a container with > indefinite types where with C++ you can't. We should not give away that > advantage. There is only a slight difference here between Ada95 and C++. In Ada95 you can do this: procedure Insert (C : in out CT; E : in ET) is EA : constant ET_Access := new ET'(E); begin ... This will work even if ET is indefinite. In C++ the type has to have a clone operator or whatever: void insert(const e_t& e) { e_t* const pe = e.clone(); ... } Internally the components wouldn't be any different. **************************************************************** From: Stephen Leake Sent: Monday, February 9, 2004 2:04 PM Matthew Heaney writes: > Stephen Leake wrote: > > > What is the rationale for making the Map Key_Type definite, as opposed > > to indefinite? Since an indefinite Key_Type is required for > > Containers.Maps.Strings, why not make that capability available to the > > users? > > Because that would punish users that have definite key types. Can you elaborate on this? I don't see it. > Also, type String isn't just any indefinite type. It's an array. > > The reference implementation for String_Maps looks like this: > > type Node_Type; > type Node_Access is access Node_Type; > > type Node_Type (Key_Length : Natural) is > record > Key : String (1 .. Key_Length); > Element : aliased Element_Type; > Next : Node_Access; > end record; Obviously you can optimize a container if you know the specific types involved. But the standard containers aren't supposed to be about highly optimized code; they are supposed to be about generally useful code. > > I don't see a discussion of this in AI-302-03/01. > > There is a paragraph in there explaining why we have a dedicated maps > whose key type is String. Yes. It does _not_ say why Ada.Containers.Maps.Key_Type is _not_ indefinite. That's what I'd like to see. > > Another point: Containers.Vectors.Size should return > > Index_Type'Base, and the Size parameter in Resize should also be > > Index_Type'Base. It's confusing to have different types for Size > > and Index. > > No. The parameter of the Resize operation specifies a hint about the > future length of the container, which is subtype Natural. Why is it Natural? Randy pointed out that Index_Type'Base might not include 0, or even be an enumeral. I'd rather see Index_Type be specified as a signed integer, including 0, rather than have Size return a type that is not Index_Type. (SAL makes this choice). > > There's also a problem if Natural'Last < Index_Type'Last; you > > can't have a vector that contains every index! > > The assumption is that a container will always have fewer the > Integer'Last number of elements. (On a 32 bit machine that's 4.2 > billion values...) And that assumption is precisely the problem. On systems where Integer'Last is 2**15, you can't have large containers. Ada must not make such assumptions! **************************************************************** From: Stephen Leake Sent: Monday, February 9, 2004 2:14 PM > The internal node of the map reference implementation looks like this: Ok. That makes sense. I suggest this level of detail be kept in the Rationale for the Ada.Containers package. I address this issue in SAL (http://www.toadmail.com/~ada_wizard/ada/sal.html) by allowing the user to specify both the Key_Type and the Key_Node_Type, and provide a function To_Key_Node to go from one to the other. For definite keys, the types are the same, and To_Key_Node is an inlined null function, so there is no overhead. For indefinite keys, that function does the allocation. Hm. In shared code generics, I guess the "inlined null function" does not get optimized away. So perhaps this would not be an appropriate approach for a standard Ada package. Actually, in SAL, keys are always stored in the Items, so you'll only see Item_Type, Key_Type, and Item_Node_Type, not Key_Node_Type. But the principle is the same. It is more complex to instantiate SAL containers than the proposed Ada.Containers.Map. But I would argue that it is worth it. > If you want an indefinite key type, then allocate the key object > yourself and instantiate the component using the key access type. > This shouldn't be a problem since the map object is typically part of > some higher-level abstraction anyway, so you can hide the allocation > and map manipulation from the users of that higher-level > abstraction. Ok. In SAL, I don't have two layers. And I agree with others who say that Ada should provide a useful container that does "typical" memory management tasks for you. But any container is better than none :). **************************************************************** From: Alexandre E. Kopilovitch Sent: Monday, February 9, 2004 3:05 PM Pascal Obry wrote: > Ada is not low-level and certainly not a system > programming language. Ada is an high level language without a specific > domain, this is my point of view. Self-contradictory viewpoint, though - because high level language without a specific domain and low-level system programming language are roughly the same thing -:) > The idea behind the Ada containers was to have a common set of useful > components for Ada to avoid reinventing the wheel... So the argument > "If you need to store elements whose type is indefinite, then you have to > build that abstraction yourself" sounds boggus to me ;) If we call them "containers" then they should, in some substantial sense, *contain* things, not just refer to them, So, in this case, they should do all associated memory management. Otherwise, they aren't Containers, they are Inventories. It is improper name that confuses the matter and creates heated argument. Also, it seems that the library is planned without looking at new features in Ada2005, particularly, interfaces. I think that this (if true) may be a serious mistake. Interfaces may provide a way for reconciling different requirements. **************************************************************** From: Ehud Lamm Sent: Tuesday, February 10, 2004 1:04 AM I would be very happy to see an Ada.Container.Interfaces (or Ada.Container.Signatures) package/hierarchy, specifying APIs, which could then be used to achieve (static) polymorphism. I think this is the palce to provide Stack, Queue interfaces etc. as well. I think that's a good way to encourage the building block approach. As far as I recall the workshop we had in Vienna (right?), not many shared my enthusiasm, alas. **************************************************************** From: Randy Brukardt Sent: Tuesday, February 10, 2004 6:53 PM Alexandre E. Kopilovitch wrote: ... > Also, it seems that the library is planned without looking at new features > in Ada2005, particularly, interfaces. I think that this (if true) may be a > serious mistake. Interfaces may provide a way for reconciling different > requirements. I wondered how long it would be before someone asked that question. I did in fact do some (idle) thinking on that question, and I concluded that interfaces wouldn't be useful for the containers library. What you'd like is to be able to write interfaces that describe iteration, for example, and be able to use those without knowing anything about the underlying container. Similarly, you could have a sequence interface that worked with any sequence container. However, that doesn't really work. The primary problem is that the profiles of the operations of an interface are fixed other than the object itself. But, for a container, the operations contain a generic formal type (the element type), as well as the object type. That means that general interfaces (like the ones described above), for example) can't be written that would match any possible element type, only a specific element type (which is pretty useless). One way to get around that would be to put the interfaces into the generic units. But then, the interfaces would only be usable with that container -- hardly a useful interface! You might as well just use the container directly. A better way would be to make the element type an interface itself. Then you could write useful non-generic interfaces. But that would limit the contained objects to types that can have an interface: tagged types, and perhaps task and protected types (and of course have the required interface). That sort of limitation isn't going to fly for the primary container library - a container of access values is just too common and important. (I could imagine an O-O offshoot that worked at that way - in a secondary standard.) **************************************************************** From: Alexandre E. Kopilovitch Sent: Tuesday, February 10, 2004 9:45 PM Randy Brukardt wrote: > I did in fact do some (idle) thinking on that question, and I concluded that > interfaces wouldn't be useful for the containers library. > > What you'd like is to be able to write interfaces that describe iteration, > for example, and be able to use those without knowing anything about the > underlying container. Similarly, you could have a sequence interface that > worked with any sequence container. Yes. > However, that doesn't really work. The primary problem is that the profiles > of the operations of an interface are fixed other than the object itself. > But, for a container, the operations contain a generic formal type (the > element type), as well as the object type. That means that general > interfaces (like the ones described above), for example) can't be written > that would match any possible element type, only a specific element type > (which is pretty useful). This shows an unpleasant incompatilibity of interfaces with generics. Well, perhaps "incompatibility" is too strong word for that, but anyway there is some inconsistence, these notions do not collaborate smoothly. And this is a general issue, regardless of container library. > One way to get around that would be to put the interfaces into the generic > units. But then, the interfaces would only be usable with that container -- > hardly a useful interface! You might as well just use the container > directly. Yes, this is clearly a poor way. > A better way would be to make the element type an interface itself. Then you > could write useful non-generic interfaces. But that would limit the > contained objects to types that can have an interface: tagged types, and > perhaps task and protected types (and of course have the required > interface). That sort of limitation isn't going to fly for the primary > container library - a container of access values is just too common and > important. I don't understand the latter sentence - I thought that access to interfaces is permitted... I'm looking at the last example in AI-251 (under the line "A somewhat less artifical example") - there is type Object_Reference, which is access to interface type Monitored_Object'Class, and this Object_Reference is used for parameters of procedures Register and Unregister. And if you meant that those access values may point to untagged types then I think that "boxing" those untagged types will not significantly annoy a programmer. But anyway I don't think that this way is generally better. It artificially pushes a containter in position of "controlling object", which isn't a good thing. And it often convolutes thinking... seems no better than typical C++ puzzles, a maintainer's hell. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 10, 2004 11:03 PM > I don't understand the latter sentence - I thought that access to interfaces > is permitted... I'm looking at the last example in AI-251 (under the line > "A somewhat less artifical example") - there is type Object_Reference, which > is access to interface type Monitored_Object'Class, and this Object_Reference > is used for parameters of procedures Register and Unregister. Yes, but access types themselves are not tagged. What they point at is irrelevant. If you have a formal "type T is tagged private;" no access type will match that; it's the same for interfaces. You could of course wrap the access type in a tagged record, and give the interface to that, and then the element type could be that. But then you have an extra component name in every use, which is annoying. For lower-level uses, having a vector/sequence of pointers or a map of pointers certainly sounds useful and common; forcing wrapping is not going to win any style points. **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 4:28 PM Regarding support for indefinite keys, Martin Krischik said: > But you could not even strore a collection of strings. Ok, there are > unbounded strings. But storing 'Class thats the killer feature. If > Ada.Containers can't do it I am not interested. The will be no 20%/80% > split. Its 0% - I won't us them. How about this: you write a package that supports the indefinite case, and you build it on top of the (currently proposed) standard package that supports only definite? The definite-only package takes care of the hashing or whatever, and your package takes care of memory management for the indefinite keys. Maybe you try to get your package to be a de-facto standard, or a secondary standard. The point is, you *can* use the definite-only package, but only indirectly, via a wrapper package. The definite-only package isn't useless; it does *part* of the job you desire. This seems like a better design than making a single package that supports both, and somehow magically optimize the definite cases. If the RM supports indefinite, I claim it should do so by providing two separate packages. But we're trying to minimize the size of all this, so we choose just the lower-level one of those. Yeah, it would be nice if the RM provided both... **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 5:36 PM These seem like an ideal candidate for the hoped-for containers secondary standard. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 11:09 AM Randy Brukardt wrote: > If we want an array sort, we should declare one: > > generic > type Index_Type is (<>); > type Element_Type is private; > function "<" (Left, Right : Element_Type) return Boolean is <>; > type Array_Type is array (Index_Type) of Element_Type; > procedure Ada.Generic_Sort (Arr : in out Array_Type); > > (We'd need an unconstrained version, too.) But keep it separate from the > Vector one (or any List one, for that matter). I added a the following library-level declarations to the latest reference implementation: AI302.Containers.Generic_Sort_Constrained_Array AI302.Containers.Generic_Sort_Unconstrained_Array AI302.Containers.Generic_Sort The latter works for any sequence having a random-access iterator, um, I mean cursor. They're all basically the same: a simple quicksort using a median-of-3 to choose a pivot. The Generic_Sort for the vector is implemented as an instantiation of the generic sort for arrays. **************************************************************** From: Robert A. Duff Sent: Sunday, February 8, 2004 12:09 PM Marius Amado Alves wrote: > In the meanwhile, there is no requirement that Ada.Containers be > implemented strictly in Ada, is there? No. However, there is "meta requirement" that Ada.Containers be implementABLE in Ada, and I expect all implementations will be in plain vanilla Ada without compiler-specific tricks. **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 1:41 PM The proposal can be implemented in Ada today. In fact it already is: **************************************************************** From: Ehud Lamm Sent: Tuesday, February 10, 2004 12:58 AM I agree. I think the meta requirement is the way to go. If there is some good reason to resort to non-Ada code, it should be allowed, so long as the API is maintained. BUT, it would reflect badly on the languiage if the only way to implement this sort of library efficiently would require going outside the scope of the language. Remember Ada is a general purporse, reuse oriented language. One of the reasons I wanted this discussion (and I pushed for a standard container library back when practically no one wanted to hear...) is that I think that by working on standard libraries it is easier to focus on areas where the language needs improvement. I think this is in fact what's happening right now... **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 2:37 PM Right, and my point was that I want to keep it that way. I suggest the AI mention this "meta requirement" in its discussion. Some folks have suggested some sort of compiler-specific "magic" going on behind the scenes. I don't want that. >... In fact it already is: > > I thank you for your hard work on this. I haven't had a chance to look at it yet, though. What sort of copyright does it have? Can the various implementers just take your code and use it as their implementation of this AI? **************************************************************** From: Matthew Heaney Sent: Monday, February 9, 2004 2:46 PM Yes. That was the intent. We can attach any copyright necessary to allow implementors or anyone else to use it. Will the GMGPL work? I'm not an expert on these matters. **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 4:36 PM I suspect the GMGPL would work, but I'm not an expert on these matters, either. I suggest you ask Robert Dewar. **************************************************************** From: Pascal Leroy Sent: Monday, February 9, 2004 10:41 AM > I've just posted the report of the containers committee on > Ada-Comment. The executive summary follows. You can read the > whole report in the !appendix to AI-00302-3/01, which you can > find at: http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT > or you can download the ZIP or tar files from: > http://www.ada-auth.org/ais.html Good job. A few comments after a first perusal: 1 - Insisting on O(N log N) complexity for the sorting algorithm excludes Shellsort. This is misguided in my opinion, as Shellsort often behaves better in practice that Quicksort (in particular, if the input file is nearly in order). 2 - I would really like it if the definition of containers were written without a particular implementation in mind. It's OK to explain that a Vector is logically an array, but _requiring_ that insertion at the beginning should take time O(N) is nonsensical! This is preventing possibly better implementations. I have also seen in a mail by Randy that element access has to be in O(1) (somehow I can't find this in the AI). Again, I believe that this is overspecification. A skip list would be in my opinion a perfectly good implementation of a Vector, as in most practical situations the difference between O(1) and O(Log N) doesn't matter. But the O(1) requirement precludes a skip list implementation... 3 - Similarly, I don't understand why the definition of Maps insists on a hash-based implementation. I have no problem with the notion that this generic takes a hash-function, as this can be generally useful whatever the implementation strategy. But I don't see why it's necessary to insist on or expose the details of a hash-based implementation. For large maps, a tree-based implementation makes probably more sense. We should not prevent such an implementation. Furthermore, the description seems to require a hash-based implementation that tries to keep the collision lists reasonably short (by increasing the number of buckets) and that can lead to very expensive deallocation/reallocation. 4 - Like others, I don't like the type names ending in _Type (but I realize that's a matter of taste). More seriously, I don't like the usage of the word Vector, as this word is already used by AI 296. Since it might make perfect sense to have a vector-302 of vectors-296 (e.g. successive positions of a mobile) the terminology is only going to cause confusion among users. Of all the proposals that I have seen, Sequence has my preference. And I don't give a damn what the terminology is in Java or C++. **************************************************************** From: Robert Dewar Sent: Monday, February 9, 2004 11:02 AM > 1 - Insisting on O(N log N) complexity for the sorting algorithm > excludes Shellsort. This is misguided in my opinion, as Shellsort often > behaves better in practice that Quicksort (in particular, if the input > file is nearly in order). Or what about linear sorts like address calculation :-) > 2 - I would really like it if the definition of containers were written > without a particular implementation in mind. It's OK to explain that a > Vector is logically an array, but _requiring_ that insertion at the > beginning should take time O(N) is nonsensical! This is preventing > possibly better implementations. I have also seen in a mail by Randy > that element access has to be in O(1) (somehow I can't find this in the > AI). Again, I believe that this is overspecification. A skip list > would be in my opinion a perfectly good implementation of a Vector, as > in most practical situations the difference between O(1) and O(Log N) > doesn't matter. But the O(1) requirement precludes a skip list > implementation... I agree this is over specified. Also, O(1) is a bit bogus given caches anyway. > 3 - Similarly, I don't understand why the definition of Maps insists on > a hash-based implementation. I have no problem with the notion that > this generic takes a hash-function, as this can be generally useful > whatever the implementation strategy. But I don't see why it's > necessary to insist on or expose the details of a hash-based > implementation. For large maps, a tree-based implementation makes > probably more sense. We should not prevent such an implementation. > Furthermore, the description seems to require a hash-based > implementation that tries to keep the collision lists reasonably short > (by increasing the number of buckets) and that can lead to very > expensive deallocation/reallocation. I agree with Pascal here entirely > 4 - Like others, I don't like the type names ending in _Type (but I > realize that's a matter of taste). More seriously, I don't like the > usage of the word Vector, as this word is already used by AI 296. Since > it might make perfect sense to have a vector-302 of vectors-296 (e.g. > successive positions of a mobile) the terminology is only going to cause > confusion among users. Of all the proposals that I have seen, Sequence > has my preference. And I don't give a damn what the terminology is in > Java or C++. I really think the _Type suffix should be avoided, with few exceptions it is not at all RM style. **************************************************************** From: Tucker Taft Sent: Monday, February 9, 2004 12:37 PM > 1 - Insisting on O(N log N) complexity for the sorting algorithm > excludes Shellsort. This is misguided in my opinion, as Shellsort often > behaves better in practice that Quicksort (in particular, if the input > file is nearly in order). The complexity specifications are intended to set expectations, without being overly prescriptive. If there are no shared expectations, then the containers can end up being frustrating to use. As usual, the requirement associated with a complexity specification is that as N => infinity, there is some upper bound on the ratio between the actual time and the given formula. We should also make it clear whether this is for the average case, or the worst case. > 2 - I would really like it if the definition of containers were written > without a particular implementation in mind. It's OK to explain that a > Vector is logically an array, but _requiring_ that insertion at the > beginning should take time O(N) is nonsensical! Clearly we should say "no worse than O(N)". > ... This is preventing > possibly better implementations. I have also seen in a mail by Randy > that element access has to be in O(1) (somehow I can't find this in the > AI). Again, I believe that this is overspecification. A skip list > would be in my opinion a perfectly good implementation of a Vector, as > in most practical situations the difference between O(1) and O(Log N) > doesn't matter. But the O(1) requirement precludes a skip list > implementation... I am not an expert on skip lists, but it seems critical to appropriate use that any element of a vector is "directly addressible". Random access is a fundamental part of the abstraction, and if that is not efficient, it will be very hard to create applications that work reasonably across implementations. There needs to be some kind of bound on random access. If you believe O(Log N) is acceptable, we can consider that. For a vector, I personally expect O(1), where the constant factor is *very* small, and the per-component space overhead ratio is no worse than 100%, even for byte-sized components. > 3 - Similarly, I don't understand why the definition of Maps insists on > a hash-based implementation. I have no problem with the notion that > this generic takes a hash-function, as this can be generally useful > whatever the implementation strategy. But I don't see why it's > necessary to insist on or expose the details of a hash-based > implementation. For large maps, a tree-based implementation makes > probably more sense. Why? I would have thought just the opposite. A hashed map can provide an average case of O(1), and there is nothing precluding using trees for the few hash buckets that get big. > ... We should not prevent such an implementation. > Furthermore, the description seems to require a hash-based > implementation that tries to keep the collision lists reasonably short > (by increasing the number of buckets) and that can lead to very > expensive deallocation/reallocation. I feel like you are arguing both sides of the coin here. You are objecting to the behavior while at the same time saying we shouldn't specify it. If it is clear that this is an abstraction whose performance is no worse than an extensible hash table, then it is more likely it will be used appropriately. By doubling on each expansion, the number of reallocations can be kept relatively small, and the pieces left behind are generally just the right size for other growing hash tables. I suppose you could say that you will implement it in a way that makes your particular customers happy, but I don't think that is a way to create a standard. The goal is portability, not only in terms of correct execution, but also in terms of reasonable, relatively predictable performance. I agree we shouldn't overspecify, but nor should we underspecify. We need to specify enough to establish useful, reasonable expectations for implementors and users, so the container library is not just a toy, but is actually a useful part of the professional Ada programmer's toolkit. We certainly should never discourage implementors from doing better than the minimal requirements, but nor should we encourage them to deviate so much from the minimal requirements that they have effectively created a different abstraction, interfering with portability. I see the error bounds specified for the elementary functions as a similar exercise. They establish expectations, which reduces confusion and frustration, and helps make it clear when the language-defined functions can be used appropriately, and when they can't. > 4 - Like others, I don't like the type names ending in _Type (but I > realize that's a matter of taste). More seriously, I don't like the > usage of the word Vector, as this word is already used by AI 296. Since > it might make perfect sense to have a vector-302 of vectors-296 (e.g. > successive positions of a mobile) the terminology is only going to cause > confusion among users. Of all the proposals that I have seen, Sequence > has my preference. And I don't give a damn what the terminology is in > Java or C++. Of course you don't give a damn. But the question is whether other users who do write significant amounts of code in other languages will appreciate the effort to be part of the mainstream, rather than always trying to swim in our own creek, elegant and pure as it may be. **************************************************************** From: Robert Dewar Sent: Monday, February 9, 2004 1:10 PM Tucker Taft wrote: > The complexity specifications are intended to set expectations, > without being overly prescriptive. If there are no shared expectations, > then the containers can end up being frustrating to use. > As usual, the requirement associated with a complexity specification > is that as N => infinity, there is some upper bound on the ratio > between the actual time and the given formula. We should also > make it clear whether this is for the average case, or the worst case. Big O is not an upper bound, it is a description of asymptotic behavior. As written, this spec would prohibit a sort whose behavior was asymptotically linear. > I am not an expert on skip lists, but it seems critical to appropriate use > that any element of a vector is "directly addressible". Random access > is a fundamental part of the abstraction, and if that is not efficient, > it will be very hard to create applications that work reasonably across > implementations. There needs to be some kind of bound on random > access. If you believe O(Log N) is acceptable, we can consider that. > For a vector, I personally expect O(1), where the constant factor is *very* > small, and the per-component space overhead ratio is no worse than 100%, > even for byte-sized components. What does constant factor mean here? A typical implementation of arrays will have extreme variable behavior depending on caching. A naive model in which all access is constant time is unrealistic in any case. > Why? I would have thought just the opposite. A hashed map can provide > an average case of O(1), and there is nothing precluding using trees > for the few hash buckets that get big. I personally think that any comments about performance should be implementation advice, not requirements. You will get into all kinds of formal mess if you try to make them requirements, but as IA they are fine and comprehensible. > Of course you don't give a damn. But the question is whether other users > who do write significant amounts of code in other languages will appreciate > the effort to be part of the mainstream, rather than always trying to > swim in our own creek, elegant and pure as it may be. To me, sequence *is* more mainstream than vector. The latter phrase comes with far too much baggage :-) **************************************************************** From: Stephane Barbey Sent: Monday, February 9, 2004 1:54 PM Both IDL and UML (OCL) use "Sequence" for unbounded collections of ordered elements that allow the same element more than once. OCL offers Set, Bag, Sequence and Collection. The Ada mapping to IDL offers a Corba.Sequences.Unbounded (and Bounded) package that are similar in spirit (and in specification) to what the Ada.Strings.Bounded and Unbounded packages provide. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 2:07 PM > I personally think that any comments about performance should be > implementation advice, not requirements. You will get into all kinds > of formal mess if you try to make them requirements, but as IA they > are fine and comprehensible. All of the performance "requirements" *are* written as Implementation Advice. There isn't any way that I can think of to make them normative, and in any case, that would be overspecification. So, if Pascal wants to ignore them, he can -- he just has to document that fact. **************************************************************** From: Robert Dewar Sent: Monday, February 9, 2004 3:23 PM OK, sorry, missed this, then I have no objection to any of the statements, though there is still a bit of over-specification I would say :-) **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 4:27 PM By the way, I find this discussion somewhat frustrating, because there are discussions going on in ada-comment, and also on arg. People are raising some of the same points on both. It seems like the ARG should pay a lot of attention to real users on this issue, but I fear some key ARG members are not currently listening to ada-comment, and many ada-comment folks are not seeing the arg mailing list. Sigh. Anyway, Pascal Leroy said: > 2 - I would really like it if the definition of containers were written > without a particular implementation in mind. It's OK to explain that a > Vector is logically an array, but _requiring_ that insertion at the > beginning should take time O(N) is nonsensical! I'm responding to Pascal's message, because it makes the point so clearly, but this is really a more general comment. This is the *usual* view of language design, and the usual view in the Ada RM -- we specify the high-level semantics, and not the efficiency of things. However, I think for a container library, efficiency properties are the key issue. Consider "sequences" -- an ordered sequence of items, which can in principle be numbered from 1 to N (or 0 to N-1, if the programmer prefers). There are many possible implementations of "sequence" -- singly-linked lists, doubly-linked lists with dummy header, growable arrays, fixed-size arrays, etc. Programmers choose among those primarily for efficiency reasons. Therefore, I think we should be thinking about a secondary standard that contains a variety of "sequence" packages. Each should be named according to the intended implementation, so the programmer can choose wisely. We're saying "vector" (meaning "array-based" or "contiguous hunk of storage") should be the one in the next RM -- but we expect others, like linked lists. So I disagree with Pascal above -- I think the container packages *should* have a particular implementation in mind. I'll even go further than Randy, and say that instead of "O(1) access" I really want "a vector/array-based implementation". Now, you may say that's overspecification. Why shouldn't the implementer choose a "better" implementation? Well, for containers, there is no "better" -- they just have different efficiency properties (better for some uses, worse for others). As a programmer, I need to know the underlying implementation. The language designer cannot know which implementation of sequences is "better". Nor can the implementer. Only the programmer can know. Therefore, we should not let implementers choose, here. If one implementer chooses "arrays, deallocated and reallocated when growing" and the other implementer chooses "skip lists", it's a disaster -- the programmer has no idea which package to choose. I say, the vectors package should say (as Implementation Advice) "the intended implementation is as an array", rather than saying something about O(1) access. As others have pointed out, there's really no such thing as O(1) random access -- if you make the vector big enough, you will get O(log N) because of cache or paging effects. Then a secondary standard can define 17 other varieties of "sequence" that have different efficiency properties. None is "best" for all purposes. However, it is desirable that they all have interfaces that are as similar as possible. SUMMARY: Don't let implementers choose the one be-all end-all sequence package. We choose a particular sequence implementation (vectors/arrays) that is useful, and let a secondary standard build all the others. Let the programmer choose among them. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 6:01 PM Robert Duff: > By the way, I find this discussion somewhat frustrating, because there > are discussions going on in ada-comment, and also on arg. Besides Bob's "real user" concerns, I am faced with the aggrevating task of filing two unrelated threads on the same topic going on at the same time into the same AI. I fear no one is going to be able to make sense out of the !appendix section... ... > So I disagree with Pascal above -- I think the container packages > *should* have a particular implementation in mind. I'll even go further > than Randy, and say that instead of "O(1) access" I really want "a > vector/array-based implementation". That's actually what the Implementation Advice says. But of course it is Implementation Advice, so it has no force: Pascal can use a skip list if he wants. Going further than that would be useless and bad for at least some implementations. For instance, because of generic code sharing, the implementation of the Vector type will essentially be an array of pointers. Because of that, I'll probably implement this as an array of pointers, and use that to eliminate copying in insert/delete/sort operations. Technically, that would still be a correct implementation (insert would still be O(N), just the constant would be a lot lower). But clearly, the ratio of execution times between the various operations would be quite different for this package than for the "canonical" implementation. To avoid that, you'd pretty much have to specify the body of the package. But even that doesn't really help. Again, looking at Janus/Ada, you're going to get (implicit) allocations of the elements. So, for a Vector of elementary, the cost of an Insert operation could be 20 times more than for a non-sharing implementation. (While for a Vector of a type with an expensive assignment, it might only be a few percent more.) For most uses of the container, this difference in performance (which appears because the unit is generic) is likely to matter more than the O(N) performance. Of course, this is an extreme example, but it shows that the actual performance of the container is going to depend heavily on the implementation no matter what is specified in the standard. So going beyond O(N) type specifications for key operations doesn't help, and could be actively harmful (by preventing innovative implementations). **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 5:30 PM Pascal wrote: ... > 2 - I would really like it if the definition of containers were written > without a particular implementation in mind. It's OK to explain that a > Vector is logically an array, but _requiring_ that insertion at the > beginning should take time O(N) is nonsensical! This is preventing > possibly better implementations. I have also seen in a mail by Randy > that element access has to be in O(1) (somehow I can't find this in the > AI). For the record, here's the wording from the AI. (I wrote this, Matt wanted it, but didn't know how to express it. I'm not sure I do either - but I knew I didn't want to define the O(N) notation...) Implementation Advice Containers.Vectors should be implemented similarly to an array. In particular, the time taken by Append and Element should not depend on the number of items in the Vector, and the time taken by Insert or Delete at First of the vector should take time roughly proportional to the number of elements in the vector. And you are correct, the last part of the sentence should say "no worse than" or something like that. (Although I can't think of any implementation that meets the first part that doesn't also meet the second part exactly - you can reduce the constant arbitrarily, but it still is proportional to N.) > 4 - Like others, I don't like the type names ending in _Type (but I > realize that's a matter of taste). Our original idea was to avoid the "_Type". However, when I tried to do that, there were a lot of conflicts with package, subprogram, and parameter names. In the interests of the getting a report done on time, we wanted to avoid major surgery to the proposal. (Especially updating the examples would be painful.) So we stuck with "_Type". If there is a majority opinion that it is worth going forward with these packages, and that changing the names would be preferred, then I can spend the time to do it. But I don't want to spend the ARG's limited resources doing major changes if all we're going to do it kill the proposal anyway. (I would hope that no one votes against the proposal solely because they don't like the names - although such a result wouldn't surprise me.) **************************************************************** From: Robert Dewar Sent: Monday, February 9, 2004 6:01 PM Randy Brukardt wrote: > If there is a majority opinion that it is worth going forward with these > packages, and that changing the names would be preferred, then I can spend > the time to do it. But I don't want to spend the ARG's limited resources > doing major changes if all we're going to do it kill the proposal anyway. (I > would hope that no one votes against the proposal solely because they don't > like the names - although such a result wouldn't surprise me.) It's always risky to vote for something that is flawed with the expectation of fixing it. On the other hand, at least one delegation in Salem that was strongly in favor of adding the keyword CLASS to the language voted against JDI's proposal because they did not like the prefix notation (I told Jean not to mix up the issues, but he did not listen to me). They were quite dismayed that the proposal failed. So you never know... (interestingly to wonder what would have happened at Salem if that delegation had understood how the vote worked and voted their actual interests, then the vote would have been 3-2 in favor of class X is ... and the US was posed to follow the winning side, so the eventual vote would have been 4-2 and who knows what would have happened?) (sorry to digress, but it's an interesting little piece of Ada trivia history :-) **************************************************************** From: Robert Dewar Sent: Monday, February 9, 2004 6:04 PM Robert A Duff wrote: > I'm responding to Pascal's message, because it makes the point so > clearly, but this is really a more general comment. > > lot's of sensible stuff deleted here > > SUMMARY: Don't let implementers choose the one be-all end-all sequence > package. We choose a particular sequence implementation > (vectors/arrays) that is useful, and let a secondary standard build all > the others. Let the programmer choose among them. I find Bob's comments here to make a lot of sense, and I agree with all of them (yes I know that's a change in position, but I think the fact that this is IA, and Bob's useful perspective make the difference). **************************************************************** From: Pascal Leroy Sent: Monday, February 9, 2004 4:58 AM Bob chided me: > By the way, I find this discussion somewhat frustrating, > because there are discussions going on in ada-comment, and > also on arg. People are raising some of the same points on > both. It seems like the ARG should pay a lot of attention to > real users on this issue, but I fear some key ARG members are > not currently listening to ada-comment, and many ada-comment > folks are not seeing the arg mailing list. Sorry, the signal/noise ratio on Ada-Comment is too poor, I admit that I don't have the patience to read all that stuff, and I didn't want to get 50 replies to my initial message. Anyway, to avoid confusion, I promise I will shut up until this topic is discussed face-to-face in Phoenix. Randy pointed out: > All of the performance "requirements" *are* written as > Implementation Advice. There isn't any way that I can think > of to make them normative, and in any case, that would be > overspecification. I realize that they are IA, and that's fine. I am just arguing that the advices as written are excluding perfectly good implementations. Of course I can ignore them, but that's not a satisfactory answer to me: if we put them in the RM they should be useful. Tuck commented: > I agree we shouldn't overspecify, but nor should we > underspecify. We need to specify enough to establish useful, > reasonable expectations for implementors and users, so the > container library is not just a toy, but is actually a useful > part of the professional Ada programmer's toolkit. I completely agree with this principle. The performance advices are only there to prevent "bad" implementation. They should not constrain "good" implementations. For instance, using a bubble sort is a no-no, but we want an implementer to be able to use heapsort, quicksort or shellsort (or a combination of the three). Similarly, a Vector should not be implemented using a simple linked list, but an array or a skip list are both valid implementations. > If you believe O(Log N) is acceptable, we can consider that. As others have pointed out, O(1) and O(Log N) are hardly distinguishable in practice, it's only the multiplicative factor that counts, so yes, I believe that we should allow O(Log N) access for vectors. Back to Bob: > This is the *usual* view of language design, and the usual > view in the Ada RM -- we specify the high-level semantics, > and not the efficiency of things. > > However, I think for a container library, efficiency > properties are the key issue. I don't see what makes a container library so different from all the rest. Let me draw your attention to the fact that we don't specify efficiency properties for the string packages, or for the numerics (including the matrix operations of AI 296). I know that Bob doesn't do numerics, but for people who do, the performance of these libraries are likely to be more critical than that of containers. In practice what happens is that they run benchmarks, and talk sternly to their vendor if they don't like the results. > Therefore, I think we should be thinking about a secondary > standard that contains a variety of "sequence" packages. > Each should be named according to the intended > implementation, so the programmer can choose wisely. We're > saying "vector" (meaning "array-based" or "contiguous hunk of > storage") should be the one in the next RM -- but we expect > others, like linked lists. You are on the right track to kill this proposal with kindness ;-) > So I disagree with Pascal above -- I think the container packages > *should* have a particular implementation in mind. I'll even > go further than Randy, and say that instead of "O(1) access" > I really want "a vector/array-based implementation". But what do you gain if you don't specify the multiplicative factor? I have this wonderful implementation of vectors, I swear it's O(1), but for some reason the multiplicative constant is such that it takes 1 sec on average to access an element. This is a Duff-compliant implementation, but hardly a good one. Surely you don't want to get into the business of specifying the factor, right? Unless of course your target is a MIX computer ;-) > Now, you may say that's overspecification. Why shouldn't the > implementer choose a "better" implementation? Well, for > containers, there is no "better" -- they just have different > efficiency properties (better for some uses, worse for > others). The same is true for everything. For the elementary functions, you have a trade-off between speed and accuracy. Which is best? Depends on the application. For the random numbers, there is a trade-off between speed, size of the generator, and quality of the random numbers. Again, there is no better implementation. > If one implementer chooses "arrays, deallocated and > reallocated when growing" and the other implementer chooses > "skip lists", it's a disaster > -- the programmer has no idea which package to choose. Either the programmer doesn't care, for instance because they only put a few elements in the vector, and both implementations are fine (that's Randy's viewpoint, I think). Or the programmer does care, and he better run a simple benchmark with, say, a 10-million-element vector, and see what happens. > SUMMARY: Don't let implementers choose the one be-all end-all > sequence package. We choose a particular sequence implementation > (vectors/arrays) that is useful, and let a secondary standard > build all the others. Let the programmer choose among them. SUMMARY: For once, I disagree with just about everything that Bob wrote. **************************************************************** From: Robert A. Duff Sent: Tuesday, February 10, 2004 8:35 AM > Bob chided me: I didn't mean to chide you in particular. In fact, I didn't mean to chide anybody. I was merely lamenting the fact that there is no forum where the public (i.e. ada-comment folks) and the arg can discuss the issue of containers. Sorry. > > However, I think for a container library, efficiency > > properties are the key issue. > > I don't see what makes a container library so different from all the > rest. Let me draw your attention to the fact that we don't specify > efficiency properties for the string packages, or for the numerics > (including the matrix operations of AI 296). I know that Bob doesn't do > numerics, but for people who do, the performance of these libraries are > likely to be more critical than that of containers. In practice what > happens is that they run benchmarks, and talk sternly to their vendor if > they don't like the results. It seems to me that for most features of the language, either efficiency doesn't matter all that much, or else it's fairly obvious what the efficiency properties will be. I *know* (roughly) how compilers represent integers, arrays, records, etc. But there are many wildly different ways to represent sequences. I don't want Vectors represented as skip lists any more than I want built-in arrays implemented as skip lists. There are a few cases like this is Ada already. One example is size-changing records (i.e. defaulted discriminants). Some compilers choose an allocate-the-max-size strategy, and others choose a heap-based strategy. The former is unacceptable when the max size is 2**33 bytes. The latter is unacceptable in real-time systems that don't want heap allocation, or whenever the extra level of indirection is too costly. It's not obvious which implementation choice is "better". If Ada 83 had specified (as a NOTE or whatever) which choice the language designers expected, use of this feature would have been much more portable. As to numerics, I don't know what I'm talking about, but I know that the numerics annex is full of accuracy requirements. Isn't the implementer's goal simply "as fast as possible, given the accuracy requirements"? Are there wildly different implementation strategies? I was under the impression that it's more like, "spend more money, make the algorithms incrementally faster". As to matrices, I don't know what I'm talking about there, either, but don't we want all vendors to use a two-dimensional array? An implementer that chose a sparse representation wouldn't be doing any favors, right? ... > But what do you gain if you don't specify the multiplicative factor? I > have this wonderful implementation of vectors, I swear it's O(1), but > for some reason the multiplicative constant is such that it takes 1 sec > on average to access an element. This is a Duff-compliant > implementation, but hardly a good one. I trust implementers not to *deliberately* sabottage their products. But implementers need to understand what's expected of them. I want to say that for Vectors, an array-based implementation is expected -- we're not asking for the world's greatest all-purpose sequence package here; we're asking for growable arrays. > Surely you don't want to get into the business of specifying the factor, > right? Unless of course your target is a MIX computer ;-) Agreed. > SUMMARY: For once, I disagree with just about everything that Bob wrote. Oh, well. :-( **************************************************************** From: Pascal Leroy Sent: Tuesday, February 10, 2004 10:14 AM > I don't want Vectors represented as skip lists > any more than I want built-in arrays implemented as skip lists. But why? You have to explain why, you cannot just say "I don't want". When I look at the specification of Vectors, the first implementation that comes to my mind is to use an array if the vector is not too large, and dense enough. If it becomes too large I would probably want to switch to a skip list implementation: this would avoid the unreasonable O(N) cost on insertion/deletion. Similarly if the vector becomes very sparse (not many active elements), I would switch to a skip list implementation to save space (and indexing would become a bit more costly). Of course the skip list would not store individual elements, but probably chunks that have sufficiently high density. Surely there are a number of parameters/threshold to be selected to do the switch from array to skip list, but they should be easy to select by graphing the space/time characteristics of each algorithm and looking for the point where these characteristics intersect. Incidentally we do this for string search: for small strings we use the na‹ve algorithm, and when the string becomes large we switch to Boyer-Moore. You could call this overengineering, but as a user I don't see why you would complain. Now you've got to explain to me what is wrong with this approach. Let me say it again: this is the first thought that comes to my mind when I read the specification of Vectors, so I'd like to be educated. **************************************************************** From: Robert Dewar Sent: Tuesday, February 10, 2004 10:23 AM surely everyone would prefer a skip list if using a contiguous vector would force page faults for every access to the vector. I assume the real interest here is speed, not O(1) at the cost of any constant Anyway this is only IA :-) **************************************************************** From: Robert A. Duff Sent: Tuesday, February 10, 2004 11:11 AM Pascal wrote: > But why? You have to explain why, you cannot just say "I don't want". I've got nothing against skip lists. What I really want is uniformity of efficiency across implementations. The only way I know how to achieve that is for the programmer to choose among basic implementation strategies. > When I look at the specification of Vectors, the first implementation > that comes to my mind is to use an array if the vector is not too large, > and dense enough. If it becomes too large I would probably want to > switch to a skip list implementation: this would avoid the unreasonable > O(N) cost on insertion/deletion. Similarly if the vector becomes very > sparse (not many active elements), I would switch to a skip list > implementation to save space (and indexing would become a bit more > costly). OK, now you're talking about a hybrid strategy. I don't see how the implementation could know about sizes and densities at compile time, so I assume what you mean is that the Vector implementation gathers statistics at run time, and switches among different strategies based on that information. The overhead of gathering statistics and checking them at relevant times is worth it in some cases, and not in others. All I'm saying is that only the programmer can make that choice. In my current project, we use growable arrays that are almost always quite small. The above "fancy" implementation would be inappropriate. Now if you say the "fancy" implementation is a good one, fine, then the RM should encourage *all* implementers to use it. Then I, as a programmer, can know that I don't want to use the language-defined Vectors package. In other cases, I can decide that the language-defined package is appropriate. But if I have no idea what the underlying implementation is, I can *never* use the language-defined (except perhaps in toy programs that don't care about efficiency, or portability thereof). > Of course the skip list would not store individual elements, but > probably chunks that have sufficiently high density. > > Surely there are a number of parameters/threshold to be selected to do > the switch from array to skip list, but they should be easy to select by > graphing the space/time characteristics of each algorithm and looking > for the point where these characteristics intersect. > > Incidentally we do this for string search: for small strings we use the > na‹ve algorithm, and when the string becomes large we switch to > Boyer-Moore. You could call this overengineering, but as a user I don't > see why you would complain. Well, I suppose I wouldn't complain about that. > Now you've got to explain to me what is wrong with this approach. Let > me say it again: this is the first thought that comes to my mind when I > read the specification of Vectors, so I'd like to be educated. There's nothing wrong with that approach (I assume we're talking about the Vector case, not the string-search case). But if you choose that approach, and some other compiler-writer chooses a wildly different approach, the programmer will be lost. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 6:54 PM Jeffrey Carter: > Randy Brukardt wrote: > > > > Huh? You've said, in effect, that the performance isn't good enough > > for applications where the performance doesn't matter. That's a > > pretty goofy statement! > > Actually, you originally said something like that. You have said > > 1. That the vector component should only be used by applications where > performance doesn't matter. > > 2. That the difference in performance between possible implementations > of vector may be critical to applications that use it. > > If performance doesn't matter to these applications, then the > restriction on implementations should be removed. However, I agree with > you that even applications that are suitable for the use of standard > components may find the performance difference between different > implementations critical. That's what I get for trying to argue a position that I don't believe in. My position is that performance does not matter for these components. Period. However, that's a minority position, and I understand the other argument. The trouble with including performance is that then you must have enough container forms to handle the most common performance profiles - that means at least 4 sequence containers (and probably more - at least bounded and unbounded forms, and list and vector forms) and similarly at least 8 associative containers, and we simply don't have the manpower to properly specify such a library. But in any case, I'm obviously not good at arguing it the current position, and I'm not going to try anymore. --- That said, my opinion is that the only container worth having (with no performance requirements) is a map. The Set isn't sufficiently different, and no sequence container is worth the effort. And such a container probably ought to hold indefinite elements. (Performance doesn't matter, remember.) But that position is a minority position (of one!), and I'm not going to argue that, either. **************************************************************** From: Robert A. Duff Sent: Monday, February 9, 2004 7:13 PM > That's what I get for trying to argue a position that I don't believe in. ;-) > My position is that performance does not matter for these components. > Period. > > However, that's a minority position, and I understand the other argument. I'm afraid that I take the opposite position: efficiency is the key issue. I'll take the liberty of reposting my response on arg here: [Editor's note: This word-for-word repeat of 50+ lines is removed to keep these comments manageable. You can find it about 600 lines back; look for "This is the *usual* view of language design..." in a message from Bob on Monday at 4:27 PM] > The trouble with including performance is that then you must have enough > container forms to handle the most common performance profiles - that means > at least 4 sequence containers (and probably more - at least bounded and > unbounded forms, and list and vector forms) and similarly at least 8 > associative containers, and we simply don't have the manpower to properly > specify such a library. I'm saying we should lead the way toward those 4 or 8, as opposed to trying to be the last word on "sequences" or "mappings" or etc. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 7:18 PM Jeffrey Carter wrote: > Regarding Size and Resize, you wrote: > > > That's no different than many of the attributes in Ada, which (if set), > > always return the values that they were set to. But what the compiler does > > with those values is (almost) completely implementation-defined. > > There is a difference between a compiler directive and an operation of a > package. The latter must have well defined behavior that is not > implementation defined. That's a goofy statement. There are lots of package operations in Ada that have implementation-defined behavior. Try any file Open or anything in Ada.Command_Line, for instance. > > Huh? Resize tells the container a reasonable size to use; what the container > > does with that information is up to it. Size simply returns that information. > > What does Size return if Resize has not been called? The implementation-defined initial size of the container. Note that there is still quite a bit of overspecification in some of the wording. I didn't have the time or energy to rewrite every second line of Matt's proposal, and it wasn't clear that I had the support of the committee to do so, either. > If the intention is as you described, then the operations appear to be > useless, and should be eliminated. Why? Giving a container an idea of how many elements it will contain can be a great efficiency help. But there shouldn't be any specification of what it will mean. > The introductory text to Vectors does not make it clear that this is an > extensible array (EA). Probably because no one uses such a term! The first time I can recall anyone talking about extensible arrays in my 25+ years of programming (including my college courses) was last week. I of course know what is meant because the words have their conventional meanings, but I doubt that there are many people out there looking up "extensible array" in an index! ... > So, if the ARM gains a mathematical library of matrices and vectors, It already did. See AI-296, already approved. (Note that this is an old Ada 83 standard that has not been widely used - but the fact remains that Ada has had vectors in the mathematical sense for a long time.) > However, this is really a general problem, and a general solution might > be advisable. There are no predefined modular types in Standard, so we > might want to add > > type Maximal_Count is mod implementation-defined; Adding types to Standard is dangerous, because they hide ones visible via a use-clause. We're not planning to add anything named to Standard for this reason. Adding it to Ada could cause trouble if there is a use clause for Ada in a program. So, I'd suggest such a type be added to Ada.Containers (next to Hash_Type). > I don't understand why the string-keyed maps exist, since they are > equivalent to a map with an unbounded string key. The implementation > would have to store the provided key in an appropriate unbounded string, > or duplicate the functionality of unbounded strings. No, a stringspace implementation would be much better than Unbounded_String for storing large numbers of strings of unknown length. That's precisely the idea of this component (and the reason it exists separately). Unbounded_Strings require many tiny allocations, while a stringspace implementation requires just one (or a few) larger ones. ... > This discussion of the searchable structure and the map based on it > seems to indicate a basic design problem with the hashed map component. > A hash table is not trivial to implement correctly. There are uses for > hash tables other than maps. As it stands, the user who wants a hash > table must create one, duplicating the effort performed for the map, and > increasing the likelihood of errors. Huh? What could you do with a separate hash table that you couldn't do with a map? The hash "buckets" contain *something*, and that something is (or can be) the same as the map elements. I suspect that if you try to develop this separate container, you'll end up with pretty much the same interface as map - so there is no reason for a separate version. **************************************************************** From: Jeffrey Carter Sent: Tuesday, February 10, 2004 12:52 PM Randy Brukardt wrote: > That's a goofy statement. There are lots of package operations in Ada > that have implementation-defined behavior. Try any file Open or > anything in Ada.Command_Line, for instance. At least I'm consistent :) I agree. In retrospect, I worded that badly, using general terms when referring to specifics. >>> Huh? Resize tells the container a reasonable size to use; what the container >>> does with that information is up to it. Size simply returns that information. > >> What does Size return if Resize has not been called? > > The implementation-defined initial size of the container. OK. Let's see if I understand your position correctly. Resize gives the implementation a hint about a reasonable size to use, but the implementation may do whatever it wants, including nothing. Size returns the actual size of something if Resize has not been called, but the last size given to Resize if Resize has been called, regardless of what the implementation does (or doesn't) do with that size. So it appears that you are saying the implementation is required to keep track of whether Resize has been called, and to store the size passed to Resize. That doesn't seem like a very useful requirement to me. It's fun to argue this kind of thing, but we're really wasting time. My concern is not really what you think should be required, but what the proposal actually requires. > Giving a container an idea of how many elements it will contain can > be a great efficiency help. But there shouldn't be any specification > of what it will mean. That's fine. But the specification of Resize requires that it perform an allocation. That's primarily why these operations concern me. Allowing the user to know the current size doesn't seem very useful to me, but I don't see how it can hurt. Allowing the user to force a resize does seem unwise. Resize is an appropriate name for the operation as specified. I expect an operation named Resize to cause resizing. If we're really talking about giving the implementation a hint about an appropriate size, then not only does the specification need to be changed, the name also needs to be different (perhaps Size_Hint?). > Note that there is still quite a bit of overspecification in some of > the wording. I didn't have the time or energy to rewrite every second > line of Matt's proposal, and it wasn't clear that I had the support > of the committee to do so, either. Right, and most of my comments were identifying such areas and presenting alternative wording. I hope, as such, they are useful. I can understand you not being able to correct all of these, but if they are not corrected, the current proposal is unacceptable. Normally, I would think the original author would be the best person to make such changes. However, Heaney's response to suggestions that the proposal could be improved has uniformly been that the proposal is correct as it stands (although I see that after saying that the vector doesn't need an iterator, he has now added an iterator to his reference implementation). Perhaps he is more amenable to requests for modifications from the select committee. I do have time at the moment, and am willing to make the effort if that is desired. The committee needs to ask, since I'm unwilling to waste the effort. > Probably because no one uses such a term! The first time I can recall > anyone talking about extensible arrays in my 25+ years of programming > (including my college courses) was last week. I of course know what > is meant because the words have their conventional meanings, but I > doubt that there are many people out there looking up "extensible > array" in an index! Surely I didn't invent the term! I agree with you, though. This is a case where I'm familiar with the concept and have used versions of it for decades, but I've never encountered a general name for it, except I know it's not a vector. By analog to unbounded strings, perhaps unbounded array is best. > It already did. See AI-296, already approved. (Note that this is an > old Ada 83 standard that has not been widely used - but the fact > remains that Ada has had vectors in the mathematical sense for a long > time.) Good. I was not aware of this standard. However, this simply reinforces my opposition to calling unbounded arrays "vectors". > Adding types to Standard is dangerous, because they hide ones visible > via a use-clause. We're not planning to add anything named to > Standard for this reason. Adding it to Ada could cause trouble if > there is a use clause for Ada in a program. So, I'd suggest such a > type be added to Ada.Containers (next to Hash_Type). OK. This should be useful for more than containers, so I'd like to see it somewhere higher in the hierarchy, though the most important thing is to avoid defining such types all over the place, like the Count types in the IO packages. The odds of a conflict if it's in Ada are small, so I wouldn't think that would be a problem. If the ARG/committee objects to putting it in Ada, perhaps there should be a special child package for such things. > No, a stringspace implementation would be much better than > Unbounded_String for storing large numbers of strings of unknown > length. That's precisely the idea of this component (and the reason > it exists separately). Unbounded_Strings require many tiny > allocations, while a stringspace implementation requires just one (or > a few) larger ones. In general, a key is added to a map only once, and never modified. Using Unbounded_String would, therefore, only need one allocation per key, so I don't see that many tiny allocations are needed. However, you probably know more about this sort of thing, since compilers need to do this kind of thing a lot, so I may well be mistaken. > Huh? What could you do with a separate hash table that you couldn't > do with a map? The hash "buckets" contain *something*, and that > something is (or can be) the same as the map elements. Suppose I want to store Integers in a hash table so I can determine if I've seen one before. There is no mapping from Integers to anything else. Yes, I can do that with a map, by providing a dummy type for the element type, and a dummy value for the element parameters, but that's an ugly kludge. Defining a map in terms of a hash table is neither ugly nor a kludge. > I suspect that if you try to develop this separate container, you'll > end up with pretty much the same interface as map - so there is no > reason for a separate version. A hash table doesn't have an operation to obtain an element given a key, for example. I could agree that there's no reason for a separate map given a hash table, since maps are trivial to implement with a hash table, but ideally I'd like to see both. Ada can do better than expecting the use of ugly kludges. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 10, 2004 1:08 PM Jeffrey Carter wrote: > Normally, I would think the original author would be the best person to > make such changes. However, Heaney's response to suggestions that the > proposal could be improved has uniformly been that the proposal is > correct as it stands (although I see that after saying that the vector > doesn't need an iterator, he has now added an iterator to his reference > implementation). Perhaps he is more amenable to requests for > modifications from the select committee. I said a vector doesn't need an *active* iterator. My opinion on that matter hasn't changed: active iterators (aka "cursors") are too error-prone for (array-based) vectors. I wasn't sure whether we needed *passive* iterators for a vector, since Ada already provides a built-in for loop. However, there has been interest, and so *passive* iterators were added. > Suppose I want to store Integers in a hash table so I can determine if > I've seen one before. Use a set, not a map. The latest version of the reference implementation now supports the stream attributes for containers. **************************************************************** From: Jeffrey Carter Sent: Tuesday, February 10, 2004 6:35 PM Matthew Heaney wrote: > I said a vector doesn't need an *active* iterator. My opinion on > that matter hasn't changed: active iterators (aka "cursors") are too > error-prone for (array-based) vectors. > > I wasn't sure whether we needed *passive* iterators for a vector, > since Ada already provides a built-in for loop. However, there has > been interest, and so *passive* iterators were added. The actual things said were: >> Vector should have an iterator, in addition to allowing the user to >> explicitly iterate over the structure. > > No. Vector iterators are fragile, and hence very error prone. > > They are fragile because the (logical) internal array gets thrown > away during expansion, which invalidates the iterator. It's too hard > to keep track of whether a vector iterator is still valid, and most > of the time you end up with a dangling reference. I was discussing the proposal in AI-302-03, so of course I used its terminology. I did not mention cursors, nor did you. You should also look "active" and "passive" up in a good dictionary. Then perhaps you would discover what they mean, and realize that cursors are passive and procedures are active. Precision in terminology is important. >> Suppose I want to store Integers in a hash table so I can determine >> if I've seen one before. > > Use a set, not a map. A typical answer: the proposal is perfect, therefore any problem a user has with it must be with the user, not the library. Yes, I want a set, but I want a hashed set, not one based on an O(log N) search, perhaps because I know that with my hash function and expected distribution of values, I can expect O(1) from a hash table. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 9:33 AM The terms "active iterator" and "passive iterator" are discussed in section 7.3, Variations on a Theme: Iterators, in his book describing the original Booch Components library: Software Components With Ada Grady Booch Benjamin/Cummings Publishing Company 1987 p. 157-8: "Basically, there are two approaches to iteration, called active and passive. In the active approach, we expose the iterator as a collection of primitive operations, but, in the passive approach, we export only a single operation." p. 158: "We shall first discuss the active iterator. The iterator can be considered an object of an abstract data type, characterized by the following operations: Initialize, Get_Next, Value_Of, Is_Done." p. 159: "With the passive iterator, rather than exporting the type Iterator and its associated operations, we instead export a single generic procedure that is nested in the specification of the queue component." The Iterator design pattern (aka "Cursor") is described in: Design Patterns: Elements of Reusable Object-Oriented Software Erich Gamma et al Addison-Wesley Publishing Company 1995 p. 260: "Who controls the iteration? A fundamental issue is deciding which party controls the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator, and when the iterator controls it, the iterator is an internal iterator. [footnote on p.260: Booch refers to external and internal iterators as active and passive iterators, respectively. The terms "active" and "passive" describe the role of the client, not the level of activity of the iterator.] Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element in the aggregate." The footnote in Gamma was referring to the information in the section Iteration in Chap. 9 (Frameworks) of: Object-Oriented Analysis and Design with Applications, 2nd ed Grady Booch Benjamin/Cummings Publishing Company 1994 p. 356: "For each structure, we provide two forms of iteration. Specifically, an active iterator requires that clients explicitly advance the iterator; in one logical expression, a passive iterator applies a client-supplied function, and so requires less collaboration on the part of the client. [footnote on p. 356: Passive iterators implement an "apply" function, an idiom commonly used in functional programming languages.]" Section 8.3.6 (Iterators) of the Ada95 Quality and Style Guide explains the difference between active iterators and passive iterators as follows: "The terms active and passive are used to differentiate whether the iteration mechanism (i.e., the way in which the complex data structure is traversed) is exposed or hidden. A passive iterator hides the traversal (e.g., looping mechanism) and consists of a single operation, iterate, that is parameterized by the processing you do on each element of the data structure. By contrast, an active iterator exposes the primitive operations by which you traverse the data structure (Booch 1987)." My article at adapower.com, "Iterator and Factory Method Patterns Combined," describes the difference between an active and passive iterator as follows: "There are two kinds of iterators: passive and active. A passive iterator controls the actual movement within the data structure, and all a client has to do is supply a procedure to receive each item in turn. "An active iterator moves the responsibility for movement onto the client. Unlike a passive iterator, which is essentially just a generic subprogram, an active iterator is an actual type, with primitive operations for retrieving the current item and for moving to the next item in the sequence." The "Algorithms and Data Structures I" (CS 131) course at the Dept of Computer Science of The George Washington University has this to say about the distinction between passive and active iterators: "The linked-list package introduced in Section 8.2 provides an operation called Traverse, which moves through the list, from beginning to end, one element at a time, until each element has been "visited" exactly once. "Formally, this Traverse operation is an example of a passive iterator operation. An iterator is any operation that iterates through a data structure one element at a time; we call it passive because the client program simply calls it once and "stands back" passively while the iterator roams through the entire structure. In this note, we use the terms traversal and iteration interchangeably. "Sometimes an application requires iterating through a structure, touching each element once, but allowing the client program the flexibility to decide just when to proceed to the next element. Moving through a structure in this fashion is called active iteration, because the client program is actively involved in the process at every step. Active Iterator Operations: To be actively involved in the iteration, the client program must execute a loop. We know that any loop must contain statements for loop initialization, termination, and incrementation; to support active iteration, the data structure package must provide these operations, and also one for retrieval of the current element in the traversal." The "Advanced Object-Oriented Design & Programming" (CS 635) at San Diego State University says this about passive iterators: "Neither Java nor C++ support passive iterators. Smalltalk does support them. In a passive iterator, you pass a method or function to the composite object, and the object then applies the method to all elements in the object." In the topic "Generic Programming: Iterators" in the CS 412/512 course at Old Dominion University, section 1.1 defines passive and active iterators this way: "Iterators can be: o passive: we pass a function to the iterator and tell it to apply the function to each item in the collection o active: we ask the iterator to give us items, and each time it does, we apply the desired function to it." In his description of the Bedrock framework for Macintosh apps, Scott L. Taylor describes the iterators of the C++ Booch Components as follows: "Each structure comes with its own form of an iterator that allows traversal of items within a structure. Two types of iterators are provided for each structure, passive and active. Passive iterators require much less interaction on the part of the client. A passive iterator is instantiated and used by calling the iterator's apply() method with a function pointer to the function to apply to all the elements within the structure. Active iterators allow much more flexibility but require more interaction from the client. Active iterators must be told to go on to the next item, and the iterator object returns a reference to each item in the structure for the client to process or use. Active iterators are very similar to MacApp style iterators." The iterators of the container classes in the ET++ framework are described like this: "There are two types of iterators - passive and active iterators. The latter provide methods for iterating to be called directly by the client while with passive iterators the client provides a method to be called on each element in the container." In "An Overview of the Booch Components for Ada95," the iterators are described this way: "There are two forms: active and passive. Active iteration requires the client explicitly advance the iterator. For passive, the client supplies a single function "Apply" to work across the structure." > Precision in terminology is important. Indeed, and my use of the terms "active iterator" and "passive iterator" is consistent with the references cited above. > Yes, I want a set, but I want a hashed set, not one based on an O(log N) > search, perhaps because I know that with my hash function and expected > distribution of values, I can expect O(1) from a hash table. My original proposal had hashed sets. (It also had sorted maps.) However, in order to reduce the scope of the change to the language standard the size of the proposal was reduced, and hashed sets didn't make the cut. No one got every container they wanted, not even me. If you need a hashed set right now, then just grab the hash table from the reference implementation and assemble it yourself. **************************************************************** From: Jeffrey Carter Sent: Wednesday, February 11, 2004 12:56 PM Matthew Heaney wrote: > The terms "active iterator" and "passive iterator" are discussed in > section 7.3, Variations on a Theme: Iterators, in his book describing > the original Booch Components library: > > Software Components With Ada > Grady Booch > Benjamin/Cummings Publishing Company 1987 I'm familiar with Booch and the many errors he made in this book. I'm also aware that many others are unable to think for themselves and have slavishly followed his lead. I see that you have not looked up "active" and "passive" and thought about what the phrase "active iterator" actually means in English. You have simply quoted the errors of others. Argument by authority is always suspect. We now have a situation where the terms are actively confusing, and no one who wants to communicate effectively uses them. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 1:41 PM I don't know what you mean by "glory,"' Alice said. Humpty Dumpty smiled contemptuously. `Of course you don't -- till I tell you. I meant "there's a nice knock-down argument for you!"' `But "glory" doesn't mean "a nice knock-down argument,"' Alice objected. `When _I_ use a word,' Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean -- neither more nor less.' `The question is,' said Alice, `whether you CAN make words mean so many different things.' `The question is,' said Humpty Dumpty, `which is to be master - - that's all.' **************************************************************** From: Marius Amado Alves Sent: Wednesday, February 11, 2004 2:07 PM > We now have a situation where the terms are actively confusing, and no > one who wants to communicate effectively uses them. Please let's define then: - active iterator: use of a Cursor_Type object - passive iterator: use of a generic iteration procedure I hope that's right... /* Personally I don't find the active/passive metaphor the most appropriate. Manual/automatic would be more fitting. Active/passive for me is more suggestive of program/data and read-and-write/read-only. Also a confusion was that before alternative 3 "iterator" meant two different things, namely the cursor and the abstract procedure (use of...). But now it only means the latter. */ **************************************************************** From: Jeffrey Carter Sent: Wednesday, February 11, 2004 5:46 PM Marius Amado Alves wrote: > > Please let's define then: > - active iterator: use of a Cursor_Type object > - passive iterator: use of a generic iteration procedure > I hope that's right... No, the proposal has this right: Cursor : a value that indicates a specific element in a container. Iterator: a procedure that applies an action to each element in a container in turn. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 9:38 AM > - active iterator: use of a Cursor_Type object Yes. > - passive iterator: use of a Generic_Iteration procedure Yes. > I hope that's right... Yes, that's correct. > Also a confusion was that before alternative 3 "iterator" meant two different > things, namely the cursor and the abstract procedure (use of...). But now it > only means the latter. An iterator is a mechanism for visiting elements in a container. There are two kinds of iterators: "active" iterators and "passive" iterators. **************************************************************** From: Stephen Leake Sent: Tuesday, February 10, 2004 2:45 PM Jeffrey Carter writes: > Allowing the user to know the current size doesn't seem very useful to > me, but I don't see how it can hurt. Allowing the user to force a resize > does seem unwise. The user could run her application for a while, then query the current size of the map and store it in a config file. Then, when the application starts the next time, it reads the required size from the config file, and calls Map.Resize. The intent is that this allows the application to avoid all the resizes on the second run. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 10, 2004 3:00 PM That's one (clever) application of Resize. The intent is that if you know a priori what the ultimate number of elements will be, then this avoids any expansion during insertion. Insertion behavior is thus more uniform. See the examples in ai302/hash and ai302/hash2 in the reference implementation for more ideas. **************************************************************** From: Jeffrey Carter Sent: Tuesday, February 10, 2004 6:40 PM I think I was talking about vectors. Length is sufficient for this. The main problem is that the specification prohibits some implementations: Resize is specified as requiring an allocation, which may not be appropriate for some implementations. Size_Hint, with no requirement what the implementation does with the value, is more appropriate. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 10:37 PM Jeffrey Carter wrote: (Sorry, I missed this yesterday.) ... > OK. Let's see if I understand your position correctly. Resize gives the > implementation a hint about a reasonable size to use, but the > implementation may do whatever it wants, including nothing. Size returns > the actual size of something if Resize has not been called, but the last > size given to Resize if Resize has been called, regardless of what the > implementation does (or doesn't) do with that size. > > So it appears that you are saying the implementation is required to keep > track of whether Resize has been called, and to store the size passed to > Resize. That doesn't seem like a very useful requirement to me. Yup. That's precisely how Type'Size works in Ada; it has a fairly weak effect on Obj'Size, but in any case, if you set it, you have to return the same value (even if that value has nothing to do with how objects are actually stored). ... > Resize is an appropriate name for the operation as specified. I expect > an operation named Resize to cause resizing. If we're really talking > about giving the implementation a hint about an appropriate size, then > not only does the specification need to be changed, the name also needs > to be different (perhaps Size_Hint?). I don't see a strong need to change the name, but I do agree with you that there shouldn't be a *requirement* to do some allocation. ... > > No, a stringspace implementation would be much better than > > Unbounded_String for storing large numbers of strings of unknown > > length. That's precisely the idea of this component (and the reason > > it exists separately). Unbounded_Strings require many tiny > > allocations, while a stringspace implementation requires just one (or > > a few) larger ones. > > In general, a key is added to a map only once, and never modified. Using > Unbounded_String would, therefore, only need one allocation per key, so > I don't see that many tiny allocations are needed. However, you probably > know more about this sort of thing, since compilers need to do this kind > of thing a lot, so I may well be mistaken. One allocation per key is a lot more than one allocation per *map*, which is what a stringspace implementation takes. (Well, it might have to expand if it gets full, but that should be rare. It could degrade to one allocation per key if the keys are very, very long, but some care in implementation should prevent degrading.) > > Huh? What could you do with a separate hash table that you couldn't > > do with a map? The hash "buckets" contain *something*, and that > > something is (or can be) the same as the map elements. > > Suppose I want to store Integers in a hash table so I can determine if > I've seen one before. There is no mapping from Integers to anything > else. Yes, I can do that with a map, by providing a dummy type for the > element type, and a dummy value for the element parameters, but that's > an ugly kludge. Defining a map in terms of a hash table is neither ugly > nor a kludge. I have a component like that (it's actually Tom Moran's), but in practice, I've *never* used it without using the index values it provides to manage some other data in a separate table (at least statistics and/or debugging). Even the 'known words' list in the spam filter uses the indexes (handles) for debugging. If that's the case, why bother having to use a separate component (causing another chance of error)? So I would guess that the "dummy type" would gain some real data in 95% of the applications. And that such uses are less than 10% of the uses of a map anyway. Since this is a minimal library, we're not trying to cover that remaining 0.5%. **************************************************************** From: Randy Brukardt Sent: Monday, February 9, 2004 7:41 PM Matt Heaney said: ... > As I mentioned in my previous message, Resize specifies a hint about the > future number of elements in --that is, the length of-- the container. > My assumption is that no container will ever have more than Integer'Last > number of elements. Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to assume that no container every has more than 32767 elements?? > If that assumption is incorrect, then maybe the container can be allowed > to grow internally to more than Integer'Last number of elements, but can > only report a maximum value of Integer'Last. > > Subtype Natural is the correct choice for the vector Resize operation. > > I think the ARG wants to use Hash_Type for Resize for the maps. My > reference implementation still uses Natural. Wow! I've been promoted to be the entire ARG! :-) No, I think we should use a purpose-built type for this, just like we did for hashing (and for the same reasons). I hope we don't repeat the mistake of Ada.Strings.Unbounded (which, at least has a justification for making that mistake). **************************************************************** From: Matthew Heaney Sent: Tuesday, February 10, 2004 9:19 AM > Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to > assume that no container every has more than 32767 elements?? I assumed that type Integer corresponded to the "natural" word size of the machine, and that if Integer were only 16 bits that this portended other, more invasive resource issues, which precluded very large numbers of container elements. But it just goes to show you I don't know very much... > Wow! I've been promoted to be the entire ARG! :-) Sorry about that, I should have said "ARG select committee on containers" but laziness got the better of me. I'll try to more clear in the future. > No, I think we should use a purpose-built type for this, just like we did > for hashing (and for the same reasons). I hope we don't repeat the mistake > of Ada.Strings.Unbounded (which, at least has a justification for making > that mistake). OK. But it would be nice if the operators of the length/count/size type were directly visible at the point where the container instance is declared, without having to with Ada.Containers too. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 9:53 PM Matt Heaney wrote: > Randy Brukardt wrote: > > > Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to > > assume that no container every has more than 32767 elements?? > > I assumed that type Integer corresponded to the "natural" word size of > the machine, and that if Integer were only 16 bits that this portended > other, more invasive resource issues, which precluded very large numbers > of container elements. Never assume about the Standard. :-) Janus/Ada made the choice of leaving Integer at 16-bits to ease porting of our many 16-bit customers to our 32-bit compilers. That probably was a bad choice (because it harms portability of other Ada code to Janus/Ada), but in any case we're pretty much stuck with it. (Changing would break too much existing code and especially files.) 3.5.4(21) is the only requirement on the range of Integer; there isn't anything else, not even Implementation Advice, about going further. If you want something specific, declare your own. > > Wow! I've been promoted to be the entire ARG! :-) > > Sorry about that, I should have said "ARG select committee on > containers" but laziness got the better of me. I'll try to more clear > in the future. No, this idea was one that I idly mentioned (and dismissed) a couple of days ago. I'm pretty sure no one else has talked about it (in either direction). **************************************************************** From: Jeff Cousins Sent: Tuesday, February 10, 2004 9:45 AM Given that the Booch components are now available for free from AdaPower, is there a pressing need for other containers? Though having said that, we paid for the Booch components but only found list_single_bounded_managed, list_utilities_single, heap_sort and quick_sort to be of much use. **************************************************************** From: Ehud Lamm Sent: Tuesday, February 10, 2004 12:44 AM > If you want to store elements of type T'Class, that you have to use an > access type to instantiate the component, and then do the memory > management of elements yourself. > > This is how it should be. I agree with Matt on this one. Especially as regard 'class. However, I think that strings should be treated as a speciall case. It seems to me that the easiest approach is to provide a special version of the packages for this case (a wrapper), which accepts string parameters (and return type from functions), and uses unbounded internally. This wrapper can be implemented on top of the basic library (instantiate with unbounded, and let the wrapper routines simply do the string<->unbounded string conversions). One of the good things about having a standard library is that the restricted component I described and others like it are going to be easy to create, and share, seeing as they are based on packages all Ada users are likely to have available. It is not mandatory they themselves be part of the standard (though in this case I think it would be a valuable addition). **************************************************************** From: Ehud Lamm Sent: Tuesday, February 10, 2004 12:52 AM > The most important point in a container library is *completeness* I would > say. This is exactly what STL has done. This is a good point, and keep in mind that I firmly belong to the 80/20 camp. The reason why this point is well taken is that noone is likely to want to use 2 (or 3) different contianer libraries inside one application. So the feature rich library is likely to win over restricted (even standard) ones. At least when building the _second_ application using such a library... Howver, I don't think this means adding more stuff to Ada.Containers at this point. Let's be practical here. Time is short etc. etc. What should be done, however, is for the community to provide more components based on the same style (and based on the simple building blocks that are part of the stadnard lib). Some of these will be adopted into the core later on, and some will simply coexist nicely with the standard lib while remaining independent. **************************************************************** From: Martin Krischik Sent: Tuesday, February 10, 2004 2:07 PM Am Montag, 9. Februar 2004 23:28 schrieb Robert A Duff: > Regarding support for indefinite keys, > > Martin Krischik said: > > But you could not even strore a collection of strings. Ok, there are > > unbounded strings. But storing 'Class thats the killer feature. If > > Ada.Containers can't do it I am not interested. The will be no 20%/80% > > split. Its 0% - I won't us them. > > How about this: you write a package that supports the indefinite case, > and you build it on top of the (currently proposed) standard package > that supports only definite? Did that allready - but it based on the booch components. > The point is, you *can* use the definite-only package, but only > indirectly, via a wrapper package. The definite-only package isn't > useless; it does *part* of the job you desire. This seems like a better > design than making a single package that supports both, and somehow > magically optimize the definite cases. Agreed, two packages are betten the one. And currently I do the same with the booch componentes - only I create one from the other with the help of an text filter instead of using a wrapper. > If the RM supports indefinite, I claim it should do so by providing two > separate packages. But we're trying to minimize the size of all this, > so we choose just the lower-level one of those. Maybe the RM should suggest names for extended containers. **************************************************************** From: Martin Krischik Sent: Tuesday, February 10, 2004 2:16 PM Am Montag, 9. Februar 2004 19:52 schrieb Matthew Heaney: > The library is designed around the common case, which means definite key > and element types. > > If you want to store elements of type T'Class, that you have to use an > access type to instantiate the component, and then do the memory > management of elements yourself. > > This is how it should be. If a garbage collector was provided as well: Yes. Otherwise NO!! There is something wich upsets me great time: Half the Ada community says: No garbage collector please! - The container library should do memory managment. The other half says: No, container libraries should not provide memory managment. It would be better for Ada if the Ada comunity make there mind up. Well since in AdaCL I have both I made my mind up: container libraries with memory management is more usefull. **************************************************************** From: Marius Amado Alves Sent: Tuesday, February 10, 2004 2:55 PM I just wrote the thing excerpted below. The whole is available at http://www.liacc.up.pt/~maa/containers Thanks. -- TRUC : TRUE CONTAINERS -- by Marius Amado Alves -- -- Truc is a proof-of-concept implementation of AI-302/3 -- for indefinite elements, i.e. indefinite generic formal -- element types (reorder the 4 adjectives at will). -- -- Truc automatically chooses the appropriate implementation -- for the actual type. Definite actuals select a Charles-like -- body, whereas indefinite ones select a SCOPE-like one. -- -- Truc is 100% written in Ada. Some optimizations could be -- done by going a bit outside the language. This is -- discussed elsewhere. -- -- Only the vector variety is implemented. -- Only a subset of the interface is implemented. **************************************************************** From: Randy Brukardt Sent: Tuesday, February 10, 2004 6:25 PM I'm going back and filing all of these messages about this AI, and I'm continually seeing statements like: "If the containers don't have , I'm not going to use them." I realize hyperbole as common on mailing lists, but you have to keep in mind the current situation. In order to meet the schedule, the ARG needs to complete proposals by the end of the June meeting, or they're not going to be in the standard. That reality means that there is not time to develop a significantly different proposal. (Wordsmithing is different; I expect there to be plenty of wordsmithing done on this proposal. I certainly hope that some of the problems noted by Jeff Carter (for instance) are fixed.) The strategy proposed by the committee was to standardize something like AI-302-3, and encourage the development of a secondary standard (at a more leisurely pace!) to handle creating additional containers to provide additional functionality, both performance related (bounded forms, lists, etc.), functional (sorted_maps, unsorted_sets, etc.), and operational (indefinite keys, indefinite elements, limited elements). We hope that providing a standard root will channel future developments in a common direction, rather than the scattershot approach that's currently prevalent. The ARG is going to have to decide either to follow that strategy, or essentially give up (because there is no time to develop an alternative). Now, when you say "I won't use it.", you're putting the ARG members into a spot: 1) Either the ARG has to standardize over the objections of users, "because we know better"; or 2) Decide that there is insufficient consensus, and forget the proposal. My feeling about the brief discussion at the San Diego meeting is that some ARG members view this as an insolvable problem, and would just as soon forget it (tossing it to some undefined International Workshop Agreement process). It took a lot of persuading by Tucker (and to a lesser extent, by me and couple of others) to set up the committee rather than just tossing it at that meeting. I fully expect to revisit that at our next meeting. If the discussion here gives the opponents too much ammunition, there probably won't be a standard container library in Ada now (and I personally think *ever*). If that is your true opinion, feel free to express it - I'd rather spend my time working on something that will likely be in the standard in that case! But otherwise, I'd suggest cutting down the hyperbole in your messages. **************************************************************** From: Marius Anado Alves Sent: Wednesday, February 11, 2004 4:03 AM They're not hyperboles. Please don't paternize. The wanted features missing in the proposal had been expressed since long ago, even formally, and repeatedly till now, and probably people saw this discussion as a last change to win the "resistance". It is clear now we must abandon all hope. Thanks for making that clear at last. So it's an incomplete library or none at all. I only fear an incomplete standard can do more harm than good, principally in respect to attracting new programmers to the language--by creating a bad first impression. (For what it's worth, I'd say toss it. Those Front and Back things looked terrible anyway.) **************************************************************** From: Pascal Obry Sent: Wednesday, February 11, 2004 4:25 AM > So it's an incomplete library or none at all. I only fear an incomplete > standard can do more harm than good, principally in respect to > attracting new programmers to the language--by creating a bad first > impression. (For what it's worth, I'd say toss it. Those Front and Back > things looked terrible anyway.) I tend to agree with Marius here. Especially if the next change is for 5 or 10 years from now ! A good programming language needs to provides a decent container library today. As I have already said, if the library is not broad enough it will just not be used, Charles, PragmArc or the Booch components will be used instead... In this case it is not even necessary to add a set of standard containers... Just my 2 cents of course. **************************************************************** From: Marc A. Criley Sent: Wednesday, February 11, 2004 7:57 AM It looks like the frequent situation of no lack of devil's advocates (who are only trying to make things better), and too few championing angels :-) The Ada software I develop can be split into two broad categories: performance critical, and non performance critical. When writing the latter, NIH (Not-Invented-Here) is a dirty word to me. I want to write software fast and right, and I'll happily reuse standard components, my own stuff that I've got laying around, and whatever utilities and libraries have been posted on the Internet for free use. For example, while Unbounded_Strings gets a lot of abuse in Ada discussions, it and standard strings have pretty much provided all of the string processing I've ever needed. I fully expect as well that I'll be extensively employing Ada.Containers just as soon as they're standardized. I don't care if there are some purported conceptual weaknesses or omissions, or if the implementation could be improved--if it provides the functionality I need, is effectively bug-free, performs "good enough", and better yet is part of the Ada standard, it gets used. I don't want to write infrastructure if there's already packages that provide it. I don't want to have to select among the pros and cons of six different home-grown container package collections and then have to concern myself with whether the developer is going to maintain them, or if I have to take on the responsibility for that as well. (I've never concerned myself with who's maintaing the Ada.Strings hierarchy, but I do now have to maintain my own version of a particular container collection.) I want Ada.Containers, and I will use them. Make them as good and powerful as you can, and then shut off the discussion and release them. **************************************************************** From: Martin Dowie Sent: Wednesday, February 11, 2004 8:22 AM > I want Ada.Containers, and I will use them. Make them as good and powerful > as you can, and then shut off the discussion and release them. I'd second that. What is currently proposed is admittedly limited but it would be useful. If Matt could adapt "Charles" into the core of the secondary standard that would be great too. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 9:36 AM That is indeed the plan. The current proposal has only a modest set of containers but we have to start somewhere. If you need something right away, there is a reference implementation available at my home page. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 9:44 AM I think you'll find that in spite of its modest size, the containers in the current proposal are indeed very, very useful. See in particular the !examples section in the AI itself. The reference implementation contains several examples, too. **************************************************************** From: Martin Krischik Sent: Wednesday, February 11, 2004 12:38 PM > The strategy proposed by the committee was to standardize something like > AI-302-3, and encourage the development of a secondary standard (at a more > leisurely pace!) to handle creating additional containers to provide > additional functionality, both performance related (bounded forms, lists, > etc.), functional (sorted_maps, unsorted_sets, etc.), and operational > (indefinite keys, indefinite elements, limited elements). We hope that > providing a standard root will channel future developments in a common > direction, rather than the scattershot approach that's currently prevalent. Ok, you are right there. I can easiely live with "indefinite later" - to name my pet feature - however some expressed an "indefinite never" stand and I can't live with that. **************************************************************** From: Marc A. Criley Sent: Wednesday, February 11, 2004 2:23 PM I fear the participants on this list are rather detached from the "average Ada programmer" experience. Of all the dozens of Ada programming _coworkers_ I've worked with over the years, I could count on one hand (and not even need all the fingers) the number that would know or care what Charles or PragmArc are (much less something called an "ARG"), and those few who'd heard of Booch would recall it as just something that had been used in the early days. Where are the journeyman programmers for whom Ada is just the language they write code in going to find data structures? If it doesn't show up in the reference manual, it'll be borrowed from some home- or project-grown thing that was done before, get ginned up yet again from scratch, or maybe get copied out of a dog-eared Ada textbook. Meanwhile, the C++ programmers have got the STL handed to them on a platter, and the Java programmers have got their big JDK posters and Javadocs with all those containers documented and ready to use. But for the Ada programmer that just clocks in, codes, and goes home to their family, nothing. **************************************************************** From: Pascal Obry Sent: Wednesday, February 11, 2004 2:59 PM > I fear the participants on this list are rather detached from the "average > Ada programmer" experience. This is not about average something or not. Just that I'm using Ada in the Information Technology domain. I don't really care(1) about size or performances, this is not hard real-time nor embedded applications. In the IS field we need a decent container libraries to speed-up developement. What I'm saying is that if the container library is not complete I'll use something else. And since people on the embedded or real-time field are certainly not going to use the standard containers but most probably some simpler version hand-coded for the application I'm a bit concerned about the current path... Ada is *not only* an embbeded real-time programming language! Pascal. (1) I did not say that I want quick and dirty code :) (2) BTW, I'm not sure to be an average Ada programmer :) **************************************************************** From: Robert A. Duff Sent: Wednesday, February 11, 2004 3:06 PM > Ok, you are right there. I can easiely live with "indefinite later" - > to name my pet feature - however some expressed an "indefinite never" > stand and I can't live with that. I don't remember anybody saying "indefinite never", but anyway, *my* opinion is that a secondary standard containing a rich variety of stuff, including all the bells and whistles that various folks have asked for, including indefinite component types, would be a Good Thing. But somebody has to take charge and push such a secondary standard through. I'm not volunteering. ;-) **************************************************************** From: Jeffrey Carter Sent: Wednesday, February 11, 2004 6:02 PM The only problem is that there doesn't seem to be any mechanism for such a secondary standard. Indeed, the intial call for proposals for the standard container library indicated that it was for a secondary standard, but it is intended to become part of the ARM now. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 10:42 PM The intent is to use a new ISO procedure called an "International Workshop Agreement". These get published essentially immediately (no lengthy approvals), and then can later be turned into real standards if that proves to be a good idea. But, as Bob mentioned, there have to be people to drive that "Workshop" (which doesn't need to be an actual workshop per-se). **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, February 12, 2004 2:57 AM There is such a mechanism: it is called an International Workshop Agreement. It is a relatively new ISO procedure, giving official status (though not formally Standard) to a specification for which there is consensus. Such an IWA may become later a full-fledged standard. Since it is new, nobody really knows how this works, and whether vendors would feel bound to providing packages defined by IWA. But the mechanism is here. **************************************************************** From: Jeffrey Carter Sent: Thursday, February 12, 2004 12:49 PM OK. How do we get such a "workshop" set up? I hope it's obvious that I'm willing to participate. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, February 13, 2004 4:44 AM It is an ISO process, therefore you should get in touch with Jim Moore. Of course, the first thing is to have a conveynor. If you step forward... **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 9:02 AM As an example of the approach Bob is advocating here, I have included two examples in the latest reference implementation. The two new examples are for a vector of indefinite elements and a set of indefinite elements. Both were implemented as a thin layer on top of the vector and set containers provided by the library itself. Neither one took very long to write (in fact I did it while watching an episode of The Simpsons). In the indefinite set example, I use the nested generic package Generic_Keys, and its nested generic package Generic_Insertion. In the indefinite vector example, I use the library-level Generic_Sort generic algorithm. In the test code, I instantiate each component with type String (an indefinite type). Note that if you want to instantiate the component with a class-wide tagged type T'Class, then you'll probably have to declare these class-wide operations somewhere: procedure Is_Equal (L, R : in T'Class) is begin return L = R; end; procedure Is_Less (L, R : in T'Class) is begin return L < R; end; and then use these as the generic actuals for "<" and "=". **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 8:59 AM > But Ada hasn't got a garbage collector so there is the deallocation problem. > Especialy when the container copied or passed around. The latest version of the reference implementation has examples of a vector of indefinite elements and a set of indefinite elements. Internally both instantiate the underlying container with a simple controlled type that manages an access object that designates element type of the higher-level container. See the Insert_N and Replace_Element operations in the indefinite vector package for a brief discussion of the various tradeoffs involved. See also Generic_Sort2. See also the indefinite sets package for an example of how to use the Generic_Keys nested generic package. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 9:24 AM > -- Truc is a proof-of-concept implementation of AI-302/3 > -- for indefinite elements, i.e. indefinite generic formal > -- element types (reorder the 4 adjectives at will). The latest version of the reference implementation has two new examples: one for a vector of indefinite elements and another for a set of indefinite elements. There is no "automatic" selection of a package. The programmer chooses the correct package himself, at the time of instantiation. If he needs to store indefinite elements, then he instantiates the package for indefinite elements. If his element type is definite, then he has the choice of using either the definite or indefinite packages. The package for definite elements will be more efficient, of course. **************************************************************** From: Marius Amado Alves Sent: Wednesday, February 11, 2004 9:55 AM > There is no "automatic" selection of a package. The programmer chooses > the correct package himself, at the time of instantiation. I know. I saw your code. It's fine. So one last try: how about configuring these indefinite elements versions as the one-page specialized needs annex below? Matt's manual choice approach has the virtue of fitting right in. ANNEX Containers of Indefinite Elements This Annex provides support for containers of indefinite elements. [Implementation Requirements] An implementation conforming to this Annex shall have the package Ada.Indefinite_Elements and descendants defined in this Annex. [Static Semantics] The specifications of the descendants of Ada.Indefinite_Elements are a copy of the specifications of the descendants of Ada.Containers specified in A.17, with the unique difference that, for each generic descendant of Ada.Containers that has a definite element formal type, the corresponding descendant of Ada.Indefinite_Elements has an indefinite formal type in its place. [Dynamic Semantics] The behaviour associated with each container of Ada.Indefinite_Elements is exactly like that defined in A.17 for the corresponding container of Ada.Containers. [Examples] Specification of Ada.Indefinite_Elements.Vectors: generic type Index_Type is (<>); type Element_Type (<>) is private; with function "=" (L, R : Element_Type) return Boolean is <>; package Ada.Indefinite_Elements.Vectors -- remainder of this package exactly like that of -- Ada.Containers.Vectors **************************************************************** From: Robert A. Duff Sent: Wednesday, February 11, 2004 11:02 AM > I just wrote the thing excerpted below. > The whole is available at > http://www.liacc.up.pt/~maa/containers > Thanks. It seems inefficient to store *two* vectors for each vector, and to select between them at run time, when 'Definite is generally known at compile time. Why not let the programmer choose to instantiate one or the other package? Also, this code uses 'Unrestricted_Access, which is not Ada. **************************************************************** From: Marius Amado Alves Sent: Wednesday, February 11, 2004 9:22 AM > The latest version of the reference implementation has examples of > indefinite vectors and indefinite sets, both of which can be used to > instantiate elements of type T'Class. Good news! BTW, Truc (www.liacc.up.pt/~maa/containers/truc.ada) has been updated also, with: - a test for classwide element types too (passed:-) - cosmetics **************************************************************** From: Marius Amando Alves Sent: Wednesday, February 11, 2004 12:51 PM > It seems inefficient to store *two* vectors for each vector, > and to select between them at run time, when 'Definite is generally > known at compile time. As I say in the Truc spec, going outside the language would make it optimized. I can think of a number of ways to do so, and eliminate those ineficiencies. Aside. The two vectors problem could perhaps be eliminated inside the language using tagged types (it would still be dynamic dispatching though, i.e. a runtime choice). I tried that but Ada got in my way and so I solved the problem quickly and dirtly. Anyway it is not a big inneficiency in practice because only one vector is used, and the other never used not even initialized vector has neglectable space and zero time impact. End of aside. > Why not let the programmer choose to instantiate > one or the other package? Staying within Ada, yes, that is better, and supports my suggestion to put the indefinite variants in a separate package branch defined in a specialized needs annex. And using the already existing reference implementations by Matt (released today). > Also, this code uses 'Unrestricted_Access, which is not Ada. You've got me, I'll have to change the 100% Ada claim to 99% :-) I used 'Unrestricted_Access instead of the Rosen trick because element types must be non-limited. Maybe there's another way, but I couldn't think of it. Aside. I used AI302.vectors of stream elements to avoid doing memory management, in one more experiment in pointerless programming. And for other reasons. For example programming for persistency: I can easily get a persistent container just by changing the stream operations. I needed write access to an in container because of this stream approach. Now I'm curious if Matt's implementation has this and how he did it. End of aside. **************************************************************** From: Robert A. Duff Sent: Wednesday, February 11, 2004 3:16 PM Marius Amado Alves wrote: > I know. I saw your code. It's fine. So one last try: how about configuring > these indefinite elements versions as the one-page specialized needs annex > below? Matt's manual choice approach has the virtue of fitting right in. I like this idea, but I don't think it should be in a Specialized Needs Annex (i.e. optional for implementers to support it). The problem is not that it's hard to support, but that it adds extra verbiage to the RM. You've shown, I think, that the extra verbiage could be pretty small. We compiler writers can probably even get Matt to code up the implementation for us. ;-) This idea is much better than a magic package that supports both definite and indefinite efficiently. **************************************************************** From: Robert A. Duff Sent: Wednesday, February 11, 2004 3:35 PM True. However, I think going outside the language is a bad idea. I say: An efficient implementation should be possible in pure Ada. As an implementer, I have no intention of adding compiler magic for this stuff -- I want to be able to just write pure Ada code (or, better yet, take advantage of Matt's work). Even Address_To_Access_Conversions makes me nervous -- yeah, it's Ada, but it's rather ill-specified. > I can think of a number of ways to do so, and eliminate those ineficiencies. > > Aside. > The two vectors problem could perhaps be eliminated inside the language using > tagged types (it would still be dynamic dispatching though, i.e. a runtime > choice). I tried that but Ada got in my way and so I solved the problem > quickly and dirtly. > > Anyway it is not a big inneficiency in practice because only one vector is > used, and the other never used not even initialized vector has neglectable > space and zero time impact. > End of aside. I don't want users of definite types to pay *any* penalty caused by supporting indefinite types. > > Why not let the programmer choose to instantiate > > one or the other package? > > Staying within Ada, yes, that is better, and supports my suggestion to > put the indefinite variants in a separate package branch defined in a > specialized needs annex. And using the already existing reference > implementations by Matt (released today). As I said in my previous message, that suggestion seems reasonable, except for the cialized needs annex" part. For portability, we don't need more optionally-supported features of Ada. On the other hand, maybe support for indefinite is just too much (for the Ada RM -- of course a secondary standard should support all bells and whistles). > > Also, this code uses 'Unrestricted_Access, which is not Ada. > > You've got me, I'll have to change the 100% Ada claim to 99% :-) OK, but 99% isn't good enough. I want these packages to implementable in 100% pure Ada. If that's not possible (as in the defaulted-discrims case somebody mentioned) we need to change the language to *make* it possible. > I used 'Unrestricted_Access instead of the Rosen trick because element > types must be non-limited. Maybe there's another way, but I couldn't > think of it. Well, I can think of ways involving "for X'Address use.." or Address_To_Access_Conversions, and I might be willing to live with that, but I don't like it. I didn't read your code carefully enough to understand whether 'Unrestricted_Access was really needed. Why not declare the thing aliased, and use 'Unchecked_Access? > Aside. > I used AI302.vectors of stream elements to avoid doing memory management, in > one more experiment in pointerless programming. And for other reasons. For > example programming for persistency: I can easily get a persistent container > just by changing the stream operations. > > I needed write access to an in container because of this stream > approach. Now I'm curious if Matt's implementation has this and how he > did it. End of aside. Is it not possible to allocate the indefinite thing in the heap, and still know when it needs to be freed? I don't like memory leaks... **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 4:24 PM > I like this idea, but I don't think it should be in a Specialized Needs > Annex (i.e. optional for implementers to support it). The problem is > not that it's hard to support, but that it adds extra verbiage to the > RM. You've shown, I think, that the extra verbiage could be pretty > small. I actually was going to suggest the same thing, given that the wording needed is roughly the same supporting a "wide_string" version of something given a "string" version. I'll put it as an Open Issue in the "bug fix" update of the AI. (I don't want to make major changes to the AI, because I don't want to present a moving target to the ARG members who are supposed to be studying it for the upcoming meeting...) **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 4:57 PM I have several places in the reference implementation that I've notated with "NOTE", places in the AI where the semantics aren't exactly specified, where there's disagreement, where there can be improvement, etc. Should I send you a list or something? When would you like me to do that? **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 5:21 PM Sure, do that. Any time is fine, but no later than the start of next week. **************************************************************** From: Jeffrey Carter Sent: Wednesday, February 11, 2004 5:56 PM >>The specifications of the descendants of Ada.Indefinite_Elements are a copy of >>the specifications of the descendants of Ada.Containers specified in A.17, >>with the unique difference that, for each generic descendant of >>Ada.Containers that has a definite element formal type, the corresponding >>descendant of Ada.Indefinite_Elements has an indefinite formal type in its >>place. This doesn't seem quite right. The containers all have an Element_Type formal, so it can specify that type. Maps have a Key_Type that should also be indefinite, so it should specify it as well. However, this seems like a good way to add support for indefinite elements to the proposal. If only adding additional containers could be this easy! **************************************************************** From: Marius Amado Alves Sent: Wednesday, February 11, 2004 5:46 PM >I'll put it [Annex ] as an Open Issue in the "bug fix" update of the AI. Great! Annex is all it takes to make the proposal "complete". It is already fairly complete with respect to structural varieties (vector, set, map). What it is really missing is element type varieties (definite, indefinite). The group (definite, indefinite) has *exactly* the same properties as (vector, set, map). Primitive (in the good sense of course), concise, complete, useful. (Definite, indefinite) as opposed to (definite, indefinite, tagged, limited, abstract...), like (vectors, set, map) vs. (vector, set, map, queue, list...), these two oppositions are in perfect alignment. The extra things in each latter group can be realised with the ones in the former. The proposal is complete only if it is complete along at least these two axes (structural variety, element type). The other axes--size, persistence--are of lesser impact. It does not offend me at all to have them set to a fixed point in the standard--unbounded, core memory--, and extend them in secondary standards--(fixed, bounded, unbounded...), (core, cache, file...) A nice simetry. Container space has 4 axis (structure, element type, size, persistence). Aternative 3 with Annex ranges over 2, and fixes a point in the other 2. The ranges and points defining the most primitive region. The standard region. I promise this is my last motivational rambling for indefinite elements. I needed to have a view of the whole, evidently I used my "system of coordinates", and I thought I might share it with you. Talking of Open Issues: the range vs. discrete index issue. I'd say range. It solves the problem of Assert failing on enumerations. And the use of an enumeration for the index of a *variable* length vector does not make much sense. Ditto for modular types. **************************************************************** From: Robert A. Duff Sent: Wednesday, February 11, 2004 3:40 PM One thing that disappoints me about the current containers proposal is that there's no way to control memory allocation. The C++ STL allows the client to define which storage pool should be used. Would it be possible for us to do the same, without burdening users who just want to use "the regular heap"? **************************************************************** From: Randy Brukardt Sent: Wednesday, February 11, 2004 4:17 AM I don't think so. There were proposals offered for naming the standard storage pool(s) and allowing defaults for generic formal parameters, but both of those died an early death. (See AI-299 and AI-300.) Those were aimed at solving this problem. Since we're not reintroducing existing, killed proposals (and certainly the need for them in containers libraries was well explained when first considered - there's no "new information" here), it would have to be done without them. That means that about the only way to do it would be with an access type (with "null" meaning use the default pool). That seems very ugly to me, especially as you would have to make the pool that you want to pass in "aliased". And there doesn't seem to be a good place for that access type to live. In any case, that strikes me as creeping featurism. **************************************************************** From: Matthew Heaney Sent: Wednesday, February 11, 2004 4:34 PM You could do something like this: generic type Element_Type is private; Pool : in out Root_Storage_Pool'Class; with function "=" (L, R : ET) return Bool is <>; package Ada.Container.Vectors is ...; //for ex. There are several problems: (1) The language standard doesn't specify any storage pool objects. I suppose that the standard library could define a few default pool objects, though. (2) Even if you do have a pool then you run into problems with static matching rules, since the generic formal pool type is T'Class, which doesn't match a specific type NT in T'Class. So you have to resort to hacks like: package My_Pools is My_Pool : My_Pool_Type; --derives from RSP My_Pool_View : Root_Storage_Pool'Class renames Root_Storage_Pool (My_Pool); end; and then use My_Pool_View as the generic actual pool object. (3) It's in conflict with our design principle that components be easy to instantiate and use. I would love to have a generic formal pool object default a la "is <>" or "is ", but the language doesn't let you specify defaults for generic formal objects. (4) You might be able to get around (2) by declaring a generic formal derived type: generic type ET is private; type Pool_Type is new Root_Storage_Pool with private; Pool : in out Pool_Type; package Ada.Containers.Vectors is ...; but then this is in conflict with (3), because now there's another formal type (which cannot be defaulted). In C++ generic formal pool objects ("allocators") are allowed to have a default, by constructing an allocator on-the-fly. But then there's some rule about the STL that requires allocator objects be shared (or something like that)??? And then it complicates things for implementors because you have to use the "empty virtual base class" trick to avoid allocating padding for otherwise empty classes. Realize that adding custom allocator support to the STL complicated the semantics somewhat (do objects have the same or different allocators? -- affects assignment rules, etc). An early version of Charles allowed you to pass in a storage pool, but I eventually gave it up because it was too many headaches for casual users who didn't care about supplying their own pool. If you've studied my reference implementation then you might have noticed that the substrate package (e.g. charles.red_black_trees) used to implement the higher-level container is written so that all the allocation and deallocation is done by the higher-level package. The substrate package is completely agnostic about how storage allocation gets done. This allows the user of the instantiation of the red-black tree (say) to use a pool if he wants, or indeed even statically allocate the nodes. In fact the container elements can even be limited. The substrate package doesn't care. All the ugliness is hidden from the container user by the wrapper container package. So I reached the conclusion that if someone (like, um, Bob Duff, who has written lots of custom storage pools) needs a special sorted set that uses some fancy storage pool, then it's not too hard to do that using the substrate package directly and building his own wrapper class. Perhaps there is a way to do this. It may be that there's some slick language trick that I haven't figured out that would allow the user to pass in his own pool without too much pain at instantiation time. There is also the multi-threading issue. Clearly the user has the responsibility to not allow concurrent access to the same container object, but what about different threads each manipulating their own container object, so we have multiple container objects (and hence multiple threads) sharing a common pool object? But I suppose you could use the same synchronization mechanism you use for alligator new. Maybe you could make some other (non-limited) abstraction, and pass that in as the default, e.g. generic type ET is private; Pool : Pool_Handle := Default_Pool; --from somewhere package Generic_Containers is ...; but the language doesn't give you anything like the placement new construct in C++, which allows you to construct an object in-place, at a location you specify. There is a sort of hack you can do by declaring a pool object on-the-fly, that binds to an object (in some raw form e.g. storage elements) to be constructed using an access discriminant. Then you make a dummy call to new, and internally the pool object specifies the address of the object (to which the pool object is bound) as the address return value. The run-time system will then call Initialize on that object. Placement new in Ada95! But that's kind of a trick and I don't really know if it will work. **************************************************************** From: Tucker Taft Sent: Wednesday, February 11, 2004 9:31 AM I believe the intent is that all of these containers use controlled types to avoid storage leakage, analogous to what unbounded strings do. (In fact, I could imagine that a vector and an unbounded string would have a lot in common under the covers.) So I'm not sure how a user-defined pool would interact with that (and I fear based on our own experience that putting finalizable things in user-defined pools can be tricky). Note that this will give more incentive for implementors to "sharpen up" their implementation of controlled types. I think that is a good thing, so we are spending energy improving existing features of the language, rather than dissipating energy on lots of different ways of skinning the same cat. **************************************************************** From: Tucker Taft Sent: Wednesday, February 11, 2004 5:13 PM Matthew Heaney wrote: > ... > (3) It's in conflict with our design principle that components be easy > to instantiate and use. I would love to have a generic formal pool > object default a la "is <>" or "is ", but the language doesn't let > you specify defaults for generic formal objects. I generally agree with your reasoning, but this particular statement is false, unless you say "formal IN OUT objects." Formal IN objects certainly can have defaults, and the way to pass in a storage pool would be as Randy suggested, via an access value. The default could be implementation-defined, with the semantics that it implies the standard storage pool, if allowed to default. But I still believe my earlier response, that mixing user-defined storage pools and controlled types is asking for complexity, and doesn't seem to buy enough to justify itself. One reason to have a user-defined storage pool is to do some kind of garbage collection, or to do mark/release. Both of those could easily interfere with the implementation of controlled types, unless the user was very careful, and had a pretty good idea about how controlled types were implemented. **************************************************************** From: Simon J. Wright Sent: Thursday, February 12, 2004 3:12 AM Marc A. Criley wrote: > Where are the journeyman programmers for whom Ada is just the > language they write code in going to find data structures? If it > doesn't show up in the reference manual, it'll be borrowed from some > home- or project-grown thing that was done before, get ginned up yet > again from scratch, or maybe get copied out of a dog-eared Ada > textbook. For a project of any size I would not expect journeyman programmers to be making this sort of choice; it should be a matter of policy set by the software architect(s), along with "how we use tasks", "how we deal with exceptions" etc. So the question is, where do the architects find stuff? and clearly the ARM is a very good start (though I have to admit there are parts of Annex A that I'm not at all familiar with and should be! Strings.Maps, for example). I started maintaining the BCs because I needed containers for a demo project. Although it has been fun, I would never have done so if the proposed library had been available, and I strongly support it. **************************************************************** From: Marc A. Criley Sent: Thursday, February 12, 2004 7:46 AM Speaking as a software architect for both Ada and C++ projects, your characterization of this aspect of the architect's job is quite correct. One of my sub-tasks was ensuring that only the authorized container classes were being used, even to the point of once having to threaten to reject a developer's code if he didn't start using STL instead of coding up his own comparable classes. However, I've had plenty of experience with other architects and leads who aren't on this mailing list, don't visit comp.lang.ada (or comp.lang.c++), aren't on the Team-Ada mailing list, don't subscribe to any technial magazines or journals, aren't in ACM, don't go home and code at night or on weekends, and own only one book covering each programming language they have to deal with, whether it be Ada, C++, Java, Perl, etc. When they need container classes, they check the project's code base or they go to their books, which for Ada does usually include the reference manual. That's where container availability needs to be publicized, because if they do find something on the Web (like Booch or Charles or PragmArc), they need to first expend the effort to convince themselves that that "home-grown" collection is something that would be useful, and then overcome most management's resistance to using "free", unsupported software. Making Ada.Containers part of the standard language distribution gives it a cachet of legitimacy that means that architects/leads don't have to fight that fight. And they end up with a container collection that will handle the needs of most non performance critical projects. **************************************************************** From: Stephen Leake Sent: Thursday, February 12, 2004 11:48 AM Just to state my position for the record: I like AI-302-3. I think the rationale for the design needs to be more clearly stated (particularly why the key and element types are definite). Examples of how to build packages supporting indefinite types would be good; an actual standard for that (layered on top of the current one) would be better, but I can wait for that. I like the term "cursor" instead of "iterator"; "iterator" is clearly overloaded, while "cursor" matches the usage in SQL. Since these packages are intended to be low-level building blocks, I'd rather see them called "unbounded_array", "hashed_map", and "sorted_tree". But that's a small issue. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 3:41 PM See the latest reference implementation (Thu, 12 Feb 2004) for examples of using the canonical containers to implement indefinite vectors and indefinite sets. I'll have at least one example of a map of indefinite elements tomorrow. > I like the term "cursor" instead of "iterator"; "iterator" is clearly > overloaded, while "cursor" matches the usage in SQL. The Iterator design pattern described in the Gamma book says that "Cursor" is an alias for the term "Iterator", so you seem to be in good company. > Since these packages are intended to be low-level building blocks, I'd > rather see them called "unbounded_array", "hashed_map", and > "sorted_tree". But that's a small issue. Low-level is a point of view. It's a vector implemented as an unbounded array, not an unbounded array per se. It's a map, implemented using a hash table, but not a hash table per se. It's a sorted set, implemented using a balanced (red-black) tree, but not a tree per se. Yes, they're building blocks. Yes, they're low level. But they're not as low-level as unbounded arrays, hash tables, and red-black trees. **************************************************************** From: Stephen Leake Sent: Friday, February 13, 2004 8:19 AM > > "sorted_tree". But that's a small issue. > > Low-level is a point of view. > > It's a vector implemented as an unbounded array, not an unbounded > array per se. Hm. Let's compare Ada.Containers.Vectors to SAL.Poly.Unbounded_Array. Vectors has Insert in the middle, Sort, and Element_Access. SAL allows indefinite and limited items. Otherwise they are the same. Sort is a reasonable operation for any container; I would put it in a child package, since many applications won't need it. I guess that means SAL.Poly.Unbounded_Array is actually a "vector"? What would a true low-level unbounded_array look like? > It's a map, implemented using a hash table, but not a hash table per se. I need to see your definition of "hash table"; this looks like one to me. > It's a sorted set, implemented using a balanced (red-black) tree, but > not a tree per se. This one I'll grant you; it is more complex than just a tree. > Yes, they're building blocks. Yes, they're low level. But they're > not as low-level as unbounded arrays, hash tables, and red-black trees. As long as the names are sufficiently clear, and it is clear how to name new components that complement these, I'm happy. As I see it, all of these names have loose enough definitions that this issue is _not_ a show stopper. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 9:33 AM > Yup. That's precisely how Type'Size works in Ada; it has a fairly weak > effect on Obj'Size, but in any case, if you set it, you have to return the > same value (even if that value has nothing to do with how objects are > actually stored). The Size function is analogous to the capacity() member function in the STL vector class. The Resize procedure is analogous to the reserve() member function. A vector container is implemented internally as a contiguous array, that expands as items are inserted into the container. The Size function returns the length of the internal array. The Length function returns the number of elements in the array that are "active," that have actually been inserted into the vector. At all times a vector satisfies the invariant that Length (V) <= Size (V) The procedure Resize tells the vector to expand to at least the size specified in the call. If the current size is equal to or greater than the value specified, then Resize does nothing. If the current size is less than the value specified, then the internal array is expanded. The standard does not specify the exact algorithm for expansion, and only requires that the Size function return at least the value specified. There's nothing special an implementation needs to do to keep track of the current value of the size, since it has that information already: it's just the result of the 'Length attribute for the internal array. >>Resize is an appropriate name for the operation as specified. I expect >>an operation named Resize to cause resizing. If we're really talking >>about giving the implementation a hint about an appropriate size, then >>not only does the specification need to be changed, the name also needs >>to be different (perhaps Size_Hint?). The semantics for Resize are described above. > I don't see a strong need to change the name, but I do agree with you that > there shouldn't be a *requirement* to do some allocation. There is a requirement for allocation only if the current size is less than the size specified in the call to Resize. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 9:58 AM > We compiler writers can probably even get Matt to code up the > implementation for us. ;-) Ask, and you shall receive... The latest version (12 Feb 2004) of the reference implementation has an example of a sorted map, implemented using the sorted set and by instantiating its nested generic package Generic_Keys. There are also two examples of hashed sets, for both definite and indefinite elements. This standard doesn't have a hashed set but if it did then this is what it would look like. **************************************************************** From: Stephen Leake Sent: Thursday, February 12, 2004 11:55 AM > Yup. That's precisely how Type'Size works in Ada; it has a fairly weak > effect on Obj'Size, but in any case, if you set it, you have to return the > same value (even if that value has nothing to do with how objects are > actually stored). I think that's a bad idea. It means my scenario (preserve the max size of a container in a config file, and set it next time on startup) won't work. If I set the max size from yesterday on startup today, but then the container grows larger today, when I exit and query the size, I won't get the larger correct size, but just the one I set at the begining. Setting the size should be a hint, but only for a starting point. Querying the size should always return the current size. **************************************************************** From: Jeffrey Carter Sent: Thursday, February 12, 2004 12:47 PM Randy Brukardt wrote: > Yup. That's precisely how Type'Size works in Ada; it has a fairly > weak effect on Obj'Size, but in any case, if you set it, you have to > return the same value (even if that value has nothing to do with how > objects are actually stored). Not quite precisely. There are cases where a compiler is required to use the specified 'Size. > One allocation per key is a lot more than one allocation per *map*, > which is what a stringspace implementation takes. (Well, it might > have to expand if it gets full, but that should be rare. It could > degrade to one allocation per key if the keys are very, very long, > but some care in implementation should prevent degrading.) OK. I misunderstood. > I have a component like that (it's actually Tom Moran's), but in > practice, I've *never* used it without using the index values it > provides to manage some other data in a separate table (at least > statistics and/or debugging). Even the 'known words' list in the spam > filter uses the indexes (handles) for debugging. If that's the case, > why bother having to use a separate component (causing another chance > of error)? > > So I would guess that the "dummy type" would gain some real data in > 95% of the applications. And that such uses are less than 10% of the > uses of a map anyway. Since this is a minimal library, we're not > trying to cover that remaining 0.5%. It's not "another component"; it's the underlying implementation of the hashed map component. My point is that we're requiring the implementation of a hash table, which is a useful component, but not requiring that it be provided to users. That's like requiring that a compiler be able to convert strings into numbers, but not having 'Value in the language. It doesn't require any additional work by implementors, nor introduce an additional opportunity for errors, but it does increase the utility of the library. To me it's a no brainer, as is converting the map part of the "sorted set" component (Generic_Keys) into its own component: it's no additional work for implementors, and allows the user to obtain a sorted map with a single instantiation, instead of 2. Put another way, the library has 2 different approaches to defining a map. In one, we have a map component and hide the underlying implementation (the hash table). In the other, we have the "sorted set" component, and then a map component implemented in terms of it. We should at least be consistent, and I argue that our consistency take the form of providing both the underlying implementation and the map implemented in terms of it. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 3:21 PM > It's not "another component"; it's the underlying implementation of the > hashed map component. My point is that we're requiring the > implementation of a hash table, which is a useful component, but not > requiring that it be provided to users. A hash table might be at the wrong level of abstraction (too low). The hashed map actually takes the level of abstraction up a notch. In my original proposal, I allowed the user to query each bucket of the underlying hash table array, but the subcommittee rejected that approach as too low level, in favor of higher-level First and Succ active iterator operations. **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 3:43 PM > Setting the size should be a hint, but only for a starting point. > Querying the size should always return the current size. Yes, querying the size should always return length of the internal array. If the value specified in the call to Resize is larger than the current length of the internal array, then the internal array is expanded to at least the length specified. **************************************************************** From: Simon J. Wright Sent: Thursday, February 12, 2004 10:42 AM > The Size function is analogous to the capacity() member function in the > STL vector class. > > The Resize procedure is analogous to the reserve() member function. ... Do we really need these operations? I presume that they support optimisation by allocating extra space ahead of time -- do our users really need that? (assuming of course that the vector will resize itself if it finds it needs to). **************************************************************** From: Matthew Heaney Sent: Thursday, February 12, 2004 11:10 AM Yes, it makes the optimization you describe -- the Resize preallocates an internal array large enough to contain all future insertions. The optimization is important especially for very large numbers of elements. But don't take my word for it. Measure the performance of this procedure: procedure Not_Optimized (V : in out Vector_Type) is begin for I in 1 .. 1_000_000 loop Append (V, New_Item); end loop; end; and then compare it to this one: procedure Optimized (V : in out Vector_Type) is begin Resize (V, Size => 1_000_000); for I in 1 .. 1_000_000 loop Append (V, New_Item); end loop; end; If you really want to see a difference then use a complex element type, perhaps one that is controlled and does lots of internal allocation. I know it makes a difference because I've actually had the problem. In my streaming media server, when a file is requested I must load large indexes comprising several hundred thousand elements that describe the frames in the file (these are 2 hour movies). When I first wrote the server there was a huge spike in the CPU monitor whenever I loaded a file, and this tended to disrupt existing streaming clients. (This is a real-time streaming media server, and I have to service several hundred clients simultaneously.) I did some analysis and realized is was population of the index vector that was the cause of my problem. So I just figured out my total number of indexes before inserting and then did a Resize. And now all it well. So performance matters, and therefore we should keep Size and Resize. Of course, if your vector objects are small, or you don't have any special performance needs, then you can just ignore Resize and the vector will work fine. **************************************************************** From: Robert A. Duff Sent: Thursday, February 12, 2004 12:01 PM I would expect the former to about lg(1_000_000) = 20 allocations, and the latter to do 1 allocation, presuming the growth is exponential, which it should be. (E.g. double the size each time you run out of space.) > So performance matters, and therefore we should keep Size and Resize. I agree. I use a similar growable array abstraction quite heavily in my current project, and there are cases where the code knows the size ahead of time (or can guess), and I care enough about speed to do the Resize. **************************************************************** From: Alexandre E. Kopilovitch Sent: Thursday, February 12, 2004 3:29 PM > Yes, but access types themselves are not tagged. What they point at is irrelevant. > If you have a formal "type T is tagged private;" no access type will match > that; it's the same for interfaces. Still don't understand: if we can do something useful with, say, an array (or Unbounded_Array) of interface objects then why can't do the same with an array of accesses to interface objects - just dereferencing them before calling a member of the interface? **************************************************************** From: Randy Brukardt Sent: Thursday, February 12, 2004 4:58 PM Because you can't create a container (a map, say) of access types in this model. Remember, an interface has no implementation, so at some point you have to have a concrete implementation. Let me try to give a very simple example: (* Warning *) This is not a serious proposal! (* End Warning *) package Ada.Containers is type Element_Interface is interface; -- Any element operations here (I don't think there need to be any). type Cursor_Interface is interface; -- Any common cursor operations here. end Ada.Containers; package Ada.Containers.Interfaces is type Forward_Iterator_Container_Interface is interface; function Null_Cursor (Container : Forward_Iterator_Container_Interface) return Cursor_Interface'Class is abstract; function Front (Container : Forward_Iterator_Container_Interface) return Cursor_Interface'Class is abstract; procedure Increment (Container : Forward_Iterator_Container_Interface; Cursor : in out Cursor_Interface'Class) is abstract; function Element (Container : Forward_Iterator_Container_Interface; Cursor: Cursor_Interface'Class) return Element_Interface'Class is abstract; ... -- (It might make more sense to put the "iterator" operations on the Cursor_Interface. -- But then you'd need a separate interface just for element access through a cursor.) end Ada.Containers.Interfaces; with Ada.Containers.Interfaces; generic type Key_Type is private; type Element_Type is new Element_Interface; ... -- As before package Ada.Containers.Maps is type Map_Type is new Ada.Containers.Interfaces.Forward_Iterator_Container_Interface with private; -- Of course, other useful interfaces also would be included here. -- Probably including a "map" one. type Cursor_Type is new Cursor_Interface with private; ... -- As before. (With appropriate Null_Cursor and Increment routines). end Ada.Containers.Maps; Now, to use this, the element type has to 'have' the Element_Interface interface: type My_Element_Type is new Ada.Containers.Element_Interface ... with ...; You can't instantiate the container with a scalar type or an access type or an array type or any record that doesn't have the Element_Interface interface. Now, the point of all of this is that you now can write an iteration routine that will work for any container having the Forward_Iterator_Container_Interface. For instance, to create a passive iterator, you could do (this of course isn't useful, but the ability to write such things is): generic with procedure Process (Element : in Element_Interface'Class); procedure Iterator (Container : Forward_Iterator_Container_Interface'Class); procedure Iterator (Container : Forward_Iterator_Container_Interface'Class) is Current : Cursor_Interface'Class := Front (Container); begin while Current /= Null_Cursor (Container) loop Process (Element (Container, Current)); Increment (Container, Current); end loop; end Iterator; Moreover, the instantiations are pretty much the same as the current proposal. But the element types are limited to tagged types. **************************************************************** From: Ehud Lamm Sent: Thursday, February 12, 2004 2:43 AM But signature packages would work ok, wouldn't they? **************************************************************** From: Randy Brukardt Sent: Thursday, February 12, 2004 5:06 PM Signature packages violate the meta-rule about ease of instantiation: as few instantiations as possible to get a usable container. (That's one instantiation, of course.) As far as I can tell, to use them like interfaces, they'd have to be a parameter to the generic container package. But perhaps you had something else in mind. In any case, I don't like signature packages. They add layers of overhead on a generic sharing implementation (every generic package has a cost, the more you use, the more that cost is), turning the performance of pretty much anything into that of bad Java code. (That's not a problem if the signature doesn't contain anything "expensive", but trying to define that - and work around it - is a fool's game.) **************************************************************** From: Randy Brukardt Sent: Friday, February 13, 2004 12:13 AM Jeffrey Carter: > > Yup. That's precisely how Type'Size works in Ada; it has a fairly > > weak effect on Obj'Size, but in any case, if you set it, you have to > > return the same value (even if that value has nothing to do with how > > objects are actually stored). > > Not quite precisely. There are cases where a compiler is required to use > the specified 'Size. Not for a (sub)type. 13.3(48) says that an object's size is *at least* as large as the specified size. Anything else said is "advice". ... > It's not "another component"; it's the underlying implementation of the > hashed map component. My point is that we're requiring the > implementation of a hash table, which is a useful component, but not > requiring that it be provided to users. That's like requiring that a > compiler be able to convert strings into numbers, but not having 'Value > in the language. It doesn't require any additional work by implementors, > nor introduce an additional opportunity for errors, but it does increase > the utility of the library. Not true at all. Building a separate hash table component and then building a map on top of that would be a horrible implementation performance-wise. Lots of extra call and generic overhead. So, in practice, they'd have completely separate implementations -- thus, you'd be doubling the work. Moreover, the component you're describing (a hash table without elements) wouldn't have any place to *put* elements. So I don't see how you could even use it to implement the map. (The hash table component you're suggesting would return a Cursor object to represent each key, but that item isn't an index that you could use in a sequence. So how would you associate a key from the hash table with an element? A linear list would work, but would essentially make the hash table useless.) What I suspect would happen in practice is that the relatively useless hash table component would be implemented in terms of a map with a null record element type. What's the point in that - the user can do that themselves if they need it? > To me it's a no brainer, as is converting the map part of the "sorted > set" component (Generic_Keys) into its own component: it's no additional > work for implementors, and allows the user to obtain a sorted map with a > single instantiation, instead of 2. Matt will tell you that the difference between a Sorted_Set using Generic_Keys and a Map (any kind) is that the key doesn't have a separate existence in the Sorted_Set; it's part of the element. Whereas in a Map, it is separate from the element. There's obviously a significant space advantage to avoiding duplicate keys. I originally deleted the Generic_Keys component as redundant (because I too thought it was a Map), then put it back after a discussion on C.L.A. showed how important it is. Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set, and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of people have asked that the List be put back. But that would quickly ballon the proposal to double its size, and in any case smacks of "feeping creaturism". :-) **************************************************************** From: Matthew Heaney Sent: Friday, February 13, 2004 8:32 AM > Not true at all. Building a separate hash table component and then building > a map on top of that would be a horrible implementation performance-wise. Gulp! I guess Randy hasn't looked at the reference implementation yet... > Lots of extra call and generic overhead. So, in practice, they'd have > completely separate implementations -- thus, you'd be doubling the work. Jeff may have assumed (perhaps by looking at the reference implementation) that implementors would implement the (hashed) map as a layer on top of a separate generic hash table component. But as Randy notes, implementors won't necessarily implement the map container that way, and so Jeff is basically advocating that another component (specifically, a low-level hash table data structure) be added to the standard library. > Matt will tell you that the difference between a Sorted_Set using > Generic_Keys and a Map (any kind) is that the key doesn't have a separate > existence in the Sorted_Set; it's part of the element. Whereas in a Map, it > is separate from the element. There's obviously a significant space > advantage to avoiding duplicate keys. What Randy told you Matt would tell you is correct... > I originally deleted the Generic_Keys component as redundant (because I too > thought it was a Map), then put it back after a discussion on C.L.A. showed > how important it is. Yes. It allows the instantiator to take advantage of properties of the generic actual set element type that the generic set itself isn't privy to. See for example the Indefinite_Sets example in the reference implementation. > Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set, > and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of > people have asked that the List be put back. But that would quickly ballon > the proposal to double its size, and in any case smacks of "feeping > creaturism". :-) What Randy told you Matt would tell is once again correct... **************************************************************** From: Marius Amado Alves Sent: Friday, February 13, 2004 12:54 PM I've updated Truc: the "100% Ada" claim is now true. The URL is the same (www.liacc.up.pt/~maa/containers/truc.ada) Truc features an implementation of indefinite elements using streams, alternate to Matt's approach using controlled deallocation. This could be of interest to implementors. But remember Truc was a proof-of-concept and is missing many standard functions. The other principal feature of Truc is now merely academic, that it choses automatically the most appropriate implementation to the actual element type w.r.t. definiteness. It is now settled that the choice will be manual (done by the user). **************************************************************** From: Dan Eilers Sent: Friday, February 13, 2004 6:44 PM I think its a little to soon to say that manual choice is settled. Certainly it is agreed that there should not be any overhead from support of indefinite types forced onto users of definite types. But a user really probably prefers not to have to worry about which flavor of each container to instantiate, just like users of generic_elementary_functions currently don't have to explicitly select between single and double precision versions. You earlier proposed a language extension as an aside: > Aside. Of course there is still no standard means to do this, but it > would be a nice extension. Conditional compilation of generic bodies > based on instantiation properties. Variant units :-) > generic > type T is private; > ... > package G is > when T'Definite => > ...; > when others => > ...; > end; > (On the subject of conditional compilation, see also the recent Ada > Preprocessor thread on CLA.) This looks like too large of a change for the benefit, but there may be a simpler change that would work. For example, by extending the syntax for renames to allow a conditional expression, as in: generic package p1 is end p1; generic package p2 is end p2; with p1, p2; generic package p3 renames (if condition then p1 else p2); **************************************************************** From: Alexandre E. Kopilovitch Sent: Friday, February 13, 2004 9:38 PM > Because you can't create a container (a map, say) of access types in this > model. Remember, an interface has no implementation, so at some point you > have to have a concrete implementation. I remember that, but I still can't get how it may be possible that 1) we can create a container of interfaces and 2) we can create a container of accesses and 3) we have accesses to interfaces but at the same time we cannot create a container of accesses to interfaces. I don't understand how the delayed implementation of interfaces may create this situation. Let me follow your example: > Let me try to give a very simple example: > > (* Warning *) This is not a serious proposal! (* End Warning *) > > package Ada.Containers is > type Element_Interface is interface; Let's change the above line to: type Item_Interface is interface; type Element_Access is access all Item_Interface; > -- Any element operations here (I don't think there need to be any). > > type Cursor_Interface is interface; > -- Any common cursor operations here. > end Ada.Containers; > > package Ada.Containers.Interfaces is > type Forward_Iterator_Container_Interface is interface; > function Null_Cursor (Container : Forward_Iterator_Container_Interface) return > Cursor_Interface'Class is abstract; > function Front (Container : Forward_Iterator_Container_Interface) return > Cursor_Interface'Class is abstract; > procedure Increment (Container : Forward_Iterator_Container_Interface; > Cursor : in out Cursor_Interface'Class) is abstract; > function Element (Container : Forward_Iterator_Container_Interface; > Cursor: Cursor_Interface'Class) return Element_Interface'Class is abstract; and the above function to: function Element (Container : Forward_Iterator_Container_Interface; Cursor: Cursor_Interface'Class) return Element_Access is abstract; function Item (Container : Forward_Iterator_Container_Interface; Cursor: Cursor_Interface'Class) return Item_Interface'Class is abstract; > ... > -- (It might make more sense to put the "iterator" operations on the Cursor_Interface. > -- But then you'd need a separate interface just for element access through a cursor.) > end Ada.Containers.Interfaces; > > with Ada.Containers.Interfaces; > generic > type Key_Type is private; > type Element_Type is new Element_Interface; change above line to type Item_Type is new Item_Interface; type Element_Type is access all Item_Type; > ... -- As before > package Ada.Containers.Maps is > type Map_Type is new > Ada.Containers.Interfaces.Forward_Iterator_Container_Interface with private; > -- Of course, other useful interfaces also would be included here. Probably > -- including a "map" one. > type Cursor_Type is new Cursor_Interface with private; > > ... -- As before. (With appropriate Null_Cursor and Increment routines). > end Ada.Containers.Maps; > > Now, to use this, the element type has to 'have' the Element_Interface interface: > type My_Element_Type is new Ada.Containers.Element_Interface ... with > ...; correspondily: Now, to use this, the element type must be access to a type that has to 'have' the Item_Interface interface: type My_Item_Type is new Ada.Containers.Item_Interface ... with ...; type My_Element_Type is access all My_Item_Type; > You can't instantiate the container with a scalar type or an access type or > an array type or > any record that doesn't have the Element_Interface interface. But now, with the above changes we can instantiate the containter with access to a tagged type that has Item_Interface interface. Where I am wrong here - in which point/step? **************************************************************** From: Randy Brukardt Sent: Friday, February 13, 2004 9:47 PM ... > correspondily: > > Now, to use this, the element type must be access to a type that has to 'have' the > Item_Interface interface: > type My_Item_Type is new Ada.Containers.Item_Interface ... with ...; > type My_Element_Type is access all My_Item_Type; > > > You can't instantiate the container with a scalar type or an access type or > > an array type or any record that doesn't have the Element_Interface interface. > > But now, with the above changes we can instantiate the containter with access > to a tagged type that has Item_Interface interface. > > Where I am wrong here - in which point/step? This works, of course, but now you can only instantiate with access-to-interfaces. That's even more limiting than just interfaces - because you have to do all of the memory management yourself. If you've been following along here, I'm sure you've noticed that that won't do. You could of course support this as an alternative implementation with both sets of stuff around. But then you've instantly doubled the size of the library -- and you still can't have a container of floats or of arrays (especially of unconstrained arrays). Wrappers are very space-inefficient in the first case, and barely possible for unconstrained arrays (the code to use them will be very ugly). **************************************************************** From: Matthew Heaney Sent: Friday, February 13, 2004 9:08 AM The current version of the reference implementation has examples of indefinite sets, maps, and vectors. However, I have discovered a potential anomaly in indefinite containers that I wanted to make users aware of. An indefinite container is implemented by storing a pointer to the (indefinite) element, and doing the allocation and deallocation of the element behind the scenes during insertion and deletion. The issue comes up in the item-less forms of insertion. In that case, there is a null pointer for the element. This has several consequences. Consider the vector. When we do an item-less insert, does that mean we copy the internal pointer up to the next position, and leave a null pointer at the insertion position? Or do we leave the original element there and make a copy of the element to slide up? When we delete vector elements, do we move pointers down, and leave null element pointers behind? Or are we required to make a copy to slide down? What should the passive iterator do when it hits a null element pointer? Skip that position or just raise Constraint_Error? Should we generalize Replace_Element, to allow a "null element" as the replacement value? Does Generic_Element return a null pointer if the element pointer is null, or does it raise CE? What should sort do with null elements? Assume that a null element is always less than a non-null element? This affects streaming of elements too, because you have to stream out an extra bit to indicate whether the element is null or not. My tentative assumption is that we'll have to omit the item-less insertion operations in the indefinite containers. This mostly applies only to vector and map, but I still have to analyze the behavior of indefinite set Generic_Keys nested package. The reference implementation doesn't do anything special for indefinite vectors. The indefinite map handles null elements. I will fix both this weekend, as I prepare an errata list for Randy. If the developers who want indefinite containers have an opinion about these matters than please speak up. **************************************************************** From: Alexandre E. Kopilovitch Sent: Friday, February 13, 2004 10:19 AM Matthew Heaney wrote: > I have discovered a potential anomaly in indefinite containers > that I wanted to make users aware of. > > An indefinite container is implemented by storing a pointer to the > (indefinite) element, and doing the allocation and deallocation of the > element behind the scenes during insertion and deletion. > > The issue comes up in the item-less forms of insertion. In that case, > there is a null pointer for the element. This has several consequences. > >... > > Should we generalize Replace_Element, to allow a "null element" as the > replacement value? No. >... > If the developers who want indefinite containers have an opinion about > these matters than please speak up. Yes, I think that this is right way to go - exclude item-less elements from containers for indefintite types altogether. This is clear and well-justified restriction, and it will not harm usefulness of those containers significantly. >From user's viewpoint I believe that this restriction is fair price for admission of indefinite type in containers (with basic memory management), at least in a basic library. **************************************************************** From: Marius Amado Alves Sent: Friday, February 13, 2004 10:45 AM First what is item-less insertion? Checking the AI I guess it is this: << procedure Insert_N (Vector : in out Vector_Type; Before : in Index_Type'Base; Count : in Natural); Equivalent to Insert_N (Vector, Before, Count, New_Item), with the difference that the elements in the Count positions starting at Before are not assigned. >> That is (correct me if I'm wrong), the inserted elements hold garbage. Garbage is garbage, definite or indefinite. Any attempt to read garbage should raise an exception (I'm checking now if the AI has this provision; it should). Sorry, I'm not following strictly your questions, but I think I'm answering them. Another 'problem' is what a proper multiple insertion (Insert_N/4 with Count > 1) does for indefinite elements: repeat the same pointer or create N copies of the item? Value semantics, man. Create N copies. **************************************************************** From: Marius Amado Alves Sent: Friday, February 13, 2004 11:18 AM Matt, I'm rechecking you questions one by one now, against the 'philosophy' expressed in my previous post. ... > Consider the vector. When we do an item-less insert, does that mean we > copy the internal pointer up to the next position, yes > and leave a null > pointer at the insertion position? A null or another internal sign of garbage. > Or do we leave the original element > there and make a copy of the element to slide up? This does not make sense. The user is inserting garbage. > When we delete vector elements, do we move pointers down, and leave null > element pointers behind? Or are we required to make a copy to slide down? Move pointers. And leave *nothing* behind. Shrink the vector, as per the spec. > What should the passive iterator do when it hits a null element pointer? Whatever it does when it hits an unassigned (garbage) element. Definite or indefinite. > Skip that position or just raise Constraint_Error? Definitely raise something. But in definite elements too. And maybe a more specific exception. Value_Error. Data_Error. > Should we generalize Replace_Element, to allow a "null element" as the > replacement value? I'm not sure I understand. Please do not create another special entity. Definitely not Null_Element, which the user would have to define. > Does Generic_Element return a null pointer if the element pointer is > null, or does it raise CE? See above. > What should sort do with null elements? Assume that a null element is > always less than a non-null element? I say raise something. > This affects streaming of elements too, because you have to stream out > an extra bit to indicate whether the element is null or not. Again, raise. That is, in practice, forbid sort or (container-wide) streaming of a container with (yet) unassigned elements. Let the user create its one 'null' element value, if he needs to process it. > My tentative assumption is that we'll have to omit the item-less > insertion operations in the indefinite containers. No. Or, omiting it, omit in definite too. > This mostly applies > only to vector and map, but I still have to analyze the behavior of > indefinite set Generic_Keys nested package. With the philosophy subsumed in my replies, that analysis should be clear ;-) Please note my solution implies containers have a 'validity' state. Namely, if they contain unassigned elements they are invalid w.r.t. some operations e.g. sort. Maybe a Valid predicate should be added to the spec. Alternatively, we can simply remove the creation of unassigned elements i.e. omit the item-less insertion. > The reference implementation doesn't do anything special for indefinite > vectors. The indefinite map handles null elements. I will fix both > this weekend, as I prepare an errata list for Randy. > > If the developers who want indefinite containers have an opinion about > these matters than please speak up. **************************************************************** From: Stephen Leake Sent: Friday, February 13, 2004 11:48 AM Matthew Heaney writes: > An indefinite container is implemented by storing a pointer to the > (indefinite) element, and doing the allocation and deallocation of the > element behind the scenes during insertion and deletion. ok, good. > The issue comes up in the item-less forms of insertion. In that > case, there is a null pointer for the element. Why would I want to do this? Seems bogus to me. Just remove this operation, all the problems go away! I had not noticed these versions of Insert before. Do you have an example of when they are useful? Note that for definite Item_Type, you can still get Constraint_Error from an itemless Insert, unless the element is initialized to some valid value. > Consider the vector. When we do an item-less insert, does that mean > we copy the internal pointer up to the next position, and leave a null > pointer at the insertion position? Yes. > Or do we leave the original element there and make a copy of the > element to slide up? Why should the null pointer case be any different than the non-null case? > When we delete vector elements, do we move pointers down, and leave > null element pointers behind? Or are we required to make a copy to > slide down? I guess you mean what do you leave in vector (last + 1). I would move pointers down, and leave a null pointer (again, this is the same whether we have null inserts or not). > What should the passive iterator do when it hits a null element > pointer? Skip that position or just raise Constraint_Error? Raise Constraint_Error. The user asked for it. > Should we generalize Replace_Element, to allow a "null element" as > the replacement value? no. Unless you have an example of when that would be useful. > Does Generic_Element return a null pointer if the element pointer is > null, or does it raise CE? Raise Constraint_Error. It might be nice to have a version of Generic_Element that returns the pointer, rather than the element. As Maps.Generic_Element does. > What should sort do with null elements? Assume that a null element is > always less than a non-null element? Raise Constraint_Error. > This affects streaming of elements too, because you have to stream > out an extra bit to indicate whether the element is null or not. Raise Constraint_Error. > My tentative assumption is that we'll have to omit the item-less > insertion operations in the indefinite containers. This mostly > applies only to vector and map, but I still have to analyze the > behavior of indefinite set Generic_Keys nested package. Ok by me. > The reference implementation doesn't do anything special for > indefinite vectors. The indefinite map handles null elements. I will > fix both this weekend, as I prepare an errata list for Randy. > > If the developers who want indefinite containers have an opinion about > these matters than please speak up. I have :). **************************************************************** From: Matthew Heaney Sent: Friday, February 13, 2004 5:33 PM >>The issue comes up in the item-less forms of insertion. In that >>case, there is a null pointer for the element. > > Why would I want to do this? Seems bogus to me. Just remove this > operation, all the problems go away! That's what I'll do. > I had not noticed these versions of Insert before. Do you have an > example of when they are useful? Because you don't always have a value to assign immediately. What you want to do is make space in the vector for all the items, and then do the assignment. For example, suppose you want to copy a list into a vector: V : Vector_Type; procedure Copy (List : List_Type; I : Index_Type) is C : Cursor_Type := First (List); J : Index_Type := I; begin Insert_N (V, Before => I, Count => Length (List)); for K in 1 .. Length (List) loop Replace_Element (V, Index => J, By => Element (List)); Increment (Cursor); J := Index_Type'Succ (J); end loop; end Copy; If you don't do it this way, then your time complexity is O(n*m) instead of O(n+m). >>My tentative assumption is that we'll have to omit the item-less >>insertion operations in the indefinite containers. This mostly >>applies only to vector and map, but I still have to analyze the >>behavior of indefinite set Generic_Keys nested package. > > Ok by me. This simplifies the model. Let's do it this way. **************************************************************** From: Randy Brukardt Sent: Friday, February 13, 2004 10:54 PM Matt Heaney wrote: > An indefinite container is implemented by storing a pointer to the > (indefinite) element, and doing the allocation and deallocation of the > element behind the scenes during insertion and deletion. > > The issue comes up in the item-less forms of insertion. In that case, > there is a null pointer for the element. This has several consequences. Well, you have to decide precisely what containers you are creating. (That's the designers job, I think). Consider the Sequence. (Aside: I don't think the name "Vector" is going to make it, given that AI-296 has about 10 years dibs on that name. And I don't think we want two different things with the same name in the standard...) If your container supports sparse sequences, then you need to decide what it means to not have an element at a position. And whatever that decision is, it probably ought to be the same for both forms. I tend to agree that referencing an empty element should cause an exception in that case (it's better than returning garbage). (Which means that Sorting and [passive] Iteration would raise that exception when the first empty element was reached.) OTOH, if your container does not support sparse sequences, then I don't see why you ought to have item-less forms of insertion in the first place. Inserting nothing is a mistake if you can't have undefined elements. In either case, it is clear that deletion should shrink the (virtual) length of the sequence. To do anything else would mean that you couldn't reliably iterate on a sequence that has ever been deleted from. That seems goofy. Of course, that doesn't mean that you need to change the length of the internal array. And doing so means that it is irrelevant how items past the logical end of the array are represented. I do think that if you support sparse sequences, you need to be able to stream them in and out. They seem to be potentially useful (imagine a histogram vector; values that never occurred would not need any value at all), and if they are legitimate at all, they have to be streamable. Of course, if you don't support sparse sequences and you get one anyway, that's a bug. Crashing is fine. :-) I know that at least some readers have thought that sparse sequences are supported. So a definitive decision on that is needed. **************************************************************** From: Robert A. Duff Sent: Saturday, February 14, 2004 9:56 AM > (Aside: I don't think the name "Vector" is going to make it, given that > AI-296 has about 10 years dibs on that name. And I don't think we want two > different things with the same name in the standard...) I don't really agree. They are widely-separated enough that confusion can be avoided. We already have "dispatching", which means an indirect call when you're talking about tagged types, but means choosing which task to run when you're talking about tasks. "Pragma Controlled" and "Finalization.Controlled" are totally unrelated. A "stub" in the DS Annex has something to do with inter-process communication; a "stub" in the core language is a syntactic placeholder for a body. Probably more... So there's precedent for using confusing terminology when convenient. ;-) "Vector" is good because it matches what other languages call the thing, and it's short, unlike "Growable_Array" and the like. [snipped stuff I agree with] > I know that at least some readers have thought that sparse sequences are > supported. So a definitive decision on that is needed. Yes, this is another case where I think the programmer needs to know (via impl advice or whatever) what's going on under the hood. **************************************************************** From: Nick Roberts Sent: Saturday, February 14, 2004 3:51 PM Randy Brukardt wrote: >>Since an indefinite Key_Type is required for >>Containers.Maps.Strings, why not make that capability available to the >>users? > > We definitely expect that the strings container will use a purpose-built > data structure for storing strings, not some general indefinite item > capability. Ways to compactly and efficiently store sets of varying size > strings are well known and commonly used. > > Such algorithms could be extended to a general "unconstrained array of > elementary", but that hardly seems to be a worthwhile definition for keys. The key value of each element stored in a map (implemented as a hashed array) must also be stored. Since the Element_Type is definite, making the Key_Type definite as well makes it possible for the key values (as well as the element values) to be stored in a fixed array. This has the advantage of making the implementation simpler, but the disadvantage of not supporting indefinite key types (which I reckon would be useful in a significant minority of cases). Simplifying the implementation has two benefits: implementation costs are reduced and the risk of failure (bugs) reduced; executional efficiency (speed more than memory use in this situation) is likely to be increased. I understand Randy is arguing that executional efficiency should be considered of relatively low importance for these containers, and I agree. On the other hand, implementation simplification is, I suspect, going to be considered quite important by the ARG (and WG9?). I would, on balance, prefer an indefinite key type, but I've set out the reasons why a definite key type would be preferred, and I would guess these reasons would prevail. >>Another point: Containers.Vectors.Size should return Index_Type'Base, >>and the Size parameter in Resize should also be Index_Type'Base. It's >>confusing to have different types for Size and Index. >> >>There's also a problem if Natural'Last < Index_Type'Last; you >>can't have a vector that contains every index! > ... > So I don't see a great solution. I wondered about using "Hash_Type" here (it > has the correct properties), but that seems like a misuse of the type (and a > bad idea in a library that most Ada programmers will read - you want to show > them good style in standard libraries). My preferred solution would be to remove the Index_Type generic parameter altogether, and make the index type Standard.Positive. I believe this would have the advantage of simplifying the package from the user's point of view, it would solve at a stroke the problems mentioned above, and I believe that no-one in practice will ever need to use a different index type. **************************************************************** From: Robert A. Duff Sent: Sunday, February 15, 2004 11:57 AM I disagree. Using different index types for different kinds of arrays is a very useful way to catch bugs, even when all those index types are basically just 1..2**31-1. This is true for the normal built-in array types, and also for growable ones (Vectors). I have a growable-array generic in my current project that is instantiated dozens of times, and it has a "range <>" parameter for the index type. Some instantiations share the same index type, but most have their own, and I think that's a Good Thing. Furthermore, using Positive doesn't solve Randy's problem -- he's got a compiler where Positive'Last = 2**15-1, but the machine has a 32-bit address space, so you very well might want Vectors longer than Positive'Last. Furthermore, if the Index_Type is "range <>" (which I think it should be), then the Size can reasonably be of a subtype declared like this: subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last; As I said before, allowing Index_Type to be modular or enumeration is not useful, and introduces anomalies. **************************************************************** From: Matthew Heaney Sent: Sunday, February 15, 2004 1:08 PM Bob Duff wrote: > Furthermore, if the Index_Type is "range <>" (which I think it should > be), then the Size can reasonably be of a subtype declared like this: > > subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last; Bob you have my latest API for the vector container. What I did as a replacement for Natural is this: type Element_Count is 0 .. ; There's also a Positive_Element_Count subtype. I don't know if this is the way you want to go but at least it's a start. I like your idea above, too. One issue is that the T'Last of the size/length type (Size_Type'Last in your example) needs to be at least the value of Index_Type'Last - Index_Type'First + 1. I'm not sure your scheme above will work since Index_Type'Base might not have all those values. Consider using subtype Natural as the generic actual index type, which means you have one too many values to represent. There's always going to be some type that's too big. Suppose I instantiate the vector with Long_Long_Integer? In that case I don't have any integer type that can fit the number of values that are theoretically possible. I don't think there's any real issue for generic actual index types with a large range, since you're not going to put that many elements in the vector container anyway. The problem cases are when you use a type with a smaller range, e.g. type My_Index_Type is range -128 .. 127; The number of possible container elements is 256, but T'Base'Last might only be 127 (indeed that's all it's required to be). Of course we could require that users declare their type to have the required properties: Last : constant := 127; First : constant := -128; N : constant := Last - First + 1; type My_Index_Type_Base is First .. N; type My_Index_Type is new My_Index_Type range First .. Last; But this is probably too subtle for typical language users. In the new reference implementation I sent you I use Syste.Max_Int to declare the Element_Count type, which means casual users would only have an issue for a generic actual index type such as Long_Long_Integer (whose use as an index type I would expect to be rare). > As I said before, allowing Index_Type to be modular or enumeration is > not useful, and introduces anomalies. The generic formal index type was also changed as you suggested, to use the stronger form "range <>" instead of the weaker form "(<>)". **************************************************************** From: Ehud Lamm Sent: Sunday, February 15, 2004 4:34 AM > Ehud Lamm wrote: > > But signature packages would work ok, wouldn't they? > > Signature packages violate the meta-rule about ease of > instantiation: as few > instantiations as possible to get a usable container. (That's one > instantiation, of course.) As far as I can tell, to use them like > interfaces, they'd have to be a parameter to the generic > container package. > > But perhaps you had something else in mind. > I agree with the rationale behind meta-rule: simple things should be simple. The signatures will not be required in order to use the containers. They will only be required once you try to write code that should work across containers AND across libraries. Since this isn't going to be the most common scenario this probably falls outside the 80/20 guideline, so I'll leave it at that. Personally, I like signature packages and interface-oriented programming, and I would have liked the library to encourage this style even more than it currently does. But what's now on the table is still a big step forward. **************************************************************** From: Jeffrey Carter Sent: Sunday, February 15, 2004 9:29 PM Randy Brukardt wrote: > Moreover, the component you're describing (a hash table without elements) > wouldn't have any place to *put* elements. So I don't see how you could even > use it to implement the map. (The hash table component you're suggesting > would return a Cursor object to represent each key, but that item isn't an > index that you could use in a sequence. So how would you associate a key > from the hash table with an element? A linear list would work, but would > essentially make the hash table useless.) Apparently I'm not making myself clear. Consider: generic -- Hash_Tables type Element is private; with function "=" (Left, Right : Element) return Boolean is <>; with function Hash (Item : Element) return Hash_Value is <>; package Hash_Tables is type Hash_Table is private; procedure Insert (Into : in out Hash_Table; Item : in Element); -- Inserts Item into Into. If Into contains an Element X such that -- Item = X, replaces X with Item. procedure Delete (From : in out Hash_Table; Item : in Element); -- If From contains an Element X such that Item = X, deletes X -- from From. Otherwise, has no effect. function Is_In (Item : Element; Table : Hash_Table) return Boolean; -- If Table contains an Element X such that Item = X, returns True; -- Otherwise, returns False function Get (Item : Element; From : Hash_Table) return Element; -- If From contains an Element X such that Item = X, returns X. -- Otherwise, raise Constraint_Error. private -- Hash_Tables ... end Hash_Tables; generic -- Hashed_Maps type Key_Info is private; type Element is private; with function "=" (Left, Right : Key_Info) return Boolean is <>; with function Hash (Item : Key_Info) return Hash_Value is <>; package Hashed_Maps is type Hashed_Map is private; procedure Insert (Into : in out Hashed_Map; Key : in Key_Info; Item : in Element); -- Inserts Key/Item into Into. If Into contains a key X such that -- Key = X, replaces the Element associated with X with Item. procedure Delete (From : in out Hashed_Map; Key : in Key_Info); -- If From contains a key X such that Key = X, deletes X and the -- Element associated with it from From. Otherwise, has no effect. procedure Is_In (Key : Key_Info; Map : Hashed_Map) return Boolean; -- If Map contains a key X such that Key = X, returns True. -- Otherwise, returns False. procedure Get (Key : Key_Info; Map : Hashed_Map) return Element; -- If Map contains a key X such that Key = X, returns the Element -- associated with X. Otherwise, raises Constraint_Error. private -- Hashed_Maps type Hash_Node is record Key : Key_Info; Item : Element; end record; function "=" (Left, Right : Hash_Node) return Boolean; -- Performs Left.Key = Right.Key. function Hash (Item : Hash_Node) return Hash_Value; -- Performs Hash (Item.Key). package Implementation is new Hash_Tables (Element => Hash_Node); type Hashed_Map is record Table : Implementation.Hash_Table; end record; end Hashed_Maps; Insert, Delete, and Is_In should be obvious. Get would be implemented as Dummy : Element; begin -- Get Dummy.Key := Key; return Implementation.Get (Dummy, Map.Table).Item; Obviously a lot of functionality is missing from this simple example, but it clearly demonstrates how a hash table can be used to implement a map, while leaving the hash table available for those who are not storing key/value pairs. Yes, I know these won't compile :) **************************************************************** From: Randy Brukardt Sent: Monday, February 16, 2004 10:19 PM > Apparently I'm not making myself clear. Consider: Definitely. :-) ... > Obviously a lot of functionality is missing from this simple example, > but it clearly demonstrates how a hash table can be used to implement a > map, while leaving the hash table available for those who are not > storing key/value pairs. OK, what you're calling a Hash Table is what Matt called a Hashed Set. To me, a hash table is an index without any elements at all - it's used as part of the implementation of some larger component. In any case, as I said earlier, that implementation (which is very similar to Matt's) would be horrible on our compiler. You'd end up with 3 separate allocations per element, plus a bunch of call overhead. Other compilers mileage may vary (although I'd expect most would generate better code without the extra generic). So, you cannot assume that there is "no extra cost" here; it would be another entire component. It would, of course, be very similar to the "Sorted_Set" component, so it's hard to see that there is enough value to having a separate container for The Standard, but I'd expect it to appear in the secondary standard (along with List and Sorted_Map). **************************************************************** From: Nick Roberts Sent: Monday, February 16, 2004 5:47 PM Robert A Duff wrote: > Nick Roberts wrote: > >> My preferred solution would be to remove the Index_Type generic >> parameter altogether, and make the index type Standard.Positive. I >> believe this would have the advantage of simplifying the package from >> the user's point of view, it would solve at a stroke the problems >> mentioned above, and I believe that no-one in practice will ever need >> to use a different index type. > > I disagree. Using different index types for different kinds of arrays > is a very useful way to catch bugs, even when all those index types are > basically just 1..2**31-1. This is true for the normal built-in array > types, and also for growable ones (Vectors). I think you are fundamentally wrong on this point, Bob. And I mean 'fundamentally', as I am looking at it from a very purist point of view (perhaps too purist, I'm not sure). I'll try to explain. I think arrays (in Ada and similar languages) are used for two fundamentally different purposes: (a) as a mapping, from the index subtype to the element subtype; (b) as a sequence of elements. What marks out the difference between (a) and (b) is that for a sequence, it is the order of the elements that is of primary importance. A good example of usage (a) is the array type Schedule in RM95 3.6 (28), which maps from Day to Boolean. A good example of usage (b) is a String. In usage (b), the index type is merely used to indicate the relative positions of the elements of the sequence, and it has long become common and programming (at least in Ada!) convention to call the first element number 1, the second number 2, and so on. In mathematics, the set N of natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost always used for this purpose. In Ada, the subtype Positive is almost always used (it is used for String), and I think it makes logical sense to use the same subtype for this single purpose. I believe that, in practice, an extensible array will only ever have usage (b). Therefore, logically, I think the index type should always be Positive. I think this argument is reinforced by the tangle that using a generic Index_Type has obviously got you into. If you simply use Positive, the problems all go away. Isn't that a bit of a hint? > I have a growable-array generic in my current project that is > instantiated dozens of times, and it has a "range <>" parameter for the > index type. Some instantiations share the same index type, but most > have their own, and I think that's a Good Thing. Then ask yourself the question: how difficult would it be to remove the "range <>" parameter and use Positive instead throughout? I suspect you would find this quite easy to do, and that the result would be easier to read and understand. > Furthermore, using Positive doesn't solve Randy's problem -- he's got a > compiler where Positive'Last = 2**15-1, but the machine has a 32-bit > address space, so you very well might want Vectors longer than > Positive'Last. I doubt that very much (that you very well might want Vectors longer than Positive'Last). Presumably this decision was made having being satisfied that users would not want any String to be longer than 2**15-1 characters. Surely it would be silly to expect users to be happy with this constraint on strings, but rebel against it applying to extensible arrays? Surely, if users of this implementation really required bigger extensible arrays, they would almost certainly also demand bigger strings, in which case the right solution would be to make Integer 32-bit based? > Furthermore, if the Index_Type is "range <>" (which I think it should > be), then the Size can reasonably be of a subtype declared like this: > > subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last; This might be considered a reasonable solution, but it could go wrong. If Index_Type'First < 1, it might be possible for an extensible array to reach a length greater than Index_Type'Base'Last. [I think the term 'length' is more appropriate than 'size'.] This solution imposes another subtype (or maybe type) upon the user; one for each instantiation of the extensible array package, in effect. A user would be annoyed when, for example, trying to compare the length of one extensible array to that of another (with a different Index_Type'Base), to find the compiler complaining: type Apple_Count is range 0..100; -- maximum of 100 apples type Orange_Count is range 0..2000; -- maximum of 20000 oranges subtype Apple_Index is Apple_Count range 1..Apple_Count'Last; subtype Orange_Index is Orange_Count range 1..Orange_Count'Last; package Apple_Baskets is new Ada.Containers.Vectors(Apple_Index,Apple); package Orange_Baskets is new Ada.Containers.Vectors(Orange_Index,Orange); Apple_Basket: Apple_Baskets.Vector_Type; Orange_Basket: Orange_Baskets.Vector_Type; ... if Size(Apple_Basket) < Size(Orange_Basket) then This comparison might not work on some implementations. Worse, it might work on other implementations, and the user could be pretty mystified as to why. if Size(Apple_Basket) < Apple_Baskets.Size_Type(Size(Orange_Basket)) then seems ugly, and could raise Constraint_Error, and if Natural(Size(Apple_Basket)) < Natural(Size(Orange_Basket)) then seems to defeat the purpose (of not simply using Positive as the index type), and could also raise Constraint_Error (with Randy's compiler, for example). I think my way is simpler and better: instantiations of the package do not require an Index_Type; there is no need for a separate size/length (sub)type. It is easier to understand and there is less to go wrong. As an aside, I would reiterate that I think the name 'Size' for the ordinality function is confusing, and ought to be 'Length', to accord with the meaning of the Length attribute. > As I said before, allowing Index_Type to be modular or enumeration is > not useful, and introduces anomalies. And I think replacing Index_Type with Positive would reduce the anomalies still further. **************************************************************** From: Randy Brukardt Sent: Monday, February 16, 2004 10:56 PM > I believe that, in practice, an extensible array will only ever have usage > (b). Therefore, logically, I think the index type should always > be Positive. That's only true if we're not supporting sparse sequences. (And perhaps not even then.) I disagree with Bob and Matt that modular indexes aren't useful, and can even imagine uses for enumeration index types (although that would be rare enough not to worry about). > I think this argument is reinforced by the tangle that using a generic > Index_Type has obviously got you into. If you simply use Positive, the > problems all go away. Isn't that a bit of a hint? Yeah, and if we got rid of the generic and just made the elements void * we'd have less problems still. :-) Seriously, Ada is about strong typing, and you're suggesting to deny the programmer the power of strong typing in this package. That's a non-starter in my view. ... > > Furthermore, using Positive doesn't solve Randy's problem -- he's got a > > compiler where Positive'Last = 2**15-1, but the machine has a 32-bit > > address space, so you very well might want Vectors longer than > > Positive'Last. > > I doubt that very much (that you very well might want Vectors longer than > Positive'Last). Presumably this decision was made having being satisfied > that users would not want any String to be longer than 2**15-1 characters. That's a complete fallacy. The reason this decision was made (in 1987!) was that we wanted to be able to migrate users from our 16-bit MS-DOS compilers to our 32-bit compilers with as little incompatibility as possible. The intent was that if a program was recompiled on a 32-bit compiler, it would run and work, including being able to read and write files in the same format. > Surely it would be silly to expect users to be happy with this constraint > on strings, but rebel against it applying to extensible arrays? Surely, if > users of this implementation really required bigger extensible arrays, they > would almost certainly also demand bigger strings, in which case the right > solution would be to make Integer 32-bit based? If someone wants a 32-bit string, all they have to do is write: type Long_Natural is range 0 .. 2**31-1; subtype Long_Positive is Long_Natural range 1 .. Long_Positive'Last; type Long_String is array (Long_Positive range <>) of Character; which works fine (except for the language-defined packages). Moreover, this will work on essentially any Ada compiler (including our 16-bit MS-DOS compilers) without any dependence on the definitions of predefined types. OTOH, making Integer 32-bit would use more data memory (potentially a lot more), and could make existing files unreadable. The amount of pain for a programmer to change from 16-bit Integer to 32-bit Integer depends on the code of course, but it can be worse than moving to another compiler altogether. We don't want to be encouraging our customers to move to another vendor! The only real option would be to have a compiler switch of some sort to select which is used, but that would require lots of work in the compiler - everything assumes a single definition for Standard. (Yes, we've studied it seriously, as the choice of 16-bit for Integer is a significant portability issue - far too many people assume the range of that type, where if they really care about the range, they should declare their own type.) There are many other things of more value to our customers at this time. No Ada program should depend on predefined elementary types. Period. Unfortunately, type String drags in Natural, leaving no real chance to enforce a decent Ada style (you can't easily tell when a use of Natural is for indexing String, or when it is being abused). That's a bug in the Ada design, but one we're going to have to live with. > This solution imposes another subtype (or maybe type) upon the user; one > for each instantiation of the extensible array package, in effect. A user > would be annoyed when, for example, trying to compare the length of one > extensible array to that of another (with a different Index_Type'Base), to > find the compiler complaining: I agree, but not with your solution. Clearly, there should be a Size_Type next to Hash_Type in Ada.Containers. If you actually need to do math on it (which should be very rare), you'd need a "use type Ada.Containers.Size_Type;", but with any decent style, you'll need that no matter what the type is or where it is declared. You don't want it in the generic unit (for the reasons you stated), Natural is clearly bad (use predefined scalar types only for String in new code - we want to show readers of the standard good style), so a type is needed somewhere fairly high up in the hierarchy. **************************************************************** From: Matthew Heaney Sent: Monday, February 16, 2004 11:54 PM I got rid of the subtype Natural in the container packages, per Randy's request. I modified the proposal and the reference implementation so that each generic package declares its own modular Element_Count type. In the case of the map it just derives from Hash_Type; in the vector and set it's its own declaration. My issue with Randy's solution is that the operators for the size type aren't visible where the instantiation is visible, so you have to with Ada.Containers specially. (But is that really true? I still have to check that.) By declaring the type right in the generic package, the user has immediate access to the size type. Perhaps it's not such a big deal to have to make a special with of Ada.Containers. I don't really know. One advantage of Randy's solution is that the packages can share the size type. So for example you can pass the result of the Length function of one container to the Resize operation for some other container, and no type conversion is necessary. On the other hand, doing that across different container instantiations might be rare. So where the size type lives, what its name is, etc, is still very tentative. The next release will merely show one way to do it. **************************************************************** From: Robert A. Duff Sent: Tuesday, February 17, 2004 8:42 AM ... > In usage (b), the index type is merely used to indicate the relative > positions of the elements of the sequence, and it has long become common > and programming (at least in Ada!) convention to call the first element > number 1, the second number 2, and so on. In mathematics, the set N of > natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost > always used for this purpose. In Ada, the subtype Positive is almost always > used (it is used for String), and I think it makes logical sense to use the > same subtype for this single purpose. Positive is rarely used in well-written Ada code, except when using String. It was a language-design mistake to use Positive for String; there should have been a separate String_Index type. It was also a language design mistake to put non-standard stuff like Integer and Long_Integer in Standard. > I believe that, in practice, an extensible array will only ever have usage > (b). Therefore, logically, I think the index type should always be Positive. I agree with the above philosophy (mappings vs sequences). However, it does not follow that sequences should always be indexed by Positive. It should usually be indexed by a type whose range is 1... There are good reasons why the programmer might want different upper bounds. There are also some cases where 0.. makes more sense for a sequence. Therefore, we should leave this choice to the programmer. Furthermore, it is important to allow the programmer to use different index types for unrelated sequences, in order to prevent bugs. For the same reason, when I declare a sequence-like array type, I usually declare a new index type for it. If two array types are related so that I want to say things like: for I in ... loop ... A(I) ... ... B(I) ... then I use the same index type for both. > I think this argument is reinforced by the tangle that using a generic > Index_Type has obviously got you into. If you simply use Positive, the > problems all go away. Isn't that a bit of a hint? My proposal has no "tangles" that I can see. All the tangles are caused by using modular or enumeration types for the index, which I don't recommend. > > I have a growable-array generic in my current project that is > > instantiated dozens of times, and it has a "range <>" parameter for the > > index type. Some instantiations share the same index type, but most > > have their own, and I think that's a Good Thing. > > Then ask yourself the question: how difficult would it be to remove the > "range <>" parameter and use Positive instead throughout? I suspect you > would find this quite easy to do, and that the result would be easier to > read and understand. It would of course be trivial to remove that capability, but that's not the issue. It would damage the type checking, so I wouldn't do that. By the way, my growable arrays generic says: pragma Assert(Index_Type'First = 1); I did run into one case where that was inconvenient, and I wanted sequences starting at 100_000_000, 200_000_000, etc. I decided not to remove that assertion, though. > > Furthermore, using Positive doesn't solve Randy's problem -- he's got a > > compiler where Positive'Last = 2**15-1, but the machine has a 32-bit > > address space, so you very well might want Vectors longer than > > Positive'Last. > > I doubt that very much (that you very well might want Vectors longer than > Positive'Last). Presumably this decision was made having being satisfied > that users would not want any String to be longer than 2**15-1 characters. > > Surely it would be silly to expect users to be happy with this constraint > on strings, but rebel against it applying to extensible arrays? Surely, if > users of this implementation really required bigger extensible arrays, they > would almost certainly also demand bigger strings, in which case the right > solution would be to make Integer 32-bit based? Well, the machine in question is a 32-bit machine, so Integer really *should* be 32 bits. But Randy chose 16 bits for compatibility reasons, which makes perfect sense. Perhaps if Randy's customers had followed good coding practise, he wouldn't have been forced into that decision. ;-) > > Furthermore, if the Index_Type is "range <>" (which I think it should > > be), then the Size can reasonably be of a subtype declared like this: > > > > subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last; > > This might be considered a reasonable solution, but it could go wrong. If > Index_Type'First < 1, it might be possible for an extensible array to reach > a length greater than Index_Type'Base'Last. So don't do that. You and I already agreed that Index_Type'First = 1, usually. Even if it's 0, you can't create a Vector that big, presuming the upper bound is 2**31-1 on a 32-bit machine. >... [I think the term 'length' is > more appropriate than 'size'.] I agree that size is not ideal. But we're not talking about the *current* length, we're talking about the maximum length we can grow to without doing more allocation. How about Buffer_Length, which appropriately indicates that we're talking about the internal buffer. > This solution imposes another subtype (or maybe type) upon the user; one ^^^^^^^^^^^^^ I said subtype, not type. We're measuring number of components, here, not bytes. So it makes perfect sense to use the same type for indexing as for this size measurement (but obviously a different subtype). ... > ... > > if Size(Apple_Basket) < Size(Orange_Basket) then > > This comparison might not work on some implementations. Worse, it might > work on other implementations, and the user could be pretty mystified as to > why. Heh? First of all, given my proposal, the above comparison would be illegal on *all* implementations. That's what I want -- if Apple_Baskets and Orange_Baskets are unrelated, then I *want* that comparison to be illegal. On the other hand, if the two abstractions are related in such a way that indexes into one make sense for the other, then the programmer should say so -- use the same index type for both instantiations. This should be the programmer's choice. **************************************************************** From: Robert A. Duff Sent: Tuesday, February 17, 2004 8:38 AM > No Ada program should depend on predefined elementary types. Period. So you don't use Boolean in your programs? Maybe it's "(False, Maybe, True)" on some implementations? ;-) Sorry, I couldn't resist -- I of course know what you meant. **************************************************************** From: Robert A. Duff Sent: Tuesday, February 17, 2004 8:53 AM > I got rid of the subtype Natural in the container packages, per Randy's > request. Maybe you should wait for the whole ARG to come to a decision before you make further changes in this area. > I modified the proposal and the reference implementation so that each > generic package declares its own modular Element_Count type. In the case > of the map it just derives from Hash_Type; in the vector and set it's its > own declaration. In the map and set, it should probably be a *signed* type: "type Element_Count range 0..implementation-defined". It's got nothing to do with Hash_Type. For Vector, it is related to the Index_Type, and should therefore be a subtype of the same type: subtype Element_Count is Index_Type'Base range 0..Index_Type'Base'Last; You might, for example, want to set the size to twice the current length of the vector. Both types are in the same "units", as it were -- number of components, so they should be the same type. (The above assumes that you agree with me that Index_Type should be "range <>"; I know Randy, and perhaps others, don't agree with that.) Furthermore, whether two different vectors should have the same Index_Type and Element_Count type should be the programmer's choice. Note that sets/maps are different from vectors -- in the former case, the implementation controls the maximum size (it's related to available memory), whereas in the vector case, the programmer controls the max size by choosing the value of Index_Type'Last. > My issue with Randy's solution is that the operators for the size type > aren't visible where the instantiation is visible, so you have to with > Ada.Containers specially. (But is that really true? I still have to > check that.) You don't need an extra with_clause, but you would need an extra use_clause. I agree that's slightly annoying. >...By declaring the type right in the generic package, the user > has immediate access to the size type. But by making it a subtype of the type of Index_Type, all the operators will be visible wherever the instance is visible. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 17, 2004 9:39 AM > Maybe you should wait for the whole ARG to come to a decision before you > make further changes in this area. OK. Randy wanted an errata list early this week, and I wasn't sure whether I was responsible for coming up with the version that didn't use the Natural subtype. It sounds like you guys already have some other ideas. > In the map and set, it should probably be a *signed* type: "type > Element_Count range 0..implementation-defined". It's got nothing to do > with Hash_Type. OK. That's the kind of feedback I was looking for. I also wasn't sure whether you wanted signed or unsigned types as the size/count/length type. I guess I assumed you'd want unsigned, since that gives you a bigger range. ... > (The above assumes that you agree with me that Index_Type should be > "range <>"; I know Randy, and perhaps others, don't agree with that.) My tentative conclusion was to do as you suggested, and restrict the vector to use only integer index types. However, it appears that there is still debate among the subcommittee, so I guess it's still an open issue. The only problem with your scheme above is that Index_Type'Base doesn't necessarily include all the values you need. For example: type Index_Type is -10 .. 5; Index_Type'Base'Last might only be 5, but we need it to be at least 16. However, since this is supposed to be an expandable array, then maybe the index type above doesn't make any sense. Note that I'm not married to the name Element_Count; it was just an idea. I was using the container analog of type Storage_Count as the model. The name Size_Type might be better, which is the closer to the style of name Hash_Type, and to the style of the actual container names. > Furthermore, whether two different vectors should have the same > Index_Type and Element_Count type should be the programmer's choice. > > Note that sets/maps are different from vectors -- in the former case, > the implementation controls the maximum size (it's related to available > memory), whereas in the vector case, the programmer controls the max > size by choosing the value of Index_Type'Last. OK. I was assuming the model was the same for all containers (max elements is controlled by available memory). >>My issue with Randy's solution is that the operators for the size type >>aren't visible where the instantiation is visible, so you have to with >>Ada.Containers specially. (But is that really true? I still have to >>check that.) > > You don't need an extra with_clause, but you would need an extra > use_clause. I agree that's slightly annoying. I wasn't sure about that. I was thinking that in order to say "use type Ada.Containers.Size_Type", you had to with Ada.Containers too. But it sounds like I was wrong. > But by making it a subtype of the type of Index_Type, all the operators > will be visible wherever the instance is visible. Yes. I like using Index_Type'Base, but wasn't sure whether we would run into snags wrt the base range of the type being large enough. It sounds like that's not really an issue. **************************************************************** From: Robert A. Duff Sent: Tuesday, February 17, 2004 10:20 AM > I also wasn't sure whether you wanted signed or unsigned types as the > size/count/length type. I guess I assumed you'd want unsigned, since > that gives you a bigger range. This is why I hate modular types. One is tempted to use them when wraparound arithmetic is inappropriate, just to get one extra bit. (IMHO, "type T is range 1..2**32-1;" should be legal on all implementations -- for that matter, so should "range 1..10**100". But I realize that's a pretty radical notion!) Anyway, in this case, the extra bit probably isn't necessary. You can't create a vector of 2 billion integers on a 32-bit machine -- you'll run out of address space first. Even if the component type is Character, you're unlikely to want to do that. I believe many operating systems steal half the address space for their own use, so no single process can use more than 2 billion bytes anyway. On a 64-bit machine, a vector of 2**62 components is unthinkable anytime soon. As I said, "1.." will be the most common index range, in which case 'Length can't be more than 'Last.. If that's not enough, buy a compiler that supports bigger signed integers. I want overflow/constraint checking on that type. So I suggest signed integer rather than modular. > The only problem with your scheme above is that Index_Type'Base doesn't > necessarily include all the values you need. For example: > > type Index_Type is -10 .. 5; > > Index_Type'Base'Last might only be 5, but we need it to be at least 16. Yes, it is possible to shoot yourself in the foot. So don't do that. ;-) This is already an issue in Ada -- the programmer must take care to make sure base ranges are wide enough. Nothing new here. > However, since this is supposed to be an expandable array, then maybe > the index type above doesn't make any sense. It would be rare, I'd say. ... > OK. I was assuming the model was the same for all containers (max > elements is controlled by available memory). Well, I suppose it *usually* will be -- the programmer will use an Index_Type that goes up to the roughly size of the address space. But the programmer can choose a smaller Index_Type, and there are sometimes good reasons to do so. ... > I wasn't sure about that. I was thinking that in order to say "use type > Ada.Containers.Size_Type", you had to with Ada.Containers too. But it > sounds like I was wrong. If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be visible. Look at the definition of "mentioned in a with_clause". This is because compilers might have trouble dealing with holes in the visibility -- cases where something is in scope, but the thing it's declared inside of is not. Use clauses don't work like that. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 17, 2004 10:56 AM ... > If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be > visible. Look at the definition of "mentioned in a with_clause". > This is because compilers might have trouble dealing with holes in > the visibility -- cases where something is in scope, but the thing it's > declared inside of is not. > > Use clauses don't work like that. I guess I'm still confused. I just tried this: with Character_Vectors; use Character_Vectors; procedure Test is use type Ada.Containers.Hash_Type; begin null; end Test; but GNAT is telling me that I'm "missing with for Ada.Containers" I put a subtype declaration in the vectors package, like this: subtype Hash_Type is Containers.Hash_Type; and then I could say: use type Character_Vectors.Hash_Type; But that's different from what Randy said to Nick: >I agree, but not with your solution. Clearly, there should >be a Size_Type next to Hash_Type in Ada.Containers. If you >actually need to do math on it (which should be very rare), >you'd need a "use type Ada.Containers.Size_Type;", but with >any decent style, you'll need that no matter what the type >is or where it is declared. I didn't know how to get "use type Ada.Containers.Size_Type;" to work without also with'ing Ada.Containers. But perhaps Randy meant something else? I'm not sure. If you want to declare type Size_Type is range 0 .. ; in Ada.Containers, I assumed you'd have to also declare a Size_Subtype in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this: subtype Size_Subtype is Size_Type; and then the user would have to say: with Instantiation; use type Instantiation.Size_Subtype; But that's different from saying "use type Ada.Containers.Size_Type;". **************************************************************** From: Jeffrey Carter Sent: Tuesday, February 17, 2004 11:35 AM > OK, what you're calling a Hash Table is what Matt called a Hashed Set. To > me, a hash table is an index without any elements at all - it's used as part > of the implementation of some larger component. We've already established that what Matt calls a "set" isn't. I'm afraid you're not making yourself clear now. With rare exceptions, hash functions can produce the same hash value for different elements. This results in "collisions". Therefore, hash tables store the elements so a lookup can determine if a specific element is actually in the table, or just hashes to the same value as another element. Since an element can contain information not used in calculating the hash or for "=", it seems that a hash table has to have an interface something like the one I presented. In other words, without seeing something more specific (like a spec), I can't tell how your idea of a hash table would work. > In any case, as I said earlier, that implementation (which is very similar > to Matt's) would be horrible on our compiler. You'd end up with 3 separate > allocations per element, plus a bunch of call overhead. Other compilers > mileage may vary (although I'd expect most would generate better code > without the extra generic). The solution is simple: don't use your compiler :) For most applications that will be willing to use a standard component, I doubt the performance will be unacceptable on any compiler. > So, you cannot assume that there is "no extra cost" here; it would be > another entire component. It would, of course, be very similar to the > "Sorted_Set" component, so it's hard to see that there is enough value to > having a separate container for The Standard, but I'd expect it to appear in > the secondary standard (along with List and Sorted_Map). The component would have to be specified, of course. I'm sure Matt or I would be able and willing to do that, and it wouldn't take very long. There is no extra implementation cost. Implementors are going to have to implement a hash table in order to implement hashed maps anyway. Let's be good software engineers and allow the reuse of that effort. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 18, 2004 5:12 PM > I'm afraid you're not making yourself clear now. With rare exceptions, > hash functions can produce the same hash value for different elements. > This results in "collisions". Of course. But to me, a hash table is just a table (array); collision handling is not part of it. It's a necessary part of a component, of course, which is why it's impossible to have a hash table component. But arguing over terminology is pointless. You're arguing in favor of Matt's Hashed_Set (even if you don't want to call it that). It's better to stick to a common set of terminology, even if you don't like it. ... > > In any case, as I said earlier, that implementation (which is very similar > > to Matt's) would be horrible on our compiler. You'd end up with 3 separate > > allocations per element, plus a bunch of call overhead. Other compilers > > mileage may vary (although I'd expect most would generate better code > > without the extra generic). > > The solution is simple: don't use your compiler :) Them's fighting words, even with the smiley. Being intolerant of the diversity of Ada implementations (and uses) is a good way to get yourself tuned out of ARG deliberations. > For most applications that will be willing to use a standard component, > I doubt the performance will be unacceptable on any compiler. Which of course is exactly the argument I've been making all along. Of course, then the Sorted_Set and the Vector are also good enough -- which is quite contrary to your position. ... > The component would have to be specified, of course. I'm sure Matt or I > would be able and willing to do that, and it wouldn't take very long. > There is no extra implementation cost. Implementors are going to have to > implement a hash table in order to implement hashed maps anyway. Let's > be good software engineers and allow the reuse of that effort. I've already said multiple times that there would be a significant extra implementation cost. Even though some of the implementation could be reused, there would still be a lot of unique work. In any case, repeating a falsehood doesn't make it true. But imagine for a moment that you're right, and there is not a line of extra code that needs to be written. You're still doubling the documentation, debugging, and testing costs for implementers. Clearly, this component will need a unique set of tests, and while there is a bit of sharing available, most of it will need to be different. And even if there are no bugs in the implementation at all, you still have to do the testing. So the cost will be a lot more than zero. **************************************************************** From: Stephen Leake Sent: Tuesday, February 17, 2004 12:43 PM ... > Because you don't always have a value to assign immediately. What you > want to do is make space in the vector for all the items, and then do > the assignment. For example, suppose you want to copy a list into a > vector: ... > If you don't do it this way, then your time complexity is O(n*m) > instead of O(n+m). Ok. I actually ran across a similar situation in Real Code this weekend :). If you were doing Insert (at end) rather than Insert (in the middle), your time complexity would be O(m), right? (n is the size of the vector, m is the size of the list). In general, Insert (in the middle) is an O(n) operation. So Insert_N (in the middle, no elements) is an optimization to work around that in some common cases. I think if you are really doing code like this, and you want the optimization, you should make the Vector Item_Type be an access type, and manage the memory yourself. Optimized code is always harder to write. So I'm affirming that deleting the itemless insertion from the indefinite map is ok. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 17, 2004 1:10 PM > Ok. I actually ran across a similar situation in Real Code this > weekend :). This happens all the time: you know in advance how many items you want to insert, so you tell the vector allowing it to preallocate, and then you do the insert. > If you were doing Insert (at end) rather than Insert (in the middle), > your time complexity would be O(m), right? (n is the size of the > vector, m is the size of the list). Yes, that's correct. The n part reduces to 0, because you're not sliding elements already in the vector container. > In general, Insert (in the middle) is an O(n) operation. So Insert_N > (in the middle, no elements) is an optimization to work around that in > some common cases. Yes. It is specifically designed for inserting in the middle of a vector. In the case of the STL, what happens it that you specify an iterator pair designating the half-open range of the source container. The vector probably computes the distance() first, then does the internal expansion, and then walks the source range constructing each new vector element in place. For a std::vector, the distance() function is specialized so that it computes the distance in constant time (because vector iterators are random access iterators, and therefore distance() can be implementing for a vector by simple subtraction). We can't get this sophisticated in Ada, but we can be almost as efficient. Instead of the vector itself calling distance(), it's the vector user who computes the distance (by whatever method makes sense), and then calls Insert_N to do the preallocation. So in this particular case (inserting multiple elements in the middle of a vector), in Ada the complete insertion operation actually comprises two separate calls. > I think if you are really doing code like this, and you want the > optimization, you should make the Vector Item_Type be an access type, > and manage the memory yourself. Optimized code is always harder to > write. No. Doesn't that argument undermine the case for indefinite forms? The Insert_N operation provides important and useful functionality, just like Resize does. There's nothing special about indefinite vectors, and the same techniques for optimized insertions apply as for the definite form. > So I'm affirming that deleting the itemless insertion from the > indefinite map is ok. I think they need to stay. If nothing else the definite and indefinite forms require a more or less identical interface. **************************************************************** From: Alexandre E. Kopilovitch Sent: Tuesday, February 17, 2004 2:25 PM This is return to the topic of interfaces in conjunction with Container Library, to the starting point of recent brief discussion - now I'm taking another branch of argumentation, which addresses the topic in the most direct way. ... > One way to get around that would be to put the interfaces into the generic > units. But then, the interfaces would only be usable with that container -- > hardly a useful interface! You might as well just use the container > directly. Although I'm not 100% sure what exactly you see as a problem with generic interfaces in Container Library, but guessing that you mean massive duplication of declarations of operations, I came to an idea how to overcome this problem with generics and make employment of interfaces in the library rather smooth. Let's introduce new form of interface declaration: type IT is interface of T; -- where T is a type, possibly generic one This will mean that IT is interface, which consists of declarations of all public primitive operations of T, in which all occurences of type T are substituted by interface IT. Type T automatically implements IT. If T in above declaration is generic type then IT is generic interface. In that case instantiation (perhaps partial) may be made inside the declaration, if needed: type IT is interface of T; -- for generic T I think that this form of interface declaration will solve the problem mentioned above. [Also, this form may be extended even further - by not requiring T to be a tagged type (but interface type IT will still be tagged) - with the same definition, that is, the interface IT constists of all primitive operations (which are all public in this case) of T. But this probably isn't directly related to the Container Library.] **************************************************************** From: Robert A. Duff Sent: Tuesday, February 17, 2004 6:03 PM ... > > If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be > > visible. Look at the definition of "mentioned in a with_clause". > > This is because compilers might have trouble dealing with holes in > > the visibility -- cases where something is in scope, but the thing it's > > declared inside of is not. > > > > Use clauses don't work like that. > > I guess I'm still confused. I don't think you're confused. I think I wrote something confusing above. Sorry about that. >... I just tried this: > > with Character_Vectors; use Character_Vectors; > > procedure Test is > use type Ada.Containers.Hash_Type; > begin > null; > end Test; > > but GNAT is telling me that I'm > > "missing with for Ada.Containers" Correct. If you want to refer to Ada.Containers.Hash_Type, you need to say "with Ada.Containers;". I was assuming you would have said "with Ada.Containers.Something;" already, but that's not necessarily true. I should probably admonish you to use the RM as the definition of the language, rather than what one compiler happens to do. ;-) Chapters 8 and 10 explain all this -- but chapter 8 is pretty tough going. > I put a subtype declaration in the vectors package, like this: > > subtype Hash_Type is Containers.Hash_Type; > > and then I could say: > > use type Character_Vectors.Hash_Type; Yes, that could work. However, that will make use-package clauses less useful, because if you say "use Character_Vectors, Integer_Vectors;", then the two Hash_Type's will conflict, and cancel each other out. > But that's different from what Randy said to Nick: > > >I agree, but not with your solution. Clearly, there should > >be a Size_Type next to Hash_Type in Ada.Containers. If you > >actually need to do math on it (which should be very rare), > >you'd need a "use type Ada.Containers.Size_Type;", but with > >any decent style, you'll need that no matter what the type > >is or where it is declared. > > I didn't know how to get "use type Ada.Containers.Size_Type;" to work > without also with'ing Ada.Containers. You're right. >... But perhaps Randy meant something > else? I'm not sure. > > If you want to declare > > type Size_Type is range 0 .. ; > > in Ada.Containers, I assumed you'd have to also declare a Size_Subtype > in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this: > > subtype Size_Subtype is Size_Type; > > and then the user would have to say: > > with Instantiation; use type Instantiation.Size_Subtype; > > But that's different from saying "use type Ada.Containers.Size_Type;". You're right. I suggest that if Size_Type is declared in Containers, let the programmer write "with Ada.Containers; use type Ada.Containers.Size_Type;". Declaring Size_Subtype causes the "cancelling out" problem I mentioned above. But I don't feel strongly about this. I do think my suggestion for Vectors solves the problems better -- but not for sets/maps (unless you pass in the Size_Type as a generic formal to those). During the Ada 9X project, we considered a rule that if there are 17 potentially directly visible things call X, and they're all essentially renamings of the same thing, then the compiler picks one at random. But the rules would be pretty tricky, and the idea got dropped. **************************************************************** From: Nick Roberts Sent: Wednesday, February 18, 2004 12:23 PM Apologies for this not being in response to anything anyone has specifically said, but the containers topic has generated such a spout of messages, it's difficult! I would repeat (I'm sure I've said it before many times) that the container packages /do not need/ indefinite forms, now or in the future. The reason is simple: (a) if you want to contain an indefinite type, and you want to abstract away such low-level mechanics as memory management (quite rightly), all you do is write a package that exports a definite private type, with the required operations and other accoutrements (constants, support types and subtypes), and encapsulates the underlying indefinite type indide that definite type (almost certainly by using dynamic allocation); (b) to support class-wide types or any indefinite types whose objects are not dynamically allocated (so that memory management is not an issue), you can contain an access type that designates them. For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use definite containers on unbounded strings without problems. End of story, and hopefully end of argument. Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D reckoned this was good for maps and sets, but not vectors. I still disagree with Bob about the vector package having its own Index_Type generic parameter. I think that the practical advantages of having a pre-supplied universal index type would greatly outweigh the advantages of having the way it currently is. Furthermore, I think Randy's idea has the merit of echoing the approach taken by the existing *_IO packages. Why don't we have something like this: type Count is range 0 .. [imp def]; subtype Positive_Count is Count range 1..Count'Last; declared in Ada.Containers, and then: generic type Element_Type is private; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Vectors is pragma Preelaborate; type Vector_Type is private; function "=" (Left, Right : Vector_Type) return Boolean; function Max_Length (Vector : Vector_Type) return Count; -- was Length function Is_Empty (Vector : Vector_Type) return Boolean; procedure Clear (Vector : in out Vector_Type); procedure Swap (Left, Right : in out Vector_Type); procedure Append (Vector : in out Vector_Type; New_Item : in Element_Type); procedure Insert (Vector : in out Vector_Type; Before : in Positive_Count; New_Item : in Element_Type); procedure Insert (Vector : in out Vector_Type; Before : in Positive_Count); procedure Insert_N (Vector : in out Vector_Type; Before : in Positive_Count; How_Many : in Count; New_Item : in Element_Type); ... function Length (Vector : Vector_Type) return Natural; -- was Size procedure Resize (Vector : in out Vector_Type; New_Length : in Count); -- function Front, Back ? function First (Vector : Vector_Type) return Positive_Count; ... If the user felt it was important to have index type safety, or an index base other than 1 -- and I don't think it will be often -- she could always wrap an instantiation of Ada.Containers.Vectors in a package that provided it. I could suggest a few more useful operations for vectors. How about vector concatenation? Slicing? I might suggest a constant Null_Vector, obviating the need for the Is_Empty function and Clear procedure, but I must admit one disadvantage of such constants is that they are not inherited. I've found this a small pain occasionally. On the other hand, the test V = Foo.Null_Vector might be considered better (more natural, more readable) than Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm not sure. I'm none too keen on the generic type Element_Access is access all Element_Type; function Generic_Element (Vector : Vector_Type; Index : Index_Type'Base) return Element_Access; sub-package. It will surely constrain the implementation to declaring its internal storage array(s) with aliased components. This could have some pretty unfortunate effects on efficiency. I really like the Generic_Sort. That would certainly be very handy. By the way, I wonder if anyone has thought about a likely implementation of this package. I know Matt's done a sample imp (which I haven't had time to look at, sorry), but it seems to me that a reasonably efficient implementation would not be very simple. Are we saying that implementations are not expected to be very efficient, or that implementations are expected to be sophisticated? Another suggestion that I feel you should think about is a package that has almost the same interface as A.C.Vectors, but whose container objects are capable of being metamorphosed (perhaps implicitly, perhaps explicitly, or perhaps both) between the array form (with fast random access) and the linked-list form (with efficient appendage). This would fit very neatly with typical usage: building by successively appending elements, followed by usage that requires random access (sorting being the classic example). In the light of this idea, might not a List (linked list) package actually be more fundamentally useful, that simply had an operation to convert the list to an array? **************************************************************** From: Matthew Heaney Sent: Wednesday, February 18, 2004 1:21 PM > If the user felt it was important to have index type safety, or an index > base other than 1 -- and I don't think it will be often -- she could > always wrap an instantiation of Ada.Containers.Vectors in a package that > provided it. The vector package will import a generic formal index type. > I could suggest a few more useful operations for vectors. How about > vector concatenation? Slicing? This is an open issue, and I mentioned this in the errata list I sent Randy this morning. > I might suggest a constant Null_Vector, obviating the need for the > Is_Empty function and Clear procedure, but I must admit one disadvantage > of such constants is that they are not inherited. I've found this a > small pain occasionally. On the other hand, the test V = Foo.Null_Vector > might be considered better (more natural, more readable) than > Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm > not sure. The vector will have Is_Empty and Clear operations. > I'm none too keen on the > > generic > type Element_Access is access all Element_Type; > function Generic_Element (Vector : Vector_Type; > Index : Index_Type'Base) > return Element_Access; > > sub-package. It will surely constrain the implementation to declaring > its internal storage array(s) with aliased components. This could have > some pretty unfortunate effects on efficiency. The aliasing of elements is an open issue (for other reasons), and was included in the errata list I sent Randy this morning. > I really like the Generic_Sort. That would certainly be very handy. > > By the way, I wonder if anyone has thought about a likely implementation > of this package. I know Matt's done a sample imp (which I haven't had > time to look at, sorry), but it seems to me that a reasonably efficient > implementation would not be very simple. Are we saying that > implementations are not expected to be very efficient, or that > implementations are expected to be sophisticated? It's implemented using an unconstrained array (that's why the container is named "vector"). The implementation is as complicated as array manipulation is. The Generic_Sort in the reference implementation is implemented using a quicksort algorithm, augmented with a median-of-3 to find the pivot. > Another suggestion that I feel you should think about is a package that > has almost the same interface as A.C.Vectors, but whose container > objects are capable of being metamorphosed (perhaps implicitly, perhaps > explicitly, or perhaps both) between the array form (with fast random > access) and the linked-list form (with efficient appendage). The vector is optimized for inserting at the back end of the container. Append for a vector is O(1), just like a list is. (The only difference is that appending to a vector is "amortized" constant time.) > This would > fit very neatly with typical usage: building by successively appending > elements, followed by usage that requires random access (sorting being > the classic example). That's exactly how a vector is intended to be used. You do not need a list to do what you have described. > In the light of this idea, might not a List > (linked list) package actually be more fundamentally useful, that simply > had an operation to convert the list to an array? There is no list container is this version of the standard container library. **************************************************************** From: Marius Amado Alves Sent: Wednesday, February 18, 2004 2:44 PM On Wednesday 18 February 2004 18:22, Nick Roberts wrote: > I would repeat (I'm sure I've said it before many times) that the container > packages /do not need/ indefinite forms, now or in the future. > > The reason is simple: > > (a) if you want to contain an indefinite type, and you want to abstract > away such low-level mechanics as memory management (quite rightly), all you > do is write a package that exports a definite private type, with the > required operations and other accoutrements (constants, support types and > subtypes), and encapsulates the underlying indefinite type indide that > definite type (almost certainly by using dynamic allocation); Beaten argument. And self contradictory: dynamic allocation *is* memory management. The "level" of it does not matter. The user does not want to do *any* memory management. > (b) to support class-wide types or any indefinite types whose objects are > not dynamically allocated (so that memory management is not an issue), you > can contain an access type that designates them. Sure. > For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use > definite containers on unbounded strings without problems. Unbounded_String is in fact a wonderful container. And a paradigmatic example of what the user expects from any container. So more ammo against (a). > End of story, and hopefully end of argument. Unfortunately no. > Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D > reckoned this was good for maps and sets, but not vectors. I still disagree > with Bob about the vector package having its own Index_Type generic > parameter. I think that the practical advantages of having a pre-supplied > universal index type would greatly outweigh the advantages of having the > way it currently is. I agree. I'm taking the chance to express myself on this issue. For me the index type could be simply Positive, like in Unbounded_Arrays (a package I presented at the ASCL Workshop, echoes of which can still be heard in the current proposal e.g. Resize and unassigned elements). /* > Furthermore, I think Randy's idea has the merit of > echoing the approach taken by the existing *_IO packages. I never liked this _Count business but ok. */ > I might suggest a constant Null_Vector... No please. > . . . the test V = Foo.Null_Vector might be > considered better (more natural, more readable) than Is_Empty(V) and V := > Foo.Null_Vector than Clear(V). Not to me, no. > I'm none too keen on the > > generic > type Element_Access is access all Element_Type; > function Generic_Element (Vector : Vector_Type; > Index : Index_Type'Base) > return Element_Access; > > sub-package. It will surely constrain the implementation to declaring its > internal storage array(s) with aliased components. This could have some > pretty unfortunate effects on efficiency. And it's not terribly useful either. If the user wants to do pointer programming he can do that him self with containers of pointers, no? > . . . > Another suggestion that I feel you should think about is a package that has > almost the same interface as A.C.Vectors, but whose container objects are > capable of being metamorphosed If it's another package with a similar interface then just make it a list, don't complicate it with transmorphing. I tried a similar stunt with Truc but then I saw the light :-) > ... linked-list form (with efficient appendage) You mean insertion. Appendage can be efficient with vectors. > In the light of this idea, might not a List (linked list) package actually > be more fundamentally useful, that simply had an operation to convert the > list to an array? Maybe. Personally I wouldn't mind at all seeing a list package there. Paralelled by a reduction of the vectors interface. Once you have lists, you don't need the (unefficient) insertion and deletion in the middle of vectors anymore. And as said above, remove pointer programming support--in all structural varieties (vectors, lists, maps, sets). The total reduction would make plenty of space for the so much wanted--and rightfully so--list. * Indefinite elements revisited : an alternative : elementary containers * I think we all agree that the main rationale for having indefinite elements is freeing the user to do memory management. Many people do not like, want, or know how, to dance with pointers. I and Matt have already shown how indefinite elements can be added to the proposal, with packages paralleling the ones for definite elements, defined in a one-page annex. An alternative is to provide a minimal package of 'elementary containers' that does the required encapsulation of an indefinite inside a definite that the user can then use to instantiate 'normal' containers. This alternative has the virtue of focusing on the main requirement (freeing the user of doing memory management). generic type Element_Type (<>) is private; package Ada.Containers.Elementary is type Container_Type is private; function Put (Item : Element_Type) return Container_Type; function Get (Container : Container_Type) return Element_Type; end; package Boxes is new Elementary (My_Indef_Type); package My_Vectors is Vectors (Boxes); use Boxes, My_Vectors; V : My_Vectors.Vector_Type; Append (V, Put (My_Indef_Object)); My_Op_Upon_The_Indef_Type (Get (Element (V, 1))); For a 'real' example see the implementation of Truc (www.liacc.up.pt/~maa/containers). This breaks the only-one-instantiation requirement but it is for a good cause :-) Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of indefinite elements, so we can safely assume all the others will be happy too, and the standard will be embraced by ALL :-) Note the minimal container is useful also for other situations, e.g. for making an (core language) array of indefinite elements: A : array (1 .. 10) of Boxes.Container_Type; A (1) := Put (My_Indef_Object); And remember you have memory magic i.e. when you write A (1) := Put (Another_Indef_Object); the previous value is cleanly disposed of. Compare this with all the stuff you have to write (and review, and debug, and test, and...) to get the same effect with core language devices. (Well, this is just backing up the rationale above.) **************************************************************** From: Randy Brukardt Sent: Wednesday, February 18, 2004 5:55 PM > > I'm none too keen on the > > > > generic > > type Element_Access is access all Element_Type; > > function Generic_Element (Vector : Vector_Type; > > Index : Index_Type'Base) > > return Element_Access; > > > > sub-package. It will surely constrain the implementation to declaring its > > internal storage array(s) with aliased components. This could have some > > pretty unfortunate effects on efficiency. > > And it's not terribly useful either. If the user wants to do pointer > programming he can do that him self with containers of pointers, no? I think the idea is to allow update-in-place of elements (which matters if the elements are large or indefinite). It's likely to be more necessary with Maps than with Vectors, but it's better to have the same operations for all of the containers. It wouldn't be necessary to use a generic formal for this purpose, of course, just put an access type in here: type Element_Access is access all Element_Type; function Writable_Element (Vector : Vector_Type; Index : Index_Type'Base) return Element_Access; That's a bit less flexible, but probably flexible enough if the primary purpose is a reference. ... > An alternative is to provide a minimal package of 'elementary containers' that > does the required encapsulation of an indefinite inside a definite that the > user can then use to instantiate 'normal' containers. This alternative has > the virtue of focusing on the main requirement (freeing the user of doing > memory management). I tend to prefer the two packages mechanism. That's because having the local memory management also makes the proportionality constant for Inserts and Sorts much less, and I'd not want to lose that. Indeed, if the proposal was adopted with both Definite and Indefinite element types, I'd suggest using the Indefinite version for large/expensive-to-copy element types even if the type is definite and any amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two implementations would be identical, but that would be unusual, and I wouldn't recommend anyone depend on that.) The Definite version would be best for small element types (like access types), because it would have a lot less overhead for adding an item and destroying the container. **************************************************************** From: Nick Roberts Sent: Wednesday, February 18, 2004 4:35 PM Marius Amado Alves wrote: > Personally I wouldn't mind at all seeing a list package there. Indeed, and I feel the argument for a list package is really stronger than for a vectors one. With a list container, you can do all the insertion and deletion you like perfectly efficiently, and then just convert it to an array for random access. What's wrong with that? Why then would vectors be needed at all? > Many people do not like, want, or know how, to dance with pointers. I completely agree with this. > I and Matt have already shown how indefinite elements can be added to the > proposal, with packages paralleling the ones for definite elements, defined > in a one-page annex. Yuk. > An alternative is to provide a minimal package of 'elementary containers' that > does the required encapsulation of an indefinite inside a definite that the > user can then use to instantiate 'normal' containers. This alternative has > the virtue of focusing on the main requirement (freeing the user of doing > memory management). Brilliant! I think this is a superb idea. Maybe we could term a container of this kind a 'keeper'; I'm sure someone can come up with a better one. with Ada.Finalization; -- for private part only generic type Element_Type (<>) is private; package Ada.Containers.Keepers is type Keeper is private; function To_Keeper (Item : Element_Type) return Keeper; function Empty_Keeper return Keeper; function Value (Source : Keeper) return Element_Type; function Is_Empty (Source : Keeper) return Boolean; procedure Clear (Source : in out Keeper); procedure Replace (Source : in out Keeper; By : in Element_Type); private type Element_Access is access Element_Type; type Keeper is new Ada.Finalization.Controlled with record Ref: Element_Access; -- null for empty end record; end; package My_Keepers is new Ada.Containers.Keepers(My_Indef_Type); package My_Vectors is Ada.Containers.Vectors(My_Keepers.Keeper); use My_Keepers, My_Vectors; V : My_Vectors.Vector_Type; Append( V, To_Keeper(My_Indef_Object) ); My_Op_Upon_The_Indef_Type( Value( Element(V,1) ) ); Possibly 'To_Keeper' should be named 'Make_Keeper' or 'New_Keeper'. I've shown the likely implementation of the Keeper type. > Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of > indefinite elements, so we can safely assume all the others will be happy > too, and the standard will be embraced by ALL :-) I really REALLY like Marius' idea here. Yes please! > Note the minimal container is useful also for other situations, e.g. for > making an (core language) array of indefinite elements: > > A : array (1 .. 10) of Boxes.Container_Type; > A (1) := Put (My_Indef_Object); or alternatively: A : array (1 .. 10) of My_Keepers.Keeper; Replace( A(1), My_Indef_Object ); which might be slightly more efficient. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 18, 2004 7:32 PM ... > Indeed, and I feel the argument for a list package is really stronger than > for a vectors one. With a list container, you can do all the insertion and > deletion you like perfectly efficiently, and then just convert it to an > array for random access. What's wrong with that? Why then would vectors be > needed at all? That's going to be very expensive if the length of the list is very long and/or copying the elements is expensive. Matt's design tries to avoid copying elements as much as possible, and he's particularly concerned with the containers being able to 'scale-up' to large numbers of elements. If the sequence (I'm using the general term here) doesn't have very big elements and can't get very long, you don't need any fancy container to hold it. Just declare an array of the maximum size and use it. The value of any container is when one or both of those things is true, and you do need the memory management implied by a container. And, if you can only have one sequence container, the vector container (which allows computed access to elements) is more flexible than the list container (which doesn't). Besides, a useful list is a lot easier to write than a useful growable array. **************************************************************** From: Marius Amado Alves Sent: Thursday, February 19, 2004 6:51 AM On Wednesday 18 February 2004 22:34, Nick Roberts wrote: [Lists and Vectors] > Marius Amado Alves wrote: > > Personally I wouldn't mind at all seeing a list package there. > > Indeed, and I feel the argument for a list package is really stronger than > for a vectors one. I don't feel that way. > With a list container, you can do all the insertion and > deletion you like perfectly efficiently, and then just convert it to an > array for random access. What's wrong with that? Efficiency. Surely you cannot convert a list of a zillion elements just like that. > Why then would vectors be > needed at all? See above. And also, you often need the precise vector abstraction. Let it be there ready for use. Just add the precise list abstraction. They will live there happily side by side. [Elementary Containers] > generic > type Element_Type (<>) is private; > package Ada.Containers.Keepers is > type Keeper is private; > function To_Keeper (Item : Element_Type) return Keeper; > function Empty_Keeper return Keeper; > function Value (Source : Keeper) return Element_Type; > function Is_Empty (Source : Keeper) return Boolean; > procedure Clear (Source : in out Keeper); > procedure Replace (Source : in out Keeper; > By : in Element_Type); > private... Looks good. Compare with this 'real code' example from AI302/2: generic type Element (<>) is private; type Element_Ptr is access all Element; type Container is private; with procedure Put (C : in out Container; E : Element) is <>; with function Put (E : Element) return Container is <>; with function Get (C : Container) return Element is <>; with procedure Delete (C : in out Container) is <>; with function Access_Of (C : Container) return Element_Ptr is <>; with function "=" (L, R : Container) return Boolean is <>; with procedure Overwrite (C : Container; E : Element) is <>; with function Img (C : Container) return String is <>; package Signature is end; Operations side by side: Minimal AI302/2 Nick Remark ------------------------------------------------------------------ Put(E)->C Put(E)->C To_Keeper(E)->C yes, Insert Get(C)->E Get(C)->E Value(C)->E yes, Element Put(ioE,C) Replace(ioC,E) yes, Replace(C,E)? Delete(ioC) Clear(C) yes, Clear Access_Of(C)->P for update-in-place "="(C,C)->B no Overwrite(C,E) for update-in-place Img(C)->S no Empty_Keeper->C no ------------------------------------------------------------------ Abbreviations: E = element type C = container type -> = returns io = in out B = Boolean P = pointer to element In the remarks: A "yes" means the operation is definitely a go, with the indicated name for consistency with AI302/3. The remark "Replace(C,E)?" is associated with the fact that in AI302/3 the container parameter of the Replace_Element operation for vectors is just in, not in out. But in the corresponding operation for maps the container parameter is in out. Only the ARG and/or Matt can explain this. The two "for update-in-place" operations: Access_Of is like the Generic_Element (terrible name) of AI302/3 vectors. Overwrite(C,E) is logically equivalent to Access_Of (C).all := E. Overwrite is the update-in-place operation distiled. So if Access_Of (or Generic_Element) is there just for update-in-place it can be dropped from the interface. In C++ Overwrite is dangerous if the new element is bigger than the previous. I hope Ada can avert this, or at least detect it and raise an exception. Whatever you do, leave a means for update-in-place in the interface. Albeit dangerous (?), it is very useful for efficient replacement when the user knows that the sizes are equal. * Names. Finalising a proposal * "Keeper" is too colloquial, no? And has a connotation to football. "Cell" would be a better metaphor. Of course the container type name and the other names must get along with each other e.g. package container type element type ------------------------------------------ Elementary Container_Type Element_Type Cells Cell_Type Element_Type Cells Cell_Type Value_Type ------------------------------------------ If there are no essential disagreements with this proposal, I and Nick (?) will try to formalise a proposal, with the options indicated above. **************************************************************** From: Marius Amado Alves Sent: Thursday, February 19, 2004 8:25 AM On Wednesday 18 February 2004 23:48, Randy Brukardt wrote: [Operations for update-in-place] > Marius Amado Alves wrote (responding to Nick Roberts): > > > I'm none too keen on the > > > > > > generic > > > type Element_Access is access all Element_Type; > > > function Generic_Element (Vector : Vector_Type; > > > Index : Index_Type'Base) > > > return Element_Access; > > > > > > sub-package. It will surely constrain the implementation to declaring > > its > > > > internal storage array(s) with aliased components. This could have some > > > pretty unfortunate effects on efficiency. > > > > And it's not terribly useful either. If the user wants to do pointer > > programming he can do that him self with containers of pointers, no? > > I think the idea is to allow update-in-place of elements (which matters if > the elements are large or indefinite). If large yes. If indefinite not quite. You have to deal with possibly different sizes. See my previous message in reply to Nick- > It's likely to be more necessary > with Maps than with Vectors, I don't see why, but ok. > but it's better to have the same operations > for all of the containers. Ok. > It wouldn't be necessary to use a generic formal for this purpose, of > course, just put an access type in here: > type Element_Access is access all Element_Type; Yes, please do that. The generic breaches the only-one-instantiation requirement. [Indefinite elements] > I tend to prefer the two packages mechanism. That's because having the > local memory management also makes the proportionality constant for Inserts > and Sorts much less If I understand correctly, not quite. Not the "much" anyway. See the provisions for update-in-place for elementary containers in my previous message (in reply to Nick). > Indeed, if the proposal was adopted with both Definite and Indefinite > element types, I'd suggest using the Indefinite version for > large/expensive-to-copy element types even if the type is definite and any > amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two > implementations would be identical, but that would be unusual, and I > wouldn't recommend anyone depend on that.) The Definite version would be > best for small element types (like access types), because it would have a > lot less overhead for adding an item and destroying the container. Note this only applies to *inerently* inefficient operations e.g. inserting/deleting in vectors. And, again, provisions for update-in-place for elementary containers minimize the 'problem'. And shouldn't we avoid mingling definiteness and largeness? They are independent factors. Personally, as a user, I'm happy with either solution (Annex or elementary containers). I can easily construct either one from the other. But as an implementer I would prefer the elementary containers solution, because it is so less trouble. I'm surprised that the real compiler writer Randy feels the contrary. And it seems much less work for conformance testing also. And it probably eases the specification also. Annex is a bit strange and bug-prone, because it is assuming that a lot about definite elements transposes to indefinite. We already found some "anomalies". Elementary containers is just a 'normal' spec. It does not require any *combined* testing with the other containers. The user can easily derive by himself any theorems about a container of elementary containers from the two independent specs. And I think everybody prefers a standard that just shows a package spec--over one that defines one in English. **************************************************************** From: Randy Brukardt Sent: Thursday, February 19, 2004 6:21 PM Marius Amado Alves: > > I think the idea is to allow update-in-place of elements (which matters if > > the elements are large or indefinite). > > If large yes. If indefinite not quite. You have to deal with possibly > different sizes. Well, usually it would be used to update parts (components) of elements, not the entire thing. If you're going to update the whole thing, use the safer Replace_Element. Indefinite elements have components, too. > [Indefinite elements] > > > I tend to prefer the two packages mechanism. That's because having the > > local memory management also makes the proportionality constant for Inserts > > and Sorts much less > > If I understand correctly, not quite. Not the "much" anyway. See the > provisions for update-in-place for elementary containers in my previous > message (in reply to Nick). For most implementations, it will make them much less. The canonical implementation of a definite element is: type Internal_Array is array (Index_Type range <>) of aliased Element_Type; while for indefinite element is: type Element_Access is access all Element_Type; type Internal_Array is array (Index_Type range <>) of Element_Access; so, when you're moving buckets for an insert, you're copying whole elements in the definite case, and just pointers in the indefinite case. If element copy is expensive (lots of controlled components, for instance), that can make a huge difference. > Note this only applies to *inerently* inefficient operations e.g. > inserting/deleting in vectors. Of course. But if you're using them a lot, it matters. > And, again, provisions for update-in-place for > elementary containers minimize the 'problem'. I have no idea what you mean. When you have to copy an element, you have to copy it. If "elementary containers" (BTW, that name is horrible, because "elementary" means scalar and access types in Ada, and that is not what you mean here) uses controlled types and does reference counted shallow copies, it could avoid some overhead -- but at the cost of a lot of complexity. > But as an implementer I would prefer the elementary containers solution, > because it is so less trouble. I'm surprised that the real compiler writer > Randy feels the contrary. For us (because of generic sharing), there is no difference between definite and indefinite elements. The compiler will internally transform "Element_Type" into "Element_Access" (because the size and contents of the actual type are unknown). Which is why I'm completely opposed to any semantics differences between them. And, because of that, your proposed solution would mean that both containers would end up doing memory management. So everything would end up allocated twice (the actual element, and then the "elementary container". That would cause serious heap fragmentation problems (Windows is not good at handling that), and I fear that the combination would be effectively unusable. At which point we're out of business (changing the implementation of generics is not an option). For me, all of the elements should be indefinite, period. We don't need definite versions. (That would make Janus/Ada look good, our implementation would be competitive. :-) But I understand why no one else thinks that. > And it seems much less work for conformance testing also. Since the semantics are identical for the two packages, use the same tests (with different types). Much less work than writing two sets of tests from scratch. > And it probably eases the specification also. Annex is a bit strange and > bug-prone, because it is assuming that a lot about definite elements > transposes to indefinite. We already found some "anomalies". Yes, but those are bugs in the design of the container. Do we really want to be able to put random junk into containers? I don't think so. There would be a problem if we were to decide to add array operations (since indefinite can't be a component), but that's far from decided. ... > And I think everybody prefers a standard that just shows a package spec--over > one that defines one in English. That is precisely how all of the Wide_String packages work, and they haven't caused a lot of problems. Indeed, the advantage of the indefinite packages is that they are *very* small in terms of standard wording and "weight" (that is, there is no new concepts to learn and understand with them). That's not true of "elementary containers". **************************************************************** From: Jeffrey Carter Sent: Thursday, February 19, 2004 7:07 PM Randy Brukardt wrote: > Of course. But to me, a hash table is just a table (array); collision > handling is not part of it. It's a necessary part of a component, of > course, which is why it's impossible to have a hash table component. OK. That's not the definition of a hash table that I learned, but we're not really in disagreement. I'm curious, though: if a hash table is just an array, what are the index and component types? > Which of course is exactly the argument I've been making all along. > Of course, then the Sorted_Set and the Vector are also good enough -- > which is quite contrary to your position. I'd be perfectly happy to not have a hash table or anything based on one. If they exist, though, I might choose to use a hash table based on expected performance for a specific application, and I would want to be able to use it without an ugly kludge. If they exist, I think the implementation should be available as well as the higher-level components. **************************************************************** From: Stephen Leake Sent: Friday, February 20, 2004 3:34 AM Matthew Heaney writes: > In the case of the STL, what happens it that you specify an iterator > pair designating the half-open range of the source container. The > vector probably computes the distance() first, then does the internal > expansion, and then walks the source range constructing each new > vector element in place. > > For a std::vector, the distance() function is specialized so that it > computes the distance in constant time (because vector iterators are > random access iterators, and therefore distance() can be implementing > for a vector by simple subtraction). > > We can't get this sophisticated in Ada, but we can be almost as > efficient. Instead of the vector itself calling distance(), it's the > vector user who computes the distance (by whatever method makes > sense), and then calls Insert_N to do the preallocation. Hmm. We could require a source container signature package, that includes cursors and Distance; that should give the same efficiency as C++ STL. We probably don't want that for ai302. ... > > So I'm affirming that deleting the itemless insertion from the > > indefinite map is ok. > > I think they need to stay. If nothing else the definite and > indefinite forms require a more or less identical interface. Ok, I agree with you; itemless insert is useful and should be in the indefinite containers. However, the intended use is that they be immediatly followed by a Replace operation, which specifies the item for each element. So itemless insert should just insert null pointers in the underlying container, and any operation that accesses an itemless element should raise Constraint_Error, since it indicates a user error. I've looked thru your indefinite_vectors package. Why do you have both type VT and type Vector_Type? **************************************************************** From: Matthew Heaney Sent: Friday, February 20, 2004 1:50 PM > Hmm. We could require a source container signature package, that > includes cursors and Distance; that should give the same efficiency as > C++ STL. We probably don't want that for ai302. It's not necessary. In the current design it just means you have to supply the count yourself and do the vector pre-insert, then use your favorite iteration method (over the target, over the source, active, passive, etc, etc) to do the actual vector insert. > Ok, I agree with you; itemless insert is useful and should be in the > indefinite containers. > > However, the intended use is that they be immediatly followed by a > Replace operation, which specifies the item for each element. Yes, that is correct. The state of the container immediately following the pre-insert (what I've been calling "Insert_N") is intended only as a temporary state, as a prelude to some form of replacement of the elements in the newly-allocated slots. > So itemless insert should just insert null pointers in the underlying > container, and any operation that accesses an itemless element should > raise Constraint_Error, since it indicates a user error. Yes, for the indefinite form, an item-less insert would give each new slot the value null (the original non-null values in those positions would slide up), in anticipation of its replacement by a non-null value. > I've looked thru your indefinite_vectors package. Why do you have both > type VT and type Vector_Type? It's a bit of a trick. I used transitivity of visibility to make the operations of the type directly visible. **************************************************************** From: Marius Amado Alves Sent: Friday, February 20, 2004 5:51 AM On Friday 20 February 2004 00:20, Randy Brukardt wrote: > ... When you have to copy an element, you have to > copy it. If "elementary containers" (BTW, that name is horrible... The correct name would be "uni-elementary containers". For some reason I lost the "uni-". I'm considering changing to "cells". > ... everything would end up > allocated twice (the actual element, and then the "elementary container". > That would cause serious heap fragmentation problems (Windows is not good > at handling that), and I fear that the combination would be effectively > unusable. Serious problems? Effectively unusable? Are you sure? Just because of one more level of allocation? For such small things as pointers? Forgot high performance is not required? > ... For me, all of the elements should be indefinite, period. For me too! > ... But I understand why no one else thinks that. I don't (understand)! > > ... one that defines one in English. > > That is precisely how all of the Wide_String packages work, and they > haven't caused a lot of problems. I know, but String to Wide_String is not a quantum leap like definite to indefinite. > Indeed, the advantage of the indefinite > packages is that they are *very* small in terms of standard wording and > "weight" (that is, there is no new concepts to learn and understand with > them). Only if the transposition is exceptionless. That is no "anomalies". Can we assure that? I fear a flood of Ada Questions (?) beginning 2005. > That's not true of "elementary containers". Yes, but the new concept (the cell:-) is minimal, useful, "brilliant", natural to every programmer. In sum, we have three solutions to choose from, with pros and cons: Only Def.+ Def.+ indef. indef. cells ------------------------------------------------------ Changes to AI-302/3 many few few Reference implementation no yes yes One more useful structure no no yes Janus issues no no yes ... **************************************************************** From: Randy Brukardt Sent: Friday, February 20, 2004 7:32 PM Marius Amado Alves wrote: ... > The correct name would be "uni-elementary containers". For some reason I lost > the "uni-". I'm considering changing to "cells". "Cells" seems better to me. Short is always good! > > ... everything would end up > > allocated twice (the actual element, and then the "elementary container". > > That would cause serious heap fragmentation problems (Windows is not good > > at handling that), and I fear that the combination would be effectively > > unusable. > > Serious problems? Effectively unusable? Are you sure? Just because of one more > level of allocation? For such small things as pointers? Forgot high > performance is not required? Well, a "Cell" (which is doing memory management) is not a pointer, it's a controlled object containing a pointer. (Otherwise, the memory wouldn't be recovered on scope exit, which is a clear no-no.) So that means its size is more like 20 bytes for Janus/Ada. So I think I was wrong about fragmentation problems (it is big enough to avoid those). But it certainly would be a potential problem for memory use (if there are lot of them), and a lot more overhead when items are copied (calls to Finalize and Adjust for each item, which the directly indefinite version would not have - it wouldn't need controlled elements as the container itself is controlled). Of course, this doesn't matter in truly low performance applications, but there are a lot of middle ground applications in which that could matter. > > ... But I understand why no one else thinks that. > > I don't (understand)! Bounded forms need to have definite components (the reason for bounded forms is to have little or no dynamic memory management; it defeats the purpose to then dynamically allocate the elements). We need to leave room for future enhancements. Similarly, there is a lot less dynamic memory management in with definite elements. Most implementers claim that's important to their customers (they want repeatability). (It better not be important to Janus/Ada customers, because we allocate a lot of things dynamically and non-contiguously.) I have to trust their judgement. ... > > Indeed, the advantage of the indefinite > > packages is that they are *very* small in terms of standard wording and > > "weight" (that is, there is no new concepts to learn and understand with them). > > Only if the transposition is exceptionless. That is no "anomalies". Can we > assure that? I fear a flood of Ada Questions (?) beginning 2005. Well, I'm not worrying that the ARG is going to run out of work no matter what ends up in the Amendment. I fully expect a flood of questions on the containers. Almost all of the packages in Ada 95 (except the ones defined in previous standards) generated a lot of questions. Why would this Amendment be different?? > Yes, but the new concept (the cell:-) is minimal, useful, "brilliant", natural > to every programmer. One more minor advantage to indefinite element containers: they only require one instantiation to use. The "cell" solution requires two. **************************************************************** From: Nick Roberts Sent: Saturday, February 21, 2004 12:54 PM > "Cells" seems better to me. Short is always good! I like that name too. Splendid. > Well, a "Cell" (which is doing memory management) is not a pointer, it's > a controlled object containing a pointer. (Otherwise, the memory > wouldn't be recovered on scope exit, which is a clear no-no.) So that > means its size is more like 20 bytes for Janus/Ada. So I think I was > wrong about fragmentation problems (it is big enough to avoid those). > But it certainly would be a potential problem for memory use (if there > are lot of them), and a lot more overhead when items are copied (calls > to Finalize and Adjust for each item, which the directly indefinite > version would not have - it wouldn't need controlled elements as the > container itself is controlled). Of course, this doesn't matter in truly > low performance applications, but there are a lot of middle ground > applications in which that could matter. It may or may not be a problem for memory use. The size of one 'cell' object would, as you say, comprise in the ball park of 20 bytes (a tag and a linked-list 'next' access value, in addition the access value referring to the contained indefinite object). I reckon the overhead (the tag and the next pointer) is likely to be 8 bytes in most cases, although it could be quite a lot more. However, if the average size of each contained object is significantly more than this overhead, it is unlikely to be really significant (it may be a little annoying). If the inidefinite objects are relatively small on average, it matters. I'm not really sure, myself, which scenario will be prevalent in practice. > One more minor advantage to indefinite element containers: they only > require one instantiation to use. The "cell" solution requires two. The extra instantiation could be somewhat amortised away in some (perhaps many) realistic situations. type Fragment_Count is range 0..2000; subtype Fragment_Number is Fragment_Count range 1..Fragment_Count'Last; package Gene_Fragments is new Ada.Containers.Cells(Gene_Array); subtype Gene_Fragment is Gene_Fragments.Cell; use Gene_Fragments; package Fragment_Gangs is new Ada.Containers.Vectors(Fragment_Number,Gene_Fragment); subtype Fragment_Gang is Fragment_Gangs.Vector; use Fragment_Gangs; type Fixed_Gang is array (Fragment_Number range <>) of Gene_Fragment; Sample: constant Fixed_Gang := Ref_Samp_1 & Ref_Samp_2; Here, the instantiation of a cell package permits us to declare an array of cells in addition to a vector of them. I feel that cells would quite often be useful for purposes other than allowing a (definite) container to contain indefinite objects. An advantage of definite containers over their indefinite counterparts is that they permit conversion to and from arrays (including the slicing of a linear container). The cell technique would have the extra advantage that, since only definite containers are used, these array operations would remain available. I feel that in itself could be quite a compelling argument. In my ignorance, could I ask please what the presumed (proper) implementation of Vectors is? In my mind forms a picture of a tree structure with the leaves containing (or pointing to) actual arrays which form fragments of the whole conceptual array. Each fragment would have a counter saying how many of its elements are actually used. Appending an element would require adding a leaf node if there was no more space in the end fragment. Random selection of an element would require descending the tree. Am I way off the mark? If I'm not way off the mark, I would contend that building a linked list and converting to an array (for subsequent random access) would be likely to be superior (to building a vector and selecting randomly from it by tree descent) in a majority of cases in practice. **************************************************************** From: Matthew Heaney Sent: Monday, February 23, 2004 6:23 PM > In my ignorance, could I ask please what the presumed (proper) > implementation of Vectors is? See the files ai302-containers-vectors.ad? ai302-containers-indefinite_vectors.ad? in the latest reference implementation for the details. This implementation has a few more examples, some of which use the new indefinite vector container. Look thru the anagram examples for some ideas. > In my mind forms a picture of a tree structure with the leaves containing > (or pointing to) actual arrays which form fragments of the whole conceptual > array. Each fragment would have a counter saying how many of its elements > are actually used. Appending an element would require adding a leaf node if > there was no more space in the end fragment. Random selection of an element > would require descending the tree. Am I way off the mark? A vector is implemented as an unconstrained array. > If I'm not way off the mark, I would contend that building a linked list > and converting to an array (for subsequent random access) would be likely > to be superior (to building a vector and selecting randomly from it by tree > descent) in a majority of cases in practice. To convert between container types, just use one of the iterators. The container library takes pains to give the library user easy and efficient access to the container elements (that means the actual objects). It is never the case that a container needs, say, an operation to convert itself to an array specifically. A container iterator allows the library user himself to choose the target type, whatever makes the most sense for him. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 24, 2004 10:27 AM Nick Roberts wrote: > > I might suggest a constant Null_Vector, obviating the need for the > Is_Empty function and Clear procedure, but I must admit one disadvantage > of such constants is that they are not inherited. I've found this a > small pain occasionally. On the other hand, the test V = Foo.Null_Vector > might be considered better (more natural, more readable) than > Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm > not sure. This won't work, because the vector type privately derives from Controlled, and therefore you can't declare a constant of the type in a package with preelaborate categorization. However, a constructor function would work. Here are some ideas: function Null_Vector return Vector_Type; function Empty_Vector return Vector_Type; function New_Vector return Vector_Type; function To_Vector (Length : Size_Type) return Vector_Type; function To_Vector (New_Item : Element_Type; Count : Size_Type) return Vector_Type; I actually had a need for something like this in one of my examples. It's kind of a pain that the language doesn't give you a default constructor for a type that you can pass as a parameter. For example, in C++ I can say: container.insert(T()); where T() invokes the default ctor for the element type T. Ada does let you do something like this, when constructing an aggregate: type NT is new T with record I : Integer; end record; Object : constant NT := (T with I => 42); Here we're allowed to use T as the value of the parent part of NT, when constructing an aggregate of type NT. But I can't use the type name as the value of a parameter: Insert (Container, New_Item => T); -- not legal Ada I have to say something like: Insert (Container, New_Item => New_T); where New_T is a function that returns type T. **************************************************************** From: Randy Brukardt Sent: Tuesday, February 24, 2004 1:39 PM > It's kind of a pain that the language doesn't give you a default > constructor for a type that you can pass as a parameter. .. > Here we're allowed to use T as the value of the parent part of NT, when > constructing an aggregate of type NT. But I can't use the type name as > the value of a parameter: > > Insert (Container, New_Item => T); -- not legal Ada True, but Ada 200Y lets you say: Insert (Container, New_Item => (<>)); which is a default-initialized aggregate. Which is what you want, right?? (See AI-287.) We originally tried to use the type name here, but it led to all kinds of problems, and it isn't providing any actual information, so we decided to use the box "<>" instead. So all you really want is an Ada 200Y compiler. :-) **************************************************************** From: Gary Dismukes Sent: Tuesday, February 24, 2004 3:13 PM > This won't work, because the vector type privately derives from > Controlled, and therefore you can't declare a constant of the type in a > package with preelaborate categorization. Not completely true. In Ada 200Y you can make a private type have preelaborable initialization, in which case constants of the type can be declared in preelaborable packages (see AI-161). Type Ada.Finalization.Controlled (and Limited_Controlled) are defined to have preelaborable initialization, though there's a restriction that if a user-defined controlled type overrides Initialize then the type doesn't have preelaborable initialization. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 24, 2004 4:15 PM OK. Thanks for the info. The vector and (hashed) map containers don't override the Initialize operation. The (sorted) set does override Initialize. Let me see if I can get rid of that. It might not matter anyway, since we can use the new "(<>)" notation to construct an anonymous instance of the type. **************************************************************** From: Matthew Heaney Sent: Tuesday, February 24, 2004 4:27 PM I just got rid of the override of Initialize for the set. The full view of Set_Type now looks like: function New_Back return Node_Access; type Set_Type is new Controlled with record Tree : Tree_Type := (Back => New_Back, Length => 0); end record; The function New_Back does the allocation and initialization that I was doing in Initialize. I'll fold this change into the next release of the reference implementation. **************************************************************** From: Matthew Heaney Sent: Friday, February 27, 2004 12:29 PM I just uploaded the latest version of the reference implementation: This version includes indefinite forms for all containers. There are also two more anagram examples, and a new genealogy example. **************************************************************** From: Tucker Taft Sent: Friday, February 27, 2004 4:07 PM I had a couple of problems compiling this. One problem is that you have two versions of package "String_Vectors", one in the top-level dir, and one in the indefinite_vectors subdirectory. You might want to delete the indefinite_vectors subdirectory, since it is redundant with the ai302-containers-indefinite_vectors stuff, and it is confusing because one uses "Natural" where the other uses "Size_Type." The other problem I had was with your "Control_Type" in the private part of indefinite_vectors/indefinite_vectors.ads. Again, this is largely redundant with ai302-containers-indefinite_vectors. But for what it is worth, the former one doesn't compile with our latest compiler, because the type declaration: type VT is new Rep_Types.Vector_Type; fails with complaints about trying to add primitive operations after a type is frozen. It is a bit subtle, but this type declaration is in fact implicitly declaring additional operations on "Control_Type" *after* Control_Type has been passed to a generic. The solution I came up with was putting the declaration of Control_Type into a nested package ("Inner") starting at the declaration of Control_Type and ending after the generic instantiation producing Rep_Types. Then the declaration of VT is outside the (inner) package, meaning that the additional operations it implicitly declares with parameters of type Control_Type don't end up as primitives of Control_Type. A corresponding change is needed in the body of Indefinite_Vectors. In any case, ai302-containers-indefinite_vectors.ad? doesn't have this problem -- you use a different approach. I'll let you know about any other problems I encounter. Very nice work, in any case! **************************************************************** From: Matthew Heaney Sent: Friday, February 27, 2004 4:43 PM I wasn't sure whether I still needed the old indefinite_xxx subdirectories. Those were originally created to show how to implement an indefinite container as a thin layer on top of the official definite containers. However, after I did that Randy suggested that having indefinite forms as an official part of the library might be acceptable, so I went ahead and implemented them, up in the parent directory. I can either remove them entirely from the release, or move them off into a deprecated subdirectory. I suppose a README couldn't hurt, either... > The other problem I had was with your "Control_Type" in the > private part of indefinite_vectors/indefinite_vectors.ads. ... OK. That's easy enough to fix. (I don't really need that derived type. It was only declared to effect transitivity of visibility.) > The solution I came up with was putting the declaration of > Control_Type into a nested package ("Inner") starting at the > declaration of Control_Type and ending after the generic > instantiation producing Rep_Types. Then the declaration > of VT is outside the (inner) package, meaning that the additional > operations it implicitly declares with parameters of type > Control_Type don't end up as primitives of Control_Type. > A corresponding change is needed in the body of Indefinite_Vectors. OK. Thanks for the tip. > In any case, ai302-containers-indefinite_vectors.ad? doesn't > have this problem -- you use a different approach. Indeed. That version is implemented natively, not as a thin layer. The versions in the parent directory are the only ones you really care about. I can move those other versions to somewhere less confusing. > I'll let you know about any other problems I encounter. OK, thanks. I can fold any changes into the next release. I'll be at the meeting in Phoenix, so we can discuss any other issues you have. > Very nice work, in any case! Thanks. I was able to build the reference implementation from the spare parts I had lying around for Charles, so it was a big job but not that big. I was just thinking today that it would be nice to have a functional insertion operation, like this: --see wordcount.adb declare N : Natural renames Insert (Map'Access, Word, 0).all; begin N := N + 1; end; or like this: --see genealogy.adb declare Roots : Set_Type renames Insert (Map'Access, Key => "---").all; begin ... This simulates what I can do in C++ using operator[](). One way to declare it is: function Insert (Map : access Map_Type; Key : String) return access Element_Type; I was thinking the cursor selectors could be declared like this: function To_Element (Cursor : Cursor_Type) return access Element_Type; function To_Key (Cursor : Cursor_Type) return access constant Key_Type; If functions could return an anonymous access type this would allow me to get rid of the Generic_Element and Generic_Key functions. Just some ideas... **************************************************************** From: Dan Eilers Sent: Saturday, February 28, 2004 2:14 PM In ai302/test_sets.adb, on line 91, there is a call to "find" that appears to be ambiguous, matching the find declared in test_sets.adb on line 51, and the find declared in integer_vectors. **************************************************************** From: Adam Beneschan Sent: Monday, March 1, 2004 6:33 PM ... > fails with complaints about trying to add primitive operations > after a type is frozen. It is a bit subtle, but this type > declaration is in fact implicitly declaring additional operations > on "Control_Type" *after* Control_Type has been passed to a generic. Can this be right? Essentially the source is equivalent to: generic ... package Indefinite_Vectors is private type Control_Type is new Controlled with record ... end record; package Rep_Types is type Vector_Type is private; procedure Append (Vector : in out Vector_Type; New_Item : in Control_Type); private ... end Rep_Types; type VT is new Rep_Types.Vector_Type; end Indefinite_Vectors; The derived type declaration causes a new inherited subprogram to be declared implicitly: procedure Append (Vector : in out VT; New_Item : in Control_Type); But as I read RM 3.2.3 and particularly 3.2.3(4), the derived subprogram Append is a primitive subprogram of type VT, but *not* a primitive subprogram of type Control_Type. So there shouldn't be an error message about primitive subprograms being added after Control_Type is frozen (even if there were some declaration that froze Control_Type before the declaration of VT, which there isn't in my reduced example). Also, 3.9.2(13) makes "the explicit declaration of a primitive subprogram of a tagged type" illegal after the type is frozen, but this is not an explicit subprogram declaration. So what did I miss? **************************************************************** From: Randy Brukardt Sent: Monday, March 1, 2004 6:57 PM ... > But as I read RM 3.2.3 and particularly 3.2.3(4), the derived > subprogram Append is a primitive subprogram of type VT, but *not* a > primitive subprogram of type Control_Type. Humm. This looks messy. Primitive subprograms have to be explicitly declared for initial types. But 3.2.3(4) says that inherited routines are primitive for derived types. It doesn't say that routines inherited *from the parent type* are primitive. In this case, Control_Type is derived, so inherited routines are primitive -- and this routine is certainly inherited. Of course, that seems to be a nonsense interpretation of the language. I think that 3.2.3(4) was intended to apply only to routines inherited from the parent. So the question is whether that can be derived from other language (in which case Tucker's compiler has a bug), or if there is actually a language hole. **************************************************************** From: Adam Beneschan Sent: Monday, March 1, 2004 7:26 PM > Humm. This looks messy. Primitive subprograms have to be explicitly declared > for initial types. But 3.2.3(4) says that inherited routines are primitive > for derived types. It doesn't say that routines inherited *from the parent > type* are primitive. In this case, Control_Type is derived, so inherited > routines are primitive -- and this routine is certainly inherited. The exact language of 3.2.3(2,4) is: The primitive subprograms of a specific type are defined as follows: For a derived type, the inherited (see 3.4) user-defined subprograms; So we refer to 3.4 to see what it says about "inherited user-defined subprograms". 3.4(17) says, "For each user-defined primitive subprogram... of the parent type that already exists at the place of the derived_type_definition, there exists a corresponding _inherited_ primitive subprogram of the derived type with the same defining name". The primitive subprograms of the parent type that exist at the time Control_Type is defined are those that exist for Control_Type's parent type, Ada.Finalization.Controlled, namely Initialize, Finalize, Adjust. So to me, those are "the inherited user-defined subprograms" to which 3.2.3(4) refers. I've always interpreted it that way, just from the language of those two sections, independently of any other language in the RM or of any conclusion that a different interpretation would be nonsense. > Of course, that seems to be a nonsense interpretation of the language. I > think that 3.2.3(4) was intended to apply only to routines inherited from > the parent. I agree. I personally think the intent is already clear from the RM. **************************************************************** From: Randy Brukardt Sent: Thursday, April 29, 2004 9:59 PM I've just posted the updated Container library AI. [This is version /03.] This was updated to reflect the conclusions of the six hours of discussion (which was a record for a single AI) at the Phoenix meeting. I'm happy to say that most of the suggestions made here were implemented in some way. Indefinite element containers were added, as well as a list container. Set operations were added to the set package. Iteration was changed somewhat to be more familiar to Ada programmers. The operations and their semantics were made more regular. Comments are welcome. (But please remember that I have to read and file all of them for the permanent record, so try to take the long-winded discussions of philosophy to comp.lang.ada. :-) **************************************************************** From: Pascal Obry Sent: Friday, April 30, 2004 1:26 AM That's great news ! Congratulations to all for the hard word on this issue. **************************************************************** From: Marius Amado Alves Sent: Friday, April 30, 2004 3:04 PM > I've just posted the updated Container library AI... Excelent! Just a tiny comment at this time: the names Indefinite_Vectors, etc. do not sound right to me, because the element type is indefinite, not the containers. Alternatives: 1. Containers.Indefinite_Elements.Vectors 2. Containers.Vectors_Of_Indefinite_Elements 3. Containers_Of_Indefinite_Elements.Vectors ("Indefinite_Elements" is not literally correct either because the type, not the elements, is indefinite. But it is a common idiom to say "things" in place of "thing type".) I think I like 3. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, April 30, 2004 8:20 AM Everybody talks about a real vector, or a complex matrix. Doesn't seem to hurt the mathematicians... **************************************************************** From: Marius Amado Alves Sent: Friday, April 30, 2004 11:21 AM Vector of real numbers... real vector. Vector of elements of indefinite type... vectors of indefinite elements... indefinite vector. Ok, I think the ears will get accostumed. **************************************************************** From: Jeffrey Carter Sent: Friday, April 30, 2004 5:27 PM > Comments are welcome. (But please remember that I have to read and file all > of them for the permanent record, so try to take the long-winded discussions > of philosophy to comp.lang.ada. :-) Perhaps I'm missing something, but I don't see why the vector component needs the assertion anymore. If it's not needed, it would be nice to eliminate it. **************************************************************** From: Dan Eilers Sent: Friday, April 30, 2004 6:25 PM Some typos: > All containers are non-limited, and hence allow ordinary assignment. In > the unique case of a vector, there is a separate assignment procedure: > > Assert (Target => V1, Source => V2); ^^^^^^ > The reason is that the model for a vector is that it's implemented using > an unconstrained array. During ordinary assignment, the internal array > is deallocated (during controlled finalization), and then a new internal > [array] is allocated (during controlled adjustment) to store a copy of the ^^^^^^^ "is may not" hat the average bucket caching *effects arbitary conbined evalution exmples Generic_Revserse_Find heirarchy Indefinited_Hashed_Maps insuffiently machinary simplied stratgies sucessful **************************************************************** From: Christoph Grein Sent: Thursday, May 6, 2004 4:07 AM A few more typos: specify precisely where this will happen (it will happen no lat{t}er than the ^^^ AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are [the] only ways that an element can change from empty to non-empty. ^^^^^ Any exceptions raising during element assignment raised (as everywhere else) cursor designates with a[n] index value (or a cursor designating an element at ^^^ declared in Containers.Vectors with a[n] ambiguous (but not invalid, see below) ^^^ but it is {is} a *different* element ^^^^ **************************************************************** From: Marius Amado Alves Sent: Thursday, May 6, 2004 3:07 PM What happened to the Lower_Bound, Upper_Bound and "insert with hint" operations for sets? They were very useful. Is there a way to make the same kind of searches/updates with the new spec? Furthermore, often the user already has a cursor value for an element that he knows is a bound for another search he wants to make. It should be possible to use this information to improve the search. Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation providing something similar, albeit more restrictive (the known cursor had to be adjacent). The current version does not have even this. /* I found these requirements in real world situations, namely writing a database system that uses large sets to store some things. */ At least implementation permissions/advice should exist allowing/encouraging implementations to provide these optimized search/update operations. Namely via operations with the standard profiles except having additional "hint" parameters. Better yet make a number of these optimized profiles standard, permitting the actual optimization to be null. To assure portability. **************************************************************** From: Matthew Heaney Sent: Monday, May 10, 2004 6:48 PM > What happened to the Lower_Bound, Upper_Bound and "insert with hint" > operations for sets? They were very useful. Is there a way to make the same > kind of searches/updates with the new spec? I tried to keep them, but I argued badly and hence lost that vote. I discussed restoring these operations (Lower_Bound and Upper_Bound) with Randy, and he said I'd have post a message on ada-comment justifying why those operations are needed. (You have to do it that way since the entire ARG voted in the last meeting, and you have give them the opportunity to reconsider their decision during the next meeting.) It's good that you're asking about this, since that's evidence that there is interest in these operations from someone other than me. It would be helpful if you could post a follow-up message on ada-comment giving a specific example of why you need LB and UB. The ARG can then put this on the agenda for the ARG meeting in Palma. > Furthermore, often the user already has a cursor value for an element that he > knows is a bound for another search he wants to make. It should be possible > to use this information to improve the search. That's similar to insert-with-hint. However, the ARG members weren't persuaded by my defense of optimized insertion. > Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation > providing something similar, albeit more restrictive (the known cursor had to > be adjacent). The current version does not have even this. Yes, that is correct. Personally I can live without the insert-with-hint operations (because you have insert-sans-hint), but I think removing the Lower_Bound and Upper_Bound operations was a mistake, since that leaves no way to find the set element nearest some key. All you have now is basically a membership test, which is too coarse a granularity. For example, someone on CLA had an ordered set of integers, and he wanted to iterate over the values in [0, 1000), then from [1000, 2000), etc. Without Lower_Bound there's no way to do that. > /* I found these requirements in real world situations, namely writing a > database system that uses large sets to store some things. */ Please post an example of what you're trying to do, and show how it can't be done without Lower_Bound and Upper_Bound. > At least implementation permissions/advice should exist allowing/encouraging > implementations to provide these optimized search/update operations. Namely > via operations with the standard profiles except having additional "hint" > parameters. Better yet make a number of these optimized profiles standard, > permitting the actual optimization to be null. To assure portability. Give an example of why you need Lower_Bound and Upper_Bound, and request that the ARG put it on the agenda for Palma. Some other possibilities are: procedure Find (Container : in Set; Key : in Key_Type; Position : out Cursor; Success : out Boolean); If the Key matches, then Success=True and Position.Key = Key. Otherwise, Success=False and Key < Position.Key. Technically you don't need that, since you can test the result of Lower_Bound: C : Cursor := Lower_Bound (Set, Key); begin if Key < Position then null; --Position denotes next (successor) neighbor else null; --Position denotes node containing Key end; Another possibility is to name it something like "Ceiling" or whatever. An additional possibility is something (STL) like: procedure Equal_Range (Container : in Set; Key : in Key; Lower_Bound : out Cursor; Upper_Bound : out Cursor); Then you can test: Lower_Bound = Upper_Bound => key not found Lower_Bound /= Upper_Bound => found This latter operation has the benefit of working with multisets too. **************************************************************** From: Marius Amado Alves Sent: Tuesday, May 11, 2004 6:55 AM Upon Heaney's advice, I'll detail the case for optimized operations for sets. I use Lower_Bound in the implementation of Mneson, in at least four subprograms, excerpted below. For the entire code see www.liacc.up.pt/~maa/mneson. Mneson is a database system based on a directed graph implemented as a set of links. Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then by the 2nd. Front_Vertex is an unlinked vertex value lower that any other. procedure Delete_All_In_Range (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type) is use Link_Sets; I : Cursor_Type := Lower_Bound (Link_Set, From); begin while I /= Back (Link_Set) loop exit when Element (I) > To; Delete (Link_Set, I); end loop; end; procedure For_Each_Link_In_Range (Set : Link_Sets.Set_Type; From, To : Link_Type) is use Link_Sets; I : Cursor_Type := Lower_Bound (Set, From); E : Link_Type; begin I := Lower_Bound (Set, From); while I /= Back (Set) loop E := Element (I); exit when E > To; Process (E); Increment (I); end loop; end; function Connected (Source : Vertex) return Boolean is use Link_Sets; begin return Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor; end; function Inv_Connected (Target : Vertex) return Boolean is use Link_Sets; begin return Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor; end; I'm also developing optimized algorithms for set intersection that require not only Lower_Bound but also search with hint (known bounds), and eventually Upper_Bound. These are still on the drawing board, but I already know at this point that they require those operations. Soon I'll have some code, but it's rather complicated, because Mneson sets can be of various kinds, extensional and intensional, the basic extensional being a designated vertex whose targets are the elements, the intensional being a dedicated "selection" structure, designed for lazy evaluation, with elements being represented in several ways, and materialized only upon certain operations like iteration and extraction. My interest is databases. At least here, ordered sets are an incredibly useful thing. Pretty much every interesting database function can be defined in terms of them. In a graph-based implementation like Mneson, set intersection is crucial. The spec now has the full set algebra (union, intersection, differences, etc.) That is good, and if their performance were ideal for all purposes, I'd be silent. But I know their performance cannot be ideal in many situations, because I know optimization techniques that require more than what the spec now offers. Namely they require search with hint and/or Lower_Bound. And anyway the spec does not specify performance for them (only for Insert, Find, Element). Also note that the Find operations for Vectors and Hashed_Maps are kind of hintful, so it's only fair that Ordered_Sets have these versions too. For databases, performance is paramount. Even apparently small gains matter. "Apparently" because many database functions scale worse than lineary, e.g. cross products. Optimization is these cases is a must. In many cases the optimization makes all the difference (between feasible and unfeasible). Optimization is invariably based on knowledge the system prepares about the sets in the expression queried for computation. The preparation time is usually negligible. In a system implemented with Ada.Containers, great part of the prepared knowledge is ultimately expressed as cursor values for known element value bounds for the sought element ranges. Ordered_Sets implementations are likely to be able to take advantage of this knowledge for improving time performance (the previous AI302 "insert with hint" is an example). Therefore it is required that this knowledge can be passed to the basic operations. Immodestely assuming I've made a convincing case, I can inform that Heaney and myself have solid ideas on how the operations should look like and we are ready to prepare a pretty open-shut proposal for Palma. I myself will be there from Sunday to Sunday, and happily available for discussion. To me the most promising format is *prescribing* hintful versions of Find et al. but with only *advised* performance, i.e. allowing null optimization. **************************************************************** From: Matthew Heaney Sent: Tuesday, May 11, 2004 12:37 PM > Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for > Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set > are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then > by the 2nd. Front_Vertex is an unlinked vertex value lower that any other. > > procedure Delete_All_In_Range > (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type) > is > use Link_Sets; > I : Cursor_Type := Lower_Bound (Link_Set, From); > begin > while I /= Back (Link_Set) loop > exit when Element (I) > To; > Delete (Link_Set, I); > end loop; > end; You might want to vet From and To, to assert that they're in order. It also looks like you mean to delete the node designated by To (this is apparently a closed range), which means you could use Upper_Bound to find the endpoint of the range: procedure Delete_All_In_Range (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type) is pragma Assert (From <= To); use Link_Sets; I : Cursor_Type := Lower_Bound (Link_Set, From); J : constant Cursor_Type := Upper_Bound (Link_Set, To); begin while I /= J loop Delete (Link_Set, I); end loop; end; > procedure For_Each_Link_In_Range > (Set : Link_Sets.Set_Type; From, To : Link_Type) > is > use Link_Sets; > I : Cursor_Type := Lower_Bound (Set, From); > E : Link_Type; > begin > I := Lower_Bound (Set, From); --??? > while I /= Back (Set) loop > E := Element (I); > exit when E > To; > Process (E); > Increment (I); > end loop; > end; This again appears to be a closed range, so I recommend using Upper_Bound to find the endpoint: procedure For_Each_Link_In_Range (Set : Link_Sets.Set_Type; From, To : Link_Type) is pragma Assert (From <= To); use Link_Sets; I : Cursor_Type := Lower_Bound (Set, From); J : constant Cursor_Type := Upper_Bound (Set, To); begin while I /= J loop Process (Element (I)); Increment (I); end loop; end; Alternatively, you could use the new Generic_Update procedure: procedure For_Each_Link_In_Range (Set : Link_Sets.Set_Type; From, To : Link_Type) is pragma Assert (From <= To); use Link_Sets; procedure Process (E : in out Link_Type) is begin ...; --whatever end; procedure Update is new Generic_Update; I : Cursor_Type := Lower_Bound (Set, From); J : constant Cursor_Type := Upper_Bound (Set, To); begin while I /= J loop Update (I); Increment (I); end loop; end; (Note that I only have the vectors done in the reference implementation.) > function Connected (Source : Vertex) return Boolean is > use Link_Sets; > begin > return > Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor; > end; Lower_Bound will only return Null_Cursor if the value is greater than every element in the set. So it looks like you're testing whether the value is less than or equal to an element in the set. There are probably other ways to implement this predicate function, for example: function Connected (Source : Vertex) return Boolean is use Link_Sets; begin if Is_Empty (Source) then return False; end if; return Link_Type'(Source, Front_Vector) <= Last_Element (Links); end; > function Inv_Connected (Target : Vertex) return Boolean is > use Link_Sets; > begin > return > Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor; > end; Ditto for this function. The moral here is you don't need Lower_Bound if all you do is throw away its result. However, it looks like in the first two examples, you have a legitimate need for Lower_Bound (and arguably Upper_Bound, too). **************************************************************** From: Marius Amado Alves Sent: Tuesday, May 11, 2004 1:20 PM > ... > > function Connected (Source : Vertex) return Boolean is > > use Link_Sets; > > begin > > return > > Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor; > > end; > > > Lower_Bound will only return Null_Cursor if the value is greater than > every element in the set. Oops, this was a bug. Thanks a lot for catching it. What I must have meant is: X := Lower_Bound (Links, (Source, Front_Vertex)); return X /= Null_Cursor and then Element (X) (1) = Source; Thanks a lot for the other suggestions too. I won't be applying them yet because if-it-works-dont-fix-it, but I've certainly queued them in the Mneson "to do" list. > ...it looks like in the first two examples, you have a legitimate > need for Lower_Bound (and arguably Upper_Bound, too). Yes. And these, unlike the specific version of Connect above, are used and tested. (It seems the specific version of Connect above had not been used yet. Which accounts for it's fault not being detected. It's there in the library because when I wrote the libary it looked like it would be necessary. Thanks to you, if and when it does, it will be flawless. Thanks again.) **************************************************************** From: Tucker Taft Sent: Tuesday, May 11, 2004 2:17 PM I think I missed the beginning of this discussion, but I would agree with the suggestion for using Floor and Ceiling rather than Lower_Bound and Upper_Bound, to find the nearest element of the set no greater (or no less, respectively) than a given value. And I agree they would be useful operations on an ordered set. Lower_Bound and Upper_Bound seem more likely to refer to the minimum and maximum elements of the entire set. **************************************************************** From: Marius Amado Alves Sent: Tuesday, May 11, 2004 3:31 PM > ... I would agree with the suggestion for using > Floor and Ceiling... Good. One of the proposals I'm discussing with Matt has indeed function Floor (Item : Element_Type) return Cursor; function Ceiling (Item : Element_Type) return Cursor; where Ceiling = Lower_Bound, but Floor /= Upper_Bound, Floor = Reverse_Lower_Bound. (Here Lower_Bound and Upper_Bound are the functions defined in version 1.1 of the spec, and that were dropped in the current. Reverse_Lower_Bound is a fictitious function like Lower_Bound but in reverse order.) The proposal also has function Slice (Container : Set; Low, High : Cursor) return Set; function Open_Bound (Position : Cursor) return Cursor; The four functions provide a complete search optimization framework. The main idea is that a slice can be used to convey range and/or optimization information to any operation. Slice returns the subset of Set consisting of the elements of Set that are in the specified interval. Open_Bound returns a cursor marked as an open bound when used in Slice. A unmarked cursor represents a closed bound. The integer set example, namely to iterate over the values in [0, 1000), then [1000, 2000), etc., becomes: procedure Iterate is new Generic_Iteration; begin Iterate (Slice (Integer_Set, Floor (0), Open_Bound (Ceiling (1000)))); Iterate (Slice (Integer_Set, Floor (1000), Open_Bound (Ceiling (2000)))); Also, with this framework, Upper_Bound (Set, Item) can be realised functionally as: First (Slice (Set, Open_Bound (Ceiling (Set, Item)), Last (Set))) So no need for Upper_Bound. The only sensitive aspect of this framework is the use of a slice as an object of update operations (Insert, etc.) A slice is likely to be best represented as a 'virtual' set, i.e. only a 'view' to the corresponding subset of its 'ground' container. We are currently checking whether and how and which update operations can process this virtual object properly. **************************************************************** From: Matthew Heaney Sent: Tuesday, May 11, 2004 5:13 PM > Lower_Bound and Upper_Bound seem more likely to refer to the > minimum and maximum elements of the entire set. As Mario has pointed out, Ceiling is equivalent to Lower_Bound. There is no function that corresponds to a Floor function in the STL, Charles, or earlier releases of the AI-302 draft. I did discuss how to implement a floor function in the examples section of earlier drafts, as follows: Floor (S, K) = Previous (Upper_Bound (S, K)) Here Floor is derived from Upper_Bound. In the most recent draft, for subtle reasons you have to implement Floor as: function Floor (S, K) return Cursor is C : Cursor := Upper_Bound (S, K); begin if C = No_Element then return Last (S); else return Previous (C); end if; end; To derive Upper_Bound from Floor, I think it would be: function Upper_Bound (S, K) return Cursor is C : Cursor := Floor (S, K); begin if C = No_Element then return First (C); else return Next (C); end if; end; To iterate over the half-open range [K1, K2), where K1 <= K2, I think you would have to write: declare I : Cursor := Ceiling (S, K1); J : Cursor := Floor (S, K2); begin if J = No_Element then J := First (S); end if; while I /= J loop ... Next (I); end loop; end; However, this seems a little awkward. (Assuming my analysis is correct. I have to think about whether [K1, K2) is a closed range or a half-open range. Mario's example was a closed range.) What we really need is something to compliment Ceiling, something like "Strict_Ceiling" or "Proper_Ceiling", e.g. declare I : Cursor := Ceiling (S, K1); -- K1 <= I.Key J : Cursor := Proper_Ceiling (S, K2); -- K2 < J.Key begin while I /= J loop ...; end; Is there a technical term for "proper ceiling"? I want a function that, given a key, returns the smallest key greater than the key. (That's what function Upper_Bound returns, but that name seems to be confusing to people unfamiliar with the STL.) **************************************************************** From: Marius Amado Alves Sent: Tuesday, May 11, 2004 5:19 PM Two corrections: Slice returns the subset of *Container* consisting of the elements of *Container* that are in the specified interval (not Set, that's the type). Upper_Bound (S, Item) = First (Slice (S, Open_Bound (*Floor* (S, Item)), Last (S))) (S instead of Set because that's the type name, and Floor, not Ceiling) Sorry. BTW, Reverse_Upper_Bound (S, Item) = Last (Slice (S, First (S), Open_Bound (Ceiling (S, Item)))). Also, S = Slice (S, First (S), Last (S)) should always hold. Currently thinking about the special cases, namely those with occurrences of No_Element. And about the slice-for-update problem: easily solved with a specification similar to the current one for invalid cursors, given that a slice is expressed as cursor values. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 3:15 AM > Here Floor is derived from Upper_Bound. In the most recent draft, for > subtle reasons you have to implement Floor as: > > function Floor (S, K) return Cursor is > C : Cursor := Upper_Bound (S, K); > begin... But you don't have Upper_Bound in the most recent draft! > What we really need is something to compliment Ceiling, something like > "Strict_Ceiling" or "Proper_Ceiling", e.g. I'm against strange things in the spec. Give the user only well known concepts. A complete set of primitive well known concepts. Ceiling, Floor, Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants. > Is there a technical term for "proper ceiling"? Smallest_Greater_Than :-) But then to be complete you need also Greatest_Smaller_Than. But then again, don't give strange things. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 3:44 AM > procedure Delete_All_In_Range > (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type) > is > pragma Assert (From <= To); > > use Link_Sets; > I : Cursor_Type := Lower_Bound (Link_Set, From); > J : constant Cursor_Type := Upper_Bound (Link_Set, To); > begin > while I /= J loop > Delete (Link_Set, I); > end loop; > end; My impression is that the original version is more efficient, because it only calls a search function once (Lower_Bound). Your version makes two calls (Lower_Bound, Upper_Bound). I assume these operations have O(log n) time performance, and the others (Back, Element, Delete) constant time. But my version calls these more times. So I'd have to check the absolute times. This also provides an example for optimized search with Slice. Because I know the upper bound must be above the lower, I could pass this information to Upper_Bound: J : constant Cursor_Type := Upper_Bound (Slice (Link_Set, I, Last (Link_Set)), To); **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 9:33 AM >>What we really need is something to compliment Ceiling, something like >>"Strict_Ceiling" or "Proper_Ceiling", e.g. > > I'm against strange things in the spec. Give the user only well known > concepts. A complete set of primitive well known concepts. Ceiling, Floor, > Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants. Does Upper_Bound qualify as a "well known concept"? All I'm trying to do is come up with another name for Upper_Bound. >>Is there a technical term for "proper ceiling"? > > Smallest_Greater_Than :-) But then to be complete you need also > Greatest_Smaller_Than. But then again, don't give strange things. But Upper_Bound isn't a strange thing. I suspect Stepanov was motivated by the set-theoretic terms "upper bound," "least upper bound", etc. But I think it's that conflation to which Tucker objects. Other names for Upper_Bound are: limit, supremum, supremum limit, etc. procedure Op (K1, K2 : Key_Type) is I : Cursor := Ceiling (Set, K1); J : constant Cursor := Limit (Set, K2); begin while I /= J loop ...; end; **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 10:16 AM > Lower_Bound and Upper_Bound seem more likely to refer to the > minimum and maximum elements of the entire set. One counter-argument is that both Lower_Bound and Upper_Bound accept a key. Maybe we could provide these: Lower_Limit Floor Ceiling (AKA Lower_Bound) Upper_Limit (AKA Upper_Bound) with the following semantics: Key (Lower_Limit (S, K)) < K Key (Floor (S, K)) <= K Key (Ceiling (S, K)) >= K Key (Upper_Limit (S, K)) > K **************************************************************** From: Tucker Taft Sent: Wednesday, May 12, 2004 11:41 AM I don't find the names Lower_Limit and Upper_Limit a whole lot better than Lower_Bound/Upper_Bound. I don't see why you need them. It seems Lower_Limit(S,K) = Previous(Ceiling(S,K)) and Upper_Limit(S,K) = Next(Floor(S,K)) Or am I confused? **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 1:46 PM No, you got it right, except for the endpoints; see my last message. For example, if Ceiling(S,K) returns No_Element (because K is large), then Previous(Ceiling(S,K)) returns No_Element, whereas Lower_Limit returns Last(S). We can define the abstraction to have the semantics you describe above, but I think that requires that (1) the set has an internal sentinel and (2) type Cursor is privately tagged. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 10:36 AM > Lower_Limit > Floor > Ceiling (AKA Lower_Bound) > Upper_Limit (AKA Upper_Bound) Better to keep a consistent metaphor: Ground, Floor, Ceiling, Roof. "Limit" is too abstract. Alternatives for Ground: Basement, Base... Underworld :-) **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 11:47 AM > Lower_Limit > Floor > Ceiling (AKA Lower_Bound) > Upper_Limit (AKA Upper_Bound) In mathematics "lower limit" applies to a sequence of values (e.g. the values of sin (x) with x from zero to infinity), and means the least value of the sequence. So it's really more similar to First. [My main source for checking this stuff has been the Wikipedia (en.wikipedia.org)] Have you considered my Slice, Open_Bound proposal yet? Recapitulating: Ground, Floor, Ceiling, Roof, do not solve the problem of providing search optimization information to the other operations. Slice, Open_Bound do. And Ground, Roof can be derived from Ceiling, Floor, Slice, Open_Bound, First, Last. So my proposal is adding Ceiling, Floor, Slice, Open_Bound. And eventually Ground, Roof defined as "equivalent to..." **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 8:22 AM Connected *is* a legitimate example of the need for Lower_Bound. The fixed Connected body is X : Cursor_Type := Lower_Bound (Links, (Source, Front_Vertex)); begin return X /= Null_Cursor and then Element (X) (1) = Source; Matt, your suggestion, > function Connected (Source : Vertex) return Boolean is > use Link_Sets; > begin > if Is_Empty (Source) then > return False; > end if; > > return Link_Type'(Source, Front_Vector) <= Last_Element (Links); > end; won't work. Apart from the obvious bugs Is_Empty (Source) which should be Is_Empty (Links), and Front_Vector which should be Front_Vertex, the return expression Link_Type'(Source, Front_Vertex) <= Last_Element (Links) does not yield as desired. Front_Vertex is a value that is never connected and is lower than any other. Let's say Front_Vertex = 0 and Links = ((2, 3), (2, 4)). Then Connected (1) would (erroneously) return True, because (1, 0) <= (2, 4). You're not checking for actual membership in Links. Maybe you had something else in mind. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 10:30 AM <> Not terribly, no. <> Actually the term "least upper bound" in mathematics is what we have been calling *Lower_Bound*, or Ceiling. And "greatest lower bound" is Floor. I don't know a mathematical term for what we have been calling Upper_Bound. Which to me indicates a bit of strangeness. Also, Upper_Bound (let's keep calling it that): - does not seem to be so useful as Ceiling, if at all - can be derived with First, Last, and Slice My previous examples demonstrate this. But the term you're looking for might be: Roof. **************************************************************** From: Tucker Taft Sent: Wednesday, May 12, 2004 12:40 PM > Have you considered my Slice, Open_Bound proposal yet? I don't see the need for "Slice" or "Open_Bound". These seem to be introducing a layer of "virtual" set on top, which you could do with a new abstraction. Is there a real efficiency need here, or just a desire for the additional abstraction level? For example, it seems using an Open_Bound as the high bound of an iteration is equivalent to iterating up to Previous(Ceiling()). You can easily create a "real" slice by iterating from the low bound to the high bound and insert the result in a new set. If you want a "virtual" slice, then to me that is an additional layer on top, and not something appropriate for the basic Ordered_Sets abstraction. ... > So my proposal is adding Ceiling, Floor, Slice, Open_Bound. > > And eventually Ground, Roof defined as "equivalent to..." I don't see the need to go beyond Floor and Ceiling. They seem to provide all the primitives needed to enable the efficient construction of any operations you might want, and I believe their meaning is more intuitive than the others you have suggested. **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 1:25 PM > For example, it seems using an Open_Bound as the high > bound of an iteration is equivalent to iterating up to > Previous(Ceiling()). This requires care, since Ceiling can return No_Element if the key is greater than every key in the set. To make your algorithm fully general I think you'd have to say: declare C : Cursor := Ceiling (S, K); begin if Has_Element then Previous (C); else C := Last (S); end if; ... end; > I don't see the need to go beyond Floor and Ceiling. They > seem to provide all the primitives needed to enable the > efficient construction of any operations you might want, > and I believe their meaning is more intuitive than the others > you have suggested. As above, the problem case is when Floor returns No_Element, because the key is less than every key in the set. To implement an equivalent of Upper_Bound, it's not good enough to say Next (Floor (S, K)); you have to say instead: declare C : Cursor := Floor (S, K); begin if Has_Element (C) then Next (C); else C := First (S); end if; ... end; I don't know whether this is really a problem, but I just wanted to bring it up. Having to handle the endpoints as a special case is a consequence of the fact that we got rid of the internal sentinel node. Another possibility is to restore the sentinel, and then define rules for how it compares to the deferred constant No_Element. Assuming type Cursor is defined as: type Node_Type is record -- red-black tree node Color : Color_Type; ... end record; type Cursor is record Node : Node_Access; end record; No_Element : constant Cursor := (Node => null); function Has_Element (C : Cursor) return Boolean is begin if C.Node = null then return False; end if; if C.Node.Color = White then -- sentinel has special color return False; end if; return True; end; function "=" (L, R : Cursor) return Boolean is begin if L.Node = null or else L.Node.Color = White then return R.Node = null or else R.Node.Color = White; end if; if R.Node = null or else R.Node.Color = White then return False; end if; return L.Node = R.Node; end; The problem of course is that "=" for type Cursor overrides predefined "=", which means predefined "=" re-emerges when type Cursor is a record or array component, or when type Cursor is a generic actual type. I suppose we could privately tag type Cursor, to guarantee that predefined "=" never re-emerges. I was trying to avoid that, however. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 1:47 PM "Lower_Limit(S,K) = Previous(Ceiling(S,K))" You mean Lower_Limit (S, K) = Previous (Floor (S, K)). But this fails when Floor (S, K) < K. **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 1:56 PM No. Tucker was correct. **************************************************************** From: Tucker Taft Sent: Wednesday, May 12, 2004 2:21 PM No, I meant what I wrote, based on Matt's specification that Key(Lower_Limit(S,K)) < K. I'm not sure you and Matt have the same definition in mind for all these functions. In particularly I get the sense that your definition of Lower_Bound is the opposite of his. I understand the notion of Greatest_Lower_Bound on a lattice, but I have never quite understand how that relates to Lower_Bound. In any case, I was focusing on the specifications that Matt gave for Lower_Limit and Upper_Limit, and based my equivalence on those. And I realize my equivalence fails at the end points, but I suspect that some special handling may be required for those in any case, and it is easy enough for the user to define a function that does what is desired (e.g. Previous_Or_Last() which returns Last when given No_Element). > But this fails when Floor (S, K) < K. That's why I wrote Previous(Ceiling(S,K)). **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 4:30 PM Oops, sorry. *I* was confused. By the way, I checked the names so far, and they are (aligned, but in no specific order): Version 1.1 Mathematics My names Matthew's AKAs, other ---------------------------------------------------------------------------- Lower_Bound least upper bound Ceiling Ceiling Upper_Bound Roof Upper_Limit greatest lower bound Floor Floor Reverse_Lower_Bound Ground Lower_Limit Reverse_Upper_Bound lower limit First upper limit Last ---------------------------------------------------------------------------- **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 2:18 PM > I don't see why you need them. It seems > Lower_Limit(S,K) = Previous(Ceiling(S,K)) and > Upper_Limit(S,K) = Next(Floor(S,K)) Thinking about this issue some more, there might be a way to create these semantics without a sentinel. If a cursor is implemented this way: type Cursor is record Container : Set_Access; Node : Node_Access; end record; In which case you could implement Previous as: function Previous (C : Cursor) return Cursor is begin if C.Container = null then --No_Element return C; end if; if C.Node = null then --pseudo-sentinel return C; --or: Last (C.Container) end if; if C = First (C.Container) then return (C.Container, null); --pseudo-sentinel end if; return Previous (C.Container.Tree, C.Node); end; Next would be implemented similarly. The only issue here is that Previous (First (S)) /= No_Element (the LHS has a non-null set pointer, the RHS has a null set pointer). I don't know if this is an issue. **************************************************************** From: Tucker Taft Sent: Wednesday, May 12, 2004 2:30 PM I don't think we need to change "Previous" to make these equivalences work for endpoints. Just let the user write a "Previous_Or_Last" if they really want to, which would need to take both a cursor and a set. Or more directly, write Lower_Limit or Upper_Limit if you want them, since these already have enough information with the set and the key. Providing Ceiling and Floor still seems adequate to me, as they provide the needed primitives for all other operations mentioned thus far. **************************************************************** From: Matthew Heaney Sent: Wednesday, May 12, 2004 2:36 PM OK. That seems reasonable. I just wanted to make sure we were on the same page w.r.t the behavior at the endpoints. **************************************************************** From: Marius Amado Alves Sent: Wednesday, May 12, 2004 1:28 PM <> Efficiency. Slice is a simple way of passing known bounds to *any* operation. As an example consider the usual scenario from accounting where you have invoices, and each invoice has a variable number of items. The relational representation of this database includes a set Items of (Invoice_Id, Item_Id) pairs, ordered by (Invoice_Id, Item_Id). You want to insert a new invoice X with items A and B. Without Slice you do: Insert (Items, (X, A), Point_XA, Ok); Insert (Items, (X, B), Point_XB, Ok); Each time Insert will have to search for the insertion point from the start (e.g. from the root of a binary tree). But clearly Point_XA is close to Point_XB, so if there was a way of telling Insert that we are inserting (X, B) next to Point_XA, Insert could start looking from there to great advantage. Slice provides that way. Insert (Slice (Items, Point_XA, Last (Items)), (X, B), Ok); You could even save some extra micro-seconds writing: Insert (Slice (Items, Open_Bound (Point_XA), Last (Items)), (X, B), Ok); [Of course there are other ways, not relational, of representing the data. For example, Items could be a set of pairs (Invoice_Id, Item_Set), where Item_Set is a set of items. But there are a number of reasons why you might want the relational scheme. One is that with this scheme you can search Items by properties of Item_Id. For example you might want to know which invoices sold part number 12345. One more subtle reason--not applicable to this example, but occuring in other common situations--has to do with the unfortunate fact that it is not possible to have recursive containers without resorting to a pointer idiom. There are other reasons.] **************************************************************** From: Tucker Taft Sent: Wednesday, May 12, 2004 3:07 AM > Efficiency. Slice is a simple way of passing known > bounds to *any* operation.... If I understand you, Slice is not a copy, but a by-reference subset of a set, created for the purpose of improving performance. I don't find this example sufficiently compelling to include it in a basic capability like Ordered_Sets. It requires significant set up by the user, and it seems possible that in some implementations, it would be a waste of energy. I like "Ceiling" and "Floor" because they address the common notion of "nearest" element or approximate match, something which makes sense to ask in a set. Slice and Open_Bound seem to only serve some more obscure performance concern, which I don't see of being of wide or general usefulness. All these things involve subtle tradeoffs, and I accept you might make different choices, but we are looking to provide the 20% of all possible set operations that together meet the needs of 80% of the typical users of sets. **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 6:34 AM > > Efficiency. Slice is a simple way of passing known > > bounds to *any* operation.... > > If I understand you, Slice is not a copy, but a by-reference > subset of a set, created for the purpose of improving performance. Exactly. It *must not* be a copy. > I don't find this example sufficiently compelling to include it > in a basic capability like Ordered_Sets. It requires significant > set up by the user, and it seems possible that in some implementations, > it would be a waste of energy. The setup is not significant because the user can always ignore the slice idiom, and/or only use it when the known bounds have been acquired naturally from previous operations required by the application logic, as was the case in the invoices example. The implementation is easy, especially if null optimization is allowed, as I proposed (a slice obviously knows about its base container, so an unoptimized operation can just call itself with the base). But in most implementations, namely using trees or skip lists, the implementation of non-null optimization is also easy, because usually the internal search primitives are recursive operations accepting bounds expressed as a node or nodes, and the Cursor type is likely to have node information, as in the previous Matt's study. So not a waste of energy. The implementation is already there. And here's one more real life example. Website access analysis. You want to identify sessions from a HTTP requests log file. A session is a sequence of requests from the same IP such that the time between each consecutive request does not exceed 30 minutes (this is a common criteria). You want to update each request with the corresponding (computed) session id. You key the access log by (IP, Time), and traverse the entire file to effect this logic. You will have naturally collected bounds for the fine search and update operations. Rather strict bounds, giving you (in an non-null optimization implementation) orders of magnitude gains in time. The usual application of website analysis is for huge log files, of tens of million accesses. The gains could mean a difference from hours to minutes or seconds. I've done this stuff using databases systems (Postgres, MySQL), and the scripts ran four hours. I wasn't able to optimized more because of the same reasons we're discussing here: lack of ways to pass known bounds to the core data engine. I've done this kind of work in several real life application, including http://soleunet.ijs.si/website/other/final_report/html/WP5-s9.html Note that with the increasing availability of large RAM, the tendency is towards *prevalent* systems, where all data required for search and retrieval is held in RAM during work. An optimized Ada.Containers library could mean a great plus for Ada in this area. Databases have been identified as a promising area for Ada. Perhaps the DADAISM project had not stalled if there were optimized Ada.Containers around then. Open_Bound is not strictly required for optimization, but together with Slice it provides a means to express any kind of interval. I understand the 20/80 rule. It's just that in my perception the addition of Slice configures 20.5/90 or so. Say 21/95 further adding Open_Bound. **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 7:36 AM Note that Slice is also useful for non-optimization purposes. For example, currently to process a "range" you must use the "active" iterator idiom: I : Cursor := From; begin while I <= To loop Process (I); Next (I); end loop; With Slice you have access to the "passive" idiom right out of the box: procedure Iterate is new Generic_Iteration; begin Iterate (Slice (S, From, To)); **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 8:08 AM > ...you could use the new Generic_Update procedure: > > procedure For_Each_Link_In_Range > (Set : Link_Sets.Set_Type; From, To : Link_Type) > is > pragma Assert (From <= To); > > use Link_Sets; > > procedure Process (E : in out Link_Type) is > begin > ...; --whatever > end; > > procedure Update is new Generic_Update; > > I : Cursor_Type := Lower_Bound (Set, From); > J : constant Cursor_Type := Upper_Bound (Set, To); > begin > while I /= J loop > Update (I); > Increment (I); > end loop; > end; Generic_Update is excellent stuff. It does not apply in this particular case though, because Mneson links are immutable by design (can only be created or deleted, never changed). But there are a lot of element update situations in other applications, and so having Generic_Update is a great improvement from version 1.1 of the spec and corresponding reference implementation (that is the one currently used by Mneson, and that does not have Generic_Update--good thing that Mneson does not need them :-). **************************************************************** From: Matthew Heaney Sent: Thursday, May 13, 2004 10:24 AM This will have to written as I : Cursor := Ceiling (Set, From); J : Cursor := Floor (Set, To); begin if J = No_Element then --To is small key pragma Assert (I = First (Set)); return; end if; Next (J); --now has value of Upper_Bound while I /= J then Update (I); Next (I): end loop; end; > Generic_Update is excellent stuff. It does not apply in this particular case > though, because Mneson links are immutable by design (can only be created or > deleted, never changed). But there are a lot of element update situations in > other applications, and so having Generic_Update is a great improvement from > version 1.1 of the spec and corresponding reference implementation (that is > the one currently used by Mneson, and that does not have > Generic_Update--good thing that Mneson does not need them :-). Generic_Update is equivalent to Generic_Element. The only difference is that Generic_Update doesn't require the element to be aliased. It provides no new functionality relative to the 1.1 spec. **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 12:13 PM I see. I didn't need it for Mneson so it was there in the book but not in my mind. Anyway Generic_Update is better because it's pointerless :-) **************************************************************** From: Tucker Taft Sent: Thursday, May 13, 2004 10:33 AM I guess I am still not convinced. If you use a binary tree, having a cursor pointing into the tree is not always terribly useful when you are trying to search for some subsequent element with a given key. You will often have to go back "up" several levels before being able to go back down. With "Slice" you are forcing every operation to support a "virtual" subset as well as a real set. This is going to inevitably introduce some distributed overhead. I would be surprised if on balance, this is a net savings. I'm sure you could construct a case where it would be a savings, but overall, I would expect the mix of uses would favor keeping the abstraction simpler. An alternative is to have additional versions of operations like "Find" and "Delete" which take a "Starting_With" cursor parameter. (This may be something that was there to begin with, I have forgotten.) Those might be useful, but still they seem like operations that might sometimes be slower than starting at the "top" of the binary tree, depending on exactly where in the tree the Starting_With cursor points. The added complexity to the interface just doesn't seem worth it. There is certainly nothing preventing someone defining a "Very_Ordered_Set" or whatever that has more of these operations, making it closer to a "Vector" in interface. Or it could be a generic child of Ordered_Set. I just don't think the justification is there for our initial attempt at a standard container library to include these additional capabilities. **************************************************************** From: Matthew Heaney Sent: Thursday, May 13, 2004 10:59 AM > An alternative is to have additional versions of operations > like "Find" and "Delete" which take a "Starting_With" cursor > parameter. (This may be something that was there to begin > with, I have forgotten.) Those might be useful, but still > they seem like operations that might sometimes be slower > than starting at the "top" of the binary tree, depending > on exactly where in the tree the Starting_With cursor points. That's pretty much my feeling. It's hard to know apriori whether it's faster to find an item by starting at the top and then searching the tree, or starting from some point in the tree and then searching linearly. Starting from the top does have the benefit that we can definitely say that the time complexity is O(log n) even in the worst case, which is why I re-wrote Mario's example to use a top-down search. I agree that Slice, Open_Range, etc, aren't necessary. However, you could make an argument for including an Upper_Bound style function, since it's more efficient than the expression Next(Floor(S,K)), and because it handles the endpoint issue automatically. In fact I think the issue of endpoint is the more compelling argument for including Upper_Bound (with some other name, of course), since even trying to write Mario's example sans Upper_Bound required a bit of mental effort. Maybe call it Next_Ceiling or Upper_Ceiling or whatever. **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 12:48 PM Slice does not complicate the abstraction, on the contrary, cf. my example about iterating a range. I agree tree implementations might have trouble optimizing certain cases. But in those cases they can just start with at the root as for a no-slice. But, yes, there is a slight overhead even then, namely for detecting the kind of case. Skiplist and hashtable implementations might do better though. But remember it's not just about optimization, it's also about expressing ranges declaratively. Just some final thoughts. By now I think I've made the case for Slice. Personally as a user I'd like it there. But I might be too biased a user (towards databases). I'm all confident you'll make the right choice. And as you point out there is always space for an Ada.Containers.Optimized_Sets package (*), which can mean business for independent Ada tool developers :-) (*) Is there? I was under the impression that the RM ruled-out extensions to package Ada. But the Ada.Containers spec talks about them as if they were legal. Sorry for the newby question. **************************************************************** From: Martin Dowie Sent: Thursday, May 13, 2004 2:08 PM > (*) Is there? I was under the impression that the RM ruled-out extensions to > package Ada. But the Ada.Containers spec talks about them as if they were > legal. Sorry for the newby question. I believe the rule is you can't add Child packages to package "Ada" but you can add Grand-Child and extend existing Child packages. **************************************************************** From: Marius Amado Alves Sent: Thursday, May 13, 2004 1:25 PM (Damn I said I was done but you keep asking for it :-) > > An alternative is to have additional versions of operations > > like "Find" and "Delete" which take a "Starting_With" cursor > > parameter. I fail to see how duplicating Insert, Delete, Is_In, Find, complicate the interface less than simply adding Slice. > > (This may be something that was there to begin > > with, I have forgotten.) There was, but only for Insert, and the known position add to be adjacent to the new. > > Those might be useful, but still > > they seem like operations that might sometimes be slower > > than starting at the "top" of the binary tree, depending > > on exactly where in the tree the Starting_With cursor points. > > That's pretty much my feeling. It's hard to know apriori whether it's > faster to find an item by starting at the top and then searching the > tree, or starting from some point in the tree and then searching linearly. This happens for either Slice or Starting_With. Actually Slice has more information (the upper bound), which can help make a better decision. > Starting from the top does have the benefit that we can definitely say > that the time complexity is O(log n) even in the worst case, which is > why I re-wrote Mario's example to use a top-down search. Can you pinpoint please? (Not pressing.) > I agree that Slice, Open_Range, etc, aren't necessary. However, you > could make an argument for including an Upper_Bound style function, > since it's more efficient than the expression Next(Floor(S,K)), and > because it handles the endpoint issue automatically. > > In fact I think the issue of endpoint is the more compelling argument > for including Upper_Bound (with some other name, of course), since even > trying to write Mario's example sans Upper_Bound required a bit of > mental effort. Again, can you pinpoint please? (Not pressing.) > Maybe call it Next_Ceiling or Upper_Ceiling or whatever. I take it you don't like Roof :-( **************************************************************** From: Randy Brukardt Sent: Thursday, May 13, 2004 11:45 AM > I guess I am still not convinced. If you use a binary > tree, having a cursor pointing into the tree is not > always terribly useful when you are trying to search > for some subsequent element with a given key. You will > often have to go back "up" several levels before being > able to go back down. With "Slice" you are forcing > every operation to support a "virtual" subset as well > as a real set. This is going to inevitably introduce > some distributed overhead. I would be surprised if on > balance, this is a net savings. I'm sure you could > construct a case where it would be a savings, but overall, > I would expect the mix of uses would favor keeping the > abstraction simpler. I totally agree. Moreover, there is overhead from requiring every implementation of Sets to support by-reference, not copied set objects. (That is, the result of Slice). Moreover, you're introducing even more erroneous cases into the library. Matt will be happy to tell you how hard I tried to eliminate *all* erroneousness from the containers library. He eventually convinced me that some cases of dangling cursors cannot be detected (that is, those that point into container objects that no longer exist). So some erroneousness is inevitable; but I'm very opposed to having it where it is not required. (Note that the erroneous cases come from the non-OOP design of the library. If the container object was a parameter to all operations [as it ought to be, IMHO], then there would be no need for erroneous cases. But that's water under the dam. :-) **************************************************************** From: Nick Roberts Sent: Friday, May 14, 2004 2:43 PM I am generally delighted by this amendment, and I hope it goes in. I think it shows how the knocking together of many wise heads generally produces a good result (even if it is only after an awful lot of argument :-) It does seem clear to me that a comprehensive set of packages could easily have numbered in the hundreds, when one considers the combinations of different structures and the selection between bounded and unbounded, definite and indefinite, and so on. I haven't counted, but Booch is over a hundred isn't it? I have a few queries. My profuse apologies if any of these have already been addressed (and I've missed them). [1] The vectors and maps are intended to automatically expand when required. This is fine, but the interface seems to provide no control over this expansion at all. Would it perhaps be a good idea to add a generic parameter such as below? Expansion_Size: Size_Type := [implementation defined]; The idea is that automatic expansion is done in multiples of Expansion_Size. It has a default value, so that it can be conveniently ignored by the user. A possible alternative is: Expansion_Factor: Float := [implementation defined]; The idea here is that automatic expansion of a map or vector X is by Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient default. Alternatively, ExpansionSize/Factor could be made a visible discriminant of the container types, or an invisible attribute (with appropriate get and set operations). [2] What was the reason for not permitting Resize to make a container smaller, please? [3] I'd quite like the amendment to add a paragraph near the top clarifying the idea that every container has a set of 'slots', and that each slot can be either empty or contain the (valid?) value of one element. The following descriptions could, I think, be made slightly clearer and more succinct by referring to these slots. (Would you like specific wording?) [4] Regarding the optimisation of operations, I suggest it may be possible for an implementation to keep enough extra internal information (in a Set object) to enable it to detect and optimise various scenarios (judged to be typical). For example, assuming a tree structure, a pointer to the node above the (terminal) node most recently inserted could be retained; the implementation could test each insertion to see if it falls under this node; if a sequence of insertions of (as it turns out) adjacent values occurs, this trick could yield a very good speed improvement. [5] Probably already mentioned, but in line 3364 'Assert (Target => V1, Source => V2);' should be 'Assign (Target => V1, Source => V2);'. Finally, is there a sample implementation of any these packages yet? **************************************************************** From: Matthew Heaney Sent: Thursday, May 13, 2004 3:05 PM Nick Roberts wrote: > I am generally delighted by this amendment, and I hope it goes in. I think > it shows how the knocking together of many wise heads generally produces a > good result (even if it is only after an awful lot of argument :-) Most of the argument you didn't even see... > It does seem clear to me that a comprehensive set of packages could easily > have numbered in the hundreds, when one considers the combinations of > different structures and the selection between bounded and unbounded, > definite and indefinite, and so on. I haven't counted, but Booch is over a > hundred isn't it? Booch is large. But my original AI-302 proposal was large too: I think were something like 25 containers (some of them had bounded and unbounded forms, etc), and the proposal itself was about 150 pgs. > I have a few queries. My profuse apologies if any of these have already been > addressed (and I've missed them). > > [1] The vectors and maps are intended to automatically expand when required. Yes. > This is fine, but the interface seems to provide no control over this > expansion at all. No. That's what Resize is for. > Would it perhaps be a good idea to add a generic parameter > such as below? > > Expansion_Size: Size_Type := [implementation defined]; Use Resize. > The idea is that automatic expansion is done in multiples of Expansion_Size. > It has a default value, so that it can be conveniently ignored by the user. > A possible alternative is: > > Expansion_Factor: Float := [implementation defined]; Use Resize to supply a hint about intended maximum length. The implementation then resizes the container according to the algorithm the vendor has chosen. > The idea here is that automatic expansion of a map or vector X is by > Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient > default. In the AI-302 reference implementation, the array is automatically expanded to twice its current size. > Alternatively, ExpansionSize/Factor could be made a visible discriminant of > the container types, or an invisible attribute (with appropriate get and set > operations). The container types do not have discriminants. > [2] What was the reason for not permitting Resize to make a container > smaller, please? Make a copy of the container, Clear the original, and then Move the copy to the original. (Wasn't this in the examples section?) > [4] Regarding the optimisation of operations, I suggest it may be possible > for an implementation to keep enough extra internal information (in a Set > object) to enable it to detect and optimise various scenarios (judged to be > typical). > > For example, assuming a tree structure, a pointer to the node above the > (terminal) node most recently inserted could be retained; the implementation > could test each insertion to see if it falls under this node; if a sequence > of insertions of (as it turns out) adjacent values occurs, this trick could > yield a very good speed improvement. Earlier releases of the AI-302 draft had overloadings of Insert that had a hint parameter, which, if it were successfully used to perform the insertion, then the time complexity would be O(1) instead of O(log n). However, the insert-with-hint operations were removed from the API at the ARG meeting in Phoenix. > Finally, is there a sample implementation of any these packages yet? See the ai302 subdirectory. The vector containers in the ai302 subdirectory conform to the most recent AI-302 draft (dated 2004/04/29). Look for updates to the remaining containers this weekend. (I recommend simply joining the charles project mailing lists, so you get notified automatically.) **************************************************************** From: Randy Brukardt Sent: Friday, May 14, 2004 10:09 PM > [1] The vectors and maps are intended to automatically expand when required. > This is fine, but the interface seems to provide no control over this > expansion at all. That's intentional. The implementation is allowed to choose the expansion algorithm that makes the most sense for it's architecture. Resize can be used to tell the implementation the ultimate size; there is an AARM note to mention to implementors that it is intended that this do the allocations needed. Matt claims that Resize often can be used in practice (I'm skeptical), but when it can't be used, you really don't have enough information to choose at all. > [2] What was the reason for not permitting Resize to make a container > smaller, please? The same reason that deleting an element doesn't necessarily destroy the element. We wanted to give the implementation flexibility in using blocking, caching, etc. The only operation that is guaranteed to recover space is the destruction of the container. Matt shows that it can be done by jumping through hoops, so there is a way to do it in the rare case that it is needed. > [3] I'd quite like the amendment to add a paragraph near the top clarifying > the idea that every container has a set of 'slots', and that each slot can > be either empty or contain the (valid?) value of one element. The following > descriptions could, I think, be made slightly clearer and more succinct by > referring to these slots. (Would you like specific wording?) It's not necessary, and makes things read more like a description of a specific implementation. We want as abstract a description as possible. We spent quite a bit of effort getting rid of such wording from the vector and maps containers (there should be no further reference to "nodes" in those containers). I would have done the same to the other containers if I would have had more time and energy. > [5] Probably already mentioned, but in line 3364 'Assert (Target => V1, > Source => V2);' should be 'Assign (Target => V1, Source => V2);'. Yes, and I've fixed all of the typos noted by Dan and Christoph in the working version -- so the ARG won't need to consider them in Palma. **************************************************************** From: Matthew Heaney Sent: Friday, May 14, 2004 11:25 PM > Matt shows that it can be done by jumping through hoops, so > there is a way to do it in the rare case that it is needed. Just to add to what Randy said: the point of Resize is to prevent automatic expansion that would otherwise occur as items are inserted into the container. It's not influencing the size that's important per se; rather, it's disabling expansion. If you ever need to shrink a vector (say), then just do this: Shrink: declare Temp : Vector := V; begin Clear (V); Move (Target => V, Source => Temp); end Shrink; Note that I've been an STL user for 4 years now, and I've never actually had a need to shrink a vector. Most of the time I use a vector to store a large index or whatever, and usually I can determine prior to insertion how many items I'm going to insert, so I call Resize first. **************************************************************** From: Randy Brukardt Sent: Friday, May 14, 2004 11:24 PM I think you meant "Assign" rather than "Move", as Move just copies the existing internal contents (thus preserving the size). "Assign" would make the target only as large as necessary. **************************************************************** From: Matthew Heaney Sent: Saturday, May 15, 2004 1:54 AM No, you've got it backwards. Move does indeed preserve the size -- of the source. Here, Temp has the minimum size necessary to store the Length (V) elements of V (although the API doesn't actually specify this). Note that Move doesn't copy any elements. The copying happened during assignment of V to Temp. Assign copies the active elements Source of onto the existing internal array of Target, so it doesn't modify the size unless Length (Source) > Size (Target). **************************************************************** From: Nick Roberts Sent: Saturday, May 15, 2004 8:39 AM > > [2] What was the reason for not permitting Resize to make a container > > smaller, please? > > The same reason that deleting an element doesn't necessarily destroy the > element. We wanted to give the implementation flexibility in using blocking, > caching, etc. The only operation that is guaranteed to recover space is the > destruction of the container. Well, it may seem like nitpicking, but that seems to be a reason to /allow/ the implementation /not/ to (actually) shrink a container. It doesn't seem like a reason to /disallow/ the implementation from shrinking it. Surely allowing an implementation to shrink if it wishes would be provide the greatest flexibility? I suspect, with respect, that you are being a bit hopeful if you expect implementations to use blocking, caching, or other optimisations. I doubt that many will, in practice. And with an implementation close to the model, there would be no difficulty in shrinking (by reallocation and copying, as for enlargement). Actually, I think shrinking would probably be feasible for most implementations, maybe all. Again, I guess that's arguing the case as strongly as it can be. > > [3] I'd quite like the amendment to add a paragraph near the top > > clarifying the idea that every container has a set of 'slots', and that > > each slot can be either empty or contain the (valid?) value of one > > element. The following descriptions could, I think, be made slightly > > clearer and more succinct by referring to these slots. (Would you > > like specific wording?) > > It's not necessary, and makes things read more like a description of a > specific implementation. We want as abstract a description as possible. We > spent quite a bit of effort getting rid of such wording from the vector and > maps containers (there should be no further reference to "nodes" in those > containers). I would have done the same to the other containers if I would > have had more time and energy. Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and you could even say that in the description. I do really think it could significantly clarify the descriptions. I could do some actual wording, if you wish. **************************************************************** From: Nick Roberts Sent: Saturday, May 15, 2004 8:39 AM > > [1] The vectors and maps are intended to automatically expand when required. > > This is fine, but the interface seems to provide no control over this > > expansion at all. > > Would it perhaps be a good idea to add a generic parameter > > such as below? > > > > Expansion_Size: Size_Type := [implementation defined]; > > Expansion_Factor: Float := [implementation defined]; > > Use Resize to supply a hint about intended maximum length. The > implementation then resizes the container according to the algorithm the > vendor has chosen. > In the AI-302 reference implementation, the array is automatically > expanded to twice its current size. This seems to correspond with the idea of having something like: Expansion_Factor: Float := 2.0; as a generic parameter. Such a parameter would not interfere with the use of Resize, wherever the user could or wanted to use it (and which would certainly be superior where it could be used). However, it would provide a small extra measure of control for the user. An implementation could partially or entirely ignore the value of Expansion_Factor, if there were better criteria for it to base the decision on. Since it has a default value, it does not get in the way of the user who doesn't want to use it. I don't think its addition would add much complexity to the specifications, or much burden to implementations. It would actually simplify some implementations, wouldn't it?. I seem to remember that, back in the days when computers (operating systems) had fixed-length files on their hard disks, you could usually specify an expansion size for a file. A file would be automatically reallocated, expanded by it expansion size, when necessary (just like a vector in the AI, curiously). Okay, I think I've argued the case for this feature as strongly as possible now :-) > > [2] What was the reason for not permitting Resize to make a container > > smaller, please? > > Make a copy of the container, Clear the original, and then Move the copy > to the original. (Wasn't this in the examples section?) Yes, but that doesn't answer my question, Matt! > > Finally, is there a sample implementation of any these packages yet? > > > > See the ai302 subdirectory. > > The vector containers in the ai302 subdirectory conform to the most > recent AI-302 draft (dated 2004/04/29). Look for updates to the > remaining containers this weekend. (I recommend simply joining the > charles project mailing lists, so you get notified automatically.) Great. Thanks. **************************************************************** From: Ehud Lamm Sent: Sunday, May 16, 2004 4:59 AM > An implementation could partially or entirely ignore the value of > Expansion_Factor, if there were better criteria for it to base the decision > on. Since it has a default value, it does not get in the way of the user who > doesn't want to use it. This makes sense to me. That's the way I usually do it. **************************************************************** From: Nick Roberts Sent: Saturday, May 15, 2004 8:48 AM > > Matt shows that it can be done by jumping through hoops, so > > there is a way to do it in the rare case that it is needed. > > Just to add to what Randy said: the point of Resize is to prevent > automatic expansion that would otherwise occur as items are inserted > into the container. It's not influencing the size that's important per > se; rather, it's disabling expansion. > ... Okay, but it would be way easier to be able to use one call to Resize instead! > Note that I've been an STL user for 4 years now, and I've never actually > had a need to shrink a vector. Most of the time I use a vector to store > a large index or whatever, and usually I can determine prior to > insertion how many items I'm going to insert, so I call Resize first. Hmm. I think perhaps what you're missing is the case where: (a) you don't know in advance what size is going to be required; (b) you want to Resize the vector to something big, so as to minimise (eliminate) reallocations. I think this is a fairly common scenario. In this kind of case, the user knows the length of the vector after it has been populated, and would probably like to be able to issue a simple Resize afterwards to change the size of the vector to its length (eliminating wasted space). E.g.: Open(File,...); Resize(Vector,100_000); while not End_of_File(File) loop Read(File,X); Append(Vector,X); end loop; Close(File); Resize(Vector,Length(Vector)); Does this not make sense? **************************************************************** From: Matthew Heaney Sent: Saturday, May 15, 2004 11:40 AM > Okay, but it would be way easier to be able to use one call > to Resize instead! Right now Resize has the same semantics as reserve() does in the STL. You might want to post a note on comp.lang.c++ asking about reserve() (and its associated function capacity()). You might want also want to send your question to Musser, Plauger, or Scott Meyers to get their opinion. > Hmm. I think perhaps what you're missing is the case where: > (a) you don't know in advance what size is going to be > required; (b) you want to Resize the vector to something big, > so as to minimise (eliminate) reallocations. I think this is > a fairly common scenario. In that case I would use a std::deque, not a std::vector, if the number of elements is large and I need population of the container to be as fast as possible. (I had included a deque container in my original proposal, but removed it after the ARG asked me to reduce its size. We should revisit this if there's ever a secondary container library standard.) >In this kind of case, the user > knows the length of the vector after it has been populated, > and would probably like to be able to issue a simple Resize > afterwards to change the size of the vector to its length > (eliminating wasted space). E.g.: > > Open(File,...); > Resize(V,100_000); > while not End_of_File(File) loop > Read(File,X); > Append(V,X); > end loop; > Close(File); > Resize(V,Length(V)); > > Does this not make sense? Read the file into a temporary vector (which has been resized as above), and then assign it to the real vector V. The moral of the story is that you can shrink a vector. We're only disagreeing about the syntax. (Note that what I do mostly involves .avi files, which have a header describing how many frames are in the file. So in my case I read in the avi stream header first, and then resize the vector based on the information in the header.) **************************************************************** From: Nick Roberts Sent: Monday, May 17, 2004 2:47 PM "Matthew Heaney" wrote: > > Okay, but it would be way easier to be able to use one call > > to Resize instead! > > Right now Resize has the same semantics as reserve() does in the STL. > You might want to post a note on comp.lang.c++ asking about reserve() > (and its associated function capacity()). You might want also want to > send your question to Musser, Plauger, or Scott Meyers to get their > opinion. I must say, that seems like a very evasive answer. Can you not give a direct answer to the question "Why not permit Resize to reduce the size of a vector?" Why does the Ada standard need to do what the STL does? > > Hmm. I think perhaps what you're missing is the case where: > > (a) you don't know in advance what size is going to be > > required; (b) you want to Resize the vector to something big, > > so as to minimise (eliminate) reallocations. I think this is > > a fairly common scenario. > > In that case I would use a std::deque, not a std::vector, if the number > of elements is large and I need population of the container to be as > fast as possible. > > (I had included a deque container in my original proposal, but removed > it after the ARG asked me to reduce its size. We should revisit this if > there's ever a secondary container library standard.) In which case, I must ask what is the point of providing the vector abstraction at all? What does it provide that is not bettered, in practice, either by Ada's instrinsic arrays or by the list abstraction? > ... > The moral of the story is that you can shrink a vector. We're only > disagreeing about the syntax. Yes, we are disagreeing about the syntax. I am suggesting that the syntax: Resize(V,Length(V)); is a big improvement upon: declare Temp : Vector := V; begin Clear (V); Move (Target => V, Source => Temp); end Shrink; and I do not see -- and I have not been given -- any reason why the former should not be permitted. > (Note that what I do mostly involves .avi files, which have a header > describing how many frames are in the file. So in my case I read in the > avi stream header first, and then resize the vector based on the > information in the header.) In which case, why do you not simply use an array? **************************************************************** From: Randy Brukardt Sent: Monday, May 17, 2004 5:07 PM > In which case, I must ask what is the point of providing the vector > abstraction at all? What does it provide that is not bettered, in practice, > either by Ada's intrinsic arrays or by the list abstraction? Because Matt is very tied (in his mind) to a particular implementation. The containers as described in AI-302-03 are much more abstract, and do not have a prescribed implementation. Janus/Ada will probably use a two-level implementation for vector (which is more like what Matt calls a "Deque"), because the extra cost of such an implementation is quite low in return for the benefits that will be available. (It also maps much better to the code-shared generics of Janus/Ada). ... > > (Note that what I do mostly involves .avi files, which have a header > > describing how many frames are in the file. So in my case I read in the > > avi stream header first, and then resize the vector based on the > > information in the header.) > > In which case, why do you not simply use an array? I've made this point many times, and you're never going to get a satisfactory answer. It's best to let it go. (Otherwise, we'll use up the entire budget for Ada 2005 discussing trivialities, and there will not be any money to build the RM...) Earlier, Nick wrote: > This seems to correspond with the idea of having something like: > Expansion_Factor: Float := 2.0; > as a generic parameter. This is very specific to a particular implementation. We don't want that much specification of the implementation. ... > An implementation could partially or entirely ignore the value of > Expansion_Factor, if there were better criteria for it to base the decision > on. Since it has a default value, it does not get in the way of the user who > doesn't want to use it. We don't want a parameter whose value can be ignored. Resize itself is bad enough. In any event, micro-managing memory use is not what containers are about. You use them when you want the system to manage memory for you. If you care deeply about memory use, you need to build your own abstractions. If you don't, a compiler update could completely destroy your system's performance. (You can't rely on predefined stuff for critical time/space performance.) ... > I suspect, with respect, that you are being a bit hopeful if you expect > implementations to use blocking, caching, or other optimisations. I doubt > that many will, in practice. And with an implementation close to the model, > there would be no difficulty in shrinking (by reallocation and copying, as > for enlargement). Actually, I think shrinking would probably be feasible for > most implementations, maybe all. IBM Rational insisted on weakening some of the requirements so that they could use alternative implementations. Similarly, I've been very concerned about specifying an implementation, simply because Matt's implementations would be outrageously slow if compiled for Janus/Ada (due to generic code sharing). I fully intend to use a two-level scheme for vectors. All of the containers will use limited free lists to avoid excess allocation. I've considered allocation blocking for lists (but it wouldn't work for Janus/Ada, so we won't do that). Now, some vendors may simply use Matt's implementations, but it's pretty clear that at least some vendors are not planning to do so. > Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and you > could even say that in the description. I do really think it could > significantly clarify the descriptions. I could do some actual wording, if > you wish. But we don't need it! Containers just hold a number of elements; all else is specific to particular implementations, and does not really belong in the standard. We made a number of specific exceptions to that to allow inserting of empty elements into vectors for performance reasons (similar to the reason that Resize exists). Those do not need any wrapper concept. As I said, Matt's original text had "nodes" in many places, and I took them out as much as possible. It generally shortened the wording; there were no cases where it helped anything. (It's more useful in the List container, but even there, it would be best to remove it. Just no more energy or budget.) And no thanks, I don't have any energy or budget to spend training someone how to write Standard wording. Especially when it isn't necessary. I don't doubt that there exist paragraphs that need wordsmithing, but I think the overall wording is about on the right level. **************************************************************** From: Pascal Obry Sent: Tuesday, May 18, 2004 12:55 AM > Because Matt is very tied (in his mind) to a particular implementation. The > containers as described in AI-302-03 are much more abstract, and do not have > a prescribed implementation. This is not true for the map. The name is Indefinite_Hashed_Maps. This state cleary that the implementation uses an hash table. I have found that if an hash table is very fast for small set of data (< 10000) it is quite slower than an AVL for very large set of data (> 100_000). Maybe this is the current reference implementation but that's what I have experienced. FYI, the AVL implementation I'm talking about is Table_Of_*_And_Dynamic_Data_G from the LGL. **************************************************************** From: Randy Brukardt Sent: Friday, May 19, 2004 7:14 PM The "Hashed_Maps" and "Ordered_Sets" cases are special. I think everyone would have preferred to avoid specifying an implementation there as well. But that's impossible, because of the vastly different generic parameters needed. That is, a "Hashed_Map" takes a hash function as a generic parameter, while an "Ordered_Set" (implemented as a tree) takes generic ordering operators as generic parameters. So, that exposes the basic implementation, as does any ordering requirements (hash tables aren't ordered by definition). Given these basic properties differ, a container where the hash vs. tree implementation isn't specified doesn't make sense. I do have to wonder about your results. Since an AVL tree is going to be log N access by key, it should be quite a bit slower in large collections. The only reason for a hash table to slow down is a bad hash function (which then could make long chains in a few buckets) - essentially turning lookups into brute force searches. Are you sure that your hash function is good enough for "large sets of data"? An ideal function would put one item into each bucket. **************************************************************** From: Pascal Obry Sent: Saturday, May 22, 2004 2:04 AM The hash routine was not good at all. We have discussed this with Matthew, using a standard one (close to a hash routine used to implement associative arrays in Tcl or Gawk) the hash table is now 2 times faster. **************************************************************** From: Michael F. Yoder Sent: Saturday, May 22, 2004 11:40 AM I've seen bad behavior with hashing many times, both in personal and professional contexts. The basic reason is: if you use a fixed table size and linear chaining within a bucket, hashing is linear (albeit with a small constant) and large datasets can perform very badly even if the hash function is good. I don't recall the problem ever being a bad hash function, though it could have occurred and I've forgotten. My own solution was to expand the table size when it becomes 3/4 full or so (using internal rather than external chaining); it might be better to make each bucket be a tree. The latter solution has a security benefit: it mitigates DOS attacks based on causing collisions deliberately. This consideration occurred at my last job, but admittedly isn't a common one. For what it's worth, the use of an expanding table has always solved the problem. **************************************************************** From: Tucker Taft Sent: Saturday, May 22, 2004 3:42 PM The Hash_Maps are intended to be expandable hash tables. That's what Resize() is all about. And yes, I expect the only reason AVLs might start to outperform a hash table is if the hash table has a fixed number of buckets. **************************************************************** From: Randy Brukardt Sent: Saturday, May 22, 2004 3:46 PM The containers library uses an expanding hash table. The only way the behavior can get bad is if the hash function isn't good enough to use most of the buckets. **************************************************************** From: Ehud Lamm Sent: Sunday, May 16, 2004 4:54 AM > Tucker wrote: > > > I guess I am still not convinced. If you use a binary > > tree, having a cursor pointing into the tree is not > > always terribly useful when you are trying to search > > for some subsequent element with a given key. You will > > often have to go back "up" several levels before being > > able to go back down. With "Slice" you are forcing > > every operation to support a "virtual" subset as well > > as a real set. This is going to inevitably introduce > > some distributed overhead. I would be surprised if on > > balance, this is a net savings. I'm sure you could > > construct a case where it would be a savings, but overall, > > I would expect the mix of uses would favor keeping the > > abstraction simpler. > > I totally agree. Moreover, there is overhead from requiring every > implementation of Sets to support by-reference, not copied > set objects. (That is, the result of Slice). This is also the way I see it. Perhaps I missed something, so let me put it bluntly: are we talking ADT interfaces here, or are we working solely for a specific implementation? As you know from our Ada-Europe workshop a couple of years ago, I am firmly in the ADT camp myself, so I prefer interfaces that don't impose to many implementation restrictions. They an then be extended at will -- much easier than removing operations that are hard of inefficient to support. **************************************************************** From: Marius Amado Alves Sent: Monday, May 17, 2004 1:01 PM [Slice et al.] > Perhaps I missed something, so let me put it bluntly: are we talking ADT > interfaces here, or are we working solely for a specific implementation? Both. Slice provides a way to express ranges declaratively (interface) and a way to pass information to operations that can use it to optimize (implementation, but not specific). (Just clarifying. The cases have been made, the tendency of the ARG is to leave Slice out, so it's only academic now.) **************************************************************** From: Ehud Lamm Sent: Sunday, May 16, 2004 5:03 AM > From: Matthew Heaney [mailto:mheaney@on2.com] > > Tucker Taft wrote: > > > I don't think we need to change > > "Previous" to make these equivalences work for > > endpoints. Just let the user write a > > "Previous_Or_Last" if they really want to, > > which would need to take both a cursor and a set. > > Or more directly, write Lower_Limit or Upper_Limit > > if you want them, since these already have enough > > information with the set and the key. > > > > Providing Ceiling and Floor still seems adequate to me, > > as they provide the needed primitives for all other > > operations mentioned thus far. > > OK. That seems reasonable. I just wanted to make sure we > were on the > same page w.r.t the behavior at the endpoints. It does seem reasonable, and since I never used this sort of operations, my opinion shouldn't count as much, so take this with a grain of salt... It looks like the equivalences help understand what's going on. The special cases make code less readable and the logic a bit less clear. How important this is, is hard to judge. I wager many students will forget about the special case. Why not provide Lower_Limit or Upper_Limit? The cost seems tiny. **************************************************************** From: Matthew Heaney Sent: Monday, May 17, 2004 1:15 PM I am in favor of providing the following four operations: Lower_Limit (S, K) < K (AKA "Ground", "Previous_Floor") Floor (S, K) <= K Ceiling (S, K) >= K (AKA Lower_Bound) Upper_Limit (S, K) > K (AKA Upper_Bound, "Roof", "Next_Ceiling") I think Tucker only wants the middle two. If I had to pick only two, then I'd pick the last two (Ceiling and Upper_Limit). (This is what the STL & Charles do, and what was in the API prior to the ARG meeting in Phoenix.) Note that there are really two separate issues: (1) What is the value of the expression: Previous (Next (C)) We got rid of the internal sentinel node in Phoenix, which means once a cursor assumes the value No_Element, then it keeps that value. This is what Tucker and I were discussing in the earlier message quoted above, about letting a user define a Previous_or_Last function if he needs to back up onto the actual sequence. (2) Restoring the functionality of the two operations formerly known as "Lower_Bound" and "Upper_Bound". There seems to be agreement that this functionality is useful. One of the issues is that several of the ARG reviewers were confused by the names "Lower_Bound" and "Upper_Bound". **************************************************************** From: Tucker Taft Sent: Monday, May 17, 2004 1:49 PM Will this never end? ;-) My *major* complaint with Upper_Limit, Lower_Limit, Upper_Bound, Lower_Bound, etc. is that the names make no intuitive sense. If you could come up with some reasonable names, I might support the inclusion. I do not find any of the ones that have been proposed thus far acceptable. Predecessor and Successor might make it, where they are allowed to take a key that might or might not appear in the set, and return the cursor for the item in the set next preceding or following the given key. **************************************************************** From: Michael F. Yoder Sent: Wednesday, May 19, 2004 11:48 AM Whether 2 or 4 operations are included, it would be pleasant if the names came from a consistent scheme. For example: Lt_Item (S, K) < K Le_Item (S, K) <= K Gt_Item (S, K) > K Ge_Item (S, K) >= K This is easier to do if the "Lt" and "Gt" operations are the only two provided. For example, 'Predecessor' and 'Successor' would be fine. Floor for Le_Item and Ceiling for Ge_Item, together with Predecessor and Successor, would be acceptable. **************************************************************** From: Christoph Grein Sent: Sunday, May 23, 2004 11:37 PM I do think the names at the right intuitively describe the meaning: Gt_Item (S, K) > K Roof Ge_Item (S, K) >= K Ceiling Le_Item (S, K) <= K Floor Lt_Item (S, K) < K Ground, Basement It's like a building, you're in a room, which has a floor and a ceiling; above is the roof (or the attic), below the basement or ground. **************************************************************** From: Marius Amado Alves Sent: Monday, May 24, 2004 5:26 AM ":=" for containers clones the source (as opposed to passing a reference to). Do I understand correctly that this behaviour is specified solely by the fact that containers are non-limited? In that case, wouldn't a small clarifying Note by useful, specially for new users coming e.g. from... uh... Java... And should't the behaviour of ":=" be documented for any controlled type anyway? **************************************************************** From: Matthew Heaney Sent: Wednesday, June 9, 2004 9:40 AM I have a few comments on the Phoenix release of AI-302 (2004-04-29 AI95-00302-03/03). Each comment is bracketed with "MJH:" and "ENDMJH." pairs, and immediately follows the item to which it refers. -Matt A.17.2 The Package Containers.Vectors generic ... package Ada.Containers.Vectors is ... function To_Vector (Count : Size_Type) return Vector; MJH: I wasn't absolutely sure whether the formal param should be named "Count" or "Length". The term "count" is used elsewhere in this spec, but here it actually specifies the length of the vector object returned by the function. ENDMJH. function To_Vector (New_Item : Element_Type; Count : Size_Type) return Vector; MJH: This is formatted inconsistently. It should be: function To_Vector (New_Item : Element_Type; Count : Size_Type) return Vector; ENDMJH. procedure Set_Length (Container : in out Vector; Length : in Size_Type); MJH: Should we include following operation too? procedure Set_Length (Container : in out Container_Type; Length : in Size_Type; New_Item : in Element_Type); This would allow the user to specify an actual value for the new elements, if the length of the vector is increased. ENDMJH. procedure Swap (Container : in out Vector; I, J : in Cursor); MJH: Should be weaken the precondition, allowing the case in which both I and J have the value No_Element? In that case Swap would be a no-op. (Right now I think it's an error.) ENDMJH. function To_Index (Position : Cursor) return Index_Type'Base; If Position is No_Element, Constraint_Error is propagated. Otherwise, the index (within its containing vector) of the element designated by Cursor is returned. MJH: Should this be reworded to say "If Has_Element (Position) is False..."? ENDMJH. MJH: Also, note that if Position may only designate an active element in the container, then we don't need to return Index_Type'Base. We can strengthen the post-condition by returning Index_Type. ENDMJH. AARM Note: This implies that the index is determinable from a bare cursor alone. The basic model is that a vector cursor is implemented as a record containing an access to the vector container and a index value. This does constrain implementations, but it also allows all of the cursor operations to be defined in terms of the corresponding index operation (which should be primary for a vector). MJH: It's not clear if CE is supposed to be propagated if Position does not specify a value within the range of currently active elements of Container. For example: declare V : Vector; C : Cursor; I : Index_Type'Base; begin Append (V, E); C := First (V); Delete_First (V); I := To_Index (C); --valid? end; ENDMJH. generic with procedure Process (Element : in out Element_Type) is <>; procedure Generic_Update_by_Index (Container : in Vector; Index : in Index_Type'Base); If Index is not in the range First_Index (Container) .. Last_Index (Container), then Constraint_Error is propagated. Otherwise, it calls the generic actual bound to Process with the element at position Index as the parameter. Any exceptions raised by Process are propagated. If Element_Type is unconstrained and definite, then the Element parameter shall be unconstrained. AARM Note: This means that the elements cannot be aliased nor directly allocated from the heap; it must be possible to change the discriminants of the element in place. The element at position Index is not an empty element after successful completion of this operation. AARM Note: Since reading an empty element is a bounded error, attempting to use this procedure to replace empty elements may fail. Use Replace_Element to do that reliably. MJH: What did we conclude about this? I thought using Generic_Update to initialize a space element was ok? (Or was that only for a list?) Is this AARM Note in conflict with the note below? ENDMJH. procedure Replace_Element (Position : in Cursor; By : in Element_Type); This function assigns the value By to the element designated by Position. If Position equals No_Element, then Constraint_Error is propagated. Any exceptions raised during the assignment are propagated. The element designated by Position is not an empty element after successful completion of this operation. AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are only ways that an element can change from empty to non-empty. MJH: Is this AARM Note in conflict with the note above? ENDMJH. procedure Insert (Container : in out Vector; Before : in Cursor; New_Item : in Vector); If Before is No_Element, then is equivalent to Insert (Container, Index_Type'Succ (Last_Index (Container)), New_Item); otherwise is equivalent to Insert (Container, To_Index (Before), New_Item); MJH: Should this be reworded to say "Has_Element (Before) = False..." instead? ENDMJH. MJH: We probably need to say here that if New_Item is empty, then then operation has no effect. Otherwise there's a constraint check if Before=No_Element (IT'Succ (Cont.Last)) can fail, when Cont.Last=IT'Last). ENDMJH. MJH: Here and elsewhere the equivalence is in terms of To_Index, but this might be too restrictive. Before is allowed to be IT'Succ (Cont.Last), but I think To_Index raises an exception if it has that value. ENDMJH. procedure Insert (Container : in out Vector; Before : in Cursor; New_Item : in Vector; Position : out Cursor); Create a temporary (call it Temp_Index) and set it to Index_Type'Succ (Last_Index (Container)) if Before equals No_Element, and To_Index (Before) otherwise. Then Insert (Container, Before, New_Item) is called, and finally Position is set to To_Cursor (Container, Temp_Index). AARM Note: The messy wording because Before is invalidated by Insert, and we don't want Position to be invalid after this call. An implementation probably only needs to copy Before to Position. MJH: See note above. ENDMJH. procedure Insert (Container : in out Vector; Before : in Cursor; New_Item : in Element_Type; Count : in Size_Type := 1); Equivalent to Insert (Container, Before, To_Vector (New_Item, Count)); MJH: See note above when Count = 0. (We should state explicitly that if Count=0, then the operation is a no-op, and there are no constraint checks or any other exceptions. The value or state of cursor Before is not checked or otherwise considered, when Count=0.) ENDMJH. procedure Insert (Container : in out Vector; Before : in Cursor; New_Item : in Element_Type; Position : out Cursor; Count : in Size_Type := 1); Equivalent to Insert (Container, Before, To_Vector (New_Item, Count), Position); MJH: See not above re count=0. ENDMJH. procedure Prepend (Container : in out Vector; New_Item : in Vector; Count : in Size_Type := 1); Equivalent to Insert (Container, Index_Type'First, New_Item). MJH: Typo: this declaration should look like this: procedure Prepend (Container : in out Vector; New_Item : in Vector); ENDMJH. procedure Insert_Space (Container : in out Vector; Before : in Cursor; Position : out Cursor; Count : in Size_Type := 1); Create a temporary (call it Temp_Index) and set it to Index_Type'Succ (Last_Index (Container)) if Before equals No_Element, and To_Index (Before) otherwise. Then Insert_Space (Container, Temp_Index, Count) is called, and finally Position is set to To_Cursor (Container, Temp_Index). MJH: See note above re count=0. ENDMJH. procedure Delete (Container : in out Vector; Position : in out Cursor; Count : in Size_Type := 1); If Count is 0, the operation has no effect. Otherwise is equivalent to Delete (Container, To_Index (Position), Count). MJH: Here and elsewhere when Count is 0, I think we need to specify what value for Position is returned. ENDMJH. MJH: If Count is non-zero, then how should we handle a Position that does not designate an active element. Above, we raise CE. Is this correct? ENDMJH. MJH: We probably need to say here that Position is set to Position.Index if Index continues to designate an element in the container, or No_Element if Position was part of the entire tail that was deleted. ENDMJH. procedure Delete_Last (Container : in out Vector; Count : in Size_Type := 1); If Length (Container) < Count then is equivalent to Delete (Container, Index_Type'First, Count); otherwise is equivalent to Delete (Container, Index_Type'Val(Index_Type'Pos(Last_Index(Container)) - Count + 1), Count). MJH: If Length (C) >= Count, then isn't it easier to simply say that it's the same as Clear (C)? ENDMJH. Returns the value Index_Type'First. MJH: What operation does this description refer to? I assume it's First_Index. ENDMJH. procedure Swap (Container : in Vector; I, J : in Cursor); Equivalent to Swap (Container, To_Index (I), To_Index (J)). MJH: I mentioned this above. We might want to weaken the precondition of Swap, to allow cursors both of which Has_Element returns False to be swapped; that is, if both are No_Element, then Swap should be a no-op. ENDMJH. function Find (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'First) return Index_Type'Base; Searches the elements of Container for an element equal to Item, starting at position Index. If Index is less than Index_Type'First, then Constraint_Error is propagated. If there are no elements in the range Index .. Last_Index (Container) equal to Item, then Find returns Index_Type'Succ (Last_Index (Container)). Otherwise, it returns the index of he matching element. MJH: Here and in the other find ops we should probably weaken the precondition, such that if the container is empty, we return failure status immediately, without vetting or otherwise interrogating the value of Index. ENDMJH. function Find (Container : Vector; Item : Element_Type; Position : Cursor := No_Element) return Cursor; Searches the elements of Container for an element equal to Item, starting at the first element if Cursor equals No_Element, and at the element designated by Cursor otherwise, and searching to the last element in Container. If an item equal to Item is found, Find returns a cursor designating the first element found equal to Item. If no such item is found, it returns No_Element. MJH: Suppose Has_ELement (Position) = False, is this an error (raise CE), or does it count as No_ELement (start from IT'First)? NDMJH. A.17.3 The Package Containers.Doubly_Linked_Lists procedure Delete (Container : in out List; Position : in out Cursor; Count : in Size_Type := 1); If Position equals No_Element, the operation has no effect. Otherwise Delete removes Count nodes starting at the node designated by Position from Container (or all of the nodes if there are less than Count nodes starting at Position). Any exceptions raised during deallocation of internal storage are propagated. MJH: Is this inconsistent with vector? I think we made it an error if Size > 0 and Position = No_Element. (I don't know which way we should go, I just wanted to bring it up.) ENDMJH. procedure Swap (Container : in out List; I, J : in Cursor); Swap exchanges the nodes designated by I and J. MJH: Allow I and J to both assume the value No_Element? ENDMJH. MJH: Does this swap nodes (by exchanging pointers, or does it eave the nodes in their relative positions, and merely exchange the values of the elements on those nodes? ENDMJH. A.17.5 The Package Containers.Ordered_Sets generic ... package Ada.Containers.Ordered_Sets is ... procedure Insert (Container : in out Set; New_Item : in Element_Type; Position : out Cursor; Success : out Boolean); --MJH: --A nice function might be: --procedure Insert (Container : in out Set; -- New_Item : in Element_Type); --This is a convenience function that omits the last two params. --ENDMJH. function Is_Subset (Item : Set; Container : Set) return Boolean; MJH: Clarify the results when one or both of the params are empty sets. (I assume that in set theory, the subset operation is defined on a pair ull sets, but I don't remember offhand what the value is.) ENDMJH. function Is_Disjoint (Item : Set; Container : Set) return Boolean; MJH: As above, clarify the results when one or both of the params are empty sets. ENDMJH. **************************************************************** From: Randy Brukardt Sent: Wednesday, June 9, 2004 11:59 PM A couple of comments on Matt's comments (I'm not going to comment on typos and the like, it's too late to fix them before the meeting, and they're recorded). > function To_Index (Position : Cursor) return Index_Type'Base; > > If Position is No_Element, Constraint_Error is propagated. Otherwise, the > index (within its containing vector) of the element designated by > Cursor is > returned. > > MJH: > Should this be reworded to say "If Has_Element (Position) is False..."? > ENDMJH. I don't think so. It's usually a bounded error to use a cursor that doesn't point at an active element. That allows either raising Constraint_Error or doing something else. You explain why below... ... > MJH: > It's not clear if CE is supposed to be propagated if Position does not > specify a value within the range of currently active elements of > Container. For example: > > declare > V : Vector; > C : Cursor; > I : Index_Type'Base; > begin > Append (V, E); > C := First (V); > Delete_First (V); > I := To_Index (C); --valid? > end; > ENDMJH. It's very clear that this is a bounded error, and we're *not* requiring implementations to detect this case (in this specific example, because Delete is called on an element to the left). But we *allow* it to be detected. I thought we had agreed that we didn't want the overhead of detecting these kinds of errors. The organization of the standard requires us to put the bounded error text far away from this subprogram (which is unfortunate), but since it is a general rule, that isn't too bad. The bounded error rules apply to *all* uses of cursors except Has_Element, so the answer is the same for all other routines. > generic > with procedure Process (Element : in out Element_Type) is <>; > procedure Generic_Update_by_Index (Container : in Vector; > Index : in Index_Type'Base); ... > MJH: > What did we conclude about this? I thought using Generic_Update to > initialize a space element was ok? (Or was that only for a list?) It's also in the bounded error section. I think we concluded that we couldn't allow Generic_Update, because it implies a read of the element. I tried to find a way to avoid that, but if we did, then it wouldn't be "Update" any more. ... > AARM Note: Replace_Element, Generic_Update, and > Generic_Update_by_Index are > only ways that an element can change from empty to non-empty. > > MJH: > Is this AARM Note in conflict with the note above? > ENDMJH. Someone asked that in April. Sheesh. Generic_Update is in the list because it's a bounded error to call it, and *if* it doesn't raise an exception, *then* it changes the element to non-empty. But you can't depend that it doesn't raise an exception. ... > procedure Delete (Container : in out List; > Position : in out Cursor; > Count : in Size_Type := 1); > > If Position equals No_Element, the operation has no effect. Otherwise > Delete removes Count nodes starting at the node designated by Position > from Container (or all of the nodes if there are less than Count nodes > starting at Position). Any exceptions raised during deallocation > of internal storage are propagated. > > MJH: > Is this inconsistent with vector? I think we made it an error if > Size > 0 and Position = No_Element. (I don't know which way we should > go, I just wanted to bring it up.) > ENDMJH. Yes, it seems to be inconsistent with Vector. Vector raises C_E for indexes out of range (of course), and the cursor version mimics that behavior, because it really can't do anything else. So I'd say this probably out to raise C_E as well. **************************************************************** From: Matthew Heaney Sent: Thursday, June 10, 2004 10:32 AM > It's very clear that this is a bounded error, and we're *not* requiring > implementations to detect this case (in this specific example, because > Delete is called on an element to the left). But we *allow* it to be > detected. I thought we had agreed that we didn't want the overhead of > detecting these kinds of errors. OK, I just wanted to make sure. The other thing I forget to mention is that the following operations are in the list package but not the vector package: procedure Delete (Container : in out List; Item : in Element_Type); generic with function Predicate (Element : Element_Type) return Boolean is <>; procedure Generic_Delete (Container : in out List); procedure Reverse_List (Container : in out List); generic with function Predicate (Element : Element_Type) return Boolean is <>; function Generic_Find (Container : List; Position : Cursor := No_Element) return Cursor; generic with function Predicate (Element : Element_Type) return Boolean is <>; function Generic_Reverse_Find (Container : List; Position : Cursor := No_Element) return Cursor; There's no technical reason they should be in the list but not the vector. Either we can add them to vector, or get rid of them for list. Here's another idea. We already have a Generic_Update, but another useful operation might be some kind of query operation, that either returns Boolean or a type you pass in as a generic formal. Something like: generic type Result_Type (<>) is limited private; function Process (E : ET) return Result_Type is <>; function Generic_Query (Position : Cursor) return Result_Type; Of course, a user could implement this as (here, for a Boolean Result_Type): function Query (P : C) return Boolean is Result : Boolean; procedure Process (E : in out ET) is begin Result := Predicate (E); -- some algorithm end; procedure Update is new Generic_Update; begin Update (P); return Result; end; The awkward case is when the Result_Type actual type is indefinite. For example, were it type String you would have to use an unbounded_string or whatever as the temporary (but maybe that's not such a big deal). Clearly you can implement a query-style function from the update modifier operation, but I wasn't sure whether that's possible in all cases for all possible return types, and if so whether this warrants the introduction of a dedicated operation. **************************************************************** From: Randy Brukardt Sent: Thursday, June 10, 2004 6:50 PM ... > There's no technical reason they should be in the list but not the > vector. Either we can add them to vector, or get rid of them for list. I'd be wary of adding too many rarely used routines to these containers. Those just make the containers harder to learn and harder to implement with little additional benefit. Unbounded_Strings has a large number of rarely used routines, and yet it never seems to have the odd routine I actually need. So, that actually increases the frustration level, because you'd think that in so many routines, every plausible need would be met. When there are fewer routines, the expectation level is lower, too, and you wouldn't feel quite so ripped-off. In the routines you mentioned, I think that the generic routines are too specialized - it would be rare that you both could match their usage pattern *and* would remember that they exist. Delete by item seems error-prone if there are multiple identical items in the container (does it delete just one or all of them? Explain your choice, and why the user would expect that over the other possibility.) Reverse_List (which probably should just be called "Reverse") doesn't seem that useful, and is masking a lot of work. So I'd probably dump the whole lot. But I do agree that List and Vector should be the same, whatever is decided. > Here's another idea. We already have a Generic_Update, but another > useful operation might be some kind of query operation, that either > returns Boolean or a type you pass in as a generic formal. > Something like: > > generic > type Result_Type (<>) is limited private; > function Process (E : ET) return Result_Type is <>; > function Generic_Query (Position : Cursor) return Result_Type; This seems too specialized to me. Most of the time, it would make just as much sense to write a function of the Element. Besides, this seems like it would be illegal if AI-318 is passed as currently planned, since limited unconstrained types will not be allowed to be returned. So there is a contract issue here (having a function that has to be able to both build-in-place and return-by-copy seems like a very nasty case for generic sharing implementations). In any case, we need to avoid "feeping creaturism" here. KISS definitely applies! **************************************************************** From: Pascal Obry Sent: Wednesday, June 9, 2004 10:43 AM One feedback after migrating AWS to the AI302 reference implementation. The procedure Size and Length are really too confusing. I have at least 2 times used the wrong one (using Size instead of Length). Length is ok, maybe Size should be renamed Hash_Size or something like that. For the record: function Size (Container : Vector) return Size_Type; -> returns the size of the hash table (number of buckets) function Length (Container : Vector) return Size_Type; -> returns the number of item in the vector Also, as Size and Resize are low-level stuff I would put those routines at the end of the package. Another solution would be to put such routines into a child package. Thoughts ? **************************************************************** From: Matthew Heaney Sent: Wednesday, June 9, 2004 2:55 PM Pascal Obry wrote: > One feedback after migrating AWS to the AI302 reference implementation. The > procedure Size and Length are really too confusing. I have at least 2 times > used the wrong one (using Size instead of Length). Length is ok, maybe Size > should be renamed Hash_Size or something like that. It's not unlike for an array, which has both 'Length and 'Size attributes. > For the record: > > function Size (Container : Vector) return Size_Type; > -> returns the size of the hash table (number of buckets) No. The Size of a hashed map container specifies the maximum length (number of items) before which automatic expansion of the internal hash table occurs. It does *not* specify the number of buckets in the hash table. (It is indeed the case that in the AI-302 reference implementation, function Size happens to return the number of hash table buckets, but that is a characteristic of that particular implementation. It is not guaranteed to be the case for all implementations.) > function Length (Container : Vector) return Size_Type; > -> returns the number of items in the vector Technically it's the "number of active elements," but let's not quibble. > Also, as Size and Resize are low-level stuff I would put those routines at the > end of the package. Another solution would be to put such routines into a > child package. Thoughts ? It's a bad idea. **************************************************************** From: Pascal Obry Sent: Wednesday, June 9, 2004 3:26 PM What is a bad idea ? I have proposed 3 things : - rename Size and keep Length - move the Size and Resize to the end of the API - move the Size and Resize routines into a child package I hope that you at least see that Size/Length having the same prototype is dangerous. It is even more dangerous that using Size instead of Length can stay undetected for some time... **************************************************************** From: Matthew Heaney Sent: Wednesday, June 9, 2004 3:44 PM I was referring to the suggestion in your last paragraph to make Size and Resize child subprograms. **************************************************************** From: Pascal Obry Sent: Wednesday, June 9, 2004 3:53 PM Ok, I also think it is bad idea, was there for completeness :) **************************************************************** From: Tucker Taft Sent: Wednesday, June 9, 2004 3:34 PM How about "Maximum_Length" and "Set_Maximum_Length" in place of Size and Resize? **************************************************************** From: Pascal Obry Sent: Wednesday, June 9, 2004 3:42 PM Fine with me. **************************************************************** From: Robert A Duff Sent: Wednesday, June 9, 2004 7:23 PM > What is a bad idea ? I have proposed 3 things : I don't know Matt's opinion, but here's mine: > - rename Size and keep Length Good idea. I think this is fairly important. > - move the Size and Resize to the end of the API Good idea. Not important. > - move the Size and Resize routines into a child package Bad idea. > I hope that you at least see that Size/Length having the same prototype > is dangerous. It is even more dangerous that using Size instead of Length > can stay undetected for some time... Yes, I agree. The name Size should be changed to something else, something nobody would mistake for Length. **************************************************************** From: Nick Roberts Sent: Wednesday, June 9, 2004 9:06 PM > How about "Maximum_Length" and "Set_Maximum_Length" in place > of Size and Resize? I endorse this suggestion. Specifically, I suggest: (1) In package Ada.Containers, change: type Size_Type is range 0 .. ; to: type Count_Type is range 0 .. ; and all subsequent uses of Size_Type be renamed to Count_Type. (2) In packages Ada.Containers.Vectors, Ada.Containers.Hashed_Maps, (and Ada.Containers.Indefinite_Hashed_Maps,) change: function Size (Container : Vector|Map) return Size_Type; to: function Maximum_Length (Container : Vector|Map) return Count_Type; and change: procedure Resize (Container : in out Vector|Map; Size : in Size_Type); to: procedure Set_Maximum_Length (Container : in out Vector|Map; To : in Count_Type); (3) Change all references to the term 'size' to 'maximum length'. For example, change the second paragraph of the proposed A.17.2 from: A vector container object manages an unconstrained internal array, which expands as necessary as items are inserted. The *size* of a vector corresponds to the total length of the internal array, and the *length* of a vector corresponds to the number of active elements in the internal array. to: A vector container object conceptually manages an unconstrained internal array, which expands as necessary as items are inserted. The *maximum length* of a vector corresponds to the total length of this conceptual internal array, and the *length* of a vector corresponds to the number of active elements within this array. An alternative to 'maximum length' and [Set_]Maximum_Length throughout all the above could be 'allocated length' and [Set_]Allocated_Length. This issue has been argued about before. Some said that the term 'size' clashed with the predominant existing usage of the term in connection with the number of storage units used up by objects and program units. Others said that many terms are 'overloaded' in the RM, and the term 'size' is already used to mean other things in some places. However, I quite strongly feel that an alternative term could easily be chosen, and it would be very desirable to do so, to avoid just the kind of confusion Pascal reported. I must also add that I still think it is unjustified that the size/maximum length of a vector or map is not permitted to be reduced by any implementation. Specifically, I advocate that Resize/Set_Maximum_Length be allowed (by the standard) to reduce the size/maximum length of a vector or map, but that implementations are permitted to ignore such reductions if they wish. In fact, I would suggest that the current wording (forbidding such reductions) is silly in a way, because I doubt very much that there will ever be an ACATS test for it. On that basis, I also question the wording "Resize sets the size of Container to a value which is at least the value Size", which could more sensibly be changed to "Resize sets the size of Container to approximately the value Size". (4) I suggest the paragraph: If Size (Container) is equal to or greater than Size, the operation does nothing. Otherwise Resize sets the size of Container to a value which is at least the value Size, expanding the internal array to hold Size elements. Expansion will require allocation, and possibly copying and deallocation of elements. Any exceptions raised by these operations are propagated, leaving the container with at least the original Size, Length, and elements. be changed to: Set_Maximum_Length sets the maximum length of Container to approximately the value To, expanding or contracting the internal array as required. Expansion or contraction may require allocation, and possibly copying and deallocation of elements. Any exceptions raised by these operations are propagated, leaving the length and active elements of the container unchanged. and that the following AARM notes be changed appropriately, and that this implementation permission is added: Implementations are not required to support the [changing|reduction] of the maximum size of a container by Set_Maximum_Length, in which case calls of this procedure should do nothing. I favour the word 'changing', on the basis that Set_Maximum_Length is probably never going to be ACATS tested for its effect on the size (maximum length) of a vector or map. (4) I also suggest that the concept of an 'expansion factor' is added to vectors and maps. Each vector or map has its own expansion factor associated with it, which is a value of the subtype Ada.Containers.Expansion_Factor_Type, declared as follows: subtype Expansion_Factor_Type is Float range 1.0 .. [impl def]; Whenever a vector or map is expanded automatically, the value of its expansion factor at the time may be used (but does not have to be) by the implementation to determine the new maximum length of the container, nominally by multiplying the current maximum length by the current expansion factor. The initial (default) value of the expansion factor of a container is implementation defined, but its value may be retrieved and set by the following subprograms: function Expansion_Factor (Container : Vector|Map) return Expansion_Factor_Type; procedure Set_Expansion_Factor (Container : in out Vector|Map; To : in Expansion_Factor_Type); **************************************************************** From: Robert A. Duff Sent: Thursday, June 10, 2004 7:45 AM Tuck says: > How about "Maximum_Length" and "Set_Maximum_Length" in place > of Size and Resize? I don't really like "Maximum_Length", because there actually *is* no max length -- the whole point is these things can grow arbitrarily large. I believe STL calls them "capacity" and "reserve". Pretty much anything would be better than "Size", for the reasons Pascal stated. **************************************************************** From: Matthew Heaney Sent: Thursday, June 10, 2004 10:10 AM Well, it does describe when expansion happens. How about: function Expansion_Length (Container : in Map) return Size_Type; procedure Set_Expansion_Length (Container : in out Map; Length : in Size_Type); **************************************************************** From: Alexander E. Kopilovich Sent: Thursday, June 10, 2004 11:31 AM Another proposition: function Extent -- or Current_Extent and procedure Set_Extent -- correspondily, Set_Current_Extent But perhaps the best would be to say straight: function Reserved_Length -- or Reserved_Size and procedure Set_Reserved_Length -- correspondily, Set_Reserved_Size **************************************************************** From: Tucker Taft Sent: Thursday, June 10, 2004 1:42 PM > But perhaps the best would be to say straight: > > function Reserved_Length > > and > > procedure Set_Reserved_Length I like these. "Capacity" is pretty much a synonym for "Maximum_Length". Both need the word "Current" added to make it clear these are expandable. "Reserved" has just the right connotation. By the way, I agree that there seems no reason not to allow Set_Reserved_Length to specify a smaller length, though we then want Reserved_Length to be allowed to return a value larger than the value most recently set by Set_Reserved_Length. Which might argue for changing the "set" procedure's name to "Set_Minimum_Reserved_Length" and the function's name to "Actual_Reserved_Length" to be crystal clear. **************************************************************** From: Alexander E. Kopilovich Sent: Thursday, June 10, 2004 5:13 PM Perhaps for this purpose "Provide_Reserved_Length" would be even better (again, more straigt) than "Set_Minimum_Reserved_Length". And additionally, as "provide" (unlike "set") is somehow uncertain about the upper limit, there will be less need for the prefix "Actual_" before "Reserved_Length". **************************************************************** From: Nick Roberts Sent: Friday, June 11, 2004 9:13 PM I like these suggestions. I quite like 'Actual_Reserved_Length', but I think it's not really necessary, since there is no other function ('Requested_Reserved_Length' or some such) for it to be contrasted with. Perhaps a consensus is coming close to: - rename the term 'size' as 'reserved length'; - rename the 'Size' functions as 'Reserved_Length'; - rename the 'Resize' procedures as 'Request_Reserved_Length'. I would also like to suggest: - rename the type 'Size_Type' as 'Count' or 'Count_Type'. My justification for this is that the term 'size' is mainly used in connection with storage units, so some potential for confusion would be easily avoided by a different name, and a type 'Count' fulfilling a very similar role is declared in the Ada.*_IO packages. I would like to reiterate my suggestions that: - the Request_Reserved_Length (Resize) procedures are permitted to reduce the reserved length (size) of a container, but that in any case (reduction or expansion) any actual change to the reserved length (size) remains implementation defined; - an explicit expansion factor is supported, as in my previous post. My justification for the first is that it would not be sensible to formally test whether an implementation obeyed a more stringent definition. The Reserved_Length (Size) functions should return the actual reserved length (size), but again, it would probably not be sensible to try to formally test this (and it may not be possible). My justification for the second is that there will often be situations where the user (of the proposed container packages) knows better than the implementation what the expansion factor should be, and in such cases the implementation default for deciding by how much to expand a container (whether by a simple factor or some other method) is likely to be very inappropriate. [Sorry about numbering this point '(4)' in my previous post; it should have been '(5)'.] **************************************************************** From: Michael F. Yoder Sent: Saturday, June 12, 2004 9:18 AM >Perhaps a consensus is coming close to: > >- rename the term 'size' as 'reserved length'; > >- rename the 'Size' functions as 'Reserved_Length'; > >- rename the 'Resize' procedures as 'Request_Reserved_Length'. If there is such a consensus, I'll add my support to it. These seem like good ideas. >I would also like to suggest: > >- rename the type 'Size_Type' as 'Count' or 'Count_Type'. > >My justification for this is that the term 'size' is mainly used in >connection with storage units, so some potential for confusion would be >easily avoided by a different name, and a type 'Count' fulfilling a very >similar role is declared in the Ada.*_IO packages. I agree. >I would like to reiterate my suggestions that: > >- the Request_Reserved_Length (Resize) procedures are permitted to reduce >the reserved length (size) of a container, but that in any case (reduction >or expansion) any actual change to the reserved length (size) remains >implementation defined; I strongly agree. Requiring that the user write the size reduction code via a copy forecloses even the possibility of a reduction that avoids copying. >- an explicit expansion factor is supported, as in my previous post. > >My justification for the first is that it would not be sensible to formally >test whether an implementation obeyed a more stringent definition. The >Reserved_Length (Size) functions should return the actual reserved length >(size), but again, it would probably not be sensible to try to formally test >this (and it may not be possible). > >My justification for the second is that there will often be situations where >the user (of the proposed container packages) knows better than the >implementation what the expansion factor should be, and in such cases the >implementation default for deciding by how much to expand a container >(whether by a simple factor or some other method) is likely to be very >inappropriate. [Sorry about numbering this point '(4)' in my previous post; >it should have been '(5)'. I'm less enthusiastic about the expansion factor, but I don't oppose it. **************************************************************** From: Robert A. Duff Sent: Monday, June 14, 2004 8:55 AM Mike Yoder wrote: > Nick Roberts wrote: > > >- the Request_Reserved_Length (Resize) procedures are permitted to reduce > >the reserved length (size) of a container, but that in any case (reduction > >or expansion) any actual change to the reserved length (size) remains > >implementation defined; > > > I strongly agree. Requiring that the user write the size reduction code > via a copy forecloses even the possibility of a reduction that avoids > copying. The STL guarantees that the reserved size is at least that requested. This is important because it means that cursors/iterators that point into the data structure do not become invalid while appending (up to that reserved size). **************************************************************** From: Nick Roberts Sent: Sunday, June 20, 2004 1:53 PM However, the semantics required by the current AI-302 is clearly different. The relevant wording is: A Cursor value is *ambiguous* if any of the following have occurred since it was created: * Insert or Delete has been called on the vector that contains the element the cursor designates with an index value (or a cursor designating an element at such an index value) less than or equal to the index value of the element designated by the cursor; * The vector that contains the element it designates has been passed to an instance of Generic_Sort. and: A Cursor value is *invalid* if any of the following have occurred since it was created: * The vector that contains the element it designates has been finalized; * The vector that contains the element it designates has been used as the Source or Target of a call to Move; * The element it designates has been deleted. The result of "=" or Has_Element is unspecified if it is called with an invalid cursor parameter. Execution is erroneous if any other subprogram declared in Containers.Vectors is called with an invalid cursor parameter, or if the cursor designates an element in a different vector object than the appropriate one specified in the call. AARM Notes: The list above (combined with the bounded error cases) is intended to be exhaustive. In other cases, a cursor value continues to designate its original element. For instance, cursor values survive the appending of new elements. End AARM Notes. Cursors are not permitted to become ambiguous or invalid solely because of internal copying (as a result of automatic extension). **************************************************************** From: Randy Brukardt Sent: Wednesday, June 23, 2004 9:43 PM Right. That's an important property: cursors do not become invalid because of an action that is outside of the user's control. And memory management in a container is outside of the user's control. Resize (I forget the new name we settled on, so I'll use the old one for now) is purely a performance enhancing routine. The only requirement is that Size (ditto on the name) returns the value most recently passed into Resize, or something larger. There's an AARM note suggesting to implementors that Resize allocate at least the specified memory, but of course that is untestable and cannot be specified in normative language of the standard. **************************************************************** From: Simon Wright Sent: Friday, June 11, 2004 3:03 AM > possibility.) Reverse_List (which probably should just be called > "Reverse") If it wasn't a reserved word! **************************************************************** From: Matthew Heaney Sent: Sunday, June 27, 2004 5:35 PM Randy: I have a few comments about the Palma API release. The actual text of my comments are bracketed by "MJH:" and "ENDMJH." pairs, and immediately follows the operation(s) to which they refer. vector: MJH: The partial view of type Vector is now tagged, like this: type Vector is tagged private; Ditto for the other containers. ENDMJH. function To_Vector (Count : Count_Type) return Vector; function To_Vector (New_Item : Element_Type; Count : Count_Type) return Vector; MJH: We need to affirm whether the parameter should be named "Count" or "Length". ENDMJH. function Capacity (Container : Vector) return Count_Type; procedure Set_Capacity (Container : in out Vector; Capacity : in Count_Type); MJH: I declared the operations formerly named "Size" and "Resize" as above. ENDMJH. generic with procedure Process (Element : in out Element_Type); procedure Generic_Update_Element_By_Index (Container : in Vector; Index : in Index_Type'Base); generic with procedure Process (Element : in out Element_Type); procedure Generic_Update_Element (Position : in Cursor); MJH: We don't need different names for these operations anymore, since they're not generic and hence we can overload the names as follows (verify my syntax is correct): procedure Update_Element (Container : in Vector; Index : in Index_Type'Base; Process : access procedure (Element : in out Element_Type)); procedure Update_Element (Position : in Cursor; Process : access procedure (Element : in out Element_Type)); ENDMJH. procedure Set_Length (Container : in out Vector; Length : in Size_Type); MJH: Is this vector operation missing? procedure Set_Length (Container : in out Container_Type; Length : in Size_Type; New_Item : in Element_Type); This would allow you to specify a value for elements that become active when Length > Length (Container). ENDMJH. procedure Swap (Container : in Vector; I, J : in Index_Type'Base); procedure Swap (Container : in out Vector; I, J : in Cursor); MJH: The declaration of the second Swap operation (the one for which I and J have type Cursor) appears to be incorrect. Firstly, the Container parameter is inout. (This was probably for symmetry with the cursor-based Swap for the List container -- see below.) It would only need to be in-mode. However, I think the real error is that there is a container parameter at all. It is never that case that you need to pass a container when all you're doing is manipulating an element through a cursor (e.g. E := Element (C)). I think the cursor-based swap operation should be declared this way: procedure Swap (I, J : in Cursor); The comment also applies to the (cursor-based) swap operation for the List containers. (This change is really a consequence of clarifying the semantics of swap for list containers during the ARG meeting in Palma.) ENDMJH. MJH: Should be weaken the precondition, allowing I and J to have the value No_Element, in which case Swap is a no-op? ENDMJH. function Is_In (Item : Element_Type; Container : Vector) return Boolean; MJH: As a consequence of making the vector type tagged, the parameters should be put in the opposite order, like this: function Is_In (Container : Vector; Item : Element_Type) return Boolean; ENDMJH. generic with procedure Process (Position : in Cursor); procedure Generic_Iteration (Container : in Vector); generic with procedure Process (Position : in Cursor); procedure Generic_Reverse_Iteration (Container : in Vector); MJH: These operations aren't generic anymore. Also, the name should probably be changed to use verb-style instead of noun-style (the existing name is consistent with the style for generic operations as Unchecked_Deallocation, etc): procedure Iterate (Container : in Vector; Process : access procedure (Position : in Cursor)); procedure Reverse_Iterate (Container : in Vector; Process : access procedure (Position : in Cursor)); ENDMJH. list: procedure Swap (Container : in out List; I, J : in Cursor); MJH: The semantics of Swap were clarified in Palma so that only elements are swapped, and not nodes. Hence there is no need for a container parameter, and the swap operation should be declared like this: procedure Swap (I, J : in Cursor); This cursor-based swap operation for Vector should be declared similarly. ENDMJH. MJH: As for the Vector, the parameters for Is_In should be reordered so that the container parameter is the first parameter. ENDMJH. map: generic type Key_Type is private; type Element_Type is private; with function Hash (Key : Key_Type) return Hash_Type is <>; with function Is_Equal_Key (Left, Right : Key_Type) return Boolean is "="; with function "=" (Left, Right : Element_Type) return Boolean is <>; package AI302.Containers.Hashed_Maps is ...; MJH: There was a lot of talk between Tucker and Pascal about the declaration of generic formal region for the hashed map. I think Tucker wanted it to look like this: generic type Key_Type is private; type Element_Type is private; with function Hash (Key : Key_Type) return Hash_Type; --VERIFY WHETHER THERE'S NO DEFAULT with function "=" (Left, Right : Key_Type) return Boolean is <>; with function Equivalent (Left, Right : Key_Type) return Boolean is "="; with function "=" (Left, Right : Element_Type) return Boolean is <>; package AI302.Containers.Hashed_Maps is ...; We agreed that the map container would use keys to compute map container equality. My question is exactly how this should be done. Firstly, what is the purpose for passing in key equality as a generic formal parameter? Is it merely to supply a default for Equivalent? Or is it also used for some other purpose (perhaps to compute map equality)? To compute map equality, we do something like this: (1) Compare lengths; if they're different, then return false. (2a) For each key in the left (say) map, see if it's in the right map. If it's not found, then return false. (2b) If the key is found, then compare the elements. If they're not equal, then return false. My question is really about step (2a), about what it means to "compare keys." When we check to see if the key of the left map is in the right map, we do what already during insertion and deletion, by computing the hash value and then calling Equivalent (formerly called "Is_Equal_Key"). So what is the purpose of key equailty "="? Do we use key equality for some purpose other than providing a default for Equivalent? Or do we somehow incorporate an explicit call to key "=" when we "compare keys" during computation of map equality? ENDMJH. MJH: Another point: the name "Equivalent" is also inconsistent with cursor operations named "Is_Equal_Key". Was this intended? Should we leave the formal operation named "Is_Equal_Key" as is, or change the cursor operations to use name the "Equivalent"? ENDMJH. procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Position : out Cursor; Success : out Boolean); procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); procedure Insert (Container : in out Map; Key : in Key_Type; Position : out Cursor; Success : out Boolean); MJH: We overloaded the insertion operations for list to include overloadings that omit a cursor parameter. Should we provide similar overloadings for maps (and sets -- see below) too? Something like this: procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); (I have omitted an overloading that just accepts a key, since in general we need a cursor in order to give the element a value following the insertion proper.) ENDMJH. sets: procedure Insert (Container : in out Set; New_Item : in Element_Type; Position : out Cursor; Success : out Boolean); MJH: The following overloading that omits the cursor parameter would be useful: procedure Insert (Container : in out Set; New_Item : in Element_Type); I've had a need for this operation, as has Georg Bauhaus (per CLA). It would also be consistent with list, which has an overloading that omits the cursor parameter. ENDMJH. MJH: We have a Replace operation for maps, but nothing similar for sets. It make make sense to include this set operation too: procedure Replace (Container : in out Set; New_Item : in Element_Type); ENDMJH. function Is_Subset (Item : Set; Container : Set) return Boolean; function Is_Disjoint (Item : Set; Container : Set) return Boolean; function Is_In (Item : Element_Type; Container : Set) return Boolean; MJH: Since the container type is tagged, all of these operations need to reorder the parameters so that the container is first: function Is_Subset (Container : Set; Item : Set) return Boolean; function Is_Disjoint (Container : Set; Item : Set) return Boolean; function Is_In (Container : Set; Item : Element_Type) return Boolean; ENDMJH. MJH: For both Is_Subset and Is_Disjoint, we should clarify the results when one or both of the params are empty sets. ENDMJH. function Find (Container : Set; Item : Element_Type) return Cursor; MJH: Immediately following the declaration of the Find operation, the following two operations are declared: function Ceiling (Container : Set; Item : Element_Type) return Cursor; function Floor (Container : Set; Item : Element_Type) return Cursor; ENDMJH. **************************************************************** From: Matthew Heaney Sent: Sunday, June 27, 2004 6:34 PM > MJH: > For both Is_Subset and Is_Disjoint, we should clarify the > results when one or both of the params are empty sets. ENDMJH. > > function Find (Container : Set; > Item : Element_Type) > return Cursor; > > MJH: > Immediately following the declaration of the Find operation, > the following two operations are declared: > > function Ceiling (Container : Set; > Item : Element_Type) > return Cursor; > > function Floor (Container : Set; > Item : Element_Type) > return Cursor; > ENDMJH. MJH: I forget to mention here that Ceiling and Floor should also be added to the set nested package Generic_Keys: function Is_In (Container : Set; Key : Key_Type) return Boolean; function Find (Container : Set; Key : Key_Type) return Cursor; function Ceiling (Container : Set; Key : Key_Type) return Cursor; function Floor (Container : Set; Key : Key_Type) return Cursor; ENDMJH. MJH: Note also that the generic operation Generic_Keys.Generic_Insertion has been removed. ENDMJH. **************************************************************** From: Nick Roberts Sent: Sunday, June 27, 2004 5:50 PM [My comments are between NJR and ENDNJR.] "Matthew Heaney" wrote: function To_Vector (Count : Count_Type) return Vector; function To_Vector (New_Item : Element_Type; Count : Count_Type) return Vector; MJH: We need to affirm whether the parameter should be named "Count" or "Length". ENDMJH. NJR: The name 'Length' seems very approrpiate to me. ENDNJR MJH: Is this vector operation missing? procedure Set_Length (Container : in out Container_Type; Length : in Size_Type; New_Item : in Element_Type); This would allow you to specify a value for elements that become active when Length > Length (Container). ENDMJH. NJR I think this procedure might make sense (but it might be considered overkill). I guess 'Count_Type' was meant instead of 'Size_Type'. ENDNJR procedure Swap (I, J : in Cursor); NJR Might it be slightly clearer for the name to be 'Swap_Elements'? ENDNJR MJH: Should be [we] weaken the precondition, allowing I and J to have the value No_Element, in which case Swap is a no-op? ENDMJH. NJR All the algorithms that I can think of which swap elements in an array (of some kind) necessarily have a pre-test for validity before doing the swap. I therefore think it would be a useful bug-catcher to specify that an exception is raised if I or J is No_Element. By analogy to an Ada array, doing T := A(I); A(I) := A(J); A(J) := T; would raise Constraint_Error if I or J were out of the range of A. ENDNJR function Is_In (Item : Element_Type; Container : Vector) return Boolean; MJH: As a consequence of making the vector type tagged, the parameters should be put in the opposite order, like this: function Is_In (Container : Vector; Item : Element_Type) return Boolean; ENDMJH. NJR The obvious objection to this is that it would lack consistency with the Ada.Character.Maps packages. But I think I tentatively agree with Matt, in which case the name should probably be changed to something like 'Contains', 'Includes', or 'Has'. ENDNJR procedure Swap (I, J : in Cursor); NJR Again, maybe 'Swap_Elements' would be a slightly clearer name. ENDNJR MJH: As for the Vector, the parameters for Is_In [for lists] should be reordered so that the container parameter is the first parameter. ENDMJH. NJR Again, in which case the name should be something like 'Contains', 'Includes', or 'Has'. ENDNJR MJH: Since the container type is tagged, all of these operations need to reorder the parameters so that the container is first: function Is_Subset (Container : Set; Item : Set) return Boolean; function Is_Disjoint (Container : Set; Item : Set) return Boolean; function Is_In (Container : Set; Item : Element_Type) return Boolean; ENDMJH. NJR Again, the objection is that there would be a lack of consistency with Ada.Strings.Maps. If the order of the parameters were to be changed, it would seem that alternative names ought to be chosen for Is_Subset and Is_In. For example: function Contains_All (Container : Set; Item : Set) return Boolean; function Contains (Container : Set; Item : Element_Type) return Boolean; ENDNJR **************************************************************** From: Matthew Heaney Sent: Sunday, June 27, 2004 10:45 PM > MJH: > Is this vector operation missing? > > procedure Set_Length (Container : in out Container_Type; > Length : in Size_Type; > New_Item : in Element_Type); > > This would allow you to specify a value for elements that > become active when Length > Length (Container). ENDMJH. > > NJR > I think this procedure might make sense (but it might be > considered overkill). I guess 'Count_Type' was meant instead > of 'Size_Type'. > ENDNJR. Yes, that was a cut and paste error. The Length parameter should have type Count_Type. (Size_Type is gone.) **************************************************************** From: Matthew Heaney Sent: Sunday, June 27, 2004 10:52 PM > map: > > package AI302.Containers.Hashed_Maps is ...; I forgot to mention the changes to these map two operations: function Size (Container : Map) return Size_Type; procedure Resize (Container : in out Map; Size : in Size_Type); MJH: I made the name changes here the same as for the vector: function Capacity (Container : Map) return Count_Type; procedure Set_Capacity (Container : in out Map; Capacity : in Count_Type); ENDMJH. **************************************************************** From: Pascal Leroy Sent: Monday, June 28, 2004 2:17 AM Matt wrote: > function Is_In (Item : Element_Type; > Container : Vector) > return Boolean; > > MJH: > As a consequence of making the vector type tagged, the > parameters should be put in the opposite order, like this: > > function Is_In (Container : Vector; > Item : Element_Type) > return Boolean; > ENDMJH. I disagree. The reason why we care which parameter comes first is of course the Object.Operation notation introduced by AI 252. However, in this case we want to actively prevent the use of this notation. With your proposed change a call to Is_In could be written: My_Vector.Is_In (My_Element) which reads exactly backwards. This operation is a case where the parameters have a "natural" order and we don't want to change it. (I wish we could redefine "in", but that's a different topic.) **************************************************************** From: Cyrille Comar Sent: Monday, June 28, 2004 3:38 AM Maybe "Is_In" should be renamed "Contains" which has the opposite "natural" order: My_Vector.Contains (My_Element) looks better... **************************************************************** From: Pascal Leroy Sent: Monday, June 28, 2004 4:28 AM I like the idea. I have never been very happy with the name Is_In anyway. **************************************************************** From: Matthaw Heaney Sent: Monday, June 28, 2004 8:54 AM As Nick pointed out, the name Is_In comes from Ada.Strings.Maps (RM95 A.4.2 (13)). (That package also has an Is_Subset operation, with parameters in the same order as Is_In.) So I guess it's a choice between consistency with other parts of RM95, or trying to take advantage of new syntax allowed by Ada 0Y. **************************************************************** From: Georg Bauhaus Sent: Wednesday, June 30, 2004 7:25 AM !topic unchecked Insert for Sets and Maps !reference RM95-A.17 [AI95-00302-03] !from Author Georg Bauhaus 04-06-30 !discussion There are similar Insert procedures for both Ordered_Sets and Hashed_Maps with highly useful Position and Success parameters. Sometimes however, it seems somewhat disturbing to see declararations of variables for Position and Success that are not read because it is considered safe to ignore them. It might be known that Insert will succeed without surprises (ceteris paribus). Examples include adding initial values to a library level container, like 69 keywords, or adding known border values to ordered containers. A work around is to have wrapper procedures providing variables necessary for Insert. But does this not incur more verbiage and/or withing than is desirable? Adding a convenient procedure to the containers seems easy. In a sense, this might also make Ordered_Sets and Hashed_Maps correspond more closely to Vectors and Doubly_Linkes_List with regard to Insert procedures. (When using maps, I sometimes think of them as sparse arrays. With arrays, I can just write ary(key) := value; and be done.) **************************************************************** From: Matthew Heaney Sent: Wednesday, June 30, 2004 3:07 PM You have that operation already; it's called Replace: Replace (Map, Key, New_Item => Value); However, there's nothing like that for the ordered sets. It would appear useful as an adjunct to Update_Element. I agree that having overloadings of Insert for sets and maps that omit the Position and Success parameters would be handy. It's often the case that you know an insertion will succeed, so having to declare a Boolean object that you don't bother inspecting is kind of a pain. **************************************************************** From: Pascal Leroy Sent: Wednesday, June 30, 2004 3:30 PM These are sensible suggestions. However, I have to remind you that if this AI is not approved at the Madison ARG meeting it won't be in the Amendment. You should be cautious when considering the addition of new features: at this point we should really be crossing the t's and dotting the i's. This is especially important given that some countries expressed concerns regarding the maturity of this particular AI at the WG9 level. **************************************************************** From: Marius Amado Alves Sent: Thursday, July 1, 2004 5:43 AM I think Matthew is 'more right' than Jeffrey. But please note I tend to NOT advise changing the spec now, for the reasons Pascal gave. So this is mainly academic now, and sorry for being slightly OT, but I love real code examples. Two follow, from Mneson.Base (AI302 version 20040227). In (1) I know in advance that the element is new. In (2) I want the set semantics (unique elements) guaranteed by the container, so I don't care about the success value. You'll note I tried hard to name the dummy variables in accordance with the circumstances. (1) -- ... use String_Maps; use Short_String_IO; Found, Dont_Need : String_Maps.Cursor_Type; Expected_True : Boolean; begin Found := Find (String_Table, Value); if Found /= Null_Cursor then X := Element (Found); else -- new string -- code here to store the value -- in a Short_String_IO file Insert (Map => String_Table, Key => Value, New_Item => X, Cursor => Dont_Need, Success => Expected_True); end if; -- ... (2) procedure Connect (Source, Target : Vertex) is use Link_Sets; Dont_Need : Cursor_Type; Dont_Care : Boolean; begin Insert (Set => Links, New_Item => (Source, Target), Cursor => Dont_Need, Success => Dont_Care); Insert (Set => Inv_Links, New_Item => (Target, Source), Cursor => Dont_Need, Success => Dont_Care); end; **************************************************************** From: Matthew Heaney Sent: Thursday, July 1, 2004 9:31 AM Note that (1) isn't the most efficient way to do this, since Insert duplicates the effort of Find. I recommend doing it this way instead: C : Cursor; Not_Already_In_Map : Boolean; begin Insert (String_Table, Value, C, Not_Already_In_Map); if Not_Already_In_Map then Replace_Element (C, By => X); else X := Element (C); end if; end; Here, we use the item-less version of Insert to perform an insertion attempt, and then give the item a proper value if the key was actually inserted during this attempt. If the attempt fails, it's because the key was already in the map, so you can then interrogate the item associated with that key. **************************************************************** From: Marius Amado Alves Sent: Thursday, July 1, 2004 10:40 AM Yes I think it applies. Thanks. This kind of optimizations are in the Mneson to do list. Everyone is welcome to join Mneson development ;-) **************************************************************** From: Jeffrey Carter Sent: Wednesday, June 30, 2004 7:59 PM These are no doubt useful operations. However, as any student of defensive programming knows, operations that you know will succeed don't, often enough to be a problem. Since these components must ensure their internal consistency whenever possible, these operations would still have to check that they succeed, and raise an appropriate exception if they don't. **************************************************************** From: Matthew Heaney Sent: Thursday, July 1, 2004 9:32 AM > These are no doubt useful operations. However, as any student of > defensive programming knows, operations that you know will succeed > don't, often enough to be a problem. Since these components must ensure > their internal consistency whenever possible, these operations would > still have to check that they succeed, and raise an appropriate > exception if they don't. There may be some misunderstanding here. The Success parameter merely indicates whether the key was inserted during *this* insertion, not whether the key was inserted into the container. If Success returns False, this simply means that the key was already in the container. So no matter what Success returns, you still have a guarantee that the key is in the map. If this is a map, and it's important that the element associated with the key is always stored in the map (even if the key is already in the map), then use Replace. **************************************************************** From: Jeffrey Carter Sent: Thursday, July 1, 2004 7:54 PM This is an un-Ada-like way to do things, and I'm sorry I didn't realize this earlier. Insert might perform a replacement if the key already exists, or it might consider it an error and raise an exception, but to return a Boolean flag is too C-like for my tastes. **************************************************************** From: Pascal Leroy Sent: Friday, July 2, 2004 2:33 AM I am starting to feel that way too, and I wish I had noticed this earlier. The notion of an out parameter that you can drop on the floor if you like looks like a real safety issue to me. It is all too easy to forget to test this parameter. I know that Matt's philosophy is to trust the programmer, but he has been repeatedly chided by the ARG for this. My preference would be to change the specification of Insert as follows: procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Allow_Replacement : in Boolean; Position : out Cursor); If Allow_Replacement is True, Insert will replace any existing entry in the map with the given key/element pair. If Allow_Replacement is False, Insert will raise an exception if the key is already in the map. In the absence of an exception, Position will denote the newly inserted/replaced entry. I realize that I advocated avoiding changes to the specification, but this AI is going to be shot down by WG9 if it contains safety holes. **************************************************************** From: Marius Amado Alves Sent: Friday, July 2, 2004 5:33 AM I don't see a safety hole here, just a different style of doing the same thing. Remember the "Success" parameter does not reflect some kind of 'total' success of the operation, just that the element was already there. Other, really abnormal, conditions raise exceptions as expected. Personally I'm even OK with the occasional Dont_Care dummy variable. But the solution to this is trivial, just add the operation variants without the out parameters as 'proxies' with the expected implementation e.g. procedure Insert (Container, Item) is Dont_Care : Boolean; Dont_Need : Cursor; begin Insert (Container, Item, Dont_Care, Dont_Need); end; **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 8:03 AM > The notion of an out parameter that you can drop on the floor > if you like looks like a real safety issue to me. There are no safety issues. If I have a set of integers, and I do this: Insert (S, 42, C, B); Insert (S, 42, C, B); Then in the first call, B returns True, and in the second case, B returns False. There are no errors. What people have been asking for is the ability to say this: Insert (S, 42); Insert (S, 42); Which everybody seems to agree is perfectly reasonable. > It is all too easy to forget to test this parameter. The Boolean parameter indicates whether the key was already in the container. There are many times for which there is no reason to test the return value, and that is why people are asking for an overloading of insert that doesn't have the extra parameters. > I know that > Matt's philosophy is to trust the programmer, but he has been > repeatedly chided by the ARG for this. True, but there is no safety issue here. > My preference would be to change the specification of Insert > as follows: > > procedure Insert (Container : in out Map; > Key : in Key_Type; > New_Item : in Element_Type; > Allow_Replacement : in Boolean; > Position : out Cursor); > > If Allow_Replacement is True, Insert will replace any > existing entry in the map with the given key/element pair. We have a Replace operation already for maps that has that semantics. (I was toying with the idea that it might be nice to have a Replace for sets too.) > If Allow_Replacement is False, Insert will raise an exception > if the key is already in the map. In the absence of an > exception, Position will denote the newly inserted/replaced entry. The Allow_Replacement parameter is an example of "control coupling." If you want replacement behavior, just call Replace! > I realize that I advocated avoiding changes to the > specification, but this AI is going to be shot down by WG9 if > it contains safety holes. There are no safety holes. The issue we had in Palma was with Update_Element for sets (it would be possible to change the order relation of the element), and Tucker suggested a change to remove any possible erroneous behavior. There is nothing wrong with Insert, except for the fact that we didn't overload Insert to omit the position and status parameters. Mario gave some examples of when that operation would be useful. **************************************************************** From: Pascal Leroy Sent: Friday, July 2, 2004 8:15 AM > I don't see a safety hole here, just a different style of > doing the same thing. Remember the "Success" parameter does > not reflect some kind of 'total' success of the operation, > just that the element was already there. Other, really > abnormal, conditions raise exceptions as expected. After calling Insert with some key/element pair, if Success is set to False, the key/element pair is not really in the map. Instead, a pair key/some-other-element is in the map. I see this as a violation of an invariant of the map. > But the solution to this is trivial, just add the > operation variants without the out parameters as 'proxies' This is trivial from the perspective of the implementers or of the language description. It is _not_ trivial from the perspective of the user of the container. More operations make the container more complicated to use, you have to go back to the documentation to find out the meaning of all these operations, and at the end of the day you are less likely to use the container. The string packages are like that at the moment: they contain so much stuff that I can never remember what's there and what's not, so quite often I end up not using them. **************************************************************** From: Robert A. Duff Sent: Friday, July 2, 2004 8:37 AM > procedure Insert (Container : in out Map; > Key : in Key_Type; > New_Item : in Element_Type; > Allow_Replacement : in Boolean; > Position : out Cursor); The way I did this in my container packages is to have two routines: one raises an exception if the key is not there, and the other replaces with a new key=>value pair. You could call them Insert and Replace. I don't see the point of a routine that leaves the key associated with the *old* value. **************************************************************** From: Robert A. Duff Sent: Friday, July 2, 2004 8:46 AM > The way I did this in my container packages is to have two routines: > one raises an exception if the key is not there, and the other ^^^^^^^^^ Oops! I meant it raises if the key *is* already there. Sorry. > replaces with a new key=>value pair. You could call them > Insert and Replace. Roughly the same thing works for mappings and for sets. > I don't see the point of a routine that leaves the key associated with > the *old* value. **************************************************************** From: Marc Criley Sent: Friday, July 2, 2004 9:07 AM > This is trivial from the perspective of the implementers or of the > language description. It is _not_ trivial from the perspective of the > user of the container. More operations make the container more > complicated to use, you have to go back to the documentation to find out > the meaning of all these operations, and at the end of the day you are > less likely to use the container. The string packages are like that at > the moment: they contain so much stuff that I can never remember what's > there and what's not, so quite often I end up not using them. The abundance of operations has certainly not hindered the adoption of the C++ STL or the JDK libraries, whose content and complexity far exceed that of the existing and proposed Ada libraries. And this despite those collections often having their functionality supplied not just by a single class, but a whole inheritance hierarchy of classes. Whether it's strings or containers, I expect myself and other programmers to have a general familiarity with the services available, and then to look at the docs (whether in a separate document or embedded as comments associated with the declaration--my preference) for the details. My response to being uncertain about the contents of a library is not to forgo its use, but to go scan through it to see what's available so I can determine if it's useful to me and then take advantage of it if it is! **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 9:31 AM > After calling Insert with some key/element pair, if Success is set to > False, the key/element pair is not really in the map. Instead, a pair > key/some-other-element is in the map. I see this as a violation of an > invariant of the map. First of all, it is *not* a violation of any map invariant, and the behavior you describe is often exactly what we want. I gave an example yesterday, when I re-wrote Mario's example. A histogram is another example (see the !examples of the AI): Frequency_Histogram : Word_Count_Histograms.Map; ... procedure Log_Word (Word : in String) is C : Cursor; B : Boolean; begin Frequency_Histogram.Insert (Key => Word, New_Item => 0, --YES Position => C, Success => B); declare procedure Increment (Count : in out Integer) is begin Count := Count + 1; end; begin Update_Element (C, Increment'Access); end; end Log_Word; This example illustrates why Pascal is wrong. In the example, we attempt to insert the key value Word and the element value 0. This locution is quite deliberate. If the word is already in the map, then this insertion returns False without touching the word count. We then increment the existing count, which is exactly what we want. If the word is not already in the map, then this insertion returns True and the value 0 is associated with that key. We then increment the count, which gives it the value 1, which is exactly what we want. Notice that in neither case was it necessary to interrogate the Boolean return value. It can return True or False, but either value is correct. The value returned simply reflects the state of the map for this insertion, but this is state information we don't care about. **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 9:52 AM > I don't see the point of a routine that leaves the key associated with > the *old* value. See my last post containing the word count histogram for an example of why you'd want to preserve the old value. The only change we need to make to the API here is to add an overloading for Insert that omits the position and success parameters. At a minimum this overloading should be added to the sets. (The map has a Replace operation, so we can probably leave the map alone.) **************************************************************** From: Nick Roberts Sent: Friday, July 2, 2004 11:05 AM Blimey people, please listen to a guy who has spent a lot of time designing these kinds of things. You /must/ have an insertion operation which returns (in an out parameter) a boolean (or some other kind of) flag, where the flag indicates what happened (e.g. whether the key already existed or not), but does not raise an exception either way. This is because there are lots of well-used algorithms that rely (for their efficiency) on being able to quickly tell whether a key exists, and to insert a new value (only) if it doesn't (and to know if it did). This rules out indicating non-existing by raising an exception (way too slow), and doing a separate check in advance means searching the tree or hash table twice, which is too inefficient. In Ada at least, it is certainly /nice/ to have another procedure which has no flag, and which simply raises an exception if the key already exists. This is because there are many situations and algorithms that expect never to insert the same key twice (and if it happens, this indicates a problem with code or data). Obviously, this procedure could be written by the user in terms of the former procedure, but it is so often used it seems justified (to me) to provide it. You also need a replacement operation that returns a flag and a deletion operation that returns a flag, for the same reasons. **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 12:10 AM > You /must/ have an insertion operation which returns (in an out parameter) a > boolean (or some other kind of) flag, where the flag indicates what happened > (e.g. whether the key already existed or not), but does not raise an > exception either way. This is because there are lots of well-used algorithms > that rely (for their efficiency) on being able to quickly tell whether a key > exists, and to insert a new value (only) if it doesn't (and to know if it > did). This rules out indicating non-existing by raising an exception (way > too slow), and doing a separate check in advance means searching the tree or > hash table twice, which is too inefficient. I agree with all of this. The API supports all of this behavior. > In Ada at least, it is certainly /nice/ to have another procedure which has > no flag, and which simply raises an exception if the key already exists. > This is because there are many situations and algorithms that expect never > to insert the same key twice (and if it happens, this indicates a problem > with code or data). Obviously, this procedure could be written by the user > in terms of the former procedure, but it is so often used it seems justified > (to me) to provide it. The user can handle this very easily using the existing API: declare C : Cursor; B : Boolean; begin Insert (S, E, C, B); pragma Assert (B); end; or for a map: declare C : Cursor; B : Boolean; begin Insert (M, K, E, C, B); pragma Assert (B); end; I will state again that if Insert returns False, whether this is an error depends on the application. I have given many !examples of why a status of False is *not* an error. > You also need a replacement operation that returns a flag and a deletion > operation that returns a flag, for the same reasons. You certainly don't need another replacement operation that passes back a flag, since the flag-based insert provides a superset of the functionality of replace. (Look at the implementation of map Replace, which is simply a convenience function that is implemented in terms of Insert). Replace is just the same as: declare C : Cursor; B : Boolean; begin Insert (M, K, E, C, B); if not B then Replace_Element (C, By => E); end if; end; If you want a flag for replacement, then just use the algorithm above. Delete is a borderline case. I probably wouldn't bother passing back a flag since: (1) in the cursor-based delete, the delete must succeed; (2) in the key-based delete, you can test the value returned by Length before and after the call to determine whether the key was deleted, so no flag is necessary; (3) instead of using the key-based delete, you can use Find and the cursor-based delete as follows: declare C : Cursor := Find (M, K); begin if Has_Element (C) then Delete (M, C); end if; end; I will repeat my position on this matter: the only change we need to make here is to add an Insert for sets that omits the position and success parameters, and possibly add a Replace operation for sets. The map is adequate as is. **************************************************************** From: Robert A. Duff Sent: Friday, July 2, 2004 12:29 PM Nick Roberts says: > You /must/ have an insertion operation which returns (in an out parameter) a > boolean (or some other kind of) flag, where the flag indicates what happened > (e.g. whether the key already existed or not), but does not raise an > exception either way. This is because there are lots of well-used algorithms > that rely (for their efficiency) on being able to quickly tell whether a key > exists, and to insert a new value (only) if it doesn't (and to know if it > did). ... I find that convincing. **************************************************************** From: Adam Beneschan Sent: Friday, July 2, 2004 1:19 PM Marius Amado Alves wrote: > > I realize that I advocated avoiding changes to the specification, but this > > AI is going to be shot down by WG9 if it contains safety holes. > > I don't see a safety hole here, just a different style of doing the same > thing. Remember the "Success" parameter does not reflect some kind of > 'total' success of the operation, just that the element was already > there. Other, really abnormal, conditions raise exceptions as expected. and Matthew Heaney wrote: > If I have a set of integers, and I do this: > > Insert (S, 42, C, B); > Insert (S, 42, C, B); > > Then in the first call, B returns True, and in the second case, B returns > False. There are no errors. I agree that this shouldn't necessarily be considered an error (it depends on the application), but doesn't this indicate that the "Success" parameter is misnamed? The opposite of "Success" is "Failure", which does (to me) carry the connotation of "something going WRONG", i.e. an error. And sorry, I don't have a better suggestion. Something like We_Were_Able_To_Do_The_Insertion_Because_It_Wasnt_Already_There would be a more descriptive name but suffers from other flaws, such as being about three times too long :) I haven't been following this thread religiously, so my apologies if this ground has been covered already...... **************************************************************** From: Marius Amado Alves Sent: Friday, July 2, 2004 4:22 PM > I agree that this shouldn't necessarily be considered an error (it > depends on the application), but doesn't this indicate that the > "Success" parameter is misnamed? Yes. I had thought about this before. I almost sent an illustration similar to yours, I think it was Insert_If_Not_Already_There_Otherwise_Let_It_Be (... It_Was_Not_There_So_I_Have_Inserted : Boolean) and in the meanwhile I thought of Proper_Insertion or New_Element instead of Success. And the bloody operation is really Ensure_Inserted, no? And to this day I'm still a big fan of "Put" and "Get" (instead of Insert and Element). But I still don't advise any changes now, for the reasons Pascal gave... and took back :-) **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 5:35 PM > I agree that this shouldn't necessarily be considered an > error (it depends on the application), but doesn't this > indicate that the "Success" parameter is misnamed? The > opposite of "Success" is "Failure", which does (to me) carry > the connotation of "something going WRONG", i.e. an error. The boolean parameter simply conveys information about this insertion. My model is that the call is really an insertion attempt, and depending on the current state of the container the attempt can succeed or the attempt can fail. The fact that an insertion attempt "fails" does not imply that there's an error. There would only be an error if the post-condition could not be satisfied, in which case an exception would be raised. But as we have seen, the post-condition is satisfied (because the element is already in the set), so there is no error and hence no exception either. This behavior is similar to atomically grabbing a lock, and immediately returning if the resource is already locked. See for example the Win32 API function TryEnterCriticalSection. **************************************************************** From: Nick Roberts Sent: Friday, July 2, 2004 1:29 PM > The user can handle this very easily using the existing API: ... Yes, but it would be quite a bit neater and easier to be able to write the one line: Insert (S, E); for a set, or: Insert (M, K, E); for a map, and this functionality is quite often required in practice. >> You also need a replacement operation that returns a flag and a >> deletion operation that returns a flag, for the same reasons. > > You certainly don't need another replacement operation that passes > back a flag, since the flag-based insert provides a superset of the > functionality of replace. This isn't the functionality I was thinking of. > Replace is just the same as: > > declare > C : Cursor; > B : Boolean; > begin > Insert (M, K, E, C, B); > > if not B then > Replace_Element (C, By => E); > end if; > end; The replacement operation that I was thinking of would never insert a new key-value pair, it would either: replace the value for a given key, and return 'true' for a flag 'exists'; do nothing and return 'false' for the flag. The above code is not equivalent to this. The following code would, I think, be equivalent to what I intended: declare C : Cursor := Find (M, K); begin if Has_Element (C) then Replace_Element (C, By => E); end if; end; However, it would be slightly neater and easier to be able to write something like: Replace_When_Exists (M, K, E, B); > Delete is a borderline case. I probably wouldn't bother passing > back a flag since: > ... > (2) in the key-based delete, you can test the value returned by > Length before and after the call to determine whether the key was > deleted, so no flag is necessary; That's fine, since the Length is then acting as a kind of flag. Maybe not the neastest solution (there seems the danger of somewhat obfuscated code). > (3) instead of using the key-based delete, you can use Find and > the cursor-based delete as follows: > > declare > C : Cursor := Find (M, K); > begin > if Has_Element (C) then > Delete (M, C); > end if; > end; But again it would be slightly neater and easier to write something like: Delete_When_Exists (S, E, B); for a set, or: Delete_When_Exists (M, K, B); for a map. > I will repeat my position on this matter: the only change we need > to make here is to add an Insert for sets that omits the position > and success parameters, and possibly add a Replace operation for > sets. The map is adequate as is. I agree with this statement, but I think there is a marginal argument for the addition of replacement and deletion operations with a flag. **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 5:22 PM > Yes, but it would be quite a bit neater and easier to be able > to write the > one line: > > Insert (S, E); > for a set, or: Yes, but you want this statement to raise an exception if E is already in set S. This is *not* what I want. The reason we disagree is because we have different pre- and post-conditions for set insertion. Matt's pre- and post-conditions are: procedure Insert (S : in out Set; E : in ET); --pre: True --post: Is_In (S, E) That's why in Matt's universe there are no exceptions: the precondition is as weak as possible, and the post-condition guarantees that the element is in the set. If E is already in S, then the post-condition is satisfied, and when the post-condition is satisfied then there's no reason to raise an exception. However, in Nick's universe the pre-condition is: procedure Insert (S : in out Set; E : in ET); --pre: not Is_In (S, E) --post: Is_In (S, E) If the element is already in the set, then the precondition has been violated, and so an exception is raised. However, this behavior doesn't make a lot of sense, since the invariants of the set abstraction are preserved even if we were to weaken the pre-condition (as in Matt's universe). A note on exception behavior: In general, if a pre-condition is violated (and the operation detects this), then it is appropriate to raise an exception, in order to preserve the integrity of the abstraction, and to signal the fact that the post-condition cannot be satisfied. The strange thing about Nick's semantics is that the post-condition is satisfied even if the pre-condition isn't, so what's the point of having an exception? The effect of the call is the same either way. However, because we want to be good citizens, we're not supposed to violate pre-conditions (the exception is there to remind us to change our bad behavior), so insertion into a set would have to written like this: declare C : Cursor := Find (S, E); begin if not Has_Element (C) then Insert (S, E); end; end; But now of course we have doubled the amount of work, since the work done by Insert simply duplicates the work of Find, which is precisely what we were trying to avoid! At the end of the day, an insertion operation that omits the cursor and boolean parameters should have the same behavior as the insertion operation that includes those parameters. If you want insertion that omits the parameters to have a different behavior, then the operation should have a different name. This is precisely why the map operation sans cursor and boolean is named "Replace" instead of "Insert". > Insert (M, K, E); > > for a map, and this functionality is quite often required in practice. I have *never* needed an insertion operation to raise an exception, especially when I have an insertion operation that reports whether the insertion succeeded. I don't find the argument "often required in practice" very convincing, especially since we have had many actual examples of code from me and others that specifically don't need or want the exception. Instead of calling Insert as above (and getting an exception), then just call Replace: Replace (M, K, E); That does everything that the insertion operation above does, but without the exception. **************************************************************** From: Matthew Heaney Sent: Friday, July 2, 2004 8:56 PM BTW: The duplication of search overhead could be avoided in the code fragment above if the API had an insert with hint: declare C : Cursor := Ceiling (S, E); begin if not Has_Element (C) or else E < C then S.Insert (Hint => C, New_Item => E); end if; end; The hint form of insertion guarantees that insertion is O(1) if the hint is useful. This property is satisfied by the result of the Ceiling function. **************************************************************** From: Nick Roberts Sent: Friday, July 2, 2004 9:28 PM > ... > Yes, but you want this statement to raise an exception if E is > already in set S. This is *not* what I want. Heh. But it /is/ what I want! Actually, for set insertion, I think both operations -- raise exception if already there, do nothing if already there -- would be nice. I would kinda expect the latter to be named something like 'union', but that's pretty cosmetic. The reason I would like the insertion that would raise an exception is, for example, if I were reading a list of values from a file (or any serial source), and I wanted to check that there were no duplicates. On the other hand, of course, if I just wanted to build up a set and I didn't care about duplicates, I'd want the other kind of insertion. >> Insert (M, K, E); >> >> for a map, and this functionality is quite often required in >> practice. > > I have *never* needed an insertion operation to raise an > exception, especially when I have an insertion operation that > reports whether the insertion succeeded. I don't find the > argument "often required in practice" very convincing, > especially since we have had many actual examples of code > from me and others that specifically don't need or want the > exception. Well, all I can say is that I have been writing real application programs for business, science, and the military, for many, many years, and it is has often been a requirement for me. The typical scenario is that I've got a file (or other serial source) to read into a map (or equivalent), and I must check that there are no duplicates (of key). **************************************************************** From: Matthew Heaney Sent: Saturday, July 3, 2004 1:05 AM > The reason I would like the insertion that would raise an exception is, > for example, if I were reading a list of values from a file (or any serial > source), and I wanted to check that there were no duplicates. Fine, then use the Success parameter to check that there are no duplicates. > Well, all I can say is that I have been writing real application programs > for business, science, and the military, for many, many years, and it is > has often been a requirement for me. The typical scenario is that I've got > a file (or other serial source) to read into a map (or equivalent), and I > must check that there are no duplicates (of key). Fine, then use the Success parameter to check that there are no duplicates. **************************************************************** From: Nick Roberts Sent: Saturday, July 3, 2004 4:37 AM > Fine, then use the Success parameter to check that there > are no duplicates. [x2] Hehe. Yes, but that's not the point. The whole point of the original suggestion (an insertion without a flag) was not that it would do something that /cannot/ be done by the version with a flag, but that it would be a bit more convenient in many typical cases. It's merely the difference between: Account: Account_Record; ... while not End_of_File(F) loop Read(F,Account); Insert(M,Account.ID,Account.Balance); end loop; and: Account: Account_Record; Okay: Boolean; ... while not End_of_File(F) loop Read(F,Account); Insert(M,Account.ID,Account.Balance,Okay); raise Duplicate_Account when not Okay; end loop; so I would have to admit that it's almost a trivial convenience. But this /is/ a scenario that occurs very often in practice, and I think that actually justifies the inclusion of the non-flagged insertion. **************************************************************** From: Matthew Heaney Sent: Saturday, July 3, 2004 11:41 AM The point I was trying to make is that the library doesn't know whether the statement: Insert (M, K, E); is an error, meaning that it should propagate an exception if key K is already in map M. Only the library user can know whether a duplicate key is an error. (We have seen examples of both interpretations.) > It's merely the difference between: [example snipped] I would have handled a duplicate key by simply ignoring it. Or I would have used Replace. My argument is that the library should be neutral wrt duplicate key behavior. > so I would have to admit that it's almost a trivial > convenience. But this > /is/ a scenario that occurs very often in practice, and I think that > actually justifies the inclusion of the non-flagged insertion. It's helpful to differentiate map and sets here. Everyone seems to agree that the set statement: Insert (S, E); makes sense, since if E is already in the map then the post-condition is satisfied. The debate is about how to how to interpret the map statement: Insert (M, K, E); There are two possible interpretations if K is already in M: (1) This is an error, and a duplicate key exception is propagated. (2) This is not an error, and there are no exceptions. You could justify (1) on the grounds that since value E is not entered into the map, then the caller should be alerted to this fact. However, a reason to reject interpretation (1) is that you can simply call Replace to get that behavior. That leaves (2). This has the benefit of symmetry with the corresponding insertion operation for sets. The meaning of these operations is "if the key is already in the container, then do nothing." Another meaning is "this is the same as the canonical insertion operation, except that the cursor and boolean parameters are omitted." Obviously I favor interpretation (2). **************************************************************** From: Matthew Heaney Sent: Saturday, July 3, 2004 1:12 PM I think I've got to confess that I didn't expect Replace to have these semantics, and I didn't read the AI carefully enough. Sorry. I think perhaps, in the light of the lateness in the day, and the fact that these container abstractions were always intended to be quite low-level, upon which users would generally build higher-level abstractions, it's not worth arguing about a few extra convenience procedures too much. **************************************************************** From: Randy Brukardt Sent: Saturday, July 3, 2004 3:59 PM ... > The debate is about how to how to interpret the map statement: > > Insert (M, K, E); > > There are two possible interpretations if K is already in M: > > (1) This is an error, and a duplicate key exception is propagated. > (2) This is not an error, and there are no exceptions. > > You could justify (1) on the grounds that since value E is not > entered into the map, then the caller should be alerted to this fact. > However, a reason to reject interpretation (1) is that you can simply > call Replace to get that behavior. Ugh. Something called "Replace" should not have insertion semantics; that is, replacing something that doesn't exist is an error in my view. Probably the primary cause of confusion in this discussion is that "Replace" might do an insert, and "Insert" might not do an insert. Both of these seem goofy to me. But, as Nick said, it's probably more important that these are consistent and stable than that they match a particular world view (mine :-). **************************************************************** From: Matthew Heaney Sent: Saturday, July 3, 2004 4:47 PM I realized after I composed my last message that another possibility is to interpret the statement: Insert (M, K, E); as having the same behavior as the operation we're calling "Replace." That would allow us to either get rid of exising Replace operation, or keep it but give it the (slightly different) semantics Nick described in his earlier post. (It sounds like Randy's leaning that way already.) **************************************************************** From: Pascal Leroy Sent: Monday, July 5, 2004 4:03 AM Randy wrote: > Ugh. Something called "Replace" should not have insertion > semantics; that is, replacing something that doesn't exist is > an error in my view. Probably the primary cause of confusion > in this discussion is that "Replace" might do an insert, and > "Insert" might not do an insert. Both of these seem goofy to me. Agreed. This is hopelessly confusing. > But, as Nick said, it's probably more important that these > are consistent and stable than that they match a particular > world view (mine :-). None of this sound "consistent" or "stable" to me. At any rate the only world view that matters is that of the Heads of Delegations who will vote at the next WG9 meeting. In light of the discussion at the last meeting, I see trouble ahead. But that may only be my inexperience... **************************************************************** From: Marius Amado Alves Sent: Monday, July 5, 2004 9:32 AM Now you're scaring me! You mean the proposal seriously risks not pass simply because of this? Would an Note help? Operations Replace and Insert have a slightly more complex semantics than a direct interpretation of their names. Namely, the effect is conditioned by the prior existence of the specified key: Replace Insert ------------------------------------------------------ Key is already there replace item no change Key is not there yet add key, item add key, item ------------------------------------------------------ **************************************************************** From: Pascal Leroy Sent: Monday, July 5, 2004 10:38 AM I am saying (repeating, actually) that some countries have expressed concerns regarding the safety of the containers library as it stood at the time of the last WG9 meeting. I suspect that these countries will ultimately oppose this AI if they think that there are safety issues. Whether that will be the majority (which would effectively kill the AI) or not is an interesting question. Whether the said countries would go so far as to oppose the entire Amendment is another interesting question. As you can imagine, we can just go forward and count the votes, but the outcome may be unpleasant. It's much better to find a consensus before the vote. In the case at hand I am uncomfortable with the notion that Replace sometimes has an insertion semantics and Insert sometimes has a no-op semantics. My opinion doesn't count, however, as I am not voting at WG9. Furthermore, I may just be overreacting. However, I have a feeling that this dodgy semantics is going to be hard to swallow in some quarters. Back to the technical discussion. As far as I can tell we have identified five different behaviors, all of which make sense depending on the application needs: 1 - Insert and fail if key is already in the map. 2 - Insert and replace element if key is already in the map. 3 - Insert and do nothing if key is already in the map. 4 - Replace and fail if key is not in the map. 5 - Replace and insert if key is not in the map. My advice would be to provide all five behaviors using five subprograms with clearly distinct names. The notion of an out parameter that you can drop on the floor is sure to make people nervous. On the other hand, no-one is going to argue that there is a safety problem if you called Foo when you really wanted to call Bar. Just my two cents... **************************************************************** From: Jeffrey Carter Sent: Monday, July 5, 2004 12:02 PM There's also 6 - Replace and do nothing if key is not in the map. > My advice would be to provide all five behaviors using five subprograms > with clearly distinct names. The notion of an out parameter that you can > drop on the floor is sure to make people nervous. On the other hand, > no-one is going to argue that there is a safety problem if you called Foo > when you really wanted to call Bar. I'm not sure providing 6 different insert and replace operations is a good idea, either. One of each, with behaviors that don't overlap, combined with query operations that allow building the other 4, may be the clearest approach. That would argue for the operations that fail. **************************************************************** From: Jean-Pierre Rosen Sent: Monday, July 5, 2004 12:13 PM I'm a bit afraid of having to many subprograms... Why not follow the example of the "Drop" parameter in Ada.Strings.Fixed, i.e. having an enumeration specifying behaviour? **************************************************************** From: Nick Roberts Sent: Monday, July 5, 2004 12:16 PM I can suggest an emergency alteration to the AI. The changes required don't seem to be drastic. There are four cases covered here (cases 2 and 5 that Pascal suggested seem to be the same). > 3 - Insert and do nothing if key is already in the map. procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Position : out Cursor; Success : out Boolean); If Length (Container) equals Size (Container), then Insert calls Resize to resize Container to some larger value. Insert then uses Hash and Is_Equal_Key to check if Key is already present in Container. If a key matches, Success returns False and Position designates the element with the matching key. Otherwise, Insert allocates a new node, initializes it to Key and New_Item, and adds it to Container. Success returns True and Position designates the newly-inserted node. Any exceptions raised during allocation are propagated and Container is not modified. [This is exactly as in the current AI.] > 1 - Insert and fail if key is already in the map. procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Position : out Cursor); [One possible wording is:] Insert without a Success parameter is equivalent to Insert with a Success parameter with the difference that if Success would have been False then this operation propagates the exception Insertion_Error. [or another possible wording is:] If Length (Container) equals Size (Container), then Insert calls Resize to resize Container to some larger value. Insert then uses Hash and Is_Equal_Key to check if Key is already present in Container. If a key matches, propagates the exception Insertion_Error and Position designates the element with the matching key. Otherwise, Insert allocates a new node, initializes it to Key and New_Item, and adds it to Container. Position designates the newly- inserted node. Any exceptions raised during allocation are propagated and Container is not modified. > 2 - Insert and replace element if key is already in the map. > 5 - Replace and insert if key is not in the map. procedure Insert_or_Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); Insert_or_Replace inserts Key and New_Item as per Insert, with the difference that if Key is already in the map, then this operation assigns New_Item to the element associated with Key. Any exceptions raised during assignment are propagated. [This procedure is named Replace in the current AI.] > 4 - Replace and fail if key is not in the map. procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); Replace assigns New_Item to the element associated with Key. If Key is not already in the map, then this operation propagates the exception Replacement_Error, and does not perform any assignment. Any exceptions raised during assignment are propagated. [This procedure has the same profile as Replace in the current AI, but the wording is changed to provide the exception raising semantics.] Instead of the name 'Insert_or_Replace' I have suggested, a name such as 'Emplace' might be considered a little more succinct. We need to add the two exceptions Insertion_Error and Replacement_Error to the base containers package: ~~~ The library package Containers has the following declaration: package Ada.Containers is pragma Pure; type Hash_Type is mod ; type Size_Type is range 0 .. ; Insertion_Error, Replacement_Error: exception; end Ada.Containers; Hash_Type represents the range of the result of a hash function. Size_Type represents the (potential or actual) size (number of elements) of a container. Insertion_Error is raised when insertion into a container fails. Replacement_Error is raised when replacement of a value in a container fails. ~~~ I actually think we should refrain from trying to add any further operations to the AI now, since there could be a combinatorial explosion due to the other permutations (e.g. supplied value versus default value for insertion, fail silently versus raise exception versus return a flag for deletion). I guess it'll be difficult to make any changes at all. **************************************************************** From: Nick Roberts Sent: Monday, July 5, 2004 12:37 PM Possibly the following procedure should also be added: ~~~ procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Success : out Boolean); ~~~ I'll write a wording later. The idea is that it assigns New_Item if Key exists and sets Success to True, otherwise it sets Success to False. It could be argued that this procedure would not be much better than finding the key into a cursor, and then testing the cursor (and doing a Replace_Element if it is not No_Element). > I'm not sure providing 6 different insert and replace operations > is a good idea, either. One of each, with behaviors that don't > overlap, combined with query operations that allow building the > other 4, may be the clearest approach. That would argue for the > operations that fail. I think this is basically right, and if we think of adding any operations at all at this stage, it should be as few as possible. I think it's probably okay for the Delete to silently do nothing if the given Key does not exist, although this behaviour might be surprising to some programmers. **************************************************************** From: Marius Amado Alves Sent: Monday, July 5, 2004 12:35 PM > I'm not sure providing 6 different insert and replace operations is a > good idea, either. One of each, with behaviors that don't overlap, > combined with query operations that allow building the other 4, may be > the clearest approach. That would argue for the operations that fail. Whatever you do, remember that 'algebraic' behavior is not the only factor in the design, there is also the fact that these operations perform search, combined with the fact that many idioms can profit on this fact for efficiency if that result is made known or used directly by the 'strange' semantics (e.g. Replace doing an insertion), usually to avoid searching twice, as already well exemplified. The current operations represent well a set of primitives that takes this 'unpure' but required factors into consideration. Rename them Extended_Replace and Extended_Insert and provide the 'pure' operations with the 'unextended' names perhaps. **************************************************************** From: Nick Roberts Sent: Monday, July 5, 2004 12:56 PM On Mon, 5 Jul 2004 19:13:00 +0200, Jean-Pierre Rosen wrote: > I'm a bit afraid of having to many subprograms... Yes, I think we all are! > Why not follow the example of the "Drop" parameter in > Ada.Strings.Fixed, i.e. having an enumeration specifying > behaviour? I think that's a reasonable idea. We could add: type Failure_Action is (Ignore, Error); to the base package Ada.Containers and then add a parameter such as: On_Failure: Failure_Action := Error to the Insert, Replace, and Delete operations that didn't have a Success parameter. This parameter could be named 'When_Exists' or 'When_Absent' as appropriate. **************************************************************** From: Marius Amado Alves Sent: Monday, July 5, 2004 2:22 PM Someone had already proposed a 'control coupled' design, and with only Boolean types, which is better than this because it has less special types to manage. (Those special types in the standard String operations are a pain, each time I need them here I go for the RM to find out exactly what package do I have to withen, and then I have to use "use" to keep my sanity writing when the calls.) And some say control coupling is a bad thing, whatever types you use. **************************************************************** From: Nick Roberts Sent: Monday, July 5, 2004 3:50 PM I too think control-coupled procedures are often a bad idea, generally because they can make the implementations of those procedures a logical tangle (of deeply nested ifs and cases), which can be bad for correctness and maintenance, and sometimes significantly bad for performance. On the other hand, a proliferation of procedures, where you have many different variations on a theme, are also a bad idea. Indeed, in this case, I'd say a worse idea. As for using Booleans, I've tried that myself in the past and gradually come to the conclusion that using enumerated types with (more) meaningful names is usually preferable. The name availability problem can be ameliorated by techniques such as replication within the generic specification. For example, one could add the declarations: subtype Failure_Action is Ada.Containers.Failure_Action; Ignore: constant Failure_Action := Ada.Containers.Ignore; Error: constant Failure_Action := Ada.Containers.Error; to the specification of Ada.Containers.Hashed_Maps. **************************************************************** From: Matthew Heaney Sent: Monday, July 5, 2004 9:11 PM > I am saying (repeating, actually) that some countries have > expressed concerns regarding the safety of the containers > library as it stood at the time of the last WG9 meeting. What "safety" issue? The only one I can think of is the behavior of Update_Element for sets. Tucker suggested a change to remove the erroneous behavior and now all is well. Were there some others? > In the case at hand I am uncomfortable with the notion that > Replace sometimes has an insertion semantics and Insert > sometimes has a no-op semantics. Then why not just rename "Replace" to "Insert" instead? We would then have: (1) Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Position : out Cursor; Success : out Boolean); If the Key is not in the map Container, then insert a new value pair Key and New_Item in the map, and set Success to True. If the Key is already in the map, the set Success to False and don't do anything else. This is what we have right now. This has the same semantics as the similarly-named operation in the STL. (2) Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); If Key is not in the map, then insert a new value pair Key and New_Item in the map. If Key is already in the map, then assign New_Item to the element associated with Key. We have this operation in the API already, but it's called "Replace". We could also add another operation, with behavior not in the current API: (3) Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); If Key is not in the map, then do nothing. If Key is already in the map, then assign New_Item to the element associated with Key. > My opinion doesn't count, > however, as I am not voting at WG9. Furthermore, I may just > be overreacting. However, I have a feeling that this dodgy > semantics is going to be hard to swallow in some quarters. I don't know which semantics are "dodgy," since these are the same semantics (for Insert, anyway) as for the STL, which is already an ISO standard. > Back to the technical discussion. As far as I can tell we > have identified five different behaviors, all of which make > sense depending on the application needs: > > 1 - Insert and fail if key is already in the map. See (1) above. > 2 - Insert and replace element if key is already in the map. See (2) above. > 3 - Insert and do nothing if key is already in the map. Hmmm. What's the difference between "Insert and fail" and "Insert and do nothing"? Same as (1) above? > 4 - Replace and fail if key is not in the map. If by "fail" you mean "do nothing," then see (3) above. > 5 - Replace and insert if key is not in the map. I don't know what this means. Sounds like (2) above. > My advice would be to provide all five behaviors using five > subprograms with clearly distinct names. The consensus seems to be it's a problem that the operation we're now calling "Replace" can insert a new key if it doesn't already exist in the map. That's an easy problem to solve: just name the operation "Insert" instead. > The notion of an > out parameter that you can drop on the floor is sure to make > people nervous. I can't imagine why. Nick gave a nice summary of the rationale for such an operation, and I have given several examples of why conditional insertion is both useful and necessary. Note again that this is exactly what the STL does. > On the other hand, no-one is going to argue > that there is a safety problem if you called Foo when you > really wanted to call Bar. As far as I can tell, some people have objected to the fact that the operation named "Replace" can insert a new key. So either rename it "Insert," or just get rid of it. This is a very minor change. **************************************************************** From: Matthew Heaney Sent: Monday, July 5, 2004 9:20 PM > I'm a bit afraid of having to many subprograms... > Why not follow the example of the "Drop" parameter in > Ada.Strings.Fixed, i.e. having an enumeration specifying behaviour? As others have already pointed out, this is an example of "control coupling." This is my least favorite aspect of the Ada.Strings.* API. With the Insert we have now (the map operation that has 5 parameters), you can build any of the other behaviors. The operation named "Replace" is merely a convenience function, to either insert a new key if it isn't in the map, or replace the current element value if the key is already in the map. The issue seems to be that an operation named "Replace" can insert a new key. All we need to do is rename the operation "Insert". **************************************************************** From: Matthew Heaney Sent: Monday, July 5, 2004 9:24 PM > I can suggest an emergency alteration to the AI. The changes > required > don't seem to be drastic. Well, adding new exceptions to the API is a very drastic change. It's also completely unnecessary. All we need to do is change the name of the operation "Replace" to "Insert", and all is well. **************************************************************** From: Matthew Heaney Sent: Monday, July 5, 2004 9:30 PM > I think this is basically right, and if we think of adding > any operations > at all at this stage, it should be as few as possible. We don't need to add any new operations. Just rename "Replace" to "Insert". (Note that I would be in favor of creating a new replacement-style operation that has the semantics you mentioned in your post a couple of days ago, but even that can be easily built from the Find and Replace_Element primitives.) > I think it's > probably okay for the Delete to silently do nothing if the > given Key does > not exist, although this behaviour might be surprising to > some programmers. I can't image why. The post-condition is that the key isn't in the map. If the key isn't in the map before the call, then the post-condition is satisfied. What's surprising about that? **************************************************************** From: Pascal Leroy Sent: Tuesday, July 6, 2004 2:15 AM > > I am saying (repeating, actually) that some countries have > expressed > > concerns regarding the safety of the containers library as > it stood at > > the time of the last WG9 meeting. > > What "safety" issue? The only one I can think of is the > behavior of Update_Element for sets. Tucker suggested a > change to remove the erroneous behavior and now all is well. > Were there some others? If you don't mind, I am not going to disclose private discussions on a public forum. I was only trying to wave a red flag. "Safety" is not merely "erroneousness". Consider two changes that were made by the ARG recently: (1) the definition of map equality was changed to compare the key/element pairs, instead of only the elements; and (2) functions Lower_Bound and Upper_Bound were made symmetrical. There were no erroneousness issues in these cases; still, without these changes the AI was sure to be dead-on-arrival, take my word for it. The reason is that semantics that are "surprising" can very easily lead to programming errors, so it is best to make the semantics as pure and "natural" as possible, given the other constraints. Of course, what counts as "surprising" is in the eye of the beholder to some extent. But the fact that there has been so much discussion on Insert and Replace recently is probably an indication that these operations are not exactly WYSIWYG. > I don't know which semantics are "dodgy," since these are the > same semantics (for Insert, anyway) as for the STL, which is > already an ISO standard. This is completely bogus. The fact that STL is an ISO standard is irrelevant. AI-302 will be judged based on how well it preserves and extends the "good properties" of Ada: safety, readability, portability, etc. If it can be compatible with STL, so much the better, but if it cannot, too bad. In particular, if the semantics of some operations are felt to be inadequate, repeating the mantra "it's the same as STL" won't help. **************************************************************** From: Pascal Leroy Sent: Tuesday, July 6, 2004 2:31 AM Nick wrote: > I too think control-coupled procedures are often a bad idea, > generally because they can make the implementations of those > procedures a logical tangle (of deeply nested ifs and cases), > which can be bad for correctness and maintenance, and > sometimes significantly bad for performance. Note that we should really be designing this library for the user, not for the implementer. After all, there will only be a handful implementations of these units (the compiler vendors, plus a few Matts here and there). These implementations will hopefully be extensively tested by the ACATS. So correctness and maintenance of the library itself is only a secondary concern. The real issue is correctness and maintenance of the code on the client side. Here I don't necessarily see control coupling as bad. Syntactically, there is very little difference between: Insert_And_Replace_If_Present (...); and: Insert (Replace_If_Present => True, ...); And in fact, judicious use of defaulted parameters can improve the readability of the calls (you demonstrated how a defaulted parameter could be use to force detection of errors by default). Furthermore, control coupling makes it possible to dynamically/globally switch some options (for instance, detect errors based on the value of an environment variable), something which is hard to do with multiple entry points. > On the other hand, a proliferation of procedures, where you > have many different variations on a theme, are also a bad > idea. Indeed, in this case, I'd say a worse idea. > > As for using Booleans, I've tried that myself in the past and > gradually come to the conclusion that using enumerated types > with (more) meaningful names is usually preferable. I agree with Nick here. **************************************************************** From: Cyrille Comar Sent: Tuesday, July 6, 2004 4:42 AM Pascal Leroy writes: > It's much better to find a consensus before the vote. I strongly agree. Better have a consensus now and a some official base for Ada containers as part of the standard rather than wait for a non widely discussed defacto standard to emerge. **************************************************************** From: Nick Roberts Sent: Tuesday, July 6, 2004 6:33 AM I will play devil's advocate, and make a proposal for changes to the AI, based on the idea of having a parameter to control behaviour when certain pre-conditions are not met. Change the base package: ~~~ The library package Containers has the following declaration: package Ada.Containers is pragma Pure; type Hash_Type is mod ; type Size_Type is range 0 .. ; type Error_Action is (Ignore, Error); Key_Error: exception; end Ada.Containers; Hash_Type represents the range of the result of a hash function. Size_Type represents the (potential or actual) size (number of elements) of a container. Key_Error and Error_Action are used in conjunction with certain container operations, for handling the situation when a key does or does not exist, as described below. ~~~ All of the following changes are for the Ada.Containers.Hashed_Maps generic package. Add the following declarations into the package listing: ~~~ subtype Error_Action is Ada.Containers.Error_Action; Ignore: constant Error_Action := Ada.Containers.Ignore; Error: constant Error_Action := Ada.Containers.Error; ~~~ Add the following wording after that for the existing Insert: ~~~ procedure Insert (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Position : out Cursor; Key_Exists : in Error_Action := Error); Insert without a Success parameter is equivalent to Insert with a Success parameter with the difference that if Success would have been False then this operation does: nothing, if Key_Exists is Ignore; propagates the exception Key_Error, if Key_Exists is Error. ~~~ Add the specification of this procedure into the package listing. Rename the procedure 'Replace' as 'Insert_Or_Replace' in the package listing and in the wording. Add the following wording after that for Insert_Or_Replace: ~~~ procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type; Key_Absent : in Error_Action := Error); Replace assigns New_Item to the element associated with Key. If Key is not already in the map, then this operation does not perform any assignment and does: nothing, if Key_Absent is Ignore; propagates the exception Key_Error, if Key_Absent is Error. Any exceptions raised during assignment are propagated. ~~~ Add the specification of this procedure into the package listing. Replace the current wording for the Delete procedure with Key with: ~~~ procedure Delete (Container : in out Map; Key : in Key_Type; Key_Absent : in Error_Action := Error); Delete uses Hash and Is_Equal_Key to check if Key is present in Container. If Key matches the key of a node, Delete removes the node from the map and then deallocates the node. If Key is not already in the map, then this operation does: nothing, if Key_Absent is Ignore; propagates the exception Key_Error, if Key_Absent is Error. AARM Notes: Delete should only compare elements that hash to the same bucket in the hash table. Delete with Key_Absent=Ignore should work on an empty map; nothing happens in that case. ~~~ Replace the current specification of this procedure in the package listing with the specifcation above. I think this idea raises an issue which needs consideration. Many other operations raise Constraint_Error (e.g. if a key does not exists, or a cursor is No_Element). One possibility is for Key_Error to be removed from this change, and for Constraint_Error to be raised in its place. Another possibility is to raise Key_Error instead of Constraint_Error in many further places in the AI; in this case a more generalised name, such as Container_Error, might be more appropriate. I feel the latter option would probably be helpful to the user for debugging. I'll happily suggest a set of changes to Ordered_Sets if the above seem at all acceptable. **************************************************************** From: Marius Amado Alves Sent: Tuesday, July 6, 2004 8:23 AM >>As for using Booleans, I've tried that myself in the past and >>gradually come to the conclusion that using enumerated types >>with (more) meaningful names is usually preferable. > > I agree with Nick here. You guys need to take a deep breath or something :-) For the case at hand an enumeration with only two values instead of a Boolean is just useless baggage. But I'm against *any* control coupling here. FWIW I agree with Matt's set of proposals: simply rename Replace to Insert, maybe rename Success to Proper_Insert or something, maybe add the flagless variant, keep the rest of the map spec as is, and adjust the sets spec. **************************************************************** From: Michael Yoder Sent: Tuesday, July 6, 2004 11:35 AM On the semantics of Insert: speaking mathematically, the notion that "Insert might not insert" doesn't bother me. Insertion of an element ought to be equivalent to finding the union with a singleton set, and yes, that means if the element is already there the set doesn't change. Still, I can accept that programmers' intuitions might not match those of mathematicians in general or mine in particular. If I insert an ace of spades into a poker hand already containing one, this is a situation which (expressed delicately) indicates an error condition. I don't find such analogies appropriate, but others might. For sets, I suggest adding a procedure Insert_New. Speaking coarsely, it acts just like Insert, but raises an exception if the inserted element is already present. For maps, I suggest these names for the five cases enumerated by Pascal: On Jul 5, 2004, at 11:38 AM, Pascal Leroy wrote: > > Back to the technical discussion. As far as I can tell we have identified > five different behaviors, all of which make sense depending on the > application needs: > > 1 - Insert and fail if key is already in the map. Insert_New > 2 - Insert and replace element if key is already in the map. Insert > 3 - Insert and do nothing if key is already in the map. Insert_If_New > 4 - Replace and fail if key is not in the map. Replace_Old > 5 - Replace and insert if key is not in the map. Replace > > My advice would be to provide all five behaviors using five subprograms > with clearly distinct names. The notion of an out parameter that you can > drop on the floor is sure to make people nervous. On the other hand, > no-one is going to argue that there is a safety problem if you called Foo > when you really wanted to call Bar. **************************************************************** From: Matthew Heaney Sent: Sunday, July 11, 2004 9:03 PM > There are similar Insert procedures for both Ordered_Sets > and Hashed_Maps with highly useful Position and Success > parameters. Sometimes however, it seems somewhat disturbing > to see declarations of variables for Position and Success > that are not read because it is considered safe to ignore > them. It might be known that Insert will succeed without > surprises (ceteris paribus). Examples include adding initial > values to a library level container, like 69 keywords, or > adding known border values to ordered containers. In the case of a set, the postcondition is satisfied no matter what value is returned, so yes there should be convenience operation that omits the cursor and boolean parameters, since they only add syntactic noise. In the case of a map, you already have a convenience function called "Replace" that omits the cursor and boolean parameters, and it is defined to replace the element associated with the key if the key already exists in the map. I had originally named the three-parameter insertion operation for maps "Insert", but was concerned that developers would think that it has the same behavior as the five-parameter Insert, but with the cursor and boolean parameters omitted. Since the behavior of the three-parameter operation was kind of like Replace_Element (the difference is that it can insert a new key), I named the operation "Replace". However, others have argued that the fact that it can insert a new key means it should be named Insert, which is OK with me. So ultimately your request is for adding a convenience operation to the set. There is already a convenience operation for maps, but it is currently named Replace. > In a > sense, this might also make Ordered_Sets and Hashed_Maps > correspond more closely to Vectors and Doubly_Linkes_List > with regard to Insert procedures. Yes. Here's some history. My original proposal was quite long, and the ARG asked me to cut it down. As I was re-writing the proposal to make it shorter, I looked for operations that were possible candidates for removal, and one of the operations I removed was the two-parameter insert for sets. This appears to have been a mistake, but it's no big deal to add it back in. We deliberately deployed a reference implementation to discover API bugs early, and this looks like one. (If you have an Ada 2005 compiler that lets you compile children of package Ada, then you can use the a-c*.ad[sb] version of the library. It's at the tigris site.) > (When using maps, I sometimes think of them as sparse > arrays. With arrays, I can just write > > ary(key) := value; > > and be done.) This is exactly how the STL works: ary[key] = value; You can even do stuff like: ++ary[key]; and it's guaranteed to work, since the standard says that if key isn't in the map then it is inserted, and its element is constructed with its default value. The index operation then returns a reference to the element (newly-inserted or not), which the increment operator then works on. In the Ada case, you have to pass in the default value explicitly (see my histogram example), but it works the same as the index operation above (the syntax is different, of course): Ary.Insert (Key, 0, C, B); declare procedure Increment (I : in out Integer) is begin I := I + 1; end; begin Update_Element (C, Increment'Access); end; This is equivalent to the C++ statement above. Note that the algorithm above works no matter what value is returned for B. The only thing we require is that the insertion be conditional; that is, if the key is already in the map, then leave the associated element alone. It is certainly not an "error" if B is false and the existing element value is left unmodified, and in fact the algorithm above depends on this behavior. **************************************************************** From: Matthew Heaney Sent: Wednesday, July 7, 2004 6:52 AM Randy: I moved the update_element procedure into the generic_keys nested package and declared it this way: generic with function Key (Element : in Element_Type) return Key_Type; procedure Generic_Update_Element (Container : in out Set; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); I made the Key function a generic formal subprogram for that operation directly instead of for the entire nested generic package, since that's the only operation that needs a key selector. The basic algorithm is something like: procedure Generic_Update_Element (...) is Old_Key : Key_Type renames Key (Position.Element); begin Process (Position.Element); if Old_Key < Position.Element or else Old_Key > Position.Element then if then raise Constraint_Error; end if; end if; end Generic_Update_Element; Note that you don't need to pass an "=" operator for keys, since you already have "<" and ">" for comparing keys to elements. One question I had was whether update element is allowed to change the key (and hence change the relative position of the key's node). In the code fragment above, the node is moved (deleted then immediately re-inserted) when there's been a change in the value of the key, and there's an error only if there's already a key with the new value. Another way to have handled this is to not allow the re-insertion at all: if Old_Key < Position.Element or else Old_Key > Position.Element then raise Constraint_Error; end if; This handles the key-change more pessimistically. It wasn't clear from your notes which behavior was intended. You can check out the latest packages here: ordered sets spec: ordered sets body: I'll have the indefinite ordered sets (a-ciorse.ad[sb]) done in a day or two. **************************************************************** From: Randy Brukardt Sent: Wednesday, July 7, 2004 6:43 PM ... > I made the Key function a generic formal subprogram for that operation > directly instead of for the entire nested generic package, since that's the > only operation that needs a key selector. I think I prefer it to be on Generic_Keys, simply because that avoids the need to use two instantiations. ... > Note that you don't need to pass an "=" operator for keys, since you already > have "<" and ">" for comparing keys to elements. Right. I had already noted that in the draft minutes. I don't know if I wrote Tucker's proposal down wrong, or if he made that mistake, but it's clearly wrong when you read the whole discussion. > One question I had was whether update element is allowed to change the key > (and hence change the relative position of the key's node). In the code > fragment above, the node is moved (deleted then immediately re-inserted) > when there's been a change in the value of the key, and there's an error > only if there's already a key with the new value. I don't know; Tucker didn't cover that issue. My gut feeling is that the way you have it is better, because dropping elements on the floor (even with an exception) is nasty. If there is an alternative, it would be preferred. (Of course, doing that could silently cause the routine to be expensive, which is also annoying. But it wouldn't be much more expensive than Replace, so it probably is OK.) **************************************************************** From: Matthew Heaney Sent: Wednesday, July 7, 2004 9:31 AM > I think I prefer it to be on Generic_Keys, simply because > that avoids the need to use two instantiations. OK. I declared the nested package like this: generic type Key_Type (<>) is limited private; with function Key (Element : in Element_Type) return Key_Type; with function "<" (Left : Key_Type; Right : Element_Type) return Boolean is <>; with function ">" (Left : Key_Type; Right : Element_Type) return Boolean is <>; package Generic_Keys is ...; And I declared the operation like this: procedure Update_Element (Container : in out Set; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); > > One question I had was whether update element is allowed to change the > > key (and hence change the relative position of the key's node). In > > the code fragment above, the node is moved (deleted then immediately > > re-inserted) when there's been a change in the value of the key, and > > there's an error only if there's already a key with the new value. > > I don't know; Tucker didn't cover that issue. My gut feeling > is that the way you have it is better, because dropping > elements on the floor (even with an > exception) is nasty. If there is an alternative, it would be > preferred. (Of course, doing that could silently cause the > routine to be expensive, which is also annoying. But it > wouldn't be much more expensive than Replace, so it probably is OK.) Yes, but it's probably still going to beat any kind of normal insertion, since the node isn't destroyed and the element isn't copied. Another thing I realized is that if we do raise an exception (because the new key duplicates an existing key), then the cursor that was passed in effectively becomes a dangling reference. When we delete a key thru a cursor, we set the cursor value to No_Element on return. Another possibility for the declaration of Update_Element, that is analogous to the Delete operation, is: procedure Update_Element (Container : in out Set; Position : in out Cursor; --NOTE MODE Process : not null access procedure (Element : in out Element_Type)); We could define the semantics as follows: if the key hasn't been modified, then Position retains its value. Otherwise, the node is removed from the tree and then re-inserted back into the tree. If the insertion was successful, then Position retains its value. If the insertion was not successful, then the node is deallocated and Position is set to No_Element. (Note that there aren't any exceptions.) It's just an idea, so I figured I'd bring it up. Since Update_Element now has deletion semantics, maybe that should be more obvious. **************************************************************** From: Randy Brukardt Sent: Wednesday, July 7, 2004 7:13 PM At the recent ARG meeting we discussed the meaning of Swap for Lists. When working on the minutes of the meeting, I noticed this discussion, and I think that the resolution of the issue was nonsense (which I failed to realize at the time). So I'm bringing it up here for further discussion. --- Here's the meeting minutes on the topic: Swap for list: After a swap, do the cursors still designate the same elements, or do they designate the swapped elements? The semantics should be similar to that for an array: much like an index designate a “box” in the array, a cursor designates a “box” holding an element, so after a swap, the elements in the boxes should be changed. So we swap the elements, not the nodes. --- There are a couple of problems with this semantics: * This cannot be implemented without an extra level of indirection for indefinite containers. While an Ada standard implementation would necessarily have such a level of indirection, it is easy to imagine an implementation supporting an extension to Ada that would eliminate that indirection and the costs thereof. That is, an implementation could support a single indefinite component at the end of the node record (with the requirement that such a record is initialized on creation); that would get rid of extra allocation costs. I don't think we want to prevent such an implementation solely for Swap. * This completely eliminates any performance advantage for a special swap routine for lists. Since the objects will be copied (at least in the definite case), it would always be better for the programmer to code their own routine that swapped the nodes. As such, I don't see the point of defining this routine for Lists. Indeed, since this routine only provides a performance advantage for indefinite vectors (in all other cases the performance is the same or worse than something built out of primitives), I have to wonder if it worth having this routine at all. In any case, I think that the thinking about cursors reflected in the minutes is flawed. A cursor designates an element - how that is accomplished isn't supposed to be visible in the specification of the containers. We specifically dropped a lot of wording about "nodes" for this reason - it's not necessary to reason about the containers. That said, it's obvious that the Swap for list should just swap the positions of the elements in the list, and cursors should continue to designate the same elements. (Logically, what they designate should be unspecified, but that seems to be going to far for lists.) The vector one is different, simply because cursors are weird -- we have all of the Bounded Error rules to explain that they don't necessarily designate the same element afterwards. Thus, for the vector one, it should be a Bounded Error to use the cursors afterwards, just like for Insert and Delete. (Indexes don't have that sort of issue, and usually using indexes would be preferred.) Another alternative is to recognize that these operations are fundamentally different, and thus give them different names. But that seems to be rather confusing. All in all, I don't think (and I have *never* thought) that a swap operation is worth having for any of the containers, because it is just too complicated to define sensibly. **************************************************************** From: Nick Roberts Sent: Wednesday, July 7, 2004 7:31 PM I have to say, I am annoyed that you didn't follow my suggested model of internal cursors. If you had done so, you simply wouldn't have these problems. **************************************************************** From: Randy Brukardt Sent: Wednesday, July 7, 2004 8:19 PM I haven't the foggiest idea what you are talking about; there isn't any messages from you in Ada Comment with both "internal" and "cursor" that remotely discuss a "model". (If you want to post a reference, *please* just give the date of the message, don't quote the whole thing. There is far too much quoting going on around here, which makes it redundant for later readers of the mail.) And, in any case, the model of cursors really has nothing to do with this. Swap is a stupid operation; it's used mostly in student assignments and sub-optimal sorting algorithms. I don't think I've used Swap for anything else in my nearly 30 years of programming -- simply because it always requires three copies of something, and that's 1/3 too many. Since the only real use of Swap is in the sorting algorithms, making it user-visible is stupid. If it makes some people feel good, fine, as long as it doesn't screw up the model. It isn't worth that. **************************************************************** From: Nick Roberts Sent: Thursday, July 8, 2004 7:30 AM > I haven't the foggiest idea what you are talking about; there isn't > any messages from you in Ada Comment with both "internal" and > "cursor" that remotely discuss a "model". It was something that was lengthily discussed in comp.lang.ada, the ASCL mailing list, and other places. I'm surprised you don't remember anything about it, Randy, I thought everyone knew about it. My proposals were posted on the web for a long time. Swing your browser to: http://www.adapower.net/ascl/ It's all still there. > And, in any case, the model of cursors really has nothing to do > with this. I think it does. You were arguing that Swap should be dropped for doubly-linked lists (and maybe also for vectors) because, with the current cursor model, it would be difficult (impossible?) to implement the same semantics for both the lists and the vectors. With my model, there would be no such difficulty. > Swap is a stupid operation; it's used mostly in student > assignments and sub-optimal sorting algorithms. Sub-optimal? Such as Quicksort and Mergesort? > Since the only real use of Swap is in the sorting algorithms, > making it user-visible is stupid. I don't about 'stupid', but less than vitally important perhaps. > If it makes some people feel good, fine, as long as > it doesn't screw up the model. It isn't worth that. Well, I agree, at this stage. I just think it's a pity my model wasn't used from the outset. **************************************************************** From: Matthew Heaney Sent: Thursday, July 8, 2004 9:35 AM > That said, it's obvious that the Swap for list should just swap the > positions of the elements in the list, and cursors should continue to > designate the same elements. (Logically, what they designate should be > unspecified, but that seems to be going to far for lists.) The vector one is > different, simply because cursors are weird -- we have all of the Bounded > Error rules to explain that they don't necessarily designate the same > element afterwards. Thus, for the vector one, it should be a Bounded Error > to use the cursors afterwards, just like for Insert and Delete. (Indexes > don't have that sort of issue, and usually using indexes would be > preferred.) So if you were in favor of retaining Swap for lists, then does this mean that you want the original, pre-palma semantics? (Swap "exchanges nodes," to use Matt's term.) **************************************************************** From: Randy Brukardt Sent: Thursday, July 8, 2004 1:36 PM Yes. The Palma semantics don't do anything useful, and certainly are not what you would write yourself if you had a need to swap two nodes in a hand-programmed list. **************************************************************** From: Randy Brukardt Sent: Friday, July 2, 2004 2:28 PM > It was something that was lengthily discussed in comp.lang.ada, the ASCL > mailing list, and other places. I'm surprised you don't remember > anything about it, Randy, I thought everyone knew about it. Oh, I didn't realize that you were talking about (ancient) pre-history. Since such things are not recorded with the AI (meaning that its impossible to go back and look them up), references to them are going to be more confusing than enlightening. In this case, I thought that you were referring to some specific suggestion for how the wording should be written for *this* proposal -- thus the confusion. In any case, I think virtually everyone here has an idea of how they would design this library differently if they were doing it. Whether those ideas are better or not is pretty much irrelevant, as we've decided to use Matt's basic design - in large part because Matt has done more thinking on this topic (including usage issues) than nearly anyone else, *and* he submitted a complete proposal as a starting point. I certainly would not have implemented cursors as Matt has; but our job at this point is to insure that the description of those cursors is correct, and that the operation set is complete and described correctly -- not to gripe about some other design being better. (We all know that "best is the enemy of good enough", and that certainly applies here.) **************************************************************** From: Matthew Heaney Sent: Friday, July 9, 2004 9:33 AM Given Randy's analysis, I think the vector and list swap operations look like this: For vectors: procedure Swap (Container : in out Vector; I, J : in Index_Type'Base); procedure Swap (I, J : in Cursor); For lists: procedure Swap (Container : in out List; I, J : in Cursor); The semantics are as follows. For the vector swap operations, the elements in the vector are swapped. One question we need to answer is whether cursors in the cursor-based swap for vectors remain valid. I know that our working model is that cursors designate elements (rather than positions within the vector), so if the element moves then the cursor is supposed to follow the element. (In the reference implementation the cursors do remain valid, since internally they're just index values. But this breaks the model above, since I and J continue to designate the same relative positions as before the swap, and hence I and J deliver different element values following the swap. Compare this to the list behavior below.) For the list swap operation, the nodes designated by I and J are relinked. (I know I'm not supposed to say "nodes" or "relinked," but just bear with me.) I and J continue to designate the same nodes before and after the swap (and hence, I and J return the same element values as existing prior to the swap). Following the swap, I designates the node in J's former relative position in the list, and J designates the node in I's former relative position. (These are just the the pre-Palma semantics.) Note that the behaviors for vector swap and list swap are different. I think that's OK, since the reason this operation exists is to allow you to take advantage of the particular representation of the container, in a way that allows swap to be more efficient. In the case of the definite vector, the implementation will most likely exchange element values by creating at least one temporary. So in this one case, swap doesn't confer any great advantage besides convenience (and we need it anyway, since the spec must be identical to the indefinite vector). For the indefinite vector, swap really does confer an advantage, since some form of element indirection is implied, and so the implementor can swap internal pointers instead of elements. (This was our original motivation for introducing a swap operation.) The swap for the definite and indefinite lists work the same, by relinking internal nodes. (In the pre-Palma reference implementation, I implemented swap for lists using Splice.) Note that each of the three operations declared above has a different signature. That means there's no possibility that switching from a vector to a list (say) will suddenly introduce different swap semantics (presumably the application was depending on the original semantics), since the signatures are different and hence the change will be caught by the compiler. (You need to pass the list to the list swap operation, since the list caches pointers to the first and list nodes, and so if you're moving nodes then the cache values might change. In the vector case, you're moving elements not nodes, so you don't need to pass the vector object.) **************************************************************** From: Matthew Heaney Sent: Saturday, July 10, 2004 11:44 AM Now that container types are publicly tagged, it might make sense to declare some container type parameters as class-wide. For example, the generic_sort and generic_merge for lists are generic operations, which means these operations aren't primitive for the list type. Should they be declared this way: generic with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Sort (Container : in out List'Class); generic with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Merge (Target : in out List'Class; Source : in out List'Class); If a user does derive from List (say), then he would have to convert back to the parent type in order to call an instantiation of generic_sort or generic_merge, but that's kind of a pain. In the case of generic_merge, another possibility is to pass the type as a generic formal: generic type List_Type is new List with private; with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Merge (Target : in out List_Type; Source : in out List_Type); This would force target and source to have the same type. But of course it's more work to instantiate. For the sets, there are a couple of predicate functions that accept two set objects. The declarations would look like: function Is_Subset (Item : Set; Container : Set'Class) return Boolean; function Is_Disjoint (Item : Set; Container : Set'Class) return Boolean; (Note that we still haven't decided that the parameter names should be -- should "Container" come first?) The sets package also has the nested generic Generic_Keys. None of its operations are primitive for type Set either. That might be an argument for declaring all of the set container parameters as type Set'Class. (Another possibility is to pass in the set type as a generic formal.) Note that the sets have the union, intersection, etc operations, and the lists have the splice operations, but I think those can stay as is. **************************************************************** From: Nick Roberts Sent: Monday, August 2, 2004 4:23 PM Randy's done a great job, as usual, updating this AI. For most (maybe all) of the changes, I say "Hooray!" As always, I have some comments. Happily, these are all very minor issues. Just dismiss any question already answered or issue already dealt with (and accept my apologies). * In line 205, swap two items (to accord with the order of presentation of the four container kinds), from: The following major non-limited containers are provided: * (Expandable) Vectors of any non-limited type; * Doubly-linked Lists of any non-limited type; * Ordered Sets of any non-limited type; * Hashed Maps keyed by any non-limited type containing any non-limited type. to: The following major non-limited containers are provided: * (Expandable) Vectors of any non-limited type; * Doubly-linked Lists of any non-limited type; * Hashed Maps keyed by any non-limited type containing any non-limited type. * Ordered Sets of any non-limited type; * Rephrase: Separate versions for definite element types are provided, as those can be implemented more efficiently. as: Separate versions for definite and indefinite element types are provided, as those for definite types can be implemented more efficiently. * Typo in line 253: specify precisely where this will happen (it will happen no latter than the Change 'latter' to 'later'. * Add a requirement (or Imp Adv) for Hash_Type'Modulus to be a power of two? (Line 299.) * In line 307, clarify what the 'back end' of a vector is :-) Maybe: The language-defined package Containers.Vectors provides a private type Vector and a set of operations. A vector container allows insertion and deletion at any position, but it is specifically optimized for insertion and deletion at the back end of the container (the end with the highest index). A vector container also provides random access to its elements. * I suggest to rephrase: A vector container object manages an unconstrained internal array, which expands as necessary as items are inserted. The *capacity* of a vector corresponds to the total length of the internal array, and the *length* of a vector corresponds to the number of active elements in the internal array. A vector container may contain *empty elements*. Empty elements do not have a specified value. as: A vector has a *length*, of the type Containers.Count_Type, which varies dynamically and is the number of elements the vector contains. These elements are the *active elements* of the vector. Their indices are of the subtype Index_Type, and occupy the range f .. f+n-1, where f is Index_Type'First and n is the length of the vector. A vector container object manages an internal array, which expands as necessary. The *capacity* of a vector corresponds to the number of elements which can be stored in the internal array, and will always be no less than the length of the vector. An active element may be *empty*. An empty element does not have a specified value, and it is an unbounded error to read an empty element. I know this is a little bit more long-winded, but I think it is a bit clearer. * Maybe the AARM note at line 329: The internal array is a virtual structure. There is no requirement for the implementation to be a single contiguous array. could be better phrased: The internal array is a conceptual model for the purposes of defining the semantics of vectors. There is no requirement for an implementation to actually use a single contiguous array. * Maybe add a AARM note: pragma Assert (Index_Type'Base'First < Index_Type'First); It is essential that Last_Index is always able to return a valid value, including for an empty vector. It cannot do this if Index_Type'Base'First = Index_Type'First, so it is a requirement that Index_Type'Base'First < Index_Type'First for any instantiation of Containers.Vectors, and this pragma enforces the requirement. * What is the purpose of Index_Subtype? (This question seems to have been raised but not answered.) * For vectors, lists, maps, and sets the procedures named 'Iteration' and 'Reverse_Iteration', which were generic but are now procedures taking an access-to-subprogram parameter, would perhaps now be more appropriately named 'Iterate' and 'Reverse_Iterate'? (Or 'Traverse' and 'Reverse_Traverse'? :-) * Should the same thing (change from generic procedure to procedure with access-to-subprogram parameter) be done for Generic_Sort? (I guess the answer is to do with efficiency.) * Should procedure 'Assign' be called 'Copy', to emphasise its distinction from normal assignment? Should it have start and end parameters? (I see idea this already got suggested.) * The description for Last_Index line 987: Returns the position of the last element in Container. could be better expressed as: If the length of Container is 0, Last_Index returns First_Index(Container) - 1, otherwise it returns the index of the last active element in Container. * Typo in line 1065: function Reverse_Find (Container : Vector; Item : Element_Type; -> Index : Index_Type'Base := Index_Type'Las)) return Index_Type'Base; The 't' is missed off 'Last' and there's an extra ')'. * There may be a call ambiguity problem with Find and Reverse_Find. These are provided for both an index and cursor starting point: function Find (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'First) return Index_Type'Base; function Find (Container : Vector; Item : Element_Type; Position : Cursor := No_Element) return Cursor; A call Find(Cont,Item) in a context which could require either an index or a cursor will be ambiguous. I'm just pointing this one out at the moment; maybe it's too trivial to worry about. * In the description for Splice, line 1635: last node of Container. The length of Target is incremented, and the length of Source is decremented. 'Container' needs to be changed to 'Target'. * In line 1788: AARM Note: The name is "Hashed_Maps" to allow for a secondary standard to include "Sorted_Maps". Maybe the suggested name should be "Ordered_Maps", by analogy to the package "Ordered_Sets"? * For the map 'Replace' procedure: procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); Replace inserts Key and New_Item as per Insert, with the difference that if Key is already in the map, then this operation assigns New_Item to the element associated with Key. Any exceptions raised during assignment are propagated. I think this procedure should be called 'Replace_or_Insert', to avoid potential confusion about its semantics to anyone reading code using it. This point has already been argued about, so maybe it should rest as is now. * For the Delete at cursor procedure: procedure Delete (Container : in out Map; Position : in out Cursor); If Position equals No_Element, this operation has no effect. Otherwise, Delete removes the node from the map and deallocates the node. Position is set to No_Element on return. Is it such a good idea for Position to have mode 'in out'? This prevents constructions such as (concocting an example): Delete(M,Next(C)); which might be handy sometimes. * Minor typo in line 2118: If Length (Container) > Count, then Constraint_Error is propogated. Change 'propogated' to 'propagated'. Also, on the following line, perhaps the phraseology should be: Otherwise, Set_Capacity allocates ... I'd suggest the same for the wording for the following First and Next functions. * Is it really appropriate for the Key_Type formal parameter of generic package Ordered_Sets.Generic_Keys to be limited? I don't think it matters a great deal, but it just might assist some implementations by making it non-limited (since it would allow them to make internal copies). I doubt that a limited type would ever be required in practice: if the key is directly extracted from an element, it cannot be limited (because the element cannot); if it is indirectly derived, it must be via an access value, in which case the access type can be used as the key type. That Key_Type is limited doesn't seem to match its conceptual abstraction, to my mind. * In line 2473 and 2788, within package Generic_Keys: procedure Update_Element -> (Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); Would it be better to make the mode of Position 'in out', so that if re-insertion occurs the cursor 'tracks' the new position? * In those places where "Index_Type'Succ(X)" is used, couldn't X + 1 be used instead? Index_Type is an integer type (isn't it?). * In various places appears a phrase like: Any exceptions raised ... are propagated. I think these should be rephrased in the singular: Any exception raised ... is propagated. since only one exception can be propagated (the first to occur). * As an example of counting the number of angels dancing on the tip of a needle, I wonder if the word 'expansile' should be used instead of 'expandable'. I think there is a tiny difference of meaning, in that the latter suggests something that can be expanded deliberately (like a set of modular shelves) whereas the former suggests something that inherently tends to expand (as one's bladder, on a visit to the pub). *** A few side-notes: The semantics expressed by: While Set_Capacity can be used to reduce the capacity of a map, we do not specify whether an implementation actually supports reduction of the capacity. Since the actual capacity can be anything greater than or equal to Count, an implementation never has to reduce the capacity. is exactly what I wanted. Cool. Some of the AARM notes are brilliant. E.g. "The implementation needs to take care so that aliasing effects do not make the result trash; Union (S, S); must work." This is totally cool; it's just the kind of gotcha that causes me to have bad nights. Finally I'd like to express -- and I very much hope this doesn't sound trite or ingratiating -- my admiration for Matt and Randy for all the perspiration and inspiration (in whichever proportion) they have put into this Amendment. I hope it gets passed. In fact, I just want to add a note that I hope nothing I have posted to this mailing list comes across as being deliberately antagonistic or offensive (towards Randy or anyone else). Although I may often be blunt in the way I put things, and I may often be strident in my criticisms, there has never been on my part any element of personal animosity in any of my comments. On the contrary, I may disagree vehemently with someone on a particular issue, without having any less respect for them. I feel that it is in the nature of the best technical forums that the participants retain a sense of personal respect for the others, however passionate their disagreement may be. Nuff said. **************************************************************** From: Matthew Heaney Sent: Monday, August 9, 2004 4:36 PM Nick Roberts wrote: > > * What is the purpose of Index_Subtype? (This question seems to have > been raised but not answered.) It's there for composability. You want to be able to write operations and declare variables in terms of the index type used to instantiate the package. > * For vectors, lists, maps, and sets the procedures named 'Iteration' > and 'Reverse_Iteration', which were generic but are now procedures > taking an access-to-subprogram parameter, would perhaps now be more > appropriately named 'Iterate' and 'Reverse_Iterate'? (Or 'Traverse' > and 'Reverse_Traverse'? :-) I have made the same comment in my review of the AI. (The name should be the verb phrase "Iterate", not the noun phrase "Iteration".) > * Should the same thing (change from generic procedure to procedure > with access-to-subprogram parameter) be done for Generic_Sort? (I > guess the answer is to do with efficiency.) Yes, it has to do with efficiency. Predefined relational operators for a type are intrinsic, so you can't take their address and hence cannot pass the operation as the value of an anonymous access parameter. > * Should procedure 'Assign' be called 'Copy', to emphasise its > distinction from normal assignment? Should it have start and end > parameters? (I see idea this already got suggested.) No. It's called Assign because it's an assignment operation. (In this case, more efficient than operator ":=".) The lexical rule is: Assign takes the Target as the first parameter and Source as the second parameter, while Copy takes the Source as the first parameter and target as the second parameter, like this: procedure Assign (Target : in out T; Source : in T); procedure Copy (Source : in T; Target : in out T); So even if you were to change the name, you'd have to change the parameter order too. But you have have to put the target as the first parameter, in order to use distinguished receiver syntax: Target.Assign (Source); Is the same as: Target := Source; but the former is more efficient than the latter (for a vector). > * There may be a call ambiguity problem with Find and Reverse_Find. > These are provided for both an index and cursor starting point: > > function Find (Container : Vector; > Item : Element_Type; > Index : Index_Type'Base := Index_Type'First) > return Index_Type'Base; > > function Find (Container : Vector; > Item : Element_Type; > Position : Cursor := No_Element) > return Cursor; > > A call Find(Cont,Item) in a context which could require either an > index or a cursor will be ambiguous. I'm just pointing this one out > at the moment; maybe it's too trivial to worry about. I discussed this with Randy, and he reasoned that ambiguity wouldn't be a problem because in the typical case you name the return value anyway. However, we do have First_Index and Last_Index, and so I would be in favor of renaming the first operation Find_Index. > * For the map 'Replace' procedure: > > procedure Replace (Container : in out Map; > Key : in Key_Type; > New_Item : in Element_Type); > > Replace inserts Key and New_Item as per Insert, with the difference that > if Key is already in the map, then this operation assigns New_Item to > the element associated with Key. Any exceptions raised during assignment > are propagated. > > I think this procedure should be called 'Replace_or_Insert', to avoid > potential confusion about its semantics to anyone reading code using > it. This point has already been argued about, so maybe it should rest > as is now. Too much verbosity. Let's just call it "Insert". > * For the Delete at cursor procedure: > > procedure Delete (Container : in out Map; > Position : in out Cursor); > > If Position equals No_Element, this operation has no effect. > Otherwise, Delete removes the node from the map and deallocates the > node. > Position is set to No_Element on return. > > Is it such a good idea for Position to have mode 'in out'? This > prevents constructions such as (concocting an example): > > Delete(M,Next(C)); > > which might be handy sometimes. But you're deallocating the node designated by Position, so you have to set the cursor to No_Element. I haven't had a need to do as in your example, but I have had a need to iterate through the map and delete each element in turn. I can do this in C++ by saying my_map.erase(my_iter++); but I don't have that option in Ada. But it's no big deal, just do this: declare I : Cursor := First (M); J : Cursor; begin while Has_Element (I) loop Update_Element (I, Finalize'Access); J := I; Next (I); M.Delete (J); end loop; end; > * Is it really appropriate for the Key_Type formal parameter of > generic package Ordered_Sets.Generic_Keys to be limited? I don't think > it matters a great deal, but it just might assist some implementations > by making it non-limited (since it would allow them to make internal > copies). But you can make an internal copy (in fact, the reference implementation does), you just have to use renames instead of assignment: declare Copy_Of_Key : Key_Type renames Key (E); begin > I doubt that a limited type would ever be required in practice: > if the key is directly extracted from an element, it cannot be limited > (because the element cannot); if it is indirectly derived, it must be > via an access value, in which case the access type can be used as the > key type. That Key_Type is limited doesn't seem to match its conceptual > abstraction, to my mind. But Generic_Keys is written in terms of what Generic_Keys requires. It only needs function Key to implement Update_Element, and it doesn't need assignment (since it can use renames). > * In line 2473 and 2788, within package Generic_Keys: > > procedure Update_Element > -> (Position : in Cursor; > Process : not null access procedure (Element : in out > Element_Type)); > > Would it be better to make the mode of Position 'in out', so that if > re-insertion occurs the cursor 'tracks' the new position? Hmmmm. Not sure what you mean here, since Position would seem to already "track" the new position. The cursor continues to designate the same internal storage node before and after the call (even if the relative position of the node changes), and so it doesn't need to be inout. The more substantive issue in my mind is what happens if the key value changes and it matches another key in the set. In that case we deallocate the storage node and then raise Constraint_Error. I don't really like that, since the cursor is now designating a node which has been deallocated. I'd rather say: procedure Update_Element_Or_Delete (Position : in out Cursor; Process : ...); and then not raise any exception if there's a match. Rather, we deallocate the node and then set Position to No_Element. **************************************************************** From: Matthew Heaney Sent: Thursday, July 12, 2004 6:42 PM One of the things we did in Palma was to move the Update_Element operation for ordered sets into its nested package Generic_Keys, to allow it to check whether the key was modified. One of the consequences of moving that operation is that we no longer have an operation to pass the set element as a parameter (without also instantiating the nested generic). For example: procedure Print (S : Set) is procedure Print (E : in out ET) is begin -- we only need to query E, not modify it end; procedure Process (C : Cursor) is begin Update_Element (C, Print'Access); end; begin S.Iterate (Process'Access); end Print; We can do this if only we also instantiate the nested generic. Of course, we could also use the selector function Element for cursors, to return a copy of the element, but this isn't very attractive for large elements. I think what's missing is an operation like Update_Element, but with the difference that the access procedure parameter accepts the element with in mode. Something like: procedure Query_Element (Position : in Cursor; Process : not null access procedure (E : in ET)); This would be declared outside of the Generic_Keys nested package. In fact, I think it makes sense to add this operation for all containers. This would allow us to write the example above like this: procedure Print (S : Set) is procedure Print (E : in ET) is begin -- we only need to query E, not modify it end; procedure Process (C : Cursor) is begin Query_Element (C, Print'Access); end; begin S.Iterate (Process'Access); end Print; Also, in Palma we added a Key selector function to the generic formal region of Generic_Keys. Now that we have that, I think it makes sense for Generic_Keys to provide the following additional operation: function Key (Position : Cursor) return Key_Type; This allows Generic_Keys to more closely mimic the spec of the (heahed) map. **************************************************************** From: Matthew Heaney Sent: Thursday, September 9, 2004 9:14 PM > -----Original Message----- > From: Randy Brukardt [mailto:randy@rrsoftware.com] > Sent: Tuesday, September 07, 2004 1:13 PM > > Of course, we don't have the Hashed_Set and Ordered_Map > containers, and they will come up in practice. I'm now > convinced that their omission was a mistake. I recently had a > case where I could have used a map, but creating a decent > hash function was going to be substantial work not justified > by the number of elements to be used. Clearly, an Ordered_Map > would be much easier to use in such a case. The examples > given by Tucker are good examples of the use of a Hashed_Set. As a sort of thought experiment, I implemented a hashed set. It's up at the CVS repository. It looks just like the ordered set, except for the generic formal region, and the addition of Set_Capactity, etc. **************************************************************** From: Randy Brukardt Sent: Thursday, September 9, 2004 9:56 PM The problem, of course, is that the description of the operations in for the standard would be different, and its very late to be adding anything significant. I would hope that most implementers would provide a Hashed_Set patterned on your package. (This is the sort of thing that an IWA would be very good for.) **************************************************************** From: Matthew Heaney Sent: Friday, September 11, 2004 3:09 AM I just uploaded an ordered map. http://charles.tigris.org/source/browse/charles/src/ai302/a-coorma.ads?rev=H EAD&sortby=date&only_with_tag=HEAD&content-type=text/vnd.viewcvs-markup http://charles.tigris.org/source/browse/charles/src/ai302/a-coorma.ads **************************************************************** From: Nick Roberts Sent: Sunday, September 12, 2004 10:18 AM I think this proposal is looking pretty polished now. I hope it won't seem wrong for me to put my own responses to Randy's questions (to the ARG, I guess) at the end of the latest update to this AI. Apologies if the quoting seems heavy. > Q1) Find_Index returns Last_Index (Container) + 1 if the element is not > found. This seems consistent to me (it's past the end of the container in > a forward search), but Matt worries that First_Index (Container) - 1 > might be thought of as better. The trouble with First_Index (Container) - > 1 is that you can't put it into an object: > declare > I : Index_Type := Index_Type'First; > begin > I := Find_Index (Vect, Item, I); > while I <= Last_Index (Vect) loop > -- Do something to the element I. > I := Find_Index (Vect, Item, I+1); > end loop; > end; >If Find_Index returned Index_Type'First - 1, saving the result of > Find_Index would raise Constraint_Error if the item is not found. That's > not what we want, I think. The problem with Last_Index (Container) + 1 is that it may not exist, because Last_Index (Container) might be Index_Type'Last. On the other hand, we convenienty have a requirement that Index_Type'First > Index_Type'Base'First, which guarantees that First_Index (Container) - 1 does always exist (as a value of Index_Type'Base). I think that swings it. I suggest Find_Index returns First_Index (Container) - 1 when it does not find what it is looking for. It might be convenient to declare two extra things in the Containers.Vectors package: subtype Find_Index_Result is Index_Type'Base range Index_Type'First-1 .. Index_Type'Last; Not_Found: constant Find_Index_Result := Find_Index_Result'First; Obviously the Find_[Reverse_]Index functions can then have Find_Index_Result as their return types. The example can then be reformulated as: declare I : Find_Index_Result := Index_Subtype'First; begin I := Find_Index (Vect, Item, I); while I /= Not_Found loop -- Do something to the element I. I := Find_Index (Vect, Item, I+1); end loop; end; or alternatively: declare I : Find_Index_Result := Index_Subtype'First - 1; begin loop I := Find_Index (Vect, Item, I+1); exit when I = Not_Found; -- Do something to the element I. end loop; end; > Q2) The parameters to Generic_Merge have not been made class-wide (even > though the comments about non-primitive operations with specific tagged > parameters mentioned for Generic_Sort hold here, too). That's because > both parameters need to be the same type. An alternative would be to make > them class-wide, and then have a runtime check (of the actual tags) that > they actually are the same type. But that is not very O-O. A third > possibility would be to repeat the type in the generic spec: > generic > type List_Type is new List with private; > with function "<" (Left, Right : Element_Type) > return Boolean is <>; > procedure Generic_Merge (Target : in out List_Type; > Source : in out List_Type); >But that is not very consistent with the rest of the specification. Some > guidance would be helpful here. I'm uncomfortable with Generic_Sort having a classwide parameter: generic with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Sort (Container : in out List'Class); This is because the actual sorting operation can only be on objects of the root type, both conceptually and actually. The implementation would have to typecast the parameter Container to List anyway. I think it would make much more sense for Generic_Sort to be declared: generic with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Sort (Container : in out List); and the typecast to be done in the call instead. For example: package Float_Lists is new Ada.Containers.Doubly_Linked_Lists(Float); procedure Sort is new Float_Lists.Generic_Sort; type My_List is new Float_Lists.List with ...; ... L1: My_List; ... Sort( Float_Lists.List(L1) ); I prefer this because it makes it explicit that the list L1 is being sorted /as/ a Float_Lists.List (not as a My_List, as such). To extend the example a bit further: procedure Merge is new Float_Lists.Generic_Merge; ... type Freds_List is new Float_Lists.List with ...; ... L2: Freds_List; ... Merge( Float_Lists.List(L2), Float_Lists.List(L1) ); I think this is the right formulation, since again it makes it explicit that you are merging the two lists /as/ Float_Lists.Lists (which is what makes them compatible enough to merge anyway). As is rightly pointed out, the implementations of Generic_Sort and Generic_Merge cannot use dispatching operations of their parameters (lists), so they must both be specific to their root types anyway. For this reason, I would suggest neither should have a classwide parameter. I hope I haven't totally missed the point here! Note also that there is a typo (lines 1418-1419): in the package specification itself the parameters to Generic_Merge are both still cited as List'Class. > Q6) Tucker has mentioned that he often has components in the key of a map > beyond the actual key participating ones. (This is similar to the > behavior of a set; if we had a Hashed_Set this probably would be less of > an issue.) For that to be effective, it would be necessary to change a > key that is already in a map. Currently, neither Replace_Element nor > Insert_or_Replace change the value of a key that is in the map; only the > element is changed. >In order to get the sort of semantics that Tucker seems to be suggesting, > we'd need a way to change the value of a key. But such an operation would > potentially change the location of the element, so it could be fairly > expensive. Moreover, it would likely require allocation even if the hash > didn't change for the indefinite form of the container. >Finally, whether or not the key is replaced would seem to be another > (orthogonal) option for the Insert routine "6) Insert replaces the key > and the element when the key is already in the map; 7) Insert replaces > the key, leaving the element unchanged when the key is already in the > map". >This complication doesn't seem worth it to me, but as it came up very > late, the entire ARG needs to discuss the issue. I think this is a case of "Don't do that!" Isn't the basic idea that key values are there specifically to provide fast indexed access to the elements? I don't think they are not intended to be used to carry ancillary information. It may sometimes be convenient to use an existing type (which has non-participating components) as a key type, but I think, in these cases, the non-participating components of the key should be moved into the element. If you have type T1 (with non-participating components) you want to use as a key for type T2, I think the proper design is to declare a new type T3, with only participating components, and a new type T4 which has the remaining components of type T1 and all those of T2 (or alternatively two components, of type T1 and T2). You then use T3 to index T4 (from which you extract the components of T1 or T2 as required). So, I agree that a key replacement operation should not be added. > Q3) The generic formal part for maps has: > with function "=" (Left, Right : Key_Type) > return Boolean is <>; > with function Is_Equal_Key (Left, Right : Key_Type) > return Boolean is "="; >Matt wonders why both operations are needed; [etc.] I hate this difference. I think it is counter-intuitive that keys have two different kinds of equality, and the reason why relates to my answer to Q6: I think the principle we should stick to is that the purpose of key values is solely to provide fast indexed access to a set of element values. On this priciple, I think it is intuitively the case that the equality that is implied by the ordering operation on the keys: (not A; procedure Generic_Merge (Target : in out List'Class; Source : in out List'Class); MJH: We need to resolve the declaration of the parameter types in Madison. We can either declare the two types as class-wide (as above), or import the list as a generic formal type, like this: generic type List_Type is new List with private; with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Merge (Target : in out List_Type; Source : in out List_Type); Note also that the description of this operation uses type List (probably just a typo). Other possibilities for List_Type are: type List_Type (<>) is new List with private; or maybe: type List_Type (<>) is abstract new List with private; ENDMJH. procedure Insert (Container : in out List; Before : in Cursor; New_Item : in Element_Type; Count : in Count_Type := 1); Insert allocates Count new nodes whose element is initialized to the value New_Item, and inserts them prior to the node designated by Before. If Before equals No_Element, the new nodes are inserted immediately following the last node (if any). Any exception raised during allocation of internal storage is propagated, and Container is not modified. MJH: We have to decide whether partial success is allowed. (The last sentence above looks vestigial, and might have been written prior to the introduction of the Count parameter.) For example, if Count is 10, and we're only able to allocate, say, 7 nodes, then can the list be modified such that its length only grew by 7 nodes? ENDMJH. procedure Swap (Container : in out List; I, J : in Cursor); Swap exchanges the nodes designated by I and J. AARM Note: Unlike Swap_Elements for vectors, this exchanges the nodes, not the elements. No copying is performed. I and J designate the same elements after this call as they did before it. This is important, as this operation is provided as it can provide better performance than a straight copying swap. The programmer can writing a copying swap if they need one. This difference in semantics is the reason that this operations have different names in the List and Vector containers. MJH: The penultimate sentence should say: "The programmer can write a copying swap if he needs one." We also need to specify the behavior when one or both of the parameters equal No_Element. It probably wouldn't hurt anything to add a Swap_Elements operation, too. If an implementor uses pointers to elements for the indefinite form, than at least that would confer a performance benefit (the same as for the indefinite vector). ENDMJH. Hashed maps: generic type Key_Type is private; type Element_Type is private; with function Hash (Key : Key_Type) return Hash_Type; with function "=" (Left, Right : Key_Type) return Boolean is <>; with function Is_Equal_Key (Left, Right : Key_Type) return Boolean is "="; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Hashed_Maps is MJH: We need to resolve the characteristics of this generic formal region in Madison. Note that in Palma we decided to use the name "Equivalent" instead of "Is_Equal_Key". ENDMJH. procedure Query_Element (Position : in Cursor; Process : not null access procedure (Element : in Element_Type)); procedure Update_Element (Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); MJH: We might need some other operations, if we're serious about manipulating (and possibly modifying) keys. Here are some ideas: procedure Query_Key (Position : in Cursor; Process : not null access procedure (Key : in Key_Type)); procedure Query_Key_And_Element (Position : in Cursor; Process : not null access procedure (Key : in Key_Type; Element : in Element_Type)); procedure Query_Key_And_Update_Element (Position : in Cursor; Process : not null access procedure (Key : in Key_Type; Element : in out Element_Type)); procedure Checked_Update_Key (Container : in out Map; Position : in Cursor; Process : not null access procedure (Key : in out Key_Type)); ENDMJH. procedure Insert_Or_Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); MJH: Tucker thought that this operation replaced the value of the key too, so we need to confirm the exact semantics of this operation. ENDMJH. function Is_Equal_Key (Left, Right : Cursor) return Boolean; function Is_Equal_Key (Left : Cursor; Right : Key_Type) return Boolean; function Is_Equal_Key (Left : Key_Type; Right : Cursor) return Boolean; MJH: We need to decide about these ops in Madison: either change the name, or keep the name, or get rid of them. ENDMJH. procedure Set_Capacity (Container : in out Map; Capacity : in Count_Type); If Length (Container) > Capacity, then Constraint_Error is propagated. Otherwise, Set_Capacity allocates a new hash table such that the length of the resulting map can become at least the value Capacity without requiring an additional Set_Capacity operation. If the allocation fails, the exception is propagated and Container is not modified. It then rehashes the nodes in Container onto the new hash table. It replaces the old hash table with the new hash table, and then deallocates the old hash table. MJH: (This comment applies to vector, too.) We have already discussed the fact that I don't think raising CE is appropriate for this operation. The purpose of an exception is to indicate that the postcondition cannot be satisfied. If for example, a very large value for Capacity were requested, and the implemention were unable to allocate the requisite storage, then an exception would be appropriate. However, the postcondition here is "Capacity(Container) >= Capacity", and an invariant is "Capacity(Container) >= Length(Container)". If Capacity < Length(Container), then the implementation will allocate a capacity that is at least the container's length, thus satisfying both the postcondition and the invariant. Therefore an exception is not appropriate. Note that the STL member function reserve() (which is equivalent to Set_Capacity) does *not* raise an exception. ENDMJH. Ordered Sets: function Is_Disjoint (Left, Right : Set) return Boolean; MJH: I had originally chosen the name Is_Disjoint to be consistent with Is_In. This was also the name favored by John Barnes. However, Is_In has since been renamed to Contains. Should we consider a similar name change for Is_Disjoint? function Overlaps (Left, Right : Set) return Boolean; This is the name Tucker prefers. ENDMJH. generic type Key_Type (<>) is limited private; with function Key (Element : Element_Type) return Key_Type; with function "<" (Left : Key_Type; Right : Element_Type) return Boolean is <>; with function ">" (Left : Key_Type; Right : Element_Type) return Boolean is <>; package Generic_Keys is MJH: We have to make a decision about whether we want to pass the set type as a generic actual type, like this: generic type Set_Type is new Set with private; type Key_Type (<>) is limited private; .. and then use Set_Type everywhere for operations. Or instead declare the container parameters as type Set'Class. Actually, another declaration might be: type Set_Type (<>) is new Set with private; or possibly: type Set_Type (<>) is abstract new Set with private; (However we declare Set_Type, we should do the same for List_Type of Generic_Merge.) ENDMJH. procedure Checked_Update_Element (Container : in Set; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); MJH: The Container param should be inout, not in. (I think you already fixed this.) ENDJH. !examples Ordered Sets: Another technique would be to use an active iterator, like this: procedure Shutdown_Connections is I : Cursor; X : Connection_Access; begin while not Is_Empty (Connection_Set) loop I := First (Connect_Set); X := Element (I); Delete (Connection_Set, Position => I); Free (X); end loop; end Shutdown_Connections; Here we use the cursor-form of Delete. This is probably more efficient than using the item-form of Delete, since the cursor-form doesn't have to search for the item. MJH: We might also want to say that this example can be simplified as follows: START EXAMPLE TEXT The example can be simplified by using the set operations that manipulate the first element specifically: procedure Shutdown_Connections is X : Connection_Access; begin while not Is_Empty (Connection_Set) loop X := First_Element (Connect_Set); Delete_First (Connection_Set); Free (X); end loop; end Shutdown_Connections; END EXAMPLE TEXT ENDMJH. To actually change the employee's address in the example above, we use the special element modifier operation: procedure Change_Address (SSN : SSN_Type; New_Home : Home_Address_Type) is procedure Set_Home (Employee : in out Employee_Type) is begin Employee.Home := New_Home; end; Position : Cursor := Find (Employees, Key => SSN); begin if Has_Element (Position) then SSN_Keys.Checked_Update_Element (Position => Position, Process => Set_Home'Address); ... end if; end Change_Address; MJH: The call to Checked_Update_Element needs to pass the set, too: SSN_Keys.Checked_Update_Element (Container => Employees, Position => Position, Process => Set_Home'Address); ENDMJH. Another technique is to use Checked_Update_Element, which allows the element's key to be modified, and then moves then element to its new relative location in the set: procedure Change_SSN (Old_SSN : SSN_Type; New_SSN : SSN_Type) is Old_Position, New_Position : Cursor; Inserted : Boolean; begin if New_SSN = Old_SSN then return; end if; Old_Position := Find (Employees, Key => Old_SSN); if not Has_Element (Old_Position) then return; end if; New_Position := Find (Employees, Key => New_SSN); if Has_Element (New_Position) then raise Duplicate_SSN; end if; declare procedure Set_SSN (Employee : in out Employee_Type) is begin Employee.SSN := New_SSN; end; begin SSN_Keys.Checked_Update_Element (Position => Old_Position, Process => Set_SSN'Access); end; end Change_SSN; MJH: Same thing here: we need to pass the set as a parameter: SSN_Keys.Checked_Update_Element (Container => Employees, Position => Old_Position, Process => Set_SSN'Access); ENDMJH. Suppose now we want a list all the employees in the firm. One way to do it is like this: procedure Display is procedure Print (I : in Employee_Sets.Cursor) is procedure Do_Print (E : in out Employee_Type) is begin Put ("Name: "); Put (E.Name); Put ("SSN: "); Put (E.SSN); ...; end; begin Query_Element (Position => I, Process => Do_Print'Access); end; begin Iterate (Employees, Print'Access); end; MJH: The mode for the (generic) actual Do_Print should be just in, not inout. ENDMJH. begin Sort (Cursors); for Index in Cursors'Range loop C := Cursors (Index); Query_Element (Position => C, Process => Do_Print'Access); end loop; end Display_Employees_In_Name_Order; MJH: All of the process procedures above need to change the mode from in to inout. ENDMJH. This lets us perform session lookups based on the session identifier: procedure Play (Session_Id : in String; NPT_Range : in NPT_Range_Type; RTSP_Status : out RTSP_Status_Type) is Position : constant Session_Set_Types.Cursor := Find (Session_Set, Key => Session_Id); MJH: We might want to use name qualification here: Position : constant Session_Set_Types.Cursor := Id_Keys.Find (Session_Set, Key => Session_Id); ENDMJH. **************************************************************** From: Matthew Heaney Sent: Tuesday, September 14, 2004 2:13 AM My comments are inline. --MJH > > Q1) Find_Index returns Last_Index (Container) + 1 if the element is > >not found. This seems consistent to me (it's past the end of the > >container in a forward search), but Matt worries that First_Index > >(Container) - 1 might be thought of as better. The trouble with > >First_Index (Container) - 1 is that you can't put it into an object: ... > The problem with Last_Index (Container) + 1 is that it may > not exist, because Last_Index (Container) might be > Index_Type'Last. But Find_Index returns Index_Type'Base. There's an implied requirement that Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if Last_Index(V) = Index_Type'Last. Remember the purpose of a vector is to expand. That's what influenced our decision to use an integer type as the index instead of any discrete type. Your argument that Last_Index(V) might equal Index_Type'Last is tantamount to saying that an enumeration type should be able to be used as the index type, but this matter has already been debated and settled. Allowing an Index_Type to be passed as a generic formal is really only intended to allow the user to specify the starting point of the index subtype range. The ending point should be large relative to the number of elements typically stored in the container. I would expect that most users would either use subtypes Natural or Positive as the generic actual index type. If the starting point of your range has some negative values, then you could do something like: type Index_Type is range -42 .. Integer'Pos (Integer'Last); or type Index_Type is range -2001 .. Count_Type'Pos (Count_Type'Last); > On the other hand, we convenienty have a > requirement that Index_Type'First > Index_Type'Base'First, > which guarantees that First_Index (Container) - 1 does always > exist (as a value of Index_Type'Base). > > I think that swings it. I suggest Find_Index returns First_Index > (Container) - 1 when it does not find what it is looking for. I find Randy's argument more persuasive, and hence agree with him that we don't need any change here. We need to affirm the semantics of Reverse_Find_Index too, since you could make the argument that for symmetry with Find_Index, it should return Index_Type'Pred (Index_Type'First). On that other hand, you could argue that for consistency with Find_Index that it should return the same value. > > Q2) The parameters to Generic_Merge have not been made class-wide > >(even though the comments about non-primitive operations > with specific > >tagged parameters mentioned for Generic_Sort hold here, > too). That's > >because both parameters need to be the same type. An > alternative would > >be to make them class-wide, and then have a runtime check (of the > >actual tags) that they actually are the same type. But that is not > >very O-O. A third possibility would be to repeat the type > in the generic spec: > > generic > > type List_Type is new List with private; > > with function "<" (Left, Right : Element_Type) > > return Boolean is <>; > > procedure Generic_Merge (Target : in out List_Type; > > Source : in out List_Type); > >But that is not very consistent with the rest of the specification. > >Some guidance would be helpful here. The generic formal type should probably be declared as: type List_Type (<>) is new List with private; or possibly type List_Type (<>) is abstract new List with private; > I'm uncomfortable with Generic_Sort having a classwide parameter: ... > I prefer this because it makes it explicit that the list L1 > is being sorted /as/ a Float_Lists.List (not as a My_List, as such). But this argument applies to any operation. We shouldn't have to call sort differently, just because it happens to be generic. You could pass the list type as a generic actual, but that seems like overkill when the operation only has a single parameter. We also need to affirm the declaration of Ordered_Sets.Generic_Keys. > > Q6) Tucker has mentioned that he often has components in the key of a > >map beyond the actual key participating ones. (This is similar to the ... > I think this is a case of "Don't do that!" > > Isn't the basic idea that key values are there specifically > to provide fast indexed access to the elements? I don't think > they are not intended to be used to carry ancillary information. Well, that's the debate. We can use either model, but we have to chose one. Essentially, the API as it stands now has not been designed to facilitate replacement of key values. The problem is that to support key modification, there will most likely be some kind of penality, either static (more complex interface) or dynamic (less efficient execution). The decision is whether the added generality is worth the penalty. > > Q4) Set_Capacity is defined to raise Constraint_Error if Length > >(Container) > Count. Matt would prefer that this case is not > >separately handled. He would like > > Set_Capacity (M, 0) > >to be a shorthand for setting the Map or Vector to the smallest > >reasonable size. (I find this a bit odd, as Matt never wanted this > >routine to even allow smaller values. But whatever.) Note that just > >dropping the check would not be enough; we'd have to redo the > >description of the operation to say that the capacity is set to at > >least Count_Type'Max (Count, Length(Container)) -- because we don't > >want this operation to drop elements. I'm unsure that the benefit is > >worth the change, and it seems like a bug to me to try to set the > >capacity of a container to be smaller than the number of elements it > >holds. > > I think this is a case where readability is more important > than conciseness of code, and I think using an explicit > length will usually be more readable. At least, I think its > meaning is a bit more intuitively obvious. The meaning of an operation is its postcondition. (See Meyer's description of Hoare triples in OOSC, read David Gries' book, and read EWD's books and technical notes. For more information about the proper use of exceptions, see the paper at Barne Stroustrup's home page, or just read Appendix E of TC++PL.) The postcondition of Set_Capacity is: Capacity(V) >= Capacity A vector (or hashed map) also satisfies the invariant that: Capacity(V) >= Length(V) Set_Capacity does whatever is necessary to satisfy both of these predicates. Whether the requested capacity is less than the current length is irrelevant, since the representation invariant already handles that case. An exception would only be proper if Set_Capacity were unable to satisfy the postcondition or the invariant. > ~~~ > > Q5 was about using an extra parameter of an enumerated type > in insertion and replacement procedures to indicate what > should happen if the key exists > (insertion) or doesn't exist (replacement), with a default > indicating that an exception should be raised. > > I rather prefer this idea. I think it is the least of three > evils (no user choice, a plethora of procedures, or control-coupling). I have already stated that I am not in favor of this change. (It's at the wrong level of abstraction, for one thing.) > ~~~ > > Another question that arises in my mind is, for any of the > functions which returns a container: what is the capacity of > the result? In particular, should this be defined by the > standard, or remain implementation defined? [Call this Q7?] All this standard says is that Capacity(C) >= Length(C). If you want more control of the capacity, then use the procedures, not the functions. **************************************************************** From: Nick Roberts Sent: Tuesday, September 14, 2004 10:06 AM Matthew Heaney wrote: >>The problem with Last_Index (Container) + 1 is that it may >>not exist, because Last_Index (Container) might be >>Index_Type'Last. > > But Find_Index returns Index_Type'Base. There's an implied requirement that > Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if Last_Index(V) > = Index_Type'Last. If there is such a requirement, it should be explicit. But I would be alarmed if such a requirement really was imposed: it would be a classic potential source of obscure bugs. > Remember the purpose of a vector is to expand. That's what influenced our > decision to use an integer type as the index instead of any discrete type. > Your argument that Last_Index(V) might equal Index_Type'Last is tantamount > to saying that an enumeration type should be able to be used as the index > type, but this matter has already been debated and settled. I don't think so at all (that Last_Index(V) might equal Index_Type'Last is tantamount to saying that an enumeration type should be able to be used as the index type). On an implementation that has an 16-bit Integer'Base type, and where V is of a package instantiated with Index_Type Positive, it is quite feasible that Last_Index(V) becomes 32767. > Allowing an Index_Type to be passed as a generic formal is really only > intended to allow the user to specify the starting point of the index > subtype range. The ending point should be large relative to the number of > elements typically stored in the container. Why? Why should it (Index_Type'Last) not be equal to the maximum number of elements to be stored? It would be prudent to be sure that the maximum was never exceeded, in such a case. Let me give you an example. Supposing we are programming a card game, and we need a container that can contain a 'hand', which is a list of cards. The rules of this game do not limit the number of cards in a hand, as such, but since the cards come from a pack of 52, we can be absolutely certain that the limit of 52 will never be exceeded. It would therefore be quite reasonable (wouldn't it?) to write: type Card_Count is range 0..52; subtype Card_Index is Card_in_Hand_Count range 1..52; package Hand_Vectors is new Ada.Containers.Vectors (Card_ID, Card_Index); ... Dealer: Hand_Vectors.Vector := Random_52; Hands: array (Player_ID) of Hand_Vectors.Vector; We could move cards between Dealer and Hands throughout the program, safe in the knowledge that although a length of 52 is possible (certain, for Dealer), that length can never be exceeded. >>I think it would make much more sense for Generic_Sort to be declared: >>[not classwide] >>and the typecast to be done in the call instead. For example: >>... >> Sort( Float_Lists.List(L1) ); >> >>I prefer this because it makes it explicit that the list L1 >>is being sorted /as/ a Float_Lists.List (not as a My_List, as such). > > But this argument applies to any operation. We shouldn't have to call sort > differently, just because it happens to be generic. We have to call Sort differently because it's argument isn't of type Float_Lists.List. This fact applies to any non-inherited operation of List, generic or not. That is my point. > You could pass the list type as a generic actual, but that seems like > overkill when the operation only has a single parameter. I quite agree, and I'd argue the same for Generic_Merge (which only has two parameters). I'm also saying that making them classwide would be overkill. > We also need to affirm the declaration of Ordered_Sets.Generic_Keys. What are the alternatives, please? >>Isn't the basic idea that key values are there specifically >>to provide fast indexed access to the elements? I don't think >>they are not intended to be used to carry ancillary information. > > Well, that's the debate. We can use either model, but we have to chose one. Okay. > Essentially, the API as it stands now has not been designed to facilitate > replacement of key values. The problem is that to support key modification, > there will most likely be some kind of penality, either static (more complex > interface) or dynamic (less efficient execution). The decision is whether > the added generality is worth the penalty. I don't think it is. I'm saying that I do indeed think we should stick to the model that key values are only for the purpose of indexing. >>> [re: Set_Capacity(M,n) where n>I think this is a case where readability is more important >>than conciseness of code, and I think using an explicit >>length will usually be more readable. At least, I think its >>meaning is a bit more intuitively obvious. > > The meaning of an operation is its postcondition. I'm not arguing about the semantic meaning of the operation, Matt! How can I put it more plainly? I'm saying that this is a matter of readability. The semantic meanings of your suggested Set_Capacity(M,0) and of Set_Capacity(M,Length(M)) are identical, so this is simply not an issue. I am arguing that Set_Capacity(M,Length(M)) is more readable, and for that reason we should disallow Set_Capacity(M,0). That is all. In fact, I would accept the procedure being renamed (again ;-) to Set_Minimum_Capacity with the semantics you suggest. Or possibly declare the procedure as: procedure Set_Capacity (Container: in out Vector|List|etc; Minimum: in Count_Type); so that the call can be made: Set_Capacity (M, Minimum => 0); >>Q5 was about using an extra parameter of an enumerated type >>in insertion and replacement procedures to indicate what >>should happen if the key exists >>(insertion) or doesn't exist (replacement), with a default >>indicating that an exception should be raised. >> >>I rather prefer this idea. I think it is the least of three >>evils (no user choice, a plethora of procedures, or control-coupling). > > I have already stated that I am not in favor of this change. (It's at the > wrong level of abstraction, for one thing.) Okay, but I feel that arguments about the level of abstraction are a bit of a nicety here. Am I wrong that we are faced with three choices, none of which is plainly ideal? >>Another question that arises in my mind is, for any of the >>functions which returns a container: what is the capacity of >>the result? In particular, should this be defined by the >>standard, or remain implementation defined? [Call this Q7?] > > All this standard says is that Capacity(C) >= Length(C). If you want more > control of the capacity, then use the procedures, not the functions. So, are you saying that the standard /should/ leave it implementation defined what the capacity is? **************************************************************** From: Matthew Heaney Sent: Tuesday, September 14, 2004 1:02 PM >> But Find_Index returns Index_Type'Base. There's an implied >> requirement that >> Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if >> Last_Index(V) >> = Index_Type'Last. > > If there is such a requirement, it should be explicit. But I would be > alarmed if such a requirement really was imposed: it would be a classic > potential source of obscure bugs. Not at all. If Last_Index(V) = Index_Type'Base'Last, and the element isn't in the vector, then Find will raise Constraint_Error. Nothing obscure about that... >> Remember the purpose of a vector is to expand. That's what influenced our >> decision to use an integer type as the index instead of any discrete type. >> Your argument that Last_Index(V) might equal Index_Type'Last is tantamount >> to saying that an enumeration type should be able to be used as the index >> type, but this matter has already been debated and settled. > > > I don't think so at all (that Last_Index(V) might equal Index_Type'Last > is tantamount to saying that an enumeration type should be able to be > used as the index type). On an implementation that has an 16-bit > Integer'Base type, and where V is of a package instantiated with > Index_Type Positive, it is quite feasible that Last_Index(V) becomes 32767. Then you used the wrong index type. Say this instead: type Index_Type is new Count_Type range 1 .. Count_Type'Last; Remember also that Find must perform a linear scan of the vector. In your vector that's over 32000 elements. Hmmm... Perhaps you should consider using a different container. >> Allowing an Index_Type to be passed as a generic formal is really only >> intended to allow the user to specify the starting point of the index >> subtype range. The ending point should be large relative to the >> number of >> elements typically stored in the container. > > > Why? Why should it (Index_Type'Last) not be equal to the maximum number > of elements to be stored? It would be prudent to be sure that the > maximum was never exceeded, in such a case. But we're not arguing about Index_Type'Last, we're arguing about Index_Type'Base'Last. If you want to define Index_Type'Last such that it equals the maximum number of elements, then go right ahead. > Let me give you an example. Supposing we are programming a card game, > and we need a container that can contain a 'hand', which is a list of > cards. The rules of this game do not limit the number of cards in a > hand, as such, but since the cards come from a pack of 52, we can be > absolutely certain that the limit of 52 will never be exceeded. It would > therefore be quite reasonable (wouldn't it?) to write: > > type Card_Count is range 0..52; > subtype Card_Index is Card_in_Hand_Count range 1..52; > package Hand_Vectors is > new Ada.Containers.Vectors (Card_ID, Card_Index); > ... > Dealer: Hand_Vectors.Vector := Random_52; > Hands: array (Player_ID) of Hand_Vectors.Vector; All you need to do to fix this is: type Card_Count is range 0..53; subtype Card_Index is Card_in_Hand_Count range 1..52; package Hand_Vectors is new Ada.Containers.Vectors (Card_ID, Card_Index); and now all is well. > We could move cards between Dealer and Hands throughout the program, > safe in the knowledge that although a length of 52 is possible (certain, > for Dealer), that length can never be exceeded. Fine. See above. >>> I think it would make much more sense for Generic_Sort to be declared: >>> [not classwide] >>> and the typecast to be done in the call instead. For example: >>> ... >>> Sort( Float_Lists.List(L1) ); >>> >>> I prefer this because it makes it explicit that the list L1 is being >>> sorted /as/ a Float_Lists.List (not as a My_List, as such). >> >> >> But this argument applies to any operation. We shouldn't have to call >> sort >> differently, just because it happens to be generic. > > > We have to call Sort differently because it's argument isn't of type > Float_Lists.List. This fact applies to any non-inherited operation of > List, generic or not. That is my point. No, we don't have to call it differently. That's my point. >> You could pass the list type as a generic actual, but that seems like >> overkill when the operation only has a single parameter. > > > I quite agree, and I'd argue the same for Generic_Merge (which only has > two parameters). I'm also saying that making them classwide would be > overkill. But we want to *statically* check that both parameters have the same specific type, something we can't do if the parameters have type List'Class. >> We also need to affirm the declaration of Ordered_Sets.Generic_Keys. > > > What are the alternatives, please? Same as for Generic_Sort or Generic_Merge. >>>> [re: Set_Capacity(M,n) where n>> >>> I think this is a case where readability is more important than >>> conciseness of code, and I think using an explicit length will >>> usually be more readable. At least, I think its meaning is a bit more >>> intuitively obvious. >> >> >> The meaning of an operation is its postcondition. > > > I'm not arguing about the semantic meaning of the operation, Matt! How > can I put it more plainly? I'm saying that this is a matter of readability. The problem is that users are probably going to have to check, prior to making the call, in order to avoid raising Constraint_Error. But this unnecessary check is only crying wolf, since nothing bad happens when Capacity < Length(V). > The semantic meanings of your suggested Set_Capacity(M,0) and of > Set_Capacity(M,Length(M)) are identical, so this is simply not an issue. > > I am arguing that Set_Capacity(M,Length(M)) is more readable, and for > that reason we should disallow Set_Capacity(M,0). That is all. That fact that you don't like this locution is not a compelling reason to disallow it. This API specifies a postcondition, the operation satisfies the postcondition, and therefore there is no exception. It's very simple. Exceptions are not a mechanism for social engineering of software. >>> Q5 was about using an extra parameter of an enumerated type in >>> insertion and replacement procedures to indicate what should happen >>> if the key exists >>> (insertion) or doesn't exist (replacement), with a default indicating >>> that an exception should be raised. >>> >>> I rather prefer this idea. I think it is the least of three evils (no >>> user choice, a plethora of procedures, or control-coupling). >> >> >> I have already stated that I am not in favor of this change. (It's at the >> wrong level of abstraction, for one thing.) > > > Okay, but I feel that arguments about the level of abstraction are a bit > of a nicety here. Am I wrong that we are faced with three choices, none > of which is plainly ideal? The 5-parameter insert is already ideal, since it's completely general. Any other behavior you want can be written in terms of the canonical insert. >>> Another question that arises in my mind is, for any of the functions >>> which returns a container: what is the capacity of the result? In >>> particular, should this be defined by the standard, or remain >>> implementation defined? [Call this Q7?] >> >> >> All this standard says is that Capacity(C) >= Length(C). If you want >> more >> control of the capacity, then use the procedures, not the functions. > > > So, are you saying that the standard /should/ leave it implementation > defined what the capacity is? Yes, of course. The standard needs needs to get out of the vendors' way, too. **************************************************************** From: Nick Roberts Sent: Tuesday, September 14, 2004 9:21 PM Matthew Heaney wrote: > Not at all. If Last_Index(V) = Index_Type'Base'Last, and the element > isn't in the vector, then Find will raise Constraint_Error. Nothing > obscure about that... Okay, but is that really the behaviour we want? > Then you used the wrong index type. Say this instead: > > type Index_Type is new Count_Type range 1 .. Count_Type'Last; Okay, but I think this point might not be obvious to many programmers. > Remember also that Find must perform a linear scan of the vector. In > your vector that's over 32000 elements. Hmmm... Perhaps you should > consider using a different container. I think this kind of scan will be appropriate for some applications. The vector might contain over 32000 elements, but the scan might not start at the beginning of the vector. Anyway, I don't think this point is very relevant. > But we're not arguing about Index_Type'Last, we're arguing about > Index_Type'Base'Last. If you want to define Index_Type'Last such that > it equals the maximum number of elements, then go right ahead. But in general, if I declare: type T is range A..B; I cannot know that T'B < T'Base'B. > All you need to do to fix this is: > > type Card_Count is range 0..53; > subtype Card_Index is Card_in_Hand_Count range 1..52; > package Hand_Vectors is > new Ada.Containers.Vectors (Card_ID, Card_Index); > > and now all is well. Except that Card_Count having a rnage of 0..53 just for the convenience of the Find function is surely poor programming? Incidentally, I made a boob: subtype Card_Index is Card_in_Hand_Count range 1..52; was meant to be: subtype Card_Index is Card_Count range 1..52; Sorry. >> We have to call Sort differently because it's argument isn't of type >> Float_Lists.List. This fact applies to any non-inherited operation of >> List, generic or not. That is my point. > > No, we don't have to call it differently. That's my point. I'm pretty sure that we do. For example, if I declare: package Thing_Vectors is Ada.Containers.Vectors(Thing,Positive); package Sort is new Thing_Vectors.Generic_Sort; -- non-inherited op ... procedure Foo (Them: in out Thing_Vectors.Vector); -- non-inherited op ... type Other_Vector is new Thing_Vectors.Vector; ... V2: Other_Vector; ... Sort (Thing_Vectors.Vector(V2)); Foo (Thing_Vectors.Vector(V2)); we must typecast V2 for the calls to both Sort and Foo, because they are both non-inherited operations (and not specially declared for Other_Vector). >> I quite agree, and I'd argue the same for Generic_Merge (which only >> has two parameters). I'm also saying that making them classwide would >> be overkill. > > But we want to *statically* check that both parameters have the same > specific type, something we can't do if the parameters have type > List'Class. You've got hold of the wrong end of the stick, Matt! I am saying myself that the parameters to both Generic_Sort and Generic_Merge should /not/ be classwide. I am agreeing with you! I /agree/ that we want to statically check that both parameters (to any call of any instantiation of Generic_Merge) are of the same type. I was saying that I think Generic_Sort should not have a classwide parameter, because it would be better style for derived types (derived from Vector or List) to be explicitly typecast in a call to an instantiation of Generic_Sort, than to be opaquely typecast within the implementation. >>> We also need to affirm the declaration of Ordered_Sets.Generic_Keys. >> >> What are the alternatives, please? > > Same as for Generic_Sort or Generic_Merge. I don't quite understand this, since Generic_Sort and Generic_Merge are generic procedures, and Ordered_Sets.Generic_Keys is a generic package. However, if you're suggesting that parameters of type Set in this package be made classwide (Set'Class), I would prefer not, for similar reasons as above. > That fact that you don't like this locution is not a compelling reason > to disallow it. I wasn't trying to argue that it is a /compelling/ reason, only that it is a reason. Anyway, what do you think of the idea of changing the name of Set_Capacity to Set_Minimum_Capacity, or its Capacity parameter to Minimum? >> Okay, but I feel that arguments about the level of abstraction are a >> bit of a nicety here. Am I wrong that we are faced with three choices, >> none of which is plainly ideal? > > The 5-parameter insert is already ideal, since it's completely general. > Any other behavior you want can be written in terms of the canonical > insert. Fair enough. I still think the suggested versions of Insert (with an If_Exists parameter) and Replace (with an If_Nonexistent parameter) should be added, on the grounds of significant convenience. >> So, are you saying that the standard /should/ leave it implementation >> defined what the capacity is? > > Yes, of course. The standard needs needs to get out of the vendors' > way, too. Yes, I agree with this. **************************************************************** From: Randy Brukardt Sent: Tuesday, September 14, 2004 9:40 PM ... > we must typecast V2 for the calls to both Sort and Foo, because they are > both non-inherited operations (and not specially declared for > Other_Vector). You never have to use a type conversion to call a routine with a class-wide parameter of a root class, which is the case here. Our "standard" for Claw was that a specific typed non-primitive parameter represented a bug, as it would just restrict the uses of the library for no benefit. We in fact wrote checking for that into our help file generator, and later removed most of the ones found. (There were a couple of cases where extensions would have been real problems, functions that returned objects or for procedures with 'out' parameters. Neither applies to Generic_Sort.) The same appears to be true for the Containers libraries. **************************************************************** From: Matthew Heaney Sent: Wednesday, September 15, 2004 2:28 AM > > But we're not arguing about Index_Type'Last, we're arguing about > > Index_Type'Base'Last. If you want to define Index_Type'Last such that > > it equals the maximum number of elements, then go right ahead. > > But in general, if I declare: > > type T is range A..B; > > I cannot know that T'B < T'Base'B. That's because you declared T incorrectly: type T_Base is A .. B + 1; type T is new T_Base range A .. B; > > All you need to do to fix this is: > > > > type Card_Count is range 0..53; > > subtype Card_Index is Card_in_Hand_Count range 1..52; > > package Hand_Vectors is > > new Ada.Containers.Vectors (Card_ID, Card_Index); > > > > and now all is well. > > Except that Card_Count having a rnage of 0..53 just for the convenience of > the Find function is surely poor programming? This simply reflects Ada95 idioms for type declarations. You have to declare a pseudo base type whose (base) range has the range required, and then declare the real type as a subtype or derived type that restricts the range. See the "sum of 3 numbers" thread on CLA from a few months ago. > >>> We also need to affirm the declaration of Ordered_Sets.Generic_Keys. > >> > >> What are the alternatives, please? > > > > Same as for Generic_Sort or Generic_Merge. > > I don't quite understand this, since Generic_Sort and Generic_Merge are > generic procedures, and Ordered_Sets.Generic_Keys is a generic package. > > However, if you're suggesting that parameters of type Set in this package > be made classwide (Set'Class), I would prefer not, for similar reasons as > above. I'm leaning myself towards importing the set type as a generic formal. We'll have to see how the ARG weighs in this weekend. > Anyway, what do you think of the idea of changing the name of Set_Capacity > to Set_Minimum_Capacity, or its Capacity parameter to Minimum? Too much verbosity. (That's the same reason I'm not crazy about the name "Insert_Or_Replace".) **************************************************************** From: Randy Brukardt Sent: Wednesday, September 15, 2004 6:19 PM ... > > Our "standard" for Claw was that a specific typed non-primitive parameter > > represented a bug, as it would just restrict the uses of the library for no > > benefit. > > But it wouldn't actually /retrict/ uses of the subprogram, would it? It > would simply mean that the user would have to write an explicit type > conversion for an object or expression of a derived type, wouldn't it? Writing unnecessary type conversions *is* a bug. The reason for wanting explicit type is to indicate the possibility of a problem (for conversions that can fail or lose precision). Neither is true here. It's the same reason that Ada 05 expands the use of anonymous access types -- if you have conversions that can't fail, they're not interesting conversions -- they just clutter the code. > On the other hand, doesn't making the parameter class-wide restrict new > overloadings of the subprogram (for a derived type)? I think that could be > quite inconvenient occasionally. "Occasionally", perhaps. If you use a ton of use clauses. Otherwise, it's a non-problem, because the new routine would necessarily be in a different package. > > We in fact wrote checking for that into our help file generator, > > and later removed most of the ones found. (There were a couple of cases > > where extensions would have been real problems, functions that returned > > objects or for procedures with 'out' parameters. Neither applies to > > Generic_Sort.) > > I think this is the kind of problem I'm worried about. For a container type > T1, Generic_Sort may be instantiated to a procedure named Sort (or > whatever) in one phase of software construction, which may then get frozen, > and then a type T2 may be derived from T1 in a later phase. It might be an > annoyance not to be able to declare a new overloaded Sort (or whatever) for > T2. I suppose you'd have to name it something like Sort_T2. Not a disaster, > but an annoyance. (Of course, you /could/ declare the overloaded Sort, but > you couldn't call it unambiguously.) No problem, just prefix it to call it unambiguously. Moreover, if the Sort is class-wide, you don't even need to do this new routine; just call the original one. (If the instantiation is in the package with the type declaration, you can even do that with the prefix notation without any extra with or use clauses.) > > The same appears to be true for the Containers libraries. > > It does seem to be a similar situation. I have often found difficulty in > deciding on these kinds of details of the design, for packages which export > tagged types. I generally find that it's a mistake to try to be > too 'clever'. I agree. My rule is that all operations are either primitive or class-wide unless they are creating a new object of the type. > A class-wide parameter implies a dispatching implementation, doesn't it? No, not to me. It implies an operation that is *meaningful* to all members of a type. How it's implemented is not relevant. > However, the implementation of Generic_Sort would not do any dispatching; > it would simply have to internally typecast the parameter to the root type. That would be open to debate. It certainly would be easier to define it this way, but it would make sense for it to dispatch to the primitive operations of the type. (But that probably would be a mistake for performance reasons.) > I feel that this fact indicates a bug. I think, for this reason, it would > not be the right design to make the parameter(s) class-wide for > Generic_Sort or for Generic_Merge. Nope, it's irrelevant in my view. Claw has an entire package of operations on Root_Window_Type'Class, and there isn't a single dispatching operation in the bunch. They are all operations that make sense on any window; the implementation ought to be irrelevant. **************************************************************** From: Robert A. Duff Sent: Wednesday, September 15, 2004 6:55 AM > That's because you declared T incorrectly: > > type T_Base is A .. B + 1; > type T is new T_Base range A .. B; I think you want: subtype T is T_Base range A .. B; ****************************************************************From: Nick Roberts Sent: Wednesday, September 15, 2004 9:19 AM >>But in general, if I declare: >> >> type T is range A..B; >> >>I cannot know that T'B < T'Base'B. > > That's because you declared T incorrectly: > > type T_Base is A .. B + 1; > type T is new T_Base range A .. B; True, but it might not be obvious to a user that this must be done for an instantiation of Ada.Containers.Vectors. > This simply reflects Ada95 idioms for type declarations. You have to > declare a pseudo base type whose (base) range has the range required, and > then declare the real type as a subtype or derived type that restricts the > range. See the "sum of 3 numbers" thread on CLA from a few months ago. Yes, you are quite right. However, does this suggest that we need to add a second assertion? pragma Assert (Index_Type'Last < Index_Type'Base'Last); I don't really see why we don't just return Index_Type'First-1; it's what the string packages do. > I'm leaning myself towards importing the set type as a generic formal. > We'll have to see how the ARG weighs in this weekend. I think I see the logic of this idea, but I feel there is a danger that it might be a bit confusing for some users. >>Anyway, what do you think of the idea of changing the name of >>Set_Capacity >>to Set_Minimum_Capacity, or its Capacity parameter to Minimum? > > Too much verbosity. (That's the same reason I'm not crazy about the name > "Insert_Or_Replace".) Okay, but isn't verbosity a lesser sin than the potential for misinterpretation? Up to a point, certainly. I'm not suggesting a name such as Set_Capacity_to_This_Unless_It_Is_Less_Than_The_Length_In_Which_Case_... Anyway, changing the parameter name from 'Capacity' to 'Minimum' isn't increasing the verbosity (it would actually reduce it by one letter :-) **************************************************************** From: Matthew Heaney Sent: Wednesday, September 15, 2004 9:28 AM Randy wrote: >> Q1) Find_Index returns Last_Index (Container) + 1 if the element is not >> found. This seems consistent to me (it's past the end of the container in >> a forward search), but Matt worries that First_Index (Container) - 1 >> might be thought of as better. The trouble with First_Index (Container) - >> 1 is that you can't put it into an object: >> declare >> I : Index_Type := Index_Type'First; >> begin >> I := Find_Index (Vect, Item, I); >> while I <= Last_Index (Vect) loop >> -- Do something to the element I. >> I := Find_Index (Vect, Item, I+1); >> end loop; >> end; >> If Find_Index returned Index_Type'First - 1, saving the result of >> Find_Index would raise Constraint_Error if the item is not found. That's >> not what we want, I think. Actually, I think this example is wrong anyway. Object I should have type Index_Type'Base. You could write instead (see below): declare I : Index_Type'Base := Find_Index (Vect, Item); begin while I /= No_Index loop -- Do something to the element I. I := Find_Index (Vect, Item, I+1); end loop; end; One issue with Randy's original formulation is that there's a constraint check every time object I is assigned the value returned by Find. We could always liberalize what is acceptable as the value of the Index parameter of Find. Right now, we raise CE if Index < Index_Type'First, but it might make sense to allow No_Index (actually, any value less then Index_Type'First) as the value, and interpret that to mean begin the search at Index_Type'First. Nick Roberts responded: > The problem with Last_Index (Container) + 1 is that it may not exist, > because Last_Index (Container) might be Index_Type'Last. On the other hand, > we convenienty have a requirement that Index_Type'First > > Index_Type'Base'First, which guarantees that First_Index (Container) - 1 > does always exist (as a value of Index_Type'Base). > > I think that swings it. I suggest Find_Index returns First_Index > (Container) - 1 when it does not find what it is looking for. ... This is actually closer to how std::string works. If a search fails, it returns std::string::npos, which is defined as string::size_type(-1). It would be similar to what you have above: No_Index : constant Index_Type'Base := Index_Type'Pred (Index_Type'First); We would then have to affirm whether To_Index should raise CE if given No_Element as the argument, or return the value No_Index instead. (Our original motivation for raising CE was that we didn't know what index value to return, but if we declare No_Index as a distinguished value, then we really do have a value to return that makes sense.) **************************************************************** From: Matthew Heaney Sent: Wednesday, September 15, 2004 9:31 AM > However, does this suggest that we need to add a second assertion? > > pragma Assert (Index_Type'Last < Index_Type'Base'Last); > > I don't really see why we don't just return Index_Type'First-1; it's > what the string packages do. I would be in favor of declaring an object of type Index_Type'Base, named No_Index (or whatever), whose value is Index_Type'First - 1. Find_Index and Reverse_Find_Index would return No_Index when the search fails. I would also be in favor of function To_Index returning No_Index when the parameter has the cursor value No_Element (instead of raising CE). See my previous post for the details. Note that you can't write that assertion, because it would fail for generic actual types like Natural or Positive. **************************************************************** From: Nick Roberts Sent: Wednesday, September 15, 2004 5:25 PM > I would be in favor of declaring an object of type Index_Type'Base, > named No_Index (or whatever), whose value is Index_Type'First - 1. > Find_Index and Reverse_Find_Index would return No_Index when the search > fails. Specifically, I suggest the following declaration is inserted into the Ada.Containers.Vectors package specification: No_Index: constant Index_Type'Base := Index_Type'First - 1; > I would also be in favor of function To_Index returning No_Index when > the parameter has the cursor value No_Element (instead of raising CE). > > See my previous post for the details. I suggest the wording for To_Index in this package is changed to: function To_Index (Position : Cursor) return Index_Type'Base; If Position is No_Element, To_Index returns No_Index. Otherwise, it returns the index (within its containing vector) of the element designated by Cursor. I suggest the wording for the Find_Index and Reverse_Find_Index functions is changed to: function Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'First) return Index_Type'Base; Searches the elements of Container for an element equal to Item, starting at position Index. If Index is less than Index_Type'First, then the search begins at Index_Type'First. If there are no elements in the range Index .. Last_Index (Container) equal to Item, then Find_Index returns No_Index. Otherwise, it returns the index of the matching element with the lowest index. function Reverse_Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'Last) return Index_Type'Base; Searches the elements of Container in reverse for an element equal to Item, starting at position Index. If Index is greater than Last_Index (Container), then the search begins at position Last_Index (Container). If there are no elements in the range Index_Type'First .. Index equal to Item, then Reverse_Find_Index returns No_Index. Otherwise, it returns the index of the matching element with the highest index. I've also added the wording "with the lowest index" and "with the highest index" in this suggestion, in the hope it increases clarity slightly. **************************************************************** From: Randy Brukardt Sent: Wednesday, September 15, 2004 6:19 PM ... > > Our "standard" for Claw was that a specific typed non-primitive parameter > > represented a bug, as it would just restrict the uses of the library for no > > benefit. > > But it wouldn't actually /retrict/ uses of the subprogram, would it? It > would simply mean that the user would have to write an explicit type > conversion for an object or expression of a derived type, wouldn't it? Writing unnecessary type conversions *is* a bug. The reason for wanting explicit type is to indicate the possibility of a problem (for conversions that can fail or lose precision). Neither is true here. It's the same reason that Ada 05 expands the use of anonymous access types -- if you have conversions that can't fail, they're not interesting conversions -- they just clutter the code. > On the other hand, doesn't making the parameter class-wide restrict new > overloadings of the subprogram (for a derived type)? I think that could be > quite inconvenient occasionally. "Occasionally", perhaps. If you use a ton of use clauses. Otherwise, it's a non-problem, because the new routine would necessarily be in a different package. > > We in fact wrote checking for that into our help file generator, > > and later removed most of the ones found. (There were a couple of cases > > where extensions would have been real problems, functions that returned > > objects or for procedures with 'out' parameters. Neither applies to > > Generic_Sort.) > > I think this is the kind of problem I'm worried about. For a container type > T1, Generic_Sort may be instantiated to a procedure named Sort (or > whatever) in one phase of software construction, which may then get frozen, > and then a type T2 may be derived from T1 in a later phase. It might be an > annoyance not to be able to declare a new overloaded Sort (or whatever) for > T2. I suppose you'd have to name it something like Sort_T2. Not a disaster, > but an annoyance. (Of course, you /could/ declare the overloaded Sort, but > you couldn't call it unambiguously.) No problem, just prefix it to call it unambiguously. Moreover, if the Sort is class-wide, you don't even need to do this new routine; just call the original one. (If the instantiation is in the package with the type declaration, you can even do that with the prefix notation without any extra with or use clauses.) > > The same appears to be true for the Containers libraries. > > It does seem to be a similar situation. I have often found difficulty in > deciding on these kinds of details of the design, for packages which export > tagged types. I generally find that it's a mistake to try to be > too 'clever'. I agree. My rule is that all operations are either primitive or class-wide unless they are creating a new object of the type. > A class-wide parameter implies a dispatching implementation, doesn't it? No, not to me. It implies an operation that is *meaningful* to all members of a type. How it's implemented is not relevant. > However, the implementation of Generic_Sort would not do any dispatching; > it would simply have to internally typecast the parameter to the root type. That would be open to debate. It certainly would be easier to define it this way, but it would make sense for it to dispatch to the primitive operations of the type. (But that probably would be a mistake for performance reasons.) > I feel that this fact indicates a bug. I think, for this reason, it would > not be the right design to make the parameter(s) class-wide for > Generic_Sort or for Generic_Merge. Nope, it's irrelevant in my view. Claw has an entire package of operations on Root_Window_Type'Class, and there isn't a single dispatching operation in the bunch. They are all operations that make sense on any window; the implementation ought to be irrelevant. **************************************************************** From: Randy Brukardt Sent: Wednesday, September 15, 2004 7:01 PM > I would be in favor of declaring an object of type Index_Type'Base, > named No_Index (or whatever), whose value is Index_Type'First - 1. > Find_Index and Reverse_Find_Index would return No_Index when the search > fails. That would OK, except that you are assuming that users will somehow magically use Index_Type'Base when using Find, but use Index_Type for everything else. That's very impractical; I often will put the result of operations like Find directly into the resulting data structure (sometimes in an aggregate), and it would be awful to have to use 'Base all over. Moreover, how would anyone remember to use 'Base? I don't think I've ever written 'Base outside of a generic unit; it certainly wouldn't be the first thing I'd think of. So I view this proposal as being the same as saying that Find raises Constraint_Error whenever the object is not found. Moreover, this isn't documented, so most users will have to find it out by trial and error. Unless they use the container very frequently, this will be a major gotcha. > I would also be in favor of function To_Index returning No_Index when > the parameter has the cursor value No_Element (instead of raising CE). That means that we have to carefully study every operation to see if the behavior for No_Index is proper. Since most of the cursor operations are defined in terms of To_Index, that is going to be a big job. (And most likely, some of the index operations should *not* raise an exception if given No_Index - Find_Index for example.) While I'm sure we can come up with a consistent semantics, such a major rewrite will most likely prevent the AI from being approved at this meeting. (I will be completely opposed to approving any AIs with major changes; most of the ones approved in the past were too full of holes to complete.) You'd also be giving more ammunition to those who claim that this container library isn't mature enough to standardize. Even though this is arguably a corner-case, it gives the impression of continuing flux in the interface. So, all in all, I'd rather leave the whole thing alone; it's insufficiently broken -- the only problem is that a failed Find would raise Constraint_Error if the array is full. That isn't a major issue - I could argue that failed Find always should raise an exception (that's what Claw does in most cases) - and in any case, lots of operations are going to fail with a full array. Don't do it. :-) **************************************************************** From: Matthew Heaney Sent: Wednesday, September 15, 2004 11:25 PM > > I would be in favor of declaring an object of type Index_Type'Base, > > named No_Index (or whatever), whose value is Index_Type'First - 1. > > Find_Index and Reverse_Find_Index would return No_Index when the > > search fails. > > That would OK, except that you are assuming that users will > somehow magically use Index_Type'Base when using Find, but > use Index_Type for everything else. That's very impractical; > I often will put the result of operations like Find directly > into the resulting data structure (sometimes in an > aggregate), and it would be awful to have to use 'Base all > over. Moreover, how would anyone remember to use 'Base? I > don't think I've ever written 'Base outside of a generic > unit; it certainly wouldn't be the first thing I'd think of. This is the most compelling argument (for retaining the current semantics). I often assume most Ada users are like Bob Duff, and sometimes need reminding that there are very few Bob Duffs... > So I view this proposal as being the same as saying that Find > raises Constraint_Error whenever the object is not found. > Moreover, this isn't documented, so most users will have to > find it out by trial and error. Unless they use the container > very frequently, this will be a major gotcha. Agreed. ... > So, all in all, I'd rather leave the whole thing alone; it's > insufficiently broken -- the only problem is that a failed > Find would raise Constraint_Error if the array is full. That > isn't a major issue - I could argue that failed Find always > should raise an exception (that's what Claw does in most > cases) - and in any case, lots of operations are going to > fail with a full array. Don't do it. :-) Agreed. **************************************************************** From: Robert A. Duff Sent: Thursday, September 16, 2004 10:27 AM > This is the most compelling argument (for retaining the current semantics). > I often assume most Ada users are like Bob Duff, and sometimes need > reminding that there are very few Bob Duffs... Well, Bob Duff doesn't want to use 'Base all over the place, either. It seems to me that if there's a "special" value returned in the "not found" case, it is entirely Good and Right to declare a constant called Not_Found or some such. And there should be a subtype that includes that value plus all the normal index values. Whether you're putting the result of Find functions in data strucutures or local variables, you should use that subtype if the result might be Not_Found -- or you can assert that it *will* be found by using the normal index subtype. None of this requires using 'Base "all over" -- perhaps once in the generic, and none in client code. I don't think this is some sort of bobduffian arcanity -- it's no different from declaring a subtype 0..N to count the number if Things, and another subtype 1..N to index them. Aside: this is the same reason why Ada desperately needs a "not null" constraint on access subtypes. Otherwise, there's no way to express the difference between "X points at a Thing" versus "X either points at a Thing or has a special null value". This is the source of numerous bugs, IME. **************************************************************** From: Matthew Heaney Sent: Thursday, September 16, 2004 12:21 PM The function Last_Index returns Index_Type'Base too, to handle the case of an empty vector. Indeed, handling the result of Last_Index was the reason we settled on an integer index type. You could argue for declaring a No_Index value on the basis of Last_Index alone. Note also that Get_Line returns Line'First - 1 (or is it 0?) when the line is empty. And Nick has pointed out that the search functions in Ada.Strings.* return 0 to indicate not found. So there is precedent for needing to handle a "special" value for index-based functions. We already have an Index_Subtype in the spec, that simply renames the generic formal type. One way to avoid having to say IT'Base is to define that subtype as: subtype Index_Subtype is Index_Type'Base range Index_Type'First - 1 .. Index_Type'Last; No_Index : constant Index_Subtype := Index_Subtype'First; **************************************************************** From: Randy Brukardt Sent: Thursday, September 16, 2004 1:15 PM It would need a different name if you were to do that; "Index_Subtype" doesn't imply the correct semantics (it implies a constraint, not the lack of one). Moreover, as Bob points out, you really need both, and it would seem weird to define only one or the other. We usually name these subtypes _Count and _Index (or _Indices); the first because of the strong precident in Streams, Storage_Elements, and Direct_IO, and the second to make it clear what the purpose is. (Direct_IO uses "Positive_Count", which is more confusing than anything.) The problem with that naming is that we already have a separate type for Counts. So I can't think of an appropriate name. And, in any case, my concerns about making a significant change here still apply. This is such a minor issue that it just doesn't seem worth the effort of checking and possibly changing the definition of every routine in the vector package. **************************************************************** From: Nick Roberts Sent: Thursday, September 16, 2004 5:51 PM > The problem with that naming is that we already have a separate type for > Counts. So I can't think of an appropriate name. I suggest: subtype Index_Subtype is Index_Type; subtype Extended_Index is Index_Type'Base range Index_Type'First - 1 .. Index_Type'Last; No_Index: constant Extended_Index := Index_Type'First - 1; function To_Index (Position : Cursor) return Extended_Index; function Last_Index (Container : Vector) return Extended_Index; function Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'First) return Extended_Index; function Reverse_Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type'Base := Index_Type'Last) return Extended_Index; and change the wording for To_Index and Last_Index to: function To_Index (Position : Cursor) return Extended_Index; If Position is No_Element, To_Index returns No_Index. Otherwise, it returns the index (within its containing vector) of the element designated by Cursor. function Last_Index (Container : Vector) return Extended_Index; If Container is empty, Last_Index returns No_Index; otherwise, it returns the position of the last element in Container. Some other wordings may need to be changed: Except for the wording of To_Cursor, change every occurrance of "If Index is not in the range First_Index (Container) .. Last_Index (Container)" or "If Index does not specify a value in the range First_Index (Container) .. Last_Index (Container)" to "If Container is empty or Index is not in the range First_Index (Container) .. Last_Index (Container)". The wording for the To_Cursor function appears to work without change. The wordings for all the Insert and Insert_Space procedures appear to work without change (as a side-benefit of No_Index = Index_Type'First - 1). Incidentally, the wordings for the Insert and Insert_Space procedures which have a Position out-parameter appear to suggest that Position is not set to anything if Length (New_Item) = 0. Is this correct? The definitions of Append all appear to work without change. Likewise for Delete, Delete_First, and Delete_Last. (The wording for Delete_Last does actually work.) As a very minor point, the wording for the Delete procedure (with Count) has "Any exceptions raised during element assignment are propagated." Should this be "Any exception propagated by an element assignment is propagated by Delete."? Also add the wording (after the package spec): No_Index represents a position that does not correspond to any element. The subtype Extended_Index covers the indices covered by Index_Subtype plus the value No_Index. I've already suggested wordings for Find_Index and Reverse_Find_Index. I don't think anything else is affected. It seems quite neat to me, in a way, that the declaration of Extended_Subtype would supplant the assertion (pragma), as it would itself cause any instantiation of Ada.Containers.Vectors with Index_Type'First = Index_Type'Base'First to fail. > And, in any case, my concerns about making a significant change here still > apply. This is such a minor issue that it just doesn't seem worth the effort > of checking and possibly changing the definition of every routine in the > vector package. For what it's worth, I would like to see these changes made. I don't feel that it would be a major change, and I do feel it would be worth it. But I guess things are getting close to the bone now. **************************************************************** From: Jeffrey Carter Sent: Friday, September 17, 2004 1:49 AM Robert A Duff wrote: > > It seems to me that if there's a "special" value returned in the "not > found" case, it is entirely Good and Right to declare a constant called > Not_Found or some such. And there should be a subtype that includes > that value plus all the normal index values. Whether you're putting the > result of Find functions in data strucutures or local variables, you > should use that subtype if the result might be Not_Found -- or you can > assert that it *will* be found by using the normal index subtype. There's a simple solution to the question of what to return if a Find operation doesn't find the specified value. There is a software engineering principle that a value should have one and only one interpretation. Returning an index from a Find operation and having a special value for the not-found case violates this principle. The solution is to use a single value to indicate if the operation found the value, and a 2nd value to indicate the index at which it was found if it was found: type Find_Result (Found : Boolean) is record case Found is when False => null; when True => Index : Index_Value; end case; end record; Now there is no requirement that the range of index values be smaller than its base type (at least for the Find operation). **************************************************************** From: Robert A. Duff Sent: Friday, September 17, 2004 9:02 AM I agree with the principle, but I think Ada doesn't have what it takes to do it cleanly and efficiently. If this were, say, ML then we might want to do something like this. **************************************************************** From: Randy Brukardt Sent: Friday, September 24, 2004 8:57 PM The Update_Element routine is available in all of the containers. It has the profile of: procedure Update_Element (Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); A recent discussion on comp.lang.ada points out that this requires defining a subprogram to make any in-place modification. That's clunky, and for larger objects, there really isn't any option. Matt originally had returned an access type here, but he needed a generic to work with user defined types, which made the solution rather clunky. It also required elements to be constrained (so that they could be aliased), which was considered ugly for definite types. With the recent approval of AI-363 and AI-318, both of these objections seem to have disappeared. AI-363 repeals 3.6(11), so an aliased component can be unconstrained. And AI-318 defines anonymous access for function returns, which avoids the need to introduce a named access type. So, should we reconsider a specification of: function Update_Element (Position : in Cursor) return not null access Element_Type; with Query_Element changed to: function Query_Element (Position : in Cursor) return not null access constant Element_Type; These would make most simple updates easy; the only lose is the "free" binding of the element to a name. Instead of: declare procedure Increment (Count : in out Natural) is begin Count := Count + 1; end Increment; begin Update_Element (A_Cursor, Increment'Access); end; we'd have: declare Item : Natural renames Update_Element(A_Cursor).all; begin Item := Item + 1; end; I'm not sure that putting access types into the specification is a good idea, but it certainly would be easier to use. The returned access would need rules similar to those for cursors (so that it would be erroneous to use it after certain operations). - so it would complicate the wording of the standard a bit. In the absence of a lot of support for this idea, we probably should stay with the current specification (we need to freeze this thing soon). **************************************************************** From: Tucker Taft Sent: Friday, September 24, 2004 9:18 PM Randy Brukardt wrote: > The Update_Element routine is available in all of the containers. It has the > profile of: > > procedure Update_Element > (Position : in Cursor; > Process : not null access procedure (Element : in out Element_Type)); > > A recent discussion on comp.lang.ada points out that this requires defining > a subprogram to make any in-place modification. That's clunky, and for > larger objects, there really isn't any option. I can't parse this last sentence. What do you mean "and for larger objects, there really isn't any option"? > ... > In the absence of a lot of support for this idea, we probably should stay > with the current specification (we need to freeze this thing soon). Leave it as is, in my view. Safety over convenience, and defining a procedure locally isn't that inconvenient, once you get used to the idea. **************************************************************** From: Randy Brukardt Sent: Friday, September 24, 2004 9:35 PM > I can't parse this last sentence. What do you mean "and for > larger objects, there really isn't any option"? Yuck. Let's try again: That's clunky, and for larger objects, there really isn't any alternative to using in-place modification. > Leave it as is, in my view. Safety over convenience, and > defining a procedure locally isn't that inconvenient, once > you get used to the idea. Kinda my position, too, but I thought it was good to ask the wider community before freezing this for all time. **************************************************************** From: Matthew Heaney Sent: Saturday, September 25, 2004 12:23 AM Actually, having a function like this would simplify the API a little. You could get rid of Update_Element and Query_Element, and change the Element function like this: function Element (C : Cursor) return access Element_Type; This does everything Query_Element, Update_Element, and Element do. You could then say: E : ET renames Element (C).all; or Op (Element (C).all); This is exactly analogous to the STL: void f(vect_t::iterator i) { E& e = *i; g(*i); //... } For the set, you'd want to pass a constant view, so Element would be declared this way: function Element (C : Cursor) return access constant Element_Type; and then a renaming of E, like this: E : ET renames Element (C).all; would be a constant view. This is analogous to the C++ statement: void f(set_t::const_iterator i) { const E& e = *i; } In the case of a map, the key selector returns a constant view, and the element selector returns a variable view: function Key (C : Cursor) return access constant Key_Type; function Element (C : Cursor) return access Element_Type; so you could say: K : KT renames Key (C).all; E : ET renames Element (C).all; where K is a constant view, and E is a variable view. This is analogous to the C++ statements: void f{map_t::iterator i) { const K& k = i->first; E& e = i->second; } **************************************************************** From: Nick Roberts Sent: Saturday, September 25, 2004 9:19 AM This seems like such a better interface design, I say it is worth making the change. **************************************************************** From: Robert A. Duff Sent: Saturday, September 25, 2004 11:04 AM > Leave it as is, in my view. Safety over convenience, and > defining a procedure locally isn't that inconvenient, once > you get used to the idea. Well, I don't like dangling pointers, either, but I lean the other way on this one: I think being forced to move a small hunk of code away from where I want it, wrap it in several lines of syntax, and clutter the namespace with a meaningless procedure name really IS a big pain. (That is, forcing the programmer to make an abstraction boundary at a place where it's inappropriate to do so.) I view this as "safety versus readability", not "safety versus convenience" -- and that makes the choice not so obvious. If you give me Lisp lambdas, I wouldn't mind the procedure approach. OTOH, if you give me C++ references, the ref approach could be safe (I think?). **************************************************************** From: Pascal Obry Sent: Saturday, September 25, 2004 12:57 AM > So, should we reconsider a specification of: > > function Update_Element (Position : in Cursor) > return not null access Element_Type; Why not keep both ? **************************************************************** From: Matthew Heaney Sent: Sunday, September 26, 2004 3:13 AM You could, but the only reason we needed the procedures is because we needed a reference to the in-place object. The function gives you that, which obviates the need for the procedure. For the vectors and lists, you would have this: function Element (C : Cursor) return not null access Element_Type; For the sets, you would have this: function Element (C : Cursor) return not null access constant Element_Type; For the maps, you would have these: function Key (C : Cursor) return not null access constant Key_Type; function Element (C : Cursor) return not null access Element_Type; The (in)famous wordcount program would look like: declare C : Cursor; B : Boolean; begin Insert (M, Word, 0, C, B); declare N : Natural renames Element (C).all; begin N := N + 1; end; end; **************************************************************** From: Martin Dowie Sent: Monday, September 27, 2004 3:10 AM This looks very elegant, readable and more comprehensible to me... **************************************************************** From: Pascal Leroy Sent: Monday, September 27, 2004 3:51 AM > Leave it as is, in my view. Safety over convenience, and > defining a procedure locally isn't that inconvenient, once > you get used to the idea. I agree. It seems to me that returning access values is opening the door to all sorts of dangling pointers bugs. Consider the case of a vector, which is probably implemented using one or several arrays. Returning an access value designating an element means that if/when an array is reallocated, the access becomes dangling. And that can happen at the drop of a hat. So it would be very hard indeed for the client to prevent nasty bugs from happening. I believe that safety should be of paramount importance when making decisions about the design of the containers: we don't want to add cases of erroneousness unless we absolutely have to. The alternative is to say that access values never become dangling, but that would unnecessarily constrain the implementation. For instance, it would not be legitimate for the implementation of vectors to reallocate an array. Bob wrote: > Well, I don't like dangling pointers, either, but I lean the > other way on this one: I think being forced to move a small > hunk of code away from where I want it, wrap it in several > lines of syntax, and clutter the namespace with a meaningless > procedure name really IS a big pain. > (That is, forcing the programmer to make an abstraction > boundary at a place where it's inappropriate to do so.) I suspect that readability is in the eye of the beholder, to some extent. I'd rather see a crisp, 10-line subprogram gathering all the processing that pertain to an element (or a key), than a 500-line procedure squirreling away a pointer at the beginning and using it in random places throughout the code. Furthermore, I am not convinced that the programmer would be forced to create "inappropriate abstractions". It seems to me that the operations that are being performed on an element are good candidates for encapsulation and/or reuse (remember, they don't have to be local) so most of the time they are exactly the abstraction you want to create. And sorry, I don't care if you have to type a few extra lines of syntax. **************************************************************** From: Pascal Obry Sent: Monday, September 27, 2004 3:41 AM Ok, but this is only a workcount program ! In some cases the changes that need to be done on the element could be quite more complex. In such cases it would certainly better to have the procedure "callback". But well it is true that it is always possible to pass the result of the function to a procedure... Looks like the function is more versatile after all! **************************************************************** From: Robert A. Duff Sent: Monday, September 27, 2004 4:02 PM Before I start ranting, let me say this first: I agree with whoever said we should provide both. I don't think we need to be super-minimalist here. Pascal wrote: > Tuck wrote: > > > Leave it as is, in my view. Safety over convenience, and > > defining a procedure locally isn't that inconvenient, once > > you get used to the idea. > > I agree. It seems to me that returning access values is opening the door > to all sorts of dangling pointers bugs. Consider the case of a vector, > which is probably implemented using one or several arrays. Returning an > access value designating an element means that if/when an array is > reallocated, the access becomes dangling. And that can happen at the drop > of a hat. I haven't read the entire latest version, but in the C++ STL, it doesn't happen at the drop of a hat. It can happen at fairly well-defined places. I hope that's still true in the Ada proposal. >... So it would be very hard indeed for the client to prevent nasty > bugs from happening. Actually, it's not so hard, I think: If you call the pointer-returning function, immediately do .all of that, and rename the result, you get essentially what the pass-a-procedure interface gives you, with somewhat less syntactic cruft and namespace pollution. The renaming can't dangle, unless you modify the data structure in the scope of the renaming. (Here, by "modify the data structure" I mean things like adding and deleting elements -- as opposed to modifying the particular element we've got our hands on.) But in the pass-a-procedure interface, the same is true: if you modify the data structure within that procedure, the parameter becomes a dangling pointer (at least, if passed by reference, which would usually be true in the cases we're talking about). >... I believe that safety should be of paramount > importance when making decisions about the design of the containers: we > don't want to add cases of erroneousness unless we absolutely have to. > > The alternative is to say that access values never become dangling, but > that would unnecessarily constrain the implementation. For instance, it > would not be legitimate for the implementation of vectors to reallocate an > array. > > Bob wrote: > > > Well, I don't like dangling pointers, either, but I lean the > > other way on this one: I think being forced to move a small > > hunk of code away from where I want it, wrap it in several > > lines of syntax, and clutter the namespace with a meaningless > > procedure name really IS a big pain. > > (That is, forcing the programmer to make an abstraction > > boundary at a place where it's inappropriate to do so.) > > I suspect that readability is in the eye of the beholder, to some extent. > I'd rather see a crisp, 10-line subprogram gathering all the processing > that pertain to an element (or a key), than a 500-line procedure > squirreling away a pointer at the beginning and using it in random places > throughout the code. > > Furthermore, I am not convinced that the programmer would be forced to > create "inappropriate abstractions". It seems to me that the operations > that are being performed on an element are good candidates for > encapsulation and/or reuse (remember, they don't have to be local) so most > of the time they are exactly the abstraction you want to create. Well, I strongly disagree with the above paragraph. First of all, a philisophical point: it is not our place, as language designers, to decide that certain things are "good candidates" for encapsulation, and then *force* programmers to encapsulate on exactly those boundaries. Instead, we should be providing tools for encapsulation, and let programmers choose where to use them. I don't usually like "500-line squirreling" procedures either, but it's not our job to tell people how many lines of code are appropriate in any given procedure. Second, the procedures in question *do* have to be local, in nearly all cases, because they need more information than just the Element parameter. That is, the useful (perhaps reusable) abstraction is probably a procedure with *two* parameters, so we would need a local wrapper procedure with one parameter. Consider this example: procedure Grind_Upon_String(S: String) is begin for I in S'Range loop Insert_In_Table(Key => S(I), Value => I); end loop; end Grind_Upon_String; Kind of silly: we're inserting Key,Value pair into a table, consisting of the character and its index in the table. Instead of the index in the table, it might well be some other local variable of Grind_Upon_String; you get the idea. The point is, the programmer has chosen Insert_In_Table as the appropriate abstraction. Wrapping it in another abstraction gains nothing: procedure Grind_Upon_String(S: String) is begin for I in S'Range loop declare procedure Insert_In_Table_With_I_As_Value (X: Character) is begin Insert_In_Table(Key => X, Value => I); end Insert_In_Table_With_I_As_Value; begin Insert_In_Table_With_I_As_Value(S(I)); end; end loop; end Grind_Upon_String; I'd be tempted to call Insert_In_Table_With_I_As_Value "Process_Element", which is a meaningless name -- which is appropriate, because it's a meaningless [non]abstraction. The fact that the above has a loop is not relevant to my point -- we've found an element by some means, and we want to do something with it (or perhaps modify it). I think if you inspect your own code, and look for cases where you're processing one element of some data structure (either in a loop, or based on a lookup, or based on some other info), you will find few cases where the code to process one element is exactly one call to a procedure with exactly one parameter (the element). Consider a simple algorithm for reversing a sequence, by moving two indices (or cursors!) inward from both ends. Surely the "swap the two current items" code doesn't deserve its own procedure (although "swap two items" with two parameters probably does). > And sorry, I don't care if you have to type a few extra lines of syntax. Come on, Pascal! Surely you know me better than that! When I complain about verbosity, I'm complaining about having to read useless junk -- not about having to type it in. See the second Grind_Upon_String above, which has a lot of "noise" compared to the amount of code conveying useful information to the reader. By the way, whether we use an accessor-returning-pointer or pass-a-procedure, it seems like we need two versions: one for read-only access, and one for read/write access. **************************************************************** From: Randy Brukardt Sent: Monday, September 27, 2004 5:07 PM Bob Duff wrote: > Before I start ranting, let me say this first: I agree with whoever said > we should provide both. I don't think we need to be super-minimalist > here. I don't see the point. No one is going to write "Process" subprograms if they don't have to. Once we've defined access versions, we're done there. (It would be especially good if we could write the language rules to avoid dangling in most cases.) I do agree in one sense though; I see no reason to drop the convenient value-returning function and value-replacing procedure. Doing so would just clutter up the code with .alls; and you really need the procedure for indefinite types (because you can't change constraints via .all there - the objects can be constrained and would need to be reallocated). > Actually, it's not so hard, I think: If you call the pointer-returning > function, immediately do .all of that, and rename the result, you get > essentially what the pass-a-procedure interface gives you, with somewhat > less syntactic cruft and namespace pollution. The renaming can't > dangle, unless you modify the data structure in the scope of the renaming. But there isn't a way to enforce this usage (unless the infinite accessibility idea flies). There would be a lot more cases of erroneous usage, which I know will make some reviewers nervous. > But in the pass-a-procedure interface, > the same is true: if you modify the data structure within that > procedure, the parameter becomes a dangling pointer (at least, if passed > by reference, which would usually be true in the cases we're talking about). Humm, sounds like a case that needs to be enumerated in the "Erroneous Execution" part of the standard. "If Element_Type is not a by-copy type, ...." > By the way, whether we use an accessor-returning-pointer or > pass-a-procedure, it seems like we need two versions: one for read-only > access, and one for read/write access. We have Query_Element (read-only) and Update_Element (read-write) currently. It would expect that we'd only change to anon access returns for those, with no other changes to the spec. (as I noted above). **************************************************************** From: Robert A. Duff Sent: Monday, September 27, 2004 5:18 PM I wrote: > > Before I start ranting, let me say this first: I agree with whoever said > > we should provide both. I don't think we need to be super-minimalist > > here. Randy replied: > I don't see the point. No one is going to write "Process" subprograms if > they don't have to. ... That may well be true. If so, it's good evidence that the Process subprogram is a bitter pill (for curing an admittedly nasty disease). > > Actually, it's not so hard, I think: If you call the pointer-returning > > function, immediately do .all of that, and rename the result, you get > > essentially what the pass-a-procedure interface gives you, with somewhat > > less syntactic cruft and namespace pollution. The renaming can't > > dangle, unless you modify the data structure in the scope of the renaming. > > But there isn't a way to enforce this usage (unless the infinite > accessibility idea flies). There would be a lot more cases of erroneous > usage, which I know will make some reviewers nervous. Correct. We could put in a NOTE recommending renaming. Whenever I write these sorts of return-pointer things, I put in a comment saying "Beware dangling pointers", and recommending renames. **************************************************************** From: Ehud Lamm Sent: Tuesday, September 28, 2004 2:27 AM For what it's worth I am with Bob as regards this issue. The style of programming implied by the collection interface (as inspired by the STL and other collection interfaces etc.) encourges the creation of small and rather meaningless element processing procedures, ones that are often quite thightly depedent on local scope. Without "anonymous functions" this style of programming can have a bad impact on readability, an reliability. Notice that C#, for example, added "anonymous delegates" to better support this style of programming. I guess that's out of the question for us at this point... We should keep in mind that the renaming "trick" reuiqres deep understanding of the language, and is quite subtle for beginners to understand. A NOTE is a good idea, as well as a style guide (is there going to be a new AQ&S guide?) **************************************************************** From: Matthew Heaney Sent: Tuesday, September 28, 2004 8:59 AM I had this problem today (see ai302/examples/shapes). I needed to sort array, which looks like this: Rect : aliased Rectangle_Type; Line : aliased Line_Type; Face : aliased Face_Type; type Shape_Array is array (Positive range <>) of access Shape_Type'Class; V : Shape_Array := (Rect'Access, Line'Access, Face'Access); I decided to try out the fancy new array declaration syntax, that allows an anonymous access type as the array element subtype. procedure Sort is new Ada.Containers.Generic_Array_Sort (Positive, ???, Shape_Array); I have no actual type match the Generic_Array_Sort.Element_Type formal. To solve this problem, I came up with another kind of sorting procedure: generic type Index_Type is (<>); with function Less (Left, Right : Index_Type) return Boolean is <>; with procedure Swap (Left, Right : Index_Type) is <>; procedure Generic_Sort (First, Last : in Index_Type'Base); That allows me to say: Sort_V: declare function Less (I, J : Positive) return Boolean is IW : constant Point_Type := West (V (I).all); JW : constant Point_Type := West (V (J).all); begin return IW.X < JW.X; end; procedure Swap (I, J : Positive) is E : Shape_Type'Class renames V (I).all; begin V (I) := V (J); V (J) := E'Access; end; procedure Sort is new Generic_Sort (Positive); begin Sort (V'First, V'Last); end Sort_V; I don't know if this will be a problem or not, but I thought I'd bring it up... **************************************************************** From: Matthew Heaney Sent: Tuesday, September 28, 2004 12:11 PM Actually, I just realized that this is an issue even in Ada today, since you can declare an array object whose type is anonymous: V : array (1 .. 3) of Shape_Class_Access; We don't have a generic actual array type to match the Array_Type formal. However, the Generic_Sort declared below works for this array object declaration, too. **************************************************************** From: Tucker Taft Sent: Tuesday, September 28, 2004 12:35 PM I would like to see a generic sort like that as well. It is probably too late to standardize it, but getting it into your "reference implementation" would certainly be a start. I have never been completely happy with the array sorts we provide, since they seem unnecessarily "concrete." So long as the user provides the compare and the swap, we really don't care what is the type of the "array" or array-like thing. **************************************************************** From: Randy Brukardt Sent: Tuesday, September 28, 2004 2:34 PM That's true as long as you don't care about the usability and performance of the result. But Sort is an expensive and common operation, and specializing it enough to make it perform reasonably is valuable. I'd be very opposed to a compare and swap sort as the only one provided. To expand on that a bit, the usability issue should be obvious: you usually would have to write a swap routine. While the compare often already exists for other reasons, there almost never is a reason to have a swap. The performance issue is simply that a swap routine injects an extra subprogram call into the mix. Moreover, it prevents any optimization of element movement - you have to use a straight swap even if something better is available. (You can't take advantage of relinking elements or the fact that list sorts are stable for free this way.) For small elements, that overhead is substantial. (And on a generic sharing implementation, it is even worse. Ours has to save/restore displays on formal subprogram calls.) **************************************************************** From: Tucker Taft Sent: Friday, September 2, 2004 4:23 PM I was definitely not suggesting we drop the others. I was saying that for my personal use, I have found the existing ones overly "concrete." Your mileage obviously varies, and I accept that. I think Matt's very-generic sort would be nice to have, but I don't think it is worth standardizing at this point... **************************************************************** From: Matthew Heaney Sent: Tuesday, September 28, 2004 4:16 PM Has the language been modified to allow an anonymous access type as the generic formal array element subtype? generic type IT is (<>); type ET (<>) is limited private; type Array_T is array (IT) of access ET; --legal? procedure GP (A : Array_Type); Can the type (with access constant element subtype): type Array_T is array (1 .. 3) of access constant T; be passed as the actual type for generic formal array type GP.Array_T? Are there any other combinations? **************************************************************** From: Randy Brukardt Sent: Tuesday, September 28, 2004 4:28 PM > Has the language been modified to allow an anonymous access type as the > generic formal array element subtype? > > generic > type IT is (<>); > type ET (<>) is limited private; > type Array_T is array (IT) of access ET; --legal? > procedure GP (A : Array_Type); No. > Can the type (with access constant element subtype): > > type Array_T is array (1 .. 3) of access constant T; > > be passed as the actual type for generic formal array type GP.Array_T? No. > Are there any other combinations? Who cares? There are a lot of cases in Ada where you can't use an anonymous type to do something. (You can't write a type conversion or qualified expression, for instance.) If it hurts, don't do that. :-) Anonymous types are supposed to be a convinience feature, not a cornerstone of design. Use them sparingly. **************************************************************** From: Matthew Heaney Sent: Friday, September 28, 2004 4:40 PM OK, but I think I gave a reasonable example. If I declare my array this way: declare type Array_T is array (Positive range <>) of T_Access; E1 : aliased T; E2 : aliased T; E3 : aliased T: A : Array_T := (E1'Unchecked_Access, E2'Unchecked_Access, E3'Unchecked_Access); begin ... end; The issue is that type T_Access is declared in an outer scope, and so the language requires the use of 'Unchecked_Access. However, here that's simply crying wolf. I'd rather say: A : Array_T := (E1'Access, E2'Access, E3'Access); but to do that I need to either declare a local access type, or declare the array element subtype as an anonymous access type. I was trying to be sparing, so I chose the latter, but then that created the problem instantiating the generic... **************************************************************** From: Tucker Taft Sent: Tuesday, September 28, 2004 4:56 PM I will admit I never noticed that these anonymous access types didn't make it into generic formals. It seems they should, presuming we have a good definition for "statically matching subtypes." That is the requirement, in general, for component subtypes. I believe we allow them in the discriminant part of formal discriminated types, so I don't see why we shouldn't allow them in the component-subtype definition for a formal array type. I think this was an oversight rather than intentional. **************************************************************** From: Tucker Taft Sent: Tuesday, September 28, 2004 5:17 PM Actually, the syntax for formal_array_type_definition simply says array_type_definition, so anonymous access types are permitted as the component type in a generic formal. So I think Randy was wrong in saying they weren't permitted. **************************************************************** From: Randy Brukardt Sent: Tuesday, September 28, 2004 5:24 PM OK, but then we haven't defined a matching rule. Or does that somehow fall out? **************************************************************** From: Tucker Taft Sent: Tuesday, September 28, 2004 7:35 PM It requires that the component subtypes match statically. That is defined for anonymous access types in 4.9.1(2) to require that the designated subtypes match statically. This was updated in AI-231 to also require that null-exclusiveness and access-to-constantness match. **************************************************************** From: Pascal Leroy Sent: Wednesday, September 29, 2004 4:20 AM > I think Randy and I were worried about two different kinds of > dangling references. I was worried about the one that would > occur if you left the scope where a vector object was > declared. Randy was worrying about dangling references that > would occur if you altered the vector object, with the side > effect of some part of it being deallocated and then > reallocated elsewhere. Thanks for the example and the clarification. I have been watching this thread with total bewilderment because I had no idea what problem you were trying to solve. My feeling is that the restrictions that you have to impose on the function, in particular in the case where it returns a call to another function, are so drastic as to seriously cripple functions-returning-anonymous-accesses. > This latter problem could happen with the access-procedure > approach as well, and I believe it is not possible to create > a reference that is so short-lived that you can completely > eliminate that problem. Well, to be honest, in the access-to-procedure approach, you could at least "lock" the container while you are calling the access-to-procedure, thereby detecting the situation where deallocation/reallocation would happen. I am not saying that we should require that, but it would be a viable implementation option for situations where safety is a prime concern. On the other hand, if you return an access to a part of the container, there is no way that you can prevent erroneousness. **************************************************************** From: Randy Brukardt Sent: Wednesday, September 29, 2004 6:03 PM > Well, to be honest, in the access-to-procedure approach, you could at > least "lock" the container while you are calling the access-to-procedure, > thereby detecting the situation where deallocation/reallocation would > happen. That's an excellent idea, and one that proves a decisive advantage for the current approach. > I am not saying that we should require that, but it would be a > viable implementation option for situations where safety is a prime > concern. I'm not sure why we *shouldn't* require that. The check appears to be cheaper than the check for cursor abuse, which we've mandated. And we have to write the text about abuse in any case - we might as well use that to make it safe. In particular, while in Update_Element's Process routine, calling an operation on the same container for the same element that could modify the element would raise Program_Error. For vectors, we'd also want that to happen for any operation that could expand the vector or make the cursor ambiguous (see the Bounded Error for the definition of ambiguous). There's actually not a problem if the element is by-copy for most of those cases, but I'd rather that we didn't define the semantics based on privacy-breaking properties of the element type. And this routine will usually be used only on large, pass-by-reference objects. The only case where the check could get complex is if a Process routine called Update_Element on the same container, but a different element. We could outlaw that to make the check easy, or we could allow it and pay a price in a slightly more complex check. We need also need rules for Iterate. Checking for Iterate is more expensive or overly broad. Alternatively to mandating checks, we can make these cases bounded errors; either Program_Error is raised, or (some) element is modified. That would reduce the need to make checks to just deletions of nodes (and reallocations of vectors). I don't see any need for erroneousness. (Which should help selling this to the safety-first folks.) > On the other hand, if you return an access to a part of the > container, there is no way that you can prevent erroneousness. Right. I think that clearly states it should be left as is. **************************************************************** From: Pascal Leroy Sent: Thursday, September 29, 2004 2:06 AM > I'm not sure why we *shouldn't* require that. The check > appears to be cheaper than the check for cursor abuse, which > we've mandated. And we have to write the text about abuse in > any case - we might as well use that to make it safe. Fine with me. I see safety as of critical importance for these packages anyway. > There's actually not a problem if the element is by-copy for > most of those cases, but I'd rather that we didn't define the > semantics based on privacy-breaking properties of the element > type. Agreed. > The only case where the check could get complex is if a > Process routine called Update_Element on the same container, > but a different element. We could outlaw that to make the > check easy, or we could allow it and pay a price in a > slightly more complex check. I could go either way. Element-level locking is going to require an extra integer for each element. No big deal for big elements, but it might be a significant overhead for small elements. (You need an integer, not a boolean, to do the locking because of recursive calls.) > Alternatively to mandating checks, we can make these cases > bounded errors; Right, but I would be slightly in favor of making the behavior deterministic here. After all, the whole point of this library is that you can port your code more easily. Bounded errors can cause nasty porting problems. **************************************************************** From: Randy Brukardt Sent: Thursday, September 30, 2004 12:25 PM > I could go either way. Element-level locking is going to require an extra > integer for each element. No big deal for big elements, but it might be a > significant overhead for small elements. (You need an integer, not a > boolean, to do the locking because of recursive calls.) I was thinking of a list of locked elements in the container; much less space overhead. Calling Update_Element a second time on the same element surely should be detected, so I wouldn't try to allow recursive calls. (But even if that is allowed, a list of locked elements still would work. The list would usually be small - that is 0 or 1 elements.) Note that this is another case where an "unchecked" set of containers could be defined in a secondary standard if the overhead really matters. > > Alternatively to mandating checks, we can make these cases > > bounded errors; > > Right, but I would be slightly in favor of making the behavior > deterministic here. After all, the whole point of this library is that > you can port your code more easily. Bounded errors can cause nasty > porting problems. I agree. **************************************************************** From: Tucker Taft Sent: Thursday, September 30, 2004 12:37 PM I think mandating this is a mistake at this point. I think it requires too much careful analysis of the implementation implications. I think we should clearly define what is "evil" but forcing all implementations to catch all evil behavior is overkill, I believe. We want the user to know what behavior is portable, and we can rely on implementors to try to catch most evil behaviors, but give up when it gets too hard. By saying that the behavior is "unspecified" in the evil cases, we are hopefully making a clear indication to the user that it is non portable. **************************************************************** From: Randy Brukardt Sent: Thursday, September 30, 2004 1:04 PM > I think mandating this is a mistake at this point. > I think it requires too much careful analysis of > the implementation implications. I've already done that, and it is a fairly simple check. That, of course, depends on exactly what is prohibited. > I think we should clearly define what is "evil" but > forcing all implementations to catch all evil behavior > is overkill, I believe. I agree, but this particular case (at least the vast majority of it) is easy to check, without much space overhead. We can't reliably detect dangling cursors (I've tried, and have concluded it has to be erroneous); but this can be detected. My intent is for our implementation to detect all of the bounded error cases and most dangling cursors. Which demonstrates Pascal's point: moving from some other implementation to ours could very well cause problems, because we're detecting problems that the other implementation ignores. > We want the user to know > what behavior is portable, and we can rely on implementors > to try to catch most evil behaviors, but give up when > it gets too hard. By saying that the behavior is > "unspecified" in the evil cases, we are hopefully > making a clear indication to the user that it is > non portable. Someone (perhaps it was you) told me that "unspecified" was worse than erroneous. When I've said "unspecified" in the containers text, I really mean that some result is returned, or some exception is raised. But not that unrelated memory is overwritten, or that the command to launch the missles is sent. That implies that the containers are not compiled with checks suppressed, for instance. I wonder if we need to be a bit tighter than a blanket "unspecified". One way to do it would be to define a "corrupted container", and then to say that any operation on a corrupted container either raises some exception, never returns, or returns with any function result or "out" parameters having unspecified values. Is this worth doing? (It would improve the safety a bit.) **************************************************************** From: Matthew Heaney Sent: Thursday, September 30, 2004 1:21 PM > Is this worth doing? (It would improve the safety a bit.) Functions that return anonymous access types = yes. Adding extra safety checks = no. **************************************************************** From: Matthew Heaney Sent: Thursday, September 30, 2004 1:22 PM Obviously, I'm with Tucker. Manipulating the container (specifically, changing its cardinality) during (passive) iteration is a Bad Thing to do, but a container should not be required to detect this. I am in favor of functions returning anonymous access types that designate container elements. This is what the STL does, and this is what Charles (sort of) does. I'm not worried about dangling references, since it's no different from: declare X : Integer_Array_Access := new Integer_Array (1 .. 1); I : Integer renames X (X'First); begin Free (X); I := 42; -- dangling reference end; The guideline for programmers is to always rename the result of the function, and to declare the object in the most inner scope possible: procedure Op (L : in out List) is -- for example C : Cursor := First (L); begin if Has_Element (C) then declare I : Integer renames Query_Element (C).all; begin if I = 42 then Delete (L, C); --reference to I would be dangling here, but... end if; end; --here there is no I to reference end if; end Op; I do this all the time. For example, I have an app that uses a list as a queue. Each list element has a reference count that indicates how many objects are referring to that queue element (each object has its own list cursor). When I append a new item to the queue, or when a client decrements its own contribution to the count, I inspect the front-most item and delete it if the reference count is 0. Something like: procedure Unjoin (My_Cursor : in out Cursor) is E : Entry_Type renames To_Access (My_Cursor).all; begin E.Ref_Count := E.Ref_Count - 1; if E.Ref_Count = 0 and then My_Cursor = First (Q) then Delete (Q, My_Cursor); --any reference here to E would be dangling, but... else My_Cursor := No_Element; end if; end Unjoin; --here there is no E to reference This behavior is simply a consequence of the nature of containers, which are merely a mechanism for storing and accessing elements. It's the job of the container to stay out of the element's way. Worrying about a single element is the least of your problems, since the entire container can go away: declare C : Cursor; begin declare L : List; begin -- ... populate L C := First (L); end; Replace_Element (C, By => 42); -- oops! end; As you can see, it's quite easy to have a dangling reference, even without functions that return an anonymous access type. There are debug versions of the STL. Something like that could easily be done for the AI-302 containers. A vendor could provide a specialized version of the library (hey, I'd even write it) that detects errors such as danging cursor references, etc, but without regard for performance. When the application developer is satisfied, he can simply adjust his include path to get the performance-optimized version. **************************************************************** From: Tucker Taft Sent: Thursday, September 30, 2004 1:17 PM I agree if the implementation is manageable, and the definition is relatively short, safety is worth the effort. However, Ada doesn't always detect dangling references, though it makes an effort to minimize them. I think we need to put this AI to bed very soon, so I am reluctant to keep fiddling with it. I'm sure Randy feels the same way, so I'll trust Randy to make only "appropriate" changes at this point. I still see the decision about Update_Element vs. some kind of Element_Ptr function as up in the air. Do you feel a decision has been made one way or the other? Independent of that decision, should I put in some energy to define the accessibility level for anon access function results to at least enable safe definition of functions like Element_Ptr, if not for Containers, perhaps for other similar interfaces? As defined now, the anon-access function returns don't really provide much of any added power or safety to the language. If we can come up with a definition that allows them to be used for things *like* Element_Ptr, that would seem to give them some real added value. Guidance welcomed! **************************************************************** From: Randy Brukardt Sent: Thursday, September 30, 2004 1:42 PM > I agree if the implementation is manageable, and > the definition is relatively short, safety is > worth the effort. However, Ada doesn't always > detect dangling references, though it makes an > effort to minimize them. I think we need to > put this AI to bed very soon, so I am reluctant > to keep fiddling with it. I'm sure Randy feels > the same way, so I'll trust Randy to make only > "appropriate" changes at this point. I agree, although I'm trying to get a feeling for what appropriate is. I'd appreciate some comments on whether an unqualified "unspecified" is too broad. > I still see the decision about Update_Element vs. > some kind of Element_Ptr function as up in the air. > Do you feel a decision has been made one way or > the other? Personally, I agree with Pascal. The fact that the *possibility* exists to avoid problems with the callback versions is a significant advantage that does not exist for the version that returns an access. Moreover, at this late date, we need a strong consensus to make a change. Given that Pascal and I are against a change in this area, I don't think we have that. > Independent of that decision, should I put in some energy > to define the accessibility level for anon access > function results to at least enable safe definition > of functions like Element_Ptr, if not for Containers, > perhaps for other similar interfaces? As defined now, > the anon-access function returns don't really provide > much of any added power or safety to the language. > If we can come up with a definition that allows them > to be used for things *like* Element_Ptr, that would > seem to give them some real added value. I personally don't think it is worth it. I've been convinced that you can't completely eliminate dangling pointers, and there is no such thing as a little bit of erroneousness. :-) **************************************************************** From: Randy Brukardt Sent: Thursday, September 30, 2004 1:51 PM > I am in favor of functions returning anonymous access types that > designate container elements. This is what the STL does, and this is > what Charles (sort of) does. I'm not worried about dangling references, > since it's no different from: > > declare > X : Integer_Array_Access := new Integer_Array (1 .. 1); > I : Integer renames X (X'First); > begin > Free (X); > I := 42; -- dangling reference > end; That's the problem. This is too unsafe for many of us; Unchecked_Deallocation has that name for a reason! ... > As you can see, it's quite easy to have a dangling reference, even > without functions that return an anonymous access type. Sure, but these are also much easier to detect than those on an access type. > There are debug versions of the STL. Something like that could easily > be done for the AI-302 containers. A vendor could provide a specialized > version of the library (hey, I'd even write it) that detects errors such > as danging cursor references, etc, but without regard for performance. > When the application developer is satisfied, he can simply adjust his > include path to get the performance-optimized version. We agreed that the Madison meeting that the default for the containers would be safe, and that implementers could provide "unchecked" versions for greater performance. You have it somewhat backwards. In any case, I don't think that these checks will have much impact on performance (the main cost is a bit of additional memory per element). If you are willing to have a 99.5% detection (which I think is the best you can do anyway), just comparing a pair of integer serial numbers will detect virtually all dangling cursors. It doesn't quite catch all problems (if the memory has been turned back to the OS, you might get a fault that you can't handle; and it's possible that some other use of the memory might happen to "fake" the serial number). I intend that to be our primary implementation. If it turns out that the hit matters for some application (and that will be proven by profiling, not speculation!), it would easy enough to provide an "unchecked" version. Because we can't detect *all* such accesses means that we can't require detection in general (thus the erroneous cases for dangling cursors). But it seems silly to use that to say that we shouldn't detect the easy cases (like deleting an element that we're actively modifying). **************************************************************** From: Tucker Taft Sent: Thursday, September 30, 2004 2:22 PM I think it is fine to say "unspecified" for clearly "evil" situations. Trying to specify exactly what happens will just allow users to try to depend on the specified behavior. Randy Brukardt wrote: >>I agree if the implementation is manageable, and >>the definition is relatively short, safety is >>worth the effort. However, Ada doesn't always >>detect dangling references, though it makes an >>effort to minimize them. I think we need to >>put this AI to bed very soon, so I am reluctant >>to keep fiddling with it. I'm sure Randy feels >>the same way, so I'll trust Randy to make only >>"appropriate" changes at this point. > > > I agree, although I'm trying to get a feeling for what appropriate is. I'd > appreciate some comments on whether an unqualified "unspecified" is too > broad. **************************************************************** From: Randy Brukardt Sent: Thursday, September 30, 2004 2:44 PM I think you missed the point. The wording currently uses says that the "behavior is unspecified". Someone privately made the claim that that allows *anything*, including overwritting unrelated objects or launching the missile. There is no need to allow *that*. So my question was whether we needed to tighten up the wording so that exactly what is unspecified is more clear: The operation raises some exception; or Never returns; or Returns with unspecified values for any function results and in out and out parameters. Note that I don't want to specify the results, only that any corruption be limited to the container and parameters to operations. **************************************************************** From: Tucker Taft Sent: Thursday, September 30, 2004 3:24 PM I think any time we say "unspecified" it is possible for implementors to do something truly stupid. I don't see why we need to go out of our way to prevent that here. **************************************************************** From: Pascal Leroy Sent: Friday, October 1, 2004 7:56 AM > I think we need to > put this AI to bed very soon, so I am reluctant > to keep fiddling with it. I'm sure Randy feels > the same way, so I'll trust Randy to make only > "appropriate" changes at this point. Exactly. Based on all the traffic that I have read lately, and given the tight schedule constraints that we have to obey, here is my preference: 1 - Keep Update_Element the way it is (with access-to-subprogram Process), and don't provide a version exposing pointers. This has been discussed extensively in Phoenix, and the group disliked the pointer version (one Tucker Taft in particular was quite vocal). Granted, some language limitations have been lifted by other AIs, but I don't see that it significantly affects the Phoenix decision. Plus, the access-to-subprogram version can be used to build a "safe" container, the other cannot. 2 - Don't require the container to be safe in the face of updates occurring during a call to Update_Element. It's OK to let implementations compete on the level of checking they do. 3 - Don't try to specify what is a "corrupt" container or what happens when you operate on such a container. Just use "unspecified" as in the current write-up. 4 - Give up on the idea of infinite accessibility depth for function results. It's just too late for such a change. We don't have the time necessary to work out the implications of that change. In particular, the assume-the-worst rules for access parameters could significantly reduce the usefulness of functions returning an anonymous access type. AI 318 has been quite contentious in the past; don't rock the boat. I am going to ask Randy to update AI 302 according to 1, 2, and 3 above. Also, AI 318 was approved with changes at the last meeting, so I am going to send it to WG9 in November (after editorial review), unless someone asks for a letter ballot. Sorry, folks, but we have to draw the line at some point. **************************************************************** From: Robert A. Duff Sent: Friday, October 1, 2004 1:01 PM Pascal said: > 1 - Keep Update_Element the way it is (with access-to-subprogram > Process), and don't provide a version exposing pointers. This has been > discussed extensively in Phoenix, and the group disliked the pointer > version (one Tucker Taft in particular was quite vocal). Granted, some > language limitations have been lifted by other AIs, but I don't see that > it significantly affects the Phoenix decision. Plus, the > access-to-subprogram version can be used to build a "safe" container, > the other cannot. If we use the pass-a-procedure approach (which I still don't like, for reasons already stated), then we need to decide whether it's OK to modify the container during that procedure. Pascal is saying here, "No, but implementations need not check." But note that gave an example that would suggest otherwise. He was using the return-a-pointer method, but the issue is the same. Basically, his example was to do a lookup, returning a pointer, and then delete that element (under some circumstances). The deletion happens (just) before the pointer goes out of scope, but the deletion is the last reference to that pointer. We need to decide whether that's a reasonable thing to do (I think so). If so, we shouldn't say (in the pass-a-proc method) it's an error to modify the container during the passed procedure. Or (in the return-a-ref method) that it's an error to modify the container while that pointer still exists. > Sorry, folks, but we have to draw the line at some point. True. **************************************************************** From: Randy Brukardt Sent: Friday, October 1, 2004 2:55 PM ... > If we use the pass-a-procedure approach (which I still don't like, for > reasons already stated), then we need to decide whether it's OK to > modify the container during that procedure. Pascal is saying here, "No, > but implementations need not check." But note that gave an example that > would suggest otherwise. He was using the return-a-pointer method, but > the issue is the same. Basically, his example was to do a lookup, > returning a pointer, and then delete that element (under some > circumstances). The deletion happens (just) before the pointer goes out > of scope, but the deletion is the last reference to that pointer. > > We need to decide whether that's a reasonable thing to do (I think so). > If so, we shouldn't say (in the pass-a-proc method) it's an error to > modify the container during the passed procedure. Or (in the > return-a-ref method) that it's an error to modify the container while > that pointer still exists. It's absolutely unreasonable in the pass-a-proc situation, because you have the object, not a pointer to it. You can't even get access to the cursor in order to do a delete without standing on your head (you'd have to an uplevel access to it). That's very different than the return-a-pointer method, where you're in the same scope. Update_Element is intended for updates to the element. Period. Doing anything else in the Process procedure means that you are on very thin ice. If you want to delete the element, do that after you leave Update_Element (that's what the return-a-pointer version is doing after all). That's why the cases are very different, and why the pass-a-proc is preferred. Moreover, we do not want to add any erroneous cases here, and there is no need to do so. (On this point, I disagree with Pascal's resolution; it will take *more* text to make these cases erroneous, and it will save very little in terms of implementation. It also takes the rules out of line of the Update_Element definition, which isn't good either.) Note that none of this applies to Iterate, which passes a cursor. We have extensive rules about cursors (dangling and otherwise), and moreover it makes perfect sense to delete some records while iterating over them. But we do have to say the order of iteration is unspecified if the container is modified by the Process routine. **************************************************************** From: Pascal Leroy Sent: Saturday, October 2, 2004 3:58 AM > Moreover, we do not want to add any erroneous cases here, and > there is no need to do so. (On this point, I disagree with > Pascal's resolution; it will take *more* text to make these > cases erroneous, and it will save very little in terms of > implementation. It also takes the rules out of line of the > Update_Element definition, which isn't good either.) Technically I agree with you. I was under the impression that there was no consensus on this topic, however, and surely we cannot afford to make sizeable semantic changes to this AI in Atlanta. It would be good to hear what other people feel. I know that some members have expressed misgivings about the safety of the containers. Now would be a good time to speak up... **************************************************************** From: Randy Brukardt Sent: Saturday, October 2, 2004 7:37 PM Now having actually tried to write the wording, I've concluded my statement above is incorrect. We have to have an erroneous wording in any case, to cover for the possibility of someone calling Finalize or Unchecked_Deallocation on the container. While that is as likely as Pam Anderson marrying me, we have to cover it, and if it happened, the parameter to Process would be unstable as the memory would be possibly reused. OTOH, detecting deletions of the element itself can be done fairly cheaply, and would catch virtually all of the real problems. I'll include a complete write-up in the "list of changes" document for the update. **************************************************************** From: Matthew Heaney Sent: Tuesday, October 5, 2004 9:44 AM A question about the exact semantics of the cursor-based Swap operation for vectors and lists came up during my review of the Madison API. The operation looks like this: procedure Swap (I, J : Cursor); The question was whether cursors I and J are allowed to swap elements from different containers. I think the answer is no, it's not allowed, and Program_Error is raised if you try. But I'm not sure, so I wanted to ask for clarification. The reason the question came up is that the semantics of Swap are defined as follows: procedure Swap (I, J : Cursor) is EI : constant Element_Type := Element (I); begin Replace_Element (I, Element (J)); Replace_Element (J, EI); end; There's nothing in this algorithm that would prohibit I and J from designating elements in different containers. **************************************************************** From: Pascal Leroy Sent: Tuesday, October 5, 2004 10:19 AM The minutes are very clear, P_E is raised. **************************************************************** From: Matthew Heaney Sent: Tuesday, October 5, 2004 9:28 AM I just finished reviewing the changes to the AI-302 draft that we made in Madison, and I had a question that Randy suggested I post to the ARG discussion list. I didn't think of it during the meeting, but the C++ STL (on which AI-302 is largely based) uses the name "reserve" for the operation we named Ensure_Capacity. We already have a function "Capacity" that has the same name and semantics as the "capacity" vector member function in the STL. The operation "Ensure_Capacity" has the same semantics as the "reserve" member function in the STL, but it has a different name. My question was whether the name "Reserve" would have been a better name than "Ensure_Capacity", if only to avoid any unnecessary differences between to the two APIs. Any opinions? **************************************************************** From: Robert Dewar Sent: Tuesday, October 5, 2004 9:30 AM I am in favor of changing the name to Reserve. Ensure_Capacity is a bit odd anyway. I certainly don't call a hotel to ensure capacity for my upcoming stay :-) **************************************************************** From: Pascal Leroy Sent: Tuesday, October 5, 2004 10:09 AM But with this analogy Ensure_Capacity is better, because the semantics are really that the hotel builds new room if you come with many friends. I could go either way, but I have a preference for having the word capacity somewhere in the name of this operation. Otherwise it is not obvious that Reserve and Capacity are related. **************************************************************** From: Matthew Heaney Sent: Tuesday, October 5, 2004 10:24 AM But you can always say at the point of call: Reserve (V, Capacity => N); **************************************************************** From: Randy Brukardt Sent: Tuesday, October 5, 2004 10:47 AM When Matt suggested this to me, I thought he meant Reserve_Capacity which seemed better (at least on first thought) than Ensure_Capacity. But I don't like "Reserve" by itself, either. **************************************************************** From: Robert I. Eachus Sent: Tuesday, October 5, 2004 4:00 PM I think that makes more sense. Reserve can either be a noun or a verb. Reserve_Capacity isn't completely unambiguous, but it is unlikely anyone will misunderstand the intent. **************************************************************** From: Pascal Leroy Sent: Wednesday, October 6, 2004 1:55 AM I agree that Reserve_Capacity is better than Ensure_Capacity. **************************************************************** From: Matthew Heaney Sent: Monday, October 4, 2004 7:03 PM The vectors package has operations like: function To_Vector (Count : Count_Type) return Vector; function To_Vector (Item : Element_Type; Count : Count_Type) return Vector; function "&" (L, R : Vector) return Vector; ...etc The set package as operations like: function Union (L, R : Set) return Set; function Intersection (L, R : Set) return Set; ...etc That is, each of these tagged types has primitive operations that return the type. (I refer to functions that return the type as "constructors", or "ctors" for short.) This means that during a derivation, the derived type must either be declared as abstract, or the constructors must be overridden. This is fine if your derivation is public, and the derived type is intended to be used as a member of that class: with Integer_Vectors; package P is type T is new Integer_Vectors.Vector with private; ... end P; In this case, you probably wouldn't object too much to overriding the ctors. However, what I often do is to implement the full view of a type as a private derivation, like this: package Q is type T is private; ... private package Integer_Vectors is new Vectors (Integer); type T is new Integer_Vectors.Vector with null record; end Q; The issue here is that I am forced to override the vector ctors that were inherited by T. But here I don't really care about those operations, which aren't used to implement T, so no one will call them anyway. I don't know whether this is an issue. After all, you can implement the full view of T as a record, and declare V as a component. One possibility is to arrange for the ctors to be non-primitive, something like: generic ... package Vectors is type Vector is tagged private; package Constructors is function To_Vector (Count : CT) return Vector; function To_Vector (Item : ET; Count : CT) return Vector; ... end Constructors; ... end Vectors; In the declaration above, the constructor operations aren't primitive for type Vector, and so aren't inherited. However, this does mean that to use a constructor, you have to make those operations visible: declare V1 : Vector := Constructors.To_Vector (N); use Constructors; V2 : Vector := To_Vector (N); begin ... end; The issue was on my mind, and so I just wanted to see whether anyone else had an opinion on the matter. **************************************************************** From: Tucker Taft Sent: Tuesday, October 4, 2004 7:58 PM I agree that sometimes it makes more sense to make constructors non-primitive. Putting the constructor operations in a child package called "Factory" or equivalent is something we do here pretty frequently. However, I wouldn't consider "&" and Union/Intersection to be constructors. They are operators, and they have operands of the type. I think the kind of constructors that belong in a factory are typically very closely tied to the underlying implementation of the type, rather than to its abstract interface. To_Vector is somewhat a borderline case, since it makes sense almost independently of the way the vector is implemented. On the other hand, if there were a constructor that took the initial capacity, and perhaps a specification of how much to expand on each expansion, etc., that would seem very closely tied to a particular implementation, and would belong in a factory child, or equivalent. **************************************************************** From: Pascal Leroy Sent: Tuesday, october 5, 2004 2:47 AM > In this case, you probably wouldn't object too much to > overriding the ctors. The constructors are fine as they are, i.e., they should be primitive. I think the example above is the important one, and we want it to work right. If we didn't expect users to extend the containers, we wouldn't have made them visibly tagged, right? If To_Vector is not primitive, you have no way to create a T with Count elements. In particular, you cannot use an extension aggregate, because T is a private extension. If you believe that To_Vector is generally useful (and you do) then there is no reason why it wouldn't be useful for T, too. And yes, you'll have to override it, but that's no big deal. > However, what I often do is to implement the full view of a > type as a private derivation, like this: > > package Q is > type T is private; > ... > private > > package Integer_Vectors is new Vectors (Integer); > > type T is new Integer_Vectors.Vector with null record; end Q; > > The issue here is that I am forced to override the vector > ctors that were inherited by T. But here I don't really care > about those operations, which aren't used to implement T, so > no one will call them anyway. You tell me that the constructors aren't used to implement T, but I have no reason to believe you. I can imagine many reasons why you would want to call some of the constructors for T in the body of Q. For instance, to implement "&", you could first call To_Vector to create an empty vector to contain the result of the catenation. If you know that you will never call a given constructor (including by dispatching) then you can probably write a 2-line body that returns a dummy value. No big deal. Also remember that in general we want controlling results to work right. It would be very confusing if (as suggested by Tuck) some functions were primitives and others were not. This would be likely to cause mysterious Tag_Errors. **************************************************************** From: Florian Weimer Sent: Tuesday, October 5, 2004 4:32 AM > I think the example above is the important one, and we want it to work > right. If we didn't expect users to extend the containers, we wouldn't > have made them visibly tagged, right? I thought the main motivation for making them tagged was to enable the industry-standard method invocation syntax for them. **************************************************************** From: Pascal Leroy Sent: Wednesday, October 6, 2004 1:55 AM This was not the "main motivation", although this was certainly one aspect discussed when this decision was made in Palma. The motivation was that tagged types are more flexible in many respects (in particular, guess what, you can extend them) and since the implementation has to be a controlled type anyway, we might as well expose the tagged-ness to the user. With the addition of interfaces, I would actually expect that mixins involving containers and user-defined interfaces would be quite common in programs making heavy use of the OOP paradigm. **************************************************************** From: Matthew Heaney Sent: Wednesday, October 6, 2004 12:25 PM The reasoning was that since these are tagged anyway (because the type must privately derive from Controlled), then we might as well make them publicly tagged. This has benefits besides allowing type extension, for example, tagged type subprogram parameters are implicitly aliased, and you can use distinguished-receiver syntax. However, there is a cost, and that is when a derivation occurs, you must override all the primitive functions that return the type. This is a pain, when you simply want the convenience of implementing some other type as a private derivation from vector (say): package P is type T is private; ... private type T is new Vector_Types.Vector with null record; --must override To_Vector, etc end P; The locution above is a common Ada idiom (in fact, it's how all the containers are implemented). In this case, however, the convenience of deriving from Vector is outweighed by having to override 6 vector functions, none of which are needed to implement T. It is in this sense that making the vector type tagged isn't free. I support the decision to make the containers tagged (for the two benefits I list above), but my concern is the cost of derivation. But maybe container derivation isn't common enough to worry about. You can eliminate this cost by making functions that return the type non-primitive, but that of course has its own costs. If you're doing any kind of polymorphic programming, you wouldn't be able to dispatch on the tag of the function result. But then again, polymorphic programming of containers seems a little bizarre, so maybe it isn't common enough to worry about. > With the addition of interfaces, I would actually expect that mixins > involving containers and user-defined interfaces would be quite common in > programs making heavy use of the OOP paradigm. I am skeptical. In the vast majority of cases, the type of the container is known statically. I can't imagine why anyone would ever need a polymorphic class of, say, integer vectors. Any mixing of containers can be done entirely using static mechanisms (cursors and iterators). **************************************************************** From: Randy Brukardt Sent: Wednesday, October 6, 2004 12:40 PM > I support the decision to make the containers tagged (for the two > benefits I list above), but my concern is the cost of derivation. But > maybe container derivation isn't common enough to worry about. Well, if the containers weren't tagged, then there wouldn't be any (useful) derivation. So this is only saying that it isn't quite as easy to derive as we might like. The choice really seems to be between not allowing derivation at all or having it be more painful that we'd like. But moving various operators into a child package (which would require a separate instantiation) or a nested package seems weird. That's especially true for sets; do we really want "Union" to be non-primitive? (And then again you couldn't use the prefix notation for calls.) **************************************************************** From: Matthew Heaney Sent: Wednesday, October 6, 2004 1:02 PM > Well, if the containers weren't tagged, then there wouldn't be any (useful) > derivation. I gave an example of what I consider to be a very useful derivation: package P is type T is private; ... private type T is new Vector_Types.Vector; --no extension if not tagged end P; I bring this up because I actually attempted to declare a type as above, but ended up abandoning that approach when I was forced to override the primitive functions. But it's no big deal, since I was able to solve the problem another way. **************************************************************** From: Nick Roberts Sent: Tuesday, October 5, 2004 11:58 AM Would it make sense now to allow conversion towards an extension type? A conversion: TT(X) where X was an object of tagged type T, and T was an ancestor type of TT, could be defined as being equivalent to: TT'(X with others => <>) thus requiring any extension components to have default intialisations. It would then be possible to rescind the rule that a primitive function with a controlling result is abstract or must be overridden [RM2K 3.9.3 (4-6)]. Well, I guess so, anyway. This was always a bit of an awkward rule, wasn't it? **************************************************************** From: Robert A. Duff Sent: Tuesday, October 5, 2004 1:15 PM > Would it make sense now to allow conversion towards an extension type? Sounds dangerous, to me. > A conversion: > > TT(X) > > where X was an object of tagged type T, and T was an ancestor type of TT, > could be defined as being equivalent to: > > TT'(X with others => <>) > > thus requiring any extension components to have default intialisations. There's no such requirement, as far as I know. That is, "with others => <>" means "use the default, if any". For integer components with no explicit ":=...", it means "default initialize it to any old garbage". So there's some value in requiring an explicit "<>" when that is what is wanted. Besides, downward conversions are view conversions, and are allowed only for class-wide operands, and there's a tag check. Making TT(X) equivalent to an extension aggregate doesn't fit in well with that. > It would then be possible to rescind the rule that a primitive function > with a controlling result is abstract or must be overridden [RM2K 3.9.3 > (4-6)]. Well, I guess so, anyway. This was always a bit of an awkward rule, > wasn't it? I wouldn't say it's awkward -- it's necessary, to make sure the extension components are not forgotten. I suppose it might make sense to rescind that rule when the extension is "with null record". I think we considered that during the Ada 9X design, but decided it wasn't worthwhile to have such a special case. **************************************************************** From: Nick Roberts Sent: Wednesday, October 6, 2004 5:50 AM Robert A Duff wrote: >>Would it make sense now to allow conversion towards an extension type? > > Sounds dangerous, to me. > ... > Besides, downward conversions are view conversions, and are allowed only > for class-wide operands, and there's a tag check. Making TT(X) > equivalent to an extension aggregate doesn't fit in well with that. I actually agree, on reflection. > I wouldn't say it's awkward -- it's necessary, to make sure the > extension components are not forgotten. I suppose it might make sense > to rescind that rule when the extension is "with null record". I think > we considered that during the Ada 9X design, but decided it wasn't > worthwhile to have such a special case. I think maybe it would be worthwhile, on the grounds that it isn't, in fact, such a special case. Perhaps the ARG should consider this again? **************************************************************** From: Tucker Taft Sent: Wednesday, October 6, 2004 10:16 AM > I think maybe it would be worthwhile, on the grounds that it isn't, in > fact, such a special case. Perhaps the ARG should consider this again? Even for a null extension, there might be additional operations which are not present in the dispatch table of the type of the object. This really doesn't work. What you want is an extension aggregate. **************************************************************** From: Randy Brukardt Sent: Wednesday, October 6, 2004 12:58 PM I've posted the updated AI-302-3 (Containers) to the web site. Find it through http://www.ada-auth.org/ais.html (the file name is AI-20302.TXT) [This is version /07 of the AI - ED] There is a list of changes beyond those discussed in Madison at the end of the !appendix section (the very end of the AI file). Comments are welcome, but keep in mind that we're planning to approve the AI at the meeting next month (else it may not make the Amendment). So major overhauls aren't practical. We're just looking to improve the details at this point. **************************************************************** From: Jeff Carter Sent: Wednesday, October 6, 2004 1:49 PM I have a minor complaint about the ordering of operations in the package specifications. When using a container, I generally want to know how to create one, how to put things in it, how to access things that are in it, and how to delete things from it. The next most common thing is to make it empty. Therefore, I think these operations should come first in the specs. Right now they tend to be scattered around, separated by less common operations. For example, looking at vectors, is "&" really more common than Insert, Update, Replace (these 2 seem to be the same, so I don't know why the names differ), and Delete? One also needs to be able to determine locations within containers (cursors and indices for vectors) in order to do these common operations, so operations that provide locations should also be among the first in the spec. The container library proposed seems to be significantly different from the STL, not to mention much smaller. The references to the STL in the AI do not seem to add anything, and should probably be eliminated. **************************************************************** From: Tucker Taft Sent: Thursday, October 7, 2004 5:59 AM I hate to weigh in on this now, but... It is a fairly common paradigm to instantiate a package and then use derivation to bring the type into the current scope. E.g: package T_Vecs is new Vectors(T); type T_Vec is new T_Vecs.Vector; Making the type tagged and giving it various operations that are functions returning the type does defeat this approach (since "with null record;" wouldn't work without having to override all of the functions). There seem to be a few alternatives to deal with this: a) live with it as is b) define "type NT is new T with null record;" to provide default implementations of such functions by implicitly providing an extension aggregate at the point of call, e.g.: "Union(X,Y)" for NT is equiv to "NT'(Union(T(X),T(Y)) with null record)" c) make types untagged, and support object.op syntax on untagged record and private types. I kind of like option (b) as we don't want to force the use of untagged types to enable this paradigm. [ASIDE: This issue also reminds me of the problem we never fixed: type T is private; function "+"(X, Y: T) return T; private type T is new Integer; function "+"(X, Y: T) return T renames <>; -- inventing here It is pretty often that you want a private type to expose some but not all of the operations of the full type. Some kind of renaming would be great. Right now, you have to write wrappers for each such operation, which is a bit of a pain. End of ASIDE.] **************************************************************** From: Nick Roberts Sent: Thursday, October 7, 2004 8:57 AM This does all seem to be suggesting the introduction of a new form of declaration, the 'default completion'. default_completion ::= subprogram_specification [IS subprogram_default] ; obviously similar to a formal subprogram declaration. This declaration would be allowed anywhere a subprogram body is allowed, and would form the completion of a subprogram, declared in the visible part of a package, which is a primitive operation of a private type T. The type declaration in the private part of the package must be a type derivation declaration; let the type from which it is derived be called P. The subprogram's body would be formed from the body of the corresponding operation of P, with every parameter of type P converted to T, and each other occurrance of P replaced by T. The conversion would be a view conversion for non-tagged types, and a conversion of the form Tuck suggested for a tagged type which added no extra components. Default completions would be disallowed for a tagged type which did add components. The idea is that we could derive from a container thus: package Foo is package T_Vecs is new Ada.Containers.Vectors(T); type T_Vec is private; function Length (Container : T_Vec) return Count_Type; function Is_Empty (Container : T_Vec) return Boolean; procedure Clear (Container : in out T_Vec); procedure Append (Container : in out T_Vec; New_Item : in T_Vec); ... -- other vector operations we want to expose private type T_Vec is new T_Vecs.Vector with null record; ... end; package body Foo is ... function Length (Container : T_Vec) return Count_Type is <>; function Is_Empty (Container : T_Vec) return Boolean is <>; procedure Clear (Container : in out T_Vec) is <>; procedure Append (Container : in out T_Vec; New_Item : in T_Vec) is <>; ... end Foo; We still have to explicitly declare the operations we wish to inherit, and their completions (in the package body), but at least the form of the completions is succinct and clear (making it explicit that the operations are direct copies). We could also have: package Bar is type T is private; function "+"(X, Y: T) return T; ... private type T is new Integer; ... end; package body Bar is function "+"(X, Y: T) return T is <>; ... end Bar; **************************************************************** From: Dan Eilers Sent: Thursday, October 7, 2004 4:35 PM > [ASIDE: This issue also reminds me of the problem we never fixed: > > type T is private; > function "+"(X, Y: T) return T; > private > type T is new Integer; > function "+"(X, Y: T) return T renames <>; -- inventing here > > It is pretty often that you want a private type to expose > some but not all of the operations of the full type. > Some kind of renaming would be great. Right now, > you have to write wrappers for each such operation, > which is a bit of a pain. End of ASIDE.] I am in favor of fixing this problem, having seen real customer code that did this. Note that there is a hazard in trying to write the wrapper workaround, in that it is easy to accidentally infinitely recurse. **************************************************************** From: Dan Eilers Sent: Thursday, October 7, 2004 4:57 PM > Making the type tagged and giving it various operations > that are functions returning the type does defeat this > approach (since "with null record;" wouldn't work without > having to override all of the functions). > > There seem to be a few alternatives to deal with this: ... Another alternative might be to allow type renaming: package T_Vecs is new Vectors(T); type T_Vec renames T_Vecs.Vector; with the semantics that you want. **************************************************************** From: Martin Krischik Sent: Friday, October 8, 2004 4:01 AM I allwas though that typerenaming where not added because subtypes do the same. subtype T_Vec is T_Vecs.Vector; Mind you: In my fist Ada month I did actually try to rename types I wonder why it was not possible. Then the textbook told me that "subtype" is the thing to do. > with the semantics that you want. Saying the subtypes and typerenaming are supposed to be sematicly the same I wonder if typerenaming should not be allowed as a syntax option since it would be often closed to want the programmer wants to express. **************************************************************** From: Tucker Taft Sent: Friday, October 8, 2004 10:24 PM Type renaming creates a realm of thorny issues, relating for example, to where primitives are (re)declared. I don't believe this is a time to open up discussion of a completely new syntactic feature. On the other hand, I think we do need to be sensitive to whether there are some minor "tweaks" of the various proposals that will make them work better together. **************************************************************** From: Christoph Grein Sent: Thursday, October 7, 2004 6:24 AM > b) define "type NT is new T with null record;" to > provide default implementations of such functions > by implicitly providing an extension aggregate at the > point of call, e.g.: "Union(X,Y)" for NT is equiv to > "NT'(Union(T(X),T(Y)) with null record)" I like this proposal. > [ASIDE: This issue also reminds me of the problem we never fixed: > > type T is private; > function "+"(X, Y: T) return T; > private > type T is new Integer; > function "+"(X, Y: T) return T renames <>; -- inventing here > > It is pretty often that you want a private type to expose > some but not all of the operations of the full type. > Some kind of renaming would be great. Right now, > you have to write wrappers for each such operation, > which is a bit of a pain. End of ASIDE.] And I have often grumbled about this, too, and like Tuck's invention, which is fully upward compatible. But I fear it's too late for Ada0Y. **************************************************************** From: Nick Roberts Sent: Thursday, October 7, 2004 10:55 AM I wrote: > default_completion ::= > subprogram_specification [IS subprogram_default] ; Obviously I should have written: default_completion ::= subprogram_specification IS subprogram_default ; **************************************************************** From: Matthew Heaney Sent: Monday, October 25, 2004 11:36 AM Tucker Taft wrote: > > It is a fairly common paradigm to instantiate a package > and then use derivation to bring the type into the > current scope. E.g: > > package T_Vecs is new Vectors(T); > type T_Vec is new T_Vecs.Vector; > > Making the type tagged and giving it various operations > that are functions returning the type does defeat this > approach (since "with null record;" wouldn't work without > having to override all of the functions). I have been corresponding with someone who is teaching a class in data structures. He was trying to use the vector container to implement a stack: generic type ET is private; with function "=" (L, R : ET) is <>; packcage Stacks is type Stack is private; procedure Push (Container : in out Stack; New_Item : in ET); procedure Pop (Container : in out Stack); private package ET_Vectors is new Ada.Containers.Vectors (Positive, ET, "="); type Stack is new ET_Vectors.Vector with null record; --won't compile as is end Stacks; He was confused about the compiler error messages, stating that he had to override the function To_Vector, etc. (He knows Ada83, but he's still learning Ada95.) I bring this up as a real-life example of the fact that private derivation is a very natural Ada idiom, since to him that was the most obvious solution. **************************************************************** From: Ehud Lamm Sent: Monday, October 25, 2004 11:56 AM When I first started doing this sort of thing in Ada I was confused myself, as I am pretty sure my studetns will be. Alas, I don't see a good solution. The child package approach seems to be as confusing, if not more. **************************************************************** From: Dan Eilers Sent: Monday, October 25, 2004 12:21 PM Isn't "type renaming" exactly the solution you're looking for? It seems you don't really want to make Stack a new type derived from ET_Vectors.Vector, instead you want to say that Stack is implemented by, or in other words, renames ET_Vectors.Vector. **************************************************************** From: Matthew Heaney Sent: Saturday, October 9, 2004 12:47 AM I just had a quick(?) question about dope vectors for arrays. Suppose I have this declaration: declare type String_Access is access all String; S : aliased String (1 .. 10); X : String_Access := S'Access; --not legal Ada95 begin The declaration of X is illegal, since S doesn't have a dope vector. I can do this: declare type String_Access is access all String; S : aliased String := String'(1 .. 10 => ' '); X : String_Access := S'Access; --OK begin Was this behavior liberalized in Ada 2005? Is there a way for the programmer to say: "give me a dope vector for this array object", so that the former declaration would be legal? What about a record component: type RT (N : Natural) is record S : aliased String (1 .. N); end record; R : RT; X : String_Access := R.S'Acccess; I was thinking about the functions-that-return-anonymous-access-types, to handle this case: type T is limited private; function S (O : access T) return access String; private type T is limited record S : aliased String (1 .. 10); --or maybe this is a discriminant end record; function S (O : access T) return access String is begin return O.S'Access; end; ... declare O : aliased T; SS : String renames S (O'Access).all; begin Just curious... **************************************************************** From: Tucker Taft Sent: Saturday, October 9, 2004 11:19 AM > Was this behavior liberalized in Ada 2005? No. > ... Is there a way for the > programmer to say: "give me a dope vector for this array object", so that > the former declaration would be legal? No. > > What about a record component: > > type RT (N : Natural) is record > S : aliased String (1 .. N); > end record; > > R : RT; > X : String_Access := R.S'Acccess; No. You can't get there from here. > ... > Just curious... Good question, but we didn't make fixing this a priority. There is no obvious fix other than to say that all aliased arrays must have dope vectors pre-allocated in a way that would allow an access-to-unconstrained pointer to point at them. That probably would have been the right answer in Ada 95, in retrospect, but changing it now could break some working code in bizarre ways, as it would require a change in representation for existing data types. **************************************************************** From: Matthew Heaney Sent: Thursday, October 21, 2004 11:28 PM !standard A.17 04-10-04 AI95-00302-03/07 !subject Container library My review of the post-Madison AI-302 draft follows. As usual, each comment is bracketed with "MJH:" and "ENDMJH." pairs, and immediately follows the text to which it refers. I haven't copied the entire AI draft here. Rather, I give just enough context to determine the relevant section. I can summarize most of my comments as: (1) If we decide to keep the new cursor-based replace operation for sets, then it should be named "Replace_Element", not "Replace". (2) We need to get rid of the key-based replace operation for sets, the operation named "Replace" in the nested package Generic_Keys. (3) The set operation named "Checked_Update_Element" declared in the nested package Generic_Keys should be named just "Update_Element". (4) We need to get rid of the requirement that an implementation detect container modification while passive iteration is in progress. This requirement is unnecessary since we already have a meta-rule that says container behavior is unspecified if a container object is simultaneously read from and written to. This rule of course applies even if it's the same task doing the reading and writing. (5) This API needs to state unambigously that a container implementation must support multiple tasks simultaneously reading from a container object. In particular it is perfectly legal for multiple tasks to simulaneously perform passive iteration over a container. (6) This API needs(?) to state that there are no container operations that are "potentially blocking." A.17 Containers ... Note that the language already includes several requirements that are important to the use of containers. First, library packages must be reentrant - multiple tasks can use the packages as long as they operate on separate containers. Thus, it is only necessary for a user to protect a container if a single container needs to be used by multiple tasks. MJH: We need to be clear here about multithreading issues, since that last sentence is wrong. The only problem case is when there are multiple writers, or a single writer and one or more readers. (The reader and writer can also be the same task.) It is definitely *not* an error for multiple readers to access the same container all simultaneously. In particular, it is perfectly acceptable (in fact, the API is designed to facilitate this) for multiple tasks to be iterating over a same container object, using either cursors or the passive iterator. We already have a rule that says container behavior is not defined when a container is simultaneously written to and read from. The rule applies whether it is one task or more than one task. ENDMJH. Second, the language requires that language-defined types stream "properly". That means that the stream attributes can be used to implement persistence of containers when necessary, and containers can be passed between partitions of a program. ... A.17.2 The Package Containers.Vectors ... package Ada.Containers.Vectors is ... function To_Vector (Length : Count_Type) return Vector; function To_Vector (New_Item : Element_Type; Length : Count_Type) return Vector; function "&" (Left, Right : Vector) return Vector; function "&" (Left : Vector; Right : Element_Type) return Vector; function "&" (Left : Element_Type; Right : Vector) return Vector; function "&" (Left, Right : Element_Type) return Vector; MJH: We have already discussed the fact making these functions primitive means that they must be overridden during a derivation, since the function return type is Vector. (There is a similar issue for sets.) This is kind of a pain, since it's very common to implement the full view of a private type as a null extension of some other (tagged) type, or to derive from a type in order to bring its primitive operations into local scope. We can either live with this feature, arrange to make these operations non-primitive, or modify the language such that, say, a null extension inherits a default implementation of these functions. ENDMJH. ... procedure Delete (Container : in out Vector; Index : in Extended_Index; Count : in Count_Type := 1); MJH: See my comments below about the subtype and semantics of parameter Index for the index-based Delete operation. ENDMJH. ... generic with function "<" (Left, Right : Element_Type) return Boolean is <>; procedure Generic_Sort (Container : in Vector); MJH: Another operation that might be useful is a binary search over a sorted vector: generic with function "<" (Left, Right : Element_Type) return Boolean is <>; function Generic_Binary_Search (Container : Vector; Item : Element_Type) return Extended_Index; (It's just an idea...) ENDMJH. ... procedure Delete (Container : in out Vector; Index : in Extended_Index; Count : in Count_Type := 1); If Count is 0, the operation has no effect. If Index does not specify a value in the range First_Index (Container) .. Last_Index (Container), then Constraint_Error is propagated. Otherwise Delete slides the active elements (if any) starting Index plus Count down to Index. Any exceptions raised during element assignment are propagated. MJH: The semantics wrt the Index parameter are arguably inconsistent with the semantics of the cursor-based delete. Any index value outside of the range IT'First .. C.Last is technically the same as "not Has_Element", so you could make an argument that it should be treated the same as the cursor-based delete (meaning that it should be a no-op). ENDMJH. ... procedure Swap (I, J : in Cursor); If either I or J is No_Element, then Constraint_Error is propagated. If I and J designate elements in different containers, then Program_Error is propagated. Otherwise Swap exchanges the values of the elements designated by I and J. MJH: The ARG needs to confirm whether the second sentence is really correct. The semantics of Swap are equivalent to: procedure Swap (I, J : Cursor) is EI : constant ET := Element (I); begin Replace_Element (I, By => Element (J)); Replace_Element (J, By => EI); end; There's nothing here that would preclude I and J from designating elements in different containers, so it's not clear why this an error. ENDMJH. ... procedure Iterate (Container : in Vector; Process : not null access procedure (Position : in Cursor)); Invokes Process.all with a cursor that designates each element in Container, in index order. Any exception raised by Process is propagated. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. There is no check needed if an attempt is made to insert or delete nothing (that is, Count = 0 or Length(Item) = 0). The check is easy to implement: each container needs a counter. The counter is incremented when Iterate is called, and decremented when Iterate completes. If the counter is nonzero when an operation that inserts or deletes is called, Finalize is called, or one of the other operations in the list occurs, Program_Error is raised. Swap and Generic_Sort are not included here, as they only copy elements. End AARM Notes. MJH: Au contraire: the counter-based check described above isn't adequate for detecting container modification during passive iteration, since it won't work in the presence of multiple reader tasks. We already have a rule that says container behavior is undefined if a container is simultaneously read from and written to. This rule applies even if it's the same task doing the simultaneous reading and writing. Hence the requirement above is entirely superfluous, and therefore it should be removed. A cursor does not need to confer any safety benefits beyond what an access type provides. (This is especially true for a vector, which implements a cursor as a wrapper around an index.) ENDMJH. ... A.17.3 The Package Containers.Doubly_Linked_Lists ... procedure Swap (I, J : in Cursor); If either I or J is No_Element, then Constraint_Error is propagated. If I and J designate elements in different containers, then Program_Error is propagated. Otherwise Swap exchanges the values of the elements designated by I and J. AARM Notes: After a call to Swap, I designates the element value previously designated by J, and J designates the element value previously designated by I. The cursors do not become ambiguous from this operation. AARM Notes: To Be Honest: The implementation is not required to actually copy the elements if it can do the swap some other way. But it is allowed to copy the elements if needed. MJH: The ARG needs to confirm the behavior when I and J designate elements in different containers. See my comment above for vectors. ENDMJH. ... procedure Iterate (Container : in List; Process : not null access procedure (Position : in Cursor)); Invokes Process.all with a cursor that designates each node in Container. Any exceptions raised during Process are propagated. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all calls a routine that reorders the elements of Container (Swap_Links, Splice, Generic_Sort, or Generic_Merge); or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. There is no check needed if an attempt is made to insert or delete nothing (that is, Count = 0). See Iterate for vectors for a suggested implementation of the check. Swap is not included here, as it only copies elements. End AARM Notes. MJH: This requirement is redundant, since we already have a meta-rule that says container behavior isn't specified if the container object is simultaneously read from and written to. This rule applies even if it's the same task doing the reading and writing (as would be the case when a container is modified during passive iteration). Note that the suggested implementation of the check doesn't work when there are multiple reader tasks. A container must support simultaneous reading by multiple tasks. At the end of the day, it doesn't really matter whether the container is modified during iteration, as long as next node can be reached safely, and the iteration eventually terminates. This is especially true for a list. If a user decides to sort the list during iteration, and then iterate delivers items in some different order, then what's the problem? (It's certainly not a problem in the reference implementation.) The problem case is deleting nodes during iteration. If the user deletes the current node, then the iterator might not be able to find the next node (since the next node pointer was stored on the node deleted). You can handle that by keeping a few nodes in cache, so that a node retains its value after it has been deleted. (You could do this using a special storage pool too, of course.) Here's an example I presented to Randy to illustrate some of these issues. Suppose the list contains items that are equivalent in some way, and all the equivalent items are grouped together in sequence, something like: a a b b b c d d d d e e Now we want to write a filter program, that removes all but the first member of each like sequence: a b c d e You could implement such a filter this way: procedure Filter (L : in out List) is procedure Process (C : Cursor) is D : Cursor; begin loop D := Next (C); if Has_Element (D) and then Element (D) = Element (C) then Delete (L, D); --NOTE: DELETION else return; end if; end loop; end Process; begin Iterate (L, Process'Access); end Filter; Even though this algorithm deletes nodes during passive iteration, it works if Iterate is implemented this way: procedure Iterate (Container : in List; Process : not null access procedure (Position : in Cursor)) is Node : Node_Access := Container.First; begin while Node /= null loop Process (Cursor'(Container'Unchecked_Access, Node)); Node := Node.Next; end loop; end Iterate; It's perfectly safe here, since it only deletes nodes that follow the current node. The real issue is portability, since it won't work if Iterate is implemented this way: procedure Iterate (Container : in List; Process : not null access procedure (Position : in Cursor)) is Node : Node_Access := Container.First; begin while Node /= null loop declare Next : constant Node_Access := Node.Next; begin Process (Cursor'(Container'Unchecked_Access, Node)); end; Node := Next; end loop; end Iterate; The filter algorithm above would be erroneous in this case, since the node designated by the Next pointer has been deleted. Now I'm not saying that this API should support container modification during passive iteration, since the filter algorithm above could be implemented just as easily using the active iterator. But there is a big difference between saying that we don't support it, and making a requirement that the implementation must detect modification during Iterate and raise an exception. The moral of the story is that the only thing this API should say about modification during iteration is that this API doesn't say what happens when modification during iteration occurs. ENDMJH. ... A.17.4 The Package Containers.Hashed_Maps ... generic ... package Ada.Containers.Hashed_Maps is ... function "=" (Left, Right : Map) return Boolean; MJH: We have been in discussion about adding another operation, called Equivalent, for sets and maps. I'm not sure what such an operation would mean for a map, but I just wanted to write down somewhere that it's up for discussion in Atlanta. ENDMJH. ... procedure Replace_Element (Position : in Cursor; By : in Element_Type); ... procedure Replace (Container : in out Map; Key : in Key_Type; New_Item : in Element_Type); MJH: The introduction of a new set operation has alerted me to the fact that we have two operations similarly named. (See the set spec for more comments.) So far we have named the cursor-based replace operation "Replace_Element" and its element parameter "By" (this came from Ada.Strings.*), and named the key-based replace operation "Replace" and its element parameter "New_Item". The ARG should confirm whether this difference in naming of the cursor-based vs. key-based replace operations is intended. ENDMJH. ... procedure Iterate (Container : in Map; Process : not null access procedure (Position : in Cursor)); Iterate calls Process.all with a cursor that designates each node in the Container. Any exception raised by Process is propagated. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all calls Reserve_Capacity; or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. See Iterate for vectors for a suggested implementation of the check. We have to include Reserve_Capacity here, as rehashing probably will change the order that elements are stored in the map. End AARM Notes. MJH: We already have a rule that says behavior is unspecified when the container is simultaneously read from and written to. That includes the case of a single task that is both the reader and writer (as is the case with Iterate). Note that for a hashed container (as for a list) the only real problem case is when the current node is deleted, since it contains the pointer to the next node. (Note that if the deleted node retains its value after has been deallocated or put in cache, then deleting the current node isn't really a problem, since you would then be able to reach the next node.) It doesn't really matter if Move or Reserve_Capacity is called either, since the worst thing that happens is you run off the end of the buckets array, in which case Constraint_Error is propagated. A simple assertion to check that the buckets index hasn't changed is all you would need to detect calls to Reserve_Capacity, and a simple assertion check is all you would need to detect whether the buckets array (pointer) has changed as a result of Move or Reserve_Capacity. ENDMJH. ... A.17.5 The Package Containers.Ordered_Sets ... generic ... package Ada.Containers.Ordered_Sets is ... function "=" (Left, Right : Set) return Boolean; MJH: As I mentioned above, we have been in discussion about adding an operation called Equivalent, that is similar to "=" except that it compares each element for equivalence (using "<") instead of element "=". The ARG should also decide whether to bring back the lexicographical comparison operators for sets ("<", ">", etc), since Equivalent is defined as "not (L < R) and not (R < L)". ENDMJH. ... procedure Replace (Container : in out Set; New_Item : in Element_Type); ... procedure Replace (Container : in Set; Position : in Cursor; By : in Element_Type); MJH: This new cursor-based replace operation for sets is named in a manner inconsistent with the rest of this API. It should be named Replace_Element. (Note that the element parameter is named By, not New_Item. The parameter name By is always used as the element parameter of the cursor-based operation named Replace_Element.) ENDMJH. ... generic ... package Generic_Keys is ... procedure Replace (Container : in out Set; Key : in Key_Type; New_Item : in Element_Type); MJH: This operation needs to be removed from this API. First of all, there's no guarantee that parameter Key matches the key-part of parameter New_Item. This is a set and so the ultimate arbiter of position within the set is the key-part of the element itself. In that case, you might was well just say: Replace (Container => S, New_Item => E); A key-based replace doesn't buy you anything, since the New_Item parameter already has a key. But let's suppose someone really wants to use a key-based replace operation. That means there are two possibilities: either the Key matches the key-part of New_Item, or it doesn't. If it matches the key-part of New_Item (and Randy has stated this is the normal case), then passing the key as a separate parameter is entirely redundant. If the key doesn't match, then you must search for the node that matches Key, remove it from the set, assign the value of New_Item to the element on that node, and then reinsert that node. But of course that reinsertion might fail (since the key-part of New_Item) might already be in the set), and if so raise P_E. If you really want to change the key-part of an element, then you can do that using the existing key-based Find and cursor-based Replace_Element operations: Replace_Element (S, Keys.Find (S, K), By => E); However, even that's kind of dubious. That only thing Replace_Element really saves is that the node doesn't have to be deallocated and then reallocated. You might as well just say: Delete (S, K); Insert (S, E); The bottom line is that Generic_Keys.Replace must be removed from this API. It provides no new functionality, and only adds unnecessary clutter. ENDMJH. ... procedure Checked_Update_Element (Container : in out Set; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); MJH: This operation should be named just "Update_Element", not "Checked_Update_Element". There's no need to say that an operation is "checked," since that's implied. All operations are checked in some way or another. It also doesn't need to be named "Checked...", since unlike the Update_Element for other containers, this set operation has an extra parameter for the container. Checking is implied by the extra parameter. Yet another clue that this Update_Element for sets is special is that it is declared in a special place, inside Generic_Keys. So saying that that this operation is "checked" doesn't tell you anything that you don't already know. It just adds unnecessary syntactic overhead. Another issue is that it's inconsistent with Replace_Element. That operation carries an extra container parameter too, yet it's not named Checked_Replace_Element (or Checked_Replace). If it's obvious that Replace_Element is checked, then it should also be obvious that Update_Element is checked. Please change the name of this operation to "Update_Element". ENDMJH. end Generic_Keys; private ... -- not specified by the language end Ada.Containers.Ordered_Sets; ... procedure Iterate (Container : in Set; Process : not null access procedure (Position : in Cursor)); Invokes Process.all with a cursor that designates each element in Container. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. See Iterate for vectors for a suggested implementation of the check. End AARM Notes. MJH: See my previous comments. There should be no requirement for a check to determine whether a set has been modified during passive iteration. As usual, if checks are desired then they should be enabled by an assertion. The ordered set is interesting because (at least in the reference implementation) it uses both looping and recursion to implement passive iteration: procedure Generic_Iteration (Tree : in Tree_Type) is procedure Iterate (P : Node_Access) is X : Node_Access := P; begin while X /= Null_Node loop Iterate (Left (X)); Process (X); X := Right (X); end loop; end Iterate; begin Iterate (Tree.Root); end Generic_Iteration; Again, it really doesn't matter much if the tree is modified, since eventually the iteration terminates when it reaches the bottom of the tree. However, if tree modification is a concern then you can insert some assertion checks (as many as the vendor feels is necessary): procedure Iterate (P : Node_Access) is X : Node_Access := P; begin while X /= Null_Node loop pragma Assert (Is_Valid (Tree, X)); Iterate (Left (X)); pragma Assert (Is_Valid (Tree, X)); Process (X); pragma Assert (Is_Valid (Tree, X)); X := Right (X); end loop; end Iterate; The Is_Valid function would be defined something like this: function Is_Valid (Tree : Tree_Type; Node : Node_Access) return Boolean is begin if Tree.Length = 0 then return False; end if; if Tree.Root = Null_Node then return False; end if; if Tree.First = Null_Node then return False; end if; if Tree.Last = Null_Node then return False; end if; if Parent (Tree.Root) /= Null_Node then return False; end if; if Tree.Length > 1 then null; elsif Tree.First /= Tree.Last then return False; elsif Tree.First /= Tree.Root then return False; end if; if Left (Node) = Null_Node then null; elsif Parent (Left (Node)) /= Node then return False; end if; if Right (Node) = Null_Node then null; elsif Parent (Right (Node)) /= Node then return False; end if; if Parent (Node) = Null_Node then if Tree.Root /= Node then return False; end if; elsif Left (Parent (Node)) = Node then if Right (Parent (Node)) = Node then return False; end if; elsif Right (Parent (Node)) /= Node then return False; end if; return True; end Is_Valid; You get the idea. You can use this same technique for all the containers, to check the validity of the cursor passed to any cursor-based operation. ENDMJH. **************************************************************** From: Randy Brukardt Sent: Monday, October 3, 2004 xx:xx PM Here is a listing of the AI-302 updates that I made beyond those discussed at the meeting. These mostly have come up in more recent e-mail discussions. 3) Replace and Exclude operations matching the ones in the Ordered_Sets were added to the Generic_Keys generic, as it is odd that Delete was in there and not the others. MJH: See my previous comments that the key-based Replace operation should not be part of this API. ENDMJH. (The intent is that this package closely match Hashed_Maps [and Ordered_Maps, if it ever is defined] - as many operations on keys in Hashed_Maps should be represented here as possible.) MJH: The intent of Sets.Generic_Keys is *not* to make a set look like a map (we have maps for that), but rather to allow key-based manipulation of a set. It is always the case that the identity of the container as a *set* is preserved, even when using the operations in Generic_Keys. The nested package is there to allow users to take advantage of composite element types which have a distinctive key component. ENDMJH. Insert and Include were omitted, as there could be no guarentee that the Key passed in matches the one in the Element passed in. (We could check, of course, but that seems like going too far; moreover, it's hard to imagine how these could be used.) MJH: But that's true of Replace as well. ENDMJH. Replace simply doesn't worry about it; it is defined in terms of Replace (see below), replacing the element referred to by the Key. Thus it works similarly to Checked_Update_Element. MJH: See my previous analysis. There is no reason to have a key-based replace operation, since the addition of a cursor-based replace operation obviates its need. ENDMJH. Replace (Container, Cursor, New_Item) also has been added to the Set itself, as there is not a Replace_Element for a set. This tries to replace in place, but will do an insert/delete if necessary. MJH: This operation should be named "Replace_Element", not "Replace". ENDMJH. 5) The index forms of Element, Replace_Element, Query_Element, and Update_Element took Index_Type'Base for some reason, but passing No_Index raises Constraint_Error. So I changed these to Index_Type, so that the specification doesn't allow No_Index to be passed. MJH: I originally used IT'Base because the implementation must check that the index parameter satisfies the constraint that Index <= Last_Index (V), so passing in a constrained subtype didn't really buy you anything. ENDMJH. 6) Added an erroneous case for abuse of the Process procedure of Query_Element and Update_Element. This usually looks like: Execution also is erroneous if the called Process procedure of a call to Query_Element or Update_Element executes an operation that causes the Position cursor of Query_Element or Update_Element to become invalid. For lists, maps, and sets, the only problem occurs if the element is deleted directly, or if the container is finalized (via Unchecked_Deallocation). Insertions and other Deletions don't matter, as the nodes are logically separate. For vectors, the rule also includes ambiguous cursors. An insert or delete to the left of the cursor will move the elements; if the element is passed by reference, that will clobber the element being operated on with unknown effects. We don't want to require that optimization is off in Process subprograms! The vector version also requires wording to cover the index version of the routines. I'd like to suggest that we consider adding a check that the element being processed is not deleted by the Process procedure. This check requires only a bit per node (or a short list of elements in process), and covers all of the new dangerous cases for most of the containers. MJH: See my previous analysis. This suggested implementation won't work in the presence of multiple reader taaks, which a container must support. We already have a rule that says simultaneous reading from and writing to a container has undefined behavior. This rule applies whether it is one task or multiple tasks doing the simultaneous reading and writing. Hence there should be no requirement to perform any such checks. ENDMJH. (Bad use of Unchecked_Deallocation is hardly new to the containers, and Move will not actually cause problems in practice, as the nodes are not changed, just the container that they belong to.) Deleting yourself requires contortions (the Process routine does not have a cursor to use for this operation), and, since it damages the element parameter, the effects could be widespread. The check also would prevent calling Update_Element on the same element again, which would have different results depending on the parameter passing mode (and which makes the check cheaper). The overhead of the check would only apply to the various Deletes and Update_Element; no other routines would need to check. The text would be: If the Process procedure deletes the element designated by Cursor, or calls Update_Element on Cursor, Program_Error is raised. AARM Note: This check has to be done in the code for Delete and Update_Element, of course. Making vector Update_Element safe would also require checking for any operations that would make the cursor ambigious. (That's a bounded error in other cases.) MJH: See my previous comments. The checks described above are not implementable. ENDMJH. 8) Delete for cursors does nothing if the cursor is No_Element for Lists, Maps, and Sets. (Matt says this was intended to model the effect of Unchecked_Deallocation.) Delete for cursors in Vectors, on the other hand, raised Constraint_Error in this case. I changed the wording for Delete for cursors in Vectors to be consistent with the other three. MJH: The reasons are historical. The index-based form of delete came first, and then (as now), specifying an index value outside of the active range of elements raised Contraint_Error. (The model is that a vector is roughly the same as an array that can expand or contract.) When we added the cursor-based operations in Phoenix, I probably defined the semantics of the cursor-based delete operation for vector to match the semantics of the index-based delete operation, rather than matching the semantics of the cursor-based delete for other containers. Note that I have a comment above (in the vectors section) requesting the ARG to confirm the semantics of the index-based delete. (That operation raises C_E if the index is outside of the active range of elements.) ENDMJH. 10) Added an AARM note to the effect that when we say "unspecified" in this clause (A.17), we don't mean "erroneous". If we meant "erroneous", we said that. And included some ramifications of that (checking must not be suppressed; don't create dangling pointers by assuming behavior of generic formals). MJH: Maybe we should say that modifying a container during passive iteration, or during Update_Element, etc, has "unspecified" behavior? I am not in favor of requiring any sort of check (especially since the suggested implementation won't work if there are multiple reader tasks). ENDMJH. 15) Wording was added to Iterate for each container to say that Program_Error is raised if the Process routine calls an operation that will modify or reorder the container. Each container needs slightly different wording for various reasons (nodes can be reordered in Lists; rehashing in a Map would change the order). MJH: This requirement should be removed. There is no way to make such a check that works when there are multiple reader tasks. This requirement is redundant anway, since we already have a meta-rule that says behavior is unspecified if a container is simultaneously read from and written to. This rule applies irrespective of the number of tasks (including the case of the same task). ENDMJH. This decision grew out of a discussion between Matt and me as to what exactly the passive iterator should allow. We both agreed that trying to implement a passive iterator that could stand insertions and deletions of elements was hard. MJH: It varies by container. My comments above highlight some of the differences among the containers. The problem case for a list, for example, is deleting the current node; other than that, pretty much anything goes. At best, container modification during passive iteration is non-portable. This API should not require that container modification during passive iteration be allowed, but it should not require that modification during passive iteration be prevented, either. In particular, there should be no requirement to perform any check during passive iteration to detect modification. ENDMJH. Morevoer, if the user needs to do that, they can use an active iterator (that is, a loop with explicit cursors) to do so. So, we agreed that inserting or deleting elements from within a passive iterator was bad, and there is no need or intent support it. MJH: Whether or not it's "bad" depends on the implementation. If you delete the current node during passive iteration over a list, for example, then very likely the implementation will read some deallocated memory to get the pointer to the next node. That would clearly be bad, unless the implementation stores deleted nodes in a cache, or arranges for deallocated memory to retain its state (say, by using a special storage pool), in which case there would be no problem. On the other hand, if nodes other than the current node are deleted, then there isn't any problem. (But again, it depends on the implementation.) So no, this API should not require implementations to support container modification during passive iteration, but that's a far different thing from requiring that an implementation prevent such modification. There is a meta-rule that says the container must support manipulation by multiple reader tasks. There is no way (that I know of, at least) to check that a container isn't modified during passive iteration in a way that doesn't violate this meta-rule. That's why there should be no requirement for such a check, since it's impossible to implement. ENDMJH. The main undecided issue is what to do if the user does indeed make a mistake and insert or delete an element from the container during a passive iterator. There seem to be 4 possibilities: 1) Specified results (it works in some specified way); MJH: For the vector and list, you could probably do that without too much implementation burden. (Well, I take that back. Randy and Pascal are implementing their vectors using a two-tier structure and a skip list, respectively, so I can't say what burden would entail.) In any event, it's probably not worth the bother of specifying allowed modifications during passive iteration, since the user can just manually use a cursor and an explicit loop. ENDMJH. 2) Unspecified results (it works, but what it does isn't specified); MJH: I'm not sure we can even guarantee that "it works," since this might overly constrain an implementor (by requiring that he cache nodes, for example). ENDMJH. 3) Erroneous (anything goes); MJH: If a node gets deallocated, and you refer to that node in order to navigate to some other node, then you have a dangling reference, so I assume that falls under the heading of "erroneous." ENDMJH. 4) Check for bad cases and raise an exception. MJH: This won't work if there are multiple reader taaks. ENDMJH. (1) is clearly too burdensome on the implementation, and besides, we don't want it. (2) would insure that the program wouldn't crash, but otherwise the results wouldn't be portable. (3) would allow anything, implementers could ignore the possibility. (4) would be the most portable, but there are concerns about overhead. MJH: It's not even clear to me how (4) could even be implemented, since there's no way to perform a check that works in the presence of multiple reader tasks. ENDMJH. I originally wrote (2) using the wording: "Which cursors are presented to Process is unspecified if..." But that seems to be a burden on implementations for little benefit. I object to (3), because users *will* make this mistake, and likely implementations of the iterators would have very bad effects. If the node that the iterator was holding onto was deleted, it probably would be Unchecked_Deallocated, the memory might be reused, and when the pointers are walked, just about anything could happen. MJH: Not necessarily. You yourself have already stated you plan on using a node cache, which means nodes would retain their state. You could arrange to put a newly-deleted node at the end of the queue, such that it retains its state for as long as possible. You could even use a generic formal constant (or some other mechanism) to control how large the cache is. Note that I have already implemented some of the validity checking described in an earlier comment, and I was able to successfully detect a dangling reference without doing anything special. The validity checking would be even more robust were I to use GNAT's special Debug_Pool storage pool. ENDMJH. (4) seemed to have too much overhead, but once we stopped trying to support any insertion or deletion into the container, the cost became quite reasonable. All the implementation of the check would need is a counter (8 bits probably is enough) in each container. When an Iterate starts, the counter is incremented; when it completes, the counter is decremented. Each of the operations on the list of problem operations check that the counter is zero, raising Program_Error if the counter is nonzero. MJH: This won't work in the presence of multiple reader tasks, which a container must support. This API shouldn't be tied to a particular implementation technique anyway. ENDMJH. (We don't have to worry about tasking issues, as the container object is inside of the Iterate call the entire time. If some other task makes a call during that time, we have bad use of shared variables, and we don't care what happens. In fact, what will happen is that Program_Error would be raised, which is probably a good thing.) MJH: We certainly do have to worry about tasking issues! It is certainly *not* an error if multiple tasks all call Iterate simultaneously. ENDMJH. That has very little overhead, because virtually all of the operations in question allocate or deallocate memory, and thus are expensive anyway, an additional compare and branch will have no visible impact on performance. (Sorting and Merging are also expensive; Swap_Links and Splice are the only exceptions.) Operations that don't modify the container don't need to make any check. MJH: This technique doesn't work if there are multiple reader tasks. ENDMJH. This has the advantage of making passive iterators completely safe against problems caused by what container operations are invoked in Process. (Yes, calling Unchecked_Deallocation on the container could still cause problems, but that is covered by other rules of the language -- and even it would raise Program_Error.) It also means that uses of passive iterators are safely portable (whereas active iterators could have problems if a dangling cursor was used) -- which gives them a clear advantage. MJH: Well, if we're going to invoke "other rules of the language," then we should just invoke the rule that says simultaneously reading from and writing to a container is undefined, irrespective of whether this is one task or multiple tasks. ENDMJH. This check is another one that could be dropped in an "unchecked" container. MJH: This check doesn't belong in this API. At a minimum the implementation of the check described above doesn't work when there are multiple reader tasks. ENDMJH. Thus, I've worded this check into all of the passive iterators. The wording enumerates the reasons that a check is needed: "if Process attempts to insert or delete elements into Container; or" "modifies Container" would be too broad, as it could include replacing the value of an element. MJH: I tend to think of the container and its elements as separate entities. I often use the term "change the cardinality of the container" to emphasize the modification of the container itself. ENDMJH. We need also to talk about finalization and about calling Move, as the current wording only talks about cursors being passed to operations, not something that happens *during* an operation. Moreover, once we decide to have a check, including that check in the body of Finalize and Move is not difficult. MJH: I am against mandating any such check. ENDMJH. **************************************************************** MJH: The following comments apply to Pascal's new set API: AI-20302-07-addendum-set-crlf.txt ENDMJH. A.17.6 Sets The language-defined packages Containers.Hashed_Sets and Containers.Ordered_Sets provide private types Set and Cursor, and a set of operations for each type. A hashed set container allow an arbitrary type to be stored in a set. An ordered set container orders its element per a specified relation. MJH: Well, technically both kinds of set (indeed, all kinds of containers) allow an "arbitrary type" to be stored, not just the hashed set. ENDMJH. This section describes the declarations that are common to both kinds of sets. See A.17.7 for a description of the semantics specific to Containers.Hashed_Sets and A.17.8 for a description of the semantics specific to Containers.Ordered_Sets. The type Set is used to represent sets. The type Set needs finalization (see 7.6). A set contains elements. Set cursors designate elements. There exists an equivalence relation on elements, whose definition is different for hashed sets and ordered sets. A set never contains two or more equivalent elements. The *length* of a set is the number of elements it contains. Each nonempty set has two particular elements called the *first element* and the *last element* (which may be the same). Each element except for the last element has a *successor element*. If there are no other intervening operations, starting with the first element and repeatedly going to the successor element will visit each element in the map exactly once until the last element is reached. The exact definition of these terms is different for hashed sets and ordered sets. MJH: But do realize that only the ordered set has a Last_Element selector and a Delete_Last modifier. I'm not sure any discussion of last element is even relevant, since the successor of last is well-defined (the cursor has the value No_Element). ENDMJH. Empty_Set represents the empty Set object. It has a length of 0. If an object of type Set is not otherwise initialized, it is initialized to the same value as Empty_Set. No_Element represents a cursor that designates no element. If an object of type Cursor is not otherwise initialized, it is initialized to the same value as No_Element. function "=" (Left, Right : Set) return Boolean; If Left and Right denote the same set object, then the function returns True. If Left and Right have different lengths, then the function returns False. Otherwise, for each element E in Left, the function returns False if an element equivalent to E is not present in Right. If the function has not returned a result after checking all of the elements, it return True. Any exception raised during evaluation of element equivalence is propagated. MJH: As I have already mentioned, equality for sets is (er, should be) defined in terms of element equality, not equivalence. This is true for all containers, so sets shouldn't be any different. ENDMJH. ... procedure Replace (Container : in out Set; New_Item : in Element_Type); If Length (Container) equals 0, then Contraint_Error is propagated. Otherwise, Replace checks if an element equivalent to New_Item is already in the set. If a match is found, that element is replaced with New_Item; otherwise, Constraint_Error is propagated. MJH: I'm not sure why that first sentence is necessary, since the last sentence includes the case of Length(C) = 0. ENDMJH. ... procedure Iterate (Container : in Set; Process : not null access procedure (Position : in Cursor)); Iterate calls Process.all with a cursor that designates each element in Container, starting with the first node and moving the cursor according to the successor relation. Any exception raised by Process.all is propagated. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. See Iterate for vectors for a suggested implementation of the check. End AARM Notes. MJH: I don't know how you expect to implement such a requirement. No, it's not good enough to modify the state of the container to indicate that iteration is in progress, since the container must work in the presence of multiple reader tasks. (Mixing readers and writers is of course a no-no.) Get rid of the requirement for a check, and say that modifying the container (that is, changing its cardinality) during iteration is erroneous (or unspecified, or whatever). We already have a meta-rule that says behavior isn't specified if the container is simultaneously queried and modified. This rule applies even if the reader and writer are the same task (as would be the case of deleting an element from the container while iteration is in progress). The best way to handle this "problem" is to use assertion checks (or perhaps some kind of preprocessor) that can be controlled by the user. The assertions can use knowledge of the representation of the internal storage node and the characteristics of the storage pool. For example, here's a set of assertions that detect an attempt to delete a node that has already been deleted: pragma Assert (Tree.Length > 0); pragma Assert (Tree.Root /= Null_Node); pragma Assert (Tree.First /= Null_Node); pragma Assert (Tree.Last /= Null_Node); pragma Assert (Parent (Tree.Root) = Null_Node); pragma Assert ((Tree.Length > 1) or else (Tree.First = Tree.Last and then Tree.First = Tree.Root)); pragma Assert ((Left (Node) = Null_Node) or else (Parent (Left (Node)) = Node)); pragma Assert ((Right (Node) = Null_Node) or else (Parent (Right (Node)) = Node)); pragma Assert (((Parent (Node) = Null_Node) and then (Tree.Root = Node)) or else ((Parent (Node) /= Null_Node) and then ((Left (Parent (Node)) = Node) or else (Right (Parent (Node)) = Node)))); See Delete_Node_Sans_Free in a-crbtgo.adb. Similar checks immediately following the call to Process by Iterate should be adequate to detect most problems. ENDMJH. ... procedure Replace (Container : in out Set; Key : in Key_Type; New_Item : in Element_Type); Equivalent to Replace (Container, Find (Container, Key), New_Item). MJH: This is a useless operation, and it should be removed from this API. I'll include my analysis in my review of the post-Madison final draft. [MJH -- See my earlier comments.] Note also that the the cursor-based Replace operation by which this key-based replace is implemented hasn't been mentioned anywhere above. ENDMJH. ... A.17.7 The Package Containers.Hashed_Sets Static Semantics The package Containers.Hashed_Sets has the following declaration: generic type Element_Type is private; with function Hash (Element : Element_Type) return Hash_Type; with function Equivalent_Elements (Left, Right : Element_Type) return Boolean; MJH: We have already discussed the fact that you need to pass elem_t "=" too, in order to implement set container "=". We have also discussed the fact that I think the equivalence function should be named Equivalent_Keys, not Equivalent_Elements. ENDMJH. ... function Equivalent_Elements (Left, Right : Cursor) return Boolean; function Equivalent_Elements (Left : Cursor; Right : Element_Type) return Boolean; function Equivalent_Elements (Left : Element_Type; Right : Cursor) return Boolean; MJH: See my comment above. These should all be named Equivalent_Keys. The model is that an element is either a composite type with a key-part component; or, the element doesn't have a key-part, meaning that the element is all-key. But in either case there is a key, even if the "key" is the element itself. (Another argument is that a set is basically the same as a map -- the only difference is where the key lives. You should be able to switch between a set and a map without too much pain, and having the set use the name Equivalent_Element while the map uses Equivalent_Keys is a gratuitous difference.) ENDMJH. ... generic type Key_Type (<>) is limited private; with function Key (Element : in Element_Type) return Key_Type; with function Hash (Key : Key_Type) return Hash_Type; with function Equivalent (Left : Key_Type; Right : Element_Type) return Boolean; MJH: Now you have changed this too! It should be named Equivalent_Keys. The whole point of this nested package is to allow key-based set manipulation for (composite) element types that have a key-part. ENDMJH. package Generic_Keys is ... procedure Replace (Container : in out Set; Key : in Key_Type; New_Item : in Element_Type); MJH: See my comments above. This operation should be removed from this API. ENDMJH. ... function Equivalent_Keys (Left : Cursor; Right : Key_Type) return Boolean; function Equivalent_Keys (Left : Key_Type; Right : Cursor) return Boolean; MJH: Here's all the proof you need that the formal operation should be named Equivalent_Keys, too. ENDMJH. end Generic_Keys; private ... -- not specified by the language end Ada.Containers.Hashed_Sets; ... procedure Iterate (Container : in Set; Process : not null access procedure (Position : in Cursor)); In addition to the semantics described in A.17.6, Program_Error is propagated if Process.all calls Reserve_Capacity. MJH: See my earlier comments, about the fact that this requirement is unnecessary, and unimplementable. We already have a meta-rule that says the container cannot be read-from and written-to simultaneously. The rule applies whether this is one task or more than one task. In this particular case, you can detect whether a node has been moved (as a result of rehashing) like this: for I in Container.Buckets'Length loop ... Process (Cursor'(Container'UC, Node)); pragma Assert (Hash (Node.Element) mod Bucket'Len = I); ... end loop; ENDMJH. ... function Equivalent_Keys (Left : Cursor; Right : Key_Type) return Boolean; Equivalent to Equivalent_Keys (Key (Left), Right). function Equivalent_Keys (Left : Key_Type; Right : Cursor) return Boolean; Equivalent to Equivalent_Keys (Left, Key (Right)). MJH: This is inconsistent with the name for the generic formal function. ENDMJH. ... A.17.8 The Package Containers.Ordered_Sets Static Semantics The package Containers.Ordered_Sets has the following declaration: generic ... package Ada.Containers.Ordered_Sets is ... generic ... package Generic_Keys is ... procedure Replace (Container : in out Set; Key : in Key_Type; New_Item : in Element_Type); MJH: I have already stated this Replace operation should be removed from this API. ENDMJH. ... end Generic_Keys; private ... -- not specified by the language end Ada.Containers.Ordered_Sets; **************************************************************** From: Randy Brukardt Sent: Wednesday, October 27, 2004 10:58 PM I'm going to only answer a few of these points, because my opinions are already noted in the AI and the attached e-mail. But several of the things Matt says are completely new and for the most part, wrong, and they need to be addressed. I'm going to answer this in several smaller messages so that we can make reasonable threads. ... > A.17 Containers > > ... > > Note that the language already includes several requirements that are > important to the use of containers. First, library packages must be > reentrant - multiple tasks can use the packages as long as they operate on > separate containers. Thus, it is only necessary for a user to protect a > container if a single container needs to be used by multiple tasks. > > > MJH: > > We need to be clear here about multithreading issues, since that last > sentence is wrong. No, the last sentence exactly matches the language of A(3). And this paragraph has been here forever, as has A(3). > The only problem case is when there are multiple writers, or a single > writer and one or more readers. (The reader and writer can also be the > same task.) > > It is definitely *not* an error for multiple readers to access the same > container all simultaneously. Yes it is, because it violates A(3). > In particular, it is perfectly acceptable (in fact, the API is designed > to facilitate this) for multiple tasks to be iterating over a same > container object, using either cursors or the passive iterator. Maybe, but it violates A(3). I would be very opposed to trying to repeal A(3) for these packages. There's a number of reasons for that: 1) It would be different than all other Ada-defined packages. That would add to user confusion. 2) It would prevent implementations from doing any sort of modifications on reading. For instance, the common technique of making most recently referenced elements to be more accessible in a hash table couldn't be used. Nor could caches or reference counts. We had similar restrictions in some operations in Claw, and it proved to be very constraining. 3) It would require a lot of wording to implement. We'd have to define precisely which operations are reading and which are writing; that would be complex to do. 4) It would make it far more likely for users to try to access containers from multiple tasks and would lead to errors. For instance, your proposed semantics wouldn't allow one task to write a container and another to read it, but it's likely that users would try to do so. The current rule is quite simple: if you want to use multiple tasks on a single container (or any other entity of a predefined type), wrap it in a protected object. Any other rule is going to be far more complex, both to use and to understand. We've always expected that a secondary standard would consider task-safe "protected" containers. But the locking needed for that is quite expensive, and it certainly shouldn't be mandated. **************************************************************** From: Nick Roberts Sent: Thursday, October 28, 2004 10:45 AM > The current rule is quite simple: if you want to use multiple tasks on a > single container (or any other entity of a predefined type), wrap it in a > protected object. Any other rule is going to be far more complex, both to > use and to understand. I agree with Randy on this issue. In particular, calling protected functions to read a wrapped container object should be something that multiple tasks can do in parallel (provided no protected procedures are in execution at the same time). I can find no functions in the latest AI (v1.10) which would themselves be likely to be implemented in such an impure manner that parallel calls could interfere with each other or the integrity of the container's state, but I suspect there is a possibility that the Find (plus Contains and Has_Element) functions /might/ be implemented like this, as well as Floor and Ceiling (they might modify the container's state in an effort to increase the speed of subsequent searches). How easy would it be to wrap a container in Ada 2005? I guess each operation would have have to be explicitly wrapped (and in many cases this might be quite right, too). Could this be done by renaming? **************************************************************** From: Randy Brukardt Sent: Wednesday, October 27, 2004 11:58 PM Another response to Matt's review: > procedure Iterate > (Container : in List; > Process : not null access procedure (Position : in Cursor)); > > Invokes Process.all with a cursor that designates each node in Container. > Any > exceptions raised during Process are propagated. > Program_Error is propagated if: > * Process.all attempts to insert or delete elements from Container; or > * Process.all calls a routine that reorders the elements of Container > (Swap_Links, Splice, Generic_Sort, or Generic_Merge); or > * Process.all finalizes Container; or > * Process.all calls Move with Container as a parameter. > > AARM Note: > This check takes place when the operations that insert or delete elements, > etc. > are called. There is no check needed if an attempt is made to insert or > delete > nothing (that is, Count = 0). > > See Iterate for vectors for a suggested implementation of the check. > > Swap is not included here, as it only copies elements. > End AARM Notes. > > > MJH: > > This requirement is redundant, since we already have a meta-rule that > says container behavior isn't specified if the container object is > simultaneously read from and written to. This rule applies even if it's > the same task doing the reading and writing (as would be the case when a > container is modified during passive iteration). There is no such rule that I know of. A(3) of course covers it for multiple tasks, but for a single task, the only rules are the ones I just added to Query_Element, Update_Element, and Iterate. Moreover, such a rule would require careful definition of exactly which operations are reading and which are writing. (You can't assume that that is "obvious" in the Standard!) There certainly is nothing that complex in the current AI. (BTW, please refer to AIs by the version numbers when writing public messages, so that future readers can easily find out what is meant -- "post-Madison" won't mean anything to anyone in six months.) > Note that the suggested implementation of the check doesn't work when > there are multiple reader tasks. A container must support simultaneous > reading by multiple tasks. A(3) only requires that multiple tasks work for disjoint objects. I'm very opposed to going further (see my other note). Since virtually everything that Matt has to say is based on the two above fallicies, I'll just reiterate the important points: The Ada philosophy is that unspecified (resp. erroneous execution and bounded error) behavior is bad. Even if there was a "meta-rule" (whatever that is), it would be a bad thing. We only allow less than completely specified behavior when it is too expensive to check. If we do allow less than completely specified behavior, we try to specify it as tightly as possible. (That is a bounded error if we can.) We don't rely on "meta-rules" that say anything goes; that's for the C and C++ folks. In this case, we want to define what the passive iterator does in all cases. That helps code be portable and safe. The only real advantage of the passive iterator is that it is safe -- you'll only get valid cursors from it, and you'll get each one exactly once, and so on. If we don't protect against deletion of elements, there is no chance for the iterator to be implemented safely (certainly not if you use one node per element, as is likely for most of the containers, like lists and maps). Indeed, it would be hard for it to be implemented without being erroneous in some case, as Matt gives examples to describe. I find "erroneous" to be completely unacceptable for the passive iterator. It should *never* give the Process routine a dangling pointer or run over random memory. If it can do that, we shouldn't have it at all, because it them offers nothing at all over the active case (it's harder to use, and *less* safe?). I tried to define this as a bounded error, with it being unspecified what cursors are returned, but Matt convinced me that that was too hard to implement, and that Insert and Delete in a passive iterator is a mistake anyway. Fine; then simply checking that these don't happen is the better option. In any case, we should err on the side of safe and predictable. The containers have far too much "unspecified" behavior for my taste already. (I'd prefer checks on Query_Element and Update_Element, but these would have some space overhead on every element, which might be too much - which is why I didn't write them in.) Matt seems to have little interest in portable, predicable behavior. I don't understand that, unless he thinks everyone is going to use his implementations anyway (so it doesn't matter what the standard says). I can say with certainty that that will not be the case; the standard has to provide the predictability. **************************************************************** From: Nick Roberts Sent: Thursday, October 28, 2004 10:58 AM > ... > I tried to define this as a bounded error, with it being unspecified what > cursors are returned, but Matt convinced me that that was too hard to > implement, and that Insert and Delete in a passive iterator is a mistake > anyway. Fine; then simply checking that these don't happen is the better > option. I agree with this. **************************************************************** From: Randy Brukardt Sent: Thursday, October 28, 2004 12:03 AM Another response to Matt: > ... > procedure Replace_Element (Position : in Cursor; > By : in Element_Type); > ... > procedure Replace (Container : in out Map; > Key : in Key_Type; > New_Item : in Element_Type); > > MJH: > > The introduction of a new set operation has alerted me to the fact that > we have two operations similarly named. (See the set spec for more > comments.) > > So far we have named the cursor-based replace operation > "Replace_Element" and its element parameter "By" (this came from > Ada.Strings.*), and named the key-based replace operation "Replace" and > its element parameter "New_Item". > > The ARG should confirm whether this difference in naming of the > cursor-based vs. key-based replace operations is intended. Yes, it's intentional. Replace_Element only replaces the element (duh!) but Replace replaces both the element and the key. If you called it Replace_Element, it shouldn't replace the key, which would violate the semantics that we agreed on in Madison. **************************************************************** From: Randy Brukardt Sent: Thursday, October 28, 2004 12:27 AM > ... > > procedure Replace (Container : in out Set; > New_Item : in Element_Type); > ... > > procedure Replace (Container : in Set; > Position : in Cursor; > By : in Element_Type); > > MJH: > > This new cursor-based replace operation for sets is named in a manner > inconsistent with the rest of this API. It should be named > Replace_Element. (Note that the element parameter is named By, not > New_Item. The parameter name By is always used as the element parameter > of the cursor-based operation named Replace_Element.) > > ENDMJH. I think I agree with that; I was trying to draw a parallel with the Maps, but that only applies to the Generic_Keys package, not the root container. > ... > > generic > ... > package Generic_Keys is > > ... > procedure Replace (Container : in out Set; > Key : in Key_Type; > New_Item : in Element_Type); > > MJH: > > This operation needs to be removed from this API. ... (followed by a lot of irrelevant discussion) > The bottom line is that Generic_Keys.Replace must be removed from this > API. It provides no new functionality, and only adds unnecessary > clutter. For someone who is always proposing unnecessary operations, this is quite a stretch! I don't understand why you feel so strongly about this operation. Certainly, this isn't the most useful routine, except in one case: when a project decides to move from a Map to a Set representation. One of the most important criteria for the containers, I think, is that it be (relatively) easy to move between related containers as needs change. Certainly changing from an ordered to a hashed map should be simple. But I think this also applies to changing the location of the key. One of the most valuable uses of containers is in prototypes, and one of the most obvious properties of a prototype is that it is likely to change. It's not unreasonable that a project would find the need to include the key in the element at some point during development. Such a change shouldn't require wholesale rewriting of the application; and that is especially true for those of us who are use-adverse and will not use a use-clause. (If a library requires the use of use-clauses, it doesn't belong in the standard. Period.) Therefore, it should be the case that as many operations as possible from the Map be available in the Set, and in particular in the Generic_Keys (since that is the direct counterpart of the Map). Forcing half the operations to be rewritten or moved is simply not acceptable. Thus, the Replace operation is declared here simply because a similar operation is declared in the Maps. I would have preferred to declare an Insert and Include as well, but these require goofy checks that the key is compatible with the element. Note that I would not expect any of these operations to be used in new (hand-written) code; they'd only be used when a Map is converted into a Set. So the fact that their semantics is a little weird is irrelevant -- they're simply an aid for changing between containers. Matt seems to expect that rewriting a large percentage of the calls when making such a change is acceptable. I don't think that is true, unless you think that moving a key from a separate entity to a component is going to be rare (I certainly don't believe that). Anyway, I want Generic_Keys to be as similar as possible to Maps (and indeed, our style guide will recommend avoiding operations that are not common, like Update_Element - even if it is renamed, it won't have the same parameter list or semantics). That means that there will be a few operations with odd semantics. So be it. **************************************************************** From: Nick Roberts Sent: Thursday, October 28, 2004 11:07 AM Regarding AI95-00302-03/07 (v1.10), a few minor typos by line number: * 816, 823 [function]{procedure|operation} * 2335 [hastle]{hassle} * 3021 [Positiond oes]{Position does} * 3098, 3107, 8420, 8421 [communative]{commutative} I'll keep looking. **************************************************************** From: Randy Brukardt Sent: Tuesday, November 2, 2004 7:04 AM This update (AI-302-3/08) mainly reorganizes the text to add Hashed_Sets and Ordered_Maps to the containers. Hopefully, this will be more consistent than the previous definition. Most of the work on this was done by Pascal Leroy, although I did make some editorial fixes to it. Note that the version to be considered at the next ARG meeting in Atlanta needs to be finished before November 15th, so try to make comments soon. (But no need to repeat existing ones.) This AI is in the usual place at http://www.ada-auth.org/ais.html. **************************************************************** From: Matthew Heaney Sent: Saturday, November 13, 2004 12:31 PM In the interest of fully exploring all the issues at hand, I offer a brief rebuttal to Randy's points below. --MJH > -----Original Message----- > From: Randy Brukardt [mailto:randy@rrsoftware.com] > Sent: Wednesday, October 27, 2004 11:58 PM > To: Ada Comment > Subject: [Ada-Comment] Matt's review of ai-302-3/07 draft (resp. 1) > > > ... > > A.17 Containers > > > > ... > > > Matt said: > > In particular, it is perfectly acceptable (in fact, the API is > > designed to facilitate this) for multiple tasks to be > iterating over a > > same container object, using either cursors or the passive iterator. Randy replied: > Maybe, but it violates A(3). MJH: Does A(3) apply when the calls denote the same container ("overlapping objects")? ENDMJH: > I would be very opposed to trying to repeal A(3) for these > packages. There's a number of reasons for that: > > 1) It would be different than all other Ada-defined > packages. That would add to user confusion. MJH: Well, I could argue that what is confusing is the fact that a container can be modified during a read-only call. What exactly does an in-mode parameter even mean? ENDMJH. > 2) It would prevent implementations from doing any sort of > modifications on reading. For instance, the common technique > of making most recently referenced elements to be more > accessible in a hash table couldn't be used. Nor could caches > or reference counts. We had similar restrictions in some > operations in Claw, and it proved to be very constraining. MJH: Fair enough, but realize that that permission will more than likely violate users' expectations, if the container is passed as an in-mode parameter. At a minimum, you're going to have to use some kind of indirection to do this (a la function Random). The problem with the internal counter technique you want to use to keep track of invocations of Iterate is that Iterate passes the container as an in-mode parameter. Either some sort of trickery will be involved, or you'll have to allocate the counter object (yech!). ENDMJH. > 3) It would require a lot of wording to implement. We'd > have to define precisely which operations are reading and > which are writing; that would be complex to do. MJH: But defining those operations is required in any case, since if you want to require that an implementation raise Program_Error when an operation is called during Iterate, you must define which specific operations are required to raise PE. ENDMJH. > 4) It would make it far more likely for users to try to > access containers from multiple tasks and would lead to > errors. For instance, your proposed semantics wouldn't allow > one task to write a container and another to read it, but > it's likely that users would try to do so. MJH: My counter-argument is that changing some state behind the scenes (meaning that there are state changes even when the container is passed as an in-mode parameter) is going to cause you even greater problems, since most users will assume (in spite of A(3)) that since the container is passed as an in-mode parameter, that there are no state changes and therefore it's safe for multiple tasks to call in-mode operations. Passing the container as in-mode and making a state change is like false advertising. ENDMJH. > The current rule is quite simple: if you want to use multiple > tasks on a single container (or any other entity of a > predefined type), wrap it in a protected object. Any other > rule is going to be far more complex, both to use and to understand. MJH: As I argue above, prohibiting simultaneous reads by multiple tasks is already hard to understand. Most developers will assume that multiple readers are allowed, and they will therefore most likely corrupt the internal state that we don't advertise can change. ENDMJH. > We've always expected that a secondary standard would > consider task-safe "protected" containers. But the locking > needed for that is quite expensive, and it certainly > shouldn't be mandated. MJH: My main argument is that if there is "physical" state that can change (even if the "logical" state does not change), then we need to advertise this fact. ENDMJH. **************************************************************** From: Matthew Heaney Sent: Saturday, November 13, 2004 6:39 PM > Another response to Matt's review: > > > procedure Iterate > > (Container : in List; > > Process : not null access procedure (Position : in Cursor)); ... > > MJH: > > > > This requirement is redundant, since we already have a > meta-rule that > > says container behavior isn't specified if the container object is > > simultaneously read from and written to. This rule applies even if > > it's the same task doing the reading and writing (as would > be the case > > when a container is modified during passive iteration). > > There is no such rule that I know of. A(3) of course covers > it for multiple tasks, but for a single task, the only rules > are the ones I just added to Query_Element, Update_Element, > and Iterate. > > Moreover, such a rule would require careful definition of > exactly which operations are reading and which are writing. > (You can't assume that that is "obvious" in the Standard!) > There certainly is nothing that complex in the current AI. MJH: My assumption was that it's perfectly clear which operations are reading and which are writing, since that's what the parameter mode indicates. Any time the container is modified, we pass the container as the first parameter, with parameter mode inout. If you do add the check to determine whether passive iteration is in progress, then you're still going to have to specify which operations are supposed to raise PE. For example, is calling function Length allowed during passive iteration? What about First or Last? ENDMJH. > > Note that the suggested implementation of the check doesn't > work when > > there are multiple reader tasks. A container must support > > simultaneous reading by multiple tasks. > > A(3) only requires that multiple tasks work for disjoint > objects. I'm very opposed to going further (see my other note). MJH: That's fine, but how do you intend to implement the counter? The container is passed as an in-mode parameter. An in-mode parameter is telling me that there is no (logical) state change. You want to allow a physical state change, even when the parameter is in-mode. Fair enough, I'm all in favor of maximizing vendor freedom, but only to a point, since otherwise the problem becomes intractable, as you attempt to divine all the myriad things a vendor *might* do, were he allowed to change the (physical) state for any operation. One potential human-engineering issue is that if a user isn't aware that a container can always change state (even when it's passed as an in-mode parameter), he may inadvertently corrupt the container, even if he calls only in-mode operations (using multiple tasks). But that might just mean we'll have to educate him. ENDMJH. > Since virtually everything that Matt has to say is based on > the two above fallacies, I'll just reiterate the important points: > > The Ada philosophy is that unspecified (resp. erroneous > execution and bounded error) behavior is bad. Even if there > was a "meta-rule" (whatever that is), it would be a bad > thing. We only allow less than completely specified behavior > when it is too expensive to check. > > If we do allow less than completely specified behavior, we > try to specify it as tightly as possible. (That is a bounded > error if we can.) We don't rely on "meta-rules" that say > anything goes; that's for the C and C++ folks. > > In this case, we want to define what the passive iterator > does in all cases. That helps code be portable and safe. The > only real advantage of the passive iterator is that it is > safe -- you'll only get valid cursors from it, and you'll get > each one exactly once, and so on. MJH: But we do define portable and safe behavior, but only when the user promises not to modify the container while iterating. If he modifies the container while iterating, then he can suffer the consequences. This is just basic economics. ENDMJH. > If we don't protect against deletion of elements, there is no > chance for the iterator to be implemented safely (certainly > not if you use one node per element, as is likely for most of > the containers, like lists and maps). Indeed, it would be > hard for it to be implemented without being erroneous in some > case, as Matt gives examples to describe. I find "erroneous" > to be completely unacceptable for the passive iterator. It > should *never* give the Process routine a dangling pointer or > run over random memory. If it can do that, we shouldn't have > it at all, because it them offers nothing at all over the > active case (it's harder to use, and *less* safe?). > > I tried to define this as a bounded error, with it being > unspecified what cursors are returned, but Matt convinced me > that that was too hard to implement, and that Insert and > Delete in a passive iterator is a mistake anyway. Fine; then > simply checking that these don't happen is the better option. MJH: I'd love it if we could find a way for this to be a bounded error. (Hey, I'd accept erroneous behavior, but I appear to be losing that battle.) ENDMJH. > In any case, we should err on the side of safe and > predictable. The containers have far too much "unspecified" > behavior for my taste already. (I'd prefer checks on > Query_Element and Update_Element, but these would have some > space overhead on every element, which might be too much - > which is why I didn't write them in.) > > Matt seems to have little interest in portable, predicable > behavior. I don't understand that, unless he thinks everyone > is going to use his implementations anyway (so it doesn't > matter what the standard says). I can say with certainty that > that will not be the case; the standard has to provide the > predictability. MJH: Similar to what Dijkstra said, my brain can only tolerate a certain threshold of complexity, so I have to accept my limitations, and find some way to solve hard problems. Yes, I tend to view this specification through the lens of one particular implementation, but that's the only way I know how to make this problem tractable. It is certainly not because I don't care about portability and predictability. Our disagreement might simply reflect our difference in problem solving approaches: I tend to be a bottom-up designer, and you tend to be a top-down designer. Vive la difference... ENDMJH. **************************************************************** From: Matthew Heaney Sent: Saturday, November 13, 2004 7:05 PM > > ... > > > > generic > > ... > > package Generic_Keys is > > > > ... > > procedure Replace (Container : in out Set; > > Key : in Key_Type; > > New_Item : in Element_Type); > > > > MJH: > > > > This operation needs to be removed from this API. > ... > (followed by a lot of irrelevant discussion) > > The bottom line is that Generic_Keys.Replace must be > removed from this > > API. It provides no new functionality, and only adds unnecessary > > clutter. > > For someone who is always proposing unnecessary operations, > this is quite a stretch! I don't understand why you feel so > strongly about this operation. MJH: Because it has goofy semantics, and no sensible person will use it. ENDMJH. > Certainly, this isn't the most useful routine, except in one > case: when a project decides to move from a Map to a Set > representation. MJH: The purpose of Generic_Keys is to allow key-based manipulation of a set, not to make it easy to change from a map to a set. Even if a project does use Generic_Keys to facilitate such a change, they still have work to do: at a minimum they'll have to instantiate Generic_Keys. If they need a key-based Replace too, then they can easily write it themselves as a local procedure. ENDMJH. > Therefore, it should be the case that as many operations as > possible from the Map be available in the Set, and in > particular in the Generic_Keys (since that is the direct > counterpart of the Map). Forcing half the operations to be > rewritten or moved is simply not acceptable. MJH: Fine, they can use Generic_Keys, and if they choose, write their own Replace. ENDMJH. > Thus, the Replace operation is declared here simply because a > similar operation is declared in the Maps. I would have > preferred to declare an Insert and Include as well, but these > require goofy checks that the key is compatible with the > element. Note that I would not expect any of these operations > to be used in new (hand-written) code; they'd only be used > when a Map is converted into a Set. So the fact that their > semantics is a little weird is irrelevant -- they're simply > an aid for changing between containers. MJH: But this particular aid doesn't need to be provided by us. It can be implemented using other container primitives. ENDMJH. > Anyway, I want Generic_Keys to be as similar as possible to > Maps (and indeed, our style guide will recommend avoiding > operations that are not common, like Update_Element - even if > it is renamed, it won't have the same parameter list or > semantics). That means that there will be a few operations > with odd semantics. So be it. MJH: That the key-based Replace operation has "odd semantics" hardly bolsters your case. It needs to go... ENDMJH. **************************************************************** From: Randy Brukardt Sent: Saturday, November 13, 2004 11:55 PM General remark: I wish you'd say something only once. Repeating yourself just bulks up our mail files, and makes me repeat myself, ad nausem. Matt wrote: > > Certainly, this isn't the most useful routine, except in one > > case: when a project decides to move from a Map to a Set > > representation. > > MJH: > > The purpose of Generic_Keys is to allow key-based manipulation of a set, not > to make it easy to change from a map to a set. Even if a project does use > Generic_Keys to facilitate such a change, they still have work to do: at a > minimum they'll have to instantiate Generic_Keys. Of course. And change the names of the types. But that's all that they should have to do. > If they need a key-based > Replace too, then they can easily write it themselves as a local > procedure. That's not much an argument, for two reasons: (a) it applies to virtually everything in Generic_Keys, and (b) such a subprogram would be in the wrong package, and you'd still have to change every call. (At least those of us who never use use clauses would.) ... > > Thus, the Replace operation is declared here simply because a > > similar operation is declared in the Maps. I would have > > preferred to declare an Insert and Include as well, but these > > require goofy checks that the key is compatible with the > > element. Note that I would not expect any of these operations > > to be used in new (hand-written) code; they'd only be used > > when a Map is converted into a Set. So the fact that their > > semantics is a little weird is irrelevant -- they're simply > > an aid for changing between containers. > > But this particular aid doesn't need to be provided by us. It can be > implemented using other container primitives. Sure, as can virtually everything else in Generic_Keys. That argument could be used to suggest that most of the operations in Generic_Keys be dropped. Certainly the Delete, Exclude, and Equivalent_Keys operations should be dropped: they're easy to write yourself and hardly ever would be used. Indeed, the only operations that would be a problem to write yourself are Checked_Update_Element and Find (both for performance reasons). The rest of them are unnecessary (if you take the view you have above). Let me say it again. I want to make moving between a Map and Set as easy as possible. Rewriting nearly every call doesn't work! If we can't have that, (and your view holds sway), then lets get rid of all of the junk operations that exist mainly to make it easier to port (or are just junk operations): the ones named above, "Swap" in lists, "Equivalent_Elements" in Hashed_Sets, "Equivalent_Keys" everywhere, ordering operators in ordered containers, etc. And let's name all of the operations that have different semantics/profiles differently. **************************************************************** From: Randy Brukardt Sent: Sunday, November 14, 2004 12:31 AM Matt said: > Matt said: > > > > In particular, it is perfectly acceptable (in fact, the API is > > > designed to facilitate this) for multiple tasks to be > > iterating over a > > > same container object, using either cursors or the passive iterator. > > > Randy replied: > > > Maybe, but it violates A(3). > > Does A(3) apply when the calls denote the same container ("overlapping > objects")? Yes, of course. Note that the parameter mode isn't mentioned in A(3): it *always* applies to by-reference parameters. ... > > The current rule is quite simple: if you want to use multiple > > tasks on a single container (or any other entity of a > > predefined type), wrap it in a protected object. Any other > > rule is going to be far more complex, both to use and to understand. > > As I argue above, prohibiting simultaneous reads by multiple tasks is > already hard to understand. Most developers will assume that multiple > readers are allowed, and they will therefore most likely corrupt the > internal state that we don't advertise can change. I would presume that these "most developers" aren't Ada programmers used to using tasking, because experienced Ada programmers know about avoid multiple access to the same object. Most of them stumble over it the first time they write a tasking program (using Text_IO), and ought not to be surprised afterwards. > > We've always expected that a secondary standard would > > consider task-safe "protected" containers. But the locking > > needed for that is quite expensive, and it certainly > > shouldn't be mandated. > > My main argument is that if there is "physical" state that can change (even > if the "logical" state does not change), then we need to advertise this fact. I'd have no problem making the container parameters to Iterate, Query_Elements, and Update_Elements in out. (Indeed, our style guide here says to do that always for primitive operations of extensible types, in order to not fence in extensions. It's unfortunate that that cannot be done for functions...) But I disagree with your fundimental point. An 'in' parameter of a composite type is really advertising that the logical properties of the type don't change. The programmer should never be concerned about what actually happens behind the scenes of the implementation. Writing code that depends on the implementation not modifying something is simply wrong, IMHO. After, 'in' parameters that aren't fully constant have a long history in Ada. Text_IO doesn't advertise physical (or even logical) state changes. Claw certainly does not. Ada.Numerics.Float_Random certainly has state changes in its generator parameters. Even Ada.Strings.Unbounded could change its 'in' parameters (I don't think any of the vendor implementations actually do that, but I've seen a couple of replacement packages that do.) The Rosen trick is widely used. So you shouldn't put too much meaning into 'in' parameters. I'd certainly not trust any tasking behavior unless the parameter was by-copy or the routine was explicitly documented to be task-safe. And predefined Ada packages are not required to be task-safe in this way. **************************************************************** From: Matthew Heaney Sent: Tuesday, November 16, 2004 12:34 AM !standard A.17 04-11-01 AI95-00302-03/08 !class amendment 04-01-14 !status work item 04-01-14 !status received 04-01-14 !priority Medium !difficulty Hard !subject Container library My review of the AI95-00302-03/08 (preatlanta) draft follows. For issues that I have already discussed in previous reviews, I have only made a brief comment. Each comment is bracketed with "MJH:" and "ENDMJH." pairs, and immediately follows the text to which it refers. -Matt !proposal The hashed map associative containers scatter keys in the container according to a hash function. The size of the hash table is automatically increased when the length of the container equals its capacity (which specifies the length before which it is guaranteed that no automatic hash table resizing occurs), thus preserving constant time complexity for insertions. MJH: We might want to generalize this paragraph, since we now have a hashed set, too. ENDMJH. The set associative container maintains elements in sort order. Insertions and searches have O(log N) time complexity even in the worst case. MJH: Similar my previous comment, we might want to generalize this, since we now have ordered maps, too. ENDMJH. All of the containers have alternate forms that accept an element type that is indefinite. The indefinite hashed maps also accept an indefinite key type, allowing (for example) type String to be used as the generic actual key type. MJH: We can probably just say "indefinite maps" here. ENDMJH. A.17 Containers The following major non-limited containers are provided: * (Expandable) Vectors of any non-limited type; * Doubly-linked Lists of any non-limited type; * Hashed Maps keyed by any non-limited type containing any non-limited type; * Ordered Sets of any non-limited type. MJH: As above, we should generalize what is said about maps and sets, since each of those containers now has both hashed and ordered forms. ENDMJH. Note that the language already includes several requirements that are important to the use of containers. First, library packages must be reentrant - multiple tasks can use the packages as long as they operate on separate containers. Thus, it is only necessary for a user to protect a container if a single container needs to be used by multiple tasks. MJH: I have already stated my objection to this feature (an objection perhaps based on a misunderstanding of A(3)). ENDMJH. A.17.2 The Package Containers.Vectors package Ada.Containers.Vectors is ... type Vector is tagged private; MJH: As has been discussed on Ada-Comment, I have mixed feelings about making the container types publicly tagged. My tentative opinion is that (as the language stands now) we should get rid of the tag. One justification for making the container types publicly tagged is that they're already privately tagged, since you need to derive from type Controlled to make memory management work, so you might as well expose the tag. However, it turns out that it is not necessary for the container type itself to derive from Controlled, since the full view of the type can be implemented like this: type Control_Type is new Controlled with record Elements : Element_Array_Access; Last : Extended_Index := No_Index; end record; type Vector is record Control : Control_Type; end record; This implementation has the benefit that since the container type isn't tagged, then there's no vtable, which I assume means that there's less run-time baggage. Another reason for making the type publicly tagged is that you can extend the type, and use class-wide types for polymorphic programming. However, in practice using container types this way would probably be extremely rare. Anything you'd be tempted to do to a container using dynamic mechanisms you can do more easily via static mechanisms such as cursors or iterators. I had given another reason for public taggedness during the Palma meeting, and that was that tagged types are implicitly aliased. But needing this functionality is probably rare, since this API has been designed (as much as possible) to give you unhindered access to elements. Assuming we state that the container type is a by-reference type, then in the rare case that aliasing of a container parameter were necessary, the developer could use 'Address and Address_To_Access_Conversions to obtain a pointer to the container object. (Or perhaps the language has been changed in this area, to add something like the 'Unrestricted_Access attribute in GNAT.) I bring all this up because I and at least one other developer have run into the scenario of wanting to implement the full view of a private type as a private derivation from Vector, like this: generic type ET is private; package P is type T is private; procedure Op (O : T); private package ET_Vectors is new A.C.Vectors (Positive, ET); type T is new ET_Vectors.Vector with null record; --lots of tedious overridings are req'd here end P; The problem is that several of the primitive operations return type Vector, which means they must be overridden. This is too painful, so in practice no one will bother, and in the example above you'll probably have to implement type T as a record with a vector component. But this feels unnatural. If the container type weren't tagged, no extension or overridings would be necessary. Another common technique is to derive from a type in order to bring its operations into local scope. In general the design of this API has guided by the philosophy that you should design around the common case. Declaring one type as a derivation of another type is a nearly ubiquitous idiom in Ada. Since type derivation is common, and polymorphic container programming is rare, we should design around the derivation case. That probably means getting rid of the public tag, unless the language is modified somehow to allow null type extensions to inherit a default version of functions that return the type. ENDMH. ... procedure Iterate (Container : in Vector; Process : not null access procedure (Position : in Cursor)); Invokes Process.all with a cursor that designates each element in Container, in index order. Any exception raised by Process is propagated. Program_Error is propagated if: * Process.all attempts to insert or delete elements from Container; or * Process.all finalizes Container; or * Process.all calls Move with Container as a parameter. AARM Note: This check takes place when the operations that insert or delete elements, etc. are called. There is no check needed if an attempt is made to insert or delete nothing (that is, Count = 0 or Length(Item) = 0). The check is easy to implement: each container needs a counter. The counter is incremented when Iterate is called, and decremented when Iterate completes. If the counter is nonzero when an operation that inserts or deletes is called, Finalize is called, or one of the other operations in the list occurs, Program_Error is raised. Swap and Generic_Sort are not included here, as they only copy elements. End AARM Notes. MJH: I have already stated my objection to this requirement. Basically, we already have rules to handle cursors that are "invalid", so I don't see why those rules shouldn't also apply to Iterate too. ENDMJH. A.17.7 Sets function "=" (Left, Right : Set) return Boolean; If Left and Right denote the same set object, then the function returns True. If Left and Right have different lengths, then the function returns False. Otherwise, for each element E in Left, the function returns False if an element equivalent to E is not present in Right. If the function has not returned a result after checking all of the elements, it return True. Any exception raised during evaluation of element equivalence is propagated. MJH: Do you want to move this into each subsection separately? Ordered sets and hashed sets implement set equality a little differently: For ordered sets, the elements are in order, so you only need to compare each element sequentially using element "=". For hashed sets, given an element of left, you compute the index of the bucket in right that corresponds to the left element's hash value, and then you compare the left element to each element in that bucket using element "=". I bring this up because the language above about returning false if there is no "equivalent" element is at best ambiguous (and at worse wrong), since for either container Equivalent_Elements isn't guaranteed to return the same result as "=" for elements. This is especially the case since Tucker wants to add an Equivalent function for sets, that is implemented in terms of element equivalence instead of element equality. ENDMJH. procedure Replace_Element (Container : in Set; Position : in Cursor; By : in Element_Type); If Position equals No_Element, then Constraint_Error is propagated. If Position does not designate an element in Container, then Program_Error is propagated. Otherwise, the element designated by Position is tested for equivalence to By; if equivalent Replace assigns By to the element designated by Position. Otherwise, the element designated by Position is removed from the container, then By is inserted into the container. If the insertion fails, Program_Error is propagated. MJH: If we decide to keep this operation, then it needs to pass the container as an inout parameter, not in-mode. ENDMJH. procedure Replace (Container : in out Set; Key : in Key_Type; New_Item : in Element_Type); Equivalent to Replace (Container, Find (Container, Key), New_Item). MJH: I think you meant to say that it's equivalent to Replace_Element. I have stated elsewhere that I don't agree that Generic_Keys.Replace should be included in this API. ENDMJH. procedure Checked_Update_Element (Container : in out Set; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); MJH: I have stated elsewhere that I think this operation should be named just Update_Element, not Checked_Update_Element. ENDMJH. A.17.8 The Package Containers.Hashed_Sets Static Semantics The package Containers.Hashed_Sets has the following declaration: generic type Element_Type is private; with function Hash (Element : Element_Type) return Hash_Type; with function Equivalent_Elements (Left, Right : Element_Type) return Boolean; MJH: I have stated elsewhere that I think this formal operation should be named Equivalent_Keys, not Equivalent_Elements. Note that Randy's argument for including the operation Generic_Keys.Replace was to facilitate changing from a map to a set. That goal is undermined if you use a different name from Equivalent_Keys. This generic formal region also needs to pass element "=", since that's how set "=" is implemented. See my note above. ENDMJH. Function Equivalent_Elements is expected to return the same value each time it is called with a particular pair of element values. For any two elements E1 and E2, the boolean values Equivalent_Keys (E1, E2) and Equivalent_Key (E2, E1) are expected to be equal. If Equivalent_Elements behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call Equivalent_Elements, and how many times they call it, is unspecified. MJH: In the middle sentence you refer to "Equivalent_Keys". Is this a typo? (But see my comments above.) ENDMJH. A.17.9 The Package Containers.Ordered_Sets Static Semantics The package Containers.Ordered_Sets has the following declaration: generic type Element_Type is private; with function "<" (Left, Right : Element_Type) return Boolean is <>; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Ordered_Sets is pragma Preelaborate (Ordered_Sets); type Set is tagged private; type Cursor is private; Empty_Set : constant Set; No_Element : constant Cursor; function "=" (Left, Right : Set) return Boolean; MJH: Tucker wants an Equivalent function for sets, implemented in terms of element equivalence. (This is different from set equality, which is implemented in terms of element equality.) ENDMJH. **************************************************************** From: Matthew Heaney Sent: Sunday, April 3, 2005 10:30 PM Georg Bauhaus forwarded this comment to me. Several lines of text say: Program_Error is propagated unless Before is equal to No_Element or designated an element in Target. He was asking about the mixed use of tense. It looks like it should say "designates an element...". **************************************************************** From: Matthew Heaney Sent: Wednesday, March 23, 2005 11:14 PM I've been reading the API described here: The first line of that file says: 05-02-02 AI95-00302-03/10 I don't know if this is the most up-to-date API, but I wanted to get this in the queue anyway. Each comment is bracked by "MJH:" and "ENDMJH." pairs, and immediately follows the API text to which it refers. A little after the declaration of the spec of the vector in A.18.2, the API says: * it inserts or deletes elements of V, that is, it calls the Insert, Insert_Space, Clear, Delete, Delete_Last, or Set_Length procedures with V as a parameter; or MJH: Do you want to add Delete_First to this list? ENDMJH. And a little after that, then API says: Some operations are assumed to not change elements. For such an operation, a subprogram is said to *tamper with elements* of a vector object V if: * it tampers with cursors of V; or * it modifies one or more elements of V, that is, it calls the Replace_Element, Update_Element, or Swap procedures or an instance of Generic_Sort with V as a parameter. AARM Note: Swap and Generic_Sort copy elements rather than reordering them, so they can be allowed for Iterate. MJH: I don't see then how you can implement a tamper-resistent vector using a single internal counter (see the AARM note the follows the description of the Iterate operation). You're saying that calling Generic_Sort is allowed during Iterate, but not allowed during (say) Query_Element. So how can you use a single counter to handle both cases? (But maybe there is a way that I just haven't figured out.) Perhaps it is the case that you can't call Swap during Iterate. I think it's true that in our model, a cursor designates an element. If you sort during Iterate (inside the call to Process.all), then the cursor parameter won't continue to designate the same element after it gets moved into its new sort position (since a cursor is merely an array index). ENDMJH. procedure Set_Length (Container : in out Vector; Length : in Count_Type); MJH: If Set_Length is called by Process.all during Iterate, and the requested length is the same as the current length, then is this an error? Or is the implementation still required to raise PE? ENDMJH. procedure Delete (Container : in out Vector; Index : in Extended_Index; Count : in Count_Type := 1); If Index is not in the range First_Index (Container) .. Last_Index (Container), then Constraint_Error is propagated. If Count is 0, Delete has no effect. MJH: I think these two sentences are backwards. If Count is 0, then it doesn't really matter what value Index has. (Right?) ENDMJH. procedure Delete (Container : in out Vector; Position : in out Cursor; Count : in Count_Type := 1); If Position equals No_Element, then Constraint_Error is propagated. If Position does not designate an element in Container, then Program_Error is propagated. Otherwise, Delete (Container, To_Index (Position), Count) is called. MJH: Is that first sentence true even when Count = 0? Shouldn't Count=0 be a no-op? ENDMJH. **************************************************************** From: Matthew Heaney Sent: Thursday, March 24, 2005 12:51 PM Here's the section of the vectors spec for Query_Element: procedure Query_Element (Container : in Vector; Index : in Index_Type; Process : not null access procedure (Element : in Element_Type)); ...Program_Error is propagated if Process.all tampers with the elements of Container. Any exception raised by Process.all is propagated. procedure Query_Element (Position : in Cursor; Process : not null access procedure (Element : in Element_Type)); ...Program_Error is propagated if Process.all tampers with the elements of Container. Any exception raised by Process.all is propagated. AARM Note: The tamper with the elements check is intended to prevent the Element parameter of Process from being modified or deleted outside of Process. The check prevent data loss (if Element_Type is passed by copy) or erroneous execution (if Element_Type is an unconstrained type in an indefinite container). MJH: There are two issues: changing the cardinality of the vector container, and changing the value of an element. It's relatively cheap to check for changes of cardinality during iteration, using the counter as described in the AARM Note near the description of Iterate. However, the API fragment I have cited above wants to make a stronger check: that a specific element isn't updated while Query_Elem is in progress. How can you make this check without having a separate counter for each vector element? What is the exact intent of the "tampers with elements" check? To monitor whether any element is being manipulated, or to monitor specific elements? If the former (any element), then you do intend to restrict the user's ability to manipulate other elements while a query of an element is in progress? If so, then it would appear that description of the implementation of the check (using a single internal counter) is incorrect, since two internal counters are necessary. If the latter (specific elements), then how can you monitor specific elements without using a per-element counter? You have the AARM Note cited above that gives a rationale for the tampers-with-element check, but do we really want to do this? If a user is querying an element, and then manipulates that same element by calling other operations inside Process.all, then isn't this just a case of aliasing? Why is this the container's problem? ENDMJH. **************************************************************** From: Randy Brukardt Sent: Thursday, March 24, 2005 6:09 PM Matt wrote: > I've been reading the API described here: > > > > The first line of that file says: > > 05-02-02 AI95-00302-03/10 > > I don't know if this is the most up-to-date API, but I wanted to get this in > the queue anyway. There is a slightly newer version with some typos fixed, but it is essentially correct. > Each comment is bracked by "MJH:" and "ENDMJH." pairs, and immediately > follows the API text to which it refers. > > A little after the declaration of the spec of the vector in > A.18.2, the API > says: > > * it inserts or deletes elements of V, that is, it calls the Insert, > Insert_Space, Clear, Delete, Delete_Last, or Set_Length > procedures with V > as a parameter; or > > MJH: > > Do you want to add Delete_First to this list? > > ENDMJH. No, Delete_Last was included by mistake. Since Delete_Last is declared in terms of Clear and Delete, it is covered by the list already. (That's not true for Lists, for instance, which is probably why the error happened.) I note that it is not included in the Amendment or AARM text, so it probably was noticed and incompletely fixed. > And a little after that, then API says: > > Some operations are assumed to not change elements. For such an > operation, a > subprogram is said to *tamper with elements* of a vector object V if: > > * it tampers with cursors of V; or > * it modifies one or more elements of V, that is, it calls the > Replace_Element, Update_Element, or Swap procedures or an > instance of Generic_Sort with V as a parameter. > > AARM Note: > Swap and Generic_Sort copy elements rather than reordering them, so > they can be allowed for Iterate. > > MJH: > > I don't see then how you can implement a tamper-resistent vector using a > single internal counter (see the AARM note the follows the description of > the Iterate operation). You're saying that calling Generic_Sort is allowed > during Iterate, but not allowed during (say) Query_Element. So how can you > use a single counter to handle both cases? (But maybe there is a way that I > just haven't figured out.) It's clear that there has to be separate indicators for "tamper with cursors" and "tamper with elements". They were split at the last moment to fix a serious bug, so it's possible that not all of the notes were updated. But I can't find anything that says you can implement both with a single counter... > Perhaps it is the case that you can't call Swap during Iterate. I think > it's true that in our model, a cursor designates an element. If you sort > during Iterate (inside the call to Process.all), then the cursor parameter > won't continue to designate the same element after it gets moved into its > new sort position (since a cursor is merely an array index). > > ENDMJH. It's subtle, but the cursor continues to designate the same element (say the 10th) when the vector is sorted or elments are swapped. The *contents* of the element are changed, but that's because they were copied. So that's OK, it won't break the iterator. However, for a list, the *order* of the cursors is changed for a sort or a Swap_Links, which would confuse the iteration. That's why the difference. We wouldn't have to be that subtle, and just disallow Swap and sorting all of the time. But we figured more capability was better. > procedure Set_Length (Container : in out Vector; > Length : in Count_Type); > > MJH: > > If Set_Length is called by Process.all during Iterate, and the > requested length is the same as the current length, then is this an > error? Or is the implementation still required to raise PE? > > ENDMJH. It's still an error. It gets much too messy and error prone to make it depend on the exact values of the parameters and the state of the Vector. There are some cases (like this one) where it would be relatively easy to allow more, but it doesn't seem worthwhile to do so (why would you want to Set_Length to the same length, when you couldn't set it to any other length?) > procedure Delete (Container : in out Vector; > Index : in Extended_Index; > Count : in Count_Type := 1); > > If Index is not in the range First_Index (Container) .. Last_Index (Container), > then Constraint_Error is propagated. If Count is 0, Delete has no effect. > > MJH: > > I think these two sentences are backwards. If Count is 0, then it > doesn't really matter what value Index has. (Right?) > > ENDMJH. I think one of the reviewers insisted that we check the bounds first. Calling Delete with a Count of 0 is rather pathological anyway, so it doesn't make much difference in practice. I realize that it is different than slices, but I think there is a consensus that the handling of slices in Ada is a mistake in this way (because it has a fairly significant impact on generated code). Obviously, we're not going to change slices at this late date, but there isn't much point in repeating the mistake. > procedure Delete (Container : in out Vector; > Position : in out Cursor; > Count : in Count_Type := 1); > > If Position equals No_Element, then Constraint_Error is propagated. If Position > does not designate an element in Container, then Program_Error is propagated. > Otherwise, Delete (Container, To_Index (Position), Count) is called. > > MJH: > > Is that first sentence true even when Count = 0? Shouldn't Count=0 be a > no-op? > > ENDMJH. Same comments as above. **************************************************************** From: Matthew Heaney Sent: Friday, March 25, 2005 9:23 AM >>I think these two sentences are backwards. If Count is 0, then it >>doesn't really matter what value Index has. (Right?) > > I think one of the reviewers insisted that we check the bounds first. > Calling Delete with a Count of 0 is rather pathological anyway, so it > doesn't make much difference in practice. I realize that it is different > than slices, but I think there is a consensus that the handling of slices in > Ada is a mistake in this way (because it has a fairly significant impact on > generated code). Obviously, we're not going to change slices at this late > date, but there isn't much point in repeating the mistake. But this does mean that you can't delete 0 elements from an empty vector. Delete will raise CE because the index check will always fail. That is just a limiting case of deleting 0 elements off the end of a vector, empty or not. You're sure this is what you want? One of the nice things about Ada arrays is that you don't have to think too hard about 0-length arrays, since the language always does the right thing. You shouldn't have to worry about calling an operation for which the Count parameter has the value 0. The language should do the right thing (that is, do nothing). **************************************************************** From: Matthew Heaney Sent: Friday, March 25, 2005 7:03 PM The index-based Delete operation is declared this way: procedure Delete (Container : in out Vector; Index : in Extended_Index; Count : in Count_Type := 1); Why does parameter Index have subtype Extended_Index? That subtype includes the values Index_Type'First - 1 and Index_Type'Last + 1, yet both of these values will fail the index check that happens as the first processing step. The subtype Extended_Index appears to be vestigial. It would only make sense if the check for Count=0 happens prior to the index check. Since the semantics were changed, and the index check now happens first, you might as well strengthen the precondition and declare parameter Index with (sub)type Index_Type. **************************************************************** From: Matthew Heaney Sent: Sunday, March 27, 2005 10:52 PM > But this does mean that you can't delete 0 elements from an empty > vector. Delete will raise CE because the index check will > always fail. The description of Delete_First says: START OF QUOTE: procedure Delete_First (Container : in out Vector; Count : in Count_Type := 1); {AI95-00302-03} Equivalent to Delete (Container, Index_Type'First, Count). END OF QUOTE. What is the behavior when the vector is empty? Does Delete_First raise CE or not? The description of Delete says: START OF QUOTE: procedure Delete (Container : in out Vector; Index : in Extended_Index; Count : in Count_Type := 1); {AI95-00302-03} If Index is not in the range First_Index (Container) .. Last_Index (Container), then Constraint_Error is propagated... END OF QUOTE. For an empty vector, Index_Type'First is not in the indicated range, so I guess the answer is that CE is raised. However, the description of Delete_Last says: START OF QUOTE: procedure Delete_Last (Container : in out Vector; Count : in Count_Type := 1); {AI95-00302-03} If Length (Container) <= Count then Delete_Last is equivalent to Clear (Container)... END OF QUOTE. For an empty vector, Delete_Last does nothing. This would mean that Delete_First and Delete_Last behave differently for an empty vector. Was this the intent? **************************************************************** From: Matt Heaney Sent: Tuesday, March 29, 2005 10:05 PM > > Why, for example, is inserting 0 elements at the end of a > > vector a no-op, while deleting 0 elements from the end is an error? > > That's a different issues, and I agree this seems inconsistent. The guiding principle was always "inserting or deleting 0 items is never an error." The old rule handles the corner case of inserting or deleting 0 items in a vector that is full (such that Last_Index (V) = Index_Type'Last). The new rule will raise CE, since the API specifies that the index check must be performed prior to the count=0 check. But this doesn't make any sense, since count=0 doesn't change the state of the container. The purpose of exceptions is to prevent damage to the container, or to indicate that the postcondition can not be satisfied. The new rule is wrong, since no damage can occur, and the postcondition is satisfied, when count=0. **************************************************************** From: Pascal Leroy Sent: Wednesday, March 30, 2005 1:57 AM > > That's a different issues, and I agree this seems inconsistent. > > The guiding principle was always "inserting or deleting 0 > items is never an error." The old rule handles the corner > case of inserting or deleting 0 items in a vector that is > full (such that Last_Index (V) = Index_Type'Last). > > The new rule will raise CE, since the API specifies that the > index check must be performed prior to the count=0 check. > But this doesn't make any sense, since count=0 doesn't change > the state of the container. > > The purpose of exceptions is to prevent damage to the > container, or to indicate that the postcondition can not be > satisfied. The new rule is wrong, since no damage can occur, > and the postcondition is satisfied, when count=0. I am worried by the inconsistency that you pointed out, but on the other hand I am unconvinced by your argument. One purpose of exceptions is to preserve the integrity of the container. But another, equally important, purpose is to ensure that usages that are so outlandish that they probably indicate bugs are flagged. If I have a vector with index range 1..10, and I am deleting elements at position 20, surely something is fishy. Now you are telling me that if the number of elements that I am deleting happens to be 0, it's OK to hide the fishiness. That bothers me. I see the situation (Position => 20, Count => 0) to be a special case of (Position => 20, Count => N), rather than (Position => N, Count => 0). **************************************************************** From: Matt Heaney Sent: Wednesday, March 30, 2005 7:57 AM For a vector with index range 1..10, we're really only talking about deleting 0 elements at position 11; that is, the value Last_Index (V) + 1. This is the same range we allow for Insert. **************************************************************** From: Matt Heaney Sent: Thursday, April 1, 2005 11:40 PM Actually, we could liberalize the rule for the index-based delete such that we allow the Index parameter to have the value Last_Index (V) + 1, no matter what the value of the Count parameter. With these semantics we wouldn't have to have a special case for Count=0. The reason is that Delete always computes the minimum of: N1 = Count N2 = Last_Index (V) - Index + 1. If Index = Last_Index (V) + 1, then N2 has the value 0. This is equivalent to our special case, which says Count=0 does nothing. **************************************************************** From: Randy Brukardt Sent: Thursday, March 24, 2005 6:37 PM ... > It's relatively cheap to check for changes of cardinality during > iteration, using the counter as described in the AARM Note near the > description of Iterate. Right. > However, the API fragment I have cited above wants to make a stronger > check: that a specific element isn't updated while Query_Elem is in > progress. How can you make this check without having a separate counter > for each vector element? The *problem* is with specific elements; but the check isn't phrased that way. > What is the exact intent of the "tampers with elements" check? To > monitor whether any element is being manipulated, or to monitor specific > elements? It's as written. > If the former (any element), then you do intend to restrict the user's > ability to manipulate other elements while a query of an element is in > progress? If so, then it would appear that description of the > implementation of the check (using a single internal counter) is > incorrect, since two internal counters are necessary. There is clearly one counter for each "tamper with" check. Actually, the *tamper with elements* only needs a single bit, because it can't be recursive, and there isn't any requirement to worry about multiple tasks. > If the latter (specific elements), then how can you monitor specific > elements without using a per-element counter? You can't, but that isn't the intent. > You have the AARM Note cited above that gives a rationale for the > tampers-with-element check, but do we really want to do this? Yes. It was what was agreed to in Atlanta. I was still concerned that this was too limiting, so I wrote up some examples of what would be disallowed and presented them in Paris. (I was in favor of a per-element check.) The group was very firm that they wanted to stick to the overall check. Since it's already been explicitly discussed twice, an attempt to bring it up again would be way out of bounds. > If a user is querying an element, and then manipulates that same element by > calling other operations inside Process.all, then isn't this just a case > of aliasing? Why is this the container's problem? There are two problems with this: (1) Since we're not specifying the implementation of the container, we can't lean on the existing language rules here. We have to state in some fashion what happens. So it certainly is the containers problem. One possibility considered was to allow it to be aliased, but that doesn't always work. Which brings us to (2) The container might allocate different memory for the element if it is modified. Depending on the implementation, that could happen for pretty much any composite type. (Not to mention reallocation of vectors.) But the fact that we're defining indefinite containers means that it is pretty much required in some cases (say, if a different lengthed string was inserted). That would leave the parameter to Process.all pointing to deallocated memory (presuming it was passed by reference, which is most likely, and what we generally want in this case anyway) -- which is the worst kind of erroneousness. I think there was a fairly strong feeling that programming with the containers should be as safe (and preferably safer) than programming with "bare" access types and the like. One of the big advantages of using a predefined container is that it can make the kinds of checks that you would never bother to program yourself. (That's one reason why C++ programs seem so enamored of Std::Vector -- it makes the checks that they would never write themselves.) **************************************************************** From: Matthew Heaney Sent: Thursday, March 24, 2005 9:21 PM > There is clearly one counter for each "tamper with" check. > Actually, the *tamper with elements* only needs a single bit, > because it can't be recursive, and there isn't any > requirement to worry about multiple tasks. Now I'm really confused. Here is a list example from the !examples section: START OF QUOTE: For the definite form (the case here), this will make a copy of the element in order to perform the swap. However, we would prefer not to make a copy of this element, which is a list and hence potentially large (or otherwise expensive to copy). To avoid making a copy, we can use Move and nested processing routines: procedure Swap (V : in out Vector_Of_Lists.Vector; I, J : in Extended_Index) is procedure Process_I (IE : in out List) is procedure Process_J (JE : in out List) is IE_Temp : List; begin Move (Target => IE_Temp, Source => IE); Move (Target => IE, Source => JE); Move (Target => JE, Source => IE_Temp); end; begin Update_Element (V, Index => J, Process => Process_J'Access); end; begin Update_Element (V, Index => I, Process => Process_I'Access); end Swap; END OF QUOTE. Is this example legal or illegal? Does this raise PE or not? This certainly looks like a kind of recursion, so I don't understand how you can implement this using "a single bit." **************************************************************** From: Matthew Heaney Sent: Thursday, March 24, 2005 10:06 PM > It's clear that there has to be separate indicators for > "tamper with cursors" and "tamper with elements". They were > split at the last moment to fix a serious bug, so it's > possible that not all of the notes were updated. To what "serious bug" are you referring? > But I can't > find anything that says you can implement both with a single > counter... The AARM Note that follows the description of Iterate says: START OF QUOTE: The check is easy to implement: each container needs a counter. The counter is incremented when Iterate is called, and decremented when Iterate completes. If the counter is nonzero when an operation that inserts or deletes is called, Finalize is called, or one of the other operations in the list occurs, Program_Error is raised. END OF QUOTE. This wording makes it seem as if only a single counter is necessary. **************************************************************** From: Randy Brukardt Sent: Thursday, March 24, 2005 10:41 PM ... > Now I'm really confused. Here is a list example from the > !examples section: ... > Is this example legal or illegal? It's legal, of course (in that you can compile it, which is the meaning of "legal" in the Standard). > Does this raise PE or not? Yes, it does raise PE, because the inner Update_Element certainly "tampers with elements" (read the definition carefully). This is one of the examples I showed at Paris (although I didn't realize it was in the AI, that will have to be fixed!). > This certainly looks like a kind of recursion, so I don't > understand how you > can implement this using "a single bit." The example will raise Program_Error, so a single bit is sufficient. **************************************************************** From: Randy Brukardt Sent: Thursday, March 24, 2005 10:45 PM > > It's clear that there has to be separate indicators for > > "tamper with cursors" and "tamper with elements". They were > > split at the last moment to fix a serious bug, so it's > > possible that not all of the notes were updated. > > To what "serious bug" are you referring? The erroneousness I mentioned in an earlier message today. The whole point of the check for Update_Element and Query_Element was to eliminate that problem, and we'd missed the most likely cause. ... > This wording makes it seem as if only a single counter is necessary. Sure, you only need a single counter to implement the check for Iterate. Why would you think this would be talking about some other, unrelated check?? You need an extra bit to implement the check for Update_Element and Query_Element; I don't think that there is any note discussing that, but it seems obvious (since recursion is not allowed). **************************************************************** From: Matthew Heaney Sent: Thursday, March 24, 2005 10:57 PM > Sure, you only need a single counter to implement the check > for Iterate. Why would you think this would be talking about > some other, unrelated check?? Because it says "each container needs a counter." > You need an extra bit to implement the check for > Update_Element and Query_Element; I don't think that there is > any note discussing that, but it seems obvious (since > recursion is not allowed). So you don't even allow Query_Element to recurse? What is the harm in allowing nested invocations of Query_Element (as in the example from my previous example)? How does a developer get a constant view of two elements simultaneously? In one of my examples, I compare a pair of elements in a vector like this: function Is_Equal (I, J : Positive) return Boolean is Result : Boolean; procedure Process_I (IE : in Pair_Type) is procedure Process_J (JE : in Pair_Type) is begin Result := Equal_Case_Insensitive (IE.Sorted, JE.Sorted); end; begin Query_Element (V, J, Process_J'Access); end; begin Query_Element (V, I, Process_I'Access); return Result; end Is_Equal; Are you saying this will raise PE? **************************************************************** From: Randy Brukardt Sent: Thursday, March 24, 2005 11:19 PM Matthew Heaney wrote: > > Sure, you only need a single counter to implement the check > > for Iterate. Why would you think this would be talking about > > some other, unrelated check?? > > Because it says "each container needs a counter." Of course it needs a counter. It needs lots of other stuff, too (length, pointer to elements, etc.), but we're not talking about those things here. This doesn't say that it *only* needs a counter to implement every check in the universe! > > You need an extra bit to implement the check for > > Update_Element and Query_Element; I don't think that there is > > any note discussing that, but it seems obvious (since > > recursion is not allowed). > > So you don't even allow Query_Element to recurse? What is the harm in > allowing nested invocations of Query_Element (as in the example from my > previous example)? What gave you that idea? The check is defined to be "tamper with elements"; read the wording if you want to know *exactly* what is required. Query_Element doesn't "tamper with elements". So it of course can be recursively called. But it doesn't participate in the check, it just needs the check made. So I don't see the issue. **************************************************************** From: Matthew Heaney Sent: Thursday, March 24, 2005 11:44 PM > Query_Element doesn't "tamper with elements". So it of course > can be recursively called. But it doesn't participate in the > check, it just needs the check made. So I don't see the issue. Well, I still don't understand the RM. How you can you implement nested calls to Query_Element without using a counter? I have repeated my example from my last message, and marked the calls to Query_Element "call #1" and "call #2". You're telling me that invocation must raise PE if it tampers with elements. Fine. That means there needs to be some state to indicate that a call to Query_Element is in progress, and that state is checked by an operation that tampers with elements. So QE sets the bit when called at point #1. The invocation of QE at point #2 also sets the bit. When QE #2 finishes executing, it sets the bit to false. But that wipes out the indication that QE #1 is still executing. Right? function Is_Equal (I, J : Positive) return Boolean is