!standard A.18.2(97.1/3) 16-10-02 AI12-0111-1/04 !class Amendment 16-06-06 !status work item 14-05-15 !status received 14-02-08 !priority Medium !difficulty Hard !qualifier Omission !subject Tampering considered too expensive !summary Add a nested package Stable within each Container package, which provides a *stabilized view* of an underlying regular (unstable) container. Such stable containers have no operations that change their shape, and the stabilization "lock" on the underlying regular container need only be set once when the view is created, and released when the view goes away. Restrict "tampering with elements" checks to containers of elements of an indefinite type. !problem The performance of the standard containers is way less efficient than the C++ containers. The primary cause of this is tampering checks, and specifically, the cost of setting up and tearing down the tampering state. In a loop, this overhead can be considerable. The standard should be improved to reduce this overhead. !proposal We propose to eliminate the requirement for per-reference tampering checks by providing a mechanism to "stabilize" a container against tampering. This can be used by the iterator expansion and elsewhere. When a container is stabilized, we could provide a different mechanism for indexing which would fail, or not be available, if the iterator were not stabilized. We propose the notion of a Stable_XXX handle to the container as a whole, which would be a separate type with cheaper indexing operations. This can also be a way to get read-only references which could be passed to multiple tasks. That is you could safely pass a read-only stable reference to multiple tasks. We propose to define a nested package "Stable" within each of the various Container packages, removing operations that tamper with cursors. Below we provide a Stable subpackage for Vectors. The stable vector is of a limited type, and has a non-null access discriminant called "Base," with no default, which refers to an underlying unstable vector which is automatically stabilized as long as the stable vector exists (hence the stabilized vector needs to be controlled so that when it goes away, it automatically releases the stabilization "lock" on the underlying vector). Two essential properties of stable containers is that they have no tampering checks, and can be iterated by multiple tasks in parallel. Note that we propose to reclassify Merge as tampering with cursors, and to restrict "tampering with elements" to apply only to containers with indefinite elements. !wording Modify 18.2(92/2) (and similar paragraphs elsewhere): * it inserts or deletes elements of V, that is, it calls the Insert, Insert_Space, Clear, Delete, or Set_Length procedures{,or Merge procedure of an instance of Generic_Sorting,} with V as a parameter; or Delete 18.2(95/2-97/2) (and similar paragraphs except for Indefinite_*) talking about tampering with elements. Insert new section after A.18.2 (or in a separate area at the end of A.18 to avoid using 4-component numbers): A.18.2.1 The Nested Package Containers.Vectors.Stable The nested package Vectors.Stable provides a type Stable.Vector that represents a /stable/ vector, which is one that cannot grow and shrink. Such a vector can be created by calling the To_Vector or Copy functions, or by establishing a /stabilized view/ of a regular Vector. While a stabilized view exists, no operations that tamper with cursors are permitted on the underlying regular Vector. The operations of this package are equivalent to those for regular Vectors, except that no tampering checks are performed. If a stable vector is declared with the Base discriminant designating a pre-existing regular vector, the stable vector represents a stabilized view of the underlying regular vector, and any operation on the stable vector is reflected on the underlying regular vector. While a stabilized view exists, any operation that tampers with cursors performed on the underlying vector is prohibited. The finalization of a stable vector that provides such a view removes this restriction on the underlying regular vector Redundant[(though some other restriction might exist due to other concurrent iterations or stabilized views)]. If a stable vector is declared without specifying Base, the object must be initialized. The initializing expression of the stable vector, Redundant[(typically a call on To_Vector or Copy)], determines the Length of the vector. By default the Length is zero. The Length of a stable vector never changes after initialization. [Author's note: Below, we have left in the original paragraph numbers from the Vector container because we thought they might be useful, and mostly because we are too lazy to remove them.] ... -- enclosing package Ada.Containers.Vectors package Stable is 8/3 type Vector (Base : not null access Vectors.Vector) is tagged limited private with Constant_Indexing => Constant_Reference, Variable_Indexing => Reference, Default_Iterator => Iterate, Iterator_Element => Element_Type; pragma Preelaborable_Initialization(Vector); 9/2 type Cursor is private; pragma Preelaborable_Initialization(Cursor); 10/2 Empty_Vector : constant Vector; 11/2 No_Element : constant Cursor; 11.1/3 function Has_Element (Position : Cursor) return Boolean; 11.2/3 package Vector_Iterator_Interfaces is new Ada.Iterator_Interfaces (Cursor, Has_Element); 12/2 function "=" (Left, Right : Vector) return Boolean; 13/2 function To_Vector (Length : Count_Type) return Vector; 14/2 function To_Vector (New_Item : Element_Type; Length : Count_Type) return Vector; 15/2 function "&" (Left, Right : Vector) return Vector; 16/2 function "&" (Left : Vector; Right : Element_Type) return Vector; 17/2 function "&" (Left : Element_Type; Right : Vector) return Vector; 18/2 function "&" (Left, Right : Element_Type) return Vector; 19/2 function Capacity (Container : Vector) return Count_Type; 21/2 function Length (Container : Vector) return Count_Type; 23/2 function Is_Empty (Container : Vector) return Boolean; 25/2 function To_Cursor (Container : Vector; Index : Extended_Index) return Cursor; 26/2 function To_Index (Position : Cursor) return Extended_Index; 27/2 function Element (Container : Vector; Index : Index_Type) return Element_Type; 28/2 function Element (Position : Cursor) return Element_Type; 29/2 procedure Replace_Element (Container : in out Vector; Index : in Index_Type; New_Item : in Element_Type); 30/2 procedure Replace_Element (Container : in out Vector; Position : in Cursor; New_item : in Element_Type); 31/2 procedure Query_Element (Container : in Vector; Index : in Index_Type; Process : not null access procedure (Element : in Element_Type)); 32/2 procedure Query_Element (Position : in Cursor; Process : not null access procedure (Element : in Element_Type)); 33/2 procedure Update_Element (Container : in out Vector; Index : in Index_Type; Process : not null access procedure (Element : in out Element_Type)); 34/2 procedure Update_Element (Container : in out Vector; Position : in Cursor; Process : not null access procedure (Element : in out Element_Type)); 34.1/3 type Constant_Reference_Type (Element : not null access constant Element_Type) is private with Implicit_Dereference => Element; 34.2/3 type Reference_Type (Element : not null access Element_Type) is private with Implicit_Dereference => Element; 34.3/3 function Constant_Reference (Container : aliased in Vector; Index : in Index_Type) return Constant_Reference_Type; 34.4/3 function Reference (Container : aliased in out Vector; Index : in Index_Type) return Reference_Type; 34.5/3 function Constant_Reference (Container : aliased in Vector; Position : in Cursor) return Constant_Reference_Type; 34.6/3 function Reference (Container : aliased in out Vector; Position : in Cursor) return Reference_Type; 34.7/3 procedure Assign (Target : in out Vector; Source : in Vector) with Pre => Length (Target) = Length (Source); procedure Assign (Target : in out Vectors.Vector; Source : in Vector); 34.8/3 function Copy (Source : Vector) return Vector; function Copy (Source : Vectors.Vector) return Vector; 54/2 procedure Reverse_Elements (Container : in out Vector); 55/2 procedure Swap (Container : in out Vector; I, J : in Index_Type); 56/2 procedure Swap (Container : in out Vector; I, J : in Cursor); 57/2 function First_Index (Container : Vector) return Index_Type; 58/2 function First (Container : Vector) return Cursor; 59/2 function First_Element (Container : Vector) return Element_Type; 60/2 function Last_Index (Container : Vector) return Extended_Index; 61/2 function Last (Container : Vector) return Cursor; 62/2 function Last_Element (Container : Vector) return Element_Type; 63/2 function Next (Position : Cursor) return Cursor; 64/2 procedure Next (Position : in out Cursor); 65/2 function Previous (Position : Cursor) return Cursor; 66/2 procedure Previous (Position : in out Cursor); 67/2 function Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type := Index_Type'First) return Extended_Index; 68/2 function Find (Container : Vector; Item : Element_Type; Position : Cursor := No_Element) return Cursor; 69/2 function Reverse_Find_Index (Container : Vector; Item : Element_Type; Index : Index_Type := Index_Type'Last) return Extended_Index; 70/2 function Reverse_Find (Container : Vector; Item : Element_Type; Position : Cursor := No_Element) return Cursor; 71/2 function Contains (Container : Vector; Item : Element_Type) return Boolean; 73/2 procedure Iterate (Container : in Vector; Process : not null access procedure (Position : in Cursor)); 74/2 procedure Reverse_Iterate (Container : in Vector; Process : not null access procedure (Position : in Cursor)); 74.1/3 function Iterate (Container : in Vector) return Vector_Iterator_Interfaces.Reversible_Iterator'Class; 74.2/3 function Iterate (Container : in Vector; Start : in Cursor) return Vector_Iterator_Interfaces.Reversible_Iterator'Class; 75/2 generic with function "<" (Left, Right : Element_Type) return Boolean is <>; package Generic_Sorting is 76/2 function Is_Sorted (Container : Vector) return Boolean; 77/2 procedure Sort (Container : in out Vector); 79/2 end Generic_Sorting; 80/2 private 81/2 ... -- not specified by the language 82/2 end Stable; private ... -- not specified by the language end Ada.Containers.Vectors; Upon creation, an object of type Stable.Vector marks the vector designated by the Base discriminant (the /base/ vector) to disallow tampering with cursors as long as the stable view of the base vector exists. The operations of the Stable subpackage correspond to the operations of the enclosing generic package Vectors that do not tamper with cursors. For functions that create a new vector, such as To_Vector or Copy, the Stable version of these first create a base object of the Vectors.Vector type, and then create a stable vector designating that base object. [Redundant: If one of these functions is used to initialize a stable vector object, the return object is built in place.] The other operations of the Stable subpackage perform the corresponding operation on the vector designated by the Base discriminant. !discussion Tampering was primarily intended to prevent erroneous/unspecified behavior. More specifically, tampering with cursors was intended to prevent loops from going off the rails because an element was deleted (which might cause an iterator to run into non-existent memory) or inserted (which might cause an iterator to go into an infinite loop). Tampering with elements was intended to prevent an element that is actively being used from disappearing or being replaced (which might change the "shape" of the element). We are proposing to restrict this concern to cases where the formal element type is indefinite, since for definite types, normal assignment can be used to replace objects, and access types will work safely across assignment to their designated objects. We make the stable vector type a limited type because normal assignment would almost certainly fail due to a mismatch of the access discriminant. !ASIS No ASIS effect. !ACATS test ** TBD. !appendix [This thread diverged from the one in AI12-0110-1; there was some cross-pollination.] From: Robert Dewar Sent: Monday, February 10, 2014 7:23 AM BTW, while on the subject of tampering checks, we are becoming more worried that the design of the containers in Ada is fundamentally flawed. It is a serious embarrassment, one noted by several of our customers, that the performance of the Ada containers is way less efficient than the C++ containers, and we fear that a lot of this inefficiency is mandated by the Ada design. In the case of bounded containers, we have a set of "formal" containers that are much simpler and much more efficient. I don't know if in practice we will end up replacing the official containers entirely, perhaps so. *************************************************************** From: Brad Moore Sent: Monday, February 10, 2014 9:45 AM It strikes me that Tampering should be a check that can be suppressed with pragma Suppress. Maybe it was an oversight that it isnt already a suppressable check? Once programmers are satisfied that their code does not tamper with the containers, they might want to choose to suppress the tampering checks, to obtain better performance. I'd much rather see that than "replace the official containers entirely". *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 9:51 AM > It strikes me that Tampering should be a check that can be suppressed > with pragma Suppress. > Maybe it was an oversight that it isnt already a suppressable check? Well the notion of suppressible checks, which can be turned on and off at will, does not work well with run-time units where the checks are deeply embedded into the run-time code and its design. > Once programmers are satisfied that their code does not tamper with > the containers, they might want to choose to suppress the tampering > checks, to obtain better performance. > > I'd much rather see that than "replace the official containers > entirely". Well yes, but it won't help at all I am afraid. *************************************************************** From: Jean-Pierre Rosen Sent: Monday, February 10, 2014 10:21 AM > It strikes me that Tampering should be a check that can be suppressed > with pragma Suppress. > Maybe it was an oversight that it isnt already a suppressable check? > > Once programmers are satisfied that their code does not tamper with > the containers, they might want to choose to suppress the tampering > checks, to obtain better performance. > > I'd much rather see that than "replace the official containers > entirely". I agree with you, but how could you do that if the containers are written in Ada, like regular packages? Suppress can remove only compiler generated checks. Tampering checks could be suppressed only if fully implemented as pre/post conditions, but I doubt this is doable (disclaimer: I never actually looked into the implementation of containers, please correct me if I'm wrong) *************************************************************** From: Brad Moore Sent: Monday, February 10, 2014 10:20 AM > Well the notion of suppressible checks, which can be turned on and off > at will, does not work well with run-time units where the checks are > deeply embedded into the run-time code and its design. I see, but I would have thought that something could be done however. Maybe if the suppression is globally specified as a configuration pragma, the tampering could be removed by the compiler as dead code, similar to when a constant boolean expression is encountered in GNAT? i.e. Do_Tampering_Checks : constant Boolean := False; if Do_Tampering_Checks then ... -- Tampering check code end if; *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 10:33 AM > I see, but I would have thought that something could be done however. > Maybe if the suppression is globally specified as a configuration > pragma, the tampering could be removed by the compiler as dead code, > similar to when a constant boolean expression is encountered in GNAT? How can that help, the run-time does not get recompiled in response to setting a configuration pragma! *************************************************************** From: Brad Moore Sent: Monday, February 10, 2014 10:59 AM Could it be used as a switch to select a different container library to link with, then? *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 11:01 AM > Could it be used as a switch to select a different container library to link > with, then? Conceivably a configuration pragma could be used in this way, but it is more attractive to just fix the existing containers IMO. This is damaging the language, and what I would be inclined to do is to make a new set of efficient containers, and retain the old ones just to pass ACATS tests but not expect many other people to use them. I don't see why we should have a default that is unusable. *************************************************************** From: Brad Moore Sent: Monday, February 10, 2014 11:14 AM I would say that not everyone cares about performance, when using containers. Having safety as the default seems like the right choice to me for Ada. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 11:20 AM > I would say that not everyone cares about performance, when using > containers. Having safety as the default seems like the right choice to > me for Ada. Ada is supposed to be about getting safety WITHOUT paying a huge performance hit. We are finding a lot of users who are very disappointed to find out that the containers are a) SO much worse than C++ b) as a consequence pretty much useless We found that ourselves internally, so some of our attempts to use the standard containers have just been abandoned in favor of spinning our own, and that is what will happen with customers. Believe me "we know that the containers in Ada are five times slower than those in C++, but they are so much safer" won't fly. Besides we have no real basis for saying that they are safer, we don't know in practice that all this checking will actually be helpful, we are just guessing that this is the case without real data. We don't expect critical applications to use the containers at all. Far too much difficult stuff. Instead they will typically use our formal bounded containers which are very usable, but much simplified AND much safer, because they are much more friendly to certification and formal proof. One thing that would have helped is to design all the safety stuff so that it was in the form of preconditions and postconditions, which could be easily suppressed. *************************************************************** From: Brad Moore Sent: Monday, February 10, 2014 11:38 AM >> I would say that not everyone cares about performance, when using >> containers.Having safety as the default seems like the right choice >> to me for Ada. > >Ada is supposed to be about getting safety WITHOUT paying a huge >performance hit. We are finding a lot of users who are very >disappointed to find out that the containers are > >a) SO much worse than C++ I could see performance as being the default, but I can also imagine that there are projects that would want to have a way to turn on the extra checking, when performance is not an issue, and the possibility of tampering hasn't been completely ruled out. *************************************************************** From: Gary Dismukes Sent: Monday, February 10, 2014 1:18 PM > > I see, but I would have thought that something could be done however. > > Maybe if the suppression is globally specified as a configuration > > pragma, the tampering could be removed by the compiler as dead code, > > similar to when a constant boolean expression is encountered in GNAT? > > How can that help, the run-time does not get recompiled in response to > setting a configuration pragma! In fact it does effectively get recompiled, since these are generics getting expanded in the user's code. So in principle a configuration pragma could be used to control this, though would still be a little tricky to control tampering checks this way. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 1:26 PM Surely we don't make a copy of the ENTIRE body of the generic, sounds horrible not to be sharing code as much as possible, certainly the interface does not demand such behavior. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 2:10 PM > BTW, while on the subject of tampering checks, we are becoming more > worried that the design of the containers in Ada is fundamentally > flawed. It is a serious embarrassment, one noted by several of our > customers, that the performance of the Ada containers is way less > efficient than the C++ containers, and we fear that a lot of this > inefficiency is mandated by the Ada design. I can believe this, but I find it hard to believe that tampering (alone) is the cause of this inefficiency. The tampering *check* should be a simple comparison against an integer value, which shouldn't add significant overhead compared to the "real" operation in most cases. I'd expect dangling cursor checks (if you make them) to be much more expensive (and a drag on performance). But those are optional. It would be nice if someone could quantify the difference between the Ada and C++ libraries and pinpoint the exact issues. My guess is that you'd still see similar problems even with the tampering check eliminated. (I'd do this myself, but I don't know C++ well enough to make an accurate comparison.) > One thing that would have helped is to design all the safety stuff so > that it was in the form of preconditions and postconditions, which > could be easily suppressed. I've suggested that before (including on Friday). It's still possible, of course, it would be compatible to do so (we couldn't use predicates, but preconditions would work). (But it would be a lot of work for the Standard, and some for implementers.) Of course, we couldn't have done this for Ada 2005, as we didn't have preconditions then. In particular, if we added the routine Is_Tampering_with_Cursors (Container : in Vector) return Boolean, then the bounded vector Insert could be: procedure Insert (Container : in out Vector; Before : in Extended_Index; New_Item : in Vector) with Pre => (if Container.Is_Tampering_with_Cursors then raise Program_Error with "Tampering check") and then (if Before not in Container.First_Index .. Container.Last_Index + 1 then raise Constraint_Error) and then (if Container.Length = Container.Capacity then raise Capacity_Error), Post => (Container.Is_Tampering_with_Cursors'Old = Container.Is_Tampering_with_Cursors) and Container.Length'Old + 1 = Container.Length); The postcondition is intended to show provers that a call of Insert doesn't change the tampering state. Having these as preconditions and postconditions would allow easy suppression of the checks (and also would allow the compiler to combine and/or eliminate them, so it wouldn't be as necessary to suppress them). This seems to me to be the best way to fix the performance problem caused by the checking. But that presumes that we're sure that the performance problem is caused solely by the checking. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 2:15 PM > This seems to me to be the best way to fix the performance problem > caused by the checking. But that presumes that we're sure that the > performance problem is caused solely by the checking. The main issue for performance is in loops, and I believe Ed has some insight into the issues there, Ed? *************************************************************** From: Ed Schonberg Sent: Monday, February 10, 2014 2:58 PM >> BTW, while on the subject of tampering checks, we are becoming more >> worried that the design of the containers in Ada is fundamentally >> flawed. It is a serious embarrassment, one noted by several of our >> customers, that the performance of the Ada containers is way less >> efficient than the C++ containers, and we fear that a lot of this >> inefficiency is mandated by the Ada design. > > I can believe this, but I find it hard to believe that tampering > (alone) is the cause of this inefficiency. The tampering *check* > should be a simple comparison against an integer value, which > shouldn't add significant overhead compared to the "real" operation in most > cases. the cost is not the comparison, but the setting and reseting of the tampering counter. Its management requires that references be somehow controlled, so in an loop you are forced to create. initialize, and finalize an object with a controlled component. This makes simple loops (add the elements of a vector, say) more than an order of magnitude slower that the naive array code. The finalization machinery cannot be inlined, so there is no hope for the code generator to optimize any of it. There are possible front-end optimizations for some simple loops, and we will experiment with them in coming weeks, but in general the controlled overhead is the killer. If there are better suggestions on how to manage tampering bits in the presence of arbitrary references that may be floating around, I’d be delighted to hear it. *************************************************************** From: Ed Schonberg Sent: Monday, February 10, 2014 3:00 PM > Surely we don't make a copy of the ENTIRE body of the generic, sounds > horrible not to be sharing code as much as possible, certainly the > interface does not demand such behavior. Surely you jest. There is very little that can be shared between different kinds of containers. *************************************************************** From: Gary Dismukes Sent: Monday, February 10, 2014 3:09 PM And what is shared is either unrelated to tampering checks or is itself generic. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 3:06 PM Perhaps the idea of a completely separate set of containers, which completely remove the tampering checks, and a configuration pragma saying whether tampering checks are required to choose which set of containers are needed. Then probably for GNAT, we would default to the efficient set, and have a compiler switch for strictly conforming behavior. The configuration pragma could be pragma Container_Tampering_Checks (On | Off); with a requirement that this be set consistently for all units withing containers. Or perhaps, since these are all generic, it would work to instantiate two separate versions depending on the setting of Tampering_Check??? *************************************************************** From: Tucker Taft Sent: Monday, February 10, 2014 3:09 PM >> Surely we don't make a copy of the ENTIRE body of the generic, sounds >> horrible not to be sharing code as much as possible, certainly the >> interface does not demand such behavior. > > Surely you jest. There is very little that can be shared between different > kinds of containers. I think Robert meant that you might be able to share code between different instantiations of the same generic, by carving out some of the code into a non-generic package. I think we already do that for the unbounded containers. It might be harder to do that with the bounded containers. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 3:11 PM > Surely you jest. There is very little that can be shared between different > kinds of containers. I do not jest, I am talking about sharing code between multiple instantiations of the same container, as we do for Text_IO or Bounded_String. We minimize the amount of generic code in such packages, so that e.g. multiple instantiations of Bounded_Strings for different lengths share 95% of the code between them (they are actually uses of a Superbounded package that keeps the max length around). Similarly, 95% of the work of the generic subpackages in Text_IO is done by non-generic helper routines. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 3:22 PM > the cost is not the comparison, but the setting and reseting of the > tampering counter. Its management requires that references be somehow > controlled, so in an loop you are forced to create. initialize, and > finalize an object with a controlled component. This makes simple > loops (add the elements of a vector, say) more than an order of > magnitude slower that the naive array code. The finalization > machinery cannot be inlined, so there is no hope for the code > generator to optimize any of it. There are possible front-end > optimizations for some simple loops, and we will experiment with them > in coming weeks, but in general the controlled overhead is the killer. > If there are better suggestions on how to manage tampering bits in the > presence of arbitrary references that may be floating around, I'd be > delighted to hear it. Thanks, I was starting to think about that since Robert's message pointed me the right way. It's especially annoying because there isn't any sensible way to suppress such overhead (it's not with the check per-se). I would have expected worse problems with Reference/Constant_Reference, but perhaps they're not used as much in critical situations. Anyway, I think it is possible to optimize the overhead away in simple cases. I know I hoped that the containers would get used enough to make such optimizations worthwhile. But I don't see a general way to do that (it would be specific to the containers), and I admit I'm not sure the substantial efforts required would help enough. Specifically, if the compiler "knows" that it is dealing with a controlled object used solely for a tampering check, then it can determine if there are any possible occurrences of a tampering check within the scope of that object. (That necessarily would have to be pretty conservative, probably would assume that any call to a non-container routine contains tampering.) If there are no tampering checks in that scope, then the controlled object itself (setup and removal) can be completely removed from the program, which of course would completely eliminate the overhead. (This probably would require in-lining of the setup and finalization of the object.) I would expect that the majority of Reference/Constant_Reference calls would meet these requirements and could have the overhead removed. It would be rarer for loops, but then again loops that have lots of contained calls will be less sensitive to the loop overhead (it would be a much smaller part of the whole). This isn't ideal because it requires knowledge about the containers themselves. Perhaps if the tampering check was a precondition, it would require a bit less magic, but you'd also want to know about routines that could not possibly have a tampering check (other container routines that are pure reads, in particular). [Because blocking the optimization by *any* call would leave few loops that could be optimized.] I can imagine defining a set of [implementation-defined] aspects for the purpose, but of course that increases the work needed. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 3:28 PM > Specifically, if the compiler "knows" that it is dealing with a > controlled object used solely for a tampering check, then it can > determine if there are any possible occurrences of a tampering check > within the scope of that object. (That necessarily would have to be > pretty conservative, probably would assume that any call to a > non-container routine contains tampering.) If there are no tampering > checks in that scope, then the controlled object itself (setup and > removal) can be completely removed from the program, which of course > would completely eliminate the overhead. (This probably would require > in-lining of the setup and finalization of the object.) I don't see this kind of highly specialized flow analysis as being even vaguely practical, at least in our environment. It would require a complete extra complicated pass in the compiler, and I can't see that happening. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 3:36 PM ... > Specifically, if the compiler "knows" that it is dealing with a > controlled object used solely for a tampering check, then it can > determine if there are any possible occurrences of a tampering check > within the scope of that object. (That necessarily would have to be > pretty conservative, probably would assume that any call to a > non-container routine contains tampering.) If there are no tampering > checks in that scope, then the controlled object itself (setup and > removal) can be completely removed from the program, which of course > would completely eliminate the overhead. (This probably would require > in-lining of the setup and finalization of the object.) Another way to do this would be to have unsafe versions of the routines somewhere (which don't create the controlled object) and switch to them anytime that the scope of the object doesn't contain any possible tampering operations. More generally, since the controlled object exists solely for the tampering check, it isn't needed at all if it is known that there isn't any possible tampering in the actual loop body (or scope of Reference/Constant_Reference). For simple cases, this is determinable by the compiler, and then some method has to be used to ensure that no useless controlled object is created and managed. *************************************************************** From: Robert Dewar Sent: Monday, February 10, 2014 3:45 PM > More generally, since the controlled object exists solely for the > tampering check, it isn't needed at all if it is known that there > isn't any possible tampering in the actual loop body (or scope of > Reference/Constant_Reference). For simple cases, this is determinable > by the compiler, and then some method has to be used to ensure that no > useless controlled object is created and managed. It's really uncomfortable to have the front end of the compiler have to know this much about containers. I think I prefer the approach where you just indicate at instantiation time whether or not you want tampering checks. Note that another advantage of having the tampering checks be stated as preconditions and the status stated as pre/post conditions is that you have a chance of verifying formally that no tampering is possible. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 3:55 PM > I don't see this kind of highly specialized flow analysis as being > even vaguely practical, at least in our environment. It would require > a complete extra complicated pass in the compiler, and I can't see > that happening. Yeah, it's probably too hard to do for loops. I have to wonder, though, if something like it would be worth it for Reference/Constant_Reference, which probably are a lot of the overhead in many cases. Most of the time, their lifetime is very short (within a single expression), and there probably is an expression pass that such an optimization could be added to. For instance, in Sum := 0; for C in Container.Iterator loop -- (1) Sum := Sum + Container(C).Component; -- (2) end loop; Let's assume that there are 100 elements in Container. In that case, the tampering loop setup at (1) is executed once; the tampering setup for Constant_Reference (implicit in 2) happens 100 times. Moreover, that second, very local object gets finalized as soon as the assignment statement is completed (it might even be sooner than that). So that is very localized, it's clear that there is no tampering check (because there is no call of any kind) in the scope of that tampering check. And it's clear that removing (2) would save 100 times the overhead that removing (1) would. Of course, I haven't tried this in practice, so I don't know how hard it would really be. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 4:00 PM ... > Note that another advantage of having the tampering checks be stated > as preconditions and the status stated as pre/post conditions is that > you have a chance of verifying formally that no tampering is possible. I agree with this in any case, that's one reason I've been interested in doing this for quite a while. But I don't think it would help much for the purpose of eliminating the setup cost (other than a SPARK-like environment), because real code has just too many external subprogram calls to be able to tell. Instrumenting them all with no tampering pre/postconditions is likely to be prohibitive. *************************************************************** From: Ed Schonberg Sent: Monday, February 10, 2014 4:08 PM > And it's clear that removing (2) would save 100 times the overhead > that removing (1) would. > > Of course, I haven't tried this in practice, so I don't know how hard > it would really be. this is the kind of transformation i have been experimenting with. There are cases where the expansion of the generalized indexing following by an explicit dereference could be replaced with a call to Element directly, which does not involve the creation of a reference, and therefore generates no controlled calls. The question is to determine precisely when the body of the loop makes this transformation safe. *************************************************************** From: Randy Brukardt Sent: Monday, February 10, 2014 4:24 PM Ah, great minds think alike! :-) Unless the generalized indexing is used in a renames or as an actual parameter, I would expect that the determination whether tampering could occur would be purely local to the subexpression in question. So that seems practical. I probably would look into the idea of having a "unsafe" Reference/Constant_Reference routine somehow hidden in the container, to use in place of the standard call. The problem with Element is that it logically copies the element, while Reference/Constant_Reference is a pointer to it. So there are other considerations with replacing with Element that don't come up if an unsafe version (without the controlled object) of Reference/Constant_Reference is available. (It's also possible that copying a large element is more expensive than the tampering setup. That can't happen for a no-tampering Reference/Constant_Reference.) I suppose it makes sense to use an Element replacement as a proof-of-concept, as no funny business in the containers is needed. If it looks promising, then try a container modification to go further. *************************************************************** From: Randy Brukardt Sent: Friday, February 14, 2014 12:55 PM At the risk of restarting a thread which has died down, I have some additional thoughts on this issue today. I was thinking a bit about this while getting ready for work, and I was wondering if the bounded containers are actually different than the other containers in terms of needing some tampering checks. This thought inspired me to go back and look at the original problems that inspired the checks. This note (essentially an LSN) explains what I found out. ---- Executive summary: The tampering with elements checks associated with Reference/Constant_Reference for bounded containers are probably not worth their overhead, as most of the problems that they're intended to prevent aren't possible if the implementation advice is followed. The checks do prevent some problems, however. The tampering with elements checks associated with Reference/Constant_Reference could be eliminated with little loss of safety if some additional constraints are placed on the implementation of the containers. For a bounded container, these constraints are already part of the implementation advice, so the effect would be minimal (additional advice would reduce risk even further). For an unbounded container, these constraints could be lived with, but would make some likely implementation approaches impossible, especially for vectors. For indefinite containers, the constraints would be impossible. The tampering with elements checks associated with the Query_Element/Update_Element callbacks could be eliminated with the same constraints as mentioned above. There would be a small additional loss of safety for Update_Element, mainly for elements that could be passed by-copy. The tampering with cursors checks associated with iterators (both the callback form and the for loop form) can't be eliminated without a significant loss of safety. Eliminating tampering checks from some kinds of containers and some operations (but not other containers and operations) would reduce consistency between the containers, but that could only be a problem when moving from a container without such checks to one that has them, and even then a problem seems unlikely. ---- Let's start with some background. The tampering checks were originated to ensure that call-back iterators (and now for loops) did not have to be constructed defensively. Specifically, inserting or deleting an element near the current one in an iterator can cause trouble, as the iterator may have already saved a cursor (pointer) to the "next" value. If that next value was to change by some operation, the iterator could malfunction (possibly missing an element, or visiting an element twice, or worst of all, visiting some memory that *used* to be an element but has since been returned to the system). The cost of these check was thought to be acceptable, because the setup only happens once per loop, and the checks are cheap (an integer compare, typically). Moreover, the setup is cheap for the call-back iterators, as the body of the iterator can clear the tampering flag when it completes its actions. Later, we ran into some similar issues with Query_Element/Update_Element, and we decided to use the already defined tampering mechanism to handle them. Specifically, if the element that Query_Element/Update_Element is currently working on is deleted or replaced, then if the parameter was passed by reference, it may no longer reference allocated memory (for container implementations that allocate elements individually, especially likely for the indefinite containers). For Update_Element, there is also the potential for data loss -- if the element is passed by copy, any modifications made to it via operations calls from Update_Element (rather than to the parameter) will be lost when Update_Element returns and copies back the parameter to the element. The tampering checks for elements weren't restricted to the element being operated on specifically, for two reasons. First, tracking the element in question would make the checks significantly more expensive, especially as several nested operations could be used to operate on a number of elements at once. Secondly, some reference container designs (especially for vectors) move elements around when new ones are inserted or deleted. That means that element might change if something is inserted in front of it, with all of the same issues happening. In order to give implementers maximum flexibility for implementations, we simply banned any insertions or deletions, which also has the effect of making the check relatively simple. For Ada 2012, we have new operations that also use these checks. The iterators and the associated for loop syntax has exactly the same need for tampering checks as the Ada 2005 call-back iterators. The only difference is that clearing the tampering state is harder as there is no easy place to put that code. The usual solution is to use a controlled object, but of course that increases the overhead of the setup and removal of the tampering state. Again, this isn't too significant in most uses, as it occurs only once per loop, and the other overhead of setting up iteration can also be fairly significant. Ada 2012 also adds Reference/Constant_Reference. The design of these functions uses accessibility checks on the container (via it being an explicitly aliased parameter) and on the result to ensure that the returned access value cannot outlive the container. This eliminates the majority of dangling pointer issues. However, we still have the same problem as with Query_Element/Update_Element, in that a replacement or deletion of the element could leave the pointer accessing nothing. That might appear to be hard to do (given the generally short life of the result), but parameter passing and renames both can extend that lifetime arbitrarily. There is an example of this causing erroneous execution in AI05-0212-1. Thus, we determined that a tampering check is needed here, too. It's implementable using a controlled object that is finalized at the end of the lifetime of the access value. When the object is finalized, the tampering state is cleared. The problem, of course, is that this setup and tear-down of the tampering state is rather expensive, as it involves creating and destroying a controlled object. In addition, this is likely to be very frequent in Ada code, especially that using the new indexing syntax. Finally, a lot of these cases cannot have any tampering operations occur during their very short lifetime, so the check is just arbitrary overhead. In a typical loop, there will be one or more indexed accesses per iteration. So while the for loop will have one tampering setup/teardown per execution of the entire loop statement, similar overhead will be incurred per iteration by Reference/Constant_Reference. Thus the Reference/Constant_Reference overhead is likely to be several orders of magnitude more important than the loop overhead. If we change anything, this is the most important thing to change. --- We wanted the containers to be as similar as possible. Thus, we used the same tampering checks on all of the containers. However, the *need* for the tampering checks is not the same for each kind of container. In particular, the indefinite containers have to, by definition, allocate each element separately (as the size and shape of the element is not known until it is inserted). These containers also will have to delete and then reinsert an element that is replaced (at least in general), as again the size and shape could change. In these cases, the element memory will most likely be returned to the storage pool (often the standard storage pool) and reused (maybe by something completely unrelated). Later references to that memory will cause havoc. OTOH, the bounded containers will never return memory to a storage pool; the maximum size is fixed throughout the life of the container. Thus an element will continue to exist during the entire life of the container, and a reference to it will continue to point somewhere so long as the container itself hasn't been destroyed. As you might guess, the situation with the unbounded containers is somewhere in between. Some containers implementations might move an element when a container is expanded to allow more elements. For instance, the typical vector implementation expands its capacity by allocating a new, larger data area, copying all of the existing elements to the new data area, and then freeing the old data area. (Vectors don't have to work this way, a point we'll come back to later.) In such an implementation, an element insertion has a non-zero risk of making any existing references dangling. (This is an important reason for the tampering check covering any insertions or deletions, not just of the element itself.) Another interesting difference is between Update_Element and Reference. The data loss situation that can occur for elements passed by copy in Update_Element can't occur for Reference, because all access to the element is via the reference -- essentially by-reference. This means that for Reference and Constant_Reference, the only concern is about elements being inserted/deleted/replaced. --- There are two levels of the concern about elements being inserted/deleted/replaced for functions Reference and Constant_Reference. The primary concern is any case where the memory of the element could be deallocated and potentially reused, not just for another element of the same type, but possibly in some unrelated data structure. This is the classic erroneous execution problem, and it is well-known to be a major cause of security bugs and other pestilence. Solutions that eliminate these problems are highly desired, and we're certainly willing to pay some overhead to eliminate this. The secondary concern are cases where the element is deleted and the memory is reused for another element. (A closely related concern, which only happens for vectors, is cases where the insertion or deletion of an element causes the referenced element to move such that the reference no longer points at the same element.) In these cases, the reference still points at a valid element, but it's the *wrong* element. Such cases certainly can cause problems (say by writing components of the wrong element), but they can't cause erroneous execution by themselves. Moreover, these cases are essentially the same as mistakes that can happen with a fixed array of elements. If there is a way to cause erroneous execution from such a case, it's also possible to create the same erroneous execution from similar misuse of array elements (such examples usually involve renaming or parameter passing, both of which can be done with array elements just like container elements). As such, we don't want to have any extra cost in trapping these errors, but if we can detect them for free, that would be good. --- If you've still awake at this point, you're probably wondering where I'm going with all this. The key here is that bounded containers are not supposed to allocate or deallocate elements on the fly. They're supposed to be all allocated when the container is created. (There is implementation advice to this effect for each bounded container.) Thus the primary concern about functions Reference and Constant_Reference is impossible for a bounded container: the returned reference can only become dangling if the entire container is destroyed while the reference exists. (This can only happen if the container is explicitly deallocated with Unchecked_Deallocation or the similar subpool facilities while the reference exists. Tampering checks do not provide any help for this extreme circumstance.) To reiterate, for a bounded container (only), dereferencing the result of Reference cannot cause erroneous execution unless the container is itself deallocated. Otherwise, it will always continue to point to the memory of an element (it just might not be the right element). Moreover, for containers other than vectors, implementations can further reduce trouble this by waiting to reallocate elements as long as possible. That is, when inserting new elements, a previously used element should be used only if no other elements are available, and then the least recently used element should be reused. That makes it much more likely that any dangling references point at deleted elements (to which modifications are not going to cause any problems by themselves) as opposed to pointing at some other in-use element (to which modifications are likely to cause problems). Note that one can get erroneous execution in such cases by changing discriminants of known-to-be-constrained objects (such as formal parameters). But this is possible without involving any containers -- there is nothing new with this. --- As such, the tampering check on Reference and especially Constant_Reference for bounded containers is not doing a lot of good. For containers with individual nodes (all but vectors), we can give implementation advice to use least-recently used allocation of nodes. For an implementation that follows such advice, the reuse of nodes while a reference exists is unlikely (it can happen of course if the container is near capacity), so the danger of modifying the wrong element is minimal. The danger mainly comes from data loss, where code continues to update an element that has been deleted. It's probably not worth the overhead for such cases. That's especially true for Constant_Reference (where no updating is allowed anyway). The case with the bounded vector is not as clear-cut. Deleting or inserting an element usually causes the others to "slide" over, invalidating all cursors and references. These now all point at the wrong element. Thus, while a reference will still access data, it will always be the wrong data. There is an argument that this is not significant, as the same thing can happen with a fixed array. That argument however seems to mainly be that fixed arrays are buggy, so bounded vectors should also be buggy. :-) As such, it is not only reasonable to suppress the tampering check for Reference and Constant_Reference for bounded containers, but for most containers, it may be sensible to eliminate it altogether. That would make bounded containers slightly less safe than the unbounded form, but they also would be considerably faster. --- If we were to make some or all of the bounded containers different by eliminating their tampering check for Reference/Constant_Reference, there would clearly be an interoperability concern. Moving from a working bounded container to an unbounded one could cause some failures from tampering. (Going in the other direction would not cause any problems, of course.) It's hard to say how significant this is; most Reference/Constant_Reference calls are very-shortlived and couldn't tamper in any way. But there probably would be some longer-lived ones (via renaming or parameter passing) that could cause issues. --- Could we go further? Yes, but there are downsides. First, consider Update_Element. It has the additional data loss issue of by-copy elements. Moreover, it's tampering state setup is cheaper than Reference as no controlled object is needed. So a bit less safety in exchange for much less overhead gain (and in a lesser-used routine) is probably not worth a change. Query_Element is essentially the same as Constant_Reference, so it could be eliminated, but again there this isn't used much and the overhead is much less than Constant_Reference. We also could remove the tampering checks in the same way for the Unbounded forms if we were willing to require that implementations refrain from deallocating (as opposed to saving and reusing) elements when they are deleted/inserted/replaced. The problem with this is that it would prevent the contiguous implementation of vectors; an implementation would have to be chunked so that a capacity expansion did not move any existing elements to new memory. That's probably a too much of an imposition. --- On the other side of the tampering check, we could make the tampering checks (along with some of the other checks for containers) an explicit precondition on the containers routines. This is compatible as renaming/'Access does not require matching preconditions. This would allow dropping some of the English text in the RM, as a pleasant side-effect. (A lot of that text is really messy as it is trying to write something better described in pseudo-code in English.) That might not help the performance problem too much, at least not directly, but it would allow a compiler to remove redundant tampering checks (once a routine has made the check, it doesn't need to be repeated on future calls unless something that might change the tampering state intervenes), and it also would allow a compiler to know when no tampering failure is possible (which could be used to eliminate tampering overhead, particularly for Reference and Constant_Reference). --- Concluding. When I started writing this, I was pretty sure that the tampering check on Reference/Constant_Reference for bounded containers had no purpose at all. I now realize that I was wrong about that, as it does prevent reading or modifying the wrong element after a tampering operation. However, it is unlikely that such reading or modifying would cause erroneous execution (more specifically, the reading or writing of memory that does not currently belong to the container); it could only do so for discriminated records that already have such problems in many other non-container cases. Moreover, with care, most containers could ensure that such reading or modifying is only of an unused element, which is even less likely to cause problems. (Not for the bounded vector, unfortunately.) The main negative here is making the bounded containers more different (and somewhat less safe) than the other kinds of containers. I'm not sure how important we think having all of the containers be as similar as possible is. I do think we need to talk about this in a meeting setting, and perhaps I need to write up some examples of the cases that would slip through without a tampering check for bounded containers. So we probably need to make an AI for further consideration. *************************************************************** From: Geert Bosch Sent: Friday, February 14, 2014 1:22 PM (Let me preface this with noting that I read Randy's excellent write up from this morning, but this is not a direct reply.) > the cost is not the comparison, but the setting and reseting of the > tampering counter. Its management requires that references be somehow > controlled, so in an loop you are forced to create. initialize, and finalize an object with a controlled component. This makes simple loops (add the elements of a vector, say) more than an order of magnitude slower that the naive array code. It seems clear to me that anything involving controlled types for the cursor is going to be unusable. In particular, it is a really bad idea to require that creating a reference to a container must modify that container. Argument 1: Multiple tasks must be able to concurrently read the same thing =========================================================================== I shouldn't have to know whether some type is implemented with a container or otherwise to be able to concurrently read objects of that type from multiple tasks. As it is currently in GNAT, referencing a container requires updating a reference count in the container and the update is not protected and will cause erroneous execution if done concurrently from multiple tasks. Argument 2: References are fleeting, but have an unclear life time ================================================================== As references often are very local and live only for a single use (as in indexing), they should not require updating the container, which typically is a far more global data structure. Writing global data does not only require more care (see above, consider locking), especially on multiprocessors writing to non-local memory can be far slower. Along the jest of Randy's mail, we don't really need tampering checks all that much. What do we care about? What are our requirements? I see one big one. Requirement: Avoid erroneous execution ====================================== We don't want a program that attempts to update a container while holding references to it cause erroneous execution when using the reference after the update. Current situation for GNAT ========================== As shown above, today tampering checks not only slow things down, they CAUSE erroneous execution in multitasking programs concurrently reading a container, as concurrent references cause concurrent writes. Cause of the problem ==================== The problem is that we require the tampering checks on the operations modifying the container. This in turn requires updates to *know* about references, resulting in all issues above. Proposed solution ================= Instead of requiring the update to fail the tampering check, allow subsequent use of the now-stale references to fail the tampering check. A conceptual implementation would be as follows: - any operations that would currently require a tampering check will instead increase the version number (modification counter) of the container - any reference (such as a cursor) contains the version number of the container at the time the reference was created - any dereference checks the container for its version number, and raises Program_Error if it detects tampering Benefits ======== 1. References only need to read the container, so are fast. 2. Concurrent reads of containers are safe, even when iterating 3. It is OK to hold a reference and then perform an update while still holding the reference, as long as the reference is not used subsequently. 4. This approach could be easily extended to allow efficient concurrent updates, or detect concurrent usage, with only locking overhead for writes. My hope is that we can slightly rework the concept of tampering checks, so that we can get containers to be efficient. *************************************************************** From: Randy Brukardt Sent: Friday, February 14, 2014 2:43 PM ... > It seems clear to me that anything involving controlled types for the > cursor is going to be unusable. In particular, it is a really bad idea > to require that creating a reference to a container must modify that > container. You're confusing cursors and references here. There are no (mandated) tampering checks for cursors. Your counter solution for cursors is a good one for detecting cursor problems. My container implementations uses a similar solution for that, although mine uses per-element rather than per-container ids. (That solution was designed in 2006 or 2007; this is not a new idea.) A reference, however, is an access discriminant. It's lifetime is strictly known to be that of the enclosing object, and the (usually static) accessibility checks required ensure that it cannot be used after that lifetime without heroic efforts. We also (usually statically) ensure that the container lives as long as the reference. The tampering check is used to ensure that the *element* lives as long as the reference. [Note that tampering is only in effect during the (usually short) lifetime of the reference; it's not related in any way to the cursor.] Without that, you can get erroneous execution from a dangling pointer -- and that's one of the worst kinds of erroneousness as it means potentially writing someone else's memory. It's not worthwhile to detect that after the fact, because once some other data structure is corrupt, it will stay that way. It's too late to recover. Your proposal: > - any dereference checks the container for its version number, and raises > Program_Error if it detects tampering makes no sense for references, because they are just a normal access-to-object dereference. Thus the element modification takes place through a raw pointer. There is no opportunity to communicate that modification back to the container. I suppose that one could *assume-the-worst* that all references are modifications, but that does not help because you find out too late: once you've overwritten random memory, the program is unrecoverable. You'd usually know that the program is corrupt (unless of course the corruption was such that it made the counters look OK), but all you can do then is terminate the program. One would not want to use such a container in a rocket controller! > As shown above, today tampering checks not only slow things down, they CAUSE erroneous execution > in multitasking programs concurrently reading a container, as concurrent references cause > concurrent writes. Right, but no one is required to use a reference to read an element. Calling function Element involves no tampering check, the "problem" is that it requires a copy of the element. Ergo, if you want concurrent reads, just use Element rather than Constant_Reference. Of course, Ada does not guarantee concurrent reads in any case in the standard library. Of course, we could add such a guarantee (presumably for a very limited set of container operations). I suspect that it wouldn't be too hard to make a read guarantee work for the bounded containers, but for the much more general indefinite containers (the big advance, IMHO), I suspect that it would be problematical. One reason being that we have to have tampering checks to prevent erroneous execution of references there (as any replacement of an element causes reallocation), whereas their value for bounded containers is much more doubtful. (As described in detail in my previous message.) I'm somewhat dubious of the value of supporting concurrent reads (only), you're always going to need some sort of locking to handle the modifications, and it isn't going to add any additional overhead to use that for reads as well. (My spam filter has this problem, blacklists and whitelists and other data are rarely modified compared to the runtime of the filter, but one still has to get a lock before reading from them in order to ensure that the data isn't in the process of being updated.) What is the use case that you are considering?? *************************************************************** From: Tucker Taft Sent: Friday, February 14, 2014 3:37 PM I have had trouble following all of the ins and outs of this thread. But I definitely agree with this idea, though Randy implies it may be solving a different problem. The container library Bob implemented many years ago for our static analysis tool (CodePeer) uses a very similar approach, where you have a "modification count" or equivalent, and checks are performed that this modification count doesn't change, to detect the equivalent of tampering. I also wonder whether references might be short-lived enough that we could do something at compile-time to improve performance. In partial answer to something Randy mentioned, I think it would be reasonable to bump a "modification count" whenever you return a reference that supports update, even before any actual modification takes place. We use that approach for a software virtual-memory mechanism, also in our static analysis tool. *************************************************************** From: Bob Duff Sent: Friday, February 14, 2014 4:21 PM > I have had trouble following all of the ins and outs of this thread. I have not read all of it. I think the way to make containers fast is for someone (AdaCore?) to hack on them, and do measurements and profiling. Perhaps then language change suggestions might be in order. > But I definitely agree with this idea, though Randy implies it may be > solving a different problem. The container library Bob implemented > many years ago for our static analysis tool (CodePeer) uses a very > similar approach, where you have a "modification count" or equivalent, > and checks are performed that this modification count doesn't change, > to detect the equivalent of tampering. That's not how I remember it. As I recall, for something like Vectors, there were two types, one growable (call it G), and one fixed length (call it F). You create a G object, and append things onto it. You then call Freeze, which takes a G, and returns an F containing the same sequence of items. You now process the F. Freeze is implemented so it didn't have to copy the data. Once you call Freeze, the G object is destroyed, and any attempt to touch it raises an exception. It is often used like this: function Build_F_Obj(...) return F is G_Obj: G; -- [*] begin ...build G_Obj... return Freeze (G_Obj); end Build_F_Obj; so there's little opportunity for tripping over that exception, because G_Obj vanishes right after the Freeze. And once you call Freeze, you've got an F, whose length cannot change. So there's no need for tampering checks on F. This supports programs where the first phase builds up the data structure, and then the second phase processes it without modifying it (or, at least, without changing its size/shape). Pretty common. [*] As I recall, G is limited, and F is nonlimited. So at [*], we default init to empty. Today, I would use a build-in-place function Empty, and avoid use of defaults. That was many years ago, so I might be misremembering, and someone might have changed it in the meantime. *************************************************************** From: Randy Brukardt Sent: Friday, February 14, 2014 4:28 PM > I have had trouble following all of the ins and outs of this thread. That doesn't surprise me, there's a lot of misinformation out there. And I'm not good at explaining things concisely -- I hope that you manage to do that. :-) > But I definitely agree with this idea, though Randy implies it may be > solving a different problem. Yes, it's solving a different problem. Read the "problem" part of my message for an explanation of the problems that tampering is intended to solve. Geert's solution is really about detecting "invalid" cursors, which the containers have never required. We didn't require that because solutions like Geert's don't provide 100% detection, which is necessary in order to meet a requirement, or they have a lot of false positives. The hope was that implementations would implement a cheaper 95% solution, but one also has to be careful not to introduce too many false positives. Tampering of elements is mainly about preventing erroneous execution by reading or writing a no longer existing element -- either one that was passed as a parameter to the call-back of Query_Element or Update_Element, or one that we have a reference to from Reference/Constant_Reference. I've always considered these more dangerous than the cursor cases, because a container implementation can decide just how much erroneousness from invalid cursors it was willing to tolerate (which is of course a trade-off with performance). There is no possibility of detecting such problems in the reference or call-back cases, at least without a large amount of complication (because no false positives would be tolerated, just as false positives are not tolerated in the detection of ). The tampering check gives us well-defined bounds on what false positives are tolerated. It does make sense to allow an implementation to "suppress" the tampering check somehow, if it in fact would prefer maximum performance, erroneousness be damned. That would be consistent with the handling of the cursor cases, where an implementation can allow erroneousness for maximum performance if desired. (My understanding is that there is a debug implementation of the containers for GNAT that detects invalid cursors and a performance implementation that does not. It makes sense to treat tampering with elements the same way, because the problem is fairly similar.) *************************************************************** From: Geert Bosch Sent: Friday, February 14, 2014 11:56 PM > ... >> It seems clear to me that anything involving controlled types for the >> cursor is going to be unusable. In particular, it is a really bad >> idea to require that creating a reference to a container must modify >> that container. > > You're confusing cursors and references here. There are no (mandated) > tampering checks for cursors. I'm referring to this (A.18.2(97.1/3)): > When tampering with cursors is prohibited for a particular vector > object V, Program_Error is propagated by a call of any > language-defined subprogram that is defined to tamper with the cursors > of V, leaving V unmodified. Isn't this a mandated tampering check for cursors? > Your counter solution for cursors is a good one for detecting cursor > problems. My container implementations uses a similar solution for > that, although mine uses per-element rather than per-container ids. > (That solution was designed in 2006 or 2007; this is not a new idea.) The purpose is not to invent new ideas, but to fix the horrible inefficiencies (and lack of safety) that we have today. Today, a cursor has to be controlled as the container has to know about it in order to raise exceptions on operations that are defined to tamper with cursors. Containers having to know about references to them, whether cursors or otherwise, is the root of all evil. > A reference, however, is an access discriminant. It's lifetime is > strictly known to be that of the enclosing object, and the (usually > static) accessibility checks required ensure that it cannot be used > after that lifetime without heroic efforts. We also (usually > statically) ensure that the container lives as long as the reference. Right, that is helpful. Unfortunately cursors don't have that property, but there are ways to solve that with an extra level of indirection. > The tampering check is used to ensure that the *element* lives as long > as the reference. [Note that tampering is only in effect during the > (usually > short) lifetime of the reference; it's not related in any way to the > cursor.] Without that, you can get erroneous execution from a dangling > pointer -- and that's one of the worst kinds of erroneousness as it > means potentially writing someone else's memory. It's not worthwhile > to detect that after the fact, because once some other data structure > is corrupt, it will stay that way. It's too late to recover. For simplicity, I only recognize a single kind of erroneousness, and it always is the worst kind, and it always is too late to recover. Fortunately, access types are abstract in Ada. We can invent whatever implementation we want for them, including ones that look like: type Version_Number_Access is access Version_Number; type Element_Access is record Address : System.Address; Reference_Version : Version_Number; Current_Version : Version_Number_Access; end record; The compiler can implement the dereference of this fat pointer as: function Dereference (Ptr : Element_Access) return Element is begin if Ptr.Current_Version.all /= Ptr.Reference_Version then raise Program_Error with "failed tampering check"; end if; declare Item : Element; for Item'Address use Ptr.Address; begin return Element; -- return-by-reference end; end Dereference; > Your proposal: >> - any dereference checks the container for its version number, and >> raises Program_Error >> if it detects tampering > > makes no sense for references, because they are just a normal > access-to-object dereference. Thus the element modification takes > place through a raw pointer. There is no opportunity to communicate > that modification back to the container. AFAIK, we don't have to. Tampering with elements means replacing elements, not modifying them. > I suppose that one could > *assume-the-worst* that all references are modifications, but that > does not help because you find out too late: once you've overwritten > random memory, the program is unrecoverable. You'd usually know that > the program is corrupt (unless of course the corruption was such that > it made the counters look OK), but all you can do then is terminate > the program. One would not want to use such a container in a rocket controller! No, the point is that containers just manage elements. So containers care about tampering with cursors (adding/removing elements) or tampering with elements (replacing elements with different elements), not so much with modifying the elements managed by the container. The Cursor, Constant_Reference_Type and Reference_Type types in the end are all just references. Informally, cursors can become invalid because adding/removing elements might involve reorganization or reallocation of the datastructure causing any pointers to elements to become invalid. References can become invalid because the referenced element is no longer in the container and could be freed. >> As shown above, today tampering checks not only slow things down, they >> CAUSE erroneous execution >> in multitasking programs concurrently reading a container, as >> concurrent references cause concurrent writes. > > Right, but no one is required to use a reference to read an element. > Calling function Element involves no tampering check, the "problem" is > that it requires a copy of the element. Ergo, if you want concurrent > reads, just use Element rather than Constant_Reference. I don't see how this should make a difference. As long as we're only reading it shouldn't matter what kind of references or indexing we use. Remember, vectors should be able to be substitutes for arrays. We can concurrently read from multiple references to the same array element. The same should be true for vectors. > Of course, Ada does not guarantee concurrent reads in any case in the > standard library. Of course, we could add such a guarantee (presumably > for a very limited set of container operations). Repeating the "of course" twice doesn't really make it a stronger argument. If we'd violate the principle that concurrent reads are OK, that makes containers not a viable substitute for arrays. Vectors should have the same semantics as arrays. > I suspect that it wouldn't be too > hard to make a read guarantee work for the bounded containers, but for > the much more general indefinite containers (the big advance, IMHO), I > suspect that it would be problematical. It shouldn't be, or the implementation is flawed. In essence, if reading from a data-structure involves writing that data-structure, we're lost. > One reason being that we have to have > tampering checks to prevent erroneous execution of references there > (as any replacement of an element causes reallocation), whereas their > value for bounded containers is much more doubtful. (As described in > detail in my previous message.) Again, access types are not just memory addresses. We could implement access types as pairs of (integer index, version number), where the index points into an array of pointers and version numbers. That way we could detect dereferencing of access types whose target has disappeared or changed. > I'm somewhat dubious of the value of supporting concurrent reads > (only), you're always going to need some sort of locking to handle the > modifications, and it isn't going to add any additional overhead to > use that for reads as well. This is completely wrong. If we're not protecting against concurrent access, as is the case of regular data types such as arrays, no locking is needed for concurrent reads. They just work. A program wishing to update a container needs to ensure there are no concurrent readers, just as for any other variable. Updating the container concurrently with reading it is erroneous, just as for any other variable. With the version numbering scheme, it is easy to detect this. Reading never needs locking. > (My spam filter has this problem, blacklists and whitelists and other > data are rarely modified compared to the runtime of the filter, but > one still has to get a lock before reading from them in order to > ensure that the data isn't in the process of being updated.) What is > the use case that you are considering?? The general method is: read the version number, copy the data, read the version number again. If the number didn't change the data is good. Otherwise, repeat. This scheme ensures consistent data, as well as lock-free behavior: the system will always make progress, even though an individual task might starve. *************************************************************** From: Randy Brukardt Sent: Saturday, February 15, 2014 12:57 AM > > ... > >> It seems clear to me that anything involving controlled types for > >> the cursor is going to be unusable. In particular, it is a really > >> bad idea to require that creating a reference to a container must > >> modify that container. > > > > You're confusing cursors and references here. There are no > > (mandated) tampering checks for cursors. > > I'm referring to this (A.18.2(97.1/3)): > > When tampering with cursors is prohibited for a particular vector > > object V, Program_Error is propagated by a call of any > > language-defined subprogram that is defined to tamper with the > > cursors of V, leaving V unmodified. > Isn't this a mandated tampering check for cursors? No. This is a mandated tampering check (used in for loops and iterators) that no operations occur on the containers that would damage cursors. There is no mandated check on cursors themselves, and it's certainly not required that they are controlled. (They could be if the implementation wants that.) > > Your counter solution for cursors is a good one for detecting cursor > > problems. My container implementations uses a similar solution for > > that, although mine uses per-element rather than per-container ids. > > (That solution was designed in 2006 or 2007; this is not a > new idea.) > The purpose is not to invent new ideas, but to fix the horrible > inefficiencies (and lack of safety) that we have today. Today, a > cursor has to be controlled as the container has to know about it in > order to raise exceptions on operations that are defined to tamper > with cursors. This is definitely not true. There is little relationship between a cursor and a reference: a *reference* has to be controlled, a cursor does not. I think you need to read the container model much more closely before discussing it, because you have so much confused it takes an amazing amount of time to refute it. > Containers having to know about references to them, whether cursors or > otherwise, is the root of all evil. Then we have nothing that we could ever agree on vis-a-vis safety of containers. > > A reference, however, is an access discriminant. It's lifetime is > > strictly known to be that of the enclosing object, and the (usually > > static) accessibility checks required ensure that it cannot be used > > after that lifetime without heroic efforts. We also (usually > > statically) ensure that the container lives as long as the > > reference. > > Right, that is helpful. Unfortunately cursors don't have that > property, but there are ways to solve that with an extra level of > indirection. Cursors have nothing (directly) to do with tampering. "Tampering with cursors" is just a name. ... > Fortunately, access types are abstract in Ada. We can invent whatever > implementation we want for them, including ones that look like: > > type Version_Number_Access is access Version_Number; > type Element_Access is record > Address : System.Address; > Reference_Version : Version_Number; > Current_Version : Version_Number_Access; > end record; > > The compiler can implement the dereference of this fat pointer as: > > function Dereference (Ptr : Element_Access) return Element is > begin > if Ptr.Current_Version.all /= Ptr.Reference_Version then > raise Program_Error with "failed tampering check"; > end if; > > declare > Item : Element; > for Item'Address use Ptr.Address; > begin > return Element; -- return-by-reference > end; > end Dereference; Since Ada allows anonymous access types to be converted to other access types, you are saying that you want *all* general access types in Ada to have this representation. The odds of that happening are nil (it wouldn't be compatible with C, it would have huge overhead for people that never use the cursors, etc.). Besides, you're directly trying the same false starts we had when we tried to solve the user-defined dereference problem. We ended up with the access discriminant solution because it had the best overall properties. I doubt much has changed about that. Besides, this doesn't actually work. If the pointer is dereferenced and then passed as a reference parameter, and then tampering happens within the called subprogram, your check would pass, but use of the reference parameter is still erroneous. The existing tampering check catches that case (indeed, that case is the primary reason for this specific tampering check). > > Your proposal: > >> - any dereference checks the container for its version number, and > >> raises Program_Error > >> if it detects tampering > > > > makes no sense for references, because they are just a normal > > access-to-object dereference. Thus the element modification takes > > place through a raw pointer. There is no opportunity to communicate > > that modification back to the container. > > AFAIK, we don't have to. Tampering with elements means replacing > elements, not modifying them. Correct. In which case I have no idea what your proposal is intended to detect, because it won't detect any tampering at all. > The Cursor, Constant_Reference_Type and Reference_Type types in the > end are all just references. Informally, cursors can become invalid > because adding/removing elements might involve reorganization or > reallocation of the datastructure causing any pointers to elements to > become invalid. > References can become invalid because the referenced element is no > longer in the container and could be freed. Right. But these are handled very differently, because the consequences are very different. With a cursor, you always have to use some other operation for it to have any effect. That other operation can make any checks needed. With a reference, since you have a bare pointer, we can't make the checks that way, and we have to have tampering. > >> As shown above, today tampering checks not only slow things down, > >> they CAUSE erroneous execution in multitasking programs > >> concurrently reading a container, as concurrent references cause > >> concurrent writes. > > > > Right, but no one is required to use a reference to read an element. > > Calling function Element involves no tampering check, the "problem" > > is that it requires a copy of the element. Ergo, if you want > > concurrent reads, just use Element rather than Constant_Reference. > > I don't see how this should make a difference. It makes a difference because when an element is copied, it no longer matters what happens to the original element afterwards. It's the same reason that one uses temporaries rather than calculating values in place. > As long as > we're only reading it shouldn't matter what kind of references or > indexing we use. Remember, vectors should be able to be substitutes > for arrays. We can concurrently read from multiple references to the > same array element. The same should be true for vectors. If we had ragged arrays (like the indefinite containers), you'd have all of the same problems. Only a bounded container is anything like an array, and I could see making an exception for those. But really, you are asking for more than Ada guarantees about anything. Unless you have atomic or protected objects, having multiple tasks access (or call!) anything is always risky. That's a flaw in Ada, but it is pervasive. One hopes that the parallel subgroup will come up with some proposals that will help that. You can get away with multiple access to an array so long as the components are disjoint. But that works simply because there is no control structure associated with it; if there is, as in the case of the containers, access becomes much more problematical. > > Of course, Ada does not guarantee concurrent reads in any case in > > the standard library. Of course, we could add such a guarantee > > (presumably for a very limited set of container operations). > Repeating the "of course" twice doesn't really make it a stronger > argument. > If we'd violate the principle that concurrent reads are OK, that makes > containers not a viable substitute for arrays. > Vectors should have the same semantics as arrays. Only protected (and atomic) objects are truly safe when used from tasks. So in this way, they *are* the same as arrays. :-) More seriously, if you're willing to restrict your view to what can be accomplished with arrays (as in the bounded vectors), then I can see your point. But a vector (in general) has many capabilities that an array does not, especially in allowing elements of different sizes and shapes. And those compatibilities require complex implementations that would be hard to make task safe AND memory safe. > > I suspect that it wouldn't be too > > hard to make a read guarantee work for the bounded containers, but > > for the much more general indefinite containers (the big advance, > > IMHO), I suspect that it would be problematical. > > It shouldn't be, or the implementation is flawed. In essence, if > reading from a data-structure involves writing that data-structure, > we're lost. No, you're lost. The whole advantage of the containers to me is that they manage memory and eliminate almost all erroneous execution from dangling pointers. (Iterators also have a termination guarantee.) To eliminate those guarantees in order to have a dubious read-only tasking guarantee is going in the wrong direction. There would be value to a fully task-safe container implementation, but it would necessarily be quite expensive. > > One reason being that we have to have tampering checks to prevent > > erroneous execution of references there (as any replacement of an > > element causes reallocation), whereas their value for bounded > > containers is much more doubtful. (As described in detail in my > > previous message.) > Again, access types are not just memory addresses. We could implement > access types as pairs of (integer index, version number), where the > index points into an array of pointers and version numbers. That way > we could detect dereferencing of access types whose target has > disappeared or changed. That's never going to happen. It's been proposed before but the overhead would be crippling. Indeed, one of the main reasons cursors are private types is so that they can have an implementation like the one you're talking about. But it doesn't work for references (both because the check is too soon, and because you can't change the representation of an Ada general access type, since they're all interconvertable modulo accessibility checks). > > I'm somewhat dubious of the value of supporting concurrent reads > > (only), you're always going to need some sort of locking to handle > > the modifications, and it isn't going to add any additional overhead > > to use that for reads as well. > This is completely wrong. If we're not protecting against concurrent > access, as is the case of regular data types such as arrays, no > locking is needed for concurrent reads. They just work. A program > wishing to update a container needs to ensure there are no concurrent > readers, just as for any other variable. Exactly. And the only way I know (that doesn't pervert the program structure) to prevent concurrent readers is to have them test an "updating" lock before they do any reading, and free it afterwards. You could have the updater call an entry in the readers to tell them to pause, but that really puts irrelevant stuff into the readers (and has at least has as much overhead as a lock). It makes much more sense to have the entire data structure protected by an appropriate locking system. The only time your suggestion would make sense is in the case of a data structure that is *never* modified after initialization. But those structures are much rarer in practice than initially seems to be the case. (Almost every structure that I've ever designed that way had to be changed at some future point to allow modifications.) > Updating the container concurrently with reading it is erroneous, just > as for any other variable. With the version numbering scheme, it is > easy to detect this. Reading never needs locking. > > > > (My spam filter has this problem, blacklists and whitelists and other > > data are rarely modified compared to the runtime of the filter, but > > one still has to get a lock before reading from them in order to > > ensure that the data isn't in the process of being > > updated.) What is the use case that you are considering?? > The general method is: read the version number, copy the > data, read the version number again. If the number didn't > change the data is good. Otherwise, repeat. > This scheme ensures consistent data, as well as lock-free > behavior: the system will always make progress, even though > an individual task might starve. Such a system is likely to miss deadlines. Hard to say if that is acceptable or not. For my spam filter, that would require a version number on every individual item (since reads have a very small granularity), or copying all of the data. Neither seems like a good option. Remember that the containers, like an array, are designed to allow using of elements in place. You *never* have to copy an element. I don't see how this scheme would work when updating parts of elements in place (which is the normal case). Anyway, please explain a scheme that actually detects the bugs that tampering prevents, and then it will make much more sense to talk about this. It's frustrating when someone is solving different problems than the ones the checks actually prevent. *************************************************************** From: Randy Brukardt Sent: Saturday, February 15, 2014 8:19 PM The problem with quickly answering messages on the way out of the office is that you don't organize your thoughts very well. Thus I want to clarify a point that's somewhat off-topic here, hopefully so we don't discuss it more (certainly not in this thread). I said: > Unless you have atomic or protected objects, having multiple tasks > access (or call!) anything is always risky. That's a flaw in Ada, but > it is pervasive. By this, I mean that multiple task access to raw Ada data structures like an array is risky. It's risky because safe access always requires some sort of protocol, be it "no access after initialization", or "check version number and retry if changed", or "lock before use", or whatever. Just accessing an array from multiple tasks without any sort of protocol is likely going to cause a latent bug of some sort. And that is a very hard bug to find, as it is likely to occur rarely, and also it is an error of omission (which are usually the hardest ones to find). The Ada language provides very little help for this, in that it doesn't have many ways to prevent access that avoids the protocol. The best thing one can do is to completely encapsulate the data structure (array in this example) in a package, but once that's done, the "arrayness" has disappeared -- we just have another ADT, essentially another container. Essentially, if you want a multi-tasking guarantee in Ada, you need to write a package that provides it. The standard containers are no different in this regard. Whether Ada should have some purpose-built multi-tasking safe containers is another question, but they clearly would need a different design (or abandonment of safety). *************************************************************** From: Randy Brukardt Sent: Saturday, February 15, 2014 9:06 PM ... > Besides, this doesn't actually work. If the pointer is dereferenced > and then passed as a reference parameter, and then tampering happens > within the called subprogram, your check would pass, but use of the > reference parameter is still erroneous. The existing tampering check > catches that case (indeed, that case is the primary reason for this > specific tampering check). In case someone actually wants to see some of the examples that the reference tampering check is intended to prevent, and to have them ready for the eventual AI, I've created some examples: package Pkg is type Item is record C : Character := 'R'; ... end record; package Item_Vector is new Ada.Containers.Vectors (Element_Type => Item, Index_Type => Positive); Item_List : Item_Vector.Vector; procedure Process_Item (Obj : in out Item); end Pkg; package body Pkg is procedure Process_Item (Obj : in out Item) is begin ... -- Split Item: Item_List.Append (Item'(C => Character'Pred(Obj.C), ...)); -- [1] Obj.C := Character'Succ(Obj.C); -- [2] ... end Process_Item; end Pkg; with Pkg; procedure Main is begin Pkg.Item_List.Append (Item'(C => 'M', ...)); Pkg.Item_List.Append (Item'(C => 'T', ...)); Pkg.Process_Item (Pkg.Item_List(2).all); -- [3] end Main; The call at [3] uses the Variable_Indexing aspect and expands into Pkg.Process_Item (Pkg.Item_List.Reference(2).Element.all); In this case, tampering with elements is prohibited until the call Process_Item returns. This means that the call at [1] fails a tampering check. This is important; if the Append call required an expansion of the vector, and the vectors' implementation uses a reallocation strategy (this was Matt's original reference implementation of unbounded vectors), then the element Item_List(2) will be copied to the new (larger) vector data area, and the original one will be freed. If the parameter Obj is passed by reference (as it must be if Item is a by-reference type, for example if it is tagged), then the parameter object no longer exists at the location referenced by the parameter, and the modification at [2] will occur into memory no longer owned by the container. This is classic use of a dangling pointer. The parameter Obj being passed by copy does not help. In that case, the modification at [2] will be OK, but the copy-back after the call will be to non-existent memory. Moreover, the problem occurs in the by-copy case even if the object is never touched after the tampering call. Note that at the point of the dereference, there is no problem. The problem happens later, AFTER the dereference, but while the reference object exists. Also note that this is a new problem to the containers; it does not happen with arrays. That's because arrays have no way to create or delete or replace elements. The set of elements is fixed for the entire life of an array object. We don't want the implicit use of Unchecked_Deallocation inside of the containers to leak out and cause new erroneous cases. Even when the parameters in question have mode "in", a version of the problem still exists, since no modification of the container is necessary to cause problems. While reading non-existent memory is less dangerous than writing it (as it can't corrupt the program directly), it still can cause the program to fault (for instance, if the deallocated memory was returned to the target OS/kernel and it was unmapped from the process's address space). So the read at [2] has a risk of causing a program fault and thus it has to be formally erroneous in this case. A similar problem can happen if the reference is passed as an access parameter: package Pkg2 is type Item is record C : Character := 'R'; ... end record; package Item_Vector is new Ada.Containers.Vectors (Element_Type => Item, Index_Type => Positive); Item_List : Item_Vector.Vector; procedure Process_Item (Obj : access Item); end Pkg2; package body Pkg is procedure Process_Item (Obj : access Item) is begin ... -- Split Item: Item_List.Append (Item'(C => Character'Pred(Obj.all.C), ...)); -- [4] Obj.all.C := Character'Succ(Obj.all.C); -- [5] ... end Process_Item; end Pkg; with Pkg; procedure Main is begin Pkg.Item_List.Append (Item'(C => 'M', ...)); Pkg.Item_List.Append (Item'(C => 'T', ...)); Pkg.Process_Item (Pkg.Item_List(2)); -- [6] end Main; In this case, the parameter acts as if it is passed by-reference, so the by-copy issues can't happen, but the remaining issues still are possible. Note that the reference (an anonymous access discriminant) is implicitly converted to the anonymous access parameter. The accessibility rules are still in place; an attempt to assign the access parameter to a long-lived access type should raise Program_Error. In this example, the dereference occurs after the tampering, but it's possible to use a renaming to move that before the tampering: My_Obj : Item renames Obj.all; In any case, passing objects as parameters is fundamental to Ada; having them be unsafe is not acceptable. As I've previously noted, these particular examples can't cause erroneous execution this way for the bounded containers (as the underlying memory is never allocated or freed on the fly, only when the container is created or destroyed), so it might make sense to eliminate the tampering check there. But there is another way for tampering with a container to cause erroneous execution. Specifically, AI05-0212-1 gives an example of how vector sliding can cause erroneous execution for a reference. (It's rather long so I won't repeat it here.) In this case, the erroneous execution is caused by changing the discriminants of a constrained object. Argubly, we could tolerate this erroneousness, since we do so for other kinds of calls (see 3.7.2(4)). We never decided if that was OK as we prevented it with no extra cost with the same check that prevents the first problem. Hope this sheds some light on the purpose of the tampering checks for Reference/Constant_Reference. *************************************************************** From: Bob Duff Sent: Sunday, February 16, 2014 8:35 AM > Even when the parameters in question have mode "in", a version of the > problem still exists, since no modification of the container is > necessary to cause problems. While reading non-existent memory is less > dangerous than writing it (as it can't corrupt the program directly), > it still can cause the program to fault (for instance, if the > deallocated memory was returned to the target OS/kernel and it was > unmapped from the process's address space). So the read at [2] has a > risk of causing a program fault and thus it has to be formally erroneous in > this case. Well, you don't need unlikely events like "unmapped from address space" to see bad behavior. If X is a dangling pointer, then: Y := X.all.Next; Y.all := ...; will overwrite arbitrary memory locations. So yes, the fact that X is an in-mode parameter doesn't help. > Hope this sheds some light on the purpose of the tampering checks for > Reference/Constant_Reference. Yes, thank you. I think I sort-of knew all that, but it was helpful to have it clearly laid out. *************************************************************** From: Bob Duff Sent: Wednesday, June 4, 2014 2:16 PM [From an administrative thread:] >...AI 111 is on performance (rather than strictly real-time), of >Containers, but I get the impression that Randy isn’t too keen on >relaxing the safety checks, and I think it’s too early to discuss >until there’s at least one implementation of the Containers that >passes the ACATS tests (get it functionally correct then think about >tuning it). Good point. But even then it will be premature to discuss in ARG. The way to attack that problem is to hack on (some implementation of) containers, profile them, and experiment with possible language changes, possible compiler changes, etc. Making language changes first, in the hopes that they might make containers efficient, won't work -- too much guesswork. The moral of the story is that the folks who say standards committees should be standardizing existing practice, rather than inventing, are right. *************************************************************** From: Robert Dewar Sent: Wednesday, June 4, 2014 2:23 PM > Good point. But even then it will be premature to discuss in ARG. > The way to attack that problem is to hack on (some implementation of) > containers, profile them, and experiment with possible language > changes, possible compiler changes, etc. Making language changes > first, in the hopes that they might make containers efficient, won't > work -- too much guesswork. I agree. Let GNAT figure out how to make containers usable, we have an enormous incentive to do that, many of our customers are seriously worried by the horrible performance compared to C++ and we have to do something about it. > The moral of the story is that the folks who say standards committees > should be standardizing existing practice, rather than inventing, are > right. Especially at this stage. *************************************************************** From: Brad Moore Sent: Thursday, October 16, 2014 9:32 AM The discussion section of this AI shows an example of tampering with elements that the tampering checks detect, basically involving obtaining a reference to an element of a vector, and then using that reference to call a procedure which appends an item to container, which could cause the array to be increased in size, leaving the original reference pointing to garbage. I'm wondering if it would be possible to catch this sort of error at compile time rather than at run time. This is the approach taken by the Rust programming language, for a similar example. See http://doc.rust-lang.org/nightly/intro.html fn main() { let mut v = vec![]; v.push("Hello"); let x = &v[0]; v.push("world"); println!("{}", x); } This fails to compile because the second push call it attempting to "borrow" a mutable reference to vector v, while an immutable "borrow" exists (the declaration of x) Maybe Ada could have a compiler mode where it does this sort of checking, or maybe the Global aspects proposal of AI12-0079 could help here. If we had a way to rely on these sorts of errors being caught at compile time, then maybe the need for expensive run-time checks could be eliminated. *************************************************************** From: Robert Dewar Sent: Saturday, October 18, 2014 8:25 AM > The discussion section of this AI shows an example of tampering with > elements that the tampering checks detect, basically involving > obtaining a reference to an element of a vector, and then using that > reference to call a procedure which appends an item to container, > which could cause the array to be increased in size, leaving the > original reference pointing to garbage. > > I'm wondering if it would be possible to catch this sort of error at > compile time rather than at run time. VERY messy to have the compiler have to know this much about container packages, so best would be if some pragmas could be devised that makes the relevant check more general. *************************************************************** From: Randy Brukardt Sent: Tuesday, May 26, 2015 3:55 PM > The discussion section of this AI shows an example of tampering with > elements that the tampering checks detect, basically involving > obtaining a reference to an element of a vector, and then using that > reference to call a procedure which appends an item to container, > which could cause the array to be increased in size, leaving the > original reference pointing to garbage. > > I'm wondering if it would be possible to catch this sort of error at > compile time rather than at run time. I keep thinking about this, and wanted to write something up so I can stop thinking about it and do something actually on my work list. The answer to Brad's question is yes, and in fact the main thing we need is the proposal for the Global aspect. But the result would be wildly incompatible (because of course no existing subprogram has a global aspect). Specifically, tampering with elements effectively is the same as saying that the container cannot be written (other than by the reference itself, of course). So, when tampering with elements is prohibited, any attempt to write the container other than by the reference could be an error. (We'd probably want to somehow define an aspect to mean this, so the rules aren't completely aribtrary.) That's easy for direct uses (as when the result of Reference is renamed), but it's messier when calls are involved (the most important such case being when the result of Reference is passed as a parameter). Here the (proposed) Global aspect is a huge help: the rule could be that the call is illegal if the subprogram's Global aspect allows writing of the container. Since the vast majority of calls would not even know about the container in question, most calls would work fine. The problem here being that the default meaning if there is no Global aspect is "writes all", which clearly would have to be illegal. We could mitigate that somewhat by making the error suppressible (assuming that idea gets implemented), but it still would be too incompatible (in my view, anyway) to use it unconditionally on the existing containers. Thus, the idea would be best used on new containers (specifically the task-safe ones designed for use in parallel loops and the like). Tucker had noted that the "of" form of iterator is a combination of an iterator and a reference, so the same compile-time tampering check would be OK there as well (the reference needs the strong check in any case, and it's stronger than the check for the iterator). However, it would be way too fierce for tampering with cursors situations for regular iterators (it would prevent modifying the element in an iterator, which of course would be nonsense). So it would seem that would need to remain a run-time check. That's probably OK, the number of iterators in a program is far less than the number of calls to Reference. *************************************************************** From: Tucker Taft Sent: Saturday, June 4, 2016 9:59 PM Here is a new version of AI-111 [Version /03 - Ed] that proposes defining a child package "Stable" for each container package, in which a "stable" container type is defined. A stable container object has an access discriminant which can designate an underlying regular container, to provide a "stabilized view" of that container. While the stabilized view exists, no tampering of the underlying container is permitted. A stable container type includes essentially all of the operations of the "regular" container type that do not change the "shape" of the container. This AI also proposes to eliminate the notion of "tampering with elements" for all but containers with elements having an indefinite type. *************************************************************** From: Tucker Taft Sent: Sunday, June 5, 2016 9:41 AM After more thought, it seems we might consider making the Stable.Vector type limited, and require the Base discriminant to be non-null. It is going to be build-in-place anyway, and ":=" won't be reliable so the Assign procedure will be needed in most cases, so there really isn't much benefit in making it non-limited and having the special case for Base=>null. If we do this, then stable vectors created by functions like Copy or To_Vector will become the perfect candidates for use of co-extensions. ;-) For what it is worth, the "&" operators are other ways to create values of the Stable.Vector type, which wasn't mentioned in the write-up. *************************************************************** From: Randy Brukardt Sent: Tuesday, June 7, 2016 5:31 PM > Here is a new version of AI-111 that proposes defining a child package > "Stable" for each container package, in which a "stable" container > type is defined. A stable container object has an access discriminant > which can designate an underlying regular container, to provide a > "stabilized view" > of that container. While the stabilized view exists, no tampering of > the underlying container is permitted. A stable container type > includes essentially all of the operations of the "regular" container > type that do not change the "shape" > of the container. For the record, I changed this AI into the form of an Amendment, since adding dozens of Stable subpackages and removing a check seems way beyond a Binding Interpretation. > This AI also proposes to eliminate the notion of "tampering with > elements" for all but containers with elements having an indefinite > type. The way you did this is as follows: Delete 18.2(95/2-97/2) (and similar paragraphs except for Indefinite_*) talking about tampering with elements. The problem with this is that the Indefinite_* containers never talk about tampering with elements at all (the word "tampering" does not appear in A.18.11 for Indefinite_Vectors, for instance). You have to *move* the existing wording to the indefinite clauses, you can't just delete it in the regular containers. [I also assume that you are replacing "tampering with elements" with "tampering with cursors" in the places where it currently says "tampering with elements", lest reallocation on growth not be allowed as an implementation strategy (that's the "canonical" implementation of a vector). If you're doing that, you need to say so somewhere.] Overall, I can't figure out how to get from your suggested change to your intended result. *************************************************************** From: Tucker Taft Sent: Tuesday, June 7, 2016 9:54 PM > ... Overall, I can't figure out how to get from your suggested change > to your intended result. Actually, you seemed to have figured out my intent. Hopefully it will also be clear to the rest of the ARG. We can fix up the wording after the meeting, if there is support for the approach. *************************************************************** From: Erhard Ploedereder Sent: Sunday, June 12, 2016 8:53 AM In response to questions raised... It was the reading from Containers that was exceptionally slow. Writing was reasonably competitive with other languages. Hashed sets were called out as culprits (this may well be an "e.g." rather than pointing out the only sinner); the difference to other languages was 4x to 8x execution time. Other containers used were Vectors, Doubly_linked_lists and Hash maps. (Orthogonally: Use of indefinite containers in contrast to the "normal" ones added a factor of ~2.) *************************************************************** From: Randy Brukardt Sent: Monday, July 11, 2016 9:17 PM Could you find out *which* operations that were used to read from Containers? I'd expect the performance of Element, Query_Element, and Constant_Reference (the three main ways to read) to be very different (with Query_Element being the fastest and Element being the slowest for most element types). (Of course, regular indexed notation expands into calls on Constant_Reference.) After all, Element essentially makes a copy of an element, whereas the other two use a direct reference into the element. *************************************************************** From: Jeff Cousins Sent: Tuesday, August 16, 2016 6:21 AM Erhard - do you know specifically which reads were slow? Presumably you've already done the legwork of taking the measurements, but which did you measure? *************************************************************** From: Erhard Ploedereder Sent: Wednesday, August 17, 2016 7:18 AM A while ago, I cleaned the experiment from my disk, as changes to the containers made the results obsolete anyway. As such, I have to see whether I can retrieve this from backup somehow and I have not got around to doing the archeology. It was not my experiment, but one of my assistants, so mmy memory alone does not serve. *************************************************************** From: Jeff Cousins Sent: Wednesday, August 17, 2016 8:01 AM Ok, thanks Erhard. *************************************************************** From: Erhard Ploedereder Sent: Saturday, August 20, 2016 7:31 PM I did ask my assistant to look at the problem again. First, I barked up the wrong tree in my cited message. Apologies. The slowness comes from inserting entities into hashed sets after reading them from external files. So it is the writing to hashed sets that is slow. Simply invert the quoted statements below wrt reading/writing. The time for reading/writing these files inclusive of storing/retrieving the entities from containers was measured. These measurements were compared among multiple languages using their respective container packages. Hence my inverted statement about reading/writing. Times for reading the files were really bad for Ada in comparison to the others. Culprit was/is the writing to the container. The slow writing to hashed sets is done by Ada.Containers.Hashed_Sets procedure Include (Container : in out Set; New_Item : Element_Type); The comparably much faster reading is done by cursor over the hash set while writing the retrieved entities to a file. These possible causes were suspected (but not verified as the true causes): - unnecessary additional allocations - seemingly irrelevant computations (maybe tampering checks?) - generality to deal with Element_Type of size not known at compile time - exception handling preventing inlining of calls *************************************************************** From: Jeff Cousins Sent: Sunday, August 21, 2016 1:49 AM Thanks for looking that up Erhard. Good to get the info. *************************************************************** From: Tucker Taft Sent: Sunday, August 21, 2016 12:19 PM For what it is worth, AdaCore has re-implemented most of the containers, and has added the ability to suppress either just tampering checks, or all checks, on container operations. Might be worth trying with the latest GNAT container implementations. *************************************************************** From: Tucker Taft Sent: Sunday, October 2, 2016 9:10 PM Here is an update on the stable containers AI. [This is version /04 - Editor.] I don't believe Raphael had time to prototype this. Major changes -- made this into a subpackage, and made the Base discriminant "not null" with no default. Began describing the semantics of the operations of the package, perhaps most importantly, To_Vector and Copy. Also added a version of Copy that takes an unstable vector as its parameter. Eliminated the Capacity parameter to Copy, as that doesn't make much sense for a stable vector. *************************************************************** From: Raphael Amiard Sent: Monday, October 3, 2016 3:21 PM I was in holidays for two weeks - I'll try not to take vacations just before an ARG meeting in the future ;) - but I'm well on my way prototyping stable vectors, I hope to have some results for the ARG ! I do have some questions though: type Vector (Base : not null access Vectors.Vector) is tagged limited private with Constant_Indexing => Constant_Reference, Variable_Indexing => Reference, Default_Iterator => Iterate, Iterator_Element => Element_Type; Why do you need the Base to be an access discriminant ? The rationale may be obvious, it is not mentioned in the AI, but to my mind it mostly causes problems in the implementation. How do you plan to define Empty_Vector, knowing that Base can't be null, and that you cannot use Vectors.Empty_Vector because it will appear later ? *************************************************************** From: Tucker Taft Sent: Monday, October 3, 2016 3:35 PM > Why do you need the Base to be an access discriminant ? The rationale > may be obvious, it is not mentioned in the AI, but to my mind it > mostly causes problems in the implementation. Without an access discriminant, I can't imagine how the user will specify which unstable vector the stable vector will control. > ... How do you plan to define Empty_Vector, knowing that Base can't be null, > and that you cannot use Vectors.Empty_Vector because it will appear later ? Good point. We probably have to make Empty_Vector into a parameterless function, now that this is a nested package. E.g.: function Empty_Vector return Vector is begin return Result : Vector(Base => new Vectors.Vector'(Empty_Vector)); end Empty_Vector; The result would be "built in place" at the point of call. *************************************************************** From: Raphael Amiard Sent: Monday, October 3, 2016 3:56 PM > Without an access discriminant, I can't imagine how the user will specify > which unstable vector the stable vector will control. Via a constructor function ? That would be a better option I think, make the type completely private, and have one/several constructor functions if you need to enable the user to construct them directly. Also what do you mean by user ? To my mind a stable container can only be constructed via functions on the container itself, something like: function Stable_View (Self : Vector) return Stable.Vector; >>... How do you plan to define Empty_Vector, knowing that Base can't be null, >>and that you cannot use Vectors.Empty_Vector because it will appear later ? >Good point. We probably have to make Empty_Vector into a parameterless >function, now that this is a nested package. E.g.: > > function Empty_Vector return Vector is > begin > return Result : Vector(Base => new Vectors.Vector'(Empty_Vector)); > end Empty_Vector; > >The result would be "built in place" at the point of call. I think this is a bit kludgey, I'd like to be sure that the access discriminant is needed first. *************************************************************** From: Tucker Taft Sent: Monday, October 3, 2016 4:18 PM > Via a constructor function ? That would be a better option I think, > make the type completely private, and have one/several constructor > functions if you need to enable the user to construct them directly. Not quite sure how that would work from an accessibility point of view. Remember that we are taking an existing object, and creating a wrapper around it. Using an access discriminant the only requirement is that the existing object be aliased, and since Vector is a tagged type, when passed as a parameter it is always aliased. > Also what do you mean by user ? To my mind a stable container can only > be constructed via functions on the container itself, something like: > > function Stable_View (Self : Vector) return Stable.Vector; That is not the model proposed by this AI. Instead, you write: X : aliased My_Vectors.Vector; ... declare Stable_X : My_Vectors.Stable.Vector(Base => X'Access); ... >>> ... How do you plan to define Empty_Vector, knowing that Base can't >>> be null, and that you cannot use Vectors.Empty_Vector because it will appear later ? >> >> Good point. We probably have to make Empty_Vector into a >> parameterless function, now that this is a nested package. E.g.: >> >> function Empty_Vector return Vector is >> begin >> return Result : Vector(Base => new Vectors.Vector'(Empty_Vector)); >> end Empty_Vector; >> >> The result would be "built in place" at the point of call. > > I think this is a bit kludgey, I'd like to be sure that the access > discriminant is needed first. This situation is in some sense exactly what access discriminants were designed for. It seems odd not to use them since they are the most natural solution for this sort of wrapper. *************************************************************** From: Bob Duff Sent: Monday, October 3, 2016 4:24 PM > Good point. We probably have to make Empty_Vector into a > parameterless function, now that this is a nested package. E.g.: > > function Empty_Vector return Vector is Functions are better anyway, because they can be overloaded. It's annoying when the compiler complains, "Which Empty_Vector did you mean?" "The one of the right type, of course, YA BIG DUMMY!!" ;-) *************************************************************** From: Raphael Amiard Sent: Monday, October 3, 2016 4:29 PM Ok, thank you for the precisions, that makes sense. I'll go for the Empty_Vector function then. *************************************************************** From: Edmond Schonberg Sent: Monday, October 3, 2016 5:05 PM > Ok, thank you for the precisions, that makes sense. I'll go for the > Empty_Vector function then. Not the precisions again :-)! *************************************************************** From: Randy Brukardt Sent: Monday, October 3, 2016 4:42 PM ... > Functions are better anyway, because they can be overloaded. > It's annoying when the compiler complains, "Which Empty_Vector did you > mean?" "The one of the right type, of course, YA BIG DUMMY!!" ;-) There'd be no major problem in allowing constants to be overloaded (the profile is obvious, just like it is for enumeration literals). (Having variables overloaded would be a bit weird, but not a difficult problem either.) It's unfortunate that Ichbiah didn't take the model to that obvious point, because most of the issues with maintenance problems come from non-overloadable entities. I researched this once in the past (I think in the context of the "integrated package" proposal) and concluded that it wouldn't cause real issues (mostly, it would make illegal programs legal, which hardly could be a problem). But there is a lot of FUD when it comes to visibility, and I fully understand. Plus it would be fairly complex in terms of language wording, so I don't suppose there would be enough support to try again. (Similarly, there is no problem with allowing untagged types in the private part of a protected object [no other unit has visibility, so the usual problems of where operators become visible in the visible view are moot]. Another thing that there is no will to change. I'd rather fix these annoyances, which would make Ada better for everyone, rather than spend time on limited-use features. YMMV.) *************************************************************** From: Randy Brukardt Sent: Monday, October 3, 2016 5:41 PM > Here is an update on the stable containers AI. In the third paragraph of A.18.2.1, there is a Redundant section which ends (but does not start) with a paren. I put a starting paren at the start of the Redundant text (that's what was in the previous version). Perhaps you meant to delete both parens instead? (A closing paren with no opening paren does not work.) ***************************************************************