Version 1.1 of ai05s/ai05-0141-1.txt

Unformatted version of ai05s/ai05-0141-1.txt version 1.1
Other versions for file ai05s/ai05-0141-1.txt

!standard 13.11          09-02-13 AI05-0141-1/01
!class Amendment 09-02-13
!status work item 09-02-13
!status received 09-01-19
!priority Medium
!difficulty Medium
!subject User-dereferencing in Storage Pools
!summary
(See proposal.)
!problem
Dangling access values are a scourge of Ada programming. There are various circumstances (especially for debugging) when it would be valuable to detect the use of dangling access values (created by Unchecked_Deallocation). There are a variety of ways to create a storage pool which could detect such accesses, but the Ada storage pool definition does not have the routines necessary for such detection.
It is proposed to add those routines compatibility to the Root_Storage_Pool definition.
The proposed changes can also be used to build a persistent pool and to build a pool for limited-lifetime access values -- see the examples.
!proposal
Add three additional routines to Root_Storage_Pool so that pools can return and manage access values as handles to memory as opposed to just raw memory addresses. One of the new routines converts an access value into a raw memory address for use (which also provides opportunities for additional runtime checking).
!wording
Add the following routines to Root_Storage_Pool:
procedure Dereference (Pool : in out Root_Storage_Pool; Storage_Address : in out Address) is null;
procedure Export_from_Pool (Pool : in out Root_Storage_Pool; Storage_Address : in out Address) is null;
procedure Add_to_Pool (Pool : in out Root_Storage_Pool; Storage_Address : in out Address) is null;
[Editor's note: These rules (including the existing ones) are all under "Static Semantics", but they are all really dynamic rules. This is traditional for language-defined library packages - I'm not quite sure why - so I didn't make any effort to change that.]
Replace the second last sentence of 13.11(16): (Note that AI05-0107-1 also changes this paragraph, I haven't reconciled these AIs.)
The result returned in the Storage_Address parameter is used by the allocator to represents the memory of the allocated object[Redundant:; procedure Dereference will be used to get the address of the actual memory.]
Add after 13.11(17):
Before any use of a value of a storage pool (via a dereference or other operation that needs to access the allocated memory) procedure Dereference is called. It returns in parameter Storage_Address the address of the allocated storage, which is a contiguous block of memory of the Size_In_Storage_Elements of the corresponding call to Allocate for the input value of Storage_Address storage elements.
Whenever a value of an access type is converted to another access type with a different (or unknown) storage pool, procedure Export_from_Pool is called for the source access type, and procedure Add_to_Pool is called for the target access type. Export_from_Pool takes a value returned by a call to Allocate or Add_to_Pool for the pool and returns the address of the associated memory. Add_to_Pool takes the address of memory of an object and returns an appropriate value. Add_to_Pool is also called for an access value created by an Access or Unchecked_Access attribute. Any exceptions propagated by these routines are propagated by the conversion or attribute.
[Perhaps we should say that a standard storage pool has Dereference, Add_to_Pool, and Export_from_Pool procedures that do not propagate any exceptions. Preferably, they would be null procedures, but we don't need to guarantee that.]
Replace 13.11(21):
If a Storage_Pool is specified for an access type, then a call to Dereference should return the address of the first storage element of a contiguous block of memory. The memory should contain Size_In_Storage_Elements storage elements, and should be aligned according to Alignment of the Allocate call that created the access value. The storage should not be used for any other purpose until the master of the call to Dereference exits, and the contents of the storage must remain available (via another call to Dereference so long as the pool element remains in existence. A call to Allocate for such an access type should, if it can satisfy the request, reserve an appropriate amount of storage with the properties required by Dereference. If the request cannot be satisfied, then Allocate should propagate an exception (such as Storage_Error). If Allocate behaves in any other manner, then the program execution is erroneous.
Add after 13.11(31) [a new Note]:
A user-defined storage pool can directly return the address of the memory to be used for an object from Allocate; in that case, procedure Dereference can be left null. Alternatively, a user-defined pool can return a handle to the memory from Allocate, and use procedure Dereference to return the actual address of the memory.
!discussion
The basic idea of these changes to Root_Storage_Pool is that Allocate my return a handle to memory rather than strictly the address of memory. Procedure Dereference converts that handle into the address of actual memory. This allows the pool to move the memory, so long as it preserves the contents of that memory. It also allows the pool to check the validity of the handle (which is often hard to do for raw addresses).
The two conversion routines are provided so that these handles do not escape to access types (such as anonymous access types) that are not prepared to handle them. They can simply reject conversions (by raising Program_Error, for instance), or, if the pool permits, pass the address of the object out to (or in from) another pool.
The routines are defined as null procedures such that for a pool that simply has Allocate return the address of the memory, the routines can do nothing at all. This preserves compatibility: all existing pools will continue to work with this new model. In addition, using the existing concept of null procedures minimizes the work for compilers; moreover, they make it easier for compilers to determine when the routines are not needed and thus avoid generating the code needed to support them.
This mechanism is not trying to solve storage management issues like region-based allocation. That requires at least lifetime management of access values and the results of calls to Dereference. That is considered out-of-scope for this proposal; see Tucker's various ideas on that subject (and the discussion on those ideas).
Advantages:
(1) Fully compatible with existing pools; no changes are needed to them.
(2) All dereferences (including ones implicitly generated by the compiler) are
included.
(3) A pool can prevent conversions to and from the type if necessary.
(4) Minimal change to the language definition and to compilers.
(5) With some additional routines, a pool can save and restore objects to a
backing store and thus implement persistence and pools larger than physical memory.
Disadvantages:
(1) Rather low-level.
(2) No way for a pool to tell when the memory returned by a call to Dereference
can be reused (after backing up).
(3) No way for the pool to tell of the memory has been modified (a simple
checksum could be used for that purpose if writing to a backing store is expensive).
(4) The use of procedures is clunky. (See alternatives below.)
(5) Having handles being of type System.Address is annoying; it would be better
if they could be a separate type. (See alternatives below.)
(6) Preventing conversions dynamically is inferior to doing so statically. (But
static rules would cause generic contract problems.)
(7) Users of Unchecked_Conversion might be surprised. They should be using
Access_to_Address_Conversions instead, but it doesn't take a type with a storage pool.
Alternatives:
[A] A simplification would be to just provide procedure Dereference, and call it
only for dereferences (and not for other implicit operations). In this case, the meaning of Allocate could be unchanged in the wording of the Standard; the access value as a handle would be dropped. However, a pool using this idea cannot do much other than detect dangling pointers.
[B] Another simplification would keep all of the routines, but drop just the
access value as a handle idea. This would change somewhat less wording, but it would prevent writing a pool where memory is moved or a pool which allows more objects than there is physical memory. The author thinks the additional capabilities are worth the small additional effort.
[C] A version of Address_to_Access_Conversions could be added that would take a
formal type rather than defining the access type directly. This would be defined to call Export_from_Pool and Add_to_Pool appropriately.
[D] An additional routine End_Dereference
procedure End_Dereference (Pool : in out Root_Storage_Pool; Access_Value : in Address);
was seriously considered. This routine would be called as part of the termination of the master that called Dereference. It could be used to determine when it is safe to save the value of an object to backing store, and when it is safe to reuse the memory for something else.
The routine was eventually rejected because of the implementation cost. The other routines are all called at points where the access value is naturally available (it is being dereferenced, converted, etc.) The value would have to be saved to be used in this routine. While this would be simpler than the full finalization (the value returned by dereference can't be a subcomponent of an object or changed by assignment), it still be fairly complicated.
[E] It would be better if these routines were defined as functions. For
instance, Dereference would make more sense if it was defined as:
function Dereference (Pool : in Root_Storage_Pool;
Access_Value : in Address) return Address;
But note that the pool needs to be modifiable (so a restriction to in parameters only would be very annoying, especially for Add_to_Pool which certainly needs to add the address to a table if it is making handles for all values). Using anonymous access parameters would be theorically cause infinite regression, as these routines are defining the dynamic semantics of access values.
One could imagine the "in out" parameter problem being fixed elsewhere. (There will be a proposal to that effect.) But making these functions would require defining a body which compilers would have to provide for the root storage pool. A compiler would have a lot more trouble determining whether such a function call could be avoided; doing so would require new compiler mechanisms.
If we wanted to go this way, fixing the annoying asymmetry that we have null procedures, but nothing similar for functions would make the most sense. The closest analog to null procedures is identity functions. Indeed, the default versions of these functions are identity functions.
Thus, one could imagine defining a way to define identity functions. These are unfortunately more complex than null procedures, because of the need to identify the parameter that is the one returned. So, we've have to add something like:
function Dereference (Pool : in out Root_Storage_Pool; Access_Value : in Address) return Address is identity (2);
which means that the function is an identity function on the second parameter. In this case, the identified parameter's subtype would have to statically match the return subtype.
If we had this feature, we could also use a subtype to make the difference between an access value and a storage address clearer. (Unfortunately, we can't change the parameter names for Allocate and Deallocate.) Thus Root_Storage_Pool would become:
with Ada.Finalization; with System.Storage_Elements; package System.Storage_Pools is pragma Preelaborate(System.Storage_Pools);
subtype Access_Value is Address;
type Root_Storage_Pool is abstract new Ada.Finalization.Limited_Controlled with private; pragma Preelaborable_Initialization(Root_Storage_Pool);
procedure Allocate( Pool : in out Root_Storage_Pool; Storage_Address : out Access_Value; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count) is abstract;
procedure Deallocate( Pool : in out Root_Storage_Pool; Storage_Address : in Access_Value; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count) is abstract;
function Storage_Size(Pool : Root_Storage_Pool) return Storage_Elements.Storage_Count is abstract;
function Dereference (Pool : in out Root_Storage_Pool; Value : Access_Value) return Address is identity (2);
function Export_from_Pool (Pool : in out Root_Storage_Pool; Value : Access_Value) return Address is identity (2);
function Add_to_Pool (Pool : in out Root_Storage_Pool; Storage_Address : Address) return Access_Value is identity (2);
private ... -- not specified by the language end System.Storage_Pools;
It's probably easier to see the intent this way, but it requires a lot of new mechanism.
[F] Another way to change the pool would be to have a separate (derived)
enhanced storage pool type that supports the extra operations. That would mean that compilers would be easily able to tell if the extra operations need to be called. However, that also would mean that dispatching through Root_Storage_Pool for the extended types would get handles and use them without the corresponding Dereference calls. That wouldn't work. One could put the new type below Root_Storage_Pool, but then what do we call it? What's below the root? Dirt? Bedrock? Hell?? :-)
The added complication doesn't seem to buy anything, so long as the new routines are easily determined if they are not overridden (and null procedures do that easily). Also note that changing the type from Address to something more appropriate isn't going to work because of dispatching, so that reason for separating the types isn't possible.
[G] The whole idea of pools is rather low-level. A higher level idea would be to
provide user-defined dereferencing for other types (presumably private types). It is easy to imagine defining an operator "all" for this purpose. But what would the routine look like?
function "all" (Ptr : in Priv_Type) return Designated_Type;
looks cool, but it returns a constant object. The most important feature of dereferencing is that it can be a variable. So this doesn't work.
function "all" (Ptr : in Priv_Type) return access Designated_Type;
seems like it would work, but wait! Where's the DEreference? This is returning an access value, and it would have to be dereferenced afterwards. Moreover, we don't want anyone assigning the result of this function into an access value:
Something : Priv_Type; Acc_Obj : access Designated_Type := Something.all;
makes no sense at all.
Thus, a new kind of function is needed to provide this capability. But if we're willing to do that, we really don't need this capability at all:
function Deref (Ptr : in Priv_Type) magic_return Designated_Type; can be called as: Ptr.Deref := <something>;
which is probably good enough.
So this idea will be explored separately.
!examples
(1) Detecting dangling references.
There are many ways to do this, depending on how expensive a lookup that can be considered and how much memory can be wasted. I'll show one simple one (and I won't show the specification, which should be obvious).
package body Checked_Pool is
Allocated_Addresses : array (1..1000) of Address := (others => Null_Address); Last_Allocated : Natural;
procedure Allocate ( Pool : in out Checked_Storage_Pool; Storage_Address : out Address; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count) is My_Address : Address; begin if Last_Allocated = Allocated_Addresses'Last then raise Storage_Error; end if; Last_Allocated := Last_Allocated + 1; -- Allocate memory from the standard storage pool into My_Address. Allocated_Addresses(Last_Allocated) := My_Address; Storage_Address := Address(Last_Allocated); end Allocate;
procedure Deallocate ( Pool : in out Checked_Storage_Pool; Storage_Address : in Address; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count) is begin if Natural(Storage_Address) not in Allocated_Addresses'range then raise Program_Error with "Bad handle"; elsif Allocated_Addresses(Natural(Storage_Address)) = Null_Address then raise Program_Error with "Dangling pointer"; end if; -- Deallocate memory at Allocated_Addresses(Natural(Storage_Address)) -- to the standard pool. Allocated_Addresses(Natural(Storage_Address)) := Null_Address; end Deallocate;
function Storage_Size (Pool : Checked_Storage_Pool) return Storage_Elements.Storage_Count is begin -- Call the standard pool for this. end Storage_Size;
procedure Dereference (Pool : in out Checked_Storage_Pool; Storage_Address : in out Address) is begin if Natural(Storage_Address) not in Allocated_Addresses'range then raise Program_Error with "Bad handle"; elsif Allocated_Addresses(Natural(Storage_Address)) = Null_Address then raise Program_Error with "Dangling pointer"; end if; Storage_Address := Allocated_Addresses(Natural(Storage_Address)); end Dereference;
procedure Export_from_Pool (Pool : in out Checked_Storage_Pool; Storage_Address : in out Address) is begin Dereference (Pool, Storage_Address); -- Nothing different here. end Export_from_Pool;
procedure Add_to_Pool (Pool : in out Checked_Storage_Pool; Storage_Address : in out Address) is begin if Last_Allocated = Allocated_Addresses'Last then raise Storage_Error; end if; Last_Allocated := Last_Allocated + 1; Allocated_Addresses(Last_Allocated) := Storage_Address; Storage_Address := Address(Last_Allocated); end Add_to_Pool;
end Checked_Pool;
Hopefully, a real pool would have a way to allocate more than 1000 elements - expanding the array or something. It could also check for deallocation of items not allocated from the pool by adding a flag to each element that would be set by Add_to_Pool, and raising an exception if that flag is set for the address that is deallocated. The global objects also should be inside of the pool object (so that there can be multiple such objects).
(2) Persistent pool objects.
A persistent pool could be built by adding a few additional routines:
package Persistent_Pools is
type Persistent_Pool is new Root_Storage_Pool with private;
procedure Initialize ( Pool : in out Persistent_Pool; Backing_Store_Name : in String); -- Initialize a persistent pool for use, proving the name of the backing store (a file name).
procedure Allocate ( Pool : in out Persistent_Pool; Storage_Address : out Address; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count); -- Allocate a NEW persistent object, returning a handle to it.
procedure Deallocate ( Pool : in out Persistent_Pool; Storage_Address : in Address; Size_In_Storage_Elements : in Storage_Elements.Storage_Count; Alignment : in Storage_Elements.Storage_Count); -- Deallocate a persistent object.
function Storage_Size (Pool : Persisent_Pool) return Storage_Elements.Storage_Count;
procedure Dereference (Pool : in out Persistent_Pool; Storage_Address : in out Address); -- Allocate appropriate memory for the object whose handle is -- in Storage_Address, and read it from the backing store into -- that memory.
procedure Export_from_Pool (Pool : in out Persistent_Pool; Storage_Address : in out Address); -- Raise Program_Error (the memory given to other access types -- would have to be locked for the duration of the program - that -- could be done instead).
procedure Add_to_Pool (Pool : in out Persisent_Pool; Storage_Address : in out Address); -- Raise Program_Error (unless we want to make ordinary access -- objects persistent).
procedure Flush (Pool : in out Persistent_Pool); -- Flush all pool objects. -- Writes the all objects to the backing store (if necessary), and -- then frees the memory used to hold the objects. -- Don't call within the scope of a renames of a dereference of any -- access value of this pool.
procedure Finalize (Pool : in out Persistent_Pool); -- Calls Flush, then closes the backing store.
end Persistent_Pool;
The access values returned by this pool can be streamed, so they can be saved for long periods.
An alternative to having the backing store name passed in to a separate routine would be to use an access discriminant to define it, or to just include it in the body of the pool package.
(3) Managing the lifetime of access values.
For the Claw GUI binding, returning an access to an object is problematic, as objects can cease to exist at any time (that happens in response to mouse clicks of the user of the GUI program, which is asynchronous with respect to the operations of the program). Without some sort of protection, it would be impossible to do any operation without it potentially becoming erroneous (and only if some clicked at exactly the wrong time, making finding the problem virtually impossible). Still, there are cases where it is necessary to modify an object in place (such as doing operations on a parent window). Claw solved this problem by returning an object which contained a lock and the desired access value, and noting that copying the access value is not supported. Not being able to get the compiler to help enforce proper usage was a major annoyance. (The lock type is limited so it can't be copied, of course).
Note that a form of user-defined dereferencing alone is not enough to solve the Claw problem. Some sort of lock is needed to prevent operations on the underlying object which would make the dereference invalid, so the lock/access value pair is still needed. But the new pool operations could be used to make it impossible to use the access value after its lock has been freed.
The pool would use handles to the objects. The Add_to_Pool operation would register an object and return a useful handle. The Export_from_Pool operation would unconditionally raise an exception, preventing conversion to other access types. Dereference would return the address of the object The pool also would define an End_Dereference operation that would be called by the finalization routine of the returned lock/access object. This operation would mark the handle as invalid, so that any subsequent Dereference operation would raise an exception. The pool would not actually do any allocation or deallocation, so those routines could be simply raise Storage_Error.
!ACATS test
!appendix

From: Randy Brukardt
Date: Monday, January 19, 2009  9:15 PM

Following is one of a number of "trial-balloon" proposals for future
enhancements to Ada. I've been thinking about these more than my real work
lately, so by writing them up I can stop thinking about them. (And written up,
problems should be more evident than they would be in a simple question format.)
Note that I haven't discussed these with anyone else yet, so they are pretty
much solely from my fertile brain: don't expect to be using these features next
year... :-)


[This is version /01 of the AI - ED.]

****************************************************************

From: Cyrille Comar
Date: Tuesday, January 20, 2009  10:34 AM

Just for Information, GNAT has a similar notion that we call Checked_Pools
(s-chepoo.ads) and implements part of the functionality you suggest (the
Dereference one). It has been available for 12 years now ;-)

****************************************************************

From: Steve Baird
Date: Tuesday, January 20, 2009  9:12 PM

Does comparing two handles for a given pool provides a valid implementation of
4.5.2(12)'s definition of access type equality?

If two handles can map to the same address, then this wouldn't work and
comparison would become more expensive.

It might be reasonable to add a requirement that the handle-to-address mapping
must be one-to-one.

****************************************************************

From: Randy Brukardt
Date: Tuesday, January 20, 2009  9:36 PM

Not sure how we could enforce such a requirement. Or just make the program
erroneous if the mapping is not one-to-one? I suppose that would be consistent
with the way pools work (if a storage pool allocator doesn't actually allocate
appropriate memory, the program is erroneous).

An alternative would be to rewrite 4.5.2(12) to say that it compares two "access
values" rather than leaning on the designated object. Do any implementations
really check that the object is the same anyway?? That would allow multiple
handles designating the same memory; I'm sure someone would find a use for that
as a capability (not that I can think of one off-hand).

But I don't feel strongly either way.

****************************************************************

From: Adam Beneschan
Date: Wednesday, January 21, 2009  11:55 AM

Would adding an equality function to System.Storage_Pools solve the problem?
The compiler would generate a call to this function whenever comparing two
non-null access values belonging to the storage pool.

I notice that in Randy's first example, Add_To_Pool looks like:

    procedure Add_to_Pool (Pool : in out Checked_Storage_Pool;
                           Storage_Address : in out Address) is
    begin
      if Last_Allocated = Allocated_Addresses'Last then
        raise Storage_Error;
      end if;
      Last_Allocated := Last_Allocated + 1;
      Allocated_Addresses(Last_Allocated) := Storage_Address;
      Storage_Address := Address(Last_Allocated);
    end Add_to_Pool;

If the same address were given as input to Add_To_Pool multiple times, the
resulting handle would be different each time, and the result is that access
objects could not be compared for equality.  An equality function would help:

    function Equal (Pool : Checked_Storage_Pool;
                    Left, Right : Address) return Boolean is
    begin
      -- checks to make sure Left and Right are valid, maybe
      return Allocated_Addresses(Natural(Left)) =
               Allocated_Addresses(Natural(Right));
    end Equal;

Also, the proposed semantics state that Add_To_Pool is called whenever 'Access
is used.  This would cause a problem in a case where, say, we have an access
type where most uses of the access type are created by allocators, but there is
some "common" value that access values might often point to.  So it might be
tempting to write something like:

    if Employee_Name /= "" then
       Emp := new Employee_Record' (...);
    else
       Emp := Unknown_Employee'Access;
    end if;

If this happens a lot, and if Emp's type has Randy's example as its storage
pool, then every time the 'Access is used, a new entry in his array is
allocated.  I don't know just what this means---whether Randy's example is just
too simple and a more sophisticated package would be needed (in which case
Steve's suggestion to require a one-to-one mapping may make some sense), or
what?  I don't think this issue indicates that a change would need to be made in
the proposal, but I'm not sure.

One other thing I noticed while writing this is that access values still have to
be able to have the value "null".  If I understand correctly, this proposal uses
System.Address as something that is a "handle" and not an "address".  But when
an access object is assigned the value null, presumably the compiler will copy
System.Null_Address into it.  Thus, if a pool implementation is using handles
that aren't addresses, it will need to work around System.Null_Address, which
could be a bit challenging if the code is to be portable.  For example, Randy's
example won't work if, for some reason, an implementation represents null access
values as the integer 1 instead of 0.  No, I don't know of any implementation
that does this.  But if there were, then the first time Allocate is called, it
would return Allocated_Addresses'First, which would be 1, which would look just
like a null access value.  It's certainly possible to write portable code that
works around whatever Null_Address migh happen to be; but an alternative might
be to include some sort of Null_Access function that would return the value that
the pool wants a "null" to look like, so that if a pool uses "handles" instead
of actual addresses for access values, it can also dictate what it wants the
null "handle" to be.  (Presumably this function would be inlined.)

Anyway, those are just some thoughts based on the Randy's Checked_Pool example;
maybe it's a mistake to base too much on that one perhaps-too-simple example,
though.

****************************************************************

Questions? Ask the ACAA Technical Agent