Version 1.4 of ais/ai-00350.txt

Unformatted version of ais/ai-00350.txt version 1.4
Other versions for file ais/ai-00350.txt

!standard 4.05.2 (12)          03-09-16 AI95-00350/01
!standard 13.11 (7)
!class binding interpretation 03-09-16
!status No Action (5-3-1) 04-06-14
!status work item 03-09-16
!status received 03-08-21
!qualifier Omission
!priority Low
!difficulty Medium
!subject Allocating and comparing zero-size objects
!summary
[Editor's note: This is an unfinished draft. Private conversations have made it clear that no acceptable rule will get consensus, so I will not waste any more time on this one.]
The Allocate routine in System.Storage_Pools can be called with the Size_In_Storage_Elements equal to zero. Pools do not need need to insure that the value they return is unique in that case.
Equality of access values ...
!question
The Allocate routine in System.Storage_Pools declares the Size_In_Storage_Elements parameter to be of type System.Storage_Elements.Storage_Count, which is a subtype whose lower bound is zero.
When defining a new type derived from Root_Storage_Pool, and providing a Allocate routine for that type, is the Allocate routine expected to correctly handle the case when this parameter is zero? (Yes.)
If so, do the resulting access values need to be unique? (No.) If not, it isn't possible to meet the requirements of 4.5.2(12) which imply that access to different objects must compare unequal.
!recommendation
(See wording.)
!wording
Add after 3.10(17):
If the designated subtype is constrained, the elaboration of an access_to_object_definition and access_definition includes a check that the designated subtype has a size greater than zero. Program_Error is raised if this check fails.
Change 4.5.2(12) to say:
Two access-to-object values are equal if they designate the same object, or if both are the result of evaluating the literal NULL. Two access-to-object values (of the same type) are unequal if one is the result of evaluating the literal NULL, and the other is the result of evaluating an allocator, 'Access, or 'Unchecked_Access, or if both are the result of evaluating an allocator, 'Access, or 'Unchecked_Access, and the designated objects are not the same existing object. For other cases it is unspecified whether two access values are equal or unequal.
!discussion
It is temping to fix this problem by forbidding compilers to call Allocate with a size of 0. However, that really doesn't work.
First, we cannot change the specification of Allocate to eliminate zero as a legal value. (Doing so would be incompatible with all programs that define pools, which is completely unacceptable.) Thus, we have to adopt a dynamic semantics rule which states that compilers never call with size zero. But, since we cannot prevent such a call from a user-written call, any well-written pool still will have to detect the case and handle it (or raise an exception). So there is no real savings from having such a rule.
Second, doing so does not really fix the problems with the equality rule. We have similar problems with equality of access values created by Address_to_Access_Conversions, which also need to be addressed.
Thus, we simply address the equality issues and do not change the meaning of Allocate.
...
--!corrigendum B.03(69/1)
!ACATS test
This change only affects pathological programs, and then only makes their results unspecified. That is not testable.
!appendix

!topic Calling allocate routine with zero size
!reference RM95 13.11(7), 13.7.1(4)
!from Adam Beneschan 08-21-03
!discussion

The Allocate routine in System.Storage_Pools declares the
Size_In_Storage_Elements parameter to be of type
System.Storage_Elements.Storage_Count, which is a subtype whose lower
bound is zero.

My question: When defining a new type derived from Root_Storage_Pool,
and providing a Allocate routine for that type, is the Allocate
routine expected to correctly handle the case when this parameter is
zero?

Consider this:

    procedure Try (Pool : in System.Storage_Pools.Root_Storage_Pool'Class;
                   N    : in Integer) is

        type Arr is array (Natural range <>) of Character;
        type Arr_P is access Arr;
        for Arr_P'Storage_Pool use Pool;

        P  : Arr_P;
        P2 : Arr_P;
    begin
        P  := new Arr (1 .. N);
        P2 := new Arr (1 .. N);
    end Try;

Suppose we call Try(0).

We need to "allocate" a block of memory of size zero.  Of course, P
and P2 have to be set to something that is not null, and they can't be
equal to each other.

Which of the following is true?  In order for things to be portable,
one or the other of these has to be true (and perhaps ought to be
stated in the RM or AARM):

(1) Allocate routines are not expected to deal with
    Size_In_Storage_Elements = 0.  Compilers that generate (possibly
    dispatching) calls to Allocate must be careful to make sure the
    parameter is not zero.  (They might, for instance, use a parameter
    of 1 in a case such as the above, or they might invoke a separate
    mechanism to deal with zero-length allocations.)

(2) Compilers are allowed to generate calls to Allocate with
    Size_In_Storage_Elements = 0; Allocate routines are expected to
    handle them appropriately.

****************************************************************

From: Tucker Taft
Sent: Friday, August 22, 2003  5:32 AM

I see nothing that implies an allocate routine
may assume the Size_In_Storage_Elements is
greater than zero, so it needs to handle zero
"properly."

The more important question I would
ask is whether two distinct evaluations of an
allocator must return access values that compare
not equal.  4.5.2(12) seems to imply that, though
it doesn't state it in language that is as explicit as,
for example, the discussion in 4.5.2(13).

Personally, I see no important reason for an allocate
routine to go out of its way to make sure allocations
of zero-length objects return distinct access values,
but I believe I recall others saying that this is
somehow very important.  As you imply, there is
some distributed overhead in accomplishing that,
and, again personnally, the overhead seems like a waste
of energy.  If a user wants to use access values as
unique tokens in some way, they can jolly-well allocate
non-zero-length objects.

I might go even further and allow two allocators to return
the same access value even for non-zero-size objects, if
the implementation can determine that they can safely share the same
memory space.  Certainly we allow access values to "reappear"
after an unchecked deallocation has been performed.  So we know
that an access value designating a non-existent object (due
to unchecked deallocation) and an access value designating a
newly allocated object can be equal.  Note that it is *not*
erroneous to copy or compare an access value that designates a
non-existent object, though it is erroneous to evaluate "acc_val.all" if
acc_val designates a non-existent object.

[I have always found C's concern about zero-length objects
to be misplaced.  It is often natural to have zero-length
arrays, and even records, but in C you have to go out
of your way sometimes to be sure you don't try to define
a zero-length array, or else the compiler, if ANSI/ISO
compliant, will slap your wrists.  I have heard the argument
relates to pointer equality, but again, what is the big
deal if pointers to two logically distinct objects
happen to be equal?]

So I would probably recommend that 4.5.2(12) explicitly
allow for the possibility that access values might be
equal even if they designate logically distinct objects,
so long as the objects have no updatable components.

****************************************************************

From: Pascal Leroy
Sent: Friday, August 22, 2003  6:58 AM

A related question is whether two distinct evaluations of the 'Access
attribute with different prefixes must necessarily return distinct
access values.  Consider:

	type R is null record;
	type A is array (1..10) of aliased R;
	type T is access all R;
	X : T := A(1)'Access;
	Y : T := A(2)'Access;

I read 4.5.2(12) as requiring that X and Y be distinct.  We deal with
this requirement by ensuring that each array component occupies at least
one byte, but this has surprised users in the past, and it is really
stupid: if you happen to have objects that don't need any storage at
all, why not take advantage of this?

****************************************************************

From: Nick Roberts
Sent: Monday, August 25, 2003  9:14 PM

> A related question is whether two distinct evaluations of
> the 'Access attribute with different prefixes must ne-
> cessarily return distinct access values.  Consider:

> type R is null record;
> type A is array (1..10) of aliased R;

I think Pascal meant:

   A: array (1..10) of aliased R;

> type T is access all R;
> X : T := A(1)'Access;
> Y : T := A(2)'Access;

> I read 4.5.2(12) as requiring that X and Y be distinct.

I think they should be, since it is possible that certain algorithms depend
on this property for their correctness (indeed completability). I think
4.5.2(12) needs to have "Otherwise they are unequal." added.

> We deal with this requirement by ensuring that each array
> component occupies at least one byte, but this has surprised
> users in the past,

I'll bet it has, and I'm pretty sure it's unnecessary. My basic observation
is that access values do not have to inidicate different addresses in order
to have different values. An access value is, in Ada, pointedly (sorry ;-)
not the same as an address.

I'm not certain of the nitty gritty details, but I suspect you could use the
following trick for access subtypes that happen to reference a null object
(subtype): return an arbitrary distinct value for each distinct object; any
instance of  dereferencing becomes a null action (so no machine instruction
will ever be executed trying to use the access value as an address). If the
object subtype is not statically null, I suppose this requires some kind of
dynamic test, but then Ada often requires dynamic tests (e.g. for violations
that must raise an exception).

So, to use the above example, even if an access value is, say, one machine
word in size, and normally contains a linear address, the result of
A(1)'Access could be, say, 1, and of A(2)'Access could be 2 (and so on).
These values (1, 2, etc.) are not addresses, but they don't need to be. The
access values remain distinct. Any attempt to dereference them must not
cause the execution of actual dereferencing code. Array A need take up no
storage (memory) space, nor have any associated (real) address.

In fact, perhaps the definition of the Address attribute in the RM should
have a clause added explictly permitting the result of the attribute, when
it is applied to a null object, to be a value which is not a real (or valid)
address.

I think this same principle can be applied to the earlier question involving
the System.Storage_Pools.Allocate procedure. An allocator (as in the 'new'
reserved word) should always return a distinct access value, even for an
object of zero size. However, the evaluation of the allocator, in such a
case, should /not/ call System.Storage_Pools.Allocate; instead it can return
a value which has nothing to do with addresses (but maintains the
distinctness property).

If the RM needs changing to reflect this, then it should be. While we're
there (RM 13.11(16)), I also think it should be made clear that an
implementation may call Allocate less than once per allocation (by 'new'),
if it wishes (to cluster allocations of small objects).

> and it is really stupid: if you happen to have objects that don't
> need any storage at all, why not take advantage of this?

I couldn't agree more. In fact, I have a suspicion that it is possible there
will be a few algorithms which /require/ this property in order to work (on
a machine with finite memory).

****************************************************************

From: Tucker Taft
Sent: Monday, August 25, 2003  10:16 PM

I am mystified by Nick Roberts comments, as
they seem to be contradictory.  Perhaps he is
saying that by some magic, you can have two
aliased objects with the same address, but
different access values.  I can't imagine how.

In any case, I agree with Pascal that it is
stupid to go out of your way to make all
aliased components have non-zero size.
If there was a realistic example of an
algorithm that fundamentally depended on
access values designating distinct "empty" objects
having different values, I would be interested.

Otherwise, this seems like distributed overhead
serving no purpose.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, August 26, 2003  1:23 AM

Right.  First, I can't imagine how Address_To_Access_Conversion could be made
to work.  Second, I can't imagine that the user community would tolerate the
additional cost of testing for "invalid" access values before each
dereferencing; considering that there are millions of SLOCs out there that do
heavy use of dynamic data structures (using access types), the proposed
performance incompatibility sounds ludicrous to me.

Anyone thinking that access values are not in a one-to-one correspondence with
addresses is seriously misguided.  This is true even for thick pointers,
although in this case the address alone is obviously not sufficient to
reconstruct the entire access value.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 26, 2003  7:43 AM

I am simply saying that an access value which accesses an object of zero
size does not have to have any correspondence with any (real) address at
all.  I feel this is a simple enough concept, and I'm surprised that you are
mystified, Tuck.

You can indeed have two aliased objects with the same address, but different
access values. It's very simple. First consider two aliased objects, A and
B, of size 1 byte (= 1 storage unit) each. They might be laid out in memory
like this:

001A 5670  A  def 1
001A 5671  B  def 1

They might have the access values (in binary) 16#001A_5670# for A'Access and
16#001A_5671# for B'Access. Now consider two more aliased objects, C and D,
of size 0 each. They might be 'laid out' in memory like this:

001A 5672  C  def 0
001A 5672  D  def 0

They might have the access values (in binary) 16#8000_0001# for C'Access and
16#8000_0002# for D'Access. Allakazzam!  Hey presto!  We have two aliased
objects with the same address (16#001A_5672#) but with different access
values (16#8000_0001# and 16#8000_0002#). Can you imagine that?

Seriously, Tuck, I don't understand what the problem is.

Obviously the implementation must ensure it doesn't try to dereference
C'Access or D'Access in the normal way. But in any event it must ensure it
never attempts to load or store anything from or to C'Address or D'Address.
The chances are it's got to make a special test (for a designated subtype of
size 0) anyway.

> In any case, I agree with Pascal that it is
> stupid to go out of your way to make all
> aliased components have non-zero size.

Well, that's a point of agreement, at least.

> If there was a realistic example of an
> algorithm that fundamentally depended on
> access values designating distinct "empty" objects
> having different values, I would be interested.

My point is that such an algorithm is possible. I was not, and am not,
trying to make an assertion about the likelihood of such an algorithm being
used in practice. I would simply prefer it that Ada compilers did not break
such algorithms, even if they are unlikely.

> Otherwise, this seems like distributed overhead
> serving no purpose.

I'm not sure, but I suspect that what I'm proposing would add no appreciable
overhead (speed or memory) to remote access types. I would suggest, for one
thing, that remote access types tend very rarely in practice to have a
designated subtype capable of size 0.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 26, 2003  7:43 AM

> Right.  First, I can't imagine how Address_To_Access_Conversion
> could be made to work.

I did, in my original post, specifically suggest that X'Address be permitted
to result in an unreal or invalid address if X has size 0. In this case, I
suggest To_Pointer(X'Address) could be permitted to return either X'Access
or null, and To_Address(X'Access) could be permitted to return any address
(which might be X'Address, or some other, possibly unreal or invalid,
address).

> Second, I can't imagine that the user community would tolerate the
> additional cost of testing for "invalid" access values before each
> dereferencing;

I think the additional cost of these tests would be an issue of some
complexity. I suspect that in many cases it would turn out to be very little
or none, in fact. Conceptually, the test would not have to be performed at
every dereferencing, as such, but at every elaboration of an access subtype
which could dynamically have a designated subtype of size 0.

(I use 'dynamic' here in the sense of 'at run time', as in something that
cannot be computed at compile time.)

> considering that there are millions of SLOCs out there that do heavy
> use of dynamic data structures (using access types), the proposed
> performance incompatibility sounds ludicrous to me.

Performance incompatibility? What is that? (Perhaps the use of a high-level
language is incompatible with performance, and we should all use assembly?
:-)

Remember that the dynamic test in question is only relevant to access types
whose designated subtype is capable of dynamically having size 0 (as well as
other sizes). I suspect that, of those millions of SLOCs, the vast majority
would not fall into this category anyway.

> Anyone thinking that access values are not in a one-to-one
> correspondence with addresses is seriously misguided.

Surely there are many implementations which add flags and other
paraphernalia to an access value.  Are these implementations all seriously
misguided?

> This is true even for thick pointers, although in this case the
> address alone is obviously not sufficient to reconstruct the
> entire access value.

A thick pointer does not have a one-to-one correspondence with addresses.
You seem to have contradicted your own assertion.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, August 26, 2003  9:29 AM

> I think the additional cost of these tests would be an issue
> of some complexity. I suspect that in many cases it would
> turn out to be very little or none, in fact. Conceptually,
> the test would not have to be performed at every
> dereferencing, as such, but at every elaboration of an access
> subtype which could dynamically have a designated subtype of size 0.

You've lost me.  I fail to see why the check would relate to the
elaboration of a subtype.  Consider:

	type A is access String;
	subtype S is A (1..N);

If N happens to be 0 (at runtime) then objects of subtype S could occupy
0 bytes, and therefore they would be denoted by "magic access values".
But of course the declarations of A and S could be in totally unrelated
units, so any dereferencing of a value of type A should be prepared to
get one of these magic values.  To me, that's one check per
dereferencing.

> Performance incompatibility? What is that?

When we do any change to that language, we try very hard to ensure that
the performance of existing programs is not affected.  This is essential
to support smooth transition from a version of Ada to the next.  That's
what I mean by "performance incompatibility".  If real-time programs
started missing cycles, people could be quite displeased.

> Surely there are many implementations which add flags and
> other paraphernalia to an access value.  Are these
> implementations all seriously misguided?

I'd be curious to hear about real-life examples of such implementations
(I'm not talking of remote access types here).

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 26, 2003  11:43 AM

..
> If N happens to be 0 (at runtime) then objects of subtype S
> could occupy 0 bytes, and therefore they would be denoted by
> "magic access values". But of course the declarations of A and
> S could be in totally unrelated units, so any dereferencing of a
> value of type A should be prepared to get one of these magic
> values.  To me, that's one check per dereferencing.

But you cite only one particular scenario, Pascal (and not in sufficient
detail). There are many others. Supposing the declarations of A and S (and
presumably various associated declarations and statements) do not occur in
totally unrelated units, as will often be the case in practice.

In particular, consider the dereferencing of access values that are known to
be of the same subtype. If the compiler can determine that they are of the
same subtype, it only has to check that subtype once (for having a
designated subtype of zero size), and then it knows whether to treat all the
dereferences as normal or not.

Consider:

   type A is access String;
   ...
   subtype S is A (1..N);
   P1, P2: S := new String'( N*'#' );
   ...
   Do_Something(P1.all,P2.all);

Here the compiler can know that P1 and P2 must either both be dereferenced
normally or both be dereferenced as of 'magic' access values. Only one check
is required (conceptually at the elaboration of S).

Furthermore -- if it's of any interest to anyone -- if my suggestion were
adhered to by the compiler, we can be certain that P1 /= P2 for all values
of N (including 0). I believe it is reasonably conceivable (I don't say
'likely') that an algorithm could depend on this property for its
correctness.

> > Performance incompatibility? What is that?

> When we do any change to that language, we try very hard
> to ensure that the performance of existing programs is not
> affected.  This is essential to support smooth transition
> from a version of Ada to the next.

Well, I accept that principle, broadly.

> That's what I mean by "performance incompatibility".

Okay, I understand now.

> If real-time programs started missing cycles, people could
> be quite displeased.

They might indeed. They might even be displeased enough to rewrite their
time-critical code more efficiently (for example, using an array instead of
a pool).

Actually, I salute your point, Pascal. It occurs to me that an
implementation could provide ways (pragmas, perhaps) to slicken the
operation of a particular access type or subtype, perhaps at the expense of
certain semantic properties (that are generally desirable). This would be
similar to the way in which pragma Suppress can be used to remove run-time
checks where they cause too much speed degradation.

> > Surely there are many implementations which add flags
> > and  other paraphernalia to an access value.  Are these
> > implementations all seriously misguided?

> I'd be curious to hear about real-life examples of such
> implementations (I'm not talking of remote access types
> here).

I really need someone to come galloping to my rescue at this point. I don't
have access to the implementational details of a large number of Ada
compilers.

I would point this out, however: it's not possible for any access type to
have a totally one-to-one relationship with real (meaningful) addresses,
since every access type must have a 'null' value. This value /could/ be
associated with a real address, of course, but I'll bet this isn't actually
done in /any/ implementation.  Maybe I'm arguing about minutiae.

Would you please expand on your entreaty that access values should have a
one-to-one relation to addresses, Pascal. Why should this be?

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 26, 2003  2:18 PM

> I'd be curious to hear about real-life examples of such implementations
> (I'm not talking of remote access types here).

I prefer not to get involved here (mainly because I see no value to the
supposed requirement that pointers to the different objects are unequal, and
I see no requirement in the standard that this is true - there is no "only
if" in 4.5.2(12)).

However, Janus/Ada does have bit pointers (that is, a machine address and
bit offset). Usually, these only show up in renames of packed components and
the like. But, on the U2200, they also were used for (some) access types.
Since that machine had a 36-bit word, and we needed to be able to access
characters (for compatibility with C), we had to allow them as access types.
(Luckily for us, the Unisys code generator already supported those as a
native type, so we just used that rather than something constructed.)

On the U2200, convention C on an access type would force the use of bit
pointers (we also allowed conversions between the derived access types with
different representation, with a check that the bit offset was zero, in
order to provide better code on the Ada side).

****************************************************************

From: Simon Wright
Sent: Tuesday, August 26, 2003  2:41 PM

I thought that, but what about 4.5.2(25)? (I too feel this isn't a
very urgent change.)

****************************************************************

From: Adam Beneschan
Sent: Tuesday, August 26, 2003  3:09 PM

I don't see how that makes a difference.  4.5.2(25) just says that
"/="(X,Y) [if it returns Boolean] is equivalent to "not"("="(X,Y)) in
all cases, whatever X and Y are and whatever "=" returns.  It doesn't
say anything about being evaluated according to a complementary *rule*
that parallels the rule for "=", or anything like that.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 26, 2003  3:17 PM

> I thought that, but what about 4.5.2(25)? (I too feel this isn't a
> very urgent change.)

What about it? It just says that "/=" is the same as (not "="). Certainly,
if A'Access = B'Access, then not (A'Access /= B'Access). That is still
consistent with the RM.

I don't see any reason for the bending over backwards that Pascal indicated
that Rational does - the RM doesn't require it, and access to 0-sized
objects is not very useful anyway. (A zero-sized object for a non-array is
usually a bug, and for arrays, static matching prevents any interesting
access to null slices.)

****************************************************************

From: Simon Wright
Sent: Wednesday, August 27, 2003  3:25 AM

My misunderstanding (I guess I didn't see (25) as introducing an
overarching concept. Brain damage.)

****************************************************************

From: Pascal Leroy
Sent: Tuesday, August 26, 2003  3:43 PM

> I prefer not to get involved here (mainly because I see no value to the
> supposed requirement that pointers to the different objects are unequal, and
> I see no requirement in the standard that this is true - there is no "only
> if" in 4.5.2(12)).

Wrong argument, Randy.  AARM 1.1.3(14.b) says "we often use 'if' to mean 'if
and only if'...".  So I am certainly going to call for an AI to clarify
4.5.2(12).  (I would certainly love to get rid of the ridiculous distributed
overhead that this paragraph seems to impose.)

> However, Janus/Ada does have bit pointers (that is, a machine address and
> bit offset).

A perfectly reasonable implementation, which amounts to saying that bits are
addressable (albeit using a complicated form of address).  However, it is still
the case that access values in this case have a direct correspondence to (bit)
addresses.  And therefore 4.5.2(12) is still a nuisance.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 26, 2003  4:57 PM

> If there was a realistic example of an
> algorithm that fundamentally depended on
> access values designating distinct "empty" objects
> having different values, I would be interested.

It is quite reasonable that a procedure which takes two parameters, both of
a certain access type, tests to ensure they are not equal (because the
procedure's algorithm is based on the assumption of no aliasing). The
procedure might raise an exception if it detects that they are equal.

Another piece of code might call the procedure as part of an algorithm that
diminishes the size of the objects of the access type, and it may make a
call in a limiting case where the size is zero. It may be difficult or ugly
to eliminate the limiting case. Moreover, it might not be obvious that it is
necessary to eliminate the limiting case.

Consider:

   type R is array (Positive range <>) of Float;
   type A is access R;

   procedure Foo (N: in Natural; ...) is

      subtype S is A(1..N);

      procedure Bar (P1, P2: in S) is
         ...
      begin
         if P1 = P2 then raise Constraint_Error; end if;
         ...
      end;

      Q1, Q2: S := new S'(...);
   begin
      ...
      Bar(Q1,Q2);
      ...
   end Foo;

Obviously this isn't a real-life example, but it shows how a call Foo(0,...)
could raise Constraint_Error if the allocator initialising Q1 and Q2 returns
the same access value to both. It might not be obvious to the programmer, in
a situation like this, what is going wrong.

Might this not be considered one of those nasty 'gotchas' that Ada is
supposed to avoid?

> Otherwise, this seems like distributed overhead
> serving no purpose.

As I suggest to Pascal, it might be reasonable for implementations to
mitigate the overhead by providing a pragma which allows the behaviour I
suggest to be removed from a specific access type. (I'm not sure whether the
standard should specify such a pragma.)

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 26, 2003  7:03 PM

> Obviously this isn't a real-life example, but it shows how a call Foo(0,...)
> could raise Constraint_Error if the allocator initialising Q1 and Q2 returns
> the same access value to both. It might not be obvious to the programmer, in
> a situation like this, what is going wrong.

Fine, but I can't imagine an implementation where an access-to-unconstrained
would ever point at a 0-sized object (in practice, as opposed to the
language definition). There has to be an array descriptor somewhere (to give
the bounds), and that by definition is not of zero size.

Therefore, I claim that the "problem" cannot occur with
access-to-unconstrained. (It certainly cannot in Janus/Ada.) The static
matching rules mean it cannot happen with an arbitrary slice of an array. It
could only happen with a access to a subtype of size zero (either a null
array or an null record). What possible use is such a type? It is a pointer
to nothing. Why should we have distributed overhead to support such an
access type? (Rational's solution makes objects larger simply in case
someone tries to do this compare on two access values that are known to
point at nothing. That seems ridiculous.)

So, I would be in favor of Pascal's AI saying something along the lines of
"the result of equality is unspecified if the designated subtype is
constrained and has size 0." Implementors would still be required to make
access-to-unconstrained work (because I can agree that that could be
dangerous; null slices are not uncommon), but that would happen 'naturally'
in the vast majority of implementations.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 26, 2003  9:12 PM

> Fine, but I can't imagine an implementation where an access-
> to-unconstrained would ever point at a 0-sized object (in
> practice, as opposed to the language definition).

Such implementations are indeed (reasonably) possible.

> There has to be an array descriptor somewhere (to give
> the bounds), and that by definition is not of zero size.

Quite true, but the descriptor, or 'dope' data, does not have to accompany
the object's value in memory. It can be elsewhere (and it can be shared by
many access values). Notionally, dope data is associated with (constrained)
subtypes of the access type, rather than with objects per se.

At bottom, the danger, to my mind, is that some implementations may produce
access values for Q1 and Q2 such that the test Q1=Q2 returns True (even if
the binary patterns of the values of Q1 and Q2 are different). I am
exhorting that they should not (be allowed to) do so; I don't actually care
what mechanism they use to ensure this. The mechanism I suggested was only a
suggestion.

> Therefore, I claim that the "problem" cannot occur with
> access-to-unconstrained. (It certainly cannot in Janus/Ada.)

I claim that it can, for implementations different to Janus/Ada.

...
> So, I would be in favor of Pascal's AI saying something along the
> lines of "the result of equality is unspecified if the designated
> subtype is constrained and has size 0."

I continue my opposition to this idea.

It still seems clear to me that an access value is not just a representative
of a storage space (as an address is), but rather it is also a token
representing a (conceptual, as well as actual) object. Possibly this point
is only properly relevant to access-to-limited-type types, but I feel that
when you say "new My_Limited_Type", you don't just mean "allocate some space
on the heap, please", you also mean "create a new token to represent a
distinct object". That second meaning should hold true even if the object
happens to have zero size.

****************************************************************

From: Tucker Taft
Sent: Wednesday, August 27, 2003  9:43 AM

In re-reading 4.5.2(12), in comparison with 4.5.2(13),
it certainly seems "reasonable" to conclude there is
no requirement that access-to-object values be *un*equal if
they designate different objects.  It would also be
reasonable to read "if" as "if and only if" so we
clearly have an ambiguity which deserves clarification by
the ARG.

One way Randy and others have helped illuminate such
issues in the past has been by writing up an example
and seeing what various compilers do with it.  If all
compilers agree, then that is an argument for solidifying
the interpretation that way.  If the compilers don't
agree, then the general approach is to disambiguate in
favor of least implementation disruption, unless there
is a compelling user requirement to the contrary.

Here, we have heard some user arguments in favor of
requiring inequality if the designated objects are
different, but it has not reached the level of
"compelling" in most readers' minds, apparently.

So...

Here is a simple example to illustrate the issue.
Please give this a try and see what your favorite
compiler produces.  One of ours produces False,False,False.
Another produces False, True, True.

----------- access_equality.ada ------------


with Ada.Text_IO; use Ada.Text_IO;
procedure Access_Equality is
     type Int_Arr is array(Positive range <>) of Integer;
     subtype Int_Arr_Zero is Int_Arr(1..Integer'Value("0"));
     type Int_Arr_Ptr is access Int_Arr_Zero;
     for Int_Arr_Ptr'Storage_Size use 500;
       -- Use a sized heap to indicate reclamation may not be necessary

     type Int_Arr_Const_Ptr is access constant Int_Arr_Zero;
       -- Use an access-to-constant to indicate reclamation won't happen

     type Empty_Rec is null record;
     type Empty_Rec_Ptr is access all Empty_Rec;
     type Empty_Rec_Arr is array(Positive range <>) of aliased Empty_Rec;

     A, B : Int_Arr_Ptr;
     CA, CB : Int_Arr_Const_Ptr;
     C, D : Empty_Rec_Ptr;

     ERA : Empty_Rec_Arr(1..10);
begin
     A := new Int_Arr_Zero;
     B := new Int_Arr_Zero;
     Put_Line("A = B? " & Boolean'Image(A = B));

     CA := new Int_Arr_Zero'(others => 0);
     CB := new Int_Arr_Zero'(others => 0);
     Put_Line("CA = CB? " & Boolean'Image(CA = CB));

     C := ERA(1)'Access;
     D := ERA(2)'Access;
     Put_Line("C = D? " & Boolean'Image(C = D));
end Access_Equality;

****************************************************************

From: Pascal Leroy
Sent: Wednesday, August 27, 2003  10:04 AM

My favorite compiler produces False, False, False.  It's my development
version, not anything that any customer has seen or will see.  On the
other hand, we gave special attention to this issue, so I would expect
to see the same result regardless of the target or version.

****************************************************************

From: Nick Roberts
Sent: Wednesday, August 27, 2003  11:10 AM

> In re-reading 4.5.2(12), in comparison with 4.5.2(13),
> it certainly seems "reasonable" to conclude there is
> no requirement that access-to-object values be *un*equal if
> they designate different objects.  It would also be
> reasonable to read "if" as "if and only if" so we
> clearly have an ambiguity which deserves clarification by
> the ARG.

I think everyone agrees with this (I certainly do).

> One way Randy and others have helped illuminate such
> issues in the past has been by writing up an example
> and seeing what various compilers do with it.  If all
> compilers agree, then that is an argument for solidifying
> the interpretation that way.  If the compilers don't
> agree, then the general approach is to disambiguate in
> favor of least implementation disruption, unless there
> is a compelling user requirement to the contrary.

I agree with this approach, so long as it never becomes absolute dogma.

> Here, we have heard some user arguments in favor of
> requiring inequality if the designated objects are
> different, but it has not reached the level of
> "compelling" in most readers' minds, apparently.

Fair enough. If the ARG (WG9?) decides not to require inequality, then I
would like to see that this point is made clear in (the next revision of)
the RM, possibly with a little example drawing attention to the point.

> Here is a simple example to illustrate the issue.
> Please give this a try and see what your favorite
> compiler produces.  One of ours produces False,False,False.
> Another produces False, True, True.

GNAT 3.15p (CygWin build of gcc 2.8.1 on Windows 95) produces False, False,
True.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  8:37 AM

Tucker wrote:

> > Otherwise, this seems like distributed overhead
> > serving no purpose.

And Nick replied:

> I'm not sure, but I suspect that what I'm proposing would add no appreciable
> overhead (speed or memory) to remote access types. I would suggest, for one
> thing, that remote access types tend very rarely in practice to have a
> designated subtype capable of size 0.

Nick, I think you misunderstand what Tucker means by "distributed overhead".
He's not talking about distributed systems, the DS annex, or remote access
types, or anything like that.

"Distributed overhead" means that you pay an efficiency cost for the
mere existence of a feature, even when you don't use that feature.  For
example, if the run-time system is continually checking "has this task
been aborted?", then that's distributed overhead, because programs that
don't use abort at all have to pay for the mere existence of abort in
the language.  The language designer can eliminate distributed overhead
(in this case by not putting abort in the language).  An implementer can
eliminate distributed overhead (by cleverly optimizing the run-time
system in the no-abort case).

****************************************************************

From: Nick Roberts
Sent: Thursday, August 28, 2003  9:54 AM

> Nick, I think you misunderstand what Tucker means by "distributed
> overhead". He's not talking about distributed systems, the DS
> annex, or remote access types, or anything like that.

Thanks for the clarification, Bob; I did misunderstand.

> "Distributed overhead" means that you pay an efficiency cost for
> the mere existence of a feature, even when you don't use that
> feature.  For example, if the run-time system is continually checking
> "has this task been aborted?", then that's distributed overhead,
> because programs that don't use abort at all have to pay for the
> mere existence of abort in the language.

Well the concept is familiar to me, but not the term "distributed overhead".
A term such as "universal overhead" would seem more appropriate. But then
that's computer science for you. (Witness the term "artificial inteligence"
:-)

> The language designer can eliminate distributed overhead (in this
> case by not putting abort in the language).  An implementer can
> eliminate distributed overhead (by cleverly optimizing the run-time
> system in the no-abort case).

I like to think of Ada as being the Rolls Royce (perhaps I should say
Lincoln Continental :-) (or Zob ;-) of programming languages. And we all
know that one of the things you pay for, when you dole out your wedges of
dosh for a car of that kind, is the fact that all the luxury gadgets are
/built into/ the car, and thoroughly integrated (with the car and all the
other gadgets). Inevitably, you end up with a heavyweight car, but that's in
the nature of the car.

In the pursuit of the software engineering goals that are associated with
the fact that big software is written by teams of programmers who are
fallible humans, Ada is (rightly) designed to have as many facilities built
in which reasonably can be. There are many things built into Ada that are
add-ons to most other programming languages, and there are some significant
practical advantages to this fact.

So when saying something like "Oh, but consider the distributed overhead,"
think about what kind of car we're trying to design here.

I know how tricky it is building optimisations into a compiler, especially
when you're trying very hard not to reduce the quality (reliability) of its
code generation. But I think the time has come and gone when optimisation
was an optional feature of compilers. Expectations have moved on.

To return to the issue in this thread, Tucker was presumably saying that the
feature I suggested -- that access values referencing empty (zero size)
objects should have distinct values for different objects -- would add a
speed or memory penalty to all implementations, on the basis that access
values would have two kinds of value, rather than just one (an address).

I said, more than once, that any compiler could eliminate the overhead
(associated with having two kinds of value) for any access type that it
could (statically) deduce could never reference an empty object. I hesitate
to use the word 'trivial' (perhaps it's fair to say that no optimisation is
trivial), but it doesn't seem overly complicated. And I'm fairly sure that
the vast majority of access types would turn out to be readily optimisable
in this way.

I don't want Ada to become a hopelessly overweight language, but it is
already a heavyweight language, compared to the typical alternatives, and we
should be proud of it. I think that shrinking from adding even small extra
complexities to the language at this point would be falling between two
stools.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  8:38 AM

Tucker wrote:

> In re-reading 4.5.2(12), in comparison with 4.5.2(13),
> it certainly seems "reasonable" to conclude there is
> no requirement that access-to-object values be *un*equal if
> they designate different objects.

This is a very "creative" reading indeed.  ;-)  Methinks you're trying to
make it say what you *want* it to say.

Pascal points out (in response to Randy) that "if" often means "if and
only if" in the RM.  How one can tell in general whether a given "if"
means just "if" is an interesting question, but in *this* case it
clearly means "if and only if".  I rely on Robert's rule that the RM
never says anything silly: If "if" meant just "if" here, then an
implementation that always returned True for "=" on *all* access types
would be correct, which is patently ridiculous (and by Robert's rule,
language designers never say ridiculous things.  ;-))

I am very much opposed to an "=" operator that returns True in some
cases, but in other cases, returns True or False at the whim of the
implementer.  Here's a typical singly-linked list walk:

    while L /= null loop
        ...
        L := L.all.Next;
    end loop;

Your interpretation of 4.5.2(12) means that L /= null can be False when
L is not null, which is clearly nonsensical, and would break many
programs.  Therefore, this particular "if" must mean "if and only if",
and we don't need an AI to say so, because it's obvious.

You say, "in comparison with 4.5.2(13)", but I think you read too much
into that comparison.  Para 13 is written that way because of the
special permission to return the wrong answer in some cases, for
access-to-procedure.  As I recall, this special permission was granted
at the request of one particular vendor (a vendor that no longer
supports Ada, in fact).  It's poor language design, and I said so at the
time, and you agreed, but we all gave in to the vendor.  In any case,
para 12 has no such permission, and is therefore written in plain
English.

>...It would also be
> reasonable to read "if" as "if and only if" so we
> clearly have an ambiguity which deserves clarification by
> the ARG.

No.  "If and only if" is *clearly* the intended meaning.
So any relaxation of the requirements on "=" would be a language
change -- an inadvisable language change, in my opinion.

> One way Randy and others have helped illuminate such
> issues in the past has been by writing up an example
> and seeing what various compilers do with it.  If all
> compilers agree, then that is an argument for solidifying
> the interpretation that way.  If the compilers don't
> agree, then the general approach is to disambiguate in
> favor of least implementation disruption, unless there
> is a compelling user requirement to the contrary.

That approach isn't much help when you're proposing that it be
implementation dependent whether "=" returns True or False in certain
cases.  Whatever the results of the experiment, they support the
contention that "it is implementation dependent whether blah blah
blah..."

> Here, we have heard some user arguments in favor of
> requiring inequality if the designated objects are
> different, but it has not reached the level of
> "compelling" in most readers' minds, apparently.

It is compelling in my mind.

> So...
>
> Here is a simple example to illustrate the issue.
> Please give this a try and see what your favorite
> compiler produces.  One of ours produces False,False,False.
> Another produces False, True, True.

And you find this kind of implementation variability acceptable?
I don't.  I think one of our implementations has a bug.  Pascal's
favorite compiler does not.  Nick reports that GNAT does.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  8:38 AM

I disagree here with some of the points made by Tucker, Pascal, and
Randy.  I don't like the idea of changing the meaning of "=" on access
values, and I don't buy the claim that the semantics is currently ill
specified.  And I see no distributed overhead.

Anyway, here's my answer to Adam Beneschan's original question:

> The Allocate routine in System.Storage_Pools declares the
> Size_In_Storage_Elements parameter to be of type
> System.Storage_Elements.Storage_Count, which is a subtype whose lower
> bound is zero.
>
> My question: When defining a new type derived from Root_Storage_Pool,
> and providing a Allocate routine for that type, is the Allocate
> routine expected to correctly handle the case when this parameter is
> zero?

No.  First of all, a user-defined allocator can do anything it likes.
For example, it is legitimate to say:

    procedure Allocate(...) is
        pragma Assert(Size_In_Storage_Elements = 12);

or:

    procedure Allocate(...);
    -- If Size_In_Storage_Elements is not 12, this will behave in an
    -- unpredictable manner, ... demons ... noses ...

In fact, the whole point of having this feature in the language is so
that users can take advantage of special properties of the objects being
allocated (for efficiency reasons).  I have written a number of storage
pool types of this nature.  They have documented restrictions on what
the client can do with them (all objects must be the same size, all
object must be small, all objects must have a particular alignment, etc).

Two access values that designate distinct objects must be unequal.
The most sensible way for an implementation to do this, in my opinion,
is to check for zero size at the call site.  If size = 0, pass 1 instead.

This is more efficient than having Allocate deal with it, because in
99.99% of all cases, the size is known at compile time to be zero or
nonzero.  (OK, I admit it, I made up that 99.99% statistic.  But I think
it's not too far wrong.)  Thus, the compiler can do the check at compile
time, so there is no cost at run time.  Note that the size need not be
known at compile time.  For example, if a dynamic array is being
allocated with run-time-known size, and if dope is stored with the array
(the usual case), we know at compile time that more than zero bytes are
being allocated, even though we don't know how many.  This optimization
is trivial, so I'm surprised to hear the various compiler writers
whining about it.

I was going to say that we should change the RM to *require* this
behavior.  That is, require that 0 never be passed to Allocate, so that
it must be dealt with at the call site (either by turning size=0 into
size=1, or by more esoteric tricks).

However, I now believe that this is *already* required, though the
argument is somewhat subtle.  Here's the contract for Allocate
(AARM-13.11(21)):

 21    {erroneous execution (cause) [partial]} If Storage_Pool is specified for
 an access type, then if Allocate can satisfy the request, it should allocate a
 contiguous block of memory, and return the address of the first storage
 element in Storage_Address. The block should contain Size_In_Storage_Elements
 storage elements, and should be aligned according to Alignment. The allocated
 storage should not be used for any other purpose while the pool element
 remains in existence. If the request cannot be satisfied, then Allocate should
 propagate an exception [(such as Storage_Error)]. If Allocate behaves in any
 other manner, then the program execution is erroneous.

If Allocate obeys this contract, then the implementation must ensure
that the semantics of access types is obeyed, including "=" on
zero-sized things.  The contract does *not* require that Allocate return
unique addresses; it just requires that the block of storage not overlap
anything else, which is trivially true for a zero-sized block.

Therefore, the implementation must arrange for "=" to work on
access-to-zero-sized things, and the simplest way to do that would be to
never pass in 0 (pass 1 instead).  I think we should change the RM to
require that more clearly.

> Consider this:
>
>     procedure Try (Pool : in System.Storage_Pools.Root_Storage_Pool'Class;
>                    N    : in Integer) is
>
>         type Arr is array (Natural range <>) of Character;
>         type Arr_P is access Arr;
>         for Arr_P'Storage_Pool use Pool;
>
>         P  : Arr_P;
>         P2 : Arr_P;
>     begin
>         P  := new Arr (1 .. N);
>         P2 := new Arr (1 .. N);
>     end Try;
>
> Suppose we call Try(0).
>
> We need to "allocate" a block of memory of size zero.  Of course, P
> and P2 have to be set to something that is not null, and they can't be
> equal to each other.

Correct.  However, I claim this is the responsibility of the
implementation, not of the user-defined Allocate procedure.

> Which of the following is true?  In order for things to be portable,
> one or the other of these has to be true (and perhaps ought to be
> stated in the RM or AARM):
>
> (1) Allocate routines are not expected to deal with
>     Size_In_Storage_Elements = 0.  Compilers that generate (possibly
>     dispatching) calls to Allocate must be careful to make sure the
>     parameter is not zero.  (They might, for instance, use a parameter
>     of 1 in a case such as the above, or they might invoke a separate
>     mechanism to deal with zero-length allocations.)

True.

> (2) Compilers are allowed to generate calls to Allocate with
>     Size_In_Storage_Elements = 0;

Technically true, but inadvisable, because then the implementation would
have to stand on its ear and spit nickles in order to implement "="
correctly.

>... Allocate routines are expected to
>     handle them appropriately.

False; that is, according to RM-13.11(21), Allocate of 0 size can return
the same address as some other allocate (in the past, or in the future,
zero-sized or not).

****************************************************************

From: Nick Roberts
Sent: Thursday, August 28, 2003  10:47 AM

> I disagree here with some of the points made by Tucker, Pascal,
> and Randy.  I don't like the idea of changing the meaning of "=" on
> access values, and I don't buy the claim that the semantics is
> currently ill specified.  And I see no distributed overhead.

I would add that it is still my contention that it is /important/ that "="
always returns False for access values that reference different objects,
particularly for a designated type that is limited, but possibly also for
other designated types. I agree with Bob's interpretation of the 'if' in
4.5.2(12), but I think adherence to this interpretation matters (rather than
just being a bureaucratic nicety).

> Two access values that designate distinct objects must be
> unequal. The most sensible way for an implementation to do this
> in my opinion, is to check for zero size at the call site.  If size = 0,
> pass 1 instead.

I'm not too keen on this idea. I have a suspicion that it could possibly
break certain algorithms in the absence of either explicit deallocation or
garbage collection. I'd prefer a technique that allocated no (heap) memory
at all.

What algorithms? I'm thinking of a generic unit that gets passed two integer
formal parameters, let's say N1 and N2. It may be a legitimate combination,
in a limiting case, for say N1=0 and N2=huge, and it may be that the
algorithm involves allocating an object of size N1*k in a loop that iterates
N2 times. You may scoff, but I think it would be a shame for such an
algorithm to fail because it runs out of heap space. It would also be a
shame if the algorithm was too slow in this case because of time spent
allocating or deallocating memory (that would be obviated by using a
technique that did not allocate any heap at all).

I might invoke the Principle of Least Surprise. Is it not going to be
surprising to find that the access values referencing two different objects
compare equal? Is it not going to be surprising to find that allocating an
object of zero size uses up some heap space?

> > (2) Compilers are allowed to generate calls to Allocate with
> >     Size_In_Storage_Elements = 0;
>
> Technically true, but inadvisable, because then the implementation
> would have to stand on its ear and spit nickles in order to
> implement "=" correctly.

Point taken, but I'm suggesting that this is a case where it's better for
compilers to (have to) perform the aforementioned feats of aural specie
progenesis than to (be allowed to) surprise the user.

Although admitedly, a compiler standing on its ear and spitting nickles
might be a /little/ bit surprising.

****************************************************************

From: Adam Beneschan
Sent: Thursday, August 28, 2003  11:25 AM

Bob Duff wrote:

> I was going to say that we should change the RM to *require* this
> behavior.  That is, require that 0 never be passed to Allocate, so that
> it must be dealt with at the call site (either by turning size=0 into
> size=1, or by more esoteric tricks).
>
> However, I now believe that this is *already* required, though the
> argument is somewhat subtle.  Here's the contract for Allocate
> (AARM-13.11(21)):

Tucker didn't see it that way.  And to me, having two heavyweights
disagreeing on the correct interpretation of an RM paragraph is, in
itself, compelling evidence that the RM needs to be "changed", or at
least reworded to clarify the meaning.

...
> > (2) Compilers are allowed to generate calls to Allocate with
> >     Size_In_Storage_Elements = 0;
>
> Technically true, but inadvisable, because then the implementation would
> have to stand on its ear and spit nickles in order to implement "="
> correctly.
>
> >... Allocate routines are expected to
> >     handle them appropriately.
>
> False; that is, according to RM-13.11(21), Allocate of 0 size can return
> the same address as some other allocate (in the past, or in the future,
> zero-sized or not).

I don't see how (2) can be partly true (even "technically true") and
partly false.  If the compiler is allowed to generate code passing
zero to the allocate routine when a zero-size object is allocated, and
Allocate is allowed to raise Storage_Error or do any fool thing it
pleases when it gets a zero size parameter, then how can we expect
things to work?  You might as well say that program execution is
erroneous if you call an allocator on a zero-sized object.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  11:44 AM

Tucker wrote:

> [I have always found C's concern about zero-length objects
> to be misplaced.  It is often natural to have zero-length
> arrays, and even records, but in C you have to go out
> of your way sometimes to be sure you don't try to define
> a zero-length array, or else the compiler, if ANSI/ISO
> compliant, will slap your wrists.

I agree with all that, but it seems to argue in favor of making
zero-sized things work properly -- the opposite of what you're arguing.
A common mistake in language design is to fail to allow for the zero
case.  For example, Ada's 'while' and 'for' loops are *better* than
Fortran's DO loop, because the DO loop doesn't properly deal with
zero-trip loops.  (I don't know modern Fortran well; I'm talking about
FORTRAN 66.)  Another example: it is a flaw in Ada that you can't write
a zero-length (or one-length!) positional array aggregate.

>...  I have heard the argument
> relates to pointer equality, but again, what is the big
> deal if pointers to two logically distinct objects
> happen to be equal?]

Well, C is different, because everything is aliased.

But anyway, I think you and Pascal and Randy are falling into the trap
of designing the language to make the compiler-writer happy.  Please
everybody take off their compiler-writer hats, and put on their
language-designer hats, and pay attention to what user's need.

Users need a mathematically clean semantics.  Not little hacks like
"equality might not work on some implementations under certain
circumstances".

Equality of access types is all about object identity.  This is why
X=Y is different from X.all=Y.all.  It seems very important to me that
X=Y returns true if AND ONLY IF the designated objects are the same
object (or both null).  It doesn't matter where the access values came
from ("new" or "X'Access") -- if they designate distinct objects, then
the access values should be unequal.

Nick Roberts wrote:

> It still seems clear to me that an access value is not just a representative
> of a storage space (as an address is), but rather it is also a token
> representing a (conceptual, as well as actual) object. Possibly this point
> is only properly relevant to access-to-limited-type types, but I feel that
> when you say "new My_Limited_Type", you don't just mean "allocate some space
> on the heap, please", you also mean "create a new token to represent a
> distinct object". That second meaning should hold true even if the object
> happens to have zero size.

Well said.

The argument seems to be that access-to-zero-size is rare, so who cares
what "=" does in that case.  If that were true, "=" should raise an
exception, rather than returning a wrong answer.  It would be worse to
return a correct answer on some implementations, and a wrong answer on
others.

I understand that zero-sized things are rare, and we don't want to pay
distributed overhead for their existence, but I see no reason to believe
that "=" is rarer on access-to-zero-size than it is on other access
values.  On the contrary: there's no data at the end of that pointer to
fiddle with, so just about the only thing you want to do is "=".

In any case, proving that some feature is rare is not a proof that it's
pathological.  That's what causes nasty bugs: rare cases that do
something unusual.  We don't need proof that this feature is useful
(although Nick Roberts made a pretty good argument to that effect).
To change the rules, we need a proof that this feature is useless,
of which I've seen none.

Back to Tuck:

> So I would probably recommend that 4.5.2(12) explicitly
> allow for the possibility that access values might be
> equal even if they designate logically distinct objects,
> so long as the objects have no updatable components.

Sorry, Tuck, but this seems even crazier than the zero-size idea, to me.
I thought the problem was that you don't want to have the overhead of
saying "if size=0 then allocate 1 byte instead".  But without that
check, access-to-zero-size can end up "=" to access-to-nonzero-size,
which the above rule doesn't cover.  And the above rule violates the
principle that object identity is different from equality of the
designated values.  It seems crazy to allow an implementation to make a
hash table and return equal access values for new objects whose
designated values happen to be equal (a pessimization!).

Nick Roberts wrote:

> I really need someone to come galloping to my rescue at this point.

At your service.

I see no reason for any distributed overhead.  I see two potentials for
distributed overhead.  (1) "new" needs to somehow check for the
zero-size case, and do something different.  Elsewhere, I claimed that
this is trivially optimized away in 99.99% of cases.  (2) All aliased
objects need to be allocated non-zero space.  I think it's silly to
worry about this one, since zero-sized objects are rare, but if you want
to worry about it, fairly simple compiler optimizations can eliminate
this overhead.  Nick outlined how.  Tuck said he didn't understand, and
Pascal said it involved overhead on every dereference, and then perhaps
Nick confused the issue by talking about all manner of complicated
optimizations, which are unnecessary, I think.

In fact, I think Nick's idea is just fine, and need not have any
overhead.  An address is some (say) 32-bit value.  Clearly, there are
many bit patterns that do not represent actual storage locations for
objects of the type in question.  A "new" of a zero-sized object just
needs to return a unique one of those.  Surely that's trivial.  For
example, "new (zero-sized-thing)" could return an address pointing into
the code segment.  A unique one.

Pascal and Nick were worried about extra instructions executed on every
dereference.  But I think that's unnecessary, and it falls out easily
without fancy compiler work.  Three cases:

    - The designated object is known at compile time to be zero size.
      Deref does nothing.

    - The designated object is known at compile time to *not* be zero
      size.  This is the usual case, as I have argued elsewhere.
      Clearly, nothing special need be done in this case.

    - The size of the designated object is dynamic, and might be zero.
      Here, dereferencing involves either a no-op (in case of
      pass-by-ref), or a copy of N bytes or N words or whatever,
      where N is dynamic.  Well, copying N bytes/words where N happens
      to be 0 is a no-op, and does not touch the designated memory.
      No special check for zero is needed -- just the usual loop,
      checking whether we're done copying N bytes.

So Nick's optimization is trivial, I claim.  I also claim that even this
trivial work is a waste of energy, given the rarity of zero-sized
*aliased* objects.

So where's this horrible distributed overhead that makes everybody want
to define "=" on access values to do something other than compare object
identities?!

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  12:00 PM

Robert A Duff wrote:
>
> Tucker wrote:
>
> > In re-reading 4.5.2(12), in comparison with 4.5.2(13),
> > it certainly seems "reasonable" to conclude there is
> > no requirement that access-to-object values be *un*equal if
> > they designate different objects.
>
> This is a very "creative" reading indeed.  ;-)  Methinks you're trying to
> make it say what you *want* it to say.

No, I truly wasn't.  It seems careful to say that *if* they designate
the same object, then they are equal.

> I am very much opposed to an "=" operator that returns True in some
> cases, but in other cases, returns True or False at the whim of the
> implementer.  Here's a typical singly-linked list walk:
>
>     while L /= null loop
>         ...
>         L := L.all.Next;
>     end loop;
>
> Your interpretation of 4.5.2(12) means that L /= null can be False when
> L is not null, which is clearly nonsensical, and would break many
> programs...

That's not actually what I said.  I said 4.5.2(12) was ambiguous,
and I wasn't certain how it should be interpreted.  I never in my
wildest dreams interpreted it as meaning that a null access value could
ever be equal to an access value that designated an object, even
a zero-size object.

> ...
> >...It would also be
> > reasonable to read "if" as "if and only if" so we
> > clearly have an ambiguity which deserves clarification by
> > the ARG.
>
> No.  "If and only if" is *clearly* the intended meaning.
> So any relaxation of the requirements on "=" would be a language
> change -- an inadvisable language change, in my opinion.

Well here I don't agree.  The intended meaning is not completely
clear (to me) in the context of neighboring paragraphs, and zero-sized
objects and access-to-constant values are unusual enough
that I suspect they were not carefully considered for the
definition of access-value equality.  (I mention access-to-constant
values here because I could imagine an implementation that shares
storage between two access-to-constant allocators that create
constants with identical values.  This kind of sharing is quite common
for string literals in C, which are essentially like access-to-constant-Strings
in Ada.)

> > One way Randy and others have helped illuminate such
> > issues in the past has been by writing up an example
> > and seeing what various compilers do with it.  If all
> > compilers agree, then that is an argument for solidifying
> > the interpretation that way.  If the compilers don't
> > agree, then the general approach is to disambiguate in
> > favor of least implementation disruption, unless there
> > is a compelling user requirement to the contrary.
>
> That approach isn't much help when you're proposing that it be
> implementation dependent whether "=" returns True or False in certain
> cases.  Whatever the results of the experiment, they support the
> contention that "it is implementation dependent whether blah blah
> blah..."

Alternatively, the results support the contention that code that has
been ported to several compilers must not be critically dependent
on the answer, and so it is overspecification to say what happens
for zero-sized objects.

>
> > Here, we have heard some user arguments in favor of
> > requiring inequality if the designated objects are
> > different, but it has not reached the level of
> > "compelling" in most readers' minds, apparently.
>
> It is compelling in my mind.

What is "compelling"?  That the "if" was intended to mean "if and only if"
in this paragraph, or that there is useful code that depends on access values
designating distinct zero-sized objects having unequal values?

I find the case of aliased components that happen to be of zero-size a
pretty convincing argument for not requiring such objects to have unique
access values.

>
> > So...
> >
> > Here is a simple example to illustrate the issue.
> > Please give this a try and see what your favorite
> > compiler produces.  One of ours produces False,False,False.
> > Another produces False, True, True.
>
> And you find this kind of implementation variability acceptable?

Yes.  Comparing addresses of objects is fundamentally implementation-dependent,
especially zero-sized objects.  I find the C rule that no object
can be of zero size a big waste of energy, and forcing the implementation
to do it for aliased objects in Ada seems a similar waste of energy.

You and I have generally had different comfort levels with implementation
variation.  I believe the language should not specify things that
users shouldn't care about to begin with, such as order of evaluation
of parameters, precision of intermediates, etc.  Java made an effort
to specify all of these things exactly, but that seems misguided since
Java is inherently a multi-threaded languages, and relative rates of
progress of threads can't be specified.  Furthermore, their efforts to
specify precision of floating point intermediates ultimately ended
up causing serious performance problems for X86 implementations, and
so they have pretty much dropped it.

I think I was particularly influenced by Djikstra's book "A Discipline
of Programming" on the value of not overspecifying semantics that
shouldn't matter to the user.  In the language in that book, a multi-arm
"if" or "do"-loop allowed overlapping conditions, in which case it
was implementation-dependent which "arm" would be chosen.  He shows
that this actually simplifies proofs in some cases.

[An interesting counter example to our general preferences is that
I tend to always use short-circuit logical operations, whereas you
seem to prefer symmetrical, order-unspecified, "and"s and "or"s.]

> I don't.  I think one of our implementations has a bug.  Pascal's
> favorite compiler does not.

Though Pascal seems to regret having to do the funny business with
aliased components to make it work the way it does.

> Nick reports that GNAT does.

Given that we supply front ends to Aonix and Green Hills, this means
that the "big 4" vendors, ACT, Aonix, Green Hills, and Rational all
might have different behavior with respect to access values designating
zero-sized objects.  This can't be something that is too critical
to a user who cares about portability across compilers, which after
all, is what standards are all about.  Standards should specify enough
to provide useful portability, but without overspecifying things that
are of little or no interest to users.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003  1:05 PM

Then you need to make an argument that it is naughty to depend on the
result of "=" on access-to-zero-sized things.  I have seen no such
argument -- just complaining from compiler vendors about distributed
overhead (which I argued in another message is not a real issue; I'd
like to hear your response).

Not an argument that it's *rare* to do such a thing.  An argument that
it is *bad practise* in the same sense as depending on order of eval and
the like.  In other words, please explain what is special about the
zero-size case that makes object identity irrelevant or evil.

****************************************************************

From: Tucker Taft
Sent: Thursday, August 28, 2003  2:04 PM

> Then you need to make an argument that it is naughty to depend on the
> result of "=" on access-to-zero-sized things.

Well "naughty" is hard to define.  It seems pointless.  A user
could depend on all kinds of things.  Why should they expect
that zero-sized objects occupy non-zero space?  It seems especially
mysterious that an array of zero-sized objects is not itself
of zero size, if the components are declared "aliased."

And how "big" should zero-sized aliased components be?
Can users rely on zero-sized aliased components being exactly
one addressible unit in size, or one word in size, or one
gigabyte in size?

> ... I have seen no such
> argument -- just complaining from compiler vendors about distributed
> overhead (which I argued in another message is not a real issue; I'd
> like to hear your response).

It seems that if you have two consecutive aliased components in a record,
or an array of aliased components, whose size is not known at compile-time,
then there will have to be code at run-time to calculate the size, and if zero,
bump it up to something non-zero.  Similarly, you will need to worry about the
gaps that doing this creates, so that equality comparison will
know not to use block-compare, or you will have to initialize the
gaps.  Every allocator for an array which might be of zero size will
need code to check for this special case.  This seems like distributed
overhead.  We can argue about whether it is significant, but it seems
clear that it will affect all array allocations where the high bound is dynamic
and might be less than the low bound.

> Not an argument that it's *rare* to do such a thing.  An argument that
> it is *bad practise* in the same sense as depending on order of eval and
> the like.  In other words, please explain what is special about the
> zero-size case that makes object identity irrelevant or evil.

It's bad practice to depend on things which are not necessarily true ;-).

When it comes to zero-sized objects, it seems bad practice to assume
they actually take up heap space.  Or alternatively, it seems like bad
practice to assume that logically distinct objects must have distinct
access values, if it would be in all other ways OK for them to share the same
address (e.g., because they are of zero size, or they are both acc-to-constant
values and they designate constants with identical values).

Your argument seems to rest on the fact that it is mathematically cleaner
to be able to say that if two objects are distinct, then their access
values are not equal.  But that seems like a tautology.  Suppose I claim it is
mathematically cleaner to make no such assumption.  Neither of us really
have an argument here.  Ultimately I think we have to recognize that access
value equality is testing exactly whether two objects have the same address,
no more, and no less.  I am willing to agree that a given object has one
and only one address, and so all access values designating a given object
must be equal, but I don't see any mathematical requirement that two different
objects not share the same address.  It is sort of whether we choose the
Fermion or Boson statistics ;-).

****************************************************************

From: Randy Brukardt
Sent: Thursday, August 28, 2003  3:01 PM

Bob said:
> I see no reason for any distributed overhead.  I see two potentials for
> distributed overhead.  (1) "new" needs to somehow check for the
> zero-size case, and do something different.  Elsewhere, I claimed that
> this is trivially optimized away in 99.99% of cases.

This is false. It was true in Ada 83, but since the pool interface is
exposed to the user in Ada 95, you cannot depend on the calls having the
correct form. (Janus/Ada 83 not only prevented 0 as a size [which crashed
the assembly code that implemented the heap manager], but also rounded sizes
up to the next alignment boundary [usually 4]. But for Ada 95, we had to
move that code into the pool, because a user-written call could violate the
assumptions, and we didn't want the system to crash in that case.

The only way to do anything at the compiler level would be to have a
separate interface for the standard heap, but that seems like overkill (why
have multiple ways of allocating memory?) and is certainly distributed
overhead in the compiler (you'd have to decide which method to use on every
allocation and deallocation).

> (2) All aliased
> objects need to be allocated non-zero space.  I think it's silly to
> worry about this one, since zero-sized objects are rare, but if you want
> to worry about it, fairly simple compiler optimizations can eliminate
> this overhead.

The trouble is that you don't know at compile-time that the objects have
zero size. The usual problem is that you have an array subtype that is
dynamically-sized and might be zero. You then need distributed overhead to
handle check whether it is zero and somehow allocate more than the normal
memory. But, as we all agree, this case will be rare, yet it will occur on
every dynamic component.

> ...  Nick outlined how.  Tuck said he didn't understand, and
> Pascal said it involved overhead on every dereference, and then perhaps
> Nick confused the issue by talking about all manner of complicated
> optimizations, which are unnecessary, I think.
>...
>     - The size of the designated object is dynamic, and might be zero.
>       Here, dereferencing involves either a no-op (in case of
>       pass-by-ref), or a copy of N bytes or N words or whatever,
>       where N is dynamic.  Well, copying N bytes/words where N happens
>       to be 0 is a no-op, and does not touch the designated memory.
>       No special check for zero is needed -- just the usual loop,
>       checking whether we're done copying N bytes.

This is also false. Simply setting up the addressing for a block move could
cause an address fault on common processors. I do agree that if you make up
the move as a loop of very simple instructions, it wouldn't have that
property, but that alone is a distributed (size) overhead -- such loops are
quite a bit larger than the single instruction that they are replacing.

I don't know that I care enough about this issue -- no matter what is
decided, I'm not going to change how Janus/Ada deals with these cases, and I
rather doubt that anyone else is, either. (There is no user complaint here,
only hot air.) I'm not going to force objects to be larger or bloat program
size just so Nick and Bob can have a "mathematically pure" language.

****************************************************************

From: Nick Roberts
Sent: Thursday, August 28, 2003  4:49 PM

"Randy Brukardt" <randy@rrsoftware.com> wrote:

> I don't know that I care enough about this issue -- no matter
> what is decided, I'm not going to change how Janus/Ada deals
> with these cases, and I rather doubt that anyone else is, either.
> (There is no user complaint here, only hot air.) I'm not going to
> force objects to be larger or bloat program size just so Nick
> and Bob can have a "mathematically pure" language.

I going to try one more time to convince people that issue is not just hot
air.

Consider the following example, which I hope doesn't seem too far fetched.

   type A_String is access String;

   function "+" (S: String) return A_String is
   begin
      return new String'(S);
   end;

   type A_String_Array is array (Positive range <>) of String;

   Zoo: constant A_String_Array := (+"aardvark", +"baboon", ... );

This much is taken directly from Barnes [Programming in Ada 95 4th ed. p
184]. Later on he sets an exercise for the reader to print out this ragged
array.

   procedure Print (A: in A_String_Array) is
      Sentinel: constant A_String := A(A'Last);
      Current: Positive := A'First;
   begin
      loop
         Put_Line( A(Current).all );
         exit when A(Current) = Sentinel;
         Current := Current + 1;
      end loop;
   end;

Of course, this is not the most obvious implementation, but it appears
correct (provided it is not passed an empty array), and differences of
detail could prompt this kind of implementation in a real case. I suggest it
might puzzle, and perhaps dismay, many a programmer when:

   Print( (+"foo",+"bar",+"",+"") );

prints only three lines, or, worse, that:

   Print( (+"foo",+"",+"bar",+"") );

prints only two lines.

I appreciate that Janus/Ada would never commit either of these sins, but I
think we've established that some other compilers probably would commit the
first one, and possibly the second. Out of interest, I wonder what Janus/Ada
(and other compilers) would do if the first declaration were changed to:

   type A_String is access constant String;

especially if the following declarations and call were made:

   X1: constant aliased String := "foo";
   X2: constant aliased String := "bar";
   X3: constant aliased String := "";
   X4: constant aliased String := "";

   Print( (X1'Access,X2'Access,X3'Access,X4'Access) );

As an aside, I suggest it might be even more puzzling (and dismaying) for:

   Print( (+"foo",+"bar",+"hum",+"foo") );

to print just one line.

****************************************************************

From: Robert A. Duff
Sent: Thursday, August 28, 2003 12:23 PM

> Tucker didn't see it that way.  And to me, having two heavyweights
> disagreeing on the correct interpretation of an RM paragraph is, in
> itself, compelling evidence that the RM needs to be "changed", or at
> least reworded to clarify the meaning.

Well, maybe.  I'm hoping to convince Tuck (and Pascal and Randy)
that they were wrong on this point.

> > > (2) Compilers are allowed to generate calls to Allocate with
> > >     Size_In_Storage_Elements = 0;
> >
> > Technically true, but inadvisable, because then the implementation would
> > have to stand on its ear and spit nickles in order to implement "="
> > correctly.
> >
> > >... Allocate routines are expected to
> > >     handle them appropriately.
> >
> > False; that is, according to RM-13.11(21), Allocate of 0 size can return
> > the same address as some other allocate (in the past, or in the future,
> > zero-sized or not).
>
> I don't see how (2) can be partly true (even "technically true") and
> partly false.  If the compiler is allowed to generate code passing
> zero to the allocate routine when a zero-size object is allocated, and
> Allocate is allowed to raise Storage_Error or do any fool thing it
> pleases when it gets a zero size parameter, then how can we expect
> things to work?  You might as well say that program execution is
> erroneous if you call an allocator on a zero-sized object.

(2) can be partly true because that's what the RM says (clearly, I would
claim -- once I studied the relevant paragraphs for hours. ;-))
The RM does not forbid the implementation from passing zero.
The RM does not require Allocate to return unique addresses.
The RM does require equality to work.
Therefore, the implementation must use some means of making equality
work (in the zero case) that does not rely on the Allocate routine
returning unique addresses.

That's all kind of silly; I think the RM should explicitly forbid the
implementation to pass 0 to Allocate.

The contract for Allocate says it can do one of three things,
if the size is 0:

    1. Raise an exception.  In this case, the "new" propagates the same
       exception.

    2. Return the address of a block of memory that does not overlap
       anything else.  Since a zero-sized block does not overlap
       anything else, any address will do, including the same address
       returned for some other Allocate call, whether zero or not.
       In this case, the implementation is required to make equality
       work.

    3. Anything else.  In this case, the program is erroneous if zero is
       passed in.  But that's OK -- there is presumably a comment
       telling clients "don't do that".

****************************************************************

From: Adam Beneschan
Sent: Thursday, August 28, 2003  9:44 PM

...
> Well, maybe.  I'm hoping to convince Tuck (and Pascal and Randy)
> that they were wrong on this point.

I don't see how that helps.  If Tuck did get it wrong, then how can we
expect a poor less-elevated user, who's writing his own storage pool
manager and trying to read RM to figure out whether they need to write
their Allocate routine to handle Size_In_Storage values of zero, to
get it right?  I still maintain that whatever the result of this
discussion, the RM needs to address this issue.

...
> (2) can be partly true because that's what the RM says (clearly, I would
> claim -- once I studied the relevant paragraphs for hours. ;-))

Perhaps I should have made it clear that I was asking about what the
RM ought to say, rather than what it does say.  If it actually says
anything to the effect that (2) is partly true and partly false, it's
a bug.

****************************************************************

From: Robert Duff
Sent: Saturday, August 30, 2003  8:48 AM

I feel like this argument is going around in circles, and I find it
confusing.  I'm not sure, for example, exactly what Tuck, Pascal, and
Randy believe the semantics of "=" on access values ought to be.  So let
me start by summarizing *my* position:

    1. I think it is essential that access "=" reflect object identity.
       That is, X = Y if and only if both are null or else X.all and
       Y.all are the same object.  I think object identity is clearly
       defined: each elaboration of an object declaration creates an
       object (and perhaps some component objects), each evaluation of
       "new" creates a new object, etc.

    2. I think the RM already says that, quite clearly.

    3. I don't care if this requires distributed overhead, since this is
       the only possible sensible semantics for "=".

    4. However, I don't *like* distributed overhead, and would like to
       eliminate it.  The appropriate way to do that is through compiler
       optimizations, not by damaging the language definition by putting
       in error-prone rules.  I claim that simple compiler optimizations
       can eliminate the distributed overhead in all but a vanishingly
       small number of cases.  Nick outlined the basic scheme, but he
       made it sound a lot more complicated, and a lot less efficient,
       than it needs to be.

Now, please, all of you summarize your positions.  I think we all agree
that "=" should return True if the two objects are the same object.
(Do we at least agree on what "same object" means -- see above?)
I think it should return False if they are not the same object.
You three seem to think it should be allowed to return either True or
False (impl-dep) in some circumstances; please precisely describe those
circumstances.  Your description cannot refer to "the size allocated",
since that's ill defined.  You can refer to 'Size, or 'Length of an
array, or "null record", or "range 10..10", or other well defined
concepts.

----------------

It seems to me that if "=" can return either True or False under certain
circumstances, then the programmer must avoid calling "=" under those
circumstances.  Do you agree?  If so, then "=" really ought to raise an
exception under those circumstances, since calling it is a bug.  If not,
please produce an example of a program that calls "=" and doesn't care
about the answer under those circumstances.

----------------

A point you seem to be missing is that to avoid *all* distributed
overhead, you need to allow an access-to-zero-sized thing to be equal to
an access-to-nonzero-sized thing (which happens to be allocated just
before, or just after the access-to-zero-sized thing).  This would mean
that X=Y does not imply X.all=Y.all, a consequence I find unacceptable.
Please clarify whether the "circumstances" include this case.

Tuck wrote:

> Robert A Duff wrote:
> >
> > Tuck wrote:
> >
> > > I believe the language should not specify things that
> > > users shouldn't care about to begin with, such as order of evaluation
> > > of parameters, precision of intermediates, etc.
> >
> > Then you need to make an argument that it is naughty to depend on the
> > result of "=" on access-to-zero-sized things.
>
> Well "naughty" is hard to define.

That's my point.  Your whole argument is based on the premise that it is
naughty to depend on the result of "=" (in some circumstances that I
don't fully understand).  If you can't explain why that's true (other
than it being slightly less efficient), then your argument doesn't hold
water.

My point is that the burden of proof is on you.  If you can't make a
strong argument that object identity is an evil thing to depend on
(under some circumstances), then you have no business making rules that
will cause bugs if the programmer *does* ask about object identity
(under those circumstances).

You also need to counter Nick's examples.  If those are totally bogus,
you need to explain why, and also give some realistic example of
zero-sized objects of any sort.

I could claim that 1_298_484_999 is a very unusual number, and that
therefore multiplying it by 2 should have impl-def semantics.  But the
burden of proof would be on me, to prove that 1_298_484_999 is totally
useless, and in fact pathological.  I'm not allowed to challenge folks
to prove that some useful program actually uses that number.

>...It seems pointless.

Not sure what "it" refers to here.  Defining "naughty"?  Depending on
the result of "="?  If the latter, please explain why.

>...  A user
> could depend on all kinds of things.

Yes, of course.  The RM is all about defining those things!

>...  Why should they expect
> that zero-sized objects occupy non-zero space?

They should not.  Nor should they expect that zero-sized objects occupy
zero space.  As you well know, the RM does not define how much space
various object will occupy.

As you also well know, most aliased zero-sized arrays do *not* occupy
zero space, because the subtype is usually unconstrained, and most
compilers allocate dope with the object in that case.

Consider access-to-Character.  Some folks might be surprised that each
character takes up 8 bytes, because the default allocator assumes max
alignment.  That seems like a reasonably efficiency trade-off, to me,
since access-to-Character and the like is rare.  Likewise,
access-to-String will generally waste a little bit, by aligning to 8
bytes.  (Talking about the heap here.)

>...It seems especially
> mysterious that an array of zero-sized objects is not itself
> of zero size, if the components are declared "aliased."

A user might also find it mysterious that a packed array of 32 Booleans
takes more than 32 bits.  That's another possibly-surprising effect of
"aliased".  Shrug.  Being surprised at minor efficiency properties seems
much less dangerous than being surprised by the "=".

> And how "big" should zero-sized aliased components be?
> Can users rely on zero-sized aliased components being exactly
> one addressible unit in size, or one word in size, or one
> gigabyte in size?

No, of course not.  The RM says nothing about how much space is occupied
by objects.  If X is of type Integer, X could occupy a gigabyte
(according to the RM), whether aliased or not.  If you don't like that
(I don't!), then make your compiler allocate 32 bits for X.  But that's
not an issue for the RM.

> > ... I have seen no such
> > argument -- just complaining from compiler vendors about distributed
> > overhead (which I argued in another message is not a real issue; I'd
> > like to hear your response).
>
> It seems that if you have two consecutive aliased components in a record,
> or an array of aliased components, whose size is not known at compile-time,
> then there will have to be code at run-time to calculate the size, and if zero,
> bump it up to something non-zero.  Similarly, you will need to worry about the
> gaps that doing this creates, so that equality comparison will
> know not to use block-compare, or you will have to initialize the
> gaps.  Every allocator for an array which might be of zero size will
> need code to check for this special case.  This seems like distributed
> overhead.

Yes, there is some overhead in some cases.  But those cases are pretty
rare, I think.

>...We can argue about whether it is significant, but it seems
> clear that it will affect all array allocations where the high bound is dynamic
> and might be less than the low bound.

Nowhere near that many cases.  The component subtype of the array has to
be of dynamic size, which usually implies some dope or discriminants, so
the size is known at compile time to be non-zero.  Write up an example
-- I think it's not nearly as common as you imply.

> > Not an argument that it's *rare* to do such a thing.  An argument that
> > it is *bad practise* in the same sense as depending on order of eval and
> > the like.  In other words, please explain what is special about the
> > zero-size case that makes object identity irrelevant or evil.
>
> It's bad practice to depend on things which are not necessarily true ;-).

I see the smiley, and I hope it means "I realize this argument is
totally bogus".  ;-)

This argument says it's OK for the language to define that 2+2 is either
4 or 5, depending on the whim of the compiler writer, and it is
therefore naughty to depend on the result of 2+2.  It's true that if
there were such a language, it would be "bad practice" to add 2 and 2,
but that's hardly a defense of that language!

This argument says that any time a user finds a bug in a compiler, they
should avoid that feature.  No!  I hope they report the bug, and I hope
the compiler vendor fixes it.

> When it comes to zero-sized objects, it seems bad practice to assume
> they actually take up heap space.

Agreed.

>...Or alternatively, it seems like bad
> practice to assume that logically distinct objects must have distinct
> access values, if it would be in all other ways OK for them to share the same
> address (e.g., because they are of zero size, or they are both acc-to-constant
> values and they designate constants with identical values).

I presume you also intend to include the case where the only data in the
objects is discriminants or bounds dope.

Anyway, I disagree.  And I claim that you must offer some *reason* for
this belief, before you damage the semantics of "=".  It doesn't "seem
like bad practice" to me, at all.  It seems perfectly reasonable.

> Your argument seems to rest on the fact that it is mathematically cleaner
> to be able to say that if two objects are distinct, then their access
> values are not equal.

Yes.

>...But that seems like a tautology.

Tautologies are obviously true, so I'm not sure what you mean, here.

>...Suppose I claim it is
> mathematically cleaner to make no such assumption.

Then I would be confused as to why we have "=" on access values in the
language in the first place.  Of what use is "=" if one can't make
assumptions about it?!

>...Neither of us really
> have an argument here.

I think I do, but more importantly, I claim the burden of proof is on
you.

>...  Ultimately I think we have to recognize that access
> value equality is testing exactly whether two objects have the same address,
> no more, and no less.

No, that is a completely wrong view of access types.  They are *not*
simply addresses of memory locations.  And should not be.  An
implementation can of course take that point of view, but it needs to
make sure the high-level semantics, which does not mention addresses,
is obeyed.

Analogy: enumeration types are implemented as machine integers, but they
are *not* integers.  Multiplication of them makes no sense, for
example.

>...  I am willing to agree that a given object has one
> and only one address, and so all access values designating a given object
> must be equal, but I don't see any mathematical requirement that two different
> objects not share the same address.

Nor do I, but I see a requirement that those two objects have different
*identity*, which is what access-"=" is all about.  This is not some
mathematical blathering -- I'm being very practical here, in insisting
that object identity is paramount.  I simply can't imagine how write
anything but bugs, if "=" is impl def, presuming I use "=".  And I can't
imagine why I *shouldn't* use "=", since it seems to provide useful
information.

Clearly, two different unaliased objects can have the same address, if
one of them has zero size.  And an aliased object can have the same
address as an object of an unrelated type, or an address in the code
segment, because then "=" would not apply.  This is what Nick's idea is
all about.

>...It is sort of whether we choose the
> Fermion or Boson statistics ;-).

You seem to have mistaken me for a physicist.  ;-)  Sorry, but I haven't
the foggiest idea what "Fermion or Boson statistics" means.  Would you
care to post a lesson in physics?  Sounds interesting.

****************************************************************

From: Robert Duff
Sent: Saturday, August 30, 2003  8:50 AM

Adam says:

> Bob Duff wrote:
>
> > > Tucker didn't see it that way.  And to me, having two heavyweights
> > > disagreeing on the correct interpretation of an RM paragraph is, in
> > > itself, compelling evidence that the RM needs to be "changed", or at
> > > least reworded to clarify the meaning.
> >
> > Well, maybe.  I'm hoping to convince Tuck (and Pascal and Randy)
                                                   ^^^^^^
> > that they were wrong on this point.

I should not have included Pascal there.  Pascal seems to agree with me
on what the language definition currently *says*.  He just doesn't like
it.  Tuck and Randy are the ones claiming the RM says what they like,
or at least is ambiguous and could sensibly be interpreted that way.

> I don't see how that helps.  If Tuck did get it wrong, then how can we
> expect a poor less-elevated user, who's writing his own storage pool
> manager and trying to read RM to figure out whether they need to write
> their Allocate routine to handle Size_In_Storage values of zero, to
> get it right?  I still maintain that whatever the result of this
> discussion, the RM needs to address this issue.

Fine -- I have no objection to clarifying the RM.  I do object to Tuck's
claim that the language is currently unclear on the semantics of "=" and
therefore it's fair game to make up some semantics out of whole cloth.

> > (2) can be partly true because that's what the RM says (clearly, I would
> > claim -- once I studied the relevant paragraphs for hours. ;-))
>
> Perhaps I should have made it clear that I was asking about what the
> RM ought to say, rather than what it does say.  If it actually says
> anything to the effect that (2) is partly true and partly false, it's
> a bug.

I agree.  I think the RM *should* say that zero-size must not be passed
to Allocate.  That would make it crystal clear that user-defined
Allocate functions need not do anything special for zero size.

I claim that the RM *currently* says that zero can be passed, but that
the implementation cannot then rely on unique addresses being returned,
which is confusing at best.

****************************************************************

From: Robert Duff
Sent: Saturday, August 30, 2003  9:10 AM

I feel like I'm using giant cannons to attack a mosquito, here.
After all, zero-sized objects are pretty rare, and we're talking
about *aliased* zero-sized objects, which are even rarer.
Sorry about that, I hope this doesn't offend, but I find this mosquito to
be annoying, so here is another giant cannon attack.  ;-)

Tuck wrote:

> Robert A Duff wrote:
> >
> > Tucker wrote:
> >
> > > In re-reading 4.5.2(12), in comparison with 4.5.2(13),
> > > it certainly seems "reasonable" to conclude there is
> > > no requirement that access-to-object values be *un*equal if
> > > they designate different objects.
> >
> > This is a very "creative" reading indeed.  ;-)  Methinks you're trying to
> > make it say what you *want* it to say.
>
> No, I truly wasn't.  It seems careful to say that *if* they designate
> the same object, then they are equal.

...and makes no other restriction on the "only if" case.

> > I am very much opposed to an "=" operator that returns True in some
> > cases, but in other cases, returns True or False at the whim of the
> > implementer.  Here's a typical singly-linked list walk:
> >
> >     while L /= null loop
> >         ...
> >         L := L.all.Next;
> >     end loop;
> >
> > Your interpretation of 4.5.2(12) means that L /= null can be False when
> > L is not null, which is clearly nonsensical, and would break many
> > programs...
>
> That's not actually what I said.

Sorry, I thought you were saying that "if" means "if but not necessarily
only if" in the paragraph in question.  The above example is a logical
consequence of that interpretation.

> In re-reading 4.5.2(12), in comparison with 4.5.2(13),
> it certainly seems "reasonable" to conclude there is
> no requirement that access-to-object values be *un*equal if
> they designate different objects.

OK, I see you didn't mention null (though I don't know why -- the "if"
we're arguing about applies to that case just as much).  So basically,
you're saying that this paragraph could be interpreted to mean that if X
and Y are both nonnull, and designate distinct objects, then the result
of X=Y is implementation defined?

That interpretation can't be right either, since it leads to equally
ridiculous consequences.  It directly implies that "=" can return True
so long as both operands are nonnull.

>...  I said 4.5.2(12) was ambiguous,
> and I wasn't certain how it should be interpreted.  I never in my
> wildest dreams interpreted it as meaning that a null access value could
> ever be equal to an access value that designated an object, even
> a zero-size object.

But if you don't believe "if" means "if and only if" here, then that's
exactly what it says, despite the lack of wild dreams.

> > >...It would also be
> > > reasonable to read "if" as "if and only if" so we
> > > clearly have an ambiguity which deserves clarification by
> > > the ARG.
> >
> > No.  "If and only if" is *clearly* the intended meaning.
> > So any relaxation of the requirements on "=" would be a language
> > change -- an inadvisable language change, in my opinion.
>
> Well here I don't agree.  The intended meaning is not completely
> clear (to me) in the context of neighboring paragraphs, and zero-sized
> objects and access-to-constant values are unusual enough
> that I suspect they were not carefully considered for the
> definition of access-value equality.  (I mention access-to-constant
> values here because I could imagine an implementation that shares
> storage between two access-to-constant allocators that create
> constants with identical values.  This kind of sharing is quite common
> for string literals in C, which are essentially like access-to-constant-Strings
> in Ada.)

Obviously, I can't contradict you if you claim that you don't understand
that paragraph; I must take you at your word.  But I see no occurrences
of "zero" or "constant" in the words, so I must assume you're basing
your interpretation on something other than the words as written.  The
words as written seem clear to me.  The word "if" could mean "if and
only if" or "if (but not necessarily only if)".  The latter leads to
ridiculous consequences, so it must mean "if and only if".  If you don't
like the implementation consequences, why don't you just say so, instead
of claiming that "if" means "if but not if blah blah zero blah blah
access-to-constant"?  The author of this paragraph (which was almost
certainly either you, Tuck, or me, I don't remember) did not intend it
to mean anything different for the zero-or-const case, or he would have
said so.  Any "interpretation" that mentions zero-sized objects and
acc-to-const and the like is sheer invention, since there are no such
words in there.

I agree that the author, you or me, might not have been thinking
specifically of the implementation-oriented cases you're worried about.

> > > One way Randy and others have helped illuminate such
> > > issues in the past has been by writing up an example
> > > and seeing what various compilers do with it.  If all
> > > compilers agree, then that is an argument for solidifying
> > > the interpretation that way.  If the compilers don't
> > > agree, then the general approach is to disambiguate in
> > > favor of least implementation disruption, unless there
> > > is a compelling user requirement to the contrary.
> >
> > That approach isn't much help when you're proposing that it be
> > implementation dependent whether "=" returns True or False in certain
> > cases.  Whatever the results of the experiment, they support the
> > contention that "it is implementation dependent whether blah blah
> > blah..."
>
> Alternatively, the results support the contention that code that has
> been ported to several compilers must not be critically dependent
> on the answer, and so it is overspecification to say what happens
> for zero-sized objects.

This makes no sense to me.  The fact that compilers behave differently
is in no way evidence that compilers ought to be allowed to behave
differently.

> > > Here, we have heard some user arguments in favor of
> > > requiring inequality if the designated objects are
> > > different, but it has not reached the level of
> > > "compelling" in most readers' minds, apparently.
> >
> > It is compelling in my mind.
>
> What is "compelling"?  That the "if" was intended to mean "if and only if"
> in this paragraph, or that there is useful code that depends on access values
> designating distinct zero-sized objects having unequal values?

Sorry.  I meant that "some user arguments in favor of requiring
inequality if the designated objects are different" are compelling (to
me).  Namely, Nick's arguments, which you have yet to refute.

> I find the case of aliased components that happen to be of zero-size a
> pretty convincing argument for not requiring such objects to have unique
> access values.
>
> >
> > > So...
> > >
> > > Here is a simple example to illustrate the issue.
> > > Please give this a try and see what your favorite
> > > compiler produces.  One of ours produces False,False,False.
> > > Another produces False, True, True.
> >
> > And you find this kind of implementation variability acceptable?
>
> Yes.  Comparing addresses of objects is fundamentally implementation-dependent,
> especially zero-sized objects.

I agree that comparing addresses is "fundamentally implementation-dependent".
But we're not talking about comparing addresses.  We're talking about
comparing access values, i.e. object identities.  You continue to
confuse the two levels of abstraction.

And you have yet to explain why zero-sized objects are special (in user
terms, please).

>...  I find the C rule that no object
> can be of zero size a big waste of energy, and forcing the implementation
> to do it for aliased objects in Ada seems a similar waste of energy.

I agree that it's a pain that C doesn't allow zero-length arrays.
But I fail to see how this is relevant: Ada allows zero-length
arrays.  In C, everything is aliased.  In Ada, if you don't want the
space overhead, you can just avoid making the thing aliased.

> You and I have generally had different comfort levels with implementation
> variation.  I believe the language should not specify things that
> users shouldn't care about to begin with, such as order of evaluation
> of parameters, precision of intermediates, etc.

True -- we have somewhat different philosophies with respect to
nondeterminism.  But I don't think that difference of opinion applies
here.  We both agree that a program should not rely on things like order
of eval in:

     <expression> + <other expression>

and we both agree that it would be desirable to *prove* at compile time
that the effect of the program does not depend on such order of eval
(and we're working on a tool that can do just that!)
We disagree on what to do about it, in cases where such a proof is
infeasible.  But that seems like a minor disagreement.  In *this* case,
we are disagreeing on whether a reasonable program should rely on
object-identity semantics for "=".  That's more fundamental.

>...Java made an effort
> to specify all of these things exactly, but that seems misguided since
> Java is inherently a multi-threaded languages, and relative rates of
> progress of threads can't be specified.  Furthermore, their efforts to
> specify precision of floating point intermediates ultimately ended
> up causing serious performance problems for X86 implementations, and
> so they have pretty much dropped it.

Granted -- Java went overboard.  But that doesn't mean the result of 2+2
should be nondeterministic.  Nor does it mean that object identity
should be nondeterministic.

> I think I was particularly influenced by Djikstra's book "A Discipline
> of Programming" on the value of not overspecifying semantics that
> shouldn't matter to the user.  In the language in that book, a multi-arm
> "if" or "do"-loop allowed overlapping conditions, in which case it
> was implementation-dependent which "arm" would be chosen.  He shows
> that this actually simplifies proofs in some cases.

I think you are misapplying Djikstra's ideas here.  In such a case, as I
recall, Djikstra would *prove* that the different arms of the "if" don't
matter to the end result.  But you are *assuming* that the result of "="
doesn't matter in certain cases, without proof.

> [An interesting counter example to our general preferences is that
> I tend to always use short-circuit logical operations, whereas you
> seem to prefer symmetrical, order-unspecified, "and"s and "or"s.]

Interesting to a psychologist.  Maybe we can get one to study us?  ;-)

I guess I view "Computer Science" as a branch of psychology, more than
as a branch of mathematics.

For the record, I use "and then" only when the second half makes no
sense when the first half is False, otherwise "and".  I think it should
be the compiler's job to figure out which order is more efficient,
and whether one or the other can be avoided.
(And, I realize it can't always, so I sometimes use "and then" when I
know the compiler has no clue, and I grumble about it.)

I'm waxing overly philosophical here...

> > I don't.  I think one of our implementations has a bug.  Pascal's
> > favorite compiler does not.
>
> Though Pascal seems to regret having to do the funny business with
> aliased components to make it work the way it does.

Right.  Perhaps Pascal agrees with you on what the RM *should* say, but
he agrees with me on what it *does* say.

> > Nick reports that GNAT does.
>
> Given that we supply front ends to Aonix and Green Hills, this means
> that the "big 4" vendors, ACT, Aonix, Green Hills, and Rational all
> might have different behavior with respect to access values designating
> zero-sized objects.  This can't be something that is too critical
> to a user who cares about portability across compilers, which after
> all, is what standards are all about.  Standards should specify enough
> to provide useful portability, but without overspecifying things that
> are of little or no interest to users.

Again, you're arguing that current variability in implementations is
proof that it's good to have variability in implementations.

****************************************************************

From: Pascal Leroy
Sent: Sunday, August 31, 2003  3:56 AM

Bob wrote:
>I should not have included Pascal there.  Pascal seems to agree with me
>on what the language definition currently *says*.  He just doesn't like
>it.

Well, Pascal has actually changed his mind after reading the hundreds of
messages that you wrote on this topic, Bob.

I now agree that an access value ought to be an object identity, and that two
distinct objects ought to be designated by two distinct object values (I also
agree that the identity of objects, and the phrase "distinct objects" are
actually well-defined by the RM).

I have always been convinced that "if" meant "if and only if" in this wretched
paragraph.  As you point out, any other interpretation leads to nonsense,
plague and pestilence.  What I am now saying is that this paragraph is right as
it is and there is no reason to change it (although evidently a
confirmation/ramification AI could be useful).

I find Nick's example particularly compelling, by the way.  While John's
programming style is a bit peculiar here, the use of sentinels is a very common
programming practice, and I can imagine that, for slightly more elaborate data
structures, there would be no other way to program such a piece of code.  And
at any rate, peculiar or not, it is entirely legitimate for a programmer to
assume that this style has to "work right".

Because access types are at a higher level of abstractions than addresses, they
should be free of the idiosyncrasies that may affect addresses depending on a
variety of implementation-dependent choices.

Incidentally, I think that this interpretation is already implied by the RM as
it is: 3.10(1) says that "A value of an access type provides indirect access to
the object or subprogram it designates".  Note the singular in the second part
of the sentence: it doesn't say "to one of the 50 objects it might designate".

Implementation-wise, I am not sure that your model is as simple as you think,
and anyway we are not going to change our model unless some paying customer
makes a ruckus.  We'll continue to pay the distributed price of one extra byte
in some cases.  Shrug.  The thing that bothers me a bit more is that composites
that contain potentially zero-sized aliased objects will need to be compared
component-by-component (an alternative would be to initialize the extra byte,
but we don't want to go that way).  But I am willing to pay that price to have
access types work right.

Incidentally our compiler is not bug-free, and tests a bit more complex than
the one that Tuck sent show situations where distinct objects end up at the
same address (and therefore, are designated by the same access value).  This
will have to be fixed.

****************************************************************

From: Alexander E. Kopilovich
Sent: Sunday, August 31, 2003  11:34 AM

Robert A Duff wrote:

> I agree that comparing addresses is "fundamentally implementation-dependent".
> But we're not talking about comparing addresses.  We're talking about
> comparing access values, i.e. object identities.  You continue to
> confuse the two levels of abstraction.
>
> And you have yet to explain why zero-sized objects are special (in user
> terms, please).

From the mathematical viewpoint there is clear opportunity for special treatment
of object identity in the case of zero-sized objects: while non-empty sets
easily may be isomorphic but non-identical, two empty sets usually are regarded
as identical, even if they are results of entirely different construction
process. The same for other categories (topological spaces, algebras etc.).
So, a zero-sized object may be considered as analog of empty set (that is,
a zero-sized object of a type corresponds to the "null" universal object in
a category) and that is one possible explanation of the kind you requested.

****************************************************************

From: Pascal Leroy
Sent: Monday, September 1, 2003  2:59 AM

I fail to see the analogy.  Zero-sized objects often arise from sets
that have exactly one value, as in:

	type T1 is (Foo); -- The one value is Foo.
	subtype T2 is String (1..0); -- The one value is "".

so the comparison with empty sets sounds entirely irrelevant to me.

(True, zero-sized objects also arise from empty sets, as in:

	type T3 is range 10..9;

but surely we don't want the semantics of the language to be different
in these two cases.)

****************************************************************

From: Alexander E. Kopilovich
Sent: Monday, September 1, 2003  10:40 AM

> I fail to see the analogy.  Zero-sized objects often arise from sets
> that have exactly one value, as in:
>
>	type T1 is (Foo); -- The one value is Foo.

This is just because you represent the elements of a set by subsets of another
set (bit string representation is exactly the predicate function for a subset),
and an empty set has exactly one subset (itself). Therefore you take an empty
set (zero-length bit string) when you need only one subset,

>	subtype T2 is String (1..0); -- The one value is "".

This is quite straighforward: a string is array = ordered set of characters,
and null (empty) string is naturally an empty (ordered -:) set of characters.

****************************************************************

From: Robert A. Duff
Sent: Monday, September 1, 2003  9:24 AM

Well, maybe you could use such reasoning to say that an access to a
zero-sized value is equal to all other access to zero-sized values (even
if the types don't match?!), and no others.  But what's been proposed is
that "=" is *nondeterministic* if both access values point at zero-sized
objects (and perhaps other cases).  I don't know of any branch of
mathematics where it is "implementation dependent" whether or not two
empty sets are equal.  ;-)

****************************************************************

From: Alexander E. Kopilovich
Sent: Tuesday, September 2, 2003  2:39 PM

> Well, maybe you could use such reasoning to say that an access to a
> zero-sized value is equal to all other access to zero-sized values (even
> if the types don't match?!),

No, the types must match within that logic, otherwise we are comparing objects
from different categories, which is meaningless. Well, you may map both objects
by the erasing functors to objects of underlying category and compare the
images, but it will be very bad practice to do that implicitly. So, at most
you may permit explicit canonical conversion of zero-sized object to zero-sized
object of another type.

> and no others.  But what's been proposed is
>that "=" is *nondeterministic* if both access values point at zero-sized
>objects (and perhaps other cases).

I think that nondeternism is caused (quite common case) by the presence of
several different logics (that is, more than one), and each of them is somehow
justified. This usually means that there are several different prototype
real-world situations, which are similar in external view, but require different
approach if you want to be adequate enough. In other words, the general notion
used for these situations is not established enough (and perhaps cannot be
established without paying too much complexity cost for that generality).
I guess that a pragma can select a logic for a particular application... and
perhaps it will be good to prevent comparsions between accesses to constant
objects (I mean the case when both objects for comparison are constant) in the
absense of that pragma.

>  I don't know of any branch of
> mathematics where it is "implementation dependent" whether or not two
> empty sets are equal.  ;-)

In mathematics it should be said "depends upon representation", I think -:) .
Well, we are discussing not comparisons between objects - (empty) sets, but
comparisons between *references* to them, So, it can easily be two different
paths in a diagram to the same object - (empty) set, and all depends upon
what exactly we want to compare, how we understand identity for our purposes.

****************************************************************

From: Robert A. Duff
Sent: Monday, September 1, 2003  9:58 AM

> Well, Pascal has actually changed his mind after reading the hundreds of
> messages that you wrote on this topic, Bob.

Sorry to beat you over the head with all that verbiage.  ;-)
Here's a little more...

I'd like to hear comment from you and others on my suggestion that we
change the RM to say that the implementation must not pass 0 as the size
to a user-defined Allocate.  The implementation model I'm imagining is
that it checks for zero at the call site of Allocate (which has a good
chance of being optimized away), and either passes 1, or does something
like what Nick suggested.  This is responsive to the original question
-- it clarifies whose responsibility it is to deal with the zero-size
issue.

> Implementation-wise, I am not sure that your model is as simple as you
> think, and anyway we are not going to change our model unless some
> paying customer makes a ruckus.

I think it's pretty trivial in the allocator case, and can easily be
optimized for most allocators.  The aliased component case is a little
less easily optimized.

>...  We'll continue to pay the distributed
> price of one extra byte in some cases.  Shrug.  The thing that bothers
> me a bit more is that composites that contain potentially zero-sized
> aliased objects will need to be compared component-by-component (an
> alternative would be to initialize the extra byte, but we don't want to
> go that way).

If you have a record containing, say, a byte-sized thing followed by a
word-sized thing, so that alignment considerations cause your compiler
to insert 3 extra bytes in between, do you zero those 3 bytes?  It seems
to me that the simplest implementation would be to consider the extra
bytes introduced by zero-sized things as "gaps" just like those
introduced by alignment issues, and treat them the same way.

I know of one compiler that allows components of a record extension to
be allocated in that 3-byte gap.

> Incidentally our compiler is not bug-free, ...

I'm shocked!

----

By the way, another issue occurs to me: zero-origin arrays (or "virtual
origin").  This means that an access-to-array is represented as the dope
(bounds) plus the address of the zero'th component of the array.  If
that component does not exist, then it points to where A[0] *would* be.
This perhaps makes array indexing more efficient because you don't have
to subtract off the lower bound.  (It causes all manner of trouble for a
conservative garbage collector, I would imagine!)

Consider:

    X: aliased String(1..10);
    Y: aliased String(11..99);

How do we know that X[0] and Y[0] are not at the same address?
Can X = Y be True in this case?  Are virtual-origin arrays an
incorrect implementation of Ada, in general?

I was under the impression that GNAT used these techniques (at least,
I've heard Robert singing their praises), but experiments fail to
confirm that.  Perhaps someone from ACT could clarify.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, September 2, 2003  10:28 AM

Bob said:

> I'd like to hear comment from you and others on my suggestion
> that we change the RM to say that the implementation must not
> pass 0 as the size to a user-defined Allocate.  The
> implementation model I'm imagining is that it checks for zero
> at the call site of Allocate (which has a good chance of
> being optimized away), and either passes 1, or does something
> like what Nick suggested.

I believe that's what we do, just so that we don't end up with two
objects at the same address (and that's done in the generated code, not
in the called Allocate routine).

> If you have a record containing, say, a byte-sized thing
> followed by a word-sized thing, so that alignment
> considerations cause your compiler to insert 3 extra bytes in
> between, do you zero those 3 bytes?  It seems to me that the
> simplest implementation would be to consider the extra bytes
> introduced by zero-sized things as "gaps" just like those
> introduced by alignment issues, and treat them the same way.

Yes, I think the analogy works.  If we have gaps, we do not zero them,
we just go to component-by-component comparison.  So we would do the
same for the extra byte introduced by zero-size component.  (On the
other hand we try hard not to have gaps, so in the example you describe
we would put the word-size component _before_ the byte-sized component.)

****************************************************************

From: Michael F. Yoder
Sent: Tuesday, September 2, 2003  1:03 PM

>From the mathematical viewpoint there is clear opportunity for special treatment
>of object identity in the case of zero-sized objects: while non-empty sets
>easily may be isomorphic but non-identical, two empty sets usually are regarded
>as identical, even if they are results of entirely different construction
>process. The same for other categories (topological spaces, algebras etc.).
>So, a zero-sized object may be considered as analog of empty set (that is,
>a zero-sized object of a type corresponds to the "null" universal object in
>a category) and that is one possible explanation of the kind you requested.

This argument  only applies if access values to distinct objects are
allowed to be the same when the objects contain identical values.
Consider two constant objects with identical values; if this logic for
zero-sized objects were followed, why not let these two objects also be
at the same address?

My counterclaim amounts to saying that zero size *isn't* special here.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, September 2, 2003  1:16 PM

In fact, Tucker suggested just that.  And I think he said that C
compilers typically do that for (static) string literals.  It's what we
used to call "literal pooling", except literal pooling is supposed
to be invisible.

****************************************************************

From: Alexander E. Kopilovich
Sent: Tuesday, September 2, 2003  1:20 PM

> This argument  only applies if access values to distinct objects are
> allowed to be the same when the objects contain identical values.

For constant objects only. And moreover, those object must be constant at the
implementation level. There must be true constants, without any tricks with
views etc.

>  Consider two constant objects with identical values; if this logic for
> zero-sized objects were followed, why not let these two objects also be
> at the same address?

I suspect that this actually happens sometimes (for example for string literals
declared in separate compilation units) at the linker level.

>My counterclaim amounts to saying that zero size *isn't* special here.

Zero size is special at least in that it automatically implies constant object,
even if the object was not declared as constant.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, September 2, 2003  1:26 PM

> (On the other hand we try hard not to have gaps, so in the example you
> describe we would put the word-size component _before_ the byte-sized
> component.)

OK; makes sense.  If you want, you can try even harder.  For example:

    type Zero_Size is (Red);

    type T is
        record
            X, Y: Integer;
            A, B: aliased Zero_Size;
        end record;

You could allocate A at the same place as X, and B one byte past that
(in the middle of X).  This would guarantee A'Access and B'Access are
unique, since X'Access is illegal (and even if it were legal, it would
be of the wrong type to compare "=" with A'Access).  And it would mean
zero wasted space for A and B.  You would need to make sure that writes
on A and B don't really do anything.

I'm not saying this is a worthwhile optimization.  I would go to extra
trouble to do it if and only if a paying customer actually wanted it.

I must admit that I can't think of any zero-cost way to allocate zero
storage to all arrays of zero-sized component subtype, in general, which
is perhaps what Tuck wants.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, September 2, 2003  1:52 PM

> > I see no reason for any distributed overhead.  I see two potentials for
> > distributed overhead.  (1) "new" needs to somehow check for the
> > zero-size case, and do something different.  Elsewhere, I claimed that
> > this is trivially optimized away in 99.99% of cases.
>
> This is false. It was true in Ada 83, but since the pool interface is
> exposed to the user in Ada 95, you cannot depend on the calls having the
> correct form.

You mean if the program calls Allocate directly, not via "new"?
I don't see any issue there -- it does what it does, and there's
nothing in the RM saying Allocate has to do anything in particular.
The RM just says that *if* Allocate does so-and-so, then "new" does
such-and-such.

>... (Janus/Ada 83 not only prevented 0 as a size [which crashed
> the assembly code that implemented the heap manager], but also rounded sizes
> up to the next alignment boundary [usually 4]. But for Ada 95, we had to
> move that code into the pool, because a user-written call could violate the
> assumptions, and we didn't want the system to crash in that case.

The system can crash in that case (according to the RM).  So you're
going above and beyond the call of duty, here.  No problem, there, but
you can't complain about the cost, if it's optional.

> > (2) All aliased
> > objects need to be allocated non-zero space.  I think it's silly to
> > worry about this one, since zero-sized objects are rare, but if you want
> > to worry about it, fairly simple compiler optimizations can eliminate
> > this overhead.
>
> The trouble is that you don't know at compile-time that the objects have
> zero size.

Most compilers usually know at compile time that the thing does *not*
have zero size (even when you don't know the size at compile time).

>...The usual problem is that you have an array subtype that is
> dynamically-sized and might be zero. You then need distributed overhead to
> handle check whether it is zero and somehow allocate more than the normal
> memory. But, as we all agree, this case will be rare, yet it will occur on
> every dynamic component.

I don't fully understand your run-time model, but I do admit there are
some cases with some distributed overhead, in any run-time model.

> > ...  Nick outlined how.  Tuck said he didn't understand, and
> > Pascal said it involved overhead on every dereference, and then perhaps
> > Nick confused the issue by talking about all manner of complicated
> > optimizations, which are unnecessary, I think.
> >...
> >     - The size of the designated object is dynamic, and might be zero.
> >       Here, dereferencing involves either a no-op (in case of
> >       pass-by-ref), or a copy of N bytes or N words or whatever,
> >       where N is dynamic.  Well, copying N bytes/words where N happens
> >       to be 0 is a no-op, and does not touch the designated memory.
> >       No special check for zero is needed -- just the usual loop,
> >       checking whether we're done copying N bytes.
>
> This is also false. Simply setting up the addressing for a block move could
> cause an address fault on common processors.

I don't know which processors you're talking about.  But surely there is
*some* set of addresses on every processor that can be safely used in
the manner I suggest (actually, Nick suggested it originally).

P.S. If this issue is decided in favor of my opinion, I think there
really needs to be an ACATS test for it.  Otherwise, it is a totally
pointless and hollow victory.  I would rather say "so-and-so is impl
def" rather than "so-and-so is well defined in the RM, but the ACATS
don't test it, and the vendors scoff at the idea of implementing it
properly".

****************************************************************

From: Randy Brukardt
Sent: Tuesday, September 2, 2003  6:37 PM

Bob said:

> Now, please, all of you summarize your positions.

I think this whole discussion is a massive waste of time, especially as
we're answering a question that no one really asked.

I started to try to respond to Bob's many messages, since they are filled
with misrepresentations and half-truths (and bad policy) -- to the extent
that I cannot believe that they were written by an Ada implementor -- but I
realize that I simply cannot afford the time on such a pointless discussion.
We have lots of important issues to discuss, AIs to write, etc.

But, I'll give a brief summary which I hope will at least make clear why any
tightening of the rules here is actively harmful.

To summarize Bob's position in very brief terms:
   1) "=" of access objects means equality of objects, exactly, with no
variation allowed. (That is exactly the case if you insert the missing "only
if".) The only exception would be in an erroneous program.

   2) We need a rule to make it illegal for a compiler to pass 0 to a
storage pool's Allocate routine.

Let's look at the second rule first. It is very ugly. We cannot change the
specification of Allocate (otherwise, all existing pools would become
illegal - a gratious change for a extremely minor issue). Thus, we'd have to
have some goofy rule like: "An implementation shall not pass zero to
Allocate." This would be a permanent wart on the language; essentially it is
an admission that the spec is wrong.

In addition, it would require doubling the needed checks for a pool; since
direct calls to a pool can be written, a well-written pool still would have
to make the check for zero internal to itself. For the 25-50% of allocations
where the size is not statically known (array allocations and allocations of
generic formal objects are the most likely), this would require checks both
before the call and then again in the pool. Since no optimization could ever
remove the checks inside the pool, this is just adding overhead with almost
no benefit.

---

Moving to the first interpretation. This means that we (the ARG) have to
decide what "object" is pointed to in any case where the access value is
created by a means other than an allocator or 'Access. It would particularly
cause trouble for Address_to_Access_Conversions.

Consider:
     type Null_Record is null record;
     package AAC is System.Address_to_Access_Conversions (Null_Record);
     subtype Acc is AAC.Object_Pointer;
     type A_Rec is
        Comp1, Comp2 : aliased Null_Record;
     end record;
     Obj : A_Rec;
     Comp1_Addr : System.Address := Obj.Comp1'Address;
     C1 : Acc := Obj.Comp1'Access;
     C2 : Acc := Obj.Comp2'Access;
     AC1 : Acc := AAC.To_Pointer(Comp1_Addr);

     if C1 = AC1 then -- ??
     if C2 = AC1 then -- ??

If you follow Bob's rule to the letter, then C1 must equal AC1 and C2 must
not equal AC1. In that case the Nick/Bob implementation using phony values
becomes incorrect. (And everything said about it and it's overhead are
irrelevant.) That's because if could have no way to determine whether C1 or
C2 is meant, so no amount of overhead could get the correct answer. [Thus,
you're insisting on non-null allocation - which means that A_Rec'Size is a
lie. If we had 'Object_Size, we could solve this usefully, but since we
cannot have that, we are screwed.]

If you say that both of these are always unequal, then you are forcing
compilers to use an implementation like Nick's. That clearly is harmful.

If you try to say that this result is somehow not well-defined, you'll have
to come up with a reason for it. Certainly, the definition of "=" as
espoused by Bob offers no such leeway. You could say that all uses of AAC
are erroneous - but that would simply prevent the use of the feature (and
put us back to the bad old days of Unchecked_Conversion). And it certainly
doesn't help anyone for this to be well-defined some of the time.

---

Bob also asked:

>By the way, another issue occurs to me: zero-origin arrays (or "virtual
>origin").  This means that an access-to-array is represented as the dope
>(bounds) plus the address of the zero'th component of the array.  If
>that component does not exist, then it points to where A[0] *would* be.
>This perhaps makes array indexing more efficient because you don't have
>to subtract off the lower bound.  (It causes all manner of trouble for a
>conservative garbage collector, I would imagine!)

I don't see a problem here. For access-to-array, you compare the addresses
of the descriptors, not of the data. The descriptors are unique to the
objects (the data may not be, in the case of slices or of
constrained-by-discriminant arrays). Then you can use any representation for
the pointer to data that you like.

The only time that you use the data itself is for an array without a
descriptor. In that case, it (if non-null) has to be unique.

Indeed, we considered an implementation like that, but abandoned it because
you don't always get a legal address that way. Since we insist on
no-wraparound math, even on addresses (because the front-end does not know
the details of addressing on the target; wrap-around math doesn't work on a
8086, for instance, where you'll just go to another segment), we would have
had to use several possible mechanisms. Which conflicted with our desire to
keep the code size small. So we always point at A[A'First].

****************************************************************

From: Nick Roberts
Sent: Tuesday, September 2, 2003  9:40 PM

"Randy Brukardt" <randy@rrsoftware.com> wrote:

> I think this whole discussion is a massive waste of time, especially as
> we're answering a question that no one really asked.

But it is a question that arose nevertheless, and I don't think it is a
waste of time.

To be frank, what I think is a waste of time is the intransigence of you,
Randy, and Tucker which is apparently based on a reluctance to make the
effort to improve the Ada standard, rather than genuinely on technical
arguments. This is the impression I am getting; I hope it is wrong.

> I started to try to respond to Bob's many messages, since they are
> filled with misrepresentations and half-truths (and bad policy)

I think that you, Randy, should enumerate exactly all the instances of
misrepresentations and half-truths you accuse Bob of. If not, then I think
you are honour bound to retract your accusations. This mailing list is a
public forum.

> but I realize that I simply cannot afford the time on such a pointless
> discussion. We have lots of important issues to discuss, AIs to write,
> etc.

I would find it acceptable for an officer of the ARG to say that he feels
discussion of a certain issue has to be deprioritised, in favour of other
more important issues. If that is what you mean, Randy, then I wish you
would say it directly, instead of apparently inventing a lot of specious
technical arguments in a hopeless effort to dismiss the issue altogether.

> We cannot change the specification of Allocate (otherwise, all
> existing pools would become illegal - a gratious change for a extremely
> minor issue). Thus, we'd have to have some goofy rule like: "An
> implementation shall not pass zero to Allocate."

I disagree. The following declaration could be harmlessly added to
System.Storage_Elements:

   subtype Positive_Storage_Count is
      Storage_Count range 1..Storage_Count'Last;

and the declaration of Allocate in System.Storage_Pools could be changed to:

   procedure Allocate(
      ...
      Size_In_Storage_Elements :
         in Storage_Elements.Positive_Storage_Count;
      ... ) is abstract;

This new declaration could be used for all existing implementations of
storage pools, with the subtype of the Size_In_Storage_Elements parameter
being the only thing that needed to be changed. All existing calls to
Allocate which did not pass 0 in this parameter would work correctly, and
all calls which did pass 0 would be caught (perhaps quite efficiently).

> This would be a permanent wart on the language; essentially
> it is an admission that the spec is wrong.

I think it is a very ugly suggestion that mistakes in the language
(standard) should be covered up rather than openly corrected. Is that how a
standards review process is supposed to work?

> In addition, it would require doubling the needed checks for a pool;
> since direct calls to a pool can be written, a well-written pool still
> would have to make the check for zero internal to itself. For the
> 25-50% of allocations where the size is not statically known (array
> allocations and allocations of generic formal objects are the most
> likely), this would require checks both before the call and then
> again in the pool.

I think my suggestion (changing the subtype of the Size_In_Storage_Elements
parameter) would generally avoid the two checks you suggest here.

> Moving to the first interpretation. This means that we (the ARG)
> have to decide what "object" is pointed to in any case where the
> access value is created by a means other than an allocator or
> 'Access. It would particularly cause trouble for
> Address_to_Access_Conversions.

It would only cause trouble for Address_to_Access_Conversions. It seems
highly unconvinving to me to base a refutation of (the standard prescribing)
the strict behaviour of a major element of the language (access types) on a
problem with a very minor part of the language
(Address_to_Access_Conversions).

In the case of an access value being generated by the To_Pointer function of
an instantiation of System.Address_to_Access_Conversions, I would be
perfectly happy for the definition of object identity to be based on
address. That is, in the case of such a conversion, equal addresses are
considered to reference the same object (and the resulting access values
must be equal), and unequal addresses are considered to reference different
objects (and the resulting access values must be unequal).

In the other, much more common, cases of the generation of access values, I
maintain that it is important that the equality of access values always
matches the differentiation of objects.

> If you try to say that this result is somehow not well-defined,
> you'll have to come up with a reason for it

The reason is as follows. Nearly always, the reason why
Address_to_Access_Conversions needs to be used is for interfacing with C (or
C++) code. As has been noted, C (and C++) has parity between address
equality and object identity, because it doesn't allow null objects. Thus we
can use the same characteristic for Address_to_Access_Conversions. I doubt
there will ever be any case where the semantics I suggest would fail.

Neither you, Randy, nor Tucker has even attempted to produce a refutation of
the examples I gave, illustrating, to my mind, why strict access equality is
important.

I still believe that the lack of this strictness could cause a real Ada
program to fail. Although I admit I have failed to produce an actual example
of this, that is hardly proof that it could not happen. I think my examples
do show that the possibility of this situation cannot be lightly dismissed.

I suggest that, in cases where there is a conflict poised between strict
correctness and implementation expediency, the decision should be in favour
of strict correctness. Isn't it better for a program to work slowly than to
fail quickly?

In fact, I'm saying that I think it is better (although regrettable) for
thousands of programs to work slightly slower, than for one deployed program
to fail; Sod's law dictates that the program would be flying an aeroplane,
or running somebody's heart monitor.

Finally, I hope that my comments on this issue will be taken in the good
humour they are intended.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, September 3, 2003  12:31 AM

Nick said:

> > I started to try to respond to Bob's many messages, since they are
> > filled with misrepresentations and half-truths (and bad policy)
>
> I think that you, Randy, should enumerate exactly all the instances of
> misrepresentations and half-truths you accuse Bob of. If not, then I think
> you are honour bound to retract your accusations. This mailing list is a
> public forum.

It would take at least 4 hours of my time to properly respond to Bob's
messages (and the messages of others) and probably twice that to respond to
responses. (It took 2.5 hours to write the message I did send, and another
45 minutes on this one.) I simply don't have the time. I covered the most
important problems in the message I sent, but there are a number more. Some
of it is simply that we disagree on the basic ideas (about what is
statically determinable, for instance), and there is no real hope of a
resolution.

> > but I realize that I simply cannot afford the time on such a pointless
> > discussion. We have lots of important issues to discuss, AIs to write,
> > etc.
>
> I would find it acceptable for an officer of the ARG to say that he feels
> discussion of a certain issue has to be deprioritised, in favour of other
> more important issues. If that is what you mean, Randy, then I wish you
> would say it directly, instead of apparently inventing a lot of specious
> technical arguments in a hopeless effort to dismiss the issue altogether.

Well, nothing I've said is "specious". If you think so, fine, you're welcome
to your opinion.

> > We cannot change the specification of Allocate (otherwise, all
> > existing pools would become illegal - a gratuitous change for a
extremely
> > minor issue). Thus, we'd have to have some goofy rule like: "An
> > implementation shall not pass zero to Allocate."
>
> I disagree. The following declaration could be harmlessly added to
> System.Storage_Elements:
>
>    subtype Positive_Storage_Count is
>       Storage_Count range 1..Storage_Count'Last;
>
> and the declaration of Allocate in System.Storage_Pools could be
> changed to:
>
>    procedure Allocate(
>       ...
>       Size_In_Storage_Elements :
>          in Storage_Elements.Positive_Storage_Count;
>       ... ) is abstract;

Changing the specification of Allocate would be incompatible with all
existing storage pool code. (The specifications would fail the subtype
conformance check.) Worse, any code changed to use the new definition would
be incompatible with Ada 95 compilers (for the same reason). That's not
going to happen, certainly not for a minor issue such as this.

If it wasn't for compatibility, I would be in favor of the change you
suggest - precisely because it don't have the problems that Bob's solution
would. But compatibility is not a negotiable item.

> > If you try to say that this result is somehow not well-defined,
> > you'll have to come up with a reason for it
>
> The reason is as follows. Nearly always, the reason why
> Address_to_Access_Conversions needs to be used is for interfacing with C (or
> C++) code. As has been noted, C (and C++) has parity between address
> equality and object identity, because it doesn't allow null objects. Thus we
> can use the same characteristic for Address_to_Access_Conversions. I doubt
> there will ever be any case where the semantics I suggest would fail.

This doesn't make any sense from an standards perspective. Please try to
explain it in terms of the standard.

And, in any case, only lousy C interfacing uses AAC. Ada 95 provides decent
ways to avoid using addresses in interface code. I would think it is more
useful in accessing hardware.

> Neither you, Randy, nor Tucker has even attempted to produce a refutation of
> the examples I gave, illustrating, to my mind, why strict access equality is
> important.

There is no point in doing so. My argument is that the implementation
overhead of the strict rule is excessive. I've provided numerous examples
and reasons for that. Clearly, some people seem to think that they would
rather pay that cost. Fine. Get your implementer to do it.

In any case, I was swayed by your example to the point that rule I proposed
only gave the permission to types that the designated type had size 0. I
agree we don't want it happening for access-to-unconstrained. But I think
that types with a designated size 0 are going to be vanishingly small (and
really represent a bug). I'd prefer to outlaw them altogether, but that
would bring up the compatibility issue again. (Not a real one in this case.)

...
> In fact, I'm saying that I think it is better (although regrettable) for
> thousands of programs to work slightly slower, than for one deployed program
> to fail; Sod's law dictates that the program would be flying an aeroplane,
> or running somebody's heart monitor.

That's understandable, but also potentially a violation of "performance
compatibility". We've spent a lot of effort on more important issues
insuring that Ada 200Y changes do not impact the performance of existing
code. But note that this is not as absolute.

In any case, I've come around to the position that no change here is best. I
don't for a moment think that any real user will ever care what the
resolution of this discussion is -- which is the classic definition of a
waste of time.

So this is the last I'll say on this topic until and unless it comes up on
the ARG agenda. (Other than I'll give a reply to Bob if he wants one.)

****************************************************************

From: Tucker Taft
Sent: Thursday, September 4, 2003  11:59 AM

Robert A Duff wrote:

> I feel like this argument is going around in circles, and I find it
> confusing.  I'm not sure, for example, exactly what Tuck, Pascal, and
> Randy believe the semantics of "=" on access values ought to be.  So let
> me start by summarizing *my* position:
>
>     1. I think it is essential that access "=" reflect object identity.
>        That is, X = Y if and only if both are null or else X.all and
>        Y.all are the same object.  I think object identity is clearly
>        defined: each elaboration of an object declaration creates an
>        object (and perhaps some component objects), each evaluation of
>        "new" creates a new object, etc.
>
>     2. I think the RM already says that, quite clearly.

I don't happen to agree with this.  I believe that
access values are primarily a way to reference objects
indirectly.  I don't feel that there is a requirement that they
represent the "identity" of an object, especially if the
designated object has 'Size of 0.  I accept that others could
interpret 4.5.2(12) that way, but I think there are enough
underlying problems with a strict interpretation of "access-value
as identity" that we have to take an "if and only if" interpretation
of 4.5.2(12) more of as an indication of what is true in the "typical"
case, with some thorny unmentioned details to get it exactly right.

I also don't see any requirement in 4.8 or 3.3.1 that when I create an
object, I allocate it its own unique address, nor its own unique
access value, nor even its own unique memory cells.  Clearly an access
value must identify its designated object.  I see no
requirement that it doesn't happen to equal some other
access value that designates some other object created
at some other time.

For technical arguments, the existence of address-to-access-value
conversion seems to make it clear that access values can
be created (by conversion from an address) that work just
fine, but which happen to equal another access value,
if the object whose 'address was taken happens to be
of zero size, or is a slice of some larger object.
In addition, it is clear that it is possible
that the result of an allocator can equal an access value
that designates an object to which Unchecked_Deallocation
has been applied.  There is no restriction in 13.11.2 on using "="
with an access value that designates a freed object.
The only restriction is that you cannot dereference it.

As far as what I would recommend, if we make any attempt
to "firm up" the definition in this area, I would replace 4.5.2(12) with:

   Two access-to-object values are equal if they
   designate the same object, or if both are the result
   of evaluating the literal NULL.  Two access-to-object
   values (of the same type) are unequal if one is the result of
   evaluating the literal NULL, and the other is the result of evaluating
   an allocator, 'Access, or 'Unchecked_Access, or if both
   are the result of evaluating an allocator, 'Access, or
   'Unchecked_Access, and the designated objects are not the
   same object, provided each designated object is an existing variable
   object of Size greater than zero. For other cases it is unspecified
   whether two access values are equal or unequal.

To put it another way, an implementation should make no special
effort to make "access-value = identity."  This paragraph sets
a limit on what a user may assume, with the intent of imposing
no restriction on a "natural, efficient" implementation of access
values.

This means that access-to-constants may be equal even if
the designated constants are created by distinct allocators, and access
values produced by address-to-access-value conversion do
not have to obey any special rules.  It means that if
a user algorithm depends on access values being *unequal,*
then they must be access-to-variable with designated object's Size
greater than zero.  This is easy enough to arrange, and
so the user should easily be able to accommodate it if
this use of access value equality is important.

 From a "mathematical" point of view, if two aliased objects are
the same, then they have the same access value, but the
contrapositive is not necessarily true; two distinct
objects may have the same access value.

I would claim that to correctly handle access values
produced by address-to-access-conversion and access
values that designate freed objects, some additional
wording is required in 4.5.2(12) no matter what rule
we adopt.  Also, 4.5.2(12) uses "equality" in the definition
of equality ("... if both are *equal* to the null access value").

But I also agree with Randy that the best
thing would probably be to drop this whole effort, and
rule ACATS tests involving equality of access values
with zero-sized designated objects, or with potentially
sharable access-to-constants, as being "pathological" and
not worth an iota of implementor effort.


>>...It is sort of whether we choose the
>>Fermion or Boson statistics ;-).
>
>
> You seem to have mistaken me for a physicist.  ;-)  Sorry, but I haven't
> the foggiest idea what "Fermion or Boson statistics" means.  Would you
> care to post a lesson in physics?  Sounds interesting.

In particle physics, all subatomic particles are either
fermions (spin 1/2, 3/2, ...) or bosons (spin 0, 1, 2, ...).
The Pauli exclusion principle says that two Fermions
(e.g. electrons) cannot have exactly the same state (they obey
"Fermi" statistics), whereas two Bosons (e.g photons) can
(they obey "Bose-Einstein" statistics).

Hence, if we think of designated objects as Bosons, then they
can have the same access value, whereas if they are Fermions, then
they must have distinct access values.  ;-)

****************************************************************

From: Nick Roberts
Sent: Tuesday, September 9, 2003  5:34 PM

"Tucker Taft" <stt@sofcheck.com> wrote:

> As far as what I would recommend, if we make any
> attempt to "firm up" the definition in this area, I
> would replace 4.5.2(12) with:
>
>    Two access-to-object values are equal if they
>    designate the same object, or if both are the result
>    of evaluating the literal NULL.  Two access-to-object
>    values (of the same type) are unequal if one is the result of
>    evaluating the literal NULL, and the other is the result of evaluating
>    an allocator, 'Access, or 'Unchecked_Access, or if both
>    are the result of evaluating an allocator, 'Access, or
>    'Unchecked_Access, and the designated objects are not the
>    same object, provided each designated object is an existing variable
>    object of Size greater than zero. For other cases it is unspecified
>    whether two access values are equal or unequal.

Yes, please.

****************************************************************

From: Pascal Leroy
Sent: Wednesday, September 10, 2003  2:30 AM

I am surprised to see that Nick is satisfied with this wording, but I will
adamantly oppose the last part, starting at "provided...".

****************************************************************

From: Nick Roberts
Sent: Wednesday, September 10, 2003  9:03 AM

I am still opposed to allowing equality of access values that refer to
different objects, but I would rather see this much more explicit form of
wording in the RM200X than what already exists. At least it would serve as
some warning (of a potential nasty surprise).

I am not opposed to non-existing objects (which have been de-allocated
explicitly using Unchecked_Deallocation) being excused. I am happy that the
semantics of 'dangling' access values (esp. how they compare for equality)
remain implementation defined. [I suspect that Pascal did not intend
otherwise.]

I ran a little quiz on comp.lang.ada to get people's reaction to the idea
that Print( (...,...,...,...) ); [see my examples in previous post] might
print something other than four lines. Of course there were only a few
replies, but the obvious conclusion was that the newcomers to Ada were
surprised but the old hands were not.

Incidentally, purely out of interest, I had a look in the ARM 83, and it [at
4.5.2(4)] contains almost exactly the same wording as RM95 4.5.2 (12). It
would seem the RM95 inherited the wording.

****************************************************************

From: Pascal Leroy
Sent: Wednesday, September 10, 2003  9:49 AM

Right, but this is already wildly erroneous (RM95 13.11.2(16)) so of
course we don't mandate "good behavior" in this case.

****************************************************************

From: Tucker Taft
Sent: Wednesday, September 10, 2003 12:26 PM

It is not erroneous (and definitely not "wildly" erroneous ;-)
to use an access value that designates a freed object in an
equality operator.  13.11.2(16) only disallows using names
that denote freed objects, which would require a dereference
of the access value (implicit or explicit ".all").

You also have to decide what to do about access values produced
by address-to-access-value conversion.  What are the rules
for equality involving one of them?

So I believe to be precise, this paragraph needs to at least
provide exemptions for these two cases.  I don't see sufficient
benefit in zero-'size objects to justify any implementation
expense in worrying about them w.r.t. equality, so *if* we
want to make this paragraph more precise, I am arguing for
exempting zero-'size objects as well.

****************************************************************

From: Robert A. Duff
Sent: Sunday, October 19, 2003  3:48 PM

Sorry to reraise this issue, which some folks may be tired of talking
about.  But a few weeks ago Randy told me he was filing an AI on the
subject, and he asked me to address a couple of issues I had failed to
address in my previous diatribes.  Since it's now an AI, I'm including
the arg mailing list -- I *hope* that's the right thing to do.

[For those who haven't seen this long thread, somebody asked whether a
user-defined Allocate routine needs to ensure that every call returns a
distinct Address, even if the size to be allocated is zero.  Tucker then
suggested that X = Y should be (or is) allowed to return either True or
False in some cases.  Several different suggestions for *which* cases
were proposed.  I claimed that equality on access types is defined in
terms of object identity, not addresses, and that it would damage the
language to change that fact.]

First of all, Tucker *seemed* to be claiming that 4.5.2(12) *could* be
interpreted to mean that zero-sized objects are a special case.  It says:

    Two access-to-object values are equal if they designate the same
    object, or if both are equal to the null value of the access type.

Since that paragraph does not contain the word "zero" or "size" or
anything else related to this issue, it seems to be sheer fantasy to
claim that it could be interpreted as saying anything special about
zero-sized objects.  Surely, if the author had intended to have a kludgy
special case for zero-sized objects, there would some words to that
effect in the paragraph.  (Note that in the very *next* paragraph, there
*is* a kludgy special case for implementation reasons, and as expected,
this is stated explicitly.  In fact it's stated twice (4.5.2(13), and
3.10.2(39)).)

I strongly object to the idea that this paragraph is totally ambiguous,
and that therefore the ARG is free to fiddle with it arbitrarily.

If Tucker is not making that claim, he should clarify what he *is*
claiming.  Otherwise, he should buy me a new pair of eyeglasses, because
I can't see the word "zero" that he discerns in para 12.  ;-)

Seriously, this paragraph seems to be one of the clearest in the RM.
It means:

    Two access-to-object values are equal if AND ONLY IF they designate
    the same object, or if both are equal to the null value of the
    access type.  That is, IF AND ONLY IF the objects have the same
    *identity* (in the nonnull case).

We briefly discussed that "if" might mean "if but not necessarily only
if", but I believe I refuted that idea, using Robert's rule.  So "if"
here must mean "if and only if" (as stated in Chapter 1).  It cannot
possibly be "interpreted" to mean "if, but not blah blah zero-sized blah
blah discriminants...".  I can understand that Tucker (and Randy, and an
earlier not-yet-debugged version of Pascal ;-)) don't *like* what it
says.  But to claim that it could be interpreted to say something other
than what it obviously says seems like an unfair debating tactic -- an
attempt to pretend that a proposed language change isn't really a
change.

If Tucker or somebody wants to claim that this paragraph could be
interpreted in some way, they have to point to some actual words in the
RM -- it's not good enough to say, "Well, *I* don't understand that
paragraph -- it seems utterly ambiguous to me."

Perhaps you meant "Perhaps the original author didn't think about the
zero-sized case when writing this paragraph."  Now *that* I could
believe (although I know *I* thought about it -- I've known for 20 years
that Ada requires a special zero check in some allocators!).

----

Now Tucker *did* point out one hole: If one or both access values being
compared designate an object that no longer exists (because of
Unchecked_Deallocation, or leaving the scope of an aliased stack
object), then that paragraph, if it means anything at all, means "="
returns False (because they don't designate the same object -- it's been
freed, so they don't designated *any* object).  That's clearly not what
we want; we want to allow True in this case.

I agree that this is a hole.  However, I object to pretending that the
hole is wider than it is.  Patching this hole in no way requires adding
new special cases about zero-sized objects.  The latter would be a
language change, not a patch to an existing hole.

Tucker also pointed out that this case is *not* declared erroneous, nor
is it declared "wildly erroneous".  ;-)  This is true.  However, any
program that compares pointers to deallocated objects necessarily has a
bug, because it is impossible to get any useful information from such a
comparison.  Usually, in Ada, we like compilers to catch bugs (either at
compile time or at run time).  However in this and a few other cases, we
do not wish to require that, because it would greatly harm efficiency.

It would seem useful if an implementation chose to detect such bugs.
Most implementations can't reasonably do that, but I see no reason to
*forbid* such detection.  It would be feasible to detect such bugs in an
implementation that has garbage-collection support, for example.
Therefore, I suggest that the best fix would be to say that "=" when one
or both access values designates a nonexistent (deallocated) object is a
bounded error: it might return either True or False, or it might raise
Program_Error.

In any case, the wording belongs in Chap 13, not Chap 4.

----

The other point Randy asked me to address is
Address_To_Access_Conversions.  If there is a hole in the semantics of
Address_To_Access_Conversions, then it should be patched in Chap 13.
Using some weird issue related to Address_To_Access_Conversions to
drive the high-level semantics of access equality is a case of the tail
wagging the dog.

I don't think there *is* a hole in Address_To_Access_Conversions; please
correct me if I'm wrong.  Address_To_Access_Conversions does whatever it
does, and that depends on the run-time model chosen by the
implementation.  For example, if a garbage collector is moving things
around in memory, Address_To_Access_Conversions might have some pretty
weird behavior -- that's the programmer's problem.  Contrast this with
access equality, which must return the right answer even if the GC is
moving things around.  People who implement GC's know how to accomplish
this (even in an incremental GC).

If there *is* a hole in Address_To_Access_Conversions, please explain
what it is.  And we can fix it in Chap 13.

Whether or not there is such a hole, there is certainly no need for
4.5.2(12) to mention Address_To_Access_Conversions.  Chapter-13-ish
features (Address_To_Access_Conversions, Unchecked_Conversion, Address
clauses, etc) can violate *any* run-time rule in the whole manual.  The
entire RM is written with that assumption.  For example, we say that
discriminants can't change.  Of course, they *can* change if you use
Chapter-13-ish features, but we don't pollute the section on
discriminants by chatting about that.

----

Now I finally get to the *real* issue: what *ought* the RM say about "="
on access-to-zero-sized objects?

Tucker proposes (as I understand it), that "=" should be allowed to
return either True or False if *both* of the access values designate
distinct zero-sized objects.

I think Tucker has failed to address some points I made earlier.

The claim is that the above permission eliminates some distributed
overhead.  For example, an allocator might maintain a pointer to the
next-to-be-allocated location, and increment that by the size of the
object, which might be zero.  Thus two successive Allocates might return
the same Address.  But this implementation is wrong, even with the above
permission, because it would cause "=" to return True when just *one* of
the objects is zero-sized; the next-allocated one might not be zero
sized.

Do you propose to allow either True or False when just *one* of the two
objects is zero-sized?  That would be even worse than breaking object
identity!  It would break the principle that "X = Y" implies
"X.all = Y.all" (if non-null).  I strongly object to a language in which
X.all = "Hello, world", and Y.all = "", yet X = Y!  (Worse, "...yet X
*might* equal Y.")

----

Another point Tucker failed to address is that if you claim that
equality of zero-sized objects is somehow evil programming practise, the
burden of proof is on you.  You've given no reason why a programmer
*shouldn't* use "=" in this case.

Suppose I say, if X contains the integer number 1_259_792_008, and Y
contains the number 7_992_829, then X = Y can return either True or
False.  After all, this is an extremely rare thing to do; I've never run
across these numbers in all my life of programming.  It's an unimportant
case -- I'll bet if a compiler had this bug, nobody would notice.

Well, that's not good enough.  I can't just say it's rare.  I can't just
say it's an unimportant case.  I can't challenge someone to produce a
useful program containing comparison of those numbers.  I have to give
some logical argument that this is poor programming practise; that
there's something *special* about those particular numbers that should
logically make them incomparable.

Tucker has said he doesn't think the zero-sized case is important.
But to change the language in this way, I claim he has to show what's
special about that case that makes it fundamentally wrong to use "=".
(In fact, Nick showed just the opposite in his example.)

----

And, oh by the way, we should address the original question.  I think
the appropriate change would be to forbid implementations from passing
zero as the size to Allocate.  This would make it clear that the zero
check needs to happen at the call site of Allocate (where it can often
be optimized away), and that programmers can portably assume that
user-defined Allocate need not check for zero.  [I think the RM already
says this, but the argument is subtle.]

Randy made a point related to this, which I have so far failed to
address: he said something about direct calls to Allocate (not via
"new").  Well, I think that's a red herring.  The RM doesn't say
anything about what Allocate must do.  If you call it directly, it does
whatever it does.  The RM just says that *if* Allocate obeys a certain
contract, and *if* it is called via "new", then certain things happen.
There is nothing in the RM that forbids doing the zero check as part of
"new" rather than as part of Allocate, and in fact that implementation
choice is preferable, which is why I advocate requiring it.

----

These things should be enforced by ACATS!

****************************************************************

From: Robert A. Duff
Sent: Sunday, October 19, 2003  4:46 PM

I wrote:

> But a few weeks ago Randy told me he was filing an AI on the
> subject, and he asked me to address a couple of issues I had failed to
> address in my previous diatribes.

How's that for passing the blame?  I refuse to take proper
responsibility for my own actions, and blame Randy.  Sorry,
Randy.   ;-)

Randy did say that he might be "sticking a branch into a beehive".
;-)

****************************************************************

From: Robert Dewar
Sent: Sunday, October 19, 2003  7:05 PM

[For those who haven't seen this long thread, somebody asked whether a
user-defined Allocate routine needs to ensure that every call returns a
distinct Address, even if the size to be allocated is zero.  Tucker then
suggested that X = Y should be (or is) allowed to return either True or
False in some cases.  Several different suggestions for *which* cases
were proposed.  I claimed that equality on access types is defined in
terms of object identity, not addresses, and that it would damage the
language to change that fact.]

I strongly agree with this, trying to kludge up equality for some implementor
convenience here for zero sized objects is an unjustified change in the
language.

****************************************************************

From: Robert I. Eachus
Sent: Sunday, October 19, 2003  8:50 PM

I also agree that there is nothing currently broken here, and trying to find
wiggle room via appeals to cases where Unchecked_Deallocation has been called
is wrong.  If the access value was set to null by Unchecked_Deallocation, that
case is clearly covered by the current wording.  If a user maintains a copy of
an access value X that is not cleared by Unchecked_Deallocation, of course
tests for equality of X and Y may return true or false when Y is a different
access value.

In fact, this reminds me of a discussion I just had on c.l.a. The actual
subtype of the Object parameter in instantiations of Unchecked_Deallocation is
really useless.  (Except that it forbids Unchecked_Deallocation for
access-to-constant types.)  Should the compiler be allowed to ignore calls to
an instance of Unchecked_Deallocation if the subtype I used in the
instantiation was of zero length?  Of course not!  The fact that the subtype in
the allocation or deallocation was of zero length should have no effect on the
language rules.

Same applies here.  If I want to use access types as laundary tickets and
ignore the data in the designated type, the rules for access types should still
prevent me from accidentally giving out two identical laundary tickets.  (Yes,
I know, if I deallocate something and there are still live access values
around, I am headed for trouble.  But I would be anyway if I didn't make the
laundry tickets limited so users couldn't duplicate them. ;-)

****************************************************************

From: Randy Brukardt
Sent: Monday, October 20, 2003 12:11 PM

> Sorry to reraise this issue, which some folks may be tired of talking
> about.  But a few weeks ago Randy told me he was filing an AI on the
> subject, and he asked me to address a couple of issues I had failed to
> address in my previous diatribes.  Since it's now an AI, I'm including
> the arg mailing list -- I *hope* that's the right thing to do.

No, you should never cross-post between ARG and Ada-Comment. There are a
number of reasons for this:
-- You're making a private e-mail list's address public by so doing. That
could lead to people trying to post to ARG that aren't members;
-- Most of us are on both lists. By cross-posting, we get the messages
twice. Those of us who aren't on both lists have decided to stay out of
public technical discussions, and thus have decided to avoid the discussion
until it comes up on the ARG agenda.
-- By bringing another group into the middle of the discussion, we end up
having to rehash all of the arguments again (which is especially annoying
given the fact that Bob seems to want to win the argument by sheer volume of
messages. :-)

As happened the last time, I can't answer this whole message. But a quick
scan showed a couple of disagreements:

> Tucker also pointed out that this case is *not* declared erroneous, nor
> is it declared "wildly erroneous".  ;-)  This is true.  However, any
> program that compares pointers to deallocated objects necessarily has a
> bug, because it is impossible to get any useful information from such a
> comparison.

That's not true. It is useful to compare such objects against null, and
indeed that ought to work properly. We use that when streaming in the
symboltable in the Janus/Ada compiler -- if a pointer is not null, we read
another object from the stream. If it is null, there is no object to read
in. This seems to be a generally useful technique, and we don't want to lose
it.

> Usually, in Ada, we like compilers to catch bugs (either at
> compile time or at run time).  However in this and a few other cases, we
> do not wish to require that, because it would greatly harm efficiency.

> It would seem useful if an implementation chose to detect such bugs.
> Most implementations can't reasonably do that, but I see no reason to
> *forbid* such detection.  It would be feasible to detect such bugs in an
> implementation that has garbage-collection support, for example.
> Therefore, I suggest that the best fix would be to say that "=" when one
> or both access values designates a nonexistent (deallocated) object is a
> bounded error: it might return either True or False, or it might raise
> Program_Error.
>
> In any case, the wording belongs in Chap 13, not Chap 4.

Thus, I totally object to this wording. I'm not against making this a
bounded error in general, but the compare against null case should not be an
error (or undefined, for that matter).

> ----
>
> The other point Randy asked me to address is
> Address_To_Access_Conversions.  If there is a hole in the semantics of
> Address_To_Access_Conversions, then it should be patched in Chap 13.
> Using some weird issue related to Address_To_Access_Conversions to
> drive the high-level semantics of access equality is a case of the tail
> wagging the dog.

There is no hole in Address_to_Access_Conversions (AAC). The problem is your
insistence on an object-based model for equality compares, with no wiggle
room.

If you have such a comparison, you then have to say what object the result
of an AAC is. Otherwise, you have no basis to decide what the answer from
such a comparison is. And, as I showed earlier, any answer to this question
essentially requires that all objects (not just aliased ones) have to have a
unique id. The only way to do that is to disallowing zero-sized objects -
and that has a runtime cost, especially for array slices.

...
> Randy made a point related to this, which I have so far failed to
> address: he said something about direct calls to Allocate (not via
> "new").  Well, I think that's a red herring.  The RM doesn't say
> anything about what Allocate must do.  If you call it directly, it does
> whatever it does.  The RM just says that *if* Allocate obeys a certain
> contract, and *if* it is called via "new", then certain things happen.
> There is nothing in the RM that forbids doing the zero check as part of
> "new" rather than as part of Allocate, and in fact that implementation
> choice is preferable, which is why I advocate requiring it.

I very much dislike having the specification of a routine fail to indicate
its contract. Moreover, any good Ada programmer will include a check for any
invariants in a subprogram's implementation; the only check that they would
ever omit would be ones implied by the subprogram's specification. You in
fact said that you did this yourself. So the net effect is to require a
double check, both of which are often dynamic and thus will require code.
Allocations are common, so requiring the check at the point of the call will
add a substantial amount of code to the system. (Our experience is that
around 50% of the allocations in the program text are dynamically sized -
there are a lot of allocations of generic formal parameters and of
dynamically sized arrays in the programs that we look at.)

****************************************************************

From: Robert A. Duff
Sent: Monday, October 20, 2003  3:02 PM

> No, you should never cross-post between ARG and Ada-Comment.

OK, sorry.

But doesn't it seem like a shame that the ARG will now debate the issue
in its ivory tower, and the original questioner doesn't even get to
listen in, or say anything?  The original question, by the way, is
getting lost in all the rhetoric generated after Tuck weighed in.

>... There are a
> number of reasons for this:
> -- You're making a private e-mail list's address public by so doing. That
> could lead to people trying to post to ARG that aren't members;
> -- Most of us are on both lists. By cross-posting, we get the messages
> twice. Those of us who aren't on both lists have decided to stay out of
> public technical discussions, and thus have decided to avoid the discussion
> until it comes up on the ARG agenda.
> -- By bringing another group into the middle of the discussion, we end up
> having to rehash all of the arguments again (which is especially annoying
> given the fact that Bob seems to want to win the argument by sheer volume of
> messages. :-)

Whereas Randy and Tucker are happy with the status quo (no ACATS test
for it), so they want to win the argument by ignoring my points,
and declaring the whole issue unimportant.  :-)

OK, I put a smiley, just like Randy above.  Seriously, I tried reduce my
verbiage since we discussed it earlier.

> As happened the last time, I can't answer this whole message. But a quick
> scan showed a couple of disagreements:
>
> > Tucker also pointed out that this case is *not* declared erroneous, nor
> > is it declared "wildly erroneous".  ;-)  This is true.  However, any
> > program that compares pointers to deallocated objects necessarily has a
> > bug, because it is impossible to get any useful information from such a
> > comparison.
>
> That's not true. It is useful to compare such objects against null, and
> indeed that ought to work properly. We use that when streaming in the
> symboltable in the Janus/Ada compiler -- if a pointer is not null, we read
> another object from the stream. If it is null, there is no object to read
> in. This seems to be a generally useful technique, and we don't want to lose
> it.

OK, I have no objection to that.  I don't think it changes anything
about the zero-size issue, though.

> > Usually, in Ada, we like compilers to catch bugs (either at
> > compile time or at run time).  However in this and a few other cases, we
> > do not wish to require that, because it would greatly harm efficiency.
>
> > It would seem useful if an implementation chose to detect such bugs.
> > Most implementations can't reasonably do that, but I see no reason to
> > *forbid* such detection.  It would be feasible to detect such bugs in an
> > implementation that has garbage-collection support, for example.
> > Therefore, I suggest that the best fix would be to say that "=" when one
> > or both access values designates a nonexistent (deallocated) object is a
> > bounded error: it might return either True or False, or it might raise
> > Program_Error.
> >
> > In any case, the wording belongs in Chap 13, not Chap 4.
>
> Thus, I totally object to this wording. I'm not against making this a
> bounded error in general, but the compare against null case should not be an
> error (or undefined, for that matter).

OK with me.  I suggest we make it a bounded error in the non-null case
(both operands non-null, and at least one operand fails to point at any
object, because the object has been freed).

> > ----
> >
> > The other point Randy asked me to address is
> > Address_To_Access_Conversions.  If there is a hole in the semantics of
> > Address_To_Access_Conversions, then it should be patched in Chap 13.
> > Using some weird issue related to Address_To_Access_Conversions to
> > drive the high-level semantics of access equality is a case of the tail
> > wagging the dog.
>
> There is no hole in Address_to_Access_Conversions (AAC). The problem is your
> insistence on an object-based model for equality compares, with no wiggle
> room.
>
> If you have such a comparison, you then have to say what object the result
> of an AAC is.

No, I don't.  The semantics of AAC in those weird cases is entirely up
to the implementation.  In practise, if the user wants to take the
'Address of an unaliased object, or an address that comes from C,
etc, and convert it to an access value, then the user has to understand
the implementation's run-time model, and make sure the bits look "as if"
the thing were a properly allocated aliased object.  If the user doesn't
ensure that, then "=" might not work; that's true of many other
operations, like .all.  And chapter 4 doesn't need to say that!

When considering the chap 4 semantics of "=", all I care about is
properly allocated aliased objects.  Chap 13 can deal with anything else
(usually by saying "implementation dependent").  Otherwise, the tail
wags the dog.

I'm not sure why you pick on AAC.  The same issue arises for
machine-code inserts, pragma Suppress, and all the other chap 13 stuff.
(OK, pragma Suppress isn't in chap 13, but it should be.)

>... Otherwise, you have no basis to decide what the answer from
> such a comparison is. And, as I showed earlier, any answer to this question
> essentially requires that all objects (not just aliased ones) have to have a
> unique id. The only way to do that is to disallowing zero-sized objects -
> and that has a runtime cost, especially for array slices.

No, with my suggested "interpretation" of 4.5.2(12), there is nothing
forbidding zero-sized unaliased objects.  This is because *everything*
in chap 4 (and most others) must be understood to have an implicit
"unless you do some weird chap-13-ish thing".

> ...
> > Randy made a point related to this, which I have so far failed to
> > address: he said something about direct calls to Allocate (not via
> > "new").  Well, I think that's a red herring.  The RM doesn't say
> > anything about what Allocate must do.  If you call it directly, it does
> > whatever it does.  The RM just says that *if* Allocate obeys a certain
> > contract, and *if* it is called via "new", then certain things happen.
> > There is nothing in the RM that forbids doing the zero check as part of
> > "new" rather than as part of Allocate, and in fact that implementation
> > choice is preferable, which is why I advocate requiring it.
>
> I very much dislike having the specification of a routine fail to indicate
> its contract.

Me, too.  But there are lots of preconditions that we can't express in
Ada.  This one we *could* express, but for compatibility reasons.
Oh, well.  No big deal.

>... Moreover, any good Ada programmer will include a check for any
> invariants in a subprogram's implementation; the only check that they would
> ever omit would be ones implied by the subprogram's specification. You in
> fact said that you did this yourself. So the net effect is to require a
> double check, both of which are often dynamic and thus will require code.
> Allocations are common, so requiring the check at the point of the call will
> add a substantial amount of code to the system.

My user-defined allocators do have lots of pragmas Assert in them,
including Assert(size /= 0).  So I suppress these in production mode.
(Don't give me the old chestnut about lifejackets on land.  You can't
have it both ways -- if it's too expensive, you have to suppress it.
Only the engineer building that software can decide.)

>... (Our experience is that
> around 50% of the allocations in the program text are dynamically sized -
> there are a lot of allocations of generic formal parameters and of
> dynamically sized arrays in the programs that we look at.)

But you have to admit that this is a special case for your run-time
model.  If you're not sharing generic bodies, then the size of a generic
formal type can be known at compile time.  I am neither surprised nor
concerned that generic sharing has a run-time cost; it presumably
benefits by a compile-time gain (and memory savings).

For the kinds of programs *I* write, I would guess that at least 90% of
all allocations are for small records of compile-time-known size.
The rest are dynamic arrays.  And yes, Allocate *is* an efficiency
bottleneck, so I don't want distributed overhead.

Here's a question for compiler vendors: Are there any compilers that are
capable of inlining the Allocate routine that is implicitly called when
you do a "new"?  It seems silly to complain about the extra zero check
(I repeat: can often be optimized away), when the call overhead is
surely more costly.

****************************************************************

From: Robert Dewar
Sent: Monday, October 20, 2003  8:49 PM

> > Thus, I totally object to this wording. I'm not against making this a
> > bounded error in general, but the compare against null case should not be an
> > error (or undefined, for that matter).

Why not? I disagree, it is clearly an error to reference a non-existant
(freed) access value, and it is just fine if an implementation raises
program_error in response.

****************************************************************

From: Randy Brukardt
Sent: Monday, October 20, 2003  9:04 PM

> > > Thus, I totally object to this wording. I'm not against making this a
> > > bounded error in general, but the compare against null case
> should not be an
> > > error (or undefined, for that matter).
>
> Why not? I disagree, it is clearly an error to reference a non-existant
> (freed) access value, and it is just fine if an implementation raises
> program_error in response.

Sigh, I explained it in the original message, which Bob even quoted.

<<That's not true. It is useful to compare such objects against null, and
indeed that ought to work properly. We use that when streaming in the
symboltable in the Janus/Ada compiler -- if a pointer is not null, we read
another object from the stream. If it is null, there is no object to read
in. This seems to be a generally useful technique, and we don't want to lose
it.>>

The objects in question are not only freed, they're from a previous program
run. But there still is useful information there. Comparing an access value
against null should always work, no matter where the value came from.

****************************************************************

From: Robert Dewar
Sent: Monday, October 20, 2003  10:17 PM

> The objects in question are not only freed, they're from a previous program
> run. But there still is useful information there. Comparing an access value
> against null should always work, no matter where the value came from.

The fact that something is useful is not of itself a justification for semantically
peculiar behavior. Who says that a pointer to a freed access object must compare
non-null. In a garbage collected environment, it would make eminent sense to set
such pointers to null forcibly. Furthermore for a pool specific access type,
there is no requirement that null be some special value, just that it be
distinguishable from any other object and equal to itself (it would for example
be fine to have null be a pointer to a special dummy object).

****************************************************************


Questions? Ask the ACAA Technical Agent