!standard 4.05.2 (12) 03-09-16 AI95-00350/01 !standard 13.11 (7) !class binding interpretation 03-09-16 !status work item 03-09-16 !status received 03-08-21 !qualifier Omission !priority Low !difficulty Medium !subject Allocating and comparing zero-size objects !summary [Editor's note: This is an unfinished draft. Private conversations have made it clear that no acceptable rule will get consensus, so I will not waste any more time on this one.] The Allocate routine in System.Storage_Pools can be called with the Size_In_Storage_Elements equal to zero. Pools do not need need to insure that the value they return is unique in that case. Equality of access values ... !question The Allocate routine in System.Storage_Pools declares the Size_In_Storage_Elements parameter to be of type System.Storage_Elements.Storage_Count, which is a subtype whose lower bound is zero. When defining a new type derived from Root_Storage_Pool, and providing a Allocate routine for that type, is the Allocate routine expected to correctly handle the case when this parameter is zero? (Yes.) If so, do the resulting access values need to be unique? (No.) If not, it isn't possible to meet the requirements of 4.5.2(12) which imply that access to different objects must compare unequal. !recommendation (See wording.) !wording Add after 3.10(17): If the designated subtype is constrained, the elaboration of an access_to_object_definition and access_definition includes a check that the designated subtype has a size greater than zero. Program_Error is raised if this check fails. Change 4.5.2(12) to say: Two access-to-object values are equal if they designate the same object, or if both are the result of evaluating the literal NULL. Two access-to-object values (of the same type) are unequal if one is the result of evaluating the literal NULL, and the other is the result of evaluating an allocator, 'Access, or 'Unchecked_Access, or if both are the result of evaluating an allocator, 'Access, or 'Unchecked_Access, and the designated objects are not the same existing object. For other cases it is unspecified whether two access values are equal or unequal. !discussion It is temping to fix this problem by forbidding compilers to call Allocate with a size of 0. However, that really doesn't work. First, we cannot change the specification of Allocate to eliminate zero as a legal value. (Doing so would be incompatible with all programs that define pools, which is completely unacceptable.) Thus, we have to adopt a dynamic semantics rule which states that compilers never call with size zero. But, since we cannot prevent such a call from a user-written call, any well-written pool still will have to detect the case and handle it (or raise an exception). So there is no real savings from having such a rule. Second, doing so does not really fix the problems with the equality rule. We have similar problems with equality of access values created by Address_to_Access_Conversions, which also need to be addressed. Thus, we simply address the equality issues and do not change the meaning of Allocate. ... --!corrigendum B.03(69/1) !ACATS test This change only affects pathological programs, and then only makes their results unspecified. That is not testable. !appendix !topic Calling allocate routine with zero size !reference RM95 13.11(7), 13.7.1(4) !from Adam Beneschan 08-21-03 !discussion The Allocate routine in System.Storage_Pools declares the Size_In_Storage_Elements parameter to be of type System.Storage_Elements.Storage_Count, which is a subtype whose lower bound is zero. My question: When defining a new type derived from Root_Storage_Pool, and providing a Allocate routine for that type, is the Allocate routine expected to correctly handle the case when this parameter is zero? Consider this: procedure Try (Pool : in System.Storage_Pools.Root_Storage_Pool'Class; N : in Integer) is type Arr is array (Natural range <>) of Character; type Arr_P is access Arr; for Arr_P'Storage_Pool use Pool; P : Arr_P; P2 : Arr_P; begin P := new Arr (1 .. N); P2 := new Arr (1 .. N); end Try; Suppose we call Try(0). We need to "allocate" a block of memory of size zero. Of course, P and P2 have to be set to something that is not null, and they can't be equal to each other. Which of the following is true? In order for things to be portable, one or the other of these has to be true (and perhaps ought to be stated in the RM or AARM): (1) Allocate routines are not expected to deal with Size_In_Storage_Elements = 0. Compilers that generate (possibly dispatching) calls to Allocate must be careful to make sure the parameter is not zero. (They might, for instance, use a parameter of 1 in a case such as the above, or they might invoke a separate mechanism to deal with zero-length allocations.) (2) Compilers are allowed to generate calls to Allocate with Size_In_Storage_Elements = 0; Allocate routines are expected to handle them appropriately. **************************************************************** From: Tucker Taft Sent: Friday, August 22, 2003 5:32 AM I see nothing that implies an allocate routine may assume the Size_In_Storage_Elements is greater than zero, so it needs to handle zero "properly." The more important question I would ask is whether two distinct evaluations of an allocator must return access values that compare not equal. 4.5.2(12) seems to imply that, though it doesn't state it in language that is as explicit as, for example, the discussion in 4.5.2(13). Personally, I see no important reason for an allocate routine to go out of its way to make sure allocations of zero-length objects return distinct access values, but I believe I recall others saying that this is somehow very important. As you imply, there is some distributed overhead in accomplishing that, and, again personnally, the overhead seems like a waste of energy. If a user wants to use access values as unique tokens in some way, they can jolly-well allocate non-zero-length objects. I might go even further and allow two allocators to return the same access value even for non-zero-size objects, if the implementation can determine that they can safely share the same memory space. Certainly we allow access values to "reappear" after an unchecked deallocation has been performed. So we know that an access value designating a non-existent object (due to unchecked deallocation) and an access value designating a newly allocated object can be equal. Note that it is *not* erroneous to copy or compare an access value that designates a non-existent object, though it is erroneous to evaluate "acc_val.all" if acc_val designates a non-existent object. [I have always found C's concern about zero-length objects to be misplaced. It is often natural to have zero-length arrays, and even records, but in C you have to go out of your way sometimes to be sure you don't try to define a zero-length array, or else the compiler, if ANSI/ISO compliant, will slap your wrists. I have heard the argument relates to pointer equality, but again, what is the big deal if pointers to two logically distinct objects happen to be equal?] So I would probably recommend that 4.5.2(12) explicitly allow for the possibility that access values might be equal even if they designate logically distinct objects, so long as the objects have no updatable components. **************************************************************** From: Pascal Leroy Sent: Friday, August 22, 2003 6:58 AM A related question is whether two distinct evaluations of the 'Access attribute with different prefixes must necessarily return distinct access values. Consider: type R is null record; type A is array (1..10) of aliased R; type T is access all R; X : T := A(1)'Access; Y : T := A(2)'Access; I read 4.5.2(12) as requiring that X and Y be distinct. We deal with this requirement by ensuring that each array component occupies at least one byte, but this has surprised users in the past, and it is really stupid: if you happen to have objects that don't need any storage at all, why not take advantage of this? **************************************************************** From: Nick Roberts Sent: Monday, August 25, 2003 9:14 PM > A related question is whether two distinct evaluations of > the 'Access attribute with different prefixes must ne- > cessarily return distinct access values. Consider: > type R is null record; > type A is array (1..10) of aliased R; I think Pascal meant: A: array (1..10) of aliased R; > type T is access all R; > X : T := A(1)'Access; > Y : T := A(2)'Access; > I read 4.5.2(12) as requiring that X and Y be distinct. I think they should be, since it is possible that certain algorithms depend on this property for their correctness (indeed completability). I think 4.5.2(12) needs to have "Otherwise they are unequal." added. > We deal with this requirement by ensuring that each array > component occupies at least one byte, but this has surprised > users in the past, I'll bet it has, and I'm pretty sure it's unnecessary. My basic observation is that access values do not have to inidicate different addresses in order to have different values. An access value is, in Ada, pointedly (sorry ;-) not the same as an address. I'm not certain of the nitty gritty details, but I suspect you could use the following trick for access subtypes that happen to reference a null object (subtype): return an arbitrary distinct value for each distinct object; any instance of dereferencing becomes a null action (so no machine instruction will ever be executed trying to use the access value as an address). If the object subtype is not statically null, I suppose this requires some kind of dynamic test, but then Ada often requires dynamic tests (e.g. for violations that must raise an exception). So, to use the above example, even if an access value is, say, one machine word in size, and normally contains a linear address, the result of A(1)'Access could be, say, 1, and of A(2)'Access could be 2 (and so on). These values (1, 2, etc.) are not addresses, but they don't need to be. The access values remain distinct. Any attempt to dereference them must not cause the execution of actual dereferencing code. Array A need take up no storage (memory) space, nor have any associated (real) address. In fact, perhaps the definition of the Address attribute in the RM should have a clause added explictly permitting the result of the attribute, when it is applied to a null object, to be a value which is not a real (or valid) address. I think this same principle can be applied to the earlier question involving the System.Storage_Pools.Allocate procedure. An allocator (as in the 'new' reserved word) should always return a distinct access value, even for an object of zero size. However, the evaluation of the allocator, in such a case, should /not/ call System.Storage_Pools.Allocate; instead it can return a value which has nothing to do with addresses (but maintains the distinctness property). If the RM needs changing to reflect this, then it should be. While we're there (RM 13.11(16)), I also think it should be made clear that an implementation may call Allocate less than once per allocation (by 'new'), if it wishes (to cluster allocations of small objects). > and it is really stupid: if you happen to have objects that don't > need any storage at all, why not take advantage of this? I couldn't agree more. In fact, I have a suspicion that it is possible there will be a few algorithms which /require/ this property in order to work (on a machine with finite memory). **************************************************************** From: Tucker Taft Sent: Monday, August 25, 2003 10:16 PM I am mystified by Nick Roberts comments, as they seem to be contradictory. Perhaps he is saying that by some magic, you can have two aliased objects with the same address, but different access values. I can't imagine how. In any case, I agree with Pascal that it is stupid to go out of your way to make all aliased components have non-zero size. If there was a realistic example of an algorithm that fundamentally depended on access values designating distinct "empty" objects having different values, I would be interested. Otherwise, this seems like distributed overhead serving no purpose. **************************************************************** From: Pascal Leroy Sent: Tuesday, August 26, 2003 1:23 AM Right. First, I can't imagine how Address_To_Access_Conversion could be made to work. Second, I can't imagine that the user community would tolerate the additional cost of testing for "invalid" access values before each dereferencing; considering that there are millions of SLOCs out there that do heavy use of dynamic data structures (using access types), the proposed performance incompatibility sounds ludicrous to me. Anyone thinking that access values are not in a one-to-one correspondence with addresses is seriously misguided. This is true even for thick pointers, although in this case the address alone is obviously not sufficient to reconstruct the entire access value. **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 7:43 AM I am simply saying that an access value which accesses an object of zero size does not have to have any correspondence with any (real) address at all. I feel this is a simple enough concept, and I'm surprised that you are mystified, Tuck. You can indeed have two aliased objects with the same address, but different access values. It's very simple. First consider two aliased objects, A and B, of size 1 byte (= 1 storage unit) each. They might be laid out in memory like this: 001A 5670 A def 1 001A 5671 B def 1 They might have the access values (in binary) 16#001A_5670# for A'Access and 16#001A_5671# for B'Access. Now consider two more aliased objects, C and D, of size 0 each. They might be 'laid out' in memory like this: 001A 5672 C def 0 001A 5672 D def 0 They might have the access values (in binary) 16#8000_0001# for C'Access and 16#8000_0002# for D'Access. Allakazzam! Hey presto! We have two aliased objects with the same address (16#001A_5672#) but with different access values (16#8000_0001# and 16#8000_0002#). Can you imagine that? Seriously, Tuck, I don't understand what the problem is. Obviously the implementation must ensure it doesn't try to dereference C'Access or D'Access in the normal way. But in any event it must ensure it never attempts to load or store anything from or to C'Address or D'Address. The chances are it's got to make a special test (for a designated subtype of size 0) anyway. > In any case, I agree with Pascal that it is > stupid to go out of your way to make all > aliased components have non-zero size. Well, that's a point of agreement, at least. > If there was a realistic example of an > algorithm that fundamentally depended on > access values designating distinct "empty" objects > having different values, I would be interested. My point is that such an algorithm is possible. I was not, and am not, trying to make an assertion about the likelihood of such an algorithm being used in practice. I would simply prefer it that Ada compilers did not break such algorithms, even if they are unlikely. > Otherwise, this seems like distributed overhead > serving no purpose. I'm not sure, but I suspect that what I'm proposing would add no appreciable overhead (speed or memory) to remote access types. I would suggest, for one thing, that remote access types tend very rarely in practice to have a designated subtype capable of size 0. **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 7:43 AM > Right. First, I can't imagine how Address_To_Access_Conversion > could be made to work. I did, in my original post, specifically suggest that X'Address be permitted to result in an unreal or invalid address if X has size 0. In this case, I suggest To_Pointer(X'Address) could be permitted to return either X'Access or null, and To_Address(X'Access) could be permitted to return any address (which might be X'Address, or some other, possibly unreal or invalid, address). > Second, I can't imagine that the user community would tolerate the > additional cost of testing for "invalid" access values before each > dereferencing; I think the additional cost of these tests would be an issue of some complexity. I suspect that in many cases it would turn out to be very little or none, in fact. Conceptually, the test would not have to be performed at every dereferencing, as such, but at every elaboration of an access subtype which could dynamically have a designated subtype of size 0. (I use 'dynamic' here in the sense of 'at run time', as in something that cannot be computed at compile time.) > considering that there are millions of SLOCs out there that do heavy > use of dynamic data structures (using access types), the proposed > performance incompatibility sounds ludicrous to me. Performance incompatibility? What is that? (Perhaps the use of a high-level language is incompatible with performance, and we should all use assembly? :-) Remember that the dynamic test in question is only relevant to access types whose designated subtype is capable of dynamically having size 0 (as well as other sizes). I suspect that, of those millions of SLOCs, the vast majority would not fall into this category anyway. > Anyone thinking that access values are not in a one-to-one > correspondence with addresses is seriously misguided. Surely there are many implementations which add flags and other paraphernalia to an access value. Are these implementations all seriously misguided? > This is true even for thick pointers, although in this case the > address alone is obviously not sufficient to reconstruct the > entire access value. A thick pointer does not have a one-to-one correspondence with addresses. You seem to have contradicted your own assertion. **************************************************************** From: Pascal Leroy Sent: Tuesday, August 26, 2003 9:29 AM > I think the additional cost of these tests would be an issue > of some complexity. I suspect that in many cases it would > turn out to be very little or none, in fact. Conceptually, > the test would not have to be performed at every > dereferencing, as such, but at every elaboration of an access > subtype which could dynamically have a designated subtype of size 0. You've lost me. I fail to see why the check would relate to the elaboration of a subtype. Consider: type A is access String; subtype S is A (1..N); If N happens to be 0 (at runtime) then objects of subtype S could occupy 0 bytes, and therefore they would be denoted by "magic access values". But of course the declarations of A and S could be in totally unrelated units, so any dereferencing of a value of type A should be prepared to get one of these magic values. To me, that's one check per dereferencing. > Performance incompatibility? What is that? When we do any change to that language, we try very hard to ensure that the performance of existing programs is not affected. This is essential to support smooth transition from a version of Ada to the next. That's what I mean by "performance incompatibility". If real-time programs started missing cycles, people could be quite displeased. > Surely there are many implementations which add flags and > other paraphernalia to an access value. Are these > implementations all seriously misguided? I'd be curious to hear about real-life examples of such implementations (I'm not talking of remote access types here). **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 11:43 AM .. > If N happens to be 0 (at runtime) then objects of subtype S > could occupy 0 bytes, and therefore they would be denoted by > "magic access values". But of course the declarations of A and > S could be in totally unrelated units, so any dereferencing of a > value of type A should be prepared to get one of these magic > values. To me, that's one check per dereferencing. But you cite only one particular scenario, Pascal (and not in sufficient detail). There are many others. Supposing the declarations of A and S (and presumably various associated declarations and statements) do not occur in totally unrelated units, as will often be the case in practice. In particular, consider the dereferencing of access values that are known to be of the same subtype. If the compiler can determine that they are of the same subtype, it only has to check that subtype once (for having a designated subtype of zero size), and then it knows whether to treat all the dereferences as normal or not. Consider: type A is access String; ... subtype S is A (1..N); P1, P2: S := new String'( N*'#' ); ... Do_Something(P1.all,P2.all); Here the compiler can know that P1 and P2 must either both be dereferenced normally or both be dereferenced as of 'magic' access values. Only one check is required (conceptually at the elaboration of S). Furthermore -- if it's of any interest to anyone -- if my suggestion were adhered to by the compiler, we can be certain that P1 /= P2 for all values of N (including 0). I believe it is reasonably conceivable (I don't say 'likely') that an algorithm could depend on this property for its correctness. > > Performance incompatibility? What is that? > When we do any change to that language, we try very hard > to ensure that the performance of existing programs is not > affected. This is essential to support smooth transition > from a version of Ada to the next. Well, I accept that principle, broadly. > That's what I mean by "performance incompatibility". Okay, I understand now. > If real-time programs started missing cycles, people could > be quite displeased. They might indeed. They might even be displeased enough to rewrite their time-critical code more efficiently (for example, using an array instead of a pool). Actually, I salute your point, Pascal. It occurs to me that an implementation could provide ways (pragmas, perhaps) to slicken the operation of a particular access type or subtype, perhaps at the expense of certain semantic properties (that are generally desirable). This would be similar to the way in which pragma Suppress can be used to remove run-time checks where they cause too much speed degradation. > > Surely there are many implementations which add flags > > and other paraphernalia to an access value. Are these > > implementations all seriously misguided? > I'd be curious to hear about real-life examples of such > implementations (I'm not talking of remote access types > here). I really need someone to come galloping to my rescue at this point. I don't have access to the implementational details of a large number of Ada compilers. I would point this out, however: it's not possible for any access type to have a totally one-to-one relationship with real (meaningful) addresses, since every access type must have a 'null' value. This value /could/ be associated with a real address, of course, but I'll bet this isn't actually done in /any/ implementation. Maybe I'm arguing about minutiae. Would you please expand on your entreaty that access values should have a one-to-one relation to addresses, Pascal. Why should this be? **************************************************************** From: Randy Brukardt Sent: Tuesday, August 26, 2003 2:18 PM > I'd be curious to hear about real-life examples of such implementations > (I'm not talking of remote access types here). I prefer not to get involved here (mainly because I see no value to the supposed requirement that pointers to the different objects are unequal, and I see no requirement in the standard that this is true - there is no "only if" in 4.5.2(12)). However, Janus/Ada does have bit pointers (that is, a machine address and bit offset). Usually, these only show up in renames of packed components and the like. But, on the U2200, they also were used for (some) access types. Since that machine had a 36-bit word, and we needed to be able to access characters (for compatibility with C), we had to allow them as access types. (Luckily for us, the Unisys code generator already supported those as a native type, so we just used that rather than something constructed.) On the U2200, convention C on an access type would force the use of bit pointers (we also allowed conversions between the derived access types with different representation, with a check that the bit offset was zero, in order to provide better code on the Ada side). **************************************************************** From: Simon Wright Sent: Tuesday, August 26, 2003 2:41 PM I thought that, but what about 4.5.2(25)? (I too feel this isn't a very urgent change.) **************************************************************** From: Adam Beneschan Sent: Tuesday, August 26, 2003 3:09 PM I don't see how that makes a difference. 4.5.2(25) just says that "/="(X,Y) [if it returns Boolean] is equivalent to "not"("="(X,Y)) in all cases, whatever X and Y are and whatever "=" returns. It doesn't say anything about being evaluated according to a complementary *rule* that parallels the rule for "=", or anything like that. **************************************************************** From: Randy Brukardt Sent: Tuesday, August 26, 2003 3:17 PM > I thought that, but what about 4.5.2(25)? (I too feel this isn't a > very urgent change.) What about it? It just says that "/=" is the same as (not "="). Certainly, if A'Access = B'Access, then not (A'Access /= B'Access). That is still consistent with the RM. I don't see any reason for the bending over backwards that Pascal indicated that Rational does - the RM doesn't require it, and access to 0-sized objects is not very useful anyway. (A zero-sized object for a non-array is usually a bug, and for arrays, static matching prevents any interesting access to null slices.) **************************************************************** From: Simon Wright Sent: Wednesday, August 27, 2003 3:25 AM My misunderstanding (I guess I didn't see (25) as introducing an overarching concept. Brain damage.) **************************************************************** From: Pascal Leroy Sent: Tuesday, August 26, 2003 3:43 PM > I prefer not to get involved here (mainly because I see no value to the > supposed requirement that pointers to the different objects are unequal, and > I see no requirement in the standard that this is true - there is no "only > if" in 4.5.2(12)). Wrong argument, Randy. AARM 1.1.3(14.b) says "we often use 'if' to mean 'if and only if'...". So I am certainly going to call for an AI to clarify 4.5.2(12). (I would certainly love to get rid of the ridiculous distributed overhead that this paragraph seems to impose.) > However, Janus/Ada does have bit pointers (that is, a machine address and > bit offset). A perfectly reasonable implementation, which amounts to saying that bits are addressable (albeit using a complicated form of address). However, it is still the case that access values in this case have a direct correspondence to (bit) addresses. And therefore 4.5.2(12) is still a nuisance. **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 4:57 PM > If there was a realistic example of an > algorithm that fundamentally depended on > access values designating distinct "empty" objects > having different values, I would be interested. It is quite reasonable that a procedure which takes two parameters, both of a certain access type, tests to ensure they are not equal (because the procedure's algorithm is based on the assumption of no aliasing). The procedure might raise an exception if it detects that they are equal. Another piece of code might call the procedure as part of an algorithm that diminishes the size of the objects of the access type, and it may make a call in a limiting case where the size is zero. It may be difficult or ugly to eliminate the limiting case. Moreover, it might not be obvious that it is necessary to eliminate the limiting case. Consider: type R is array (Positive range <>) of Float; type A is access R; procedure Foo (N: in Natural; ...) is subtype S is A(1..N); procedure Bar (P1, P2: in S) is ... begin if P1 = P2 then raise Constraint_Error; end if; ... end; Q1, Q2: S := new S'(...); begin ... Bar(Q1,Q2); ... end Foo; Obviously this isn't a real-life example, but it shows how a call Foo(0,...) could raise Constraint_Error if the allocator initialising Q1 and Q2 returns the same access value to both. It might not be obvious to the programmer, in a situation like this, what is going wrong. Might this not be considered one of those nasty 'gotchas' that Ada is supposed to avoid? > Otherwise, this seems like distributed overhead > serving no purpose. As I suggest to Pascal, it might be reasonable for implementations to mitigate the overhead by providing a pragma which allows the behaviour I suggest to be removed from a specific access type. (I'm not sure whether the standard should specify such a pragma.) **************************************************************** From: Randy Brukardt Sent: Tuesday, August 26, 2003 7:03 PM > Obviously this isn't a real-life example, but it shows how a call Foo(0,...) > could raise Constraint_Error if the allocator initialising Q1 and Q2 returns > the same access value to both. It might not be obvious to the programmer, in > a situation like this, what is going wrong. Fine, but I can't imagine an implementation where an access-to-unconstrained would ever point at a 0-sized object (in practice, as opposed to the language definition). There has to be an array descriptor somewhere (to give the bounds), and that by definition is not of zero size. Therefore, I claim that the "problem" cannot occur with access-to-unconstrained. (It certainly cannot in Janus/Ada.) The static matching rules mean it cannot happen with an arbitrary slice of an array. It could only happen with a access to a subtype of size zero (either a null array or an null record). What possible use is such a type? It is a pointer to nothing. Why should we have distributed overhead to support such an access type? (Rational's solution makes objects larger simply in case someone tries to do this compare on two access values that are known to point at nothing. That seems ridiculous.) So, I would be in favor of Pascal's AI saying something along the lines of "the result of equality is unspecified if the designated subtype is constrained and has size 0." Implementors would still be required to make access-to-unconstrained work (because I can agree that that could be dangerous; null slices are not uncommon), but that would happen 'naturally' in the vast majority of implementations. **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 9:12 PM > Fine, but I can't imagine an implementation where an access- > to-unconstrained would ever point at a 0-sized object (in > practice, as opposed to the language definition). Such implementations are indeed (reasonably) possible. > There has to be an array descriptor somewhere (to give > the bounds), and that by definition is not of zero size. Quite true, but the descriptor, or 'dope' data, does not have to accompany the object's value in memory. It can be elsewhere (and it can be shared by many access values). Notionally, dope data is associated with (constrained) subtypes of the access type, rather than with objects per se. At bottom, the danger, to my mind, is that some implementations may produce access values for Q1 and Q2 such that the test Q1=Q2 returns True (even if the binary patterns of the values of Q1 and Q2 are different). I am exhorting that they should not (be allowed to) do so; I don't actually care what mechanism they use to ensure this. The mechanism I suggested was only a suggestion. > Therefore, I claim that the "problem" cannot occur with > access-to-unconstrained. (It certainly cannot in Janus/Ada.) I claim that it can, for implementations different to Janus/Ada. ... > So, I would be in favor of Pascal's AI saying something along the > lines of "the result of equality is unspecified if the designated > subtype is constrained and has size 0." I continue my opposition to this idea. It still seems clear to me that an access value is not just a representative of a storage space (as an address is), but rather it is also a token representing a (conceptual, as well as actual) object. Possibly this point is only properly relevant to access-to-limited-type types, but I feel that when you say "new My_Limited_Type", you don't just mean "allocate some space on the heap, please", you also mean "create a new token to represent a distinct object". That second meaning should hold true even if the object happens to have zero size. **************************************************************** From: Tucker Taft Sent: Wednesday, August 27, 2003 9:43 AM In re-reading 4.5.2(12), in comparison with 4.5.2(13), it certainly seems "reasonable" to conclude there is no requirement that access-to-object values be *un*equal if they designate different objects. It would also be reasonable to read "if" as "if and only if" so we clearly have an ambiguity which deserves clarification by the ARG. One way Randy and others have helped illuminate such issues in the past has been by writing up an example and seeing what various compilers do with it. If all compilers agree, then that is an argument for solidifying the interpretation that way. If the compilers don't agree, then the general approach is to disambiguate in favor of least implementation disruption, unless there is a compelling user requirement to the contrary. Here, we have heard some user arguments in favor of requiring inequality if the designated objects are different, but it has not reached the level of "compelling" in most readers' minds, apparently. So... Here is a simple example to illustrate the issue. Please give this a try and see what your favorite compiler produces. One of ours produces False,False,False. Another produces False, True, True. ----------- access_equality.ada ------------ with Ada.Text_IO; use Ada.Text_IO; procedure Access_Equality is type Int_Arr is array(Positive range <>) of Integer; subtype Int_Arr_Zero is Int_Arr(1..Integer'Value("0")); type Int_Arr_Ptr is access Int_Arr_Zero; for Int_Arr_Ptr'Storage_Size use 500; -- Use a sized heap to indicate reclamation may not be necessary type Int_Arr_Const_Ptr is access constant Int_Arr_Zero; -- Use an access-to-constant to indicate reclamation won't happen type Empty_Rec is null record; type Empty_Rec_Ptr is access all Empty_Rec; type Empty_Rec_Arr is array(Positive range <>) of aliased Empty_Rec; A, B : Int_Arr_Ptr; CA, CB : Int_Arr_Const_Ptr; C, D : Empty_Rec_Ptr; ERA : Empty_Rec_Arr(1..10); begin A := new Int_Arr_Zero; B := new Int_Arr_Zero; Put_Line("A = B? " & Boolean'Image(A = B)); CA := new Int_Arr_Zero'(others => 0); CB := new Int_Arr_Zero'(others => 0); Put_Line("CA = CB? " & Boolean'Image(CA = CB)); C := ERA(1)'Access; D := ERA(2)'Access; Put_Line("C = D? " & Boolean'Image(C = D)); end Access_Equality; **************************************************************** From: Pascal Leroy Sent: Wednesday, August 27, 2003 10:04 AM My favorite compiler produces False, False, False. It's my development version, not anything that any customer has seen or will see. On the other hand, we gave special attention to this issue, so I would expect to see the same result regardless of the target or version. **************************************************************** From: Nick Roberts Sent: Wednesday, August 27, 2003 11:10 AM > In re-reading 4.5.2(12), in comparison with 4.5.2(13), > it certainly seems "reasonable" to conclude there is > no requirement that access-to-object values be *un*equal if > they designate different objects. It would also be > reasonable to read "if" as "if and only if" so we > clearly have an ambiguity which deserves clarification by > the ARG. I think everyone agrees with this (I certainly do). > One way Randy and others have helped illuminate such > issues in the past has been by writing up an example > and seeing what various compilers do with it. If all > compilers agree, then that is an argument for solidifying > the interpretation that way. If the compilers don't > agree, then the general approach is to disambiguate in > favor of least implementation disruption, unless there > is a compelling user requirement to the contrary. I agree with this approach, so long as it never becomes absolute dogma. > Here, we have heard some user arguments in favor of > requiring inequality if the designated objects are > different, but it has not reached the level of > "compelling" in most readers' minds, apparently. Fair enough. If the ARG (WG9?) decides not to require inequality, then I would like to see that this point is made clear in (the next revision of) the RM, possibly with a little example drawing attention to the point. > Here is a simple example to illustrate the issue. > Please give this a try and see what your favorite > compiler produces. One of ours produces False,False,False. > Another produces False, True, True. GNAT 3.15p (CygWin build of gcc 2.8.1 on Windows 95) produces False, False, True. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 8:37 AM Tucker wrote: > > Otherwise, this seems like distributed overhead > > serving no purpose. And Nick replied: > I'm not sure, but I suspect that what I'm proposing would add no appreciable > overhead (speed or memory) to remote access types. I would suggest, for one > thing, that remote access types tend very rarely in practice to have a > designated subtype capable of size 0. Nick, I think you misunderstand what Tucker means by "distributed overhead". He's not talking about distributed systems, the DS annex, or remote access types, or anything like that. "Distributed overhead" means that you pay an efficiency cost for the mere existence of a feature, even when you don't use that feature. For example, if the run-time system is continually checking "has this task been aborted?", then that's distributed overhead, because programs that don't use abort at all have to pay for the mere existence of abort in the language. The language designer can eliminate distributed overhead (in this case by not putting abort in the language). An implementer can eliminate distributed overhead (by cleverly optimizing the run-time system in the no-abort case). **************************************************************** From: Nick Roberts Sent: Thursday, August 28, 2003 9:54 AM > Nick, I think you misunderstand what Tucker means by "distributed > overhead". He's not talking about distributed systems, the DS > annex, or remote access types, or anything like that. Thanks for the clarification, Bob; I did misunderstand. > "Distributed overhead" means that you pay an efficiency cost for > the mere existence of a feature, even when you don't use that > feature. For example, if the run-time system is continually checking > "has this task been aborted?", then that's distributed overhead, > because programs that don't use abort at all have to pay for the > mere existence of abort in the language. Well the concept is familiar to me, but not the term "distributed overhead". A term such as "universal overhead" would seem more appropriate. But then that's computer science for you. (Witness the term "artificial inteligence" :-) > The language designer can eliminate distributed overhead (in this > case by not putting abort in the language). An implementer can > eliminate distributed overhead (by cleverly optimizing the run-time > system in the no-abort case). I like to think of Ada as being the Rolls Royce (perhaps I should say Lincoln Continental :-) (or Zob ;-) of programming languages. And we all know that one of the things you pay for, when you dole out your wedges of dosh for a car of that kind, is the fact that all the luxury gadgets are /built into/ the car, and thoroughly integrated (with the car and all the other gadgets). Inevitably, you end up with a heavyweight car, but that's in the nature of the car. In the pursuit of the software engineering goals that are associated with the fact that big software is written by teams of programmers who are fallible humans, Ada is (rightly) designed to have as many facilities built in which reasonably can be. There are many things built into Ada that are add-ons to most other programming languages, and there are some significant practical advantages to this fact. So when saying something like "Oh, but consider the distributed overhead," think about what kind of car we're trying to design here. I know how tricky it is building optimisations into a compiler, especially when you're trying very hard not to reduce the quality (reliability) of its code generation. But I think the time has come and gone when optimisation was an optional feature of compilers. Expectations have moved on. To return to the issue in this thread, Tucker was presumably saying that the feature I suggested -- that access values referencing empty (zero size) objects should have distinct values for different objects -- would add a speed or memory penalty to all implementations, on the basis that access values would have two kinds of value, rather than just one (an address). I said, more than once, that any compiler could eliminate the overhead (associated with having two kinds of value) for any access type that it could (statically) deduce could never reference an empty object. I hesitate to use the word 'trivial' (perhaps it's fair to say that no optimisation is trivial), but it doesn't seem overly complicated. And I'm fairly sure that the vast majority of access types would turn out to be readily optimisable in this way. I don't want Ada to become a hopelessly overweight language, but it is already a heavyweight language, compared to the typical alternatives, and we should be proud of it. I think that shrinking from adding even small extra complexities to the language at this point would be falling between two stools. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 8:38 AM Tucker wrote: > In re-reading 4.5.2(12), in comparison with 4.5.2(13), > it certainly seems "reasonable" to conclude there is > no requirement that access-to-object values be *un*equal if > they designate different objects. This is a very "creative" reading indeed. ;-) Methinks you're trying to make it say what you *want* it to say. Pascal points out (in response to Randy) that "if" often means "if and only if" in the RM. How one can tell in general whether a given "if" means just "if" is an interesting question, but in *this* case it clearly means "if and only if". I rely on Robert's rule that the RM never says anything silly: If "if" meant just "if" here, then an implementation that always returned True for "=" on *all* access types would be correct, which is patently ridiculous (and by Robert's rule, language designers never say ridiculous things. ;-)) I am very much opposed to an "=" operator that returns True in some cases, but in other cases, returns True or False at the whim of the implementer. Here's a typical singly-linked list walk: while L /= null loop ... L := L.all.Next; end loop; Your interpretation of 4.5.2(12) means that L /= null can be False when L is not null, which is clearly nonsensical, and would break many programs. Therefore, this particular "if" must mean "if and only if", and we don't need an AI to say so, because it's obvious. You say, "in comparison with 4.5.2(13)", but I think you read too much into that comparison. Para 13 is written that way because of the special permission to return the wrong answer in some cases, for access-to-procedure. As I recall, this special permission was granted at the request of one particular vendor (a vendor that no longer supports Ada, in fact). It's poor language design, and I said so at the time, and you agreed, but we all gave in to the vendor. In any case, para 12 has no such permission, and is therefore written in plain English. >...It would also be > reasonable to read "if" as "if and only if" so we > clearly have an ambiguity which deserves clarification by > the ARG. No. "If and only if" is *clearly* the intended meaning. So any relaxation of the requirements on "=" would be a language change -- an inadvisable language change, in my opinion. > One way Randy and others have helped illuminate such > issues in the past has been by writing up an example > and seeing what various compilers do with it. If all > compilers agree, then that is an argument for solidifying > the interpretation that way. If the compilers don't > agree, then the general approach is to disambiguate in > favor of least implementation disruption, unless there > is a compelling user requirement to the contrary. That approach isn't much help when you're proposing that it be implementation dependent whether "=" returns True or False in certain cases. Whatever the results of the experiment, they support the contention that "it is implementation dependent whether blah blah blah..." > Here, we have heard some user arguments in favor of > requiring inequality if the designated objects are > different, but it has not reached the level of > "compelling" in most readers' minds, apparently. It is compelling in my mind. > So... > > Here is a simple example to illustrate the issue. > Please give this a try and see what your favorite > compiler produces. One of ours produces False,False,False. > Another produces False, True, True. And you find this kind of implementation variability acceptable? I don't. I think one of our implementations has a bug. Pascal's favorite compiler does not. Nick reports that GNAT does. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 8:38 AM I disagree here with some of the points made by Tucker, Pascal, and Randy. I don't like the idea of changing the meaning of "=" on access values, and I don't buy the claim that the semantics is currently ill specified. And I see no distributed overhead. Anyway, here's my answer to Adam Beneschan's original question: > The Allocate routine in System.Storage_Pools declares the > Size_In_Storage_Elements parameter to be of type > System.Storage_Elements.Storage_Count, which is a subtype whose lower > bound is zero. > > My question: When defining a new type derived from Root_Storage_Pool, > and providing a Allocate routine for that type, is the Allocate > routine expected to correctly handle the case when this parameter is > zero? No. First of all, a user-defined allocator can do anything it likes. For example, it is legitimate to say: procedure Allocate(...) is pragma Assert(Size_In_Storage_Elements = 12); or: procedure Allocate(...); -- If Size_In_Storage_Elements is not 12, this will behave in an -- unpredictable manner, ... demons ... noses ... In fact, the whole point of having this feature in the language is so that users can take advantage of special properties of the objects being allocated (for efficiency reasons). I have written a number of storage pool types of this nature. They have documented restrictions on what the client can do with them (all objects must be the same size, all object must be small, all objects must have a particular alignment, etc). Two access values that designate distinct objects must be unequal. The most sensible way for an implementation to do this, in my opinion, is to check for zero size at the call site. If size = 0, pass 1 instead. This is more efficient than having Allocate deal with it, because in 99.99% of all cases, the size is known at compile time to be zero or nonzero. (OK, I admit it, I made up that 99.99% statistic. But I think it's not too far wrong.) Thus, the compiler can do the check at compile time, so there is no cost at run time. Note that the size need not be known at compile time. For example, if a dynamic array is being allocated with run-time-known size, and if dope is stored with the array (the usual case), we know at compile time that more than zero bytes are being allocated, even though we don't know how many. This optimization is trivial, so I'm surprised to hear the various compiler writers whining about it. I was going to say that we should change the RM to *require* this behavior. That is, require that 0 never be passed to Allocate, so that it must be dealt with at the call site (either by turning size=0 into size=1, or by more esoteric tricks). However, I now believe that this is *already* required, though the argument is somewhat subtle. Here's the contract for Allocate (AARM-13.11(21)): 21 {erroneous execution (cause) [partial]} If Storage_Pool is specified for an access type, then if Allocate can satisfy the request, it should allocate a contiguous block of memory, and return the address of the first storage element in Storage_Address. The block should contain Size_In_Storage_Elements storage elements, and should be aligned according to Alignment. The allocated storage should not be used for any other purpose while the pool element remains in existence. If the request cannot be satisfied, then Allocate should propagate an exception [(such as Storage_Error)]. If Allocate behaves in any other manner, then the program execution is erroneous. If Allocate obeys this contract, then the implementation must ensure that the semantics of access types is obeyed, including "=" on zero-sized things. The contract does *not* require that Allocate return unique addresses; it just requires that the block of storage not overlap anything else, which is trivially true for a zero-sized block. Therefore, the implementation must arrange for "=" to work on access-to-zero-sized things, and the simplest way to do that would be to never pass in 0 (pass 1 instead). I think we should change the RM to require that more clearly. > Consider this: > > procedure Try (Pool : in System.Storage_Pools.Root_Storage_Pool'Class; > N : in Integer) is > > type Arr is array (Natural range <>) of Character; > type Arr_P is access Arr; > for Arr_P'Storage_Pool use Pool; > > P : Arr_P; > P2 : Arr_P; > begin > P := new Arr (1 .. N); > P2 := new Arr (1 .. N); > end Try; > > Suppose we call Try(0). > > We need to "allocate" a block of memory of size zero. Of course, P > and P2 have to be set to something that is not null, and they can't be > equal to each other. Correct. However, I claim this is the responsibility of the implementation, not of the user-defined Allocate procedure. > Which of the following is true? In order for things to be portable, > one or the other of these has to be true (and perhaps ought to be > stated in the RM or AARM): > > (1) Allocate routines are not expected to deal with > Size_In_Storage_Elements = 0. Compilers that generate (possibly > dispatching) calls to Allocate must be careful to make sure the > parameter is not zero. (They might, for instance, use a parameter > of 1 in a case such as the above, or they might invoke a separate > mechanism to deal with zero-length allocations.) True. > (2) Compilers are allowed to generate calls to Allocate with > Size_In_Storage_Elements = 0; Technically true, but inadvisable, because then the implementation would have to stand on its ear and spit nickles in order to implement "=" correctly. >... Allocate routines are expected to > handle them appropriately. False; that is, according to RM-13.11(21), Allocate of 0 size can return the same address as some other allocate (in the past, or in the future, zero-sized or not). **************************************************************** From: Nick Roberts Sent: Thursday, August 28, 2003 10:47 AM > I disagree here with some of the points made by Tucker, Pascal, > and Randy. I don't like the idea of changing the meaning of "=" on > access values, and I don't buy the claim that the semantics is > currently ill specified. And I see no distributed overhead. I would add that it is still my contention that it is /important/ that "=" always returns False for access values that reference different objects, particularly for a designated type that is limited, but possibly also for other designated types. I agree with Bob's interpretation of the 'if' in 4.5.2(12), but I think adherence to this interpretation matters (rather than just being a bureaucratic nicety). > Two access values that designate distinct objects must be > unequal. The most sensible way for an implementation to do this > in my opinion, is to check for zero size at the call site. If size = 0, > pass 1 instead. I'm not too keen on this idea. I have a suspicion that it could possibly break certain algorithms in the absence of either explicit deallocation or garbage collection. I'd prefer a technique that allocated no (heap) memory at all. What algorithms? I'm thinking of a generic unit that gets passed two integer formal parameters, let's say N1 and N2. It may be a legitimate combination, in a limiting case, for say N1=0 and N2=huge, and it may be that the algorithm involves allocating an object of size N1*k in a loop that iterates N2 times. You may scoff, but I think it would be a shame for such an algorithm to fail because it runs out of heap space. It would also be a shame if the algorithm was too slow in this case because of time spent allocating or deallocating memory (that would be obviated by using a technique that did not allocate any heap at all). I might invoke the Principle of Least Surprise. Is it not going to be surprising to find that the access values referencing two different objects compare equal? Is it not going to be surprising to find that allocating an object of zero size uses up some heap space? > > (2) Compilers are allowed to generate calls to Allocate with > > Size_In_Storage_Elements = 0; > > Technically true, but inadvisable, because then the implementation > would have to stand on its ear and spit nickles in order to > implement "=" correctly. Point taken, but I'm suggesting that this is a case where it's better for compilers to (have to) perform the aforementioned feats of aural specie progenesis than to (be allowed to) surprise the user. Although admitedly, a compiler standing on its ear and spitting nickles might be a /little/ bit surprising. **************************************************************** From: Adam Beneschan Sent: Thursday, August 28, 2003 11:25 AM Bob Duff wrote: > I was going to say that we should change the RM to *require* this > behavior. That is, require that 0 never be passed to Allocate, so that > it must be dealt with at the call site (either by turning size=0 into > size=1, or by more esoteric tricks). > > However, I now believe that this is *already* required, though the > argument is somewhat subtle. Here's the contract for Allocate > (AARM-13.11(21)): Tucker didn't see it that way. And to me, having two heavyweights disagreeing on the correct interpretation of an RM paragraph is, in itself, compelling evidence that the RM needs to be "changed", or at least reworded to clarify the meaning. ... > > (2) Compilers are allowed to generate calls to Allocate with > > Size_In_Storage_Elements = 0; > > Technically true, but inadvisable, because then the implementation would > have to stand on its ear and spit nickles in order to implement "=" > correctly. > > >... Allocate routines are expected to > > handle them appropriately. > > False; that is, according to RM-13.11(21), Allocate of 0 size can return > the same address as some other allocate (in the past, or in the future, > zero-sized or not). I don't see how (2) can be partly true (even "technically true") and partly false. If the compiler is allowed to generate code passing zero to the allocate routine when a zero-size object is allocated, and Allocate is allowed to raise Storage_Error or do any fool thing it pleases when it gets a zero size parameter, then how can we expect things to work? You might as well say that program execution is erroneous if you call an allocator on a zero-sized object. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 11:44 AM Tucker wrote: > [I have always found C's concern about zero-length objects > to be misplaced. It is often natural to have zero-length > arrays, and even records, but in C you have to go out > of your way sometimes to be sure you don't try to define > a zero-length array, or else the compiler, if ANSI/ISO > compliant, will slap your wrists. I agree with all that, but it seems to argue in favor of making zero-sized things work properly -- the opposite of what you're arguing. A common mistake in language design is to fail to allow for the zero case. For example, Ada's 'while' and 'for' loops are *better* than Fortran's DO loop, because the DO loop doesn't properly deal with zero-trip loops. (I don't know modern Fortran well; I'm talking about FORTRAN 66.) Another example: it is a flaw in Ada that you can't write a zero-length (or one-length!) positional array aggregate. >... I have heard the argument > relates to pointer equality, but again, what is the big > deal if pointers to two logically distinct objects > happen to be equal?] Well, C is different, because everything is aliased. But anyway, I think you and Pascal and Randy are falling into the trap of designing the language to make the compiler-writer happy. Please everybody take off their compiler-writer hats, and put on their language-designer hats, and pay attention to what user's need. Users need a mathematically clean semantics. Not little hacks like "equality might not work on some implementations under certain circumstances". Equality of access types is all about object identity. This is why X=Y is different from X.all=Y.all. It seems very important to me that X=Y returns true if AND ONLY IF the designated objects are the same object (or both null). It doesn't matter where the access values came from ("new" or "X'Access") -- if they designate distinct objects, then the access values should be unequal. Nick Roberts wrote: > It still seems clear to me that an access value is not just a representative > of a storage space (as an address is), but rather it is also a token > representing a (conceptual, as well as actual) object. Possibly this point > is only properly relevant to access-to-limited-type types, but I feel that > when you say "new My_Limited_Type", you don't just mean "allocate some space > on the heap, please", you also mean "create a new token to represent a > distinct object". That second meaning should hold true even if the object > happens to have zero size. Well said. The argument seems to be that access-to-zero-size is rare, so who cares what "=" does in that case. If that were true, "=" should raise an exception, rather than returning a wrong answer. It would be worse to return a correct answer on some implementations, and a wrong answer on others. I understand that zero-sized things are rare, and we don't want to pay distributed overhead for their existence, but I see no reason to believe that "=" is rarer on access-to-zero-size than it is on other access values. On the contrary: there's no data at the end of that pointer to fiddle with, so just about the only thing you want to do is "=". In any case, proving that some feature is rare is not a proof that it's pathological. That's what causes nasty bugs: rare cases that do something unusual. We don't need proof that this feature is useful (although Nick Roberts made a pretty good argument to that effect). To change the rules, we need a proof that this feature is useless, of which I've seen none. Back to Tuck: > So I would probably recommend that 4.5.2(12) explicitly > allow for the possibility that access values might be > equal even if they designate logically distinct objects, > so long as the objects have no updatable components. Sorry, Tuck, but this seems even crazier than the zero-size idea, to me. I thought the problem was that you don't want to have the overhead of saying "if size=0 then allocate 1 byte instead". But without that check, access-to-zero-size can end up "=" to access-to-nonzero-size, which the above rule doesn't cover. And the above rule violates the principle that object identity is different from equality of the designated values. It seems crazy to allow an implementation to make a hash table and return equal access values for new objects whose designated values happen to be equal (a pessimization!). Nick Roberts wrote: > I really need someone to come galloping to my rescue at this point. At your service. I see no reason for any distributed overhead. I see two potentials for distributed overhead. (1) "new" needs to somehow check for the zero-size case, and do something different. Elsewhere, I claimed that this is trivially optimized away in 99.99% of cases. (2) All aliased objects need to be allocated non-zero space. I think it's silly to worry about this one, since zero-sized objects are rare, but if you want to worry about it, fairly simple compiler optimizations can eliminate this overhead. Nick outlined how. Tuck said he didn't understand, and Pascal said it involved overhead on every dereference, and then perhaps Nick confused the issue by talking about all manner of complicated optimizations, which are unnecessary, I think. In fact, I think Nick's idea is just fine, and need not have any overhead. An address is some (say) 32-bit value. Clearly, there are many bit patterns that do not represent actual storage locations for objects of the type in question. A "new" of a zero-sized object just needs to return a unique one of those. Surely that's trivial. For example, "new (zero-sized-thing)" could return an address pointing into the code segment. A unique one. Pascal and Nick were worried about extra instructions executed on every dereference. But I think that's unnecessary, and it falls out easily without fancy compiler work. Three cases: - The designated object is known at compile time to be zero size. Deref does nothing. - The designated object is known at compile time to *not* be zero size. This is the usual case, as I have argued elsewhere. Clearly, nothing special need be done in this case. - The size of the designated object is dynamic, and might be zero. Here, dereferencing involves either a no-op (in case of pass-by-ref), or a copy of N bytes or N words or whatever, where N is dynamic. Well, copying N bytes/words where N happens to be 0 is a no-op, and does not touch the designated memory. No special check for zero is needed -- just the usual loop, checking whether we're done copying N bytes. So Nick's optimization is trivial, I claim. I also claim that even this trivial work is a waste of energy, given the rarity of zero-sized *aliased* objects. So where's this horrible distributed overhead that makes everybody want to define "=" on access values to do something other than compare object identities?! **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 12:00 PM Robert A Duff wrote: > > Tucker wrote: > > > In re-reading 4.5.2(12), in comparison with 4.5.2(13), > > it certainly seems "reasonable" to conclude there is > > no requirement that access-to-object values be *un*equal if > > they designate different objects. > > This is a very "creative" reading indeed. ;-) Methinks you're trying to > make it say what you *want* it to say. No, I truly wasn't. It seems careful to say that *if* they designate the same object, then they are equal. > I am very much opposed to an "=" operator that returns True in some > cases, but in other cases, returns True or False at the whim of the > implementer. Here's a typical singly-linked list walk: > > while L /= null loop > ... > L := L.all.Next; > end loop; > > Your interpretation of 4.5.2(12) means that L /= null can be False when > L is not null, which is clearly nonsensical, and would break many > programs... That's not actually what I said. I said 4.5.2(12) was ambiguous, and I wasn't certain how it should be interpreted. I never in my wildest dreams interpreted it as meaning that a null access value could ever be equal to an access value that designated an object, even a zero-size object. > ... > >...It would also be > > reasonable to read "if" as "if and only if" so we > > clearly have an ambiguity which deserves clarification by > > the ARG. > > No. "If and only if" is *clearly* the intended meaning. > So any relaxation of the requirements on "=" would be a language > change -- an inadvisable language change, in my opinion. Well here I don't agree. The intended meaning is not completely clear (to me) in the context of neighboring paragraphs, and zero-sized objects and access-to-constant values are unusual enough that I suspect they were not carefully considered for the definition of access-value equality. (I mention access-to-constant values here because I could imagine an implementation that shares storage between two access-to-constant allocators that create constants with identical values. This kind of sharing is quite common for string literals in C, which are essentially like access-to-constant-Strings in Ada.) > > One way Randy and others have helped illuminate such > > issues in the past has been by writing up an example > > and seeing what various compilers do with it. If all > > compilers agree, then that is an argument for solidifying > > the interpretation that way. If the compilers don't > > agree, then the general approach is to disambiguate in > > favor of least implementation disruption, unless there > > is a compelling user requirement to the contrary. > > That approach isn't much help when you're proposing that it be > implementation dependent whether "=" returns True or False in certain > cases. Whatever the results of the experiment, they support the > contention that "it is implementation dependent whether blah blah > blah..." Alternatively, the results support the contention that code that has been ported to several compilers must not be critically dependent on the answer, and so it is overspecification to say what happens for zero-sized objects. > > > Here, we have heard some user arguments in favor of > > requiring inequality if the designated objects are > > different, but it has not reached the level of > > "compelling" in most readers' minds, apparently. > > It is compelling in my mind. What is "compelling"? That the "if" was intended to mean "if and only if" in this paragraph, or that there is useful code that depends on access values designating distinct zero-sized objects having unequal values? I find the case of aliased components that happen to be of zero-size a pretty convincing argument for not requiring such objects to have unique access values. > > > So... > > > > Here is a simple example to illustrate the issue. > > Please give this a try and see what your favorite > > compiler produces. One of ours produces False,False,False. > > Another produces False, True, True. > > And you find this kind of implementation variability acceptable? Yes. Comparing addresses of objects is fundamentally implementation-dependent, especially zero-sized objects. I find the C rule that no object can be of zero size a big waste of energy, and forcing the implementation to do it for aliased objects in Ada seems a similar waste of energy. You and I have generally had different comfort levels with implementation variation. I believe the language should not specify things that users shouldn't care about to begin with, such as order of evaluation of parameters, precision of intermediates, etc. Java made an effort to specify all of these things exactly, but that seems misguided since Java is inherently a multi-threaded languages, and relative rates of progress of threads can't be specified. Furthermore, their efforts to specify precision of floating point intermediates ultimately ended up causing serious performance problems for X86 implementations, and so they have pretty much dropped it. I think I was particularly influenced by Djikstra's book "A Discipline of Programming" on the value of not overspecifying semantics that shouldn't matter to the user. In the language in that book, a multi-arm "if" or "do"-loop allowed overlapping conditions, in which case it was implementation-dependent which "arm" would be chosen. He shows that this actually simplifies proofs in some cases. [An interesting counter example to our general preferences is that I tend to always use short-circuit logical operations, whereas you seem to prefer symmetrical, order-unspecified, "and"s and "or"s.] > I don't. I think one of our implementations has a bug. Pascal's > favorite compiler does not. Though Pascal seems to regret having to do the funny business with aliased components to make it work the way it does. > Nick reports that GNAT does. Given that we supply front ends to Aonix and Green Hills, this means that the "big 4" vendors, ACT, Aonix, Green Hills, and Rational all might have different behavior with respect to access values designating zero-sized objects. This can't be something that is too critical to a user who cares about portability across compilers, which after all, is what standards are all about. Standards should specify enough to provide useful portability, but without overspecifying things that are of little or no interest to users. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 1:05 PM Then you need to make an argument that it is naughty to depend on the result of "=" on access-to-zero-sized things. I have seen no such argument -- just complaining from compiler vendors about distributed overhead (which I argued in another message is not a real issue; I'd like to hear your response). Not an argument that it's *rare* to do such a thing. An argument that it is *bad practise* in the same sense as depending on order of eval and the like. In other words, please explain what is special about the zero-size case that makes object identity irrelevant or evil. **************************************************************** From: Tucker Taft Sent: Thursday, August 28, 2003 2:04 PM > Then you need to make an argument that it is naughty to depend on the > result of "=" on access-to-zero-sized things. Well "naughty" is hard to define. It seems pointless. A user could depend on all kinds of things. Why should they expect that zero-sized objects occupy non-zero space? It seems especially mysterious that an array of zero-sized objects is not itself of zero size, if the components are declared "aliased." And how "big" should zero-sized aliased components be? Can users rely on zero-sized aliased components being exactly one addressible unit in size, or one word in size, or one gigabyte in size? > ... I have seen no such > argument -- just complaining from compiler vendors about distributed > overhead (which I argued in another message is not a real issue; I'd > like to hear your response). It seems that if you have two consecutive aliased components in a record, or an array of aliased components, whose size is not known at compile-time, then there will have to be code at run-time to calculate the size, and if zero, bump it up to something non-zero. Similarly, you will need to worry about the gaps that doing this creates, so that equality comparison will know not to use block-compare, or you will have to initialize the gaps. Every allocator for an array which might be of zero size will need code to check for this special case. This seems like distributed overhead. We can argue about whether it is significant, but it seems clear that it will affect all array allocations where the high bound is dynamic and might be less than the low bound. > Not an argument that it's *rare* to do such a thing. An argument that > it is *bad practise* in the same sense as depending on order of eval and > the like. In other words, please explain what is special about the > zero-size case that makes object identity irrelevant or evil. It's bad practice to depend on things which are not necessarily true ;-). When it comes to zero-sized objects, it seems bad practice to assume they actually take up heap space. Or alternatively, it seems like bad practice to assume that logically distinct objects must have distinct access values, if it would be in all other ways OK for them to share the same address (e.g., because they are of zero size, or they are both acc-to-constant values and they designate constants with identical values). Your argument seems to rest on the fact that it is mathematically cleaner to be able to say that if two objects are distinct, then their access values are not equal. But that seems like a tautology. Suppose I claim it is mathematically cleaner to make no such assumption. Neither of us really have an argument here. Ultimately I think we have to recognize that access value equality is testing exactly whether two objects have the same address, no more, and no less. I am willing to agree that a given object has one and only one address, and so all access values designating a given object must be equal, but I don't see any mathematical requirement that two different objects not share the same address. It is sort of whether we choose the Fermion or Boson statistics ;-). **************************************************************** From: Randy Brukardt Sent: Thursday, August 28, 2003 3:01 PM Bob said: > I see no reason for any distributed overhead. I see two potentials for > distributed overhead. (1) "new" needs to somehow check for the > zero-size case, and do something different. Elsewhere, I claimed that > this is trivially optimized away in 99.99% of cases. This is false. It was true in Ada 83, but since the pool interface is exposed to the user in Ada 95, you cannot depend on the calls having the correct form. (Janus/Ada 83 not only prevented 0 as a size [which crashed the assembly code that implemented the heap manager], but also rounded sizes up to the next alignment boundary [usually 4]. But for Ada 95, we had to move that code into the pool, because a user-written call could violate the assumptions, and we didn't want the system to crash in that case. The only way to do anything at the compiler level would be to have a separate interface for the standard heap, but that seems like overkill (why have multiple ways of allocating memory?) and is certainly distributed overhead in the compiler (you'd have to decide which method to use on every allocation and deallocation). > (2) All aliased > objects need to be allocated non-zero space. I think it's silly to > worry about this one, since zero-sized objects are rare, but if you want > to worry about it, fairly simple compiler optimizations can eliminate > this overhead. The trouble is that you don't know at compile-time that the objects have zero size. The usual problem is that you have an array subtype that is dynamically-sized and might be zero. You then need distributed overhead to handle check whether it is zero and somehow allocate more than the normal memory. But, as we all agree, this case will be rare, yet it will occur on every dynamic component. > ... Nick outlined how. Tuck said he didn't understand, and > Pascal said it involved overhead on every dereference, and then perhaps > Nick confused the issue by talking about all manner of complicated > optimizations, which are unnecessary, I think. >... > - The size of the designated object is dynamic, and might be zero. > Here, dereferencing involves either a no-op (in case of > pass-by-ref), or a copy of N bytes or N words or whatever, > where N is dynamic. Well, copying N bytes/words where N happens > to be 0 is a no-op, and does not touch the designated memory. > No special check for zero is needed -- just the usual loop, > checking whether we're done copying N bytes. This is also false. Simply setting up the addressing for a block move could cause an address fault on common processors. I do agree that if you make up the move as a loop of very simple instructions, it wouldn't have that property, but that alone is a distributed (size) overhead -- such loops are quite a bit larger than the single instruction that they are replacing. I don't know that I care enough about this issue -- no matter what is decided, I'm not going to change how Janus/Ada deals with these cases, and I rather doubt that anyone else is, either. (There is no user complaint here, only hot air.) I'm not going to force objects to be larger or bloat program size just so Nick and Bob can have a "mathematically pure" language. **************************************************************** From: Nick Roberts Sent: Thursday, August 28, 2003 4:49 PM "Randy Brukardt" wrote: > I don't know that I care enough about this issue -- no matter > what is decided, I'm not going to change how Janus/Ada deals > with these cases, and I rather doubt that anyone else is, either. > (There is no user complaint here, only hot air.) I'm not going to > force objects to be larger or bloat program size just so Nick > and Bob can have a "mathematically pure" language. I going to try one more time to convince people that issue is not just hot air. Consider the following example, which I hope doesn't seem too far fetched. type A_String is access String; function "+" (S: String) return A_String is begin return new String'(S); end; type A_String_Array is array (Positive range <>) of String; Zoo: constant A_String_Array := (+"aardvark", +"baboon", ... ); This much is taken directly from Barnes [Programming in Ada 95 4th ed. p 184]. Later on he sets an exercise for the reader to print out this ragged array. procedure Print (A: in A_String_Array) is Sentinel: constant A_String := A(A'Last); Current: Positive := A'First; begin loop Put_Line( A(Current).all ); exit when A(Current) = Sentinel; Current := Current + 1; end loop; end; Of course, this is not the most obvious implementation, but it appears correct (provided it is not passed an empty array), and differences of detail could prompt this kind of implementation in a real case. I suggest it might puzzle, and perhaps dismay, many a programmer when: Print( (+"foo",+"bar",+"",+"") ); prints only three lines, or, worse, that: Print( (+"foo",+"",+"bar",+"") ); prints only two lines. I appreciate that Janus/Ada would never commit either of these sins, but I think we've established that some other compilers probably would commit the first one, and possibly the second. Out of interest, I wonder what Janus/Ada (and other compilers) would do if the first declaration were changed to: type A_String is access constant String; especially if the following declarations and call were made: X1: constant aliased String := "foo"; X2: constant aliased String := "bar"; X3: constant aliased String := ""; X4: constant aliased String := ""; Print( (X1'Access,X2'Access,X3'Access,X4'Access) ); As an aside, I suggest it might be even more puzzling (and dismaying) for: Print( (+"foo",+"bar",+"hum",+"foo") ); to print just one line. **************************************************************** From: Robert A. Duff Sent: Thursday, August 28, 2003 12:23 PM > Tucker didn't see it that way. And to me, having two heavyweights > disagreeing on the correct interpretation of an RM paragraph is, in > itself, compelling evidence that the RM needs to be "changed", or at > least reworded to clarify the meaning. Well, maybe. I'm hoping to convince Tuck (and Pascal and Randy) that they were wrong on this point. > > > (2) Compilers are allowed to generate calls to Allocate with > > > Size_In_Storage_Elements = 0; > > > > Technically true, but inadvisable, because then the implementation would > > have to stand on its ear and spit nickles in order to implement "=" > > correctly. > > > > >... Allocate routines are expected to > > > handle them appropriately. > > > > False; that is, according to RM-13.11(21), Allocate of 0 size can return > > the same address as some other allocate (in the past, or in the future, > > zero-sized or not). > > I don't see how (2) can be partly true (even "technically true") and > partly false. If the compiler is allowed to generate code passing > zero to the allocate routine when a zero-size object is allocated, and > Allocate is allowed to raise Storage_Error or do any fool thing it > pleases when it gets a zero size parameter, then how can we expect > things to work? You might as well say that program execution is > erroneous if you call an allocator on a zero-sized object. (2) can be partly true because that's what the RM says (clearly, I would claim -- once I studied the relevant paragraphs for hours. ;-)) The RM does not forbid the implementation from passing zero. The RM does not require Allocate to return unique addresses. The RM does require equality to work. Therefore, the implementation must use some means of making equality work (in the zero case) that does not rely on the Allocate routine returning unique addresses. That's all kind of silly; I think the RM should explicitly forbid the implementation to pass 0 to Allocate. The contract for Allocate says it can do one of three things, if the size is 0: 1. Raise an exception. In this case, the "new" propagates the same exception. 2. Return the address of a block of memory that does not overlap anything else. Since a zero-sized block does not overlap anything else, any address will do, including the same address returned for some other Allocate call, whether zero or not. In this case, the implementation is required to make equality work. 3. Anything else. In this case, the program is erroneous if zero is passed in. But that's OK -- there is presumably a comment telling clients "don't do that". **************************************************************** From: Adam Beneschan Sent: Thursday, August 28, 2003 9:44 PM ... > Well, maybe. I'm hoping to convince Tuck (and Pascal and Randy) > that they were wrong on this point. I don't see how that helps. If Tuck did get it wrong, then how can we expect a poor less-elevated user, who's writing his own storage pool manager and trying to read RM to figure out whether they need to write their Allocate routine to handle Size_In_Storage values of zero, to get it right? I still maintain that whatever the result of this discussion, the RM needs to address this issue. ... > (2) can be partly true because that's what the RM says (clearly, I would > claim -- once I studied the relevant paragraphs for hours. ;-)) Perhaps I should have made it clear that I was asking about what the RM ought to say, rather than what it does say. If it actually says anything to the effect that (2) is partly true and partly false, it's a bug. **************************************************************** From: Robert Duff Sent: Saturday, August 30, 2003 8:48 AM I feel like this argument is going around in circles, and I find it confusing. I'm not sure, for example, exactly what Tuck, Pascal, and Randy believe the semantics of "=" on access values ought to be. So let me start by summarizing *my* position: 1. I think it is essential that access "=" reflect object identity. That is, X = Y if and only if both are null or else X.all and Y.all are the same object. I think object identity is clearly defined: each elaboration of an object declaration creates an object (and perhaps some component objects), each evaluation of "new" creates a new object, etc. 2. I think the RM already says that, quite clearly. 3. I don't care if this requires distributed overhead, since this is the only possible sensible semantics for "=". 4. However, I don't *like* distributed overhead, and would like to eliminate it. The appropriate way to do that is through compiler optimizations, not by damaging the language definition by putting in error-prone rules. I claim that simple compiler optimizations can eliminate the distributed overhead in all but a vanishingly small number of cases. Nick outlined the basic scheme, but he made it sound a lot more complicated, and a lot less efficient, than it needs to be. Now, please, all of you summarize your positions. I think we all agree that "=" should return True if the two objects are the same object. (Do we at least agree on what "same object" means -- see above?) I think it should return False if they are not the same object. You three seem to think it should be allowed to return either True or False (impl-dep) in some circumstances; please precisely describe those circumstances. Your description cannot refer to "the size allocated", since that's ill defined. You can refer to 'Size, or 'Length of an array, or "null record", or "range 10..10", or other well defined concepts. ---------------- It seems to me that if "=" can return either True or False under certain circumstances, then the programmer must avoid calling "=" under those circumstances. Do you agree? If so, then "=" really ought to raise an exception under those circumstances, since calling it is a bug. If not, please produce an example of a program that calls "=" and doesn't care about the answer under those circumstances. ---------------- A point you seem to be missing is that to avoid *all* distributed overhead, you need to allow an access-to-zero-sized thing to be equal to an access-to-nonzero-sized thing (which happens to be allocated just before, or just after the access-to-zero-sized thing). This would mean that X=Y does not imply X.all=Y.all, a consequence I find unacceptable. Please clarify whether the "circumstances" include this case. Tuck wrote: > Robert A Duff wrote: > > > > Tuck wrote: > > > > > I believe the language should not specify things that > > > users shouldn't care about to begin with, such as order of evaluation > > > of parameters, precision of intermediates, etc. > > > > Then you need to make an argument that it is naughty to depend on the > > result of "=" on access-to-zero-sized things. > > Well "naughty" is hard to define. That's my point. Your whole argument is based on the premise that it is naughty to depend on the result of "=" (in some circumstances that I don't fully understand). If you can't explain why that's true (other than it being slightly less efficient), then your argument doesn't hold water. My point is that the burden of proof is on you. If you can't make a strong argument that object identity is an evil thing to depend on (under some circumstances), then you have no business making rules that will cause bugs if the programmer *does* ask about object identity (under those circumstances). You also need to counter Nick's examples. If those are totally bogus, you need to explain why, and also give some realistic example of zero-sized objects of any sort. I could claim that 1_298_484_999 is a very unusual number, and that therefore multiplying it by 2 should have impl-def semantics. But the burden of proof would be on me, to prove that 1_298_484_999 is totally useless, and in fact pathological. I'm not allowed to challenge folks to prove that some useful program actually uses that number. >...It seems pointless. Not sure what "it" refers to here. Defining "naughty"? Depending on the result of "="? If the latter, please explain why. >... A user > could depend on all kinds of things. Yes, of course. The RM is all about defining those things! >... Why should they expect > that zero-sized objects occupy non-zero space? They should not. Nor should they expect that zero-sized objects occupy zero space. As you well know, the RM does not define how much space various object will occupy. As you also well know, most aliased zero-sized arrays do *not* occupy zero space, because the subtype is usually unconstrained, and most compilers allocate dope with the object in that case. Consider access-to-Character. Some folks might be surprised that each character takes up 8 bytes, because the default allocator assumes max alignment. That seems like a reasonably efficiency trade-off, to me, since access-to-Character and the like is rare. Likewise, access-to-String will generally waste a little bit, by aligning to 8 bytes. (Talking about the heap here.) >...It seems especially > mysterious that an array of zero-sized objects is not itself > of zero size, if the components are declared "aliased." A user might also find it mysterious that a packed array of 32 Booleans takes more than 32 bits. That's another possibly-surprising effect of "aliased". Shrug. Being surprised at minor efficiency properties seems much less dangerous than being surprised by the "=". > And how "big" should zero-sized aliased components be? > Can users rely on zero-sized aliased components being exactly > one addressible unit in size, or one word in size, or one > gigabyte in size? No, of course not. The RM says nothing about how much space is occupied by objects. If X is of type Integer, X could occupy a gigabyte (according to the RM), whether aliased or not. If you don't like that (I don't!), then make your compiler allocate 32 bits for X. But that's not an issue for the RM. > > ... I have seen no such > > argument -- just complaining from compiler vendors about distributed > > overhead (which I argued in another message is not a real issue; I'd > > like to hear your response). > > It seems that if you have two consecutive aliased components in a record, > or an array of aliased components, whose size is not known at compile-time, > then there will have to be code at run-time to calculate the size, and if zero, > bump it up to something non-zero. Similarly, you will need to worry about the > gaps that doing this creates, so that equality comparison will > know not to use block-compare, or you will have to initialize the > gaps. Every allocator for an array which might be of zero size will > need code to check for this special case. This seems like distributed > overhead. Yes, there is some overhead in some cases. But those cases are pretty rare, I think. >...We can argue about whether it is significant, but it seems > clear that it will affect all array allocations where the high bound is dynamic > and might be less than the low bound. Nowhere near that many cases. The component subtype of the array has to be of dynamic size, which usually implies some dope or discriminants, so the size is known at compile time to be non-zero. Write up an example -- I think it's not nearly as common as you imply. > > Not an argument that it's *rare* to do such a thing. An argument that > > it is *bad practise* in the same sense as depending on order of eval and > > the like. In other words, please explain what is special about the > > zero-size case that makes object identity irrelevant or evil. > > It's bad practice to depend on things which are not necessarily true ;-). I see the smiley, and I hope it means "I realize this argument is totally bogus". ;-) This argument says it's OK for the language to define that 2+2 is either 4 or 5, depending on the whim of the compiler writer, and it is therefore naughty to depend on the result of 2+2. It's true that if there were such a language, it would be "bad practice" to add 2 and 2, but that's hardly a defense of that language! This argument says that any time a user finds a bug in a compiler, they should avoid that feature. No! I hope they report the bug, and I hope the compiler vendor fixes it. > When it comes to zero-sized objects, it seems bad practice to assume > they actually take up heap space. Agreed. >...Or alternatively, it seems like bad > practice to assume that logically distinct objects must have distinct > access values, if it would be in all other ways OK for them to share the same > address (e.g., because they are of zero size, or they are both acc-to-constant > values and they designate constants with identical values). I presume you also intend to include the case where the only data in the objects is discriminants or bounds dope. Anyway, I disagree. And I claim that you must offer some *reason* for this belief, before you damage the semantics of "=". It doesn't "seem like bad practice" to me, at all. It seems perfectly reasonable. > Your argument seems to rest on the fact that it is mathematically cleaner > to be able to say that if two objects are distinct, then their access > values are not equal. Yes. >...But that seems like a tautology. Tautologies are obviously true, so I'm not sure what you mean, here. >...Suppose I claim it is > mathematically cleaner to make no such assumption. Then I would be confused as to why we have "=" on access values in the language in the first place. Of what use is "=" if one can't make assumptions about it?! >...Neither of us really > have an argument here. I think I do, but more importantly, I claim the burden of proof is on you. >... Ultimately I think we have to recognize that access > value equality is testing exactly whether two objects have the same address, > no more, and no less. No, that is a completely wrong view of access types. They are *not* simply addresses of memory locations. And should not be. An implementation can of course take that point of view, but it needs to make sure the high-level semantics, which does not mention addresses, is obeyed. Analogy: enumeration types are implemented as machine integers, but they are *not* integers. Multiplication of them makes no sense, for example. >... I am willing to agree that a given object has one > and only one address, and so all access values designating a given object > must be equal, but I don't see any mathematical requirement that two different > objects not share the same address. Nor do I, but I see a requirement that those two objects have different *identity*, which is what access-"=" is all about. This is not some mathematical blathering -- I'm being very practical here, in insisting that object identity is paramount. I simply can't imagine how write anything but bugs, if "=" is impl def, presuming I use "=". And I can't imagine why I *shouldn't* use "=", since it seems to provide useful information. Clearly, two different unaliased objects can have the same address, if one of them has zero size. And an aliased object can have the same address as an object of an unrelated type, or an address in the code segment, because then "=" would not apply. This is what Nick's idea is all about. >...It is sort of whether we choose the > Fermion or Boson statistics ;-). You seem to have mistaken me for a physicist. ;-) Sorry, but I haven't the foggiest idea what "Fermion or Boson statistics" means. Would you care to post a lesson in physics? Sounds interesting. **************************************************************** From: Robert Duff Sent: Saturday, August 30, 2003 8:50 AM Adam says: > Bob Duff wrote: > > > > Tucker didn't see it that way. And to me, having two heavyweights > > > disagreeing on the correct interpretation of an RM paragraph is, in > > > itself, compelling evidence that the RM needs to be "changed", or at > > > least reworded to clarify the meaning. > > > > Well, maybe. I'm hoping to convince Tuck (and Pascal and Randy) ^^^^^^ > > that they were wrong on this point. I should not have included Pascal there. Pascal seems to agree with me on what the language definition currently *says*. He just doesn't like it. Tuck and Randy are the ones claiming the RM says what they like, or at least is ambiguous and could sensibly be interpreted that way. > I don't see how that helps. If Tuck did get it wrong, then how can we > expect a poor less-elevated user, who's writing his own storage pool > manager and trying to read RM to figure out whether they need to write > their Allocate routine to handle Size_In_Storage values of zero, to > get it right? I still maintain that whatever the result of this > discussion, the RM needs to address this issue. Fine -- I have no objection to clarifying the RM. I do object to Tuck's claim that the language is currently unclear on the semantics of "=" and therefore it's fair game to make up some semantics out of whole cloth. > > (2) can be partly true because that's what the RM says (clearly, I would > > claim -- once I studied the relevant paragraphs for hours. ;-)) > > Perhaps I should have made it clear that I was asking about what the > RM ought to say, rather than what it does say. If it actually says > anything to the effect that (2) is partly true and partly false, it's > a bug. I agree. I think the RM *should* say that zero-size must not be passed to Allocate. That would make it crystal clear that user-defined Allocate functions need not do anything special for zero size. I claim that the RM *currently* says that zero can be passed, but that the implementation cannot then rely on unique addresses being returned, which is confusing at best. **************************************************************** From: Robert Duff Sent: Saturday, August 30, 2003 9:10 AM I feel like I'm using giant cannons to attack a mosquito, here. After all, zero-sized objects are pretty rare, and we're talking about *aliased* zero-sized objects, which are even rarer. Sorry about that, I hope this doesn't offend, but I find this mosquito to be annoying, so here is another giant cannon attack. ;-) Tuck wrote: > Robert A Duff wrote: > > > > Tucker wrote: > > > > > In re-reading 4.5.2(12), in comparison with 4.5.2(13), > > > it certainly seems "reasonable" to conclude there is > > > no requirement that access-to-object values be *un*equal if > > > they designate different objects. > > > > This is a very "creative" reading indeed. ;-) Methinks you're trying to > > make it say what you *want* it to say. > > No, I truly wasn't. It seems careful to say that *if* they designate > the same object, then they are equal. ...and makes no other restriction on the "only if" case. > > I am very much opposed to an "=" operator that returns True in some > > cases, but in other cases, returns True or False at the whim of the > > implementer. Here's a typical singly-linked list walk: > > > > while L /= null loop > > ... > > L := L.all.Next; > > end loop; > > > > Your interpretation of 4.5.2(12) means that L /= null can be False when > > L is not null, which is clearly nonsensical, and would break many > > programs... > > That's not actually what I said. Sorry, I thought you were saying that "if" means "if but not necessarily only if" in the paragraph in question. The above example is a logical consequence of that interpretation. > In re-reading 4.5.2(12), in comparison with 4.5.2(13), > it certainly seems "reasonable" to conclude there is > no requirement that access-to-object values be *un*equal if > they designate different objects. OK, I see you didn't mention null (though I don't know why -- the "if" we're arguing about applies to that case just as much). So basically, you're saying that this paragraph could be interpreted to mean that if X and Y are both nonnull, and designate distinct objects, then the result of X=Y is implementation defined? That interpretation can't be right either, since it leads to equally ridiculous consequences. It directly implies that "=" can return True so long as both operands are nonnull. >... I said 4.5.2(12) was ambiguous, > and I wasn't certain how it should be interpreted. I never in my > wildest dreams interpreted it as meaning that a null access value could > ever be equal to an access value that designated an object, even > a zero-size object. But if you don't believe "if" means "if and only if" here, then that's exactly what it says, despite the lack of wild dreams. > > >...It would also be > > > reasonable to read "if" as "if and only if" so we > > > clearly have an ambiguity which deserves clarification by > > > the ARG. > > > > No. "If and only if" is *clearly* the intended meaning. > > So any relaxation of the requirements on "=" would be a language > > change -- an inadvisable language change, in my opinion. > > Well here I don't agree. The intended meaning is not completely > clear (to me) in the context of neighboring paragraphs, and zero-sized > objects and access-to-constant values are unusual enough > that I suspect they were not carefully considered for the > definition of access-value equality. (I mention access-to-constant > values here because I could imagine an implementation that shares > storage between two access-to-constant allocators that create > constants with identical values. This kind of sharing is quite common > for string literals in C, which are essentially like access-to-constant-Strings > in Ada.) Obviously, I can't contradict you if you claim that you don't understand that paragraph; I must take you at your word. But I see no occurrences of "zero" or "constant" in the words, so I must assume you're basing your interpretation on something other than the words as written. The words as written seem clear to me. The word "if" could mean "if and only if" or "if (but not necessarily only if)". The latter leads to ridiculous consequences, so it must mean "if and only if". If you don't like the implementation consequences, why don't you just say so, instead of claiming that "if" means "if but not if blah blah zero blah blah access-to-constant"? The author of this paragraph (which was almost certainly either you, Tuck, or me, I don't remember) did not intend it to mean anything different for the zero-or-const case, or he would have said so. Any "interpretation" that mentions zero-sized objects and acc-to-const and the like is sheer invention, since there are no such words in there. I agree that the author, you or me, might not have been thinking specifically of the implementation-oriented cases you're worried about. > > > One way Randy and others have helped illuminate such > > > issues in the past has been by writing up an example > > > and seeing what various compilers do with it. If all > > > compilers agree, then that is an argument for solidifying > > > the interpretation that way. If the compilers don't > > > agree, then the general approach is to disambiguate in > > > favor of least implementation disruption, unless there > > > is a compelling user requirement to the contrary. > > > > That approach isn't much help when you're proposing that it be > > implementation dependent whether "=" returns True or False in certain > > cases. Whatever the results of the experiment, they support the > > contention that "it is implementation dependent whether blah blah > > blah..." > > Alternatively, the results support the contention that code that has > been ported to several compilers must not be critically dependent > on the answer, and so it is overspecification to say what happens > for zero-sized objects. This makes no sense to me. The fact that compilers behave differently is in no way evidence that compilers ought to be allowed to behave differently. > > > Here, we have heard some user arguments in favor of > > > requiring inequality if the designated objects are > > > different, but it has not reached the level of > > > "compelling" in most readers' minds, apparently. > > > > It is compelling in my mind. > > What is "compelling"? That the "if" was intended to mean "if and only if" > in this paragraph, or that there is useful code that depends on access values > designating distinct zero-sized objects having unequal values? Sorry. I meant that "some user arguments in favor of requiring inequality if the designated objects are different" are compelling (to me). Namely, Nick's arguments, which you have yet to refute. > I find the case of aliased components that happen to be of zero-size a > pretty convincing argument for not requiring such objects to have unique > access values. > > > > > > So... > > > > > > Here is a simple example to illustrate the issue. > > > Please give this a try and see what your favorite > > > compiler produces. One of ours produces False,False,False. > > > Another produces False, True, True. > > > > And you find this kind of implementation variability acceptable? > > Yes. Comparing addresses of objects is fundamentally implementation-dependent, > especially zero-sized objects. I agree that comparing addresses is "fundamentally implementation-dependent". But we're not talking about comparing addresses. We're talking about comparing access values, i.e. object identities. You continue to confuse the two levels of abstraction. And you have yet to explain why zero-sized objects are special (in user terms, please). >... I find the C rule that no object > can be of zero size a big waste of energy, and forcing the implementation > to do it for aliased objects in Ada seems a similar waste of energy. I agree that it's a pain that C doesn't allow zero-length arrays. But I fail to see how this is relevant: Ada allows zero-length arrays. In C, everything is aliased. In Ada, if you don't want the space overhead, you can just avoid making the thing aliased. > You and I have generally had different comfort levels with implementation > variation. I believe the language should not specify things that > users shouldn't care about to begin with, such as order of evaluation > of parameters, precision of intermediates, etc. True -- we have somewhat different philosophies with respect to nondeterminism. But I don't think that difference of opinion applies here. We both agree that a program should not rely on things like order of eval in: + and we both agree that it would be desirable to *prove* at compile time that the effect of the program does not depend on such order of eval (and we're working on a tool that can do just that!) We disagree on what to do about it, in cases where such a proof is infeasible. But that seems like a minor disagreement. In *this* case, we are disagreeing on whether a reasonable program should rely on object-identity semantics for "=". That's more fundamental. >...Java made an effort > to specify all of these things exactly, but that seems misguided since > Java is inherently a multi-threaded languages, and relative rates of > progress of threads can't be specified. Furthermore, their efforts to > specify precision of floating point intermediates ultimately ended > up causing serious performance problems for X86 implementations, and > so they have pretty much dropped it. Granted -- Java went overboard. But that doesn't mean the result of 2+2 should be nondeterministic. Nor does it mean that object identity should be nondeterministic. > I think I was particularly influenced by Djikstra's book "A Discipline > of Programming" on the value of not overspecifying semantics that > shouldn't matter to the user. In the language in that book, a multi-arm > "if" or "do"-loop allowed overlapping conditions, in which case it > was implementation-dependent which "arm" would be chosen. He shows > that this actually simplifies proofs in some cases. I think you are misapplying Djikstra's ideas here. In such a case, as I recall, Djikstra would *prove* that the different arms of the "if" don't matter to the end result. But you are *assuming* that the result of "=" doesn't matter in certain cases, without proof. > [An interesting counter example to our general preferences is that > I tend to always use short-circuit logical operations, whereas you > seem to prefer symmetrical, order-unspecified, "and"s and "or"s.] Interesting to a psychologist. Maybe we can get one to study us? ;-) I guess I view "Computer Science" as a branch of psychology, more than as a branch of mathematics. For the record, I use "and then" only when the second half makes no sense when the first half is False, otherwise "and". I think it should be the compiler's job to figure out which order is more efficient, and whether one or the other can be avoided. (And, I realize it can't always, so I sometimes use "and then" when I know the compiler has no clue, and I grumble about it.) I'm waxing overly philosophical here... > > I don't. I think one of our implementations has a bug. Pascal's > > favorite compiler does not. > > Though Pascal seems to regret having to do the funny business with > aliased components to make it work the way it does. Right. Perhaps Pascal agrees with you on what the RM *should* say, but he agrees with me on what it *does* say. > > Nick reports that GNAT does. > > Given that we supply front ends to Aonix and Green Hills, this means > that the "big 4" vendors, ACT, Aonix, Green Hills, and Rational all > might have different behavior with respect to access values designating > zero-sized objects. This can't be something that is too critical > to a user who cares about portability across compilers, which after > all, is what standards are all about. Standards should specify enough > to provide useful portability, but without overspecifying things that > are of little or no interest to users. Again, you're arguing that current variability in implementations is proof that it's good to have variability in implementations. **************************************************************** From: Pascal Leroy Sent: Sunday, August 31, 2003 3:56 AM Bob wrote: >I should not have included Pascal there. Pascal seems to agree with me >on what the language definition currently *says*. He just doesn't like >it. Well, Pascal has actually changed his mind after reading the hundreds of messages that you wrote on this topic, Bob. I now agree that an access value ought to be an object identity, and that two distinct objects ought to be designated by two distinct object values (I also agree that the identity of objects, and the phrase "distinct objects" are actually well-defined by the RM). I have always been convinced that "if" meant "if and only if" in this wretched paragraph. As you point out, any other interpretation leads to nonsense, plague and pestilence. What I am now saying is that this paragraph is right as it is and there is no reason to change it (although evidently a confirmation/ramification AI could be useful). I find Nick's example particularly compelling, by the way. While John's programming style is a bit peculiar here, the use of sentinels is a very common programming practice, and I can imagine that, for slightly more elaborate data structures, there would be no other way to program such a piece of code. And at any rate, peculiar or not, it is entirely legitimate for a programmer to assume that this style has to "work right". Because access types are at a higher level of abstractions than addresses, they should be free of the idiosyncrasies that may affect addresses depending on a variety of implementation-dependent choices. Incidentally, I think that this interpretation is already implied by the RM as it is: 3.10(1) says that "A value of an access type provides indirect access to the object or subprogram it designates". Note the singular in the second part of the sentence: it doesn't say "to one of the 50 objects it might designate". Implementation-wise, I am not sure that your model is as simple as you think, and anyway we are not going to change our model unless some paying customer makes a ruckus. We'll continue to pay the distributed price of one extra byte in some cases. Shrug. The thing that bothers me a bit more is that composites that contain potentially zero-sized aliased objects will need to be compared component-by-component (an alternative would be to initialize the extra byte, but we don't want to go that way). But I am willing to pay that price to have access types work right. Incidentally our compiler is not bug-free, and tests a bit more complex than the one that Tuck sent show situations where distinct objects end up at the same address (and therefore, are designated by the same access value). This will have to be fixed. **************************************************************** From: Alexander E. Kopilovich Sent: Sunday, August 31, 2003 11:34 AM Robert A Duff wrote: > I agree that comparing addresses is "fundamentally implementation-dependent". > But we're not talking about comparing addresses. We're talking about > comparing access values, i.e. object identities. You continue to > confuse the two levels of abstraction. > > And you have yet to explain why zero-sized objects are special (in user > terms, please). From the mathematical viewpoint there is clear opportunity for special treatment of object identity in the case of zero-sized objects: while non-empty sets easily may be isomorphic but non-identical, two empty sets usually are regarded as identical, even if they are results of entirely different construction process. The same for other categories (topological spaces, algebras etc.). So, a zero-sized object may be considered as analog of empty set (that is, a zero-sized object of a type corresponds to the "null" universal object in a category) and that is one possible explanation of the kind you requested. **************************************************************** From: Pascal Leroy Sent: Monday, September 1, 2003 2:59 AM I fail to see the analogy. Zero-sized objects often arise from sets that have exactly one value, as in: type T1 is (Foo); -- The one value is Foo. subtype T2 is String (1..0); -- The one value is "". so the comparison with empty sets sounds entirely irrelevant to me. (True, zero-sized objects also arise from empty sets, as in: type T3 is range 10..9; but surely we don't want the semantics of the language to be different in these two cases.) **************************************************************** From: Alexander E. Kopilovich Sent: Monday, September 1, 2003 10:40 AM > I fail to see the analogy. Zero-sized objects often arise from sets > that have exactly one value, as in: > > type T1 is (Foo); -- The one value is Foo. This is just because you represent the elements of a set by subsets of another set (bit string representation is exactly the predicate function for a subset), and an empty set has exactly one subset (itself). Therefore you take an empty set (zero-length bit string) when you need only one subset, > subtype T2 is String (1..0); -- The one value is "". This is quite straighforward: a string is array = ordered set of characters, and null (empty) string is naturally an empty (ordered -:) set of characters. **************************************************************** From: Robert A. Duff Sent: Monday, September 1, 2003 9:24 AM Well, maybe you could use such reasoning to say that an access to a zero-sized value is equal to all other access to zero-sized values (even if the types don't match?!), and no others. But what's been proposed is that "=" is *nondeterministic* if both access values point at zero-sized objects (and perhaps other cases). I don't know of any branch of mathematics where it is "implementation dependent" whether or not two empty sets are equal. ;-) **************************************************************** From: Alexander E. Kopilovich Sent: Tuesday, September 2, 2003 2:39 PM > Well, maybe you could use such reasoning to say that an access to a > zero-sized value is equal to all other access to zero-sized values (even > if the types don't match?!), No, the types must match within that logic, otherwise we are comparing objects from different categories, which is meaningless. Well, you may map both objects by the erasing functors to objects of underlying category and compare the images, but it will be very bad practice to do that implicitly. So, at most you may permit explicit canonical conversion of zero-sized object to zero-sized object of another type. > and no others. But what's been proposed is >that "=" is *nondeterministic* if both access values point at zero-sized >objects (and perhaps other cases). I think that nondeternism is caused (quite common case) by the presence of several different logics (that is, more than one), and each of them is somehow justified. This usually means that there are several different prototype real-world situations, which are similar in external view, but require different approach if you want to be adequate enough. In other words, the general notion used for these situations is not established enough (and perhaps cannot be established without paying too much complexity cost for that generality). I guess that a pragma can select a logic for a particular application... and perhaps it will be good to prevent comparsions between accesses to constant objects (I mean the case when both objects for comparison are constant) in the absense of that pragma. > I don't know of any branch of > mathematics where it is "implementation dependent" whether or not two > empty sets are equal. ;-) In mathematics it should be said "depends upon representation", I think -:) . Well, we are discussing not comparisons between objects - (empty) sets, but comparisons between *references* to them, So, it can easily be two different paths in a diagram to the same object - (empty) set, and all depends upon what exactly we want to compare, how we understand identity for our purposes. **************************************************************** From: Robert A. Duff Sent: Monday, September 1, 2003 9:58 AM > Well, Pascal has actually changed his mind after reading the hundreds of > messages that you wrote on this topic, Bob. Sorry to beat you over the head with all that verbiage. ;-) Here's a little more... I'd like to hear comment from you and others on my suggestion that we change the RM to say that the implementation must not pass 0 as the size to a user-defined Allocate. The implementation model I'm imagining is that it checks for zero at the call site of Allocate (which has a good chance of being optimized away), and either passes 1, or does something like what Nick suggested. This is responsive to the original question -- it clarifies whose responsibility it is to deal with the zero-size issue. > Implementation-wise, I am not sure that your model is as simple as you > think, and anyway we are not going to change our model unless some > paying customer makes a ruckus. I think it's pretty trivial in the allocator case, and can easily be optimized for most allocators. The aliased component case is a little less easily optimized. >... We'll continue to pay the distributed > price of one extra byte in some cases. Shrug. The thing that bothers > me a bit more is that composites that contain potentially zero-sized > aliased objects will need to be compared component-by-component (an > alternative would be to initialize the extra byte, but we don't want to > go that way). If you have a record containing, say, a byte-sized thing followed by a word-sized thing, so that alignment considerations cause your compiler to insert 3 extra bytes in between, do you zero those 3 bytes? It seems to me that the simplest implementation would be to consider the extra bytes introduced by zero-sized things as "gaps" just like those introduced by alignment issues, and treat them the same way. I know of one compiler that allows components of a record extension to be allocated in that 3-byte gap. > Incidentally our compiler is not bug-free, ... I'm shocked! ---- By the way, another issue occurs to me: zero-origin arrays (or "virtual origin"). This means that an access-to-array is represented as the dope (bounds) plus the address of the zero'th component of the array. If that component does not exist, then it points to where A[0] *would* be. This perhaps makes array indexing more efficient because you don't have to subtract off the lower bound. (It causes all manner of trouble for a conservative garbage collector, I would imagine!) Consider: X: aliased String(1..10); Y: aliased String(11..99); How do we know that X[0] and Y[0] are not at the same address? Can X = Y be True in this case? Are virtual-origin arrays an incorrect implementation of Ada, in general? I was under the impression that GNAT used these techniques (at least, I've heard Robert singing their praises), but experiments fail to confirm that. Perhaps someone from ACT could clarify. **************************************************************** From: Pascal Leroy Sent: Tuesday, September 2, 2003 10:28 AM Bob said: > I'd like to hear comment from you and others on my suggestion > that we change the RM to say that the implementation must not > pass 0 as the size to a user-defined Allocate. The > implementation model I'm imagining is that it checks for zero > at the call site of Allocate (which has a good chance of > being optimized away), and either passes 1, or does something > like what Nick suggested. I believe that's what we do, just so that we don't end up with two objects at the same address (and that's done in the generated code, not in the called Allocate routine). > If you have a record containing, say, a byte-sized thing > followed by a word-sized thing, so that alignment > considerations cause your compiler to insert 3 extra bytes in > between, do you zero those 3 bytes? It seems to me that the > simplest implementation would be to consider the extra bytes > introduced by zero-sized things as "gaps" just like those > introduced by alignment issues, and treat them the same way. Yes, I think the analogy works. If we have gaps, we do not zero them, we just go to component-by-component comparison. So we would do the same for the extra byte introduced by zero-size component. (On the other hand we try hard not to have gaps, so in the example you describe we would put the word-size component _before_ the byte-sized component.) **************************************************************** From: Michael F. Yoder Sent: Tuesday, September 2, 2003 1:03 PM >From the mathematical viewpoint there is clear opportunity for special treatment >of object identity in the case of zero-sized objects: while non-empty sets >easily may be isomorphic but non-identical, two empty sets usually are regarded >as identical, even if they are results of entirely different construction >process. The same for other categories (topological spaces, algebras etc.). >So, a zero-sized object may be considered as analog of empty set (that is, >a zero-sized object of a type corresponds to the "null" universal object in >a category) and that is one possible explanation of the kind you requested. This argument only applies if access values to distinct objects are allowed to be the same when the objects contain identical values. Consider two constant objects with identical values; if this logic for zero-sized objects were followed, why not let these two objects also be at the same address? My counterclaim amounts to saying that zero size *isn't* special here. **************************************************************** From: Robert A. Duff Sent: Tuesday, September 2, 2003 1:16 PM In fact, Tucker suggested just that. And I think he said that C compilers typically do that for (static) string literals. It's what we used to call "literal pooling", except literal pooling is supposed to be invisible. **************************************************************** From: Alexander E. Kopilovich Sent: Tuesday, September 2, 2003 1:20 PM > This argument only applies if access values to distinct objects are > allowed to be the same when the objects contain identical values. For constant objects only. And moreover, those object must be constant at the implementation level. There must be true constants, without any tricks with views etc. > Consider two constant objects with identical values; if this logic for > zero-sized objects were followed, why not let these two objects also be > at the same address? I suspect that this actually happens sometimes (for example for string literals declared in separate compilation units) at the linker level. >My counterclaim amounts to saying that zero size *isn't* special here. Zero size is special at least in that it automatically implies constant object, even if the object was not declared as constant. **************************************************************** From: Robert A. Duff Sent: Tuesday, September 2, 2003 1:26 PM > (On the other hand we try hard not to have gaps, so in the example you > describe we would put the word-size component _before_ the byte-sized > component.) OK; makes sense. If you want, you can try even harder. For example: type Zero_Size is (Red); type T is record X, Y: Integer; A, B: aliased Zero_Size; end record; You could allocate A at the same place as X, and B one byte past that (in the middle of X). This would guarantee A'Access and B'Access are unique, since X'Access is illegal (and even if it were legal, it would be of the wrong type to compare "=" with A'Access). And it would mean zero wasted space for A and B. You would need to make sure that writes on A and B don't really do anything. I'm not saying this is a worthwhile optimization. I would go to extra trouble to do it if and only if a paying customer actually wanted it. I must admit that I can't think of any zero-cost way to allocate zero storage to all arrays of zero-sized component subtype, in general, which is perhaps what Tuck wants. **************************************************************** From: Robert A. Duff Sent: Tuesday, September 2, 2003 1:52 PM > > I see no reason for any distributed overhead. I see two potentials for > > distributed overhead. (1) "new" needs to somehow check for the > > zero-size case, and do something different. Elsewhere, I claimed that > > this is trivially optimized away in 99.99% of cases. > > This is false. It was true in Ada 83, but since the pool interface is > exposed to the user in Ada 95, you cannot depend on the calls having the > correct form. You mean if the program calls Allocate directly, not via "new"? I don't see any issue there -- it does what it does, and there's nothing in the RM saying Allocate has to do anything in particular. The RM just says that *if* Allocate does so-and-so, then "new" does such-and-such. >... (Janus/Ada 83 not only prevented 0 as a size [which crashed > the assembly code that implemented the heap manager], but also rounded sizes > up to the next alignment boundary [usually 4]. But for Ada 95, we had to > move that code into the pool, because a user-written call could violate the > assumptions, and we didn't want the system to crash in that case. The system can crash in that case (according to the RM). So you're going above and beyond the call of duty, here. No problem, there, but you can't complain about the cost, if it's optional. > > (2) All aliased > > objects need to be allocated non-zero space. I think it's silly to > > worry about this one, since zero-sized objects are rare, but if you want > > to worry about it, fairly simple compiler optimizations can eliminate > > this overhead. > > The trouble is that you don't know at compile-time that the objects have > zero size. Most compilers usually know at compile time that the thing does *not* have zero size (even when you don't know the size at compile time). >...The usual problem is that you have an array subtype that is > dynamically-sized and might be zero. You then need distributed overhead to > handle check whether it is zero and somehow allocate more than the normal > memory. But, as we all agree, this case will be rare, yet it will occur on > every dynamic component. I don't fully understand your run-time model, but I do admit there are some cases with some distributed overhead, in any run-time model. > > ... Nick outlined how. Tuck said he didn't understand, and > > Pascal said it involved overhead on every dereference, and then perhaps > > Nick confused the issue by talking about all manner of complicated > > optimizations, which are unnecessary, I think. > >... > > - The size of the designated object is dynamic, and might be zero. > > Here, dereferencing involves either a no-op (in case of > > pass-by-ref), or a copy of N bytes or N words or whatever, > > where N is dynamic. Well, copying N bytes/words where N happens > > to be 0 is a no-op, and does not touch the designated memory. > > No special check for zero is needed -- just the usual loop, > > checking whether we're done copying N bytes. > > This is also false. Simply setting up the addressing for a block move could > cause an address fault on common processors. I don't know which processors you're talking about. But surely there is *some* set of addresses on every processor that can be safely used in the manner I suggest (actually, Nick suggested it originally). P.S. If this issue is decided in favor of my opinion, I think there really needs to be an ACATS test for it. Otherwise, it is a totally pointless and hollow victory. I would rather say "so-and-so is impl def" rather than "so-and-so is well defined in the RM, but the ACATS don't test it, and the vendors scoff at the idea of implementing it properly". **************************************************************** From: Randy Brukardt Sent: Tuesday, September 2, 2003 6:37 PM Bob said: > Now, please, all of you summarize your positions. I think this whole discussion is a massive waste of time, especially as we're answering a question that no one really asked. I started to try to respond to Bob's many messages, since they are filled with misrepresentations and half-truths (and bad policy) -- to the extent that I cannot believe that they were written by an Ada implementor -- but I realize that I simply cannot afford the time on such a pointless discussion. We have lots of important issues to discuss, AIs to write, etc. But, I'll give a brief summary which I hope will at least make clear why any tightening of the rules here is actively harmful. To summarize Bob's position in very brief terms: 1) "=" of access objects means equality of objects, exactly, with no variation allowed. (That is exactly the case if you insert the missing "only if".) The only exception would be in an erroneous program. 2) We need a rule to make it illegal for a compiler to pass 0 to a storage pool's Allocate routine. Let's look at the second rule first. It is very ugly. We cannot change the specification of Allocate (otherwise, all existing pools would become illegal - a gratious change for a extremely minor issue). Thus, we'd have to have some goofy rule like: "An implementation shall not pass zero to Allocate." This would be a permanent wart on the language; essentially it is an admission that the spec is wrong. In addition, it would require doubling the needed checks for a pool; since direct calls to a pool can be written, a well-written pool still would have to make the check for zero internal to itself. For the 25-50% of allocations where the size is not statically known (array allocations and allocations of generic formal objects are the most likely), this would require checks both before the call and then again in the pool. Since no optimization could ever remove the checks inside the pool, this is just adding overhead with almost no benefit. --- Moving to the first interpretation. This means that we (the ARG) have to decide what "object" is pointed to in any case where the access value is created by a means other than an allocator or 'Access. It would particularly cause trouble for Address_to_Access_Conversions. Consider: type Null_Record is null record; package AAC is System.Address_to_Access_Conversions (Null_Record); subtype Acc is AAC.Object_Pointer; type A_Rec is Comp1, Comp2 : aliased Null_Record; end record; Obj : A_Rec; Comp1_Addr : System.Address := Obj.Comp1'Address; C1 : Acc := Obj.Comp1'Access; C2 : Acc := Obj.Comp2'Access; AC1 : Acc := AAC.To_Pointer(Comp1_Addr); if C1 = AC1 then -- ?? if C2 = AC1 then -- ?? If you follow Bob's rule to the letter, then C1 must equal AC1 and C2 must not equal AC1. In that case the Nick/Bob implementation using phony values becomes incorrect. (And everything said about it and it's overhead are irrelevant.) That's because if could have no way to determine whether C1 or C2 is meant, so no amount of overhead could get the correct answer. [Thus, you're insisting on non-null allocation - which means that A_Rec'Size is a lie. If we had 'Object_Size, we could solve this usefully, but since we cannot have that, we are screwed.] If you say that both of these are always unequal, then you are forcing compilers to use an implementation like Nick's. That clearly is harmful. If you try to say that this result is somehow not well-defined, you'll have to come up with a reason for it. Certainly, the definition of "=" as espoused by Bob offers no such leeway. You could say that all uses of AAC are erroneous - but that would simply prevent the use of the feature (and put us back to the bad old days of Unchecked_Conversion). And it certainly doesn't help anyone for this to be well-defined some of the time. --- Bob also asked: >By the way, another issue occurs to me: zero-origin arrays (or "virtual >origin"). This means that an access-to-array is represented as the dope >(bounds) plus the address of the zero'th component of the array. If >that component does not exist, then it points to where A[0] *would* be. >This perhaps makes array indexing more efficient because you don't have >to subtract off the lower bound. (It causes all manner of trouble for a >conservative garbage collector, I would imagine!) I don't see a problem here. For access-to-array, you compare the addresses of the descriptors, not of the data. The descriptors are unique to the objects (the data may not be, in the case of slices or of constrained-by-discriminant arrays). Then you can use any representation for the pointer to data that you like. The only time that you use the data itself is for an array without a descriptor. In that case, it (if non-null) has to be unique. Indeed, we considered an implementation like that, but abandoned it because you don't always get a legal address that way. Since we insist on no-wraparound math, even on addresses (because the front-end does not know the details of addressing on the target; wrap-around math doesn't work on a 8086, for instance, where you'll just go to another segment), we would have had to use several possible mechanisms. Which conflicted with our desire to keep the code size small. So we always point at A[A'First]. **************************************************************** From: Nick Roberts Sent: Tuesday, September 2, 2003 9:40 PM "Randy Brukardt" wrote: > I think this whole discussion is a massive waste of time, especially as > we're answering a question that no one really asked. But it is a question that arose nevertheless, and I don't think it is a waste of time. To be frank, what I think is a waste of time is the intransigence of you, Randy, and Tucker which is apparently based on a reluctance to make the effort to improve the Ada standard, rather than genuinely on technical arguments. This is the impression I am getting; I hope it is wrong. > I started to try to respond to Bob's many messages, since they are > filled with misrepresentations and half-truths (and bad policy) I think that you, Randy, should enumerate exactly all the instances of misrepresentations and half-truths you accuse Bob of. If not, then I think you are honour bound to retract your accusations. This mailing list is a public forum. > but I realize that I simply cannot afford the time on such a pointless > discussion. We have lots of important issues to discuss, AIs to write, > etc. I would find it acceptable for an officer of the ARG to say that he feels discussion of a certain issue has to be deprioritised, in favour of other more important issues. If that is what you mean, Randy, then I wish you would say it directly, instead of apparently inventing a lot of specious technical arguments in a hopeless effort to dismiss the issue altogether. > We cannot change the specification of Allocate (otherwise, all > existing pools would become illegal - a gratious change for a extremely > minor issue). Thus, we'd have to have some goofy rule like: "An > implementation shall not pass zero to Allocate." I disagree. The following declaration could be harmlessly added to System.Storage_Elements: subtype Positive_Storage_Count is Storage_Count range 1..Storage_Count'Last; and the declaration of Allocate in System.Storage_Pools could be changed to: procedure Allocate( ... Size_In_Storage_Elements : in Storage_Elements.Positive_Storage_Count; ... ) is abstract; This new declaration could be used for all existing implementations of storage pools, with the subtype of the Size_In_Storage_Elements parameter being the only thing that needed to be changed. All existing calls to Allocate which did not pass 0 in this parameter would work correctly, and all calls which did pass 0 would be caught (perhaps quite efficiently). > This would be a permanent wart on the language; essentially > it is an admission that the spec is wrong. I think it is a very ugly suggestion that mistakes in the language (standard) should be covered up rather than openly corrected. Is that how a standards review process is supposed to work? > In addition, it would require doubling the needed checks for a pool; > since direct calls to a pool can be written, a well-written pool still > would have to make the check for zero internal to itself. For the > 25-50% of allocations where the size is not statically known (array > allocations and allocations of generic formal objects are the most > likely), this would require checks both before the call and then > again in the pool. I think my suggestion (changing the subtype of the Size_In_Storage_Elements parameter) would generally avoid the two checks you suggest here. > Moving to the first interpretation. This means that we (the ARG) > have to decide what "object" is pointed to in any case where the > access value is created by a means other than an allocator or > 'Access. It would particularly cause trouble for > Address_to_Access_Conversions. It would only cause trouble for Address_to_Access_Conversions. It seems highly unconvinving to me to base a refutation of (the standard prescribing) the strict behaviour of a major element of the language (access types) on a problem with a very minor part of the language (Address_to_Access_Conversions). In the case of an access value being generated by the To_Pointer function of an instantiation of System.Address_to_Access_Conversions, I would be perfectly happy for the definition of object identity to be based on address. That is, in the case of such a conversion, equal addresses are considered to reference the same object (and the resulting access values must be equal), and unequal addresses are considered to reference different objects (and the resulting access values must be unequal). In the other, much more common, cases of the generation of access values, I maintain that it is important that the equality of access values always matches the differentiation of objects. > If you try to say that this result is somehow not well-defined, > you'll have to come up with a reason for it The reason is as follows. Nearly always, the reason why Address_to_Access_Conversions needs to be used is for interfacing with C (or C++) code. As has been noted, C (and C++) has parity between address equality and object identity, because it doesn't allow null objects. Thus we can use the same characteristic for Address_to_Access_Conversions. I doubt there will ever be any case where the semantics I suggest would fail. Neither you, Randy, nor Tucker has even attempted to produce a refutation of the examples I gave, illustrating, to my mind, why strict access equality is important. I still believe that the lack of this strictness could cause a real Ada program to fail. Although I admit I have failed to produce an actual example of this, that is hardly proof that it could not happen. I think my examples do show that the possibility of this situation cannot be lightly dismissed. I suggest that, in cases where there is a conflict poised between strict correctness and implementation expediency, the decision should be in favour of strict correctness. Isn't it better for a program to work slowly than to fail quickly? In fact, I'm saying that I think it is better (although regrettable) for thousands of programs to work slightly slower, than for one deployed program to fail; Sod's law dictates that the program would be flying an aeroplane, or running somebody's heart monitor. Finally, I hope that my comments on this issue will be taken in the good humour they are intended. **************************************************************** From: Randy Brukardt Sent: Wednesday, September 3, 2003 12:31 AM Nick said: > > I started to try to respond to Bob's many messages, since they are > > filled with misrepresentations and half-truths (and bad policy) > > I think that you, Randy, should enumerate exactly all the instances of > misrepresentations and half-truths you accuse Bob of. If not, then I think > you are honour bound to retract your accusations. This mailing list is a > public forum. It would take at least 4 hours of my time to properly respond to Bob's messages (and the messages of others) and probably twice that to respond to responses. (It took 2.5 hours to write the message I did send, and another 45 minutes on this one.) I simply don't have the time. I covered the most important problems in the message I sent, but there are a number more. Some of it is simply that we disagree on the basic ideas (about what is statically determinable, for instance), and there is no real hope of a resolution. > > but I realize that I simply cannot afford the time on such a pointless > > discussion. We have lots of important issues to discuss, AIs to write, > > etc. > > I would find it acceptable for an officer of the ARG to say that he feels > discussion of a certain issue has to be deprioritised, in favour of other > more important issues. If that is what you mean, Randy, then I wish you > would say it directly, instead of apparently inventing a lot of specious > technical arguments in a hopeless effort to dismiss the issue altogether. Well, nothing I've said is "specious". If you think so, fine, you're welcome to your opinion. > > We cannot change the specification of Allocate (otherwise, all > > existing pools would become illegal - a gratuitous change for a extremely > > minor issue). Thus, we'd have to have some goofy rule like: "An > > implementation shall not pass zero to Allocate." > > I disagree. The following declaration could be harmlessly added to > System.Storage_Elements: > > subtype Positive_Storage_Count is > Storage_Count range 1..Storage_Count'Last; > > and the declaration of Allocate in System.Storage_Pools could be > changed to: > > procedure Allocate( > ... > Size_In_Storage_Elements : > in Storage_Elements.Positive_Storage_Count; > ... ) is abstract; Changing the specification of Allocate would be incompatible with all existing storage pool code. (The specifications would fail the subtype conformance check.) Worse, any code changed to use the new definition would be incompatible with Ada 95 compilers (for the same reason). That's not going to happen, certainly not for a minor issue such as this. If it wasn't for compatibility, I would be in favor of the change you suggest - precisely because it don't have the problems that Bob's solution would. But compatibility is not a negotiable item. > > If you try to say that this result is somehow not well-defined, > > you'll have to come up with a reason for it > > The reason is as follows. Nearly always, the reason why > Address_to_Access_Conversions needs to be used is for interfacing with C (or > C++) code. As has been noted, C (and C++) has parity between address > equality and object identity, because it doesn't allow null objects. Thus we > can use the same characteristic for Address_to_Access_Conversions. I doubt > there will ever be any case where the semantics I suggest would fail. This doesn't make any sense from an standards perspective. Please try to explain it in terms of the standard. And, in any case, only lousy C interfacing uses AAC. Ada 95 provides decent ways to avoid using addresses in interface code. I would think it is more useful in accessing hardware. > Neither you, Randy, nor Tucker has even attempted to produce a refutation of > the examples I gave, illustrating, to my mind, why strict access equality is > important. There is no point in doing so. My argument is that the implementation overhead of the strict rule is excessive. I've provided numerous examples and reasons for that. Clearly, some people seem to think that they would rather pay that cost. Fine. Get your implementer to do it. In any case, I was swayed by your example to the point that rule I proposed only gave the permission to types that the designated type had size 0. I agree we don't want it happening for access-to-unconstrained. But I think that types with a designated size 0 are going to be vanishingly small (and really represent a bug). I'd prefer to outlaw them altogether, but that would bring up the compatibility issue again. (Not a real one in this case.) ... > In fact, I'm saying that I think it is better (although regrettable) for > thousands of programs to work slightly slower, than for one deployed program > to fail; Sod's law dictates that the program would be flying an aeroplane, > or running somebody's heart monitor. That's understandable, but also potentially a violation of "performance compatibility". We've spent a lot of effort on more important issues insuring that Ada 200Y changes do not impact the performance of existing code. But note that this is not as absolute. In any case, I've come around to the position that no change here is best. I don't for a moment think that any real user will ever care what the resolution of this discussion is -- which is the classic definition of a waste of time. So this is the last I'll say on this topic until and unless it comes up on the ARG agenda. (Other than I'll give a reply to Bob if he wants one.) **************************************************************** From: Tucker Taft Sent: Thursday, September 4, 2003 11:59 AM Robert A Duff wrote: > I feel like this argument is going around in circles, and I find it > confusing. I'm not sure, for example, exactly what Tuck, Pascal, and > Randy believe the semantics of "=" on access values ought to be. So let > me start by summarizing *my* position: > > 1. I think it is essential that access "=" reflect object identity. > That is, X = Y if and only if both are null or else X.all and > Y.all are the same object. I think object identity is clearly > defined: each elaboration of an object declaration creates an > object (and perhaps some component objects), each evaluation of > "new" creates a new object, etc. > > 2. I think the RM already says that, quite clearly. I don't happen to agree with this. I believe that access values are primarily a way to reference objects indirectly. I don't feel that there is a requirement that they represent the "identity" of an object, especially if the designated object has 'Size of 0. I accept that others could interpret 4.5.2(12) that way, but I think there are enough underlying problems with a strict interpretation of "access-value as identity" that we have to take an "if and only if" interpretation of 4.5.2(12) more of as an indication of what is true in the "typical" case, with some thorny unmentioned details to get it exactly right. I also don't see any requirement in 4.8 or 3.3.1 that when I create an object, I allocate it its own unique address, nor its own unique access value, nor even its own unique memory cells. Clearly an access value must identify its designated object. I see no requirement that it doesn't happen to equal some other access value that designates some other object created at some other time. For technical arguments, the existence of address-to-access-value conversion seems to make it clear that access values can be created (by conversion from an address) that work just fine, but which happen to equal another access value, if the object whose 'address was taken happens to be of zero size, or is a slice of some larger object. In addition, it is clear that it is possible that the result of an allocator can equal an access value that designates an object to which Unchecked_Deallocation has been applied. There is no restriction in 13.11.2 on using "=" with an access value that designates a freed object. The only restriction is that you cannot dereference it. As far as what I would recommend, if we make any attempt to "firm up" the definition in this area, I would replace 4.5.2(12) with: Two access-to-object values are equal if they designate the same object, or if both are the result of evaluating the literal NULL. Two access-to-object values (of the same type) are unequal if one is the result of evaluating the literal NULL, and the other is the result of evaluating an allocator, 'Access, or 'Unchecked_Access, or if both are the result of evaluating an allocator, 'Access, or 'Unchecked_Access, and the designated objects are not the same object, provided each designated object is an existing variable object of Size greater than zero. For other cases it is unspecified whether two access values are equal or unequal. To put it another way, an implementation should make no special effort to make "access-value = identity." This paragraph sets a limit on what a user may assume, with the intent of imposing no restriction on a "natural, efficient" implementation of access values. This means that access-to-constants may be equal even if the designated constants are created by distinct allocators, and access values produced by address-to-access-value conversion do not have to obey any special rules. It means that if a user algorithm depends on access values being *unequal,* then they must be access-to-variable with designated object's Size greater than zero. This is easy enough to arrange, and so the user should easily be able to accommodate it if this use of access value equality is important. From a "mathematical" point of view, if two aliased objects are the same, then they have the same access value, but the contrapositive is not necessarily true; two distinct objects may have the same access value. I would claim that to correctly handle access values produced by address-to-access-conversion and access values that designate freed objects, some additional wording is required in 4.5.2(12) no matter what rule we adopt. Also, 4.5.2(12) uses "equality" in the definition of equality ("... if both are *equal* to the null access value"). But I also agree with Randy that the best thing would probably be to drop this whole effort, and rule ACATS tests involving equality of access values with zero-sized designated objects, or with potentially sharable access-to-constants, as being "pathological" and not worth an iota of implementor effort. >>...It is sort of whether we choose the >>Fermion or Boson statistics ;-). > > > You seem to have mistaken me for a physicist. ;-) Sorry, but I haven't > the foggiest idea what "Fermion or Boson statistics" means. Would you > care to post a lesson in physics? Sounds interesting. In particle physics, all subatomic particles are either fermions (spin 1/2, 3/2, ...) or bosons (spin 0, 1, 2, ...). The Pauli exclusion principle says that two Fermions (e.g. electrons) cannot have exactly the same state (they obey "Fermi" statistics), whereas two Bosons (e.g photons) can (they obey "Bose-Einstein" statistics). Hence, if we think of designated objects as Bosons, then they can have the same access value, whereas if they are Fermions, then they must have distinct access values. ;-) **************************************************************** From: Nick Roberts Sent: Tuesday, September 9, 2003 5:34 PM "Tucker Taft" wrote: > As far as what I would recommend, if we make any > attempt to "firm up" the definition in this area, I > would replace 4.5.2(12) with: > > Two access-to-object values are equal if they > designate the same object, or if both are the result > of evaluating the literal NULL. Two access-to-object > values (of the same type) are unequal if one is the result of > evaluating the literal NULL, and the other is the result of evaluating > an allocator, 'Access, or 'Unchecked_Access, or if both > are the result of evaluating an allocator, 'Access, or > 'Unchecked_Access, and the designated objects are not the > same object, provided each designated object is an existing variable > object of Size greater than zero. For other cases it is unspecified > whether two access values are equal or unequal. Yes, please. **************************************************************** From: Pascal Leroy Sent: Wednesday, September 10, 2003 2:30 AM I am surprised to see that Nick is satisfied with this wording, but I will adamantly oppose the last part, starting at "provided...". **************************************************************** From: Nick Roberts Sent: Wednesday, September 10, 2003 9:03 AM I am still opposed to allowing equality of access values that refer to different objects, but I would rather see this much more explicit form of wording in the RM200X than what already exists. At least it would serve as some warning (of a potential nasty surprise). I am not opposed to non-existing objects (which have been de-allocated explicitly using Unchecked_Deallocation) being excused. I am happy that the semantics of 'dangling' access values (esp. how they compare for equality) remain implementation defined. [I suspect that Pascal did not intend otherwise.] I ran a little quiz on comp.lang.ada to get people's reaction to the idea that Print( (...,...,...,...) ); [see my examples in previous post] might print something other than four lines. Of course there were only a few replies, but the obvious conclusion was that the newcomers to Ada were surprised but the old hands were not. Incidentally, purely out of interest, I had a look in the ARM 83, and it [at 4.5.2(4)] contains almost exactly the same wording as RM95 4.5.2 (12). It would seem the RM95 inherited the wording. **************************************************************** From: Pascal Leroy Sent: Wednesday, September 10, 2003 9:49 AM Right, but this is already wildly erroneous (RM95 13.11.2(16)) so of course we don't mandate "good behavior" in this case. **************************************************************** From: Tucker Taft Sent: Wednesday, September 10, 2003 12:26 PM It is not erroneous (and definitely not "wildly" erroneous ;-) to use an access value that designates a freed object in an equality operator. 13.11.2(16) only disallows using names that denote freed objects, which would require a dereference of the access value (implicit or explicit ".all"). You also have to decide what to do about access values produced by address-to-access-value conversion. What are the rules for equality involving one of them? So I believe to be precise, this paragraph needs to at least provide exemptions for these two cases. I don't see sufficient benefit in zero-'size objects to justify any implementation expense in worrying about them w.r.t. equality, so *if* we want to make this paragraph more precise, I am arguing for exempting zero-'size objects as well. **************************************************************** From: Robert A. Duff Sent: Sunday, October 19, 2003 3:48 PM Sorry to reraise this issue, which some folks may be tired of talking about. But a few weeks ago Randy told me he was filing an AI on the subject, and he asked me to address a couple of issues I had failed to address in my previous diatribes. Since it's now an AI, I'm including the arg mailing list -- I *hope* that's the right thing to do. [For those who haven't seen this long thread, somebody asked whether a user-defined Allocate routine needs to ensure that every call returns a distinct Address, even if the size to be allocated is zero. Tucker then suggested that X = Y should be (or is) allowed to return either True or False in some cases. Several different suggestions for *which* cases were proposed. I claimed that equality on access types is defined in terms of object identity, not addresses, and that it would damage the language to change that fact.] First of all, Tucker *seemed* to be claiming that 4.5.2(12) *could* be interpreted to mean that zero-sized objects are a special case. It says: Two access-to-object values are equal if they designate the same object, or if both are equal to the null value of the access type. Since that paragraph does not contain the word "zero" or "size" or anything else related to this issue, it seems to be sheer fantasy to claim that it could be interpreted as saying anything special about zero-sized objects. Surely, if the author had intended to have a kludgy special case for zero-sized objects, there would some words to that effect in the paragraph. (Note that in the very *next* paragraph, there *is* a kludgy special case for implementation reasons, and as expected, this is stated explicitly. In fact it's stated twice (4.5.2(13), and 3.10.2(39)).) I strongly object to the idea that this paragraph is totally ambiguous, and that therefore the ARG is free to fiddle with it arbitrarily. If Tucker is not making that claim, he should clarify what he *is* claiming. Otherwise, he should buy me a new pair of eyeglasses, because I can't see the word "zero" that he discerns in para 12. ;-) Seriously, this paragraph seems to be one of the clearest in the RM. It means: Two access-to-object values are equal if AND ONLY IF they designate the same object, or if both are equal to the null value of the access type. That is, IF AND ONLY IF the objects have the same *identity* (in the nonnull case). We briefly discussed that "if" might mean "if but not necessarily only if", but I believe I refuted that idea, using Robert's rule. So "if" here must mean "if and only if" (as stated in Chapter 1). It cannot possibly be "interpreted" to mean "if, but not blah blah zero-sized blah blah discriminants...". I can understand that Tucker (and Randy, and an earlier not-yet-debugged version of Pascal ;-)) don't *like* what it says. But to claim that it could be interpreted to say something other than what it obviously says seems like an unfair debating tactic -- an attempt to pretend that a proposed language change isn't really a change. If Tucker or somebody wants to claim that this paragraph could be interpreted in some way, they have to point to some actual words in the RM -- it's not good enough to say, "Well, *I* don't understand that paragraph -- it seems utterly ambiguous to me." Perhaps you meant "Perhaps the original author didn't think about the zero-sized case when writing this paragraph." Now *that* I could believe (although I know *I* thought about it -- I've known for 20 years that Ada requires a special zero check in some allocators!). ---- Now Tucker *did* point out one hole: If one or both access values being compared designate an object that no longer exists (because of Unchecked_Deallocation, or leaving the scope of an aliased stack object), then that paragraph, if it means anything at all, means "=" returns False (because they don't designate the same object -- it's been freed, so they don't designated *any* object). That's clearly not what we want; we want to allow True in this case. I agree that this is a hole. However, I object to pretending that the hole is wider than it is. Patching this hole in no way requires adding new special cases about zero-sized objects. The latter would be a language change, not a patch to an existing hole. Tucker also pointed out that this case is *not* declared erroneous, nor is it declared "wildly erroneous". ;-) This is true. However, any program that compares pointers to deallocated objects necessarily has a bug, because it is impossible to get any useful information from such a comparison. Usually, in Ada, we like compilers to catch bugs (either at compile time or at run time). However in this and a few other cases, we do not wish to require that, because it would greatly harm efficiency. It would seem useful if an implementation chose to detect such bugs. Most implementations can't reasonably do that, but I see no reason to *forbid* such detection. It would be feasible to detect such bugs in an implementation that has garbage-collection support, for example. Therefore, I suggest that the best fix would be to say that "=" when one or both access values designates a nonexistent (deallocated) object is a bounded error: it might return either True or False, or it might raise Program_Error. In any case, the wording belongs in Chap 13, not Chap 4. ---- The other point Randy asked me to address is Address_To_Access_Conversions. If there is a hole in the semantics of Address_To_Access_Conversions, then it should be patched in Chap 13. Using some weird issue related to Address_To_Access_Conversions to drive the high-level semantics of access equality is a case of the tail wagging the dog. I don't think there *is* a hole in Address_To_Access_Conversions; please correct me if I'm wrong. Address_To_Access_Conversions does whatever it does, and that depends on the run-time model chosen by the implementation. For example, if a garbage collector is moving things around in memory, Address_To_Access_Conversions might have some pretty weird behavior -- that's the programmer's problem. Contrast this with access equality, which must return the right answer even if the GC is moving things around. People who implement GC's know how to accomplish this (even in an incremental GC). If there *is* a hole in Address_To_Access_Conversions, please explain what it is. And we can fix it in Chap 13. Whether or not there is such a hole, there is certainly no need for 4.5.2(12) to mention Address_To_Access_Conversions. Chapter-13-ish features (Address_To_Access_Conversions, Unchecked_Conversion, Address clauses, etc) can violate *any* run-time rule in the whole manual. The entire RM is written with that assumption. For example, we say that discriminants can't change. Of course, they *can* change if you use Chapter-13-ish features, but we don't pollute the section on discriminants by chatting about that. ---- Now I finally get to the *real* issue: what *ought* the RM say about "=" on access-to-zero-sized objects? Tucker proposes (as I understand it), that "=" should be allowed to return either True or False if *both* of the access values designate distinct zero-sized objects. I think Tucker has failed to address some points I made earlier. The claim is that the above permission eliminates some distributed overhead. For example, an allocator might maintain a pointer to the next-to-be-allocated location, and increment that by the size of the object, which might be zero. Thus two successive Allocates might return the same Address. But this implementation is wrong, even with the above permission, because it would cause "=" to return True when just *one* of the objects is zero-sized; the next-allocated one might not be zero sized. Do you propose to allow either True or False when just *one* of the two objects is zero-sized? That would be even worse than breaking object identity! It would break the principle that "X = Y" implies "X.all = Y.all" (if non-null). I strongly object to a language in which X.all = "Hello, world", and Y.all = "", yet X = Y! (Worse, "...yet X *might* equal Y.") ---- Another point Tucker failed to address is that if you claim that equality of zero-sized objects is somehow evil programming practise, the burden of proof is on you. You've given no reason why a programmer *shouldn't* use "=" in this case. Suppose I say, if X contains the integer number 1_259_792_008, and Y contains the number 7_992_829, then X = Y can return either True or False. After all, this is an extremely rare thing to do; I've never run across these numbers in all my life of programming. It's an unimportant case -- I'll bet if a compiler had this bug, nobody would notice. Well, that's not good enough. I can't just say it's rare. I can't just say it's an unimportant case. I can't challenge someone to produce a useful program containing comparison of those numbers. I have to give some logical argument that this is poor programming practise; that there's something *special* about those particular numbers that should logically make them incomparable. Tucker has said he doesn't think the zero-sized case is important. But to change the language in this way, I claim he has to show what's special about that case that makes it fundamentally wrong to use "=". (In fact, Nick showed just the opposite in his example.) ---- And, oh by the way, we should address the original question. I think the appropriate change would be to forbid implementations from passing zero as the size to Allocate. This would make it clear that the zero check needs to happen at the call site of Allocate (where it can often be optimized away), and that programmers can portably assume that user-defined Allocate need not check for zero. [I think the RM already says this, but the argument is subtle.] Randy made a point related to this, which I have so far failed to address: he said something about direct calls to Allocate (not via "new"). Well, I think that's a red herring. The RM doesn't say anything about what Allocate must do. If you call it directly, it does whatever it does. The RM just says that *if* Allocate obeys a certain contract, and *if* it is called via "new", then certain things happen. There is nothing in the RM that forbids doing the zero check as part of "new" rather than as part of Allocate, and in fact that implementation choice is preferable, which is why I advocate requiring it. ---- These things should be enforced by ACATS! **************************************************************** From: Robert A. Duff Sent: Sunday, October 19, 2003 4:46 PM I wrote: > But a few weeks ago Randy told me he was filing an AI on the > subject, and he asked me to address a couple of issues I had failed to > address in my previous diatribes. How's that for passing the blame? I refuse to take proper responsibility for my own actions, and blame Randy. Sorry, Randy. ;-) Randy did say that he might be "sticking a branch into a beehive". ;-) **************************************************************** From: Robert Dewar Sent: Sunday, October 19, 2003 7:05 PM [For those who haven't seen this long thread, somebody asked whether a user-defined Allocate routine needs to ensure that every call returns a distinct Address, even if the size to be allocated is zero. Tucker then suggested that X = Y should be (or is) allowed to return either True or False in some cases. Several different suggestions for *which* cases were proposed. I claimed that equality on access types is defined in terms of object identity, not addresses, and that it would damage the language to change that fact.] I strongly agree with this, trying to kludge up equality for some implementor convenience here for zero sized objects is an unjustified change in the language. **************************************************************** From: Robert I. Eachus Sent: Sunday, October 19, 2003 8:50 PM I also agree that there is nothing currently broken here, and trying to find wiggle room via appeals to cases where Unchecked_Deallocation has been called is wrong. If the access value was set to null by Unchecked_Deallocation, that case is clearly covered by the current wording. If a user maintains a copy of an access value X that is not cleared by Unchecked_Deallocation, of course tests for equality of X and Y may return true or false when Y is a different access value. In fact, this reminds me of a discussion I just had on c.l.a. The actual subtype of the Object parameter in instantiations of Unchecked_Deallocation is really useless. (Except that it forbids Unchecked_Deallocation for access-to-constant types.) Should the compiler be allowed to ignore calls to an instance of Unchecked_Deallocation if the subtype I used in the instantiation was of zero length? Of course not! The fact that the subtype in the allocation or deallocation was of zero length should have no effect on the language rules. Same applies here. If I want to use access types as laundary tickets and ignore the data in the designated type, the rules for access types should still prevent me from accidentally giving out two identical laundary tickets. (Yes, I know, if I deallocate something and there are still live access values around, I am headed for trouble. But I would be anyway if I didn't make the laundry tickets limited so users couldn't duplicate them. ;-) **************************************************************** From: Randy Brukardt Sent: Monday, October 19, 2003 12:11 PM > Sorry to reraise this issue, which some folks may be tired of talking > about. But a few weeks ago Randy told me he was filing an AI on the > subject, and he asked me to address a couple of issues I had failed to > address in my previous diatribes. Since it's now an AI, I'm including > the arg mailing list -- I *hope* that's the right thing to do. No, you should never cross-post between ARG and Ada-Comment. There are a number of reasons for this: -- You're making a private e-mail list's address public by so doing. That could lead to people trying to post to ARG that aren't members; -- Most of us are on both lists. By cross-posting, we get the messages twice. Those of us who aren't on both lists have decided to stay out of public technical discussions, and thus have decided to avoid the discussion until it comes up on the ARG agenda. -- By bringing another group into the middle of the discussion, we end up having to rehash all of the arguments again (which is especially annoying given the fact that Bob seems to want to win the argument by sheer volume of messages. :-) As happened the last time, I can't answer this whole message. But a quick scan showed a couple of disagreements: > Tucker also pointed out that this case is *not* declared erroneous, nor > is it declared "wildly erroneous". ;-) This is true. However, any > program that compares pointers to deallocated objects necessarily has a > bug, because it is impossible to get any useful information from such a > comparison. That's not true. It is useful to compare such objects against null, and indeed that ought to work properly. We use that when streaming in the symboltable in the Janus/Ada compiler -- if a pointer is not null, we read another object from the stream. If it is null, there is no object to read in. This seems to be a generally useful technique, and we don't want to lose it. > Usually, in Ada, we like compilers to catch bugs (either at > compile time or at run time). However in this and a few other cases, we > do not wish to require that, because it would greatly harm efficiency. > It would seem useful if an implementation chose to detect such bugs. > Most implementations can't reasonably do that, but I see no reason to > *forbid* such detection. It would be feasible to detect such bugs in an > implementation that has garbage-collection support, for example. > Therefore, I suggest that the best fix would be to say that "=" when one > or both access values designates a nonexistent (deallocated) object is a > bounded error: it might return either True or False, or it might raise > Program_Error. > > In any case, the wording belongs in Chap 13, not Chap 4. Thus, I totally object to this wording. I'm not against making this a bounded error in general, but the compare against null case should not be an error (or undefined, for that matter). > ---- > > The other point Randy asked me to address is > Address_To_Access_Conversions. If there is a hole in the semantics of > Address_To_Access_Conversions, then it should be patched in Chap 13. > Using some weird issue related to Address_To_Access_Conversions to > drive the high-level semantics of access equality is a case of the tail > wagging the dog. There is no hole in Address_to_Access_Conversions (AAC). The problem is your insistence on an object-based model for equality compares, with no wiggle room. If you have such a comparison, you then have to say what object the result of an AAC is. Otherwise, you have no basis to decide what the answer from such a comparison is. And, as I showed earlier, any answer to this question essentially requires that all objects (not just aliased ones) have to have a unique id. The only way to do that is to disallowing zero-sized objects - and that has a runtime cost, especially for array slices. ... > Randy made a point related to this, which I have so far failed to > address: he said something about direct calls to Allocate (not via > "new"). Well, I think that's a red herring. The RM doesn't say > anything about what Allocate must do. If you call it directly, it does > whatever it does. The RM just says that *if* Allocate obeys a certain > contract, and *if* it is called via "new", then certain things happen. > There is nothing in the RM that forbids doing the zero check as part of > "new" rather than as part of Allocate, and in fact that implementation > choice is preferable, which is why I advocate requiring it. I very much dislike having the specification of a routine fail to indicate its contract. Moreover, any good Ada programmer will include a check for any invariants in a subprogram's implementation; the only check that they would ever omit would be ones implied by the subprogram's specification. You in fact said that you did this yourself. So the net effect is to require a double check, both of which are often dynamic and thus will require code. Allocations are common, so requiring the check at the point of the call will add a substantial amount of code to the system. (Our experience is that around 50% of the allocations in the program text are dynamically sized - there are a lot of allocations of generic formal parameters and of dynamically sized arrays in the programs that we look at.) ****************************************************************