!standard 3.10 (09) 03-09-12 AC95-00075/01 !class amendment 03-09-12 !status received no action 03-09-12 !status received 03-08-22 !subject Special dope vector for pointers to unconstrained arrays !summary !appendix !topic special dope vector for pointers to unconstrained arrays !from Matthew Heaney 08-22-03 !discussion I have a map container, with type String as the key. The map is implemented as a tree of nodes, with each node comprising a key and the element associated with that key. The node looks something like this: type Node_Type (Key_Size : Natural) is limited record Key : String (1 .. Key_Size); Element : aliased Element_Type; ... end record; A map has an iterator, which is just a thin wrapper around an access object that designates a node. The iterator allows one to obtain the key and its element in a way that hides representation, like this: procedure Op (I : Iterator_Type) is K : String := Key (I); E : Element_Type := Element (I); begin One thing I can do for elements is to convert from an iterator to an element access, because the Element component of the node is declared as aliased. This allows me to modify the element, or to simply avoid making a copy: procedure Op (I : Iterator_Type) is E : Element_Type renames To_Element_Access (I).all; begin I would like to be able to do something similar for keys. Something like: procedure Op (I : Iterator_Type) is K : String renames To_String_Access (I).all; begin However, if I have an access type like this: type String_Access is access all String; then declaring the Key component of Node as aliased still won't help, because of static matching rules. Basically, the Key string object has no dope vector. I was wondering if any thought has been given to having a special access type that designates an unconstrained array subtype, in which the array objects have fixed O'First, and an O'Last that is specified by some other means, for example the discriminant of an enclosing record. For example, in the Node_Type above, the Key'First of the Key component always has the value 1, and the Key'Last is determined from the Key_Size discriminant. In a sense we're using the discriminant as the dope vector. The syntax would look something like: type String_Access is access all String (1 .. <>); type Node_Type (Key_Size : Natural) is limited record Key : aliased String (1 .. Key_Size); ... end record; for Node_Type.Key'Dope_Vector use Node_Type.Key_Size; for String_Access'Dope_Vector use Node_Type.Key'Dope_Vector; or something like that. Alternatively, the language could provide a way to force dope to be generated for the Key component, but this wouldn't be very space efficient, because the index range is specified already, via other mechanisms besides an explicit dope vector. The problem is being able to declare an access subtype that understands these other mechanisms for constraining the array objects. **************************************************************** From: Adam Beneschan Sent: Monday, August 25, 2003 7:59 PM > I was wondering if any thought has been given to having a special > access type that designates an unconstrained array subtype, in which > the array objects have fixed O'First, and an O'Last that is > specified by some other means, for example the discriminant of an > enclosing record. I'm not exactly sure what you're trying to do, since I haven't seen the definitions of To_Element_Access or To_String_Access. I assume To_Element_Access(I) is a function that returns I.Element'Access or something equivalent, and that you'd like to do the same for To_String_Access? If I understand everything correctly (and I make no guarantees whatsoever about that), it appears to me that you don't really need a special access type, but that a special attribute might be good enough. Suppose we defined an attribute, 'Unconstrained_Access, which works exactly like 'Access except the prefix must be an array, and instead of 3.10.2(27), the rule for 'Unconstrained_Access would be The type of the view shall be the same as that of the A's designated type, and A's designated subtype shall be unconstrained. Thus 'Unconstrained_Access would, in essence, create a dope vector using the bounds of whatever prefix you give it. Would this solve the problem? (Yes, if we added this, we'd probably need 'Unchecked_Unconstrained_Access too.) **************************************************************** From: Matthew Heaney Sent: Tuesday, August 26, 2003 9:45 AM > I'm not exactly sure what you're trying to do, since I haven't seen > the definitions of To_Element_Access or To_String_Access. I assume > To_Element_Access(I) is a function that returns I.Element'Access or > something equivalent, and that you'd like to do the same for > To_String_Access? Yes, that's about right. In my case To_Element_Access is an instantiation of a generic, but you can think of it this way: type Element_Access is access all Element_Type; function To_Element_Access (I : Iterator_Type) return Element_Access is begin return I.Element'Access; end; I wanted something similar for keys, having type String. Something like: type Key_Access is access constant String; function To_Key_Access (I : Iterator_Type) return Key_Access is begin return I.Key'Access; end; However, this won't work, because of static matching rules. > If I understand everything correctly (and I make no guarantees > whatsoever about that), it appears to me that you don't really need a > special access type, but that a special attribute might be good > enough. Suppose we defined an attribute, 'Unconstrained_Access, which > works exactly like 'Access except the prefix must be an array, and > instead of 3.10.2(27), the rule for 'Unconstrained_Access would be > > The type of the view shall be the same as that of the A's > designated type, and A's designated subtype shall be > unconstrained. > > Thus 'Unconstrained_Access would, in essence, create a dope vector > using the bounds of whatever prefix you give it. > > Would this solve the problem? > > (Yes, if we added this, we'd probably need > 'Unchecked_Unconstrained_Access too.) What I was trying to do is avoid things like this: procedure Print (I : Iterator_Type) is begin Put ("key="); Put (Key (I)); New_Line; end; In the case of a selector function like this: function Key (I : Iterator_Type) return String is begin return I.Key; end; I believe it's true that this will make a copy of the key, so that invokations like this (as in the example above): ... Key (I) ... will create a temporary. I was trying to avoid creation of the temporary, by returning an access object instead. That way I could say: Put (To_Key_Access (I).all); which refers to the original string object, and no temporary is created. The old VADS compiler (and I think GNAT works this way too, when you use a size clause for the access type) would implement an access type that designates an unconstrained array by pointing to the first element of the array itself, and allocating the dope vector in the 8 bytes that immediately preceded the start of the array object. So given an access object, you can find the dope vector. I was trying to do something similar for arrays that are record components, in which the array bounds come from the record discriminant(s). So given an access object that designates this record component, you can find the dope vector: it's just the discriminant itself. There are perhaps other ways to solve this problem. We could avoid the access type entirely if the function return type were anonymous, like this: function To_Key_Access (I : Iterator_Type) return access constant String is begin return I.Key'Access; end; That would also allow me to say: Put_Line (To_Key_Access (I).all); as above. All of these mechanisms are basically a roundabout way of returning a reference. In C++ I can say: const string& key() const; A reference is what you'd get if you have visibility to the node type, and you renamed the key component: procedure Op (I : Iterator_Type) is Key : String renames I.Key; begin Instead of just a reference that is an object declaration, I'd like to have a reference as the return type of a function. A reference is nice because it avoids having to say ".all". However, if I have to say ".all" to avoid creation of a temporary I don't mind too much. Hence the ideas for anonymous and named access types using special dope vectors. **************************************************************** From: Matthew Heaney Sent: Tuesday, August 26, 2003 10:17 AM > I wanted something similar for keys, having type String. > Something like: > > type Key_Access is access constant String; > > function To_Key_Access > (I : Iterator_Type) return Key_Access is > begin > return I.Key'Access; > end; > > However, this won't work, because of static matching rules. Note that access types of the form type String_Access is access all String; are completely general, and can point to string objects anywhere, as long as the string object has dope. I think I'm asking for a less-general access type, like this: type Node_Type (Length : Natural) is limited record Key : aliased String (1 .. Length); ... end record; type Key_Access is access all Node_Type.Key; The problem is a little more tricky, however, because Node_Type is in a private part, so I would need to do something like: generic ... package GP is type Iterator_Type is private; type Key_Access is access all String'Deferred; --? function To_Key_Access (I : Iterator_Type) return Key_Access; private type Node_Type ...; type Key_Access is access all Node_Type.Key; end GP; **************************************************************** From: Nick Roberts Sent: Tuesday, August 26, 2003 3:26 PM I think you could achieve pretty much the effect you require as follows. Declare in the visible part of your package: type Key_Element_Association (Key_Length: Positive) is record Key: String(1..Key_Length); Element: Element_Type; end record; type KEA_Access is access all Key_Element_Association; Declare in the private part: type Node_Type (Key_Length: Positive) is record KEA: aliased Key_Element_Association(Key_Length); -- possibly other private components end record; Provide (in the visible part) a function such as: function Key_and_Element (I: in Iterator_Type) return KEA_Access; Obviously this function gives the user (read and write) access to both the key and element at the same time. It may not beautiful, but I think it would work. **************************************************************** From: Matthew Heaney Sent: Tuesday, August 26, 2003 3:38 PM Yes, I had that already. The problem is that both the node_type and the kea_type each have a discriminant, so I'm back up to 8 bytes of overhead per node, which is the problem I was trying to solve. The other problem is that you don't get an access to a string. You get an access to something else, that contains a string. So yes, I can avoid creating temporaries using that technique, but the cost is 4-byte-per-node penalty. **************************************************************** From: Randy Brukardt Sent: Tuesday, August 26, 2003 4:17 PM > Yes, I had that already. The problem is that both the node_type > and the kea_type each have a discriminant, so I'm back up to 8 > bytes of overhead per node, which is the problem I was trying to solve. Humm, that's highly implementation dependent. In Janus/Ada, the structure you're describing (because of generic sharing, array descriptors, and the fact that we allocate all discriminated records to size) probably would have an overhead of at least 24 bytes per node. But the better question is, why woory about it? It's premature optimization; worry about the overhead if and only if it actually matters in your application. I realize that you've been working on canned components, but the rule holds there as well. If you have critical time and/or space requirements for a data structure, you have no business using canned components in the first place. Canned components value comes from being able to more rapidly prototype an application (so that solving the problem can be the primary initial focus, and that the real, not imagined, bottlenecks can be found). But that means that there is no reason for them to be perfect in any dimension. In practice, just about anything will be good enough 90% of the time; and the other 10%, the developers ought to earn their money. :-) **************************************************************** From: Matthew Heaney Sent: Tuesday, August 26, 2003 4:50 PM I can have an API like this: type Iterator_Type is private; function Key (I : Iterator_Type) return String; or I can have an API that looks like this: type Iterator_Type is private; function Key (I : Iterator_Type) return String; type Key_Access is access constant String; function To_Key_Access (I : Iterator_Type) return Key_Access; In order to implement that last function, static matching rules will mean you can't declare the key as an array component of the node_type record. Unless you use compiler magic to force dope to be generated for the component. Or you declare the component as an access object that designates a string allocated on the heap. In either case there is a space penalty. If we get rid of the To_Key_Access function, this means (I think) that the component can be more space efficient. The issue is whether the space penalty is real, and whether being able to access key values more efficiently outweighs the putative space penalty. I wish there were a way to access key values efficiently, without any storage cost. Hence the idea for special access types. I was trying to avoid the inefficiencies of the Ada.Strings.Unbounded component, but perhaps the comparison isn't valid. After all, to query the file name using Text_IO we've been saying "Name (File)" forever, and no one has complained. **************************************************************** From: Randy Brukardt Sent: Tuesday, August 26, 2003 5:45 PM Right. Even getting rid of copying overhead could be considered premature optimization. The question is how often will the Key function be called? If it will be called hundreds of thousands of times in typical use, then it might make sense to try to eliminate the overhead. But I usually just use a Get_Line-like procedure in such cases: procedure Key (I : Iterator_Type; Str : out String; Len : out Natural); These strings will be short in general, so the copying cost of the string itself isn't an issue. The cost is with the dynamic memory management of the return (Janus/Ada uses a special pool in the heap; other implementations use special stacks or other tricks) of an unconstrained string. Then the user has a choice of easy (the function) or faster (the procedure), and no head standing is required. If these things were going to be long enough that the copying expense would be a problem, I'd probably simply make them visibly access-to-unconstrained constant strings and both pass in and return the access objects. (That would eliminate any need for 'Access in the implementation.) But I would avoid the complication and not worry about it. But to respond directly to your original question: > I wish there were a way to access key values efficiently, without > any storage cost. Hence the idea for special access types. I think a new kind of type is far too heavy for this problem. Adam's suggestion is much simpler and no implementor should have trouble with it (adding and removing descriptors is a common operation in parameter passing and subtype conversions). For Janus/Ada, it would have no runtime cost at all (there always is a full array descriptor -- which IS the component -- for a depends-on-discriminant array component. The data itself is always dynamically allocated to the correct size) -- it would actually be cheaper than 'Access, which has to do a check and a descriptor access to find the actual data. You seemed to reject this idea, but you never actually explained why. With Adam's suggestion, your routine would look like: type Const_String_Access is access constant String; function To_Key_Access (I : Iterator) return Const_String_Access is begin return I.Key'Unconstrained_Access; end To_Key_Access; which certainly seems to do the trick. (Note that the reference-returning version you suggested: function To_Key_Access (I : Iterator) return access constant String; has a lot of runtime overhead for accessibility checks and passing in pools, which is *exactly* what you don't want here. Which is why I think this whole idea, as outlined in AI-325 and AI-318 [Tucker put a complete implementation model in 318; its important to see how complex and costly it is], is dubious.) (Also note that you need Adam's solution whether or not AI-318/325 is included in Ada 200Y. So it should be considered separately.) **************************************************************** From: Adam Beneschan Sent: Wednesday, August 27, 2003 12:50 PM I wrote: > If I understand everything correctly (and I make no guarantees > whatsoever about that), it appears to me that you don't really need a > special access type, but that a special attribute might be good > enough. Suppose we defined an attribute, 'Unconstrained_Access, which > works exactly like 'Access except the prefix must be an array, and > instead of 3.10.2(27), the rule for 'Unconstrained_Access would be > > The type of the view shall be the same as that of the A's > designated type, and A's designated subtype shall be > unconstrained. > > Thus 'Unconstrained_Access would, in essence, create a dope vector > using the bounds of whatever prefix you give it. It's been pointed out to me that another option, instead of defining a new 'Unconstrained_Access attribute, would be to relax 3.10.2(27) so the expected type for 'Access can be an access type whose designated type is an unconstrained array type even if the nominal subtype of the view is constrained. I do not believe that doing so would make any legal programs illegal (by making any constructs ambiguous that were previously unambiguous), even taking AI95-00235 into account. **************************************************************** From: Randy Brukardt Sent: Wednesday, August 27, 2003 3:53 PM The problem with either proposal (which I forgot until I started answering this note) is that when you create a descriptor for a constrained object in response to 'Unconstrained_Access attribute, you have no obvious place to free it. Essentially, it has to remain around as long as the access type (in the absence of fancy reference counts or the like). That could look like a storage leak in a case like the example. (It wouldn't on Janus/Ada, because the full descriptor is part of the component anyway. But if the component had not been constrained by a discriminant, no such descriptor would be available, so we'd allocate one from the heap. But we'd never be able to recover it; the access can last as long as the access type. Other implementations are possible (allocating the descriptor in the scope of the object would work), but they all would have to allocate memory in a scope other than the current one -- which is likely to be difficult. Any of the solutions suggested to this 'problem' would have this property (or they would require the generation of descriptors for essentially all array objects, which is equally unacceptable). Without the storage leak problem, the static matching rule is unnecessary at all (for Janus/Ada, at least), as it eliminates no (other) implementation problems and gets in the way of perfectly reasonable uses. I recall that the Ada 9X team found that the problem was much worse that storage leakage for some implementations - it was essentially impossible to implement. But I don't recall those details. So, I have to withdraw my support for attempting to solve this problem. **************************************************************** From: Matthew Heaney Sent: Wednesday, August 27, 2003 9:57 AM Randy said: > But I usually just use a Get_Line-like procedure in such cases: > > procedure Key (I : Iterator_Type; > Str : out String; > Len : out Natural); Yes, I have something like that already. But it still makes a copy, which I was trying to avoid. > Then the user has a choice of easy (the function) or faster (the procedure), > and no head standing is required. Yes, the API has both. > But to respond directly to your original question: > > > I wish there were a way to access key values efficiently, without > > any storage cost. Hence the idea for special access types. > > I think a new kind of type is far too heavy for this problem. Adam's > suggestion is much simpler and no implementor should have trouble with it > (adding and removing descriptors is a common operation in parameter passing > and subtype conversions). For Janus/Ada, it would have no runtime cost at > all (there always is a full array descriptor -- which IS the component -- > for a depends-on-discriminant array component. The data itself is always > dynamically allocated to the correct size) -- it would actually be cheaper > than 'Access, which has to do a check and a descriptor access to find the > actual data. > > You seemed to reject this idea, but you never actually explained why. I didn't fully understand it. My response to Adam was mostly intended to ensure that we were both on the same page. > With Adam's suggestion, your routine would look like: > > type Const_String_Access is access constant String; > function To_Key_Access (I : Iterator) return Const_String_Access is > begin > return I.Key'Unconstrained_Access; > end To_Key_Access; > > which certainly seems to do the trick. Yes, this is perfect. Just what I wanted. > (Also note that you need Adam's solution whether or not AI-318/325 is > included in Ada 200Y. So it should be considered separately.) Is there an AI already for a language change a la 'Unconstrained_Access? Does an AI need to be created? **************************************************************** From: Randy Brukardt Sent: Wednesday, August 27, 2003 5:34 PM > Is there an AI already for a language change a la 'Unconstrained_Access? > Does an AI need to be created? There is not. As to whether one is a good idea, that is why we're having this discussion. :-) **************************************************************** From: Tucker Taft Sent: Wednesday, August 27, 2003 4:49 PM I am afraid I have not really been following this discussion, but it reminds me that when we looked into supporting the GNAT attribute "'Unrestricted_Access" we concluded we needed a pragma on the *access type* as well, to make sure it could properly hold the values created by unrestricted access. For example, a "pragma Unrestricted_Access(acc_to_subp_type);" in our (hypothetical) implementation would make values of the type be twice as big, so they always had both an address and a static link. For an implementation that used displays, it might have to be even bigger, to accommodate a copy of the display. I believe we were also imagining requiring that the subprogram itself would need a pragma Unrestricted_Access on it, to prepare it to be usable with Unrestricted_Access (though I forget why we wanted to require that). For the example here, where you want to be able to point at, for example, a slice of some array, or an aliased array that was not declared in the "special" way it needs to be to be pointed to by an access-to-unconstrained type, you would need a "pragma Unrestricted_Access" on the access-to-unconstrained-type. This would cause the compiler to pre-allocate with all values of the access type room to hold the array dope. Then when you used 'Unrestricted_Access on some slice or some other "inappropriate" sort of array, the dope could be carried with the access value, rather than somewhere in the heap or wherever. If you wanted to allow 'unrestricted_access on a slice of a packed array, then the dope carried with the access value might need to contain a "physical" 'First as well as a "logical" 'First, so the address wouldn't have to be a bit address. Hence, I think if you want to "standardize" something like 'Unrestricted_Access, you will need to have a pragma (or equivalent) on the access type, as well as perhaps on the possibly designated subprograms, for an access-to-subprogram Unrestricted_Access. (For subprograms, Unrestricted_Access is almost like a distinct language "convention".) **************************************************************** From: Randy Brukardt Sent: Wednesday, August 27, 2003 5:38 PM > For the example here, where you want to be able to point at, for > example, a slice of some array, or an aliased array that was not declared > in the "special" way it needs to be to be pointed to by an > access-to-unconstrained type, you would need a "pragma Unrestricted_Access" > on the access-to-unconstrained-type. This would cause the compiler to > pre-allocate with all values of the access type room to hold the array > dope. Then when you used 'Unrestricted_Access on some slice or > some other "inappropriate" sort of array, the dope could be carried > with the access value, rather than somewhere in the heap or wherever. I understand, but I fail to see why the native implementation of access-to-unconstrained would not be able to do that. An implementation has to be prepared to pass slices as unconstrained objects, so any implementation requiring adjacent bounds is impossible for that. Why would an implementor use a different representation for access-to-unconstrained? It would just complicate the compiler for little or no gain. The Janus/Ada solution of access-to-unconstrained pointing at a descriptor (which points at the data) has the side advantage of being far more optimizable -- the descriptor cannot change after creation, so it is not necessary to worry about aliasing effects on the descriptor. Thus, it can be put into registers and used in common subexpressions without extra analysis. In any case, we're NOT talking about "Unrestricted_Access" (which has a whole can of worms of its own, especially for subprograms). We're talking about a very specific case of allowing most array objects to be pointed at by an access-to-unconstrained. The suggestion "Unconstrained_Access" was simply to allow creation of an access-to-unconstrained without triggering 3.10.2(27). If you have to use a "bloated pointer" to represent the access-to-unconstrained, I think it doesn't help Matthew's problem (which is to increase efficiency). Moreover, I'd forgotten about the storage leak problem in Janus/Ada that triggered the adoption of 3.10.2(27) in the first place. So I don't think that this can be solved in a way that would really be efficient enough to meet Matthew's needs. As such, it isn't clear that we should try to solve it at all. **************************************************************** From: Tucker Taft Sent: Wednesday, August 27, 2003 8:39 PM > I understand, but I fail to see why the native implementation of > access-to-unconstrained would not be able to do that. An implementation has > to be prepared to pass slices as unconstrained objects, so any > implementation requiring adjacent bounds is impossible for that. Why would > an implementor use a different representation for access-to-unconstrained? > It would just complicate the compiler for little or no gain. This kind of argument doesn't work, as you know. Implementors make plenty of choices that make no sense to other implementors. We pass dope and data as two separate parameters, but for an access-to-unconstrained type, we require that the dope immediately precede the data in memory. For access-to-constrained, there is no per-object dope, even if the bounds are not compile-time known. (When passing a slice, if it is not aligned on an addressible boundary, we make a copy of it.) When someone declares an aliased array object, if the nominal subtype is unconstrained, we precede it with the dope that an access-to-unconstrained type would expect. If the aliased array object has a constrained nominal subtype, then we don't precede it with dope. There is essentially no way we could implement 'Unconstrained_Access with this model, unless the access type has some kind of pragma on it which indicated that we would need to use some different approach for dope. And as you point out, any approach has a problem with avoiding a storage leak on the dope, unless the dope is allocated a) when the designated object is allocated, or b) carried around with the access value as part of a "bloated" access type. > The Janus/Ada solution of access-to-unconstrained pointing at a descriptor > (which points at the data) has the side advantage of being far more > optimizable ... This is all fine and good, but irrelevant given the existence of many other compilers with different implementation models. > ... As such, it isn't clear that we should try to solve it at all. That's fine with me. But if a solution is desired, I believe it will need some additional pragmas, etc., to avoid seriously breaking existing implementations. **************************************************************** From: Pascal Leroy Sent: Thursday, August 28, 2003 5:31 AM I may be thick, but I fail to understand what problem we are trying to solve. Matt is trying to avoid copying a string in one particular usage pattern. Turns out you can't. Tough luck. I mean, any Ada program is going to make gazillions of copies, and I don't see why we would want to optimize that particular one. As Randy said, if the string is short the cost of the copy is going to be negligible, and if the string is so long that the copy takes a measurable time, you are probably using the wrong data structure anyway. **************************************************************** From: Matthew Heaney Sent: Thursday, August 28, 2003 11:17 AM It is indeed the case that the strings we're talking about here are just keys in a map, strings which are going to be relatively small. I was thinking that that you don't really need dope, because the "dope" is just the enclosing record. But you would need some special access type to take advantage of this case, e.g. package P is ... type Key_Access is access constant String with private; function To_Key_Access (Iterator : Iterator_Type) return Key_Access; private type Node_Type (Key_Length : Natural) is record Key : aliased String (1 .. Key_Length); ... end record; type Key_Access is access constant Node_Type.Key; end P; But this is probably too much language change for too little gain. ****************************************************************