!standard 13.2(6.1/2) 12-07-05 AI12-0001-1/02 !standard 13.2(7) !standard C.6(10) !standard C.6(11) !standard C.6(21) !class binding interpretation 06-03-31 !status work item 06-03-31 !status received 06-03-30 !priority Medium !difficulty Medium !qualifier Omission !subject Independence and Representation clauses for atomic objects !summary [Editor's note: This AI was carried over from Ada 2005.] This action item resolves the difference in recommended level of support for atomic and volatile objects by making the alignment implementation advice recommended support and adding a rejection statement for those array objects that are packed to a different alignment than that of the component's subtype. !question The Recommended Level of Support implies that it is required to support pragma Pack on types that have Atomic_Components, even to the bit level. Is this the intent? (No.) !recommendation Resolve the difference by eliminating C.6 (21) and changing 13.2 (6.1/2) to be a recommended level of support where by-reference, aliased, atomic and volatile objects must be aligned according to subtype. Change 13.2(9) to reject packed arrays that require independent addressability, but are packed to a different or no alignment. In C.6(10-11), add "and Independent" after indivisible. Delete C.6 (21) as it is no longer required. !wording 13.2 (6.1/2) is moved after 13.2 (7) and changed to: For a packed type that has a component that is aliased, volatile, atomic, or is of a by-reference type, the component shall be aligned according to the alignment of its subtype; in particular it shall be aligned on a storage element boundary. 13.2 (9) append: If the array component is required to be aligned according to its subtype and the results of packing are not so aligned, the pack aspect should be rejected. C.6 (10-11) Add "and independent" after indivisible. C.6 (21, AARM 21.a) Delete. !discussion Addition of atomic (and volatile) to 13.1 (24-26) was discarded because neither aspect is confirming. Making 13.2 (6.1/2) a Recommended Level of Support makes it a requirement when Annex C is supported. This covers volatile and atomic and eliminates the conflict between the Recommended Level of Support and this rule. Similarly, C.6(21) conflicts with the Recommended Level of Support. We don't want the representation of a packed array of Boolean to depend on other keywords (like aliased) or pragmas/aspects that apply to the type. (That could cause silent representation changes during maintenance.) Thus, this rule is deleted. !corrigendum 13.2(6.1/2) @ddel If a packed type has a component that is not of a by-reference type and has no aliased part, then such a component need not be aligned according to the Alignment of its subtype; in particular it need not be allocated on a storage element boundary. !corrigendum 13.2(7) @dinst The recommended level of support for pragma Pack is: @dinsa @xindent !corrigendum 13.2(9) @drepl @xbullet @dby @xbullet !corrigendum C.6(10) @drepl It is illegal to apply either an Atomic or Atomic_Components pragma to an object or type if the implementation cannot support the indivisible reads and updates required by the pragma (see below). @dby It is illegal to apply either an Atomic or Atomic_Components pragma to an object or type if the implementation cannot support the indivisible and independent reads and updates required by the pragma (see below). !corrigendum C.6(11) @drepl It is illegal to specify the Size attribute of an atomic object, the Component_Size attribute for an array type with atomic components, or the layout attributes of an atomic component, in a way that prevents the implementation from performing the required indivisible reads and updates. @dby It is illegal to specify the Size attribute of an atomic object, the Component_Size attribute for an array type with atomic components, or the layout attributes of an atomic component, in a way that prevents the implementation from performing the required indivisible and independent reads and updates. !corrigendum C.6(21) @ddel If a pragma Pack applies to a type any of whose subcomponents are atomic, the implementation shall not pack the atomic subcomponents more tightly than that for which it can support indivisible reads and updates. !ACATS test ACATS tests confirming rejection of aspect Pack combined with Atomic_Components for small types like Boolean on all targets but bit addressable targets should be implemented. (Test CXC6003 included such a case; this case has been removed from the test.) !appendix From: Jean-Pierre Rosen Sent: Friday, February 17, 2006 6:34 AM A question that arose while designing a rule for AdaControl about shared variables. If a variable is subject to a pragma Atomic_Components, is it safe for two tasks to update *different* components without synchronization? C.6 talks only about indivisibility, not independent addressing. Of course, you have to throw 9.10 in... The whole issue is with the "(or of a neighboring object if the two are not independently addressable)" in 9.10(11), while C.6 (17) says that "Two actions are sequential (see 9.10) if each is the read or update of the same atomic object", but doesn't mention neighboring objects. In a sense, indivisibility guarantees only that there cannot be temporary incorrect values in a variable due to the fact that the variable is written by more than one memory cycle. The issue *is* different from independent addressability. OTOH, Atomic_Components without independent addressability seems pretty much useless... **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 5:55 AM Answer seems clear, yes it is safe, provided that independence is assured, which means that there is no rep clause that would disturb the independence. If you are suggesting that Atomic Components should guarantee such independence, and result in the rejection of rep clauses that would compromise it, that seems reasonable, e.g. you have a packed array of bits with atomic components, that's definitely peculiar, and it seems reasonable to reject it. **************************************************************** From: Pascal Leroy Sent: Thursday, March 30, 2006 6:07 AM > If a variable is subject to a pragma Atomic_Components, is it safe for > two tasks to update *different* components without synchronization? I think that 9.10(1) is quite clear: distinct objects are independently addressable unless "packing, record layout or Component_Size is specified". So regardless of atomicity, it is always safe to read/update two distinct components of an object (in the absence of packing, etc.). What Atomic_Component buys you is that reads/updates of the same component are sequential. **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, March 30, 2006 6:17 AM Of course, my question was in the case of the presence of packing etc. The answer seems to be no, there is no *additional* implication on addressability due to atomic_components. Correct? **************************************************************** From: Pascal Leroy Sent: Thursday, March 30, 2006 6:25 AM > Of course, my question was in the case of the presence of packing etc. In the presence of packing, 9.10(1) says that independent addressability is "implementation defined", which is not too helpful. (This topic was discussed a few weeks ago as part of another thread, btw.) > The answer seems to be no, there is no *additional* implication on > addressability due to atomic_components. Correct? Right. **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 6:57 AM The ARG recently disallowed combining a pair of atomic operations on distinct objects into a single operation, I believe. I would certainly support saying that array-of-aliased and array-of-atomic would ensure independence between components, even in the presence of other rep-clauses. That seems like a reasonable interpretation of what atomic means, and "aliased" implies that you can have multiple access paths that make no visible use of indexing, and hence you would certainly want independence. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 7:58 AM > The ARG recently disallowed combining a pair of atomic operations > on distinct objects into a single operation, I believe. > I would certainly support saying that array-of-aliased > and array-of-atomic would ensure independence between > components, even in the presence of other rep-clauses. Wait a moment, then you have to give permission to reject these "other rep clauses", you can't insist that they be recognized and independence be preserved! **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 8:02 AM > In the presence of packing, 9.10(1) says that independent addressability > is "implementation defined", which is not too helpful. (This topic was > discussed a few weeks ago as part of another thread, btw.) It seems *really* nasty to make this implementation defined, I hate erroneousness being imp defined. Is this a new change, I missed it. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 8:08 AM > So regardless of atomicity, it is always safe to read/update two distinct > components of an object (in the absence of packing, etc.). What > Atomic_Component buys you is that reads/updates of the same component are > sequential. .. and atomic! But there is still the issue of something like this type X is array (1 .. 8) of Boolean; pragma Pack (X); pragma Atomic_Components (X); Should one of the two pragmas be ignored, or should one of them be rejected, or what? In GNAT we get: a.ads:4:30: warning: Pack canceled, cannot pack atomic components is that behavior OK? forbidden? mandated? (not clear to me at any right) **************************************************************** From: Pascal Leroy Sent: Thursday, March 30, 2006 8:17 AM > It seems *really* nasty to make this implementation defined, > I hate erroneousness being imp defined. Is this a new change, > I missed it. This is not new, it has been like that since Ada 95, and the last time this was discussed (around Feb, 24th, thread titled "Independence and confirming rep. clauses"), the two of us (at least) agreed that it was poor language design. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 8:25 AM OK, so I just misremembered here, sorry! **************************************************************** From: Pascal Leroy Sent: Thursday, March 30, 2006 8:25 AM > is that behavior OK? forbidden? mandated? > (not clear to me at any right) It's certainly OK to reject any representation item that you don't like. However, it appears that the implementation advice about pragma Pack does not mention atomicity, so you are not following the advice, and you don't comply with Annex C. On a machine that could independently address bits, the two pragmas could well coexist, so there is some amount of implementation dependence here. For the record Apex also ignores Pack in this example, although it doesn't emit a warning. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 8:41 AM > It's certainly OK to reject any representation item that you don't like. > However, it appears that the implementation advice about pragma Pack does > not mention atomicity, so you are not following the advice, and you don't > comply with Annex C. Yes, but it is impossible to comply on virtually all machines > On a machine that could independently address bits, the two pragmas could > well coexist, so there is some amount of implementation dependence here. There are almost no such machines! **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 8:21 AM > Wait a moment, then you have to give permission to reject > these "other rep clauses", you can't insist that they be > recognized and independence be preserved! I believe there are already rules that effectively allow that, once we make it clear that being atomic also implies being independent of neighboring objects. E.g. C.6(10-11): It is illegal to apply either an Atomic or Atomic_Components pragma to an object or type if the implementation cannot support the indivisible reads and updates required by the pragma (see below). It is illegal to specify the Size attribute of an atomic object, the Component_Size attribute for an array type with atomic components, or the layout attributes of an atomic component, in a way that prevents the implementation from performing the required indivisible reads and updates. Probably would want to change "indivisible" to "indivisible and independent" in both of the above paragraphs. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 1:30 PM SO I guess you would consider my packed example illegal, and the warning should be a real illegality? **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 2:24 PM > SO I guess you would consider my packed example illegal, and the > warning should be a real illegality? Pragma Pack is a little different. It says "pack as tightly as you can, subject to all the other requirements imposed on the type." So you never need to reject a pragma Pack. I could imagine that in the absence of a pragma Pack, some implementations might make the following array 32-bits/element: type Very_Short is new Integer range 0..7; type VS_Array is array(Positive range <>) of Very_Short; pragma Atomic_Components(VS_Array); but if we add a pragma Pack(VS_Array), I would expect it to be shrunk down to 8 bits per component on machines that allow atomic reference to bytes. In the absence of the pragma Atomic_Components, I would expect it to be shrunk down to 3 or 4 bits/component. **************************************************************** From: Gary Dismukes Sent: Thursday, March 30, 2006 3:05 PM > Pragma Pack is a little different. It says "pack as > tightly as you can, subject to all the other requirements > imposed on the type." So you never need to reject a > pragma Pack. I could imagine that in the absence of > a pragma Pack, some implementations might make the following > array 32-bits/element: But in the case of Annex C compliance you have to follow the recommended level of support, which requires tight packing of things like Boolean arrays as I understand it. There's nothing about "subject to other requirements", so it seems that one of the pragmas would have to be rejected. **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 3:31 PM > But in the case of Annex C compliance you have to follow the > recommended level of support, which requires tight packing > of things like Boolean arrays as I understand it. There's > nothing about "subject to other requirements", so it seems > that one of the pragmas would have to be rejected. Good point. But an existing AARM note implies there is some interplay between a component being aliased and the "size of the component subtype": Ramification: If a component subtype is aliased, its Size will generally be a multiple of Storage_Unit, so it probably won't get packed very tightly. This AARM ramification seems totally unjustified, unless we presumed that there was some kind of implicit "widening" that was occuring on the Size of a component subtype if necessary to satisfy other requirements, such as "aliased," "atomic," etc. But that really doesn't fit with the model, since the *subtype* is not aliased, nor is the component *subtype* atomic in the case of an Atomic_Components pragma. So I think we will definitely need to change the words here if that is what we want, namely the "tight" packing is not required if the components are aliased, by-reference, or atomic. **************************************************************** From: Randy Brukardt Sent: Thursday, March 30, 2006 3:37 PM > But in the case of Annex C compliance you have to follow the > recommended level of support, which requires tight packing > of things like Boolean arrays as I understand it. There's > nothing about "subject to other requirements", so it seems > that one of the pragmas would have to be rejected. As much as I hate to, I agree with Gary. Indeed, I don't see anything about "subject to other requirements" anywhere in 13.2. Here's what the definition of Pack is (this has nothing to do with recommended level of support): "If a type is packed, then the implementation should try to minimize storage allocated to objects of the type, possibly at the expense of speed of accessing components, subject to reasonable complexity in addressing calculations." I don't see that "reasonable complexity" has anything whatsoever to do with "other requirements". And then the Recommended Level of Support pretty much defines what "reasonable complexity" means (by allowing rounding up to avoid crossing boundaries). So I agree that one of the pragmas has to be rejected. (I don't think that any language change is needed to make that a requirement, either, although it would make sense to clarify this so there is no doubt.) A warning (as GNAT gives) is wrong for a compiler following Annex C, and unfriendly otherwise. Silently doing nothing...I better not go there. :-) **************************************************************** From: Randy Brukardt Sent: Thursday, March 30, 2006 3:50 PM > So I think we will definitely need to change the words here > if that is what we want, namely the "tight" packing is not > required if the components are aliased, by-reference, or > atomic. The note was unjustified in Ada 95, but in Ada 2005, we added a blanket permission to reject rep. clauses for components of by-reference and aliased types unless they are confirming. See 13.1(26/2). Remember that pragma Pack is never confirming, so this is the same as saying that it can be rejected (but not required to be rejected) for any aliased or by-reference type. There is even an AARM note (carried over from Ada 95) which notes that Atomic_Components has similar restrictions. But it doesn't look like we ever considered the interaction of Atomic_Components and other rep. clauses. Perhaps it should be included in 13.1(26/2)? (That is, it shouldn't be required to support any non-confirming rep. clauses on such a type, but of course you can if you want.) **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 4:00 PM > As much as I hate to, I agree with Gary. Indeed, I don't see anything about > "subject to other requirements" anywhere in 13.2.... The new paragraph 13.2(6.1) says: If a packed type has a component that is not of a by-reference type and has no aliased part, then such a component need not be aligned according to the Alignment of its subtype; in particular it need not be allocated on a storage element boundary. This is the part that implies that packing is "subject to other requirements." If we changed "aliased" to "aliased or atomic" in the above, I think it would accomplish roughly what I was suggesting. I think you will agree that the above paragraph, combined with 13.3(26.3): For an object X of subtype S, if S'Alignment is not zero, then X'Alignment is a nonzero integral multiple of S'Alignment unless specified otherwise by a representation item. implies that in: type Aliased_Bit_Vector is array (Positive range <>) of aliased Boolean; pragma Pack(Boolean); the components should be aligned on Boolean'Alignment boundaries. I would think the same thing should apply if Atomic_Components is applied to a boolean array. I admit that these paragraphs seem to contradict the recommended level of support, but I think the bug is there, not in the above two paragraphs. > ... > So I agree that one of the pragmas has to be rejected. (I don't think that > any language change is needed to make that a requirement, either, although > it would make sense to clarify this so there is no doubt.) A warning (as > GNAT gives) is wrong for a compiler following Annex C, and unfriendly > otherwise. Silently doing nothing...I better not go there. :-) I suppose it depends on your interpretation of "Pack." I have always taken it as "do as well as you can." If you really have a specific size you need, then specify that with Component_Size, or be sure that there is nothing inhibiting the packing, such as aliased, by-reference, or atomic components. I agree it is friendly to inform the user if the pack has *no* effect, but I wouldn't want to disallow pragma Pack completely in the above example, because array of Boolean might use 32-bits/component in its absence, if byte-at-a-time access is significantly slower than word-at-a-time access on the given hardware. **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 4:32 PM > I suppose it depends on your interpretation of "Pack." I have > always taken it as "do as well as you can." If you really have > a specific size you need, then specify that with Component_Size, > or be sure that there is nothing inhibiting the packing, such > as aliased, by-reference, or atomic components. Well you can interpret it that way if you like, but it is not the definition in the language, which says that for arrays with 1,2,4 bit components, pragma Pack works as expected! > I agree it is friendly to inform the user if the pack has *no* > effect, but I wouldn't want to disallow pragma Pack completely > in the above example, because array of Boolean might use > 32-bits/component in its absence, if byte-at-a-time access is > significantly slower than word-at-a-time access on the given > hardware. I think that is wrong in this case, since pragma Pack for Boolean has precise well defined semantics, and must make the component size 1, it does not mean, do-as-well-as-you-can. **************************************************************** From: Randy Brukardt Sent: Thursday, March 30, 2006 5:18 PM > type Aliased_Bit_Vector is > array (Positive range <>) of aliased Boolean; > pragma Pack(Boolean); > > the components should be aligned on Boolean'Alignment boundaries. > I would think the same thing should apply if Atomic_Components > is applied to a boolean array. Well, in your example, the pragma should be rejected because the type isn't local. But I presume you meant "pragma Pack(Aliased_Bit_Vector);". I see your point, but all it says to me is that the new paragraph shouldn't be conditional. The needed escape is provided by 13.1(26/2) anyway. 13.1(26/2) says that there is no requirement to even support pragma Pack for such a type. > I admit that these paragraphs seem to contradict the recommended > level of support, but I think the bug is there, not in the above > two paragraphs. And I disagree; I think the RLS is correct and the above should simply read: The component of a packed type need not be aligned according to the Alignment of its subtype; in particular it need not be allocated on a storage element boundary. This doesn't require misalignment, it just allows it. The RLS requires it in some cases, but in those cases there is no requirement to support pragma Pack. ... > I suppose it depends on your interpretation of "Pack." I have > always taken it as "do as well as you can." If you really have > a specific size you need, then specify that with Component_Size, > or be sure that there is nothing inhibiting the packing, such > as aliased, by-reference, or atomic components. Pack is defined to "minimize storage, within reason". No exceptions for goofy component types; for those you can't minimize storage. > I agree it is friendly to inform the user if the pack has *no* > effect, but I wouldn't want to disallow pragma Pack completely > in the above example, because array of Boolean might use > 32-bits/component in its absence, if byte-at-a-time access is > significantly slower than word-at-a-time access on the given > hardware. Such hardware is possible, I suppose, but it seems unlikely since it would perform poorly on C code and thus on standard benchmarks. Moreover, there is more to overall performance than just the byte access time; all of the wasted space would cause extra cache pressure and usually would cause the overall run time to be longer. After all, the default representation should be best for "typical" conditions. If your use of a particular type is atypical (you need storage minimization or performance maximization), then you need to declare the type appropriately. For storage minimization, that's pragma Pack. For time maximization, you have to noodle with 'Alignment and/or 'Component_Size, which is difficult; it would be useful if Ada had a pragma Fastest (...) that worked like Pack in reverse (sort of like Pascal unpack) -- space be damned, give me the fastest possible access to these components. So, I don't see any value to pragma Pack in your example; if anything, it is misleading because it does nothing. One of our goals with this amendment, after all, was to reduce the effects of adding or removing "aliased". I don't think that adding or removing "aliased" should change representation if there are rep. clauses (although it might make the rep. clauses illegal) -- otherwise, a simple maintenance change can introduce hard-to-find bugs. Specifically, you're saying that changing: type Bit_Vector is array (Positive range <>) of Boolean; pragma Pack(Bit_Vector); to type Bit_Vector is array (Positive range <>) of aliased Boolean; pragma Pack(Bit_Vector); will *silently* change the representation. Yuk. I'm pretty sure that we'll never do that in our compiler... **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 5:31 PM > will *silently* change the representation. Yuk. I'm pretty sure that we'll > never do that in our compiler... So how *will* your compiler handle these two cases? **************************************************************** From: Randy Brukardt Sent: Thursday, March 30, 2006 5:47 PM > So how *will* your compiler handle these two cases? I presume you're asking about the Ada 2005 update, not the current practice (without the new 13.1(26/2), we just give warnings that nothing will happen). Anyway, in Ada 2005, the first will be accepted, and the second rejected (based on 13.1(26/2) - this is not confirming). The rejection of the second one will make the maintenance programmer remove the pragma, and that will make the change of representation crystal clear. **************************************************************** From: Tucker Taft Sent: Thursday, March 30, 2006 6:00 PM > Anyway, in Ada 2005, the first will be accepted, and the second rejected > (based on 13.1(26/2) - this is not confirming). The rejection of the second > one will make the maintenance programmer remove the pragma, and that will > make the change of representation crystal clear. I'm convinced. And I think pragma Atomic_Components ought to work very much like adding "aliased". So perhaps the only real change is needed in 13.1(24/2): An implementation need not support a nonconfirming representation item if it could cause an aliased object or an object of a by-reference type to be allocated at a nonaddressable location or, when the alignment attribute of the subtype of such an object is nonzero, at an address that is not an integral multiple of that alignment. We should probably change "aliased" above to "aliased or atomic." **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 6:15 PM > We should probably change "aliased" above to "aliased or atomic." or volatile, you don't want extra reads/writes there either. **************************************************************** From: Randy Brukardt Sent: Thursday, March 30, 2006 6:21 PM > We should probably change "aliased" above to "aliased or atomic." I think we'd want to make that change to 13.1(25/2) and 13.1(26/2), too. We don't want to force compilers to handle 4-bit atomic record components, either. (Those could be aligned correctly and still have a size that's too small.) **************************************************************** From: Robert I. Eachus Sent: Thursday, March 30, 2006 7:27 PM >> On a machine that could independently address bits, the two pragmas >> could >> well coexist, so there is some amount of implementation dependence here. > > > There are almost no such machines! I totally agree with the language part of this discussion, but many hardware ISAs allow read-modify-write access. If you can do an AND or an OR as an RMW isntruction, then ORing16#EF# sets the fourth bit of the byte, and ANDing of 16#EF# resets it. (There are often advantages to doing 32 or 64-bit wide operations instead of byte wide operations, especially with modern CPUs, but that is a detail.) Is the RMW instruction atomic? The most interesting case is in the x86 case. If you have a single CPU (or today CPU core) the retirement rules make the instructions atomic from the CPUs point of view. (If an interrupt occurs, either the write has completed, or the instruction will be restarted.) What if you have multiple CPUs, multiple cores, or are interfacing with an I/O device? Better mark the memory as UC (uncacheable) and use the LOCK prefix on the AND or OR instruction, but then it is guaranteed to work. So I would say that the majority of computers in use do support bit-addressable atomic access support--as long as the component values don't cross quad-word boundaries. (There are lots of other CISC CPU designs where this works as well. The first microprocessor I used it on was the M68000, but I had used this trick on many mainframes before then.) **************************************************************** From: Robert I. Eachus Sent: Thursday, March 30, 2006 7:53 PM > So I would say that the majority of computers in use do support > bit-addressable atomic access support--as long as the component values > don't cross quad-word boundaries. Whoops! I got a bit carried away. In the x86 ISA you can only do atomic loads and stores of a set of all one bits or all zero bits. Some other ISAs do allow arbitrary bit patterns to be substituted. You can always use a locked XOR iff each entry in an array is 'owned' by a different thread. So the changes being discussed are needed for the non-boolean cases. However, I would hope that at least the AARM should explain the special nature of atomic bit arrays. **************************************************************** From: Bibb Latting Sent: Thursday, March 30, 2006 11:46 PM > So I would say that the majority of computers in use do support > bit-addressable atomic access support--as long as the component values > don't cross quad-word boundaries. (There are lots of other CISC CPU > designs where this works as well. The first microprocessor I used it on > was the M68000, but I had used this trick on many mainframes before then.) This is a molecular operation, not an atomic operation for: type packed_bits (1..N) of boolean; pragma pack (packed_bits); pragma atomic_components (packed_bits); 1) RMW assumes that the contents on read are the same as write. When dealing with I/O interfaces, this is not always true. 2) Without a data source for the other bits, the operation is not atomic. > Probably would want to change "indivisible" to > "indivisible and independent" in both of the above paragraphs. I think this change is worth considering. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, March 31, 2006 2:07 AM Just to spread a little more oil on the fire... What happens here? type Tab is array (positive range <>) of boolean; pragma pack (Tab); X : Tab (1 ..32); pragma Atomic_Components (X); i.e. when a *type* is packed, but an individual *variable* has atomic components? **************************************************************** From: Robert Dewar Sent: Thursday, March 30, 2006 5:05 AM An error message I trust: > The array_local_name in an Atomic_Components or > Volatile_Components pragma shall resolve to denote the declaration of an > array type or an array object of an anonymous type. Tab don't look anonymous to me :-) **************************************************************** From: Robert I. Eachus Sent: Friday, March 31, 2006 11:27 AM > This is a molecular operation, not an atomic operation for: > > type packed_bits (1..N) of boolean; > pragma pack (packed_bits); > pragma atomic_components (packed_bits); > > 1) RMW assumes that the contents on read are the same as write. When > dealing with I/O interfaces, this is not always true. No, you have to follow the prescription exactly. And although it is possible that some chipsets get this wrong, the ISA specifies what is done exactly because it is used in interfacing between multiple CPUs and CPUs and I/O devices. Oh, and it is about 50 times faster on a Hammer (AMD Athlon64, Turion, or Opteron) CPU because all memory access goes through CPU caches. So if the memory is local to the CPU, it just has to do the RMW in cache, and any other writes to the location can't interrupt. Teechnically the cache line containing the array is Owned by the thread that executes the locked RMW instruction. This means that the data migrates to the local cache, and the CPU connected to the memory has a Shared copy in cache. (Reads are not an issue, they either see the previous state of the array, or the final state.) To repeat, on x86, you must use an AND or OR instruction where the first argument is the bit array you want treated as atomic. (The second argument--the mask--can be a register or an immediate constant.) You must use the LOCK prefix byte, and the page containing the array must be marked as uncacheable. (Yes, Hammer chips cache them anyway, but enforce the atomicity rules. In fact they go a bit further, and don't even allow other reads during the few CPU clocks the cycle takes. If you read a Shared cache line, the read causes a cache snoop that can invalidate the read, and cause the instruction to be retried.) > 2) Without a data source for the other bits, the operation is not > atomic. Did you miss the fact that you have to use an AND or OR instruction with a memory address as the first argument to use the LOCK prefix? This insures that the read and write are seen as atomic by the CPU. Marking the memory as uncacheable is necessary if there are other CPUs and/or I/O devices involved. This ensures that the memory line is locked with Intel CPUs and must be locally Owned by AMD CPUs. If you really think this doesn't work, look at some driver code. I''ve avoided giving example programs, because I'd also need to supply hardware to test the code. **************************************************************** From: Bibb Latting Sent: Friday, March 31, 2006 4:44 PM > If you really think this doesn't work, look at some driver code. I''ve > avoided giving example programs, because I'd also need to supply hardware > to test the code. I *really* think that this doesn't *always* work. I understand the mechanization of memory access that you describe: indeed today there are usually adequate means to obtain exclusive access to a memory element, which when combined with suitable cache management allows implementation of volatile/atomic accesses. However, the underlying assumption is that the address referenced returns the last value written. I'm saying that this isn't always true for memory mapped I/O. An example I encountered was the SCC2692 a number of years ago. It was a really *cheap* chip with 16 bytes of address space. The problem is that the chip doesn't have enough address space to provide both read-back of control registers and adequate status. To work around the problem, the Read/Write line was multiplexed: when you write to the chip you're accessing one register; when you read, you're accessing a different register. So, there are two objects, one for write and another for read, at the *same address*. In terms of C.6, I'm treating (perhaps incorrectly) every addressable element as a variable, which becomes "shared" by application of volatile/atomic. **************************************************************** From: Robert I. Eachus Sent: Friday, March 31, 2006 7:59 PM Ah! I guess I mixed you up by going from the general to the specific case. The Intel 8086, 8088, and 80186, were not designed to support (demand paged) virtual memory, although it could be done. The Intel 80286 was designed to do so, but to call the support a kludge is an insult to most kludges. Since the 80386, and in chip years that is a long time ago, the mechanism I described has been supported as part of the ISA. Right now the AMD and Intel implementations are very different, but the same code will work on all PC compatible CPUs. There may be non-x86 compatible hardware out there that is not capable of correctly doing the (single) bit flipping. But I think that from a language design point of view, we should realize that most CPUs out there will support the packed array of Boolean special.case. I would rather have the RM require it for Real-Time Annex support, and allow compilers for non-conforming hardware to document that. For example, there is an errata for the Itanium2 IA-32 execution layer (#14 on page 67 of http://download.intel.com/design/Itanium2/specupdt/25114140.pdf) But that just means you shouldn't try to run real-time code in IA-32 emulation mode on an Itanium2 CPU. ;-) Incidently notice that there is a lot of magic that goes on in operating systems that may prevent a program from doing this bit-twiddling. That's fine. If a program that uses the Real-Time Annex needs special permissions, document them and move on. I personally think that there is no reason for an OS not to satisfy a user request for an uncacheable (UC) page. It is necessary for real-time code, and harmless otherwise. Especially on the AMD Hammer CPUs, there is no reason to restrict user access to UC pages and/or the LOCK prefix. The actual locking lasts a few nanoseconds. (The memory location will be read, ownership, if necessary transferred to the correct CPU and process. Then the locked RMW cycle takes place in the L1 data cache. Unlocked writes to the bit array can occur during the change of ownership, but the copy used in the RMW cycle is the latest version.) **************************************************************** From: Randy Brukardt Sent: Friday, March 31, 2006 8:36 PM > Ah! I guess I mixed you up by going from the general to the specific > case. No, you missed his point at altogether. It doesn't have anything to do with the CPU! The point is that memory-mapped hardware often doesn't act like memory at all; in particular a location may not be readable or writable or (worse) may return something different when read after writing. You can't make bit-mapped atomic writing work at all in such circumstances, no matter what CPU locking is provided. You are suggesting using Lock Or [Mem],16#10# to set just the fifth bit atomically, but this cannot work on memory-mapped hardware that doesn't allow reading! You'll set the other bits to whatever random junk, not the correct values. Now, the question is what this has to do with the language. You seem to want to insist that compilers support this. But compiler vendors have no control over what hardware their customers build/use. If your rule was adopted, about all vendors could do is put "don't use Atomic_Components with memory-mapped hardware that can only be written" in their manual. But this is nasty; Atomic and Atomic_Components exist in large part because of memory-mapped hardware, and here you're trying to tell people to not use one of them exactly when they are most likely to do so. That doesn't seem to be a good policy. It seems better to me to require users to read/write full storage units in this case, using an appropriate record or array type. There's much less risk of problems in that case. Funny hardware seems to be quite prevalent (remember that we had a long discussion on whether an atomic read/write could read two bytes instead of one word), we have to recognize that. **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 3:05 AM > There may be non-x86 compatible hardware out there that is not capable > of correctly doing the (single) bit flipping. But I think that from a > language design point of view, we should realize that most CPUs out > there will support the packed array of Boolean special.case. I must say I am puzzled, what code do you have in mind for supporting type x is array (1 .. 8) of Boolean; pragma Pack (x); pragma Atomic_Components (x); ... ... x (j) := k; this seems really messy to me **************************************************************** From: Robert A. Duff Sent: Saturday, April 1, 2006 8:41 AM > I *really* think that this doesn't *always* work. I understand the > mechanization of memory access that you describe: indeed today there are > usually adequate means to obtain exclusive access to a memory element, which > when combined with suitable cache management allows implementation of > volatile/atomic accesses. That makes sense. I never thought packed bitfields could be atomic. But I'm confused. Atomic implies volatile, by C.6(8), "...In addition, every atomic type or object is also defined to be volatile." Then C.6(20) says: 20 {external effect (volatile/atomic objects) [partial]} The external effect of a program (see 1.1.3) is defined to include each read and update of a volatile or atomic object. The implementation shall not generate any memory reads or updates of atomic or volatile objects other than those specified by the program. (where "volatile or atomic" means "volatile [or atomic]"). Packed bitfields CAN be volatile. But if we want to write upon a packed bitfield, we must read a whole word first, on most hardware (whether by an explicit load into a register, or an implicit read like the "LOCK OR" instruction Robert Eachus mentioned). Right? So how can one implement volatile bitfields in the way required by C.6(20)? The C.6(22/2) says: Implementation Advice 22/2 {AI95-00259-01} A load or store of a volatile object whose size is a multiple of System.Storage_Unit and whose alignment is nonzero, should be implemented by accessing exactly the bits of the object and no others. (where "volatile" means "volatile [or atomic]", this time ;-)). Is this not implied by C.6(20)? Obviously, I misunderstand what C.6(20) is (intended to) mean **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 8:49 AM > Packed bitfields CAN be volatile. But if we want to write upon a packed > bitfield, we must read a whole word first, on most hardware (whether by an > explicit load into a register, or an implicit read like the "LOCK OR" > instruction Robert Eachus mentioned). Right? So how can one implement > volatile bitfields in the way required by C.6(20)? If C.6(20) requires volatile bit-fields, it is just junk. Implementors don't pay attention to junk :-) **************************************************************** From: Tucker Taft Sent: Saturday, April 1, 2006 9:38 AM My interpretation of C.6(20) would be: If the program includes an update to a bit field, and that requires a read/modify/write sequence on the given hardware, then that is not a violation of the requirement that: The implementation shall not generate any memory reads or updates of atomic or volatile objects other than those specified by the program. The read/modify/write sequence has been "specified" by the program. If the bit fields were atomic, then that would require that the read/modify/write sequence by "indivisible." To take advantage of C.6(20) to deal with "active" memory locations, I think the programmer has to know whether the hardware requires a read/modify/write sequence for the given size of object. If so, then they better be sure that that sequence works for their memory-mapped device. It is not clear how you can say enough in the reference manual to make all of this portable. Hardware differs enough that this will require some issues that can't realistically be addressed without hardware-specific documentation. **************************************************************** From: Robert A. Duff Sent: Saturday, April 1, 2006 9:43 AM > If C.6(20) requires volatile bit-fields, it is just junk. Implementors > don't pay attention to junk :-) Well, it's apparently the intent, given this AARM annotation: 22.b/2 Reason: Since any object can be a volatile object, including packed array components and bit-mapped record components, we require the above only when it is reasonable to assume that the machine can avoid accessing bits outside of the object. I also just noticed: 21 If a pragma Pack applies to a type any of whose subcomponents are atomic, the implementation shall not pack the atomic subcomponents more tightly than that for which it can support indivisible reads and updates. which seems to answer the original question. (Sorry if somebody already pointed this out, and I missed it.) Note that (21) is for atomic, not volatile. **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 10:12 AM > The read/modify/write sequence has been "specified" by the > program. If the bit fields were atomic, then that would > require that the read/modify/write sequence by "indivisible." I really think that's strange, to me if you have a volatile variable, then reads should be reads and writes should be writes. **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 10:13 AM >> If C.6(20) requires volatile bit-fields, it is just junk. Implementors >> don't pay attention to junk :-) > > Well, it's apparently the intent, given this AARM annotation: > > 22.b/2 Reason: Since any object can be a volatile object, including packed > array components and bit-mapped record components, we require the > above only when it is reasonable to assume that the machine can > avoid accessing bits outside of the object. How does this compare with the C rules for interest. It seems obvious to me that volatile in Ada should mean the same as volatile in C. **************************************************************** From: Robert A. Duff Sent: Saturday, April 1, 2006 10:55 AM I agree. C doesn't have packed arrays, but it does have arrays of bytes (char), which might require a read to write deep down in the hardware. It has bitfields in structs. I'm not sure what the rules are for "volatile", but I have heard people claim that whatever they are, even the C language lawyers can't understand them and/or don't agree on what they mean, neither formally nor informally. ;-) **************************************************************** From: Tucker Taft Sent: Saturday, April 1, 2006 11:57 AM Here's what the GNU C reference manual says about volatile: The volatile qualifier tells the compiler to not optimize use of the variable by storing its value in a cache, but rather to fetch its value afresh each time it is used. Depending on the application, volatile variables may be modified autonomously by external hardware devices. So they are focusing on requiring that no caching is performed. They make no mention of reading or writing *more* than is specified by the program. They want to be sure you don't read any *less* than specified. As far as atomic, some versions of C have sig_atomic_t, which is an integer type that is atomic with respect to asynchronous interrupts (i.e. signals). As far as I know, there is no such thing as an atomic bit field in C. **************************************************************** From: Robert A. Duff Sent: Saturday, April 1, 2006 3:34 PM Thanks for looking that up. Interesting. Of course "GNU C" is not "the C standard". And of course, there are different versions of the C standard that might be relevant. I'm too lazy to look it up, and anyway, I suppose I'd have to fork over hundreds of dollars to ISO to do so? I agree with your earlier comment, that given all the myriad hardware out there, we cannot hope to nail down every detail in the definitions of atomic and volatile. **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 4:25 PM I disagree, we can have a clear semantic model (especially critical for atomic), and if hardware cannot accomodate this model, then the pragma must be rejected. So I think that is far too pessimistic. **************************************************************** From: Randy Brukardt Sent: Saturday, April 1, 2006 6:01 PM That's certainly true for Atomic. But Volatile must always be accepted (there is no rule that it can be rejected based on the characteristics of the type), and the model is that compilers do their best to implement it, whatever that is. We added Implementation Advice (in Ada 2005) to avoid reading/writing extra bits, so any cases where that happens has to be documented. That should be enough encouragement to avoid it when possible. But we still want to allow any object to be volatile. (This was all discussed extensively with AI-259.) Indeed, this is the only significant difference between Atomic and Volatile -- otherwise there wouldn't be a need for both. **************************************************************** From: Robert Dewar Sent: Saturday, April 1, 2006 7:32 PM > That's certainly true for Atomic. But Volatile must always be accepted (there is > no rule that it can be rejected based on the characteristics of the type), well then that's an obvious mistake, and sure there is such a rule, you don't have to do anything if it's not practical to do so. > and the model is that compilers do their best to implement it, whatever that is. I find that model absurd > We added Implementation Advice (in Ada 2005) to avoid reading/writing extra > bits, well of course this should be a fundamental requirement of volatile to me > so any cases where that happens has to be documented. That should be enough > encouragement to avoid it when possible. But we still want to allow any object > to be volatile. (This was all discussed extensively with AI-259.) Indeed, this is > the only significant difference between Atomic and Volatile -- otherwise there > wouldn't be a need for both. I cannot believe you just said that!! Of course there is a need for both, they serve totally different functions. The point of atomic is that the read or write can be done in a single instruction. *That's* what distinguishes volatile from atomic. This allows various syncrhonization algorithms based on shared variables. See Norm Shulman's PhD thesis for a very thorough treatment of this subject. So for example, an array of ten integers can be volatile, but it takes ten reads to read it, so it cannot be atomic. Or for a concrete example, if you have a bounded buffer with a reader and a writer not explicitly syncrhonized, then the buffer must be volatile, otherwise the algorithm obviously fails, but the head and tail pointers must be atomic (otherwise the algorithm fails because of race conditions). These two needs are quite quite different. The idea that a single bit in a bit array that has to be assigned with a read/mask/store sequence can be called volatile seems completely silly to me. Fortunately, as far as I can tell, this nonsense language lawyering has zero effect on an implementation. **************************************************************** From: Robert I. Eachus Sent: Sunday, April 2, 2006 3:32 AM > The point of atomic is that the read or write can be done in a single > instruction. *That's* what distinguishes volatile from atomic. This > allows various syncrhonization algorithms based on shared variables. > See Norm Shulman's PhD thesis for a very thorough treatment of > this subject. I seem to have missed a day of strum und drang. But Robert Dewar put his finger on the semantic disconnects. With modern hardware Bit vectors can be* atomic*--updated with a single, uninterruptable CPU instruction, that is also atomic from the point of view of the memory system. Note that the cache manipulations that go on to cause this to occur may be complex, but from our language lawyer point of view, all that matters is the result. On modern hardware, a read may result in 256 bytes being loaded into cache. Not an issue for atomic, as long as changes to the object are atomic from the point of view of the programmer. That means that the meaning of atomic may be diffferent in a compiler that supports Annex D. Of course, now that dual-core CPUs are becomming more common, all compilers may have to insure that atomic works in the presence of multiple CPUs or CPU cores. (And I/O devices as well.) I may have started the confusion by saying that to get atomic behavior in any x86 multiple core environment, you have to ensure that the bit array is stored in UC (incacheable) memory. But in this case, that has nothing to do with volitile--and on the AMD Hammer processors nothing to do with whether or not the bit array can be cached! It is just that the ISA only requires uninterruptable semantics for UC memory. Or to turn that around, not all memory need support atomic updates, but memory must be marked UC for the LOCK prefix to have the expected semantics. (Well, there are circumstances where the OS will handle the exception and provide the expected sematics, but that is more likely to involve server virtualization than memory that is actually unlockable.) > So for example, an array of ten integers can be volatile, but it > takes ten reads to read it, so it cannot be atomic. > > Or for a concrete example, if you have a bounded buffer with a > reader and a writer not explicitly syncrhonized, then the buffer > must be volatile, otherwise the algorithm obviously fails, but > the head and tail pointers must be atomic (otherwise the > algorithm fails because of race conditions). These two needs > are quite quite different. I hope everyone now understands atomic, because this example shows how complex volitile has become! There is the type of hardware volitile memory that Bibb Latting was talking about. However, modern hardware doesn't do single-bit reads and writes. Hardware switches and status bits are collected into registers. A particular register may have bits that are not writeable, and when you write a (32-bit?) word to that location, only the setable bits are changed. Where these registers are internal to the CPU, they usually require special instructions to read or write them. With I/O devices, the registers will be addressable as memory, but again the semantics of reading from and/or writing to those locations is going to be hardware specific. In a perfect world, these operations will all be provided as well documented code-inserts or intrinsic functions. In the case above, volitile has a much different--but also necessary--meaning. Whether or not the data is cached is not important--well it is important if you need speed. What is important is that all CPU cores, (Ada tasks. and) hardware processes see the same data. At this point I really need to talk about cache coherency strategies. AMD uses MOESI (Modified, Owned, Exclusive, Shared, Invalid), while Intel uses MESI (skip the Owner state). What Robert Dewar's example above needs (in the MESI case) is that the bounded buffer *and* the.head and tail pointers must be marked as Shared, or as Modified in one cache, and Invalid in all others. The MOESI protocol allows one copy to be marked as Owned, and the others to be either Shared or Invalid. In the AMD MOESI implementation, updating the owner's copy causes any other copies to first be marked Invalid before the write to the Owned copy completes, then the new value will be broadcast to the other chips and cores. Those that have a (now Invalid) cached copy will update it and mark it again Shared. What if you want to write to a Shared copy? You must first take Ownership. MESI is faster if the next CPU to update the Shared data is random, MOESI Owner state is much, much faster if most updates are localized. (In other words, the CPU (core) that last updated the object is most likely to be the next updater.) Maybe we need to resurrect pragma Shared for this case, and use Volitile to imply the hardware case. Notice that with modern hardware, if all you need is the Shared cache state, then you will often get much better performance, if you write the code that way. (Using Volitile where Shared is appropriate will generate correct but pessimistic code.) This is a case where the hardware is evolving and we need the language to evolve to match. Right now, you need an AMD Hammer CPU to get major speedups, but Intel's Conroe will have a shared L2 cache between cores, and each core will be able to access data in the L1 data cache of the other core. In fact, it may be worthwhile to create real code for Robert Dewar's example, and time it in various hardware configurations. The difference can be a factor of thirty or more. And by the way, since modern CPUs manage data in cache lines, it is worth knowing the sizes of those lines. Intel uses 256 byte lines in their L2 and L3 caches, but some Intel CPUs have 64-byte L1 data cache lines. AMD uses 64 byte cache lines throughout. However, in practice there is little if any difference. AMD's CPUs typically request two cache lines (128 bytes) and only terminate the request after the first line if there is another pending request. Intel requests 256 bytes, but will stop after 128 bytes if there is a pending request. (Intel's L2 cache lines can store a half-line, with the other half empty.) Both AMD and Intel support 'uncached' reads and writes intended to avoid cache pollution. But the smallest guaranteed read or write amount is 128 bits (16 bytes). So any x86 compiler that allows pragma Volitile for in memory objects smaller than 16 bytes is probably living in a state of sin. ;-) **************************************************************** From: Jean-Pierre Rosen Sent: Sunday, April 2, 2006 4:25 PM > I also just noticed: > > 21 If a pragma Pack applies to a type any of whose subcomponents are > atomic, the implementation shall not pack the atomic subcomponents more > tightly than that for which it can support indivisible reads and updates. > > which seems to answer the original question. Not really. The question was about independent addressability. You can have indivisible updates without independent addressability. **************************************************************** From: Tucker Taft Sent: Monday, May 21, 2007 8:11 AM You must not use "must" in an ISO standard. You shall use "shall" instead... ;-) (Although you didn't violate this one, you may not use "may not" either. You shall use "shall not" or you might use "might not" instead.) > ... > !wording > > 13.2 (6.1/2) is renumbered 13.2 (7.1/3) and reads: > > For a packed type that has a component that is of a by-reference type, > aliased, volatile or atomic, the component must be aligned according to Please fully "comma-ize" lists of more than two elements. Hence, "... volatile, or atomic, ..." > the alignment of its subtype; in particular it must be aligned on a > storage element boundary. Why does this last part follow? Can't a subtype have an alignment of zero? > > 13.2 (9) append: > > If the array component must be aligned according to its subtype and the > results of packing are not so aligned, pragma pack should be rejected. This is worded somewhat ambiguously, here using "must" when probably some other word would make more sense. [Editor's note: These editorial changes were made in version /02 of the AI05; this is version /01 of the AI12.] ****************************************************************