CVS difference for ai05s/ai05-0012-1.txt

Differences between 1.1 and version 1.2
Log of other versions for file ai05s/ai05-0012-1.txt

--- ai05s/ai05-0012-1.txt	2006/04/01 06:59:17	1.1
+++ ai05s/ai05-0012-1.txt	2006/04/20 05:30:39	1.2
@@ -2,7 +2,8 @@
 !standard 13.1(25/2)
 !standard 13.1(26/2)
 !standard 13.2(6.1/2)
-!standard C.7(10)
+!standard C.6(10)
+!standard C.6(21)
 !class binding interpretation 06-03-31
 !status work item 06-03-31
 !status received 06-03-30
@@ -41,8 +42,10 @@
     Alignment of its subtype; in particular it need not be allocated on
     a storage element boundary.
 
-Add "and independent" after "indivisible" in C.7(10).
+Add "and independent" after "indivisible" in C.6(10).
 
+Delete C.6(21).
+
 !discussion
 
 13.2(6.1/2) conflicts with the Recommended Level of Support. Changing the
@@ -50,10 +53,15 @@
 of types with representation clauses to change silently when "aliased" is added
 or deleted. The pragma Pack can be rejected anyway based on 13.1(24-26); there is
 no reason duplicate those permissions here.
+
+Similarly, C.6(21) conflicts with the Recommended Level of Support. We don't want
+the representation of a packed array of Boolean to depend on other keywords
+(like aliased) or pragmas that apply to the type. (That could cause silent
+representation changes during maintenance.)
 
-This change makes C.7(11) redundant. [Should it be deleted? - RLB]
+This change makes C.6(11) redundant. [Should it be deleted? - RLB]
 
-[Potentially, the change to C.7(10) could be removed if the resolution of AI05-0009
+[Potentially, the change to C.6(10) could be removed if the resolution of AI05-0009
 covers it. - RLB.]
 
 !corrigendum 13.1(24/2)
@@ -108,7 +116,7 @@
 Alignment of its subtype; in particular it need not be allocated on
 a storage element boundary.
 
-!corrigendum C.7(10)
+!corrigendum C.6(10)
 
 @drepl
 It is illegal to apply either an Atomic or Atomic_Components pragma to an object or type
@@ -119,6 +127,13 @@
 if the implementation cannot support the indivisible and independent reads and updates
 required by the pragma (see below).
 
+!corrigendum C.6(21)
+
+@ddel
+If a pragma Pack applies to a type any of whose subcomponents are atomic, the
+implementation shall not pack the atomic subcomponents more tightly than that
+for which it can support indivisible reads and updates.
+
 !ACATS test
 
 Since this only allows (rather than requires) an implementation to reject something,
@@ -1010,4 +1025,428 @@
 
 ****************************************************************
 
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  3:05 AM
+
+> There may be non-x86 compatible hardware out there that is not capable 
+> of correctly doing the (single) bit flipping.  But I think that from a 
+> language design point of view, we should realize that most CPUs out 
+> there will support the packed array of Boolean special.case. 
+
+I must say I am puzzled, what code do you have in mind for
+supporting
+
+    type x is array (1 .. 8) of Boolean;
+    pragma Pack (x);
+    pragma Atomic_Components (x);
+    ...
+    ...
+    x (j) := k;
+
+this seems really messy to me
+
+****************************************************************
+
+From: Robert A. Duff
+Sent: Saturday, April  1, 2006  8:41 AM
+
+> I *really* think that this doesn't *always* work.  I understand the
+> mechanization of memory access that you describe: indeed today there are
+> usually adequate means to obtain exclusive access to a memory element, which
+> when combined with suitable cache management allows implementation of
+> volatile/atomic accesses.
+
+That makes sense.  I never thought packed bitfields could be atomic.
+
+But I'm confused.
+
+Atomic implies volatile, by C.6(8), "...In addition, every atomic type or
+object is also defined to be volatile."  Then C.6(20) says:
+
+  20    {external effect (volatile/atomic objects) [partial]} The external
+  effect of a program (see 1.1.3) is defined to include each read and update of
+  a volatile or atomic object. The implementation shall not generate any memory
+  reads or updates of atomic or volatile objects other than those specified by
+  the program.
+
+(where "volatile or atomic" means "volatile [or atomic]").
+
+Packed bitfields CAN be volatile.  But if we want to write upon a packed
+bitfield, we must read a whole word first, on most hardware (whether by an
+explicit load into a register, or an implicit read like the "LOCK OR"
+instruction Robert Eachus mentioned).  Right?  So how can one implement
+volatile bitfields in the way required by C.6(20)?
+
+The C.6(22/2) says:
+
+                            Implementation Advice
+
+  22/2  {AI95-00259-01} A load or store of a volatile object whose size is a
+  multiple of System.Storage_Unit and whose alignment is nonzero, should be
+  implemented by accessing exactly the bits of the object and no others.
+
+(where "volatile" means "volatile [or atomic]", this time ;-)).
+
+Is this not implied by C.6(20)?  Obviously, I misunderstand what C.6(20) is
+(intended to) mean
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  8:49 AM
+
+> Packed bitfields CAN be volatile.  But if we want to write upon a packed
+> bitfield, we must read a whole word first, on most hardware (whether by an
+> explicit load into a register, or an implicit read like the "LOCK OR"
+> instruction Robert Eachus mentioned).  Right?  So how can one implement
+> volatile bitfields in the way required by C.6(20)?
+
+If C.6(20) requires volatile bit-fields, it is just junk. Implementors
+don't pay attention to junk :-)
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Saturday, April  1, 2006  9:38 AM
+
+My interpretation of C.6(20) would be:
+
+   If the program includes an update to a bit field, and
+   that requires a read/modify/write sequence on the given
+   hardware, then that is not a violation of the requirement
+   that:
+
+       The implementation shall not generate any memory
+       reads or updates of atomic or volatile objects other
+       than those specified by the program.
+
+   The read/modify/write sequence has been "specified" by the
+   program.  If the bit fields were atomic, then that would
+   require that the read/modify/write sequence by "indivisible."
+
+To take advantage of C.6(20) to deal with "active" memory
+locations, I think the programmer has to know whether the
+hardware requires a read/modify/write sequence for the given
+size of object.  If so, then they better be sure that that
+sequence works for their memory-mapped device.  It is not
+clear how you can say enough in the reference manual to make
+all of this portable.  Hardware differs enough that this
+will require some issues that can't realistically be addressed
+without hardware-specific documentation.
+
+****************************************************************
+
+From: Robert A. Duff
+Sent: Saturday, April  1, 2006  9:43 AM
+
+> If C.6(20) requires volatile bit-fields, it is just junk. Implementors
+> don't pay attention to junk :-)
+
+Well, it's apparently the intent, given this AARM annotation:
+
+    22.b/2 Reason: Since any object can be a volatile object, including packed
+          array components and bit-mapped record components, we require the
+          above only when it is reasonable to assume that the machine can
+          avoid accessing bits outside of the object.
+
+I also just noticed:
+
+21    If a pragma Pack applies to a type any of whose subcomponents are
+atomic, the implementation shall not pack the atomic subcomponents more
+tightly than that for which it can support indivisible reads and updates.
+
+which seems to answer the original question.  (Sorry if somebody already
+pointed this out, and I missed it.)  Note that (21) is for atomic, not
+volatile.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  10:12 AM
+
+>   The read/modify/write sequence has been "specified" by the
+>   program.  If the bit fields were atomic, then that would
+>   require that the read/modify/write sequence by "indivisible."
+
+I really think that's strange, to me if you have a volatile
+variable, then reads should be reads and writes should be
+writes.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  10:13 AM
+
+>> If C.6(20) requires volatile bit-fields, it is just junk. Implementors
+>> don't pay attention to junk :-)
+> 
+> Well, it's apparently the intent, given this AARM annotation:
+> 
+>     22.b/2 Reason: Since any object can be a volatile object, including packed
+>           array components and bit-mapped record components, we require the
+>           above only when it is reasonable to assume that the machine can
+>           avoid accessing bits outside of the object.
+
+How does this compare with the C rules for interest. It seems obvious
+to me that volatile in Ada should mean the same as volatile in C.
+
+****************************************************************
+
+From: Robert A. Duff
+Sent: Saturday, April  1, 2006 10:55 AM
+
+I agree.  C doesn't have packed arrays, but it does have arrays of bytes
+(char), which might require a read to write deep down in the hardware.
+It has bitfields in structs.  I'm not sure what the rules are for "volatile",
+but I have heard people claim that whatever they are, even the C language
+lawyers can't understand them and/or don't agree on what they mean, neither
+formally nor informally.  ;-)
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Saturday, April  1, 2006 11:57 AM
+
+Here's what the GNU C reference manual says about volatile:
+
+     The volatile qualifier tells the compiler to not optimize
+     use of the variable by storing its value in a cache, but
+     rather to fetch its value afresh each time it is used.
+     Depending on the application, volatile variables may be
+     modified autonomously by external hardware devices.
+
+So they are focusing on requiring that no caching is performed.
+They make no mention of reading or writing *more* than is specified
+by the program.  They want to be sure you don't read any *less*
+than specified.
+
+As far as atomic, some versions of C have sig_atomic_t, which is
+an integer type that is atomic with respect to asynchronous
+interrupts (i.e. signals).  As far as I know, there is no
+such thing as an atomic bit field in C.
+
+****************************************************************
+
+From: Robert A. Duff
+Sent: Saturday, April  1, 2006  3:34 PM
+
+Thanks for looking that up.  Interesting.
+
+Of course "GNU C" is not "the C standard".  And of course, there are different
+versions of the C standard that might be relevant.  I'm too lazy to look it
+up, and anyway, I suppose I'd have to fork over hundreds of dollars to ISO to
+do so?
+
+I agree with your earlier comment, that given all the myriad hardware out
+there, we cannot hope to nail down every detail in the definitions of atomic
+and volatile.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  4:25 PM
+
+I disagree, we can have a clear semantic model (especially critical for
+atomic), and if hardware cannot accomodate this model, then the pragma
+must be rejected. So I think that is far too pessimistic.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Saturday, April  1, 2006  6:01 PM
+
+That's certainly true for Atomic. But Volatile must always be accepted (there is
+no rule that it can be rejected based on the characteristics of the type), and
+the model is that compilers do their best to implement it, whatever that is.
+
+We added Implementation Advice (in Ada 2005) to avoid reading/writing extra bits,
+so any cases where that happens has to be documented. That should be enough
+encouragement to avoid it when possible. But we still want to allow any object
+to be volatile. (This was all discussed extensively with AI-259.) Indeed, this is
+the only significant difference between Atomic and Volatile -- otherwise there
+wouldn't be a need for both.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, April  1, 2006  7:32 PM
+
+> That's certainly true for Atomic. But Volatile must always be accepted (there is
+> no rule that it can be rejected based on the characteristics of the type),
+
+well then that's an obvious mistake, and sure there is such a rule,
+you don't have to do anything if it's not practical to do so.
+
+> and the model is that compilers do their best to implement it, whatever that is.
+
+I find that model absurd
+ 
+> We added Implementation Advice (in Ada 2005) to avoid reading/writing extra
+> bits,
+
+well of course this should be a fundamental requirement of volatile to me
+
+> so any cases where that happens has to be documented. That should be enough
+> encouragement to avoid it when possible. But we still want to allow any object
+> to be volatile. (This was all discussed extensively with AI-259.) Indeed, this is
+> the only significant difference between Atomic and Volatile -- otherwise there
+> wouldn't be a need for both.
+
+I cannot believe you just said that!! Of course there is a need for
+both, they serve totally different functions.
+
+The point of atomic is that the read or write can be done in a single
+instruction. *That's* what distinguishes volatile from atomic. This
+allows various syncrhonization algorithms based on shared variables.
+See Norm Shulman's PhD thesis for a very thorough treatment of
+this subject.
+
+So for example, an array of ten integers can be volatile, but it
+takes ten reads to read it, so it cannot be atomic.
+
+Or for a concrete example, if you have a bounded buffer with a
+reader and a writer not explicitly syncrhonized, then the buffer
+must be volatile, otherwise the algorithm obviously fails, but
+the head and tail pointers must be atomic (otherwise the
+algorithm fails because of race conditions). These two needs
+are quite quite different.
+
+The idea that a single bit in a bit array that has to be assigned
+with a read/mask/store sequence can be called volatile seems
+completely silly to me.
+
+Fortunately, as far as I can tell, this nonsense language lawyering
+has zero effect on an implementation.
+
+****************************************************************
+
+From: Robert I. Eachus
+Sent: Sunday, April  2, 2006  3:32 AM
+
+> The point of atomic is that the read or write can be done in a single
+> instruction. *That's* what distinguishes volatile from atomic. This
+> allows various syncrhonization algorithms based on shared variables.
+> See Norm Shulman's PhD thesis for a very thorough treatment of
+> this subject.
+
+I seem to have missed a day of strum und drang.  But Robert Dewar put 
+his finger on the semantic disconnects.  With modern hardware Bit 
+vectors can be* atomic*--updated with a single, uninterruptable CPU 
+instruction, that is also atomic from the point of view of the memory 
+system.   Note that the cache manipulations that go on to cause this to 
+occur may be complex, but from our language lawyer point of view, all 
+that matters is the result.  On modern hardware, a read may result in 
+256 bytes being loaded into cache.  Not an issue for atomic, as long as 
+changes to the object are atomic from the point of view of the 
+programmer.  That means that the meaning of atomic may be diffferent in 
+a compiler that supports Annex D.  Of course, now that dual-core CPUs 
+are becomming more common, all compilers may have to insure that atomic 
+works in the presence of multiple CPUs or CPU cores.  (And I/O devices 
+as well.)
+
+I may have started the confusion by saying that to get atomic behavior 
+in any x86 multiple core environment, you have to ensure that the bit 
+array is stored in UC (incacheable) memory.  But in this case, that has 
+nothing to do with volitile--and on the AMD Hammer processors nothing to 
+do with whether or not the bit array can be cached!  It is just that the 
+ISA only requires uninterruptable semantics for UC memory.  Or to turn 
+that around, not all memory need support atomic updates, but memory must 
+be marked UC for the LOCK prefix to have the expected semantics.  (Well, 
+there are circumstances where the OS will handle the exception and 
+provide the expected sematics, but that is more likely to involve server 
+virtualization than memory that is actually unlockable.)
+
+> So for example, an array of ten integers can be volatile, but it
+> takes ten reads to read it, so it cannot be atomic.
+>
+> Or for a concrete example, if you have a bounded buffer with a
+> reader and a writer not explicitly syncrhonized, then the buffer
+> must be volatile, otherwise the algorithm obviously fails, but
+> the head and tail pointers must be atomic (otherwise the
+> algorithm fails because of race conditions). These two needs
+> are quite quite different.
+
+I hope everyone now understands atomic, because this example shows how 
+complex volitile has become!  There is the type of hardware volitile 
+memory that Bibb Latting was talking about.  However, modern hardware 
+doesn't do single-bit reads and writes.  Hardware switches and status 
+bits are collected into registers.  A particular register may have bits 
+that are not writeable, and when you write a (32-bit?) word to that 
+location, only the setable bits are changed.  Where these registers are 
+internal to the CPU, they usually require special instructions to read 
+or write them. With I/O devices, the registers will be addressable as 
+memory, but again the semantics of reading from and/or writing to those 
+locations is going to be hardware specific.  In a perfect world, these 
+operations will all be provided as well documented code-inserts or 
+intrinsic functions.
+
+In the case above, volitile has a much different--but also 
+necessary--meaning.  Whether or not the data is cached is not 
+important--well it is important if you need speed.  What is important is 
+that all CPU cores, (Ada tasks. and) hardware processes see the same 
+data.  At this point I really need to talk about cache coherency 
+strategies.  AMD uses MOESI (Modified, Owned, Exclusive, Shared, 
+Invalid), while Intel uses MESI (skip the Owner state).  What Robert 
+Dewar's example above needs (in the MESI case) is that the bounded 
+buffer *and* the.head and tail pointers must be marked as Shared, or as 
+Modified in one cache, and Invalid in all others.  The MOESI protocol 
+allows one copy to be marked as Owned, and the others to be either 
+Shared or Invalid.
+
+In the AMD MOESI implementation, updating the owner's copy causes any 
+other copies to first be marked Invalid before the write to the Owned 
+copy completes, then the new value will be broadcast to the other chips 
+and cores.  Those that have a (now Invalid) cached copy will update it 
+and mark it again Shared.  What if you want to write to a Shared copy?  
+You must first take Ownership. MESI is faster if the next CPU to update 
+the Shared data is random, MOESI Owner state is much, much faster if 
+most updates are localized.  (In other words, the CPU (core) that last 
+updated the object is most likely to be the next updater.)
+
+Maybe we need to resurrect pragma Shared for this case, and use Volitile 
+to imply the hardware case.  Notice that with modern hardware, if all 
+you need is the Shared cache state, then you will often get much better 
+performance, if you write the code that way.  (Using Volitile where 
+Shared is appropriate will generate correct but pessimistic code.)  This 
+is a case where the hardware is evolving and we need the language to 
+evolve to match.  Right now, you need an AMD Hammer CPU to get major 
+speedups, but Intel's Conroe will have a shared L2 cache between cores, 
+and each core will be able to access data in the L1 data cache of the 
+other core.  In fact, it may be worthwhile to create real code for 
+Robert Dewar's example, and time it in various hardware configurations.  
+The difference can be a factor of thirty or more.
+
+And by the way, since modern CPUs manage data in cache lines, it is 
+worth knowing the sizes of those lines.  Intel uses 256 byte lines in 
+their L2 and L3 caches, but some Intel CPUs have 64-byte L1 data cache 
+lines.  AMD uses 64 byte cache lines throughout.  However, in practice 
+there is little if any difference.  AMD's CPUs typically request two 
+cache lines (128 bytes) and only terminate the request after the first 
+line if there is another pending request.  Intel requests 256 bytes, but 
+will stop after 128 bytes if there is a pending request.  (Intel's L2 
+cache lines can store a half-line, with the other half empty.)
+
+Both AMD and Intel support 'uncached' reads and writes intended to avoid 
+cache pollution.  But the smallest guaranteed read or write amount is 
+128 bits (16 bytes). So any x86 compiler that allows pragma Volitile for 
+in memory objects smaller than 16 bytes is probably living in a state of 
+sin. ;-)
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Sunday, April  2, 2006  4:25 PM
+
+> I also just noticed:
+> 
+> 21    If a pragma Pack applies to a type any of whose subcomponents are
+> atomic, the implementation shall not pack the atomic subcomponents more
+> tightly than that for which it can support indivisible reads and updates.
+> 
+> which seems to answer the original question.  
+Not really.
+The question was about independent addressability. You can have 
+indivisible updates without independent addressability.
+
+****************************************************************
 

Questions? Ask the ACAA Technical Agent