CVS difference for ais/ai-00259.txt

Differences between 1.4 and version 1.5
Log of other versions for file ais/ai-00259.txt

--- ais/ai-00259.txt	2003/01/16 02:15:26	1.4
+++ ais/ai-00259.txt	2003/01/24 04:14:27	1.5
@@ -944,3 +944,234 @@
 
 ***********************************************************
 
+From: Robert Eachus
+Sent: Thursday, January 16, 2003  11:12 PM
+
+> I've been working on my homework. At the bottom of this note, you'll find my
+> rewrite of AI-259, based on the minutes of the Bedford meeting.
+>
+> I have two problems with the revised AI:
+> ...
+
+May I say that the whole AI as written is as phony as a three-dollar bill?
+
+Actually I probably shouldn't as that considerably understates the problem.
+There are some processors in embedded and signal-processing applications
+without multi-level caches.  But the reality today is that you had better
+assume that any processor of interest:
+
+1) Has no way of reading less than a cache line of data.  (Yeah, I know.  Some
+processors like the Pentium 4 allow reading half a line into a cache line and
+marking it as such.  But whether or not this happens is usually dependent on
+memory access paterns instead of the specific read issued.  Also, it is
+possible to use PREFETCHNTA and MOVNTQ to move data into and out of the MMX
+registers in some x86 processors bypassing the cache.  But even if those
+instructions are available, they do not promise not to read data into cache,
+only to minimize cache pollution.)
+
+2) Even if the compiler can use draconican means to bypass the caches, do you
+really want that, or get what you want?  The most interesting answer to this is
+the new AMD Opteron (and Athlon 64).  The main memory controller is part of the
+CPU, and all memory accesses go through the cross-bar switch also on the CPU
+chip.  In other words, not only requests from other CPUs, but DMA and other
+memory reads and writes from the video card and I/O devices will be filled from
+L1 or L2 caches if the data is there.  In such a situation, do you even care if
+the flush to main memory ever occurs?
+
+We all know what the AI is trying to say.  But we shouldn't get into the
+problem of overspecification.  If the memory system is coherent, all we care is
+that stale data is not used in I/O or written over more recent data.  How the
+processor and compiler accomplish this doesn't belong in the RM.
+
+***********************************************************
+
+From: Robert I. Eachus
+Sent: Friday, January 17, 2003  8:45 AM
+
+May I say that the whole AI as written is as phony as a three-dollar bill?
+
+I meant to save this message to work on more today, and apparently sent
+it instead.  The comment above comes across as way to strong,
+but it should have been directed, even if appropriate, at only:
+
+For a volatile object all reads and updates of the object as a whole are
+performed directly to memory, and shall not be combined with reads or
+updates of other objects.
+
+The problem in this case is that it is a completely načve assumption that two
+separate assembler instructions will not be combined in the actual machine code
+(Itanium) or during execution by OoO processors with multiple execution pipes.
+To go into the gory details of what this means, if you use two separate
+"machine instructions" to write values from different registers to memory, the
+processor will treat the instructions as independent and executable by
+different pipes.  In an x86 processor, retirement of instructions is required
+to be "in order" but processors can, and do retire multiple instructions
+simultaneously.
+
+At this point the write pipe takes over.  Even if you wrote two separate
+instructions, write combining will combine two writes to the same memory
+location.  The 'hypothetical' example assumes writes to two separate bytes.  If
+these bytes are in the same 64-bit, 128-bit word or whatever the actual memory
+access granularity is, the write will eventually be combined, if not by the
+CPU, by the memory controller (Northbridge).
+
+So is there something that should be said here?  Sure every write to a
+particular volitile location should result in a write instruction, and
+successive writes to the same location cannot be optimized away.  This of
+course is Implementation Advice at best, as I showed with the Opteron example.
+(It is possible for writes to the screen to be only to cache, where the AGP
+card will see them, and we could care less about whether a write to main memory
+ever occurs.)
+
+In another area:
+
+> While I can't see a single byte access taking multiple instructions, it
+> certainly is possible with word registers. In the similar example:
+
+>    X : Word;
+>    for X'Address use .. some mem mapped address
+>    for X'Size use 16;
+>    pragma Atomic (X);
+
+> we want to discourage using a pair of byte writes here.
+
+Correct.  A much better example I think would be an array of Long_Float with
+Volatile_Components.  In a signal processing environment, I certainly don't
+want these writes to be done with a move that uses 32-bit registers, whether or
+not the move is a "single machine instruction".
+
+***********************************************************
+
+From: Tucker Taft
+Sent: Friday, January 17, 2003  10:48 AM
+
+I'm sure it is true that on some machines, the
+concepts discussed in this AI really don't apply.
+However, that doesn't mean the AI isn't useful.
+We aren't just worried about "stale" data.  We
+are worried about the unit in which data is sent
+to the I/O registers.  I have trouble believing
+that many machines which support memory-mapped
+I/O treat reads/writes to I/O in the same way
+they treat read/writes to RAM.  And this whole
+AI is about memory-mapped I/O.  Perhaps that should
+be more explicit.
+
+If you aren't worried about memory-mapped I/O, then
+of course it is safe to gang together atomic accesses.
+The more the merrier (unless of course there is some
+implicit locking going on, and the sequence of the atomic
+accesses affects the lock sequence, and hence might
+affect whether deadlock occurs in the presence of multiple
+tasks).  We are presuming a compiler isn't smart enough
+to know whether a given piece of volatile memory is I/O
+space or not.  Of course, if the compiler knows everything,
+then it can apply the usual "as if" rules, and ignore the
+details of the AI so long as an indistinguishable effect
+is accomplished.
+
+***********************************************************
+
+From: Robert Dewar
+Sent: Friday, January 17, 2003  2:08 PM
+
+<<How the processor and compiler accomplish this doesn't belong in the RM.>>
+
+This is exactly why I prefer the implementation advice approach rather than
+a bogus attempt at a formal requirement.
+
+***********************************************************
+
+From: Robert Dewar
+Sent: Sunday, January 19, 2003  9:43 AM
+
+>Correct.  A much better example I think would be an array of Long_Float with
+>Volatile_Components.  In a signal processing environment, I certainly don't
+>want these writes to be done with a move that uses 32-bit registers, whether
+>or not the move is a "single machine instruction".
+
+
+Please explain more clearly
+
+a) why would you not want this to be done
+
+b) by what possible reading of the RM or possibly modification to the RM,
+given as-if semantics, could you expect the RM to make sure the compkiler
+adheres to your wishes.
+
+Volatile is just about ensuring that stuff gets written or read, not how it
+gets written or read.
+
+***********************************************************
+
+From: Robert Dewar
+Sent: Sunday, January 19, 2003  10:00 AM
+
+> I'm sure it is true that on some machines, the
+> concepts discussed in this AI really don't apply.
+> However, that doesn't mean the AI isn't useful.
+> We aren't just worried about "stale" data.  We
+> are worried about the unit in which data is sent
+> to the I/O registers.  I have trouble believing
+> that many machines which support memory-mapped
+> I/O treat reads/writes to I/O in the same way
+> they treat read/writes to RAM.  And this whole
+> AI is about memory-mapped I/O.  Perhaps that should
+> be more explicit.
+
+I think the whole AI is misdirected if it is concerned with memory mapped
+I/O only. You may have trouble believing that "many machines which support .."
+but in fact it is the normal case that there is nothing special about
+memory-mapped I/O. What may be necessary on some machines is to use special
+instructions to get to mmio or to disable caches etc. But it would be
+quite wrong to have pragma Atomic or Volatile do this kind of mmio required
+special stuff by default, since the utility of Atomic and Volatile extend
+far beyond mmio.
+
+Yes, it is true that in practice the use of pragma Atomic with appropriate
+chosen datatypes may on a specific architecture have the right result but
+it is very difficult to mandate what the right result should be at the
+Ada semantic level in a target independent manner.
+
+***********************************************************
+
+From: Robert Eachus
+Sent: Monday, January 20, 2003  9:05 PM
+
+> Yes, it is true that in practice the use of pragma Atomic with appropriate
+> chosen datatypes may on a specific architecture have the right result but
+> it is very difficult to mandate what the right result should be at the
+> Ada semantic level in a target independent manner.
+
+I think that the sentence would scan better with a comma before "but".
+ However, this 48! word sentence sums up what I was tying to say
+perfectly.*  I know what the RM should mean by Atomic and Volatile, but
+there is no possible way to state the correct rules for UltraSPARC,
+PowerPC, and x86 without enumerating cases.  In fact, as I pointed out,
+with the new AMD Hammer architecture, we will have a significantly
+different situation from most x86 processors, even when it is running in
+legacy x86 mode.  (Hammer will run x86 operating systems with the x86-64
+extensions disabled, but it will still cache "uncacheable" memory pages.)
+
+*Flame retardant, I hope unnecessary.  I am complimenting Robert Dewar
+on expressing the key concept so succinctly, not criticizing. ;-)
+
+***********************************************************
+
+From: Robert Dewar
+Sent: Monday, January 20, 2003 11:14 PM
+
+<<*Flame retardant, I hope unnecessary.  I am complimenting Robert Dewar
+on expressing the key concept so succinctly, not criticizing. ;-)  >>
+
+Definition no flame retardant required, compliment accepted, thankyou :-)
+
+Now, the interesting question is, who is there who knows modern
+architectures well who disagrees with that 48 word sentence? Please
+speak up and explain your position.
+
+Far too much has been written on this with a view of architectures
+that disappeared 20 years ago :-)
+
+***********************************************************
+

Questions? Ask the ACAA Technical Agent