CVS difference for ai12s/ai12-0234-1.txt

Differences between 1.2 and version 1.3
Log of other versions for file ai12s/ai12-0234-1.txt

--- ai12s/ai12-0234-1.txt	2017/06/10 04:24:23	1.2
+++ ai12s/ai12-0234-1.txt	2018/03/30 07:55:08	1.3
@@ -470,3 +470,444 @@
 
 ****************************************************************
 
+From: Brad Moore
+Sent: Wednesday, March 28, 2018  12:39 AM
+
+I noticed that there was a fair amount of interest in this AI in the result of
+Randy's straw poll.
+
+I was wondering if my thinking on this, was on track.
+
+Basically, I was thinking that GNAT has implemented a pragma/aspect called
+Lock_Free.
+
+It is documented in the reference manual as follows:
+
+"Syntax: This pragma may be specified for protected types or objects. It
+specifies that the implementation of protected operations must be implemented
+without locks. Compilation fails if the compiler cannot generate lock-free code
+for the operations."
+
+
+I recall that Geert Bosch from Adacore had worked on this a few years ago, and
+that he had written a paper that was presented at one of the conferences that
+explained this feature quite nicely.
+
+My thoughts are that for this AI, we should simply capture the design that
+Adacore currently has implemented and write it up.
+
+Does that make sense?
+
+I know I have at times wanted to use that aspect in my code, but refrained from
+doing so, because it wasn't in the standard, but noticed that there were
+performance benefits when I did experiment with this feature.
+
+My understanding is that many hardware targets support some sort of basic
+compare and swap instruction.
+
+Also, I am the program chair for IRTAW 2018, which is happening in a couple
+weeks. I think this AI might be of interest for discussion with the IRTAW
+people.
+
+****************************************************************
+
+From: Erhard Ploedereder
+Sent: Wednesday, March 28, 2018  7:48 AM
+
+In general, users may need access to the atomic check instruction of an ISA.
+However, these are different on different ISA architectures.
+
+There is compare-and-swap. There is also test-and-set (few ISAs), and there is a
+third one more recently introduced, whose nature I do not recall right now but
+could look up.
+
+Usually, a given ISA supports only one of these ops.
+
+How can you possibly standardize on this without becoming ISA-specific?
+Much better: Target-specific package provided by implementer.
+
+Aside: Because self-made locks are really very dangerous, I would not make it
+easy for users to get at these atomic ops.
+
+****************************************************************
+
+From: Tullio Vardanega
+Sent: Wednesday, March 28, 2018  8:34 AM
+
+I very much concur with Erhard's aside.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Wednesday, March 28, 2018  10:04 AM
+
+> How can you possibly standardize on this without becoming ISA-specific?
+> Much better: Target-specific package provided by implementer.
+
+I agree this is a problem if one tries to provide an interface at too low a
+level. I think Adacore's approach with a boolean Lock_Free aspect that can be
+applied to a protected object is a higher level of abstraction that could
+accommodate mappings to different architectures. This includes providing the
+lock free capability by scheduling or other means, such as could be the case for
+Ravenscar programs if the protected object is known to be only accessed by tasks
+executing on the same processor. To this end, Adacore also added a CPU aspect
+that can be applied to a protected object to specify that the protected object
+can only be accessed tasks running on a specific CPU. This extension to the CPU
+aspect should probably also be a part of this AI.
+
+Paraphrasing from Geert's paper, the goal they started with was to target the
+widest range of systems possible, based on compare and swap or
+load-link/store-conditional primitives.
+
+Their claim is that with the restrictions described below, this feature "can be
+supported by the large majority of current multiprocessor hardware".
+
+To achieve this, they had to apply some restrictions to the use of the Lock_Free
+aspect.
+
+The restrictions include;
+
+- A lock-free protected action cannot have loop, goto, procedure call
+  statements, quantified expressions, or calls to non-static functions.
+
+The reason for this restriction is to bound the time it can take via retries to
+obtain access to the protected object.
+
+- A protected action can only refer to a single component of the protected
+  object, and the size of the component cannot exceed the maximum size for which
+  lock free atomic primitives are supported.
+
+- The protected object cannot access non-local variables. Constants are OK.
+- The parameters to the protected actions must be of elementary types
+- Access values are not allowed.
+- Address clauses are not allowed
+- Imported or exported entities are not allowed
+- Allocators are not allowed.
+- Volatile variables are not allowed
+
+I believe Adacore's implementation also does not currently support protected
+objects that have entries.
+
+If the users code does not meet all of these restrictions, the compilation
+fails. Then either the programmer would try to address the restrictions being
+violated, or remove the Lock_Free aspect and fall back to the default
+implementation for protected objects.
+
+> Aside: Because self-made locks are really very dangerous, I would not
+> make it easy for users to get at these atomic ops.
+
+By building this into the protected object abstraction, I think we'd be
+eliminating the danger, or at least making it a lot safer, while also being a
+lot more portable.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Wednesday, March 28, 2018  11:22 AM
+
+I do not think we should rely on the Lock_Free pragma on protected objects.  It
+has too many special restrictions.  I think we should provide direct access to
+compare-and-swap on atomic objects.  I believe there is a standard API for this
+which is supported by gcc for C, and I would suggest we provide something
+similar.  I believe the interface to the API is not dependent on the ISA, but of
+course the implementation is.  Atomic compare-and-swap is the most fundamental
+operation when building lock-free data structures, and it is quite annoying that
+this cannot be done portably in Ada.  And yes, this operation is for experts
+only, but that doesn't mean such experts don't want to write in Ada, while still
+achieving some level of portability.
+
+Below is the gcc API, extracted from:
+   https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
+
+I think we don't need all the options.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, March 28, 2018  12:23 PM
+
+...
+> Basically, I was thinking that GNAT has implemented a pragma/aspect
+> called Lock_Free.
+...
+> My thoughts are that for this AI, we should simply capture the design
+> that Adacore currently has implemented and write it up.
+>
+> Does that make sense?
+
+That would be an interesting thing to do, but it doesn't have anything to do
+with making updates of atomic objects safe. (Protected object /= atomic object
+:-). I don't think a directive of "don't use atomic objects, use lock_free POs
+instead" would fly -- if it did fly, totally banning atomic objects from "safe"
+tasking code would solve the problem [with or without other features].
+
+Currently, if you have an atomic object A,
+
+    A := A + 1;
+
+is a race condition (as the read and write are atomic, but not the entire
+update). Ada doesn't have atomic updates in any form.
+
+There aren't many algorithms that you can write with atomic objects that don't
+run afoul of this issue. Most implementations have a generic library designed to
+provide the missing update operations, and we wanted to standardize some such
+library so that portable, race condition-free code could be constructed.
+
+****************************************************************
+
+From: Jeff Cousins
+Sent: Thursday, March 29, 2018  5:21 AM
+
+> Currently, if you have an atomic object A,
+>
+>    A := A + 1;
+>
+> is a race condition (as the read and write are atomic, but not the entire update). Ada doesn't have atomic updates in any form.
+
+The x86 family’s LOCK prefix can be applied to increment instructions, as well
+as compare and swap, for just such a common case.  It “would be nice” to have
+usage of such instructions without having to drop into machine code insertions.
+But any such facility would have to be heavily caveated with “if supported by
+the platform” – though we already have something like this for Long_Float and
+Long_Integer, and even within the x86 family it can vary from chip type to chip
+type as to whether your particular lump of silicon actually does the bus
+locking, so it may not be portable.
+
+****************************************************************
+
+From: Erhard Ploedereder
+Sent: Thursday, March 29, 2018  6:26 AM
+
+So, how does the compiler builder implement this atomically on an ISA that does
+not have the CAS-instruction, but instead the test-and-set (or a fetch-and-add)
+instruction?
+
+(Presumably, on such architectures the C++ built-in CASes are simply not
+supported; but CAS does cover many ISA, particularly the x86, so C++ is happy.
+In Ada we shied away from implementation-defined subsetting of the standard
+language.)
+
+****************************************************************
+
+From: Bob Duff
+Sent: Thursday, March 29, 2018   7:46 AM
+
+> The x86 family’s LOCK prefix can be applied to increment instructions,
+> as well as compare and swap, for just such a common case.  It “would
+> be nice” to have usage of such instructions without having to drop
+> into machine code insertions.  But any such facility would have to be
+> heavily caveated with “if supported by the platform” ...
+
+No, all of these features (compare-and-swap, atomic increment, etc) can be
+implemented on all machines.  If the hardware doesn't support it, it can be done
+in software.  AFAIK, that's what gcc does for C -- the primitives are supported
+on all machines.
+
+Of course, users will often want to investigate how efficient the implementation
+is on their particular hardware.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, March 29, 2018  6:48 PM
+
+> Aside: Because self-made locks are really very dangerous, I would not
+> make it easy for users to get at these atomic ops.
+
+One way to do that would be to not trust them as a synchronization method for
+"safe" parallelism. I've been advocating that for a while as the "natural" use
+of atomics is a built-in race condition. (That is, "A := A + 1;" is inherently
+unsafe.) In that case, only users running in "user beware" mode could use
+atomics for synchronization - that would keep them mostly out of the hands of
+the unwashed masses that want the compiler to help with the safe application of
+parallelism.
+
+An aspect Lock_Free might have some utility in such a scheme - but this AI was
+about giving experts portable access to such operations (at least across
+compilers for a single target -- which is valuable by itself).
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, March 29, 2018  7:02 PM
+
+> So, how does the compiler builder implement this atomically on an ISA
+> that does not have the CAS-instruction, but instead the test-and-set
+> (or a fetch-and-add) instruction?
+
+Protected objects always have to be implemented in Ada, and it's always possible
+to do the needed operations in a (virtual) protected operation, so it's hard to
+imagine that there is any operation that can't be implemented on any ISA that
+can support Ada.
+
+We would probably want Implementation Advice that these are implemented in a
+lock-free manner, which would implicitly include a documentation requirement to
+tell the user any that are not implemented that way. (All Implementation Advice
+is an implicit documentation requirement.)
+
+So, the basic operations would be portable anywhere. If the expert (!) user of
+these things needed a lock-free implementation, that would be portable for a
+given ISA (modulo clueless implementers like me :-), and user can always find
+out which operations are lock-free on a particular ISA.
+
+This seems to check all of the boxes (the operations are portably available, are
+lock-free as much as possible, the user can always use the target-specific
+subset if they care about lock-free, etc.).
+
+What's wrong with this plan? (As always, when it comes to this sort of stuff, I
+know just enough to be dangerous. :-)
+
+****************************************************************
+
+From: Erhard Ploedereder
+Sent: Thursday, March 29, 2018  9:34 PM
+
+No doubt. We always can implement everything in software, but is that what the 
+user wants? Surely, if he is trying to write a lock-free programm, he is going
+to be pissed to find that compare-and-swap was implementended with a lock or
+semaphore. This is not just a performance question. It also impacts issues
+like risk of priority inversion and anything that relates to blocking.
+
+His usual route is to provide a target-dependent solution. It will look a bit 
+different, depending on which operation is available that is synchronized by
+the memory bus. He sure would hate not to be told that his compare-and-swap 
+is not mapped to such an operation. If the compiler is mum about this, there
+is a good chance that the issue is not discovered during porting of the
+software.
+
+So, I would have less problems with a set of procedures that encapsulated the
+particular primitives, combined with a mandate to support all that match a
+bus-synchronized operation and a mandate to REJECT calls to any for which said
+operation does not exist.
+
+A software emulation of these instructions that is not lock- and wait-free ist
+the worst of all solutions for embedded systems with deadlines.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, March 29, 2018  11:05 PM
+
+> No doubt. We always can implement everything in software, but is that 
+> what the user wants?
+
+It clearly depends on the user. If the user is just using atomics as the
+easiest way to meet the requirements for parallel synchronization, I doubt
+that they care how it is implemented. For the expert...
+
+> Surely, if he is trying to write
+> a lock-free programm, he is going to be pissed to find that 
+> compare-and-swap was implementended with a lock or semaphore.
+> This is not just a performance question. It also impacts issues like 
+> risk of priority inversion and anything that relates to blocking.
+
+Understood. But as Bob notes, that's how C does it. Not that that is the best
+example :-), but it at least seems to give a data point that it's acceptable 
+to some part of the community.
+
+> His usual route is to provide a target-dependent solution. It will 
+> look a bit different, depending on which operation is available that 
+> is synchronized by the memory bus.
+> He sure would hate not to be told that his compare-and-swap is not 
+> mapped to such an operation. If the compiler is mum about this, there 
+> is a good chance that the issue is not discovered during porting of 
+> the software.
+
+Compilers that are mum violate the Standard. Unfortunately, there's no good 
+way to enforce that rule.
+ 
+> So, I would have less problems with a set of procedures that 
+> encapsulated the particular primitives, combined with a mandate to 
+> support all that match a bus-synchronized operation and a mandate to 
+> REJECT calls to any for which said operation does not exist.
+
+The problem is I don't think we can do that in the Standard, since there is 
+zero chance that we could formally define what a "bus-synchronized operation" 
+is. And any Legality Rule requires that. Moreover, such a rule isn't testable 
+by the ACATS (it does not allow looking at generated code), so it would be 
+effectively meaningless. (And given my experience, probably not enforced by a
+large number of compilers.)
+
+We could (as Robert Dewar used to note) write Implementation Advice to that 
+effect. But that hardly seems better than the original option.
+ 
+> A software emulation of these instructions that is not lock- and 
+> wait-free ist the worst of all solutions for embedded systems with 
+> deadlines.
+
+The only other option that makes sense to me is to forget the atomic library
+and then declare that atomics are not "safe" for the purposes of parallelism.
+(When checking, of course.) That would keep the novices from trying to use
+them, especially to make existing code parallel-safe. And then, as Brad
+suggests, an implementation of aspect Lock_Free seems like the best way to get
+portability of lock-free code. Compilers probably would screw that up, too,
+but at least it could be a bug report then. :-)
+
+****************************************************************
+
+From: Edward Fish
+Sent: Thursday, March 29, 2018  11:32 PM
+
+> No doubt. We always can implement everything in software, but is that 
+> what the user wants? [...] His usual route is to provide a target-dependent
+> solution.
+
+I think the user would typically want general, portable code rather than 
+target-dependent without some indicator/specifier that some dependent form is
+needed; as an example records and representations. If there is not
+representation, then I generally assume that the form sought structure-wise is
+more in the ideal plane than not, whereas with a representation-specification 
+the particulars matter to the point where "ideal" isn't possible. (e.g. When 
+you're interfacing with foreign programs/protocols and have to have field A of
+x-bits, field B of y-bits, 2-bits padding, and field C of 14-bits.)
+
+> It will look a bit different, depending on which operation is 
+> available that is synchronized by the memory bus.
+> He sure would hate not to be told that his compare-and-swap is not 
+> mapped to such an operation. If the compiler is mum about this, there 
+> is a good chance that the issue is not discovered during porting of 
+> the software.
+
+But there's the crux of the issue: should a compare-and-swap be in error in an
+ISA which doesn't have a compare-and-swap instruction? or should the compiler 
+construct the proper SW-construct? -- If such low-level controls are required,
+Ada *DOES* allow for instruction-insertion, albeit implementation-defined.
+
+> So, I would have less problems with a set of procedures that 
+> encapsulated the particular primitives, combined with a mandate to 
+> support all that match a bus-synchronized operation and a mandate to 
+> REJECT calls to any for which said operation does not exist.
+>
+> A software emulation of these instructions that is not lock- and 
+> wait-free ist the worst of all solutions for embedded systems with 
+> deadlines.
+
+It looks like there's a bit of tension between (a) general purpose and
+(b) low-level here... I can't offer much comment regarding embedded
+programming, since my experience there is virtually nil.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, March 29, 2018  11:55 PM
+
+...
+> No doubt. We always can implement everything in software, but is that 
+> what the user wants? Surely, if he is trying to write a lock-free 
+> programm, he is going to be pissed to find that compare-and-swap was 
+> implementended with a lock or semaphore.
+> This is not just a performance question. It also impacts issues like 
+> risk of priority inversion and anything that relates to blocking.
+
+BTW, could you provide a reference that a relative (but not complete) newby
+understand this topic? I keep reading statements like the above, and they seem
+like utter nonsense to me. The form of the implementation shouldn't matter, so
+long as it correctly implements the semantics. (After all, there is no such
+thing as "lock-free"; there's always a lock -- lock-free algorithms are just
+using the data as the lock.)
+
+I read this sort of stuff often enough that it's clear that smart people that
+know more that I do in this area think this is true, but I've never been able
+to find anything that explains why.
+
+****************************************************************

Questions? Ask the ACAA Technical Agent