!standard C.6(14/3) 17-06-09 AI12-0234-1/01 !class Amendment 17-06-09 !status work item 17-06-09 !status received 17-05-19 !priority Low !difficulty Easy !subject Compare-and-swap for atomic objects !summary !problem Lock-free structures are all the rage these days. If one wanted to construct such a structure in Ada, one might use Atomic objects. But Ada does not provide any compare-and-swap operation (or other read-write locked operations, like atomic increment). The RM has Implementation Advice that package Machine_Code include such operations - C.1(11-12), but that is the absolute opposite of a portable operation. !proposal (See Summary.) !wording ** TBD. The author is not going to try to propose anything here, as people more knowledgeable about common machine architectures should do that. He notes that he mainly is aware of test-and-set operations and never heard of compare-and-swap until this discussion. :-) !discussion It seems that the solution will need to be generic, in order to work with any atomic type. For that to make sense, it seems necessary to allow aspect Atomic to be given on a formal type, in order to require any actual type is indeed atomic. The alternative of just saying that if the type is not atomic, then the operation isn't either, seems error-prone. -- Author's note: One thing I learned from this discussion is that at least some the literature uses "Lock-Free" to mean "free of deadlocks and livelocks" rather than the obvious meaning of "has no locks". Those are two wildly different things! Surely a tool could determine if a protected object is deadlock free; whether the implementation contains locks obviously out of the hands of tools. Moreover, almost all synchronized data has something that is acting like a lock (that is, some synchronization data); the exact form of that data should be irrelevant outside of performance concerns. !ASIS No ASIS effect. !ACATS test An ACATS C-Test is needed to check that the new capabilities are supported. !appendix !topic Lock-free data structures !reference Ada 202x RM9 !from Steven-Stewart-Gallus 17-05-19 !keywords lock-freedom concurrency tasking hard-real time !discussion Will Ada 202x have support for lock-free data structures? An API along the style of the GNAT Lock_Free pragma or a generic package like I made at https://sstewartgallus.com/git?p=linted.git;a=blob;f=src/ada-core/src/linted-gcc _atomics.ads;h=a87061e74aa3a7badcfcd7cd0f0f5c0f2abe1908;hb=HEAD that mirrors C++'s support might be useful. Also, a function for x86's pause instruction or similar would be useful. This would all be useful for hard-real time platforms that need very strict timing guarantees. The Lock_Free pragma would probably be easiest for formalizing in SPARK and such. **************************************************************** From: Randy Brukardt Sent: Friday, May 19, 2017 5:15 PM > Will Ada 202x have support for lock-free data structures? No idea; no one has formally asked until 30 minutes ago. In particular, the real-time folks at IRTAW have not (yet?) forwarded any proposals in this idea. Generally, we let them take the lead on real-time issues. And in any case, it is the wrong question. A lock-free data structure is a specific solution, not a problem or capability. Ada already provides a wide variety of ways to write data structures for multitasking, from the very low level (aspects Atomic and Volatile) to the nicely abstract (protected objects and rendezvous). We need to know what cannot be done with the existing features. No one seems interested in explaining what they cannot do in Ada, but rather seem interested in following the herd to use solutions cooked up for languages that don't have the abstract multitasking capabilities of Ada. In many cases, there are better ways to do in Ada what you might have to do in some low-level manner in some other language. (And that's often true in other areas of Ada as well -- OOP in particular.) In particular, I'd like to know the following. (Aside: I always tell people that I know just enough about multitasking to be dangerous, so please humor me if you think these things are obvious.) What capabilities (precisely) are missing from Ada in order for it to directly support low-level lock-free data structures? We most certainly would not want to add a bunch of new low-level capabilities, but rather would want to extend the existing facilities to better support low-level lock-free data structures. It seems obvious to me that Ada needs a portable atomic test-and-set operation, but I don't know if that is enough nor whether that is really practical. Nor is the best form to define that obvious (it would have to be implemented as a built-in in order to get the intended special semantics, which complicates implementation). And the even more important question: in general use, what can you do with a lock-free data structure that you cannot do with a protected object? After all, if you can use a PO to accomplish your task, you should do that as it is far more abstract and portable than any particular implementation could be. And by using a PO, you are letting the compiler chose the most efficient way to implement your data structure for your target rather than making an assumption that very well may be wrong. (Programmers are notoriously bad at determining the efficiency of code and the importance of efficiency of particular code.) Your thoughts (and anyone's, for that matter) on this topics would help guide thinking on these topics. **************************************************************** From: Tucker Taft Sent: Friday, May 19, 2017 5:30 PM Note that AdaCore supports a pragma Lock_Free that can be applied to protected types to cause them to use lock-free primitives (or complain if not possible): http://docs.adacore.com/gnat_rm-docs/html/gnat_rm/gnat_rm/implementation_defined_pragmas.html#pragma-lock-free Be that as it may, I sympathize with having access to a Compare_And_Swap primitive applicable to atomic objects, since most hardware supports that at this point, and from it you can implement essentially any lock-free structure. **************************************************************** From: Randy Brukardt Sent: Friday, May 19, 2017 6:34 PM Thanks Tuck; I was aware of that. That just seems to me to be an inversion -- the compiler ought to select the best implementation, not make the user guess what implementation is best on their target. (I realize this sort of inversion is ingrained in computing; aspect Inline is another example of this sort of inversion -- and I don't like it much either.) It seems clear that there is a hole when creating low-level algorithms (no test-and-set), it's much less obvious that there is such a hole for protected objects (after all, most of anything in a program is not performance critical, so restructuring your data structures to fit some lock-free profile just makes your code harder to understand in most cases). And one assumes that compilers do the best they can for a data structure and don't just fall back on some general algorithm. (What compiler vendor wants to generate slower than necessary code???) **************************************************************** From: Steven Stewart-Gallus Sent: Saturday, May 20, 2017 3:11 PM >> Will Ada 202x have support for lock-free data structures? > > No idea; no one has formally asked until 30 minutes ago. In particular, the > real-time folks at IRTAW have not (yet?) forwarded any proposals in this > idea. Generally, we let them take the lead on real-time issues. Does anyone know how I would get into contact with them? > And in any case, it is the wrong question. A lock-free data structure is a > specific solution, not a problem or capability. I'm sorry I meant capabilities for implementing lock-free data structures. Standard library support for common data structures is an entirely separate question. Under current standard Ada it is not possible to implement certain data structures that guarantee forward progress among all threads without delving into low-level assembly. > Ada already provides a wide variety of ways to write data structures > for multitasking, from the very low level (aspects Atomic and > Volatile) to the nicely abstract (protected objects and rendezvous). Unfortunately, current Atomics support does not provide for compare_and_exchange primitives and similar and so cannot support user-written lock-free data structures. > We need to know what cannot be done with the existing features. Atomic swaps, compare_and_swaps and processor specific thread yield instructions. Also, less strongly ordered atomics that are less expensive. > No one seems interested in explaining what they cannot do in Ada, but rather > seem interested in following the herd to use solutions cooked up for > languages that don't have the abstract multitasking capabilities of Ada. In > many cases, there are better ways to do in Ada what you might have to do in > some low-level manner in some other language. (And that's often true in > other areas of Ada as well -- OOP in particular.) You realize the standard library has to be implemented somewhere right? On some platforms there is no Ada standard library or a very reduced set of capabilities and people have to implement such capability themselves. And as stated previously, there is NO way to implement the timing guarantees such as lock-freedom or wait-freedom in standard Ada. Also, these capabilities cannot be used inside interrupts or signal handlers. > In particular, I'd like to know the following. (Aside: I always tell people > that I know just enough about multitasking to be dangerous, so please humor > me if you think these things are obvious.) > > What capabilities (precisely) are missing from Ada in order for it to > directly support low-level lock-free data structures? We most certainly > would not want to add a bunch of new low-level capabilities, but rather > would want to extend the existing facilities to better support low-level > lock-free data structures. It seems obvious to me that Ada needs a portable > atomic test-and-set operation, but I don't know if that is enough nor > whether that is really practical. Nor is the best form to define that > obvious (it would have to be implemented as auilt-in in order to get the > intended special semantics, which complicates implementation). The C++ and C standards have a somewhat reasonable API. I gave an example of two possible APIs. The Lock_Free pragma of GNAT and the generic package wrapper over intrinsics provided by GCC. I think the Lock_Free pragma approach might be easiest for static analysers such as SPARK to analyze and so would be most suited for hard-real time support. On the other hand, it wouldn't work for wait-freedom requirements. The barest mimimal requirements are probably atomic exchange, atomic compare and exchange and nop or yield instruction. All else can be implemented using them. However, certain atomics cannot be implemented in a wait-free way without being built in and instead must be lock-free. > And the even more important question: in general use, what can you do with a > lock-free data structure that you cannot do with a protected object? After > all, if you can use a PO to accomplish your task, you should do that as it > is far more abstract and portable than any particular implementation could > be. And by using a PO, you are letting the compiler chose the most efficient > way to implement your data structure for your target rather than making an > assumption that very well may be wrong. (Programmers are notoriously bad at > determining the efficiency of code and the importance of efficiency of > particular code.) You cannot guarantee special properties such as wait-freedom, or lock-freedom. **************************************************************** From: Florian Weimer Sent: Monday, May 22, 2017 6:27 AM > That just seems to me to be an inversion -- the compiler ought to > select the best implementation, not make the user guess what > implementation is best on their target. In some cases, lock-free implementations are required for correctness. For example, in glibc, we allegedly cannot use the GCC atomic built-ins because they could be implemented with locks, and that wouldn't give us the wrong behavior. With that background, this GNAT-specific assertion that the compiler must use lock-free operations is useful. **************************************************************** From: Randy Brukardt Sent: Monday, May 22, 2017 1:31 PM Could you explain this further? Why would it matter how synchronization is accomplished? I would have expected that would be an implementation detail. **************************************************************** From: Florian Weimer Sent: Tuesday, May 23 2017 2:16 PM It's very visible with POSIX shared mappings (either anonymous or file-backed). It would matter with I/O memory mappings, too. It also affects the program semantics when signals are involved. Lock-based implementations of atomics can deadlock too easily. The problems are so numerous that it's been concluded (at least on the glibc side) that if the architecture does not support certain atomics, the only way to emulate them if required is with a kernel assist. **************************************************************** From: Tucker Taft Sent: Wednesday, May 24, 2017 1:33 PM One interesting side note about lock-free vs. wait-free — a relatively recent paper documented some research which showed that the effort to produce a “wait-free” structure was rarely worthwhile. Most “lock-free” structures were nearly wait free, without incurring all the complexity of a true “wait-free” algorithm. See the paper by Alistarh, Censor-Hillel, and Shavit: "Are Lock-Free Concurrent Algorithms Practically Wait-Free?” https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paper-18.pdf In any case, I support coming up with a simple standard Ada package for compiler implementors to provide access to Compare-and-Swap and perhaps a few additional operations. **************************************************************** From: Hadrien Grasland Sent: Wednesday, May 24, 2017 4:04 PM If you go down the route of exposing hardware compare-and-swap operations, I would recommend also exposing other widely supported read-modify-write operations, such as swap or fetch-increment/decrement. They are less general than CAS, but where applicable, they enable better algorithms (no ABA issues, no expensive fail-and-retry loop). One example of application for atomic swap is triple buffering (producer transactionally updates a shared RAM value whenever it feels like it, consumer can access the latest update at any point in time), and one example of application for fetch-increment/decrement is reference counting. **************************************************************** From: Stephen Michell Sent: Wednesday, May 24, 2017 5:51 PM Some of the IRTAW members participate in ARG. In particular, Alan Burns, Brad Moore and myself participate. IRTAW holds a workshop approximately every 18-24 months. The last one was held in Benicassim, Spain in April 2016. The next one will either be in the fall of 2017 or spring of 2018. IRTAW has been resistant to putting such mechanisms into the Ada language, preferring instead to permit Ada to use such mechanisms to make mechanisms such as protected types/objects, tasks, etc. more efficient. The mechanism to interface with IRTAW is to watch for the announcement of the next workshop and submit a position paper to attempt to get the issue on the agenda and to come and participate. If you want, I will communicate with the organizer of the next workshop and ensure that interested people receive a direct email announcing the workshop when it is issued. **************************************************************** From: Dirk Craeynest Sent: Thursday, May 25, 2017 4:24 AM And if you would like to interact even sooner with IRTAW members, and the Ada community in general, you should consider to attend the annual Ada-Europe conference. The 22nd International Conference on Reliable Software Technologies - Ada-Europe 2017 - will be held in less than a month (12-16 June 2017) in Vienna, Austria. For more info, see: . **************************************************************** From: Florian Weimer Sent: Friday, May 26, 2017 2:51 PM > If you go down the route of exposing hardware compare-and-swap > operations, I would recommend also exposing other widely supported > read-modify-write operations, such as swap or fetch-increment/decrement. > They are less general than CAS, but where applicable, they enable > better algorithms (no ABA issues, no expensive fail-and-retry loop). And you also need a memory model which goes beyond the current signaling concept. **************************************************************** From: Randy Brukardt Sent: Friday, May 26, 2017 4:00 PM ?? Atomic objects themselves force sequential actions; there's no "signalling" involved. And C.6 makes it clear that may require barriers or fences. What else makes sense (whatever the rules are have to make sense in a wide variety of contexts)? **************************************************************** From: Randy Brukardt Sent: Friday, May 26, 2017 4:57 PM A couple of quick thoughts on this thread... .. > > Ada already provides a wide variety of ways to write data > structures > for multitasking, from the very low level (aspects > Atomic and > Volatile) to the nicely abstract (protected objects and > rendezvous). > > Unfortunately, current Atomics support does not provide for > compare_and_exchange primitives and similar and so cannot support > user-written lock-free data structures. > > > We need to know what cannot be done with the existing features. > > Atomic swaps, compare_and_swaps and processor specific thread yield > instructions. Also, less strongly ordered atomics that are less > expensive. Thanks, that's in part what I was looking for (and expected as well). > > No one seems interested in explaining what they cannot do in Ada, > but rather > seem interested in following the herd to use solutions > cooked up for > languages that don't have the abstract multitasking > capabilities of Ada. In > many cases, there are better ways to do in > Ada what you might have to do in > some low-level manner in some > other language. > (And that's often true in > other areas of Ada as well -- OOP in > particular.) > > You realize the standard library has to be implemented somewhere > right? The standard library is always going to be implementation-specific (and often target-specific as well); there is very little of the GNAT standard library code that would work on Janus/Ada, for one example. Thus, there (A) is no need for portability in the implementation of the standard library (meaning that the Ada Standard isn't particularly relevant there) and (B) depending on implementation-specific features is fine. Which means that is a terrible example! > On some platforms there is no Ada standard library or a very reduced > set of capabilities and people have to implement such capability > themselves. Which brings one back to the original question, why isn't a protected object good enough for that? > And as stated previously, there is NO way to implement the timing > guarantees such as lock-freedom or wait-freedom in standard Ada. I'm very skeptical that any useful timing guarantees could ever be made in a high-level language. I can't tell how long a piece of sequential code produced by my compiler will take to run even on the various Intel machines I have here (it seems to vary as much as a factor of 4 from what I would estimate). Truly portable code that runs on many different processors would be beyond predictable. Mixing that in with the different costs of synchronization on different targets and nothing but the most trivial can have any guarantee. But I'm at the limits of my knowledge here. While it seems to me that the actual implementation of a protected object should not matter (if it has the correct form, it cannot by itself cause deadlock or livelock, and that form can be easily checked by a tool), lots of other people seem to differ. Perhaps they're all focused on the wrong thing (I certainly think so), but obviously I may be missing some key information. ... > You cannot guarantee special properties such as wait-freedom, or > lock-freedom. I'd argue that it's not the job of a programming language to guarantee anything, especially when it comes to timing. (These properties seem misnamed as they generally are meant to make timing guarantees rather than an implementation guarantee. I can see no reason that a properly implemented lock could not make the same guarantee.) And the implementation of a protected object doesn't really matter in terms of whether those properties hold -- it's just the form of the contents that matters. That seems easy to check with tools like SPARK. Anyway, I've said enough, I'll let others more knowledgeable weigh in if they want. **************************************************************** From: Brad Moore Sent: Wednesday, March 28, 2018 12:39 AM I noticed that there was a fair amount of interest in this AI in the result of Randy's straw poll. I was wondering if my thinking on this, was on track. Basically, I was thinking that GNAT has implemented a pragma/aspect called Lock_Free. It is documented in the reference manual as follows: "Syntax: This pragma may be specified for protected types or objects. It specifies that the implementation of protected operations must be implemented without locks. Compilation fails if the compiler cannot generate lock-free code for the operations." I recall that Geert Bosch from Adacore had worked on this a few years ago, and that he had written a paper that was presented at one of the conferences that explained this feature quite nicely. My thoughts are that for this AI, we should simply capture the design that Adacore currently has implemented and write it up. Does that make sense? I know I have at times wanted to use that aspect in my code, but refrained from doing so, because it wasn't in the standard, but noticed that there were performance benefits when I did experiment with this feature. My understanding is that many hardware targets support some sort of basic compare and swap instruction. Also, I am the program chair for IRTAW 2018, which is happening in a couple weeks. I think this AI might be of interest for discussion with the IRTAW people. **************************************************************** From: Erhard Ploedereder Sent: Wednesday, March 28, 2018 7:48 AM In general, users may need access to the atomic check instruction of an ISA. However, these are different on different ISA architectures. There is compare-and-swap. There is also test-and-set (few ISAs), and there is a third one more recently introduced, whose nature I do not recall right now but could look up. Usually, a given ISA supports only one of these ops. How can you possibly standardize on this without becoming ISA-specific? Much better: Target-specific package provided by implementer. Aside: Because self-made locks are really very dangerous, I would not make it easy for users to get at these atomic ops. **************************************************************** From: Tullio Vardanega Sent: Wednesday, March 28, 2018 8:34 AM I very much concur with Erhard's aside. **************************************************************** From: Brad Moore Sent: Wednesday, March 28, 2018 10:04 AM > How can you possibly standardize on this without becoming ISA-specific? > Much better: Target-specific package provided by implementer. I agree this is a problem if one tries to provide an interface at too low a level. I think Adacore's approach with a boolean Lock_Free aspect that can be applied to a protected object is a higher level of abstraction that could accommodate mappings to different architectures. This includes providing the lock free capability by scheduling or other means, such as could be the case for Ravenscar programs if the protected object is known to be only accessed by tasks executing on the same processor. To this end, Adacore also added a CPU aspect that can be applied to a protected object to specify that the protected object can only be accessed tasks running on a specific CPU. This extension to the CPU aspect should probably also be a part of this AI. Paraphrasing from Geert's paper, the goal they started with was to target the widest range of systems possible, based on compare and swap or load-link/store-conditional primitives. Their claim is that with the restrictions described below, this feature "can be supported by the large majority of current multiprocessor hardware". To achieve this, they had to apply some restrictions to the use of the Lock_Free aspect. The restrictions include; - A lock-free protected action cannot have loop, goto, procedure call statements, quantified expressions, or calls to non-static functions. The reason for this restriction is to bound the time it can take via retries to obtain access to the protected object. - A protected action can only refer to a single component of the protected object, and the size of the component cannot exceed the maximum size for which lock free atomic primitives are supported. - The protected object cannot access non-local variables. Constants are OK. - The parameters to the protected actions must be of elementary types - Access values are not allowed. - Address clauses are not allowed - Imported or exported entities are not allowed - Allocators are not allowed. - Volatile variables are not allowed I believe Adacore's implementation also does not currently support protected objects that have entries. If the users code does not meet all of these restrictions, the compilation fails. Then either the programmer would try to address the restrictions being violated, or remove the Lock_Free aspect and fall back to the default implementation for protected objects. > Aside: Because self-made locks are really very dangerous, I would not > make it easy for users to get at these atomic ops. By building this into the protected object abstraction, I think we'd be eliminating the danger, or at least making it a lot safer, while also being a lot more portable. **************************************************************** From: Tucker Taft Sent: Wednesday, March 28, 2018 11:22 AM I do not think we should rely on the Lock_Free pragma on protected objects. It has too many special restrictions. I think we should provide direct access to compare-and-swap on atomic objects. I believe there is a standard API for this which is supported by gcc for C, and I would suggest we provide something similar. I believe the interface to the API is not dependent on the ISA, but of course the implementation is. Atomic compare-and-swap is the most fundamental operation when building lock-free data structures, and it is quite annoying that this cannot be done portably in Ada. And yes, this operation is for experts only, but that doesn't mean such experts don't want to write in Ada, while still achieving some level of portability. Below is the gcc API, extracted from: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html I think we don't need all the options. **************************************************************** From: Randy Brukardt Sent: Wednesday, March 28, 2018 12:23 PM ... > Basically, I was thinking that GNAT has implemented a pragma/aspect > called Lock_Free. ... > My thoughts are that for this AI, we should simply capture the design > that Adacore currently has implemented and write it up. > > Does that make sense? That would be an interesting thing to do, but it doesn't have anything to do with making updates of atomic objects safe. (Protected object /= atomic object :-). I don't think a directive of "don't use atomic objects, use lock_free POs instead" would fly -- if it did fly, totally banning atomic objects from "safe" tasking code would solve the problem [with or without other features]. Currently, if you have an atomic object A, A := A + 1; is a race condition (as the read and write are atomic, but not the entire update). Ada doesn't have atomic updates in any form. There aren't many algorithms that you can write with atomic objects that don't run afoul of this issue. Most implementations have a generic library designed to provide the missing update operations, and we wanted to standardize some such library so that portable, race condition-free code could be constructed. **************************************************************** From: Jeff Cousins Sent: Thursday, March 29, 2018 5:21 AM > Currently, if you have an atomic object A, > > A := A + 1; > > is a race condition (as the read and write are atomic, but not the entire update). Ada doesn't have atomic updates in any form. The x86 family’s LOCK prefix can be applied to increment instructions, as well as compare and swap, for just such a common case. It “would be nice” to have usage of such instructions without having to drop into machine code insertions. But any such facility would have to be heavily caveated with “if supported by the platform” – though we already have something like this for Long_Float and Long_Integer, and even within the x86 family it can vary from chip type to chip type as to whether your particular lump of silicon actually does the bus locking, so it may not be portable. **************************************************************** From: Erhard Ploedereder Sent: Thursday, March 29, 2018 6:26 AM So, how does the compiler builder implement this atomically on an ISA that does not have the CAS-instruction, but instead the test-and-set (or a fetch-and-add) instruction? (Presumably, on such architectures the C++ built-in CASes are simply not supported; but CAS does cover many ISA, particularly the x86, so C++ is happy. In Ada we shied away from implementation-defined subsetting of the standard language.) **************************************************************** From: Bob Duff Sent: Thursday, March 29, 2018 7:46 AM > The x86 family’s LOCK prefix can be applied to increment instructions, > as well as compare and swap, for just such a common case. It “would > be nice” to have usage of such instructions without having to drop > into machine code insertions. But any such facility would have to be > heavily caveated with “if supported by the platform” ... No, all of these features (compare-and-swap, atomic increment, etc) can be implemented on all machines. If the hardware doesn't support it, it can be done in software. AFAIK, that's what gcc does for C -- the primitives are supported on all machines. Of course, users will often want to investigate how efficient the implementation is on their particular hardware. **************************************************************** From: Randy Brukardt Sent: Thursday, March 29, 2018 6:48 PM > Aside: Because self-made locks are really very dangerous, I would not > make it easy for users to get at these atomic ops. One way to do that would be to not trust them as a synchronization method for "safe" parallelism. I've been advocating that for a while as the "natural" use of atomics is a built-in race condition. (That is, "A := A + 1;" is inherently unsafe.) In that case, only users running in "user beware" mode could use atomics for synchronization - that would keep them mostly out of the hands of the unwashed masses that want the compiler to help with the safe application of parallelism. An aspect Lock_Free might have some utility in such a scheme - but this AI was about giving experts portable access to such operations (at least across compilers for a single target -- which is valuable by itself). **************************************************************** From: Randy Brukardt Sent: Thursday, March 29, 2018 7:02 PM > So, how does the compiler builder implement this atomically on an ISA > that does not have the CAS-instruction, but instead the test-and-set > (or a fetch-and-add) instruction? Protected objects always have to be implemented in Ada, and it's always possible to do the needed operations in a (virtual) protected operation, so it's hard to imagine that there is any operation that can't be implemented on any ISA that can support Ada. We would probably want Implementation Advice that these are implemented in a lock-free manner, which would implicitly include a documentation requirement to tell the user any that are not implemented that way. (All Implementation Advice is an implicit documentation requirement.) So, the basic operations would be portable anywhere. If the expert (!) user of these things needed a lock-free implementation, that would be portable for a given ISA (modulo clueless implementers like me :-), and user can always find out which operations are lock-free on a particular ISA. This seems to check all of the boxes (the operations are portably available, are lock-free as much as possible, the user can always use the target-specific subset if they care about lock-free, etc.). What's wrong with this plan? (As always, when it comes to this sort of stuff, I know just enough to be dangerous. :-) **************************************************************** From: Erhard Ploedereder Sent: Thursday, March 29, 2018 9:34 PM No doubt. We always can implement everything in software, but is that what the user wants? Surely, if he is trying to write a lock-free programm, he is going to be pissed to find that compare-and-swap was implementended with a lock or semaphore. This is not just a performance question. It also impacts issues like risk of priority inversion and anything that relates to blocking. His usual route is to provide a target-dependent solution. It will look a bit different, depending on which operation is available that is synchronized by the memory bus. He sure would hate not to be told that his compare-and-swap is not mapped to such an operation. If the compiler is mum about this, there is a good chance that the issue is not discovered during porting of the software. So, I would have less problems with a set of procedures that encapsulated the particular primitives, combined with a mandate to support all that match a bus-synchronized operation and a mandate to REJECT calls to any for which said operation does not exist. A software emulation of these instructions that is not lock- and wait-free ist the worst of all solutions for embedded systems with deadlines. **************************************************************** From: Randy Brukardt Sent: Thursday, March 29, 2018 11:05 PM > No doubt. We always can implement everything in software, but is that > what the user wants? It clearly depends on the user. If the user is just using atomics as the easiest way to meet the requirements for parallel synchronization, I doubt that they care how it is implemented. For the expert... > Surely, if he is trying to write > a lock-free programm, he is going to be pissed to find that > compare-and-swap was implementended with a lock or semaphore. > This is not just a performance question. It also impacts issues like > risk of priority inversion and anything that relates to blocking. Understood. But as Bob notes, that's how C does it. Not that that is the best example :-), but it at least seems to give a data point that it's acceptable to some part of the community. > His usual route is to provide a target-dependent solution. It will > look a bit different, depending on which operation is available that > is synchronized by the memory bus. > He sure would hate not to be told that his compare-and-swap is not > mapped to such an operation. If the compiler is mum about this, there > is a good chance that the issue is not discovered during porting of > the software. Compilers that are mum violate the Standard. Unfortunately, there's no good way to enforce that rule. > So, I would have less problems with a set of procedures that > encapsulated the particular primitives, combined with a mandate to > support all that match a bus-synchronized operation and a mandate to > REJECT calls to any for which said operation does not exist. The problem is I don't think we can do that in the Standard, since there is zero chance that we could formally define what a "bus-synchronized operation" is. And any Legality Rule requires that. Moreover, such a rule isn't testable by the ACATS (it does not allow looking at generated code), so it would be effectively meaningless. (And given my experience, probably not enforced by a large number of compilers.) We could (as Robert Dewar used to note) write Implementation Advice to that effect. But that hardly seems better than the original option. > A software emulation of these instructions that is not lock- and > wait-free ist the worst of all solutions for embedded systems with > deadlines. The only other option that makes sense to me is to forget the atomic library and then declare that atomics are not "safe" for the purposes of parallelism. (When checking, of course.) That would keep the novices from trying to use them, especially to make existing code parallel-safe. And then, as Brad suggests, an implementation of aspect Lock_Free seems like the best way to get portability of lock-free code. Compilers probably would screw that up, too, but at least it could be a bug report then. :-) **************************************************************** From: Edward Fish Sent: Thursday, March 29, 2018 11:32 PM > No doubt. We always can implement everything in software, but is that > what the user wants? [...] His usual route is to provide a target-dependent > solution. I think the user would typically want general, portable code rather than target-dependent without some indicator/specifier that some dependent form is needed; as an example records and representations. If there is not representation, then I generally assume that the form sought structure-wise is more in the ideal plane than not, whereas with a representation-specification the particulars matter to the point where "ideal" isn't possible. (e.g. When you're interfacing with foreign programs/protocols and have to have field A of x-bits, field B of y-bits, 2-bits padding, and field C of 14-bits.) > It will look a bit different, depending on which operation is > available that is synchronized by the memory bus. > He sure would hate not to be told that his compare-and-swap is not > mapped to such an operation. If the compiler is mum about this, there > is a good chance that the issue is not discovered during porting of > the software. But there's the crux of the issue: should a compare-and-swap be in error in an ISA which doesn't have a compare-and-swap instruction? or should the compiler construct the proper SW-construct? -- If such low-level controls are required, Ada *DOES* allow for instruction-insertion, albeit implementation-defined. > So, I would have less problems with a set of procedures that > encapsulated the particular primitives, combined with a mandate to > support all that match a bus-synchronized operation and a mandate to > REJECT calls to any for which said operation does not exist. > > A software emulation of these instructions that is not lock- and > wait-free ist the worst of all solutions for embedded systems with > deadlines. It looks like there's a bit of tension between (a) general purpose and (b) low-level here... I can't offer much comment regarding embedded programming, since my experience there is virtually nil. **************************************************************** From: Randy Brukardt Sent: Thursday, March 29, 2018 11:55 PM ... > No doubt. We always can implement everything in software, but is that > what the user wants? Surely, if he is trying to write a lock-free > programm, he is going to be pissed to find that compare-and-swap was > implementended with a lock or semaphore. > This is not just a performance question. It also impacts issues like > risk of priority inversion and anything that relates to blocking. BTW, could you provide a reference that a relative (but not complete) newby understand this topic? I keep reading statements like the above, and they seem like utter nonsense to me. The form of the implementation shouldn't matter, so long as it correctly implements the semantics. (After all, there is no such thing as "lock-free"; there's always a lock -- lock-free algorithms are just using the data as the lock.) I read this sort of stuff often enough that it's clear that smart people that know more that I do in this area think this is true, but I've never been able to find anything that explains why. **************************************************************** From: Erhard Ploedereder Sent: Friday, March 30, 2018 5:11 PM One of the best references is Alan Burns, Andy Wellings: Real-Time Systems and Programming Languages, Addison Wesley Longmain, ISBN: 978-0-321-41745-9 Here is a canonical scenario for a priority inversion: 4 tasks T1 - T4, numbers indicate priority. High number = high priority. The tasks are periodic, i.e., released at some time intervals. T1 runs, grabs said lock to perform the software-emulated compare-and-swap, but before it is done with it, T4 is released by time-tigger, preempts T1, runs a bit - meanwhile T2 and T3 are released, too. Then T4 asks for the same lock, but can't get it. Shucks. Needs to wait until T1 finishes with the CAS. But T2 and T3 have higher priority than T1. So, T1 needs to wait until T2 and T3 have finished their work. Then, much later, T1 gets to run, completes the CAS and releases the lock. Then, finally, T4 gets to do its CAS. Now, there was a reason for T4 having high priority, namely, the fact that it has the tightest deadline (a general principle in fixed-priority embedded scheduling, known to be optimal). Which is likely to be long past in the scenario above. It T4 controls the brakes in your car, you no longer perceive this as being merely a performance issue. Dead people do not reflect on such things any more. You just saw a priority inversion in action (named so, because T4 behaves for a while as if it had lowest priority 1). There are scheduling schemes that avoid priority inversion, but only if the locks are a concept provided by the run-time system and well understood by the scheduler (ICPP, OCPP, Priority Inheritance, ... Deadline-floor protocol, etc.) You can't find these buggers by testing, because they are highly intermittent, i.e., things need to happen at just the right time to cause the prioity inversion. CAS and friends in the ISA use the synchronization of the memory bus over each memory access instruction to avoid the need for a lock to make the operation atomic, even in the case of multicore. What makes them dangerous is when users apply them to build their own locks to protect some data, because these locks are then unknown to the scheduler. => severe risk of priority inversions, if these löcks cause waits. **************************************************************** From: Randy Brukardt Sent: Friday, March 30, 2018 5:58 PM > Here is a canonical scenario for a priority inversion: > > 4 tasks T1 - T4, numbers indicate priority. High number = high > priority. > The tasks are periodic, i.e., released at some time intervals. > > T1 runs, grabs said lock to perform the software-emulated > compare-and-swap, but before it is done with it, T4 is released by > time-tigger, preempts T1, runs a bit - meanwhile > T2 and T3 are released, too. > > Then T4 asks for the same lock, but can't get it. Shucks. > Needs to wait until T1 finishes with the CAS. But T2 and T3 have > higher priority than T1. So, T1 needs to wait until T2 and T3 have > finished their work. Then, much later, T1 gets to run, completes the > CAS and releases the lock. > Then, finally, T4 gets to do its CAS. Thanks. It's clear the problem here is the fact that T1 gets preempted (I knew there was a reason I dislike preemption :-). I also note that this doesn't happen if the lock is part of a protected object, is a protected action can't be preempted (caused via ceiling priority or whatever) unless no higher priority task can use it. > Now, there was a reason for T4 having high priority, namely, the fact > that it has the tightest deadline (a general principle in > fixed-priority embedded scheduling, known to be optimal). Which is > likely to be long past in the scenario above. > > It T4 controls the brakes in your car, you no longer perceive this as > being merely a performance issue. Dead people do not reflect on such > things any more. This is of course why I want checking on the introduction of parallel execution. Mere mortals cannot see these sorts of issues; the easier it is to introduce parallelism, the more likely it is for these sorts of effects to occur. (I'm happy to have such checking turned off by experts; it necessarily has to be quite conservative and it wouldn't do to force many things to be written as tasks -- which are even less structured.) > You just saw a priority inversion in action (named so, because T4 > behaves for a while as if it had lowest priority 1). > There are scheduling schemes that avoid priority inversion, but only > if the locks are a concept provided by the run-time system and well > understood by the scheduler (ICPP, OCPP, Priority Inheritance, ... > Deadline-floor protocol, etc.) > > You can't find these buggers by testing, because they are highly > intermittent, i.e., things need to happen at just the right time to > cause the prioity inversion. Right. Tasking issues in general are impossible to find, because of that fact -- even if you get them to happen, you can't reproduce them. I seriously have no idea how people do that -- even debugging Janus/Ada's cooperative multitasking is very difficult -- and it's repeatable if you can get rid of any timing effects. > CAS and friends in the ISA use the synchronization of the memory bus > over each memory access instruction to avoid the need for a lock to > make the operation atomic, even in the case of multicore. > > What makes them dangerous is when users apply them to build their own > locks to protect some data, because these locks are then unknown to > the scheduler. => severe risk of priority inversions, if these löcks > cause waits. Makes sense. This suggests that you would prefer that anyone that needs portable synchronization avoid atomic objects altogether (one presumes that the compiler has selected an implementation [of protected objects - ED] known to the scheduler and/or truly atomic -- if the compiler implementer is clueless to these issues you have no hope anyway). Is that a fair conclusion?? I'm primarily interested here in the effect on "checked" synchronization for parallel execution. That needs to be defined so that a ny moderately competent Ada programmer can do the right thing. Since "parallel" is often used as an optimization, it will often be introduced after the fact, so the only thing preventing problems is the checking. **************************************************************** From: Erhard Ploedereder Sent: Friday, March 30, 2018 6:59 PM > I also note that this doesn't happen if the lock is part of a > protected object, as a protected action can't be preempted (caused via > ceiling priority or whatever) unless no higher priority task can use it. True only under a scheduling based on ceiling protocols. Under "plain" fixed-priority preemptive scheduling or even with priority inheritance, the preemption can happen. **************************************************************** From: Edward Fish Sent: Thursday, March 29, 2018 10:16 PM > This is of course why I want checking on the introduction of parallel > execution. But the issue here (preemption of execution) is purely a sequential issue: this is to say, if you have Task-1 and Task-2 where Task-1 where Task-2 is executing and there's a free processor for Task-1 there is no issue. (This issue w/ locks is something different, at least how I learned it [preemption having to do strictly with execution].) > Mere mortals cannot see these sorts of issues; the easier it is > to introduce parallelism, the more likely it is for these sorts of effects > to occur. (I'm happy to have such checking turned off by experts; it > necessarily has to be quite conservative and it wouldn't do to force many > things to be written as tasks -- which are even less structured.) I was really impressed by the Thesis that I referenced in an earlier email -- "Reducing the cost of real-time software through a cyclic task abstraction for Ada" -- I thought it did a great job with increasing the accuracy of schedulability and analysis, at least in the theoretical. > Right. Tasking issues in general are impossible to find, because of that > fact -- even if you get them to happen, you can't reproduce them. I > seriously have no idea how people do that -- even debugging Janus/Ada's > cooperative multitasking is very difficult -- and it's repeatable if you can > get rid of any timing effects. Are they? In the very 'generalest' it may be like the halting-problem and thus impossible... but I don't know that that necessarily translates into some usable subset. Just like how just because Ada's generics are not turing-complete doesn't mean they're unusable. (Indeed, I'd argue that turing-complete in a generic- or template-system hurts usability.) >> CAS and friends in the ISA use the synchronization of the >> memory bus over each memory access instruction to avoid the >> need for a lock to make the operation atomic, even in the >> case of multicore. >> What makes them dangerous is when users apply them to build >> their own locks to protect some data, because these locks are >> then unknown to the scheduler. => severe risk of priority >> inversions, if these löcks cause waits. > Makes sense. This suggests that you would prefer that anyone that needs > portable synchronization avoid atomic objects altogether (one presumes that > the compiler has selected an implementation known to the scheduler and/or > truly atomic -- if the compiler implementer is clueless to these issues you > have no hope anyway). Is that a fair conclusion?? Seems a fair conclusion to me, but the reverse may be interesting: when the synchronization constructs present enough information to the scheduler make such guarantees -- this honestly seems right up Ada's alley or, if not, certainly SPARK's. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, March 30, 2018 11:50 PM >> I also note that this doesn't happen if the lock is part of a >> protected object, as a protected action can't be preempted (caused >> via ceiling priority or whatever) unless no higher priority task can use it. > > True only under a scheduling based on ceiling protocols. Under "plain" > fixed-priority preemptive scheduling or even with priority > inheritance, the preemption can happen. > More precisely: preemption can always happen on tasks in protected actions, but in the case of the priority ceiling protocol, a task can be preempted only by a task that is not allowed to call the same PO, thus preventing priority inversion. **************************************************************** From: Erhard Ploedereder Sent: Saturday, March 31, 2018 11:57 AM >> This is of course why I want checking on the introduction of parallel >> execution. > But the issue here (preemption of execution) is purely a sequential > issue: this is to say, if you have Task-1 and Task-2 where Task-1 where > Task-2 is executing and there's a free processor for Task-1 there is > no issue. (This issue w/ locks is something different, at least how I > learned it [preemption having to do strictly with execution].) Yes and No. In your scenario, the inversion goes away. But what about T2 and T3?. They would preempt T1 if run on the same core. Welcome back to Priority inversion for T4. You can get rid of it only if you have as many cores as tasks (not likely), or not do preemptive scheduling (not real-time), or use the scheduling/lock protocol schemes talked about in the earlier mail. **************************************************************** From: Brad Moore Sent: Saturday, March 31, 2018 6:58 PM >> He sure would hate not to be told that his compare-and-swap is not >> mapped to such an operation. If the compiler is mum about this, there >> is a good chance that the issue is not discovered during porting of >> the software. > > Compilers that are mum violate the Standard. Unfortunately, there's no > good way to enforce that rule. This maybe sounds like another case where having a suppressable error might useful. A language mandated "soft" error sounds like a better way to enforce a compiler to not be mum. >> So, I would have less problems with a set of procedures that >> encapsulated the particular primitives, combined with a mandate to >> support all that match a bus-synchronized operation and a mandate to >> REJECT calls to any for which said operation does not exist. > > The problem is I don't think we can do that in the Standard, since > there is zero chance that we could formally define what a > "bus-synchronized operation" is. And any Legality Rule requires that. If it were a soft error, would it still be considered a legality rule? If not, maybe it could be added without requiring definitions for such things as "bus-synchronized", and perhaps not require an ACATS test. **************************************************************** From: Brad Moore Sent: Saturday, March 31, 2018 7:28 PM > operation when building lock-free data structures, and it is quite > annoying that this cannot be done portably in Ada. And yes, this > operation is for experts only, but that doesn't mean such experts don't > want to write in Ada, while still achieving some level of portability. > > Below is the gcc API, extracted from: > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html I have been investigating this suggestion to create some sort of Ada interface to the gcc API. > I think we don't need all the options. By options, I'm not sure if you were referring to memory order options, or primitives. Here's a quick summary of what I've found so far; The gcc interface appears to map to the C++11 memory model. There are basically 3 main flavors of memory ordering supported. The most restrictive mode is Sequentially Consistent, which means updates are synchronised across all threads, and all threads see the same order of update to atomic variables. I believe this corresponds closely to Ada's definition of Volatile and Atomic objects. This is the safest mode, but also the least efficient since it requires a higher level of synchronisation. We could probably support primitives that are in sequentially consistent mode most easily, since we mostly already have this, and it is the safest mode with the least amount of unexpected behaviours and pitfalls. The next level down in terms of synchronisation requirements is a mode where only the threads involved in the access to a particular atomic variable are synchronised with each other. This capability however effects which optimisations can be applied to the code. For example, hoisting or sinking of code across boundaries of access to atomic variables is disabled. To support this, the compiler needs to be intimately aware and involved where this is used when it is applying optimisations. I am only guessing, but I suspect there may not be enough appetite to introduce such a memory mode if it requires the compiler to mesh well with it. Below that is a relaxed mode where there is no synchronisation between threads, the only guarantee is that a thread wont see previous values of a variable if it has seen a newer value. This is the most efficient mode, but also the most dangerous. Someone who knows what they are doing could in theory use this to write more efficient code. This mode might be easier to implement than the previous mode, since it doesn't enforce "happens-before" constraints. Maybe we could create a new type of Volatile aspect, such as Relaxed_Volatile for this purpose? The other threads of this email chain seem to be suggesting however that the use of atomic primitives in user code to create lock free abstractions should be discouraged, to avoid problems such as priority inversions when the implementation is in software instead of hardware. This has me thinking that the best option for Ada to add capability in this area may be to go back to the Lock_Free aspect idea. That way, the implementation of the lock is provided by the compiler implementation, and fits into the synchronisation model we already have for protected objects. The implementation can choose between multiple implementation techniques, such as transactional memory. A protected object also guides the programmer to write code inside the lock that is better formed such by disallowing potentially blocking operations. Here is a quote from the introduction of Geert's paper that seems relevant. "The use of atomic primitives, memory barriers or transactional memory are implementation details after all, that should not appear in actual user code [1]." Where the reference is; H.-J. Boehm. Transactional memory should be an implementation technique, not a programming interface. In Proceedings of the First USENIX conference on Hot topics in parallelism, HotPar’09, pages 15–15, Berkeley, CA, USA, 2009. USENIX Association. I did go through the exercise of creating an Ada package spec for the functions described in Tucker's link and came up with a generic package which I've attached if anyone is interested. Perhaps many of the primitives would not be needed, such as the arithmetic ones, and the basic load and store routines. ---- atomic_operations.ads ---- with Interfaces; generic type Atomic_Type is mod <>; package Atomic_Operations is pragma Assert (Atomic_Type'Size = 1 or else Atomic_Type'Size = 2 or else Atomic_Type'SIze = 4 or else Atomic_Type'Size = 8); type Memory_Orders is (Relaxed, -- Implies no inter-thread ordering constraints. Consume, -- This is currently implemented using the stronger Acquire memory -- order because of a deficiency in C++11's semantics for -- memory_order_consume. Acquire, -- Creates an inter-thread happens-before constraint from the Release -- (or stronger) semantic store to this acquire load. Can prevent -- hoisting of code to before the operation. -- Note: This implies the compiler needs to be pretty aware of this -- setting, as it affects optimisation. i.e. Calls that use this order -- are not just regular library calls Release, -- Creates an inter-thread happens-before-constraint to acquire (or -- stronger) semantic loads that read from this release store. Can -- prevent sinking of code to after the operation. -- Note: This implies the compiler needs to be pretty aware of this -- setting, as it affects optimisation. i.e. Calls that use this order -- are not just regular library calls Acquire_Release, -- Combines the effects of both Acquire and Release. -- Note: This implies the compiler needs to be pretty aware of this -- setting, as it affects optimisation. i.e. Calls that use this order -- are not just regular library calls Sequentially_Consistent -- Enforces total ordering with all other Sequentially_Consistent -- operations. This is basically equivalent to Ada's Volatile and -- Atomic semantics ); function Atomic_Load (From : aliased Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Acquire or else Memory_Order = Consume or else Memory_Order = Sequentially_Consistent; -- -- Returns the value of From procedure Atomic_Load (From : aliased Atomic_Type; Result : out Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Acquire or else Memory_Order = Consume or else Memory_Order = Sequentially_Consistent; -- -- Returns the value of From into Result. procedure Atomic_Store (Into : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Release or else Memory_Order = Sequentially_Consistent; -- -- Writes Value into Into function Atomic_Exchange (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Acquire or else Memory_Order = Release or else Memory_Order = Acquire_Release or else Memory_Order = Sequentially_Consistent; -- -- Writes Value into Item, and returns the previous value of Item. procedure Atomic_Exchange (Item : aliased in out Atomic_Type; Value : Atomic_Type; Result : out Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Acquire or else Memory_Order = Release or else Memory_Order = Acquire_Release or else Memory_Order = Sequentially_Consistent; -- -- Stores the value of Value into Item. The original value of Item is -- copied into Result. function Atomic_Compare_And_Exchange (Item : aliased in out Atomic_Type; Expected : aliased Atomic_Type; Desired : Atomic_Type; Weak : Boolean; Success_Memory_Order : Memory_Orders; Failure_Memory_Order : Memory_Orders) return Boolean with Convention => Intrinsic, Pre => Failure_Memory_Order /= Release and then Failure_Memory_Order /= Acquire_Release and then Failure_Memory_Order <= Success_Memory_Order; -- -- Compares the value of Item with the value of Expected. If equal, the -- operation is a read-modify-write operation that writes Desired into -- Item. If they are not equal, the operation is a read and the current -- contents of Item are written into Expected. -- Weak is true for weak compare_and_exchange, which may fail spuriously, -- and false for the strong variation, which never fails spuriously. Many -- targets only offer the strong variation and ignore the parameter. -- When in doubt, use the strong variation. -- -- If desired is written into Item then True is returned and memory is -- affected according to the memory order specified by -- Success_Memory_Order. There are no restrictions on what memory order can -- be used here. -- -- Otherwise, False is returned and memory is affected according to -- Failure_Memory_Order. -------------------------------------------------------------------- -------------------------- -- Arithmetic Operations -- -- The following functions perform the operation suggested by the name, -- and return the result of the operation. -- -- i.e. Item := Item op Value; return Item; -- -- All memory orders are valid. function Atomic_Add_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Subtract_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Bitwise_And_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Bitwise_Or_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Xor_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Nand_And_Fetch (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; ------------------------------------------------------------------------- -- The following functions perform the operation suggested by the name, -- and return the value that had previously been in Item. -- -- i.e. Tmp := Item; Item := Item op Value; return Tmp; -- -- All memory orders are valid. ------------------------------------------------------------------------ function Atomic_Fetch_And_Add (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Fetch_And_Subtract (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Fetch_And_Bitwise_And (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Fetch_And_Bitwise_Or (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Fetch_And_Xor (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Fetch_And_Nand (Item : aliased in out Atomic_Type; Value : Atomic_Type; Memory_Order : Memory_Orders := Sequentially_Consistent) return Atomic_Type with Convention => Intrinsic; function Atomic_Test_And_Set (Item : aliased in out Interfaces.Unsigned_8; Memory_Order : Memory_Orders := Sequentially_Consistent) return Boolean with Convention => Intrinsic; -- -- Performs an atomic test-and-set operation on Item. Item is set to some -- implementation defined nonzero "set" value and the return value is true -- if and only if the previous contents were "set". -- All memory orders are valid. procedure Atomic_Clear (Item : aliased in out Interfaces.Unsigned_8; Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic, Pre => Memory_Order = Relaxed or else Memory_Order = Release or else Memory_Order = Sequentially_Consistent; -- Performs an atomic clear operation on Item. After the operation, Item -- contains 0. This call should be used in conjunction with -- Atomic_Test_And_Set. procedure Atomic_Thread_Fence (Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic; -- This procedure acts as a synchronization fence between threads based on -- the specified memory order. All memory orders are valid. procedure Atomic_Signal_Fence (Memory_Order : Memory_Orders := Sequentially_Consistent) with Convention => Intrinsic; -- This procedure acts as a synchronization fence between a thread and -- signal handlers int he same thead. All memory orders are valid. function Atomic_Always_Lock_Free return Boolean with Convention => Intrinsic; -- Returns True if objects always generate lock-free atomic instructions -- for the target architecture. function Atomic_Always_Lock_Free (Item : aliased Atomic_Type) return Boolean with Convention => Intrinsic; -- Returns True if objects always generate lock-free atomic instructions -- for the target architecture. Item may be used ot determine alignment. -- The compiler may also ignore this parameter. function Atomic_Is_Lock_Free (Item : aliased Atomic_Type) return Boolean with Convention => Intrinsic; -- -- This function returns True if objects always generate lock-free atomic -- instructions for the target architecture. If the intrinsic function is -- not know to be lock-free, a call is made to a runtime routine to -- determine the answer. Item may be used ot determine alignment. -- The compiler may also ignore this parameter. end Atomic_Operations; **************************************************************** From: Erhard Ploedereder Sent: Saturday, March 31, 2018 7:43 PM >> So, I would have less problems with a set of procedures that >> encapsulated the particular primitives, combined with a mandate to >> support all that match a bus-synchronized operation and a mandate to >> REJECT calls to any for which said operation does not exist. > The problem is I don't think we can do that in the Standard, since > there is zero chance that we could formally define what a > "bus-synchronized operation" is. And any Legality Rule requires that. Not so difficult: "A call on xyz is illegal if the compiler does not map it to a single atomic instruction of the target architecture." (or rewrite this an implementation requirement, with an "otherwise produce an error/warning/diagnostic message ). Incidentally, C.1(11) already has the recommendation that intrinsic subprograms for the set of these atomic memory operations be provided. (Atomic increment is one of them - this was mentioned in another mail as being useful when available.) The way C.1 is written, one would expect the respective target-specific subset of atomic memory operations. This would be a good place to standardize their signatures and be done with the AI. **************************************************************** From: Tucker Taft Sent: Sunday, April 1, 2018 4:39 PM I don't think we should delve into the various memory ordering options. It just seems like overkill given how rarely these will be used. Remember C++ doesn't have an equivalent to the protected object (at least not yet! -- never say never in C++ ;-). So we just need to address the main usage, I believe. So I would go with Sequentially Consistent, and those who are desperate for some looser synchronization can write their own primitives, or pay their vendor to provide it. Standardizing the Lock_Free aspect might be worth doing, but I don't believe that addresses the main goal here, which is to provide atomic operations on atomic objects. ****************************************************************