Version 1.13 of ai12s/ai12-0234-1.txt

Unformatted version of ai12s/ai12-0234-1.txt version 1.13
Other versions for file ai12s/ai12-0234-1.txt

!standard C.6.1(0)          19-03-09 AI12-0234-1/05
!standard C.6.2(0)
!standard C.6.3(0)
!class Amendment 17-06-09
!status work item 17-06-09
!status received 17-05-19
!priority Low
!difficulty Easy
!subject Compare-and-swap for atomic objects
!summary
!problem
On multiprocessor platforms, relying on locks for synchronization can be problematic. One issue is that while a task is blocked, it can be difficult or impossible to offer guarantees for progress. Other problems associated with locks includes deadlock, livelock, and priority inversion. Locking can also be a significant detriment to performance, and reduce opportunities for parallelism.
Lock-free data structures guarantee system-wide progress, while wait-free data structures, in addition to being lock-free, also guarantee per-thread progress.
Lock-free data structures can also be used to improve performance by increasing the amount of time spent executing in parallel since access to the data structure does not need to be serialised.
If one wanted to construct such a structure in Ada, one might use Atomic objects. But Ada does not provide any compare-and-swap operation (or other read-write locked operations, like atomic increment). The RM has Implementation Advice that package Machine_Code include such operations - C.1(11-12), but that is the absolute opposite of a portable operation. One could write such a library in another language such as C and call that library from Ada, but then that involves at least one subprogram call in addition to a single instruction insertion, which is significantly higher overhead than what might be desired, and having to rely on the existence of other languages for a given target is also undesirable and less portable.
Compare-and-swap in particular is likely one of the most important capabilities needed for atomic operations. This is because compare-and-swap instructions can be used as a building block to create all the other atomic operations one might want to use. Ada should provide a way to more portably insert Compare-And-Swap instructions into an application.
Another related capability is to simply swap the value of an atomic variable with another value. That is, write to an atomic variable, and atomically retrieve the original value of the variable before the write operation. Such a capability can be useful for creating Spin-Locks, for example. Ada should provide a way to more portably insert Swap instructions into an application?
!proposal
The solution is to provide a couple of standard library calls that map to commonly available atomic hardware instructions; One of which that performs atomic compare and swap operations, and the other of which that performs atomic swap operations. These subprograms are to be intrinsic calls. If the target architecture does not support these operations with simple machine instructions, then the subprograms are implemented to produce the same result using proper synchronization for concurrent access.
The library is a generic function in order to support operations on types of different sizes, and we require that the actual types for the generics be atomic types, so that Ada's semantics of atomic types can be associated with this primitive operation.
This proposal depends on the ability to specify the Atomic aspect for a generic formal type (AI12-0282-1). This is needed to ensure that the actual types used to instantiate the generic libraries defined by this AI are atomic types.
!wording
C.6.1 Atomic Operations
This clause presents the specifications of the package Atomic_Operations and its child packages, which provide facilities for manipulating object of atomic types and for creating lock-free data types. The subprograms of these packages are Intrinsic subprograms (see 6.3.1) in order to provide convenient access to machine operations that can provide these capabilities if they are available in the target environment.
C.6.2 The Package System.Atomic_Operations
The library package System.Atomic_Operations is the parent of a set of child units that provide facilities for manipulating objects of atomic types.
Static Semantics
The following language-defined library package exists:
package System.Atomic_Operations with Pure, Nonblocking is end System.Atomic_Operations;
System.Atomic_Operations serves as the parent of other language-defined library units that manipulate atomic objects; its declaration is empty.
A call to a subprogram is said to be lock-free if the subprogram is guaranteed to return from the call while keeping the processor of logical thread of control busy for the duration of the call.
In each child package, a function Is_Lock_Free(...) is provided to check whether the operations of the child package can all be provided lock-free for a given object. Is_Lock_Free returns True if operations defined in the child package are lock-free when applied to the object denoted by Item, otherwise Is_LocK_Free returns False.
C.6.3 The Generic Package System.Atomic_Operations.Exchange
The language-defined generic package System.Atomic_Operations.Exchange provides the following operations:
- To atomically compare the value of two atomic objects, and update the first
atomic object with a desired value if both objects were found to be equal, or otherwise update the second object with the value of the first object.
- To atomically update the value of an atomic object, and then
return the value that the atomic object had just prior to the update.
Static Semantics
The library function Ada.Atomic_Operations.Atomic_Compare_And_Exchange has the following declaration:
The library package System.Atomic_Operations.Exchange has the following declaration:
generic type Atomic_Type is private with Atomic; package System.Atomic_Operations.Exchange with Pure, Nonblocking is
function Atomic_Exchange (Item : aliased in out Atomic_Type; Value : Atomic_Type) return Atomic_Type with Convention => Intrinsic;
function Atomic_Compare_And_Exchange (Item : aliased in out Atomic_Type; Expected : aliased in out Atomic_Type; Desired : Atomic_Type) return Boolean with Convention => Intrinsic;
function Is_Lock_Free (Item : aliased Atomic_Type) return Boolean with Convention => Intrinsic;
end Atomic_Operations.Exchange;
Atomic_Exchange writes the value denoted by Value into Item, and returns the previous value denoted by Item.
Atomic_Compare_And_Exchange compares the value denoted by Item with the value denoted by Expected. If equal, the operation is a read-modify-write operation that writes the value denoted by Desired into Item. If they are not equal, the operation is a read and the value denoted by Item is written into Expected. If Desired is written into Item then True is returned. Otherwise, False is returned.
Examples
Example of a spinlock using Atomic_Exchange
type Atomic_Boolean is new Boolean with Atomic; package Exchange is new Atomic_Operations.Exchange (Atomic_Type => Atomic_Boolean);
Lock : aliased Atomic_Boolean := False;
...
begin -- Some critical section, trying to get the lock:
-- Obtain the Spinlock while (Exchange.Atomic_Exchange (Item => Lock, Value => Atomic_Boolean(True)) loop null; end loop
... -- do stuff
lock := False; -- Release the lock end;
!discussion
The solution needs to be generic in order to work with different atomic types. For that to make sense, it is necessary to allow aspect Atomic to be given on the formal type, in order to require any actual type is indeed atomic. The alternative of just saying that if the type is not atomic, then the operation isn't either, seems error-prone.
Many lock free algorithms involve atomic primitives such as compare and swap instructions. Most target platforms provide some form of machine instructions in this category.
However, most target architectures have limits on the size of objects that can be mapped to hardware machine instructions. Furthermore, some target architectures have sub-architectures that do not always provide such machine instructions. For example, an executable compiled for one target might support 64 bit atomic instructions, but only 32 bit atomic instructions on older versions of the same target family. It is desirable that such machine instructions be used when they are available, but that the software should still work and provide the same results when executing on another target that does not provide such instructions.
Similarly, the alignment of the object can sometimes determine whether lock-free instructions may be used on some architectures.
To allow the programmer to query the implementation to determine if the operations declared in child units of System.Atomic_Operations can be executed as lock-free calls, an additional function is provided in each of the child packages: Is_Lock_Free.
The gcc intrinsic libraries also provide a call to determine if the target environment always provides lock-free versions of the calls, for a given size, and alignment. We considered whether we should similarly define a call Alway_Lock_Free, with similar semantics, but decided that there would not be enough need to provide such a call. We could add such a call later, if the need arises.
The approach taken to improving support for lock free data structures is to provide a set of libraries of low level atomic primitives similar to the library that is provided by gcc for C and C++.
The C and C++ memory model support several different memory orders. The most restrictive memory order is called sequentially consistent, where all threads are synchronised such that updates to shared variables are seen by all threads, and the order seen by all threads is the same. Due to the higher level of synchronisation, this is also the most inefficient memory order, but it is the default memory order because it is the safest, and produces the least number of surprising results.
Moving towards lower level of synchronisation, there are memory orders called Acquire and Release, where essentially the synchronisation is limited to the threads involved in the atomic operations for a particular shared variable.
Relaxing the synchronisation even further is a memory order called Relaxed, where this synchronisation is not present. There are guarantees that a given thread will not see older values of variables once it has seen an update to a variable, but that is about the limit of the guarantees, other than that updates are seen in some sort of time bounded manner. Programmers using this memory order in theory should be able to write more efficient code, though it is can be much trickier to get code to work properly, since there are more unexpected behaviours due to the lack of synchronisation between threads. One other point that should be mentioned is that these atomic primitives would need to be have the Intrinsic convention, because they can require the disabling of certain compiler optimisations, such as hoisting or sinking code across boundaries where atomic primitives are being used. For instance the Acquire/Release memory order has this requirement in particular.
For the approach of this AI, it probably would be best for now to not bother with supporting the lower synchronisation memory orders, and instead provide a library which implicitly assumes the sequentially consistent memory order, which is both safer to use, and easier to understand.
The library of intrinsic primitives might be of interest to those wishing to implement specific lock free algorithms, particularly if porting those applications from other languages.
We considered whether these libraries should be "future proofed" by providing an extra parameter for the memory model. The type of that parameter could be an enumeration which for now only contains Sequentially_Consistent, but could be extended in the future to name other memory models. We decided that there is not likely to be backwards compatibility issues since these are intrinsic suprograms. For example, they cannot be passed as actual subprograms to generic libraries. Also, if needed, in the future we could add overloaded calls to these libraries that accept a memory order parameter, without introducing backward compatibility issues.
The gcc version of the Atomic_Compare_And_Exchange call has an extra boolean parameter, named "Weak". According to the documentation, Weak is true for weak compare_and_exchange, which may fail spuriously, and false for the strong variation, which never fails spuriously. Many targets only offer the strong variation and ignore the parameter. The benefit for having the "Weak" version is that it can provide additional performance benefits, if the programmer knows how to use it properly. For Ada, we thought it was probably not a good idea to provide a parameter that can cause "spurious failures", so for now we eliminated that parameter, and expect Weak to be passed as False, if needed by the implementation. We could add a separate call in the future that exposes the Weak parameter, if there is enough need for such a capability.
!ASIS
No ASIS effect.
!ACATS test
An ACATS C-Test is needed to check that the new capabilities are supported.
!appendix

!topic Lock-free data structures
!reference Ada 202x RM9
!from Steven-Stewart-Gallus 17-05-19
!keywords lock-freedom concurrency tasking hard-real time !discussion

Will Ada 202x have support for lock-free data structures?  An API along the
style of the GNAT Lock_Free pragma or a generic package like I made at
https://sstewartgallus.com/git?p=linted.git;a=blob;f=src/ada-core/src/linted-gcc
_atomics.ads;h=a87061e74aa3a7badcfcd7cd0f0f5c0f2abe1908;hb=HEAD that mirrors
C++'s support might be useful.  Also, a function for x86's pause instruction or
similar would be useful.

This would all be useful for hard-real time platforms that need very strict
timing guarantees.

The Lock_Free pragma would probably be easiest for formalizing in SPARK and
such.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 19, 2017  5:15 PM

> Will Ada 202x have support for lock-free data structures?

No idea; no one has formally asked until 30 minutes ago. In particular, the
real-time folks at IRTAW have not (yet?) forwarded any proposals in this idea.
Generally, we let them take the lead on real-time issues.

And in any case, it is the wrong question. A lock-free data structure is a
specific solution, not a problem or capability. Ada already provides a wide
variety of ways to write data structures for multitasking, from the very low
level (aspects Atomic and Volatile) to the nicely abstract (protected objects
and rendezvous).

We need to know what cannot be done with the existing features.

No one seems interested in explaining what they cannot do in Ada, but rather
seem interested in following the herd to use solutions cooked up for languages
that don't have the abstract multitasking capabilities of Ada. In many cases,
there are better ways to do in Ada what you might have to do in some low-level
manner in some other language. (And that's often true in other areas of Ada as
well -- OOP in particular.)

In particular, I'd like to know the following. (Aside: I always tell people that
I know just enough about multitasking to be dangerous, so please humor me if you
think these things are obvious.)

What capabilities (precisely) are missing from Ada in order for it to directly
support low-level lock-free data structures? We most certainly would not want to
add a bunch of new low-level capabilities, but rather would want to extend the
existing facilities to better support low-level lock-free data structures. It
seems obvious to me that Ada needs a portable atomic test-and-set operation, but
I don't know if that is enough nor whether that is really practical. Nor is the
best form to define that obvious (it would have to be implemented as a built-in
in order to get the intended special semantics, which complicates
implementation).

And the even more important question: in general use, what can you do with a
lock-free data structure that you cannot do with a protected object? After all,
if you can use a PO to accomplish your task, you should do that as it is far
more abstract and portable than any particular implementation could be. And by
using a PO, you are letting the compiler chose the most efficient way to
implement your data structure for your target rather than making an assumption
that very well may be wrong. (Programmers are notoriously bad at determining the
efficiency of code and the importance of efficiency of particular code.)

Your thoughts (and anyone's, for that matter) on this topics would help guide
thinking on these topics.

****************************************************************

From: Tucker Taft
Sent: Friday, May 19, 2017  5:30 PM

Note that AdaCore supports a pragma Lock_Free that can be applied to protected
types to cause them to use lock-free primitives (or complain if not possible):

  http://docs.adacore.com/gnat_rm-docs/html/gnat_rm/gnat_rm/implementation_defined_pragmas.html#pragma-lock-free

Be that as it may, I sympathize with having access to a Compare_And_Swap
primitive applicable to atomic objects, since most hardware supports that at
this point, and from it you can implement essentially any lock-free structure.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 19, 2017  6:34 PM

Thanks Tuck; I was aware of that.

That just seems to me to be an inversion -- the compiler ought to select the
best implementation, not make the user guess what implementation is best on
their target. (I realize this sort of inversion is ingrained in computing;
aspect Inline is another example of this sort of inversion -- and I don't like
it much either.)

It seems clear that there is a hole when creating low-level algorithms (no
test-and-set), it's much less obvious that there is such a hole for protected
objects (after all, most of anything in a program is not performance critical,
so restructuring your data structures to fit some lock-free profile just makes
your code harder to understand in most cases). And one assumes that compilers do
the best they can for a data structure and don't just fall back on some general
algorithm. (What compiler vendor wants to generate slower than necessary
code???)

****************************************************************

From: Steven Stewart-Gallus
Sent: Saturday, May 20, 2017  3:11 PM

>> Will Ada 202x have support for lock-free data structures?
>
> No idea; no one has formally asked until 30 minutes ago. In particular, the
> real-time folks at IRTAW have not (yet?) forwarded any proposals in this
> idea. Generally, we let them take the lead on real-time issues.

Does anyone know how I would get into contact with them?

> And in any case, it is the wrong question. A lock-free data structure is a
> specific solution, not a problem or capability.

I'm sorry I meant capabilities for implementing lock-free data  structures.
Standard library support for common data structures is an entirely separate
question.  Under current standard Ada it is not possible to implement certain
data structures that guarantee forward progress among all threads without
delving into low-level assembly.

> Ada already provides a wide variety of ways to write data structures
> for multitasking, from the very low level (aspects Atomic and
> Volatile) to the nicely abstract (protected objects and rendezvous).

Unfortunately, current Atomics support does not provide for
compare_and_exchange primitives and similar and so cannot support  user-written
lock-free data structures.

> We need to know what cannot be done with the existing features.

Atomic swaps, compare_and_swaps and processor specific thread yield
instructions.  Also, less strongly ordered atomics that are less expensive.

> No one seems interested in explaining what they cannot do in Ada, but rather
> seem interested in following the herd to use solutions cooked up for
> languages that don't have the abstract multitasking capabilities of Ada. In
> many cases, there are better ways to do in Ada what you might have to do in
> some low-level manner in some other language. (And that's often true in
> other areas of Ada as well -- OOP in particular.)

You realize the standard library has to be implemented somewhere right?
On some platforms there is no Ada standard library or a very reduced set  of
capabilities and people have to implement such capability themselves. And as
stated previously, there is NO way to implement the timing  guarantees such as
lock-freedom or wait-freedom in standard Ada.  Also,  these capabilities cannot
be used inside interrupts or signal handlers.

> In particular, I'd like to know the following. (Aside: I always tell people
> that I know just enough about multitasking to be dangerous, so please humor
> me if you think these things are obvious.)
>
> What capabilities (precisely) are missing from Ada in order for it to
> directly support low-level lock-free data structures? We most certainly
> would not want to add a bunch of new low-level capabilities, but rather
> would want to extend the existing facilities to better support low-level
> lock-free data structures. It seems obvious to me that Ada needs a portable
> atomic test-and-set operation, but I don't know if that is enough nor
> whether that is really practical. Nor is the best form to define that
> obvious (it would have to be implemented as auilt-in in order to get the
> intended special semantics, which complicates implementation).

The C++ and C standards have a somewhat reasonable API.  I gave an  example of
two possible APIs.  The Lock_Free pragma of GNAT and the  generic package
wrapper over intrinsics provided by GCC.

I think the Lock_Free pragma approach might be easiest for static  analysers
such as SPARK to analyze and so would be most suited for  hard-real time
support.  On the other hand, it wouldn't work for  wait-freedom requirements.

The barest mimimal requirements are probably atomic exchange, atomic  compare
and exchange and nop or yield instruction.  All else can be  implemented using
them.  However, certain atomics cannot be implemented  in a wait-free way
without being built in and instead must be lock-free.

> And the even more important question: in general use, what can you do with a
> lock-free data structure that you cannot do with a protected object? After
> all, if you can use a PO to accomplish your task, you should do that as it
> is far more abstract and portable than any particular implementation could
> be. And by using a PO, you are letting the compiler chose the most efficient
> way to implement your data structure for your target rather than making an
> assumption that very well may be wrong. (Programmers are notoriously bad at
> determining the efficiency of code and the importance of efficiency of
> particular code.)

You cannot guarantee special properties such as wait-freedom, or  lock-freedom.

****************************************************************

From: Florian Weimer
Sent: Monday, May 22, 2017  6:27 AM

> That just seems to me to be an inversion -- the compiler ought to
> select the best implementation, not make the user guess what
> implementation is best on their target.

In some cases, lock-free implementations are required for correctness. For
example, in glibc, we allegedly cannot use the GCC atomic built-ins because they
could be implemented with locks, and that wouldn't give us the wrong behavior.

With that background, this GNAT-specific assertion that the compiler must use
lock-free operations is useful.

****************************************************************

From: Randy Brukardt
Sent: Monday, May 22, 2017  1:31 PM

Could you explain this further? Why would it matter how synchronization is
accomplished? I would have expected that would be an implementation detail.

****************************************************************

From: Florian Weimer
Sent: Tuesday, May 23 2017  2:16 PM

It's very visible with POSIX shared mappings (either anonymous or file-backed).
It would matter with I/O memory mappings, too.

It also affects the program semantics when signals are involved.
Lock-based implementations of atomics can deadlock too easily.

The problems are so numerous that it's been concluded (at least on the glibc
side) that if the architecture does not support certain atomics, the only way to
emulate them if required is with a kernel assist.

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 24, 2017  1:33 PM

One interesting side note about lock-free vs. wait-free — a relatively recent
paper documented some research which showed that the effort to produce a
“wait-free” structure was rarely worthwhile.  Most “lock-free” structures were
nearly wait free, without incurring all the complexity of a true “wait-free”
algorithm.  See the paper by Alistarh, Censor-Hillel, and Shavit:

   "Are Lock-Free Concurrent Algorithms Practically Wait-Free?”

   https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paper-18.pdf

In any case, I support coming up with a simple standard Ada package for compiler
implementors to provide access to Compare-and-Swap and perhaps a few additional
operations.

****************************************************************

From: Hadrien Grasland
Sent: Wednesday, May 24, 2017  4:04 PM

If you go down the route of exposing hardware compare-and-swap operations, I
would recommend also exposing other widely supported read-modify-write
operations, such as swap or fetch-increment/decrement. They are less general
than CAS, but where applicable, they enable better algorithms (no ABA issues, no
expensive fail-and-retry loop).

One example of application for atomic swap is triple buffering (producer
transactionally updates a shared RAM value whenever it feels like it, consumer
can access the latest update at any point in time), and one example of
application for fetch-increment/decrement is reference counting.

****************************************************************

From: Stephen Michell
Sent: Wednesday, May 24, 2017  5:51 PM

Some of the IRTAW members participate in ARG. In particular, Alan Burns, Brad
Moore and myself participate. IRTAW holds a workshop approximately every 18-24
months. The last one was held in Benicassim, Spain in April 2016. The next one
will either be in the fall of 2017 or spring of 2018.

IRTAW has been resistant to putting such mechanisms into the Ada language,
preferring instead to permit Ada to use such mechanisms to make mechanisms such
as protected types/objects, tasks, etc. more efficient.

The mechanism to interface with IRTAW is to watch for the announcement of the
next workshop and submit a position paper to attempt to get the issue on the
agenda and to come and participate. If you want, I will communicate with the
organizer of the next workshop and ensure that interested people receive a
direct email announcing the workshop when it is issued.

****************************************************************

From: Dirk Craeynest
Sent: Thursday, May 25, 2017  4:24 AM

And if you would like to interact even sooner with IRTAW members, and the Ada
community in general, you should consider to attend the annual Ada-Europe
conference.

The 22nd International Conference on Reliable Software Technologies - Ada-Europe
2017 - will be held in less than a month (12-16 June 2017) in Vienna, Austria.
For more info, see: <http://www.ada-europe.org/conference2017>.

****************************************************************

From: Florian Weimer
Sent: Friday, May 26, 2017  2:51 PM

> If you go down the route of exposing hardware compare-and-swap
> operations, I would recommend also exposing other widely supported
> read-modify-write operations, such as swap or fetch-increment/decrement.
> They are less general than CAS, but where applicable, they enable
> better algorithms (no ABA issues, no expensive fail-and-retry loop).

And you also need a memory model which goes beyond the current signaling
concept.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 26, 2017  4:00 PM

?? Atomic objects themselves force sequential actions; there's no "signalling"
involved. And C.6 makes it clear that may require barriers or fences. What else
makes sense (whatever the rules are have to make sense in a wide variety of
contexts)?

****************************************************************

From: Randy Brukardt
Sent: Friday, May 26, 2017  4:57 PM

A couple of quick thoughts on this thread...

..
>  > Ada already provides a wide variety of ways to write data
> structures  > for multitasking, from the very low level (aspects
> Atomic and  > Volatile) to the nicely abstract (protected objects and
> rendezvous).
>
>  Unfortunately, current Atomics support does not provide for
> compare_and_exchange primitives and similar and so cannot support
> user-written lock-free data structures.
>
>  > We need to know what cannot be done with the existing features.
>
>  Atomic swaps, compare_and_swaps and processor specific thread yield
> instructions.  Also, less strongly ordered atomics that are less
> expensive.

Thanks, that's in part what I was looking for (and expected as well).

>  > No one seems interested in explaining what they cannot do in Ada,
> but rather  > seem interested in following the herd to use solutions
> cooked up for  > languages that don't have the abstract multitasking
> capabilities of Ada. In  > many cases, there are better ways to do in
> Ada what you might have to do in  > some low-level manner in some
> other language.
> (And that's often true in  > other areas of Ada as well -- OOP in
> particular.)
>
>  You realize the standard library has to be implemented somewhere
> right?

The standard library is always going to be implementation-specific (and often
target-specific as well); there is very little of the GNAT standard library code
that would work on Janus/Ada, for one example.

Thus, there (A) is no need for portability in the implementation of the standard
library (meaning that the Ada Standard isn't particularly relevant there) and
(B) depending on implementation-specific features is fine. Which means that is a
terrible example!

>  On some platforms there is no Ada standard library or a very reduced
> set  of capabilities and people have to implement such capability
> themselves.

Which brings one back to the original question, why isn't a protected object
good enough for that?

>  And as stated previously, there is NO way to implement the timing
> guarantees such as lock-freedom or wait-freedom in standard Ada.

I'm very skeptical that any useful timing guarantees could ever be made in a
high-level language. I can't tell how long a piece of sequential code produced
by my compiler will take to run even on the various Intel machines I have here
(it seems to vary as much as a factor of 4 from what I would estimate). Truly
portable code that runs on many different processors would be beyond
predictable. Mixing that in with the different costs of synchronization on
different targets and nothing but the most trivial can have any guarantee.

But I'm at the limits of my knowledge here. While it seems to me that the actual
implementation of a protected object should not matter (if it has the correct
form, it cannot by itself cause deadlock or livelock, and that form can be
easily checked by a tool), lots of other people seem to differ. Perhaps they're
all focused on the wrong thing (I certainly think so), but obviously I may be
missing some key information.

...
>  You cannot guarantee special properties such as wait-freedom, or
> lock-freedom.

I'd argue that it's not the job of a programming language to guarantee anything,
especially when it comes to timing. (These properties seem misnamed as they
generally are meant to make timing guarantees rather than an implementation
guarantee. I can see no reason that a properly implemented lock could not make
the same guarantee.)  And the implementation of a protected object doesn't
really matter in terms of whether those properties hold -- it's just the form of
the contents that matters. That seems easy to check with tools like SPARK.

Anyway, I've said enough, I'll let others more knowledgeable weigh in if they
want.

****************************************************************

From: Brad Moore
Sent: Wednesday, March 28, 2018  12:39 AM

I noticed that there was a fair amount of interest in this AI in the result of
Randy's straw poll.

I was wondering if my thinking on this, was on track.

Basically, I was thinking that GNAT has implemented a pragma/aspect called
Lock_Free.

It is documented in the reference manual as follows:

"Syntax: This pragma may be specified for protected types or objects. It
specifies that the implementation of protected operations must be implemented
without locks. Compilation fails if the compiler cannot generate lock-free code
for the operations."


I recall that Geert Bosch from Adacore had worked on this a few years ago, and
that he had written a paper that was presented at one of the conferences that
explained this feature quite nicely.

My thoughts are that for this AI, we should simply capture the design that
Adacore currently has implemented and write it up.

Does that make sense?

I know I have at times wanted to use that aspect in my code, but refrained from
doing so, because it wasn't in the standard, but noticed that there were
performance benefits when I did experiment with this feature.

My understanding is that many hardware targets support some sort of basic
compare and swap instruction.

Also, I am the program chair for IRTAW 2018, which is happening in a couple
weeks. I think this AI might be of interest for discussion with the IRTAW
people.

****************************************************************

From: Erhard Ploedereder
Sent: Wednesday, March 28, 2018  7:48 AM

In general, users may need access to the atomic check instruction of an ISA.
However, these are different on different ISA architectures.

There is compare-and-swap. There is also test-and-set (few ISAs), and there is a
third one more recently introduced, whose nature I do not recall right now but
could look up.

Usually, a given ISA supports only one of these ops.

How can you possibly standardize on this without becoming ISA-specific?
Much better: Target-specific package provided by implementer.

Aside: Because self-made locks are really very dangerous, I would not make it
easy for users to get at these atomic ops.

****************************************************************

From: Tullio Vardanega
Sent: Wednesday, March 28, 2018  8:34 AM

I very much concur with Erhard's aside.

****************************************************************

From: Brad Moore
Sent: Wednesday, March 28, 2018  10:04 AM

> How can you possibly standardize on this without becoming ISA-specific?
> Much better: Target-specific package provided by implementer.

I agree this is a problem if one tries to provide an interface at too low a
level. I think Adacore's approach with a boolean Lock_Free aspect that can be
applied to a protected object is a higher level of abstraction that could
accommodate mappings to different architectures. This includes providing the
lock free capability by scheduling or other means, such as could be the case for
Ravenscar programs if the protected object is known to be only accessed by tasks
executing on the same processor. To this end, Adacore also added a CPU aspect
that can be applied to a protected object to specify that the protected object
can only be accessed tasks running on a specific CPU. This extension to the CPU
aspect should probably also be a part of this AI.

Paraphrasing from Geert's paper, the goal they started with was to target the
widest range of systems possible, based on compare and swap or
load-link/store-conditional primitives.

Their claim is that with the restrictions described below, this feature "can be
supported by the large majority of current multiprocessor hardware".

To achieve this, they had to apply some restrictions to the use of the Lock_Free
aspect.

The restrictions include;

- A lock-free protected action cannot have loop, goto, procedure call
  statements, quantified expressions, or calls to non-static functions.

The reason for this restriction is to bound the time it can take via retries to
obtain access to the protected object.

- A protected action can only refer to a single component of the protected
  object, and the size of the component cannot exceed the maximum size for which
  lock free atomic primitives are supported.

- The protected object cannot access non-local variables. Constants are OK.
- The parameters to the protected actions must be of elementary types
- Access values are not allowed.
- Address clauses are not allowed
- Imported or exported entities are not allowed
- Allocators are not allowed.
- Volatile variables are not allowed

I believe Adacore's implementation also does not currently support protected
objects that have entries.

If the users code does not meet all of these restrictions, the compilation
fails. Then either the programmer would try to address the restrictions being
violated, or remove the Lock_Free aspect and fall back to the default
implementation for protected objects.

> Aside: Because self-made locks are really very dangerous, I would not
> make it easy for users to get at these atomic ops.

By building this into the protected object abstraction, I think we'd be
eliminating the danger, or at least making it a lot safer, while also being a
lot more portable.

****************************************************************

From: Tucker Taft
Sent: Wednesday, March 28, 2018  11:22 AM

I do not think we should rely on the Lock_Free pragma on protected objects.  It
has too many special restrictions.  I think we should provide direct access to
compare-and-swap on atomic objects.  I believe there is a standard API for this
which is supported by gcc for C, and I would suggest we provide something
similar.  I believe the interface to the API is not dependent on the ISA, but of
course the implementation is.  Atomic compare-and-swap is the most fundamental
operation when building lock-free data structures, and it is quite annoying that
this cannot be done portably in Ada.  And yes, this operation is for experts
only, but that doesn't mean such experts don't want to write in Ada, while still
achieving some level of portability.

Below is the gcc API, extracted from:
   https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

I think we don't need all the options.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, March 28, 2018  12:23 PM

...
> Basically, I was thinking that GNAT has implemented a pragma/aspect
> called Lock_Free.
...
> My thoughts are that for this AI, we should simply capture the design
> that Adacore currently has implemented and write it up.
>
> Does that make sense?

That would be an interesting thing to do, but it doesn't have anything to do
with making updates of atomic objects safe. (Protected object /= atomic object
:-). I don't think a directive of "don't use atomic objects, use lock_free POs
instead" would fly -- if it did fly, totally banning atomic objects from "safe"
tasking code would solve the problem [with or without other features].

Currently, if you have an atomic object A,

    A := A + 1;

is a race condition (as the read and write are atomic, but not the entire
update). Ada doesn't have atomic updates in any form.

There aren't many algorithms that you can write with atomic objects that don't
run afoul of this issue. Most implementations have a generic library designed to
provide the missing update operations, and we wanted to standardize some such
library so that portable, race condition-free code could be constructed.

****************************************************************

From: Jeff Cousins
Sent: Thursday, March 29, 2018  5:21 AM

> Currently, if you have an atomic object A,
>
>    A := A + 1;
>
> is a race condition (as the read and write are atomic, but not the entire update). Ada doesn't have atomic updates in any form.

The x86 family’s LOCK prefix can be applied to increment instructions, as well
as compare and swap, for just such a common case.  It “would be nice” to have
usage of such instructions without having to drop into machine code insertions.
But any such facility would have to be heavily caveated with “if supported by
the platform” – though we already have something like this for Long_Float and
Long_Integer, and even within the x86 family it can vary from chip type to chip
type as to whether your particular lump of silicon actually does the bus
locking, so it may not be portable.

****************************************************************

From: Erhard Ploedereder
Sent: Thursday, March 29, 2018  6:26 AM

So, how does the compiler builder implement this atomically on an ISA that does
not have the CAS-instruction, but instead the test-and-set (or a fetch-and-add)
instruction?

(Presumably, on such architectures the C++ built-in CASes are simply not
supported; but CAS does cover many ISA, particularly the x86, so C++ is happy.
In Ada we shied away from implementation-defined subsetting of the standard
language.)

****************************************************************

From: Bob Duff
Sent: Thursday, March 29, 2018   7:46 AM

> The x86 family’s LOCK prefix can be applied to increment instructions,
> as well as compare and swap, for just such a common case.  It “would
> be nice” to have usage of such instructions without having to drop
> into machine code insertions.  But any such facility would have to be
> heavily caveated with “if supported by the platform” ...

No, all of these features (compare-and-swap, atomic increment, etc) can be
implemented on all machines.  If the hardware doesn't support it, it can be done
in software.  AFAIK, that's what gcc does for C -- the primitives are supported
on all machines.

Of course, users will often want to investigate how efficient the implementation
is on their particular hardware.

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 29, 2018  6:48 PM

> Aside: Because self-made locks are really very dangerous, I would not
> make it easy for users to get at these atomic ops.

One way to do that would be to not trust them as a synchronization method for
"safe" parallelism. I've been advocating that for a while as the "natural" use
of atomics is a built-in race condition. (That is, "A := A + 1;" is inherently
unsafe.) In that case, only users running in "user beware" mode could use
atomics for synchronization - that would keep them mostly out of the hands of
the unwashed masses that want the compiler to help with the safe application of
parallelism.

An aspect Lock_Free might have some utility in such a scheme - but this AI was
about giving experts portable access to such operations (at least across
compilers for a single target -- which is valuable by itself).

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 29, 2018  7:02 PM

> So, how does the compiler builder implement this atomically on an ISA
> that does not have the CAS-instruction, but instead the test-and-set
> (or a fetch-and-add) instruction?

Protected objects always have to be implemented in Ada, and it's always possible
to do the needed operations in a (virtual) protected operation, so it's hard to
imagine that there is any operation that can't be implemented on any ISA that
can support Ada.

We would probably want Implementation Advice that these are implemented in a
lock-free manner, which would implicitly include a documentation requirement to
tell the user any that are not implemented that way. (All Implementation Advice
is an implicit documentation requirement.)

So, the basic operations would be portable anywhere. If the expert (!) user of
these things needed a lock-free implementation, that would be portable for a
given ISA (modulo clueless implementers like me :-), and user can always find
out which operations are lock-free on a particular ISA.

This seems to check all of the boxes (the operations are portably available, are
lock-free as much as possible, the user can always use the target-specific
subset if they care about lock-free, etc.).

What's wrong with this plan? (As always, when it comes to this sort of stuff, I
know just enough to be dangerous. :-)

****************************************************************

From: Erhard Ploedereder
Sent: Thursday, March 29, 2018  9:34 PM

No doubt. We always can implement everything in software, but is that what the
user wants? Surely, if he is trying to write a lock-free programm, he is going
to be pissed to find that compare-and-swap was implementended with a lock or
semaphore. This is not just a performance question. It also impacts issues
like risk of priority inversion and anything that relates to blocking.

His usual route is to provide a target-dependent solution. It will look a bit
different, depending on which operation is available that is synchronized by
the memory bus. He sure would hate not to be told that his compare-and-swap
is not mapped to such an operation. If the compiler is mum about this, there
is a good chance that the issue is not discovered during porting of the
software.

So, I would have less problems with a set of procedures that encapsulated the
particular primitives, combined with a mandate to support all that match a
bus-synchronized operation and a mandate to REJECT calls to any for which said
operation does not exist.

A software emulation of these instructions that is not lock- and wait-free ist
the worst of all solutions for embedded systems with deadlines.

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 29, 2018  11:05 PM

> No doubt. We always can implement everything in software, but is that
> what the user wants?

It clearly depends on the user. If the user is just using atomics as the
easiest way to meet the requirements for parallel synchronization, I doubt
that they care how it is implemented. For the expert...

> Surely, if he is trying to write
> a lock-free programm, he is going to be pissed to find that
> compare-and-swap was implementended with a lock or semaphore.
> This is not just a performance question. It also impacts issues like
> risk of priority inversion and anything that relates to blocking.

Understood. But as Bob notes, that's how C does it. Not that that is the best
example :-), but it at least seems to give a data point that it's acceptable
to some part of the community.

> His usual route is to provide a target-dependent solution. It will
> look a bit different, depending on which operation is available that
> is synchronized by the memory bus.
> He sure would hate not to be told that his compare-and-swap is not
> mapped to such an operation. If the compiler is mum about this, there
> is a good chance that the issue is not discovered during porting of
> the software.

Compilers that are mum violate the Standard. Unfortunately, there's no good
way to enforce that rule.

> So, I would have less problems with a set of procedures that
> encapsulated the particular primitives, combined with a mandate to
> support all that match a bus-synchronized operation and a mandate to
> REJECT calls to any for which said operation does not exist.

The problem is I don't think we can do that in the Standard, since there is
zero chance that we could formally define what a "bus-synchronized operation"
is. And any Legality Rule requires that. Moreover, such a rule isn't testable
by the ACATS (it does not allow looking at generated code), so it would be
effectively meaningless. (And given my experience, probably not enforced by a
large number of compilers.)

We could (as Robert Dewar used to note) write Implementation Advice to that
effect. But that hardly seems better than the original option.

> A software emulation of these instructions that is not lock- and
> wait-free ist the worst of all solutions for embedded systems with
> deadlines.

The only other option that makes sense to me is to forget the atomic library
and then declare that atomics are not "safe" for the purposes of parallelism.
(When checking, of course.) That would keep the novices from trying to use
them, especially to make existing code parallel-safe. And then, as Brad
suggests, an implementation of aspect Lock_Free seems like the best way to get
portability of lock-free code. Compilers probably would screw that up, too,
but at least it could be a bug report then. :-)

****************************************************************

From: Edward Fish
Sent: Thursday, March 29, 2018  11:32 PM

> No doubt. We always can implement everything in software, but is that
> what the user wants? [...] His usual route is to provide a target-dependent
> solution.

I think the user would typically want general, portable code rather than
target-dependent without some indicator/specifier that some dependent form is
needed; as an example records and representations. If there is not
representation, then I generally assume that the form sought structure-wise is
more in the ideal plane than not, whereas with a representation-specification
the particulars matter to the point where "ideal" isn't possible. (e.g. When
you're interfacing with foreign programs/protocols and have to have field A of
x-bits, field B of y-bits, 2-bits padding, and field C of 14-bits.)

> It will look a bit different, depending on which operation is
> available that is synchronized by the memory bus.
> He sure would hate not to be told that his compare-and-swap is not
> mapped to such an operation. If the compiler is mum about this, there
> is a good chance that the issue is not discovered during porting of
> the software.

But there's the crux of the issue: should a compare-and-swap be in error in an
ISA which doesn't have a compare-and-swap instruction? or should the compiler
construct the proper SW-construct? -- If such low-level controls are required,
Ada *DOES* allow for instruction-insertion, albeit implementation-defined.

> So, I would have less problems with a set of procedures that
> encapsulated the particular primitives, combined with a mandate to
> support all that match a bus-synchronized operation and a mandate to
> REJECT calls to any for which said operation does not exist.
>
> A software emulation of these instructions that is not lock- and
> wait-free ist the worst of all solutions for embedded systems with
> deadlines.

It looks like there's a bit of tension between (a) general purpose and
(b) low-level here... I can't offer much comment regarding embedded
programming, since my experience there is virtually nil.

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 29, 2018  11:55 PM

...
> No doubt. We always can implement everything in software, but is that
> what the user wants? Surely, if he is trying to write a lock-free
> programm, he is going to be pissed to find that compare-and-swap was
> implementended with a lock or semaphore.
> This is not just a performance question. It also impacts issues like
> risk of priority inversion and anything that relates to blocking.

BTW, could you provide a reference that a relative (but not complete) newby
understand this topic? I keep reading statements like the above, and they seem
like utter nonsense to me. The form of the implementation shouldn't matter, so
long as it correctly implements the semantics. (After all, there is no such
thing as "lock-free"; there's always a lock -- lock-free algorithms are just
using the data as the lock.)

I read this sort of stuff often enough that it's clear that smart people that
know more that I do in this area think this is true, but I've never been able
to find anything that explains why.

****************************************************************

From: Erhard Ploedereder
Sent: Friday, March 30, 2018  5:11 PM

One of the best references is
Alan Burns, Andy Wellings: Real-Time Systems and Programming Languages,
Addison Wesley Longmain, ISBN: 978-0-321-41745-9


Here is a canonical scenario for a priority inversion:

4 tasks T1 - T4, numbers indicate priority. High number = high priority.
The tasks are periodic, i.e., released at some time intervals.

T1 runs, grabs said lock to perform the software-emulated compare-and-swap,
but before it is done with it, T4 is released by time-tigger, preempts T1,
runs a bit - meanwhile T2 and T3 are released, too.

Then T4 asks for the same lock, but can't get it. Shucks. Needs to wait until
T1 finishes with the CAS. But T2 and T3 have higher priority than T1. So, T1
needs to wait until T2 and T3 have finished their work. Then, much later, T1
gets to run, completes the CAS and releases the lock.
Then, finally, T4 gets to do its CAS.

Now, there was a reason for T4 having high priority, namely, the fact that it
has the tightest deadline (a general principle in fixed-priority embedded
scheduling, known to be optimal). Which is likely to be long past in the
scenario above.

It T4 controls the brakes in your car, you no longer perceive this as being
merely a performance issue. Dead people do not reflect on such things any
more.

You just saw a priority inversion in action (named so, because T4 behaves for
a while as if it had lowest priority 1). There are scheduling schemes that
avoid priority inversion, but only if the locks are a concept provided by the
run-time system and well understood by the scheduler  (ICPP, OCPP, Priority
Inheritance, ... Deadline-floor protocol, etc.)

You can't find these buggers by testing, because they are highly intermittent,
i.e., things need to happen at just the right time to cause the prioity
inversion.

CAS and friends in the ISA use the synchronization of the memory bus over each
memory access instruction to avoid the need for a lock to make the operation
atomic, even in the case of multicore.

What makes them dangerous is when users apply them to build their own locks to
protect some data, because these locks are then unknown to the scheduler. =>
severe risk of priority inversions, if these löcks cause waits.

****************************************************************

From: Randy Brukardt
Sent: Friday, March 30, 2018  5:58 PM

> Here is a canonical scenario for a priority inversion:
>
> 4 tasks T1 - T4, numbers indicate priority. High number = high
> priority.
> The tasks are periodic, i.e., released at some time intervals.
>
> T1 runs, grabs said lock to perform the software-emulated
> compare-and-swap, but before it is done with it, T4 is released by
> time-tigger, preempts T1, runs a bit - meanwhile
> T2 and T3 are released, too.
>
> Then T4 asks for the same lock, but can't get it. Shucks.
> Needs to wait until T1 finishes with the CAS. But T2 and T3 have
> higher priority than T1. So, T1 needs to wait until T2 and T3 have
> finished their work. Then, much later, T1 gets to run, completes the
> CAS and releases the lock.
> Then, finally, T4 gets to do its CAS.

Thanks. It's clear the problem here is the fact that T1 gets preempted (I knew
there was a reason I dislike preemption :-).

I also note that this doesn't happen if the lock is part of a protected object,
is a protected action can't be preempted (caused via ceiling priority or
whatever) unless no higher priority task can use it.

> Now, there was a reason for T4 having high priority, namely, the fact
> that it has the tightest deadline (a general principle in
> fixed-priority embedded scheduling, known to be optimal). Which is
> likely to be long past in the scenario above.
>
> It T4 controls the brakes in your car, you no longer perceive this as
> being merely a performance issue. Dead people do not reflect on such
> things any more.

This is of course why I want checking on the introduction of parallel
execution. Mere mortals cannot see these sorts of issues; the easier it is to
introduce parallelism, the more likely it is for these sorts of effects to
occur. (I'm happy to have such checking turned off by experts; it necessarily
has to be quite conservative and it wouldn't do to force many things to be
written as tasks -- which are even less structured.)

> You just saw a priority inversion in action (named so, because T4
> behaves for a while as if it had lowest priority 1).
> There are scheduling schemes that avoid priority inversion, but only
> if the locks are a concept provided by the run-time system and well
> understood by the scheduler  (ICPP, OCPP, Priority Inheritance, ...
> Deadline-floor protocol, etc.)
>
> You can't find these buggers by testing, because they are highly
> intermittent, i.e., things need to happen at just the right time to
> cause the prioity inversion.

Right. Tasking issues in general are impossible to find, because of that fact
-- even if you get them to happen, you can't reproduce them. I seriously have
no idea how people do that -- even debugging Janus/Ada's cooperative
multitasking is very difficult -- and it's repeatable if you can get rid of
any timing effects.

> CAS and friends in the ISA use the synchronization of the memory bus
> over each memory access instruction to avoid the need for a lock to
> make the operation atomic, even in the case of multicore.
>
> What makes them dangerous is when users apply them to build their own
> locks to protect some data, because these locks are then unknown to
> the scheduler. => severe risk of priority inversions, if these löcks
> cause waits.

Makes sense. This suggests that you would prefer that anyone that needs
portable synchronization avoid atomic objects altogether (one presumes that
the compiler has selected an implementation [of protected objects - ED] known
to the scheduler and/or truly atomic -- if the compiler implementer is
clueless to these issues you  have no hope anyway). Is that a fair conclusion??

I'm primarily interested here in the effect on "checked" synchronization for
parallel execution. That needs to be defined so that a ny moderately
competent Ada programmer can do the right thing. Since "parallel" is often
used as an optimization, it will often be introduced after the fact, so the
only thing preventing problems is the checking.

****************************************************************

From: Erhard Ploedereder
Sent: Friday, March 30, 2018  6:59 PM

> I also note that this doesn't happen if the lock is part of a
> protected object, as a protected action can't be preempted (caused via
> ceiling priority or whatever) unless no higher priority task can use it.

True only under a scheduling based on ceiling protocols. Under "plain"
fixed-priority preemptive scheduling or even with priority inheritance, the
preemption can happen.

****************************************************************

From: Edward Fish
Sent: Thursday, March 29, 2018  10:16 PM

> This is of course why I want checking on the introduction of parallel
> execution.

But the issue here (preemption of execution) is purely a sequential issue:
this is to say, if you have Task-1 and Task-2 where Task-1 where Task-2 is
executing and there's a free processor for Task-1 there is no issue. (This
issue w/ locks is something different, at least how I learned it [preemption
having to do strictly with execution].)

> Mere mortals cannot see these sorts of issues; the easier it is
> to introduce parallelism, the more likely it is for these sorts of effects
> to occur. (I'm happy to have such checking turned off by experts; it
> necessarily has to be quite conservative and it wouldn't do to force many
> things to be written as tasks -- which are even less structured.)

I was really impressed by the Thesis that I referenced in an earlier email
-- "Reducing the cost of real-time software through a cyclic task
abstraction for Ada" -- I thought it did a great job with increasing the
accuracy of schedulability and analysis, at least in the theoretical.

> Right. Tasking issues in general are impossible to find, because of that
> fact -- even if you get them to happen, you can't reproduce them. I
> seriously have no idea how people do that -- even debugging Janus/Ada's
> cooperative multitasking is very difficult -- and it's repeatable if you can
> get rid of any timing effects.

Are they? In the very 'generalest' it may be like the halting-problem and thus
impossible... but I don't know that that necessarily translates into some
usable subset. Just like how just because Ada's generics are not
turing-complete doesn't mean they're unusable. (Indeed, I'd argue that
turing-complete in a generic- or template-system hurts usability.)

>> CAS and friends in the ISA use the synchronization of the
>> memory bus over each memory access instruction to avoid the
>> need for a lock to make the operation atomic, even in the
>> case of multicore.

>> What makes them dangerous is when users apply them to build
>> their own locks to protect some data, because these locks are
>> then unknown to the scheduler. => severe risk of priority
>> inversions, if these löcks cause waits.

> Makes sense. This suggests that you would prefer that anyone that needs
> portable synchronization avoid atomic objects altogether (one presumes that
> the compiler has selected an implementation known to the scheduler and/or
> truly atomic -- if the compiler implementer is clueless to these issues you
> have no hope anyway). Is that a fair conclusion??

Seems a fair conclusion to me, but the reverse may be interesting: when the
synchronization constructs present enough information to the scheduler make
such guarantees -- this honestly seems right up Ada's alley or, if not,
certainly SPARK's.

****************************************************************

From: Jean-Pierre Rosen
Sent: Friday, March 30, 2018  11:50 PM

>> I also note that this doesn't happen if the lock is part of a
>> protected object, as a protected action can't be preempted (caused
>> via ceiling priority or whatever) unless no higher priority task can use it.
>
> True only under a scheduling based on ceiling protocols. Under "plain"
> fixed-priority preemptive scheduling or even with priority
> inheritance, the preemption can happen.
>
More precisely: preemption can always happen on tasks in protected actions,
but in the case of the priority ceiling protocol, a task can be preempted
only by a task that is not allowed to call the same PO, thus preventing
priority inversion.

****************************************************************

From: Erhard Ploedereder
Sent: Saturday, March 31, 2018  11:57 AM

>> This is of course why I want checking on the introduction of parallel
>> execution.

> But the issue here (preemption of execution) is purely a sequential
> issue: this is to say, if you have Task-1 and Task-2 where Task-1 where
> Task-2 is executing and there's a free processor for Task-1 there is
> no issue. (This issue w/ locks is something different, at least how I
> learned it [preemption having to do strictly with execution].)

Yes and No. In your scenario, the inversion goes away. But what about
T2 and T3?. They would preempt T1 if run on the same core. Welcome back
to Priority inversion for T4. You can get rid of it only if you have as
many cores as tasks (not likely), or not do preemptive scheduling (not
real-time), or use the scheduling/lock protocol schemes talked about in
the earlier mail.

****************************************************************

From: Brad Moore
Sent: Saturday, March 31, 2018  6:58 PM

>> He sure would hate not to be told that his compare-and-swap is not
>> mapped to such an operation. If the compiler is mum about this, there
>> is a good chance that the issue is not discovered during porting of
>> the software.
>
> Compilers that are mum violate the Standard. Unfortunately, there's no
> good way to enforce that rule.

This maybe sounds like another case where having a suppressable error might
useful. A language mandated "soft" error sounds like a better way to enforce a
compiler to not be mum.

>> So, I would have less problems with a set of procedures that
>> encapsulated the particular primitives, combined with a mandate to
>> support all that match a bus-synchronized operation and a mandate to
>> REJECT calls to any for which said operation does not exist.
>
> The problem is I don't think we can do that in the Standard, since
> there is zero chance that we could formally define what a
> "bus-synchronized operation" is. And any Legality Rule requires that.

If it were a soft error, would it still be considered a legality rule? If
not, maybe it could be added without requiring definitions for such things
as "bus-synchronized", and perhaps not require an ACATS test.

****************************************************************

From: Brad Moore
Sent: Saturday, March 31, 2018  7:28 PM

> operation when building lock-free data structures, and it is quite
> annoying that this cannot be done portably in Ada.  And yes, this
> operation is for experts only, but that doesn't mean such experts don't
> want to write in Ada, while still achieving some level of portability.
>
> Below is the gcc API, extracted from:
>   https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

I have been investigating this suggestion to create some sort of Ada interface
to the gcc API.

> I think we don't need all the options.

By options, I'm not sure if you were referring to memory order options, or
primitives.

Here's a quick summary of what I've found so far;

The gcc interface appears to map to the C++11 memory model.

There are basically 3 main flavors of memory ordering supported.

The most restrictive mode is Sequentially Consistent, which means updates are
synchronised across all threads, and all threads see the same order of update
to atomic variables. I believe this corresponds closely to Ada's definition of
Volatile and Atomic objects. This is the safest mode, but also the least
efficient since it requires a higher level of synchronisation. We could
probably support primitives that are in sequentially consistent mode most
easily, since we mostly already have this, and it is the safest mode with the
least amount of unexpected behaviours and pitfalls.

The next level down in terms of synchronisation requirements is a mode where
only the threads involved in the access to a particular atomic variable are
synchronised with each other. This capability however effects which
optimisations can be applied to the code.

For example, hoisting or sinking of code across boundaries of access to atomic
variables is disabled. To support this, the compiler needs to be intimately
aware and involved where this is used when it is applying optimisations.
I am only guessing, but I suspect there may not be enough appetite to
introduce such a memory mode if it requires the compiler to mesh well with it.

Below that is a relaxed mode where there is no synchronisation between
threads, the only guarantee is that a thread wont see previous values of a
variable if it has seen a newer value. This is the most efficient mode, but
also the most dangerous.

Someone who knows what they are doing could in theory use this to write more
efficient code. This mode might be easier to implement than the previous mode,
since it doesn't enforce "happens-before" constraints.  Maybe we could create
a new type of Volatile aspect, such as Relaxed_Volatile for this purpose?

The other threads of this email chain seem to be suggesting however that the
use of atomic primitives in user code to create lock free abstractions should
be discouraged, to avoid problems such as priority inversions when the
implementation is in software instead of hardware.

This has me thinking that the best option for Ada to add capability in this
area may be to go back to the Lock_Free aspect idea.

That way, the implementation of the lock is provided by the compiler
implementation, and fits into the synchronisation model we already have for
protected objects. The implementation can choose between multiple
implementation techniques, such as transactional memory. A protected object
also guides the programmer to write code inside the lock that is better formed
such by disallowing potentially blocking operations.

Here is a quote from the introduction of Geert's paper that seems relevant.

"The use of atomic primitives, memory barriers or transactional memory are
implementation details after all, that should not appear in actual user
code [1]."

Where the reference is;

H.-J. Boehm. Transactional memory should be an implementation technique, not
a programming interface.
In Proceedings of the First USENIX conference on Hot topics in parallelism,
HotPar’09, pages 15–15, Berkeley, CA, USA, 2009. USENIX Association.


I did go through the exercise of creating an Ada package spec for the functions
described in Tucker's link and came up with a generic package which I've
attached if anyone is interested.

Perhaps many of the primitives would not be needed, such as the arithmetic
ones, and the basic load and store routines.

----

atomic_operations.ads

----

with Interfaces;

generic
   type Atomic_Type is mod <>;
package Atomic_Operations is

   pragma Assert (Atomic_Type'Size = 1 or else Atomic_Type'Size = 2
                  or else Atomic_Type'SIze = 4 or else Atomic_Type'Size = 8);

   type Memory_Orders is
     (Relaxed,
      --  Implies no inter-thread ordering constraints.

      Consume,
      --  This is currently implemented using the stronger Acquire memory
      --  order because of a deficiency in C++11's semantics for
      --  memory_order_consume.

      Acquire,
      --  Creates an inter-thread happens-before constraint from the Release
      --  (or stronger) semantic store to this acquire load. Can prevent
      --  hoisting of code to before the operation.
      --  Note: This implies the compiler needs to be pretty aware of this
      --  setting, as it affects optimisation. i.e. Calls that use this order
      --  are not just regular library calls

      Release,
      --  Creates an inter-thread happens-before-constraint to acquire (or
      --  stronger) semantic loads that read from this release store. Can
      --  prevent sinking of code to after the operation.
      --  Note: This implies the compiler needs to be pretty aware of this
      --  setting, as it affects optimisation. i.e. Calls that use this order
      --  are not just regular library calls

      Acquire_Release,
      --  Combines the effects of both Acquire and Release.
      --  Note: This implies the compiler needs to be pretty aware of this
      --  setting, as it affects optimisation. i.e. Calls that use this order
      --  are not just regular library calls

      Sequentially_Consistent
      --  Enforces total ordering with all other Sequentially_Consistent
      --  operations.  This is basically equivalent to Ada's Volatile and
      --  Atomic semantics
     );

   function Atomic_Load
     (From         : aliased Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic,
          Pre        => Memory_Order = Relaxed or else
                        Memory_Order = Acquire or else
                        Memory_Order = Consume or else
                        Memory_Order = Sequentially_Consistent;
   --
   --  Returns the value of From

   procedure Atomic_Load
     (From         : aliased Atomic_Type;
      Result       : out Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
     with Convention => Intrinsic,
          Pre        => Memory_Order = Relaxed or else
                        Memory_Order = Acquire or else
                        Memory_Order = Consume or else
                        Memory_Order = Sequentially_Consistent;
   --
   --  Returns the value of From into Result.

   procedure Atomic_Store
     (Into         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
       with Convention => Intrinsic,
            Pre        => Memory_Order = Relaxed or else
                          Memory_Order = Release or else
                          Memory_Order = Sequentially_Consistent;
   --
   --  Writes Value into Into

   function Atomic_Exchange
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
     return Atomic_Type
     with Convention => Intrinsic,
          Pre        => Memory_Order = Relaxed or else
                        Memory_Order = Acquire or else
                        Memory_Order = Release or else
                        Memory_Order = Acquire_Release or else
                        Memory_Order = Sequentially_Consistent;
   --
   --  Writes Value into Item, and returns the previous value of Item.

   procedure Atomic_Exchange
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Result       : out Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
     with Convention => Intrinsic,
          Pre        => Memory_Order = Relaxed or else
                        Memory_Order = Acquire or else
                        Memory_Order = Release or else
                        Memory_Order = Acquire_Release or else
                        Memory_Order = Sequentially_Consistent;
   --
   --  Stores the value of Value into Item. The original value of Item is
   --  copied into Result.

   function Atomic_Compare_And_Exchange
     (Item                 : aliased in out Atomic_Type;
      Expected             : aliased Atomic_Type;
      Desired              : Atomic_Type;
      Weak                 : Boolean;
      Success_Memory_Order : Memory_Orders;
      Failure_Memory_Order : Memory_Orders) return Boolean
     with Convention => Intrinsic,
          Pre        => Failure_Memory_Order /= Release and then
                        Failure_Memory_Order /= Acquire_Release and then
                        Failure_Memory_Order <= Success_Memory_Order;
   --
   --  Compares the value of Item with the value of Expected. If equal, the
   --  operation is a read-modify-write operation that writes Desired into
   --  Item. If they are not equal, the operation is a read and the current
   --  contents of Item are written into Expected.
   --  Weak is true for weak compare_and_exchange, which may fail spuriously,
   --  and false for the strong variation, which never fails spuriously. Many
   --  targets only offer the strong variation and ignore the parameter.
   --  When in doubt, use the strong variation.
   --
   --  If desired is written into Item then True is returned and memory is
   --  affected according to the memory order specified by
   --  Success_Memory_Order. There are no restrictions on what memory order can
   --  be used here.
   --
   --  Otherwise, False is returned and memory is affected according to
   --  Failure_Memory_Order.

   --------------------------------------------------------------------

   --------------------------
   --  Arithmetic Operations
   --
   --  The following functions perform the operation suggested by the name,
   --  and return the result of the operation.
   --
   -- i.e. Item := Item op Value; return Item;
   --
   --  All memory orders are valid.

   function Atomic_Add_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Subtract_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Bitwise_And_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Bitwise_Or_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Xor_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Nand_And_Fetch
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   -------------------------------------------------------------------------
   --  The following functions perform the operation suggested by the name,
   --  and return the value that had previously been in Item.
   --
   --  i.e.  Tmp := Item; Item := Item op Value; return Tmp;
   --
   --  All memory orders are valid.
   ------------------------------------------------------------------------

   function Atomic_Fetch_And_Add
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Fetch_And_Subtract
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Fetch_And_Bitwise_And
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Fetch_And_Bitwise_Or
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Fetch_And_Xor
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Fetch_And_Nand
     (Item         : aliased in out Atomic_Type;
      Value        : Atomic_Type;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Atomic_Type
     with Convention => Intrinsic;

   function Atomic_Test_And_Set
     (Item         : aliased in out Interfaces.Unsigned_8;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
      return Boolean
     with Convention => Intrinsic;
   --
   --  Performs an atomic test-and-set operation on Item. Item is set to some
   --  implementation defined nonzero "set" value and the return value is true
   --  if and only if the previous contents were "set".
   --  All memory orders are valid.

   procedure Atomic_Clear
     (Item         : aliased in out Interfaces.Unsigned_8;
      Memory_Order : Memory_Orders := Sequentially_Consistent)
     with Convention => Intrinsic,
          Pre        => Memory_Order = Relaxed or else
                        Memory_Order = Release or else
                        Memory_Order = Sequentially_Consistent;
   --  Performs an atomic clear operation on Item. After the operation, Item
   --  contains 0. This call should be used in conjunction with
   --  Atomic_Test_And_Set.

   procedure Atomic_Thread_Fence
     (Memory_Order : Memory_Orders := Sequentially_Consistent)
     with Convention => Intrinsic;
   --  This procedure acts as a synchronization fence between threads based on
   --  the specified memory order. All memory orders are valid.

   procedure Atomic_Signal_Fence
     (Memory_Order : Memory_Orders := Sequentially_Consistent)
     with Convention => Intrinsic;
   --  This procedure acts as a synchronization fence between a thread and
   --  signal handlers int he same thead. All memory orders are valid.

   function Atomic_Always_Lock_Free return Boolean
     with Convention => Intrinsic;
   --  Returns True if objects always generate lock-free atomic instructions
   --  for the target architecture.

   function Atomic_Always_Lock_Free
     (Item : aliased Atomic_Type) return Boolean
     with Convention => Intrinsic;
   --  Returns True if objects always generate lock-free atomic instructions
   --  for the target architecture. Item may be used ot determine alignment.
   --  The compiler may also ignore this parameter.

   function Atomic_Is_Lock_Free
     (Item : aliased Atomic_Type) return Boolean
     with Convention => Intrinsic;
   --
   --  This function returns True if objects always generate lock-free atomic
   --  instructions for the target architecture. If the intrinsic function is
   --  not know to be lock-free, a call is made to a runtime routine to
   --  determine the answer. Item may be used ot determine alignment.
   --  The compiler may also ignore this parameter.

end Atomic_Operations;

****************************************************************

From: Erhard Ploedereder
Sent: Saturday, March 31, 2018  7:43 PM

>> So, I would have less problems with a set of procedures that
>> encapsulated the particular primitives, combined with a mandate to
>> support all that match a bus-synchronized operation and a mandate to
>> REJECT calls to any for which said operation does not exist.
> The problem is I don't think we can do that in the Standard, since
> there is zero chance that we could formally define what a
> "bus-synchronized operation" is. And any Legality Rule requires that.

Not so difficult:

"A call on xyz is illegal if the compiler does not map it to a single
atomic instruction of the target architecture." (or rewrite this an
implementation requirement, with an "otherwise  produce an
error/warning/diagnostic message ).

Incidentally, C.1(11) already has the recommendation that intrinsic
subprograms for the set of these atomic memory operations be provided.
(Atomic increment is one of them - this was mentioned in another mail as
being useful when available.) The way C.1 is written, one would expect the
respective target-specific subset of atomic memory operations. This would be
a good place to standardize their signatures and be done with the AI.

****************************************************************

From: Tucker Taft
Sent: Sunday, April 1, 2018  4:39 PM

I don't think we should delve into the various memory ordering options.  It
just seems like overkill given how rarely these will be used.  Remember C++
doesn't have an equivalent to the protected object (at least not yet! -- never
say never in C++ ;-).  So we just need to address the main usage, I believe.
So I would go with Sequentially Consistent, and those who are desperate for
some looser synchronization can write their own primitives, or pay their
vendor to provide it.

Standardizing the Lock_Free aspect might be worth doing, but I don't believe
that addresses the main goal here, which is to provide atomic operations on
atomic objects.

****************************************************************

From: Brad Moore
Sent: Saturday, May 5, 2018  5:30 PM

Here is my first cut at AI12-0234-1, which provides better support for
lock-free programming in Ada. [This is version /02 of the AI - Editor.]

The original title of this AI was Compare and Swap for Atomic objects, which
was generalized to also provide other atomic operations, such as being able
to add or subtract values to integer types, and apply bitwise operations to
modular types atomically.

The proposal has 4 new libraries.

Atomic_Operations,
Atomic_Operations.Storage,
Atomic_Operations.Signed_Arithmetic,
Atomic_Operations.Modular_Arithmetic

The parent library is non-generic and includes some test and set based
operations, which do not require any user types.

The other 3 libraries are generic libraries.

The Storage library has a generic formal discrete type, and provide the
Compare and Swap, Load, Store, and Exchange capabilities.

The Signed_Arithmetic has a generic formal signed integer type, and provides
variants of add and subtract. A Constraint error can be raised as expected if
the operation overflows.

The Modular_Arithmetic has a generic formal modular type, and in addition to
Add and subtract, also provides various bit manipulation calls that one might
expect to use with a modular type. Overflow has wrap around semantics, and
does not raise constraint_error, as one might expect.

To specify that the generic formals are atomic types, I needed to change the
rules to allow the Atomic aspect to be specified on a generic formal, which
previously was not allowed.

Finally, in addition, since protected objects on a single processor are
considered to be lock-free and deadlock free with appropriate use of the
ceiling priority protocol, this AI also extends the rules for the CPU aspect,
so that it can be applied to protected type declarations. The compiler can
simplify the implementation for such protected types, since locking is not
needed, plus there is the added benefit of being deadlock free, which was
considered to be important by the attendees at the recent IRTAW, which
unanimously felt it was worth adding to the standard.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 11, 2018  12:06 AM

> The parent library is non-generic and includes some test and set based
> operations, which do not require any user types.

Humm.

>   type Test_And_Set_Type is mod 2**8;

(1) Shouldn't this type be declared atomic (or at least volatile)? It's
beyond weird to do atomic operations on non-atomic objects. (It's also a
substantial optimizer problem - an Ada optimizer knows to leave atomic objects
alone, but that certainly isn't the case for other objects.)

(2) 2**8 would be a very expensive type on the U2200 and various other unusual
processors. Ada has never tied itself to 8-bit machines like Java. Shouldn't
this type be based on the storage unit in System?

...
> To specify that the generic formals are atomic types, I needed to
> change the rules to allow the Atomic aspect to be specified on a
> generic formal, which previously was not allowed.

I wonder if this should be done in a separate AI. It could get complex if
Steve discovers any more contract problems.

> Modify C.6 (6.1/3)  to allow aspect Atomic to be applied to a generic
> formal type
>
> For an object_declaration, a component_declaration, or a {type
> (including a formal type)}[full_type_declaration], the following
> representation aspects may be specified:

(1) This implies that Atomic can be specified in private types, extensions,
interfaces, incomplete types, yadda yadda yadda. The Legality Rules in 13.1
for representation aspects would prevent that, but those same rules also
would prevent use on formal types "unless otherwise specified". Seems
confused. I'd suggest just adding "formal_type_declaration" to the original
list and then there is no confusion nor any need for a parenthetical remark.

  For an object_declaration, a component_declaration, {full_type_declartion},
  or a {formal}[full]_type_declaration, the following representation aspects
  may be specified:

(2) This means all 6 aspects are getting allowed on formal types, even though
we only have need (and have rules!) for one. Is that really what we want? Do
we really want to mess around with Independent_Components in generics? Etc.

>Modify C.6 (12/3)
>If an atomic object is passed as a parameter, then the formal parameter
>shall either have an atomic type or allow pass by copy. If an atomic
>object is used as an actual for a generic formal object of mode in out,
>then the type of the generic formal object shall be atomic. If the
>prefix of an attribute_reference for an Access attribute denotes an
>atomic object [(including a component)], then the designated type of
>the resulting access type shall be atomic. {If a generic formal type is
>atomic, then the actual shall be atomic.} If an atomic type is used as
>an actual for a generic formal derived type, then the ancestor of the formal
>type shall be atomic. Corresponding rules apply to volatile objects and
>types.
>
>In a generic instantiation the actual type corresponding to an atomic
>formal scalar, private, derived, array, or access-to-object type shall
>be atomic;

Could you explain why you are saying this twice, once with an added sentence
in the original paragraph, and a second time with an added paragraph. It seems
like the first added rule would be enough, but perhaps I missed some subtle
issue.

> Finally, in addition, since protected objects on a single processor
> are considered to be lock-free and deadlock free with appropriate use
> of the ceiling priority protocol, this AI also extends the rules for
> the CPU aspect, so that it can be applied to protected type
> declarations. The compiler can simplify the implementation for such
> protected types, since locking is not needed, plus there is the added
> benefit of being deadlock free, which was considered to be important
> by the attendees at the recent IRTAW, which unanimously felt it was
> worth adding to the standard.

This also seems like it ought to have been a separate AI, even more so than
the formal Atomic types. As you know, when AIs get too big, it's hard to make
progress on them. And this topic is completely separate from the other issues.

****************************************************************

From: Brad Moore
Sent: Friday, May 11, 2018  6:25 PM

>>   type Test_And_Set_Type is mod 2**8;
>
> (1) Shouldn't this type be declared atomic (or at least volatile)?
> It's beyond weird to do atomic operations on non-atomic objects. (It's
> also a substantial optimizer problem - an Ada optimizer knows to leave
> atomic objects alone, but that certainly isn't the case for other
> objects.)

I agree, it should be atomic. When I originally started out, I was trying to
do this with ordinary integer types, since the atomic operations provided by
the GCC API, can be applied to ordinary integers. Even in C, there is no need
to apply a volatile pragma or anything special to the declarations. All the
operations just take an access to a memory location, and all updates to the
memory location are via the API calls, so it works.

However, I then realized that trying to describe this in the RM was going to
be messy, and that it would be better to say everything is atomic, then the
definitions for atomic get applied for free, and found it fit much better to
do that.

I then added Atomic to all all the calls in the specs, but missed this one.

> (2) 2**8 would be a very expensive type on the U2200 and various other
> unusual processors. Ada has never tied itself to 8-bit machines like Java.
> Shouldn't this type be based on the storage unit in System?

I agree that storage unit would be a more flexible definition. The GCC api
just has a void pointer, but the documentation says it updates a byte of
storage. I suppose one could alternatively say 2**implementation defined, like
we do for Hash_Type in Ada.Containers, but in this case I think basing it on
storage unit would be a better choice. I wonder if it could be a private type?
I think I tried that, but found it harder to describe in the wording. This
way, one can say that Clear sets the value to a zero value.

> ...
>> To specify that the generic formals are atomic types, I needed to
>> change the rules to allow the Atomic aspect to be specified on a
>> generic formal, which previously was not allowed.
>
> I wonder if this should be done in a separate AI. It could get complex
> if Steve discovers any more contract problems.

I think a separate AI for this would be better also to separate concerns.

>> Modify C.6 (6.1/3)  to allow aspect Atomic to be applied to a generic formal type
>>
>> For an object_declaration, a component_declaration, or a {type
>> (including a formal type)}[full_type_declaration], the following
>> representation aspects may be specified:
>
> (1) This implies that Atomic can be specified in private types,
> extensions, interfaces, incomplete types, yadda yadda yadda. The
> Legality Rules in 13.1 for representation aspects would prevent that,
> but those same rules also would prevent use on formal types "unless
> otherwise specified". Seems confused. I'd suggest just adding
> "formal_type_declaration" to the original list and then there is no
> confusion nor any need for a parenthetical remark.
>
>  For an object_declaration, a component_declaration,
> {full_type_declartion}, or a  {formal}[full]_type_declaration, the
> following representation aspects may be
>  specified:

I agree with your suggestion. I think I got the original wording idea by
looking at what was done for the Nonblocking aspect, since it can be
applied to generic formals.

> (2) This means all 6 aspects are getting allowed on formal types, even
> though we only have need (and rules!) for one. Is that really what we want?
> Do we really want to mess around with Independent_Components in generics?
> Etc.

I think I was under the impression that I was letting Volatile ride in on the
coattails of Atomic for this purpose, but didn't think the other ones were
being included. The thought was that it probably makes sense to allow Volatile
on generic formal types, or weird not to, if we allow Atomic. But only if it
fits better with the wording. Otherwise restricting it to just Atomic is fine
by me.

>>Modify C.6 (12/3)
>>If an atomic object is passed as a parameter, then the formal parameter
>>shall either have an atomic type or allow pass by copy. If an atomic object
>>is used as an actual for a generic formal object of mode in out, then the
>>type of the generic formal object shall be atomic. If the prefix of an
>>attribute_reference for an Access attribute denotes an atomic object
>>[(including a component)], then
>>the designated type of the resulting access type shall be atomic. {If a
>>generic formal type is atomic, then the actual shall be atomic.} If an
>>atomic type is used as an actual for a generic formal derived type,
>>then the ancestor of the formal type shall be atomic. Corresponding rules
>>apply to volatile objects and types.

>>In a generic instantiation the actual type corresponding to an atomic
>>formal scalar, private, derived, array, or access-to-object type shall be
>>atomic;
>
> Could you explain why you are saying this twice, once with an added
> sentence in the original paragraph, and a second time with an added
> paragraph. It seems like the first added rule would be enough, but
> perhaps I missed some subtle issue.

Sure. I said it the first time because that is what I thought was needed. Then
I saw somewhere (probably nonblocking aspect) that needed to describe generic
matching rules for instances and looked up an example of that (again, likely
from nonblocking aspect), which is where I got the second version of the
statement.  The intent was that we could get rid of one or the other, or merge
into one. I wasn't sure if the second was more of a boilerplate that was
needed for generic instances, and also if it had some subtlety not captured
in the first time. Otherwise I dont have a real reason for stating it the
second time. If you don't think it is needed, we can get rid of it.

>> Finally, in addition, since protected objects on a single processor
>> are considered to be lock-free and deadlock free with appropriate use
>> of the ceiling priority protocol, this AI also extends the rules for
>> the CPU aspect, so that it can be applied to protected type
>> declarations. The compiler can simplify the implementation for such
>> protected types, since locking is not needed, plus there is the added
>> benefit of being deadlock free, which was considered to be important
>> by the attendees at the recent IRTAW, which unanimously felt it was
>> worth adding to the standard.
>
> This also seems like it ought to have been a separate AI, even more so
> than the formal Atomic types. As you know, when AIs get too big, it's
> hard to make progress on them. And this topic is completely separate
> from the other issues.

I agree that this really should be a separate AI. I think I was thinking at
the time that it would be more likely to be considered if part of an existing
AI, and it is related to the topic of lock-free, so it kind of fits somewhat.
I thought that adding a new AI at this stage might be been disallowed, but if
that can be done, given that IRTAW was behind it, then I agree that is a
better way to go.

Would you like me to resubmit this as 3 separate AI's?

****************************************************************

From: Randy Brukardt
Sent: Friday, May 11, 2018  9:03 PM

...
> >>In a generic instantiation the actual type corresponding to an
> >>atomic  formal scalar, private, derived, array, or access-to-object
> >>type shall be atomic;
> >
> > Could you explain why you are saying this twice, once with an added
> > sentence in the original paragraph, and a second time with an added
> > paragraph. It seems like the first added rule would be enough, but
> > perhaps I missed some subtle issue.
>
> Sure. I said it the first time because that is what I thought was
> needed. Then I saw somewhere (probably nonblocking
> aspect) that needed to describe generic matching rules for instances
> and looked up an example of that (again, likely from nonblocking
> aspect), which is where I got the second version of the statement.
> The intent was that we could get rid of one or the other, or merge
> into one. I wasn't sure if the second was more of a boilerplate that
> was needed for generic instances, and also if it had some subtlety not
> captured in the first time.
> Otherwise I dont have a real reason for stating it the second time. If
> you don't think it is needed, we can get rid of it.

I believe the Nonblocking rule was written the way it was because it does
*not* apply to generic discrete (or is it scalar?) types. So I had to mention
what it *did* apply to, as opposed to just saying it always applied.
If your rule applies to all types, it's just really long-winded for no reason.

...
> I agree that this really should be a separate AI. I think I was
> thinking at the time that it would be more likely to be considered if
> part of an existing AI, and it is related to the topic of lock-free,
> so it kind of fits somewhat. I thought that adding a new AI at this
> stage might be been disallowed, but if that can be done, given that
> IRTAW was behind it, then I agree that is a better way to go.

I haven't sent the list to WG 9 yet, so IRTAW can add new things until I get
caught up posting AIs. (Almost there, but not quite.) And you could probably
make a case for it to be related to our instructions, but even if not, a split
of an already started AI is always in-bounds.

> Would you like me to resubmit this as 3 separate AI's?

Yes please. Remember to mention that in the existing AI12-0234-1 part (the
actual packages) that you're depending on AI12-028x-1. (I just finished 277,
and I have four more to post/write, so the number should be in the 280s.)

****************************************************************

From: Jeff Cousins
Sent: Monday, May 14, 2018  1:34 PM

> Would you like me to resubmit this as 3 separate AI's?

Yes please, three separate AIs would seem sensible to me too.

****************************************************************

From: Randy Brukardt
Sent: Monday, May 14, 2018  10:09 PM

I didn't say anything about the first two items here, and I probably should have.

...
> > (2) 2**8 would be a very expensive type on the U2200 and various
> > other unusual processors. Ada has never tied itself to 8-bit
> > machines like Java.
> > Shouldn't this type be based on the storage unit in System?
>
> I agree that storage unit would be a more flexible definition. The GCC
> api just has a void pointer, but the documentation says it updates a
> byte of storage.
> I suppose one could alternatively say 2**implementation defined, like
> we do for Hash_Type in Ada.Containers, but in this case I think basing
> it on storage unit would be a better choice. I wonder if it could be a
> private type? I think I tried that, but found it harder to describe in
> the wording.
> This way, one can say that Clear sets the value to a zero value.

Perhaps you just want to copy (literally) the definition of Storage_Element
(which is modular):

   type Test_And_Set_Type is mod System.Storage_Elements.Storage_Element'Modulus
      with Atomic;

(We can't use Storage_Element directly because of the need to declare this
atomic, but we can copy the modulus. Storage_Unit is defined to be
Storage_Element'Size.)

...
> > (2) This means all 6 aspects are getting allowed on formal types,
> > even though we only have need (and rules!) for one. Is that really
> > what we want?
> > Do we really want to mess around with Independent_Components in generics?
> > Etc.
>
> I think I was under the impression that I was letting Volatile ride in
> on the coattails of Atomic for this purpose, but didn't think the
> other ones were being included.
> The thought was that it probably makes sense to allow Volatile on
> generic formal types, or weird not to, if we allow Atomic. But only if
> it fits better with the wording.
> Otherwise restricting it to just Atomic is fine by me.

That seemed OK to me, but what about Atomic_Components and Volatile_Components
on formal array types? That seems like work :-) and I don't know if we have
any use for that work. And similarly for the Independent cases.

>Would you like me to resubmit this as 3 separate AI's?

Repeating myself, Yes. AI12-0234-1 would be the generic packages (since that
is the original question), the formal type one is needed to support that
original one, and the third is just a good idea on its own. Those will get
numbers when I get them. All three will appear at the place of AI12-0234-1
in our priority order (at least initially), since there isn't anything else
to use to give them an order.

****************************************************************

From: Randy Brukardt
Sent: Monday, May 14, 2018  11:40 PM

> ...
> > > (2) This means all 6 aspects are getting allowed on formal types,
> > > even though we only have need (and rules!) for one. Is that really
> > > what we want?
> > > Do we really want to mess around with Independent_Components in generics?
> > > Etc.
> >
> > I think I was under the impression that I was letting Volatile ride
> > in on the coattails of Atomic for this purpose, but didn't think the
> > other ones were being included.
> > The thought was that it probably makes sense to allow Volatile on
> > generic formal types, or weird not to, if we allow Atomic. But only
> > if it fits better with the wording.
> > Otherwise restricting it to just Atomic is fine by me.
>
> That seemed OK to me, but what about Atomic_Components and
> Volatile_Components on formal array types? That seems like work :-)
> and I don't know if we have any use for that work.
> And similarly for the Independent cases.

Humm, I see that the actual wording only uses that heading for the three
aspects Atomic, Volatile, and Independent. But the proposed wording literally
only has rules for Atomic and Volatile. There's no wording for what it meaning
to put Independent into a formal type, but the proposed wording allows it. We
need at a minimum to reconcile this (either have wording for generic matching
for Independent, or only allow on formals for Atomic and Volatile).

****************************************************************

From: Brad Moore
Sent: Friday, May 18, 2018  4:36 PM

> ...
>> > (2) 2**8 would be a very expensive type on the U2200 and various
>> > other unusual processors. Ada has never tied itself to 8-bit
>> > machines like Java.
>> > Shouldn't this type be based on the storage unit in System?
>>
>> I agree that storage unit would be a more flexible definition. The
>> GCC api just has a void pointer, but the documentation says it
>> updates a byte of storage.
>> I suppose one could alternatively say 2**implementation defined, like
>> we do for Hash_Type in Ada.Containers, but in this case I think
>> basing it on storage unit would be a better choice. I wonder if it
>> could be a private type? I think I tried that, but found it harder to
>> describe in the wording.
>> This way, one can say that Clear sets the value to a zero value.
>
> Perhaps you just want to copy (literally) the definition of
> Storage_Element (which is modular):
>
>   type Test_And_Set_Type is
>      mod System.Storage_Elements.Storage_Element'Modulus
>      with Atomic;
>
> (We can't use Storage_Element directly because of the need to declare
> this atomic, but we can copy the modulus. Storage_Unit is defined to be
> Storage_Element'Size.)

I believe we *can* use Storage_Element directly if we use derivation. And in
either case, I think we'd also want to specify the Size aspect. So either;


   type Test_And_Set_Type is mod
     System.Storage_Elements.Storage_Element'Modulus
     with Atomic, Size => System.Storage_Elements.Storage_Element'Size;

as you suggested, or;

   type Test_And_Set_Type is new System.Storage_Elements.Storage_Element
     with Atomic, Size => System.Storage_Elements.Storage_Element'Size;

It's not clear to me which of these is better, or if it matters.
Maybe the Size aspect for the derivation case isn't needed if Size is
inherited.

I suppose it depends on whether we want to think of this type as a type that
happens to look a lot like a Storage_Element, or instead as a special type of
Storage_Element.

I think I am leaning towards the latter, using derivation.

Another question I have is with regard to future-proofing these libraries.

The real gcc library calls have an extra parameter for memory model, which
defaults to the sequentially consistent memory model if not specified. We are
saying right now that we want to go with a simpler approach and only support
the Sequentially_Consistent memory model, and therefore no parameter is
needed.

I am thinking it might be better future-proofing if we defined an enumeration
for Memory_Model that for now only has one literal, Sequentially_Consistent,
and that parameter be added to the calls, defaulting to the
Sequentially_Consistent value.

That way, in the future, if we decide to support other memory models, adding
items to the Memory_Model enumeration is less likely to introduce backwards
compatibility than adding a new parameter.

Is this suggestion worth worrying about? or is backwards compatibility not a
significant problem here? I'm was originally thinking of cases such as
instantiating generics using these subprograms, which might expect a certain
profile signature. On the other hand, since these are intrinsic subprograms,
one cant instantiate generics with these as actuals, and maybe there are not
any other real backwards compatibility concerns?

****************************************************************

From: Randy Brukardt
Sent: Friday, May 18, 2018  5:36 PM

...
> I suppose it depends on whether we want to think of this type as a
> type that happens to look a lot like a Storage_Element, or instead as
> a special type of Storage_Element.
>
> I think I am leaning towards the latter, using derivation.

I was thinking it was the former, thus the formulation I suggested. Doesn't
really matter much either way, though. At least not until Steve finds some
interesting way to use a formal derived type to cause a mess. ;-)

...
> Is this suggestion worth worrying about? or is backwards compatibility
> not a significant problem here? I'm was originally thinking of cases
> such as instantiating generics using these subprograms, which might
> expect a certain profile signature. On the other hand, since these are
> intrinsic subprograms, one cant instantiate generics with these as
> actuals, and maybe there are not any other real backwards
> compatibility concerns?

I tend to not want to worry about it. The C++ memory models seem to correspond
roughly to "normal"/Volatile/Atomic objects -- Ada sets these models
per-object rather than per program. If we wanted (say) a volatile version of
these libraries, I'd expect that it would have to be separately defined (given
the way you defined aspect matching). And I can't imagine adopting a C++
memory model that essentially strips the guarantees of Atomic while still
allowing objects to be declared that way.

****************************************************************

From: Brad Moore
Sent: Friday, May 18, 2018  7:04 PM

> ...
>> Is this suggestion worth worrying about? or is backwards
>> compatibility not a significant problem here? I'm was originally
>> thinking of cases such as instantiating generics using these
>> subprograms, which might expect a certain profile signature. On the
>> other hand, since these are intrinsic subprograms, one cant
>> instantiate generics with these as actuals, and maybe there are not
>> any other real backwards compatibility concerns?
>
> I tend to not want to worry about it. The C++ memory models seem to
> correspond roughly to "normal"/Volatile/Atomic objects -- Ada sets
> these models per-object rather than per program. If we wanted (say) a
> volatile version of these libraries, I'd expect that it would have to
> be separately defined (given the way you defined aspect matching). And
> I can't imagine adopting a C++ memory model that essentially strips
> the guarantees of Atomic while still allowing objects to be declared that
> way.

I think Ada's per object approach is a sensible one, but I think the memory
model concept is someone independent of the volatile/atomic/normal semantics.

Here is an example I found today, that actually looks very similar to the
approach I was suggested for implementing the FIFO_Spinning Admission policy
in AI12-0276.
The idea there was a ticket abstraction where someone wanting a lock takes the
next ticket number and then busy waits until that number comes up.

The full source code example can be found at;

   https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/ticket_mutex.c

The example uses the same primitive calls that we are considering for Ada.
To obtain the lock, they do an atomic increment to get the next ticket number
using atomic_fetch_add, then keep calling atomic_load until they see their
ticket number come up. For these calls, I believe the default memory order,
Sequentially_Consistent is applied, which makes sense since there is
communication between threads.

void ticket_mutex_lock(ticket_mutex_t * self) {
    long lingress = atomic_fetch_add(&self->ingress, 1);
    while (lingress != atomic_load(&self->egress)) {
        sched_yield();  // Replace this with thrd_yield() if you use <threads.h>
    }
    // This thread has acquired the lock on the mutex }

When unlocking the mutex, they load the current ticket number locally, using
the relaxed memory order, because since the lock was held it is known that the
value of the current ticket number would not have changed. This is more
efficient than loading using the sequentially consistent memory model, as that
requires synchronisations across all the threads. The store back to update the
current ticket number uses sequentially consistent again since other threads
need to see the update.


/*
 * Unlocks the mutex
 * Progress Condition: Wait-Free Population Oblivious
 *
 * We could do a simple atomic_fetch_add(egress, 1) but it is faster to do
 * the relaxed load followed by the store with release barrier.
 * Notice that the load can be relaxed because the thread did an acquire
 * barrier when it read the "ingress" with the atomic_fetch_add() back in
 * ticket_mutex_lock() (or the acquire on reading "egress" at a second try),
 * and we have the guarantee that "egress" has not changed since then.
 */
void ticket_mutex_unlock(ticket_mutex_t * self)
{
    long legress = atomic_load_explicit(&self->egress, memory_order_relaxed);
    atomic_store(&self->egress, legress+1);
}


So, the memory order seems somewhat orthogonal to the atomicity class of the
object. In some cases, different orders might ideally be wanted to be applied
to the same object at different times to get better performance.

Given this, do you still think that we'd not want to someday be able to
specify different memory orders using the same generic instantiation?

I suppose we could add new overloaded calls to the libraries in the future
that accept a memory order parameter, and leave the current calls as is,
without a parameter.

Then there is no backwards compatibility issue. I think I just answered my own
question... (with your help :))

So I think we're OK not worrying about this.

****************************************************************

From: Brad Moore
Sent: Friday, May 18, 2018  9:37 PM

>> > > (2) This means all 6 aspects are getting allowed on formal types,
>> > > even though we only have need (and rules!) for one. Is that
>> > > really what we want?
>> > > Do we really want to mess around with Independent_Components in
> generics?
>> > > Etc.
>> >
>> > I think I was under the impression that I was letting Volatile ride
>> > in on the coattails of Atomic for this purpose, but didn't think
>> > the other ones were being included.
>> > The thought was that it probably makes sense to allow Volatile on
>> > generic formal types, or weird not to, if we allow Atomic. But only
>> > if it fits better with the wording.
>> > Otherwise restricting it to just Atomic is fine by me.
>>
>> That seemed OK to me, but what about Atomic_Components and
>> Volatile_Components on formal array types? That seems like work :-)
>> and I don't know if we have any use for that work.
>> And similarly for the Independent cases.
>
> Humm, I see that the actual wording only uses that heading for the
> three aspects Atomic, Volatile, and Independent. But the proposed
> wording literally only has rules for Atomic and Volatile. There's no
> wording for what it meaning to put Independent into a formal type, but
> the proposed wording allows it. We need at a minimum to reconcile this
> (either have wording for generic matching for Independent, or only
> allow on formals for Atomic and Volatile).

Humm indeed. I originally said it was weird to allow Atomic without Volatile,
but now it feels even weirder to have those two and not have Independent,
especially since C.6 (8.1/4) says that "All atomic objects and aliased objects
are considered to be specified as independently addressable."

And then more weird to be able to specify Atomic or Volatile on a formal array
type, but not be able to specify Atomic_Components or Volatile_Components.

Also, if there's a need to have Atomic on generic formals, it is not too hard
to imagine there being a need for Atomic_Components on a generic formal array
type. An array of counters that gets atomically incremented is the first thing
that comes to mind.

I see quite a bit less of a use case for Volatile and Volatile components than
Atomic and Atomic_Components, and even lesser a case for Independent and
Independent_Components.

So I am now thinking the way to go for now is to allow Atomic and
Atomic_Components only, and not allow any of the others.

If we ever come up with a more concrete need for the others, they can be added
in the future.

Unless I hear otherwise, I will write the AI up this way....

****************************************************************

From: Tucker Taft
Sent: Friday, May 18, 2018  8:48 PM

...
> So I am now thinking the way to go for now is to allow Atomic and
> Atomic_Components only, and not allow any of the others.
>
> If we ever come up with a more concrete need for the others, they can be
> added in the future.
>
> Unless I hear otherwise, I will write the AI up this way....

Seems a reasonable starting point.  In the long run, supporting all of them
makes the most sense, but I can see starting with just Atomic.

****************************************************************

From: Brad Moore
Sent: Friday, May 18, 2018  7:38 PM

Here is my updated AI for the lock free libraries. [This is version /03 of the
AI - Editor.]

Other than some minor changes addressing comments received so far, the big
difference is that I split out the two other parts which will be appearing as
new separate AI's.

Specifically, I removed that part about pragma CPU being specifiable on
protected object declarations, and I removed that part about allowing pragma
Atomic to be specifiable on generic formal types, which this AI depends on.

The discussion is simplified a lot, and the problem I think provides clearer
and better motivation.

I think the calls for Atomic_Thread_Fence and Atomic_Signal_Fence will
definitely need more wording to describe those concepts if we want to support
those calls. I left those loosely and simply defined, in case we decide to drop
those calls.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 6, 2018  5:16 PM

> Here is my updated AI for the lock free libraries.

For the record, I made a handful of editorial corrections to this AI when I
posted it.

(1) Corrected the spelling of "synchronization" from "syncronisation" (Ada
uses American English spellings, thus a 'z' rather than an 's').

(2) Inserted the actual number of the AI for atomic formal types
(AI12-0282-1) and moved it's paragraph last in the !proposal section. (It
reads best giving the general solution first, then why the libraries are
generic, and finally that the libraries depend on another new AI.)

(3) You had two A.19.2 subclauses in your proposal; I renumbered the
subclauses to eliminate this.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 6, 2018  5:31 PM

...
> I think the calls for Atomic_Thread_Fence and Atomic_Signal_Fence will
> definitely need more wording to describe those concepts if we want to
> support those calls. I left those loosely and simply defined, in case
> we decide to drop those calls.

A bit too loosely to even quite understand what they do! It's hard to decide
if these are useful without figuring out where they fit into the Ada
synchronization model.

Specifically:

"This procedure acts as a synchronization fence between threads."

Ada doesn't have "threads", per se; it would be better in this AI to use the
proper Ada terminology (even in the discussion). Probably you need to use the
terminology that Tucker is coming up with for AI12-0119-1 (I forget the
details now; not sure if "tasklet" or something else is appropriate).

And of course "synchronization fence" is also an undefined term. You need to
tie this to 9.10 in order to explain how this fits in with the Ada
synchronization model. (That also would be true when discussing the C "memory
models"; it seems to me that one can only use weaker/stronger models on
individual operations lest the result violate the 9.10 model. And I doubt that
9.10 requires completely sequentially consistent operations; it only requires
that of objects that you can see [anything being allowed for things that can't
change the execution of the program]. Not sure how one would reconcile that -
to be discussed -- or possibly not.)

The second one is worse:

"This procedure acts as a synchronization fence between a thread and signal
handlers in the same thread."

Ada has neither "threads" nor "signal handlers". It's unclear to me why a
protected object would need special synchronization in this case -- 9.10 seems
to imply all protected actions need to be fenced (at least if the compiler
can't prove the absence of volatile/atomic objects in the action).
(Alternatively, all atomic writes need to be fenced. Gotta have one or the
other.)

In any case, 9.10 implies that fencing is done implicitly, and it's not
obvious to me whether explicit fencing buys enough to try to define it (the
AARM suggests defining an atomic object for synchronization, and that would
carry a fence when needed).

Again, to be discussed.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 6, 2018  5:49 PM

(Following up a bit on the previous "fence" comment...)

...
> > I tend to not want to worry about it. The C++ memory models seem to
> > correspond roughly to "normal"/Volatile/Atomic objects -- Ada sets
> > these models per-object rather than per program. If we wanted (say)
> > a volatile version of these libraries, I'd expect that it would have
> > to be separately defined (given the way you defined aspect
> > matching). And I can't imagine adopting a C++ memory model that
> > essentially strips the guarantees of Atomic while still allowing
> > objects to be declared that way.
>
> I think Ada's per object approach is a sensible one, but I think the
> memory model concept is someone independent of the
> volatile/atomic/normal semantics.

...
> So, the memory order seems somewhat orthogonal to the atomicity class
> of the object.
> In some cases, different orders might ideally be wanted to be applied
> to the same object at different times to get better performance.

The C++ library seems to be applying synchronization per-operation rather
than per-object as Ada does (this example shows that I was confused on that
point in the past).

I still contend that it is a bad idea to have operations that don't follow
the usual guarentees provided by the language. In particular, Ada requires
that atomic operations are handled "sequentially consistent" (at least WRT
to the tasks that use those atomics). I think it would be a bad idea to
define operations that violated that expectation.

OTOH, it would make perfect sense to allow *stronger* operations to be applied
to objects with a *weaker* categorization. Specifically, it would make perfect
sense to have a volatile version of the library that provided some sequentially
consistent operations on volatile-not-atomic objects. (Which implicitly answers
your question: a weaker memory model would require a different package than
this one, which is only for atomic types - specifically, the formal parameter
would have to allow volatile types.)

If we had such a library (atomic operations for volatile-not-atomic), then the
example above could be constructed, using normal volatile operations other
than when sequential consistency is required. That would give the intended
semantics.

Of course, as a tool that only synchronization Jedis should consider using,
it's not clear to me whether it should appear in the Standard. (You have to
be an expert, but not necessarily a Jedi, to use atomic update operations to
build a lock-free structure. I could see personally attempting that, but never
the sort of optimization represented by this example.)

****************************************************************

From: Brad Moore
Sent: Sunday, June 10, 2018  12:51 PM

> The C++ library seems to be applying synchronization per-operation
> rather than per-object as Ada does (this example shows that I was
> confused on that point in the past).
>
> I still contend that it is a bad idea to have operations that don't
> follow the usual guarentees provided by the language. In particular,
> Ada requires that atomic operations are handled "sequentially
> consistent" (at least WRT to the tasks that use those atomics). I
> think it would be a bad idea to define operations that violated that expectation.
>
> OTOH, it would make perfect sense to allow *stronger* operations to be
> applied to objects with a *weaker* categorization. Specifically, it
> would make perfect sense to have a volatile version of the library
> that provided some sequentially consistent operations on volatile-not-atomic objects.
> (Which implicitly answers your question: a weaker memory model would
> require a different package than this one, which is only for atomic
> types - specifically, the formal parameter would have to allow
> volatile types.)
>
> If we had such a library (atomic operations for volatile-not-atomic),
> then the example above could be constructed, using normal volatile
> operations other than when sequential consistency is required. That
> would give the intended semantics.

Except that I believe our definition of Volatile was intended to match C's
definition of Volatile, and from what I can gather, the volatile keyword in
C is not useful for sharing data between threads.

https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming?page=4

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html

The three apparent users are;

- to mark local variables in the same scope as a setjmp whose value should
  be preserved across a longjmp. (Not of interest to Ada, since setjmp and
  longjmp do not work well with Ada.)

- when variables may be "externally modified", but the modification in fact
  is triggered synchronously by the thread itself (such as memory mapped I/O)

- to communicate with a signal handler in the same thread, in a restricted
  manner.

Otherwise, volatile provides no other benefits other than slow code down by
not eliminating redundant store and load operations, at least for portability
considerations.

I believe one of the main issues is with regard to reordering of instructions
either by the compiler or by the hardware. Reads and writes of the same
volatile variable would be seen in the same order by different threads, but
read/writes of other volatile variables could be moved past or before
read/writes of other volatile variables. With Atomic, the memory fences or
barriers prevent these sorts of moves.

As such, I believe the volatile keyword in C (and presumably in Ada) have no
mapping to the types of memory ordering semantics defined by C/C++.

You might get the intended behaviour on certain targets such as Intel
processors, but when you try to port the code to other processors, you may get
very subtle difficult to debug incorrect behaviour.

So, I think if we wanted to someday support memory models other than the
sequentially consistent model, we'd probably need to end up defining new
keywords or aspects (e.g. atomic_acquire_release, atomic_relaxed) and either
as you say have multiple versions of these libraries that can also work with
stronger memory models. Or alternatively, I suppose expand the definition of
atomic to encompass all of the memory models, and have the default flavour of
atomic be sequentially consistent, but perhaps allow aspects to weaken the
memory model to one of the other flavours of atomic if desired. Then maybe
you'd only need one version of the library. You could instantiate with say an
atomic_acquire_release type, but could override a call with a stronger memory
ordering if desired, perhaps by supplying an extra parameter to the call,
like they do in C/C++.

On an aside, I currently use Volatile in my paraffin parallelism libraries to
set a boolean flag that idle workers are interested in being assigned more
work by other threads. The code is written so that if another thread sees the
flag set, it attempts use hand off work using a protected object to provide
save synchronization. If multiple tasks see the same flag being set, only one
will win, and others get told that the attempt to hand off via the protected
object failed.

It is not clear if this very limited use of Volatile might be portably safe
enough to keep.

This appears to work on the hardware I have tried including Intel/Android/ and
ARM, and I have done a lot of testing and not seen any failures.
I have tried replacing Volatile with Atomic, and seen significant slow downs.

My suspicion is that if there is a problem, the optimal fix would be to
replace with atomic calls using lower memory ordering models, if they were
available in Ada. But I dont think calling out to a C library which has these
calls would be worthwhile, since from Ada it likely involves at least 1 if
not two procedure calls to insert the instructions needed for the memory
barrier/fences, which can introduce enough overhead to eliminate benefits of
making the call.

****************************************************************

From: Tucker Taft
Sent: Sunday, June 10, 2018  3:27 PM

> when variables may be "externally modified", but the modification in fact is
> triggered synchronously by the thread itself (such as memory mapped I/O)


This is the main usage for Ada.  It means the content of the object might
change at any time due to activity in other threads or hardware devices.

...
> Otherwise, volatile provides no other benefits other than slow code down by
> not eliminating redundant store and load operations, at least for
> portability considerations.

Volatile is not useful for synchronization, but it can be useful to indicate
that data is visible to other threads or other hardware devices.

****************************************************************

From: Randy Brukardt
Sent: Sunday, June 10, 2018  4:48 PM

...
> Otherwise, volatile provides no other benefits other than slow code
> down by not eliminating redundant store and load operations, at least
> for portability considerations.

As Tucker noted, that's not really true. Volatile is very useful when
*combined* with some other synchronization method, because it guarantees that
the other tasks can see the changes as expected at a synchronization point.
Even that is not necessarily true for ordinary variables; there is no
guarantee as to what another task sees at any point for ordinary objects.

Steve has noted that Volatile used for external devices really is subtly
different than Volatile used for data to be safely usable as synchronization
points. But that doesn't seem different enough for the complication.

As I noted, there might be value to adding synchronization to Volatile objects
on a per-operation basis. The rest of the C memory model stuff seems confused
and unnecessary. (I'm not enough of an expert in these ideas to be certain.)
It appears that they don't trust the compiler to do the right thing, so
they're trying to do it manually. Ada usually takes the other approach of
trusting the compiler more than the programmer.

****************************************************************

From: Erhard Ploedereder
Sent: Sunday, June 10, 2018  7:35 PM

> Except that I believe our definition of Volatile was intended to match
> C's definition of Volatile, and from what I can gather, the volatile
> keyword in C is not useful for sharing data between threads.

I can confirm that for C++, based on a discussion that we (OWG) had with
WG21/SG12 last week. "Atomic" is the magic for making cross-core communication
work in C++. "Volatile" merely guarantees that all reads and writes happen in
the sequential order, but there does not seem to be a guarantee for write-thru
to and read-thru from memory. Strange but folks seemed to be sure about it.

****************************************************************

From: Randy Brukardt
Sent: Monday, June 11, 2018  9:14 PM

That's my understand of how it works for Ada these days (Ada 2012) as well.
The discussion and wording of AI05-0275-1 says as much. One needs a
synchronization point in order to be able to reason about volatile objects
in multiple tasks.

But "not useful for synchronization of threads" is not the same as "not
useful for sharing data between threads". It just has to be shared using
some other method (including atomics) for synchronization.

I note in passing that implies that a fence/barrier needs to be associated
with every synchronization event, whether or not atomic objects are involved
(unless of course, the compiler can prove that nothing volatile could have
been read/written since the last such fence -- not very likely). That means
(for instance) that a protected action needs a fence (somewhere) even in a
lock-free environment, otherwise any volatile objects won't be properly
synchronized afterwards. That seems necessary to keep the high-level view
tractable, especially for high-level constructs like protected actions and
rendezvous.

****************************************************************

From: Brad Moore
Sent: Wednesday, June 13, 2018  7:28 AM

> But "not useful for synchronization of threads" is not the same as
> "not useful for sharing data between threads". It just has to be
> shared using some other method (including atomics) for synchronization.

I believe that is how I am using the volatile keyword. Basically, I use it as
a lightweight method to determine whether synchronization is needed, and if
so, then I use a protected object to do the synchronization.

****************************************************************

From: Randy Brukardt
Sent: Thursday, June 14, 2018  6:34 PM

This sounds backwards to me (although I'm at the limits of my knowledge
here!). You can only trust a read of a volatile object *after* a
synchronization point, as to prevent erroneousness, the actions (the write
and the read) have to be sequential. The means either a protected action,
an atomic operation, or one of the "signaling" operations (rendezvous et.
al.).

If the read and write are not separated by something making then sequential,
then it isn't trustworthy (although practically I'd expect that eventually
the change would get reflected, but not necessarily before several writes
happened).

I remember Robert saying that there are things that are formally erroneous,
but in practice will work fine, and your use might in fact be one of those.
You have no guarantees as to how long it will take for the write to show up
in other tasks, but it's hard to imagine that it won't happen eventually
(especially as it is highly likely that some synchronization will happen
sooner or later).

****************************************************************

From: Brad Moore
Sent: Sunday, March 3, 2019  12:10 PM

IN relation to the discussion of AI 234, Compare and Swap, I think the following
two sentences are confusing because they imply that two unrelated parts of the
discussion are related.

"The type Test_And_Set seems over-specified. Brad says that the C library has
three such types, 1, 2, and 4 bytes."

  and

"It is suggested that test-and-set should be declared in one of the generics,
not using a built-in-type. C doesn't have generics so it had to provide fixed
size operations."

When I was talking about 1, 2, and 4 bytes, I was attempting to explain why some
of the packages were generic, and why the Test_And_Set subprograms were not
generic. I was not saying that the test_and_set type can be 1, 2, or 4 bytes.
The gcc documentation says this must always be 1 byte (a bool or char)

In other words, the first sentence about being over-specified is unrelated to
the second sentence about the number of bytes. So, I think it would help clarify
if these are separated better in the minutes. Or alternatively delete the second
sentence about the number of bytes.

I'd suggest however that I'd like to see that the second sentence is moved
after;

"type Test_And_Set_Type is mod <implementation-defined>

   with Atomic;"

And then preceding the second sentence with something like, "In explaining why
the other functions are in generic units, Brad says...."

I think the sentence, "C doesn't have generics so it had to provide fixed size
operations.", should probably be deleted, or moved to follow the "Brad says ..."
sentence.

For the record, the relevant gcc documentation for the test_and_set function is;

  "Built-in Function: bool __atomic_test_and_set (void *ptr, int memorder) This
  built-in function performs an atomic test-and-set operation on the byte at
  *ptr. The byte is set to some implementation defined nonzero “set” value and
  the return value is true if and only if the previous contents were “set”. It
  should be only used for operands of type bool or char. "

And for the record, the following quote from the manual shows in fact that in
addition to 1,2, and 4 bytes, 8, and 16 bytes can be supported for the other
atomic functions if the hardware supports it.

"The ‘__atomic’ builtins can be used with any integral scalar or pointer type
that is 1, 2, 4, or 8 bytes in length. 16-byte integral types are also allowed
if ‘__int128’ (see __int128) is supported by the architecture."

But I agree that the Test_And_Set subprograms could and should be moved into the
generic packages, which would simplify things. But I don't see now the built-in
type can by avoided, as the generic formal type used in all the other functions
is not used in the test_and_set call.

I think the suggestion to make it implementation defined however, helps.

****************************************************************

From: Tucker Taft
Sent: Sunday, March 3, 2019  3:09 PM

If test-and-set really is only supported on single-byte objects, then that is
useful to know.  I would make it a separate generic, with an assertion that
the formal type is a single storage unit.  This limitation to a single byte
was not clear, as least to me, in the AI.

****************************************************************

From: Randy Brukardt
Sent: Sunday, March 3, 2019  7:43 PM

> If test-and-set really is only supported on single-byte objects, then
> that is useful to know.  I would make it a separate generic, with an
> assertion that the formal type is a single storage unit.  This
> limitation to a single byte was not clear, as least to me, in the AI.

He had a size clause to require a single byte representation in the AI, that's what we were discussing at the point of the minutes. And when someone asked him what the C library supported in that case, he answered "1, 2, or 4 bytes". I am certain of this p
oint, because I remember being quite surprised by that answer. If the facts are confused in the minutes, that's because they were presented that way.

Thus, I think the best solution here for the minutes is to add some sort of square bracketed clarification. I'd prefer the minutes reflect what actually was discussed rather than what we *meant* to discuss. (I realize that I sometimes write what I meant to
 say, but that's because it's not separate in my mind. Hopefully reviewers will note such errors.)

****************************************************************

From: Erhard Ploedereder
Sent: Monday, March 4, 2019  5:29 AM

>> "The type Test_And_Set seems over-specified. Brad says that
> the C library has three such types, 1, 2, and 4 bytes."

Which library is that? I locked it up in the C11 standard and all I
could find was section 7.17.8, which (very resonably) expects one
implementation-dependently sized type:

---- copied from the pre-standard ---
7.17.8 Atomic flag type and operations

1 The atomic_flag type provides the classic test-and-set functionality.
It has two states, set and clear.

2 Operations on an object of type atomic_flag shall be lock free.

3 NOTE Hence the operations should also be address-free. No other type
requires lock-free operations, so
the atomic_flag type is the minimum hardware-implemented type needed to
conform to this
International standard. The remaining types can be emulated with
atomic_flag, though with less than ideal properties.
...

7.17.8.1 The atomic_flag_test_and_set functions
Synopsis
1 #include <stdatomic.h>
bool atomic_flag_test_and_set(
volatile atomic_flag *object);
bool atomic_flag_test_and_set_explicit(
volatile atomic_flag *object, memory_order order);

****************************************************************

From: Brad Moore
Sent: Monday, March 4, 2019  10:15 AM

>> If test-and-set really is only supported on single-byte objects, then
>> that is useful to know.  I would make it a separate generic, with an
>> assertion that the formal type is a single storage unit.  This
>> limitation to a single byte was not clear, as least to me, in the AI.

What I'm thinking is to just have a single generic package for everything. I
believe the test and set type can be a private type declared in that package,
which solves the problem of trying to define its size in the visible part of the
spec. I am thinking also of getting rid of the generic package with the generic
formal;

    type Atomic_Type is range <>;

and just have one package with the formal;

     type Atomic_Type is mod <>;

for modular arithmetic.

because I think perhaps that is the best fit for the gcc instrinsic functions.
They dont actually provide signed and modular functions, everything is just
integer types.

in C, I believe you can use unsigned or signed arithmetic types, but they don't
handle arithmetic overflow, which we would need to implement these in Ada.

In my package implementation, for the signed arithmetic functions I had to do
funny things like apply the atomic operation, but then also do do the
calculation a second time using the previous value of the atomic, and the formal
parameter value, so that the arithmetic overflow could be caught.

With the modular arithmetic version of the package, it is much simpler, I just
simply call the gcc function without having to jump through any hoops.

Since there appears to be an appetite for simplifying these packages, I think it
makes sense to drop the signed arithmetic support.

So in summary, my plan is to create two AI's. One just has the compare and swap,
and the second adds the other functions to the same package.

Does that sound like a good approach?

> He had a size clause to require a single byte representation in the
> AI, that's what we were discussing at the point of the minutes. And
> when someone asked him what the C library supported in that case, he
> answered "1, 2, or 4 bytes". I am certain of this point, because I
> remember being quite surprised by that answer. If the facts are
> confused in the minutes, that's because they were presented that way.

My recollection is that something funny happened at that point in the
discussion. Possibly there were two different questions asked in quick
succession, and while providing and answer for the first, it was taken as the
answer for the second, or perhaps someone interrupted me before I could finish
my explanation, or possibly both. I just remember being unsatisfied that somehow
what I wanted to say wasn't communicated correctly, but the discussion moved on,
from there, so I didn't say anything more on that. I tried to find the recorded
audio session for the meeting to confirm my recollection/suspicion, but I guess
it wasn't sent around. In any case, I think your suggestion for square bracketed
clarification sounds fine by me.

***************************************************************

From: Brad Moore
Sent: Monday, March 4, 2019  10:30 AM

>>> "The type Test_And_Set seems over-specified. Brad says that
>> the C library has three such types, 1, 2, and 4 bytes."
>
> Which library is that? I locked it up in the C11 standard and all I
> could find was section 7.17.8, which (very resonably) expects one
> implementation-dependently sized type:

There is only one type. There are some other emails about my miscommunication leading to statements in the minutes, which do not reflect what I was intending to say.

I believe Randy is planning to add some sort of clarification to the minutes.

> ---- copied from the pre-standard ---
> 7.17.8 Atomic flag type and operations
>
> 1 The atomic_flag type provides the classic test-and-set functionality.
> It has two states, set and clear.
>
> 2 Operations on an object of type atomic_flag shall be lock free.
>
> 3 NOTE Hence the operations should also be address-free. No other type
> requires lock-free operations, so
> the atomic_flag type is the minimum hardware-implemented type needed to
> conform to this
> International standard. The remaining types can be emulated with
> atomic_flag, though with less than ideal properties.
> ...
>
> 7.17.8.1 The atomic_flag_test_and_set functions
> Synopsis
> 1 #include <stdatomic.h>
> bool atomic_flag_test_and_set(
> volatile atomic_flag *object);
> bool atomic_flag_test_and_set_explicit(
> volatile atomic_flag *object, memory_order order);

Interesting that C11 has functions for this. I was looking at the gcc instrinsic
functions

see

https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins

Which are extensions to their support for C, and also supports the C++ memory
models.

Built-in Function: bool __atomic_test_and_set (void *ptr, int memorder)

which essentially has the same profile. We are not planning to support the C++
memory models, so the memorder parameter will not be present in the Ada calls.

The ptr parameter is expected to be the address of a byte of storage, as only
the one byte pointed to by that address is modified by the call.

****************************************************************

From: Brad Moore
Sent: Thursday, March 7, 2019  5:33 PM

Here is an update to AI12-0234-1, Compare-and-swap for atomic objects

This represents a significant change over the last version.

This version only attempts to provide a Compare-And-Swap capability.

The other atomic functions will be split off into another AI.

Compare-and-swap is defined as an intrinsic generic function.

Previously it was a generic package, but the only other function that could
possibly be grouped into such a package is Atomic_Exchange, because it is the
only other atomic function defined by gcc that can operate on any atomic types.

Most of the other ones are arithmetic functions, more suitable for a
generic-formal modular type.

The only other similar function is Test_And_Set, which should not be a generic
package, as the type it operates on is defined in the package.

So, it seemed to make sense to keep the three capabilities,
Atomic_Compare_And_Swap, Atomic_Exchange, and Atomic_Test_And_Set in separate
child units.

For all atomic functions, gcc provides a method to determine if the call is lock
free or not. That call needs a pointer to the object to be operated on, as well
as the size of the object.

It seems important to provide the user with the ability to check the
lock-free-ness of the call, so I provided another generic child function,
Is_Lock_Free, that can be applied to any atomic type to determine if a specific
object of that type can be used with all of the operations declared in the child
units of Atomic_Operations, in a lock-free manner.

This required me defining what lock-free means.

****************************************************************

From: Brad Moore
Sent: Thursday, March 7, 2019  5:33 PM

Oops, forgot to include the actual AI text.....

Here it is....

[This is version /04 of the AI - Editor.]

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 7, 2019  7:29 PM

...
> A.19 Atomic Operations

I haven't mentioned this before, but A.19 is Ada.Locales in Ada 2012. It's not a
very good choice for the number of a new clause. :-)

More seriously...

> The library package Ada.Atomic_Operations is the parent of a set of
> child units that provide facilities for manipulating objects of atomic
> types.

Shouldn't this package be a child of System? And in Annex C (since that's where
Atomic is defined, after all).

So I'd suggest defining this in C.6.1 and later subclauses (since it's related
to atomic), and have it be a child of System (it's only useful for rather
low-level programming).

****************************************************************

From: Randy Brukardt
Sent: Thursday, March 7, 2019  8:08 PM

An editorial fix:

> Static Semantics
>
> The library function
> Ada.Atomic_Operations.Atomic_Compare_And_Exchange has the following
> declaration:
>
> generic
>    type Atomic_Type is private with Atomic; function
> Ada.Atomic_Operations.Atomic_Compare_And_Exchange
>      (Item     : aliased in out Atomic_Type;
>       Expected : aliased in out Atomic_Type;
>       Desired  :                Atomic_Type) return Boolean
>    with Nonblocking, Global => null, Convention => Intrinsic;
>
> function Atomic_Compare_And_Exchange (Item     : aliased in
> out Atomic_Type;
>                                       Expected : aliased in out
> Atomic_Type;
>                                       Desired  : Atomic_Type) return
> Boolean with Nonblocking, Global => null, Convention => Intrinsic;


The function specification is given twice here. I deleted the second copy.

There also were a bunch of missing spaces, lines too long, and the like, which I
fixed and will never speak of again. :-)

****************************************************************

From: Brad Moore
Sent: Friday, March 8, 2019  12:55 AM

>
> ...
>> A.19 Atomic Operations
>
> I haven't mentioned this before, but A.19 is Ada.Locales in Ada 2012.
> It's not a very good choice for the number of a new clause. :-)
>
> More seriously...
>
>> The library package Ada.Atomic_Operations is the parent of a set of
>> child units that provide facilities for manipulating objects of
>> atomic types.
>
> Shouldn't this package be a child of System? And in Annex C (since
> that's where Atomic is defined, after all).
>
> So I'd suggest defining this in C.6.1 and later subclauses (since it's
> related to atomic), and have it be a child of System (it's only useful
> for rather low-level programming).

That sounds good to me Randy. Definitely a better locale for this stuff, so
to speak.... :-)

****************************************************************

From: Brad Moore
Sent: Saturday, March 9, 2019  12:47 PM

Here is an update to AI12-0234-1. [This is version /05 of the AI - Editor.]

The main differences are;

- As per Randy's suggestions, moved these packages an child packages under
package System, rather than package Ada, and moved the wording sections to Annex
C, just after the existing section C.6, Shared Variable Control, which seems
much more appropriate place for this stuff.

- Changed the generic function Atomic_Compare_And_Exchange into a generic
package that contains two functions,

    Atomic_Compare_And_Exchange, and Atomic_Exchange.

Atomic_Exchange was previously split off into the new AI, but it seems more
appropriate to include it here in this AI, since it is in the same package as
Atomic_Compare_And_Exchange, and they have similar capabilities.

- Instead of a generic Is_Lock_Free function that needs to be instantiated
separately, there is an Is_Lock_Free function that will be in each of the child
packages of System.Atomic_Operations. This allows a programmer to query if the
target environment allows the functions to be implemented lock-free.

- Added an example of a spinlock using Atomic_Exchange

****************************************************************


Questions? Ask the ACAA Technical Agent