CVS difference for ai12s/ai12-0234-1.txt

Differences between 1.6 and version 1.7
Log of other versions for file ai12s/ai12-0234-1.txt

--- ai12s/ai12-0234-1.txt	2018/05/18 01:58:47	1.6
+++ ai12s/ai12-0234-1.txt	2018/06/06 22:53:39	1.7
@@ -1,4 +1,4 @@
-!standard C.6(14/3)                                  18-05-05  AI12-0234-1/02
+!standard A.19(0)                                      18-06-06  AI12-0234-1/03
 !class Amendment 17-06-09
 !status work item 17-06-09
 !status received 17-05-19
@@ -9,122 +9,51 @@
 
 !problem
 
-A very desirable property of an Ada program on a single core computer is that
-it can be guaranteed to be deadlock free, with no unbounded priority inversions
-when the priority ceiling protocol is applied. Unfortunately, this property
-can not be easily proven for multi-tasking Ada programs executing on a multicore 
-processor. This is because a lock must be obtained prior to executing a protected
-action. For instance, consider two protected objects that have protected
-procedures that in turn call a protected procedure of the other object. If task 
-A calls protected object P1, which calls a protected procedure of P2, while 
-task B calls protected object P2, which calls a protected procedure of P1, 
-deadlock is a possibility, since both tasks will have acquired locks to one of 
-the protected objects, while waiting endlessly for the lock of the other 
-protected object. If the protected objects were implemented with lock free 
-algorithms, or if it could be guaranteed that all tasks that interact with a
-protected object execute on the same processor, then this deadlocking could be 
-avoided. Should Ada provide mechanisms to guarantee that deadlocking will not 
-occur when a program is executing on a multicore processor?
-
-Further, lock-free structures are all the rage these days. If one wanted to construct
-such a structure in Ada, one might use Atomic objects. But Ada does not
-provide any compare-and-swap operation (or other read-write locked operations,
-like atomic increment). The RM has Implementation Advice that package
+On multiprocessor platforms, relying on locks for synchronization can be problematic.
+One issue is that while a task is blocked, it can be difficult or impossible to 
+offer guarantees for progress.  Other problems associated with locks includes
+deadlock, livelock, and priority inversion. Locking can also be a significant
+detriment to performance, and reduce opportunities for parallelism.
+
+Lock-free data structures guarantee system-wide progress, while wait-free
+data structures, in addition to being lock-free, also guarantee per-thread progresss.
+
+Lock-free data structures can also be used to improve performance by increasing 
+the amount of time spent executing in parallel since access to the data structure 
+does not need to be serialised.
+
+If one wanted to construct such a structure in Ada, one might use Atomic objects. 
+But Ada does not provide any compare-and-swap operation (or other read-write locked 
+operations, like atomic increment). The RM has Implementation Advice that package
 Machine_Code include such operations - C.1(11-12), but that is the absolute
-opposite of a portable operation. Similarly, arithmetic operations on variables
-of atomic types cannot be expected to work properly if updates are occurring
-concurrently to the same variable. Something as simple as A := A + 1; cannot
-be trusted because after reading the value of A, another task might store a 
-different value into A, and storing the incremented value could overwrite the
-update performed by the other task. Should Ada provide some simple primitives
+opposite of a portable operation. One could write such a library in another 
+language such as C and call that library from Ada, but then that involves at
+least one subprogram call in addition to a single instruction insertion, 
+which is significantly higher overhead than what might be desired, and having
+to rely on the existence of other languages for a given target is also undesirable
+and less portable. Should Ada provide some simple primitives
 that can be mapped to hardware instructions that allow such updates to perform
 as expected? -- Yes
 
 !proposal
 
-One part of this proposal is to allow the CPU aspect to be specified on a declaration
-of a protected type, or a stand alone protected object. This ensures that
-all tasks that invoke protected actions of a protected object are executing
-on the same CPU, which implies that the runtime can be simpler without needing
-locks to be acquired, thus avoiding deadlock.
-
-Another part of the solution is to provide a set of standard library calls that
-maps to commonly available atomic hardware instructions such as compare and swap,
-and test and set. These subprograms are to be intrinsic calls, and the generic
-libraries will assert an error if a particular target platform does not support
-such atomic operations.
-
-We would like the libraries to be generic to support operations on discrete types
-of different sizes, and we would like to require that the actual types for the
-generics be atomic types, so that the semantics of atomic types can be associated
-with these primitive operations. However, Ada currently does not allow the Atomic
-aspect to be specified on generic formal types, so one part of this proposal is
-to change the rules to allow that.
+The solution is to provide a set of standard library calls that map to commonly 
+available atomic hardware instructions such as compare and swap,
+test and set, and atomic increment. These subprograms are to be intrinsic calls, 
+and the generic libraries will assert an error if a particular target platform 
+does not support such atomic operations.
+
+The libraries are generic libraries in order to support operations on discrete 
+types of different sizes, and we require that the actual types for the generics 
+be  atomic types, so that Ada's semantics of atomic types can be associated
+with these primitive operations.
+
+This proposal depends on the ability to specify the Atomic aspect for a generic 
+formal type (AI12-0282-1). This is needed to ensure that the actual types used
+to instantiate the generic libraries defined by this AI are atomic types.
 
 !wording
 
-Modify D.16 (7/3) to allow aspect CPU to be applied to a protected type
- 
-For a task type (including the anonymous type of a single_task_declaration)
-{, protected type (including the anonymous type of a single_protected_declaration),} 
-or subprogram, the following language-defined representation aspect may be specified:
-
-Modify D.16 (8.a/3)
-Aspect Description for CPU: Processor on which a given task{, or calling task 
-for a protected type} should run.
-
-Modify D.16 (10/3)
-The CPU aspect shall not be specified on a task { or protected }interface type.
-
-Modify D.16 (11/4)
-The expression specified for the CPU aspect of a task { or protected }type is 
-evaluated each time an object of the [task] type is created (see 9.1 {and 9.4}). 
-The CPU value is then associated with the [task] object.
-
-Modify D.16 (14/3)
-{For a task, the}[The] CPU value determines the processor on which the task will 
-activate and execute; the task is said to be assigned to that processor. If 
-the CPU value is Not_A_Specific_CPU, then the task is not assigned to a processor. 
-A task without a CPU aspect specified will activate and execute on the same 
-processor as its activating task if the activating task is assigned a processor. 
-If the CPU value is not in the range of System.Multiprocessors.CPU_Range or is 
-greater than Number_Of_CPUs the task is defined to have failed, and it becomes a 
-completed task (see 9.2).
-
-{For a protected type, the CPU value determines the processor on which calling tasks
-will execute; the protected object is said to be assigned to that processor. If 
-the CPU value is Not_A_Specific_CPU, then the protected object is not assigned to 
-a processor. A call to a protected object that is assigned to a processor from a
-task that is not assigned a processor or is assigned a different processor 
-raises Program_Error.}
-
-D.16 Implementation Advice
-
-Starting a protected action on a protected object assigned to a processor should 
-be implemented without busy-waiting.
-
-AARM Reason: Busy-waiting is a form of lock and can be a source of deadlock.
-Busy-waiting is typically needed for starting protected actions on multiprocessors, 
-but if all tasks calling a protected object execute on the same CPU, this locking 
-isn't needed and the source of deadlock and associated overhead can be eliminated. 
-
-Modify J.15.9 (4/3)
-A CPU pragma is allowed only immediately within a task_definition, {protected_definition, }
-or the declarative_part of a subprogram_body.
-
-Modify J.15.9 (6/3)
-For an implementation that supports Annex D, a pragma CPU specifies the value of 
-the CPU aspect (see D.16). If the pragma appears in a task_definition, the 
-expression is associated with the aspect for the task type or single_task_declaration 
-that contains the pragma{. If the pragma appears in a protected_definition, the 
-expression is associated with the aspect for the protected type or 
-single_protected_declaration that contains the pragma. Otherwise}[; otherwise], 
-the expression is associated with the aspect for the subprogram that contains the pragma.
-
-Modify K.1 (15/3)
-CPU   Processor on which a given task {, or calling task 
-for a protected type} should run. See D.16.
-
 A.19 Atomic Operations
 
 This clause presents the specifications of the package Atomic_Operations and 
@@ -139,11 +68,19 @@
 
 The library package Atomic_Operations has the following declaration: 
 
+with System.Storage_Elements;
+
 package Ada.Atomic_Operations is
 
    pragma Pure;
 
-   type Test_And_Set_Type is mod 2**8;
+   type Test_And_Set_Type is mod System.Storage_Elements.Storage_Element'Modulus
+      with Atomic, Size => System.Storage_Elements.Storage_Element'Size;
+
+   --  Or alternatively we could go with
+   --
+   --  type Test_And_Set_Type is new System.Storage_Elements.Storage_Element
+   --    with Atomic, Size => System.Storage_Elements.Storage_Element'Size;
 
    Atomic_Support_Error : exception;
    
@@ -163,6 +100,7 @@
    
 end Ada.Atomic_Operations;
 
+
 Test_And_Set_Type represents the resulting state of a call to Atomic_Test_And_Set.
 
 Atomic_Support_Error is raised during elaboration of a child package of 
@@ -235,13 +173,13 @@
 function Atomic_Load (From : aliased Atomic_Type) return Atomic_Type
 with Convention => Intrinsic;
 
-Returns the value of From
+Returns the value of From.
 
 procedure Atomic_Store (Into  : aliased in out Atomic_Type;
                         Value : Atomic_Type)
 with Convention => Intrinsic;
 
-Writes Value into Into
+Writes Value into Into.
 
 function Atomic_Exchange (Item  : aliased in out Atomic_Type;
                           Value : Atomic_Type) return Atomic_Type
@@ -259,15 +197,12 @@
 operation is a read-modify-write operation that writes Desired into
 Item. If they are not equal, the operation is a read and the current
 contents of Item are written into Expected.
-Weak is true for weak compare_and_exchange, which may fail spuriously,
+Weak is True for weak compare_and_exchange, which may fail spuriously,
 and false for the strong variation, which never fails spuriously. Many
 targets only offer the strong variation and ignore the parameter.
 When in doubt, use the strong variation.
 If Desired is written into Item then True is returned. Otherwise, False is returned.
 
-function Atomic_Always_Lock_Free return Boolean
-with Convention => Intrinsic;
-
 Returns True if objects always generate lock-free atomic instructions for the 
 target architecture.
 
@@ -283,7 +218,7 @@
 for the target architecture. Item may be used ot determine alignment.
 The compiler may also ignore this parameter.
 
-A.19.2  The Generic Package Atomic_Operations.Signed_Arithmetic
+A.19.3  The Generic Package Atomic_Operations.Signed_Arithmetic
 
 The language-defined generic package Atomic_Operations.Signed_Arithmetic 
 provides a set of operations for adding and subtracting values atomically to
@@ -338,7 +273,7 @@
 function Atomic_Fetch_And_Subtract (Item  : aliased in out Atomic_Type;
                                     Value : Atomic_Type) return Atomic_Type;
 
-A.19.3  The Generic Package Atomic_Operations.Modular_Arithmetic
+A.19.4  The Generic Package Atomic_Operations.Modular_Arithmetic
 
 The language-defined generic package Atomic_Operations.Modular_Arithmetic 
 provides a set of operations for atomically adding, subtracting, and  
@@ -456,155 +391,22 @@
                                    Value : Atomic_Type) return Atomic_Type
     with Intrinsic;
 
-Modify C.6 (6.1/3)  to allow aspect Atomic to be applied to a generic formal type
-
- For an object_declaration, a component_declaration, or a 
- {type (including a formal type)}[full_type_declaration], the following 
- representation aspects may be specified:
-
-Modify C.6 (12/3)
-If an atomic object is passed as a parameter, then the formal parameter shall 
-either have an atomic type or allow pass by copy. If an atomic object is used as 
-an actual for a generic formal object of mode in out, then the type of the 
-generic formal object shall be atomic. If the prefix of an attribute_reference 
-for an Access attribute denotes an atomic object [(including a component)], then 
-the designated type of the resulting access type shall be atomic. {If a generic
-formal type is atomic, then the actual shall be atomic.} If an atomic type is 
-used as an actual for a generic formal derived type, then the ancestor of the 
-formal type shall be atomic. Corresponding rules apply to volatile objects and types. 
-
-In a generic instantiation the actual type corresponding to an atomic formal 
-scalar, private, derived, array, or access-to-object type shall be atomic;
-
-In addition to the places where Legality Rules normally apply
-(see 12.3), the above rule also apply in the private part of an
-instance of a generic unit.
-
-AARM Ramification: For a generic formal parameter to be atomic
-(thus, for this rule to apply), it has to explicitly specify aspect
-Atomic to be True. 
-
 !discussion
 
-It seems that the solution will need to be generic, in order to work with any
-atomic type. For that to make sense, it seems necessary to allow aspect Atomic
+It seems that the solution will need to be generic, in order to work with different
+atomic types. For that to make sense, it is necessary to allow aspect Atomic
 to be given on a formal type, in order to require any actual type is indeed
 atomic. The alternative of just saying that if the type is not atomic, then the
 operation isn't either, seems error-prone.
 
-There are at least three approaches to improving Ada's support for lock free 
-structures, all of which are compatible with each other, and provide their own
-benefits.
-
-From a real-time perspective, very efficient lock free data structures can in 
-theory be obtained in a straight forward manner when the Ravenscar Profile is 
-being applied, and when it is known that all use of a protected object is by 
-tasks that execute on the same processor. This is because each task is locked
-to execute on a specific CPU, and because the ceiling priority protocol is in
-place. If all use of a protected object is by tasks
-executing on the same CPU, then any task that is executing a protected action 
-cannot be preempted by another task wishing to call into the same protected object
-and therefore no locking is needed. But locking is needed to implement
-a protected object if a protected object can be accessed from tasks executing on
-multiple cores. Ada allows aspect CPU to be applied to a task desclaration to
-indicate which CPU a task of that type will execute on. The CPU aspect may also 
-be applied to a subprogram declaration, which could be applied to protected
-procedures, or protected functions, but not protected entries.
-
-It would be helpful if the CPU aspect could be applied to a protected type 
-declaration, or single protected object, which would imply that all calls to the
-protected object are via tasks that are executing on the same CPU.
-
-With such a specification in place, this would provide an indication to the
-implementation that locking is not needed, and that any overhead associated with
-locking, including space for the lock in memory and initialisation and 
-finalisation of the locks can be eliminated.
-
-Program_Error would be raised if a task executing on a CPU other than the one
-specified for the protected object attempts to execute a protected action of that
-protected object.
-
-We considered defining a Lock_Free aspect which could also be applied to a declaration 
-of the protected type or single protected object. The type of the Lock_Free aspect would 
-be Boolean, and it would be illegal to specify the Lock_Free aspect for an object or type
-if the compiler cannot guarantee indivisible updates.
-
-With CPU aspect and Lock_Free aspect being applicable to protected type 
-declarations, we could then have a new restriction, No_Locking, which specifies
-that each protected object that is not lock free is associated with the CPU of
-the task that created it. For Ravenscar, this would have the effect of assigning
-CPU to these declarations as that being the CPU associated with the environmental
-task.
-
-Having all protected objects assigned to specific CPU's would ensure that the
-program is free from Deadlock.
-
-To further support this notion, a new attribute, 'Lock_Free could be made available
-to query if the actual implementation is lock free.
-
-If a protected object is bound to a specific CPU, then the implementation of that
-protected object could be lock free, regardless whether the program has the 
-Ravenscar profile specified or not. Any tasks calling into that protected object
-would need to have the CPU aspect specified with a matching CPU value.
-
-If a protected object does not have the CPU aspect specified, but has the Lock_Free
-aspect applied, then there would need to be additional restrictions applied to that
-protected object to allow for a lock free implementation, to support needs of
-real time systems.
-
-Since lock free data structures involve retries when there is contention for the
-object, the number of retries needs to be bounded. To bound the execution time
-of a protected action we would want to disallow loop, goto, and procedure call
-statements in the protected actions of a lock free protected object. We would
-also want to disallow calls to non-static functions, and disallow quantified
-expressions, which are a form of looping construct.
-
 Many lock free algorithms involve atomic primitives such as compare and swap
 instructions or load and store instuctions. Most target platforms provide some
 form of instruction of these category. A limitation of these instructions is 
 that they typically can only be applied to a single elementary integer type of
 1, 2, 4, or 8 bytes in size.
-
-To restrict a protected object so that it can fit to work with these atomic
-primitives, there would need to be further restrictions to contain the operation
-to a single memory location. In particular the following would not be allowed in
-a lock free protected operation;
-
-- Access to non-local variables. (Note: constants are OK)
-- Non Elementary parameters to the protected operations
-- Dereferencing of access values (i.e. Pointers)
-- Address clauses
-- Imported or exported entities
-- Allocators
-
-And lastly, to prevent instructions with external effect, we would want to
-disallow the use of Volatile variables within a lock free protected operation.
-
-It is worth noting that exceptions and exception handling would be allowable
-within lock free protected operations.
-
-One of the benefits of the Lock_Free aspect is that it provides flexibility of
-implementation. The implementation may choose to implement the Lock_Free aspect
-via the use of atomic primitives, but it may also decide to implement instead
-via the use of transactional memory approaches. Static analysis could be applied
-to determine which approach is better. For real time, this may involve determining
-the worst case execution time for the transaction. Whether the implementation
-decides to implement with a transactional memory approach or with atomic
-primitives, the choice is transparent to the client. In bounding the
-execution time, the worst case could be limited to the number of writes, N x
-the transaction time.
-
-This was discussed at a recent IRTAW, but it was noted that there are issues with
-transaction based approaches for use with real time, and the conclusion was that
-transaction based approach would likely not be wanted for use with Ada.
-
-It was felt that at the current time, defining the Lock_Free aspect
-would be premature, in part because of all the restrictions that would be needed. 
-While at least one compiler vendor has implemented this aspect,
-it was felt that more experience would be needed before standardising such a feature.
 
-A third approach to improving support for lock free data structures would be
-to provide a library of low level atomic primitives similar to the library
+The approach taken to improving support for lock free data structures is to
+provide a set of libraries of low level atomic primitives similar to the library
 that is provided by gcc for C and C++.
 
 The C and C++ memory model support several different memory orders. The most
@@ -629,32 +431,26 @@
 compiler optimisations, such as hoisting or sinking code across boundaries where
 atomic primitives are being used. For instance the Acquire/Release memory order
 has this requirement in particular.
-
-For the approach of this AI, if we were to go forward with providing a library of
-atomic primitives, it probably would be best to not bother with supporting the
-lower synchronisation memory orders, and instead provide a library which 
-implicitly assumes the sequentially consistent memory order, which is both safer
-to use, and easier to understand.
-
-As some final notes, the three approaches to lock freedom,
-
- 1) Applying Aspect CPU to protected type declarations
- 2) Allowing Lock_Free aspect to be applied to protected type declarations
- 3) Providing a library of low level atomic primitives
- 
- Are all compatibile with each other, and in theory could all be supported. 
-
-For the real time community, the aspect CPU and related No_Locking restriction
-would likely be of most interest, and possibly the easiest to implement.
 
-The Lock_Free aspect allows implementations to choose between atomic primitives
-and transactional memory, which providing a safer interface that integerates
-better into the existing task synchronisation support.
+For the approach of this AI, it probably would be best for now to not bother 
+with supporting the lower synchronisation memory orders, and instead provide a 
+library which implicitly assumes the sequentially consistent memory order, 
+which is both safer to use, and easier to understand.
 
 The library of intrinsic primitives might be of interest to those wishing to
 implement specific lock free algoriths, particularly if porting those applications
 from other languages.
 
+We considered whether these libraries should be "future proofed" by providing
+an extra parameter for the memory model. The type of that parameter could be
+an enumeration which for now only contains Sequentially_Consistent, but could 
+be extended in the future to name other memory models. We decided that 
+there is not likely to be backwards compatibility issues since these are 
+intrinsic suprograms. For example, they cannot be passed as actual subprograms
+to generic libraries. Also, if needed, in the future we could add overloaded
+calls to these libraries that accept a memory order parameter, without introducing
+backward compatibility issues.
+
 !ASIS
 
 No ASIS effect.
@@ -663,7 +459,6 @@
 
 An ACATS C-Test is needed to check that the new capabilities are supported.
 
-
 !appendix
 
 !topic Lock-free data structures
@@ -2672,3 +2467,436 @@
 
 ****************************************************************
 
+From: Brad Moore
+Sent: Friday, May 18, 2018  4:36 PM
+
+> ...
+>> > (2) 2**8 would be a very expensive type on the U2200 and various 
+>> > other unusual processors. Ada has never tied itself to 8-bit 
+>> > machines like Java.
+>> > Shouldn't this type be based on the storage unit in System?
+>> 
+>> I agree that storage unit would be a more flexible definition. The 
+>> GCC api just has a void pointer, but the documentation says it 
+>> updates a byte of storage.
+>> I suppose one could alternatively say 2**implementation defined, like 
+>> we do for Hash_Type in Ada.Containers, but in this case I think 
+>> basing it on storage unit would be a better choice. I wonder if it 
+>> could be a private type? I think I tried that, but found it harder to 
+>> describe in the wording.
+>> This way, one can say that Clear sets the value to a zero value.
+> 
+> Perhaps you just want to copy (literally) the definition of 
+> Storage_Element (which is modular):
+> 
+>   type Test_And_Set_Type is
+>      mod System.Storage_Elements.Storage_Element'Modulus
+>      with Atomic;
+> 
+> (We can't use Storage_Element directly because of the need to declare 
+> this atomic, but we can copy the modulus. Storage_Unit is defined to be
+> Storage_Element'Size.)
+
+I believe we *can* use Storage_Element directly if we use derivation. And in
+either case, I think we'd also want to specify the Size aspect. So either;
+
+
+   type Test_And_Set_Type is mod
+     System.Storage_Elements.Storage_Element'Modulus
+     with Atomic, Size => System.Storage_Elements.Storage_Element'Size;
+
+as you suggested, or;
+
+   type Test_And_Set_Type is new System.Storage_Elements.Storage_Element
+     with Atomic, Size => System.Storage_Elements.Storage_Element'Size;
+
+It's not clear to me which of these is better, or if it matters. 
+Maybe the Size aspect for the derivation case isn't needed if Size is
+inherited.
+
+I suppose it depends on whether we want to think of this type as a type that
+happens to look a lot like a Storage_Element, or instead as a special type of
+Storage_Element.
+
+I think I am leaning towards the latter, using derivation.
+
+Another question I have is with regard to future-proofing these libraries.
+
+The real gcc library calls have an extra parameter for memory model, which 
+defaults to the sequentially consistent memory model if not specified. We are
+saying right now that we want to go with a simpler approach and only support 
+the Sequentially_Consistent memory model, and therefore no parameter is 
+needed.
+
+I am thinking it might be better future-proofing if we defined an enumeration
+for Memory_Model that for now only has one literal, Sequentially_Consistent, 
+and that parameter be added to the calls, defaulting to the 
+Sequentially_Consistent value.
+
+That way, in the future, if we decide to support other memory models, adding 
+items to the Memory_Model enumeration is less likely to introduce backwards 
+compatibility than adding a new parameter.
+
+Is this suggestion worth worrying about? or is backwards compatibility not a
+significant problem here? I'm was originally thinking of cases such as 
+instantiating generics using these subprograms, which might expect a certain 
+profile signature. On the other hand, since these are intrinsic subprograms, 
+one cant instantiate generics with these as actuals, and maybe there are not 
+any other real backwards compatibility concerns?
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, May 18, 2018  5:36 PM
+
+...
+> I suppose it depends on whether we want to think of this type as a 
+> type that happens to look a lot like a Storage_Element, or instead as 
+> a special type of Storage_Element.
+> 
+> I think I am leaning towards the latter, using derivation.
+
+I was thinking it was the former, thus the formulation I suggested. Doesn't 
+really matter much either way, though. At least not until Steve finds some 
+interesting way to use a formal derived type to cause a mess. ;-)
+
+...
+> Is this suggestion worth worrying about? or is backwards compatibility 
+> not a significant problem here? I'm was originally thinking of cases 
+> such as instantiating generics using these subprograms, which might 
+> expect a certain profile signature. On the other hand, since these are 
+> intrinsic subprograms, one cant instantiate generics with these as 
+> actuals, and maybe there are not any other real backwards 
+> compatibility concerns?
+
+I tend to not want to worry about it. The C++ memory models seem to correspond
+roughly to "normal"/Volatile/Atomic objects -- Ada sets these models 
+per-object rather than per program. If we wanted (say) a volatile version of 
+these libraries, I'd expect that it would have to be separately defined (given
+the way you defined aspect matching). And I can't imagine adopting a C++ 
+memory model that essentially strips the guarantees of Atomic while still
+allowing objects to be declared that way.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Friday, May 18, 2018  7:04 PM
+
+> ...
+>> Is this suggestion worth worrying about? or is backwards 
+>> compatibility not a significant problem here? I'm was originally 
+>> thinking of cases such as instantiating generics using these 
+>> subprograms, which might expect a certain profile signature. On the 
+>> other hand, since these are intrinsic subprograms, one cant 
+>> instantiate generics with these as actuals, and maybe there are not 
+>> any other real backwards compatibility concerns?
+> 
+> I tend to not want to worry about it. The C++ memory models seem to 
+> correspond roughly to "normal"/Volatile/Atomic objects -- Ada sets 
+> these models per-object rather than per program. If we wanted (say) a 
+> volatile version of these libraries, I'd expect that it would have to 
+> be separately defined (given the way you defined aspect matching). And 
+> I can't imagine adopting a C++ memory model that essentially strips 
+> the guarantees of Atomic while still allowing objects to be declared that
+> way.
+
+I think Ada's per object approach is a sensible one, but I think the memory 
+model concept is someone independent of the volatile/atomic/normal semantics.
+
+Here is an example I found today, that actually looks very similar to the 
+approach I was suggested for implementing the FIFO_Spinning Admission policy
+in AI12-0276.
+The idea there was a ticket abstraction where someone wanting a lock takes the
+next ticket number and then busy waits until that number comes up.
+
+The full source code example can be found at;
+
+   https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/ticket_mutex.c
+
+The example uses the same primitive calls that we are considering for Ada.
+To obtain the lock, they do an atomic increment to get the next ticket number 
+using atomic_fetch_add, then keep calling atomic_load until they see their 
+ticket number come up. For these calls, I believe the default memory order, 
+Sequentially_Consistent is applied, which makes sense since there is 
+communication between threads.
+
+void ticket_mutex_lock(ticket_mutex_t * self) {
+    long lingress = atomic_fetch_add(&self->ingress, 1);
+    while (lingress != atomic_load(&self->egress)) {
+        sched_yield();  // Replace this with thrd_yield() if you use <threads.h>
+    }
+    // This thread has acquired the lock on the mutex }
+
+When unlocking the mutex, they load the current ticket number locally, using 
+the relaxed memory order, because since the lock was held it is known that the
+value of the current ticket number would not have changed. This is more 
+efficient than loading using the sequentially consistent memory model, as that
+requires synchronisations across all the threads. The store back to update the
+current ticket number uses sequentially consistent again since other threads 
+need to see the update.
+
+
+/*
+ * Unlocks the mutex
+ * Progress Condition: Wait-Free Population Oblivious
+ *
+ * We could do a simple atomic_fetch_add(egress, 1) but it is faster to do
+ * the relaxed load followed by the store with release barrier.
+ * Notice that the load can be relaxed because the thread did an acquire
+ * barrier when it read the "ingress" with the atomic_fetch_add() back in
+ * ticket_mutex_lock() (or the acquire on reading "egress" at a second try),
+ * and we have the guarantee that "egress" has not changed since then.
+ */
+void ticket_mutex_unlock(ticket_mutex_t * self)
+{
+    long legress = atomic_load_explicit(&self->egress, memory_order_relaxed);
+    atomic_store(&self->egress, legress+1);
+}
+
+
+So, the memory order seems somewhat orthogonal to the atomicity class of the 
+object. In some cases, different orders might ideally be wanted to be applied 
+to the same object at different times to get better performance.
+
+Given this, do you still think that we'd not want to someday be able to 
+specify different memory orders using the same generic instantiation?
+
+I suppose we could add new overloaded calls to the libraries in the future 
+that accept a memory order parameter, and leave the current calls as is, 
+without a parameter.
+
+Then there is no backwards compatibility issue. I think I just answered my own 
+question... (with your help :))
+
+So I think we're OK not worrying about this.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Friday, May 18, 2018  9:37 PM
+
+>> > > (2) This means all 6 aspects are getting allowed on formal types, 
+>> > > even though we only have need (and rules!) for one. Is that 
+>> > > really what we want?
+>> > > Do we really want to mess around with Independent_Components in
+> generics?
+>> > > Etc.
+>> > 
+>> > I think I was under the impression that I was letting Volatile ride 
+>> > in on the coattails of Atomic for this purpose, but didn't think 
+>> > the other ones were being included.
+>> > The thought was that it probably makes sense to allow Volatile on 
+>> > generic formal types, or weird not to, if we allow Atomic. But only 
+>> > if it fits better with the wording.
+>> > Otherwise restricting it to just Atomic is fine by me.
+>> 
+>> That seemed OK to me, but what about Atomic_Components and 
+>> Volatile_Components on formal array types? That seems like work :-) 
+>> and I don't know if we have any use for that work.
+>> And similarly for the Independent cases.
+> 
+> Humm, I see that the actual wording only uses that heading for the 
+> three aspects Atomic, Volatile, and Independent. But the proposed 
+> wording literally only has rules for Atomic and Volatile. There's no 
+> wording for what it meaning to put Independent into a formal type, but 
+> the proposed wording allows it. We need at a minimum to reconcile this 
+> (either have wording for generic matching for Independent, or only 
+> allow on formals for Atomic and Volatile).
+
+Humm indeed. I originally said it was weird to allow Atomic without Volatile, 
+but now it feels even weirder to have those two and not have Independent, 
+especially since C.6 (8.1/4) says that "All atomic objects and aliased objects
+are considered to be specified as independently addressable."
+
+And then more weird to be able to specify Atomic or Volatile on a formal array
+type, but not be able to specify Atomic_Components or Volatile_Components.
+
+Also, if there's a need to have Atomic on generic formals, it is not too hard 
+to imagine there being a need for Atomic_Components on a generic formal array 
+type. An array of counters that gets atomically incremented is the first thing
+that comes to mind.
+
+I see quite a bit less of a use case for Volatile and Volatile components than
+Atomic and Atomic_Components, and even lesser a case for Independent and 
+Independent_Components.
+
+So I am now thinking the way to go for now is to allow Atomic and 
+Atomic_Components only, and not allow any of the others.
+
+If we ever come up with a more concrete need for the others, they can be added 
+in the future.
+
+Unless I hear otherwise, I will write the AI up this way....
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Friday, May 18, 2018  8:48 PM
+
+...
+> So I am now thinking the way to go for now is to allow Atomic and 
+> Atomic_Components only, and not allow any of the others.
+> 
+> If we ever come up with a more concrete need for the others, they can be 
+> added in the future.
+> 
+> Unless I hear otherwise, I will write the AI up this way....
+
+Seems a reasonable starting point.  In the long run, supporting all of them 
+makes the most sense, but I can see starting with just Atomic.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Friday, May 18, 2018  7:38 PM
+
+Here is my updated AI for the lock free libraries. [This is version /03 of the
+AI - Editor.]
+
+Other than some minor changes addressing comments received so far, the big 
+difference is that I split out the two other parts which will be appearing as
+new separate AI's.
+
+Specifically, I removed that part about pragma CPU being specifiable on 
+protected object declarations, and I removed that part about allowing pragma
+Atomic to be specifiable on generic formal types, which this AI depends on.
+
+The discussion is simplified a lot, and the problem I think provides clearer 
+and better motivation.
+
+I think the calls for Atomic_Thread_Fence and Atomic_Signal_Fence will 
+definitely need more wording to describe those concepts if we want to support 
+those calls. I left those loosely and simply defined, in case we decide to drop
+those calls.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, June 6, 2018  5:16 PM
+
+> Here is my updated AI for the lock free libraries.
+
+For the record, I made a handful of editorial corrections to this AI when I 
+posted it.
+
+(1) Corrected the spelling of "synchronization" from "syncronisation" (Ada 
+uses American English spellings, thus a 'z' rather than an 's').
+
+(2) Inserted the actual number of the AI for atomic formal types
+(AI12-0282-1) and moved it's paragraph last in the !proposal section. (It 
+reads best giving the general solution first, then why the libraries are 
+generic, and finally that the libraries depend on another new AI.)
+
+(3) You had two A.19.2 subclauses in your proposal; I renumbered the 
+subclauses to eliminate this.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, June 6, 2018  5:31 PM
+
+...
+> I think the calls for Atomic_Thread_Fence and Atomic_Signal_Fence will 
+> definitely need more wording to describe those concepts if we want to 
+> support those calls. I left those loosely and simply defined, in case 
+> we decide to drop those calls.
+
+A bit too loosely to even quite understand what they do! It's hard to decide
+if these are useful without figuring out where they fit into the Ada 
+synchronization model.
+
+Specifically:
+
+"This procedure acts as a synchronization fence between threads."
+
+Ada doesn't have "threads", per se; it would be better in this AI to use the
+proper Ada terminology (even in the discussion). Probably you need to use the
+terminology that Tucker is coming up with for AI12-0119-1 (I forget the 
+details now; not sure if "tasklet" or something else is appropriate).
+
+And of course "synchronization fence" is also an undefined term. You need to 
+tie this to 9.10 in order to explain how this fits in with the Ada 
+synchronization model. (That also would be true when discussing the C "memory
+models"; it seems to me that one can only use weaker/stronger models on 
+individual operations lest the result violate the 9.10 model. And I doubt that
+9.10 requires completely sequentially consistent operations; it only requires 
+that of objects that you can see [anything being allowed for things that can't
+change the execution of the program]. Not sure how one would reconcile that - 
+to be discussed -- or possibly not.)
+
+The second one is worse:
+
+"This procedure acts as a synchronization fence between a thread and signal 
+handlers in the same thread."
+
+Ada has neither "threads" nor "signal handlers". It's unclear to me why a 
+protected object would need special synchronization in this case -- 9.10 seems
+to imply all protected actions need to be fenced (at least if the compiler 
+can't prove the absence of volatile/atomic objects in the action).
+(Alternatively, all atomic writes need to be fenced. Gotta have one or the
+other.)
+
+In any case, 9.10 implies that fencing is done implicitly, and it's not 
+obvious to me whether explicit fencing buys enough to try to define it (the
+AARM suggests defining an atomic object for synchronization, and that would
+carry a fence when needed).
+
+Again, to be discussed.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, June 6, 2018  5:49 PM
+
+(Following up a bit on the previous "fence" comment...)
+
+...
+> > I tend to not want to worry about it. The C++ memory models seem to 
+> > correspond roughly to "normal"/Volatile/Atomic objects -- Ada sets 
+> > these models per-object rather than per program. If we wanted (say) 
+> > a volatile version of these libraries, I'd expect that it would have 
+> > to be separately defined (given the way you defined aspect 
+> > matching). And I can't imagine adopting a C++ memory model that 
+> > essentially strips the guarantees of Atomic while still allowing 
+> > objects to be declared that way.
+> 
+> I think Ada's per object approach is a sensible one, but I think the 
+> memory model concept is someone independent of the 
+> volatile/atomic/normal semantics.
+
+...
+> So, the memory order seems somewhat orthogonal to the atomicity class 
+> of the object.
+> In some cases, different orders might ideally be wanted to be applied 
+> to the same object at different times to get better performance.
+
+The C++ library seems to be applying synchronization per-operation rather 
+than per-object as Ada does (this example shows that I was confused on that
+point in the past).
+
+I still contend that it is a bad idea to have operations that don't follow
+the usual guarentees provided by the language. In particular, Ada requires
+that atomic operations are handled "sequentially consistent" (at least WRT
+to the tasks that use those atomics). I think it would be a bad idea to 
+define operations that violated that expectation.
+
+OTOH, it would make perfect sense to allow *stronger* operations to be applied
+to objects with a *weaker* categorization. Specifically, it would make perfect
+sense to have a volatile version of the library that provided some sequentially
+consistent operations on volatile-not-atomic objects. (Which implicitly answers
+your question: a weaker memory model would require a different package than 
+this one, which is only for atomic types - specifically, the formal parameter
+would have to allow volatile types.)
+
+If we had such a library (atomic operations for volatile-not-atomic), then the
+example above could be constructed, using normal volatile operations other 
+than when sequential consistency is required. That would give the intended 
+semantics.
+
+Of course, as a tool that only synchronization Jedis should consider using,
+it's not clear to me whether it should appear in the Standard. (You have to
+be an expert, but not necessarily a Jedi, to use atomic update operations to
+build a lock-free structure. I could see personally attempting that, but never
+the sort of optimization represented by this example.)
+
+****************************************************************

Questions? Ask the ACAA Technical Agent