CVS difference for ai12s/ai12-0346-1.txt

Differences between 1.3 and version 1.4
Log of other versions for file ai12s/ai12-0346-1.txt

--- ai12s/ai12-0346-1.txt	2020/02/20 05:16:11	1.3
+++ ai12s/ai12-0346-1.txt	2020/04/28 03:42:10	1.4
@@ -1,4 +1,4 @@
-!standard 5.5(2/3)                                   20-01-12  AI12-0346-1/01
+!standard 5.5(2/3)                                   20-04-27  AI12-0346-1/02
 !standard 5.5.2(5/4)
 !standard 5.5.2(7/3)
 !class Amendment 19-10-11
@@ -9,13 +9,18 @@
 !subject Ada and OpenMP
 !summary
 
-** TBD.
+We propose to provide a technical report outlining the "plug-in"
+approach to supporting lightweight threading.  The notion of
+user-defined "generalized" aspects has been proposed (in another AI) as
+a way to provide tuning parameters to a plug-in, without the compiler
+having to be updated every time a new tuning parameter is supported by
+the underlying plug-in.
 
 !problem
 
 Systems like OpenMP are commonly used to provide the framework for
 lightweight threading, GPU usage, and similar capabilities. (OpenMP directly
-supports C and Fortran.) Since these frameworks are already in industrial 
+supports C and Fortran.) Since these frameworks are already in industrial
 usage, it would be useful if Ada (particularly the new parallelism features)
 could use them.
 
@@ -36,38 +41,38 @@
 For the mapping to OpenMP, we are recommending a layered mapping
 to OpenMP (or other light-weight-threading (LWT) scheduler), where upon
 seeing the syntax for a parallel construct, the compiler generates calls
-on a top layer (dubbed “System.Parallelism” for now). This layer is
+on a top layer (dubbed "System.Parallelism" for now). This layer is
 independent of the particular LWT scheduler that will be controlling the
 light-weight threads that are spawned as a result of the parallel
 constructs.   Below System.Parallelism will be a package dubbed
-“System.LWT”, which provides the LWT-scheduler-independent API, and
-implements it using a “plug-in” architecture.  Specific LWT schedulers
-would be children of this package, for example “System.LWT.OpenMP”, and
-one of them would be “plugged in” to System.LWT and handle the various
+"System.LWT", which provides the LWT-scheduler-independent API, and
+implements it using a "plug-in" architecture.  Specific LWT schedulers
+would be children of this package, for example "System.LWT.OpenMP", and
+one of them would be "plugged in" to System.LWT and handle the various
 calls through the API.
 
 The user will determine which particular LWT scheduler gets linked into
-the program by “with”ing a package (e.g. Interfaces.OpenMP) and
-declaring a “control” object of a type declared in that package (e.g.
+the program by "with"ing a package (e.g. Interfaces.OpenMP) and
+declaring a "control" object of a type declared in that package (e.g.
 Interfaces.OpenMP.OMP_Parallel), in the task body for the Ada tasks (or
 the main subprogram for the environment task) where it is desired to
 have multiple threads of control.  Discriminants of the control object
-can be used to control the level of parallelism desired (e.g. “Control :
-OMP_Parallel (Num_Threads => 5);”), as well as potentially other options
+can be used to control the level of parallelism desired (e.g. "Control :
+OMP_Parallel (Num_Threads => 5);"), as well as potentially other options
 that should apply by default across all parallel constructs in the given
-Ada task.  This approach is modeled on the “#pragma omp parallel” of
-OpenMP which creates a “parallel region” in which work-sharing or other
+Ada task.  This approach is modeled on the "#pragma omp parallel" of
+OpenMP which creates a "parallel region" in which work-sharing or other
 forms of parallelism can be used.  In the absence of any LWT scheduler
 linked into the program, the System.LWT package would fall back to a
 purely sequential implementation.  The Interfaces.OpenMP package might
 have other subprograms intended to be called directly by the user, in
-particular those that are part of the “official” OpenMP API, such as
-“omp_get_thread_num” or “omp_get_team_size” (for the full API, see the
+particular those that are part of the "official" OpenMP API, such as
+"omp_get_thread_num" or "omp_get_team_size" (for the full API, see the
 Runtime Library Routines section in
 https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.
-pdf).  We also have proposed that record extensions of “Ada.Aspects”
+pdf).  We also have proposed that record extensions of "Ada.Aspects"
 (suggested name) could be used to pass in options to parallel
-constructs, in a kind of generalized “aspect specification.”
+constructs, in a kind of generalized "aspect specification." (See AI12-0355-1.)
 
 A prototype implementation of this structure is available, and
 can be provided as a zip archive.
@@ -81,31 +86,31 @@
 system, at a particular real-time priority.  For each Ada task that does
 want internal parallelism, the expectation is that the underlying LWT
 scheduler (e.g. OpenMP) will start up (or re-use) additional
-(heavy-weight) “kernel threads” to act as servers for the LWTs that will
+(heavy-weight) "kernel threads" to act as servers for the LWTs that will
 be spawned somewhere within the Ada task.  Each of these servers will
 run at the same priority as the enclosing Ada task, and will share the
-Ada “identity” and “attributes” of that Ada task.  The LWTs served by
-these server threads will in turn get their Ada task “identity” and
-“attributes” from these servers.
+Ada "identity" and "attributes" of that Ada task.  The LWTs served by
+these server threads will in turn get their Ada task "identity" and
+"attributes" from these servers.
 
-Each light-weight thread is run in the context of an “LWT group,” which
-maps quite directly to an OpenMP “taskgroup.”  All of the LWTs spawned
+Each light-weight thread is run in the context of an "LWT group," which
+maps quite directly to an OpenMP "taskgroup."  All of the LWTs spawned
 during the scope of an LWT group are automatically awaited at the end of
 the LWT group.  This ensures that the scope where an LWT is defined
-isn’t exited while such LWTs are still running and potentially making
+isn't exited while such LWTs are still running and potentially making
 up-level references to objects from the enclosing scope.
 
 In past discussions in the ARG and elsewhere we had presumed that
 vendor-specific pragmas or aspects could be used to communicate
 additional options to the underlying LWT scheduler.  However, more
-recently we have discussed including a kind of “generalized aspect”
+recently we have discussed including a kind of "generalized aspect"
 syntax as part of the syntax for parallel constructs.  So far we have
 discussed this in detail only on Ada-Comment, but at the upcoming ARG
 meeting we will probably discuss it further.  The general idea is to
 allow for a variant of the aspect-specification syntax after the
-“parallel” reserved word, and allow it to be based on any record
+"parallel" reserved word, and allow it to be based on any record
 extension of a special Ada.Aspects.Root_Aspect tagged type.  A parameter
-of type “access Root_Aspect’Class” could then be included in the various
+of type "access Root_Aspect'Class" could then be included in the various
 System.LWT and System.Parallelism APIs, allowing the underlying LWT
 scheduler to receive the options and use them as it sees fit (see below
 for an example of how such a parameter might appear).  The advantage of
@@ -122,22 +127,22 @@
 accommodate evolution of the OpenMP standard as well as the ability to
 use other LWT schedulers which might come from, say, an RTOS vendor.  If
 the user chooses to use no LWT scheduler, a sequential fall back will be
-part of System.LWT whenever there is no LWT scheduler “plugged in.”
+part of System.LWT whenever there is no LWT scheduler "plugged in."
 
 Handling secondary stack, exceptions, and transfers of control out of
-the code for a light-weight thread 
- 
+the code for a light-weight thread
+
 Independent of which particular LWT scheduler is present (if any), the
 code for a particular light-weight thread is defined by a function
 pointer and a data object.  For Ada, the data object will typically be a
 tagged data object, and the function will be a simple wrapper that
-merely invokes a special “LWT_Body” dispatching operation on the object,
+merely invokes a special "LWT_Body" dispatching operation on the object,
 and handles all exceptions propagated by the body (similar to the way a
 wrapper around an Ada task body handles all exceptions).  Normal
 secondary stack and exception raising and handling can be performed
 inside the LWT_Body, because light-weight threads run on a given server
-until completion or cancelation.  They aren’t swapped back and forth, so
-there is no added complexity in stack or exception management. 
+until completion or cancelation.  They aren't swapped back and forth, so
+there is no added complexity in stack or exception management.
 Effectively, exceptions are being raised and handled on the stack of the
 server, in the usual way.
 
@@ -145,11 +150,11 @@
 an LWT_Body has an explicit transfer of control out of the code for the
 light-weight thread, an attempt is made to cancel the other threads in
 the LWT group.  The first LWT to attempt cancelation receives an
-indication of “success” from this attempt.  Later LWTs of the same group
-making such an attempt will not receive the “success” indicator. 
+indication of "success" from this attempt.  Later LWTs of the same group
+making such an attempt will not receive the "success" indicator.
 Cancelation in OpenMP, and in Ada 202X, allows cancelation to be
 implemented using a polling approach, where there are various
-well-defined “cancelation points.”  When code within LWT_Body detects
+well-defined "cancelation points."  When code within LWT_Body detects
 that the enclosing LWT group has been canceled, it generally just
 returns from the LWT_Body.  The LWT that successfully initiated the
 cancelation records in a variable visible at the point where the LWT
@@ -164,7 +169,7 @@
 
 The OpenMP recommended approach to supporting a sequence of blocks to be
 (potentially) run in parallel is to create a loop around a switch/case
-statement.  The “GOMP” implementation indicates the same approach in:
+statement.  The "GOMP" implementation indicates the same approach in:
 
    https://gcc.gnu.org/onlinedocs/libgomp/Implementing-SECTIONS-construct.html
 
@@ -214,14 +219,14 @@
 creation of only one out-of-line procedure independent of the number of
 arms in the parallel block statement.
 
-Expansions for Ada 202X parallel loop: 
+Expansions for Ada 202X parallel loop:
 
 In this description, we show an optional intermediate step where GNAT
 might use a pragma Par_Loop so that parallel loops could be specified
 while remaining compilable by older Ada compilers, analogous to the way
-the "Pre" aspect expands into pragma Precondition in GNAT. 
+the "Pre" aspect expands into pragma Precondition in GNAT.
 
-Ada 202X defines the notion of a “chunk specification” which can give a
+Ada 202X defines the notion of a "chunk specification" which can give a
 user-specified name to the index used to identify a chunk.  When using a
 pragma instead of syntax, there would be no way to specify the
 chunk-index name, so the value of the chunk index can be referenced when
@@ -257,7 +262,7 @@
 expands into:
 
 declare
-    procedure I__Loop_Body 
+    procedure I__Loop_Body
       (I__Low, I__High : Longest_Integer; I__Chunk_Index : Positive) is
     begin
         for I in S'Val (I__Low) .. S'Val (I__High) loop
@@ -346,15 +351,15 @@
 
 package System.Parallelism is
    type Longest_Integer is range System.Min_Int .. System.Max_Int;
-      --  Not worrying about unsigned ranges 
+      --  Not worrying about unsigned ranges
       --  with upper bound > System.Max_Int for now.
-      --  Could be handled by having a version of Par_Range_Loop 
+      --  Could be handled by having a version of Par_Range_Loop
       --  that operates on unsigned integers.
 
-   procedure Par_Range_Loop 
+   procedure Par_Range_Loop
      (Low, High : Longest_Integer;
       Num_Chunks : Positive;
-      Aspects : access Ada.Aspects.Root_Aspect’Class := null; 
+      Aspects : access Ada.Aspects.Root_Aspect’Class := null;
       Loop_Body : access procedure
         (Low, High : Longest_Integer; Chunk_Index : Positive));
 
@@ -373,7 +378,7 @@
     (Iterator : Parallel_Iterator'Class;
      Num_Chunks : Positive;
      Aspects : access Ada.Aspects.Root_Aspect’Class := null;
-     Loop_Body : access procedure 
+     Loop_Body : access procedure
         (Iterator : Parallel_Iterator'Class; Chunk_Index : Positive));
 
 Questions and Answers about Ada 202X parallelism and mapping to OpenMP
@@ -431,39 +436,39 @@
 "Conflict_Check_Policy":
 
      http://www.ada-auth.org/standards/2xaarm/html/AA-9-10-1.html
-     
-There are three levels: 
+
+There are three levels:
 
-No checks (“No_Conflict_Checks”);
+No checks ("No_Conflict_Checks");
 
-Shared-object checks (“Known_Conflict_Checks”);
-   This is based on the “known to denote the same object” check
+Shared-object checks ("Known_Conflict_Checks");
+   This is based on the "known to denote the same object" check
    performed on OUT parameters of functions.
 
-Synchronized-object checks (“All_Conflict_Checks”).
+Synchronized-object checks ("All_Conflict_Checks").
    This check is based on disallowing any uplevel references to
    non-synchronized objects.
 
-Also, there are separate policies for tasking (which defaults to “No
-checks”) and for parallel constructs (which defaults to
-“Synchronized-object checks”).
+Also, there are separate policies for tasking (which defaults to "No
+checks") and for parallel constructs (which defaults to
+"Synchronized-object checks").
 
-One suggestion is to move the more complex “shared-object” checks to an
+One suggestion is to move the more complex "shared-object" checks to an
 annex, and perhaps the tasking-related checks as well.  The goal is to
 preserve some basic, default, safety checks in the core, so that Ada
-programs will by default not become less safe when the word “parallel”
+programs will by default not become less safe when the word "parallel"
 is inserted in front of a loop.  A sophisticated user could decide to
-turn off the checks, or choose the subtler “shared-object” checks, but
-the early user of Ada 202X LWP will not be introducing “silent” bugs by
-adding “parallel.”
+turn off the checks, or choose the subtler "shared-object" checks, but
+the early user of Ada 202X LWP will not be introducing "silent" bugs by
+adding "parallel."
 
 Question: Does OpenMP provide a light-weight-thread scheduler?
 
 Answer: Yes, OpenMP provides a light-weight-thread scheduler, and that
 is the one we are talking about in this document. OpenMP generally
 relies on the underlying operating system to schedule the heavier-weight
-"server threads," sometimes called "kernel threads” that actually
-execute the lighter-weight OpenMP “tasks.”
+"server threads," sometimes called "kernel threads" that actually
+execute the lighter-weight OpenMP "tasks."
 
 Question: Do Ada 202x LWP features easily map to OpenMP without
 requiring a complex executive?
@@ -501,20 +506,20 @@
 have an appendix with more detail, but first is a summary of the LWP
 features of these languages.  Some languages have only (chunked)
 data-parallel features (parallel loop over an array or other homogeneous
-data structure), some have only “fork/join” parallelism, which is good
+data structure), some have only "fork/join" parallelism, which is good
 for irregular computations involving recursive divide-and-conquer
 algorithms, often over tree-like structures.  Several languages have
 both loop/stream-oriented and divide-and-conquer-oriented.
 
 Go
 
-Since the beginning, Go has had “goroutines” which are syntactic
+Since the beginning, Go has had "goroutines" which are syntactic
 constructs that represent light-weight threads which can be spawned at
-any point by simply writing “go <function call>;”.  This makes an
+any point by simply writing "go <function call>;".  This makes an
 asynchronous function call.  Typically goroutines communicate using
-“channels” which are essentially FIFO queues, with a user-specified
+"channels" which are essentially FIFO queues, with a user-specified
 level of buffering.  Goroutines can be called from a loop, but they
-would typically be considered a variant of “fork/join” parallelism.
+would typically be considered a variant of "fork/join" parallelism.
 
 Go does no safety checking, but using channels for communication is
 always safe. Goroutines are implicitly awaited at certain points, such
@@ -523,14 +528,14 @@
 Rust
 
 Rust originally focused on safe multi-threading, where the threads were
-heavy-weight “kernel” threads.
+heavy-weight "kernel" threads.
 
 More recently Rust introduced a light-weight parallelism library called
-“Rayon” and that seems to be growing in popularity relative to the
+"Rayon" and that seems to be growing in popularity relative to the
 heavy-weight threading.
 
 Rust’s safety is based on having no global variables, and a somewhat
-complicated “borrowing” mechanism plus the notion of lifetimes, but the
+complicated "borrowing" mechanism plus the notion of lifetimes, but the
 net result is an ownership-based model with no globals, allowing
 compile-time checks that prevent threads (light- or heavy-weight) from
 having conflicts due to concurrent unsynchronized access to shared data.
@@ -554,7 +559,7 @@
 Java has specific parallelism operations on arrays
 
 Java also has a fork/join library that provides light-weight parallelism
-for “irregular” divide-and-conquer style computations.
+for "irregular" divide-and-conquer style computations.
 
 C#
 
@@ -568,13 +573,14 @@
 
 !wording
 
-** TBD.
+[None proposed, except perhaps for an AARM Implementation note
+referring to the technical report we would produce.]
 
 !discussion
 
 The OpenMP session at ARG meeting #62 showed that implementing the proposed
 parallel constructs of Ada on top of OpenMP is feasible. In many of the
-semantic choices, OpenMP arrived at the same solution that the ARG did 
+semantic choices, OpenMP arrived at the same solution that the ARG did
 (particularly in the case of termination semantics), so the amount of extra
 mapping needed seems fairly minimal.
 
@@ -585,21 +591,21 @@
 OpenMP-specific mechanisms have to be implementation-defined.
 
 Ada does need additional mechanisms for tuning (many of the capabilities of
-OpenMP are aimed at tuning particular parallel code). In the case of 
+OpenMP are aimed at tuning particular parallel code). In the case of
 declarations and overall configuration, aspects and pragams have the correct
 semantics (including the ability to define implementation-defined aspects
 and pragmas). It is possible that some of these aspects and pragmas can be
-generalized enough to put them into the standard (for instance, the target 
+generalized enough to put them into the standard (for instance, the target
 aspect to specify where a data object is stored could be language-defined
 if we decide to have a standard parallelism library which would necessarily
 include types/objects for defining target capabilities. The details of the
 the supported targets would, of course, remain implementation-defined).
 
-We do need a way to specify tuning mechanisms for individual parallel 
+We do need a way to specify tuning mechanisms for individual parallel
 constructs. Chunk_specifications provide this for one such parameter, but
 there are many others that are relevant. We could use pragmas for this
 purpose, but that is not ideal for the same reasons that we have moved aways
-from entity-specific pragmas for declarations in favor of aspect 
+from entity-specific pragmas for declarations in favor of aspect
 specifications.
 
 [There's lots more to say here, but your editor is not the person to say it.]
@@ -626,11 +632,11 @@
 
 https://drive.google.com/file/d/156vq44aK2FF60cbd_I8hIOXmqEGjl1H7/view?usp=sharing_eil&amp;ts=5d97f3fe
 
-Here are some slides to help organize our discussion of Ada 202X and OpenMP 
-on Sunday. They are based on slides I presented at Ada-Europe in June, plus 
-some new slides talking about a possible "layered" approach to supporting 
+Here are some slides to help organize our discussion of Ada 202X and OpenMP
+on Sunday. They are based on slides I presented at Ada-Europe in June, plus
+some new slides talking about a possible "layered" approach to supporting
 light-weight parallelism. Note that we are in the middle of a comprehensive
-review of all Ada 202X features, but hopefully our OpenMP discussion can be 
+review of all Ada 202X features, but hopefully our OpenMP discussion can be
 largely independent of the details of that. Looking forward to our discussion
 on Sunday!
 
@@ -642,7 +648,7 @@
 I believe these are the ones I found:
 
 
-Safe Parallelism: Compiler Analysis Techniques for Ada and OpenMP 
+Safe Parallelism: Compiler Analysis Techniques for Ada and OpenMP
 -- http://www.cister.isep.ipp.pt/docs/safe_parallelism__compiler_analysis_techniques_for_ada_and_openmp/1378/view.pdf
 
 
@@ -652,7 +658,7 @@
 
 OpenMP tasking model for Ada: safety and correctness
 
--- ac.at/~blieb/AE2017/presentations/AE_17_v3.pdf 
+-- https://www.auto.tuwien.ac.at/~blieb/AE2017/presentations/AE_17_v3.pdf
 
 
 ****************************************************************
@@ -660,13 +666,13 @@
 From: Tucker Taft
 Sent: Sunday, January 12, 2020  8:35 PM
 
-Here is an extract from a document developed internally to AdaCore as a 
-mapping from Ada 202X to OpenMP.  It has been reviewed by various folks 
-involved with OpenMP, and seems to be consistent with what those attending 
+Here is an extract from a document developed internally to AdaCore as a
+mapping from Ada 202X to OpenMP.  It has been reviewed by various folks
+involved with OpenMP, and seems to be consistent with what those attending
 the International Real-Time Ada Workshop are recommending as an approach.
 
-We need to think about how we might want to refer to this sort of a mapping 
-in the Ada RM, or some other technical specification.  For now, this is 
+We need to think about how we might want to refer to this sort of a mapping
+in the Ada RM, or some other technical specification.  For now, this is
 hopefully useful background reading for such a discussion.
 
 [This is version /01 of the AI - Editor.]
@@ -685,7 +691,7 @@
 seems incomplete.
 
 
-(Also missing full stops/periods after "beginning" and "structures" in the 
+(Also missing full stops/periods after "beginning" and "structures" in the
 Java section).
 
 ****************************************************************
@@ -701,7 +707,7 @@
 
 That is actually intended to be a section heading, not a paragraph... ;-)
 
->(Also missing full stops/periods after "beginning" and "structures" in the 
+>(Also missing full stops/periods after "beginning" and "structures" in the
 >Java section).
 
 OK, thanks.

Questions? Ask the ACAA Technical Agent