CVS difference for ai12s/ai12-0242-1.txt

Differences between 1.3 and version 1.4
Log of other versions for file ai12s/ai12-0242-1.txt

--- ai12s/ai12-0242-1.txt	2018/01/25 07:57:46	1.3
+++ ai12s/ai12-0242-1.txt	2018/01/27 04:54:51	1.4
@@ -5107,7 +5107,9 @@
 > As the complexity of a loop decreases, it is likely better and easier
 > to use a 'Reduce call.
 
-Makes sense, but I at least rarely write such loops (outside of string processing, which wouldn't have enough iterations to be worth parallelizing, and which generally aren't reductions anyway). YMMV.
+Makes sense, but I at least rarely write such loops (outside of string
+processing, which wouldn't have enough iterations to be worth parallelizing,
+and which generally aren't reductions anyway). YMMV.
 
 ****************************************************************
 
@@ -5149,7 +5151,9 @@
 > issues separately. (That would at least let us make some progress.)
 
 I think thats a good idea for now, for that AI.
-I'd probably still try to create a separate AI on Associative, Reducer aspects, as that would be needed for the 'Reduce AI, but would be better to separate that part out.
+I'd probably still try to create a separate AI on Associative, Reducer aspects,
+as that would be needed for the 'Reduce AI, but would be better to separate
+that part out.
 
 > I would like to have a way to
 > tell the compiler whether to parallelize (or not) a reduction
@@ -5247,6 +5251,176 @@
 but note that while this is not commutative, it is associative, and thus is a
 candidate for parallelisation. The intent of the proposal is to support both
 commutative and non-commutative reducer functions.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Tuesday, January 16, 2018  9:44 AM
+
+> ...
+>> As far as explicit parallelism, I now believe the right way to handle 
+>> reductions in that context is to provide a chunk index to a parallel 
+>> loop, vaguely analogous to an entry family index. ...
+[See AI12-0251-1 for this thread.]
+
+I'm also thinking we might want to provide explicit parallelism via an attribute.
+
+We could have;
+
+   'Reduce
+
+  which defaults to sequential semantics, but could implicitly
+  inject parallelism if the global checks and nonblocking checks
+  pass, and appropriate reducers are identified, (and if the implementation
+  can tell that this would be a benefit to performance). An implementation
+  could opt to skip all these checks and provide a simpler sequential
+  implementation, particularly as it is not easy in a lot of cases to determine
+  if introducing the parallelism overhead is worthwhile.
+
+and
+
+   'Parallel_Reduce
+
+
+   which is the explicit form, that applies the global and nonblocking
+   checks, and ensures that reducers are defined. The compiler rejects
+   the compilation if these checks do not all pass.
+
+   This way, the code documents the intent for parallelism, and also
+   makes it clearer to the programmer that parallelism is being applied.
+   And if the programmer wants to compare with sequential performance,
+   or cannot make the checks pass, all he needs to do is take off the
+   "Parallel_" from the attribute reference.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Tuesday, January 16, 2018  10:51 AM
+
+We talked about this in Vienna (or was it Lexington), and decided to leave out
+an explicit "parallel" from the reduction expression.  If the compiler has to
+be smart enough to check whether parallelism would be legal, it is smart enough
+to insert it.  The same thing would apply to quantified expressions as well,
+and eventually container aggregates, etc.  For these kinds of "self-contained"
+expressions, I don't think we should create explicitly parallel versions, as
+it just adds complexity -- they should be seen as inherently parallel, with
+the compiler deciding when it is worth to make them actually parallel.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, January 16, 2018  4:19 PM
+
+> We talked about this in Vienna (or was it Lexington), and decided to 
+> leave out an explicit "parallel" from the reduction expression.
+
+My recollection of this discussion (in Lexington) was a bit different. :-)
+
+You pretty much announced that we don't need "parallel" in expression contexts.
+Brad objected, but you and Raphael argued that tuning is not portable.
+
+Somewhat later, Raphael complained that there isn't any way for the user to
+tell if a reduction expression is executed in parallel. No one responded to
+that.
+
+For myself, this was a complete surprise and I didn't object at least in part
+because I hadn't even thought about the possibility. Now that I have, I
+strongly agree with Brad and Raphael's objections, and have more on top.
+
+> If the compiler has to be smart enough to check whether parallelism 
+> would be legal, it is smart enough to insert it. The same thing would 
+> apply to quantified expressions as well, and eventually container 
+> aggregates, etc.  For these kinds of "self-contained"
+> expressions, I don't think we should create explicitly parallel 
+> versions, as it just adds complexity -- they should be seen as 
+> inherently parallel, with the compiler deciding when it is worth to 
+> make them actually parallel.
+
+Ada always allows as-if optimizations, and surely if the compiler can prove
+that executing in parallel has no impact on the canonical results, then it can
+execute anything in parallel. (That is enabled because the order of evaluation
+of many things is unspecified. There is nothing special about a reduction in
+this sense -- it applies to any aggregate, evaluation of parameters, and most
+other expression parts.)
+
+The problem, though, with as-if optimizations is that one has to be
+conservative with them. One never wants an as-if optimization to run slower
+than the original code.
+
+That's a major issue for parallel execution, as the overhead of parallel
+execution is going to be high on common desktop operating systems. That's
+especially true since the Meltdown flaw, as the mitigation is for all system
+calls to remap the address space. At least one system call will be needed to
+start and stop a tasklet, as the bare machine approach of letting it busy wait
+would be very unfriendly (burning CPU at all times).
+
+Combining these two things, parallelization can only be automatically applied
+when it is known that there are enough iterations and cost to each iteration
+to make the savings be more than the overhead. But that means that only
+reducers with static bounds and known bodies (either fully visible expressions
+or known expression functions) and that can beat the overhead should be
+parallelized. If you don't know the number of iterations, or the amount of
+code, then one can't parallelize as there is a significant risk that the
+performance would be way worse. Obviously, interpreters and even just-in-time
+compilers can do better, but Ada is usually implemented with code generation
+long before execution (especially necessary for embedded systems).
+
+As such, automatic parallelization will be available only very rarely; probably
+rarely enough to not make it worth the effort to even implement.
+(Interestingly, it is much easier to tell when parallelization will do no good,
+as simple short loops are quite common.) However, the programmer knows (or at
+least suspects) when parallelization would improve the performance. They need
+to be able to specify that they want parallelization for every construct for
+which it makes sense (and that surely includes reductions).
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Tuesday, January 16, 2018  11:41 PM
+
+> Combining these two things, parallelization can only be automatically 
+> applied when it is known that there are enough iterations and cost to 
+> each iteration to make the savings be more than the overhead.
+I would add:
+
+and that the gain is significantly higher than what can be achieved with
+regular tasks.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Wednesday, January 17, 2018  12:07 AM
+
+> For myself, this was a complete surprise and I didn't object at least 
+> in part because I hadn't even thought about the possibility. Now that 
+> I have, I strongly agree with Brad and Raphael's objections, and have more
+> on top.
+
+I'd like to also add to what Randy said, which happened to very much echos my
+own views.
+
+Back in Vienna and Lexington, though, the situation was quite different from
+what we have now.
+
+Back then, we were talking reduction expressions, which looked a lot like
+quantified expressions. We were talking about adding the parallel keyword to
+that syntax, which did not fit in very well in my view and looked quite messy.
+So, I was more able to understand why we didn't want to have the explicit
+parallel keyword with reduction expressions, even if I was reluctant to go
+that way.
+
+Since then, we came up with the 'Reduce attribute idea. To have another
+attribute such as 'Parallel_Reduce fits quite a lot more nicely into this
+scheme. There is no additional keyword needed. It's just a substitution of a
+different attribute name. 
+
+I note also that the proposal for explicit parallelism that you proposed the
+other day relies on using 'Reduce calls, which is implicit parallelism.
+If the goal of that proposal is to provide a way to indicate explicit
+parallelism, shouldn't there also be a way to explicitly indicate the the
+'Reduce calls associated with that syntax are also to be executed with
+parallelism?In a lot of cases, parallelism isn't wanted, but other times it
+would be. 
 
 ****************************************************************
 

Questions? Ask the ACAA Technical Agent