Version 1.10 of ai12s/ai12-0171-1.txt

Unformatted version of ai12s/ai12-0171-1.txt version 1.10
Other versions for file ai12s/ai12-0171-1.txt

!standard D.10(11)          16-12-20 AI12-0171-1/03
!class binding interpretation 15-10-05
!status Amendment 1-2012 16-11-10
!status ARG Approved 8-0-1 16-10-09
!status work item 15-10-05
!status received 15-07-10
!priority Low
!difficulty Medium
!qualifier Omission
!subject Ambiguity in Synchronous_Task_Control semantics
!summary
For a given suspension object, it is a bounded error if more than one task calls Suspend_Until_True concurrently.
!question
The following example manages to have two tasks call Suspend_Until_True when the state is certain to be "True."
The RM does not require that Suspend_Until_True is atomic with respect to itself, whereas it does require that for Set_True and Set_False. And the usage scenario that was envisioned when suspension objects were designed was that only a single task would ever call Suspend_Until_True.
Is this example reasonable? (Editor: No.)
Should we tighten up the semantics of suspension objects? (Editor: Probably, but that will eliminate any need for them.)
with Ada.Synchronous_Task_Control; use Ada.Synchronous_Task_Control; with Ada.text_IO; use Ada.text_IO; procedure ATC is thesem: Suspension_Object; task type subworker is end subworker;
task body subworker is begin for I in 1..1000 loop -- seize Suspend_Until_True(thesem);
Put_line("Sub:" & Integer'Image(I));
-- release Set_True(thesem); end loop; end subworker;
type Wref is access subworker; WW: Wref;
begin Set_True(thesem);
WW := new subworker; for I in 1..1000 loop -- seize Suspend_Until_True(thesem);
Put_line("Sub:" & Integer'Image(I));
-- release Set_True(thesem); end loop;
end ATC;
!recommendation
(See Summary.)
!wording
Add the following section after paragraph D.10(11):
Bounded (Run-Time) Errors
It is a bounded error for two or more tasks to call Suspend_Until_True on the same Suspension_Object concurrently. For each task, Program_Error might be raised, the task might proceed without suspending, or the task might suspend, potentially indefinitely. The state of the suspension object might end up either True or False.
!discussion
It is clear that existing implementations take advantage of the lack of atomicity between concurrent calls on Suspend_Until_True. After some discussion at the most recent International Real Time Ada Workshop, it was felt that supporting concurrent calls could add some inefficiency to the Suspension_Object construct, which is primarily of interest because of its simplicity and potential for high-performance implementation.
Here are some comments on the example:
(1) AARM D.10(6.a) implies that design intent was that only a single task ever calls Suspend_Until_True.
(2) The technique shown in the example will work only for exactly two tasks. But exactly two of anything happens very, very rarely in practice -- after all, the fundamental rule of programming language design is that a language feature should be usable zero, one, or many times. Something that is usable exactly twice violates that.
(3) With the introduction of the Max_Entry_Queue_Length aspect (see AI12-0164-1), it is trivial to declare a protected object without a queue. One hopes that implementations will take advantage of this aspect to create lightweight POs when Max_Entry_Queue_Length is set to 1 (after all, this is the primary purpose of the aspect). This means that it ought to be possible to declare a lightweight PO that allows multiple simultaneous calls to Suspend_Until_True. Such a PO could be
protected type SO with Max_Entry_Queue_Length => 1 is procedure Set_True; procedure Set_False; function Current_State return Boolean; entry Suspend_Until_True; private State : Boolean := False; end SO;
protected body SO is procedure Set_True is begin State := True; end Set_True;
procedure Set_False is begin State := False; end Set_False;
function Current_State return Boolean is
begin return State; end Current_State;
entry Suspend_Until_True when State is
begin State := False; end Suspend_Until_True; end SO;
!corrigendum D.10(11)
Insert after the paragraph:
The procedure Suspend_Until_True_And_Set_Deadline blocks the calling task until the state of the object S is True; at that point the task becomes ready with a deadline of Ada.Real_Time.Clock + TS, and the state of the object becomes False. Program_Error is raised upon calling Suspend_Until_True_And_Set_Deadline if another task is already waiting on that suspension object. Suspend_Until_True_And_Set_Deadline is a potentially blocking operation.
the new paragraph:
Bounded (Run-Time) Errors
It is a bounded error for two or more tasks to call Suspend_Until_True on the same Suspension_Object concurrently. For each task, Program_Error might be raised, the task might proceed without suspending, or the task might suspend, potentially indefinitely. The state of the suspension object might end up either True or False.
!ASIS
No ASIS effect.
!ACATS test
The bounded error is not practically testable in an ACATS test (almost anything short of crashing is allowed).
!appendix

From: Tucker Taft
Sent: Friday, July 10, 2015  10:06 AM

AdaCore received the comment shown below from a "GAP" (GNAT Academic Program)
user.  This results in deadlock in at least some GNAT implementations, even
though as written it satisfies the requirements that no two tasks are suspended
at the same time.  The problem seems to be that two tasks are concurrently
calling "Suspend_Until_True."

The RM does *not* require that Suspend_Until_True is atomic with respect to
itself, whereas it does require that for Set_True and Set_False.  And the usage
scenario that was envisioned when suspension objects were designed was that only
a single task would ever call Suspend_Until_True, whereas this example has two
tasks calling Suspend_Until_True, though only when the state is certain to be
"True."

It seems we should clarify the semantics in this case, and either declare it a
bounded error for multiple tasks to call Suspend_Until_True concurrently, or
specify that this scenario should be supported.  I would hope we do *not* choose
to make it erroneous.

----  Comment from GAP user:

It seems to me that package Synchronous_Task_Control should allow to build
simple semaphores like in following example:

with Ada.Synchronous_Task_Control;
use Ada.Synchronous_Task_Control;
with Ada.text_IO; use Ada.text_IO;
procedure ATC is
   thesem: Suspension_Object;
   task type subworker is end subworker;

   task body subworker is
   begin
     for I in 1..1000 loop
        -- seize
        Suspend_Until_True(thesem);

        Put_line("Sub:" & Integer'Image(I));

        -- release
        Set_True(thesem);
     end loop;
   end subworker;

   type Wref is access subworker;
   WW: Wref;

begin
   Set_True(thesem);

   WW := new subworker;
   for I in 1..1000 loop
      -- seize
      Suspend_Until_True(thesem);

      Put_line("Sub:" & Integer'Image(I));

      -- release
      Set_True(thesem);
   end loop;

end ATC;

****************************************************************

From: Randy Brukardt
Sent: Friday, July 10, 2015  4:31 PM

D.10(10) says that "Program_Error is raised upon calling Suspend_Until_True if
another task is already waiting on that suspension object." So if multiple tasks
are calling Suspend_Until_True, one of them is going to fail (have Program_Error
raised) in the general case.

If two tasks happen to call Suspend_Until_True simulateously, it's not clear
what will happen, but that's obviously not something that a programmer can
control, so one has to assume that Program_Error is likely to be raised in that
case.

It would be safe for multiple tasks to call Suspend_Until_True if they somehow
know the value is already True, but what would the point of that be? You might
as well write "null;" in that case.

The OP's program seems to be in this category. It depends on a very precisely
controlled sequence of operations that would be very hard to generate in
practice (that is, with external inputs varying the timing of the worker tasks).
As such, I don't think it is a realistic use of the feature.

So, I suppose that it would make some sense to say that it is a Bounded_Error if
two tasks call Suspend_Until_True simulateously, in case someone is trying to be
overly clever. I don't think it makes sense to make implementers work hard in
this case, since it appears to be a nonsense situation, and the extra locking
needed to make this work would (potentially) defeat the purpose of a suspension
object being as light-weight as possible.

****************************************************************

From: Tucker Taft
Sent: Friday, July 10, 2015  9:32 PM

> ...
> It would be safe for multiple tasks to call Suspend_Until_True if they
> somehow know the value is already True, but what would the point of that be?
> You might as well write "null;" in that case.

The customer's code was using the suspension object as a simple semaphore where
you know there are exactly two tasks contending for the resource.  That doesn't
seem so unreasonable.  The intent was that from the Suspend_Until_True until the
following Set_True the task had exclusive access to some resource.  It seems
reasonable, given the semantics of suspension objects, to imagine this will work
presuming only two tasks are contending for the resource.

> The OP's program seems to be in this category. It depends on a very
> precisely controlled sequence of operations that would be very hard to
> generate in practice (that is, with external inputs varying the timing
> of the worker tasks). As such, I don't think it is a realistic use of
> the feature.

I don't agree.  Once you understand the customer's intent, it seems reasonable.

> So, I suppose that it would make some sense to say that it is a
> Bounded_Error if two tasks call Suspend_Until_True simulateously, in
> case someone is trying to be overly clever. I don't think it makes
> sense to make implementers work hard in this case, since it appears to
> be a nonsense situation, and the extra locking needed to make this
> work would
> (potentially) defeat the purpose of a suspension object being as
> light-weight as possible.

I agree we should probably declare this to be a bounded error, though I am not
completely sure.  I am a little unclear why GNAT has the problem it does with
this code.

****************************************************************

From: Erhard Ploedereder
Sent: Saturday, July 11, 2015  10:21 AM

> It seems we should clarify the semantics in this case, and either
> declare it a bounded error for multiple tasks to call
> Suspend_Until_True concurrently, or specify that this scenario should
> be supported.  I would hope we do *not* choose to make it erroneous.

What bounds on the bounded error, though?
  Program_Error and null semantics are o.k., but undetected deadlock seems
  decidedly unfriendly.

****************************************************************

From: Jean-Pierre Rosen
Sent: Sunday, July 12, 2015  1:15 AM

I don't see any ambiguity here. Moreover, I don't know what "simultaneously"
means...

The only possible interpretation is: one of the tasks passes, and the other one
gets blocked.

The fact that the RM does not explicitely say that it's atomic does not imply
that it is allowed to behave differently from what is specified. This would
require an implementation permission.

****************************************************************

From: Tullio Vardanega
Sent: Sunday, July 12, 2015  9:53 AM

Argh.
How an implementation could get the two tasks in the example to end up behind a
SO set to False is difficult for me to fathom. If preemption kicks in at the
wrong time (or parallel access on a multicore processor) and the
Suspend_Until_True operation is not atomic (which it is not required to be) it
is more likely that the two tasks pass than that they don't. D.10(10) does not
seem to apply to that case, because it talks about "waiting", which technically
is not what happens in the above situation. In fact, D.10(10) suggests very
vigorously that SO are for suspending one task at a time (obviously because SO
involve no queuing), whose request for raising Program_Error would arguably be
made easier if Suspend_Until_True were required to be an atomic operation, in
D.10(7/2).

****************************************************************

From: Tucker Taft
Sent: Sunday, July 12, 2015  3:30 PM

Sounds like we should tighten up the semantics here!

I'm not convinced implementations need all the flexibility that might be implied
by the lack of atomicity of Suspend_Until_True with itself.

****************************************************************

From: Stephen Michell
Sent: Sunday, July 12, 2015  8:44 PM

I agree completely with Jean Pierre here. In fact, the RM (D.10) says "The
operations Set_True and Set_False are atomic with respect to each other and with
respect to Suspend_Until_True”. This means that a call on Suspend_Until_True is
single-threaded with respect to the setting of the suspension variable to True
or False. The fact that an implementation does not do this is a knock on the
implementation, not the RM.

****************************************************************

From: Tucker Taft
Sent: Sunday, July 12, 2015  9:05 PM

The problem is not a race between Set_True/Set_False and Suspend_Until_True.
The problem is a "race" between two different tasks both calling
Suspend_Until_True at the same moment, when the state is know to be True.  The
manual doesn't seem to require that to work, but I am coming to the conclusion
that it probably should.

****************************************************************

From: Randy Brukardt
Sent: Sunday, July 12, 2015  10:38 PM

> I'm not convinced implementations need all the flexibility that might
> be implied by the lack of atomicity of Suspend_Until_True with itself.

Maybe not. But then a suspension object has precisely the same semantics as a
protected object (using the new aspect to eliminate the queue):

    protected type SO with Max_Entry_Queue_Length => 1 is
        procedure Set_True;
        procedure Set_False;
        function Current_State return Boolean;
        entry Suspend_Until_True;
    private
        State : Boolean := False;
    end SO;

The body of Suspend_Until_True has a barrier of "State", of course.

If this is what a suspension object does, then why do we need it at all? Anyone
can write the above (or something more appropriate for their usage). My
understanding was always that a suspension object was more lightweight than a
protected object; thus it needs less specified semantics so that an
implementation can implement it as cheaply as possible.

So, if we're requiring the above (to be implemented without a queue, no big deal
for compilers if it is declared that way), then we should simply say that and
essentially stick Suspension_Objects into Annex J as old, unnecessary junk.

****************************************************************

From: Tucker Taft
Sent: Monday, July 13, 2015  8:06 AM

...
> If this is what a suspension object does, then why do we need it at all?

Because it can be a far simpler implementation, and because it would be horribly
upward incompatible to drop the now.

> Anyone can write the above (or something more appropriate for their usage).
> My understanding was always that a suspension object was more
> lightweight than a protected object; thus it needs less specified
> semantics so that an implementation can implement it as cheaply as possible.

Correct.  The question is whether this particular semantic "liberalization"
actually simplifies the implementation.  I am having trouble understanding how
it helps implementations.

> So, if we're requiring the above (to be implemented without a queue,
> no big deal for compilers if it is declared that way), then we should
> simply say that and essentially stick Suspension_Objects into Annex J
> as old, unnecessary junk.

Let's not go overboard. ;-)

****************************************************************

From: Randy Brukardt
Sent: Sunday, July 12, 2015  1:33 PM

> > If this is what a suspension object does, then why do we need it at all?
>
> Because it can be a far simpler implementation, and because it would
> be horribly upward incompatible to drop the now.

You're not supposed to answer rhetorical questions. :-)

But since you did:

I strongly disagree about the "far simpler implementation". There is no reason
for an Ada compiler to use only a single implementation for all protected
objects. No one does that for arrays, or aggregates, or function returns. So why
would one pessimize the protected object implementation?

In this case, we have a protected object with a queue length of one, and a
single component that could be atomic. There is obviously a very simple
implementation for such an object.

And, of course, we never actually delete anything, we just stick it into Annex J
and forget about it. That's what we should do with Suspension objects. And
replace them by an Implementation Requirement to use a simple implementation for
protected objects that have a single atomic component and a maximum queue length
of one. (And probably an example of the protected object I suggested previously,
so the replacement is obvious.) Indeed, I'd go so far as to junk the current
description of suspension objects and just describe them in terms of the
equivalent protected object (that would answer all questions).

> > Anyone can write the above (or something more appropriate for their usage).
> > My understanding was always that a suspension object was more
> > lightweight than a protected object; thus it needs less specified
> > semantics so that an implementation can implement it as cheaply as possible.
>
> Correct.  The question is whether this particular semantic
> "liberalization" actually simplifies the implementation.  I am having
> trouble understanding how it helps implementations.

You could be right about that. But in that case, we no longer need it at all
(presuming that Max_Entry_Queue_Length gets implemented), because it would be
relatively easy for a compiler to implement all protected objects of that form
with the "much simpler implementation". (You would need an atomic object and a
task ptr for each such object -- for either possible description.) And a
compiler writer (and the programmer as well) is much better off with the effort
spent on that rather than spending any time on a custom implementation of
suspension objects.

> > So, if we're requiring the above (to be implemented without a queue,
> > no big deal for compilers if it is declared that way), then we
> > should simply say that and essentially stick Suspension_Objects into
> > Annex J as old, unnecessary junk.
>
> Let's not go overboard. ;-)

It's not overboard, IMHO. (I recall Geert telling us that GNAT does
optimizations like this for protected objects, and that was without the queue
aspect that makes them trivial.) There would be a lot more value in implementers
improving the implementation of protected objects than in any special case
gizmo. I recall someone telling us that Ada was about a building block approach.
So lets put any effort into the building blocks, not the gizmos.

****************************************************************

From: Jeff Cousins
Sent: Tuesday, January 12, 2016  9:54 AM

Re AI12-0171: "Jeff will talk to Alan about this. Bob will investigate the
effect in GNAT with help from Tucker. Bob announces that there are 15 versions
of the implementation for GNAT for different targets."

I've talked to Alan (who discussed with Andy Wellings).  Their preference would
be that it should handle multiple suspending tasks, as Ravenscar users are
likely to expect it, but it's not a strong preference and they might reconsider
if it had a serious impact on performance.

****************************************************************

From: Tucker Taft
Sent: Tuesday, January 12, 2016  10:00 AM

I have yet to understand an implementation that cannot handle multiple
suspending tasks and still properly implement the rest of the semantics.

****************************************************************

From: Bob Duff
Sent: Tuesday, January 12, 2016  10:24 AM

> Re AI12-0171: "Jeff will talk to Alan about this. Bob will investigate
> the effect in GNAT with help from Tucker. Bob announces that there are
> 15 versions of the implementation for GNAT for different targets."

As I recall, I had no idea how to implement it in GNAT, but Tucker said it's
trivial.  So I think the ball is in Tucker's court, to prove his point.  I'm
not planning to do any more investigating any time soon.

I think it failed on Linux (with tasking built on top of Posix threads), and
we don't know if it fails on any other platforms.

The minutes say, "Bob is concerned about the possibility harming efficiency."
Yes, I probably was -- after all, this feature is intended to be super-simple
in order to be super-efficient.  But I think I was even more concerned about
being able to implement it at all.

> I've talked to Alan (who discussed with Andy Wellings).  Their
> preference would be that it should handle multiple suspending tasks,

I'm not surprised.  ;-)

>... as Ravenscar users are likely to expect it,

But we know that to be false.  Exactly one customer expected it, in the 20
years this bug (if it is a bug) has existed in GNAT.
(I'm pretty sure the it's been that way from the beginning.)

****************************************************************

From: Alan Burns
Sent: Wednesday, January 13, 2016  3:51 AM

>> >... as Ravenscar users are likely to expect it,
> But we know that to be false.  Exactly one customer expected it, in
> the 20 years this bug (if it is a bug) has existed in GNAT.
> (I'm pretty sure the it's been that way from the beginning.)
>

If the 'bug' would only manifest itself if there were parallel (not just
concurrent) calls to Suspend_Until_True then it could easily not be unnoticed
for 20 years!

****************************************************************

From: Randy Brukardt
Sent: Wednesday, January 13, 2016  4:51 PM

> As I recall, I had no idea how to implement it in GNAT, but Tucker
> said it's trivial.  So I think the ball is in Tucker's court, to prove
> his point.  I'm not planning to do any more investigating any time
> soon.
>
> I think it failed on Linux (with tasking built on top of Posix
> threads), and we don't know if it fails on any other platforms.

Tucker could not believe that this implementation (or any implementation)
could fail without having some major bug, such that it doesn't meet one of
the existing requirements. (He's probably right for a bare-machine
implementation, but that's as far as I can say anything useful.) The question
of exactly why the implementation failed was raised, and I think we're going
to need to know that in order to progress this AI.

I'd do that investigation myself, but I know so little about GNAT's runtime
and Posix threads that I'd have no chance of figuring any of it out. It would
take me weeks. I would hope that you or someone else at AdaCore would do that.
The odds that Tucker would do it, are nil, however, because he won't even look
at his homework until he's on the plane to Italy, and I suspect this problem
is going to need a better debugging facility than one would have on a
small-screened laptop.

> The minutes say, "Bob is concerned about the possibility harming
> efficiency."  Yes, I probably was -- after all, this feature is
> intended to be super-simple in order to be super-efficient.  But I
> think I was even more concerned about being able to implement it at
> all.

That seems non-sensical to me. There is an obvious and simple implementation
in terms of protected types (which I showed in my earlier messages on this
topic). Such an implementation is likely to be heavier weight than is wanted
for this purpose, but surely it would do to meet the requirements of the
Standard. So an implementation exists. The only question is whether an "super
lightweight" implementation exists (and the Standard doesn't and can't
*require* that).

Of course, simply *because* this semantics can be implemented with a protected
type, there is no reason to risk making existing implementations more
expensive by adding new requirements. If someone *really* needs suspension
objects that can work with multiple tasks, they can write a protected object
that works that way.

> > I've talked to Alan (who discussed with Andy Wellings).  Their
> > preference would be that it should handle multiple suspending tasks,
>
> I'm not surprised.  ;-)
>
> >... as Ravenscar users are likely to expect it,
>
> But we know that to be false.  Exactly one customer expected it, in
> the 20 years this bug (if it is a bug) has existed in GNAT.
> (I'm pretty sure the it's been that way from the beginning.)

Absolutely. Precisely one person tried to end run the clear intent of the rules
(as explained in the AARM notes). I don't see any reason to change that intent
at this point, even if someone cleverly took advantage of an unclear part of
the rules to do something unexpected. But how do we get Tucker and others to
stand down on this point?? We (the ARG) need your help, I think (at least to
find some other expert at AdaCore to figure out what's going on). Otherwise,
this AI will be permanently on our agenda, which is a situation I'd like to
avoid.

****************************************************************

From: Tucker Taft
Sent: Wednesday, January 13, 2016  5:26 PM

> ...  Precisely one person tried to end run the clear intent of the
> rules (as explained in the AARM notes). ...

We actually don't know that.  What we know is that one person using a
multiprocessor, running on an O/S and with a version of GNAT where this didn't
work, bumped into this problem.  Clearly on an O/S where it works fine or on a
monoprocessor, no one will have bumped into it.  As more embedded-computing
folks move to multiprocessors, this problem may become more common.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, January 13, 2016  6:01 PM

We certainly know that people who try exploit mistakes in the Standard get
what they deserve, and this is a case where the intent is crystal-clear in the
AARM and pretty obvious in the RM. I doubt that there are many such people (at
least intentionally), and I'd suspect that the actual number is one. It's
always possible that someone mistakenly does this on a monoprocessor, but I
would guess that they'd realize it was a bug as soon as it was pointed out to
them.

****************************************************************

From: Tucker Taft
Sent: Tuesday, January 12, 2016  7:04 PM

But Alan doesn't see this as a bug, nor do I.  To me it is a clever and
reasonable use of the primitive.

****************************************************************

From: Bob Duff
Sent: Monday, January 18, 2016  8:42 AM

> If the 'bug' would only manifest itself if there were parallel (not just
> concurrent) calls to Suspend_Until_True then it could easily not be
> unnoticed for 20 years!

Well, multiprocessors have been in common use for at least 10 of those 20
years.

I don't know if the 'bug' manifests on a uniprocessor, and unfortunately I
don't really have time to investigate that right now.

****************************************************************

From: Bob Duff
Sent: Monday, January 18, 2016  8:55 AM

> Tucker could not believe that this implementation (or any
> implementation) could fail without having some major bug, such that it
> doesn't meet one of the existing requirements.

Well, despite what Tucker can or cannot believe, we have an implementation
that DOES fail the test in question without having some (other) major bug.

I looked at the code months ago, and (if I recall correctly) I didn't
understand why it doesn't work.  Tucker's opinion that's it's trivial to
implement efficiently doesn't change that fact.

> > The minutes say, "Bob is concerned about the possibility harming
> > efficiency."  Yes, I probably was -- after all, this feature is
> > intended to be super-simple in order to be super-efficient.  But I
> > think I was even more concerned about being able to implement it at
> > all.
>
> That seems non-sensical to me. There is an obvious and simple
> implementation in terms of protected types...

OK, if that's true (and I'll believe you, because I don't have time to think
about it right now), then it's "just" an efficiency issue. But I'll note that
this feature wouldn't even exist if efficiency weren't the concern -- we'd
just let programmers implement it themselves.

>...We (the ARG) need your help,
> I think (at least to find some other expert at AdaCore to figure out
>what's  going on).

Sorry, but I don't have time to work on this any time soon, and I'm not really
in a position to tell someone else at AdaCore to work on it.

****************************************************************

From: Tucker Taft
Sent: Monday, January 18, 2016  11:24 AM

> Well, despite what Tucker can or cannot believe, we have an
> implementation that DOES fail the test in question without having some
> (other) major bug.
>
> I looked at the code months ago, and (if I recall correctly) I didn't
> understand why it doesn't work.  Tucker's opinion that's it's trivial
> to implement efficiently doesn't change that fact. ...

What I should have said is that I couldn't imagine how a correct
implementation could fail this. That apparently reflects a lack of imagination
or understanding on my part, since as you point out, we have such an
implementation.  But the strange thing is that even when looking at the
implementation it is not obvious how it could fail.  It was certainly not
designed to *intentionally* fail such a test.  Presumably this is some sort of
very subtle caching or out-of-order execution problem.  Which is usually a
safe statement when dealing with multiprocessors!

****************************************************************

From: Randy Brukardt
Sent: Monday, January 18, 2016  3:53 PM

> What I should have said is that I couldn't imagine how a correct
> implementation could fail this.

As far as I know, that's what you said.

> That apparently
> reflects a lack of imagination or understanding on my part, since as
> you point out, we have such an implementation.  But the strange thing
> is that even when looking at the implementation it is not obvious how
> it could fail.  It was certainly not designed to *intentionally* fail
> such a test.
> Presumably this is some sort of very subtle caching or out-of-order
> execution problem.  Which is usually a safe statement when dealing
> with multiprocessors!

And this is the thing we need to understand. If this implementation is in
fact correct, then it is clear evidence that requiring the case in question
requires extra overhead, and that is something we don't want to require in
this case.

But we don't really know if the implementation is correct on the margins.
For that, we need to understand why it fails, so that we can figure out
whether a similar failure could occur when used as intended. (That wouldn't
surprise me, nor would it surprise me if the weird case in fact is a problem
by itself.)

I doubt that we can progress the AI without having that information (since we
have a supposedly correct implementation that fails, and fixing that failure
almost certainly would have a performance cost, and we simply don't want to
pay *any* unnecessary cost in this case). Surely I wouldn't support
"confirming" that multiple tasks can indeed call a suspension object in one
unusual case, unless I was convinced that there was no cost to doing so.

We seem to have an equal number of people who think this weird case should
work (unless the performance cost is unacceptable), and seem to believe that
any correct implementation would work (so there isn't any performance cost).
That way will never lead to a conclusion unless we have some additional
evidence one way or the other.

So how do we progress the AI? Bob seems to flat out refuse to find the answer;
not that I blame him, this is likely going to be hard work. But who is going
to do that relatively hard work??

****************************************************************

From: Bob Duff
Sent: Monday, January 18, 2016  4:32 PM

> So how do we progress the AI?

It's not clear that we can.

>...Bob seems to flat out refuse to find the  answer;

Right.  At least, not during 2016.  I just have way too much other work to do,
that AdaCore deems more important.  And AdaCore is paying me; ARG is not.  ;-)

>... not that I blame him, this is likely going to be hard work.

Right, I already worked on it a little bit, and the failure was mysterious to
me.

>...But who is going to do that relatively hard work??

I fear nobody qualified has the time.  Sorry.  :-(

****************************************************************

From: Tucker Taft
Sent: Monday, January 18, 2016  9:32 PM

> ... So how do we progress the AI? Bob seems to flat out refuse to find
> the answer; not that I blame him, this is likely going to be hard
> work. But who is going to do that relatively hard work??

I took another look at one of our implementations, and I think I understand the
problem. In at least some of our implementations (e.g. VxWorks) we use both a
binary semaphore and a mutex (actually implemented using a binary semaphore) in
the following way:

    type Suspension_Object is record
       Sema  : System.OS_Interface.SEM_ID :=
                 System.OS_Interface.semBCreate
                   (System.OS_Interface.SEM_Q_FIFO,
                    System.OS_Interface.SEM_EMPTY);
       Mutex : System.OS_Interface.SEM_ID :=
                 System.OS_Interface.semBCreate
                   (System.OS_Interface.SEM_Q_FIFO,
                    System.OS_Interface.SEM_FULL);
    end record;

    ...

    procedure Set_False (S : in out Suspension_Object) is
       St : STATUS;
       pragma Unreferenced (St);
    begin
       --  Need to get the semaphore into the "empty" state.
       --  On return, this task will have made the semaphore
       --  empty (St = OK) or have left it empty.

       St := semTake (S.Sema, NO_WAIT);
    end Set_False;

    --------------
    -- Set_True --
    --------------

    procedure Set_True (S : in out Suspension_Object) is
       St : STATUS;
       pragma Unreferenced (St);
    begin
       St := semGive (S.Sema);
    end Set_True;

    ------------------------
    -- Suspend_Until_True --
    ------------------------

    procedure Suspend_Until_True (S : in out Suspension_Object) is
       St : STATUS;
    begin
       --  Determine whether another task is pending on the suspension
       --  object. Should never be called from an ISR. Therefore semTake can
       --  be called on the mutex

       St := semTake (S.Mutex, NO_WAIT);

       if St = OK then
          --  Wait for suspension object

          St := semTake (S.Sema, WAIT_FOREVER);
          St := semGive (S.Mutex);

       else
          --  Another task is pending on the suspension object

          raise Program_Error;
       end if;
    end Suspend_Until_True;

---

Now remember that the way the user was trying to use Suspension_Objects was to
call Set_True at the beginning, and then have two tasks coordinate by using
"Suspend_Until_True" as getting a Lock and "Set_True" as releasing the Lock.

Now if we create the following sequence we can see the above implementation can
produce a "raise Program_Error" with this pattern of use, if there is any
interleaving in the execution of the two Suspend_Until_True's:

   Default_init: S.Sema = 0 (aka empty), S.Mutex count = 1 (aka full)

   Either task: Set_True --  semGive(S.Sema) -- S.Sema count = 1

   Task1: Suspend_Until_True: semTake(S.Mutex, No_Wait), error if fails
                                 -- S.Mutex count = 0
                              semTake(S.Sema, Wait_Forever)
                                 -- S.Sema count = 1, no suspend
                              semGive(S.Mutex)
                                 -- S.Mutex count = 1

   Task2: Suspend_Until_True: semTake(S.Mutex, No_Wait), error if fails
                                 -- S.Mutex count = 0
                              semTake(S.Sema, Wait_Forever)
                                 -- S.Sema count = 1, no suspend
                              semGive(S.Mutex)
                                 -- S.Mutex count = 1


Clearly if Task2 tries to do a semTake on S.Mutex while Task1 holds the S.Mutex,
we will get a Program_Error, even if Task1 does not actually need to suspend.

So it isn't that hard to create this situation after all.

An alternative approach such as the following doesn't always work, because it
uses the Mutex with WAIT_FOREVER in Set_True (Set_False need not change), and
that isn't allowed because Set_True can be called during a protected action,
including from the interrupt level (D.10(11)).  To make this work, the mutex
would have to use a spin lock on a multiprocessor and have a ceiling priority --
i.e. be the kind of mutex used to implement a protected object -- and VxWorks in
particular doesn't provide one of those (at least I couldn't find one).  You
would end up having to play the same sort of tricks we presumably play with
protected objects on VxWorks, where interrupt-level protected objects use a
different mechanism (e.g. hardware priorities) than non-interrupt level
protected objects (which can use VxWorks mutexes).

In any case, here is a mechanism that would work, provided that the mutexes used
spin locks and ceiling priorities.

    procedure Set_True (S : in out Suspension_Object) is
       St : STATUS;
       pragma Unreferenced (St);
    begin
       St := semTake (S.Mutex, WAIT_FOREVER);
       S.Is_Waiting := False;   -- This is a No-Op if no waiting task
       St := semGive (S.Sema);  -- Wake up any waiting task
       St := semGive (S.Mutex);
    end Set_True;

    ...

    procedure Suspend_Until_True (S : in out Suspension_Object) is
       St : STATUS;
    begin
       --  Determine whether another task is pending on the suspension
       --  object. Should never be called from an ISR. Therefore semTake can
       --  be called on the mutex

       St := semTake (S.Mutex, WAIT_FOREVER);
       St := semTake (S.Sema, No_Wait);
       if St = OK then
          --  State was "True" already
          --  Just return after releasing the mutex
          St := semGive (S.Mutex);
          return;
       end if;
       --  State is "False," check if anyone is waiting
       if S.Is_Waiting then
          --  Someone was already waiting, error
          St := semGive (S.Mutex);
          raise Program_Error;
       end if;

       --  Remember that a task is waiting
       S.Is_Waiting := True;
       St := semGive (S.Mutex);

       --  Suspend until a semGive on Sema
       St := semTake (S.Sema, Wait_Forever);

    end Suspend_Until_True;


So hopefully this gives us some idea of the tradeoffs -- perhaps enough to
allow us to consider the AI again.

****************************************************************

From: Randy Brukardt
Sent: Monday, January 18, 2016  10:18 PM

...
> I took another look at one of our implementations, and I think I
> understand the problem.

Thanks Tuck!!!

<<Lengthy description of the problem>>

...
> Clearly if Task2 tries to do a semTake on S.Mutex while Task1 holds
> the S.Mutex, we will get a Program_Error, even if Task1 does not
> actually need to suspend.
>
> So it isn't that hard to create this situation after all.

And it's pretty clear that this implementation does meet D.10(7./2), since the
Set_False and Set_True operations just use a single mutex operation (which I
presume is atomic, else nothing makes sense).

So there clearly is some value to the rules as intended; one can use a simpler
implementation for Suspend_Until_True if one doesn't require it to be atomic
itself and assume more than one task calling it is an error.

It's of course possible to implement a Suspension_Object in terms of a protected
object (or, in as you suggest, terms of primitives that act like a protected
object), but that seems like a form of abstraction inversion -- especially if
the target doesn't directly support those primitives (as on VxWorks).

> So hopefully this gives us some idea of the tradeoffs -- perhaps
> enough to allow us to consider the AI again.

Yup. It seems I was right all along. ;-)

I guess that leaves us with the problem of properly defining what happens when
multiple tasks call Suspend_Until_True. It seems we don't want to allow it, but
we wouldn't want any overhead to check that. A Bounded Error seems right, but
what are the bounds? I'd hate to make it erroneous.

****************************************************************

From: Tucker Taft
Sent: Monday, January 18, 2016  11:10 PM

> ... I guess that leaves us with the problem of properly defining what
> happens when multiple tasks call Suspend_Until_True. It seems we don't
> want to allow it, but we wouldn't want any overhead to check that. A
> Bounded Error seems right, but what are the bounds? I'd hate to make it
> erroneous.

The bounds are either it raises Program_Error, or it works as our poor user
hoped it would.

By the way, we could make it much more *likely* that it would work as our user
had hoped by inserting the following at the beginning of Suspend_Until_True:

     St := semTake(S.Sema, No_Wait);
     if St = OK then
        return;
     end if;

This would handle the case where the State is already True, without involving
the mutex. The only way things would fail would be if the "Set_True" happened
after this first call on SemTake, but before the second call.  In fact I believe
this change would always correctly handle the particular use-case identified in
the AI, because the Set_True is being performed by one of the two tasks involved
in the race, so you would never get a Program_Error due to both tasks going
after the same mutex at the same time.

Hmmm..., that might allow us to narrow the restriction ... e.g. two tasks must
not be calling Suspend_Until_True concurrently if a third task could be calling
Set_True.

****************************************************************

From: Tucker Taft
Sent: Monday, January 18, 2016  11:27 PM

> Hmmm..., that might allow us to narrow the restriction ... e.g. two
> tasks must not be calling Suspend_Until_True concurrently if a third task
> could be calling Set_True.

A better way of saying it might be:  It is a bounded error if the State is False
and two tasks are calling Suspend_Until_True concurrently, even if the State
becomes True before either of them blocks.  Program_Error might be raised in
this case even though only one ultimately would have blocked.

This means that so long as you know the State is True on entry to
Suspend_Until_True for at least one of the tasks, you are fine.

****************************************************************

From: Bob Duff
Sent: Tuesday, January 19, 2016  6:39 AM

> I took another look at one of our implementations, and I think I understand
> the problem.

Thanks.

> In at least some of our implementations (e.g. VxWorks) ...

What about Linux?  I think the original bug report was on Linux.

****************************************************************

From: Tucker Taft
Sent: Tuesday, January 19, 2016  9:45 AM

I believe our default implementation on Linux uses mutexes in Set_True, and
doesn't seem to have any obvious race conditions.  The original bug report was
actually on Mac OSX (O708-005), but you indicated you were able to reproduce it
on Linux.  Pat Rogers said that we didn't have full support for Symmetric Multi
Processing (SMP) in our critical sections yet.  Not quite sure what that means
in the context of Linux or Mac OSX, which both have had SMP for many years.

Here is the platform info for the original bug report:

Platform: x86-64 Mac OS X (64 bits)
Version: GNATMAKE GPL 2015 (20150428-49)

****************************************************************

From: Bob Duff
Sent: Tuesday, January 19, 2016  10:00 PM

> I believe our default implementation on Linux uses mutexes in
> Set_True, and doesn't seem to have any obvious race conditions.

That's the only one I looked at carefully (IIRC), and I didn't see any obvious
race conditions, either.

>...The original bug report was actually on Mac OSX  (O708-005), but you
>indicated you were able to reproduce it on Linux.

Ah, OK, I had forgotten that.  I always try to reproduce bugs on Linux, because
that's what I have at hand.

****************************************************************

From: Tucker Taft
Sent: Tuesday, January 19, 2016  10:11 AM

> That's the only one I looked at carefully (IIRC), and I didn't see any
> obvious race conditions, either. ...

Pat Rogers mentioning something about incomplete SMP implementation makes me
think that perhaps the mutexes we are using are not guaranteed to work in a
multiprocessor environment, or we aren't doing the appropriate amount of cache
flushing and "fencing," so that perhaps the S.Waiting flag isn't being properly
updated across processors.

****************************************************************

From: Arnaud Charlet
Sent: Tuesday, January 19, 2016  10:22 PM

Pat was talking about our bare board implementation, not our POSIX/Linux/Darwin
implementation, so that's unrelated to this discussion as far as I know.

****************************************************************

From: Stephen Michell
Sent: Tuesday, January 19, 2016  9:25 AM

>I took another look at one of our implementations, and I think I understand
>the problem. ...

That is because the algorithm is wrong. What has been constructed is 1/2 of a
monitor. The Suspend_Until_True sets the monitor lock and suspends itself, but
the monitor lock then is not opened once it is suspended. The Set_True function
and the Set_False functions ignore the monitor lock, which is not allowed. You
either must use only a single semaphore or build a complete functioning monitor.

****************************************************************

From: Tucker Taft
Sent: Tuesday, February 9, 2016  10:56 AM

> A better way of saying it might be:  It is a bounded error if the
> State is False and two tasks are calling Suspend_Until_True
> concurrently, even if the State becomes True before either of them
> blocks.  Program_Error might be raised in this case even though only one
> ultimately would have blocked.

Or both might be blocked, I suppose.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 9, 2016  6:32 PM

> Another item for the Pisa agenda?

Of course, it's an open AI. It would be on the agenda even if neither you nor
Bob had provided any additional information.

Do you have a specific wording proposal (in particular, where does the wording
go?). Perhaps you should just spend 10 minutes and update the AI (it's assigned
to you and Bob anyway). [As always, it's easier to have a wording proposal and
then decide not to use it rather than not having a wording proposal and then
having to decide use something nebulous.]

****************************************************************

From: Alan Burns
Sent: Wednesday, February 10, 2016  2:51 AM

Would ARG like the IRTAW (which meets in April) to discuss both Synchronous Task
Control and Synchronous Barriers?

****************************************************************

From: Tucker Taft
Sent: Wednesday, February 10, 2016  7:02 AM

Yes, please!

****************************************************************

From: Jeff Cousins
Sent: Wednesday, February 10, 2016  6:56 AM

Yes please Alan, we do seem to spend some time trying to second guess exactly
what the IRTAW would like.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 10, 2016  2:05 PM

I agree here as well. I'd also like some idea of how IRTAW plans to extend
Ravenscar in the future. It seems to me that the two goals ("small, simple
runtime" and "analyzability") are diverging, since with more powerful computers
and techniques, more can be analyzed all the time, but what can be put into a
small, simple runtime changes little over time. That's especially true for bare
machine systems (I always think of our MS-DOS tasking system as a bare machine,
since MS-DOS provided little other than I/O - we did everything else ourselves
-- so I consider my expertise to be in the bare machine area). Clearly, in a
system where the runtime sits on top of some other OS (be it Windows or Posix or
some real time kernel), there's little advantage to the runtime to excluding
features that are provided by the underlying system anyway. So if that exclusion
isn't needed for analysis reasons, it's just in the way. OTOH, on a bare
machine, every new capability means more work and complication (and often
verification effort). I suspect that these diverging goals need separate
profiles going forward, but of course I know just enough to cause trouble. :-)

****************************************************************

From: Jeff Cousins
Sent: Wednesday, May 18, 2016  1:58 PM

Alan - what was the outcome of the IRTAW discussions on this?

****************************************************************

From: Alan Burns
Sent: Friday, May 20, 2016  2:33 AM

Yes IRTAW did discuss these issue at length, sorry for being slow in
reporting.

The view of the IRTAW was that

1) The definition of Synchronous task control should not be changed;
concurrent calls to \texttt{Suspend\_Until\_True} are not defined to be
atomic and hence suspension objects cannot be shared between tasks

2) Synchronous Barriers should not be included in the Ravenscar profile.
The resistance to changes to Ravenscar is high, and there are work rounds.

IRTAW also discussed an AI for more effective EDF task scheduling, and an
AI for a profile that is more expressive than Ravenscar. These will be
forwarded to the ARG.

****************************************************************

From: Tucker Taft
Sent: Saturday, June 4, 2016  8:12 PM

> The view of the IRTAW was that
>
> 1) The definition of Synchronous task control should not be changed; 
> concurrent calls to \texttt{Suspend\_Until\_True} are not defined to 
> be atomic and hence suspension objects cannot be shared between tasks

I would recommend we emphasize this with some additional wording, because as
written it is easy to become confused about whether a suspension object can
be used as a simple mutex.

Bob, are you planning to update the wording to AI12-0171?  It might be nice
to make it a bounded error to have two tasks contending on Suspend_Until_True,
and allow implementations to raise Program_Error if they detect the situation.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, June 7, 2016 12:15 AM

...
> > The view of the IRTAW was that
> >
> > 1) The definition of Synchronous task control should not be changed; 
> > concurrent calls to \texttt{Suspend\_Until\_True} are not defined to 
> > be atomic and hence suspension objects cannot be shared between 
> > tasks
> 
> I would recommend we emphasize this with some additional wording, 
> because as written it is easy to become confused about whether a 
> suspension object can be used as a simple mutex.

I agree, it should be clear whatever we decide.

> Bob, are you planning to update the wording to AI12-0171?  It might be 
> nice to make it a bounded error to have two tasks contending on 
> Suspend_Until_True, and allow implementations to raise Program_Error 
> if they detect the situation.

I'd be careful about that. The usual "works or raises Program_Error" bounded
error would seem to be adding overhead here, precisely what we don't want to
do. (We might as well just require it to work in that case.) If we say it is
erroneous (it probably is already - bad use of shared variables), then an
implementation can do anything they want, which surely includes raising an
exception. And there surely is no such thing as a bounded error where the
choices are "erroneous or Program_Error", 'cause that's the same as erroneous
by itself.

****************************************************************

From: Tucker Taft
Sent: Monday, October 3, 2016  3:21 PM

Here is some proposed wording about what happens if two or more tasks attempt
to call Suspend_Until_True on the same suspension object.  I put it in a
Bounded Errors section, and tried to allow for pretty much anything happening.
[This is version /01 of the AI - Editor.]

****************************************************************

From: Randy Brukardt
Sent: Monday, October 3, 2016  5:12 PM

Thanks. I found it interesting that the "Editor's ramblings" became "some
comments". :-) I updated them a bit, adding a new item to mention that AARM
D.10(6.a) seems to indicate that the original intent is that only one task
call Suspend_Until_True, and simplifying the last one a bit to make it more
relevant. (I.e. one could use a PO like this if you wanted to have a
suspension-like object that did allow multiple callers.)

I also added some text to the ACATS test section (to say it's not testable).

****************************************************************

From: Jean-Pierre Rosen
Sent: Monday, October 3, 2016 11:50 PM

...
>       entry Suspend_Until_True when State is
> 	begin
> 	    null;
            State := False;
> 	end Suspend_Until_True;
>    end SO;

****************************************************************


Questions? Ask the ACAA Technical Agent