!standard D.15(15/2) 08-07-07 AI05-0094-1/02 !class binding interpretation 08-05-16 !status work item 08-05-16 !status received 06-03-28 !priority Medium !difficulty Medium !qualifier Error !subject Timing_Events should not require deadlock !summary D.15(15/2) should only require that the handler is executed as soon as possible. !question There seems to be a nasty bug in the rules for Ada.Real_Time.Timing_Events. D.15(15/2) says: 15/2 {AI95-00297-01} If a procedure Set_Handler is called with zero or negative In_Time or with At_Time indicating a time in the past then the handler is executed immediately by the task executing the call of Set_Handler. The timing event Event is cleared. "The handler is executed" means that Handler.all is called in the normal way for a protected procedure, locking the protected object. The problem occurs if the task in question is already inside that same protected object. Deadlock or other bad behavior is required by the above paragraph. The "task executing the call of Set_Handler" is exactly the _wrong_ task to be calling the handler. The scenario is: T : constant Time_Span := Zilliseconds (10); -- some small amount of time protected body ... is procedure Handler (Event : in out Timing_Event) is begin Set_Handler (Event, At_Time => Clock + T, Handler => Handler'Access); ... end Handler; In this example, we get a flaky deadlock. Perhaps it works most of the time, but if some unrelated process steals a little time, such that Clock + T has passed by the time Set_Handler does its thing, it deadlocks. How should this be fixed? !wording Modify D.15(15/2) as follows: If a procedure Set_Handler is called with zero or negative In_Time or with At_Time indicating a time in the past then the handler is executed {as soon as possible after the completion of}[immediately by the task executing] the call of Set_Handler. [The timing event Event is cleared.] AARM Ramification: The handler will still be executed. Under no circumstances is a scheduled call of a handler lost. AARM discussion: We say "as soon as possible" so that we do not deadlock if we are executing the handler when Set_Handler is called. In that case, the current invocation of the handler must complete before the new handler can start executing. !discussion Avoiding the loss of events is important to eliminate the need to program around race conditions. Otherwise, if something takes longer than expected, it might set an event with a time in the past, and then it would be necessary to be able to continue working even if some events are lost. That would complicate programs for no good reason. We delete "The timing event is cleared." from D.15(15/2), as the execution of the handler already clears the event (D.15(13/2)). And we don't want the event cleared before we start executing the handler (otherwise we again would be at risk of losing events). !corrigendum D.15(15/2) @drepl If a procedure Set_Handler is called with zero or negative In_Time or with At_Time indicating a time in the past then the handler is executed immediately by the task executing the call of Set_Handler. The timing event Event is cleared. @dby If a procedure Set_Handler is called with zero or negative In_Time or with At_Time indicating a time in the past then the handler is executed as soon as possible after the completion of the call of Set_Handler. !ACATS Test The ACATS tests for this feature should be adjusted to follow this semantics change; no further tests should be needed. !appendix From: Robert A. Duff Sent: Friday, March 28, 2008 8:02 PM There seems to be a nasty bug in the rules for Ada.Real_Time.Timing_Events. D.15(15/2) says: 15/2 {AI95-00297-01} If a procedure Set_Handler is called with zero or negative In_Time or with At_Time indicating a time in the past then the handler is executed immediately by the task executing the call of Set_Handler. The timing event Event is cleared. I presume "the handler is executed" means that Handler.all is called in the normal way for a protected procedure, locking the protected object. The problem occurs if the task in question is already inside that same protected object. Deadlock or other bad behavior is required by the above paragraph. The "task executing the call of Set_Handler" is exactly the _wrong_ task to be calling the handler. A NOTE indicates that such a scenario makes sense: 26/2 45 {AI95-00297-01} Since a call of Set_Handler is not a potentially blocking operation, it can be called from within a handler. ...and in particular it can be called within the _same_ handler. AI95-00297-01 has a couple of examples that call Set_Handler from within the protected object of the handler. The scenario is: T : constant Time_Span := Zilliseconds (10); -- some small amount of time protected body ... is procedure Handler (Event : in out Timing_Event) is begin Set_Handler (Event, At_Time => Clock + T, Handler => Handler'Access); ... end Handler; We get a flaky deadlock. Perhaps it works most of the time, but if some unrelated process steals a little time, such that Clock + T has passed by the time Set_Handler does its thing, it deadlocks. I think the solution is to delete the D.15(15/2), and rely on: 13/2 {AI95-00297-01} As soon as possible after the time set for the event, the handler is executed, passing the event as parameter. ... in all cases. By the way, this issue comes from real customer code. It took me most of the day to debug it. The customer reported that it ran fine for hours on an unloaded system, but on a heavily loaded system, it would sometimes hang. Yikes! **************************************************************** From: Robert A. Duff Sent: Sunday, March 30, 2008 2:26 PM Robert A. Duff writes: > There seems to be a nasty bug in the rules for Ada.Real_Time.Timing_Events. I had a discussion with Ed about this, and he asked me to forward it here. Here are the relevant excerpts: From: Bob Duff Edmond Schonberg wrote: > Can't we recognize in that case that this is an internal call? We > must be able to query the identity of the current protected object and > compare it with the target. We would have to generate code for this, > one branch calling the body of the unprotected operation, and the > other making the standard external call. ???? I thought about that, but it seems inappropriate. First, this would be the only place where Ada does "nested locking" (i.e. lock-if-not-already-locked). Second, it would require searching a set of PO's currently locked (we could be inside more than one). And we might be in some procedure called from a PO -- we don't statically know whether we're inside a PO. It all seems way too heavy for a language feature described like this: 1/2 {AI95-00297-01} This clause describes a language-defined package to allow user-defined protected procedures to be executed at a specified time WITHOUT the need for a TASK OR A DELAY statement. ... Implementation Advice 25/2 {AI95-00297-01} The protected handler procedure should be executed DIRECTLY by the real-time clock INTERRUPT mechanism. (emphasis added). I mean, if you want "heavy", something like "loop ... delay ...; ... end loop" seems more appropriate. Anyway, what's the point of the requirement to do it "immediately" and "by the same task"? I say, erase that requirement, since we already have a requirement "as soon as possible". From schonberg@adacore.com Sun Mar 30 11:22:49 2008 On Mar 30, 2008, at 11:11 AM, Bob Duff wrote: > Edmond Schonberg wrote: ... > Implementation Advice > 25/2 {AI95-00297-01} The protected handler procedure should be > executed > DIRECTLY by the real-time clock INTERRUPT mechanism. But this would indicate that there is no locking involved, precisely: just go and do it., this is urgent, no? ... > P.S. Did you see my message to ARG about it? I'm not sure it got > through... I saw it, and it seemed reasonable at the time, but rereading your message I had the impression that if we can determine that it is an internal call there is no additional locking. If it is an external call it's potentially blocking and we certainly don't have to do anything special given that it's wrong. From duff@adacore.com Sun Mar 30 12:25:35 2008 Edmond Schonberg wrote: > But this would indicate that there is no locking involved, precisely: > just go and do it., this is urgent, no? No, not if by "interrupt" we mean the model described in the SP Annex (attaching protected procedures to interrupts and so forth). That analogy seems apt. If an interrupt occurs while the interrupt handler is running, then it does not immediately cause the handler to start running again reentrantly. Instead, either the interrupt is lost, or the handler is triggered when the current invocation of the handler finishes. It's just like a normal protected object, except this level of "locking" happens in hardware (and the hardware is allowed to lose interrupts). Seems like we want the same semantics for the timing event handlers. By the way, the "bug" (or feature?) that I "fixed" is on our Linux version (and the same is used on most non-embedded systems. It makes no attempt to properly implement the intended real-time semantics, and is far from "directly" attached to interrupts. I suppose the MaRTE version may try to do it "right". > I saw it, ... OK, good. >...and it seemed reasonable at the time, but rereading your message I >had the impression that if we can determine that it is an internal >call there is no additional locking. If it is an external call it's >potentially blocking and we certainly don't have to do anything >special given that it's wrong. But Ada always distinguishes internal vs. external statically. I object to making that distinction at run time in one corner or the language, while the rest of the language is unchanged. Formally, the call to the handler (the one I deleted) is always an external call, since it is indirect -- Handler.all(Event) -- and indirect calls clearly need to be considered external, in general. P.S. Perhaps we should have this discussion on the arg list? From schonberg@adacore.com Sun Mar 30 13:09:54 2008 On Mar 30, 2008, at 12:25 PM, Bob Duff wrote: > Edmond Schonberg wrote: > >> But this would indicate that there is no locking involved, precisely: >> just go and do it., this is urgent, no? > > No, not if by "interrupt" we mean the model described in the SP Annex > (attaching protected procedures to interrupts and so forth). That > analogy seems apt. > > If an interrupt occurs while the interrupt handler is running, then it > does not > immediately cause the handler to start running again reentrantly. > Instead, > either the interrupt is lost, or the handler is triggered when the > current invocation of the handler finishes. It's just like a normal > protected object, except this level of "locking" happens in hardware > (and the hardware is allowed to lose interrupts). > > Seems like we want the same semantics for the timing event handlers. Not so sure. The description is that the completion of the Attach action is the invocation of the handler. It's not two separate actions (if I understand the description properly, which I guess is the issue!). >> ...and it seemed reasonable at the time, but rereading your message >> I had the impression that if we can determine that it is an internal >> call there is no additional locking. If it is an external call it's >> potentially blocking and we certainly don't have to do anything >> special given that it's wrong. > > But Ada always distinguishes internal vs. external statically. > I object to making that distinction at run time in one corner or the > language, while the rest of the language is unchanged. Agreed, unless it's simple to do. And annex D is a corner of the language with lots of special characteristics. ... > P.S. Perhaps we should have this discussion on the arg list? Yes, B&W's opinion would be very useful here. Can you forward? **************************************************************** From: Robert A. Duff Sent: Sunday, March 30, 2008 2:36 PM If we have: protected body ... is procedure Handler (Event : in out Timing_Event) is begin Do_This; Set_Handler (Event, At_Time => TT, Handler => Handler'Access); Do_That; end Handler; If time TT is during Do_This, or during Do_That, or during Set_Handler, I claim we want the same behavior in all these cases: as soon as Handler is done, Handler should be executed again. But certainly not in the middle of Handler -- that would defeat the purpose of "protected" objects. Furthermore, the task performing the next call to Handler need not be the same task that calls it this time. I don't understand why the RM talks about which task, in a particular case, but leaves the task (if any!) unspecified in other cases. Another possibility, given that the Impl Advice suggests that the handler should be "directly" attached to the timer interrupt mechanism (whatever that means), is that we should allow interrupts to be lost. (Allow, not require.) Because that's how normal interrupts work in the SP annex. **************************************************************** From: Pascal Leroy Sent: Monday, March 31, 2008 2:06 AM >I think the solution is to delete the D.15(15/2), and rely on: > > 13/2 {AI95-00297-01} As soon as possible after the time set for the event, > the handler is executed, passing the event as parameter. ... > >in all cases. I am uncomfortable with this. I think it's a good idea to precisely specify what happens when the given time is in the past. After all, one possible definition would be that nothing happens in this case (the event is not set, the handler is not called). My preference would be to keep D.15(15/2), but to replace "immediately by the task executing the call of Set_Handler" with "as soon as possible". This would remove the overspecification, and the deadlock. **************************************************************** From: Tucker Taft Sent: Monday, March 31, 2008 1:00 PM I basically agree with Pascal, though I think you also need to address the last sentence of D.15(15/2): ... The timing event Event is cleared. Do we believe that on return from Set_Handler the timing event is cleared? I would say not. My model of how this would be handled is that a "pseudo-interrupt" would be triggered, whose handling would potentially be deferred if the caller is already "inside" an interrupt, and once the caller returns to the non-interrupt level, the timing event's handler would be called, and that is when the timing event would be cleared. **************************************************************** From: Robert A. Duff Sent: Monday, March 31, 2008 1:37 PM What Pascal says, with Tuck's suggested modification, seems fine to me. Question: Should it be guaranteed that no timing events are lost? Or is it like the interrupts in section C.3, where if an interrupt is "generated" during the handler, it is impl-def whether the new one is lost? **************************************************************** From: Tucker Taft Sent: Monday, March 31, 2008 2:04 PM That's a tough question. I'll leave it to Alan, Andy, and friends to answer that one. **************************************************************** From: Alan Burns Sent: Tuesday, April 1, 2008 3:11 AM I'm just catching up with emails after being away for a few days. I'll read through these and get back - but perhaps not for a few day **************************************************************** From: Robert I. Eachus Sent: Saturday, April 5, 2008 7:22 PM >What Pascal says, with Tuck's suggested modification, seems fine to me. > >Question: Should it be guaranteed that no timing events are lost? >Or is it like the interrupts in section C.3, where if an interrupt is >"generated" during the handler, it is impl-def whether the new one is >lost? You have real customer code, so check it out. My feeling is that losing the event would be just as bad, from the user's point of view, as a deadlock. Think of it this way, how should the user write the code so that there is no deadlock, and no scheduled events are lost? We want that to be the easiest case. If the user wants to skip past events, it is not difficult to wrte code to do so. But we don't want to define timing races where the effect of a call to Set_Interrupt is unpredictable. **************************************************************** From: Alan Burns Sent: Wednesday, April 9, 2008 6:29 AM The Timing Events issue has been discussed over the last few days by the IRTAW group. It concludes 1) yes there does seem to be a problem with the current definition! (which I think I wrote -sorry) 2) timing events should not be lost - ie even if the time specified is in the past the handler should still be executed - if this were not the case then it would be difficult/impossible to write code that was not subject to race conditions. 3) the solution to just say that 'the execute of the handler should occur at the earliest opportunity' seems the right approach (ie just keep with the current overall requirement of 'immediately'.) We had some discussion about returning an exception or a flag to indicate that the 'time' specified was in the past, but these ideas seemed to produce more problems than they solved! So, to conclude, agreement with what was proposed by ARG PS I am now out of the office for a couple of week, so sorry if anyone needs clarification on this. **************************************************************** From: Alan Burns Sent: Tuesday, April 22, 2008 4:13 AM Just to add a (final) point, the two implementations MaRTE and ORK, both do what we now think is the right thing to do: > In MaRTE we are in the same situation than in ORK (and for the same > reason: I didn't realize about that point in the RM). > > We set the hardware timer to expire "now" but, since interrupts are > disabled while inside the PO (Interrupt_Priority ceiling), interrupt > is not served until the protected action finishes. > > Consequently, and just by chance, we are already using the "now+delta" > approach **************************************************************** From: Robert A. Duff Sent: Tuesday, April 22, 2008 9:40 AM > Just to add a (final) point, the two implementations MaRTE and ORK, > both do what we now think is the right thing to do: Oh, good. Thanks for letting us know. I fixed the problem in our normal run-time system, but I wasn't sure about MaRTE. I have an item on my to-do list somewhere to worry about whether MaRTE is doing the right thing... ****************************************************************