Version 1.5 of ais/ai-00259.txt

Unformatted version of ais/ai-00259.txt version 1.5
Other versions for file ais/ai-00259.txt

!standard C.6 (16)          03-01-15 AI95-00259/03
!standard C.6 (21)
!class binding interpretation 03-01-10
!status work item 02-04-19
!status received 01-02-12
!qualifier Omission
!priority Medium
!difficulty Medium
!subject Can accesses to volatile objects be combined?
!summary
Implementations may not combine accesses to volatile objects with accesses to other objects.
!question
Consider the following program:
X : Byte; pragma Atomic (X); Y : Byte; pragma Atomic (Y);
X := ...; Y := ...;
Can the memory writes to atomic objects X and Y be combined into a single store operation? (No.)
!recommendation
(See summary.)
!wording
(See corrigendum.)
!discussion
Volatile objects are intended (among other uses) to support communication between Ada programs and non-Ada software or hardware devices. Some hardware devices require access in particular ways. For example, some devices must be accessed with a byte operation; a word operation does not work.
If memory reads and writes of volatile objects are combined into a single operation, it is not possible to write Ada code which can safely access such a device. Even if it works on the current version of a compiler, a newer compiler with better optimizations may break the program.
Therefore, we extend the definition of volatile to insure that accesses are not combined. Since Atomic implies volatile, this means that the writes to X and Y in the question cannot be combined.
But this does not go far enough. We cannot talk about bits that do not belong to any Ada object at the Ada semantic level. What we really want is to say that accesses to a volatile object can access only the bits belonging to that object, and no others. That requires implementation advice.
We have to be careful, however, as Volatile may be applied to any Ada object. If the object is not an even multiple of the Storage_Unit size, it probably is not possible to access it without accessing other bits. Since subcomponents of a Volatile object are also Volatile, single components are going to frequently occur. We don't want to adopt advice that is impossible to follow, thus we limit the rule to objects that have a size which is a multiple of System.Storage_Unit.
An atomic object is used when the object must be accessed indivisibly. Generally, that should be done with a single instruction, so we've added implementation advice to say that as well.
Note that an implementation that does not follow implementation advice is required to document that, so users will know if an implementation is suitable for accessing hardware directly.
!corrigendum C.06(16)
Replace the paragraph:
For a volatile object all reads and updates of the object as a whole are performed directly to memory.
by:
For a volatile object all reads and updates of the object as a whole are performed directly to memory, and shall not be combined with reads or updates of other objects.
!corrigendum C.06(21)
Insert after the paragraph:
If a pragma Pack applies to a type any of whose subcomponents are atomic, the implementation shall not pack the atomic subcomponents more tightly than that for which it can support indivisible reads and updates.
the new paragraphs:
Implementation Advice
A load or store of a volatile object whose size is a multiple of System.Storage_Unit should be implemented by accessing exactly the bits of the object and no others.
A load or store of an atomic object should, where possible, be implemented by a single load or store instruction.
!ACATS test
This is not testable without examining the generated code (which is prohibited by the ACATS charter).
!appendix

!topic Clarification of pragma atomic is requested
!reference RM95-C.6(20)
!from Stephen Doiel 01-01-31
!keywords pragma atomic atomic_components
!discussion
A thread titled "Help with Atomic_Components" on the newsgroup comp.lang.ada
revealed a disagreement between experts in the field (Robert Dewar and
Tucker Taft) over the behavior described by "pragma atomic" and "pragma
atomic_components".

In this discussion addition of the following implementation advice to RM
Section C.6 was suggested:

   Implementation Advice
   ---------------------

  A load or store of an atomic object should, where possible, be implemented
by a single load or store instruction which accesses exactly the bits of the
object and no others. The implementation should document those instances in
which it is not possible to follow this advice.


According to Tucker Taft this was the original intent.  According to Robert
Dewar this is unclear in the RM.

***********************************************************

From: dewar@gnat.com
Sent: Wednesday, January 31, 2001 11:30 PM

That's misleading, Tuck says it was the original intent that the
*requirements* specificy this behavior. Robert says it is not even
possible to have a normative requirement, and for sure Tuck and Robert
agree that the RM does not somehow contain this unmentioned implementation
advice.

The IA was simply my suggestion of how practically to get some of what
Tuck says he intended by the requirement. I don't even know if Tuck
agrees with this approach, since he has not said so :-)

***********************************************************

From: Tucker Taft
Sent: Wednesday, January 31, 2001 3:37 PM

Robert and I have been debating the appropriate interpretation
of pragma atomic_components on comp.lang.ada.

At this point it seems like an ARG-relevant discussion.

See the attachment for Robert's response to one of my notes.

Robert's response to someone else's comment follows inline.

Any comments from ARG-ers?

-Tuck
--------------------

In article
<95Nd6.343422$U46.10481049@news1.sttls1.wa.home.com>,
  "DuckE" <nospam_steved94@home.com> wrote:
> I find the difference in interpretation of AARM C.6(20)
> interesting.

Remember that the AARM is not an official document, and not
part of the official standard, so you can use it to try to
understand the motivation behind the standard, but it never
adds anything.


> My interpretation of this statement is: if my program
> contains two separate assigments to two distinct variables
> for which pragma atomic applies, these assignments will be
> performed as two distinct operations.  Since the two
> assignments appear as separate assignments in the code, if
> the performed these assignments are combined as single
> operation, an update is being performed that my program did
> not specify.

First, this statement is not part of the standard, so you
cannot use it in interpreting what conformance means.

But just for the moment, suppose this statement *were* part
of the standard.

Any *semantic* rule in the standard is always an "as-if" rule.
This is fundamental to the nature of semantic specification.
This means that if two possible translations have the same
semantic effect, then they are equivalent.

So I ask you the same question I asked Tuck, namely please
provide the program that will (at least conceptually) show
that the translation you claim is incorrect is semantically
non-equivalent to separate stores.

The trouble is that the semantic domain of the RM is not
at the right level of abstraction to talk about machine
instructions.

THat's why it is often better and more precise to make
requirements of this kind into implementation advice.

In this particular case, the issue of whether to make
this a requirement or IA did not arise, since it is not
stated as a requirement in any case, and (at least speaking
for myself as a reviewier) I had no idea that the design
team intended this requirement. I thought of pragma Atomic
basically as a renaming of the (confusingly named) pragma
Shared in Ada 83, and it was certainly presented this way.

> Since there is obviously some confusion over this issue
> perhaps the AARM should be revised?

That's irrelevant, since the AARM is not an official document.
The only way to resolve confusion on this issue is to send a
comment following the RM procedures, and have the ARG address
the issue.

I would think that the appropriate approach would be to
introduce implementation advice, something to the effect

Implementation Advice
---------------------

A load or store of an atomic object should, where possible,
be implemented by a single load or store instruction which
accesses exactly the bits of the object and no others. The
implementation should document those instances in which
it is not possible to follow this advice.

-----------
The reason this should be IA is that in IA, we are allowed
to talk about things like load and store instructions, and
we can intepret a statement like this in a helpful pragmatic
manner, whereas if it appeared as a formal requirement, it
would be meaningless (since it contains many undefined terms,
and would be susecptible to the as-if semanitc interpretation
which we specifically do NOT want in this case).

I think it is quite reasonable to consider adding some
IA of this kind. I suspect that most implementations can
follow this easily enough -- what is missing is documentation
of when it is not possible.

***********************************************************

From: Tucker Taft
Sent: Wednesday, January 31, 2001 4:16 PM

The attachment got screwed up somehow.  Here it is inline:

In article <3A76E3B9.BD806841@averstar.com>,
  Tucker Taft <stt@averstar.com> wrote:
> I don't agree with Robert's reading of this.  It seems very
> clear that you cannot combine multiple updates of separate
> atomic objects into a single write.

I disagree with Tuck. Please provide a test program where it
would be theoretically possible to determine that combining
writes had a semantic effect.

> One of the whole points of atomic
> is to make each write separate and indivisible.

Indivisible, yes. Separate no. You may have intended to say
this, and you may have thought it, but I don't see that the
RM says, or even implies this.

> It was designed specifically to address issues relating to
> "active" memory, where breaking the write down into multiple
> writes, or combining multiple writes into a single write,
> would confuse the device.

This is a total surprise to me, and certainly there is nothing
in the RM that would reveal this hidden design intent. For
example, an implementation which locks the bus and then does
a whole sequence of writes for an atomic variable, and then
unlocks the bus, CLEARLY meets the semantics of the RM.

This is a case of the designer reading more into what's there
than what was written. My own view is that it is almost
impossible to achieve the design goal that Tuck quotes, except
on an implementation advice basis. I had no idea that the
design team had this view of pragma Atomic (because nothing
they ever wrote in the RM implied this viewpoint).

Note that the detailed list in AARM C.6(20) of disallowed
transformations has no hint at all of the "separate" part of
Tuck's claim, and conspicuously does not list combining writes
as an improper transformation.

Tuck, having something in your mind is not good enough if you
do not write it down :-)

> In general, that means each read/update is a separate
> instruction.

No, there is absolutely NO implication of this

> Which particular instruction is not specified, though of
> course in most cases it will be a load or a store.

> I don't agree with the advice of slipping into machine code
> to accomplish all of these kinds of things, if atomic can do
> the job.

But it can't! In particular, suppose you have four atomic
byte variables next to one another. It is perfectly fine to
do a word load followed by a shift and mask, but likely this
would play havoc with memory mapped I/O.


> I would agree that with the original code, which used an
> aggregate assignment, there is no requirement to perform that
> as separate byte assignments.  However, if the code is
> written as a sequence (or a loop) of separate assignments,
> and the objects are atomic, it is a bug (in my view) if the
> compiler combines these assignments  into a single
> assignment.

You repeat this, but the RM does not support this view point.

> Similarly, reordering such assignments
> would be a bug, since it would change the external effect of
> the program in an impermissible way.

No, this is not similar at all. Everyone agrees that reordering
is not permitted, because you can see external effects not
corresponding to the canonical order. The RM is quite explicit
in this case, and indeed AARM C.6(20.c) gives an explicit
example of this not being allowed, but the AARM does not list
combining.

Too bad you did not list the combining case in the AARM Tuck,
then we would at least have known what was in your mind, and
could have discussed the point that the RM wording does not
happen to capture this intent (and as I say, I think it would
be almost impossible to capture this intent).

For all the examples in AARM C.6(20), you can devise simple
pure Ada tests that can conceptually malfunction because of
a task switch at a bad point (it may be tricky to actually
do the run that fails, since it depends on very precise
timing). But for the combining case, you cannot devise a
pure Ada test that even conceptually fails.

The idea that pragma Atomic solves this problem is an old
confusion, going way back to Ada 83 days, I thought we had
put that to rest in Ada 95, because we never discussed the
idea that Atomic *did* solve this problem. I am somewhat
amazed to see the claim that it was addressed :-)

***********************************************************

From: Norman H Cohen
Sent: Thursday, February 01, 2001 8:57 AM

Robert is right.

Let a1,...,an be the stores (to atomic variables) that a compiler performs
as a block.  One of the many allowable execution orders for the program is
one in which these stores occurred consecutively, with no intervening
loads.  That execution order is semantically indistinguishable from one in
which the stores are performed simultaneously.

An implementation is always permitted to generate code whose effect is that
of one of the allowable execution orders.

***********************************************************

From: dewar@GNAT.COM
Sent: Thursday, February 01, 2001 9:19 AM

And this statement by "formal Norman" is exactly right, and I use this
term in a quite intended manner, because the point is that you CAN only
take a formal view of formal requirements, and the position that Norman
takes, while it may be irritating to the pragamatists (a group that this
time includes Tuck), it is the ONLY possible position that makes sense
with respect to the actual requirements.

That is why I suggest the use of IA, as I argued in CLA, IA is often
much more powerful than formal requirements, precisely because in IA
we can say things informally and pragmatically, and they are NOT
subject to the "as-if" formalism that we are forced to apply to
normative requirements.

Hello Norman, we have not heard from you for a while :-) It is good to
have you back in the discussion!

***********************************************************

From: Tucker Taft
Sent: Thursday, February 01, 2001 1:58 PM

Norman H Cohen wrote:
>
> Robert is right.
>
> Let a1,...,an be the stores (to atomic variables) that a compiler performs
> as a block.  One of the many allowable execution orders for the program is
> one in which these stores occurred consecutively, with no intervening
> loads.  That execution order is semantically indistinguishable from one in
> which the stores are performed simultaneously.

I don't see how you can say that.  Each store is considered a
separate external effect.  Since these objects are atomic (and
hence volatile as well), we must presume that there is some
other external entity viewing the stores as they occur.  Clearly
4 single-byte writes are not necessarily the same thing as
one 4-byte write to this external entity.

>
> An implementation is always permitted to generate code whose effect is that
> of one of the allowable execution orders.

That is true, but I don't see how you can justify taking turning 4
separate externally visible actions into 1 action.  If this were
"passive" memory with no other external entities viewing the data,
then presumably the equivalence might be valid.  But with the
external (potentially non-Ada) entities in the picture, combining
the actions is changing the external effect in an impermissible way.

***********************************************************

From: dewar@GNAT.COM
Sent: Thursday, February 01, 2001 2:29 PM

No, we must not "presume" this at all. Pragma atomic at the Ada semantic
level has to do with task interactions, and it has meaningful semantics
there.

Nothing is clear about what is or what is not clear to a "this external
entity". FOr example, "this external entity" may require 145 nanoseconds
to elapse between the stores to recognize them, or anything else. You
can't just invoke deus ex machina arguments that support your particular
view of what you want the meaning of the language to be.

<<That is true, but I don't see how you can justify taking turning 4
separate externally visible actions into 1 action.  If this were
"passive" memory with no other external entities viewing the data,
then presumably the equivalence might be valid.  But with the
external (potentially non-Ada) entities in the picture, combining
the actions is changing the external effect in an impermissible way.
>>

You say this, but note that you do not refer to any supporting arguments
in the RM, and there simply are none.

Pragma Shared in Ada 95 was all about task interactions. When we discussed
pragma Atomic, the basis of the discussion was simply that we were changing
the name to prevent confusion. I don't recall any introduction of a completely
new set of semantic pseudo-requirements (along the lines of my proposed IA),
and certainly there is no record of any such requirement in the RM. The notion
of external effect is simply too vague to hang your interpretation on in
this way.

Note that the word separate does NOT appear in paragraph 20:

20   The external effect of a program (see 1.1.3) is defined to include each
read and update of a volatile or atomic object.  The implementation shall not
generate any memory reads or updates of atomic or volatile objects other than
those specified by the program.

And Norman and I would maintain that combining the stores meets the requirement
that the effect of these stores is an external effect.

Furthermore, the list of disallowed optimizations in the AARM (it has no
force, but it shows state of mind) does NOT include combining the stores,
or in any way hint that the author of this section of the AARM had this
in mind.

At no point is there any support for Tuck's viewpoint in what is said, and
it is interesting to note that we do not see a detailed argument from the
RM here, just a repetition of what seems pragmatically reasonable.

Most certainly we have always accepted that a legitimate implementation of
pragma Atomic is to allow it on large objects and do specific lock and
unlock operations (indeed I seem to remember that at one point, the design
team specifically talked about this implementation, and it was me who noted
that this was not what pragma Atomic was about in practice). Most certainly
an implementation that did do such locking would be conforming, but would
not, presumably, meet Tuck's viewpoint, since the one external effect would
be broken down into a series of separately visible (by the mysterious non-Ada
oracle) operations.

Also, think about caching? Is there any requirement that writes to atomic
variables be done with write-through cache operations on machines where
this makes sense? For controlling external stuff, this might be necessary,
but it is way outside the scope of the formal language definition.

For instance, an implementation in which the four atomic writes were done
to a cache (let's suppose we are talking coherent caches on an MP machine,
so there is no issue of incoherent caching between tasks running on
separate processors, which of course would violate the semantics of
pragma Atomic). The writes happen to the cache, and the actual store
to memory is of a whole cache line.

Such an implementation is clearly valid from the RM, as long as other Ada
tasks see the atomic writes, but again would violate the oracle that Tuck
calls on to intepret the semantics here (the mysterious [because undefined]
external entity)

I don't think it is ever going to fly to try to argue that the curent RM
clearly requires Tuck's semantics, that's just too hard an argument.

At best, we can decide this is something that needs ARG resolution.

And there, I *strongly* argue in favor of doing this with implementation
advice, because I think the effect of IA is much stronger than bogus
requirements in this area.

***********************************************************

From: Randy Brukardt
Sent: Thursday, February 01, 2001 5:52 PM

While I don't feel strongly on this issue, I do think that Robert's argument
certainly matches the formal wording of the standard. I do think that this
is worth the ARG discussing (but I'm not volunteering to write the AI!).

What does the rest of the ARG think? Should this be an AI?

***********************************************************

From: Jean-Pierre Rosen
Sent: Friday, February 02, 2001 2:32 AM

> No, we must not "presume" this at all. Pragma atomic at the Ada semantic
> level has to do with task interactions, and it has meaningful semantics
> there.

Certainly, but you can't say this of Pragma Volatile. And Atomic implies
Volatile. It would maybe be more logical that not combining stores be a
property of Volatile, not Atomic, but then it would apply to Atomic as well.

***********************************************************

From: Pascal Leroy
Sent: Friday, February 02, 2001 2:34 AM

> Clearly 4 single-byte writes are not necessarily the same thing as
> one 4-byte write to this external entity.

Clearly 4 single-byte writes separated by 0 ns are indistinguishable from one
4-byte write.  Since timing considerations do not play a role in external
effects, I don't think your argument holds water.  I side with Robert.

Do we need an AI?  Well, I cannot get too excited about this since it is the
type of problem that is best resolved by market pressure.  Because the
difference between Robert and Tuck's interpretation is not testable, we don't
run the risk that this issue will cause someone to fail validation.  So I'd
rather not lose precious ARG time on angels-on-a-pinhead discussions.

***********************************************************

From: dewar@GNAT.COM
Sent: Friday, February 02, 2001 9:16 AM

I strongly agree, and I think that the discussion of what the RM does or
should *require* is precisely an AOAP argument.

But why not issue an AI with the IA I suggested, I fail to see how that
can be controversial, or cause any implementation burdens, and it is
definitely helpful (it prevents other things than simply combining
writes, as I have noted in previous messages).

There is a real point here which is the following. Is this or is this
not a recommended style of Ada programming:

    X : Byte;
    for X'Address use .. some mem mapped address
    pragma Atomic (X);

where the programmer now wants to assume that byte read/write instructions
will be issued when addressing X.

The conventional wisdom for some of us has been to say, NO, this is not
recommended practice, because there is no guarantee, or even a hint of
an implication, that it can be expected to work.

With Tuck's interpretation (or Dewar's if the implementation advice I
suggest is added), then the recommendation is that this IS an appropriate
style of Ada programming for low level systems stuff.

Do we really need a lot of argument over putting in the IA, I really
think it would be very helpful.

***********************************************************

From: dewar@GNAT.COM
Sent: Friday, February 02, 2001 9:12 AM

<<Certainly, but you can't say this of Pragma Volatile. And Atomic implies
Volatile. It would maybe be more logical that not combining stores be a
property of Volatile, not Atomic, but then it would apply to Atomic as
well.>>

Right, but where do you find anything about not combining stores for
Volatile variables in the RM? Answer you do not. Here is all the RM
has to say:

16   For a volatile object all reads and updates of the object as a whole are
performed directly to memory.

Combing of writes certainly meets this rule

20   The external effect of a program (see 1.1.3) is defined to include each
read and update of a volatile or atomic object.  The implementation shall not
generate any memory reads or updates of atomic or volatile objects other than
those specified by the program.

Combining of writes certainly does not violate this rule, since a write
of two variables at once certainly "includes" the update of each of them.

Indeed para 20 applies equally to volatile and atomic, so we do not even
need to consider it to appraise your argument that the rules for volatile
are different from atomic, but atomic includes volatile, therefore ...

The only rules for volatile that are separate from those for atomic are
in para 16.

Once again, you cannot simply say what you would LIKE to be the case, you
must argue from what is in the RM.

***********************************************************

From: Tucker Taft
Sent: Friday, February 02, 2001 9:26 AM

Pascal Leroy wrote:
>
> > Clearly 4 single-byte writes are not necessarily the same thing as
> > one 4-byte write to this external entity.
>
> Clearly 4 single-byte writes separated by 0 ns are indistinguishable from one
> 4-byte write.

They are distinguishable because the bytes are being potentially sent out to
memory in a different sequence. The 1-byte writes come out in the
order specified in the program. The single 4-byte write may come out
in parallel, or in reverse sequence, or in random sequence.
Remember we are talking about "volatile" memory locations.

> ... Since timing considerations do not play a role in external
> effects, I don't think your argument holds water.

I don't understand this. First, timing certainly can be relevant
to external effects (e.g. if a delay separates the two actions).
But in this case we are talking about a sequence of
1-byte writes. When dealing with volatile memory the sequence
of writes matters, even if there is no significant time between them.

> ... I side with Robert.
>
> Do we need an AI?  Well, I cannot get too excited about this since it is the
> type of problem that is best resolved by market pressure.  Because the
> difference between Robert and Tuck's interpretation is not testable, we don't
> run the risk that this issue will cause someone to fail validation.  So I'd
> rather not lose precious ARG time on angels-on-a-pinhead discussions.

This is definitely not an angels-on-a-pinhead discussion if we are
trying to make it possible to interact predictably with "active,"
volatile memory as is used in memory-mapped I/O.

***********************************************************

From: Tucker Taft
Sent: Friday, February 02, 2001 9:45 AM

Randy Brukardt wrote:
>
> While I don't feel strongly on this issue, I do think that Robert's argument
> certainly matches the formal wording of the standard. I do think that this
> is worth the ARG discussing (but I'm not volunteering to write the AI!).
>
> What does the rest of the ARG think? Should this be an AI?

Certainly I have always presumed that pragmas Volatile and Atomic
are at least in part designed to support interacting with memory-mapped
I/O devices.  With such devices, 4 1-byte writes can be significantly
different from 1 4-byte write.  We should decide one way or the other on this.

If you read the Ada 95 Rationale, section C.5, it seems quite
clear that atomic and volatile are intended to support communication
between Ada programs and non-Ada software or hardware devices.
The example shows a case where several atomic and volatile
components are sequential in memory, and each one serves a different
purpose.  "Ganging" updates to them could be very dangerous,
and might not work at all on some hardware.

The important debate here is what are the underlying goals of
pragma Atomic/Volatile.  Once we agree on that, we can see whether
the RM words need to be modified to make sure that vendors support
those goals, and that users aren't mislead into presuming the
wrong thing.

In my view, since I believe atomic/volatile were intended to support
communicating with hardware devices, and that hardware
devices can react quite differently to 4 single byte writes vs.
one 4-byte write, this implies that the RM should have
words to prevent the compiler from ganging sequential updates.
I interpret the words in C.16 and C.20 to imply that.  Apparently
others find it less clear.

If we agree that atomic/volatile are intended
to support communication with hardware devices,
and we agree that typical device hardware will often distinguish
4 1-byte writes from 1 4-byte write, then we should add words
to prevent ganging of updates.  If we don't agree about the
intent to support interactions with hardware, then we should
figure out what we do agree on.

***********************************************************

From: dewar@GNAT.COM
Sent: Friday, February 02, 2001 10:13 AM

OK, my position is this. If Tuck wants to continue with the dubious argument
that the RM requirements currently require this, then he is in favor of
no action.

In which case, I concur, no action, and no AI. It is a silly argument to
pursue. You have two people saying

1. I think the RM is clear and does not need change

The other says

1. I think the RM is quite unclear, and if you want to make your interpretation
a change is desirable

The first person says

1. I think the RM is clear and does not need change

OK, if that is the game, I side with Pascal, no action, no AI, and no further
discussion is worth while.

I had hoped we could get quick agreement that the IA I proposed will be
helpful.

Since this is not the case, let's just abandon it I would say (and I will
continue to advise people that it is improper non-portable Ada to make
any assumptions here).

***********************************************************

From: Tucker Taft
Sent: Friday, February 02, 2001 10:43 AM

dewar@gnat.com wrote:
> I had hoped we could get quick agreement that the IA I proposed will be
> helpful.

I think your IA would be helpful, sorry if I implied otherwise.
I believe your IA expresses the intent of atomic/volatile.  Here it
is again for those who missed it:

  Implementation Advice
  ---------------------

  A load or store of an atomic object should, where possible,
  be implemented by a single load or store instruction which
  accesses exactly the bits of the object and no others. The
  implementation should document those instances in which
  it is not possible to follow this advice.

If we can agree on this IA, then I'm more than satisfied.

I also wouldn't mind making it an implementation requirement, and
that pragma Atomic should be rejected if it can't be met,
though I could live with it just being advice.

***********************************************************

From: dewar@GNAT.COM
Sent: Friday, February 02, 2001 10:50 AM

I would object to this for two reasons

1. It is over-restrictive, pragma Atomic may be useful even when this
requirement cannot be met, e.g. on an i860, it is perfectly reasonable
to issue a

lockbus
move 32 bits
move 32 bits
unlockbus

sequence of four instructions to achieve an atomic read/write of 64-bits
and I see no reason to prevent an implementation from doing this. It is
quite enough to require documentation of the cases where a single load/store
cannot be used.

2. Making this an implementation requirement weakens it. Why? Because it is
not well defined as a semantic requirement (what is a load or store
instruction anyway, especially if you consider situations like conversion
to C or implementations on the JVM). Furthermore, even if it WERE well
defined, it is subject to the as-if rule, and it is definitely arguable,
because of the lack of definition of load/store, how as-if would play
out here. Normally the criterion is that an *Ada* program could be written
to show the semantic difference.

That is why it is so much MORE forceful to have something like this as
implementation advice. For advice, we interpret things much more informally,
using our normal pragmatic knowledge, and when it comes to issues like
interaction with typical external devices, on typical machines, then
informal pragmatic language is just the right domain.

In other words, we can say things in IA that are clear, unambiguous, a
d non-controversial, which if we try to make them into formal requirements,
raise all sorts of difficulties.

***********************************************************

From: Tucker Taft
Sent: Friday, February 02, 2001 12:46 PM

dewar@GNAT.COM wrote:
>
> I would object to this for two reasons
> ...
> In other words, we can say things in IA that are clear, unambiguous, a
> d non-controversial, which if we try to make them into formal requirements,
> raise all sorts of difficulties.

Fair enough.

***********************************************************

From: Randy Brukardt
Sent: Friday, January 10, 2003  9:34 PM

I've been working on my homework. At the bottom of this note, you'll find my
rewrite of AI-259 [this was version /02 - ED], based on the minutes of the
Bedford meeting.

I have two problems with the revised AI:

1) I have no idea why we've extended this to cover volatile. The minutes are no
   help (not surprisingly). I vaguely recall Mike Yoder saying something about
   multiprocessors. I need a good example of why combining writes for a
   volatile object is a bad thing (beyond people using the pragmas for purposes
   for which they were not intended).

2) I don't think the fix solves the original problem. The *question* is
   answered, but the core e-mail point was not. To steal an a question and
   example from Robert Dewar: Is this or is this not a recommended style of Ada
   programming:

    X : Byte;
    for X'Address use .. some mem mapped address
    for X'Size use 8;
    pragma Atomic (X);

where the programmer needs only byte read/write instructions to be used to
access X.

For this to work, the object and *only* the object in question can be accessed.
The proposed rule:

"For a volatile object all reads and updates of the object as a whole are
performed directly to memory, and shall not be combined with reads or updates
of other objects."

That would prevent the compiler from writing other Ada objects. But it doesn't
say anything about bits that aren't part of any (Ada) object. The compiler
still could write to unused bits near X if it wanted to. But of course that
doesn't have the right effect.

We've also lost the value of the documentation that came with the (rejected)
Implementation Advice. The IA suggested the use of a single instruction for
accesses to Atomic objects, and required documentation when that wasn't
possible to meet. While I can't see a single byte access taking multiple
instructions, it certainly is possible with word registers. In the similar
example:

    X : Word;
    for X'Address use .. some mem mapped address
    for X'Size use 16;
    pragma Atomic (X);

we want to discourage using a pair of byte writes here.

So I think that we still need the implementation advice, for Atomic only:
"A load or store of an atomic object should, where possible, be implemented
by a single load or store instruction which accesses exactly the bits of the
object and no others. The implementation should document those instances in
which it is not possible to follow this advice."

***********************************************************

From: Tucker Taft
Sent: Saturday, January 11, 2003  9:04 AM

I agree you should indicate that any memory outside of
the volatile object (including memory that is part of other
objects) should not be read or written.

I agree that we should keep the advice about single
instructions for atomic objects.

As far as why extend this to "volatile" -- the reasoning
I believe was that everyone agreed that volatile was
useful for communicating with memory-mapped I/O registers,
whereas there were a number of people who felt "atomic"
was not for that purpose (though the Ada 9X design team
would not have agreed with them).  It seemed less confusing
for these people to associate these rules with volatile
rather than with atomic.  And since volatile covers atomic,
those who preferred the focus on atomic didn't think it
was worth arguing about further ;-).

***********************************************************

From: Robert Dewar
Sent: Sunday, January 12, 2003  7:32 PM

> "For a volatile object all reads and updates of the object as a whole are
> performed directly to memory, and shall not be combined with reads or
> updates of other objects."

I think this should be implementation advice. It is really impossible to
interpret the above sentence formally. What does it mean "directly to
memory", what does "combine" mean. The RM is simply not in the business
of specifying machine language translation sequences, but rather semantic
effects.

***********************************************************

From: Robert Dewar
Sent: Sunday, January 12, 2003  7:34 PM

Just to be clear here, I think that IA would be much stronger in practice
than junk meaningless non-formally interpretable requirements.

***********************************************************

From: Robert Eachus
Sent: Thursday, January 16, 2003  11:12 PM

> I've been working on my homework. At the bottom of this note, you'll find my
> rewrite of AI-259, based on the minutes of the Bedford meeting.
>
> I have two problems with the revised AI:
> ...

May I say that the whole AI as written is as phony as a three-dollar bill?

Actually I probably shouldn't as that considerably understates the problem.
There are some processors in embedded and signal-processing applications
without multi-level caches.  But the reality today is that you had better
assume that any processor of interest:

1) Has no way of reading less than a cache line of data.  (Yeah, I know.  Some
processors like the Pentium 4 allow reading half a line into a cache line and
marking it as such.  But whether or not this happens is usually dependent on
memory access paterns instead of the specific read issued.  Also, it is
possible to use PREFETCHNTA and MOVNTQ to move data into and out of the MMX
registers in some x86 processors bypassing the cache.  But even if those
instructions are available, they do not promise not to read data into cache,
only to minimize cache pollution.)

2) Even if the compiler can use draconican means to bypass the caches, do you
really want that, or get what you want?  The most interesting answer to this is
the new AMD Opteron (and Athlon 64).  The main memory controller is part of the
CPU, and all memory accesses go through the cross-bar switch also on the CPU
chip.  In other words, not only requests from other CPUs, but DMA and other
memory reads and writes from the video card and I/O devices will be filled from
L1 or L2 caches if the data is there.  In such a situation, do you even care if
the flush to main memory ever occurs?

We all know what the AI is trying to say.  But we shouldn't get into the
problem of overspecification.  If the memory system is coherent, all we care is
that stale data is not used in I/O or written over more recent data.  How the
processor and compiler accomplish this doesn't belong in the RM.

***********************************************************

From: Robert I. Eachus
Sent: Friday, January 17, 2003  8:45 AM

May I say that the whole AI as written is as phony as a three-dollar bill?

I meant to save this message to work on more today, and apparently sent
it instead.  The comment above comes across as way to strong,
but it should have been directed, even if appropriate, at only:

For a volatile object all reads and updates of the object as a whole are
performed directly to memory, and shall not be combined with reads or
updates of other objects.

The problem in this case is that it is a completely načve assumption that two
separate assembler instructions will not be combined in the actual machine code
(Itanium) or during execution by OoO processors with multiple execution pipes.
To go into the gory details of what this means, if you use two separate
"machine instructions" to write values from different registers to memory, the
processor will treat the instructions as independent and executable by
different pipes.  In an x86 processor, retirement of instructions is required
to be "in order" but processors can, and do retire multiple instructions
simultaneously.

At this point the write pipe takes over.  Even if you wrote two separate
instructions, write combining will combine two writes to the same memory
location.  The 'hypothetical' example assumes writes to two separate bytes.  If
these bytes are in the same 64-bit, 128-bit word or whatever the actual memory
access granularity is, the write will eventually be combined, if not by the
CPU, by the memory controller (Northbridge).

So is there something that should be said here?  Sure every write to a
particular volitile location should result in a write instruction, and
successive writes to the same location cannot be optimized away.  This of
course is Implementation Advice at best, as I showed with the Opteron example.
(It is possible for writes to the screen to be only to cache, where the AGP
card will see them, and we could care less about whether a write to main memory
ever occurs.)

In another area:

> While I can't see a single byte access taking multiple instructions, it
> certainly is possible with word registers. In the similar example:

>    X : Word;
>    for X'Address use .. some mem mapped address
>    for X'Size use 16;
>    pragma Atomic (X);

> we want to discourage using a pair of byte writes here.

Correct.  A much better example I think would be an array of Long_Float with
Volatile_Components.  In a signal processing environment, I certainly don't
want these writes to be done with a move that uses 32-bit registers, whether or
not the move is a "single machine instruction".

***********************************************************

From: Tucker Taft
Sent: Friday, January 17, 2003  10:48 AM

I'm sure it is true that on some machines, the
concepts discussed in this AI really don't apply.
However, that doesn't mean the AI isn't useful.
We aren't just worried about "stale" data.  We
are worried about the unit in which data is sent
to the I/O registers.  I have trouble believing
that many machines which support memory-mapped
I/O treat reads/writes to I/O in the same way
they treat read/writes to RAM.  And this whole
AI is about memory-mapped I/O.  Perhaps that should
be more explicit.

If you aren't worried about memory-mapped I/O, then
of course it is safe to gang together atomic accesses.
The more the merrier (unless of course there is some
implicit locking going on, and the sequence of the atomic
accesses affects the lock sequence, and hence might
affect whether deadlock occurs in the presence of multiple
tasks).  We are presuming a compiler isn't smart enough
to know whether a given piece of volatile memory is I/O
space or not.  Of course, if the compiler knows everything,
then it can apply the usual "as if" rules, and ignore the
details of the AI so long as an indistinguishable effect
is accomplished.

***********************************************************

From: Robert Dewar
Sent: Friday, January 17, 2003  2:08 PM

<<How the processor and compiler accomplish this doesn't belong in the RM.>>

This is exactly why I prefer the implementation advice approach rather than
a bogus attempt at a formal requirement.

***********************************************************

From: Robert Dewar
Sent: Sunday, January 19, 2003  9:43 AM

>Correct.  A much better example I think would be an array of Long_Float with
>Volatile_Components.  In a signal processing environment, I certainly don't
>want these writes to be done with a move that uses 32-bit registers, whether
>or not the move is a "single machine instruction".


Please explain more clearly

a) why would you not want this to be done

b) by what possible reading of the RM or possibly modification to the RM,
given as-if semantics, could you expect the RM to make sure the compkiler
adheres to your wishes.

Volatile is just about ensuring that stuff gets written or read, not how it
gets written or read.

***********************************************************

From: Robert Dewar
Sent: Sunday, January 19, 2003  10:00 AM

> I'm sure it is true that on some machines, the
> concepts discussed in this AI really don't apply.
> However, that doesn't mean the AI isn't useful.
> We aren't just worried about "stale" data.  We
> are worried about the unit in which data is sent
> to the I/O registers.  I have trouble believing
> that many machines which support memory-mapped
> I/O treat reads/writes to I/O in the same way
> they treat read/writes to RAM.  And this whole
> AI is about memory-mapped I/O.  Perhaps that should
> be more explicit.

I think the whole AI is misdirected if it is concerned with memory mapped
I/O only. You may have trouble believing that "many machines which support .."
but in fact it is the normal case that there is nothing special about
memory-mapped I/O. What may be necessary on some machines is to use special
instructions to get to mmio or to disable caches etc. But it would be
quite wrong to have pragma Atomic or Volatile do this kind of mmio required
special stuff by default, since the utility of Atomic and Volatile extend
far beyond mmio.

Yes, it is true that in practice the use of pragma Atomic with appropriate
chosen datatypes may on a specific architecture have the right result but
it is very difficult to mandate what the right result should be at the
Ada semantic level in a target independent manner.

***********************************************************

From: Robert Eachus
Sent: Monday, January 20, 2003  9:05 PM

> Yes, it is true that in practice the use of pragma Atomic with appropriate
> chosen datatypes may on a specific architecture have the right result but
> it is very difficult to mandate what the right result should be at the
> Ada semantic level in a target independent manner.

I think that the sentence would scan better with a comma before "but".
 However, this 48! word sentence sums up what I was tying to say
perfectly.*  I know what the RM should mean by Atomic and Volatile, but
there is no possible way to state the correct rules for UltraSPARC,
PowerPC, and x86 without enumerating cases.  In fact, as I pointed out,
with the new AMD Hammer architecture, we will have a significantly
different situation from most x86 processors, even when it is running in
legacy x86 mode.  (Hammer will run x86 operating systems with the x86-64
extensions disabled, but it will still cache "uncacheable" memory pages.)

*Flame retardant, I hope unnecessary.  I am complimenting Robert Dewar
on expressing the key concept so succinctly, not criticizing. ;-)

***********************************************************

From: Robert Dewar
Sent: Monday, January 20, 2003 11:14 PM

<<*Flame retardant, I hope unnecessary.  I am complimenting Robert Dewar
on expressing the key concept so succinctly, not criticizing. ;-)  >>

Definition no flame retardant required, compliment accepted, thankyou :-)

Now, the interesting question is, who is there who knows modern
architectures well who disagrees with that 48 word sentence? Please
speak up and explain your position.

Far too much has been written on this with a view of architectures
that disappeared 20 years ago :-)

***********************************************************


Questions? Ask the ACAA Technical Agent