Version 1.1 of acs/ac-00064.txt

Unformatted version of acs/ac-00064.txt version 1.1
Other versions for file acs/ac-00064.txt

!standard 7.1(01)          03-07-31 AC95-00064/01
!class amendment 03-07-31
!status received no action 03-07-31
!status received 03-01-10
!subject Converting application types to Storage_Array
!summary
!appendix

From: Jeffrey Carter
Sent: Friday, January 10, 2003  8:01 PM

This seems to be an area where I would like to see an AI. However, it is
possible that it is already addressed by the language and I've missed
it. I did not find anything about this in the AIs.

There is a need for a language-defined mechanism to efficiently
(unchecked) convert an arbitrary value of any type to (an appropriate
subtype of) Storage_Array without copying the value and in such a way
that attributes such as 'First, 'Last, 'Length, and 'range are valid for
the result.

Especially in applications that deal with custom hardware, it is
desirable to be able to write operations such as

procedure P (Data : in Storage_Array);

and use attibutes of Data to perform the operation.

The actual values that will be passed to P may be quite large. Hard
real-time systems often have both time and memory constraints that
disallow copying of the value in order to attach bounds information to
it. Depending on an implementation's scheme for connecting bounds
information to an array value, Unchecked_Conversion or an overlay using
an address clause may not provide valid bounds information, preventing
the use of normal array attributes.

This leads to systems that use implementation-dependent characteristics,
which are not portable to other compilers, or that define such
operations using an address and size:

procedure P (Addr : in Address; Size : in Positive);

with all the opportunities for error that this approach is infamous for.

We need a mechanism which is portable (therefore language defined), and
that guarantees both that the value is not copied and that normal array
attributes may be safely used.

If the mechanism requires that a subtype of Storage_Array with
appropriate bounds be declared, an attribute that gives the actual size
of an object or parameter in storage elements would be very useful, as
the calculation of this value from 'Size is repetitive and error prone,
and 'Max_Size_In_Storage_Elements may not be the correct value.

Does the language define such a mechanism, or should I prepare an AI?

If it already exists, the ARM should explicitly state it in 13.7.1. This
is something essential for Ada's intended application area and that a
lot of people have missed.

****************************************************************

From: Randy Brukardt
Sent: Friday, January 10, 2003  9:01 PM

> There is a need for a language-defined mechanism to efficiently
> (unchecked) convert an arbitrary value of any type to (an appropriate
> subtype of) Storage_Array without copying the value and in such a way
> that attributes such as 'First, 'Last, 'Length, and 'range
> are valid for the result.

I'd say:
"Use the slice, Jeff, use the slice!"

I presume your point is that using Storage_IO to do this implies a copy,
which can be a problem in some circumstances.

When we've had to do stuff like this in Claw (although there we're usually
trying to get at a C string of some sort), we've generally used a
combination of unchecked conversions on accesses to the objects in question,
along with slices of the result. This would look something like:

    use System.Storage_Elements;
    type Blob ... -- The source type of interest. Better have a
                  -- contiguous representation.

    -- Convert to storage array:
    type Blob_Access is access all Blob;
    subtype Max_Storage_Array is Storage_Array (1 .. Storage_Offset'Last);
    type SA_Access is access all Max_Storage_Array;
    -- We use a maximum sized statically constrained Storage_Array here,
    -- because not all implementations use 'bare' pointers for dynamic arrays.
    function Convert_Access is Unchecked_Conversion (Source => Blob_Access,
       Target => SA_Access);
    -- A "safe" Unchecked_Conversion; it has to work without making your
    -- program erroneous.

    -- Here's the conversion:
    ... Convert_Access(Blob_Object'Unchecked_Access).all(1 ..
          (Blob'Size+Storage_Element'Size-1)/Storage_Element'Size) ...

This conversion will copy nothing but one pointer and (possibly) a slice
descriptor; the generated code is very little, even though there is a lot of
text.

Unfortunately, you can't write a function to do this, because its parameters
must be of mode in, and thus you can only have access-to-constants of them.
(And, I suppose, the function would be at risk of copying its result
anyway).

This is unsafe only if you calculate the size of the type incorrectly (and
thus access beyond the end of the object). That's not likely (outside of
compiler bugs).

The real danger is if Blob is a type that can change size (that is, it is
mutable type), or if the compiler has represented it in a non-contiguous
fashion. For a compiler like Janus/Ada, getting a contiguous object requires
being pretty careful. If the object is non-contiguous, there isn't any way
to get a simple Storage_Array from it without copying, simply because pieces
of the object are in different places. That's why Storage_IO implies
copying.

The point is that what you want isn't possible in general. And it's easy to
get when it is possible (although I don't see why you'd want to go to
Storage_Array -- generally it makes more sense to go to some
application-specific type or to Stream_Array). There might be some value in
having an attribute which would tell you if a particular object is
contiguously represented, because then at least you could bullet-proof the
program:

      if not Blob'Contiguous then
          raise Program_Error; -- Hey, we assume that!!
      end if;

but that's about all that I can see that is possible. (Janus/Ada has such an
attribute, called 'Is_Simple, but we've never documented it. That's odd,
because we do document the distinction between simple and non-simple types.
We use it a lot in the runtime, especially in generic units like
Sequential_IO.)

****************************************************************

From: Pascal Leroy
Sent: Saturday, January 11, 2003  2:59 AM

I fully agree with Randy's analysis.

****************************************************************

From: Jeffrey Carter
Sent: Saturday, January 11, 2003  6:03 PM

 ...
 > I presume your point is that using Storage_IO to do this implies a
 > copy, which can be a problem in some circumstances.

Yes.

 > -- Here's the conversion: ...
 > Convert_Access(Blob_Object'Unchecked_Access).all(1 ..
 > (Blob'Size+Storage_Element'Size-1)/Storage_Element'Size) ...
 >
 > This conversion will copy nothing but one pointer and (possibly) a
 > slice descriptor; the generated code is very little, even though
 > there is a lot of text.

OK. It seems like a lot of work for something that is so often needed in
Ada's intended application area. The object must be aliased, which seems
like an unnecessary restriction on the rest of the system. The number of
times I've seen people (including me) make errors calculating the number
of storage elements makes me want an attribute that provides this value.

In fact, I'd say your example should use Blob_Object'Size to calculate
this value, rather than Blob'Size, since the two can differ. I'd also
use Storage_Unit rather than Storage_Element'Size, since the two are
defined to be the same and the former is shorter. Being able to say
something like

(1 .. Blob_Object'Size_In_Storage_Elements)

would eliminate this source of errors.

 > This is unsafe only if you calculate the size of the type incorrectly
 >  (and thus access beyond the end of the object). That's not likely
 > (outside of compiler bugs).

In practice, I've found it fairly common.

 >
 > The real danger is if Blob is a type that can change size (that is,
 > it is mutable type), or if the compiler has represented it in a
 > non-contiguous fashion. For a compiler like Janus/Ada, getting a
 > contiguous object requires being pretty careful. If the object is
 > non-contiguous, there isn't any way to get a simple Storage_Array
 > from it without copying, simply because pieces of the object are in
 > different places. That's why Storage_IO implies copying.
 >
 > The point is that what you want isn't possible in general. And it's
 > easy to get when it is possible (although I don't see why you'd want
 > to go to Storage_Array -- generally it makes more sense to go to some
 >  application-specific type or to Stream_Array). There might be some
 > value in having an attribute which would tell you if a particular
 > object is contiguously represented, because then at least you could
 > bullet-proof the program:

I've seen lots of projects where this kind of thing was needed, and
copying was not acceptable. One area is sticking bits into (or
getting them from) some hardware device, and letting the rest of the
application deal with the data as one of a large number of large types.
There's also applications that need to add error-correction bits and
interleave bits from multiple bytes to data that internally may be one
of a number of different types. Copying is usually more acceptable in
such cases, but the need to deal with lots of different types as a
sequence of bytes remains.

These internal types are usually carefully laid out at the bit level to
be contiguous, so the problem of contiguousitude doesn't arise.

I've always wondered about having both Stream_Element and
Storage_Element; I'd think that systems where stream and storage
elements were different would be fairly rare. If an application-specific
type were used, it would generally be equivalent to Storage_Array, so
using Storage_Array seems like the right thing to do.

At least the language allows a way to achieve this, even if it's not
very obvious. Is there any way to put an example in the ARM to point out
this approach to all the dummies like me and the people I've worked with?

Does anyone else prefer a different approach?

****************************************************************

From: Randy Brukardt
Sent: Saturday, January 11, 2003  8:57 PM

>...
> In fact, I'd say your example should use Blob_Object'Size to calculate
> this value, rather than Blob'Size, since the two can differ.

Yes, that's right, and what I meant.

> I'd also use Storage_Unit rather than Storage_Element'Size, since the two
> are defined to be the same and the former is shorter.

It's defined in a different place, and using it makes the reader figure out
that they're the same. Not a big deal; I much prefer either to simply using
'8' which a lot of people do (and they leave out the rounding, too); that
will usually work, until someone wants to compile the program on the U2200
or something like that.

> Being able to say something like
>
> (1 .. Blob_Object'Size_In_Storage_Elements)
>
> would eliminate this source of errors.

Does any compiler provide this now? It's simple enough to do that if there
was any demand, it must have been implemented by now. (Geez, I'm starting to
sound like Robert! :-)

>...
> I've always wondered about having both Stream_Element and
> Storage_Element; I'd think that systems where stream and storage
> elements were different would be fairly rare.

Janus/Ada 95 for the U2200 has these different. The machine has direct
accessibility to 36-bit words only, while I/O is done in 9-bit bytes. So
Storage_Unit = 36 and Stream_Element_Size = 9.

> If an application-specific type were used, it would generally be equivalent
> to Storage_Array, so using Storage_Array seems like the right thing to do.

Not really. Every time I've had to do this, the target type was
Interfaces.C.Char_Array. In practice, this is the same as Storage_Array, but
of course Ada is strongly typed, and we needed to use To_Ada to get a real
Ada string out it in the end.

> At least the language allows a way to achieve this, even if it's not
> very obvious. Is there any way to put an example in the ARM to point out
> this approach to all the dummies like me and the people I've worked with?

This really doesn't have much to do with the ARM (which is a language
definition, not a language usage guide). There are a lot of Ada idioms that
aren't in the ARM. Wasn't somebody making a web page to collect Ada idioms??

****************************************************************

From: Jeffrey Carter
Sent: Sunday, January 12, 2003  8:27 PM

...
>>(1 .. Blob_Object'Size_In_Storage_Elements)
>>
>>would eliminate this source of errors.
>
> Does any compiler provide this now? It's simple enough to do that if there
> was any demand, it must have been implemented by now. (Geez, I'm starting to
> sound like Robert! :-)

Not that I know of, though I've wanted it on many occasions. Sounding
like Robert, GNAT has 'Object_Size, but it's in bits, and applies to a type.

>>If an application-specific type were used, it would generally be equivalent
>>to Storage_Array, so using Storage_Array seems like the right thing to do.
>
> Not really. Every time I've had to do this, the target type was
> Interfaces.C.Char_Array. In practice, this is the same as Storage_Array, but
> of course Ada is strongly typed, and we needed to use To_Ada to get a real
> Ada string out it in the end.

I was refering to the embedded applications I've been involved with.
Apparently your experience has been different.

****************************************************************

From: Robert Dewar
Sent: Sunday, January 12, 2003  8:49 PM

(1 .. Blob_Object'Size_In_Storage_Elements)

Seems silly when Blob_Object'Size / Storage_Unit will give same result.

****************************************************************

From: Jeffrey Carter
Sent: Monday, January 13, 2003  12:46 PM

Randy and I both seem to feel that

(Blob_Object'Size + Storage_Unit - 1) / Storage_Unit

is needed to round up to the next multiple of Storage_Unit if
Blob_Object'Size is not such a multiple. It would be nice if your
version were correct for all platforms and compilers. Is that the case?

****************************************************************

From: Robert Dewar
Sent: Monday, January 13, 2003  3:35 PM

Not for an object size. I can't imagine any sensible compiler allocating
a stand alone object in memory that does not occupy an integral number of
storage units. That's certainly the case with GNAT (nothing else would
make sense).

****************************************************************

From: Robert A. Duff
Sent: Monday, January 13, 2003  2:34 PM

In my experience, if I want to know the number of storage units,
then I'm doing something that requires the size to be an integer number
of storage units.  So I usually write something like this:

    pragma Assert(Blah'Size mod System.Storage_Unit = 0);
    ... Blah'Size / System.Storage_Unit ...

In many cases, there's no point in rounding up -- that answer is just as
wrong as any other.

****************************************************************

From: Pascal Leroy
Sent: Monday, January 13, 2003  4:03 AM

Agreed.  In fact the rounding approach is weird because it assumes that the
object starts on a storage unit boundary but might end in the middle of a
storage unit.  But then the object might as well start and end in the middle of
a storage unit, in which case the rounding would be entirely wrong -- think of
a 2-bit object straddling two storage units.

This leads me to the conclusion that Bob's assertion should really be:

    pragma Assert(Blah'Alignment /= 0 and
                  Blah'Size mod System.Storage_Unit = 0);

****************************************************************

Questions? Ask the ACAA Technical Agent