Version 1.3 of ais/ai-00258.txt

Unformatted version of ais/ai-00258.txt version 1.3
Other versions for file ais/ai-00258.txt

!standard B.03 (50)          01-09-11 AI95-00258/03
!class binding interpretation 01-02-12
!status ARG approved 7-0-1 01-05-20
!status work item 01-02-12
!status received 01-02-12
!qualifier Omission
!priority Medium
!difficulty Medium
!subject Behavior of Interfaces.C.To_C when the result is null
!summary
A call to Interfaces.C.To_C with a null string parameter and Append_Nul False:
To_C(Item => X, Append_Nul => False)
raises Constraint_Error.
!question
B.3(50) defines the lower bound of the result of To_C (returning a char_array) to be 0. However, in the case where the result is a null char_array this doesn't make sense because char_array's index type is the modular type size_t, and so a null range must have a lower bound greater than zero. What should the bounds of the result array be in this case? (Constraint_Error is raised.)
!recommendation
(See summary.)
!wording
(See corrigendum.)
!discussion
Clearly, it is impossible to create a null string with a lower bound of 0.
There are two options here:
-- Return a null array with other bounds (such as 1..0).
-- Raise Constraint_Error.
Most Ada compilers return strings with the bounds 0 .. 4294967295 in this case (which is the result of the obvious implementation combined with the wraparound semantics of modular types), and that is obviously wrong. Either solution would be better than that.
Raising Constraint_Error has the following advantages:
-- It is consistent with the resolution in Defect Report 8652/0062
of a similar issue in Interfaces.C.Strings.Value.
-- It preserves the invariant that the lower bound of the result is 0.
Thus, we select raising Constraint_Error as the resolution.
!corrigendum B.3(50)
Replace the paragraph:
The result of To_C is a char_array value of length Item'Length (if Append_Nul is False) or Item'Length+1 (if Append_Nul is True). The lower bound is 0. For each component Item(I), the corresponding component in the result is To_C applied to Item(I). The value nul is appended if Append_Nul is True.
by:
The result of To_C is a char_array value of length Item'Length (if Append_Nul is False) or Item'Length+1 (if Append_Nul is True). The lower bound is 0. For each component Item(I), the corresponding component in the result is To_C applied to Item(I). The value nul is appended if Append_Nul is True. If Append_Nul is False and Item'Length is 0, then To_C propagates Constraint_Error.
!ACATS test
A test case for this should be added to ACATS tests CXB3005 and CXB3007.
!appendix

!topic Bounds for null result from function Interfaces.C.To_C
!reference RM95-B.3(50)
!from Gary Dismukes 01-01-24
!keywords interfacing char_array
!discussion

B.3(50) defines the lower bound of the result of To_C (returning
a char_array) to be 0.  However, in the case where the result is
a null char_array this doesn't make sense because char_array's
index type is the modular type size_t, and so a null range
must have a lower bound greater than zero.  What should the
bounds of the result array be in this case?  (One possibility
is to define the bounds to be 1..0 for the null result case,
which is what we've done for now in the GNAT implementation.)

****************************************************************

From: Pascal Leroy
Sent: Wednesday, January 24, 2001 4:37 AM

This question looks quite similar to AI 139 (aka DR 0062), and I think it
should be answered similarly, i.e. Constraint_Error is raised.  An empty
char_array is a weird thing in C anyway: strings in C are typically
nul-terminated, so the char_array won't be empty, it will contain one nul.

****************************************************************

From: dewar@gnat.com
Sent: Wednesday, January 24, 2001 8:27 AM

Well strings in C do NOT have to be nul-terminated, and the interface
reflects this fact. I would be dubious about introducing a CE here ...

****************************************************************

From: Tucker Taft
Sent: Wednesday, January 24, 2001 10:13 AM

This would be a doubly weird situation.  First, the caller would
have to override the default for Append_Nul, specifying False,
and then they would have to provide a null Ada string, producing
a zero-length C array, which is not permitted by the C standard
(try to declare a zero-length array in C -- the compiler will
complain).

So I would suggest you stick with a 0 low bound, and a Constraint_Error
if the high bound would be outside the base range of the index type.

Raising an exception here is probably doing the user a favor, because
they are creating an illegal C array object.

****************************************************************

From: Deller, Steve
Sent: Wednesday, January 24, 2001 3:14 PM
To: Ada-Comment List

There is nothing weird at all.  If one has a "nul" already in place, the
desire may be just to write the text at the address specified.  Also, in C
it is possible to have strings that are defined as an address and a count,
not depending on nul termination.  Or strings that are zero terminated only
if they are less than a specified length.

The string type in C does not, in any way, deal with "nul" termination,
except that it says literal strings have, for convenience, a nul appended so
they *may* be used with str* functions.

Here is the relevant section on string functions from the C manual:

  4.11.1 String function conventions
  memcpy, memset, memcmp, and memchr have been adopted from
  several existing implementations.  The general goal was to
  provide equivalent capabilities for three types of byte
  sequences:

      o   null-terminated strings (str-),
      o   null-terminated strings with a maximum length (strn-), and
      o   transparent data of specified length (mem-).

In short, I believe the function should NOT raise an exception just because
the input is of zero length.  Also, as a matter of good definition, I
believe the To_C and To_Ada functions should be invertible.

I wrote a small test that is appended.

For Rational Apex, the program returned:
  C_string lower bound: 0
  C_string upper bound: 4294967295
  Exception: CONSTRAINT_ERROR raised in inner handler 2

Note that the exception occurred during "To_Ada" not during "To_C".

With GNAT the program returned:
  C_string lower bound: 1
  C_string upper bound: 0
  Result string ''

I believe I prefer the GNAT solution of returning 1..0 without any
exception.

Regards
Steve Deller
deller@rational.com
PLEASE NOTE: In spite of the email address, I am *NOT* a Rational employee
and *DO NOT* speak for Rational in any way.  These are my *PERSONAL*
opinions.


with Text_Io;
with Interfaces.C;
use Interfaces.C;
with Ada.Exceptions;
use Ada.Exceptions;
procedure Test_To_C is
begin
    declare
        Ada_String : String (1 .. 0);
        C_String : Char_Array := To_C (Ada_String, Append_Nul => False);
    begin
        Text_Io.Put_Line ("C_string lower bound:" &
                          Size_T'Image (C_String'First));
        Text_Io.Put_Line ("C_string upper bound:" &
                          Size_T'Image (C_String'Last));
    exception
        when E: others =>
            Text_Io.Put_Line ("Exception: " & Exception_Name (E) &
                              " raised in inner handler 1");
    end;
    declare
        Ada_String : String (1 .. 0);
        C_String : Char_Array := To_C (Ada_String, Append_Nul => False);
    begin
        Text_Io.Put_Line ("Result string is '" &
                          To_Ada (C_String, Trim_Nul => False) & "'");

    exception
        when E: others =>
            Text_Io.Put_Line ("Exception: " & Exception_Name (E) &
                              " raised in inner handler 2");
    end;
exception
    when E: others =>
        Text_Io.Put_Line ("Exception: " & Exception_Name (E) &
                          " raised in outer handler");
end Test_To_C;

****************************************************************

From: Pascal Leroy
Sent: Thursday, January 25, 2001 2:40 AM

Here is another relevant section from the C manual:

<<7.1.1 Definition of terms

A string is a contiguous sequence of characters terminated by and including
the first null character.  A "pointer to" a string is a pointer to its
initial (lowest addressed) character.  The "length" of a string is the
number of characters preceding the null character and its "value" is the
sequence of the values of the contained characters, in order.>>

By this definition, in C strings that are not null-terminated are just not
strings, they are a pair address+length or some such.

Note that there are things that you cannot do with a char_array that are
perfectly legitimate in C, e.g. having a string with negative bounds.  You
can do that in C by using a char* pointing in the middle of the string, and
keeping some extra bookkeeping information.  You can't do that in Ada with
char_array.  This might be a design mistake, but it's water under the bridge
at this point.

****************************************************************

From: Robert A Duff
Sent: Thursday, January 25, 2001 9:57 AM

> Well strings in C do NOT have to be nul-terminated, and the interface
> reflects this fact. I would be dubious about introducing a CE here ...

But char[0] is (annoyingly) illegal in C.  You don't have to terminate
your char arrays with nul, but you have to have at least one component.

So I think C_E is the right answer here.

****************************************************************

From: dewar@gnat.com
Sent: Thursday, January 25, 2001 10:59 AM

<<But char[0] is (annoyingly) illegal in C.  You don't have to terminate
your char arrays with nul, but you have to have at least one component.

So I think C_E is the right answer here.>>

That is not true of dynamically allocated stuff, since there is no
control. I really think C_E is annoying here.

****************************************************************

From: Michael Yoder
Sent: Thursday, January 25, 2001 5:32 PM

I have moderately strong sympathy for Robert's finding Constraint_Error to
be annoying, but feel more pain at losing the postcondition that the
result's lower bound is guaranteed to be zero.  However, a user who needs
(or wants) a particular behavior can write their version in terms of the
other; so a decision *either* way is better than having implementers choose
independently.

****************************************************************

From: Randy Brukardt
Sent: Thursday, January 25, 2001 6:25 PM

I certainly agree with Mike that some decision should be made. Pascal's
argument of consistency seems to hold some water as well. (Although the
argument certainly is stronger for Interfaces.C.Strings.Value than it is for
Interfaces.C.To_C.)

I suspect that most implementations simply get this wrong: the obvious
implementation of returning
    Result : Char_Array (0 .. Length - 1);
combined with the modular type gives very large bounds, not null.

Whatever we decide, we certainly need an ACATS test to check it. There is an
ACATS test (case) for Interfaces.C.String.Value, although I think it might
fail to detect the error in some cases. (It tries to assign the result into
a null slice, and that might raise the expected constraint error even if the
function return some other junk).

****************************************************************

From: Robert A Duff
Sent: Friday, January 26, 2001 12:03 PM

> That is not true of dynamically allocated stuff, since there is no
> control. I really think C_E is annoying here.

I don't have the C standard at hand, but I'm using Harbisson and
Steele's "C -- A Reference Manual", which is usually pretty accurate.

They say that malloc(0) "will return either a null pointer or an
implementation-defined unique pointer."  Same for calloc and the rest.

I don't feel strongly about the issue either way, but Mike Yoder is
absolutely right that the language should define it one way or the
other.  Let's decide that either decision soon is better than the
"right" decision later on.

- Bob

P.S. Wasn't Malloc an ancient Babylonion god or something?

****************************************************************

From: Randy Brukardt
Sent: Friday, January 26, 2001 1:51 PM

> I don't feel strongly about the issue either way, but Mike Yoder is
> absolutely right that the language should define it one way or the
> other.  Let's decide that either decision soon is better than the
> "right" decision later on.

I agree with Mike and Bob that a decision here is important, particularly
considering that most compilers do neither: they return something with
bounds of 0 .. Size_T'last. Eliminating *that* is important, how that's done
is not important.

In order to move this along, I'll volunteer to write up a short AI on this,
which hopefully we can dispose of in Leuven. (Better have a discussion time
limit, though!)

Any objections?

****************************************************************

From: Robert A Duff
Sent: Tuesday, January 30, 2001 10:59 AM

I wrote:

> ...malloc(0) "will return either a null pointer or an
> implementation-defined unique pointer."

The info below may be helpful.  ;-)

- Bob

> Return-Path: <wsduff@starpower.net>
> From: "Bill and Sally Duff" <wsduff@starpower.net>
> To: "Robert A Duff" <bobduff@WORLD.STD.COM>
> References: <200101261904.OAA24156@world.std.com>
> Subject: Re: malloc
> Date: Sat, 27 Jan 2001 12:54:24 -0500
>
> Perhaps you are referring to Moloch, also spelled Molech, and sometimes
> called Milcom, a god of the Ammonites.  It is evidently not the name of the
> god, but a title derived from "melek," meaning king in Hebrew and Canaanite,
> with the vowels replaced by the vowels in the word "boshet," shame, by the
> pious Hebrews who wrote the Bible in order to show comtempt for the god.  He
> was indeed a god worthy of contempt, as his worship was associated with
> human sacrifices, e.g., incinerating children.  This must have been fairly
> widespread practice, even in the Israelite community because we are told at
> least two kings of Israel had a son "pass through the fire" of Moloch and
> the practice is condemned by several prophets.  I think the story of the
> binding of Isaac may have been told as an attempt to ban the practice of
> sacrifiicing sons.
>     The Ammonites were a not very important tribe who lived just southeast
> of Israelite territory, next to the desert.  They frequently engaged in war
> with Israel.  But Solomon is said to have married several Ammonite wives,
> for whom he built a shrine of Moloch in Jerusalem.  It was not until the
> reign of Hezekiah, 250 years later, that the shrine was destroyed.  So
> Moloch worship must have been fairly common.  Moloch is also associated with
> Ba'al (which was originally also a title, not a name; it means lord or
> husband), the Canaanite god.
>     Substituting vowels of one word for the vowels of another is a way they
> had of giving a new meaning to an old word.  For example, the name of God in
> Hebrew is YHWH.  When this word is written, they add the vowels in the word
> "adonai" into YHWH, making it look like YaHoWaH, from which we get the
> English word Jehovah.  When the Hebrew text is read by a Jew the word is
> pronounced "Adonai," which means Lord.  Thus, Jews avoid saying the actual
> name of God, which is considered too holy to be spoken.
>     By now you are sorry you asked the question.

****************************************************************

Questions? Ask the ACAA Technical Agent