!standard B.03 (50) 01-09-11 AI95-00258/03 !class binding interpretation 01-02-12 !status Amendment 200Y 02-05-09 !status WG9 Approved 01-10-05 !status ARG Approved 7-0-1 01-05-20 !status work item 01-02-12 !status received 01-02-12 !qualifier Omission !priority Medium !difficulty Medium !subject Behavior of Interfaces.C.To_C when the result is null !summary A call to Interfaces.C.To_C with a null string parameter and Append_Nul False: To_C(Item => X, Append_Nul => False) raises Constraint_Error. !question B.3(50) defines the lower bound of the result of To_C (returning a char_array) to be 0. However, in the case where the result is a null char_array this doesn't make sense because char_array's index type is the modular type size_t, and so a null range must have a lower bound greater than zero. What should the bounds of the result array be in this case? (Constraint_Error is raised.) !recommendation (See summary.) !wording (See corrigendum.) !discussion Clearly, it is impossible to create a null string with a lower bound of 0. There are two options here: -- Return a null array with other bounds (such as 1..0). -- Raise Constraint_Error. Most Ada compilers return strings with the bounds 0 .. 4294967295 in this case (which is the result of the obvious implementation combined with the wraparound semantics of modular types), and that is obviously wrong. Either solution would be better than that. Raising Constraint_Error has the following advantages: -- It is consistent with the resolution in Defect Report 8652/0062 of a similar issue in Interfaces.C.Strings.Value. -- It preserves the invariant that the lower bound of the result is 0. Thus, we select raising Constraint_Error as the resolution. !corrigendum B.3(50) @drepl @xindent @dby @xindent !ACATS test A test case for this should be added to ACATS tests CXB3005 and CXB3007. !appendix !topic Bounds for null result from function Interfaces.C.To_C !reference RM95-B.3(50) !from Gary Dismukes 01-01-24 !keywords interfacing char_array !discussion B.3(50) defines the lower bound of the result of To_C (returning a char_array) to be 0. However, in the case where the result is a null char_array this doesn't make sense because char_array's index type is the modular type size_t, and so a null range must have a lower bound greater than zero. What should the bounds of the result array be in this case? (One possibility is to define the bounds to be 1..0 for the null result case, which is what we've done for now in the GNAT implementation.) **************************************************************** From: Pascal Leroy Sent: Wednesday, January 24, 2001 4:37 AM This question looks quite similar to AI 139 (aka DR 0062), and I think it should be answered similarly, i.e. Constraint_Error is raised. An empty char_array is a weird thing in C anyway: strings in C are typically nul-terminated, so the char_array won't be empty, it will contain one nul. **************************************************************** From: dewar@gnat.com Sent: Wednesday, January 24, 2001 8:27 AM Well strings in C do NOT have to be nul-terminated, and the interface reflects this fact. I would be dubious about introducing a CE here ... **************************************************************** From: Tucker Taft Sent: Wednesday, January 24, 2001 10:13 AM This would be a doubly weird situation. First, the caller would have to override the default for Append_Nul, specifying False, and then they would have to provide a null Ada string, producing a zero-length C array, which is not permitted by the C standard (try to declare a zero-length array in C -- the compiler will complain). So I would suggest you stick with a 0 low bound, and a Constraint_Error if the high bound would be outside the base range of the index type. Raising an exception here is probably doing the user a favor, because they are creating an illegal C array object. **************************************************************** From: Deller, Steve Sent: Wednesday, January 24, 2001 3:14 PM To: Ada-Comment List There is nothing weird at all. If one has a "nul" already in place, the desire may be just to write the text at the address specified. Also, in C it is possible to have strings that are defined as an address and a count, not depending on nul termination. Or strings that are zero terminated only if they are less than a specified length. The string type in C does not, in any way, deal with "nul" termination, except that it says literal strings have, for convenience, a nul appended so they *may* be used with str* functions. Here is the relevant section on string functions from the C manual: 4.11.1 String function conventions memcpy, memset, memcmp, and memchr have been adopted from several existing implementations. The general goal was to provide equivalent capabilities for three types of byte sequences: o null-terminated strings (str-), o null-terminated strings with a maximum length (strn-), and o transparent data of specified length (mem-). In short, I believe the function should NOT raise an exception just because the input is of zero length. Also, as a matter of good definition, I believe the To_C and To_Ada functions should be invertible. I wrote a small test that is appended. For Rational Apex, the program returned: C_string lower bound: 0 C_string upper bound: 4294967295 Exception: CONSTRAINT_ERROR raised in inner handler 2 Note that the exception occurred during "To_Ada" not during "To_C". With GNAT the program returned: C_string lower bound: 1 C_string upper bound: 0 Result string '' I believe I prefer the GNAT solution of returning 1..0 without any exception. Regards Steve Deller deller@rational.com PLEASE NOTE: In spite of the email address, I am *NOT* a Rational employee and *DO NOT* speak for Rational in any way. These are my *PERSONAL* opinions. with Text_Io; with Interfaces.C; use Interfaces.C; with Ada.Exceptions; use Ada.Exceptions; procedure Test_To_C is begin declare Ada_String : String (1 .. 0); C_String : Char_Array := To_C (Ada_String, Append_Nul => False); begin Text_Io.Put_Line ("C_string lower bound:" & Size_T'Image (C_String'First)); Text_Io.Put_Line ("C_string upper bound:" & Size_T'Image (C_String'Last)); exception when E: others => Text_Io.Put_Line ("Exception: " & Exception_Name (E) & " raised in inner handler 1"); end; declare Ada_String : String (1 .. 0); C_String : Char_Array := To_C (Ada_String, Append_Nul => False); begin Text_Io.Put_Line ("Result string is '" & To_Ada (C_String, Trim_Nul => False) & "'"); exception when E: others => Text_Io.Put_Line ("Exception: " & Exception_Name (E) & " raised in inner handler 2"); end; exception when E: others => Text_Io.Put_Line ("Exception: " & Exception_Name (E) & " raised in outer handler"); end Test_To_C; **************************************************************** From: Pascal Leroy Sent: Thursday, January 25, 2001 2:40 AM Here is another relevant section from the C manual: <<7.1.1 Definition of terms A string is a contiguous sequence of characters terminated by and including the first null character. A "pointer to" a string is a pointer to its initial (lowest addressed) character. The "length" of a string is the number of characters preceding the null character and its "value" is the sequence of the values of the contained characters, in order.>> By this definition, in C strings that are not null-terminated are just not strings, they are a pair address+length or some such. Note that there are things that you cannot do with a char_array that are perfectly legitimate in C, e.g. having a string with negative bounds. You can do that in C by using a char* pointing in the middle of the string, and keeping some extra bookkeeping information. You can't do that in Ada with char_array. This might be a design mistake, but it's water under the bridge at this point. **************************************************************** From: Robert A Duff Sent: Thursday, January 25, 2001 9:57 AM > Well strings in C do NOT have to be nul-terminated, and the interface > reflects this fact. I would be dubious about introducing a CE here ... But char[0] is (annoyingly) illegal in C. You don't have to terminate your char arrays with nul, but you have to have at least one component. So I think C_E is the right answer here. **************************************************************** From: dewar@gnat.com Sent: Thursday, January 25, 2001 10:59 AM <> That is not true of dynamically allocated stuff, since there is no control. I really think C_E is annoying here. **************************************************************** From: Michael Yoder Sent: Thursday, January 25, 2001 5:32 PM I have moderately strong sympathy for Robert's finding Constraint_Error to be annoying, but feel more pain at losing the postcondition that the result's lower bound is guaranteed to be zero. However, a user who needs (or wants) a particular behavior can write their version in terms of the other; so a decision *either* way is better than having implementers choose independently. **************************************************************** From: Randy Brukardt Sent: Thursday, January 25, 2001 6:25 PM I certainly agree with Mike that some decision should be made. Pascal's argument of consistency seems to hold some water as well. (Although the argument certainly is stronger for Interfaces.C.Strings.Value than it is for Interfaces.C.To_C.) I suspect that most implementations simply get this wrong: the obvious implementation of returning Result : Char_Array (0 .. Length - 1); combined with the modular type gives very large bounds, not null. Whatever we decide, we certainly need an ACATS test to check it. There is an ACATS test (case) for Interfaces.C.String.Value, although I think it might fail to detect the error in some cases. (It tries to assign the result into a null slice, and that might raise the expected constraint error even if the function return some other junk). **************************************************************** From: Robert A Duff Sent: Friday, January 26, 2001 12:03 PM > That is not true of dynamically allocated stuff, since there is no > control. I really think C_E is annoying here. I don't have the C standard at hand, but I'm using Harbisson and Steele's "C -- A Reference Manual", which is usually pretty accurate. They say that malloc(0) "will return either a null pointer or an implementation-defined unique pointer." Same for calloc and the rest. I don't feel strongly about the issue either way, but Mike Yoder is absolutely right that the language should define it one way or the other. Let's decide that either decision soon is better than the "right" decision later on. - Bob P.S. Wasn't Malloc an ancient Babylonion god or something? **************************************************************** From: Randy Brukardt Sent: Friday, January 26, 2001 1:51 PM > I don't feel strongly about the issue either way, but Mike Yoder is > absolutely right that the language should define it one way or the > other. Let's decide that either decision soon is better than the > "right" decision later on. I agree with Mike and Bob that a decision here is important, particularly considering that most compilers do neither: they return something with bounds of 0 .. Size_T'last. Eliminating *that* is important, how that's done is not important. In order to move this along, I'll volunteer to write up a short AI on this, which hopefully we can dispose of in Leuven. (Better have a discussion time limit, though!) Any objections? **************************************************************** From: Robert A Duff Sent: Tuesday, January 30, 2001 10:59 AM I wrote: > ...malloc(0) "will return either a null pointer or an > implementation-defined unique pointer." The info below may be helpful. ;-) - Bob > Return-Path: > From: "Bill and Sally Duff" > To: "Robert A Duff" > References: <200101261904.OAA24156@world.std.com> > Subject: Re: malloc > Date: Sat, 27 Jan 2001 12:54:24 -0500 > > Perhaps you are referring to Moloch, also spelled Molech, and sometimes > called Milcom, a god of the Ammonites. It is evidently not the name of the > god, but a title derived from "melek," meaning king in Hebrew and Canaanite, > with the vowels replaced by the vowels in the word "boshet," shame, by the > pious Hebrews who wrote the Bible in order to show comtempt for the god. He > was indeed a god worthy of contempt, as his worship was associated with > human sacrifices, e.g., incinerating children. This must have been fairly > widespread practice, even in the Israelite community because we are told at > least two kings of Israel had a son "pass through the fire" of Moloch and > the practice is condemned by several prophets. I think the story of the > binding of Isaac may have been told as an attempt to ban the practice of > sacrifiicing sons. > The Ammonites were a not very important tribe who lived just southeast > of Israelite territory, next to the desert. They frequently engaged in war > with Israel. But Solomon is said to have married several Ammonite wives, > for whom he built a shrine of Moloch in Jerusalem. It was not until the > reign of Hezekiah, 250 years later, that the shrine was destroyed. So > Moloch worship must have been fairly common. Moloch is also associated with > Ba'al (which was originally also a title, not a name; it means lord or > husband), the Canaanite god. > Substituting vowels of one word for the vowels of another is a way they > had of giving a new meaning to an old word. For example, the name of God in > Hebrew is YHWH. When this word is written, they add the vowels in the word > "adonai" into YHWH, making it look like YaHoWaH, from which we get the > English word Jehovah. When the Hebrew text is read by a Jew the word is > pronounced "Adonai," which means Lord. Thus, Jews avoid saying the actual > name of God, which is considered too holy to be spoken. > By now you are sorry you asked the question. ****************************************************************