Version 1.1.1.1 of ais/ai-00037.txt
!standard B.3 (20) 97-03-19 AI95-00037/06
!standard B.3 (31)
!class binding interpretation 96-02-06
!status WG9 approved 96-12-07
!status ARG approved 12-0-0 96-10-07
!status ARG approved 8-0-0 (subject to editorial review) 96-06-17
!status work item 95-11-01
!status received 95-06-25
!priority High
!difficulty Medium
!subject In Interfaces.C, nul and wide_nul represent zero
!summary 95-06-25
In package Interfaces.C, the type wchar_t is a discrete type.
The constants nul and wide_nul have implementation defined
values, which should have a representation of zero.
Types char and wchar_t may use a signed or unsigned
representation.
!question 96-09-15
The following declarations appear in Interfaces.C:
19 type char is <implementation-defined character type>;
20 nul : constant char := char'First;
...
30 type wchar_t is <implementation-defined>;
31 wide_nul : constant wchar_t := wchar_t'First;
The declaration of wide_nul seems to imply that wchar_t
supports the attribute First. What is the intent?
If char and/or wchar_t are signed integer types in the interfaced C
implementation, may the Ada implementation reflect that fact by using a
signed representation for char and/or wchar_t? (Yes.)
Note that if char and wchar_t have a signed representation, then
char'First and wchar_t'First will not have a zero representation.
Are the constants nul and wide_nul intended to be represented as
zero? (Yes.)
!recommendation 95-06-25
(See summary.)
!wording 96-07-23
Modify paragraphs (19,20,30,31) as follows:
19 type char is <implementation-defined character type>;
20 nul : constant char := <implementation-defined>;
...
30 type wchar_t is <implementation-defined discrete type>;
31 wide_nul : constant wchar_t := <implementation-defined>;
Add the following Implementation Advice:
The constants nul and wide_nul should have a representation of zero.
!discussion 96-04-04
The intent is that wchar_t be discrete.
The type char may have a signed representation. For example,
the implementation might have:
for char use (-128, -127, ..., 127);
In that case, char'First is the wrong value to use for nul;
the intent is that nul be represented as zero.
Similarly, wchar_t could be an enumeration type with a signed
representation, as for char. Wchar_t could also be a signed integer
type. Either way, wchar_t'First is the wrong value to use for wide_nul.
It is important to allow signed representations of char and wchar_t,
in order to properly match what the C implementation does.
!appendix
!section B.3(31)
!subject Can wchar_t be signed?
!reference RM95-B.3(31)
!from Norman Cohen
!reference as: 95-5119.b Norman H. Cohen 95-4-7>>
!discussion
Paragraph 30 indicates that the definition of Interfaces.C.wchar_t is
implementation defined. However, since wide_nul is meant to have a
representation of zero, the initialization of wide_nul to wchar_t'First
in paragraph 31 suggests that wchar_t can be an enumeration type or a
modular type, but not a signed integer type. Is that the intent?
****************************************************************
!section B.3(31)
!subject Can wchar_t be signed?
!reference RM95-B.3(31)
!reference 95-5119.b Norman Cohen 95-04-07
!from Tucker Taft 95-04-12
!reference as: 95-5127.b Tucker Taft 95-4-12>>
!discussion
> Paragraph 30 indicates that the definition of Interfaces.C.wchar_t is
> implementation defined. However, since wide_nul is meant to have a
> representation of zero, the initialization of wide_nul to wchar_t'First
> in paragraph 31 suggests that wchar_t can be an enumeration type or a
> modular type, but not a signed integer type. Is that the intent?
I suppose. Although ANSI C only says "integral" I would imagine that
almost all C implementations use unsigned for wchar_t. There is probably
no harm in wchar_t being modeled as an unsigned type, even if it is
signed in the C implementation.
****************************************************************
!section B.3(31)
!subject Can wchar_t be signed
!reference RM95-B.3(31)
!reference AI95-00037/00
!from Bjorn Kallberg
!reference as: 95-5215.a Bjorn Kallberg 95-7-10>>
!discussion
In the proposed AI95, it is suggested that Interfaces.C.wchar_t is not a
signed integer type, even if the C implementation has a signed representation.
A better solution is to change (31) to say that
wide_nul : constant wchar_t := implementation-defined.
instead of the current
wide_nul : constant wchar_t := wchar_t'first;
The intent of the Interfaces.C is to give an Ada representation of the C
types. With the definition of nul as the wchar_t'first, additional
semantics which do not exist in C are added.
The following is one example where the added restriction is harmful.
There may be others.
We have an C implementation were wide_char is signed, and we also
have a C header file, defining some character values to
positive and negative values.
Automatic translation of this header file to Ada will not work.
The above problem is of course not a major one, but as there is no
argument presented except "not harmful" for the other view, it seems
much better stay with the original intent, to mimic the C definition
as closely as possible.
Also, most other constants in this package are implementation defined
/bj|rn
****************************************************************
!section B.3(31)
!subject Can wchar_t be signed?
!reference RM95-B.3 (31)
!reference AI95-00037
!from Pascal Leroy 95-10-20
!reference 95-5359.b Pascal Leroy 95-10-20>>
!discussion
This AI states that "there seems to be no harm in requiring that wchar_t not
be signed." Making wchar_t unsigned when the corresponding C type is signed
has at least one unpleasant consequence: it makes it difficult to translate
algorithms written in C (some people do that...). If some algorithm
explicitly uses the fact that the C type wchar_t is signed, you need extensive
surgery to rewrite it in Ada.
Also, the appendix says "I would imagine that almost all C implementations use
unsigned for wchar_t." That's a nice story, but it is not true. At least one
widely available OS (Solaris) has wchar_t defined as long.
I would prefer to change the declaration of wide_nul to read:
wide_nul : constant wchar_t := wchar_t'val (0);
which would have the desired effect if wchar_t is enumerated, signed or
unsigned (and would allow a signed wchar_t).
****************************************************************
!section B.3(31)
!subject Can wchar_t be signed
!reference AI95-00037
!reference B.3(31)
!from Tucker Taft 95-10-30
!reference 95-5373.d Tucker Taft 95-10-30>>
!discussion
I don't agree with the recommendation of this AI.
The whole point of Interfaces.C is to match the characteristics
of the corresponding C types in so far as practical. It seems
straightforward to allow wchar_t to be signed or unsigned.
The suggestion that wide_nul be defined as wchar_t'Val(0) seems to
resolve any problems with it being signed.
****************************************************************
!section B.3(31)
!subject Can wchar_t be signed?
!reference RM95-B.3(31)
!reference AI95-00037/03
!from Keith Thompson 96-06-21
!reference 96-5605.a Keith Thompson 96-6-21>>
!discussion
Both Pascal Leroy and Tucker Taft suggest that wide_nul whould be
defined as follows:
wide_nul : constant wchar_t := wchar_t'val (0);
This can be incorrect if wchar_t is an enumeration type with an
enumeration representation clause. Specifically:
type wchar_t is (...); -- 65536 values
for wchar_t use (-32768, -32767, ..., 0, ..., 32766, 32767);
Then wchar_t'Val(0) is the same as wchar_t'First, which is not the value
whose internal representation is 0.
Suggestion: leave the initialization for wide_nul implementation-defined,
with a comment indicating that it should (shall?) have an internal
representation of 0. An alternative would be to disallow the above
declaration of wchar_t, but I see little point in doing so.
****************************************************************
!section B.3(19)
!subject Interfaces.C.Char
!reference RM95-B.3(19)
!reference RM95-B.3(30)
!from Keith Thompson 96-12-05
!reference 96-5778.a Keith Thompson 96-12-5>>
!discussion
May the type Interfaces.C.Char be declared as
type char is new Character;
?
May Interfaces.C.wchar_t be declared as
type wchar_t is new Wide_Character;
?
If so, type conversions between Character and char, and between
Wide_Character and wchar_t, are legal under some implementations and
not under others.
Given the wording in the RM, I believe the answer is yes in both cases.
This is an interesting source of quietly non-portable programs.
****************************************************************
Questions? Ask the ACAA Technical Agent