Version 1.2 of ai05s/ai05-0182-1.txt

Unformatted version of ai05s/ai05-0182-1.txt version 1.2
Other versions for file ai05s/ai05-0182-1.txt

!standard 3.5(56/2)          10-08-05 AI05-0182-1/02
!class binding interpretation 09-10-30
!status work item 09-10-30
!status received 09-10-22
!priority Low
!difficulty Easy
!qualifier Omission
!subject Preciseness of S'Value
!summary
S'Wide_Wide_Value, S'Wide_Value, and S'Value may allow additional representations for character values.
!question
What happens in the following cases:
(1) What should Character'Value ("'" & Character'Val(16) & "'") do?
Should it return Character'Val(16), or raise Constraint_Error?
(2) What should Character'Value ("HEX_00000041") do? Return 'A', or
raise Constraint_Error?
!response
(See summary.)
!wording
Add after 3.5(56/2) [Implementation Permissions]
An implementation may extend the Wide_Wide_Value, Wide_Value, and Value attributes of a character type to allow character literals for nongraphic characters and strings starting with "Hex_" for graphic characters and those with a character smaller than 16#100#.
!discussion
The questioner goes on to report that both examples work (do not raise Constraint_Error) in GNAT.
It's clear from the Standard that both of these should raise Constraint_Error; the first is not an enumeration literal if the center character is a nongraphic character, and the second is not the 'Image of a nongraphic character.
But that implies extra code in 'Value to reject these cases. Since this is a runtime function, that is extra code that is added to every program that uses 'Value (and depending on the runtime model, possibly to every program whether or not 'Value is used). That extra code would need to include a table of graphic characters in Unicode, so it is not trivial in size. But this extra code is not helping the user any: both the user and the runtime know what the intended answer is -- the runtime is just not allowed to provide it.
That seems stupid. Therefore we relax the requirement to allow any value with the proper syntax (''' & <any character> & ''') and Hex_hhhhhhhh.
[Editor's note: Should we require or just allow this additional flexibility? I'd be in favor of requiring it, but perhaps that is too much.
If this extra flexibility is considered a bad idea, I recommend that we classify this question a pathology so that it is never tested - effectively allowing the GNAT implementation.]
!ACATS Test
As an Implementation Permission, this is not usefully testable. (The ACATS does not try to determine whether a permission is used.)
!appendix

!topic 'Value on character types
!reference 3.5(39.4/2), 2.1
!from Adam Beneschan 09-10-22
!discussion

I think I know the answers to these, but I wanted to clarify (and bring the
issue up in case anyone thinks the behavior should be different):

(1) What should Character'Value ("'" & Character'Val(16) & "'") do?
    Should it return Character'Val(16), or raise Constraint_Error?

(2) What should Character'Value ("HEX_00000041") do?  Return 'A', or
    raise Constraint_Error?

The way I read the RM, both should raise Constraint_Error.  In the first case,
'<c>' is not an enumeration literal if <c> is a nongraphic character; in the
second case, "HEX_00000041" does not correspond to the 'Image of a nongraphic
character (neither would "HEX_00000010", for that matter, since the 'Image of
Character'Val(16) is "DLE" and not "HEX_00000010").

I'm just bringing this up because someone could argue that, since there's no
requirement that the argument of 'Value be exactly the same as the result of
'Image, for numeric types e.g., Character'Value should allow any string of the
form '<c>' and any string of the form HEX_dddddddd with a valid 8-digit hex
number, for consistency.  I don't have a particular preference (except that
doing it the latter way means less work for me :)).

(GNAT does seem to accept any 3-character string with quote marks.  It raises
Constraint_Error on a HEX_dddddddd string unless the first three characters have
the letter case "Hex", which I think is a bug; but it accepts any "Hex_dddddddd"
string with valid hex digits, regardless of whether the result is a graphic or
control character.)

****************************************************************

From: Randy Brukardt
Sent: Thursday, October 22, 2009  11:54 PM

> The way I read the RM, both should raise Constraint_Error.
> In the first case, '<c>' is not an enumeration literal if <c> is a
> nongraphic character; in the second case, "HEX_00000041"
> does not correspond to the 'Image of a nongraphic character (neither
> would "HEX_00000010", for that matter, since the 'Image of
> Character'Val(16) is "DLE" and not "HEX_00000010").

I agree with your reading. Not sure whether it is a good idea, though,
especially in the latter case.

> I'm just bringing this up because someone could argue that, since
> there's no requirement that the argument of 'Value be exactly the same
> as the result of 'Image, for numeric types e.g., Character'Value
> should allow any string of the form '<c>' and any string of the form
> HEX_dddddddd with a valid 8-digit hex number, for consistency.  I
> don't have a particular preference (except that doing it the latter
> way means less work for me :)).

'Value for enumeration types surely allows many strings that 'Image can't
produce. Besides the obvious case of the leading and trailing blanks, there is
also the fact that lower case versions of (identifier) literals are accepted.
('Image always returns literals in upper case.)

What worries me here is the runtime overhead of checking the character class of
the middle character and of the result of the conversion of HEX_dddddddd. It
seems completely pointless to make such a check on the latter - it would be easy
to do for the Latin-1 part (it's not allowed there), but for the rest of Unicode
you'd need a character class chart. That's pretty big, and not something you'd
want to drag into programs. And it would be a lot of work to *avoid* dragging it
in if it is part of Character'Value.

> (GNAT does seem to accept any 3-character string with quote marks.  It
> raises Constraint_Error on a HEX_dddddddd string unless the first
> three characters have the letter case "Hex", which I think is a bug;
> but it accepts any "Hex_dddddddd"
> string with valid hex digits, regardless of whether the result is a
> graphic or control character.)

I suspect that we want to rethink the rules for 'Value (well, technically for
'Wide_Wide_Value) in order that we aren't dragging along a big runtime overhead
that really doesn't help the user any. What possible advantage is there to
forcing the user to decide whether or not a character is a graphic character
before calling 'Value??

****************************************************************

From: Robert Dewar
Sent: Friday, July 9, 2010  1:47 PM

I really think it would be a mistake to insist on the letter of the standard
here. Not only (as the AI points out) would this introduce a lot of complexity
into the implementation, but it is actively unhelpful to have these
restrictions.

It's five minutes work to implement these restrictions, I just drag in the giant
unit that does Unicode classifications and call the appropriate routine, but I
really dislike doing this since it damages the utility of the implementation for
no good reason.

Rather than regard this as a pathology, why not add implementation permission to
allow these additional forms. Or general permission to add implementation
defined additional forms (either is OK with me).

In the absence of this IP, I am inclined to change GNAT to cripple it as
suggested by the RM. It's not that important, and I don't like having this
discrepancy with the RM.

****************************************************************

From: Randy Brukardt
Sent: Friday, July 9, 2010  6:07 PM

> I really think it would be a mistake to insist on the letter of the
> standard here. Not only (as the AI points out) would this introduce a
> lot of complexity into the implementation, but it is actively
> unhelpful to have these restrictions.

We discussed this AI at the recent ARG meeting. My notes end with:

Tucker would prefer that we make this an implementation permission. We ought to
allow the easy implementation, but we don't want to force implementations to
change what they currently have as this attribute is nearly useless. We agree
with Tucker's suggestion.

which means I need to rewrite the AI in this form.

****************************************************************

From: Robert Dewar
Sent: Friday, July 9, 2010  6:19 PM

> Tucker would prefer that we make this an implementation permission. We
> ought to allow the easy implementation, but we don't want to force
> implementations to change what they currently have as this attribute
> is nearly useless. We agree with Tucker's suggestion.

OK, I agree too, and that means I don't have to change GNAT, since I think the
easy implementation is by far the mores useful one!

****************************************************************

From: Robert Dewar
Sent: Friday, July 9, 2010  6:30 PM

Well that's interesting, turns out this discussion was being held under an
illusion, GNAT does NOT permit any of these forms, perhaps it used to, but it
does not any more! But this surprises me, so I will investigate further.

****************************************************************

From: Robert Dewar
Sent: Sunday, July 25, 2010  2:25 PM

AI05-0182-1 ugly "she" in summary - get rid of it!

****************************************************************

From: Randy Brukardt
Sent: Thursday, August 5, 2010  10:45 PM

> Well that's interesting, turns out this discussion was being held
> under an illusion, GNAT does NOT permit any of these forms, perhaps it
> used to, but it does not any more! But this surprises me, so I will
> investigate further.

I just wrote a test program to see what actually happens. Using GNAT 6.2.1 I get
different results for 'Value and 'Wide_Value (not completely surprising): 'Value
does not accept Hex forms, but does accept '<LF>' (where LF =
Character'Val(10)); while 'Wide_Value accepts both (but only with "Hex_" written
in mixed case).

Note that only the two cases with 'A' should work by the letter of the Ada 2005
standard, all of the rest ought to raise Constraint_Error. The test puts out
"!!" if a string is accepted that shouldn't work by the letter of the Ada 2005
standard (but will be OK with the new permission).


Following is my test program:

with Ada.Text_IO;
with Ada.Characters.Handling;
procedure Adam_182 is

    -- Created test from question of AI05-0182-1.

    Passed : Boolean := True;

    procedure Check_Value (S : in String; Expected : in Character;
                           Must_Work : in Boolean := False) is
    begin
       if Character'Value (S) = Expected then
          if Must_Work then
             Ada.Text_IO.Put_Line ("-- Value works for " & S);
          else
             Ada.Text_IO.Put_Line ("!! Value works for " & S);
               -- The language says this shouldn't work; the new AI05-0182-1
               -- permission allows it.
          end if;
       else
          Ada.Text_IO.Put_Line ("** Value gets wrong result for " & S);
          Passed := False;
       end if;
    exception
       when Constraint_Error =>
          if Must_Work then
             Ada.Text_IO.Put_Line ("** Value raises Constraint_Error for " & S);
             Passed := False;
          else
             Ada.Text_IO.Put_Line ("-- Value raises Constraint_Error for " & S);
          end if;
    end Check_Value;

    procedure Check_Wide_Value (S : in Wide_String;
                                Expected : in Wide_Character;
                                Must_Work : in Boolean := False) is
    begin
       if Wide_Character'Wide_Value (S) = Expected then
          if Must_Work then
             Ada.Text_IO.Put_Line ("-- Wide_Value works for " &
               Ada.Characters.Handling.To_String(S));
          else
             Ada.Text_IO.Put_Line ("!! Wide_Value works for " &
               Ada.Characters.Handling.To_String(S));
               -- The language says this shouldn't work; the new AI05-0182-1
               -- permission allows it.
          end if;
       else
          Ada.Text_IO.Put_Line ("** Wide_Value gets wrong result for " &
            Ada.Characters.Handling.To_String(S));
          Passed := False;
       end if;
    exception
       when Constraint_Error =>
          if Must_Work then
             Ada.Text_IO.Put_Line ("** Wide_Value raises Constraint_Error for " &
               Ada.Characters.Handling.To_String(S));
             Passed := False;
          else
             Ada.Text_IO.Put_Line ("-- Wide_Value raises Constraint_Error for " &
               Ada.Characters.Handling.To_String(S));
          end if;
    end Check_Wide_Value;

begin
    Ada.Text_IO.Put_Line ("--- Check what 'Value does with malformed strings");
    Check_Value ("'A'", 'A', Must_Work => True);
    Check_Value ("'" & Character'Val(10) & "'", Character'Val(10));
    Check_Value ("HEX_00000041", 'A');
    Check_Value ("HEX_00000010", Character'Val(10));
    Check_Value ("Hex_00000041", 'A');
    Check_Value ("Hex_00000010", Character'Val(10));
    Check_Value ("hex_00000041", 'A');
    Check_Value ("hex_00000010", Character'Val(10));

    Check_Wide_Value ("'A'", 'A', Must_Work => True);
    Check_Wide_Value ("'" & Wide_Character'Val(10) & "'",
                      Wide_Character'Val(10));
    Check_Wide_Value ("HEX_00000041", 'A');
    Check_Wide_Value ("HEX_00000010", Wide_Character'Val(10));
    Check_Wide_Value ("Hex_00000041", 'A');
    Check_Wide_Value ("Hex_00000010", Wide_Character'Val(10));
    Check_Wide_Value ("hex_00000041", 'A');
    Check_Wide_Value ("hex_00000010", Wide_Character'Val(10));
    Check_Wide_Value ("HEX_00000394", Wide_Character'Val(16#394#)); -- Delta
    Check_Wide_Value ("Hex_00000394", Wide_Character'Val(16#394#)); -- Delta
    Check_Wide_Value ("hex_00000394", Wide_Character'Val(16#394#)); -- Delta
    if Passed then
       Ada.Text_IO.Put_Line ("--- Adam_182 Passed");
    else
       Ada.Text_IO.Put_Line ("*** Adam_182 Failed");
    end if;
end Adam_182;

Here's the results with GNAT 6.2.1:

--- Check what 'Value does with malformed strings
-- Value works for 'A'
!! Value works for ''
-- Value raises Constraint_Error for HEX_00000041
-- Value raises Constraint_Error for HEX_00000010
-- Value raises Constraint_Error for Hex_00000041
-- Value raises Constraint_Error for Hex_00000010
-- Value raises Constraint_Error for hex_00000041
-- Value raises Constraint_Error for hex_00000010
-- Wide_Value works for 'A'
!! Wide_Value works for ''
-- Wide_Value raises Constraint_Error for HEX_00000041
-- Wide_Value raises Constraint_Error for HEX_00000010
!! Wide_Value works for Hex_00000041
** Wide_Value gets wrong result for Hex_00000010
-- Wide_Value raises Constraint_Error for hex_00000041
-- Wide_Value raises Constraint_Error for hex_00000010
-- Wide_Value raises Constraint_Error for HEX_00000394
!! Wide_Value works for Hex_00000394
-- Wide_Value raises Constraint_Error for hex_00000394
*** Adam_182 Failed

[Note: The test reports failed because "Hex_0000010" did not raise
Constraint_Error nor did it return character <LF> (16#0010#). That seems bad...
- RLB]

****************************************************************

Questions? Ask the ACAA Technical Agent