!standard 3.5(56/2) 10-08-05 AI05-0182-1/02 !class binding interpretation 09-10-30 !status work item 09-10-30 !status received 09-10-22 !priority Low !difficulty Easy !qualifier Omission !subject Preciseness of S'Value !summary S'Wide_Wide_Value, S'Wide_Value, and S'Value may allow additional representations for character values. !question What happens in the following cases: (1) What should Character'Value ("'" & Character'Val(16) & "'") do? Should it return Character'Val(16), or raise Constraint_Error? (2) What should Character'Value ("HEX_00000041") do? Return 'A', or raise Constraint_Error? !response (See summary.) !wording Add after 3.5(56/2) [Implementation Permissions] An implementation may extend the Wide_Wide_Value, Wide_Value, and Value attributes of a character type to allow character literals for nongraphic characters and strings starting with "Hex_" for graphic characters and those with a character smaller than 16#100#. !discussion The questioner goes on to report that both examples work (do not raise Constraint_Error) in GNAT. It's clear from the Standard that both of these should raise Constraint_Error; the first is not an enumeration literal if the center character is a nongraphic character, and the second is not the 'Image of a nongraphic character. But that implies extra code in 'Value to reject these cases. Since this is a runtime function, that is extra code that is added to every program that uses 'Value (and depending on the runtime model, possibly to every program whether or not 'Value is used). That extra code would need to include a table of graphic characters in Unicode, so it is not trivial in size. But this extra code is not helping the user any: both the user and the runtime know what the intended answer is -- the runtime is just not allowed to provide it. That seems stupid. Therefore we relax the requirement to allow any value with the proper syntax (''' & & ''') and Hex_hhhhhhhh. [Editor's note: Should we require or just allow this additional flexibility? I'd be in favor of requiring it, but perhaps that is too much. If this extra flexibility is considered a bad idea, I recommend that we classify this question a pathology so that it is never tested - effectively allowing the GNAT implementation.] !ACATS Test As an Implementation Permission, this is not usefully testable. (The ACATS does not try to determine whether a permission is used.) !appendix !topic 'Value on character types !reference 3.5(39.4/2), 2.1 !from Adam Beneschan 09-10-22 !discussion I think I know the answers to these, but I wanted to clarify (and bring the issue up in case anyone thinks the behavior should be different): (1) What should Character'Value ("'" & Character'Val(16) & "'") do? Should it return Character'Val(16), or raise Constraint_Error? (2) What should Character'Value ("HEX_00000041") do? Return 'A', or raise Constraint_Error? The way I read the RM, both should raise Constraint_Error. In the first case, '' is not an enumeration literal if is a nongraphic character; in the second case, "HEX_00000041" does not correspond to the 'Image of a nongraphic character (neither would "HEX_00000010", for that matter, since the 'Image of Character'Val(16) is "DLE" and not "HEX_00000010"). I'm just bringing this up because someone could argue that, since there's no requirement that the argument of 'Value be exactly the same as the result of 'Image, for numeric types e.g., Character'Value should allow any string of the form '' and any string of the form HEX_dddddddd with a valid 8-digit hex number, for consistency. I don't have a particular preference (except that doing it the latter way means less work for me :)). (GNAT does seem to accept any 3-character string with quote marks. It raises Constraint_Error on a HEX_dddddddd string unless the first three characters have the letter case "Hex", which I think is a bug; but it accepts any "Hex_dddddddd" string with valid hex digits, regardless of whether the result is a graphic or control character.) **************************************************************** From: Randy Brukardt Sent: Thursday, October 22, 2009 11:54 PM > The way I read the RM, both should raise Constraint_Error. > In the first case, '' is not an enumeration literal if is a > nongraphic character; in the second case, "HEX_00000041" > does not correspond to the 'Image of a nongraphic character (neither > would "HEX_00000010", for that matter, since the 'Image of > Character'Val(16) is "DLE" and not "HEX_00000010"). I agree with your reading. Not sure whether it is a good idea, though, especially in the latter case. > I'm just bringing this up because someone could argue that, since > there's no requirement that the argument of 'Value be exactly the same > as the result of 'Image, for numeric types e.g., Character'Value > should allow any string of the form '' and any string of the form > HEX_dddddddd with a valid 8-digit hex number, for consistency. I > don't have a particular preference (except that doing it the latter > way means less work for me :)). 'Value for enumeration types surely allows many strings that 'Image can't produce. Besides the obvious case of the leading and trailing blanks, there is also the fact that lower case versions of (identifier) literals are accepted. ('Image always returns literals in upper case.) What worries me here is the runtime overhead of checking the character class of the middle character and of the result of the conversion of HEX_dddddddd. It seems completely pointless to make such a check on the latter - it would be easy to do for the Latin-1 part (it's not allowed there), but for the rest of Unicode you'd need a character class chart. That's pretty big, and not something you'd want to drag into programs. And it would be a lot of work to *avoid* dragging it in if it is part of Character'Value. > (GNAT does seem to accept any 3-character string with quote marks. It > raises Constraint_Error on a HEX_dddddddd string unless the first > three characters have the letter case "Hex", which I think is a bug; > but it accepts any "Hex_dddddddd" > string with valid hex digits, regardless of whether the result is a > graphic or control character.) I suspect that we want to rethink the rules for 'Value (well, technically for 'Wide_Wide_Value) in order that we aren't dragging along a big runtime overhead that really doesn't help the user any. What possible advantage is there to forcing the user to decide whether or not a character is a graphic character before calling 'Value?? **************************************************************** From: Robert Dewar Sent: Friday, July 9, 2010 1:47 PM I really think it would be a mistake to insist on the letter of the standard here. Not only (as the AI points out) would this introduce a lot of complexity into the implementation, but it is actively unhelpful to have these restrictions. It's five minutes work to implement these restrictions, I just drag in the giant unit that does Unicode classifications and call the appropriate routine, but I really dislike doing this since it damages the utility of the implementation for no good reason. Rather than regard this as a pathology, why not add implementation permission to allow these additional forms. Or general permission to add implementation defined additional forms (either is OK with me). In the absence of this IP, I am inclined to change GNAT to cripple it as suggested by the RM. It's not that important, and I don't like having this discrepancy with the RM. **************************************************************** From: Randy Brukardt Sent: Friday, July 9, 2010 6:07 PM > I really think it would be a mistake to insist on the letter of the > standard here. Not only (as the AI points out) would this introduce a > lot of complexity into the implementation, but it is actively > unhelpful to have these restrictions. We discussed this AI at the recent ARG meeting. My notes end with: Tucker would prefer that we make this an implementation permission. We ought to allow the easy implementation, but we don't want to force implementations to change what they currently have as this attribute is nearly useless. We agree with Tucker's suggestion. which means I need to rewrite the AI in this form. **************************************************************** From: Robert Dewar Sent: Friday, July 9, 2010 6:19 PM > Tucker would prefer that we make this an implementation permission. We > ought to allow the easy implementation, but we don't want to force > implementations to change what they currently have as this attribute > is nearly useless. We agree with Tucker's suggestion. OK, I agree too, and that means I don't have to change GNAT, since I think the easy implementation is by far the mores useful one! **************************************************************** From: Robert Dewar Sent: Friday, July 9, 2010 6:30 PM Well that's interesting, turns out this discussion was being held under an illusion, GNAT does NOT permit any of these forms, perhaps it used to, but it does not any more! But this surprises me, so I will investigate further. **************************************************************** From: Robert Dewar Sent: Sunday, July 25, 2010 2:25 PM AI05-0182-1 ugly "she" in summary - get rid of it! **************************************************************** From: Robert Dewar Sent: Sunday, July 25, 2010 2:41 PM For the record, the following program with GNAT: with Text_IO; use Text_IO; procedure ValueTest is A : Character; B : Character; begin begin A := Character'Value ("'" & Character'Val (16) & "'"); Put_Line (Integer'Image (Character'Pos (A))); exception when Constraint_Error => Put_Line ("Case 1 CE raised"); end; begin B := Character'Value ("HEX_00000041"); Put_Line (Integer'Image (Character'Pos (B))); exception when Constraint_Error => Put_Line ("Case 2 CE raised"); end; end; outputs 16 Case 2 CE raised So it is not the case that GNAT permits the second form, so this claim should be corrected in the writeup. I don't know where anyone got this idea. Character'Value has never accepted strings of this form! **************************************************************** From: Randy Brukardt Sent: Thursday, August 5, 2010 10:45 PM > Well that's interesting, turns out this discussion was being held > under an illusion, GNAT does NOT permit any of these forms, perhaps it > used to, but it does not any more! But this surprises me, so I will > investigate further. I just wrote a test program to see what actually happens. Using GNAT 6.2.1 I get different results for 'Value and 'Wide_Value (not completely surprising): 'Value does not accept Hex forms, but does accept '' (where LF = Character'Val(10)); while 'Wide_Value accepts both (but only with "Hex_" written in mixed case). Note that only the two cases with 'A' should work by the letter of the Ada 2005 standard, all of the rest ought to raise Constraint_Error. The test puts out "!!" if a string is accepted that shouldn't work by the letter of the Ada 2005 standard (but will be OK with the new permission). Following is my test program: with Ada.Text_IO; with Ada.Characters.Handling; procedure Adam_182 is -- Created test from question of AI05-0182-1. Passed : Boolean := True; procedure Check_Value (S : in String; Expected : in Character; Must_Work : in Boolean := False) is begin if Character'Value (S) = Expected then if Must_Work then Ada.Text_IO.Put_Line ("-- Value works for " & S); else Ada.Text_IO.Put_Line ("!! Value works for " & S); -- The language says this shouldn't work; the new AI05-0182-1 -- permission allows it. end if; else Ada.Text_IO.Put_Line ("** Value gets wrong result for " & S); Passed := False; end if; exception when Constraint_Error => if Must_Work then Ada.Text_IO.Put_Line ("** Value raises Constraint_Error for " & S); Passed := False; else Ada.Text_IO.Put_Line ("-- Value raises Constraint_Error for " & S); end if; end Check_Value; procedure Check_Wide_Value (S : in Wide_String; Expected : in Wide_Character; Must_Work : in Boolean := False) is begin if Wide_Character'Wide_Value (S) = Expected then if Must_Work then Ada.Text_IO.Put_Line ("-- Wide_Value works for " & Ada.Characters.Handling.To_String(S)); else Ada.Text_IO.Put_Line ("!! Wide_Value works for " & Ada.Characters.Handling.To_String(S)); -- The language says this shouldn't work; the new AI05-0182-1 -- permission allows it. end if; else Ada.Text_IO.Put_Line ("** Wide_Value gets wrong result for " & Ada.Characters.Handling.To_String(S)); Passed := False; end if; exception when Constraint_Error => if Must_Work then Ada.Text_IO.Put_Line ("** Wide_Value raises Constraint_Error for " & Ada.Characters.Handling.To_String(S)); Passed := False; else Ada.Text_IO.Put_Line ("-- Wide_Value raises Constraint_Error for " & Ada.Characters.Handling.To_String(S)); end if; end Check_Wide_Value; begin Ada.Text_IO.Put_Line ("--- Check what 'Value does with malformed strings"); Check_Value ("'A'", 'A', Must_Work => True); Check_Value ("'" & Character'Val(10) & "'", Character'Val(10)); Check_Value ("HEX_00000041", 'A'); Check_Value ("HEX_00000010", Character'Val(10)); Check_Value ("Hex_00000041", 'A'); Check_Value ("Hex_00000010", Character'Val(10)); Check_Value ("hex_00000041", 'A'); Check_Value ("hex_00000010", Character'Val(10)); Check_Wide_Value ("'A'", 'A', Must_Work => True); Check_Wide_Value ("'" & Wide_Character'Val(10) & "'", Wide_Character'Val(10)); Check_Wide_Value ("HEX_00000041", 'A'); Check_Wide_Value ("HEX_00000010", Wide_Character'Val(10)); Check_Wide_Value ("Hex_00000041", 'A'); Check_Wide_Value ("Hex_00000010", Wide_Character'Val(10)); Check_Wide_Value ("hex_00000041", 'A'); Check_Wide_Value ("hex_00000010", Wide_Character'Val(10)); Check_Wide_Value ("HEX_00000394", Wide_Character'Val(16#394#)); -- Delta Check_Wide_Value ("Hex_00000394", Wide_Character'Val(16#394#)); -- Delta Check_Wide_Value ("hex_00000394", Wide_Character'Val(16#394#)); -- Delta if Passed then Ada.Text_IO.Put_Line ("--- Adam_182 Passed"); else Ada.Text_IO.Put_Line ("*** Adam_182 Failed"); end if; end Adam_182; Here's the results with GNAT 6.2.1: --- Check what 'Value does with malformed strings -- Value works for 'A' !! Value works for '' -- Value raises Constraint_Error for HEX_00000041 -- Value raises Constraint_Error for HEX_00000010 -- Value raises Constraint_Error for Hex_00000041 -- Value raises Constraint_Error for Hex_00000010 -- Value raises Constraint_Error for hex_00000041 -- Value raises Constraint_Error for hex_00000010 -- Wide_Value works for 'A' !! Wide_Value works for '' -- Wide_Value raises Constraint_Error for HEX_00000041 -- Wide_Value raises Constraint_Error for HEX_00000010 !! Wide_Value works for Hex_00000041 ** Wide_Value gets wrong result for Hex_00000010 -- Wide_Value raises Constraint_Error for hex_00000041 -- Wide_Value raises Constraint_Error for hex_00000010 -- Wide_Value raises Constraint_Error for HEX_00000394 !! Wide_Value works for Hex_00000394 -- Wide_Value raises Constraint_Error for hex_00000394 *** Adam_182 Failed [Note: The test reports failed because "Hex_0000010" did not raise Constraint_Error nor did it return character (16#0010#). That seems bad... - RLB] ****************************************************************