Version 1.2 of ais/ai-00395.txt

Unformatted version of ais/ai-00395.txt version 1.2
Other versions for file ais/ai-00395.txt

!standard 2.1(14.2/2)          05-01-27 AI95-00395/02
!standard 1.1.4(14.1/2)
!standard 2.3(1.1/2)
!standard 2.3(5.2/2)
!standard 2.9(2)
!standard 3.5.2(3.2/2)
!standard 4.1.4(3)
!standard 4.1.4(5)
!class amendment 05-01-25
!status work item 05-01-25
!status received 05-01-25
!priority High
!difficulty Easy
!subject Various clarifications regarding 16- and 32-bit characters
!summary
(See Proposal.)
!problem
1 - The characters in category other_format are generally not displayed. The syntax rule for identifier would make it possible to have an identifier that includes two underlines separated by an other_format, which would visually look like two underlines. Similarly for trailing underlines, or for identifiers that would look like reserved words. Is this intended? (No.)
2 - The character at position 16#AD#, SOFT HYPHEN, is in category other_format. It was allowed in Ada 95 in literals, but the current wording means that it's no longer allowed, which introduces an incompatibility. Is this intended? (No.)
3 - Many places in normative text talk about "upper case" without qualification. This is somewhat ambiguous in the Unicode world.
4 - The definition of the image of non-graphic wide characters results in long strings like "Character_12345678". This increases the Width attribute for Wide_String and Wide_Wide_String for no good reason.
!proposal
1 - After removing the other_format characters, an identifier must not violate the "usual" rules about underlines. It must not be a reserved word, either. Also, other_format characters are allowed (but ignored) in reserved words, and in "special" attribute designators.
2 - The incompatibility doesn't seem justified. While Unicode recommends that other_format characters be ignored in identifiers, it doesn't say anything about other constructs. ECMA C#, which we used as a guideline in resolving some of the characters issues, allow them in string literals. Hopefully decent program editors will provide a way to display these characters. Note that some languages allow any character in string literals. We do not want to go that far, in particular we do not want to allow control characters. They have been disallowed for 20 years, and there is no indication that users have had any problem with that. We are just avoiding an incompatibility.
3 - We are not going to fix all these places. Currently we only have a rule in 2.3, but it surely doesn't cover all the occurrences of "upper case", so it would be better to have a blanket statement somewhere in section 1.
4 - Change the language-defined names to keep the current value of Width (which is 12).
!wording
1 -
Change 2.3(1.1/2-4) to read:
identifier ::= identifier_start {identifier_start | identifier_extend} identifier_start ::= letter_uppercase | letter_lowercase | letter_titlecase | letter_modifier | letter_other | number_letter identifier_extend ::= mark_non_spacing | mark_spacing_combining | number_decimal_digit | punctuation_connector | other_format
After eliminating the characters in category other_format, an identifier shall not contain two consecutive characters in category punctuation_connector, or end with a character in that category.
Add after 2.3(5.2/2):
After applying these transformations, an identifier shall not be identical to [the upper case version of] a reserved word.
Replace 2.9(2/2) with:
reserved_word ::= identifier_start {identifier_start | other_format}
After eliminating the characters in category other_format and converting the remaining sequence of characters to upper case, a reserved word shall be identical to the upper case version of one of the following words:
Replace 4.1.4(3) with:
attribute_designator ::= identifier [(static_expression)] | reserved_word
Add after 4.1.4(5):
A reserved word used as an attribute_designator shall be one of Access, Delta, or Digits.
2 -
Change 2.1(14/2) to read:
graphic_character
Any character which is not in the categories other_control,
other_private_use, other_surrogate, format_effector, and whose code position is neither 16#FFFE# nor 16#FFFF#.
3 -
Add after 1.1.4(14.1/2):
When this International Standard mentions the upper case version of some character or sequence of characters, it means the character or sequence of characters obtained by using locale-independent full case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003.
AARM Note: For sequences of characters, case folding is applied to the sequence, not to individual characters. It can make a difference, sometimes.
Change 2.3(5.2/2) to read:
o The remaining sequence of characters is converted to upper case.
4 -
In 3.5.2(3.2/2), replace:
... the string "Character_" ...
by:
... the string "Chr_" ...
!discussion
See proposal.
!example
--!corrigendum
!ACATS test
!appendix

****************************************************************

Questions? Ask the ACAA Technical Agent