Version 1.1 of ai05s/ai05-0114-1.txt
!standard 3.9(12.1/2) 08-10-06 AI05-0114-1/01
!standard 3.9(25.3/2)
!standard 3.9(26.1/2)
!standard 13.3(76)
!class binding interpretation 08-10-06
!status work item 08-10-06
!status received 08-06-13
!priority Low
!difficulty Easy
!qualifier Omission
!subject Conflicting definition of Letter
!summary
!question
Ada 2005 has enhanced the set of characters allowed to compose identifiers. In particular,
2.3(2/2) specifies that an identifier is made up of items including identifier_start.
Then 2.3(3/2) specifies that any letter_lowercase can be a component of identifier_start.
Then 2.1(9/2) defines letter_lowercase to be "Any character whose General Category is
defined to be "Letter, Lowercase"" by ISO/IEC 10646:2003.
The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase:
Code point u+00AA, named FEMININE ORDINAL INDICATOR;
Code point u+00B5, named MICRO SIGN;
Code point u+00BA, named MASCULINE ORDINAL INDICATOR.
Therefore, each of these three characters should be considered lowercase letters and allowed
in identifiers according to the various Clause 2.1 and 2.3 paragraphs mentioned above.
Since Ada considers these three characters as letters suitable for being part of identifiers,
the functions Is_Letter and Is_Lower in package Ada.Characters.Handling should now
correspondingly return True for the characters with code points 170, 181, and 186. Should
this change be made? (No.)
!recommendation
(See Summary.)
!wording
!discussion
Changing the definition of Ada.Characters.Handling has the potential of breaking existing
programs. Moreover, this is the worst kind of incompatibility: one where the behavior of
a program silently changes.
The questioner also forgets that these definitions are used in other places: specifically,
the constants of Ada.Strings.Maps and its relatives. This would spread the incompatibility
to the majority of programs that use the Ada.Strings packages.
The questioner also seems to assume that there is some correlation between
Ada.Characters.Handling and identifiers. But this has never been true; both concepts are
defined separately.
While it is likely that many programs will not use any characters in the changed range, the
potential incompatibility is so wide spread that such a runtime change cannot be contemplated.
[More interesting question: Does the incompatibility of Ada 95 and Ada 2005 classifications
of these characters have any other unintended consequences?
There is an alternative way to resolve the difference, which would be to use the Ada 95
classification for Row 00 (that is, Latin-1). One way to do that would be to explicitly say
that these three characters are not letters in Ada, even though they would qualify via
Unicode. This is unlikely to be a major problem (all of the characters appear to have
counterparts elsewhere in the Unicode set), but it would be weird (and probably not necessary
unless there is a compatibility issue uncovered here.]
--!corrigendum 13.3(76)
!ACATS Test
Create ACATS C-Tests and (if we disallow the three additional characters in identifiers) B-Tests
to check that whatever is decided is enforced.
!appendix
!topic Inconsistency in Ada 2005 definition of letter
!reference Ada 2005 A.3.2(24,25)
!from Howard W. Ludwig 08-06-26
!keywords identifier_start, letter_lowercase, Is_Letter, Is_Lower
!discussion
Ada 2005 has enhanced the set of characters allowed to compose identifiers. In particular,
2.3(2/2) specifies that an identifier is made up of items including identifier_start.
Then 2.3(3/2) specifies that any letter_lowercase can be a component of identifier_start.
Then 2.1(9/2) defines letter_lowercase to be "Any character whose General Category is
defined to be "Letter, Lowercase"" by ISO/IEC 10646:2003.
The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase:
Code point u+00AA, named FEMININE ORDINAL INDICATOR;
Code point u+00B5, named MICRO SIGN;
Code point u+00BA, named MASCULINE ORDINAL INDICATOR.
Therefore, each of these three characters should be considered lowercase letters and allowed
in identifiers according to the various Clause 2.1 and 2.3 paragraphs mentioned above.
This is contrary to Ada 95, in which 2.1(7..9) allows only characters in Row 00 of ISO 10646 BMP
whose name begins "Latin Capital Letter" or "Latin Small Letter". The MICRO SIGN and
two ORDINAL INDICATORs did not qualify as identifier characters under this Ada 95 rule but
do satisfy the Unicode lowercase letter categorization requirement for Ada 2005 identifiers.
Thus, it is not that Ada 2005 added as allowed identifier characters only code points beyond
Row 00 of the BMP but also changed the categorization of these three within Row 00.
Now that Ada considers these three characters as letters suitable for being part of identifiers,
the functions Is_Letter and Is_Lower in package Ada.Characters.Handling should now
correspondingly return True for the characters with code points 170, 181, and 186. I do not
have any strong opinion as to what Is-Basic should return as a value for these three
characters (first casual thought is True for 181 and still False for the other two). Now,
Is_Letter and Is_Lower do not have any relevance outside of type Character (that is,
beyond code point 255) for deciding what is acceptable (or not) for being part of an
identifier (though I think such functionality would be useful and should be included
for the broader category of characters, as Java does), but it should match for Row 00,
where both concepts meaningfully overlap.
I understand this would be a compatibility issue with respect to Ada 95 in that the same
program source code could yield different results under Ada 2005. However, the current letter
of the law is a conceptual incompatibility in that in Ada 95, whether a character was regarded
as a suitable letter for an identifier and whether Is_Letter returned a value of True matched
in Ada 95 but do not with current Ada 2005 wording.
****************************************************************
Questions? Ask the ACAA Technical Agent