Reason: This rule was stated in the syntax
in Ada 95, but that has gotten too complex in Ada 2005.
An identifier shall only contain characters that
may be present in Normalization Form KC (as defined by Clause 21 of ISO/IEC
An implementation can usually detect this during lexical processing.
The code points not allowed are those for which Unicode property NFKC_QC
(Normalization Form KC Quick_Check) has the value No. We say "might
be allowed" so that characters for which the value is Maybe (really,
one of the possible values is Maybe) are allowed (these are mainly combining
marks). The necessary tables can be found in http://www.unicode.org/Public/UCD/latest/ucd/DerivedNormalizationProps.txt.
Versions for older Unicode versions can be found on this site as well;
start at http://www.unicode.org/Public/
and find the appropriate version number.
Simple case folding is a mapping to lower case, so this is matching the
defining (lower case) version of a reserved word. We could have mentioned
case folding of the reserved words, but as that is an identity function,
it would have no effect.
We match the reserved
words after applying case folding so that the rules for identifier
and reserved words are the same. Since a compiler usually will lexically
and reserved words the same way (often with the same code), this will
prevent a lot of headaches.
The rules for reserved words differ in one way: they define case conversion
on letters rather than sequences. This means that it is possible that
there exist some unusual sequences that are neither identifier
nor reserved words. We are not aware of any such sequences so long as
we use simple case folding (as opposed to full case folding), but we
have defined the rules in case any are introduced in future character
set standards. This originally was a problem when converting to upper
case: “ıf” and “acceß” have upper
case conversions of “IF” and “ACCESS” respectively.
We would not want these to be treated as reserved words. But neither
of these cases exist when using simple case folding.
In a nonstandard mode, an implementation may support
other upper/lower case equivalence rules for identifier
to accommodate local conventions].
For instance, in most languages, the simple case folded equivalent of
LATIN CAPITAL LETTER I (an upper case letter without a dot above) is
LATIN SMALL LETTER I (a lower case letter with a dot above). In Turkish,
though, LATIN CAPITAL LETTER I and LATIN CAPITAL LETTER I WITH DOT ABOVE
are two distinct letters, so the case folded equivalent of LATIN CAPITAL
LETTER I is LATIN SMALL LETTER DOTLESS I, and the case folded equivalent
of LATIN CAPITAL LETTER I WITH DOT ABOVE is LATIN SMALL LETTER I. Take
for instance the following identifier (which is the name of a city on
the Tigris river in Eastern Anatolia):
DİYARBAKIR -- The first i is dotted, the second isn't.
A Turkish reader
would expect that the above identifier is equivalent to:
simple case folding (and thus Ada) maps this to:
which is different
from any of the following identifiers:
including the “correct” matching
identifier for Turkish. Upper case conversion (used in '[Wide_]Wide_Image)
introduces additional problems.
An implementation targeting the Turkish market
is allowed (in fact, expected) to provide a nonstandard mode where case
folding is appropriate for Turkish.
Lithuanian and Azeri are two other languages
that present similar idiosyncrasies.
differing only in the use of corresponding upper and lower case letters
are considered the same.
Examples of identifiers:
Count X Get_Symbol Ethelyn Marion
Snobol_4 X1 Page_Count Store_Next_Item
Πλάτων -- Plato
Чайковский -- Tchaikovsky
θ φ -- Angles
Wording Changes from Ada 83
We no longer include reserved words as identifier
This is not a language change. In Ada 83, identifier
included reserved words. However, this complicated several other rules
(for example, regarding implementation-defined attributes and pragmas,
etc.). We now explicitly allow certain reserved words for attribute designators,
to make up for the loss.
Because syntax rules are
relevant to overload resolution, it means that if it looks like a reserved
word, it is not an identifier
As a side effect, implementations cannot use reserved words as implementation-defined
attributes or pragma names.
Extensions to Ada 95
can use any letter defined by ISO-10646:2003, along with several other
categories. This should ease programming in languages other than English.
Incompatibilities With Ada 2005
characters were removed from identifiers as the Unicode recommendations
have changed. This change can only affect programs written for the original
Ada 2005, so there should be few such programs.
We now specify simple case folding rather than full
case folding. That potentially could change identifier equivalence, although
it is more likely that identifiers that are considered the same in original
Ada 2005 will now be considered different. This change was made because
the original Ada 2005 definition was incompatible (and even inconsistent
in unusual cases) with the Ada 95 identifier equivalence rules. As such,
the Ada 2005 rules were rarely fully implemented, and in any case, only
Ada 2005 identifiers containing wide characters could be affected.
Incompatibilities With Ada 2012
Correction: An identifier
that contains any characters not permitted in Normalization Form KC is
now illegal. Ada 2012 allowed such identifiers, but their interpretation
was implementation-defined (so the use of such identifiers was not portable).
Identifiers that may be interpreted differently by different compilers
is a safety and security hazard, so we no longer allow them.
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe