A.3.5 The Package Wide_Characters.Handling
{
AI05-0185-1}
The package Wide_Characters.Handling provides operations for classifying
Wide_Characters and case folding for Wide_Characters.
Static Semantics
{
AI05-0185-1}
The library package Wide_Characters.Handling has the following declaration:
{
AI05-0266-1}
function Character_Set_Version
return String;
function Is_Control (Item : Wide_Character)
return Boolean;
function Is_Letter (Item : Wide_Character)
return Boolean;
function Is_Lower (Item : Wide_Character)
return Boolean;
function Is_Upper (Item : Wide_Character)
return Boolean;
{
AI12-0260-1}
function Is_Basic (Item : Wide_Character) return Boolean;
function Is_Digit (Item : Wide_Character)
return Boolean;
function Is_Decimal_Digit (Item : Wide_Character)
return Boolean
renames Is_Digit;
function Is_Hexadecimal_Digit (Item : Wide_Character)
return Boolean;
function Is_Alphanumeric (Item : Wide_Character)
return Boolean;
function Is_Special (Item : Wide_Character)
return Boolean;
function Is_Line_Terminator (Item : Wide_Character)
return Boolean;
function Is_Mark (Item : Wide_Character)
return Boolean;
function Is_Other_Format (Item : Wide_Character)
return Boolean;
function Is_Punctuation_Connector (Item : Wide_Character)
return Boolean;
function Is_Space (Item : Wide_Character)
return Boolean;
{
AI12-0004-1}
function Is_NFKC (Item : Wide_Character) return Boolean;
function Is_Graphic (Item : Wide_Character)
return Boolean;
function To_Lower (Item : Wide_Character)
return Wide_Character;
function To_Upper (Item : Wide_Character)
return Wide_Character;
{
AI12-0260-1}
function To_Basic (Item : Wide_Character) return Wide_Character;
function To_Lower (Item : Wide_String)
return Wide_String;
function To_Upper (Item : Wide_String)
return Wide_String;
{
AI12-0260-1}
function To_Basic (Item : Wide_String) return Wide_String;
end Ada.Wide_Characters.Handling;
{
AI05-0185-1}
The subprograms defined in Wide_Characters.Handling are locale independent.
function Character_Set_Version return String;
{
AI05-0266-1}
Returns an implementation-defined identifier that identifies the version
of the character set standard that is used for categorizing characters
by the implementation.
function Is_Control (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
other_control; otherwise returns False.
function Is_Letter (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other, or
number_letter;
otherwise returns False.
function Is_Lower (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_lowercase; otherwise returns False.
function Is_Upper (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase; otherwise returns False.
function Is_Basic (Item : Wide_Character) return Boolean;
{
AI12-0260-1}
Returns True if the Wide_Character designated by
Item has no Decomposition Mapping in the code charts of ISO/IEC 10646:2017;
otherwise returns False.
Implementation Note:
Decomposition Mapping is defined in Clause 33 of ISO/IEC 10646:2017.
Machine-readable (and normative!) versions of this can be found as Character
Decomposition Mapping, described in file http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt,
field 5 (which is the 6th item, Unicode counts from zero).
function Is_Digit (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
number_decimal; otherwise returns False.
function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
number_decimal, or is in the range 'A'
.. 'F' or 'a' .. 'f'; otherwise returns False.
function Is_Alphanumeric (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other,
number_letter,
or
number_decimal; otherwise returns False.
function Is_Special (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
graphic_character, but not categorized
as
letter_uppercase,
letter_lowercase,
letter_titlecase,
letter_modifier,
letter_other,
number_letter,
or
number_decimal; otherwise returns False.
function Is_Line_Terminator (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
separator_line or
separator_paragraph,
or if Item is a conventional line terminator character (Line_Feed, Line_Tabulation,
Form_Feed, Carriage_Return, Next_Line); otherwise returns False.
function Is_Mark (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
mark_non_spacing or
mark_spacing_combining;
otherwise returns False.
function Is_Other_Format (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
other_format; otherwise returns False.
function Is_Punctuation_Connector (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
punctuation_connector; otherwise returns
False.
function Is_Space (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
separator_space; otherwise returns False.
function Is_NFKC (Item : Wide_Character) return Boolean;
{
AI12-0004-1}
{
AI12-0263-1}
Returns True if the Wide_Character designated by
Item could be present in a string normalized to Normalization Form KC
(as defined by Clause 21 of ISO/IEC 10646:2017), otherwise returns False.
Reason: Wide_Characters
for which this function returns False are not allowed in identifiers
(see 2.3) even if they are categorized as letters
or digits.
Implementation Note:
This function returns False if the Unicode property NFKC Quick Check
(NFKC_QC in the files) has the value No. See the Implementation Notes
in 2.3 for the source of this property.
Discussion: A string
for which Is_NFKC is true for every character may still not be in Normalization
Form KC, as Is_NFKC returns true for characters that are dependent on
characters around them as to whether they are removed by normalization.
Ada does not provide a full normalization operation (it is complex and
expensive).
function Is_Graphic (Item : Wide_Character) return Boolean;
{
AI05-0185-1}
Returns True if the Wide_Character designated by Item is categorized
as
graphic_character; otherwise returns False.
function To_Lower (Item : Wide_Character) return Wide_Character;
{
AI05-0185-1}
{
AI05-0266-1}
{
AI05-0299-1}
{
AI12-0263-1}
Returns the Simple Lowercase Mapping as defined by documents referenced
in
the note in Clause
2 1
of ISO/IEC 10646:
2017 2011
of the Wide_Character designated by Item. If the Simple Lowercase Mapping
does not exist for the Wide_Character designated by Item, then the value
of Item is returned.
Discussion: {
AI12-0263-1}
The “documents referenced” means Unicode,
Chapter 4 (specifically, section 4.2 — Case). The case mappings
come from Unicode as ISO/IEC 10646:
2017 2011
does not include
complete case mappings
.
See the Implementation Notes in subclause 1.1.4
for machine-readable versions of both Uppercase and Lowercase mappings (but rather references the Unicode ones as above).
function To_Lower (Item : Wide_String) return Wide_String;
{
AI05-0185-1}
Returns the result of applying the To_Lower conversion to each Wide_Character
element of the Wide_String designated by Item. The result is the null
Wide_String if the value of the formal parameter is the null Wide_String.
The lower bound of the result Wide_String is 1.
function To_Upper (Item : Wide_Character) return Wide_Character;
{
AI05-0185-1}
{
AI05-0266-1}
{
AI05-0299-1}
{
AI12-0263-1}
Returns the Simple Uppercase Mapping as defined by documents referenced
in
the note in Clause
2 1
of ISO/IEC 10646:
2017 2011
of the Wide_Character designated by Item. If the Simple Uppercase Mapping
does not exist for the Wide_Character designated by Item, then the value
of Item is returned.
function To_Upper (Item : Wide_String) return Wide_String;
{
AI05-0185-1}
Returns the result of applying the To_Upper conversion to each Wide_Character
element of the Wide_String designated by Item. The result is the null
Wide_String if the value of the formal parameter is the null Wide_String.
The lower bound of the result Wide_String is 1.
function To_Basic (Item : Wide_Character) return Wide_Character;
{
AI12-0260-1}
Returns the Wide_Character whose code point is
given by the first value of its Decomposition Mapping in the code charts
of ISO/IEC 10646:2017 if any; returns Item otherwise.
function To_Basic (Item : Wide_String) return Wide_String;
{
AI12-0260-1}
Returns the result of applying the To_Basic conversion
to each Wide_Character element of the Wide_String designated by Item.
The result is the null Wide_String if the value of the formal parameter
is the null Wide_String. The lower bound of the result Wide_String is
1.
Implementation Advice
{
AI05-0266-1}
The string returned by Character_Set_Version should include either “10646:”
or “Unicode”.
Implementation Advice: The string returned
by Wide_Characters.Handling.Character_Set_Version should include either
“10646:” or “Unicode”.
Discussion: {
AI12-0263-1}
The intent is that the returned string include the year for 10646 (as
in "10646:
2017 2011"),
and the version number for Unicode (as in "Unicode
10.0 6.0").
We don't try to specify that further so we don't need to decide how to
represent Corrigenda for 10646, nor which of these is preferred. (Giving
a Unicode version is more accurate, as the case folding and mapping rules
always come from a Unicode version [10646 just tells one to look at Unicode
to get those], and the character classifications ought to be the same
for equivalent versions, but we don't want to talk about non-ISO standards
in an ISO standard.)
8 {
AI05-0266-1}
The results returned by these functions may depend on which particular
version of the 10646 standard is supported by the implementation (see
2.1).
Extensions to Ada 2005
Incompatibilities With Ada 2012
{
AI12-0004-1}
{
AI12-0260-1}
Added additional classification
routines Is_Basic and Is_NFKC, and additional conversion routine To_Basic.
If Wide_Characters.Handling is referenced in a use_clause,
and an entity E with one of these defining_identifiers
is defined in a package that is also referenced in a use_clause,
the entity E may no longer be use-visible, resulting in errors.
This should be rare and is easily fixed if it does occur.
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe