Version 1.1 of ai05s/ai05-0185-1.txt

Unformatted version of ai05s/ai05-0185-1.txt version 1.1
Other versions for file ai05s/ai05-0185-1.txt

!standard A.3.5 (0)          09-11-02 AI05-0185-1/01
!standard A.3.6 (0)
!class amendment 09-11-02
!status work item 09-11-02
!status received 09-11-02
!priority Medium
!difficulty Medium
!subject Wide_Character and Wide_Wide_Character classification and folding
!summary
Packages are needed to provide support for the classification and case folding of Wide_Character and Wide_Wide_Character values.
!problem
The package Ada.Characters.Handling provides functions to classify a Character, and provides procedures to convert a Character to upper case and lower case. There are no such capabilities for Wide_Character and Wide_Wide_Character. Support for classification and case folding of the Wide_Character and Wide_Wide_Character types should be added to the language.
!proposal
The current version of the GNAT compiler has defined the following implementation-defined packages;
Ada.Wide_Characters.Unicode Ada.Wide_Wide_Characters.Unicode
While Ada.Wide_Characters and Ada.Wide_Wide_Characters are standard Ada 2005 packages, the Unicode child packages are non-standard.
This proposal to create two standard packages;
Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling
based on the GNAT Unicode packages, but without the functions that accept Unicode Category parameters.
!wording
A.3.5 The Package Wide_Characters.Handling
The package Wide_Characters.Handling provides operations for classifying Wide_Characters and case folding for Wide_Characters.
Static Semantics
The library package Wide_Characters.Handling has the following declaration:
package Ada.Wide_Characters.Handling is
function Is_Control (Item : Wide_Character) return Boolean;
function Is_Letter (Item : Wide_Character) return Boolean; pragma Inline (Is_Letter);
function Is_Lower (Item : Wide_Character) return Boolean;
function Is_Upper (Item : Wide_Character) return Boolean;
function Is_Digit (Item : Wide_Character) return Boolean; pragma Inline (Is_Digit);
function Is_Decimal_Digit (Item : Wide_Character) return Boolean;
function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean;
function Is_Alphanumeric (Item : Wide_Character) return Boolean;
function Is_Special (Item : Wide_Character) return Boolean;
function Is_Line_Terminator (Item : Wide_Character) return Boolean; pragma Inline (Is_Line_Terminator);
function Is_Mark (Item : Wide_Character) return Boolean;
function Is_Other (Item : Wide_Character) return Boolean;
function Is_Punctuation (Item : Wide_Character) return Boolean; pragma Inline (Is_Punctuation);
function Is_Space (Item : Wide_Character) return Boolean; pragma Inline (Is_Space);
function Is_Graphic (Item : Wide_Character) return Boolean; pragma Inline (Is_Non_Graphic);
function To_Lower (Item : Wide_Character) return Wide_Character; function To_Upper (Item : Wide_Character) return Wide_Character;
function To_Lower (Item : Wide_String) return Wide_String; function To_Upper (Item : Wide_String) return Wide_String;
end Ada.Wide_Characters.Handling;
The subprograms defined in Ada.Wide_Characters.Handling are locale independent.
function Is_Control (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as other_control, otherwise returns false.
function Is_Letter (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, or number_letter. Otherwise returns false.
function Is_Lower (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as letter_lowercase, otherwise returns false.
function Is_Upper (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, otherwise returns false.
function Is_Digit (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as number_decimal, otherwise returns false.
function Is_Decimal_Digit (Item : Wide_Character) return Boolean;
This function is a rename of Is_Digit.
function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as number_decimal, or is in the range 'A' .. 'F' or 'a' .. 'f', otherwise returns false.
function Is_Alphanumeric (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, number_letter, or number_decimal. Otherwise returns false.
function Is_Special (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as graphic_character, but not categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, number_letter, or number_decimal. Otherwise returns false.
function Is_Line_Terminator (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as separator_line or separator_paragraph, or if Item is a conventional line terminator character (CR, LF, VT, or FF). Otherwise returns false.
function Is_Mark (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as mark_non_spacing or mark_spacing_combining, otherwise returns false.
function Is_Other (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as other_format, otherwise returns false.
function Is_Punctuation (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as punctuation_connector, otherwise returns false.
function Is_Space (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as separator_space, otherwise returns false.
function Is_Graphic (Item : Wide_Character) return Boolean;
Returns True if the Wide_Character designated by Item is categorized as graphic_character, otherwise returns false.
function To_Lower (Item : Wide_Character) return Wide_Character;
Returns the Simple Lowercase Mapping of the Wide_Character designated by Item. If the Simple Lowercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
function To_Lower (Item : Wide_String) return Wide_String;
Returns the result of applying the To_Lower Wide_Character to Wide_Character conversion to each element of the Wide_String designated by Item. The result is the null Wide_String if the value of the formal parameter is the null Wide_String.
function To_Upper (Item : Wide_Character) return Wide_Character;
Returns the Simple Uppercase Mapping of the Wide_Character designated by Item. If the Simple Uppercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
function To_Upper (Item : Wide_String) return Wide_String;
Returns the result of applying the To_Upper Wide_Character to Wide_Character conversion to each element of the Wide_String designated by Item. The result is the null Wide_String if the value of the formal parameter is the null Wide_String.
A.3.6 The Package Wide_Wide_Characters.Handling
The package Wide_Wide_Characters.Handling has the same contents as Wide_Character.Handling except that each occurrence of Wide_Character is replaced by Wide_Wide_Character, and each occurrence of Wide_String is replaced by Wide_Wide_String.
!discussion
The GNAT Unicode packages defines a Category type which maps to the Unicode standard. Second forms of most of the classification routines exist that operate on category type parameters instead of Wide_Character or Wide_Wide_Character. The reason for these routines is that it is claimed they are more efficient if multiple classification tests are to be performed on a Wide_Character or Wide_Wide_Character value, otherwise the other form of the call that accepts Wide_Character or Wide_Wide_Character is expected to be more efficient. The category type however would tie the package more closely to the Unicode standard, whereas it is desirable to hide that abstraction. Furthermore, adding these routines would likely mean having to define a package like System.UTF_32 which is currently defined in GNAT. It seems that the categorization routines are not necessary for the standard, and might be better left as implementation-defined functionality.
The package Ada.Characters.Handling defines classification routines that are not present in the GNAT Wide_Characters.Unicode and GNAT Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling. Specifically, Is_Control, Is_Lower, Is_Upper, Is_Basic, Is_Decimal_Digit, Is_Graphic, Is_Hexadecimal_Digit, Is_Alphanumeric, and Is_Special are absent. These should be provided to be consistent with Ada.Characters.Handling
The Non_Graphic routine was replaced with Graphic, otherwise the remaining functions were added, except for the Is_Basic function, and the To_Basic functions. It is not clear whether these functions have any meaning in Wide_Character or Wide_Wide_Character contexts, as there do not appear to be any Unicode functions for stripping off diacritical marks, and it is not clear that doing so would result with a string that was meaningful.
Also, the ISO_646 related functions were not added, since those deal with 8 bit values, they were deemed not appropriate for Wide_Character and Wide_Wide_Character contexts.
Another question is whether some of the new classification functions should be added to Ada.Characters.Handling. The wording in the RM for that package describes the classification in terms of character ranges rather than the categories defined in 2.1. Should these be reworded in terms of these categories? [That question is tangentally covered by AI05-0114-1 - Editor.]
!example
(See discussion.)
!ACATS test
!appendix

****************************************************************

Questions? Ask the ACAA Technical Agent