CVS difference for ais/ai-00285.txt

Differences between 1.13 and version 1.14
Log of other versions for file ais/ai-00285.txt

--- ais/ai-00285.txt	2003/09/19 01:42:27	1.13
+++ ais/ai-00285.txt	2003/11/27 02:01:14	1.14
@@ -1,4 +1,4 @@
-!standard A.3.2(49)                                    03-06-12  AI95-00285/04
+!standard A.3.2(49)                                    03-11-05  AI95-00285/05
 !class amendment 02-01-23
 !status work item 02-09-24
 !status received 02-01-15
@@ -9,21 +9,28 @@
 !summary
 
 Support is added for program text using the entire set of characters from
-ISO/IEC 10646, and for operating on characters outside of the BMP at run-time.
+ISO/IEC 10646:2003, and for operating on characters outside of the BMP at run-
+time.
 
 !problem
 
 SC22 directed its working groups to provide support for the ISO/IEC 10646
-character set:
+character set. Resolution 02-24 "Recommendation on Coded Character Sets Support"
+of the SC22 2002 plenary states:
 
 "JTC 1/SC 22 believes that programming languages should offer the appropriate
 support for ISO/IEC 10646, and the Unicode character set where appropriate."
 
-Moreover, the working draft of ISO/IEC 10646:2003 makes use of planes other
-than the BMP.
+Moreover, ISO/IEC 10646:2003 makes use of planes other than the BMP.
 
 !proposal
 
+[Author's note: This AI is based on the working draft of ISO/IEC 10646:2003
+dated 2003-02-13.  This standard is currently in the FDIS stage and is expected
+to be published in 2003.  While the !proposal of this AI contains numerous
+references to Unicode, the !wording section is carefully phrased to avoid such
+mentions.]
+
 The essence of this proposal is to allow the source of the program to be
 written using 16-bit characters (from the BMP) or 32-bit characters. Also,
 it makes it possible to operate on 32-bit characters at run-time
@@ -31,11 +38,11 @@
 The main difficulty in supporting characters beyond Row 00 of the BMP in the
 program text is to define how identifiers and literals are built (which
 characters are letters, digits, etc.) and to define the lower/upper case
-equivalence rules. Fortunately, the people developing ISO/IEC 10646 have
-already done most of the work for us, so it's only a matter of defining how we
-want to piggyback on their categorization and conversion rules.
+equivalence rules. Fortunately, the Unicode Consortium has already done most of
+the work for us, so it's only a matter of defining how we want to piggyback on
+their categorization and conversion rules.
 
-ISO/IEC defines a "character database" which describes all the properties of
+Unicode defines a "character database" which describes all the properties of
 each character. The most important property for our purposes is the "General
 Category". General categories are disjoint. The following categories are of
 interest for describing Ada program text:
@@ -70,13 +77,12 @@
 
 Throughout the syntax rules, we specify which characters are allowed for the
 lexical elements. For instance, the E in the exponent part of a numeric literal
-may not be a "Greek Capital Letter Epsilon", even though a capital E and a
+may not be a "GREEK CAPITAL LETTER EPSILON", even though a capital E and a
 capital epsilon look very much the same. Similar considerations apply to the
-extended digits, the point, etc. So this means that we are not changing which
-characters may be used to build numeric_literals, based_literals, and so on.
+extended digits, the point, etc.
 
-ISO/IEC 10646 proposes to define identifiers for programming languages as
-follows (see http://www.unicode.org/unicode/reports/tr15/tr15-
+Unicode proposes to define identifiers for programming languages as follows (see
+http://www.unicode.org/unicode/reports/tr15/tr15-
 22.html#Programming_Language_Identifiers):
 
    identifier ::= identifier_start {identifier_start | identifier_extend}
@@ -111,8 +117,8 @@
                          other_format
    identifier ::= identifier_start {[punctuation_connector] identifier_extend}
 
-ISO/IEC 10646 recommends that, before storing or comparing identifiers, the
-following transformations be applied:
+Unicode recommends that, before storing or comparing identifiers, the following
+transformations be applied:
 
 o   Characters in category other_format are filtered out.
 o   For languages which have case insensitive identifiers, Normalization Form
@@ -124,7 +130,7 @@
     http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt, is used to
     find the uppercase version of each character.
 
-ISO/IEC 10646 doesn't provide guidance for the composition of numeric literals,
+Unicode doesn't provide guidance for the composition of numeric literals,
 but it is apparent that we can use the character categories above. So we
 define:
 
@@ -134,7 +140,7 @@
 Again, characters in category other_format (and punctuation_connector) are
 ignored when computing the value of a decimal literal. The numerical value of
 each character that is a number_decimal_digit is defined by the field "Decimal
-digit value" of the ISO/IEC 10646 character database.
+digit value" of the Unicode character database.
 
 The definition and role of format_effectors is modified to include the
 characters at positions 16#85#, 16#2028# and 16#2029#. These characters may be
@@ -187,59 +193,108 @@
 provided. Their definition is similar to that of Wide_Image, Wide_Value
 and Wide_Width, respectively, with Wide_Character and Wide_String replaced by
 Wide_Wide_Character and Wide_Wide_String.
-
-
-<<Open Issue>>
-
-There is a specific problem with spaces, and it is unclear what is the right
-thing to do.
 
-The dynamic semantics of a number of operations (attribute Value, procedures
-Get in Text_IO, procedures Trim in the string packages, etc.) are defined in
-terms of "space" and "blank". A space is the character at position 16#20# and a
-blank is either a space or a horizontal tabulation.
-
-For the purposes of this AI, it would be more consistent to replace space by
-separator_space, and let Get skip any separator_space, and Value and Trim trim
-leading and trailing separator_space. For instance, in a program operating on
-ideographs, it would be nice to skip/trim any Ideographic Space. Unfortunately,
-this would be an incompatibility. In the case of Value and Get, the
-incompatibility would only show up in cases which currently raise
-Constraint_Error, so it is probably acceptable. But in the case of Trim, this
-would be a silent change of the dynamic semantics...
+Note that the dynamic semantics of a number of operations (attribute Value,
+procedures Get in Text_IO, procedures Trim in the string packages, etc.) are
+defined in terms of "space" and "blank". A space is the character at position
+16#20# and a blank is either a space or a horizontal tabulation. We are not
+changing the definition of space or blank, so characters like NO-BREAK SPACE or
+IDEOGRAPHIC SPACE are not considered to be space or blank in this context.
+
+SC22/WG14 is considering the inclusion of support for Unicode 16- and 32-bit
+characters in C. Their current proposal can be found at
+http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n1010.pdf. In order to provide
+compatibility with the upcoming C standard, new types are added to Interfaces.C
+that correspond to C char16_t and char32_t. It is recognized that adding new
+declarations to predefined units can cause incompatibilities, but it is thought
+that the new identifiers are unlikely to conflict with existing code.
+
+<<Open Issue #1>>
+
+Some countries have expressed concern about referencing Unicode in the Ada
+standard. It must be noted that ISO/IEC 10646:2003 *does* reference versions 3.2
+and 4.0 of the Unicode standard, including in normative text. In order to avoid
+direct references to Unicode in the wording below, I am referencing the various
+documents indirectly through ISO/IEC 10646:2003. The Unicode documents that are
+needed for this AI are (1) the character categorization database and (2) the
+case folding table. It would be possible to state the categorization and folding
+rules explicitly in the RM, but that would add a few pages of incomprehensible
+gibberish that would be hard to maintain.
+
+Another option would be to reference annex A of ISO/IEC TR 10176:2002. However
+there are two issues with this approach. The first one is that this annex is
+based on Unicode 3.0, so there would probably be inconsistencies with ISO/IEC
+10646:2003. The second is that this TR (with contains guidelines for the
+preparation of language standards) is completely at odd with the way the Ada RM
+is written. If we reference one bit of it, it begs the question: why are we not
+complying with the entire document?
+
+<<Open Issue #2>>
+
+Unicode makes recommendations regarding the character repertoire to use for
+programming language identifiers, but they don't make any recommendation
+regarding the character repertoire to use for numeric literals. The AI is
+currently written in such a way that national variants of digits are allowed in
+character literals. Is this a good idea? The rationale is that if people from a
+non-occidental culture write code in their native language, it might be useful
+for them to use their native notation for numbers, assuming that occidental
+digits are not the most commonly used form of digits used in their culture (for
+instance, arabic-speaking countries seem to favor the arabic form of the digits).
+
+<<Open Issue #3>>
+
+Unicode recommends to use Normalization Form KC (which is defined by reference
+by ISO/IEC 10646:2003) for identifiers. The purpose of this normalization
+process is to deal with the case where the same logical character may be
+represented by several distinct sequences of physical characters. For instance
+the sequence of two characters:
+
+   LATIN CAPITAL LETTER E -- 16#0045#
+   COMBINING ACUTE ACCENT -- 16#0301#
+
+is transformed by normalization form KC into:
+
+   LATIN CAPITAL LETTER E WITH ACUTE -- 16#00C9#
+
+The purpose of normalization is to ensure that two program texts that look the
+same to users will actually be processed identically by the compiler. For
+instance, it may be that two users on the same project are using different
+editors, and that one produces the above sequence of two characters when the
+other produces a single (accented) character. It would be nice if the difference
+were invisible to users.
+
+Is this a good idea? If we apply normalization to identifiers, should we apply
+it to string and character literals, too? Or should we base the language solely
+on the sequence of physical characters, and force the users to look for errors
+in their programs in a hex dump of their source?
 
-<<Open Issue>>
 
-Do we need types corresponding to Wide_Wide_Character and Wide_Wide_String in
-Interfaces.C?  What does C do about 32-bit characters?
-
-
 !wording
 
 In (32) change:
 
-... Character, [and Wide_Character]{Wide_Character and Wide_Wide_Character} ...
+... Character, [and Wide_Character]{Wide_Character, and Wide_Wide_Character} ...
 
 
 In (34) change:
 
-... String [and Wide_String]{, Wide_String and Wide_Wide_String} ...
+... String [and Wide_String]{, Wide_String, and Wide_Wide_String} ...
 
 
 Add after 1.1.4(14):
 
 The nonterminals of the grammar, including reserved words and components of
-lexical elements, are exclusively made of the characters whose Code Point is
+lexical elements, are exclusively made of the characters whose code position is
 between 16#20# and 16#7E#, inclusively. For example, the character E in the
-definition of exponent is the character whose name is "Latin Capital Letter E",
-not "Greek Capital Letter Epsilon".
+definition of exponent is the character whose name is "LATIN CAPITAL LETTER E",
+not "GREEK CAPITAL LETTER EPSILON".
 
 
 Replace 2.1(1) by:
 
-The characters whose Code Point is 16#FFFE# or 16#FFFF# are not allowed
+The characters whose code position is 16#FFFE# or 16#FFFF# are not allowed
 anywhere in the text of a program. The characters in categories other_control,
-other_private_use and other_surrogate are only allowed in comments.
+other_private_use, and other_surrogate are only allowed in comments.
 
 
 Delete 2.1(2-3).
@@ -248,112 +303,95 @@
 Replace 2.1(4-14) by:
 
 The character repertoire for the text of an Ada program consists of the
-collection of characters described by the ISO/IEC 10646 Universal
-Multiple-Octet Coded Character Set [Author's note: I am actually using Unicode
-3.2.0]. The coded representation for these characters is implementation defined
-(it need not be a representation defined within ISO/IEC 10646).
+collection of characters described by the ISO/IEC 10646:2003 Universal
+Multiple-Octet Coded Character Set. The coded representation for these
+characters is implementation defined (it need not be a representation defined
+within ISO/IEC 10646:2003).
 
 The description of the language definition in this International Standard uses
-the fields Code Point, Character Name, General Category, Decimal Digit Value
-and Unicode 1.0 Name of the character database defined by ISO/IEC 10646. The
-actual set of graphic symbols used by an implementation for the visual
-representation of the text of an Ada program is not specified.
+the character properties General Category and Decimal Digit Value of the
+documents referenced by the note in section 1 of ISO/IEC 10646:2003. The actual
+set of graphic symbols used by an implementation for the visual representation
+of the text of an Ada program is not specified.
+
+[Author's note: the above jargon is a polite way of saying Unicode without using
+the characters U, n, i, c, o, d and e.  ISO/IEC 10646:2003 references Unicode
+all over the place, including in normative text.  As a matter of fact, a number
+of Unicode technical reports are listed in the "Normative references" section of
+ISO/IEC 10646:2003.  So rather than directly referencing Unicode, which might be
+hard to swallow for WG9 or SC22, I am using an indirect reference through
+ISO/IEC 10646:2003, which hopefully will be considered kosher.]
 
 The categories of characters are defined as follows:
 
 letter_uppercase
-Any character whose General Category is defined by ISO/IEC 10646 to be "Letter,
-Uppercase".
+Any character whose General Category is defined to be "Letter, Uppercase".
 
 letter_lowercase
-Any character whose General Category is defined by ISO/IEC 10646 to be "Letter,
-Lowercase".
+Any character whose General Category is defined to be "Letter, Lowercase".
 
 letter_titlecase
-Any character whose General Category is defined by ISO/IEC 10646 to be "Letter,
-Titlecase".
+Any character whose General Category is defined to be "Letter, Titlecase".
 
 letter_modifier
-Any character whose General Category is defined by ISO/IEC 10646 to be "Letter,
-Modifier".
+Any character whose General Category is defined to be "Letter, Modifier".
 
 letter_other
-Any character whose General Category is defined by ISO/IEC 10646 to be "Letter,
-Other".
+Any character whose General Category is defined to be "Letter, Other".
 
 mark_non_spacing
-Any character whose General Category is defined by ISO/IEC 10646 to be "Mark,
-Non-Spacing".
+Any character whose General Category is defined to be "Mark, Non-Spacing".
 
 mark_spacing_combining
-Any character whose General Category is defined by ISO/IEC 10646 to be "Mark,
-Spacing Combining".
+Any character whose General Category is defined to be "Mark, Spacing Combining".
 
 number_decimal_digit
-Any character whose General Category is defined by ISO/IEC 10646 to be "Number,
-Decimal Digit".
+Any character whose General Category is defined to be "Number, Decimal Digit".
 
 number_letter
-Any character whose General Category is defined by ISO/IEC 10646 to be "Number,
-Letter".
+Any character whose General Category is defined to be "Number, Letter".
 
 other_control
-Any character whose General Category is defined by ISO/IEC 10646 to be "Other,
-Control".
+Any character whose General Category is defined to be "Other, Control".
 
 other_format
-Any character whose General Category is defined by ISO/IEC 10646 to be "Other,
-Format".
+Any character whose General Category is defined to be "Other, Format".
 
 other_private_use
-Any character whose General Category is defined by ISO/IEC 10646 to be "Other,
-Private Use".
+Any character whose General Category is defined to be "Other, Private Use".
 
 other_surrogate
-Any character whose General Category is defined by ISO/IEC 10646 to be "Other,
-Surrogate".
+Any character whose General Category is defined to be "Other, Surrogate".
 
 punctuation_connector
-Any character whose General Category is defined by ISO/IEC 10646 to be
-"Punctuation, Connector".
+Any character whose General Category is defined to be "Punctuation, Connector".
 
 separator_space
-Any character whose General Category is defined by ISO/IEC 10646 to be
-"Separator, Space".
+Any character whose General Category is defined to be "Separator, Space".
 
 separator_line
-Any character whose General Category is defined by ISO/IEC 10646 to be
-"Separator, Line".
+Any character whose General Category is defined to be "Separator, Line".
 
 separator_paragraph
-Any character whose General Category is defined by ISO/IEC 10646 to be
-"Separator, Paragraph".
+Any character whose General Category is defined to be "Separator, Paragraph".
 
 format_effector
-The characters whose Unicode 1.0 name is "Character Tabulation", "Line
-Tabulation", "Carriage Return (CR)", "Line Feed (LF)", "Form Feed (FF)" and
-"Next Line (NEL)", and the characters in categories separator_line and
-separator_paragraph.
+The characters whose code position is 16#09# (CHARACTER TABULATION), 16#0A#
+(LINE FEED(LF)), 16#0B# (LINE TABULATION), 16#0C# (FORM FEED(FF)), 16#0D#
+(CARRIAGE RETURN(CR)), 16#85# (NEXT LINE(NEL)), and the characters in categories
+separator_line and separator_paragraph.  The names mentioned in parenthese in
+this list are *not* defined by ISO/IEC 10646; they are only used for convenience
+in this International Standard.
 
 graphic_character
 Any character which is not in the categories other_control, other_private_use,
-other_surrogate, other_format, format_effector, and whose Code Point is neither
-16#FFFE# nor 16#FFFF#. (This includes all the characters that have not yet been
-classified by ISO/IEC 10646.)
+other_surrogate, other_format, format_effector, and whose code position is
+neither 16#FFFE# nor 16#FFFF#.
 
 
 Delete 2.1(15).
 
 
-Add after 2.1(16):
-
-Documentation Requirement
-
-As the ISO/IEC 10646 character set is constantly evolving (in particular by the
-addition of new languages), an implementation shall document to which version
-of ISO/IEC 10646 it conforms.
-
-
 Delete 2.1(17).
 
 
@@ -371,24 +409,24 @@
 
 Replace 2.2(8-9) by:
 
-A delimiter is either one of the characters whose Character Name is:
+A delimiter is either one of the characters whose name is:
 
-o   Ampersand
-o   Apostrophe
-o   Left Parenthesis
-o   Right Parenthesis
-o   Asterisk
-o   Plus Sign
-o   Comma
-o   Hyphen-Minus
-o   Full Stop
-o   Solidus
-o   Colon
-o   Semicolon
-o   Less-Than Sign
-o   Equals Sign
-o   Greater-Than Sign
-o   Vertical Line
+o   AMPERSAND
+o   APOSTROPHE
+o   LEFT PARENTHESIS
+o   RIGHT PARENTHESIS
+o   ASTERISK
+o   PLUS SIGN
+o   COMMA
+o   HYPHEN-MINUS
+o   FULL STOP
+o   SOLIDUS
+o   COLON
+o   SEMICOLON
+o   LESS-THAN SIGN
+o   EQUALS SIGN
+o   GREATER-THAN SIGN
+o   VERTICAL LINE
 
 
 Replace 2.3(2-3) by:
@@ -413,9 +451,11 @@
 characters after applying the following transformations (in this order):
 
 o   The characters in category other_format are eliminated.
-o   Normalization Form KC of ISO/IEC 10646 is applied to the identifier.
-o   Full case folding, as defined by ISO/IEC 10646, is applied to obtain the
-    uppercase version of each character.
+o   Normalization Form KC defined by section 24 of ISO/IEC 10646:2003 is applied
+    to the identifier.
+o   Full case folding, as defined by documents referenced in the note in
+    section 1 of ISO/IEC 10646:2003, is applied to obtain the uppercase version
+    of each character.
 
 
 Replace 2.4.1(3) by:
@@ -450,7 +490,7 @@
 
 Add after 2.6(7):
 
-No modification is performed on the sequence of characters in a string_literal. In particular, Normalization Form KC is _not_ applied. Therefore, two strings which look alike may not compare equal.
+No modification is performed on the sequence of characters in a string_literal.
 In particular, Normalization Form KC is _not_ applied. Therefore, two strings
 which look alike may not compare equal.
 
@@ -563,23 +603,23 @@
 In the middle of 3.5.2(2), change:
 
 ... the attributes [(Wide_)Image and (Wide_)Value]{Image, Wide_Image,
-Wide_Wide_Image, Value, Wide_Value and Wide_Wide_Value}
+Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value}
 
 
 Add after 3.5.2(3):
 
 The predefined type Wide_Wide_Character is a character type whose values
-correspond to the 2147483648 code points of the ISO/IEC 10646 character set.
-Each of the graphic_characters has a corresponding character_literal in
+correspond to the 2147483648 code positions of the ISO/IEC 10646:2003 character
+set. Each of the graphic_characters has a corresponding character_literal in
 Wide_Wide_Character. The first 65536 values of Wide_Wide_Character have the
 same character_literal or language-defined name as defined for Wide_Character.
 
-In types Wide_Character and Wide_Wide_Characters, the characters whose Code
-Points are 16#FFFE# and 16#FFFF# are assigned the language-defined names FFFE
-and FFFF. The other characters whose Code Point is larger than 16#FF# and which
-are not graphic_characters have language-defined names which are formed by
-appending to the string "Character_" the representation of their Code Point in
-hexadecimal as four extended digits (in the case of Wide_Character) or eight
+In types Wide_Character and Wide_Wide_Character, the characters whose code
+positions are 16#FFFE# and 16#FFFF# are assigned the language-defined names FFFE
+and FFFF. The other characters whose code position is larger than 16#FF# and
+which are not graphic_characters have language-defined names which are formed by
+appending to the string "Character_" the representation of their code position
+in hexadecimal as four extended digits (in the case of Wide_Character) or eight
 extended digits (in the case of Wide_Wide_Character). As with other
 language-defined names, these names are usable only with the attributes
 (Wide_)Wide_Image and (Wide_)Wide_Value; they are not usable as enumeration
@@ -588,7 +628,7 @@
 
 In 3.5.2(4) change:
 
-... Character [and Wide_Character]{, Wide_Character and Wide_Wide_Character}
+... Character [and Wide_Character]{, Wide_Character, and Wide_Wide_Character}
 ...
 
 
@@ -597,7 +637,7 @@
 
 Replace 3.6.3(2) by:
 
-There are three predefined string types, String, Wide_String and
+There are three predefined string types, String, Wide_String, and
 Wide_Wide_String, each indexed by the value of the predefined subtype Positive;
 these are declared in the visible part of package Standard:
 
@@ -614,9 +654,9 @@
 
 Add in the middle of A.1(36)
 
-    -- The declaration of type Wide_Wide_Character is based on the full ISO/IEC
-    -- character set. The first 2 ** 16 positions have the same contents as type
-    -- Wide_Character. See 3.5.2.
+    -- The declaration of type Wide_Wide_Character is based on the full
+    -- ISO/IEC 10646:2003 character set. The first 2 ** 16 positions have the
+    -- same contents as type Wide_Character. See 3.5.2.
     type Wide_Wide_Character is (nul, soh, ..., FFFE, FFFF, ...);
 
 
@@ -630,7 +670,7 @@
 
 Replace the beginning of A.1(49) by:
 
-In each of the type Character [and Wide_Character]{, Wide_Character and
+In each of the type Character [and Wide_Character]{, Wide_Character, and
 Wide_Wide_Character} ...
 
 
@@ -682,7 +722,7 @@
 
 The following functions test Wide_Wide_Character or Wide_Character values for
 membership in Wide_Character or Character, or convert between corresponding
-characters of Wide_Wide_Character, Wide_Character and Character.
+characters of Wide_Wide_Character, Wide_Character, and Character.
 
 function Is_Character (Item : in Wide_Character) return Boolean;
 Returns True if Wide_Character'Pos(Item) <= Character'Pos(Character'Last).
@@ -757,7 +797,7 @@
 
 In A.4(1) change:
 
-... both String [and Wide_String]{, Wide_String and Wide_Wide_String} ...
+... both String [and Wide_String]{, Wide_String, and Wide_Wide_String} ...
 
 
 Add after A.4.1(4):
@@ -827,7 +867,8 @@
             return Wide_Wide_Character_Set;
       function To_Sequence (Set : in Wide_Wide_Character_Set)
             return Wide_Wide_Character_Sequence;
-      -- Representation for a Wide_Wide_Character to Wide_Wide_Character mapping:
+      -- Representation for a Wide_Wide_Character to Wide_Wide_Character
+      -- mapping:
       type Wide_Wide_Character_Mapping is private;
       function Value (Map : in Wide_Wide_Character_Mapping;
                       Element : in Wide_Wide_Character)
@@ -884,7 +925,7 @@
 
 In A.6(1) change:
 
-... packages Text_IO [and Wide_Text_IO]{, Wide_Text_IO and Wide_Wide_Text_IO}
+... packages Text_IO [and Wide_Text_IO]{, Wide_Text_IO, and Wide_Wide_Text_IO}
 ...
 
 
@@ -901,7 +942,7 @@
 
 In A.7(13) change:
 
-... Direct_IO, Text_IO [and Wide_Text_IO]{, Wide_Text_IO and Wide_Wide_Text_IO}
+... Direct_IO, Text_IO [and Wide_Text_IO]{, Wide_Text_IO, and Wide_Wide_Text_IO}
 ...
 
 
@@ -943,7 +984,7 @@
 In A.12(1) change:
 
 ... Text_IO.Text_Streams [and Wide_Text_IO.Text_Streams]{,
-Wide_Text_IO.Text_Streams and Wide_Wide_Text_IO.Text_Streams} ...
+Wide_Text_IO.Text_Streams, and Wide_Wide_Text_IO.Text_Streams} ...
 
 
 Add a new section after A.12.3:
@@ -968,30 +1009,173 @@
 Streams.Stream_IO.
 
 
+Add after B.3(39):
+
+   -- ISO/IEC 10646:2003 compatible types defined by SC22/WG14 document N1010.
+
+   type char16_t is <implementation-defined character type>;
+
+   char16_nul : constant char16_t := implementation-defined;
+
+   function To_C (Item : in Wide_Character) return char16_t;
+   function To_Ada (Item : in char16_t) return Wide_Character;
+
+   type char16_array is array (size_t range <>) of aliased char16_t;
+
+   pragma Pack(char16_array);
+
+   function Is_Nul_Terminated (Item : in char16_array) return Boolean;
+   function To_C (Item : in Wide_String;
+                  Append_Nul : in Boolean := True)
+      return char16_array;
+
+   function To_Ada (Item : in char16_array;
+                    Trim_Nul : in Boolean := True)
+      return Wide_String;
+
+   procedure To_C (Item : in Wide_String;
+                   Target : out char16_array;
+                   Count : out size_t;
+                   Append_Nul : in Boolean := True);
+
+   procedure To_Ada (Item : in char16_array;
+                     Target : out Wide_String;
+                     Count : out Natural;
+                     Trim_Nul : in Boolean := True);
+
+   type char16_t is <implementation-defined character type>;
+
+   char16_nul : constant char16_t := implementation-defined;
+
+   function To_C (Item : in Wide_Character) return char16_t;
+   function To_Ada (Item : in char16_t) return Wide_Character;
+
+   type char32_array is array (size_t range <>) of aliased char32_t;
+
+   pragma Pack(char32_array);
+
+   function Is_Nul_Terminated (Item : in char32_array) return Boolean;
+   function To_C (Item : in Wide_Wide_String;
+                  Append_Nul : in Boolean := True)
+      return char32_array;
+
+   function To_Ada (Item : in char32_array;
+                    Trim_Nul : in Boolean := True)
+      return Wide_Wide_String;
+
+   procedure To_C (Item : in Wide_Wide_String;
+                   Target : out char32_array;
+                   Count : out size_t;
+                   Append_Nul : in Boolean := True);
+
+   procedure To_Ada (Item : in char32_array;
+                     Target : out Wide_Wide_String;
+                     Count : out Natural;
+                     Trim_Nul : in Boolean := True);
+
+
+In B.3(43) change:
+
+The types int, short, long, unsigned, ptrdiff_t, size_t, double, char [, and
+wchar_t]{, wchar_t, char16_t, and char32_t} correspond respectively to the C
+types having the same names.
+
+
+Add after B.3(60):
+
+   function Is_Nul_Terminated (Item : in char16_array) return Boolean;
+
+      The result of Is_Nul_Terminated is True if Item contains char16_nul, and
+      is False otherwise.
+
+   function To_C (Item : in Wide_Character) return char16_t;
+   function To_Ada (Item : in char16_t ) return Wide_Character;
+
+   To_C and To_Ada provide the mappings between the Ada and C 16-bit character
+   types.
+
+   function To_C (Item : in Wide_String;
+                  Append_Nul : in Boolean := True)
+      return char16_array;
+
+   function To_Ada (Item : in char16_array;
+                    Trim_Nul : in Boolean := True)
+      return Wide_String;
+
+   procedure To_C (Item : in Wide_String;
+                   Target : out char16_array;
+                   Count : out size_t;
+                   Append_Nul : in Boolean := True);
+
+   procedure To_Ada (Item : in char16_array;
+                     Target : out Wide_String;
+                     Count : out Natural;
+                     Trim_Nul : in Boolean := True);
+
+      The To_C and To_Ada subprograms that convert between Wide_String and
+      char16_array have analogous effects to the To_C and To_Ada subprograms
+      that convert between String and char_array, except that char16_nul is used
+      instead of nul.
+
+   function Is_Nul_Terminated (Item : in char32_array) return Boolean;
+
+      The result of Is_Nul_Terminated is True if Item contains char16_nul, and
+      is False otherwise.
+
+   function To_C (Item : in Wide_Wide_Character) return char32_t;
+   function To_Ada (Item : in char32_t ) return Wide_Wide_Character;
+
+   To_C and To_Ada provide the mappings between the Ada and C 32-bit character
+   types.
+
+   function To_C (Item : in Wide_Wide_String;
+                  Append_Nul : in Boolean := True)
+      return char32_array;
+
+   function To_Ada (Item : in char32_array;
+                    Trim_Nul : in Boolean := True)
+      return Wide_Wide_String;
+
+   procedure To_C (Item : in Wide_Wide_String;
+                   Target : out char32_array;
+                   Count : out size_t;
+                   Append_Nul : in Boolean := True);
+
+   procedure To_Ada (Item : in char32_array;
+                     Target : out Wide_Wide_String;
+                     Count : out Natural;
+                     Trim_Nul : in Boolean := True);
+
+      The To_C and To_Ada subprograms that convert between Wide_Wide_String and
+      char32_array have analogous effects to the To_C and To_Ada subprograms
+      that convert between String and char_array, except that char32_nul is used
+      instead of nul.
+
+
 At the beginning of C.5(7) change:
 
 If the pragma applies to an enumeration type, then the semantics of the
 Wide_Wide_Image and Wide_Wide_Value attributes are implementation defined for
-that type; the semantics of Image, Wide_Image, Value and Wide_Value are still
+that type; the semantics of Image, Wide_Image, Value, and Wide_Value are still
 defined in terms of Wide_Wide_Image and Wide_Wide_Value...
 
 
 In F(4) change:
 
-... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing and
+... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing, and
 Wide_Wide_Text_IO.Editing} ...
 
 
 In F.3(1) change:
 
-... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing and
+... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing, and
 Wide_Wide_Text_IO.Editing} ...
 
 
 At the beginning of F.3(1) change:
 
 The child packages Text_IO.Editing [and Wide_Text_IO.Editing]{,
-Wide_Text_IO.Editing and Wide_Wide_Text_IO.Editing}...
+Wide_Text_IO.Editing, and Wide_Wide_Text_IO.Editing}...
 
 
 Add at the end of F.3(6):
@@ -1008,7 +1192,7 @@
 
 In F.3(20) change:
 
-... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing and
+... Text_IO.Editing [and Wide_Text_IO.Editing]{, Wide_Text_IO.Editing, and
 Wide_Wide_Text_IO.Editing} ...
 
 

Questions? Ask the ACAA Technical Agent