CVS difference for ais/ai-00285.txt

Differences between 1.15 and version 1.16
Log of other versions for file ais/ai-00285.txt

--- ais/ai-00285.txt	2004/03/02 04:44:58	1.15
+++ ais/ai-00285.txt	2004/06/10 05:39:56	1.16
@@ -1,4 +1,4 @@
-!standard A.3.2(49)                                    04-02-27  AI95-00285/06
+!standard A.3.2(49)                                    04-06-03  AI95-00285/07
 !class amendment 02-01-23
 !status work item 02-09-24
 !status received 02-01-15
@@ -26,10 +26,12 @@
 !proposal
 
 [Author's note: This AI is based on the working draft of ISO/IEC 10646:2003
-dated 2003-02-13.  This standard is currently in the FDIS stage and is expected
-to be published in 2003.  While the !proposal of this AI contains numerous
-references to Unicode, the !wording section is carefully phrased to avoid such
-mentions.]
+dated 2003-02-13.  This standard was published on 2003-12-15, but at the time of
+this writing I don't have access to a copy of the final standard.  While the
+!proposal of this AI contains numerous references to Unicode, the !wording
+section is carefully phrased to avoid such mentions.  It would be possible to
+remove references to Unicode from the !proposal, but it seems that having
+pointers to all the Unicode information would be useful for implementers.]
 
 The essence of this proposal is to allow the source of the program to be
 written using 16-bit characters (from the BMP) or 32-bit characters. Also,
@@ -65,7 +67,8 @@
    - Separator, Line         -- e.g., LINE SEPARATOR
    - Separator, Paragraph    -- e.g., PARAGRAPH SEPARATOR
 
-(See http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html for
+(See http://www.unicode.org/Public/4.0-Update/UCD-
+4.0.0.html#General_Category_Values for
 details on the categorization.)
 
 In paragraph 2.1 we define a non-terminal of the grammar for each of the above
@@ -82,8 +85,8 @@
 extended digits, the point, etc.
 
 Unicode proposes to define identifiers for programming languages as follows (see
-http://www.unicode.org/unicode/reports/tr15/tr15-
-22.html#Programming_Language_Identifiers):
+annex 7 of UAX #15 at http://www.unicode.org/reports/tr15/tr15-
+23.html#Programming_Language_Identifiers):
 
    identifier ::= identifier_start {identifier_start | identifier_extend}
    identifier_start ::= letter_uppercase |
@@ -123,13 +126,21 @@
 o   Characters in category other_format are filtered out.
 o   For languages which have case insensitive identifiers, Normalization Form
     KC is applied (see
-    http://www.unicode.org/unicode/reports/tr15/tr15-22.html#Specification).
+    http://www.unicode.org/reports/tr15/tr15-23.html#Specification).
     This is to ensure that identifiers which look visually the same are
     considered as identical, even if they are composed of different characters.
 o   _Full_ case folding, as described in the table
-    http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt, is used to
+    http://www.unicode.org/Public/4.0-Update/CaseFolding-4.0.0.txt, is used to
     find the uppercase version of each character.
 
+We decided not to apply Normalization Form KC, as there seems to be insuffient
+experience on using normalization forms. This seems to be a lose-lose situation
+anyway: without normalization, texts that look alike don't have the same
+meaning; with normalization the widely available text tools like grep, awk, etc.
+don't work. We allow an implementation to provide a mode in which it performs
+normalization so that it can "do the right thing" if it turns out that usage of
+normalization becomes prevalent.
+
 Unicode doesn't provide guidance for the composition of numeric literals, so we
 don't change them. They are probably not very important from the
 internationalization standpoint anyway.
@@ -141,7 +152,8 @@
 
 The definition and role of format_effectors is modified to include the
 characters at positions 16#85#, 16#2028# and 16#2029#. These characters may be
-used to terminate lines, as recommended by http://www.unicode.org/reports/tr13.
+used to terminate lines, as recommended by section 5.8 of Unicode 4.0 (see
+http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213).
 
 Note that characters in category other_format are forbidden in
 character_literals and string_literals, because their sole purpose is to affect
@@ -156,7 +168,7 @@
 
 We are removing 3.5.2(5) since an implementation may want to provide a
 nonstandard mode where the set of graphic characters is not a proper subset of
-that defined in ISO/IEC 10646, for instance to deal with private use
+that defined in ISO/IEC 10646:2003, for instance to deal with private use
 characters. We don't want to prevent implementations from doing anything
 useful. This paragraph has no force anyway, since in a non-standard mode an
 implementation may do pretty much what it likes.
@@ -181,7 +193,7 @@
 
     Wide_Character_Set : constant Wide_Wide_Maps.Wide_Wide_Character_Set;
 
-It contains each Wide_Wide_Character value in the BMP of ISO/IEC 10646.
+It contains each Wide_Wide_Character value in the BMP of ISO/IEC 10646:2003.
 
 The attributes Wide_Wide_Image, Wide_Wide_Value and Wide_Wide_Width are also
 provided. Their definition is similar to that of Wide_Image, Wide_Value
@@ -196,8 +208,9 @@
 IDEOGRAPHIC SPACE are not considered to be space or blank in this context.
 
 SC22/WG14 is considering the inclusion of support for Unicode 16- and 32-bit
-characters in C. Their current proposal can be found at
-http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n1010.pdf. In order to provide
+characters in C. Their proposal is presented in ISO/IEC TR 19769
+(http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n1040.pdf). At the time of this
+writing, this technical report is in the FDIS ballot stage. In order to provide
 compatibility with the upcoming C standard, new types are added to Interfaces.C
 that correspond to C char16_t and char32_t. It is recognized that adding new
 declarations to predefined units can cause incompatibilities, but it is thought
@@ -206,9 +219,9 @@
 <<Open Issue #1>>
 
 Some countries have expressed concern about referencing Unicode in the Ada
-standard. It must be noted that ISO/IEC 10646:2003 *does* reference versions 3.2
-and 4.0 of the Unicode standard, including in normative text. In order to avoid
-direct references to Unicode in the wording below, I am referencing the various
+standard. It must be noted that ISO/IEC 10646:2003 *does* reference version 4.0
+of the Unicode standard, including in normative text. In order to avoid direct
+references to Unicode in the wording below, I am referencing the various
 documents indirectly through ISO/IEC 10646:2003. The Unicode documents that are
 needed for this AI are (1) the character categorization database and (2) the
 case folding table. It would be possible to state the categorization and folding
@@ -691,7 +704,8 @@
 Returns the Wide_Character corresponding to Item if Is_Wide_Character(Item),
 and returns the Substitute Wide_Character otherwise.
 
-function To_Wide_Wide_Character (Item : in Character) return Wide_Wide_Character;
+function To_Wide_Wide_Character (Item : in Character) return
+    Wide_Wide_Character;
 Returns the Wide_Wide_Character X such that Character'Pos(Item) =
 Wide_Wide_Character'Pos (X).
 
@@ -1180,6 +1194,8 @@
 !example
 
 !ACATS test
+
+ACATS tests need to be constructed for these facilities.
 
 !appendix
 

Questions? Ask the ACAA Technical Agent