CVS difference for ais/ai-00285.txt

Differences between 1.14 and version 1.15
Log of other versions for file ais/ai-00285.txt

--- ais/ai-00285.txt	2003/11/27 02:01:14	1.14
+++ ais/ai-00285.txt	2004/03/02 04:44:58	1.15
@@ -1,4 +1,4 @@
-!standard A.3.2(49)                                    03-11-05  AI95-00285/05
+!standard A.3.2(49)                                    04-02-27  AI95-00285/06
 !class amendment 02-01-23
 !status work item 02-09-24
 !status received 02-01-15
@@ -130,13 +130,10 @@
     http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt, is used to
     find the uppercase version of each character.
 
-Unicode doesn't provide guidance for the composition of numeric literals,
-but it is apparent that we can use the character categories above. So we
-define:
+Unicode doesn't provide guidance for the composition of numeric literals, so we
+don't change them. They are probably not very important from the
+internationalization standpoint anyway.
 
-   numeral ::= number_decimal_digit {[punctuation_connector] numeral_extend}
-   numeral_extend ::= number_decimal_digit | other_format
-
 Again, characters in category other_format (and punctuation_connector) are
 ignored when computing the value of a decimal literal. The numerical value of
 each character that is a number_decimal_digit is defined by the field "Decimal
@@ -146,10 +143,7 @@
 characters at positions 16#85#, 16#2028# and 16#2029#. These characters may be
 used to terminate lines, as recommended by http://www.unicode.org/reports/tr13.
 
-We are not changing the definition of character_literals and string_literals.
-In particular, we _do_not_ apply Normalization Form KC to such literals. This
-means in particular that two string literals which look alike may not compare
-equal. Also note that characters in category other_format are forbidden in
+Note that characters in category other_format are forbidden in
 character_literals and string_literals, because their sole purpose is to affect
 the presentation of characters. If a program needs to operate on these
 characters, it can do that by using Wide_Wide_Character'Val (...).
@@ -229,46 +223,7 @@
 is written. If we reference one bit of it, it begs the question: why are we not
 complying with the entire document?
 
-<<Open Issue #2>>
-
-Unicode makes recommendations regarding the character repertoire to use for
-programming language identifiers, but they don't make any recommendation
-regarding the character repertoire to use for numeric literals. The AI is
-currently written in such a way that national variants of digits are allowed in
-character literals. Is this a good idea? The rationale is that if people from a
-non-occidental culture write code in their native language, it might be useful
-for them to use their native notation for numbers, assuming that occidental
-digits are not the most commonly used form of digits used in their culture (for
-instance, arabic-speaking countries seem to favor the arabic form of the digits).
-
-<<Open Issue #3>>
-
-Unicode recommends to use Normalization Form KC (which is defined by reference
-by ISO/IEC 10646:2003) for identifiers. The purpose of this normalization
-process is to deal with the case where the same logical character may be
-represented by several distinct sequences of physical characters. For instance
-the sequence of two characters:
-
-   LATIN CAPITAL LETTER E -- 16#0045#
-   COMBINING ACUTE ACCENT -- 16#0301#
-
-is transformed by normalization form KC into:
-
-   LATIN CAPITAL LETTER E WITH ACUTE -- 16#00C9#
-
-The purpose of normalization is to ensure that two program texts that look the
-same to users will actually be processed identically by the compiler. For
-instance, it may be that two users on the same project are using different
-editors, and that one produces the above sequence of two characters when the
-other produces a single (accented) character. It would be nice if the difference
-were invisible to users.
-
-Is this a good idea? If we apply normalization to identifiers, should we apply
-it to string and character literals, too? Or should we base the language solely
-on the sequence of physical characters, and force the users to look for errors
-in their programs in a hex dump of their source?
 
-
 !wording
 
 In (32) change:
@@ -451,48 +406,26 @@
 characters after applying the following transformations (in this order):
 
 o   The characters in category other_format are eliminated.
-o   Normalization Form KC defined by section 24 of ISO/IEC 10646:2003 is applied
-    to the identifier.
 o   Full case folding, as defined by documents referenced in the note in
     section 1 of ISO/IEC 10646:2003, is applied to obtain the uppercase version
     of each character.
-
-
-Replace 2.4.1(3) by:
-
-   numeral ::= number_decimal_digit {[punctuation_connector] numeral_extend}
-   numeral_extend ::= number_decimal_digit | other_format
-
-
-Replace 2.4.1(6) by:
 
-In determining the meaning of a numeric_literal, the following transformations
-are applied:
+	Implementation Advice
 
-o   The characters in categories punctuation_connector and other_format are
-    eliminated.
-o   The numerical value of each character in category number_decimal_digit is
-    given by its Decimal Digit Value.
+If appropriate for the computing environment under consideration, an
+implementation should provide a mode where Normalization Form KC (as defined by
+section 24 of ISO/IEC 10646:2003) is applied to the identifier immediately
+before performing full case folding.
 
 
-Replace 2.4.2(8) by:
+Add after 2.6(6):
 
-In determining the meaning of a based_literal, the following transformations
-are applied:
-
-o   The characters in categories punctuation_connector and other_format are
-    eliminated.
-o   The numerical value of each character in category number_decimal_digit is
-    given by its its Decimal Digit Value.
-o   The numerical values of the letters A through F are 10 through 15,
-    respectively.
-
+No modification is performed on the sequence of characters in a string_literal.
 
-Add after 2.6(7):
+	Implementation Permission
 
-No modification is performed on the sequence of characters in a string_literal.
-In particular, Normalization Form KC is _not_ applied. Therefore, two strings
-which look alike may not compare equal.
+An implementation may provide a mode where Normalization Form KC (as defined by
+section 24 of ISO/IEC 10646:2003) is applied to the string literal.
 
 
 Replace 3.5(28-29) by:
@@ -1043,12 +976,12 @@
                      Count : out Natural;
                      Trim_Nul : in Boolean := True);
 
-   type char16_t is <implementation-defined character type>;
+   type char32_t is <implementation-defined character type>;
 
-   char16_nul : constant char16_t := implementation-defined;
+   char32_nul : constant char32_t := implementation-defined;
 
-   function To_C (Item : in Wide_Character) return char16_t;
-   function To_Ada (Item : in char16_t) return Wide_Character;
+   function To_C (Item : in Wide_Wide_Character) return char32_t;
+   function To_Ada (Item : in char32_t) return Wide_Wide_Character;
 
    type char32_array is array (size_t range <>) of aliased char32_t;
 

Questions? Ask the ACAA Technical Agent