CVS difference for ai05s/ai05-0266-1.txt
--- ai05s/ai05-0266-1.txt 2011/11/02 03:52:24 1.1
+++ ai05s/ai05-0266-1.txt 2011/12/20 07:56:41 1.2
@@ -1,6 +1,21 @@
-!standard 1.2(8/2) 11-11-01 AI05-0266-1/01
+!standard 1.1.4(14.2/2) 11-12-19 AI05-0266-1/02
+!standard 1.2(7/2)
+!standard 1.2(8/2)
+!standard 1.2(9/2)
+!standard 2.1(1/2)
+!standard 2.1(3.1/2)
+!standard 2.1(4/2)
+!standard 2.1(4.1/2)
+!standard 2.1(15/2)
+!standard 2.1(16/2)
+!standard 2.3(5/2)
+!standard 3.5.2(2/2)
+!standard A.1(36.1/2)
+!standard A.1(36.2/2)
!standard A.3.5(0)
!class Amendment 11-11-01
+!status Amendment 2012 11-12-19
+!status ARG Approved 7-0-2 11-11-11
!status work item 11-11-01
!status received 11-09-29
!priority Low
@@ -8,10 +23,21 @@
!subject Use the latest version of ISO/IEC 10646
!summary
-(1) Ada 2012 should depend on the 2011 version of 10646.
+(1) Ada 2012 should reference the most recent version of other standards:
+ (A) the 2011 version of character sets (10646);
+ (B) the 2011 version of C;
+ (C) the 2011 version of C++.
+
+(2) An implementation permission is added to use any character set standard,
+so long as it is at least as new as the 2003 edition of 10646.
+
+(3) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling)
+have a new function that reports the character set standard. We also add a
+note that the results of the functions depends on the character set standard
+used.
-(2) Ada.Characters.Wide_Handling should have a statement that the behavior
-of the functions depends on the character set standard used.
+(4) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling)
+are Pure.
!proposal
@@ -19,35 +45,91 @@
10646:2011, was issued. Ada 2012 should use the most recent version of the
character set standard (as it does with other standards).
-[Editor's note: What about C and C++? I think there have been more recent
-versions of those standards, too.]
+There is a 2011 revision for C, and a 2011 revision of C++ which also should
+be used.
-Related to that is the runtime behavior of Ada.Wide_Characters.Handling. We do
-not want this package to be tied to a particular character set standard forever.
-So we should include a statement that the behavior for particular characters
-depends on the character set standard in use - future versions of Ada will use
-newer standards; programs that depend on the specific behavior for particular
-characters probably should not depend on this package (as well as
-Ada.Wide_Wide_Character.Handling).
+However, switching to a newer standard probably would introduce some
+incompatibilities in identifiers including unusual characters. Moreover, it is
+more important that Ada compilers support the character sets of the host and
+targets rather than any abstract standard. Finally, we don't want to make
+implementations wait until 2020 to support new characters (especially if those
+characters are important to some customer). So we propose that the character
+set Standard actually used be implementation-defined, subject only to the
+requirement that it is at least 10646:2003.
+
+The runtime behavior of Ada.Wide_Characters.Handling will depend on the exact
+character set used. We suggest adding a function to the package so that it can
+report what standard it uses. Programs that require particular behavior ought to
+check that the standard used is the one expected.
+Ada.Wide_Characters.Handling has no categorization pragma. This package should
+be Pure (like Ada.Characters.Handling).
+
!wording
+Replace 1.2(7/2) with:
+
+ISO/IEC 9899:2011, Information Technology - Programming languages — C
+
+Delete 1.2(7.a/2). [The name is now in the usual form.]
+
In 1.2(8/2), change "2003" to "2011".
+Replace 1.2(9/2) with:
+
+ISO/IEC 14882:2011, Information Technology - Programming languages — C++
+
+Delete 1.2(9.a/2). [The name is now in the usual form.]
+
+Implementation Permission
+The categories defined above, as well as case mapping and folding, may be based
+on an implementation-defined version of ISO/IEC 10646 (2003 edition or later).
+
+AARM Ramification: The exact categories, case mapping, and case folding chosen
+affects identifiers, the result of '[[Wide_]Wide_]Image,
+and packages Wide_Characters.Handling and Wide_Wide_Characters.Handling.
+
+Add after A.3.5(4/3):
+
+pragma Pure(Handling)
+
+function Character_Set_Version return String;
+
+Add after A.3.5(23/3):
+
+function Character_Set_Version return String;
+
+ Returns an implementation-defined identifier that identifies the version of
+ the character set standard that is used for categorizing characters by the
+ implementation.
+
Add at the end of A.3.5:
-The results of these functions depends on the character set standard used
-by a particular version of Ada. Future Ada standards will typically use newer
-character set standards, and these functions will change their results to
-reflect those standards. If a program requires behavior specifically of a
-particular character set standard, this package should not be used.
+Implementation Advice
-[Editor's note: I don't have a great way to word this, hopefully someone will
-have a better idea.]
+The string returned by Character_Set_Version should include either “10646:” or “Unicode”.
+Note:
+The results returned by these functions may depend on which particular
+version of the 10646 standard is supported by the implementation (see 2.1).
+
+Change 10646:2003 to 10646:2011 wherever it appears.
+
!discussion
+
+A program that cannot tolerate changes in the behavior of the classification
+of case conversion functions of Ada.Wide_Characters.Handling should check the
+results of the Character_Set_Version function before preceding. If it differs
+from the expected value, the program should take defensive measures.
+
+Note that any Ada program can count on the support of the characters defined in
+10646:2003 except for the few characters whose classifications are changed in
+later standards. For commonly used character sets, like Greek and Cyrillic, the
+character set chosen by the implementation should not matter.
-10646:2011 adds and Annex U about identifier syntax. But all it says is to go
+----
+
+10646:2011 adds an Annex U about identifier syntax. But all it says is to go
read the Unicode documents! We need to reconsider exactly which characters
are allowed in identifiers in order to meet this standard, but we'll do that
in a separate AI (as this topic is not as clear-cut as the others here, and
@@ -70,17 +152,211 @@
also subtly change the characters allowed, with an early Binding Interpretation.
So we will not consider any such effects here.
+!corrigendum 1.1.4(14.2/2)
+!AI-0227-1
+!AI-0266-1
+
+@drepl
+When this International Standard mentions the conversion of some character or
+sequence of characters to upper case, it means the character or sequence of
+characters obtained by using locale-independent full case folding, as defined
+by documents referenced in the note in section 1 of ISO/IEC 10646:2003.
+@dby
+When this International Standard mentions the conversion of some character or sequence
+of characters to upper case, it means the character or sequence of characters obtained
+by using simple upper case mapping, as defined by documents referenced in the note
+in section 1 of ISO/IEC 10646:2011.
+
+!corrigendum 1.2(7/2)
+
+@drepl
+ISO/IEC 9899:1999, @i<Programming languages @emdash C>,
+supplemented by Technical Corrigendum 1:2001 and Technical Corrigendum 2:2004.
+@dby
+ISO/IEC 9899:2011, @i<Information technology @emdash Programming languages @emdash C>.
+
+!corrigendum 1.2(8/2)
+
+@drepl
+ISO/IEC 10646:2003, @i<Information technology @emdash Universal Multiple-Octet
+Coded Character Set (UCS)>.
+@dby
+ISO/IEC 10646:2011, @i<Information technology @emdash Universal Multiple-Octet
+Coded Character Set (UCS)>.
+
+!corrigendum 1.2(9/2)
+
+@drepl
+ISO/IEC 14882:2003, @i<Programming languages @emdash C++>.
+@dby
+ISO/IEC 14882:2011, @i<Information technology @emdash Programming languages @emdash C++>.
+
+!corrigendum 2.1(1/2)
+
+@drepl
+The character repertoire for the text of an Ada program consists of the entire coding space
+described by the ISO/IEC 10646:2003 Universal Multiple-Octet Coded Character Set. This
+coding space is organized in @i<planes>, each plane comprising 65536 characters.
+@dby
+The character repertoire for the text of an Ada program consists of the entire coding space
+described by the ISO/IEC 10646:2011 Universal Multiple-Octet Coded Character Set. This
+coding space is organized in @i<planes>, each plane comprising 65536 characters.
+
+!corrigendum 2.1(3.1/2)
+
+@drepl
+A @fa<character> is defined by this International Standard for each cell in the coding
+space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC
+10646:2011 allocates a character to that cell.
+@dby
+A @fa<character> is defined by this International Standard for each cell in the coding
+space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC
+10646:2011 allocates a character to that cell.
+
+!corrigendum 2.1(4/2)
+
+@drepl
+The coded representation for characters is implementation defined
+(it need not be a representation defined within ISO/IEC 10646:2003).
+A character whose relative code position in its plane is 16#FFFE# or 16#FFFF#
+is not allowed anywhere in the text of a program.
+@dby
+The coded representation for characters is implementation defined
+(it need not be a representation defined within ISO/IEC 10646:2011).
+A character whose relative code point in its plane is 16#FFFE# or 16#FFFF#
+is not allowed anywhere in the text of a program.
+The only characters allowed outside of comments are those in categories
+@fa<other_format>, @fa<format_effector>, and @fa<graphic_character>.
+
+!corrigendum 2.1(4.1/2)
+
+@drepl
+The semantics of an Ada program whose text is not in Normalization Form KC (as
+defined by section 24 of ISO/IEC 10646:2003) is implementation defined.
+@dby
+The semantics of an Ada program whose text is not in Normalization Form KC (as
+defined by section 21 of ISO/IEC 10646:2011) is implementation defined.
+
+!corrigendum 2.1(5/2)
+
+@drepl
+The description of the language definition in this International Standard uses
+the character properties General Category, Simple Uppercase Mapping, Uppercase
+Mapping, and Special Case Condition of the documents referenced by the note in
+section 1 of ISO/IEC 10646:2003. The actual set of graphic symbols used by an
+implementation for the visual representation of the text of an Ada program is
+not specified.
+@dby
+The description of the language definition in this International Standard uses
+the character properties General Category, Simple Uppercase Mapping, Uppercase
+Mapping, and Special Case Condition of the documents referenced by the note in
+section 1 of ISO/IEC 10646:2011. The actual set of graphic symbols used by an
+implementation for the visual representation of the text of an Ada program is
+not specified.
+
+!corrigendum 2.1(15/2)
+
+@drepl
+The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2003)
+@dby
+The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2011)
+
+!corrigendum 2.1(16/2)
+
+@drepl
+In a nonstandard mode, the implementation may support a different character
+repertoire; in particular, the set of characters that are considered
+@fa<identifier_letter>s can be extended or changed to conform to local
+conventions.
+@dby
+The categories defined above, as well as case mapping and folding, may be based
+on an implementation-defined version of ISO/IEC 10646 (2003 edition or later).
+
+!corrigendum 2.3(5/2)
+
+@drepl
+Two @fa<identifier>s are considered the same if they consist of the same
+sequence of characters after applying the following transformations (in
+this order):
+@dby
+Two @fa<identifier>s are considered the same if they consist of the same
+sequence of characters after applying locale-independent simple case folding,
+as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2011.
+
+!corrigendum 3.5.2(2/2)
+
+@drepl
+The predefined type Character is a character type whose values correspond to
+the 256 code positions of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2003 Basic
+Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP
+has a corresponding @fa<character_literal> in Character. Each of the nongraphic
+positions of Row 00 (0000-001F and 007F-009F) has a corresponding
+language-defined name, which is not usable as an enumeration literal, but which
+is usable with the attributes Image, Wide_Image,
+Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are
+given in the definition of type Character in A.1, "The Package Standard", but
+are set in @i<italics>.
+@dby
+The predefined type Character is a character type whose values correspond to
+the 256 code points of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2011 Basic
+Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP
+has a corresponding @fa<character_literal> in Character. Each of the nongraphic
+characters of Row 00 has a corresponding
+language-defined name, which is not usable as an enumeration literal, but which
+is usable with the attributes Image, Wide_Image,
+Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are
+given in the definition of type Character in A.1, "The Package Standard", but
+are set in @i<italics>.
+
+
+!corrigendum A.1(36.1/2)
+
+@drepl
+@xcode< --@ft<@i< The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2003 BMP character>>
+ --@ft<@i< set. The first 256 positions have the same contents as type Character. See 3.5.2.>>
+
+ @b<type> Wide_Character @b<is> (@i<nul>, @i<soh> ... @i<Hex_0000FFFE>, @i<Hex_0000FFFF>);>
+@dby
+@xcode< --@ft<@i< The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2011 BMP character>>
+ --@ft<@i< set. The first 256 positions have the same contents as type Character. See 3.5.2.>>
+
+ @b<type> Wide_Character @b<is> (@i<nul>, @i<soh> ... @i<Hex_0000FFFE>, @i<Hex_0000FFFF>);>
+
+!corrigendum A.1(36.2/2)
+
+@drepl
+@xcode< --@ft<@i< The declaration of type Wide_Wide_Character is based on the full>>
+ --@ft<@i< ISO/IEC 10646:2003 character set. The first 65536 positions have the>>
+ --@ft<@i< same contents as type Wide_Character. See 3.5.2.>>
+
+ @b<type> Wide_Wide_Character @b<is> (@i<nul>, @i<soh> ... @i<Hex_7FFFFFFE>, @i<Hex_7FFFFFFF>);
+ @b<for> Wide_Wide_Character'Size @b<use> 32;>
+@dby
+@xcode< --@ft<@i< The declaration of type Wide_Wide_Character is based on the full>>
+ --@ft<@i< ISO/IEC 10646:2011 character set. The first 65536 positions have the>>
+ --@ft<@i< same contents as type Wide_Character. See 3.5.2.>>
+
+ @b<type> Wide_Wide_Character @b<is> (@i<nul>, @i<soh> ... @i<Hex_7FFFFFFE>, @i<Hex_7FFFFFFF>);
+ @b<for> Wide_Wide_Character'Size @b<use> 32;>
+
+
+!corrigendum A.3.5(0)
+
+@dinsc
+
+Force a conflict; the real text is found in the conflict file.
+
!ACATS test
-An ACATS C-Test to check that some characters added by 10646:2011 are
-supported and properly categorized.
+No separate ACATS test is needed (since the exact character set supported is
+implementation-defined).
Any tests involving identifiers should be postponed until the AI on identifiers
is decided.
!ASIS
-No ASIS effect. (??)
+No ASIS effect.
!appendix
Questions? Ask the ACAA Technical Agent