Version 1.4 of ai12s/ai12-0263-1.txt

Unformatted version of ai12s/ai12-0263-1.txt version 1.4
Other versions for file ai12s/ai12-0263-1.txt

!standard 1.1.4(14.2/3)          18-08-31 AI12-0263-1/03
!standard 2.1(1/3)
!standard 2.1(3.1/3)
!standard 2.1(4/3)
!standard 2.1(4.1/5)
!standard 2.1(5/3)
!standard 2.1(15/3)
!standard 2.1(4.1/5)
!standard 2.1(5/3)
!standard 2.3(4.1/5)
!standard 2.3(5/3)
!standard 3.5.2(2/3)
!standard 3.5.2(3/3)
!standard 3.5.2(4/3)
!standard A.1(36.1/3)
!standard A.1(36.2/3)
!standard A.3.2(32.6/5)
!standard A.3.5(51.2/5)
!standard A.3.5(55/3)
!standard A.3.5(59/3)
!standard A.4.10(3/3)
!standard B.5(21/5)
!class Amendment 18-03-08
!status Amendment 1-2012 18-03-08
!status ARG Approved 10-0-1 18-06-23
!status work item 18-03-08
!status received 18-03-05
!priority Low
!difficulty Easy
!subject Update references to ISO/IEC 10646
!summary
The Ada Standard refers to ISO/IEC 10646:2017.
!problem
AI12-0260-1 changed the character set reference to ISO/IEC 10646:2017, the most recent version. However, there are a number of references in the Standard to the 2011 version of that Standard. These are jarring and should be updated.
!proposal
Change all relevant references to ISO/IEC 10646:2011 to ISO/IEC 10646:2017.
!wording
Modify 1.1.4(14.2/3):
When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using simple upper case mapping, as defined by documents referenced in [the note in] Clause {2}[1] of ISO/IEC 10646:{2017}[2011].
{AARM Implementation Note: The "documents referenced" means Unicode, Chapter 4 (specifically, section 4.2 - Case). Machine-readable versions of Simple Uppercase Mapping and Simple Lowercase Mapping can be found in http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt. Data for older Unicode versions can be found on this site as well; start at http://www.unicode.org/Public/ and find the appropriate version number. Simple Uppercase Mapping is the 12th field in this file (the 13th element of the record, since Unicode counts from 0); the Simple Lowercase Mapping is the 13th field in this file. In both cases, if no character is present in the field, the character maps to itself.}
[Author's note: See the !discussion section for the reason for this and similar changes.]
Modify 2.1(1/3):
The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:{2017}[2011] Universal [Multiple-Octet ]Coded Character Set. This coding space is organized in planes, each plane comprising 65536 characters.
Modify 2.1(3.1/3):
A character is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:{2017}[2011], regardless of whether or not ISO/IEC 10646:{2017}[2011] allocates a character to that cell.
Modify 2.1(4/3):
The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:{2017}[2011]). A character whose relative code point in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The only characters allowed outside of comments are those in categories other_format, format_effector, and graphic_character.
Modify 2.1(4.1/5): [as modified by AI12-0004-1]
The semantics of an Ada program whose text is not in Normalization Form C (as defined by Clause 21 of ISO/IEC 10646:{2017}[2011]) is implementation defined.
Modify 2.1(5/3):
The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by [the note in] Clause {2}[1] of ISO/IEC 10646:{2017}[2011]. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified.
{AARM Discussion: The "documents referenced" means Unicode, Chapter 4. See the Discussion after the the character categorization definition for a source for machine-readable definitions of these properties.}
Modify the reference in AARM 2.1(6.a/3).
Modify 2.1(15/3):
The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2017):
Modify the reference in AARM 2.1(15.a/3).
Modify 2.3(4.1/5): [as introduced by AI12-0004-1]:
An identifier shall only contain characters that may be present in Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:{2017}[2011]).
Modify 2.3(5/3):
Two identifiers are considered the same if they consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced [in the note] in clause {2}[1] of ISO/IEC 10646:{2017}[2011].
Modify AARM 2.3(5.a.1/3):
The “documents referenced” means Unicode{, Chapter 4 (specifically, section 4.2 - Case)}. Note that simple case folding is supposed to be compatible between Unicode versions, so the Unicode version used doesn't matter. {A machine-readable version of the needed mapping can be found at: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt.}
In 3.5.2(2/3, 3/3, 4/3), replace 2011 with 2017.
In A.1(36.1/3) and A.1(36.2/3), replace 2011 with 2017.
Modify A.3.2(32.6/5) [as introduced by AI12-0004-1]:
True if Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:{2017}[2011]); this includes all characters except those with positions 160, 168, 170, 175, 178, 179, 180, 181, 184, 185, 186, 188, 189, and 190.
Modify the reference in AARM A.3.2(60.a/3).
Modify A.3.5(51.2/5) [as introduced by AI12-0004-1]:
Returns True if the Wide_Character designated by Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:{2017}[2011]), otherwise returns False.
Modify A.3.5(55/3):
Returns the Simple Lowercase Mapping as defined by documents referenced in [the note in] Clause {2}[1] of ISO/IEC 10646:{2017}[2011] of the Wide_Character designated by Item. If the Simple Lowercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
Modify AARM A.3.5(55.a/3):
{The “documents referenced” means Unicode, Chapter 4 (specifically, section 4.2 - Case)}. The case mappings come from Unicode as ISO/IEC 10646:{2017}[2011] does not include {complete} case mappings[ (but rather references the Unicode ones as above)].{ See the Implementation Notes in 1.1.4 for machine-readable versions of both Uppercase and Lowercase mappings.}
Modify A.3.5(59/3):
Returns the Simple Uppercase Mapping as defined by documents referenced in [the note in] Clause {2}[1] of ISO/IEC 10646:{2017}[2011] of the Wide_Character designated by Item. If the Simple Uppercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
Modify the references in AARM A.3.5(62.a/3) to 10646:2017 and Unicode 10.0 (current as of this writing).
Modify A.4.10(3/3):
Returns True if the strings consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in [the note in] Clause {2}[1] of ISO/IEC 10646:2017. Otherwise, returns False. This function uses the same method as is used to determine whether two identifiers are the same.
Modify B.5(21/5): [as modified by AI12-0058-1]
An implementation may add additional declarations to the Fortran interface packages. For example, declarations are permitted for the character types corresponding to Fortran character kinds 'ascii' and 'iso_10646', which in turn correspond to ISO/IEC 646:1991 and to UCS-4 as specified in ISO/IEC 10646:{2017}[2011].
!discussion
The standard should be consistent in referring to character set Standards, with the exception of a few historical AARM notes. No semantic change is intended by these changes; implementations are allowed to conform to older versions of 10646 instead of the current one.
However, there is a problem. We have several references to "documents referenced in the note in Clause 1 of ISO/IEC 10646:2011" as an indirect reference to Unicode. But 10646:2017 no longer contains that note.
Luckily, 10646:2017 does contain a normative reference to Chapter 4 of the Unicode standard. All of the information that we need (definitions of various kinds of case mapping) exists in section 4.2 ("Case") of that chapter. Thus, we replace "in the note in Clause 1" with "Clause 2" in these references. (For this purpose, we'll ignore that the reference in 10646 lists out the interesting sections, not including 4.2.)
Because this reference is even more vague than the old one (there are a number of other normative references in Clause 2 of 10646), we also beef up the associated AARM notes so that there can be no doubt of what we mean for implementers.
!corrigendum 1.1.4(14.2/3)
Replace the paragraph:
When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using simple upper case mapping, as defined by documents referenced in the note in Clause 1 of ISO/IEC 10646:2011.
by:
When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using simple upper case mapping, as defined by documents referenced in Clause 2 of ISO/IEC 10646:2017.
!corrigendum 2.1(1/3)
Replace the paragraph:
The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2011 Universal Multiple-Octet Coded Character Set. This coding space is organized in planes, each plane comprising 65536 characters.
by:
The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2017 Universal Coded Character Set. This coding space is organized in planes, each plane comprising 65536 characters.
!corrigendum 2.1(3.1/3)
Replace the paragraph:
A character is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC 10646:2011 allocates a character to that cell.
by:
A character is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2017, regardless of whether or not ISO/IEC 10646:2017 allocates a character to that cell.
!corrigendum 2.1(4/3)
Replace the paragraph:
The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2011). A character whose relative code point in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The only characters allowed outside of comments are those in categories other_format, format_effector, and graphic_character.
by:
The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2017). A character whose relative code point in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The only characters allowed outside of comments are those in categories other_format, format_effector, and graphic_character.
!corrigendum 2.1(4.1/5)
Replace the paragraph:
The semantics of an Ada program whose text is not in Normalization Form C (as defined by Clause 21 of ISO/IEC 10646:2011) is implementation defined.
by:
The semantics of an Ada program whose text is not in Normalization Form C (as defined by Clause 21 of ISO/IEC 10646:2017) is implementation defined.
!corrigendum 2.1(5/3)
Replace the paragraph:
The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by the note in Clause 1 of ISO/IEC 10646:2011. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified.
by:
The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by Clause 2 of ISO/IEC 10646:2017. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified.
!corrigendum 2.1(15/3)
Replace the paragraph:
The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2011):
by:
The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2017):
!comment use the overall paragraph to force a conflict.
!corrigendum 2.3(4/3)
Replace the paragraph:
An identifier shall only contain characters that may be present in Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2011).
by:
An identifier shall only contain characters that may be present in Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2017).
!corrigendum 2.3(5/3)
Replace the paragraph:
Two identifiers are considered the same if they consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in the note in Clause 1 of ISO/IEC 10646:2011.
by:
Two identifiers are considered the same if they consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in Clause 2 of ISO/IEC 10646:2017.
!corrigendum 3.5.2(2/3)
Replace the paragraph:
The predefined type Character is a character type whose values correspond to the 256 code points of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2011 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding character_literal in Character. Each of the nongraphic characters of Row 00 has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in italics.
by:
The predefined type Character is a character type whose values correspond to the 256 code points of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2017 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding character_literal in Character. Each of the nongraphic characters of Row 00 has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in italics.
!corrigendum 3.5.2(3/3)
Replace the paragraph:
The predefined type Wide_Character is a character type whose values correspond to the 65536 code points of the ISO/IEC 10646:2011 Basic Multilingual Plane (BMP). Each of the graphic characters of the BMP has a corresponding character_literal in Wide_Character. The first 256 values of Wide_Character have the same character_literal or language-defined name as defined for Character. Each of the graphic_characters has a corresponding character_literal.
by:
The predefined type Wide_Character is a character type whose values correspond to the 65536 code points of the ISO/IEC 10646:2017 Basic Multilingual Plane (BMP). Each of the graphic characters of the BMP has a corresponding character_literal in Wide_Character. The first 256 values of Wide_Character have the same character_literal or language-defined name as defined for Character. Each of the graphic_characters has a corresponding character_literal.
!corrigendum 3.5.2(4/3)
Replace the paragraph:
The predefined type Wide_Wide_Character is a character type whose values correspond to the 2147483648 code points of the ISO/IEC 10646:2011 character set. Each of the graphic_characters has a corresponding character_literal in Wide_Wide_Character. The first 65536 values of Wide_Wide_Character have the same character_literal or language-defined name as defined for Wide_Character.
by:
The predefined type Wide_Wide_Character is a character type whose values correspond to the 2147483648 code points of the ISO/IEC 10646:2017 character set. Each of the graphic_characters has a corresponding character_literal in Wide_Wide_Character. The first 65536 values of Wide_Wide_Character have the same character_literal or language-defined name as defined for Wide_Character.
!corrigendum A.1(36.1/2)
Replace the paragraph:
-- The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2011 BMP character -- set. The first 256 positions have the same contents as type Character. See 3.5.2.
type Wide_Character is (nul, soh ... Hex_0000FFFE, Hex_0000FFFF);
by:
-- The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2017 BMP character -- set. The first 256 positions have the same contents as type Character. See 3.5.2.
type Wide_Character is (nul, soh ... Hex_0000FFFE, Hex_0000FFFF);
!corrigendum A.1(36.2/2)
Replace the paragraph:
-- The declaration of type Wide_Wide_Character is based on the full -- ISO/IEC 10646:2011 character set. The first 65536 positions have the -- same contents as type Wide_Character. See 3.5.2.
type Wide_Wide_Character is (nul, soh ... Hex_7FFFFFFE, Hex_7FFFFFFF); for Wide_Wide_Character'Size use 32;
by:
-- The declaration of type Wide_Wide_Character is based on the full -- ISO/IEC 10646:2017 character set. The first 65536 positions have the -- same contents as type Wide_Character. See 3.5.2.
type Wide_Wide_Character is (nul, soh ... Hex_7FFFFFFE, Hex_7FFFFFFF); for Wide_Wide_Character'Size use 32;
!comment use the overall paragraph to force a conflict.
!corrigendum A.3.2(32.5/3)
Replace the paragraph:
True if Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2011); this includes all characters except those with positions 160, 168, 170, 175, 178, 179, 180, 181, 184, 185, 186, 188, 189, and 190.
by:
True if Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2017); this includes all characters except those with positions 160, 168, 170, 175, 178, 179, 180, 181, 184, 185, 186, 188, 189, and 190.
!comment use the overall paragraph to force a conflict.
!corrigendum A.3.5(51/3)
Replace the paragraph:
Returns True if the Wide_Character designated by Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2011), otherwise returns False.
by:
Returns True if the Wide_Character designated by Item could be present in a string normalized to Normalization Form KC (as defined by Clause 21 of ISO/IEC 10646:2017), otherwise returns False.
!corrigendum A.3.5(55/3)
Replace the paragraph:
Returns the Simple Lowercase Mapping as defined by documents referenced in the note in Clause 1 of ISO/IEC 10646:2011 of the Wide_Character designated by Item. If the Simple Lowercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
by:
Returns the Simple Lowercase Mapping as defined by documents referenced in Clause 2 of ISO/IEC 10646:2017 of the Wide_Character designated by Item. If the Simple Lowercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
!corrigendum A.3.5(59/3)
Replace the paragraph:
Returns the Simple Uppercase Mapping as defined by documents referenced in the note in Clause 1 of ISO/IEC 10646:2011 of the Wide_Character designated by Item. If the Simple Uppercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
by:
Returns the Simple Uppercase Mapping as defined by documents referenced in Clause 2 of ISO/IEC 10646:2017 of the Wide_Character designated by Item. If the Simple Uppercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned.
!corrigendum A.4.10(3/3)
Replace the paragraph:
Returns True if the strings consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in the note in Clause 1 of ISO/IEC 10646:2011. Otherwise, returns False. This function uses the same method as is used to determine whether two identifiers are the same.
by:
Returns True if the strings consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in Clause 2 of ISO/IEC 10646:2017. Otherwise, returns False. This function uses the same method as is used to determine whether two identifiers are the same.
!corrigendum B.5(21/5)
Replace the paragraph:
An implementation may add additional declarations to the Fortran interface packages. For example, declarations are permitted for the character types corresponding to Fortran character kinds 'ascii' and 'iso_10646', which in turn correspond to ISO/IEC 646:1991 and to UCS-4 as specified in ISO/IEC 10646:2011.
by:
An implementation may add additional declarations to the Fortran interface packages. For example, declarations are permitted for the character types corresponding to Fortran character kinds 'ascii' and 'iso_10646', which in turn correspond to ISO/IEC 646:1991 and to UCS-4 as specified in ISO/IEC 10646:2017.
!ASIS
None needed.
!ACATS test
No semantic change is intended by this AI, so no additional tests are needed.
!appendix

****************************************************************


Questions? Ask the ACAA Technical Agent