Version 1.2 of ai05s/ai05-0266-1.txt

Unformatted version of ai05s/ai05-0266-1.txt version 1.2
Other versions for file ai05s/ai05-0266-1.txt

!standard 1.1.4(14.2/2)          11-12-19 AI05-0266-1/02
!standard 1.2(7/2)
!standard 1.2(8/2)
!standard 1.2(9/2)
!standard 2.1(1/2)
!standard 2.1(3.1/2)
!standard 2.1(4/2)
!standard 2.1(4.1/2)
!standard 2.1(15/2)
!standard 2.1(16/2)
!standard 2.3(5/2)
!standard 3.5.2(2/2)
!standard A.1(36.1/2)
!standard A.1(36.2/2)
!standard A.3.5(0)
!class Amendment 11-11-01
!status Amendment 2012 11-12-19
!status ARG Approved 7-0-2 11-11-11
!status work item 11-11-01
!status received 11-09-29
!priority Low
!difficulty Easy
!subject Use the latest version of ISO/IEC 10646
!summary
(1) Ada 2012 should reference the most recent version of other standards:
(A) the 2011 version of character sets (10646); (B) the 2011 version of C; (C) the 2011 version of C++.
(2) An implementation permission is added to use any character set standard, so long as it is at least as new as the 2003 edition of 10646.
(3) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling) have a new function that reports the character set standard. We also add a note that the results of the functions depends on the character set standard used.
(4) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling) are Pure.
!proposal
In March of 2011, a new version of the character set standard, ISO/IEC 10646:2011, was issued. Ada 2012 should use the most recent version of the character set standard (as it does with other standards).
There is a 2011 revision for C, and a 2011 revision of C++ which also should be used.
However, switching to a newer standard probably would introduce some incompatibilities in identifiers including unusual characters. Moreover, it is more important that Ada compilers support the character sets of the host and targets rather than any abstract standard. Finally, we don't want to make implementations wait until 2020 to support new characters (especially if those characters are important to some customer). So we propose that the character set Standard actually used be implementation-defined, subject only to the requirement that it is at least 10646:2003.
The runtime behavior of Ada.Wide_Characters.Handling will depend on the exact character set used. We suggest adding a function to the package so that it can report what standard it uses. Programs that require particular behavior ought to check that the standard used is the one expected.
Ada.Wide_Characters.Handling has no categorization pragma. This package should be Pure (like Ada.Characters.Handling).
!wording
Replace 1.2(7/2) with:
ISO/IEC 9899:2011, Information Technology - Programming languages C
Delete 1.2(7.a/2). [The name is now in the usual form.]
In 1.2(8/2), change "2003" to "2011".
Replace 1.2(9/2) with:
ISO/IEC 14882:2011, Information Technology - Programming languages C++
Delete 1.2(9.a/2). [The name is now in the usual form.]
Implementation Permission The categories defined above, as well as case mapping and folding, may be based on an implementation-defined version of ISO/IEC 10646 (2003 edition or later).
AARM Ramification: The exact categories, case mapping, and case folding chosen affects identifiers, the result of '[[Wide_]Wide_]Image, and packages Wide_Characters.Handling and Wide_Wide_Characters.Handling.
Add after A.3.5(4/3):
pragma Pure(Handling)
function Character_Set_Version return String;
Add after A.3.5(23/3):
function Character_Set_Version return String;
Returns an implementation-defined identifier that identifies the version of the character set standard that is used for categorizing characters by the implementation.
Add at the end of A.3.5:
Implementation Advice
The string returned by Character_Set_Version should include either 10646: or Unicode.
Note: The results returned by these functions may depend on which particular version of the 10646 standard is supported by the implementation (see 2.1).
Change 10646:2003 to 10646:2011 wherever it appears.
!discussion
A program that cannot tolerate changes in the behavior of the classification of case conversion functions of Ada.Wide_Characters.Handling should check the results of the Character_Set_Version function before preceding. If it differs from the expected value, the program should take defensive measures.
Note that any Ada program can count on the support of the characters defined in 10646:2003 except for the few characters whose classifications are changed in later standards. For commonly used character sets, like Greek and Cyrillic, the character set chosen by the implementation should not matter.
----
10646:2011 adds an Annex U about identifier syntax. But all it says is to go read the Unicode documents! We need to reconsider exactly which characters are allowed in identifiers in order to meet this standard, but we'll do that in a separate AI (as this topic is not as clear-cut as the others here, and we already have such an AI to deal with another, related problem).
Using 10646:2011 changes the details of "Simple Locale-independent Case Folding", "Simple Uppercase Mapping", and "Simple Lowercase Mapping", used by various rules in the standard. The former is defined to be "stable" (always changed compatibly), so there should not be any incompatiblities or inconsistencies caused by its change. Changes to "Simple Uppercase Mapping" might change the 'Image of identifiers containing obscure characters, and could make an enumeration type containing such obscure characters illegal -- but as the changes are all in unusual characters, this is unlikely to be a problem in practice. (Following the Ada 2012 rules exactly is likely to have the same level of incompatibility.)
Using 10646:2011 will categorize more characters as letters, so that they would be allowed in identifiers. But we will consider adopting the Unicode 6.0 recommendations (as referenced in 10646:2011) for identifiers, which would also subtly change the characters allowed, with an early Binding Interpretation. So we will not consider any such effects here.
!corrigendum 1.1.4(14.2/2)
!AI-0227-1
!AI-0266-1
@drepl When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using locale-independent full case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003. @dby When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using simple upper case mapping, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2011.
!corrigendum 1.2(7/2)
Replace the paragraph:
ISO/IEC 9899:1999, Programming languages — C, supplemented by Technical Corrigendum 1:2001 and Technical Corrigendum 2:2004.
by:
ISO/IEC 9899:2011, Information technology — Programming languages — C.
!corrigendum 1.2(8/2)
Replace the paragraph:
ISO/IEC 10646:2003, Information technology — Universal Multiple-Octet Coded Character Set (UCS).
by:
ISO/IEC 10646:2011, Information technology — Universal Multiple-Octet Coded Character Set (UCS).
!corrigendum 1.2(9/2)
Replace the paragraph:
ISO/IEC 14882:2003, Programming languages — C++.
by:
ISO/IEC 14882:2011, Information technology — Programming languages — C++.
!corrigendum 2.1(1/2)
Replace the paragraph:
The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2003 Universal Multiple-Octet Coded Character Set. This coding space is organized in planes, each plane comprising 65536 characters.
by:
The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2011 Universal Multiple-Octet Coded Character Set. This coding space is organized in planes, each plane comprising 65536 characters.
!corrigendum 2.1(3.1/2)
Replace the paragraph:
A character is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC 10646:2011 allocates a character to that cell.
by:
A character is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC 10646:2011 allocates a character to that cell.
!corrigendum 2.1(4/2)
Replace the paragraph:
The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2003). A character whose relative code position in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program.
by:
The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2011). A character whose relative code point in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The only characters allowed outside of comments are those in categories other_format, format_effector, and graphic_character.
!corrigendum 2.1(4.1/2)
Replace the paragraph:
The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section 24 of ISO/IEC 10646:2003) is implementation defined.
by:
The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section 21 of ISO/IEC 10646:2011) is implementation defined.
!corrigendum 2.1(5/2)
Replace the paragraph:
The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by the note in section 1 of ISO/IEC 10646:2003. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified.
by:
The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by the note in section 1 of ISO/IEC 10646:2011. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified.
!corrigendum 2.1(15/2)
Replace the paragraph:
The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2003)
by:
The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2011)
!corrigendum 2.1(16/2)
Replace the paragraph:
In a nonstandard mode, the implementation may support a different character repertoire; in particular, the set of characters that are considered identifier_letters can be extended or changed to conform to local conventions.
by:
The categories defined above, as well as case mapping and folding, may be based on an implementation-defined version of ISO/IEC 10646 (2003 edition or later).
!corrigendum 2.3(5/2)
Replace the paragraph:
Two identifiers are considered the same if they consist of the same sequence of characters after applying the following transformations (in this order):
by:
Two identifiers are considered the same if they consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2011.
!corrigendum 3.5.2(2/2)
Replace the paragraph:
The predefined type Character is a character type whose values correspond to the 256 code positions of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding character_literal in Character. Each of the nongraphic positions of Row 00 (0000-001F and 007F-009F) has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in italics.
by:
The predefined type Character is a character type whose values correspond to the 256 code points of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2011 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding character_literal in Character. Each of the nongraphic characters of Row 00 has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in italics.
!corrigendum A.1(36.1/2)
Replace the paragraph:
-- The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2003 BMP character -- set. The first 256 positions have the same contents as type Character. See 3.5.2.
type Wide_Character is (nul, soh ... Hex_0000FFFE, Hex_0000FFFF);
by:
-- The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2011 BMP character -- set. The first 256 positions have the same contents as type Character. See 3.5.2.
type Wide_Character is (nul, soh ... Hex_0000FFFE, Hex_0000FFFF);
!corrigendum A.1(36.2/2)
Replace the paragraph:
-- The declaration of type Wide_Wide_Character is based on the full -- ISO/IEC 10646:2003 character set. The first 65536 positions have the -- same contents as type Wide_Character. See 3.5.2.
type Wide_Wide_Character is (nul, soh ... Hex_7FFFFFFE, Hex_7FFFFFFF); for Wide_Wide_Character'Size use 32;
by:
-- The declaration of type Wide_Wide_Character is based on the full -- ISO/IEC 10646:2011 character set. The first 65536 positions have the -- same contents as type Wide_Character. See 3.5.2.
type Wide_Wide_Character is (nul, soh ... Hex_7FFFFFFE, Hex_7FFFFFFF); for Wide_Wide_Character'Size use 32;
!corrigendum A.3.5(0)
Insert new clause:
Force a conflict; the real text is found in the conflict file.
!ACATS test
No separate ACATS test is needed (since the exact character set supported is implementation-defined).
Any tests involving identifiers should be postponed until the AI on identifiers is decided.
!ASIS
No ASIS effect.
!appendix

From: Randy Brukardt
Sent: Thursday, September 29, 2011  11:20 PM

While researching a question from Erhard's review, I happened to notice that a
new edition of ISO 10646, the character set standard, was issued this year.
(It's dated March 15th.)

Ada 2005 relied on ISO 10646:2003, which corresponds to Unicode 4.0.

ISO 10646:2011 corresponds to Unicode 6.0 - which has nearly 2100 additional
characters over Unicode 5.2. (No idea how many have been added compared to
10646:2003, but it would seem to be a lot.)

Which set is used would affect the exact characters used in identifiers (many of
the new characters could be used in identifiers, and there are a few characters
which have been reclassified such that they would not be usable in identifiers).
It also would affect the results from the new packages
Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling. Presumably
(although I haven't checked this), there would be changes in case mapping as
well.

Generally, Ada has relied on the most recent version of other standards. If we
follow this, we should change to using 10646:2011. But note that doing so would
present a (very mild) incompatibility, in that there would exist identifiers
legal in Ada 2005 that would not be legal in Ada 2012. Given the fact that the
identifier rules in Ada 2005 were very screwed up, I suspect that this would be
unnoticable outside of the incompatibility documentation in the Standard.

Should we make this change? Let's discuss this a bit, and then I'll send out a
Letter Ballot to get a definitive answer.

****************************************************************

From: Jean-Pierre Rosen
Sent: Thursday, September 29, 2011  11:36 PM

> Generally, Ada has relied on the most recent version of other
> standards. If we follow this, we should change to using 10646:2011.
> But note that doing so would present a (very mild) incompatibility, in
> that there would exist identifiers legal in Ada 2005 that would not be legal
> in Ada 2012.

Anybody who used those letters in identifiers will get the trouble they deserve.
Even in French, I always advise against using accented letters - which are
pretty stable.

> Given the
> fact that the identifier rules in Ada 2005 were very screwed up, I
> suspect that this would be unnoticable outside of the incompatibility
> documentation in the Standard.

Not doing the change would even involve the risk of ISO frowning at us - and
corresponding delay in the standard.

****************************************************************

From: Randy Brukardt
Sent: Thursday, September 29, 2011  11:59 PM

> ISO 10646:2011 corresponds to Unicode 6.0 - which has nearly 2100
> additional characters over Unicode 5.2. (No idea how many have been
> added compared to 10646:2003, but it would seem to be a lot.)

Looking over this new standard, some things jump out at me:

(1) There seem to be a lot more references to Unicode. Apparently, the aversion
    to that has worn off somewhat.

(2) There is now an Annex (U) discussing identifiers. But all it says is to go
    read the Unicode document on the subject (giving a link)!

(3) Aside: I did just skim the Unicode document on identifiers. They've added
    some additional character properties specifically for identifiers. These are
    supposedly stable, in that newer Unicode versions will never take characters
    out of these categories. These would probably be better to base Ada on,
    however this would allow quite a few additional characters in identifiers
    (and would require more rewriting of the Standard). But the win is that it
    would avoid future incompatibilities. (One also could imagine adding
    functions to Wide_Character.Handling to return these properties, thus giving
    a decent way to process identifiers using those libraries.)  The document
    also suggests a different algorithm for applying normalization than Ada 2005
    does (probably because the Unicode document has changed a lot) -- we have an
    upcoming Ada 2012 BI on that issue [based on a question posted on
    Ada-Comment]. Probably should leave the question of changing the characters
    allowed until that BI.

(4) Annex C and D (referred to in our A.4.11) have been folded into the
    normative standard (although placeholders remain).

(5) Still no real information about case mapping or the like. We still have to
    reference the "documents mentioned in the note of Section 1".

****************************************************************

From: Robert Dewar
Sent: Friday, September 30, 2011  3:59 AM

> Generally, Ada has relied on the most recent version of other
> standards. If we follow this, we should change to using 10646:2011.
> But note that doing so would present a (very mild) incompatibility, in
> that there would exist identifiers legal in Ada 2005 that would not be
> legal in Ada 2012. Given the fact that the identifier rules in Ada
> 2005 were very screwed up, I suspect that this would be unnoticable
> outside of the incompatibility documentation in the Standard.
>
> Should we make this change? Let's discuss this a bit, and then I'll
> send out a Letter Ballot to get a definitive answer.

My first reaction was why not, go ahead with the change, no one uses this stuff
anyway.

Then I got to thinking that this will require several days work to research what
has changed, rerun the utilities to generate tables, rebuild the units using
these tables etc etc etc, all 100% totally useless work solely for the sake of a
reference that no one cares about.

Still I suppose we should make the change. Probably the best thing is to make
the change quietly, and then I don't think GNAT will even bother to do anything
about it till someone complains, which will be never.

****************************************************************

From: Robert Dewar
Sent: Friday, September 30, 2011  4:01 AM

> (5) Still no real information about case mapping or the like. We still
> have to reference the "documented mentioned in the note of Section 1".

I regard case wrapping for extended characters as an abomination. It is not
possible to do it "right" in a locale independent way, and doing it at all is a
huge mistake.

****************************************************************

From: Tucker Taft
Sent: Friday, September 30, 2011  11:07 AM

> While researching a question from Erhard's review, I happened to
> notice that a new edition of ISO 10646, the character set standard,
> was issued this year. (It's dated March 15th.)

I would say go for the latest.
Better now than later, especially if there are already Ada 2012 changes in this
area.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, October 11, 2011  3:43 PM

As previous noted, we need to decide whether to change to the latest version of
the character set standard. For most purposes, this is not a problem, but there
is an incompatibility as some Ada 2005 identifiers would not be legal in Ada
2012 -- these would use *very* obscure characters. (But given that the rules for
identifiers are very screwed up in Ada 2005, this incompatibility is much
smaller than the potential one caused by applying the BI on identifiers). Also
note that this will have an effect on the results from the functions in
Wide_Character.Handling for obscure characters.

Following is a Letter Ballot on this topic; please respond ASAP (but no later
than Monday, October 17th):


   The character set standard used in Ada 2012 should be:

        _____  ISO/IEC 10646:2003 (that is, no change - corresponds to Unicode
	       4.0).


        _____  ISO/IEC 10646:2011 (that is, the current standard - corresponds
	       to Unicode 6.0). If choosing this option, please select from one
	       of the following:

              _____  Keep the identifier rules as currently defined, with no
		     plans to change them.

              _____  Keep the identifier rules as currently defined, but plan to
		     issue a BI in the future [if this is appropriate after
		     study] to change to use the recommended XId_Start and
		     XId_Continue classes to define the characters that can be
		     used. (These are defined to be stable, like case folding,
		     but unlike the letter classes we currently use.) We'd
		     probably also want to add functions matching these
		     classifications to (Wide_)Wide_Characters.Handling so that
		     identifier processing can be usefully written in Ada code
		     (that's not possible now as the currently used classes
		     aren't stable and thus will change from Ada version to Ada
		     version).

              _____  Change the identifier rules now to use the Xid_Start and
		     Xid_Contain classes. (Probably would delay the Standard -
		     we'll need to consider the effect of potentially including
		     non-letters in identifiers on 'Image, among other things.)

****************************************************************

From: Randy Brukardt
Sent: Tuesday, October 11, 2011  11:56 PM

> Following is a Letter Ballot on this topic; please respond ASAP (but
> no later than Monday, October 17th):
>
>
>    The character set standard used in Ada 2012 should be:
>
>         _____  ISO/IEC 10646:2003 (that is, no change - corresponds to
>                Unicode 4.0).
>
>
>         __X___  ISO/IEC 10646:2011 (that is, the current standard -
>                 corresponds to Unicode 6.0).
>                       If choosing this option, please select from one
> of the following:
>
>               _____  Keep the identifier rules as currently defined,
> with no plans to change them.
>
>               ___Y__  Keep the identifier rules as currently defined,
> but plan to issue a BI in the future [if this is appropriate after
> study] to change to use the recommended XId_Start and XId_Continue
> classes to define the characters that can be used. (These are defined
> to be stable, like case folding, but unlike the letter classes we
> currently use.) We'd probably also want to add functions matching
> these classifications to (Wide_)Wide_Characters.Handling so that
> identifier processing can be usefully written in Ada code (that's not
> possible now as the currently used classes aren't stable and thus will
> change from Ada version to Ada version).
>
>               _____  Change the identifier rules now to use the
> Xid_Start and Xid_Contain classes. (Probably would delay the Standard
> - we'll need to consider the effect of potentially including
> non-letters in identifiers on 'Image, among other things.)

****************************************************************

From: Robert Dewar
Sent: Wednesday, October 12, 2011  5:25 AM

> Following is a Letter Ballot on this topic; please respond ASAP (but
> no later than Monday, October 17th):
>
>
>     The character set standard used in Ada 2012 should be:
>
>          __X___  ISO/IEC 10646:2003 (that is, no change - corresponds
>                  to Unicode 4.0).

just because I think this stuff is so little used (not sure it is used at all),
and it is not worth doing major implementation work to make a change that will
affect no one.

OTOH, if we do the change, I don't think GNAT will bother to follow unless some
real user complains, which will likely be never :-)

****************************************************************

From: Tucker Taft
Sent: Wednesday, October 12, 2011  7:27 AM

I'll go with Randy's recommendation (see below).

...
>          _X____  ISO/IEC 10646:2011 (that is, the current standard -
>                  corresponds to Unicode 6.0).
>                        If choosing this option, please select from one
>                        of the following:
>
>                _____  Keep the identifier rules as currently defined,
>                       with no plans to change them.
>
>                __X___  Keep the identifier rules as currently defined,
>                        but plan to issue a BI in the future [if this is
>                        appropriate ...

****************************************************************

From: Jean-Pierre Rosen
Sent: Wednesday, October 12, 2011  7:41 AM

> Following is a Letter Ballot on this topic; please respond ASAP (but
> no later than Monday, October 17th):
>
>
>    The character set standard used in Ada 2012 should be:
>
>         _____  ISO/IEC 10646:2003 (that is, no change - corresponds to
>                Unicode 4.0).
>
>
>         _X____  ISO/IEC 10646:2011 (that is, the current standard -
>                 corresponds to Unicode 6.0).
>                       If choosing this option, please select from one
>                       of the following:
>
>               _____  Keep the identifier rules as currently defined,
>                      with no plans to change them.
>
>               ___X__  Keep the identifier rules as currently defined,
> but plan to issue a BI in the future [if this is appropriate after
> study] to change to use the recommended XId_Start and XId_Continue
> classes to define the characters that can be used. (These are defined
> to be stable, like case folding, but unlike the letter classes we
> currently use.) We'd probably also want to add functions matching
> these classifications to (Wide_)Wide_Characters.Handling so that
> identifier processing can be usefully written in Ada code (that's not
> possible now as the currently used classes aren't stable and thus will change from Ada version to Ada version).
>
>               _____  Change the identifier rules now to use the
> Xid_Start and Xid_Contain classes. (Probably would delay the Standard
> - we'll need to consider the effect of potentially including
> non-letters in identifiers on 'Image, among other things.)

****************************************************************

From: Bob Duff
Sent: Wednesday, October 12, 2011  8:44 AM

> As previous noted, we need to decide whether to change to the latest
> version of the character set standard. For most purposes, this is not
> a problem, but there is an incompatibility as some Ada 2005
> identifiers would not be legal in Ada 2012 -- these would use *very*
> obscure characters. (But given that the rules for identifiers are very
> screwed up in Ada 2005, this incompatibility is much smaller than the
> potential one caused by applying the BI on identifiers).

I don't understand how the rules are "very screwed up"
(and I don't really want to -- I'm sure you've explained it before -- no need to
do so again).

But whatever the screwup is, I'm guessing implementations don't obey it, so when
talking about [in]compatibility, we should be talking about what implementations
actually do.

Anyway, my vote is for:

>         __X__  ISO/IEC 10646:2003 (that is, no change - corresponds to
>                Unicode 4.0).

because any change is going to require a lot of not-very-useful work for
implementations.

****************************************************************

From: Robert Dewar
Sent: Wednesday, October 12, 2011  8:51 AM

> But whatever the screwup is, I'm guessing implementations don't obey
> it, so when talking about [in]compatibility, we should be talking
> about what implementations actually do.

GNAT follows exactly the 2005 rules. I don't really agree they are "very screwed
up", but I know of no discrepancies between the 2005 standard and what GNAT
does. The issue is the categories and the way they are used.

> Anyway, my vote is for:
>
>>          __X__  ISO/IEC 10646:2003 (that is, no change - corresponds
>> to Unicode 4.0).
>
> because any change is going to require a lot of not-very-useful work
> for implementations.

Well, pretend to require, you can't really require implementations to do
anything :-)

****************************************************************

From: Bob Duff
Sent: Wednesday, October 12, 2011  9:12 AM

> Well, pretend to require, you can't really require implementations to
> do anything :-)

Very good point!  We (language designers) have a tendency to forget that.

****************************************************************

From: Robert Dewar
Sent: Wednesday, October 12, 2011  9:23 AM

And I think after some debacles (like the leap second nonsense) implementors are
less likely to automatically jump to implement everything :-)

****************************************************************

From: Erhard Ploedereder
Sent: Wednesday, October 12, 2011  12:07 PM

> Following is a Letter Ballot on this topic; please respond ASAP (but
> no later than Monday, October 17th):

I'll abstain on this ballot out of sheer ignorance of the issues.

****************************************************************

From: Tullio Vardanega
Sent: Wednesday, October 12, 2011  1:04 PM

So do I.

>> Following is a Letter Ballot on this topic; please respond ASAP (but
>> no later than Monday, October 17th):
> I'll abstain on this ballot out of sheer ignorance of the issues.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, October 12, 2011  7:10 PM

> > But whatever the screwup is, I'm guessing implementations don't obey
> > it, so when talking about [in]compatibility, we should be talking
> > about what implementations actually do.
>
> GNAT follows exactly the 2005 rules.

I very highly doubt this.

> I don't really agree
> they are "very screwed up", but I know of no discrepancies between the
> 2005 standard and what GNAT does. The issue is the categories and the
> way they are used.

The categories are only the tip of the iceberg. Does GNAT:

(1) allow "other-format" characters (like soft hyphens) in identifiers?
Original Ada 2005 did (later repealed for Ada 2012).

(2) use full case folding for identifier equivalence checks? That means that
"a" is the same as "ass" and "ASS". (Also now changed for Ada 2012 for
compatibility reasons with Ada 95, but it was fully intended to be the case for
Ada 2005.)

(3) Returns a full case folded string from 'Image (as specified in the Ada 2005
standard), even when this would change the length and typically put the string
into lower case? (This was just a bug in Ada 2005, but there is no easy fix and
extensive changes were needed.)

My understanding from previous discussions is that GNAT does none of these.
That's probably a good thing [(3) a clearly a case of Robert's rule of the
standard saying something silly; (1) was repealed a long time ago; and (2)
caused an unintentional incompatibility], but it surely is not the same as
"follows exactly the Ada 2005 rules". It's much closer to "following the Ada
2005 as we wish they would be". ;-)

Back to the topic: the only major change from using 10646:2011 instead of
10646:2003 would be that a few obscure characters would change category, and
presumably the equivalence ("case folding") and case conversion tables also have
some changes in obscure cases. Any other changes to identifiers would need to be
discussed in the future because we need to consider all of the impacts (and we
already have an open soon-to-be AI on "normalization", which probably will
demand more changes to the rules anyway).

It should be noted that the "official" Ada rules have been coming closer to what
you want, but that the ARG remains committed to following the Unicode
recommendations as closely as makes sense for Ada. That almost certainly means
that are going to be some rules that you don't like.

I've said before, and I'll be happy to say again that I don't think characters
outside of Latin-1 should be allowed in identifiers, period, but we did not feel
that we had a choice in this matter given the directives on internationalization
of programming languages. As such, we have to make the best fit of those
recommendations with Ada.

****************************************************************

From: Robert Dewar
Sent: Wednesday, October 12, 2011  8:56 PM

> The categories are only the tip of the iceberg. Does GNAT:
>
> (1) allow "other-format" characters (like soft hyphens) in identifiers?
> Original Ada 2005 did (later repealed for Ada 2012).

yes, then changed

> (2) use full case folding for identifier equivalence checks? That
> means that "a" is the same as "ass" and "ASS". (Also now changed for
> Ada 2012 for compatibility reasons with Ada 95, but it was fully
> intended to be the case for Ada 2005.)

No, but I always thought this was an absurd misreading of Ada 2005, no informed
person can have intended that reading

> (3) Returns a full case folded string from 'Image (as specified in the Ada
> 2005 standard), even when this would change the length and typically
> put the string into lower case? (This was just a bug in Ada 2005, but
> there is no easy fix and extensive changes were needed.)

It can't change the length

> My understanding from previous discussions is that GNAT does none of these.
> That's probably a good thing [(3) a clearly a case of Robert's rule of
> the standard saying something silly; (1) was repealed a long time ago;
> and (2) caused an unintentional incompatibility], but it surely is not
> the same as "follows exactly the Ada 2005 rules". It's much closer to
> "following the Ada 2005 as we wish they would be". ;-)

The rules were badly written, and have to be interpreted with lavish use of
Robert's rule

> It should be noted that the "official" Ada rules have been coming
> closer to what you want, but that the ARG remains committed to
> following the Unicode recommendations as closely as makes sense for
> Ada. That almost certainly means that are going to be some rules that you
> don't like.
>
> I've said before, and I'll be happy to say again that I don't think
> characters outside of Latin-1 should be allowed in identifiers,
> period, but we did not feel that we had a choice in this matter given
> the directives on internationalization of programming languages. As
> such, we have to make the best fit of those recommendations with Ada.

Fine, but why bother with changing them then if all you are doing is meeting
directives, rather than doing something useful. I don't see that there were any
directives mandating case folding, which remains a plain error in thinking.

****************************************************************

From: Brad Moore
Sent: Wednesday, October 12, 2011  9:42 PM

> > Well, pretend to require, you can't really require
> > implementations to do anything :-)
>
> Very good point!  We (language designers) have a tendency to
> forget that.

If that's the case then would it not it be better to at least have the RM
mention the more up to date standard, so that implementations can
go with that version if they have the time and energy to implement it?

If instead they find nobody cares or notices that the newer standard isn't
implemented, then leaving their implementation as is isn't a problem either, is
it?

I'm trying to decide how to respond to the ballot. My feeling is that Randy's
response is the best response, but I also am sympathetic to implementation
burden, if real user's aren't likely to notice one way or the other.

****************************************************************

From: John Barnes
Sent: Thursday, October 13, 2011  7:53 AM

I agree with Erhard. I am going to abstain as well.

> I'll abstain on this ballot out of sheer ignorance of the issues.

****************************************************************

From: Gary Dismukes
Sent: Thursday, October 13, 2011  1:34 PM

> I agree with Erhard. I am going to abstain as well.

Count me in the list of abstainers.  I don't understand the issues well enough.
(If forced to vote I'd go for no change.)

****************************************************************

From: Steve Baird
Sent: Thursday, October 13, 2011  1:57 PM

>> I agree with Erhard. I am going to abstain as well.
>
> Count me in the list of abstainers.  I don't understand the issues
> well enough.  (If forced to vote I'd go for no change.)
>

Ditto.

****************************************************************

From: Ed Schonberg
Sent: Thursday, October 13, 2011  2:19 PM

I abstain as well, and  for the same reasons.

****************************************************************

From: Tucker Taft
Sent: Thursday, October 13, 2011  2:31 PM

You guys are a bunch of wimps... ;-)

****************************************************************

From: Jean-Pierre Rosen
Sent: Thursday, October 13, 2011  2:40 PM

Well, let me comment why I didn't abstain. If we believe in standards, and if we
believe that the guys who design 10646 know better than us, we have to follow.
The only freedom we have is in trying to do so in a manner that is not too
disruptive.

****************************************************************

From: Robert Dewar
Sent: Thursday, October 13, 2011  4:00 PM

perhaps we have to follow, but not to race, we have not had enough time to study
this change, let's leave it for Ada 2020, and perhaps issue a BI that allows
implementations to change before then, just as we did for 8-bit characters.

****************************************************************

From: Randy Brukardt
Sent: Thursday, October 13, 2011  4:26 PM

The only problem with that is that it would change the run-time behavior of
[Wide_]Wide_Characters.Handling (since a few characters change classifications).
It seems like a bad idea to have different implementations having different
interpretations of the correct behavior of these functions.

OTOH, the identifier syntax changes definitely need study before we adopt them
(or not), no one can reasonably implement what the Ada 2005 actually says, so
implementations will inevitably differ subtly on this in any case, and there
seems to be little evidence that programmers are using this, so deferring the
change there is better.

ISO 10646:2011 has an Annex (annex U) that specifically says that identifiers in
programming languages should follow the Unicode recommendations (giving a link,
not including them). But there is a lot of wiggle room in those Unicode
recommendations.

One alternative to "fix" the run-time issue with [Wide_]Wide_Characters.Handling
would be to say that it is implementation-defined exactly which character set
standard it depends on. Or some other statement that programmers should expect
there will be changes in character classifications, case conversions, and the
like in future standards, so we can ignore the "compatibility" issue in the
future. (After all, for most applications, it won't make any difference, or it
would be *better* for the package to use the most recent character set standard
- or at least one that applies to the target system; tying it for all time to
any particular character set standard [which we know is going to change] is
rather silly.)

****************************************************************

From: Robert Dewar
Sent: Thursday, October 13, 2011  4:42 PM

> The only problem with that is that it would change the run-time
> behavior of [Wide_]Wide_Characters.Handling (since a few characters
> change classifications). It seems like a bad idea to have different
> implementations having different interpretations of the correct behavior of
> these functions.

This is more of a theoretical concern than an actual one. And changing the
standard is not going to have any immediate effect on GNAT in the immediate
future anyway (we have already frozen the feature set for the 2012 releases of
GNAT).

And of course your suggestion leads to different implementations having even
mnore different intepretations (how many other Ada 2012 compilers do you expect
to see in the near future?), since it is much more likely that the two different
implementations involved will be an Ada 2005 one and an Ada 2012 one.

Furthermore, we did a much bigger incompatible change with 7 to 8-bit characters
and it caused very little trouble.

> One alternative to "fix" the run-time issue with
> [Wide_]Wide_Characters.Handling would be to say that it is
> implementation-defined exactly which character set standard it depends on.

What on earth would that achieve

> Or some other statement that programmers should expect there will be
> changes in character classifications, case conversions, and the like
> in future standards, so we can ignore the "compatibility" issue in the
> future. (After all, for most applications, it won't make any
> difference, or it would be
> *better* for the package to use the most recent character set standard
> - or at least one that applies to the target system; tying it for all
> time to any particular character set standard [which we know is going
> to change] is rather silly.)

That's merely a formalistic argument, no programmer will change their behavior
on the basis of such a statement in the RM.

****************************************************************

From: Randy Brukardt
Sent: Thursday, October 13, 2011  7:28 PM

...
> > The only problem with that is that it would change the run-time
> > behavior of [Wide_]Wide_Characters.Handling (since a few characters
> > change classifications). It seems like a bad idea to have different
> > implementations having different interpretations of the correct
> > behavior of these functions.
>
> This is more of a theoretical concern than an actual one. And changing
> the standard is not going to have any immediate effect on GNAT in the
> immediate future anyway (we have already frozen the feature set for
> the
> 2012 releases of GNAT).

It's not that theoretical: Ada.Wide_Wide_Characters.Handling is easy to
implement and probably will be supported by a number of Ada compilers in the
near future. And it has nothing to do with "features": the package exists in any
case, the question is exactly what it should return.

> And of course your suggestion leads to different implementations
> having even mnore different intepretations (how many other Ada 2012
> compilers do you expect to see in the near future?), since it is much
> more likely that the two different implementations involved will be an
> Ada 2005 one and an Ada 2012 one.

No Ada 2005 implementation has Ada.Wide_Wide_Characters.Handling -- it's an Ada
2012 package. If it does have it, it's formally an implementation-defined
package and thus irrelevant.

Let me say again, I am *not* talking in any way about identifiers or their
syntax. They have absolutely nothing to do with the package
Ada.Wide_Wide_Characters.Handling.

> Furthermore, we did a much bigger incompatible change with 7 to 8-bit
> characters and it caused very little trouble.

I don't see how this has anything whatsoever to do with the case in point.

...
> > Or some other statement that programmers should expect there will be
> > changes in character classifications, case conversions, and the like
> > in future standards, so we can ignore the "compatibility" issue in
> > the future. (After all, for most applications, it won't make any
> > difference, or it would be
> > *better* for the package to use the most recent character set
> > standard
> > - or at least one that applies to the target system; tying it for
> > all time to any particular character set standard [which we know is
> > going to change] is rather silly.)
>
> That's merely a formalistic argument, no programmer will change their
> behavior on the basis of such a statement in the RM.

Probably not, and that's OK -- the primary thing is to warn programmers that the
behavior of these functions on currently undefined code points is likely to
change in future versions of Ada. As with any case of a "bounded error", neither
the compiler implementer nor programmers are likely to pay much attention to the
rule -- but at least they were warned in print.

Anyway, let me ask you specifically what you think this package should do for
new/changed characters. I'm specifically talking about the behavior of functions
in Wide_Wide_Characters.Handling like Is_Letter and Is_Upper when they are
passed a character with a code position corresponding to a new character defined
in 10646:2011 (or some later version):

   (1) Wide_Wide_Characters.Handling returns values based on 10646:2003 forever.
       Very compatible, but also very out of date in the future (Ada 2012 is
       expected to last until 2020, at which point 10646:2003 will be 17 years
       old and probably will have been replaced at least one more time).

   (2) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada
       2012, updated to use some newer standard down the road. Updating to use
       some newer standard will be run-time incompatible - a few characters that
       are letters in 2003 are not letters in 2011.

       (2a) Do the above, but indicate to users of the package that the results
	    may change in the future as character sets evolve.

   (3) Wide_Wide_Characters.Handling returns values based on 10646:2011 forever.
       Also very compatible, but will also get out of date.

   (4) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada
       2012, updated to use some newer standard down the road. Similar to (2)
       above.

       (4a) Do the above, and also something similar to (2a).

   (5) Wide_Wide_Characters.Handling returns values based on an
       implementation-defined character set standard. Lets Robert do whatever he
       wants. :-)

We have to make *some* choice of these options: users need to know what they can
count on, we need to know how far ACATS and implementer internal tests can go,
etc. Ignoring the question results in (1) or (5), depending on who's doing the
interpreting.

My personal preference is (4a), followed by (2a). But I think we need some
statement in the Standard so down the road we do not feel compelled to keep
exact run-time compatibility as we do for Ada.Characters.Handling. Else Ada will
be stuck sooner or later with an obsolete character set standard.

I agree with you that it's too late now to adopt the 10646:2011 identifier
recommendations, but that is a very separate issue from the one of run-time
character classifications. I'm primarily interested in the latter now.

****************************************************************

From: Robert Dewar
Sent: Thursday, October 13, 2011  9:03 PM

> Let me say again, I am *not* talking in any way about identifiers or
> their syntax. They have absolutely nothing to do with the package
> Ada.Wide_Wide_Characters.Handling.

OK, got it, was confused

>> Furthermore, we did a much bigger incompatible change with 7 to 8-bit
>> characters and it caused very little trouble.
>
> I don't see how this has anything whatsoever to do with the case in point.

it was a case where we made a big change between versions of the standard.

>     (4) Wide_Wide_Characters.Handling returns values based on
> 10646:2003 for Ada 2012, updated to use some newer standard down the
> road. Similar to (2) above.
>
>         (4a) Do the above, and also something similar to (2a).

This (4a) is the one I would choose

> My personal preference is (4a), followed by (2a). But I think we need
> some statement in the Standard so down the road we do not feel
> compelled to keep exact run-time compatibility as we do for
> Ada.Characters.Handling. Else Ada will be stuck sooner or later with an
> obsolete character set standard.

Well I chose 4a before reading it was your first choice.

> I agree with you that it's too late now to adopt the 10646:2011
> identifier recommendations, but that is a very separate issue from the
> one of run-time character classifications. I'm primarily interested in the
> latter now.

So it lookse like 4a might be viable as a consensus decision here?

****************************************************************

From: Randy Brukardt
Sent: Thursday, October 13, 2011  9:19 PM

...
> >     (4) Wide_Wide_Characters.Handling returns values based on
> > 10646:2003 for Ada 2012, updated to use some newer standard down the
> > road. Similar to (2) above.
> >
> >         (4a) Do the above, and also something similar to (2a).
>
> This (4a) is the one I would choose

Sorry, I botched this item, I put the wrong year on the Standard. As written,
this is identical to (2) and (2a). I meant (4) to be the one that uses
10646:2011, (2) is the one that uses 10646:2003. I suspect that you meant (2a),
but I'd like a clarification from you.

...
> So it lookse like 4a might be viable as a consensus decision here?

Except that I screwed up the choices. Please consider (4) as using 10646:2011,
and vote again.

****************************************************************

From: Robert Dewar
Sent: Friday, October 14, 2011  9:28 AM

>> So it lookse like 4a might be viable as a consensus decision here?
>
> Except that I screwed up the choices. Please consider (4) as using
> 10646:2011, and vote again.

Now I am confused, can you send a new email with the newly updated choices
clear, so I am not trying to create a virtual result from synchronizing old
emails?

****************************************************************

From: Randy Brukardt
Sent: Friday, October 14, 2011  3:16 PM

Sorry about the confusion. I created (4) and (4a) with cut-and-paste and
insufficiently updated them. Here is the complete list:

What is the behavior of functions in Wide_Wide_Characters.Handling like
Is_Letter and Is_Upper when they are passed a character with a code position
corresponding to a new character defined in 10646:2011 (or some later version):

   (1) Wide_Wide_Characters.Handling returns values based on 10646:2003 forever.
       Very compatible, but also very out of date in the future (Ada 2012 is
       expected to last until 2020, at which point 10646:2003 will be 17 years
       old and probably will have been replaced at least one more time).

   (2) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada
       2012, updated to use some newer standard down the road. Updating to use
       some newer standard will be run-time incompatible - a few characters that
       are letters in 2003 are not letters in 2011 (but these are unlikely
       corner cases, not the commonly used letters) and similarly for other
       classifications.

       (2a) Do the above, but indicate to users of the package that the results
	    may change in the future as character sets evolve.

   (3) Wide_Wide_Characters.Handling returns values based on 10646:2011 forever.
       Also very compatible, but will also get out of date.

   (4) Wide_Wide_Characters.Handling returns values based on 10646:2011 for Ada
       2012, and will be update to use newer standards down the road. Similar to
       (2) above, but using the 2011 character standard now. Future changes
       probably would be run-time incompatible, but most likely in unlikely
       corner cases.

       (4a) Do the above, and also something similar to (2a).

   (5) Wide_Wide_Characters.Handling returns values based on an
       implementation-defined character set standard. Lets Robert do whatever he
       wants. :-)

We have to make *some* choice of these options: users need to know what they can
count on, we need to know how far ACATS and implementer internal tests can go,
etc. Ignoring the question results in (1) or (5), depending on who's doing the
interpreting.

My personal preference is (4a) [because I can't think of any good reason not to
use the "current" classifications here - it is explicitly not necessarily the
same as used for identifiers], followed by (2a). But I think we need some
statement in the Standard so down the road we do not feel compelled to keep
exact run-time compatibility as we do for Ada.Characters.Handling. Else Ada will
be stuck sooner or later with an obsolete character set standard.

****************************************************************

From: Robert Dewar
Sent: Friday, October 14, 2011  3:21 PM

Right, Robert's vote is for 2a, which is basically status quo with an indication
that updates may occur based on subsequent versions of the standard.

****************************************************************

From: Jean-Pierre Rosen
Sent: Saturday, October 15, 2011  12:39 AM

Since we restart from scratch, let me cast again my vote for 4a, precisely
because I don't fully understand the issue.

Character set issues are very complex. I assume that the people at 10646 are
very aware of compatibility issues, and that every non-upward compatible change
is the result of a carefully evaluated trade-off. At some point, you have to
trust the knowledge of other people. What would you say, if someone from the
10646 committee came to our meeting and told us that we got the accessibility
rules completely wrong ;-)?

****************************************************************

From: Robert Dewar
Sent: Saturday, October 15, 2011  7:22 AM

> Since we restart from scratch, let me cast again my vote for 4a,
> precisely because I don't fully understand the issue.

The reason I prefer 2a to 4a is that it will reflect reality. There is no way
that anyone at AdaCore will do other than 2a in the short term in the absence of
any customer demand. When there is customer demand to update to a new version,
we can do so at that point, which would then be totally consistent with the ida
of 2a.

To me it is just too late to be making the immediate change to 4a, when no one
has investigated the implications or the impact of any upward incompatibility.

BTW, the assumption that the appropriate standards committee has properly
considered the compatibility issues is dubious. In practice standards committees
often get more concerned with doing things right, than maintaining compatibility
(*)

(*) look at what we did in Ada with limited returns for Ada 2005, which severely
impacted the ability of many to move from Ada 95 to Ada 2005.

****************************************************************

From: Robert Dewar
Sent: Saturday, October 15, 2011  7:24 AM

By the way, I really think Randy's suggestion here of specifically allowing for
updating the standard used is an excellent one, MUCH better than just closing
our eyes and mandating the status quo till the next version.

****************************************************************

From: Tucker Taft
Sent: Saturday, October 15, 2011  10:39 AM

I'll go for 4a as well.  I understand that GNAT has already implemented
something else, but Ada 2012 doesn't even exist yet, so it is premature to have
its content depend on what has or has not already been implemented by particular
implementations.

For those looking at the Ada 2012 standard when it comes out in 2012 (or 2013?),
it makes no sense to me to tie it to an already out-of-date standard.

As usual, implementations will do what they do based on market demands.  If no
one cares about these details anyway, we might as well get the words of the
standard right, even if the reality is not going to match the words on day one.

****************************************************************

From: Robert Dewar
Sent: Saturday, October 15, 2011  3:42 PM

Well my argument was about what will or will not be implemented in GNAT, not
what already has been implemented in GNAT. From one point of view I don't care
too much between 2a and 4a, since it won't make any difference to implementation
plans in practice.

I still don't like that no one on the ARG has carefully examined the two
versions of the standard to understand what level of compatibility problems
arise. It seems unwise to just adopt a standard without carefully examining it.
If we want to adopt this new standard right away, we should have at least one
person carefully examine the two standards and write up a document describing
the differences from a programmer point of view.

Aren't we sort of obligated to be very careful when it comes to introducing
non-upwards compatibile changes, and at least document these changes carefully?

****************************************************************

From: Randy Brukardt
Sent: Saturday, October 15, 2011  6243 PM

...
> I still don't like that no one on the ARG has carefully examined the
> two versions of the standard to understand what level of compatibility
> problems arise. It seems unwise to just adopt a standard without
> carefully examining it.

I'm not sure what level of care you are actually requiring; I spent more than an
hour reading it before I sent my original messages in this thread. And I
summarized the changes that I saw in the original messages. I didn't do a
character-by-character comparison, but that would be rather silly (I have to
presume that the summaries of changes are accurate). Note that this character
set standard is freely available, anyone can download it and read it as I did.

> ... If we
> want to adopt this new standard right away, we should have at least
> one person carefully examine the two standards and write up a document
> describing the differences from a programmer point of view.

I agree that we need a lot of care before adopting new identifier syntax, but
*NO ONE* has suggested that at this point. (That will be an open issue for
discussion at a near future ARG meeting - probably not the next one because I
doubt I'll have time to write it up.)

But for other things, this would be silly, because it is the same list that
happens for any character change in Ada: some characters change categories. Some
characters that were not previously graphic characters become them, and so on.
We already have all of those changes documented as incompatibilities in Ada 2005
(because it did change character set standards). This would cause the same sorts
of changes (only in obscure, rarely used characters). The Unicode change
documents describe the changed characters in detail -- are you saying that we
have to copy all of that and send it to you so that you can see those exact
details? It's all online for anyone that cares to look.

> Aren't we sort of obligated to be very careful when it comes to
> introducing non-upwards compatibile changes, and at least document
> these changes carefully?

Currently, we're talking specifically about package
Wide_Wide_Characters.Handling, which is new in Ada 2012. There is no possibility
of creating a "non-upwards compatible" change in a new package! I realize that
GNAT has an implementation-defined equivalent, but no one is going to
accidentally change from the GNAT-only package to a language-defined one without
intending it.

If we adopt the new standard globally, there will be more changes, the main one
being that more characters would be considered graphic characters by 'Image
(changing their image from the "Hex00xxxx" form), that's technically
incompatible but hardly very interesting. (Especially as we now allow 'Value to
take the hex form for all characters, so there would be no incompatibility in
'Value unless an implementer wanted to introduce it.) Identifiers would allow
more characters (as some of the new characters are "letters") - which has no
compatibility issues, and a handful of previously allowed characters would be
banned (which would be incompatible, but again these are rarely used characters,
and probably should never have been allowed in the first place). I'd rather not
make any identifier changes, but that would be hard to do using the new standard
(which changes classifications of a few characters).

There would be more changes if we adopted the identifier recommendations, but as
I've said there is no way I would recommend that for Ada 2012 -- it's just too
much change at too late a date. The other changes are pretty minimal, however.

Since Ada 2012 is not going to be frozen until after the upcoming ARG meeting,
we can discuss this during the next meeting. And it seems pretty obvious that we
ought to.

****************************************************************

From: Robert Dewar
Sent: Saturday, October 15, 2011  6:35 PM

> Currently, we're talking specifically about package
> Wide_Wide_Characters.Handling, which is new in Ada 2012. There is no
> possibility of creating a "non-upwards compatible" change in a new package!
> I realize that GNAT has an implementation-defined equivalent, but no
> one is going to accidentally change from the GNAT-only package to a
> language-defined one without intending it.

OK, that's fair enough

> Since Ada 2012 is not going to be frozen until after the upcoming ARG
> meeting, we can discuss this during the next meeting. And it seems
> pretty obvious that we ought to.

Also fair enough, I don't actually think it makes too much difference if we
choose 2a or 4a, I don't think it will make any difference to anyone
(programmers, or implementors, or reviewers or anyone else :-)) So if it really
makes people have a better warm feeling to have the standard say 4a, then no
problem as far as I am concerned.

****************************************************************


Questions? Ask the ACAA Technical Agent