!standard 1.1.4(14.2/2) 11-12-19 AI05-0266-1/02 !standard 1.2(7/2) !standard 1.2(8/2) !standard 1.2(9/2) !standard 2.1(1/2) !standard 2.1(3.1/2) !standard 2.1(4/2) !standard 2.1(4.1/2) !standard 2.1(15/2) !standard 2.1(16/2) !standard 2.3(5/2) !standard 3.5.2(2/2) !standard A.1(36.1/2) !standard A.1(36.2/2) !standard A.3.5(0) !class Amendment 11-11-01 !status Amendment 2012 11-12-19 !status ARG Approved 7-0-2 11-11-11 !status work item 11-11-01 !status received 11-09-29 !priority Low !difficulty Easy !subject Use the latest version of ISO/IEC 10646 !summary (1) Ada 2012 should reference the most recent version of other standards: (A) the 2011 version of character sets (10646); (B) the 2011 version of C; (C) the 2011 version of C++. (2) An implementation permission is added to use any character set standard, so long as it is at least as new as the 2003 edition of 10646. (3) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling) have a new function that reports the character set standard. We also add a note that the results of the functions depends on the character set standard used. (4) Ada.Wide_Characters.Handling (and Ada.Wide_Wide_Characters.Handling) are Pure. !proposal In March of 2011, a new version of the character set standard, ISO/IEC 10646:2011, was issued. Ada 2012 should use the most recent version of the character set standard (as it does with other standards). There is a 2011 revision for C, and a 2011 revision of C++ which also should be used. However, switching to a newer standard probably would introduce some incompatibilities in identifiers including unusual characters. Moreover, it is more important that Ada compilers support the character sets of the host and targets rather than any abstract standard. Finally, we don't want to make implementations wait until 2020 to support new characters (especially if those characters are important to some customer). So we propose that the character set Standard actually used be implementation-defined, subject only to the requirement that it is at least 10646:2003. The runtime behavior of Ada.Wide_Characters.Handling will depend on the exact character set used. We suggest adding a function to the package so that it can report what standard it uses. Programs that require particular behavior ought to check that the standard used is the one expected. Ada.Wide_Characters.Handling has no categorization pragma. This package should be Pure (like Ada.Characters.Handling). !wording Replace 1.2(7/2) with: ISO/IEC 9899:2011, Information Technology - Programming languages — C Delete 1.2(7.a/2). [The name is now in the usual form.] In 1.2(8/2), change "2003" to "2011". Replace 1.2(9/2) with: ISO/IEC 14882:2011, Information Technology - Programming languages — C++ Delete 1.2(9.a/2). [The name is now in the usual form.] Implementation Permission The categories defined above, as well as case mapping and folding, may be based on an implementation-defined version of ISO/IEC 10646 (2003 edition or later). AARM Ramification: The exact categories, case mapping, and case folding chosen affects identifiers, the result of '[[Wide_]Wide_]Image, and packages Wide_Characters.Handling and Wide_Wide_Characters.Handling. Add after A.3.5(4/3): pragma Pure(Handling) function Character_Set_Version return String; Add after A.3.5(23/3): function Character_Set_Version return String; Returns an implementation-defined identifier that identifies the version of the character set standard that is used for categorizing characters by the implementation. Add at the end of A.3.5: Implementation Advice The string returned by Character_Set_Version should include either “10646:” or “Unicode”. Note: The results returned by these functions may depend on which particular version of the 10646 standard is supported by the implementation (see 2.1). Change 10646:2003 to 10646:2011 wherever it appears. !discussion A program that cannot tolerate changes in the behavior of the classification of case conversion functions of Ada.Wide_Characters.Handling should check the results of the Character_Set_Version function before preceding. If it differs from the expected value, the program should take defensive measures. Note that any Ada program can count on the support of the characters defined in 10646:2003 except for the few characters whose classifications are changed in later standards. For commonly used character sets, like Greek and Cyrillic, the character set chosen by the implementation should not matter. ---- 10646:2011 adds an Annex U about identifier syntax. But all it says is to go read the Unicode documents! We need to reconsider exactly which characters are allowed in identifiers in order to meet this standard, but we'll do that in a separate AI (as this topic is not as clear-cut as the others here, and we already have such an AI to deal with another, related problem). Using 10646:2011 changes the details of "Simple Locale-independent Case Folding", "Simple Uppercase Mapping", and "Simple Lowercase Mapping", used by various rules in the standard. The former is defined to be "stable" (always changed compatibly), so there should not be any incompatiblities or inconsistencies caused by its change. Changes to "Simple Uppercase Mapping" might change the 'Image of identifiers containing obscure characters, and could make an enumeration type containing such obscure characters illegal -- but as the changes are all in unusual characters, this is unlikely to be a problem in practice. (Following the Ada 2012 rules exactly is likely to have the same level of incompatibility.) Using 10646:2011 will categorize more characters as letters, so that they would be allowed in identifiers. But we will consider adopting the Unicode 6.0 recommendations (as referenced in 10646:2011) for identifiers, which would also subtly change the characters allowed, with an early Binding Interpretation. So we will not consider any such effects here. !corrigendum 1.1.4(14.2/2) @drepl When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using locale-independent full case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003. @dby When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using simple upper case mapping, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2011. !corrigendum 1.2(7/2) @drepl ISO/IEC 9899:1999, @i, supplemented by Technical Corrigendum 1:2001 and Technical Corrigendum 2:2004. @dby ISO/IEC 9899:2011, @i. !corrigendum 1.2(8/2) @drepl ISO/IEC 10646:2003, @i. @dby ISO/IEC 10646:2011, @i. !corrigendum 1.2(9/2) @drepl ISO/IEC 14882:2003, @i. @dby ISO/IEC 14882:2011, @i. !corrigendum 2.1(1/2) @drepl The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2003 Universal Multiple-Octet Coded Character Set. This coding space is organized in @i, each plane comprising 65536 characters. @dby The character repertoire for the text of an Ada program consists of the entire coding space described by the ISO/IEC 10646:2011 Universal Multiple-Octet Coded Character Set. This coding space is organized in @i, each plane comprising 65536 characters. !corrigendum 2.1(3.1/2) @drepl A @fa is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC 10646:2011 allocates a character to that cell. @dby A @fa is defined by this International Standard for each cell in the coding space described by ISO/IEC 10646:2011, regardless of whether or not ISO/IEC 10646:2011 allocates a character to that cell. !corrigendum 2.1(4/2) @drepl The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2003). A character whose relative code position in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. @dby The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2011). A character whose relative code point in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The only characters allowed outside of comments are those in categories @fa, @fa, and @fa. !corrigendum 2.1(4.1/2) @drepl The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section 24 of ISO/IEC 10646:2003) is implementation defined. @dby The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section 21 of ISO/IEC 10646:2011) is implementation defined. !corrigendum 2.1(5/2) @drepl The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by the note in section 1 of ISO/IEC 10646:2003. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified. @dby The description of the language definition in this International Standard uses the character properties General Category, Simple Uppercase Mapping, Uppercase Mapping, and Special Case Condition of the documents referenced by the note in section 1 of ISO/IEC 10646:2011. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified. !corrigendum 2.1(15/2) @drepl The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2003) @dby The following names are used when referring to certain characters (the first name is that given in ISO/IEC 10646:2011) !corrigendum 2.1(16/2) @drepl In a nonstandard mode, the implementation may support a different character repertoire; in particular, the set of characters that are considered @fas can be extended or changed to conform to local conventions. @dby The categories defined above, as well as case mapping and folding, may be based on an implementation-defined version of ISO/IEC 10646 (2003 edition or later). !corrigendum 2.3(5/2) @drepl Two @fas are considered the same if they consist of the same sequence of characters after applying the following transformations (in this order): @dby Two @fas are considered the same if they consist of the same sequence of characters after applying locale-independent simple case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2011. !corrigendum 3.5.2(2/2) @drepl The predefined type Character is a character type whose values correspond to the 256 code positions of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding @fa in Character. Each of the nongraphic positions of Row 00 (0000-001F and 007F-009F) has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in @i. @dby The predefined type Character is a character type whose values correspond to the 256 code points of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2011 Basic Multilingual Plane (BMP). Each of the graphic characters of Row 00 of the BMP has a corresponding @fa in Character. Each of the nongraphic characters of Row 00 has a corresponding language-defined name, which is not usable as an enumeration literal, but which is usable with the attributes Image, Wide_Image, Wide_Wide_Image, Value, Wide_Value, and Wide_Wide_Value; these names are given in the definition of type Character in A.1, "The Package Standard", but are set in @i. !corrigendum A.1(36.1/2) @drepl @xcode< --@ft<@i< The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2003 BMP character>> --@ft<@i< set. The first 256 positions have the same contents as type Character. See 3.5.2.>> @b Wide_Character @b (@i, @i ... @i, @i);> @dby @xcode< --@ft<@i< The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2011 BMP character>> --@ft<@i< set. The first 256 positions have the same contents as type Character. See 3.5.2.>> @b Wide_Character @b (@i, @i ... @i, @i);> !corrigendum A.1(36.2/2) @drepl @xcode< --@ft<@i< The declaration of type Wide_Wide_Character is based on the full>> --@ft<@i< ISO/IEC 10646:2003 character set. The first 65536 positions have the>> --@ft<@i< same contents as type Wide_Character. See 3.5.2.>> @b Wide_Wide_Character @b (@i, @i ... @i, @i); @b Wide_Wide_Character'Size @b 32;> @dby @xcode< --@ft<@i< The declaration of type Wide_Wide_Character is based on the full>> --@ft<@i< ISO/IEC 10646:2011 character set. The first 65536 positions have the>> --@ft<@i< same contents as type Wide_Character. See 3.5.2.>> @b Wide_Wide_Character @b (@i, @i ... @i, @i); @b Wide_Wide_Character'Size @b 32;> !corrigendum A.3.5(0) @dinsc Force a conflict; the real text is found in the conflict file. !ACATS test No separate ACATS test is needed (since the exact character set supported is implementation-defined). Any tests involving identifiers should be postponed until the AI on identifiers is decided. !ASIS No ASIS effect. !appendix From: Randy Brukardt Sent: Thursday, September 29, 2011 11:20 PM While researching a question from Erhard's review, I happened to notice that a new edition of ISO 10646, the character set standard, was issued this year. (It's dated March 15th.) Ada 2005 relied on ISO 10646:2003, which corresponds to Unicode 4.0. ISO 10646:2011 corresponds to Unicode 6.0 - which has nearly 2100 additional characters over Unicode 5.2. (No idea how many have been added compared to 10646:2003, but it would seem to be a lot.) Which set is used would affect the exact characters used in identifiers (many of the new characters could be used in identifiers, and there are a few characters which have been reclassified such that they would not be usable in identifiers). It also would affect the results from the new packages Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling. Presumably (although I haven't checked this), there would be changes in case mapping as well. Generally, Ada has relied on the most recent version of other standards. If we follow this, we should change to using 10646:2011. But note that doing so would present a (very mild) incompatibility, in that there would exist identifiers legal in Ada 2005 that would not be legal in Ada 2012. Given the fact that the identifier rules in Ada 2005 were very screwed up, I suspect that this would be unnoticable outside of the incompatibility documentation in the Standard. Should we make this change? Let's discuss this a bit, and then I'll send out a Letter Ballot to get a definitive answer. **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, September 29, 2011 11:36 PM > Generally, Ada has relied on the most recent version of other > standards. If we follow this, we should change to using 10646:2011. > But note that doing so would present a (very mild) incompatibility, in > that there would exist identifiers legal in Ada 2005 that would not be legal > in Ada 2012. Anybody who used those letters in identifiers will get the trouble they deserve. Even in French, I always advise against using accented letters - which are pretty stable. > Given the > fact that the identifier rules in Ada 2005 were very screwed up, I > suspect that this would be unnoticable outside of the incompatibility > documentation in the Standard. Not doing the change would even involve the risk of ISO frowning at us - and corresponding delay in the standard. **************************************************************** From: Randy Brukardt Sent: Thursday, September 29, 2011 11:59 PM > ISO 10646:2011 corresponds to Unicode 6.0 - which has nearly 2100 > additional characters over Unicode 5.2. (No idea how many have been > added compared to 10646:2003, but it would seem to be a lot.) Looking over this new standard, some things jump out at me: (1) There seem to be a lot more references to Unicode. Apparently, the aversion to that has worn off somewhat. (2) There is now an Annex (U) discussing identifiers. But all it says is to go read the Unicode document on the subject (giving a link)! (3) Aside: I did just skim the Unicode document on identifiers. They've added some additional character properties specifically for identifiers. These are supposedly stable, in that newer Unicode versions will never take characters out of these categories. These would probably be better to base Ada on, however this would allow quite a few additional characters in identifiers (and would require more rewriting of the Standard). But the win is that it would avoid future incompatibilities. (One also could imagine adding functions to Wide_Character.Handling to return these properties, thus giving a decent way to process identifiers using those libraries.) The document also suggests a different algorithm for applying normalization than Ada 2005 does (probably because the Unicode document has changed a lot) -- we have an upcoming Ada 2012 BI on that issue [based on a question posted on Ada-Comment]. Probably should leave the question of changing the characters allowed until that BI. (4) Annex C and D (referred to in our A.4.11) have been folded into the normative standard (although placeholders remain). (5) Still no real information about case mapping or the like. We still have to reference the "documents mentioned in the note of Section 1". **************************************************************** From: Robert Dewar Sent: Friday, September 30, 2011 3:59 AM > Generally, Ada has relied on the most recent version of other > standards. If we follow this, we should change to using 10646:2011. > But note that doing so would present a (very mild) incompatibility, in > that there would exist identifiers legal in Ada 2005 that would not be > legal in Ada 2012. Given the fact that the identifier rules in Ada > 2005 were very screwed up, I suspect that this would be unnoticable > outside of the incompatibility documentation in the Standard. > > Should we make this change? Let's discuss this a bit, and then I'll > send out a Letter Ballot to get a definitive answer. My first reaction was why not, go ahead with the change, no one uses this stuff anyway. Then I got to thinking that this will require several days work to research what has changed, rerun the utilities to generate tables, rebuild the units using these tables etc etc etc, all 100% totally useless work solely for the sake of a reference that no one cares about. Still I suppose we should make the change. Probably the best thing is to make the change quietly, and then I don't think GNAT will even bother to do anything about it till someone complains, which will be never. **************************************************************** From: Robert Dewar Sent: Friday, September 30, 2011 4:01 AM > (5) Still no real information about case mapping or the like. We still > have to reference the "documented mentioned in the note of Section 1". I regard case wrapping for extended characters as an abomination. It is not possible to do it "right" in a locale independent way, and doing it at all is a huge mistake. **************************************************************** From: Tucker Taft Sent: Friday, September 30, 2011 11:07 AM > While researching a question from Erhard's review, I happened to > notice that a new edition of ISO 10646, the character set standard, > was issued this year. (It's dated March 15th.) I would say go for the latest. Better now than later, especially if there are already Ada 2012 changes in this area. **************************************************************** From: Randy Brukardt Sent: Tuesday, October 11, 2011 3:43 PM As previous noted, we need to decide whether to change to the latest version of the character set standard. For most purposes, this is not a problem, but there is an incompatibility as some Ada 2005 identifiers would not be legal in Ada 2012 -- these would use *very* obscure characters. (But given that the rules for identifiers are very screwed up in Ada 2005, this incompatibility is much smaller than the potential one caused by applying the BI on identifiers). Also note that this will have an effect on the results from the functions in Wide_Character.Handling for obscure characters. Following is a Letter Ballot on this topic; please respond ASAP (but no later than Monday, October 17th): The character set standard used in Ada 2012 should be: _____ ISO/IEC 10646:2003 (that is, no change - corresponds to Unicode 4.0). _____ ISO/IEC 10646:2011 (that is, the current standard - corresponds to Unicode 6.0). If choosing this option, please select from one of the following: _____ Keep the identifier rules as currently defined, with no plans to change them. _____ Keep the identifier rules as currently defined, but plan to issue a BI in the future [if this is appropriate after study] to change to use the recommended XId_Start and XId_Continue classes to define the characters that can be used. (These are defined to be stable, like case folding, but unlike the letter classes we currently use.) We'd probably also want to add functions matching these classifications to (Wide_)Wide_Characters.Handling so that identifier processing can be usefully written in Ada code (that's not possible now as the currently used classes aren't stable and thus will change from Ada version to Ada version). _____ Change the identifier rules now to use the Xid_Start and Xid_Contain classes. (Probably would delay the Standard - we'll need to consider the effect of potentially including non-letters in identifiers on 'Image, among other things.) **************************************************************** From: Randy Brukardt Sent: Tuesday, October 11, 2011 11:56 PM > Following is a Letter Ballot on this topic; please respond ASAP (but > no later than Monday, October 17th): > > > The character set standard used in Ada 2012 should be: > > _____ ISO/IEC 10646:2003 (that is, no change - corresponds to > Unicode 4.0). > > > __X___ ISO/IEC 10646:2011 (that is, the current standard - > corresponds to Unicode 6.0). > If choosing this option, please select from one > of the following: > > _____ Keep the identifier rules as currently defined, > with no plans to change them. > > ___Y__ Keep the identifier rules as currently defined, > but plan to issue a BI in the future [if this is appropriate after > study] to change to use the recommended XId_Start and XId_Continue > classes to define the characters that can be used. (These are defined > to be stable, like case folding, but unlike the letter classes we > currently use.) We'd probably also want to add functions matching > these classifications to (Wide_)Wide_Characters.Handling so that > identifier processing can be usefully written in Ada code (that's not > possible now as the currently used classes aren't stable and thus will > change from Ada version to Ada version). > > _____ Change the identifier rules now to use the > Xid_Start and Xid_Contain classes. (Probably would delay the Standard > - we'll need to consider the effect of potentially including > non-letters in identifiers on 'Image, among other things.) **************************************************************** From: Robert Dewar Sent: Wednesday, October 12, 2011 5:25 AM > Following is a Letter Ballot on this topic; please respond ASAP (but > no later than Monday, October 17th): > > > The character set standard used in Ada 2012 should be: > > __X___ ISO/IEC 10646:2003 (that is, no change - corresponds > to Unicode 4.0). just because I think this stuff is so little used (not sure it is used at all), and it is not worth doing major implementation work to make a change that will affect no one. OTOH, if we do the change, I don't think GNAT will bother to follow unless some real user complains, which will likely be never :-) **************************************************************** From: Tucker Taft Sent: Wednesday, October 12, 2011 7:27 AM I'll go with Randy's recommendation (see below). ... > _X____ ISO/IEC 10646:2011 (that is, the current standard - > corresponds to Unicode 6.0). > If choosing this option, please select from one > of the following: > > _____ Keep the identifier rules as currently defined, > with no plans to change them. > > __X___ Keep the identifier rules as currently defined, > but plan to issue a BI in the future [if this is > appropriate ... **************************************************************** From: Jean-Pierre Rosen Sent: Wednesday, October 12, 2011 7:41 AM > Following is a Letter Ballot on this topic; please respond ASAP (but > no later than Monday, October 17th): > > > The character set standard used in Ada 2012 should be: > > _____ ISO/IEC 10646:2003 (that is, no change - corresponds to > Unicode 4.0). > > > _X____ ISO/IEC 10646:2011 (that is, the current standard - > corresponds to Unicode 6.0). > If choosing this option, please select from one > of the following: > > _____ Keep the identifier rules as currently defined, > with no plans to change them. > > ___X__ Keep the identifier rules as currently defined, > but plan to issue a BI in the future [if this is appropriate after > study] to change to use the recommended XId_Start and XId_Continue > classes to define the characters that can be used. (These are defined > to be stable, like case folding, but unlike the letter classes we > currently use.) We'd probably also want to add functions matching > these classifications to (Wide_)Wide_Characters.Handling so that > identifier processing can be usefully written in Ada code (that's not > possible now as the currently used classes aren't stable and thus will change from Ada version to Ada version). > > _____ Change the identifier rules now to use the > Xid_Start and Xid_Contain classes. (Probably would delay the Standard > - we'll need to consider the effect of potentially including > non-letters in identifiers on 'Image, among other things.) **************************************************************** From: Bob Duff Sent: Wednesday, October 12, 2011 8:44 AM > As previous noted, we need to decide whether to change to the latest > version of the character set standard. For most purposes, this is not > a problem, but there is an incompatibility as some Ada 2005 > identifiers would not be legal in Ada 2012 -- these would use *very* > obscure characters. (But given that the rules for identifiers are very > screwed up in Ada 2005, this incompatibility is much smaller than the > potential one caused by applying the BI on identifiers). I don't understand how the rules are "very screwed up" (and I don't really want to -- I'm sure you've explained it before -- no need to do so again). But whatever the screwup is, I'm guessing implementations don't obey it, so when talking about [in]compatibility, we should be talking about what implementations actually do. Anyway, my vote is for: > __X__ ISO/IEC 10646:2003 (that is, no change - corresponds to > Unicode 4.0). because any change is going to require a lot of not-very-useful work for implementations. **************************************************************** From: Robert Dewar Sent: Wednesday, October 12, 2011 8:51 AM > But whatever the screwup is, I'm guessing implementations don't obey > it, so when talking about [in]compatibility, we should be talking > about what implementations actually do. GNAT follows exactly the 2005 rules. I don't really agree they are "very screwed up", but I know of no discrepancies between the 2005 standard and what GNAT does. The issue is the categories and the way they are used. > Anyway, my vote is for: > >> __X__ ISO/IEC 10646:2003 (that is, no change - corresponds >> to Unicode 4.0). > > because any change is going to require a lot of not-very-useful work > for implementations. Well, pretend to require, you can't really require implementations to do anything :-) **************************************************************** From: Bob Duff Sent: Wednesday, October 12, 2011 9:12 AM > Well, pretend to require, you can't really require implementations to > do anything :-) Very good point! We (language designers) have a tendency to forget that. **************************************************************** From: Robert Dewar Sent: Wednesday, October 12, 2011 9:23 AM And I think after some debacles (like the leap second nonsense) implementors are less likely to automatically jump to implement everything :-) **************************************************************** From: Erhard Ploedereder Sent: Wednesday, October 12, 2011 12:07 PM > Following is a Letter Ballot on this topic; please respond ASAP (but > no later than Monday, October 17th): I'll abstain on this ballot out of sheer ignorance of the issues. **************************************************************** From: Tullio Vardanega Sent: Wednesday, October 12, 2011 1:04 PM So do I. >> Following is a Letter Ballot on this topic; please respond ASAP (but >> no later than Monday, October 17th): > I'll abstain on this ballot out of sheer ignorance of the issues. **************************************************************** From: Randy Brukardt Sent: Wednesday, October 12, 2011 7:10 PM > > But whatever the screwup is, I'm guessing implementations don't obey > > it, so when talking about [in]compatibility, we should be talking > > about what implementations actually do. > > GNAT follows exactly the 2005 rules. I very highly doubt this. > I don't really agree > they are "very screwed up", but I know of no discrepancies between the > 2005 standard and what GNAT does. The issue is the categories and the > way they are used. The categories are only the tip of the iceberg. Does GNAT: (1) allow "other-format" characters (like soft hyphens) in identifiers? Original Ada 2005 did (later repealed for Ada 2012). (2) use full case folding for identifier equivalence checks? That means that "aá" is the same as "ass" and "ASS". (Also now changed for Ada 2012 for compatibility reasons with Ada 95, but it was fully intended to be the case for Ada 2005.) (3) Returns a full case folded string from 'Image (as specified in the Ada 2005 standard), even when this would change the length and typically put the string into lower case? (This was just a bug in Ada 2005, but there is no easy fix and extensive changes were needed.) My understanding from previous discussions is that GNAT does none of these. That's probably a good thing [(3) a clearly a case of Robert's rule of the standard saying something silly; (1) was repealed a long time ago; and (2) caused an unintentional incompatibility], but it surely is not the same as "follows exactly the Ada 2005 rules". It's much closer to "following the Ada 2005 as we wish they would be". ;-) Back to the topic: the only major change from using 10646:2011 instead of 10646:2003 would be that a few obscure characters would change category, and presumably the equivalence ("case folding") and case conversion tables also have some changes in obscure cases. Any other changes to identifiers would need to be discussed in the future because we need to consider all of the impacts (and we already have an open soon-to-be AI on "normalization", which probably will demand more changes to the rules anyway). It should be noted that the "official" Ada rules have been coming closer to what you want, but that the ARG remains committed to following the Unicode recommendations as closely as makes sense for Ada. That almost certainly means that are going to be some rules that you don't like. I've said before, and I'll be happy to say again that I don't think characters outside of Latin-1 should be allowed in identifiers, period, but we did not feel that we had a choice in this matter given the directives on internationalization of programming languages. As such, we have to make the best fit of those recommendations with Ada. **************************************************************** From: Robert Dewar Sent: Wednesday, October 12, 2011 8:56 PM > The categories are only the tip of the iceberg. Does GNAT: > > (1) allow "other-format" characters (like soft hyphens) in identifiers? > Original Ada 2005 did (later repealed for Ada 2012). yes, then changed > (2) use full case folding for identifier equivalence checks? That > means that "aá" is the same as "ass" and "ASS". (Also now changed for > Ada 2012 for compatibility reasons with Ada 95, but it was fully > intended to be the case for Ada 2005.) No, but I always thought this was an absurd misreading of Ada 2005, no informed person can have intended that reading > (3) Returns a full case folded string from 'Image (as specified in the Ada > 2005 standard), even when this would change the length and typically > put the string into lower case? (This was just a bug in Ada 2005, but > there is no easy fix and extensive changes were needed.) It can't change the length > My understanding from previous discussions is that GNAT does none of these. > That's probably a good thing [(3) a clearly a case of Robert's rule of > the standard saying something silly; (1) was repealed a long time ago; > and (2) caused an unintentional incompatibility], but it surely is not > the same as "follows exactly the Ada 2005 rules". It's much closer to > "following the Ada 2005 as we wish they would be". ;-) The rules were badly written, and have to be interpreted with lavish use of Robert's rule > It should be noted that the "official" Ada rules have been coming > closer to what you want, but that the ARG remains committed to > following the Unicode recommendations as closely as makes sense for > Ada. That almost certainly means that are going to be some rules that you > don't like. > > I've said before, and I'll be happy to say again that I don't think > characters outside of Latin-1 should be allowed in identifiers, > period, but we did not feel that we had a choice in this matter given > the directives on internationalization of programming languages. As > such, we have to make the best fit of those recommendations with Ada. Fine, but why bother with changing them then if all you are doing is meeting directives, rather than doing something useful. I don't see that there were any directives mandating case folding, which remains a plain error in thinking. **************************************************************** From: Brad Moore Sent: Wednesday, October 12, 2011 9:42 PM > > Well, pretend to require, you can't really require > > implementations to do anything :-) > > Very good point! We (language designers) have a tendency to > forget that. If that's the case then would it not it be better to at least have the RM mention the more up to date standard, so that implementations can go with that version if they have the time and energy to implement it? If instead they find nobody cares or notices that the newer standard isn't implemented, then leaving their implementation as is isn't a problem either, is it? I'm trying to decide how to respond to the ballot. My feeling is that Randy's response is the best response, but I also am sympathetic to implementation burden, if real user's aren't likely to notice one way or the other. **************************************************************** From: John Barnes Sent: Thursday, October 13, 2011 7:53 AM I agree with Erhard. I am going to abstain as well. > I'll abstain on this ballot out of sheer ignorance of the issues. **************************************************************** From: Gary Dismukes Sent: Thursday, October 13, 2011 1:34 PM > I agree with Erhard. I am going to abstain as well. Count me in the list of abstainers. I don't understand the issues well enough. (If forced to vote I'd go for no change.) **************************************************************** From: Steve Baird Sent: Thursday, October 13, 2011 1:57 PM >> I agree with Erhard. I am going to abstain as well. > > Count me in the list of abstainers. I don't understand the issues > well enough. (If forced to vote I'd go for no change.) > Ditto. **************************************************************** From: Ed Schonberg Sent: Thursday, October 13, 2011 2:19 PM I abstain as well, and for the same reasons. **************************************************************** From: Tucker Taft Sent: Thursday, October 13, 2011 2:31 PM You guys are a bunch of wimps... ;-) **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, October 13, 2011 2:40 PM Well, let me comment why I didn't abstain. If we believe in standards, and if we believe that the guys who design 10646 know better than us, we have to follow. The only freedom we have is in trying to do so in a manner that is not too disruptive. **************************************************************** From: Robert Dewar Sent: Thursday, October 13, 2011 4:00 PM perhaps we have to follow, but not to race, we have not had enough time to study this change, let's leave it for Ada 2020, and perhaps issue a BI that allows implementations to change before then, just as we did for 8-bit characters. **************************************************************** From: Randy Brukardt Sent: Thursday, October 13, 2011 4:26 PM The only problem with that is that it would change the run-time behavior of [Wide_]Wide_Characters.Handling (since a few characters change classifications). It seems like a bad idea to have different implementations having different interpretations of the correct behavior of these functions. OTOH, the identifier syntax changes definitely need study before we adopt them (or not), no one can reasonably implement what the Ada 2005 actually says, so implementations will inevitably differ subtly on this in any case, and there seems to be little evidence that programmers are using this, so deferring the change there is better. ISO 10646:2011 has an Annex (annex U) that specifically says that identifiers in programming languages should follow the Unicode recommendations (giving a link, not including them). But there is a lot of wiggle room in those Unicode recommendations. One alternative to "fix" the run-time issue with [Wide_]Wide_Characters.Handling would be to say that it is implementation-defined exactly which character set standard it depends on. Or some other statement that programmers should expect there will be changes in character classifications, case conversions, and the like in future standards, so we can ignore the "compatibility" issue in the future. (After all, for most applications, it won't make any difference, or it would be *better* for the package to use the most recent character set standard - or at least one that applies to the target system; tying it for all time to any particular character set standard [which we know is going to change] is rather silly.) **************************************************************** From: Robert Dewar Sent: Thursday, October 13, 2011 4:42 PM > The only problem with that is that it would change the run-time > behavior of [Wide_]Wide_Characters.Handling (since a few characters > change classifications). It seems like a bad idea to have different > implementations having different interpretations of the correct behavior of > these functions. This is more of a theoretical concern than an actual one. And changing the standard is not going to have any immediate effect on GNAT in the immediate future anyway (we have already frozen the feature set for the 2012 releases of GNAT). And of course your suggestion leads to different implementations having even mnore different intepretations (how many other Ada 2012 compilers do you expect to see in the near future?), since it is much more likely that the two different implementations involved will be an Ada 2005 one and an Ada 2012 one. Furthermore, we did a much bigger incompatible change with 7 to 8-bit characters and it caused very little trouble. > One alternative to "fix" the run-time issue with > [Wide_]Wide_Characters.Handling would be to say that it is > implementation-defined exactly which character set standard it depends on. What on earth would that achieve > Or some other statement that programmers should expect there will be > changes in character classifications, case conversions, and the like > in future standards, so we can ignore the "compatibility" issue in the > future. (After all, for most applications, it won't make any > difference, or it would be > *better* for the package to use the most recent character set standard > - or at least one that applies to the target system; tying it for all > time to any particular character set standard [which we know is going > to change] is rather silly.) That's merely a formalistic argument, no programmer will change their behavior on the basis of such a statement in the RM. **************************************************************** From: Randy Brukardt Sent: Thursday, October 13, 2011 7:28 PM ... > > The only problem with that is that it would change the run-time > > behavior of [Wide_]Wide_Characters.Handling (since a few characters > > change classifications). It seems like a bad idea to have different > > implementations having different interpretations of the correct > > behavior of these functions. > > This is more of a theoretical concern than an actual one. And changing > the standard is not going to have any immediate effect on GNAT in the > immediate future anyway (we have already frozen the feature set for > the > 2012 releases of GNAT). It's not that theoretical: Ada.Wide_Wide_Characters.Handling is easy to implement and probably will be supported by a number of Ada compilers in the near future. And it has nothing to do with "features": the package exists in any case, the question is exactly what it should return. > And of course your suggestion leads to different implementations > having even mnore different intepretations (how many other Ada 2012 > compilers do you expect to see in the near future?), since it is much > more likely that the two different implementations involved will be an > Ada 2005 one and an Ada 2012 one. No Ada 2005 implementation has Ada.Wide_Wide_Characters.Handling -- it's an Ada 2012 package. If it does have it, it's formally an implementation-defined package and thus irrelevant. Let me say again, I am *not* talking in any way about identifiers or their syntax. They have absolutely nothing to do with the package Ada.Wide_Wide_Characters.Handling. > Furthermore, we did a much bigger incompatible change with 7 to 8-bit > characters and it caused very little trouble. I don't see how this has anything whatsoever to do with the case in point. ... > > Or some other statement that programmers should expect there will be > > changes in character classifications, case conversions, and the like > > in future standards, so we can ignore the "compatibility" issue in > > the future. (After all, for most applications, it won't make any > > difference, or it would be > > *better* for the package to use the most recent character set > > standard > > - or at least one that applies to the target system; tying it for > > all time to any particular character set standard [which we know is > > going to change] is rather silly.) > > That's merely a formalistic argument, no programmer will change their > behavior on the basis of such a statement in the RM. Probably not, and that's OK -- the primary thing is to warn programmers that the behavior of these functions on currently undefined code points is likely to change in future versions of Ada. As with any case of a "bounded error", neither the compiler implementer nor programmers are likely to pay much attention to the rule -- but at least they were warned in print. Anyway, let me ask you specifically what you think this package should do for new/changed characters. I'm specifically talking about the behavior of functions in Wide_Wide_Characters.Handling like Is_Letter and Is_Upper when they are passed a character with a code position corresponding to a new character defined in 10646:2011 (or some later version): (1) Wide_Wide_Characters.Handling returns values based on 10646:2003 forever. Very compatible, but also very out of date in the future (Ada 2012 is expected to last until 2020, at which point 10646:2003 will be 17 years old and probably will have been replaced at least one more time). (2) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada 2012, updated to use some newer standard down the road. Updating to use some newer standard will be run-time incompatible - a few characters that are letters in 2003 are not letters in 2011. (2a) Do the above, but indicate to users of the package that the results may change in the future as character sets evolve. (3) Wide_Wide_Characters.Handling returns values based on 10646:2011 forever. Also very compatible, but will also get out of date. (4) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada 2012, updated to use some newer standard down the road. Similar to (2) above. (4a) Do the above, and also something similar to (2a). (5) Wide_Wide_Characters.Handling returns values based on an implementation-defined character set standard. Lets Robert do whatever he wants. :-) We have to make *some* choice of these options: users need to know what they can count on, we need to know how far ACATS and implementer internal tests can go, etc. Ignoring the question results in (1) or (5), depending on who's doing the interpreting. My personal preference is (4a), followed by (2a). But I think we need some statement in the Standard so down the road we do not feel compelled to keep exact run-time compatibility as we do for Ada.Characters.Handling. Else Ada will be stuck sooner or later with an obsolete character set standard. I agree with you that it's too late now to adopt the 10646:2011 identifier recommendations, but that is a very separate issue from the one of run-time character classifications. I'm primarily interested in the latter now. **************************************************************** From: Robert Dewar Sent: Thursday, October 13, 2011 9:03 PM > Let me say again, I am *not* talking in any way about identifiers or > their syntax. They have absolutely nothing to do with the package > Ada.Wide_Wide_Characters.Handling. OK, got it, was confused >> Furthermore, we did a much bigger incompatible change with 7 to 8-bit >> characters and it caused very little trouble. > > I don't see how this has anything whatsoever to do with the case in point. it was a case where we made a big change between versions of the standard. > (4) Wide_Wide_Characters.Handling returns values based on > 10646:2003 for Ada 2012, updated to use some newer standard down the > road. Similar to (2) above. > > (4a) Do the above, and also something similar to (2a). This (4a) is the one I would choose > My personal preference is (4a), followed by (2a). But I think we need > some statement in the Standard so down the road we do not feel > compelled to keep exact run-time compatibility as we do for > Ada.Characters.Handling. Else Ada will be stuck sooner or later with an > obsolete character set standard. Well I chose 4a before reading it was your first choice. > I agree with you that it's too late now to adopt the 10646:2011 > identifier recommendations, but that is a very separate issue from the > one of run-time character classifications. I'm primarily interested in the > latter now. So it lookse like 4a might be viable as a consensus decision here? **************************************************************** From: Randy Brukardt Sent: Thursday, October 13, 2011 9:19 PM ... > > (4) Wide_Wide_Characters.Handling returns values based on > > 10646:2003 for Ada 2012, updated to use some newer standard down the > > road. Similar to (2) above. > > > > (4a) Do the above, and also something similar to (2a). > > This (4a) is the one I would choose Sorry, I botched this item, I put the wrong year on the Standard. As written, this is identical to (2) and (2a). I meant (4) to be the one that uses 10646:2011, (2) is the one that uses 10646:2003. I suspect that you meant (2a), but I'd like a clarification from you. ... > So it lookse like 4a might be viable as a consensus decision here? Except that I screwed up the choices. Please consider (4) as using 10646:2011, and vote again. **************************************************************** From: Robert Dewar Sent: Friday, October 14, 2011 9:28 AM >> So it lookse like 4a might be viable as a consensus decision here? > > Except that I screwed up the choices. Please consider (4) as using > 10646:2011, and vote again. Now I am confused, can you send a new email with the newly updated choices clear, so I am not trying to create a virtual result from synchronizing old emails? **************************************************************** From: Randy Brukardt Sent: Friday, October 14, 2011 3:16 PM Sorry about the confusion. I created (4) and (4a) with cut-and-paste and insufficiently updated them. Here is the complete list: What is the behavior of functions in Wide_Wide_Characters.Handling like Is_Letter and Is_Upper when they are passed a character with a code position corresponding to a new character defined in 10646:2011 (or some later version): (1) Wide_Wide_Characters.Handling returns values based on 10646:2003 forever. Very compatible, but also very out of date in the future (Ada 2012 is expected to last until 2020, at which point 10646:2003 will be 17 years old and probably will have been replaced at least one more time). (2) Wide_Wide_Characters.Handling returns values based on 10646:2003 for Ada 2012, updated to use some newer standard down the road. Updating to use some newer standard will be run-time incompatible - a few characters that are letters in 2003 are not letters in 2011 (but these are unlikely corner cases, not the commonly used letters) and similarly for other classifications. (2a) Do the above, but indicate to users of the package that the results may change in the future as character sets evolve. (3) Wide_Wide_Characters.Handling returns values based on 10646:2011 forever. Also very compatible, but will also get out of date. (4) Wide_Wide_Characters.Handling returns values based on 10646:2011 for Ada 2012, and will be update to use newer standards down the road. Similar to (2) above, but using the 2011 character standard now. Future changes probably would be run-time incompatible, but most likely in unlikely corner cases. (4a) Do the above, and also something similar to (2a). (5) Wide_Wide_Characters.Handling returns values based on an implementation-defined character set standard. Lets Robert do whatever he wants. :-) We have to make *some* choice of these options: users need to know what they can count on, we need to know how far ACATS and implementer internal tests can go, etc. Ignoring the question results in (1) or (5), depending on who's doing the interpreting. My personal preference is (4a) [because I can't think of any good reason not to use the "current" classifications here - it is explicitly not necessarily the same as used for identifiers], followed by (2a). But I think we need some statement in the Standard so down the road we do not feel compelled to keep exact run-time compatibility as we do for Ada.Characters.Handling. Else Ada will be stuck sooner or later with an obsolete character set standard. **************************************************************** From: Robert Dewar Sent: Friday, October 14, 2011 3:21 PM Right, Robert's vote is for 2a, which is basically status quo with an indication that updates may occur based on subsequent versions of the standard. **************************************************************** From: Jean-Pierre Rosen Sent: Saturday, October 15, 2011 12:39 AM Since we restart from scratch, let me cast again my vote for 4a, precisely because I don't fully understand the issue. Character set issues are very complex. I assume that the people at 10646 are very aware of compatibility issues, and that every non-upward compatible change is the result of a carefully evaluated trade-off. At some point, you have to trust the knowledge of other people. What would you say, if someone from the 10646 committee came to our meeting and told us that we got the accessibility rules completely wrong ;-)? **************************************************************** From: Robert Dewar Sent: Saturday, October 15, 2011 7:22 AM > Since we restart from scratch, let me cast again my vote for 4a, > precisely because I don't fully understand the issue. The reason I prefer 2a to 4a is that it will reflect reality. There is no way that anyone at AdaCore will do other than 2a in the short term in the absence of any customer demand. When there is customer demand to update to a new version, we can do so at that point, which would then be totally consistent with the ida of 2a. To me it is just too late to be making the immediate change to 4a, when no one has investigated the implications or the impact of any upward incompatibility. BTW, the assumption that the appropriate standards committee has properly considered the compatibility issues is dubious. In practice standards committees often get more concerned with doing things right, than maintaining compatibility (*) (*) look at what we did in Ada with limited returns for Ada 2005, which severely impacted the ability of many to move from Ada 95 to Ada 2005. **************************************************************** From: Robert Dewar Sent: Saturday, October 15, 2011 7:24 AM By the way, I really think Randy's suggestion here of specifically allowing for updating the standard used is an excellent one, MUCH better than just closing our eyes and mandating the status quo till the next version. **************************************************************** From: Tucker Taft Sent: Saturday, October 15, 2011 10:39 AM I'll go for 4a as well. I understand that GNAT has already implemented something else, but Ada 2012 doesn't even exist yet, so it is premature to have its content depend on what has or has not already been implemented by particular implementations. For those looking at the Ada 2012 standard when it comes out in 2012 (or 2013?), it makes no sense to me to tie it to an already out-of-date standard. As usual, implementations will do what they do based on market demands. If no one cares about these details anyway, we might as well get the words of the standard right, even if the reality is not going to match the words on day one. **************************************************************** From: Robert Dewar Sent: Saturday, October 15, 2011 3:42 PM Well my argument was about what will or will not be implemented in GNAT, not what already has been implemented in GNAT. From one point of view I don't care too much between 2a and 4a, since it won't make any difference to implementation plans in practice. I still don't like that no one on the ARG has carefully examined the two versions of the standard to understand what level of compatibility problems arise. It seems unwise to just adopt a standard without carefully examining it. If we want to adopt this new standard right away, we should have at least one person carefully examine the two standards and write up a document describing the differences from a programmer point of view. Aren't we sort of obligated to be very careful when it comes to introducing non-upwards compatibile changes, and at least document these changes carefully? **************************************************************** From: Randy Brukardt Sent: Saturday, October 15, 2011 6243 PM ... > I still don't like that no one on the ARG has carefully examined the > two versions of the standard to understand what level of compatibility > problems arise. It seems unwise to just adopt a standard without > carefully examining it. I'm not sure what level of care you are actually requiring; I spent more than an hour reading it before I sent my original messages in this thread. And I summarized the changes that I saw in the original messages. I didn't do a character-by-character comparison, but that would be rather silly (I have to presume that the summaries of changes are accurate). Note that this character set standard is freely available, anyone can download it and read it as I did. > ... If we > want to adopt this new standard right away, we should have at least > one person carefully examine the two standards and write up a document > describing the differences from a programmer point of view. I agree that we need a lot of care before adopting new identifier syntax, but *NO ONE* has suggested that at this point. (That will be an open issue for discussion at a near future ARG meeting - probably not the next one because I doubt I'll have time to write it up.) But for other things, this would be silly, because it is the same list that happens for any character change in Ada: some characters change categories. Some characters that were not previously graphic characters become them, and so on. We already have all of those changes documented as incompatibilities in Ada 2005 (because it did change character set standards). This would cause the same sorts of changes (only in obscure, rarely used characters). The Unicode change documents describe the changed characters in detail -- are you saying that we have to copy all of that and send it to you so that you can see those exact details? It's all online for anyone that cares to look. > Aren't we sort of obligated to be very careful when it comes to > introducing non-upwards compatibile changes, and at least document > these changes carefully? Currently, we're talking specifically about package Wide_Wide_Characters.Handling, which is new in Ada 2012. There is no possibility of creating a "non-upwards compatible" change in a new package! I realize that GNAT has an implementation-defined equivalent, but no one is going to accidentally change from the GNAT-only package to a language-defined one without intending it. If we adopt the new standard globally, there will be more changes, the main one being that more characters would be considered graphic characters by 'Image (changing their image from the "Hex00xxxx" form), that's technically incompatible but hardly very interesting. (Especially as we now allow 'Value to take the hex form for all characters, so there would be no incompatibility in 'Value unless an implementer wanted to introduce it.) Identifiers would allow more characters (as some of the new characters are "letters") - which has no compatibility issues, and a handful of previously allowed characters would be banned (which would be incompatible, but again these are rarely used characters, and probably should never have been allowed in the first place). I'd rather not make any identifier changes, but that would be hard to do using the new standard (which changes classifications of a few characters). There would be more changes if we adopted the identifier recommendations, but as I've said there is no way I would recommend that for Ada 2012 -- it's just too much change at too late a date. The other changes are pretty minimal, however. Since Ada 2012 is not going to be frozen until after the upcoming ARG meeting, we can discuss this during the next meeting. And it seems pretty obvious that we ought to. **************************************************************** From: Robert Dewar Sent: Saturday, October 15, 2011 6:35 PM > Currently, we're talking specifically about package > Wide_Wide_Characters.Handling, which is new in Ada 2012. There is no > possibility of creating a "non-upwards compatible" change in a new package! > I realize that GNAT has an implementation-defined equivalent, but no > one is going to accidentally change from the GNAT-only package to a > language-defined one without intending it. OK, that's fair enough > Since Ada 2012 is not going to be frozen until after the upcoming ARG > meeting, we can discuss this during the next meeting. And it seems > pretty obvious that we ought to. Also fair enough, I don't actually think it makes too much difference if we choose 2a or 4a, I don't think it will make any difference to anyone (programmers, or implementors, or reviewers or anyone else :-)) So if it really makes people have a better warm feeling to have the standard say 4a, then no problem as far as I am concerned. **************************************************************** From: Randy Brukardt Sent: Monday, December 19, 2011 10:13 PM I went to the ISO web site to check the exact spelling of the C standard name (to make sure that we have it right in 1.2), and what did I find? A brand new C standard! ISO/IEC 9899:2011 Information Technology - Programming Languages - C, published 2011-12-08 Based on our discussion in Denver (and our decision to use the 2011 C++ standard), I'll use this in 1.2 instead of the old one. **************************************************************** From: Randy Brukardt Sent: Monday, December 19, 2011 10:07 PM I just happened to notice that the new package Ada.Wide_Characters.Handling is not classified at all. That seems to me to be an oversight, as these are all pure functions. The Ada.Characters.Handling package that this is modeled on is classified as Pure. I will add this to AI05-0266-1 (which is modifying this package anyway), barring any objections. (We've already voted that AI, so any objection will require putting it in a different [not necessarily separate] AI so that it can be voted.) **************************************************************** From: Robert Dewar Sent: Tuesday, December 20, 2011 6:18 AM In GNAT, the pragmas for Ada.Characters.Handling are: > pragma Preelaborate; > pragma Pure_05; > -- In accordance with Ada 2005 AI-362 But indeed Ada.[Wide_]Wide_Characters.Handling have no pragmas **************************************************************** From: Randy Brukardt Sent: Tuesday, December 20, 2011 1:19 PM Is there any reason that they can't have pragma Pure, or was this just an oversight as I think it was?? **************************************************************** From: Robert Dewar Sent: Tuesday, December 20, 2011 5:54 PM I think an oversight, when did Ada.Wide_Characters.Handling enter the language? **************************************************************** From: Randy Brukardt Sent: Tuesday, December 20, 2011 6:25 PM It came from AI05-0185-1, it was originally (formally) proposed in November 2009. I think it was talked about some before that, but we didn't have a proposal. No version of the AI ever had any categorization pragmas, so I think it was just missed. **************************************************************** From: Bob Duff Sent: Tuesday, December 20, 2011 6:39 PM I have some vague memory that Wide_Chars. was deliberately impure, whereas the Chars. version was pure, because it was thought that the Wide_ version needed some pointers or heap allocation or something. But I studied that AI, and I agree with Randy and Robert: it was just an oversight to leave out Pure in this case. **************************************************************** From: Robert Dewar Sent: Tuesday, December 20, 2011 7:33 PM OK, so it should be Pure, no question about it, I will edit the GNAT packages appropriately **************************************************************** From: Randy Brukardt Sent: Wednesday, December 21, 2011 9:38 PM ... > I have some vague memory that Wide_Chars. > was deliberately impure, whereas the Chars. version was pure, > because it was thought that the Wide_ version needed some > pointers or heap allocation or something. You're thinking of Ada.Strings.Maps and Ada.Strings.Wide_Maps. The thinking was that Character_Set should be implemented as a simple bitmap, but requiring the use of 65536 bit set objects was over the top for Wide_Character_Set. (And imagine the fun for Wide_Wide_Character_Set!) So an implementation using (internal) access types was permitted, and that cannot be Pure, so Wide_Maps is preelaborated. But Ada.Wide_Characters.Handling doesn't use anything from Ada.Strings, so this reason doesn't apply here. (And even if it did, this should be preelaborated, and that wasn't done either.) > But I studied that AI, and I agree with Randy and Robert: > it was just an oversight to leave out Pure in this case. Thanks. **************************************************************** From: Robert Dewar Sent: Friday, December 23, 2011 10:22 AM > I just happened to notice that the new package > Ada.Wide_Characters.Handling is not classified at all. That seems to > me to be an oversight, as these are all pure functions. The > Ada.Characters.Handling package that this is modeled on is classified as Pure. Interestingly, this was not quite trivial to implement in GNAT. The package bodies depended on some other bodies which have what are in fact completely static array constants, but because of over energetic rules are not technically static. Luckily we compile the run-time in a special mode where such categorization errors are warnings that can be suppressed :-) ****************************************************************