CVS difference for ai05s/ai05-0114-1.txt

Differences between 1.1 and version 1.2
Log of other versions for file ai05s/ai05-0114-1.txt

--- ai05s/ai05-0114-1.txt	2008/10/07 05:19:30	1.1
+++ ai05s/ai05-0114-1.txt	2010/08/12 05:57:49	1.2
@@ -1,8 +1,8 @@
-!standard 3.9(12.1/2)                                         08-10-06  AI05-0114-1/01
-!standard 3.9(25.3/2)
-!standard 3.9(26.1/2)
-!standard 13.3(76)
+!standard A.3.2(59)                                         10-08-12  AI05-0114-1/02
+!standard A.4.6(7)
 !class binding interpretation 08-10-06
+!status Amendment 2012 10-08-12
+!status ARG Approved  7-0-1  10-08-12
 !status work item 08-10-06
 !status received 08-06-13
 !priority Low
@@ -12,6 +12,8 @@
 
 !summary
 
+No change is suggested to Ada.Characters.Handling or Ada.Strings.Maps.Constants;
+however we add a user note to point out the inconsistency.
 
 !question
 
@@ -21,10 +23,10 @@
 Then 2.1(9/2) defines letter_lowercase to be "Any character whose General Category is
 defined to be "Letter, Lowercase"" by ISO/IEC 10646:2003.
 
-The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase: 
-    Code point u+00AA, named FEMININE ORDINAL INDICATOR; 
-    Code point u+00B5, named MICRO SIGN; 
-    Code point u+00BA, named MASCULINE ORDINAL INDICATOR. 
+The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase:
+    Code point u+00AA, named FEMININE ORDINAL INDICATOR;
+    Code point u+00B5, named MICRO SIGN;
+    Code point u+00BA, named MASCULINE ORDINAL INDICATOR.
 
 Therefore, each of these three characters should be considered lowercase letters and allowed
 in identifiers according to the various Clause 2.1 and 2.3 paragraphs mentioned above.
@@ -40,7 +42,26 @@
 
 !wording
 
+Add after A.3.2(59):
 
+7 There are certain characters which are defined to be lower case letters by ISO
+  10646 and are therefore allowed in identifiers, but are not considered lower
+  case letters by Ada.Characters.Handling.
+
+AARM Reason: This is to maintain compatibility with the Ada 95 definitions of
+these functions.
+
+Add after A.4.6(7)
+
+NOTES
+1 There are certain characters which are defined to be lower case letters by ISO
+  10646 and are therefore allowed in identifiers, but are not considered lower
+  case letters by Ada.Strings.Maps.Constants.
+
+AARM Reason: This is to maintain compatibility with the Ada 95 definitions of
+these constants.
+
+
 !discussion
 
 Changing the definition of Ada.Characters.Handling has the potential of breaking existing
@@ -53,22 +74,40 @@
 
 The questioner also seems to assume that there is some correlation between
 Ada.Characters.Handling and identifiers. But this has never been true; both concepts are
-defined separately. 
+defined separately.
 
-While it is likely that many programs will not use any characters in the changed range, the 
+While it is likely that many programs will not use any characters in the changed range, the
 potential incompatibility is so wide spread that such a runtime change cannot be contemplated.
 
-[More interesting question: Does the incompatibility of Ada 95 and Ada 2005 classifications
-of these characters have any other unintended consequences?
 
-There is an alternative way to resolve the difference, which would be to use the Ada 95
-classification for Row 00 (that is, Latin-1). One way to do that would be to explicitly say
-that these three characters are not letters in Ada, even though they would qualify via
-Unicode. This is unlikely to be a major problem (all of the characters appear to have
-counterparts elsewhere in the Unicode set), but it would be weird (and probably not necessary
-unless there is a compatibility issue uncovered here.]
+We could have resolved this difference by using the Ada 95 classification for Row 00 (that is,
+Latin-1). One way to do that would be to explicitly say that these three characters are not
+letters in Ada, even though they would qualify via Unicode. This is unlikely to be a major
+problem (all of the characters appear to have counterparts elsewhere in the Unicode set),
+but it would be unusual. And we would be saying that our good taste is more important than
+the carefully considered (we hope!) classifications of the character set standards.
+
+!corrigendum A.3.2(59)
+
+@dinsa
+@xindent<@emdash Special graphic characters>
+@dinst
+@xindent<@s9<
+7   There are certain characters which are defined to be lower case letters by
+    ISO 10646 and are therefore allowed in identifiers, but are not considered
+    lower case letters by Ada.Characters.Handling.>>
+
+!corrigendum A.4.6(7)
+
+@dinsa
+Each of these constants represents a correspondingly named set of characters
+or character mapping in Characters.Handling (see A.3.2).
+@dinst
+@xindent<@s9<NOTES@hr
+12  There are certain characters which are defined to be lower case letters by
+    ISO 10646 and are therefore allowed in identifiers, but are not considered
+    lower case letters by Ada.Strings.Maps.Constants.>>
 
---!corrigendum 13.3(76)
 
 !ACATS Test
 
@@ -77,11 +116,11 @@
 
 !appendix
 
-!topic Inconsistency in Ada 2005 definition of letter 
-!reference Ada 2005 A.3.2(24,25) 
-!from Howard W. Ludwig 08-06-26 
-!keywords identifier_start, letter_lowercase, Is_Letter, Is_Lower 
-!discussion 
+!topic Inconsistency in Ada 2005 definition of letter
+!reference Ada 2005 A.3.2(24,25)
+!from Howard W. Ludwig 08-06-26
+!keywords identifier_start, letter_lowercase, Is_Letter, Is_Lower
+!discussion
 
 Ada 2005 has enhanced the set of characters allowed to compose identifiers. In particular,
 2.3(2/2) specifies that an identifier is made up of items including identifier_start.
@@ -89,10 +128,10 @@
 Then 2.1(9/2) defines letter_lowercase to be "Any character whose General Category is
 defined to be "Letter, Lowercase"" by ISO/IEC 10646:2003.
 
-The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase: 
-    Code point u+00AA, named FEMININE ORDINAL INDICATOR; 
-    Code point u+00B5, named MICRO SIGN; 
-    Code point u+00BA, named MASCULINE ORDINAL INDICATOR. 
+The Unicode Data File lists each of the following as category Ll, meaning Letter, Lowercase:
+    Code point u+00AA, named FEMININE ORDINAL INDICATOR;
+    Code point u+00B5, named MICRO SIGN;
+    Code point u+00BA, named MASCULINE ORDINAL INDICATOR.
 
 Therefore, each of these three characters should be considered lowercase letters and allowed
 in identifiers according to the various Clause 2.1 and 2.3 paragraphs mentioned above.
@@ -122,5 +161,231 @@
 in Ada 95 but do not with current Ada 2005 wording.
 
 ****************************************************************
+
+From: Robert Dewar
+Sent: Friday, July 9, 2010  4:28 PM
+
+For the record, GNAT does not permit any of the codes AA, B5, BA in identifiers.
+I have no intention of changing this unless someone thinks this incompatibility with
+Ada 95 is *really* important, it's of course an acceptable incompatibility (something
+that was illegal becoming legal), but to what purpose? None of these three symbols
+
+(MICRO SIGN, FEMININE ORDINAL INDICATOR, MASCULINE ORDINAL INDICATOR)
+
+are reasonable in identifiers.
+
+Yes, once we stray outside the Latin-1, all sorts of bizarre characters are valid in
+identifiers, but let's keep the basic 256 character set free of such oddity!!!
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, July 9, 2010  6:00 PM
+
+> For the record, GNAT does not permit any of the codes AA, B5, BA in
+> identifiers.
+> I have no intention of changing this unless someone thinks this
+> incompatibility with Ada 95 is *really* important, it's of course an
+> acceptable incompatibility (something that was illegal becoming
+> legal), but to what purpose?
+
+Illegal-->legal is not called an "incompatibility";
+it's called a "language extension".
+
+I don't care one way or the other what we do, so long as you don't call it an
+incompatibility.  ;-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, July 9, 2010  6:07 PM
+
+> I don't care one way or the other what we do, so long as you don't
+> call it an incompatibility.  ;-)
+
+Fair enough, but I like to regard "language extension"
+as the implementation of something useful, not the accidental permitting of
+something silly :-)
+
+I would be embarrassed to call this a language extension
+
+>> (MICRO SIGN, FEMININE ORDINAL INDICATOR, MASCULINE ORDINAL INDICATOR)
+>>
+>> are reasonable in identifiers.
+>>
+>> Yes, once we stray outside the Latin-1, all sorts of bizarre
+>> characters are valid in identifiers, but let's keep the basic 256
+>> character set free of such oddity!!!
+
+Are you really neutral on this, do you think it is rasonable to allow thse three
+symbols (I can't bring myself to call them letters) in identifiers.
+
+And wouldn't you find it a bit weird for a Latin-1 character that is classified
+neither as a letter nor a digit by Ada.Characters.Handling was allowed in
+identiiers???
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, July 9, 2010  6:49 PM
+
+> Are you really neutral on this, do you think it is rasonable to allow
+> thse three symbols (I can't bring myself to call them letters) in
+> identifiers.
+
+
+The problem here is determining what is a letter. Either we can depend on the
+definitions given by the character set standards (that is ISO/IEC 10646:2003) or
+we can invent our own. Depending on the character set standard means that we're
+going to have incompatibilities and extensions every time those standards change
+(add new characters, for instance). While we can stick with a particular version
+of a standard for a while, eventually we'll need to update the reference to it.
+
+Inventing our own definition is fraught with danger, and in particular works
+poorly when new characters are added to the character set standards. Neither
+works very well in my view.
+
+BTW, Unicode has a solution to prevent incompatibilities (but not
+extensions!) in identifiers: it has a special classification for characters that
+were once classified as letters and are not anymore. That classification is
+recommended to be included in identifiers. (We didn't follow this advice in Ada
+2005, as it was added after Unicode 4.0, and besides, you don't think we should
+be referring to Unicode at all, so it's probably not possible -- the
+classification isn't in 10646:2003.)
+
+As far as the oddity itself goes, I find any characters outside of A-Z and
+0-9 odd, so I don't really care (and it might come in handy for enumeration
+literals).
+
+> And wouldn't you find it a bit weird for a Latin-1 character that is
+> classified neither as a letter nor a digit by Ada.Characters.Handling
+> was allowed in identiiers???
+
+Ada.Characters.Handling is based on an obsolete standard. We can't change it
+because of the concern about silent changes in program behavior. This is the
+same problem that we have with the Upper_Case_Map for Wide_Characters; it's
+badly defined, but we don't want to change the behavior of existing programs.
+
+Note that we'll have the same problem with Ada.Wide_Characters.Handling down the
+road. It will be tied to some particular version of 10646, and we won't feel
+able to change the results to some other version (which presumably will have new
+kinds letters with new upper and lower case mappings). Either that or we'll have
+to accept behavior changes. (Neither sound great to me.)
+
+It might be valuable to have a package whose behavior is defined to be exactly
+that of Ada identifiers (for whatever version of the language is being used).
+That package would intentionally change behaviors between versions as needed to
+match changes in the character sets. (I don't think that is the main purpose of
+Ada.Characters.Handling, and changing it to do that is nasty to programs that
+are not managing identifiers.)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, July 9, 2010  7:00 PM
+
+> As far as the oddity itself goes, I find any characters outside of A-Z and
+> 0-9 odd, so I don't really care (and it might come in handy for
+> enumeration literals).
+
+Randy that is an absurd and unhelpful position to take. It is obvious that you
+have to allow common characters in European languages. MANY of our users take
+advantage of this (e.g. acute accents). I find your position even worse than
+Whitacker's O26 keypunch stuff :-) :-)
+
+Ther is a heck of a difference between E-acute and these three characters.
+
+> Ada.Characters.Handling is based on an obsolete standard. We can't
+> change it because of the concern about silent changes in program
+> behavior. This is the same problem that we have with the
+> Upper_Case_Map for Wide_Characters; it's badly defined, but we don't
+> want to change the behavior of existing programs.
+
+Never mind the standards! Ada.Characters.Is_Letter returns entirely sensible
+results. If there is no standard specifying this sensible result, too bad, we
+definitely SHOULD invent our own rules for the first 256 characters.
+
+And we should allow E acute as a letter, but not the MASCULINE and FEMININE
+symbols!
+
+> Note that we'll have the same problem with
+> Ada.Characters.Wide_Handling down the road. It will be tied to some
+> particular version of 10646, and we won't feel able to change the
+> results to some other version (which presumably will have new kinds
+> letters with new upper and lower case mappings). Either that or we'll
+> have to accept behavior changes. (Neither sound great to me.)
+
+Easily handled at the compiler level with option flags (GNAT allows LOTS of
+character sets in identifiers, e.g. all the Latin sets).
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, July 9, 2010  7:22 PM
+
+> Fair enough, but I like to regard "language extension"
+> as the implementation of something useful, not the accidental
+> permitting of something silly :-)
+>
+> I would be embarrassed to call this a language extension
+
+Then I suggest you use a term like "silly, stupid, useless, rubbage language
+extension".  ;-)
+
+I think it's important to reserve "incompatibility" for cases where we're
+potentially breaking someone's previously-working code. That's a serious charge,
+which I take seriously.
+
+Otherwise the discussion gets confused.  (During Ada 9X there were some cases
+where people used "incompatible" to mean "incompatible with my personal good
+taste", and that confused the discussion!)
+
+> >> None of these three symbols
+> >>
+> >> (MICRO SIGN, FEMININE ORDINAL INDICATOR, MASCULINE ORDINAL
+> >> INDICATOR)
+> >>
+> >> are reasonable in identifiers.
+> >>
+> >> Yes, once we stray outside the Latin-1, all sorts of bizarre
+> >> characters are valid in identifiers, but let's keep the basic 256
+> >> character set free of such oddity!!!
+>
+> Are you really neutral on this, do you think it is rasonable to allow
+> thse three symbols (I can't bring myself to call them letters) in
+> identifiers.
+
+I don't know what they look like (though I can guess), nor how to type them in,
+nor will I ever be likely to use them.  I'm neutral because I don't have enough
+knowledge.  Like if you asked me what's the best restaurant in Moscow -- I've
+never been there.  If there's some ISO character-set standard that says those
+things ought to be "letters", or "identifier chars" or whatever, then who am I
+to say nay?
+
+> And wouldn't you find it a bit weird for a Latin-1 character that is
+> classified neither as a letter nor a digit by Ada.Characters.Handling
+> was allowed in identiiers???
+
+Yes, I suppose so.  But then somebody will say "Ada.Characters.Handling is for
+human-readable text and blah blah", and I'll reply "Yeah, OK with me."
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, July 9, 2010  4:28 PM
+
+>> I would be embarrassed to call this a language extension
+>
+> Then I suggest you use a term like "silly, stupid, useless, rubbage
+> language extension".  ;-)
+
+OK, henceforth, we shall call these SSURLE's  pronouned ssssurley "-)
+
+> I think it's important to reserve "incompatibility" for cases where
+> we're potentially breaking someone's previously-working code.
+> That's a serious charge, which I take seriously.
+
+yes of course
+
+****************************************************************
 
-    

Questions? Ask the ACAA Technical Agent