CVS difference for ais/ai-00395.txt

Differences between 1.12 and version 1.13
Log of other versions for file ais/ai-00395.txt

--- ais/ai-00395.txt	2005/10/31 05:18:40	1.12
+++ ais/ai-00395.txt	2005/11/16 06:51:15	1.13
@@ -1,4 +1,4 @@
-!standard 1.1.4(15)                                  05-10-13  AI95-00395/10
+!standard 1.1.4(15)                                  05-11-15  AI95-00395/11
 !standard 2.1(01)
 !standard 2.1(03)
 !standard 2.1(04)
@@ -190,7 +190,7 @@
 Change 2.1(14) (as modified by AI95-00285) to read:
 
 graphic_character
-   Any character which is not in the categories other_control,
+   Any character that is not in the categories other_control,
 other_private_use, other_surrogate, format_effector, and whose code
 position is neither 16#FFFE# nor 16#FFFF#.
 
@@ -298,7 +298,7 @@
 above):
 
 graphic_character
-   Any character which is not in the categories other_control,
+   Any character that is not in the categories other_control,
 other_private_use, other_surrogate, format_effector, and whose relative code
 position in its plane is neither 16#FFFE# nor 16#FFFF#.
 
@@ -500,7 +500,7 @@
 implementation defined.>
 @dby
 @xhang<@xterm<@fa<graphic_character>>
-Any character which is not in the categories @fa<other_control>,
+Any character that is not in the categories @fa<other_control>,
 @fa<other_private_use>, @fa<other_surrogate>, @fa<format_effector>, and whose
 relative code position in its plane is neither 16#FFFE# nor 16#FFFF#.>
 
@@ -4676,6 +4676,2103 @@
 > <CJK Ideograph Extension B, Last>
 >
 > There are similar modification to be done to the UTF_32_Non_Graphic table.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005 11:24 AM
+
+Seems like there are some obvious omissions
+
+Ada.Strings.Wide_Wide_Hash
+Ada.Strings.Wide_Wide_Unbounded.Hash
+
+I also find oddly missing the case insensitive stuff for Wide_Wide.
+Given that we tool the decision to make this available for identifiers,
+shouldn't we give run time access to this facility? I certainly intend
+to do that in GNAT, but currently this is in a GNAT specific package,
+and I am not sure that seems right.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, February 2, 2005  2:19 PM
+
+> Seems like there are some obvious omissions
+>
+> Ada.Strings.Wide_Wide_Hash
+> Ada.Strings.Wide_Wide_Unbounded.Hash
+
+That's an intergration issue of course (it's the intersection of two AIs,
+AI-285 and AI-302). These packages don't belong in either AI. There are a
+number of features like that.
+
+But I admit that this one isn't on any of my lists, so it had just been
+overlooked. On the third hand, I haven't updated the section in question
+yet, so I probably would have caught it at that time.
+
+> I also find oddly missing the case insensitive stuff for Wide_Wide.
+> Given that we tool the decision to make this available for identifiers,
+> shouldn't we give run time access to this facility? I certainly intend
+> to do that in GNAT, but currently this is in a GNAT specific package,
+> and I am not sure that seems right.
+
+Certainly the case conversion mappings are defined in
+Ada.Strings.Wide_Wide_Maps.Constants (i.e. Lower_Case_Map and
+Upper_Case_Map). But it does seem odd that there aren't any similar
+facilities in Ada.Characters.Handling.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005  2:48 PM
+
+> Certainly the case conversion mappings are defined in
+> Ada.Strings.Wide_Wide_Maps.Constants (i.e. Lower_Case_Map and
+> Upper_Case_Map). But it does seem odd that there aren't any similar
+> facilities in Ada.Characters.Handling.
+
+Since the spec in Ada.Characters.Handling is specifically
+thought out, rather than derived with the "similar" wand,
+I took this as a deliberate decision NOT to provide this
+facility here.
+
+For sure, it would be junky to have the wide wide case
+conversion tables present for programs that do no wide
+character or wide wide character stuff.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, February 2, 2005  3:05 PM
+
+> For sure, it would be junky to have the wide wide case
+> conversion tables present for programs that do no wide
+> character or wide wide character stuff.
+
+Yes, it probably would make more sense for that to be a (different) child of
+Ada.Characters. But it certainly seems odd not to provide it at all. I
+wonder if that was deliberate or just an oversight? Pascal?
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005  2:47 PM
+
+> Certainly the case conversion mappings are defined in
+> Ada.Strings.Wide_Wide_Maps.Constants (i.e. Lower_Case_Map and
+> Upper_Case_Map). But it does seem odd that there aren't any similar
+> facilities in Ada.Characters.Handling.
+
+But in the Ada 95 RM, the statement is that the stuff in the
+package Ada.Strings.Wide_Maps.Wide_Constants provides "the
+same string handling operations". Are we really meant to
+interpolate case folding in Ada 95? Certainly we didn't in
+GNAT, and I know of no test to the contrary, so in GNAT
+in the Wide_COnstants case, we just fold the Latin-1 set.
+
+All we have in AI285 is that the package Wide_Wide_Constants
+"is similar" to the Wide_Constants package.
+
+Now if you can deduce from this that the case mapping is
+supposed to be the same as the rather arbitrary rules for
+identifiers, I sure can't. In particular, I would say that
+if fancy case foldingh is done by these run-time routines,
+it should be done correctly (e.g. in Turkey) using appropriate
+locale information. I sure can't tell one way or another from
+the AI. In fact I just assumed that the functions in
+Wide_Wide_Constants were functionally the same as those
+in Constants.
+
+We really need more definition here. What is the intention
+in Ada 95? What is the reality in Ada 95 compilers? (for
+Wide_Constants). What is the intention in ADa 2005 for
+Wide_Wide_Constants).
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, February 2, 2005  3:14 PM
+
+> We really need more definition here. What is the intention
+> in Ada 95? What is the reality in Ada 95 compilers? (for
+> Wide_Constants). What is the intention in ADa 2005 for
+> Wide_Wide_Constants).
+
+Good point. I'd naturally assumed that these tables did "the right thing",
+but, as you point out, that would be incompatible (well, "inconsistent")
+with Ada 95. (For the record, Janus/Ada just uses Latin-1 case conversion
+for Wide_Constants.)
+
+It certainly seems weird that the case folding that the compiler uses isn't
+available to the user (especially since the code clearly has been written).
+
+Another thing for AI-395, I guess.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005  3:53 PM
+
+> Good point. I'd naturally assumed that these tables did "the right thing",
+> but, as you point out, that would be incompatible (well, "inconsistent")
+> with Ada 95. (For the record, Janus/Ada just uses Latin-1 case conversion
+> for Wide_Constants.)
+
+I bet all other Ada 95 compilers do the same :-) Certainly GNAT does
+
+> It certainly seems weird that the case folding that the compiler uses isn't
+> available to the user (especially since the code clearly has been written).
+
+But it is not clear that you want to use the same case folding conventions
+as for identifiers, where it is critical to be non-locale dependent.
+
+I think I would just leave well enough alone.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, February 2, 2005  4:51 PM
+
+> But it is not clear that you want to use the same case folding conventions
+> as for identifiers, where it is critical to be non-locale dependent.
+>
+> I think I would just leave well enough alone.
+
+Well, OK, but that's not what some guy named Robert Dewar said this morning
+when he brought up this topic:
+
+> I also find oddly missing the case insensitive stuff for Wide_Wide.
+> Given that we tool the decision to make this available for identifiers,
+> shouldn't we give run time access to this facility? I certainly intend
+> to do that in GNAT, but currently this is in a GNAT specific package,
+> and I am not sure that seems right.
+
+Have you just changed your mind on this, or am I confused as to what you are
+proposing??
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005  5:19 PM
+
+> Have you just changed your mind on this, or am I confused as to what you are
+> proposing??
+
+What I would add is a new package, perhaps Ada.Wide_Wide_Characters_Handling
+or somesuch, which provides the categorization rules used in the compiler.
+Here is what GNAT provides:
+
+package GNAT.UTF_32 is
+
+    type UTF_32 is mod 2 ** 32;
+    --  The actual allowed range is 16#00_0000# .. 16#01_FFFF#
+
+    function Is_UTF_32_Letter (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Letter);
+    --  Returns true iff U is a letter that can be used to start an identifier.
+    --  This means that it is in one of the following categories:
+    --    Letter, Uppercase (Lu)
+    --    Letter, Lowercase (Ll)
+    --    Letter, Titlecase (Lt)
+    --    Letter, Modifier  (Lm)
+    --    Letter, Other     (Lo)
+    --    Number, Letter    (Nl)
+
+    function Is_UTF_32_Digit (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Digit);
+    --  Returns true iff U is a digit that can be used to extend an identifer,
+    --  which means it is in one of the following categories:
+    --    Number, Decimal_Digit (Nd)
+
+    function Is_UTF_32_Line_Terminator (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Line_Terminator);
+    --  Returns true iff U is an allowed line terminator for source programs,
+    --  which means it is in one of the following categories:
+    --    Separator, Line (Zl)
+    --    Separator, Paragraph (Zp)
+    --  or that it is a conventional line terminator (CR, LF, VT, FF)
+
+    function Is_UTF_32_Mark (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Mark);
+    --  Returns true iff U is a mark character which can be used to extend
+    --  an identifier. This means it is in one of the following categories:
+    --    Mark, Non-Spacing (Mn)
+    --    Mark, Spacing Combining (Mc)
+
+    function Is_UTF_32_Other (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Other);
+    --  Returns true iff U is an other format character, which means that it
+    --  can be used to extend an identifier, but is ignored for the purposes of
+    --  matching of identiers. This means that it is in one of the following
+    --  categories:
+    --    Other, Format (Cf)
+
+    function Is_UTF_32_Punctuation (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Punctuation);
+    --  Returns true iff U is a punctuation character that can be used to
+    --  separate pices of an identifier. This means that it is in one of the
+    --  following categories:
+    --    Punctuation, Connector (Pc)
+
+    function Is_UTF_32_Space (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Space);
+    --  Returns true iff U is considered a space to be ignored, which means
+    --  that it is in one of the following categories:
+    --    Separator, Space (Zs)
+
+    function Is_UTF_32_Non_Graphic (U : UTF_32) return Boolean;
+    pragma Inline (Is_UTF_32_Non_Graphic);
+    --  Returns true iff U is considered to be a non-graphic character,
+    --  which means that it is in one of the following categories:
+    --    Other, Control (Cc)
+    --    Other, Private Use (Co)
+    --    Other, Surrogate (Cs)
+    --    Other, Format (Cf)
+    --    Separator, Line (Zl)
+    --    Separator, Paragraph (Zp)
+    --
+    --  Note that the Ada category format effector is subsumed by the above
+    --  list of Unicode categories.
+
+    function UTF_32_To_Upper_Case (U : UTF_32) return UTF_32;
+    pragma Inline (UTF_32_To_Upper_Case);
+    --  If U represents a lower case letter, returns the corresponding upper
+    --  case letter, otherwise U is returned unchanged. The folding is locale
+    --  independent as defined by documents referenced in the note in section
+    --  1 of ISO/IEC 10646:2003
+
+end GNAT.UTF_32;
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, February 2, 2005  5:39 PM
+
+> > Have you just changed your mind on this, or am I confused as to what you are
+> > proposing??
+>
+> What I would add is a new package, perhaps
+> Ada.Wide_Wide_Characters_Handling
+> or somesuch, which provides the categorization rules used in the compiler.
+
+OK, that's what I was proposing too. I would think the name should be
+Ada.Characters.Wide_Wide_Handling. Ada.Wide_Wide_Characters.Handling would
+work, too, but I don't see much point in adding another empty package to the
+heirarchy. And it would be inconsist to have this directly as a child of Ada
+when all of the other similar packages are grandchildren. Topic for
+discussion, I guess.
+
+Upon reflection, I don't think that it would work to use the mapping
+features, because the conversion isn't necessarily 1-to-1. So I agree it is
+best to leave them as they are.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, February 2, 2005  6:34 PM
+
+I agree with the name (Wide_Wide_Handling), and presumably you woruld
+want to add Wide_Handling as well. I would suggest exactly teh
+categorizations in AI285, we don't want to spend more time on this.
+I thus withdraw my concern for Turkish I with dot :-)
+
+For sure, we need a clearer spec of the mapping units, to make sure that
+no one else makes the very natural assumption you did.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Thursday, February 3, 2005  4:24 PM
+
+> We really need more definition here. What is the intention
+> in Ada 95?
+
+Dunno, ask Bob/Tuck.
+
+> What is the reality in Ada 95 compilers? (for
+> Wide_Constants).
+
+We only cover Latin-1.  In other words, only the lower case Latin-1
+letters appear in Lower_Set, and only the Latin-1 upper case letters are
+transformed by Lower_Case_Map (the other characters are unaffected).
+
+> What is the intention in ADa 2005 for Wide_Wide_Constants.
+
+My intention was that Wide_Wide_Constant would only cover the Latin-1 set,
+just like Wide_Constant (I agree that the current RM is unclear here, but
+it seems that implementations agree, so it's just a matter of fixing the
+wording).  The reason is that I wanted users to be able to move from
+Wide_Anything to Wide_Wide_Anything with minimum semantic changes.
+
+This being said, it looks to me like providing categorization and case
+mapping at run-time would be a good idea.  I'll try to add that to AI 395.
+A few comments:
+
+1 - I'd prefer to name the packages Ada.Wide_Characters.Handling and
+Ada.Wide_Wide_Characters.Handling.  This way, Ada.Wide_Characters and
+Ada.Wide_Wide_Characters would act like umbrellas under which implementers
+and/or users could add packages appropriate to extended character sets
+and/or locale (collation, encoding, etc.).
+
+2 - I see no reason to restrict categorization to the categories defined
+in 2.1.  We are talking program execution, and a program might be
+interested in knowing that a character is a digit or a symbol, for
+instance.  So I believe we need to cover all the Unicode categories.  This
+will require bigger tables, but if you don't want these tables in your
+closure, don't with these packages.
+
+3 - The three case mappings defined by Unicode are lower case, upper case
+and title case.  I think it makes sense to support all three, as they are
+non-trivial.  I don't want to support what RM95 calls basic mapping
+(dropping the diacriticals) because it's not a well-defined Unicode
+transformation.
+
+At this point I can hear Robert yelling that he cannot share the tables he
+has for the lexical analyzer, but come on, Robert, it's just another bunch
+of tables ;-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February 3, 2005  7:57 AM
+
+> At this point I can hear Robert yelling that he cannot share the tables he
+> has for the lexical analyzer, but come on, Robert, it's just another bunch
+> of tables ;-)
+
+Actually my comment is of a different kind. This is a new feature, with a
+non-trivial implementation, and no user demand, being introduced much too
+late in the process. Let's leave well enough alone for this go around, and
+see if any demand develops.
+
+I agree with everything technical you said, I just think it is too late.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Thursday, February 3, 2005  1:17 PM
+
+> > We really need more definition here. What is the intention
+> > in Ada 95?
+>
+> Dunno, ask Bob/Tuck.
+
+I'm certainly no expert on these matters, and I suspect Tuck isn't
+either.  I've no idea how to answer the above question.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, February 3, 2005  2:44 PM
+
+Bob wrote:
+
+> I'm certainly no expert on these matters, and I suspect Tuck isn't
+> either.  I've no idea how to answer the above question.
+
+The intent is crystal clear: see A.4.7(48). (I found that last night when
+working on the section - yes, I had missed it too.)
+
+Pascal said:
+
+> My intention was that Wide_Wide_Constant would only cover the Latin-1 set,
+> just like Wide_Constant (I agree that the current RM is unclear here, but
+> it seems that implementations agree, so it's just a matter of fixing the
+> wording).  The reason is that I wanted users to be able to move from
+> Wide_Anything to Wide_Wide_Anything with minimum semantic changes.
+
+That's fine, but then why did you drop the note A.4.7(48) when you created
+A.4.8? You copied everything else, so it would seem to have been an
+intentional change. Or, I suppose you could have just failed to copy all of
+the text. Anyway, an analog to the note A.4.7(48) should be included in
+A.4.8, and then all is well.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February 3, 2005  6:27 PM
+
+This should not be a note, it's content cannot be derived from the
+text in my opinion. I would retain the note but make it normative
+for both Wide_ and Wide_Wide_ cases.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Friday, February 4, 2005  3:07 AM
+
+I must have forgotten to copy this piece of text.  I certainly don't
+remember any intentional change here.  Given the amount of wording in this
+AI, the cut-and-paste error is by far the most likely explanation.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February 4, 2005  8:34 AM
+
+Do you agree that while we are at it, we should make that note normative.
+Otherwise we are depending on the meaning of "similar" and since we as
+Ada experts got confused on the intent before seeing this note, we should
+make sure that the normative text makes the intent clear. This is really
+even more true for Ada 2005, where case folding for wide characters has
+reared its ugly head, as well as character classification. Given that the
+RM goes to so much effort to define this stuff, it is not unnatural to
+expect that this would be available at run time, and consequently not
+at all unnatural to expect the opposite of the note for wide_wide character
+at least.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February  3, 2005  9:08 PM
+
+This has:
+
+    type char16_array is array (size_t range <>) of aliased char16_t;
+    pragma Pack (char16_array);
+
+It seems meaningless to pragma Pack an array with aliased characters
+since the size of such components is fixed anyway.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, February  3, 2005  9:22 PM
+
+Certainly the only requirement of "aliased" is to make the components
+addressable. If a compiler typically aligns components on (32-bit) word
+boundaries, that would certainly have addressable components, but they
+wouldn't be packed (to 16-bits in this case). So the pragma is not quite
+unnecessary (and it certainly isn't "meaningless"). I agree that on most
+implementations, it wouldn't have any effect. The same holds for the 32-bit
+array (although that seems even harder to imagine how it could be unpacked).
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February  3, 2005  9:31 PM
+
+No the sizes have to be the same as well, because pointers don't know
+what they are pointing to.
+
+Take your example of components being aligned to 32-bits. You can only
+imagine that making sense on a machine where it is easier to address
+32-bit aligned stuff. But on such a machine aliased stand alone components
+would be allocated 32-bits anyway, so you would have to have 32-bit
+components.
+
+Remember that packing things loses independence, so there is a real
+negative effect, and in practice zero positive effect. Furthermore,
+the real requirement should be that the array is layed out the way
+C would lay it out. So the appropriate notation would be
+pragma Convention (C, ...) rather than pragma Pack.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, February  3, 2005  9:52 PM
+
+I should know better than to comment on things I barely understand. :-)
+
+The real reason for the pragm Pack is that it is given on all of the
+existing arrays in Interfaces.C: Char_Array and WChar_Array, both of which
+also have aliased components. Char_Array also gives a 'Component_Size
+clause, which seems like duplication to me - it would override pack in any
+case.
+
+I agree that Convention (C,...) would make more sense, as would a
+'Component_Size clause. But we tend to copy what's already in the Standard,
+because if no one has complained, it must be right. :-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February  3, 2005  9:57 PM
+
+Are you sure this was aliased in the first version of the Ada 95
+standard (I know we added aliased somewhere :-)
+
+You don't want a component_size clause, that's inappropriate if
+a convention (C, ...) is there. I really think a convention (C
+is worth adding in any case, since it really gets to the heart
+of the intention here!
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, February  3, 2005  10:19 PM
+
+> Are you sure this was aliased in the first version of the Ada 95
+> standard (I know we added aliased somewhere :-)
+
+Yes, it was Stream_Element_Array that it was added to. These had to be
+aliased from the begining; it would be hard to map a char* (or is that
+*char? I use C just often enough to be dangerous :-) if they were not
+aliased.
+
+> You don't want a component_size clause, that's inappropriate if
+> a convention (C, ...) is there. I really think a convention (C
+> is worth adding in any case, since it really gets to the heart
+> of the intention here!
+
+Well, everything in Interfaces.C is defined to be C-compatible. I realize
+that's not *quite* the same, but it would seem that a Convention (C, ...)
+would be about as useful as Pack, and less obvious as to the intent. (It
+would be odd to put Convention only on a couple arrays, rather than
+everything in the package -- after all, the whole package is above C
+interfacing). I'd be more tempted to just delete all of the pragma Pack and
+call it a day. :-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, February  3, 2005  10:51 PM
+
+I agree with that temptation :-)
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Friday, February  4, 2005  3:23 AM
+
+I am opposed to adding convention C there.  It would have to be done for
+each and every type, it would make the spec harder to read, and everything
+in these packages is C-compatible anyway.
+
+I did put pragma Pack on the new arrays because the others had it.  I
+agree that all these pragmas are unnecessary, but this seems
+insufficiently broken to change.  Robert's argument about independent
+addressibility is formally true, but let's get real, no implementer who
+cares about C is going to make these components non-independently
+addressable (see in particular AARM 9.10(1.c)).
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  8:54 AM
+
+In that case, let's get real and remove the Pack. The trouble is that
+to me, the Pack overrides the C-compatible requirement. Now in practice
+we may be rescued by the aliased, but this seems indirect.
+
+Actually we may also be rescued if Wide_Wide_Character'Size is 32, but
+is that in fact the case? I can't find anything that explicitly says this
+is the case, and the statement in package Standard:
+
+     -- The declaration of type Wide_Wide_Character is based on the full
+     -- ISO/IEC 10646:2003 character set. The first 65536 positions have the
+     -- same contents as type Wide_Character. See 3.5.2.
+     type Wide_Wide_Character is (nul, soh, ..., FFFE, FFFF, ...);
+
+is not entirely clear, but I guess the full character set is indeed the
+range 16#0000_0000# to 16#FFFF_FFFF# rather than the defined range of
+16#0000_0000# to 16#0010_FFFF#?
+
+I think we should clearly mandate the size to be 32 bits in package Standard.
+
+Well anyway a useful discussion, makes me realize that my implementation
+of UTF-8 is incomplete, since it handles only up to 16#10_FFFF#. To be
+fixed!
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Friday, February  4, 2005  10:51 AM
+
+> Actually we may also be rescued if Wide_Wide_Character'Size
+> is 32, but is that in fact the case? I can't find anything
+> that explicitly says this is the case...
+
+And there is no such thing.  3.5.2(3.1/2) has:
+
+"The predefined type Wide_Wide_Character is a character type whose values
+correspond to the 2147483648 code positions of the ISO/IEC 10646:2003
+character set."
+
+So this type only has 2**31 values, you have 128 groups of 256 planes of
+256 rows of 256 cells.
+
+In the absence of a size clause, I would therefore expect
+Wide_Wide_Character'Size to be 31.  We don't put size clauses for
+Character and Wide_Character, so I didn't put one for Wide_Wide_Character.
+
+But then Wide_Wide_String is a packed array of Wide_Wide_Character, so
+each character would occupy 31 bits!  I don't think this is exactly what
+we want.  Better put a size clause, then.
+
+> is not entirely clear, but I guess the full character set is
+> indeed the range 16#0000_0000# to 16#FFFF_FFFF# rather than
+> the defined range of 16#0000_0000# to 16#0010_FFFF#?
+
+The upper bound is actually 16#7FFF_FFFF#.  Get it right the first time,
+you don't want to have to revise your compiler when they encode Klingon
+:-)
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, February  4, 2005  11:23 AM
+
+> I think we should clearly mandate the size to be 32 bits in package
+> Standard.
+
+Hmm.  Good point!
+
+Does every 32-bit bit pattern represent a valid value of type W_W_C?
+I would think the answer should be "yes".  For example, unchecked
+conversion of any 32-bit integer to W_W_C should not cause erroneous
+execution.  And 'Value should not raise Constraint_Error if that code
+point does not exist as a character.
+
+I don't think we want W_W_C to be one of those evil "holey" enumeration
+types.
+
+If you agree with that, this implies that the standard should say that
+W_W_C has exactly 2**32 enumeration literals, which would imply that
+'Size = 32.  And I suppose most of them are written in italics, and
+can't be typed in a program (just like nul of type Character).
+Is that right?
+
+I suppose ISO adds new characters from time to time?
+Does the Ada standard automatically track this?
+Do we have to publish a new AI every time it happens?
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  12:04 PM
+
+> But then Wide_Wide_String is a packed array of Wide_Wide_Character, so
+> each character would occupy 31 bits!  I don't think this is exactly what
+> we want.  Better put a size clause, then.
+
+Right, I think so (a size clause is appropriate indeed).
+
+>>is not entirely clear, but I guess the full character set is
+>>indeed the range 16#0000_0000# to 16#FFFF_FFFF# rather than
+>>the defined range of 16#0000_0000# to 16#0010_FFFF#?
+>
+> The upper bound is actually 16#7FFF_FFFF#.  Get it right the first time,
+> you don't want to have to revise your compiler when they encode Klingon
+> :-)
+
+I guess we should have tests in the suite for this range
+
+    X := Wide_Wide_Character'Val (16#7FFF_FFFF#); -- OK
+    X := Wide_Wide_Character'Val (16#FFFF_FFFF#); -- raise CE
+
+Actually I wonder whether we should not just make Wide_Wide_Character be
+32-bits and be done with it, and just say that the first 2**31 values,
+correspond to the 10646 type and the upper half is implementation
+defined. It seems useful to be able to interchange raw bytes and
+characters, so why not raw words and 4-byte characters.
+
+As for Klingon, who knows what the case folding rules are? :-)
+
+Note the above discussion is even more reason to remove the pragma
+Pack from interfaces.c :-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  12:05 PM
+
+> Does every 32-bit bit pattern represent a valid value of type W_W_C?
+> I would think the answer should be "yes".  For example, unchecked
+> conversion of any 32-bit integer to W_W_C should not cause erroneous
+> execution.  And 'Value should not raise Constraint_Error if that code
+> point does not exist as a character.
+
+I agree
+
+> I don't think we want W_W_C to be one of those evil "holey" enumeration
+> types.
+
+Right
+
+> If you agree with that, this implies that the standard should say that
+> W_W_C has exactly 2**32 enumeration literals, which would imply that
+> 'Size = 32.  And I suppose most of them are written in italics, and
+> can't be typed in a program (just like nul of type Character).
+> Is that right?
+
+Right
+
+> I suppose ISO adds new characters from time to time?
+> Does the Ada standard automatically track this?
+> Do we have to publish a new AI every time it happens?
+
+No, the AI is designed to avoid this
+
+The limitations on identifiers are a problem, but that's a self-created
+one. In GNAT we have two modes for identifiers, the old one we have always
+allowed (all wide chars allowed, no case equivalence, and the new complex
+rules in AI-285. So Klingon programmers can still use klingon letters in
+identifiers in GNAT using the old method.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Friday, February  4, 2005  4:05 PM
+
+> I guess we should have tests in the suite for this range
+>
+>     X := Wide_Wide_Character'Val (16#7FFF_FFFF#); -- OK
+>     X := Wide_Wide_Character'Val (16#FFFF_FFFF#); -- raise CE
+
+Agreed.
+
+> Actually I wonder whether we should not just make
+> Wide_Wide_Character be 32-bits and be done with it, and just
+> say that the first 2**31 values, correspond to the 10646 type
+> and the upper half is implementation defined.
+
+No, we want to stick to the 10646 as closely as possible, if only for
+political reasons.  Plus, why invite non-portability in the upper half
+when we really don't need it.  By adding a size clause, we pretty much
+ensure that the right thing happens, but that the range complies with
+10646.
+
+Note that the type is not "holey", so it doesn't create nasty
+erroneousness.
+
+And if you want to interchange W_W_C, just do an unchecked conversion to
+Integer_32, it will work.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  4:38 PM
+
+> No, we want to stick to the 10646 as closely as possible, if only for
+> political reasons.  Plus, why invite non-portability in the upper half
+> when we really don't need it.  By adding a size clause, we pretty much
+> ensure that the right thing happens, but that the range complies with
+> 10646.
+
+I think this is overplaying the political card. There is nothing in C
+that would limit the range of the 32 bit char type, so why should there
+be in Ada? This is nothing in 10646 that requires this limitation. Please
+let's not get into a mode of letting Ada make things into a pain in the
+neck when it is not necessary.
+
+There is no real non-portability that results either (if you think there
+is, show me a program, remember that Image is non-portable in any case).
+>
+> Note that the type is not "holey", so it doesn't create nasty
+> erroneousness.
+
+Sure it does, if you do an unchecked conversion from a 32 bit type, then
+you get erroneousity if the value is out of range.
+
+I really think this is a siginficant mistake. I see nothing anywhere
+that would require us to introduce this obvius inconvenience.
+
+One thing to remember here is that although Wide_Wide_Character is
+nominally 10646 there is nothing much at run time that makes it so
+(at compiel time we have the extremely irritating restrictions on
+character and string literals, which are also a mistake). In practice
+you should be able to use it for any 32-bit character coding, just
+as in the 8-bit case.
+
+Nominally the Ada Character type is ISO Latin-1, but almost no Ada
+users use it that way, instead they use it to represent whatever the
+native encoding of their environment is (almost never Latin-1 on a
+PC for example), and everything works out just fine.
+
+I must say these comments about political motivations sure explain
+what I regard as a bit of over-enthusiasm in trying to over-support
+10646 with things that no other programming languages are doing.
+I see no reason for Ada to shoot itself in the foot!
+
+> And if you want to interchange W_W_C, just do an unchecked conversion to
+> Integer_32, it will work.
+
+I have no idea what this is supposed to mean, but I don't think it is
+relevant.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, February  4, 2005  4:53 PM
+
+> I think this is overplaying the political card. There is nothing in C
+> that would limit the range of the 32 bit char type, so why should there
+> be in Ada?
+
+I'm with Robert on this.
+
+If W_W_C has 2**31 values, then I can't safely read a text file without
+tripping over erroneousness!  That seems pretty bad to me.
+
+If the file is encoded as a sequence of 32-bit values (that's all UTF-32
+is, right?), it can certainly contain bit patterns that are out of
+bounds.  I think UTF-8 and/or the "compressed" representation (the
+sliding-windows thing) allow to represent out-of-range values, too.
+(I'm not sure about that -- anybody know for sure?)
+
+We're not violating the unicode standard by having 2**32 values.  That's
+a perfectly reasonable way for Ada to represent the characters of that
+standard.  (Robert's analogy with C is apt.)  Note that all the
+out-of-bounds values will have untypable enumeration literals (italics),
+so there's no portability issue.  I was not suggesting that
+implementations be allowed to provide impl-def literals for those
+values.
+
+Note that if there are 2**32 values as I suggested, you can't easily
+*generate* files containing bad characters, but you can easily *process*
+such files.  That's the way it should be, right?
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  5:41 PM
+
+> If the file is encoded as a sequence of 32-bit values (that's all UTF-32
+> is, right?), it can certainly contain bit patterns that are out of
+> bounds.  I think UTF-8 and/or the "compressed" representation (the
+> sliding-windows thing) allow to represent out-of-range values, too.
+> (I'm not sure about that -- anybody know for sure?)
+
+Actually, this is a problem. UTF-8 has no way of representing values
+greater than 31-bits, so if we do allow "upper half" wide wide characters
+they cannot be encoded in UTF-8 form. However, I think this is not terrible
+it just means that if you try to output such a value in UTF-8 form, you get
+an exception.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February  4, 2005  5:59 PM
+
+> Actually, this is a problem. UTF-8 has no way of representing values
+> greater than 31-bits, so if we do allow "upper half" wide wide characters
+> they cannot be encoded in UTF-8 form. However, I think this is not terrible
+> it just means that if you try to output such a value in UTF-8 form, you get
+> an exception.
+
+It could hardly be a problem for the Standard, which never even mentions
+UTF-8; implementation-defined stuff can do what it wants.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  6:06 PM
+
+Are you sure there is no connection between 10646 and UTF-8? That's not
+what I have read in several different places.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February  4, 2005  6:17 PM
+
+I meant our standard (Ada); it never mentions UTF-8, even in AARM notes.
+That means doing anything in UTF-8 is implementation-defined from the
+perspective of Ada. Certainly there is no requirement for
+Wide_Wide_Wide_Text_IO to write in UTF-8.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  7:19 PM
+
+What an extraordinary head-in-the-sand attitude :-) Reminds me of Algol-60
+booting on all I/O.
+
+These 32-bit characters are only viable if there is a reasonably standardized
+encoding muechanism. UTF-8 is really the only reasonable candidate. You can't
+just ignore this.
+
+It would be like designing the language with very nice graphic characters and
+then saying it is up to the implementations to find out how to represent
+programs, nothing to do with us. Hmm! come to think of it, the Algol-60
+folks did that as well :-)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February  4, 2005  8:51 PM
+
+> It would be like designing the language with very nice graphic characters and
+> then saying it is up to the implementations to find out how to represent
+> programs, nothing to do with us. Hmm! come to think of it, the Algol-60
+> folks did that as well :-)
+
+Humm, this certainly applies to Ada 95, and applies as well to Ada 2005.
+Certainly, 2.1(18), and the representation sentences of 2.1(4/2) and 2.1(5/2)
+are unchanged in Ada 2005. It seems to me that this is exactly the issue that
+Dan was worried about: a backdoor requirement to support UTF-8 everywhere. If
+it really is a requirement, it should be specified in 2.1 (and debated as
+such).
+
+Anyway, I'm not sure which Robert Dewar this is. I know that the Robert
+Dewar I know was very opposed to any sort of runtime UTF-8 support back when
+that was discussed in October 2002. I think that's one of the reasons that
+we didn't wade into encoding. And the Robert Dewar I know always has been
+very vocal that Ada shouldn't require a source representation. Perhaps that
+Robert Dewar has been replaced by a newer model?
+
+Seriously, what do you think the Ada standard should say here that it
+doesn't currently say? Generally, we've avoided questions of
+representations.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005 10:07 PM
+
+> Anyway, I'm not sure which Robert Dewar this is. I know that the Robert
+> Dewar I know was very opposed to any sort of runtime UTF-8 support back when
+> that was discussed in October 2002. I think that's one of the reasons that
+> we didn't wade into encoding. And the Robert Dewar I know always has been
+> very vocal that Ada shouldn't require a source representation. Perhaps that
+> Robert Dewar has been replaced by a newer model?
+
+I am more consistent than you think.
+
+I think it is fine for the standard to say nothing about representation.
+
+BUT, and it is a big but, there must be one or more fairly reasonable
+and obvious way of handling the representation.
+
+Although theoretically it is not required, I can't imagine an Ada
+compiler not supporting stream files in ASCII, and likewise for Ada 2005
+I can't imagine a compiler not supporting UTF-8 (GNAT has supported
+UTF-8 for years).
+
+What representation do you have in mind for supporting full 32-bit
+characters?
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February  4, 2005 10:43 PM
+
+> Although theoretically it is not required, I can't imagine an Ada
+> compiler not supporting stream files in ASCII, and likewise for Ada 2005
+> I can't imagine a compiler not supporting UTF-8 (GNAT has supported
+> UTF-8 for years).
+
+If that's true (and I agree it is), then the Standard should stop the
+charade and require UTF-8 as a source representation.
+
+But that wasn't the issue that you raised. You asked what happened if
+someone *output * a character value > 2**31. That certainly has nothing to
+do with source representation; it's purely a runtime issue.
+
+> What representation do you have in mind for supporting full 32-bit
+> characters?
+
+Where? In memory, 32-bits, I would expect. (Although many applications need
+in-memory UTF-8 support, because the space used by rarely used 32-bit
+characters is too much. It's also needed for files as well, if its
+impractical to read the file with Text_IO - as in one string component in a
+larger database record. But that doesn't map to Very_Wide_String.) In
+Double_Wide_Text_IO, probably the same unless there was a customer demand
+for something else.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Saturday, February  5, 2005  3:40 AM
+
+> If W_W_C has 2**31 values, then I can't safely read a text
+> file without tripping over erroneousness!  That seems pretty
+> bad to me.
+
+I would expect Wide_Wide_Text_IO to raise Data_Error if it finds a 32-bit
+element that is not in W_W_C.  Similarly, I would expect Wide_Wide_Text_IO
+to raise Data_Error if it reads a UTF-8 file that is improperly encoded
+(assuming that you support UTF-8 files).  I am not sure if this can be
+deduced from A.10.6(10), so a clarification might help, but I don't see
+any erroneousness here.
+
+Now if you read a file of 32-bit elements using, say, Stream_IO, and you
+uncheck convert the result to W_W_C, sure, you can get erroneousness, but
+you don't have my sympathy: you should be doing some checking on the data
+you get, or you should be using Wide_Wide_Character'Val.  But then the
+situation is not different from, say, Boolean: if you read a raw byte and
+trust it to be a boolean, you have a problem.
+
+So I don't see what this erroneousness fuss is about.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  8:23 AM
+
+Another issue with only allowing the limited range for WWC
+is that it means there are various constraint checks at
+run time. These seem undesirable for two reasons.
+
+1. They waste time in the normal case
+
+2. They achieve no useful result
+
+3. They force the programmer to be prepared to handle CE's
+in situations where this is inconvenient and unexpected.
+
+We also have the unfinished business of whether the
+Interfaces.C routine To_Ada that translates from char32_t
+to WWC can raise CE. In my view, if it can, then that
+clearly points out a weakness that the two types do
+not properly correspond.
+
+Note that in Ada 83, the same mistake was made. Character
+was defined to be 7 bits, and some compilers did annoying
+checks to make sure that upper half characters were not
+used. This was a real pain.
+
+It is important to realize that this pain had nothing
+whatever to do with Latin-1. What people wanted was an
+8-bit character code where neither the compiler nor the
+run-time intefered with the representations. They know
+what graphic corresponds to 16#A5# in their environment
+and the RM cannot possibly know, the best the RM can do
+is not get in the way.
+
+Well for 8-bit Character, the RM has lots of details
+about Latin-1, but most programmers can completely
+ignore this.
+
+For example, in my windows environment, if I want an
+upper case enya (is that the spelling, I mean the
+upper case N with a tilde), then I output
+Character'Val (16#A5#) and it works fine. The fact
+that the formal model of Ada thinks I am putting out
+a Yen_Sign is of no earthly relevance to me.
+
+Let's not repeat the Ada 83 mistake with Ada 2005.
+We can make sure the RM supports 10646, but let's not
+have it get in the way of supporting arbitrary 32-bit
+character sets at run time.
+
+It's bad enough to restrict the contents of string
+and character literals, but at least that you can get
+around with WWC'Value etc.
+
+But restricting the range is fatal. Even suppressing
+checks won't work, since the run time will still be
+riddled with unwanted checks, and the optimizer is
+always likely to do unhelpful things.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Saturday, February  5, 2005  3:47 AM
+
+> Actually, this is a problem. UTF-8 has no way of representing
+> values greater than 31-bits, so if we do allow "upper half"
+> wide wide characters they cannot be encoded in UTF-8 form.
+> However, I think this is not terrible it just means that if
+> you try to output such a value in UTF-8 form, you get an exception.
+
+But then why would this value exist in the W_W_C type in the first place?
+You cannot write it in a UTF-8 file, surely you cannot read it from a
+UTF-8 file.  The same is presumable true for the other "standard" formats:
+UTF-16 and UCS-4, which are all intented to cover only 31 bits: I don't
+see why Wide_Wide_Text_IO should allow the creation of incorrect UCS-4
+file.  So the only advantage of having a 32-bit value internally is
+presumably to make Unchecked_Conversion safe.  But since this is a type
+that has attributes 'Pos and 'Val, I don't see why you would use
+Unchecked_Conversion at all.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  7:52 AM
+
+Let's put it another way round. Pascal, you are proposing a
+restriction that some of us find a pain in the neck which
+will cause trouble. We have given our reasons (you did not
+even respond to my point about arbitrary 32-bit encodings).
+
+How about you give some reason for putting *in* the
+restriction. So far, all we have is a (to my mind
+pretty bogus) "political" argument that the restriction
+makes us better conform to 10646.
+
+I have said why I think this is bogus
+
+  a) 10646 requires support of certain stuff, nowhere does
+     it say that languages cannot support more.
+
+  b) other languages do not implement this restriction
+
+Let me give a concrete example of why this will cause
+trouble. In C, the type char32_t can be used to store
+arbitrary 32-bit values. The routines in interfaces-c
+allow such values to be converted to wide_wide_character.
+Do you really want checks all over the place here and
+raising of CE if the "conversion" fails. I think not.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  7:57 AM
+
+> But then why would this value exist in the W_W_C type in the first place?
+> You cannot write it in a UTF-8 file, surely you cannot read it from a
+> UTF-8 file.  The same is presumable true for the other "standard" formats:
+> UTF-16 and UCS-4, which are all intented to cover only 31 bits: I don't
+> see why Wide_Wide_Text_IO should allow the creation of incorrect UCS-4
+> file.  So the only advantage of having a 32-bit value internally is
+> presumably to make Unchecked_Conversion safe.  But since this is a type
+> that has attributes 'Pos and 'Val, I don't see why you would use
+> Unchecked_Conversion at all.
+
+First, I have a very realistic scenario for getting WWC values of
+this type, namely by conversion from C types char32_t.
+
+Second, you miss the point I think about unchecked conversion. It's
+quite a normal programming practice to represent arbitrary sequences
+of bytes as type Character. Suppose in C you have a sequence of bytes
+that mixes real character values, and arbitrary integer values. Since
+C makes no difference between these types, such a mixture is quite
+reasonable. How do you map that in Ada? You have two choices:
+
+   1. Treat as array of character, either by unchecked conversion
+      or simply by acquiring say an address of string from C. Then
+      when you want an integer value, use Character'Pos.
+
+   2. Treat as array of unsigned byte, again either by unchecked
+      conversion or simply by acquiring an address. Then when you
+      what a character value, use Character'Val.
+
+Neither of these is fully satisfactory as a mapping for the C type,
+but both are workable. In practice approach 1 is likely to be more
+convenient, since it retains string literals, and access to the
+string functions.
+
+Well in the 32-bit case, exactly the same situation will arise, but
+you are arbitrarily insisting that for the 32-bit case, only approach
+2 will be used. I find that annoying.
+
+That's what the "unchecked conversion" issue is all about. It's about
+low level mucking with foreign data among other things.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Monday, February  7, 2005  8:56 AM
+
+> How about you give some reason for putting *in* the
+> restriction.
+
+Ok, let me try to articulate my reasoning better.
+
+We have an external definition of a character set that happens to have
+2**31 values.  Why it was chosen that way by the Unicode folks is
+irrelevant.  What we are trying to do is model this definition in Ada.
+
+We have of course many possible ways to do this.  For the sake of the
+argument, we could use a 4-field record representing the structuring of
+the Unicode character space in terms of group, plane, row and cell.  We
+would do that if we thought that this structuring was the most important
+property of the Unicode character set, and that applications would need to
+access it constantly.
+
+Evidently we believe that the structuring of the character space is of
+minor importance, and that operations like comparison, range, iteration,
+and of course string literals are essential.  So we choose a character
+type.
+
+Whether we choose a character type with 2**32 values or a character type
+with 2**31 values (and size 32) depends largely on what we think are the
+most important operations from a user's perspective.
+
+- Robert and Bob apparently say that low-level operations like raw I/O,
+Unchecked_Conversion and interfacing with C will be prevalent for WWC, and
+therefore want to avoid the erroneousness that might arise if an invalid
+bit pattern is assigned to an object of WWC.
+
+- Pascal thinks that the major reason why users might want to use WWC at
+runtime is internationalization of applications.  In this perspective,
+they may have to do rather fancy processing on WWC, such as
+encoding/decoding, sorting/collating, normalization, case conversion,
+locale-dependent text processing, etc.  For this kind of algorithm, having
+2**31 extraneous literals would be a nuisance.  Users would have to
+constantly check if some WWC is one of the extraneous values, and failure
+to do so might lead to subtle bugs (not technically erroneousness, but C_E
+that would be raised once in a blue moon).  To take but one example, on a
+32-bit architecture, computations using WWC'Pos in conjunction with
+operators of root_integer might raise C_E because WWC'Pos would exceed the
+range of signed arithmetic.
+
+I'll put this question in AI 395 and it will be discussed in Paris.
+
+> Let me give a concrete example of why this will cause
+> trouble. In C, the type char32_t can be used to store
+> arbitrary 32-bit values. The routines in interfaces-c
+> allow such values to be converted to wide_wide_character.
+> Do you really want checks all over the place here and
+> raising of CE if the "conversion" fails. I think not.
+
+On this specific point, if you read carefully the C document, you'll see
+that char32_t has *at least* 32 bits.  It is guaranteed to provide enough
+storage for a Unicode character, but I can imagine that on a 64-bit
+architecture where access to 32-bit quantities would be inefficient, this
+type might have 64 bits.  So strictly speaking, a type WWC with 2**32
+values does not necessarily eliminate the checks and/or erroneousness.  (I
+admit that on most architectures it will.)
+
+****************************************************************
+
+From: Robert I. Eachus
+Sent: Monday, February  7, 2005 11:00 AM
+
+>Whether we choose a character type with 2**32 values or a character type
+>with 2**31 values (and size 32) depends largely on what we think are the
+>most important operations from a user's perspective.
+
+For characters in Character and Wide_Character,
+Wide_Wide_Character'Val(Wide_Character'Pos(X)) should be the same
+character.  This requires that
+Wide_Wide_Character'Pos(Wide_Wide_Character'First) = 0.   This is
+guaranteed by  RM 3.5.1(7).
+
+So far so good. But what about Wide_Wide_Character'Pos?  It returns a
+_universal_integer_ so the fact that some values don't fit in Integer is
+acceptable, but could cause just as many potential conversion headaches
+if Wide_Wide_Character is defined to have 2**32 values as would
+Unchecked_Conversion if Wide_Wide_Character has only 2**31 values.
+
+I guess this means that I am slightly in favor of 2**31 values, but
+Wide_Wide_Character'Size = 32.  (I'd rather use 'Pos and 'Val for
+conversions that Unchecked_Conversion where possible.)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February  7, 2005 11:41 AM
+
+> - Robert and Bob apparently say that low-level operations like raw I/O,
+> Unchecked_Conversion and interfacing with C will be prevalent for WWC, and
+> therefore want to avoid the erroneousness that might arise if an invalid
+> bit pattern is assigned to an object of WWC.
+
+No, we are not saying that this is the prevanlent use, just one useful
+use, which can be accomodated perfectly fine (without compromising in
+any way the objectives of the AI).
+
+> - Pascal thinks that the major reason why users might want to use WWC at
+> runtime is internationalization of applications.  In this perspective,
+> they may have to do rather fancy processing on WWC, such as
+> encoding/decoding, sorting/collating, normalization, case conversion,
+> locale-dependent text processing, etc.  For this kind of algorithm, having
+> 2**31 extraneous literals would be a nuisance.  Users would have to
+> constantly check if some WWC is one of the extraneous values, and failure
+> to do so might lead to subtle bugs (not technically erroneousness, but C_E
+> that would be raised once in a blue moon).  To take but one example, on a
+> 32-bit architecture, computations using WWC'Pos in conjunction with
+> operators of root_integer might raise C_E because WWC'Pos would exceed the
+> range of signed arithmetic.
+
+Why would someone using WWC in a strictly 10646 manner ever have
+out of range values? Only by doing something peculiar or erroneous.
+
+> I'll put this question in AI 395 and it will be discussed in Paris.
+>
+>>>Let me give a concrete example of why this will cause
+>>trouble. In C, the type char32_t can be used to store
+>>arbitrary 32-bit values. The routines in interfaces-c
+>>allow such values to be converted to wide_wide_character.
+>>Do you really want checks all over the place here and
+>>raising of CE if the "conversion" fails. I think not.
+>
+> On this specific point, if you read carefully the C document, you'll see
+> that char32_t has *at least* 32 bits.  It is guaranteed to provide enough
+> storage for a Unicode character, but I can imagine that on a 64-bit
+> architecture where access to 32-bit quantities would be inefficient, this
+> type might have 64 bits.
+
+Please don't imagine, this type will be 32-bits on all architectures,
+just like int (no one makes int 64 bits!) There are NO architectures
+on which access to 32-bit quantities is inefficient. Such an
+archtitecture would be an ill-designed unusable joke.
+
+>  So strictly speaking, a type WWC with 2**32
+> values does not necessarily eliminate the checks and/or erroneousness.  (I
+> admit that on most architectures it will.)
+
+It will on all architectures, unless you can produce one counterexample, it
+is really unreasonable to argue by appealing to an imaginary situation. The
+Interfaces.C unit is about the practical world.
+
+In your imaginary world, C pointers might be pairs consisting of base address
+and offset and be totally incomaptible with Ada poingters, but in practice
+this is not the case, so we don't worry about it.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Monday, February  7, 2005 12:02 PM
+
+> Ok, let me try to articulate my reasoning better.
+
+Thank you.
+
+> - Robert and Bob apparently say that low-level operations like raw I/O,
+> Unchecked_Conversion and interfacing with C will be prevalent for WWC, and
+> therefore want to avoid the erroneousness that might arise if an invalid
+> bit pattern is assigned to an object of WWC.
+
+My main concern is input.  (All input is "raw", I suppose.)  I write
+code that reads sequences of Characters.  I use Text_IO.  I use
+Direct_IO.  But mainly, I use my own concoction that is "lean and mean",
+and interfaces fairly directly (and portably!) to the OS.  (A couple of
+weeks ago, I more-than-doubled the speed of one application by switching
+from Direct_IO to the "lean and mean" thing.)  I want all of these
+things to be able to read arbitrary data without being erroneous or
+raising exceptions.
+
+It seems to me reasonable to want the same thing for
+Wide_Wide_Ever_So_Wide_Characters.
+
+You suggested Data_Error at one point.  That seems like a big change.
+For plain old Character, Text_IO uses Data_Error for things like
+malformed floats, not for inability to represent the basic character set
+being read from the file.
+
+> - Pascal thinks that the major reason why users might want to use WWC at
+> runtime is internationalization of applications.  In this perspective,
+> they may have to do rather fancy processing on WWC, such as
+> encoding/decoding, sorting/collating, normalization, case conversion,
+> locale-dependent text processing, etc.
+
+I agree that all that stuff is useful.  But two points: First, you've
+got to get data in from files, even in programs that do all of the
+above.  Second, I don't think Data_Error is making life easier for such
+applications.  Now they have to catch Data_Error (and do something about
+it, necessarily losing information -- you can't tell what the data
+*is*), and *also* look the thing up in some tables to see if it's a
+defined code point.  (Or more likely, introduce a bug by forgetting
+about the Data_Error.)  I think most of the processing you mention would
+be *easier* with the suggested 2**32 character literals.
+
+My lean and mean package certainly would not want to be raising
+Data_Error under any circumstances!  With 2**31 literals, I think I'd
+end up ignoring W_W_C, and using My_Wide_Wide_Character is mod 2**32
+instead.  (And yes, I'd be annoyed that I don't get string literals.)
+
+>... To take but one example, on a
+> 32-bit architecture, computations using WWC'Pos in conjunction with
+> operators of root_integer might raise C_E because WWC'Pos would exceed the
+> range of signed arithmetic.
+
+That's a very good point, and I don't have a good answer.  Ada is broken
+in this regard, and the only good way to fix it is to support
+arbitrary-range integers, which ain't gonna happen!  (I've always been
+annoyed that I'm not allowed to say "type T is range 1..10**100;".  On
+some implementations.)
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Monday, February  7, 2005 12:24 PM
+
+One point perhaps worth mentioning is that just about no-one
+will be doing I/O of 32-bit characters directly -- it
+is hopelessly space inefficient.  Some kind of variable-
+length encoding will be used (e.g. UTF-8).  So I wonder
+whether it will make as much difference.  I suspect
+when converting *to* UTF-8 it will be nice to know you
+have values in the range 0..2**31-1.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February  7, 2005 12:37 PM
+
+Well of course in the general case you won't know this, e.g.
+if you got values from char_32t, or other strangeness is going on.
+
+If your program is well behaved and keeps to 31-bit, then this would
+be true even if WWC actually used 32-bits.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February  7, 2005  1:04 PM
+
+My main concern here is not to repeat the mistakes of Ada 83 and Ada 95.
+In both cases, we tied the definitions of the character types far too
+closely to little used standards, and as a result ended up with a bunch
+of really annoying restrictions. Examples are
+
+1. only 7-bit characters in Ada 83, when all the world was using 8-bits
+for all sorts of things.
+
+2. only latin-1 in Ada 95, when in practice, the upper half is used for
+other things in almost all environments. It is for example an annoyance
+that you cannot conveniently put all the windows graphic chracters in
+a string literal.
+
+That's why I would prefer to be permissive, and accomodate the standard,
+but not restrict to it. In particular
+
+a) I would allow 32 bits in wide wide character
+
+b) I would allow arbitrary chars with codes > 16#FF# in string and
+character literals (with the possible exception of line/para terminators).
+
+For sure we do not want to introduce incompatibilities in programs that
+do not even use WWC, and the rejection of AD is a real mistake. Minimally
+we should correct that by allowing other,format characters in strings and
+characters, but that still leaves incomaptibilities with wide_string
+compared to Ada 95.
+
+Who knows what use people will make of the available upper half in 32-bit
+character mode in the next ten years? It is sure to be conveniently
+available in C, since we have 32-bits there for sure.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February  7, 2005  7:36 PM
+
+> So far so good. But what about Wide_Wide_Character'Pos?  It returns a
+> _universal_integer_ so the fact that some values don't fit in Integer is
+> acceptable, but could cause just as many potential conversion headaches
+> if Wide_Wide_Character is defined to have 2**32 values as would
+> Unchecked_Conversion if Wide_Wide_Character has only 2**31 values.
+
+I don't see a problem, universal integer is going to be 64 bits in
+any reasonable implementation anyway.
+
+> I guess this means that I am slightly in favor of 2**31 values, but
+> Wide_Wide_Character'Size = 32.  (I'd rather use 'Pos and 'Val for
+> conversions that Unchecked_Conversion where possible.)
+
+I don't see what problem you are solving by this choice. Can you
+give sample code showing the "potential .. headaches".
+
+****************************************************************
+
+From: Robert I. Eachus
+Sent: Tuesday, February  8, 2005  1:00 AM
+
+> I don't see what problem you are solving by this choice. Can you
+> give sample code showing the "potential .. headaches".
+
+Is it really that hard to imagine? Okay, say you have a file that claims
+to be ISO10646/Unicode text in some language, let's say Chinese, encoded
+in four octets per character.  So you read the file with an instance of
+Integer_IO into an array Data_In.  To protect against garbage values,
+you now go through with
+
+for I in Data_In'Range loop
+  if Data_In(I) not in 1..Wide_Wide_Character'Pos(Wide_Wide_Character'Last)
+  then Report_Some_Error;
+ end if;
+end loop;
+
+What happens if Wide_Wide_Character'Pos(Wide_Wide_Character'Last) =
+2**32-1?  It is too late at night for me to figure out whether the
+current 11.6 and 4.6(28) allow compilers to raise *Constraint_Error* in
+the if statement. ;-)  So in practice that is the net effect of making
+Wide_Wide_Character'Last 2**32-1 instead of 2**31-1, the data type that
+a user should choose when converting to an integer type is
+Interfaces.Unsigned_32 instead of Integer.  Is this a big deal?  No.
+But that is what that word *slightly* means above, I think that being
+able to convert the result of Wide_Wide_Character'Pos to Integer without
+worrying about exceptions is a bit more elegant.  I certainly won't cry
+and moan if the choice is to use the larger range, but as I said, I have
+a slight preference.
+
+Oh, and notice that this has nothing to do with the data safety issue as
+such.  IMHO, it is much safer to put constants in programs in the most
+meaningful way.  I could have written if Data_In(I) < 1 then..., in
+fact, the compiler may change the code into exactly that.  But the way I
+wrote it is much more informative--and more likely to be right if my
+understanding of the range of Wide_Wide_Character is wrong, or if it
+changes.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, February  8, 2005  3:05 AM
+
+> I don't see a problem, universal integer is going to be 64
+> bits in any reasonable implementation anyway.
+
+I am not quite sure how to interpret this sentence, since
+*universal_integer* has no run-time representation, and covers the
+infinite set of integer numbers.
+
+I suppose that you mean *root_integer*, not universal_integer above.  If
+that's the case, then that's a ludicrously bogus assertion (do I sound
+like RBKD? I try ;-).  In our technology root_integer has 32 bits on
+32-bit platforms, so overflows involving root_integer are a very real
+problem.  And yes, I do claim that this is a perfectly reasonable
+implementation, and we are not going to change that.  (Of course, on
+64-bit platforms root_integer has 64 bits.)
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, February  8, 2005  3:44 AM
+
+> My main concern is input.  (All input is "raw", I suppose.)
+> I write code that reads sequences of Characters.  I use
+> Text_IO.  I use Direct_IO.  But mainly, I use my own
+> concoction that is "lean and mean", and interfaces fairly
+> directly (and portably!) to the OS.  (A couple of weeks ago,
+> I more-than-doubled the speed of one application by switching
+> from Direct_IO to the "lean and mean" thing.)  I want all of
+> these things to be able to read arbitrary data without being
+> erroneous or raising exceptions.
+
+As Tuck pointed out, I don't think it works that way in real life.  The
+vast majority of files containing WWC are *not* going to be made of raw
+32-bit elements.  They will be encoded in one way or another (UTF-8 being
+apparently the most common format these days).  So Wide_Wide_Bob_IO will
+need to read *bytes* and feed them to some decoding state machine.  If
+that state machine discovers that the external file is not well formed, it
+could either report the error (by raising an exception, returning a
+status, etc) or try to recover (which could mean erroneousness).  At any
+rate, you cannot just ignore the problem, and I don't really see that
+having 2**32 literals would help.
+
+> It seems to me reasonable to want the same thing for
+> Wide_Wide_Ever_So_Wide_Characters.
+>
+> You suggested Data_Error at one point.  That seems like a big
+> change. For plain old Character, Text_IO uses Data_Error for
+> things like malformed floats, not for inability to represent
+> the basic character set being read from the file.
+
+Hmm.  I wonder how your implementation of Wide_Text_IO works, because this
+is not really a new problem.
+
+In our implementation of Wide_Text_IO, you can specify the encoding of the
+external file using the Form parameter.  Then the bytes read from the
+external file go through an appropriate decoder, and if the decoder
+discovers that the external file is malformed it raises Data_Error.
+
+Surely this is a case where we want to raise an exception, right?  Ada is
+not about reading random junk from external files.  When I wrote this, it
+seemed that Data_Error was the right exception, because it means "there is
+something rotten in the external file".  We could argue about that, but
+that's a detail.  The point is that you can get an exception when reading
+a file of Wide_Characters, even though that type has 2**16 elements.
+
+Even for Wide_Character, the raw file format that you imagine doesn't
+exist in practice.  The closest would be the UCS-2 format, but even this
+format starts with a signature that indicates the endianness of the
+machine used to produce the file, and our implementation raises Data_Error
+if the signature is not well-formed.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, February  8, 2005  8:16 AM
+
+> for I in Data_In'Range loop
+>  if Data_In(I) not in 1..Wide_Wide_Character'Pos(Wide_Wide_Character'Last)
+>  then Report_Some_Error;
+> end if;
+> end loop;
+
+This code is plain wrong, it can be optimized away, presumably what you
+are trying to do is to do a 'Valid test, but if so, please use 'Valid.
+The above code, if executed will work.
+
+Compilers can always raise CE because of silly capacity limitations, but
+in practice all compilers support decent sized universal integer at
+run time.
+
+If we have 32 bit characters and you want to check they are in 31
+bit range, you can do things in a far simpler way.
+
+If you must read in integer values, which seems a clear mistake if
+you have 32-bit characters, then you can simply test for negative.
+Better is to read them into WWC and just do:
+
+   subtype ISO_WWC is WWC range 0 .. WWC'Val(16#7FFF_FFFF);
+
+note that for many purposes, you probably want 16#10_FFFF# rather
+than the full range anyway.
+
+Now you just do
+
+    if Char not in ISO_WWC then ...
+
+The above messing with Pos seems awkward to me.
+
+> Oh, and notice that this has nothing to do with the data safety issue as
+> such.  IMHO, it is much safer to put constants in programs in the most
+> meaningful way.  I could have written if Data_In(I) < 1 then..., in
+> fact, the compiler may change the code into exactly that.  But the way I
+> wrote it is much more informative--and more likely to be right if my
+> understanding of the range of Wide_Wide_Character is wrong, or if it
+> changes.
+
+But it is just plain wrong to read 32-bit unsigned values into Integer,
+why would any programmer make this mistake?
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, February  8, 2005  8:19 AM
+
+> I suppose that you mean *root_integer*, not universal_integer above.  If
+> that's the case, then that's a ludicrously bogus assertion (do I sound
+> like RBKD? I try ;-).  In our technology root_integer has 32 bits on
+> 32-bit platforms, so overflows involving root_integer are a very real
+> problem.  And yes, I do claim that this is a perfectly reasonable
+> implementation, and we are not going to change that.  (Of course, on
+> 64-bit platforms root_integer has 64 bits.)
+
+OK, I am surprised, this means for instance that you get into trouble
+with 'Size on a 32-bit platform, since on such platforms objects can
+easily have sizes in bits greater than 2**31. TO limit objects to
+sizes of 2**31 or less is a serious limitation in my view. Anyway,
+true, if you are working on a platform with this limitation, you
+will have to be careful using Pos.
+
+I certainly see this limitation as an argument for not wanting to
+implement 32-bit characters, too bad ...
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, February  8, 2005 10:23 AM
+
+Thinking this over, the fact that a major important implementation
+does not allow the 'Pos attribute to be applied to WWC if we go 32
+bits, is a really serious problem. I really think that it is essential
+that 'Pos work fine with Wide_Wide_String. Obviously it is unreasonable
+to require 64-bit universal integer just for this purpose (and in fact
+the implementation of 64-bit universal integer is non-trivial on 32-bit
+machines if you want to avoid unnecessary 64-bit inefficiency).
+
+So, given this information combined with the points made by Eachus, I
+have changed my mind and think that 31-bit is appropriate for Wide_Character.
+
+(less work for me, that's what I already implemented :-)
+
+We do need to decide what happens with out of range char_32t values. I
+would think CE must be raised. Annoying to do all those junk tests, but
+in practice this will be seldom used anyway.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, February  8, 2005 10:25 AM
+
+Oh, and I do mean root integer. Sorry, can't get out of the
+habit of thinking of this as universal integer. I have
+groked the to-me peculiar terminology in Ada 95, which made
+sense if you really had Integer'Class but otherwise seems
+peculiar, anyway, sorry for the confusion there.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, February  8, 2005 10:43 AM
+
+> Obviously it is unreasonable to require
+> 64-bit universal integer just for this purpose (and in fact
+> the implementation of 64-bit universal integer is non-trivial
+> on 32-bit machines if you want to avoid unnecessary 64-bit
+> inefficiency).
+
+Right.  We gave some thought to that matter at some point and concluded
+that it was not an afternoon project.
+
+> We do need to decide what happens with out of range char_32t
+> values. I would think CE must be raised. Annoying to do all
+> those junk tests, but in practice this will be seldom used anyway.
+
+I agree that C_E is the right choice.  And I believe that Data_Error is
+appropriate if you get one of these in a file.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, February  8, 2005 10:53 AM
+
+> I agree that C_E is the right choice.  And I believe that Data_Error is
+> appropriate if you get one of these in a file.
+
+You mean for Text_IO. As always no checks are required for stream io,
+sequential io or direct io.
+
+****************************************************************
+
+From: Robert I. Eachus
+Sent: Tuesday, February  8, 2005  3:46 PM
+
+> This code is plain wrong, it can be optimized away, presumably what you
+> are trying to do is to do a 'Valid test, but if so, please use 'Valid.
+> The above code, if executed will work.
+
+But it is not wrong if compiled on a machine with say a 36-bit Integer
+type.  I see no reason not to go to the extra effort of writing what may
+be "junk" code, if it makes the code easier to port.
+
+> Better is to read them into WWC and just do:
+>
+>   subtype ISO_WWC is WWC range 0 .. WWC'Val(16#7FFF_FFFF);
+>
+> note that for many purposes, you probably want 16#10_FFFF# rather
+> than the full range anyway.
+>
+> Now you just do
+>
+>    if Char not in ISO_WWC then ...
+>
+> The above messing with Pos seems awkward to me.
+
+Actually that should be:
+
+subtype ISO_WWC  is range
+Wide_Wide_Character'First..Wide_Wide_Character'Val(16#7FFF_FFFF#);
+
+or
+
+subtype Unicode is range
+Wide_Wide_Character'First..Wide_Wide_Character'Val(16#10_FFFF#);
+
+But the first case is unnecessary if Wide_Wide_Character has 2**31 - 1
+values.  Again a slight argument in favor of doing things that way.
+
+> But it is just plain wrong to read 32-bit unsigned values into Integer,
+> why would any programmer make this mistake?
+
+Robert, I think you are ignoring my comments to the effect that all this
+is minor.  As I said, I would use Interfaces.Unsigned_32, instead of
+Integer if Wide_Wide_Character had more than 2**31 - 1 values.  Since I
+would prefer to write Integer instead of Interfaces.Unsigned_32, that
+translates into a mild preference for the narrower range.
+
+****************************************************************
+
+From: Dan Eilers
+Sent: Friday, February  4, 2005  5:11 PM
+
+Is it intended that other-format characters not be allowed
+in string literals used as operator_symbols?
+
+  x: integer := abs(3);      -- other-format allowed in abs
+  y: integer := "abs"(3);    -- other-format not allowed in "abs"
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  6:04 PM
+
+Definitely we do NOT want to allow other-format characters in this
+context, since you cannot tell at lexical analysis time the
+difference between string literals and operator symbols, and
+you definitely do not want such lexical rules to have to be
+resolved later. There is nothing in the AI that suggests
+any such intention.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February  4, 2005  6:07 PM
+
+If we allow other_format characters in string literals (as AI-395 suggests),
+then clearly there is no issue here. If we don't allow them in string
+literals, I don't see how or why we should allow them in identifiers (of
+course, that's a recommendation that is out of our hands).
+
+It's most important to be consistent here. (Whatever we do will be "wrong"
+to someone.)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Friday, February  4, 2005  7:17 PM
+
+> If we allow other_format characters in string literals (as AI-395 suggests),
+> then clearly there is no issue here.
+
+Yes, there is, we won't strip them out in string literals, but can we use
+"a[mogolian goodness knows what other format char]bs" for the absolute
+value operator?
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Saturday, February  5, 2005  4:15 AM
+
+Surely a clarification is required, because 6.1(10) is unclear (what does
+"correspond" mean?).  I suppose that the clarification could go either
+way, but my view would be that you take the sequence of characters from
+the string literal verbatim, and see if that sequence is appropriate for a
+reserved word.  As an example, say that - is the infamous soft hyphen.
+Starting from the string litteral "a-bs", you construct the token a-bs,
+you strip the other_format characters giving abs, and you end up with the
+reserved word abs.  In other word, the answer to your question would be
+yes in my view.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  8:05 AM
+
+My goodness, there seems no end to silliness here! What *possible*
+reasonable justification can you give for it being *useful* to have
+soft hyphens in the middle of the reserved word abs in a string.
+For reserved words in normal use, we do it because they are not
+distinguishable from identifiers, and though completely and
+totally useless to have soft hyphens in between the i and f
+of IF, it's presumably harmless, and we really need this rule
+to avoid implementation nonsense, given the (somewhat dubious)
+decision (which we follow because part of the standard) to allow
+them in identifiers.
+
+For reserved words in strings, the situation is exactly the
+opposite, strings normally do NOT ignore other-format characters.
+But when used as operators, you propose they do. This means a
+completely separate circuit in the compiler to do this stripping.
+
+Right, it's relatively easy to add this nonsense. Probably won't
+take more than ten minutes to implement, but it is annoying to
+even spend the ten minutes because:
+
+a) it is a completely useless feature
+
+b) it will waste time in the lexical analyzer for cases where
+wide characters are not used at all (since you still have to
+check for them in this special situation). So far, the wide
+wide character nonsense, though nastily expensive (binary
+searches on giant tables etc), only affects the programs
+(which won't exist in practice) that use wide wide characters
+in identifiers.
+
+Can't we get out of this mode of fascination with wide wide
+characters, and get into the mode of doing a reasonable
+minimal implementation of the standard.
+
+You are proposing a new feature here, without a shred of
+input that says it is useful to anyone at all.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Saturday, February  5, 2005 10:59 AM
+
+> My goodness, there seems no end to silliness here! What
+> *possible* reasonable justification can you give for it being
+> *useful* to have soft hyphens in the middle of the reserved
+> word abs in a string.
+
+No need to get all excited, I don't care one way or another, I could flip
+a coin, as long as things are well-defined in the RM.
+
+My assumption was that a compiler would have somewhere a routine to clean
+up an identifier (by removing other_format) and convert it to upper case.
+It seemed simpler implementation-wise to say that for operators you would
+obtain the sequence of characters from the string literal and pass it to
+the clean-and-upper-case routime.  I you think that's misguided, fine.
+
+At any rate, there is no user benefit one way or another, so we should do
+what's easiest for implementations.
+
+****************************************************************
+
+From: Dan Eilers
+Sent: Saturday, February  5, 2005  1:36 PM
+
+> It seemed simpler implementation-wise to say that for operators you would
+> obtain the sequence of characters from the string literal and pass it to
+> the clean-and-upper-case routime.
+
+I think you would need to clarify exactly when the "cleaning"
+occurs, so it is clear what happens for "**".
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  4:19 PM
+
+> No need to get all excited, I don't care one way or another, I could flip
+> a coin, as long as things are well-defined in the RM.
+
+Sorry for getting excited (well you know me, it is not really excitement,
+more a matter of argument style -- once at an ARG meeting JDI made a
+proposal and I said "That's the most ludicrous junky proposal I have
+heard in quite a whilte". JDI then said "Ah, now that Robert has pointed
+out that this is ludicrous and junky, I see I must be wrong :-) :-)
+Once I had a funny discussion with someone (can't remember who) on the
+ARG who assumed that JDI and I did not get along, when in fact we are
+close friends. I guess there is one way in which I am quite different,
+from Jean. I argue a point of view energetically, but if I can't convince
+a reasonable majority, I figure it is either because my point of view
+is wrong, or I am incompetent to present it. Either way, no point in
+pursuing things :-)
+
+> My assumption was that a compiler would have somewhere a routine to clean
+> up an identifier (by removing other_format) and convert it to upper case.
+> It seemed simpler implementation-wise to say that for operators you would
+> obtain the sequence of characters from the string literal and pass it to
+> the clean-and-upper-case routime.  I you think that's misguided, fine.
+
+That's extra overhead, because the more natural way of doing things is
+to discard other-format characters and do the case folding as an identifier
+is scanned. This is important, since otherwise you are going to
+have extra overhead of copying and pay a price even for the case of
+identifiers with no wide characters.
+
+> At any rate, there is no user benefit one way or another, so we should do
+> what's easiest for implementations.
+
+Well surely what is easiest for implementations is to do what they do
+now for Ada 95. The rules for identifiers have changed, the rules for
+operator symbols do not need to!
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  4:54 PM
+
+> I think you would need to clarify exactly when the "cleaning"
+> occurs, so it is clear what happens for "**".
+
+The easiest clarification is to just say that other format
+characters are either disallowed in strings or significant
+period.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Monday, February  7, 2005  4:15 AM
+
+> I think you would need to clarify exactly when the "cleaning"
+> occurs, so it is clear what happens for "**".
+
+You're right, "**" is problematic because the sequence of characters is
+not appropriate for an identifier.
+
+All the more justification for following Robert here: if you put an
+other_format in the string literal, it does not match one of the
+acceptable operator names.  So "*-*" or "a-bs" are not operator_symbols.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Saturday, February  5, 2005  4:48 PM
+
+I am a little worried about ambiguities introduced by new
+definititions in packages like Ada.Characters.Handling and
+Interfaces.C. Our ASIS builds got blown with a message:
+
+gnatelim-nodes.adb:252:17: ambiguous expression (cannot resolve "To_String")
+gnatelim-nodes.adb:252:17: possible interpretation at a-chahan.ads:116
+gnatelim-nodes.adb:252:17: possible interpretation at a-chahan.ads:112
+
+I did not look at the sources, but the declarations in question are:
+
+    function To_String
+      (Item       : Wide_Wide_String;
+       Substitute : Character := ' ')      return String;
+
+    function To_String
+      (Item       : Wide_String;
+       Substitute : Character := ' ')      return String;
+
+Now it is true that ASIS is one of the few applications to use
+wide_character and wide_string types. Furthermore, the code in
+this particular case is absurd:
+
+    return To_String ("EMPTY KEY!!!");
+
+but it is possible to imagine legitimate cases, e.g. something
+like To_String ("["A325"]");  -- get encoded version of wide char
+
+Wouldn't it be better to add these new declarations to a new
+child package Ada.Wide_Wide_Characters.Handling? In that case
+we could put all the wonderful 10646 categorization stuff there
+too either now or later.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Monday, February  7, 2005  4:45 AM
+
+Well, it's clear to me that Ada.Characters.Handling was botched in Ada 95
+in the sense that all the operations that involve the Wide_ types should
+have been part of a child named Ada.Wide_Characters.Handling.  Then we
+could just have replicated the structure for the Wide_Wide_ types.
+
+We could follow your suggestion, and if we did we would have for
+consistency to create a child Ada.Wide_Characters.Handling, too.
+Unfortunately, the Wide_ operations would have to remain in
+Ada.Characters.Handling and be renamed in Ada.Wide_Characters.Handling.
+This would give the impression that the Wide_ types are actually "more
+important" than the Wide_Wide_ types.
+
+My view is that Wide_Character should probably not be used in new
+applications (ignoring ASIS).  In this area, there are two categories of
+applications: those (the vast majority) which do not care about i18n, and
+use Character and will continue to ignore the fancy character sets; and
+those which do care about i18n.  The latter category should really be
+using Wide_Wide_Character, not Wide_Character.  The reason is that the BMP
+is where the Unicode guys stuffed the most frequently used characters, but
+the choice of what went into the BMP was somewhat arbitrary.  if you want
+to do a good job of supporting Asian languages, you must also handle the
+characters in plane 2 (SIP), ie use Wide_Wide_Character.
+
+I don't feel strongly, but it seems rather insufficiently broken.
+
+> but it is possible to imagine legitimate cases, e.g.
+> something like To_String ("["A325"]");  -- get encoded
+> version of wide char
+
+I don't quite understand this example, btw.  It seems that this call would
+return " " anyway, so it's not exactly useful.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February  7, 2005  11:36 AM
+
+> Well, it's clear to me that Ada.Characters.Handling was botched in Ada 95
+> in the sense that all the operations that involve the Wide_ types should
+> have been part of a child named Ada.Wide_Characters.Handling.  Then we
+> could just have replicated the structure for the Wide_Wide_ types.
+
+Fine but it is not our mission to improve the Ada 95 standard by
+creating gratuitous incompatibilities. We only introduce
+non-upwards incompatible changes if there is a really good argument.
+
+Here you have no good technical argument other than a matter of
+consistency and taste. That's not good enough for causing troubles
+by introducing incompatibilities.
+>
+> We could follow your suggestion, and if we did we would have for
+> consistency to create a child Ada.Wide_Characters.Handling, too.
+
+If you like, it's harmless (through fairly useless) to do so
+
+> Unfortunately, the Wide_ operations would have to remain in
+> Ada.Characters.Handling and be renamed in Ada.Wide_Characters.Handling.
+
+I agree, bug given your willingness to introduce incompatible change
+I am a little surprised at your statement of this as obvious.
+
+> This would give the impression that the Wide_ types are actually "more
+> important" than the Wide_Wide_ types.
+
+So what? In any case it's true, Wide_ is more important because of
+compatibility issues and current usage. Wide_Wide is not being put
+in because of overriding demand from real users!
+>
+> My view is that Wide_Character should probably not be used in new
+> applications (ignoring ASIS).
+
+But compatibility is about worrying about new applications, and why
+ignore ASIS. The majority of our users using Wide_Character are doing
+so in an ASIS context.
+
+I really think a case has not been made for introducing incompatibilities.
 
 ****************************************************************
 

Questions? Ask the ACAA Technical Agent