CVS difference for ais/ai-00400.txt

Differences between 1.5 and version 1.6
Log of other versions for file ais/ai-00400.txt

--- ais/ai-00400.txt	2005/10/31 05:18:41	1.5
+++ ais/ai-00400.txt	2006/01/10 22:17:50	1.6
@@ -261,4 +261,410 @@
 
 !appendix
 
+From: Gary Dismukes
+Sent: Monday, December  5, 2005  7:48 PM
+
+I'm posting this comment from Robert Dewar at his request...
+
+-----------------
+Comment on AI-400
+
+The recommendations of this AI seem seriously flawed.
+
+I have two major objections.
+
+First, the inclusion of Wide_Wide stuff in Ada.Exceptions and
+Ada.Tags seems a major mistake. This means that any program
+using tagged types or exception handling implicitly or
+explicitly will end up using Wide_Character and
+Wide_Wide_Character. That seems very unfortunate.
+Certainly it makes the No_Wide_Characters restriction
+that GNAT added quite useless, since in practice virtually
+every program will use wide wide characters. We have
+avoided this kind of entanglement up to now, and I
+think we should avoid it here.
+
+Note that the processing for Wide_Wide_Expanded_Name
+and Wide_Wide_Exception_Name is likely to include
+all the handling of the various complex formats for
+encoding wide [wide] character stuff. This is quite
+complex, and is quite a bit of code. We are very
+unhappy to see this code included in virtually every
+program.
+
+Second, this is really a major implementation pain, since
+the compiler itself is an Ada program which now unavoidably
+uses Ada 2005 features in the compiler itself, namely the
+Wide_Wide_Character support. This causes serious bootstrap
+problems.
+
+Yes, yes, we can work through this by having a compiler
+specific version of Ada.Exceptions but this introduces
+an enormous amount of complexity in the build process.
+It seems truly horrible to have to do this just for this
+rather obscure feature.
+
+Constructive suggestion: move the wide and wide_wide
+subprograms to child units. Nice names for these child
+units would be Wide and Wide_Wide, then you write
+
+   Ada.Exceptions.Wide.Exception_Name
+
+which reads quite as well as
+
+   Ada.Exceptions.Wide_Exception_Name
+
 ****************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, December  6, 2005  2:08 AM
+
+On the process:
+
+This comment comes sufficiently late that it certainly won't be taken into
+account in draft 15 which will go to WG9 later this week.  So if someone
+cares enough, it will have to be resubmitted through some official channel
+during the review period.  At any rate, this doesn't prevent the ARG from
+discussing it.
+
+On the substance:
+
+True, there is an implementation difficulty here, but I say tough luck.
+The language doesn't seem broken, and I don't see a reason to change it
+each time someone runs into a feature that it hard or inconvenient to
+implement.
+
+We are actually in the process of implementing this stuff, and by playing
+games with pragma Import (Ada) we believe that we will be able to only
+pull in the wide wide character encoding/decoding stuff when it's actually
+needed.  We also have a build process that knows how to handle three
+dialects with different predefined units (ran into this about 10 years
+ago).
+
+I realize that the proposed change is modest, but any change at this stage
+has to have a strong justification, and I just don't see the
+justification.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, December  6, 2005  1:16 PM
+
+> True, there is an implementation difficulty here, but I say tough luck.
+> The language doesn't seem broken, and I don't see a reason to change it
+> each time someone runs into a feature that it hard or inconvenient to
+> implement.
+
+Just so things are clear, the implementation difficulties are NOT
+my major objection. We can get around this (our approach will be to
+introduce implicit child units, which we do for the generic packages
+of Text_IO) in any case.
+
+I just think it is conceptually wrong for all Ada programs to drag in
+wide wide character unconditionally.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, December  6, 2005  1:48 PM
+
+> I just think it is conceptually wrong for all Ada programs to drag in
+> wide wide character unconditionally.
+
+One could argue that that is a very limited view; all programs ought to
+support international character sets because it is easy to do so. (I presume
+this is the reasoning behind the SC 22 mandate for international character
+support.)
+
+The problem is, of course, that Ada 2005 does not make it easy; there is far
+too much space waste to use Wide_Wide_Strings universal, and Ada has no
+language-defined support for UTF-8 strings (which, if universally supported,
+would meet all of the internationalization goals with little additional
+cost).
+
+Given that compilers probably will internally represent identifiers in UTF-8
+or a similar encoding, it's annoying that you have convert that to
+Wide_Wide, and then the user probably has to convert it back to UTF-8.
+
+Anyway, way too late to do anything about that for this iteration.
+
+In any case, we're talking about a fairly small piece of code (certainly
+smaller than tasking, and probably smaller than exception handling); it's
+hard to imagine that it would matter other than in extreme cases. The
+'Image/'Value code is probably the largest part of the code (because of the
+data tables), and I'd be surprised if it exceeded 4K.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, December  6, 2005  4:47 PM
+
+> One could argue that that is a very limited view; all programs ought to
+> support international character sets because it is easy to do so. (I presume
+> this is the reasoning behind the SC 22 mandate for international character
+> support.)
+
+That's a truly bizarre argument. Most of our customers programs barely
+deal with strings at all, it's just not part of the embedded scene. I
+suspect you think purely in native terms :-)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, December  6, 2005  5:09 PM
+
+> That's a truly bizarre argument. Most of our customers programs barely
+> deal with strings at all, it's just not part of the embedded scene. I
+> suspect you think purely in native terms :-)
+
+Surely, if you don't use any strings at all, you shouldn't load *any* string
+support. We're only talking about programs that use strings, and for which
+the additional overhead of supporting Wide_Wide_ strings is significant.
+That's a pretty small set, since the overhead is pretty small.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, December  6, 2005  5:23 PM
+
+> Surely, if you don't use any strings at all, you shouldn't load *any* string
+> support. We're only talking about programs that use strings, and for which
+> the additional overhead of supporting Wide_Wide_ strings is significant.
+> That's a pretty small set, since the overhead is pretty small.
+
+Well you are perhaps assuming that the linker can eliminate unused
+subprograms. Unfortunately such linker elimination technology is
+by no means universally supported on all operating systems, so in
+practice if a unit WITH's stuff, you get that stuff whether or not
+it is used.
+
+The problem in this case is that Ada_Exceptions body will end up
+withing the support for wide character encoding.
+
+P.S. I did not say "don't use any strings at all", please be careful
+not to distort my contributions :-), I said "barely deal with strings
+at all", which is rather different! Typically an embedded app may use
+strings for some messages to the external world, but I doubt me these
+have to be able to be in Chinese, especially when we are talking US
+weapons systems for instance :-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, December  6, 2005  4:46 PM
+
+> In any case, we're talking about a fairly small piece of code (certainly
+> smaller than tasking, and probably smaller than exception handling); it's
+> hard to imagine that it would matter other than in extreme cases. The
+> 'Image/'Value code is probably the largest part of the code (because of the
+> data tables), and I'd be surprised if it exceeded 4K.
+
+I guess you live in a different world, but if we told our users that
+Ada 2005 would cost them 4K increase in size even if they did not use
+any of its features, some would be upset.
+
+Actually the code is much larger than 4K, since it is not just UTF-8 that
+has to be supported, but lots of other formats that unlike UTF-8 are in
+actual wide use (such as Shift-JIS in the Japanese market, and the two
+byte upper half coding used in China).
+
+Luckily there are (fairly strenuous) tricks for avoiding any penalty
+at all if the features are not used (as I mentioned before, we use
+similar techniques to ensure that Text_IO does not drag in junk like
+fpt conversions when you don't need it).
+
+P.S. I am puzzled by the reference to 'Image and 'Value. I cannot imagine
+this code being loaded unless the feature was actually used.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, December  6, 2005  5:23 PM
+
+> I guess you live in a different world, but if we told our users that
+> Ada 2005 would cost them 4K increase in size even if they did not use
+> any of its features, some would be upset.
+
+Not at all; we've paid as much or more attention to space issues than any
+other Ada vendor. I wouldn't be surprised at all if there exist such
+customers. But every version of a compiler changes its space usage (because
+of bug fixes, new features, whatever), and some of those increase it. But
+most customers find that net size is smaller or irrelevantly bigger.
+Certainly 4K (and I doubt it would be that large) would fall into the noise
+for almost all customers.
+
+> Actually the code is much larger than 4K, since it is not just UTF-8 that
+> has to be supported, but lots of other formats that unlike UTF-8 are in
+> actual wide use (such as Shift-JIS in the Japanese market, and the two
+> byte upper half coding used in China).
+
+Huh? We're only talking about the runtime routines that return the names of
+Ada identifiers. A compiler is going to code these in only one way; the
+encoding is not visible to the users so it should be irrelevant to them. The
+runtime need support only that one decoding (and no encoding) for this
+purpose -- that keeps the size down. Indeed, you'd need no encoding at all
+if you simply stored identifiers in Wide_Wide_Strings (I think that's
+impractical for space reasons, but it is an option).
+
+Other encodings can be needed, of course, but they can be (and should be)
+limited to other parts of the runtime (Text_IO, implementation-defined
+libraries), and they thus are not required by all programs, only the
+programs that need them.
+
+> Luckily there are (fairly strenuous) tricks for avoiding any penalty
+> at all if the features are not used (as I mentioned before, we use
+> similar techniques to ensure that Text_IO does not drag in junk like
+> fpt conversions when you don't need it).
+>
+> P.S. I am puzzled by the reference to 'Image and 'Value. I cannot imagine
+> this code being loaded unless the feature was actually used.
+
+Well, I was thinking of our implementation. Because of shared generics,
+there is a single discrete Image routine (and a matching Value routine). The
+Image routine is used by the default exception handler's message code, so
+removing it is impossible. (Of course, an embedded user could recompile the
+runtime without the default exception handler, and some did, in which case
+Image could be removed.) Value is loaded only if it is used, but that is
+*any* discrete use (including Text_IO), so it usually is loaded. In any
+case, these routines are all quite small (only a couple of hundred bytes),
+so there isn't much advantage to not loading them. (The fixed and floating
+point versions are a different story, of course). YMMV.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Tuesday, December  6, 2005  5:32 PM
+
+>>Actually the code is much larger than 4K, since it is not just UTF-8 that
+>>has to be supported, but lots of other formats that unlike UTF-8 are in
+>>actual wide use (such as Shift-JIS in the Japanese market, and the two
+>>byte upper half coding used in China).
+>
+> Huh? We're only talking about the runtime routines that return the names of
+> Ada identifiers. A compiler is going to code these in only one way; the
+> encoding is not visible to the users so it should be irrelevant to them. The
+> runtime need support only that one decoding (and no encoding) for this
+> purpose -- that keeps the size down. Indeed, you'd need no encoding at all
+> if you simply stored identifiers in Wide_Wide_Strings (I think that's
+> impractical for space reasons, but it is an option).
+
+That's not the way things work for us. The name of an identifier, for
+use by Expanded_Name and Exception_Name is stored by the compiler using
+the encoding that was used in the source. That's what you want for
+Expanded_Name, since then if you display this expanded name as a normal
+string, it will look correct on the screen, the same way it looked
+when the source program was displayed. It would be quite useless if
+for example UTF-8 were always used, since if you are in a shift-JIS
+environment, the resulting string would display as gobbledygook. Most
+likely Expanded_Name is going to be used for display purposes.
+
+Wide_Expanded_Name and Wide_Wide_Expanded_Name could be specially handled
+by the compiler, but that's heavy for features that will likely never be
+used except in ACATS tests, so the easiest thing is simply to translate
+the String to Wide_String or Wide_Wide_String using the appropriate
+encoding. That's the least source code, and it hardly seems worth
+spending much effort to optimize these functions!
+
+> Well, I was thinking of our implementation. Because of shared generics,
+> there is a single discrete Image routine (and a matching Value routine). The
+> Image routine is used by the default exception handler's message code, so
+> removing it is impossible. (Of course, an embedded user could recompile the
+> runtime without the default exception handler, and some did, in which case
+> Image could be removed.) Value is loaded only if it is used, but that is
+> *any* discrete use (including Text_IO), so it usually is loaded. In any
+> case, these routines are all quite small (only a couple of hundred bytes),
+> so there isn't much advantage to not loading them. (The fixed and floating
+> point versions are a different story, of course). YMMV.
+
+OK, but that has nothing to do with the case at hand.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, December  6, 2005  6:02 PM
+
+> That's not the way things work for us. The name of an identifier, for
+> use by Expanded_Name and Exception_Name is stored by the compiler using
+> the encoding that was used in the source. That's what you want for
+> Expanded_Name, since then if you display this expanded name as a normal
+> string, it will look correct on the screen, the same way it looked
+> when the source program was displayed. It would be quite useless if
+> for example UTF-8 were always used, since if you are in a shift-JIS
+> environment, the resulting string would display as gobbledygook. Most
+> likely Expanded_Name is going to be used for display purposes.
+
+That's fine, but that's certainly outside of the Ada standard. If a user
+wants to do that portably in Ada 2005, they'll use Wide_Wide_Expanded_Name
+to get the identifier, and then use some appropriate output routine to
+format it into the local encoding. That is, Expanded_Name is intended to be
+used only for the 8-bit version of the identifier; if you expect/want to get
+extended characters, use the Wide_ or Wide_Wide_ versions.
+
+Now, what usually happens when I make some argument like yours, is that
+someone says that "you're welcome to go beyond the Ada Standard, but then
+its your obligation to work however hard is required to make that work".
+Followed by "don't cry to us, you *chose* to work harder than required by
+the standard". It's just the old joke "Patient: Doctor, it hurts when I do
+this. Doctor: Then don't do that!".
+
+So, I don't expect a lot of sympathy. :-)
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Wednesday, December  7, 2005  2:45 AM
+
+> That's not the way things work for us. The name of an
+> identifier, for use by Expanded_Name and Exception_Name is
+> stored by the compiler using the encoding that was used in
+> the source. That's what you want for Expanded_Name, since
+> then if you display this expanded name as a normal string, it
+> will look correct on the screen, the same way it looked when
+> the source program was displayed. It would be quite useless
+> if for example UTF-8 were always used, since if you are in a
+> shift-JIS environment, the resulting string would display as
+> gobbledygook.
+
+I find this approach curious.  On the face of it, it would seem that two
+users working side by side on environment with different configurations
+would not be able to share programs: the UTF-8 guy having created a
+program that includes a string literal encoded in UTF-8, this program is
+going to spit out gobbledygook on the shift-JIS guy's screen.
+
+But maybe you have options to accommodate this (arguably rare) situation?
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Wednesday, December  7, 2005  6:20 PM
+
+I can't imagine such a side by side situation in practice (we are
+talking about Japan here, where shift-JIS is standard). We don't
+accept multiple wide char representations in the same source (it
+is impossible to do this, the multiple representations are not
+mutually compatible) [though we always accept brackets notation]
+
+So if I am a Shift-JIS guy, and I get a program that is UTF-8 encoded,
+I have to know about this. I have two options
+
+a) compile with utf-8 encoding
+
+But in this case, the source program itself looks like gobbledygook
+on my screen, and it seems quite right that Expanded_Name should
+faithfully reproduce this gobbledygook.
+
+b) more likely, translate sources to my native representation
+
+Now if the binary is being handed around, it seems a plain bad
+idea to output a Expanded_Name value. It is not reasonably possible
+to make this output something sensible in all environments. If
+you want this kind of portability, then you should do as Randy says.
+Use Wide_Wide_Expanded_Name, and output the result in a careful
+portable manner.
+
+In practice programs that output wide character stuff are just
+not easily portable between environments, real care has to be
+taken to achieve this.
+
+****************************************************************
+

Questions? Ask the ACAA Technical Agent