!standard 3.9(7) 05-06-01 AI95-00400/04 !standard 3.9(10) !standard 11.4.1(2) !standard 11.4.1(5) !standard 11.4.1(12) !standard C.5(7) !class amendment 05-01-25 !status Amendment 200Y 05-03-03 !status ARG Approved 10-0-0 05-02-13 !status work item 05-01-25 !status received 05-01-25 !priority High !difficulty Easy !subject Wide_ and Wide_Wide_ images of identifiers !summary (See proposal.) !problem Now that identifiers can use 16- and 32-bit characters, the operations that construct the image of identifiers may end up having to create Wide_ or Wide_Wide_Strings. This was acknowledged by the introduction of attributes Wide_Wide_Image, Wide_Wide_Value, etc. However, there are functions in packages Ada.Tags and Ada.Exceptions which return the image of the (full) name of a tag or an exception. At a minimum, we must specify what these functions do when they encounter an identifier with a character which is not in subtype Character. Note that this AI is *not* proposing to introduce Wide_ or Wide_Wide_String variants of the numerous functions that operate on String in the predefined units (e.g., Ada.Text_IO.Open). This would be a much bigger change, and we haven't seen demand for this. What this AI is doing is fixing an inconsistency that was missed when 16- and 32-bit characters were introduced. !proposal (See wording.) !wording Change 3.9(7) to include the following declarations function Expanded_Name(T : Tag) return String; function Wide_Expanded_Name(T : Tag) return Wide_String; function Wide_Wide_Expanded_Name(T : Tag) return Wide_Wide_String; Change 3.9(10) to read: The function Wide_Wide_Expanded_Name returns the full expanded name of the first subtype of the specific type identified by the tag, in upper case, starting with a root library unit. The result is implementation defined if the type is declared within an unnamed block_statement. The function Expanded_Name (respectively, Wide_Expanded_Name) returns the same sequence of graphic characters as that defined for Wide_Wide_Expanded_Name, if all the graphic characters are defined in Character (respectively, Wide_Character); otherwise, the sequence of characters is implementation defined, but no shorter than that returned by Wide_Wide_Expanded_Name for the same value of the argument. Change 11.4.1(2) to include the following declarations: function Exception_Name(Id : Exception_Id) return String; function Wide_Exception_Name(Id : Exception_Id) return Wide_String; function Wide_Wide_Exception_Name(Id : Exception_Id) return Wide_Wide_String; Change 11.4.1(5) to include the following declarations: function Exception_Name(X : Exception_Occurrence) return String; function Wide_Exception_Name(X : Exception_Occurrence) return Wide_String; function Wide_Wide_Exception_Name(X : Exception_Occurrence) return Wide_Wide_String; Change 11.4.1(12) to read: The Wide_Wide_Exception_Name functions return the full expanded name of the exception, in upper case, starting with a root library unit. For an exception declared immediately within package Standard, the defining_identifier is returned. The result is implementation defined if the exception is declared within an unnamed block_statement. The Exception_Name functions (respectively, Wide_Exception_Name) return the same sequence of graphic characters as that defined for Wide_Wide_Exception_Name, if all the graphic characters are defined in Character (respectively, Wide_Character); otherwise, the sequence of characters is implementation defined, but no shorter than that returned by Wide_Wide_Exception_Name for the same value of the argument. Change C.5(7) to read: If the pragma applies to an enumeration type, then the semantics of the Wide_Image and Wide_Value attributes are implementation defined for that type; the semantics of Image and Value are still defined in terms of Wide_Image and Wide_Value. In addition, the semantics of Text_IO.Enumeration_IO are implementation defined. If the pragma applies to a tagged type, then the semantics of the Tags.Wide_Wide_Expanded_Name function are implementation defined for that type; the semantics of Tags.Expanded_Name and Tags.Wide_Expanded_Name are still defined in terms of Tags.Wide_Wide_Expanded_Name. If the pragma applies to an exception, then the semantics of the Exceptions.Wide_Wide_Exception_Name function are implementation defined for that exception; the semantics of Exceptions.Exception_Name and Exceptions.Wide_Exception_Name are still defined in terms of Tags.Wide_Wide_Expanded_Name. !discussion The following approaches were considered: 1 - Don't change the specification of Ada.Tags and Ada.Exceptions, the functions return an implementation-defined string when the full name contains a wide or wide-wide character. Users who take advantage of the extended character set cannot count on any portability in this area. 2 - Change the existing functions to return Wide_Wide_String instead of String. Presumably this is the worst incompatibility, as many existing usages would have to be changed, including code that prints or stores these strings for logging purposes. 3 - Add extra overloads returning Wide_String and Wide_Wide_String. This introduces some incompatibilities, mostly in code that uses these functions in conjunction with string literals. 4 - Add new functions with distinct names. This still introduces incompatibilities, but only in code that has use clauses and declares identifiers that clash, and this is extremely unlikely. This AI was written for option 4. The wording is then similar to that for the Wide_Wide_ attributes: the semantics is well defined for the function returning Wide_Wide_String, and for the others as long as the full name is only made up of Characters (resp. Wide_Characters). Otherwise, the semantics is implementation defined. !example !corrigendum 3.9(7) @drepl @xcode< @b Expanded_Name(T : Tag) @b String; @b External_Tag(T : Tag) @b String; @b Internal_Tag(External : String) @b Tag;> @dby @xcode< @b Expanded_Name(T : Tag) @b String; @b Wide_Expanded_Name(T : Tag) @b Wide_String; @b Wide_Wide_Expanded_Name(T : Tag) @b Wide_Wide_String; @b External_Tag(T : Tag) @b String; @b Internal_Tag(External : String) @b Tag;> !corrigendum 3.9(10) @drepl The function Expanded_Name returns the full expanded name of the first subtype of the specific type identified by the tag, in upper case, starting with a root library unit. The result is implementation defined if the type is declared within an unnamed @fa. @dby The function Wide_Wide_Expanded_Name returns the full expanded name of the first subtype of the specific type identified by the tag, in upper case, starting with a root library unit. The result is implementation defined if the type is declared within an unnamed @fa. The function Expanded_Name (respectively, Wide_Expanded_Name) returns the same sequence of graphic characters as that defined for Wide_Wide_Expanded_Name, if all the graphic characters are defined in Character (respectively, Wide_Character); otherwise, the sequence of characters is implementation defined, but no shorter than that returned by Wide_Wide_Expanded_Name for the same value of the argument. !corrigendum 11.4.1(2) @drepl @xcode<@b Ada.Exceptions @b @b Exception_Id @b; Null_Id : @b Exception_Id; @b Exception_Name(Id : Exception_Id) @b String;> @dby @xcode<@b Ada.Exceptions @b @b Exception_Id @b; Null_Id : @b Exception_Id; @b Exception_Name(Id : Exception_Id) @b String; @b Wide_Exception_Name(Id : Exception_Id) @b Wide_String; @b Wide_Wide_Exception_Name(Id : Exception_Id) @b Wide_Wide_String;> !corrigendum 11.4.1(5) @drepl @xcode< @b Exception_Identity(X : Exception_Occurrence) @b Exception_Id; @b Exception_Name(X : Exception_Occurrence) @b String; -- @ft<@i> @b Exception_Information(X : Exception_Occurrence) @b String;> @dby @xcode< @b Exception_Identity(X : Exception_Occurrence) @b Exception_Id; @b Exception_Name(X : Exception_Occurrence) @b String; -- @ft<@i> @b Wide_Exception_Name(X : Exception_Occurrence) @b Wide_String; -- @ft<@i> @b Wide_Wide_Exception_Name(X : Exception_Occurrence) @b Wide_Wide_String; -- @ft<@i> @b Exception_Information(X : Exception_Occurrence) @b String;> !corrigendum 11.4.1(12) @drepl The Exception_Name functions return the full expanded name of the exception, in upper case, starting with a root library unit. For an exception declared immediately within package Standard, the @fa is returned. The result is implementation defined if the exception is declared within an unnamed @fa. @dby The Wide_Wide_Exception_Name functions return the full expanded name of the exception, in upper case, starting with a root library unit. For an exception declared immediately within package Standard, the @fa is returned. The result is implementation defined if the exception is declared within an unnamed @fa. The Exception_Name functions (respectively, Wide_Exception_Name) return the same sequence of graphic characters as that defined for Wide_Wide_Exception_Name, if all the graphic characters are defined in Character (respectively, Wide_Character); otherwise, the sequence of characters is implementation defined, but no shorter than that returned by Wide_Wide_Exception_Name for the same value of the argument. !corrigendum C.5(7) @drepl If the pragma applies to an enumeration type, then the semantics of the Wide_Image and Wide_Value attributes are implementation defined for that type; the semantics of Image and Value are still defined in terms of Wide_Image and Wide_Value. In addition, the semantics of Text_IO.Enumeration_IO are implementation defined. If the pragma applies to a tagged type, then the semantics of the Tags.Expanded_Name function are implementation defined for that type. If the pragma applies to an exception, then the semantics of the Exceptions.Exception_Name function are implementation defined for that exception. @dby If the pragma applies to an enumeration type, then the semantics of the Wide_Image and Wide_Value attributes are implementation defined for that type; the semantics of Image and Value are still defined in terms of Wide_Image and Wide_Value. In addition, the semantics of Text_IO.Enumeration_IO are implementation defined. If the pragma applies to a tagged type, then the semantics of the Tags.Wide_Wide_Expanded_Name function are implementation defined for that type; the semantics of Tags.Expanded_Name and Tags.Wide_Expanded_Name are still defined in terms of Tags.Wide_Wide_Expanded_Name. If the pragma applies to an exception, then the semantics of the Exceptions.Wide_Wide_Exception_Name function are implementation defined for that exception; the semantics of Exceptions.Exception_Name and Exceptions.Wide_Exception_Name are still defined in terms of Tags.Wide_Wide_Expanded_Name. !ACATS test ACATS C-Test(s) should be created to check that these functions exist and work as specified. (This will require Unicode source code for these tests.) !appendix From: Gary Dismukes Sent: Monday, December 5, 2005 7:48 PM I'm posting this comment from Robert Dewar at his request... ----------------- Comment on AI-400 The recommendations of this AI seem seriously flawed. I have two major objections. First, the inclusion of Wide_Wide stuff in Ada.Exceptions and Ada.Tags seems a major mistake. This means that any program using tagged types or exception handling implicitly or explicitly will end up using Wide_Character and Wide_Wide_Character. That seems very unfortunate. Certainly it makes the No_Wide_Characters restriction that GNAT added quite useless, since in practice virtually every program will use wide wide characters. We have avoided this kind of entanglement up to now, and I think we should avoid it here. Note that the processing for Wide_Wide_Expanded_Name and Wide_Wide_Exception_Name is likely to include all the handling of the various complex formats for encoding wide [wide] character stuff. This is quite complex, and is quite a bit of code. We are very unhappy to see this code included in virtually every program. Second, this is really a major implementation pain, since the compiler itself is an Ada program which now unavoidably uses Ada 2005 features in the compiler itself, namely the Wide_Wide_Character support. This causes serious bootstrap problems. Yes, yes, we can work through this by having a compiler specific version of Ada.Exceptions but this introduces an enormous amount of complexity in the build process. It seems truly horrible to have to do this just for this rather obscure feature. Constructive suggestion: move the wide and wide_wide subprograms to child units. Nice names for these child units would be Wide and Wide_Wide, then you write Ada.Exceptions.Wide.Exception_Name which reads quite as well as Ada.Exceptions.Wide_Exception_Name **************************************************************** From: Pascal Leroy Sent: Tuesday, December 6, 2005 2:08 AM On the process: This comment comes sufficiently late that it certainly won't be taken into account in draft 15 which will go to WG9 later this week. So if someone cares enough, it will have to be resubmitted through some official channel during the review period. At any rate, this doesn't prevent the ARG from discussing it. On the substance: True, there is an implementation difficulty here, but I say tough luck. The language doesn't seem broken, and I don't see a reason to change it each time someone runs into a feature that it hard or inconvenient to implement. We are actually in the process of implementing this stuff, and by playing games with pragma Import (Ada) we believe that we will be able to only pull in the wide wide character encoding/decoding stuff when it's actually needed. We also have a build process that knows how to handle three dialects with different predefined units (ran into this about 10 years ago). I realize that the proposed change is modest, but any change at this stage has to have a strong justification, and I just don't see the justification. **************************************************************** From: Robert Dewar Sent: Tuesday, December 6, 2005 1:16 PM > True, there is an implementation difficulty here, but I say tough luck. > The language doesn't seem broken, and I don't see a reason to change it > each time someone runs into a feature that it hard or inconvenient to > implement. Just so things are clear, the implementation difficulties are NOT my major objection. We can get around this (our approach will be to introduce implicit child units, which we do for the generic packages of Text_IO) in any case. I just think it is conceptually wrong for all Ada programs to drag in wide wide character unconditionally. **************************************************************** From: Randy Brukardt Sent: Tuesday, December 6, 2005 1:48 PM > I just think it is conceptually wrong for all Ada programs to drag in > wide wide character unconditionally. One could argue that that is a very limited view; all programs ought to support international character sets because it is easy to do so. (I presume this is the reasoning behind the SC 22 mandate for international character support.) The problem is, of course, that Ada 2005 does not make it easy; there is far too much space waste to use Wide_Wide_Strings universal, and Ada has no language-defined support for UTF-8 strings (which, if universally supported, would meet all of the internationalization goals with little additional cost). Given that compilers probably will internally represent identifiers in UTF-8 or a similar encoding, it's annoying that you have convert that to Wide_Wide, and then the user probably has to convert it back to UTF-8. Anyway, way too late to do anything about that for this iteration. In any case, we're talking about a fairly small piece of code (certainly smaller than tasking, and probably smaller than exception handling); it's hard to imagine that it would matter other than in extreme cases. The 'Image/'Value code is probably the largest part of the code (because of the data tables), and I'd be surprised if it exceeded 4K. **************************************************************** From: Robert Dewar Sent: Tuesday, December 6, 2005 4:47 PM > One could argue that that is a very limited view; all programs ought to > support international character sets because it is easy to do so. (I presume > this is the reasoning behind the SC 22 mandate for international character > support.) That's a truly bizarre argument. Most of our customers programs barely deal with strings at all, it's just not part of the embedded scene. I suspect you think purely in native terms :-) **************************************************************** From: Randy Brukardt Sent: Tuesday, December 6, 2005 5:09 PM > That's a truly bizarre argument. Most of our customers programs barely > deal with strings at all, it's just not part of the embedded scene. I > suspect you think purely in native terms :-) Surely, if you don't use any strings at all, you shouldn't load *any* string support. We're only talking about programs that use strings, and for which the additional overhead of supporting Wide_Wide_ strings is significant. That's a pretty small set, since the overhead is pretty small. **************************************************************** From: Robert Dewar Sent: Tuesday, December 6, 2005 5:23 PM > Surely, if you don't use any strings at all, you shouldn't load *any* string > support. We're only talking about programs that use strings, and for which > the additional overhead of supporting Wide_Wide_ strings is significant. > That's a pretty small set, since the overhead is pretty small. Well you are perhaps assuming that the linker can eliminate unused subprograms. Unfortunately such linker elimination technology is by no means universally supported on all operating systems, so in practice if a unit WITH's stuff, you get that stuff whether or not it is used. The problem in this case is that Ada_Exceptions body will end up withing the support for wide character encoding. P.S. I did not say "don't use any strings at all", please be careful not to distort my contributions :-), I said "barely deal with strings at all", which is rather different! Typically an embedded app may use strings for some messages to the external world, but I doubt me these have to be able to be in Chinese, especially when we are talking US weapons systems for instance :-) **************************************************************** From: Robert Dewar Sent: Tuesday, December 6, 2005 4:46 PM > In any case, we're talking about a fairly small piece of code (certainly > smaller than tasking, and probably smaller than exception handling); it's > hard to imagine that it would matter other than in extreme cases. The > 'Image/'Value code is probably the largest part of the code (because of the > data tables), and I'd be surprised if it exceeded 4K. I guess you live in a different world, but if we told our users that Ada 2005 would cost them 4K increase in size even if they did not use any of its features, some would be upset. Actually the code is much larger than 4K, since it is not just UTF-8 that has to be supported, but lots of other formats that unlike UTF-8 are in actual wide use (such as Shift-JIS in the Japanese market, and the two byte upper half coding used in China). Luckily there are (fairly strenuous) tricks for avoiding any penalty at all if the features are not used (as I mentioned before, we use similar techniques to ensure that Text_IO does not drag in junk like fpt conversions when you don't need it). P.S. I am puzzled by the reference to 'Image and 'Value. I cannot imagine this code being loaded unless the feature was actually used. **************************************************************** From: Randy Brukardt Sent: Tuesday, December 6, 2005 5:23 PM > I guess you live in a different world, but if we told our users that > Ada 2005 would cost them 4K increase in size even if they did not use > any of its features, some would be upset. Not at all; we've paid as much or more attention to space issues than any other Ada vendor. I wouldn't be surprised at all if there exist such customers. But every version of a compiler changes its space usage (because of bug fixes, new features, whatever), and some of those increase it. But most customers find that net size is smaller or irrelevantly bigger. Certainly 4K (and I doubt it would be that large) would fall into the noise for almost all customers. > Actually the code is much larger than 4K, since it is not just UTF-8 that > has to be supported, but lots of other formats that unlike UTF-8 are in > actual wide use (such as Shift-JIS in the Japanese market, and the two > byte upper half coding used in China). Huh? We're only talking about the runtime routines that return the names of Ada identifiers. A compiler is going to code these in only one way; the encoding is not visible to the users so it should be irrelevant to them. The runtime need support only that one decoding (and no encoding) for this purpose -- that keeps the size down. Indeed, you'd need no encoding at all if you simply stored identifiers in Wide_Wide_Strings (I think that's impractical for space reasons, but it is an option). Other encodings can be needed, of course, but they can be (and should be) limited to other parts of the runtime (Text_IO, implementation-defined libraries), and they thus are not required by all programs, only the programs that need them. > Luckily there are (fairly strenuous) tricks for avoiding any penalty > at all if the features are not used (as I mentioned before, we use > similar techniques to ensure that Text_IO does not drag in junk like > fpt conversions when you don't need it). > > P.S. I am puzzled by the reference to 'Image and 'Value. I cannot imagine > this code being loaded unless the feature was actually used. Well, I was thinking of our implementation. Because of shared generics, there is a single discrete Image routine (and a matching Value routine). The Image routine is used by the default exception handler's message code, so removing it is impossible. (Of course, an embedded user could recompile the runtime without the default exception handler, and some did, in which case Image could be removed.) Value is loaded only if it is used, but that is *any* discrete use (including Text_IO), so it usually is loaded. In any case, these routines are all quite small (only a couple of hundred bytes), so there isn't much advantage to not loading them. (The fixed and floating point versions are a different story, of course). YMMV. **************************************************************** From: Robert Dewar Sent: Tuesday, December 6, 2005 5:32 PM >>Actually the code is much larger than 4K, since it is not just UTF-8 that >>has to be supported, but lots of other formats that unlike UTF-8 are in >>actual wide use (such as Shift-JIS in the Japanese market, and the two >>byte upper half coding used in China). > > Huh? We're only talking about the runtime routines that return the names of > Ada identifiers. A compiler is going to code these in only one way; the > encoding is not visible to the users so it should be irrelevant to them. The > runtime need support only that one decoding (and no encoding) for this > purpose -- that keeps the size down. Indeed, you'd need no encoding at all > if you simply stored identifiers in Wide_Wide_Strings (I think that's > impractical for space reasons, but it is an option). That's not the way things work for us. The name of an identifier, for use by Expanded_Name and Exception_Name is stored by the compiler using the encoding that was used in the source. That's what you want for Expanded_Name, since then if you display this expanded name as a normal string, it will look correct on the screen, the same way it looked when the source program was displayed. It would be quite useless if for example UTF-8 were always used, since if you are in a shift-JIS environment, the resulting string would display as gobbledygook. Most likely Expanded_Name is going to be used for display purposes. Wide_Expanded_Name and Wide_Wide_Expanded_Name could be specially handled by the compiler, but that's heavy for features that will likely never be used except in ACATS tests, so the easiest thing is simply to translate the String to Wide_String or Wide_Wide_String using the appropriate encoding. That's the least source code, and it hardly seems worth spending much effort to optimize these functions! > Well, I was thinking of our implementation. Because of shared generics, > there is a single discrete Image routine (and a matching Value routine). The > Image routine is used by the default exception handler's message code, so > removing it is impossible. (Of course, an embedded user could recompile the > runtime without the default exception handler, and some did, in which case > Image could be removed.) Value is loaded only if it is used, but that is > *any* discrete use (including Text_IO), so it usually is loaded. In any > case, these routines are all quite small (only a couple of hundred bytes), > so there isn't much advantage to not loading them. (The fixed and floating > point versions are a different story, of course). YMMV. OK, but that has nothing to do with the case at hand. **************************************************************** From: Randy Brukardt Sent: Tuesday, December 6, 2005 6:02 PM > That's not the way things work for us. The name of an identifier, for > use by Expanded_Name and Exception_Name is stored by the compiler using > the encoding that was used in the source. That's what you want for > Expanded_Name, since then if you display this expanded name as a normal > string, it will look correct on the screen, the same way it looked > when the source program was displayed. It would be quite useless if > for example UTF-8 were always used, since if you are in a shift-JIS > environment, the resulting string would display as gobbledygook. Most > likely Expanded_Name is going to be used for display purposes. That's fine, but that's certainly outside of the Ada standard. If a user wants to do that portably in Ada 2005, they'll use Wide_Wide_Expanded_Name to get the identifier, and then use some appropriate output routine to format it into the local encoding. That is, Expanded_Name is intended to be used only for the 8-bit version of the identifier; if you expect/want to get extended characters, use the Wide_ or Wide_Wide_ versions. Now, what usually happens when I make some argument like yours, is that someone says that "you're welcome to go beyond the Ada Standard, but then its your obligation to work however hard is required to make that work". Followed by "don't cry to us, you *chose* to work harder than required by the standard". It's just the old joke "Patient: Doctor, it hurts when I do this. Doctor: Then don't do that!". So, I don't expect a lot of sympathy. :-) **************************************************************** From: Pascal Leroy Sent: Wednesday, December 7, 2005 2:45 AM > That's not the way things work for us. The name of an > identifier, for use by Expanded_Name and Exception_Name is > stored by the compiler using the encoding that was used in > the source. That's what you want for Expanded_Name, since > then if you display this expanded name as a normal string, it > will look correct on the screen, the same way it looked when > the source program was displayed. It would be quite useless > if for example UTF-8 were always used, since if you are in a > shift-JIS environment, the resulting string would display as > gobbledygook. I find this approach curious. On the face of it, it would seem that two users working side by side on environment with different configurations would not be able to share programs: the UTF-8 guy having created a program that includes a string literal encoded in UTF-8, this program is going to spit out gobbledygook on the shift-JIS guy's screen. But maybe you have options to accommodate this (arguably rare) situation? **************************************************************** From: Robert Dewar Sent: Wednesday, December 7, 2005 6:20 PM I can't imagine such a side by side situation in practice (we are talking about Japan here, where shift-JIS is standard). We don't accept multiple wide char representations in the same source (it is impossible to do this, the multiple representations are not mutually compatible) [though we always accept brackets notation] So if I am a Shift-JIS guy, and I get a program that is UTF-8 encoded, I have to know about this. I have two options a) compile with utf-8 encoding But in this case, the source program itself looks like gobbledygook on my screen, and it seems quite right that Expanded_Name should faithfully reproduce this gobbledygook. b) more likely, translate sources to my native representation Now if the binary is being handed around, it seems a plain bad idea to output a Expanded_Name value. It is not reasonably possible to make this output something sensible in all environments. If you want this kind of portability, then you should do as Randy says. Use Wide_Wide_Expanded_Name, and output the result in a careful portable manner. In practice programs that output wide character stuff are just not easily portable between environments, real care has to be taken to achieve this. ****************************************************************