!standard A.10(5) 07-12-06 AC95-00151/01 !class confirmation 07-12-06 !status received no action 07-12-06 !status received 07-10-31 !subject Standard_Output of different Text_IO packages !summary !appendix !topic Standard output of different Text_IO packages !reference RM05 A.10, A.11 !from Adam Beneschan 07-10-31 !discussion How distinct, and how interconnected, are the "standard output" files that can be returned by the three different Text_IO packages (Ada.Text_IO, Ada.Wide_Text_IO, Ada.Wide_Wide_Text_IO)? Are they really the same file, although they have three different file types and thus have to be different file objects? Or are they handled totally separately? Suppose this code appears in a program that does not call any Set_Output routine, so that all these Put_Line's will go to the corresponding standard output (Ada.Text_IO.Standard_Output or Ada.Wide_Text_IO.Standard_Output). Suppose further that the runtime buffers the output, so that at the point noted by the comment, no actual output has been done to any external file. (Also, we can assume that Wide_Text_IO outputs using UTF-8 or some such encoding so that Wide_Text_IO.Put_Line("eee") has exactly the same external effect as Text_IO.Put_Line("eee").) procedure Intersperse_Output is begin Ada.Wide_Text_IO.Put_Line ("First"); Ada.Text_IO.Put_Line ("Second"); Ada.Wide_Text_IO.Put_Line ("Third"); --- ALL OUTPUT IS STILL BUFFERED, NO EXTERNAL OUTPUT HAS --- APPEARED Ada.Text_IO.Flush (Ada.Text_IO.Standard_Output); end Intersperse_Output; Is the implementation correct or incorrect if, after Intersperse_Output is done, only "Second" has been output, and not "First" or "Third"? If all the output is buffered, so that no output is done before the Flush, but Flush causes "First", "Second", and "Third" to be output in that order, is the implementation incorrect because Text_IO.Flush isn't supposed to have any effect on output written to Wide_Text_IO.Standard_Output? I've studied the sections involved but I'm not sure how to interpret them. A.10(5) says that at the beginning of program execution there is a standard output file. A.10(1), however, says that this clause describes Text_IO, so it wouldn't necessarily apply to Wide_Text_IO or Wide_Wide_Text_IO. A.11 describes how the *specifications* of Wide_Text_IO and Wide_Wide_Text_IO are derived from that of Text_IO, but it really doesn't address other aspects of the definition of Text_IO---for instance, is A.10(5) replicated for the other two packages; or is there still just one standard output file, and if so, what are the ramifications of that fact? What exactly is the intent? (P.S. I realize that it's probably inadvisable to use Text_IO.Put_Line and Wide_Text_IO.Put_Line in the same program in this way, particularly if the Text_IO could be used to output characters in the Character'Val(128)..Character'Val(255) range.) **************************************************************** From: Tucker Taft Sent: Wednesday, October 31, 2007 1:55 PM Interesting questions. Given the growing use of UTF-8, I would recommend that there be only one standard output (internal) File, and that all of the *Text_IO packages use this same internal File, and you could Flush the output using any one of the packages. Of course there is nothing in the manual that says this explicitly, but you could perhaps say it is implied by A.10(5) saying there is only one standard output, and by deciding what would be most useful. Having each package do its own buffering of standard output would not be particularly helpful for anyone. At least an AARM note is in order, but perhaps some wording changes to clarify the desired rule. **************************************************************** From: Adam Beneschan Sent: Wednesday, October 31, 2007 2:37 PM The encoding issue is an interesting one. I'm not sure what impact it has on my questions, except possibly to say that encoding issues mean that you just shouldn't use both Text_IO and Wide_Text_IO on standard output so it really doesn't matter how an implementation handles it. Even though UTF-8 use may be growing, I still see lots of mail with characters in the 160-255 (or 128-255) range, represented simply as single bytes (rather than as multiple bytes the way UTF-8 would encode them). Some mailing lists have a fair number of French or German or Scandinavian posters whose names have characters with accents or umlauts or the like. I also get a lot of mail using Russian/Cyrillic characters, but again using an 8-bit character set like koi8 or windows-1251. I don't read Russian, but I think most of this mail says "You go our fine web site you want cheap Canadiansky prescription drug" or something like that. (This last works better if you say it using a cheesy Russian accent.) But anyway, the point is that 8-bit character sets are still very much in use, and it's reasonable to think someone might want to use Text_IO in a way that writes upper-range characters simply as themselves, as 8-bit bytes. This becomes a problem when Wide_Text_IO is also used to write to standard output, if (say) it uses UTF-8 when writing Wide_Characters in the 128-65535 range; the result would be that bytes in the 128-255 range in the standard output would not have a consistent meaning. That's why I think using Text_IO and Wide_Text_IO on standard output in the same program is probably just wrong. Unless there's some way to specify the encoding of both Standard_Output files---which raises another issue. Open and Create have an implementation-defined Form parameter that can be used to specify characteristics of a file---including, perhaps, what encoding is to be used when writing text files. (Our compiler uses the Form parameter for this, and I think GNAT's does, too.) But there isn't any routine in any of the Text_IO packages that would allow you to specify a characteristic of the standard input/output/error files, even using an implementation-defined string parameter. It seems that it would be useful to have a routine like that, at least to provide the ability to specify the encoding dynamically. **************************************************************** From: Adam Beneschan Sent: Wednesday, October 31, 2007 3:45 PM I just thought of a possible objection to this: If all the packages share the same buffering, what would the buffer contain? I presume it would have to contain Wide_Wide_Characters even if only Text_IO is used. Is this too big a hit on efficiency for a program that just wants to output a simple text file? (You'd have to unpack the characters in a String, then repack them so that you could use, say, _write() on the buffer. Seems like a waste...) **************************************************************** From: Tucker Taft Sent: Wednesday, October 31, 2007 3:50 PM I presume the buffer would contain a sequence of Stream_Elements, not a sequence of any particular kind of Character. That is, it has already been converted to the external representation. Otherwise, how would you know when the buffer is full? **************************************************************** From: Tucker Taft Sent: Wednesday, October 31, 2007 3:52 PM > Even though UTF-8 use may be growing, I still see lots of mail with > characters in the 160-255 (or 128-255) range, represented simply as > single bytes (rather than as multiple bytes the way UTF-8 would encode > them). ... I don't know what to suggest here. Your suggestion of having some way to control encoding on standard output sounds like something that might be provided in an implementation-specific child package of Text_IO or Wide_Text_IO. In any case, your original question should be addressed in an implementation-independent way, I believe, and I believe the answer should be that there is only one standard output (internal) file, and Flush from any *Text_IO package may be used to "synchronize" this internal file with the external standard output file. **************************************************************** From: Pascal Leroy Sent: Monday, November 5, 2007 2:52 PM FWIW, this is what the IBM compiler does (I think). There is only one buffer internally. When Text_IO writes to it, no encoding takes place. When Wide...Text_IO writes to it, some encoding (UTF-8 or others) takes places. Adam is right, if you mix the two you can get botched output, but then, who knows, maybe the program that consumes the file knows how to interpret mixed Latin-1 and UTF-8 in the same file. Regarding the Form parameter for standard output et al.: there was a way to do that with POSIX. POSIX did standardize a way to open a file by Unix file descriptor. So you could say "open file 0 and use that Form parameter". This was actually one of the most useful features of the POSIX binding. **************************************************************** From: Randy Brukardt Sent: Thursday, December 6, 2007 6:25 PM It's fairly clear that mixing Text I/O packages is going to get bizarre output because of encoding issues. And as such, such mixing is a bug: mixing is just plain wrong. In that case, who cares what the answer to the original question is? The difference seems to be to make bizarre output 10% more portable (remember that the encoding is not specified by the language). Why would we want to make implementers (potentially) change their implementations to handle code that isn't going to be portable with or without a rule change. I think it is best to simply leave this unspecified (unless and until we're ready to *require* UTF-8 and similar file formats). ****************************************************************