Version 1.1 of acs/ac-00151.txt

Unformatted version of acs/ac-00151.txt version 1.1
Other versions for file acs/ac-00151.txt

!standard A.10(5)          07-12-06 AC95-00151/01
!class confirmation 07-12-06
!status received no action 07-12-06
!status received 07-10-31
!subject Standard_Output of different Text_IO packages
!summary
!appendix

!topic Standard output of different Text_IO packages
!reference RM05 A.10, A.11
!from Adam Beneschan 07-10-31
!discussion


How distinct, and how interconnected, are the "standard output" files
that can be returned by the three different Text_IO packages
(Ada.Text_IO, Ada.Wide_Text_IO, Ada.Wide_Wide_Text_IO)?  Are they
really the same file, although they have three different file types
and thus have to be different file objects?  Or are they handled
totally separately?

Suppose this code appears in a program that does not call any
Set_Output routine, so that all these Put_Line's will go to the
corresponding standard output (Ada.Text_IO.Standard_Output or
Ada.Wide_Text_IO.Standard_Output).  Suppose further that the runtime
buffers the output, so that at the point noted by the comment, no
actual output has been done to any external file.  (Also, we can
assume that Wide_Text_IO outputs using UTF-8 or some such encoding so
that Wide_Text_IO.Put_Line("eee") has exactly the same external effect
as Text_IO.Put_Line("eee").)

    procedure Intersperse_Output is
    begin
        Ada.Wide_Text_IO.Put_Line ("First");
        Ada.Text_IO.Put_Line      ("Second");
        Ada.Wide_Text_IO.Put_Line ("Third");

        --- ALL OUTPUT IS STILL BUFFERED, NO EXTERNAL OUTPUT HAS
        --- APPEARED

        Ada.Text_IO.Flush (Ada.Text_IO.Standard_Output);
    end Intersperse_Output;

Is the implementation correct or incorrect if, after
Intersperse_Output is done, only "Second" has been output, and not
"First" or "Third"?

If all the output is buffered, so that no output is done before the
Flush, but Flush causes "First", "Second", and "Third" to be output in
that order, is the implementation incorrect because Text_IO.Flush
isn't supposed to have any effect on output written to
Wide_Text_IO.Standard_Output?

I've studied the sections involved but I'm not sure how to interpret
them.  A.10(5) says that at the beginning of program execution there
is a standard output file.  A.10(1), however, says that this clause
describes Text_IO, so it wouldn't necessarily apply to Wide_Text_IO or
Wide_Wide_Text_IO.  A.11 describes how the *specifications* of
Wide_Text_IO and Wide_Wide_Text_IO are derived from that of Text_IO,
but it really doesn't address other aspects of the definition of
Text_IO---for instance, is A.10(5) replicated for the other two
packages; or is there still just one standard output file, and if so,
what are the ramifications of that fact?

What exactly is the intent?

(P.S. I realize that it's probably inadvisable to use Text_IO.Put_Line
and Wide_Text_IO.Put_Line in the same program in this way,
particularly if the Text_IO could be used to output characters in the
Character'Val(128)..Character'Val(255) range.)

****************************************************************

From: Tucker Taft
Sent: Wednesday, October 31, 2007  1:55 PM

Interesting questions.  Given the growing use of UTF-8,
I would recommend that there be only one standard output
(internal) File, and that all of the *Text_IO packages
use this same internal File, and you could Flush
the output using any one of the packages.  Of course
there is nothing in the manual that says this explicitly,
but you could perhaps say it is implied by A.10(5) saying
there is only one standard output, and by deciding what would
be most useful.  Having each package do its own buffering
of standard output would not be particularly helpful
for anyone.

At least an AARM note is in order, but perhaps some
wording changes to clarify the desired rule.

****************************************************************

From: Adam Beneschan
Sent: Wednesday, October 31, 2007  2:37 PM

The encoding issue is an interesting one.  I'm not sure what impact it
has on my questions, except possibly to say that encoding issues mean
that you just shouldn't use both Text_IO and Wide_Text_IO on standard
output so it really doesn't matter how an implementation handles it.

Even though UTF-8 use may be growing, I still see lots of mail with
characters in the 160-255 (or 128-255) range, represented simply as
single bytes (rather than as multiple bytes the way UTF-8 would encode
them).  Some mailing lists have a fair number of French or German or
Scandinavian posters whose names have characters with accents or
umlauts or the like.  I also get a lot of mail using Russian/Cyrillic
characters, but again using an 8-bit character set like koi8 or
windows-1251.  I don't read Russian, but I think most of this mail
says "You go our fine web site you want cheap Canadiansky prescription
drug" or something like that.  (This last works better if you say it
using a cheesy Russian accent.)

But anyway, the point is that 8-bit character sets are still very much
in use, and it's reasonable to think someone might want to use Text_IO
in a way that writes upper-range characters simply as themselves, as
8-bit bytes.  This becomes a problem when Wide_Text_IO is also used to
write to standard output, if (say) it uses UTF-8 when writing
Wide_Characters in the 128-65535 range; the result would be that bytes
in the 128-255 range in the standard output would not have a
consistent meaning.  That's why I think using Text_IO and Wide_Text_IO
on standard output in the same program is probably just wrong.  Unless
there's some way to specify the encoding of both Standard_Output
files---which raises another issue.  Open and Create have an
implementation-defined Form parameter that can be used to specify
characteristics of a file---including, perhaps, what encoding is to be
used when writing text files.  (Our compiler uses the Form parameter
for this, and I think GNAT's does, too.)  But there isn't any routine
in any of the Text_IO packages that would allow you to specify a
characteristic of the standard input/output/error files, even using an
implementation-defined string parameter.  It seems that it would be
useful to have a routine like that, at least to provide the ability to
specify the encoding dynamically.

****************************************************************

From: Adam Beneschan
Sent: Wednesday, October 31, 2007  3:45 PM

I just thought of a possible objection to this: If all the packages
share the same buffering, what would the buffer contain?  I presume it
would have to contain Wide_Wide_Characters even if only Text_IO is
used.  Is this too big a hit on efficiency for a program that just
wants to output a simple text file?  (You'd have to unpack the
characters in a String, then repack them so that you could use, say,
_write() on the buffer.  Seems like a waste...)

****************************************************************

From: Tucker Taft
Sent: Wednesday, October 31, 2007  3:50 PM

I presume the buffer would contain a sequence of
Stream_Elements, not a sequence of any particular
kind of Character.  That is, it has already been
converted to the external representation.
Otherwise, how would you know when the buffer
is full?

****************************************************************

From: Tucker Taft
Sent: Wednesday, October 31, 2007  3:52 PM

> Even though UTF-8 use may be growing, I still see lots of mail with
> characters in the 160-255 (or 128-255) range, represented simply as
> single bytes (rather than as multiple bytes the way UTF-8 would encode
> them).  ...

I don't know what to suggest here.  Your suggestion of having
some way to control encoding on standard output
sounds like something that might be provided in an
implementation-specific child package of
Text_IO or Wide_Text_IO.

In any case, your original question should
be addressed in an implementation-independent
way, I believe, and I believe the answer should be that
there is only one standard output (internal) file, and
Flush from any *Text_IO package may be used to
"synchronize" this internal file with the external
standard output file.

****************************************************************

From: Pascal Leroy
Sent: Monday, November 5, 2007  2:52 PM

FWIW, this is what the IBM compiler does (I think).  There is only one
buffer internally.  When Text_IO writes to it, no encoding takes
place.  When Wide...Text_IO writes to it, some encoding (UTF-8 or
others) takes places.  Adam is right, if you mix the two you can get
botched output, but then, who knows, maybe the program that consumes
the file knows how to interpret mixed Latin-1 and UTF-8 in the same
file.

Regarding the Form parameter for standard output et al.: there was a
way to do that with POSIX.  POSIX did standardize a way to open a file
by Unix file descriptor.  So you could say "open file 0 and use that
Form parameter".  This was actually one of the most useful features of
the POSIX binding.

****************************************************************

From: Randy Brukardt
Sent: Thursday, December 6, 2007  6:25 PM

It's fairly clear that mixing Text I/O packages is going to get bizarre
output because of encoding issues. And as such, such mixing is a bug:
mixing is just plain wrong.

In that case, who cares what the answer to the original question is?
The difference seems to be to make bizarre output 10% more portable
(remember that the encoding is not specified by the language). Why
would we want to make implementers (potentially) change their
implementations to handle code that isn't going to be portable with
or without a rule change.

I think it is best to simply leave this unspecified (unless and until
we're ready to *require* UTF-8 and similar file formats).

****************************************************************


Questions? Ask the ACAA Technical Agent