Version 1.7 of ai12s/ai12-0021-1.txt
!standard 11.4.1(19) 19-01-04 AI12-0021-1/06
!standard A.8.1(15)
!standard A.8.2(28.3/4)
!standard A.8.4(18)
!standard A.10.1(85)
!standard A.12.1(26)
!standard A.15.1(0)
!standard A.16.2(0)
!standard A.17.1(0)
!class Amendment 12-03-13
!status Amendment 1-2012 18-12-10
!status ARG Approved 10-0-2 18-12-10
!status work item 12-02-25
!status received 12-02-25
!priority High
!difficulty Hard
!subject Additional internationalization of Ada
!summary
Add support for the use of the entire set of characters from ISO/IEC 10646:2017
for file and directory names by the operations of the Annex A facilities.
!proposal
In addition to the facilities already provided,
(1) File and directory operations should support Unicode characters (presuming
that the target file system does so);
(2) Exception messages and exception information should support Unicode
characters;
(3) Command lines should support Unicode characters (presuming that the target
system allows these).
!wording
Add after 11.4.1(19):
NOTES
UTF-8 encoding (see A.4.11) can be used to represent non-ASCII characters in
exception messages.
Add after A.8.1(15):
... -- Enclosing package Ada.Sequential_IO
package Wide_File_Names is
--
procedure Create(File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
--
procedure Create(File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
Add after A.8.2(28.3/4):
The nested package Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_String is
used instead of String for the name and form of the external file.
The nested package Wide_Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_Wide_String
is used instead of String for the name and form of the external file.
Add after A.8.4(18):
... -- Enclosing package Ada.Direct_IO
package Wide_File_Names is
--
procedure Create(File : in out File_Type;
Mode : in File_Mode := Inout_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
--
procedure Create(File : in out File_Type;
Mode : in File_Mode := Inout_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
Add after A.10.1(85):
... -- Enclosing package Ada.Text_IO
package Wide_File_Names is
--
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
--
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
Add after A.12.1(26):
... -- Enclosing package Ada.Stream_IO
package Wide_File_Names is
--
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
--
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
Add section A.15.1:
A.15.1 The Packages Wide_Command_Line and Wide_Wide_Command_Line
The packages Wide_Command_Line and Wide_Wide_Command_Line allow a program to
obtain the values of its arguments and to set the exit status code to be
returned on normal termination.
Static Semantics
The specification of package Wide_Command_Line is the same as for Command_Line,
except that each occurrence of String is replaced by Wide_String.
The specification of package Wide_Wide_Command_Line is the same as for
Command_Line, except that each occurrence of String is replaced by
Wide_Wide_String.
Add section A.16.2:
A.16.2 The Packages Wide_Directories and Wide_Wide_Directories
The packages Wide_Directories and Wide_Wide_Directories provide operations for
manipulating files and directories, and their names.
Static Semantics
The specification of package Wide_Directories is the same as for Directories
(including its optional child packages Information and Hierarchical_File_Names),
except that each occurrence of String is replaced by Wide_String.
The specification of package Wide_Wide_Directories is the same as for
Directories (including its optional child packages Information and
Hierarchical_File_Names), except that each occurrence of String is replaced by
Wide_Wide_String.
Add section A.17.1:
A.17.1 The Packages Wide_Environment_Variables and Wide_Wide_Environment_Variables
The packages Wide_Environment_Variables and Wide_Wide_Environment_Variables
allow a program to read or modify environment variables.
Static Semantics
The specification of package Wide_Environment_Variables is the same as for
Environment_Variables, except that each occurrence of String is replaced by
Wide_String.
The specification of package Wide_Wide_Environment_Variables is the same as for
Environment_Variables, except that each occurrence of String is replaced by
Wide_Wide_String.
!discussion
These issues defy an easy solution. Changing the behavior of the existing
routines would break existing workarounds (which on some targets, like most
Linux systems, have no problems with directly using UTF-8 strings) and other
commonly used functionality (like encoding binary data in exception messages).
Adding even more Wide_Wide_ packages and routines is a combinational explosion.
The crux of this problem is that the semantics and representation of strings
have become co-mingled. What we really need to do is to separate these; the
difficulty with that is mostly with retaining adequate performance.
The way-out solution would be to declare a semi-magic Root_String interface (or
perhaps an abstract type); string literals, "lvalue"s and indexing already can
be supported with existing Ada 2020 facilities. Something on the line of:
package General_Strings is
type Root_String is interface with
Constant_Indexing => Get_Char,
Variable_Indexing => Set_Char,
String_Literal => Assign; --
function Get_Char (A : Root_String; I : Positive) return Wide_Wide_Character;
--
type LValue (D : access Wide_Wide_Character)
with Implicit_Dereferencing => D
is null record;
function Set_Char (A : in out Root_String; I : Positive) return LValue;
--
--
function Slice (A : Root_String; L : Positive; R : Natural) return Wide_Wide_String;
--
--
procedure Assign (Trg : in out Root_String; Src : in Wide_Wide_String);
--
function Value (Obj : Root_String) return Wide_Wide_String;
--
--
--
--
end General_Strings;
[Note: I didn't try to think of good names for these routines and parameters;
that would need to done, of course.]
Then we'd have a bunch of concrete instances:
type Latin_1_String (L, R : Positive) is new General_Strings.Root_String with
Obj : String (L .. R);
--
type Bounded_UTF_8_String (Byte_Len : Natural) is new General_Strings.Root_String with
Obj : UTF_8_String (1 .. Byte_Len);
--
and so on for every interesting representation.
In addition, we'd have Ada.Strings.General (which would have approximately the
contents of Ada.Strings.Fixed, with all of the String parameters converted to
Root_String'Class). And most of the IO routines that take strings would have
versions that would take Root_String'Class (these would need different names or
packages, unfortunately, to avoid ambiguity). Similarly for exception messages,
and so on.
The real key here is that the string types would carry their representation
along when passed into routines (which have to be new for this reason). Once
that is available, then any problems can be dealt with by simply using whatever
representation is appropriate for the target system.
====
The above was discussed at the Lisbon 2018 meeting and considered too ambitious
for the Ada 2020 timescales.
It was considered useful though to add child packages Wide_File_Names and
Wide_Wide_File_Names for each I/O package, containing just those operations that
take a filename as a parameter, and Wide_ and Wide_Wide_ versions of
Ada.Directories, Ada.Command_Line and Ada.Environment_Variables.
We do not try to add wide versions of exception messages. We want existing
code to work unmodified. However, a wide exception message would either make
existing syntax incompatible by making it ambiguous, or would make it painful
to use wide messages by not having syntax as an option. Having implementations
use multiple formats for exception messages would break techniques where the
values of objects are streamed as part of the message (a common work-around to
attach values to a raised exception). Instead, we recommend that projects that
require Wide_Wide_Character messages use UTF-8 encoding.
Note that UTF-8 encoding needs to be applied by projects, not implementations;
if an implementation was to use UTF-8 encoding for messages, streamed values
would possibly be destroyed (as upper-128 characters are expanded into two
octets).
!corrigendum 11.4.1(19)
Insert after the paragraph:
Exception_Message (by default) and Exception_Information should produce
information useful for debugging. Exception_Message should be short (about one
line), whereas Exception_Information can be long. Exception_Message should not
include the Exception_Name. Exception_Information should include both the
Exception_Name and the Exception_Message.
the new paragraph:
NOTES
3 UTF-8 encoding (see A.4.11) can be used to represent non-ASCII characters in
exception messages.
!corrigendum A.8.1(15)
Insert after the paragraph:
Status_Error : exception renames IO_Exceptions.Status_Error;
Mode_Error : exception renames IO_Exceptions.Mode_Error;
Name_Error : exception renames IO_Exceptions.Name_Error;
Use_Error : exception renames IO_Exceptions.Use_Error;
Device_Error : exception renames IO_Exceptions.Device_Error;
End_Error : exception renames IO_Exceptions.End_Error;
Data_Error : exception renames IO_Exceptions.Data_Error;
the new paragraphs:
package Wide_File_Names is
-- File management
procedure Create(File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
-- File management
procedure Create(File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
!corrigendum A.8.2(28.3/4)
Insert after the paragraph:
The exception Status_Error is propagated if the file is not open.
The exception Mode_Error is propagated if the mode of the file is In_File.
the new paragraphs:
The nested package Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_String is
used instead of String for the name and form of the external file.
The nested package Wide_Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_Wide_String
is used instead of String for the name and form of the external file.
!corrigendum A.8.4(18)
Insert after the paragraph:
Status_Error : exception renames IO_Exceptions.Status_Error;
Mode_Error : exception renames IO_Exceptions.Mode_Error;
Name_Error : exception renames IO_Exceptions.Name_Error;
Use_Error : exception renames IO_Exceptions.Use_Error;
Device_Error : exception renames IO_Exceptions.Device_Error;
End_Error : exception renames IO_Exceptions.End_Error;
Data_Error : exception renames IO_Exceptions.Data_Error;
the new paragraphs:
package Wide_File_Names is
-- File management
procedure Create(File : in out File_Type;
Mode : in File_Mode := Inout_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
-- File management
procedure Create(File : in out File_Type;
Mode : in File_Mode := Inout_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
!corrigendum A.10.1(85)
Replace the paragraph:
Status_Error : exception renames IO_Exceptions.Status_Error;
Mode_Error : exception renames IO_Exceptions.Mode_Error;
Name_Error : exception renames IO_Exceptions.Name_Error;
Use_Error : exception renames IO_Exceptions.Use_Error;
Device_Error : exception renames IO_Exceptions.Device_Error;
End_Error : exception renames IO_Exceptions.End_Error;
Data_Error : exception renames IO_Exceptions.Data_Error;
Layout_Error : exception renames IO_Exceptions.Layout_Error;
private
... -- not specified by the language
end Ada.Text_IO;
by:
Status_Error : exception renames IO_Exceptions.Status_Error;
Mode_Error : exception renames IO_Exceptions.Mode_Error;
Name_Error : exception renames IO_Exceptions.Name_Error;
Use_Error : exception renames IO_Exceptions.Use_Error;
Device_Error : exception renames IO_Exceptions.Device_Error;
End_Error : exception renames IO_Exceptions.End_Error;
Data_Error : exception renames IO_Exceptions.Data_Error;
Layout_Error : exception renames IO_Exceptions.Layout_Error;
package Wide_File_Names is
-- File management
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
-- File management
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
private
... -- not specified by the language
end Ada.Text_IO;
!corrigendum A.12.1(26)
Insert after the paragraph:
Status_Error : exception renames IO_Exceptions.Status_Error;
Mode_Error : exception renames IO_Exceptions.Mode_Error;
Name_Error : exception renames IO_Exceptions.Name_Error;
Use_Error : exception renames IO_Exceptions.Use_Error;
Device_Error : exception renames IO_Exceptions.Device_Error;
End_Error : exception renames IO_Exceptions.End_Error;
Data_Error : exception renames IO_Exceptions.Data_Error;
the new paragraphs:
package Wide_File_Names is
-- File management
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_String := "";
Form : in Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_String;
Form : in Wide_String := "");
function Name (File : in File_Type) return Wide_String;
function Form (File : in File_Type) return Wide_String;
end Wide_File_Names;
package Wide_Wide_File_Names is
-- File management
procedure Create (File : in out File_Type;
Mode : in File_Mode := Out_File;
Name : in Wide_Wide_String := "";
Form : in Wide_Wide_String := "");
procedure Open (File : in out File_Type;
Mode : in File_Mode;
Name : in Wide_Wide_String;
Form : in Wide_Wide_String := "");
function Name (File : in File_Type) return Wide_Wide_String;
function Form (File : in File_Type) return Wide_Wide_String;
end Wide_Wide_File_Names;
!corrigendum A.15.1(0)
Insert new clause:
The packages Wide_Command_Line and Wide_Wide_Command_Line allow a program to
obtain the values of its arguments and to set the exit status code to be
returned on normal termination.
Static Semantics
The specification of package Wide_Command_Line is the same as for Command_Line,
except that each occurrence of String is replaced by Wide_String.
The specification of package Wide_Wide_Command_Line is the same as for
Command_Line, except that each occurrence of String is replaced by
Wide_Wide_String.
!corrigendum A.16.2(0)
Insert new clause:
The packages Wide_Directories and Wide_Wide_Directories provide operations for
manipulating files and directories, and their names.
Static Semantics
The specification of package Wide_Directories is the same as for Directories
(including its optional child packages Information and Hierarchical_File_Names),
except that each occurrence of String is replaced by Wide_String.
The specification of package Wide_Wide_Directories is the same as for
Directories (including its optional child packages Information and
Hierarchical_File_Names), except that each occurrence of String is replaced by
Wide_Wide_String.
!corrigendum A.17.1(0)
Insert new clause:
The packages Wide_Environment_Variables and Wide_Wide_Environment_Variables
allow a program to read or modify environment variables.
Static Semantics
The specification of package Wide_Environment_Variables is the same as for
Environment_Variables, except that each occurrence of String is replaced by
Wide_String.
The specification of package Wide_Wide_Environment_Variables is the same as for
Environment_Variables, except that each occurrence of String is replaced by
Wide_Wide_String.
!ACATS test
ACATS C-Tests are needed for the new packages (nested and otherwise).
!appendix
This AI was created from the ashes of AI05-0286-1 (that is, those portions
that defied an easy solution).
****************************************************************
From: Gautier de Montmollin
Sent: Thursday, March 14, 2013 5:51 AM
!topic Ada.Directories: Form parameter for all subprograms with file or directory names
!reference Ada 2012 RM A.16
!from Author Gautier de Montmollin 2013-03-14
!keywords directories
!discussion
In Ada.Directories, only a few subprograms of those having a String for file or
directory name provide a Form parameter. It prevents an implementation providing
the same as for Ada.Text_IO, for instance recognizing the "encoding=utf-8"
sub-string. As a result Ada.Directories becomes practically useless for software
meant to run on file systems with international character sets.
****************************************************************
From: Adam Beneschan
Sent: Thursday, March 14, 2013 10:23 AM
The Form parameter is implementation-dependent, so any solution based on adding
a Form parameter is going to be implementation-dependent. Because of this, it
might be better to request AdaCore or whoever your compiler vendor is to add
their own package Ada.Directories.Extensions to provide the functionality you
want, since it's going to be implementation-dependent anyway.
It appears that the ARG is starting to think about a more permanent solution, to
the general problem of allowing Wide_String and Wide_Wide_String in places where
only String is currently allowed. See AI12-0021. Personally, I think something
like this ought to be done. The current "solution", in which a String is used
to hold UTF-8 sequences in some situations, is an obnoxious hack. A String is
an array of characters; and to me, the idea that a String whose 'Length is (say)
23 can be used to represent a string that really has 19 characters in it, is an
abuse. It's been allowed as a temporary compromise, because something was
needed and a real solution is difficult. But it's still an abuse. If we're
going to entrench the idea of using String types to hold non-String data such as
UTF-8 bit encodings, we might as well give up and start programming in C.
So I'm not in favor of adding anything like this to the language standard.
****************************************************************
From: Jeff Cousins
Sent: Thursday, March 14, 2013 10:56 AM
Thanks for replying on this topic Adam. It does seem to be something that it is
going to be very hard to make rules about, it looks like it's going to be up to
whatever implementation is used to offer something sensible for the platform
used. The discussion of AI05-0286-1/02 in the minutes of the 46th ARG meeting,
publicly available at http://www.ada-auth.org/arg-minutes.html, might show that
it has been thought about, but how hard an issue it is to tackle.
****************************************************************
From: Gautier de Montmollin
Sent: Thursday, March 14, 2013 3:04 PM
Anyway, a Wide_String version with no rule is fundamentally better than the
String version with no rule! For Wide_String, implementors will follow the
UTF-16 de facto standard in place for 20 years at least and that's it. For the
String version Ada.Directories is just dysfunctional... So please don't wait too
long...
****************************************************************
From: Adam Beneschan
Sent: Thursday, March 14, 2013 6:33 PM
I think there's some widespread and fundamental confusion when it comes to UTF
and encodings. A Wide_String is just an array of Wide_Characters. A
Wide_Character is, fundamentally, just a number between 0 and 65535, where each
number represents a character that has been assigned that number in the Unicode
Basic Multilingual Plane. A Wide_String is an array of those numbers. There
should be no "encoding" involved, UTF-16 or otherwise. A Wide_String will
normally be represented as just an array of 16-bit integers that mean
themselves.
This means that a Wide_Character in a Wide_String can't represent a character in
a different plane, i.e. from U+10000 and up. But that's what Wide_Wide_String
is for. And I am certain that the ARG will not produce a solution that allows
Wide_Strings as file names that doesn't also allow Wide_Wide_Strings.
So UTF-16 has no place in this discussion. If Ada.Directories allows
Wide_Strings and Wide_Wide_Strings, the implementation may need to convert them
to UTF-8 in order to communicate with the OS, but the Ada program that uses
Ada.Directories doesn't need to know about this implementation detail.
I feel like there are a lot of people who aren't clear on the distinctions
between the concepts (and the use of things like charset="utf-8" in HTML files
just adds to the confusion, since UTF-8 is really an encoding algorithm and not
a character set). Hopefully I've helped clear up some confusion among a few
people, but I feel like this is a losing battle.
****************************************************************
From: Gautier de Montmollin
Sent: Thursday, March 14, 2013 7:10 PM
Could not agree more!
Go for Ada.Directories with String's, Wide_String's and Wide_Wide_String's !
****************************************************************
From: Randy Brukardt
Sent: Thursday, March 14, 2013 7:54 PM
That's easy, but it doesn't fix anything. That's because you have to be able to
Create and Open files with the result. And pass Forms that contain file names.
And retrieve Names and Forms. And on and on.
Note that this is completely orthogonal to what kind of I/O the package
supports: it has nothing to do with Wide_Text_IO, for instance.
To follow the Wide_xxx and Wide_Wide_xxx to it's logical limit, you'd have to
add Wide_ and Wide_Wide_ versions of all of the file manipulation routines in
*every* existing I/O package. Which is insane (no, we do not want
"Wide_Wide_Open" and "Wide_Wide_Name").
Moreover, we would have to decide what happens if the name of a file that
contains 32-bit characters name is retrieved via Name. And recall that Name is
required to return a "name that uniquely identifies the file", which usually
means including the full path. In which case, there could be 32-bit characters
in the result returned by Name even if the simple name only is ASCII (if for
instance the user's login name and thus home directory contained such
characters).
The obvious solution to this problem is to raise an exception -- but that would
be incompatible with existing practice on Linux (where UTF-8 can be used in type
String without any interpretation) as well as practice involving Form
parameters. And it would be incompatible for anyone unfortunate enough to run
their program from a directory named using characters above position 256. We
need to avoid run-time incompatibilities if at all possible (because there is no
automatic way to detect them); while this particular case mostly involves
implementation-defined behavior, the effect would be just as dangerous to
programs that depend on it.
It was cases like these that caused the ARG to discard the rough proposal that I
had made for Ada 2012 and decide to defer any change until the next version of
Ada (whenever that will be).
As far as I can tell, the only real solution is to blow it all up and start over
using a tagged root type (tentatively named Root_String'Class, although maybe
we'd use it only for file names in which case it would get an appropriate name
for that). If the tagged root type allowed string literals (the only real change
needed to the language), there wouldn't be much user-level change, and the
implementations could include properly typed UTF-8 and UTF-16 strings, along
with anything else that might make sense.
Note that there are similar problems with Ada.Command_Line,
Ada.Environment_Variables, Ada.Exceptions (the exception message part), and
probably other packages that we haven't thought about. This makes blowing it up
more attractive, because adding dozens of new routines that hardly anyone will
want to use, and adding incompatibilities as well, does not seem like a good
plan.
I don't know if there will be the will to "blow it up", but in any case, there
is nothing simple or easy about this problem, and it does everyone a disservice
to claim that there is an "easy" solution.
****************************************************************
From: Robert Leif
Sent: Friday, March 15, 2013 11:25 AM
I believe that an alternative solution to the problem is to proceed one step up
in abstraction. A linear array generic type could be made that included the
string operations of Text_IO. It could be instantiated with any type of
character including 4 bit characters or even 1 bit characters. Then it could be
the basis of Root_String'Class or whatever you want to call it.
If anyone is interested, I have been spent my last years writing XML schemas
(CytometryML.org) written in the XML Schema Definition Language (XSD). XSD
1.1 includes assertions and restriction (generics). I basically fake datatype
declarations in Ada specifications.
****************************************************************
From: Gautier de Montmollin
Sent: Saturday, March 16, 2013 8:51 AM
Another idea would be not to change the standard at all about this, and persuade
at least one major compiler vendor to use utf-8 for file or directory names in
Ada.Directories. For instance GNAT is applying the equivalent tactic for
arguments in Ada.Command_Line, since 2008. From the Devlopment Log,
NF-62-HB07-027-gnat:
"Unicode characters on Windows command line On Windows Ada.Command_Line now
supports Unicode characters. Arguments are returned encoded in UTF-8 allowing
better handling of Unicode file names names as arguments."
****************************************************************
From: Vadim Godunko
Sent: Thursday, April 25, 2013 9:00 AM
> Another idea would be not to change the standard at all about this,
> and persuade at least one major compiler vendor to use utf-8 for file
> or directory names in Ada.Directories.
Use of some UTF-XX is fine for Windows and MacOSX which is UTF-based. On POSIX
systems any encoding can be selected by user and it is important to use it
consistently for each call to imported libraries and to do input/output
operations.
****************************************************************
From: Florian Weimer
Sent: Sunday, April 28, 2013 9:28 AM
There's also an expectation that it's possible to access files whose names are
not in the encoding range of the current locale.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, May 1, 2013 1:13 PM
Huh? UTF-8 covers all locales as all possible characters are in it; there should
be no adjustment afterwards or there is something quite wrong going on. Locales
only apply to pure 8-bit encodings, and that's impossible to do anything
sensible with. There is an issue on Windows about file name equivalence, but
that's something that simply never, even should be used, because it's impossible
to work out (in part because of the locale issue). Linux and Unix don't have
that problem.
Anyway, there are two problems here, and they're at cross-purposes.
One is the desire to let IO routines open and manipulate any file that could
exist on the system.
Second is the desire to portably be able to create and manipulate the *name* of
any file that Text_IO can *create*.
On a system with no file name rules, it's clearly not possible to do both
(you've got to have some rules in order to portable manipulation).
The purpose of Ada.Directories is exclusively the second - *portable*
manipulation of files and names. That means that by definition it will have to
be more limited than "everything the system can do". Thus, using UTF-8
exclusively would be sufficient for it.
OTOH, we probably don't want such a restriction in Text_IO.Open (for example).
That seems OK to me, as those "old" interfaces aren't going anywhere even if a
new set is created using Root_String'Class or Wide_Wide_String. If you need
bizarre capabilities on Linux (such as EBCDIC file names), use the old
interfaces.
****************************************************************
From: Yannick Duchene
Sent: Wednesday, May 1, 2013 1:47 PM
While it is true Unicode covers most languages and locales characters
requirements, it does cover everything possibly needed. There are two main
reasons: the first, Unicode is always defining new characters as the standard
evolves (which implies all possible characters are not necessarily in it), the
second, Unicode is not so much welcome in some countries (like Japan) where some
lobbies (official or not) do all they can to preserve their own encoding as the
official encoding, arguing Unicode is missing too many specificities of their
writing system. But Unicode also has private use areas, which enable enough
additional local definitions (this requires a local agreements between the
parties involved, an issue the Ada standard does not have to bother with).
Unicode is the good choice, but will not make every one happy before a long
time.
I would say well-formed UTF-8, with the requirement to be transparent with
code-points from private use areas: no attempt to transform, interpret or decide
if whether or not such a code-point is valid or not for a file-name and always
accept it as valid.
(hope I did not missed the point, as I have not read all the mails on this
issue)
****************************************************************
From: Randy Brukardt
Sent: Wednesday, May 1, 2013 6:08 PM
...
> > Huh? UTF-8 covers all locales as all possible characters are in it;
>
> While it is true Unicode covers most languages and locales characters
> requirements, it does cover everything possibly needed. There are two
> main
> reasons: the first, Unicode is always defining new characters as the
> standard evolves (which implies all possible characters are not
> necessarily in it),
If they're not in Unicode, they're not anywhere. In any case, added characters
are not an issue.
> ... the second, Unicode is
> not so much welcome in some countries (like Japan) where some lobbies
> (official or not) do all they can to preserve their own encoding as
> the official encoding, arguing Unicode is missing too many
> specificities of their writing system. But Unicode also has private
> use areas, which enable enough additional local definitions (this
> requires a local agreements between the parties involved, an issue the
> Ada standard does not have to bother with).
>
> Unicode is the good choice, but will not make every one happy before a
> long time.
>
> I would say well-formed UTF-8, with the requirement to be transparent
> with code-points from private use areas: no attempt to transform,
> interpret or decide if whether or not such a code-point is valid or
> not for a file-name and always accept it as valid.
What's a legal file name is implementation-defined, and I certainly don't see
that changing. Some characters are not allowed in Windows file names, for
example, and the Ada standard cannot try to insist that they're allowed. So I
find this irrelevant -- indeed, if there is any support at all for UTF-8 file
names will always be implementation-defined. The problem now is that we don't
have any sane way to *allow* it -- there will never be a *requirement* to
support it.
****************************************************************
From: Justin Squirek
Sent: Tuesday, November 13, 2018 1:29 PM
Hey Jeff, I made some very minor wording edits. [This is version /03 of the
AI -Editor.]
****************************************************************
From: Randy Brukardt
Sent: Wednesday, December 5, 2018 6:47 PM
Here are some editorial comments on this:
(1) I realize you're new here, Justin, but when wording says "modify" some
paragraph, the changes have to be marked with {} for insertions and [] for
deletions. This is the preferred form, because it makes my job easier and it is
easier to see the changes. Otherwise, you need to use "Replace".
In any case, you are not supposed to remove those marks from Jeff's version, nor
are you allowed to change existing wording without showing the changes.
Moreover, why would anyone change "declarations are repeated" to "declarations
get repeated"? There's nothing wrong with the first wording, and we don't change
existing wording just because someone would like a different verb.
I've ignored this change completely.
(2) You changed the spacing of the various packages. Jeff copied the spacing of
the original IO Sequential_IO and Direct_IO packages exactly. I'd agree that
the original spacing is suboptimal, but when putting new things directly next to
old things, we generally copy the original style, rather than invent a new one.
(With RM wording, context is important.) So I left the spacing as Jeff had it.
OTOH, Jeff got the spacing wrong for Text_IO and Stream_IO (these are the better
spacing). So I did use your changes there. (Don't you love consistency???)
(3) You changed the default mode for Create to "InOut_File" for most of these
packages. Jeff did have the wrong for Direct_IO and Stream_IO, but you have it
spelled wrong (the 'o' should be in lower case). And you added it to Text_IO,
but Text_IO doesn't even have that mode. Another change I ignored.
(4) In A.15.1, Jeff has "is the same as", which you changed to "identical". Jeff
is just copying the style of the existing similar wording, and it is not a good
idea to use a different form (recall the issues that I previously noted about
consistency?) We also try to avoid words like "identical" and "equivalent"
because it is rarely true. "identical except blah" is *not* identical, after
all! I left the wording Jeff had.
[Editor's note: These comments apply to version /03 as posted.]
****************************************************************
From: Jeff Cousins
Sent: Wednesday, November 14, 2018 12:13 PM
Thanks again Justin.
11.4.1.(19) “non-ASCII” seems a bit too colloquial, I still prefer “non-ASCII
characters”.
A.15.1, A.16.2 “is the same as” seems to be the more normal RM-speak than “is
identical to”, e.g. A.11.
Otherwise fine.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, November 14, 2018 6:07 PM
> 11.4.1.(19) "non-ASCII" seems a bit too colloquial, I still prefer
"non-ASCII characters".
It doesn't matter, as the term "ASCII" isn't defined for a normative
description of characters as it is not an ISO standard, so you shouldn't
use it here (or anywhere in normative wording; it's OK in AARM notes).
You could tie this text to the contents of the package ASCII, something
like "characters not present in package ASCII". But since the package ASCII
is obsolescent, I'd recommend against that.
Ada.Characters.Handling uses "ISO_646" for this purpose (that being the ISO
standard in question), so you could say something like "characters not in ISO
646" or you could even reference the subtype directly "characters not in
Characters.Handling.ISO_646".
Finally, you could simply say what you really mean and talk about character
code points: "characters whose code point is greater than 127".
Moral: everything about characters is harder than it seems. :-)
****************************************************************
From: Randy Brukardt
Sent: Wednesday, December 5, 2018 7:10 PM
Here are my technical comments on this AI:
[Aside: "you" in most of these cases was originally Jeff, but both authors
share responsibility.]
>Add after 11.4.1(19):
>
>It is recommended that exception messages requiring non-ASCII use UTF-8
>encoding.
You have this in an Implementation Advice section. But this is advice to the
programmer -- the implementation cannot and must not do this on its own. (How
would the user of Exception_Message know that it is encoded in UTF-8 if the
implementation did that itself? Moreover, what would that do to messages that
include a streamed [binary] portion? Only a project can decide to use UTF-8
encoding universally for messages.)
So this should be a user note. If it is a user note, we can be less rigorous,
so saying "non-ASCII" arguably would be better than "non ISO-646".
Additionally, the discussion needs an explanation of why this cannot be
changed. (Answer: it appears in existing syntax and pragmas that would become
ambiguous or illegal if the definition of the message changed. That would be a
substantial compatibility problem. Similarly, existing code that does not
assume UTF-8 encoding must continue to work unmodified, including code that
encodes values into the messages; silently breaking code would be an even
worse compatibility problem.)
---
>Add at the end of A.8.2:
You need to provide the paragraph number. If you had done that, you might
have seen that the end of this subclause is an Implementation Permissions
section, in which this text is totally inappropriate.
---
In A.16.2:
>The specification of package Wide_Directories is the same as for
>Directories (including its optional child package
>Hierarchical_File_Names), except that each occurrence of String is replaced
>by Wide_String.
I think you need to mention it's not-optional child of Information as well.
The contents of Information are not specified by the language, but it needs
to be present. (And suggested contents are given in the AARM for Windows and
Linux implementations.) See A.16(124/2). We don't want
Wide_Directories.Information to be any different than Directories.Information.
****************************************************************
From: Jeff Cousins
Sent: Thursday, December 6, 2018 10:46 AM
(3) You changed the default mode for Create to "InOut_File" for most of
these packages. Jeff did have the wrong for Direct_IO and Stream_IO, but you
have it spelled wrong (the 'o' should be in lower case). And you added it to
Text_IO, but Text_IO doesn't even have that mode. Another change I ignored.
?? The Modes I used were as per the parents, i.e. Out_File for Sequential_IO,
InOut_File (though, as Randy says, I should have said Inout_File) for
Direct_IO, and Out_File for Text_IO and Stream_IO.
****************************************************************
From: Randy Brukardt
Sent: Friday, December 7, 2018 12:49 AM
>(3) You changed the default mode for Create to "InOut_File" for most of
>these packages. Jeff did have the wrong for Direct_IO and Stream_IO, but you
>have it spelled wrong (the 'o' should be in lower case). And you added it to
>Text_IO, but Text_IO doesn't even have that mode. Another change I ignored.
>?? The Modes I used were as per the parents, i.e. Out_File for Sequential_IO,
>InOut_File (though, as Randy says, I should have said Inout_File) for
>Direct_IO, and Out_File for Text_IO and Stream_IO.
You're right, your version was originally correct. Justin's version confused
me enough to get them all messed up. I corrected them in a new AI version.
...
>> You have this in an Implementation Advice section ...
>> Additionally, the discussion needs an explanation ...
...
>Good points.
I've done both of these things.
>>>Add at the end of A.8.2:
>>You need to provide the paragraph number. If you had done that, you might
>>have seen that the end of this subclause is an Implementation Permissions
>>section, in which this text is totally inappropriate
>Agreed, I don’t know how I missed it.
I fixed this, too, in a new draft.
>>>The specification of package Wide_Directories is the same as for Directories
>>>(including its optional child package Hierarchical_File_Names), except that
>>>each occurrence of String is replaced by Wide_String.
>>I think you need to mention it's not-optional child of Information as well.
>>The contents of Information are not specified by the language, but it needs
>>to be present. (And suggested contents are given in the AARM for Windows and
>>Linux implementations.) See A.16(124/2). We don't want
>>Wide_Directories.Information to be any different than
>>Directories.Information.
>I must admit that the existence of child package Information had totally passed
>me by. Though if the underlying OS doesn’t provide any additional information,
>than won’t the child package not exist, or are you saying that it will exist
>but be empty? If the latter, then I think it should be shown in the Static
>Semantics section, even if it just contains a “ ... -- not specified by the
>language” type of comment.
Restriction No_Implementation_Identifiers treats this package as
language-defined with implementation-defined contents, just like Machine_Code.
That's probably the model that we should use here.
This package is hidden as much as it is because we can't mention Windows or
Linux in the normative Standard - if we could have, we would have required
minimum contents on those systems and allowed implementations to add to them.
It's presence had escaped Robert Dewar, too, probably because he famously
refused to look in the AARM. Not sure if AdaCore ever fixed this oversight.
(If I could think of a sane way to test this in the ACATS I would.)
Anyway, I added it to your existing text by mentioning "optional child packages
Information and Hierarchical_File_Names". It's in the index, people can find
it if they have to.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, December 12, 2018 1:33 AM
When I put these into the RM, I noticed a number of issues.
(1) I dropped the change to A.8.2(1) because it isn't relevant: the wide and
wide wide forms of text io are defined by equivalence, so they don't need
to be mentioned here (and it is bad precedent, we'd have to make a change
like that in other general places as well).
(2) The A.8.2(28.3/4) addition read:
The nested package Wide_File_Names provides operations equivalent to those in
regular Sequential_IO except that Wide_String is used instead of String for the
name and form of the external file.
The nested package Wide_Wide_File_Names provides operations equivalent to those
in regular Sequential_IO except that Wide_Wide_String is used instead of String
for the name and form of the external file.
But this subclause applies to all 4 of the IO packages: sequential io, direct io,
text io, and stream io. It's plain wrong to put some text that only applies to
sequential io into this clause (we have A.8.3 for that). Moreover, this rule is
plenty generic to apply to all of the packages. So I replaced it by:
The nested package Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_String is
used instead of String for the name and form of the external file.
The nested package Wide_Wide_File_Names provides operations equivalent to the
operations of the same name of the outer package except that Wide_Wide_String
is used instead of String for the name and form of the external file.
(3) Now that there is a general description, we don't need to repeat it at
A.8.2(20), A.10.2(5), and A.12.1(33). So those changes were dropped as well.
****************************************************************
Questions? Ask the ACAA Technical Agent