Version 1.1 of ai12s/ai12-0021-1.txt
!standard A.16(0/3) 12-03-13 AI12-0021-1/01
!class Amendment 12-03-13
!status work item 12-02-25
!status received 12-02-25
!priority High
!difficulty Hard
!subject Additional internationalization of Ada
!summary
**TBD.
!proposal
In addition to the facilities already provided,
(1) File and directory operations should support Unicode characters (presuming
that the target file system does so);
(2) Exception messages and exception information should support Unicode
characters;
(3) Command lines should support Unicode characters (presuming that the target
system allows these).
!wording
** TBD.
!discussion
These issues defy an easy solution. Changing the behavior of the existing routines
would break existing workarounds (which on some targets, like most Linux systems,
have no problems with directly using UTF-8 strings) and other commonly used
functionality (like encoding binary data in exception messages).
Adding even more Wide_Wide_ packages and routines is madness, and may not even make
sense on many targets.
The crux of this problem is that the semantics and representation of strings have
become co-mingled. What we really need to do is to separate these; the problem with
that mostly with retaining adequate performance.
The way-out solution would be to declare a semi-magic Root_String interface (or
perhaps an abstract type); the
magic would be string literals (since "lvalue"s and indexing already can be
supported with existing Ada 2012 facilities). Something on the line of:
package General_Strings is
type Root_String is interface with
Constant_Indexing => Get_Char,
Variable_Indexing => Set_Char,
String_Literal => Assign; --
function Get_Char (A : Root_String; I : Positive) return Wide_Wide_Character;
--
type LValue (D : access Wide_Wide_Character)
with Implicit_Dereferencing => D
is null record;
function Set_Char (A : in out Root_String; I : Positive) return LValue;
--
--
function Slice (A : Root_String; L : Positive; R : Natural) return Wide_Wide_String;
--
--
procedure Assign (Trg : in out Root_String; Src : in Wide_Wide_String);
--
function Value (Obj : Root_String) return Wide_Wide_String;
--
--
--
--
end General_Strings;
[Note: I didn't try to think of good names for these routines and parameters; that would need to
done, of course.]
Then we'd have a bunch of concrete instances:
type Latin_1_String (L, R : Positive) is new General_Strings.Root_String with
Obj : String (L .. R);
--
type Bounded_UTF_8_String (Byte_Len : Natural) is new General_Strings.Root_String with
Obj : UTF_8_String (1 .. Byte_Len);
--
and so on for every interesting representation.
In addition, we'd have Ada.Strings.General (which would have approximately the contents of
Ada.Strings.Fixed, with all of the String parameters converted to Root_String'Class). And
most of the IO routines that take strings would have versions that would take Root_String'Class
(these would need different names or packages, unfortunately, to avoid ambiguity). Similarly
for exception messages, and so on.
The real key here is that the string types would carry their representation along when passed
into routines (which have to be new for this reason). Once that is available, then any problems
can be dealt with by simply using whatever representation is appropriate for the target system.
This clearly will need a lot of work...
!ACATS test
** TBD.
!appendix
This AI was created from the ashes of AI05-0286-1 (that is, those portions
that defied an easy solution).
****************************************************************
Questions? Ask the ACAA Technical Agent