Version 1.5 of ai05s/ai05-0127-1.txt

Unformatted version of ai05s/ai05-0127-1.txt version 1.5
Other versions for file ai05s/ai05-0127-1.txt

!standard A.19          09-10-31 AI05-0127-1/02
!standard A.19.1
!standard A.19.2
!standard A.19.3
!standard A.19.5
!standard A.19.6
!standard A.19.7
!standard A.19.8
!standard A.19.9
!standard A.19.10
!standard A.19.11
!standard A.19.12
!standard A.19.13
!class Amendment 09-10-31
!status No Action (9-0-0) 10-06-20
!status work item 08-10-22
!status received 08-10-20
!priority Low
!difficulty Medium
!subject Adding Locale Capabilities
!summary
Packages are needed to provide localization support for dates, times, numeric vales , and string collating.
!problem
Ada does not provide a portable way to determine which locale is currently active in an environment. Knowing the active locale would facilitate the writing of applications that can display text using the best language text to suit the language and country of the user. Even if the locale can be determined, there is no way to present locale sensitive information in a portable manner. Dates, times, numeric values, and collating order for strings can be affected by locale. Certain applications may have a need to display locale sensitive data for locales other than the currently active locale in the external execution environment. For example, many applications do not assume the locale of the user until the user has specified which language he/she prefers. Should localization support be added to the language? (Yes.)
!proposal
An important locale related functionality that is absent from Ada is the ability to determine which locale is active in the external execution environment. Secondly, there is a need to provide a means to format dates and times in a format best suitable for the user that permits the use of Month names, and Day names in a local sensitive manner. Thirdly, there needs to be to format numeric values in a local sensitive manner, so that the most appropriate currency symbol, decimal point, and digits separator characters are used. The Ada.Text_IO.Editing package already provides a way to do this, however it does not provide a means to get these symbols based on a locale. Finally, there needs to be a way to determine collation order for character and strings, for sorting purposes.
This proposal provides a new set of packages;
Ada.Locales; Ada.Locales.Calendar; Ada.Locales.Calendar.Formatting; Ada.Locales.Calendar.Wide_Formatting; Ada.Locales.Calendar.Wide_Wide_Formatting; Ada.Locales.Numeric; Ada.Locales.Numeric.Formatting; Ada.Locales.Numeric.Wide_Formatting; Ada.Locales.Numeric.Wide_Wide_Formatting; Ada.Locales.Collating;
Ada.Locales defines the locale object which is a small record containing a two character country code and a two character language code. A small string called a variant is also defined which can be used to further distinguish a locale when country and language alone is not enough. While the package allows one to construct a locale for any imaginable combination of country, language, and variant, if an attempt is made to use such a constructed locale in an environment that is not configured to support that locale, then locale_error will be raised on the call.
A locale can be constructed without specifying values for all components of the locale. In such a case, a locale will match a configured locale based on the components that are specified. For example, a locale can be constructed for English, without specifying the Country code. This locale could match any locale in the external execution environment whether it be UK, Canada, or some other country, so long as an English locale of some sort has been configured in the environment.
If there happens to be more than one locale configured for the same matching component, the selection of a matching locale is implementation defined.
The Locales package provides a function to retrieve the default locale, (i.e. the currently active locale) as well as provides a means to set the default locale. Furthermore, a function is provided to return the set of available locales configured in the external execution environment, and also includes a function to determine the number of available locales configured for the external execution environment.
The Locales package also provides some constant declarations for commonly used locales. These locales may or may not be configured in the environment. The function Is_Available may be used to query the external execution environment to see if a particular locale has been configured for the environment.
Ada.Locales.Calendar defines some types that are common to the child packages of Ada.Locales.Calendar. In particular an enumeration of calendar fields is defined, as well as a bit array that corresponds to this enumeration that is used to indicate which fields are to be used when formatting a date/time value. Only sensible combinations of these bits may be used, otherwise Program_Error will be raised on a call. Applications typically use a small number formats for displaying dates/times. This structure facilitates creating constant values for such bit arrays which can be stored efficiently in a word.
Ada.Locales.Calendar.Formatting, Wide_Formatting, and Wide_Wide_Formatting provide a functions to return localized representations of dates and times.
The Image function accepts a locale as a parameter, as well as the field set type defined in Ada.Locales.Calendars, and an optional time zone parameter. The result is formatted as appropriate for the locale based on the selected fields in the field set parameter.
There are also functions that return the localized month name string corresponding to a month number, and the localized day name string corresponding to a day number.
Ada.Locales.Numeric is the root package for child packages that localized formatting of numeric values to account for locale specific currency symbols, radix mark, and separator.
Ada.Locales.Numeric.Formatting, Wide_Formatting, and Wide_Wide_Formatting are generic packages that accept a floating point type as a formal type. These package provide an Image function that accept the Fore, Aft, and Exp parameters as defined for the similar Ada.Text_IO.Float_IO sub-programs. In addition, there is a parameter for locale, and an boolean parameter indicating whether the number is to be formatted as a currency value.
In addition, there are functions to return the appropriate currency string, radix character, and separator character that corresponds with a particular locale. These functions provide a way to create localized fixed decimal strings, since they can be passed as parameters to the sub-programs defined in the existing Ada.Text_IO.Editing, Wide_Editing, and Wide_Wide_Editing packages.
To support localized string collating capabilities, a child package Ada.Locales.Collating is defined which contains a generic String_Comparison package that accepts a locale as a formal object. This generic package defines Localized_String types which are derived from the standard String types. All sub-programs in the instance are tied to the locale used in the instantiation. This makes it possible to use the "<" and ">" operators to perform comparisons on localized strings, since the third parameter (locale) is implied.
!wording
A.19 The Package Locales
The package Locales provides operations for querying and determining the locales associated with the environment.
Static Semantics
The library package Locales has the following declaration:
package Ada.Locales is
type Language_Code is array (1 .. 2) of Character range 'a' .. 'z'; type Country_Code is array (1 .. 2) of Character range 'A' .. 'Z';
Max_Variant_Length : constant Natural;
type Locale_Type is private; type Locale_Set is array (Natural range <>) of Locale_Type;
Locale_Error : exception;
function To_Locale (Language : Language_Code) return Locale_Type;
function To_Locale (Language : Language_Code; Country : Country_Code) return Locale_Type;
function To_Locale (Language : Language_Code; Country : Country_Code; Variant : String) return Locale_Type;
function Country (Locale : Locale_Type) return Country_Code;
function Language (Locale : Locale_Type) return Language_Code;
function Variant (Locale : Locale_Type) return String;
procedure Set_Default (Default : Locale_Type);
function Default_Locale return Locale_Type;
function Is_Available (Locale : Locale_Type) return Boolean;
function Available_Locales_Count return Positive;
function Available_Locales (Start_Index : Positive := 1; Maximum : Positive := Positive'Last) return Locale_Set;
-- Useful and common locales; English : constant Locale_Type; US : constant Locale_Type; UK : constant Locale_Type; French : constant Locale_Type; German : constant Locale_Type; Canada : constant Locale_Type;
private ... -- not specified by the language end Ada.Locales;
A locale identifies a geographic region or cultural identity. A locale may be used to select the appropriate written language text to be presented to a user of the system. The external execution environment typically supports the use of one or more locales.
The default locale is the locale that is used when a specific locale has not been identified.
A language code identifies the written language corresponding to a locale.
A country code identifies the geographic region corresponding to a locale.
A variant further identifies a unique locale when the country code and language code alone does not uniquely identify a geographic region or cultural identity.
A locale is available if it is supported by the external execution environment.
A locale must correspond to a language code.
A locale matches a locale supported by the external execution environment if the specified language code, country code, and variant of the locale matches the language, country, and variant associated with a locale supported by the external execution environment.
If a locale does not identify a country or variant then these components of the locale are not used in the comparison between the locales supported by the external execution environment when looking for matching locales.
If a locale matches more than one locale supported by the external execution environment, the selection of a match from the external environment is implementation defined.
The following locale operations are provided:
function To_Locale
(Language : Language_Code) return Locale_Type;
The locale corresponding to the language code is returned. The locale does not identify a geographic region or variant and will match all locales in the external execution environment that are associated with the same language.
function To_Locale
(Language : Language_Code;
Country : Country_Code) return Locale_Type;
The locale corresponding to the language code and country code is returned. The locale does not identify a variant and will match all locales in the external execution environment that are associated with the same language and country.
function To_Locale
(Language : Language_Code;
Country : Country_Code; Variant : String) return Locale_Type;
The locale corresponding to the language code, country code, and variant is returned. The locale will match all locales in the external execution environment that are associated with the same language, country, and variant.
function Language
(Locale : Locale_Type) return Language_Code;
The language code associated with the locale is returned.
function Country
(Locale : Locale_Type) return Country_Code;
The country code associated with the locale is returned. An empty string is returned if the locale is not associated with a country code.
function Variant
(Locale : Locale_Type) return String;
The variant name associated with the locale is returned. An empty string is returned if the locale is not associated with a variant name.
procedure Set_Default (Default : Locale_Type);
Changes the default locale to the specified locale. A locale exception is raised if the locale is not available in the external execution environment.
function Default_Locale
return Locale_Type;
The default locale is returned.
function Is_Available (Locale : Locale_Type) return Boolean;
Indicates whether the specified locale is supported by the external execution environment.
function Available_Locales_Count
return Positive;
The number of locales supported by the external execution environment is returned.
function Available_Locales
(Start_Index : Positive := 1;
Maximum_Count : Positive := Positive'Last)
return Locale_Set;
The set of locales supported by the external execution environment is returned. Start_Index indicates the index of the first locale from the external execution environment to be returned. Maximum_Count indicates the maximum number of locales to be returned by the call.
A.19.1 The Package Locales.Calendar
The package Locales.Calendars defines the fields that may be requested when converting time values into a localized textual form.
Static Semantics
The library package Locales.Calendar has the following declaration:
with Ada.Calendar.Time_Zones; package Ada.Locales.Calendar is
package Time_Zones renames Ada.Calendar.Time_Zones;
subtype Time is Ada.Calendar.Time; subtype Month_Number is Ada.Calendar.Month_Number; subtype Day_Number is Ada.Calendar.Day_Number;
type Time_Field is (Year_Number, Short_Year_Number, Numeric_Month, Month_Name, Day_Name, Short_Day_Name, Short_Month_Name, Day_Of_Month, Hours, Minutes, Seconds, Am_Pm_Marker, Time_Fraction, Time_Zone);
subtype Year_Field is Time_Field range Year_Number .. Short_Year_Number; subtype Month_Field is Time_Field range Numeric_Month .. Short_Month_Name; subtype Day_Field is Time_Field range Day_Name .. Day_Of_Month;
type Field_Set is array (Time_Field) of Boolean; for Field_Set'Component_Size use 1;
function Is_Valid (Fields : Field_Set) return Boolean;
Default_Field_Set : constant Field_Set := (Short_Year_Number | Numeric_Month | Day_Of_Month | Hours .. Seconds => True, others => False);
end Ada.Locales.Calendar;
A time field indicates the kinds of time elements that are to appear in a localized formatted text string.
A Year_Number represents the year in numeric form. e.g., 2005
A Short_Year_Number is an abbreviated form of the numeric year representing the current decade number and the year within the decade. e.g., 05
A Numeric_Month represents the month in numeric form. e.g., 05
A Month_Name represents the full name of the month in text form. e.g., January
A Day_Name represents the full name of the day in text form. e.g., Wednesday
A Short_Day_Name represents an abbreviated form of the day name. e.g., Wed
A Short_Month_Name represents an abbreviated form of the month name. e.g., Jan
A Day_Of_Month represents the day of the month in numeric form. e.g., 31
Hours represents the hour value of the time of day in numeric form. If the Am_Pm_Marker is requested, hours represents the hour from 1 to 12. Otherwise, the hour corresponds to a 24 hour clock ranging from 0 to 23.
Minutes represents the minutes within the hour in numeric form. e.g., 30
Seconds represents the seconds within the minute in numeric form. e.g., 30.
Am_Pm_Marker represents morning or afternoon for a 12 hour clock format.
Time_Fraction represents the fraction of a second in numeric form. e.g. 234
Time_Zone represents the name of the time zone associated with the time. e.g. PST
A field set is a combination of time fields to be used for formatting time values in a localized textual format.
The default field set is used when a field set has not been specified when formatting time values in localized textual form.
A valid field set may only contain certain combinations of time fields.
Only one Year_Field, Month_Field, or Day_Field may be present in a field set.
Time Field units span in length of time from years to fractions of a second. It is not valid to specify a field set that has a gap in unit type between the smallest unit size to the largest unit size indicated in a field set.
If the Am_Pm_Marker is present in a field set, then Hours must also be present.
If Time_Zone is present in a field set, then Hours must also be present.
function Is_Valid (Fields : Field_Set) return Boolean;
Indicates whether a field_set contains a valid combination of time fields.
A.19.2 Locales.Calendar.Formatting
The package Locales.Calendar.Formatting is used to retrieve localized textual forms of time and date related values.
Static Semantics
The library package Locales.Calendar.Formatting has the following declaration:
package Ada.Locales.Calendar.Formatting is
function Image (Date : Time; Locale : Locale_Type := Default_Locale; Include_Fields : Field_Set := Default_Field_Set; Time_Zone : Time_Zones.Time_Offset := 0) return String;
function Month_Name (Locale : Locale_Type := Default_Locale; Month : Month_Number; Abbreviate : Boolean := False) return String;
function Day_Name (Locale : Locale_Type := Default_Locale; Day : Day_Number; Abbreviate : Boolean := False) return String;
end Ada.Locales.Calendar.Formatting;
The exception Locale_Error is propagated by an attempt to pass a locale in a call to any of the functions defined in Locales.Calendar.Formatting for a locale that is not available in the external execution environment.
function Image
(Date : Time;
Locale : Locale_Type := Default_Locale; Include_Fields : Field_Set := Default_Field_Set; Time_Zone : Time_Zones.Time_Offset := 0) return String;
A localized string representing the specified time including the fields specified in the field set is returned. A Program_Error is propagated if an attempt is made to make a call to format a localized representation of time using a field set that is not valid.
function Month_Name
(Locale : Locale_Type := Default_Locale;
Month : Month_Number; Abbreviate : Boolean := False) return String;
A localized string containing the name of a month corresponding to the specified month number is returned. If Abbreviate is true and the specified locale has an abbreviated form of the month name, then the abbreviated form is returned, otherwise the full name is returned.
function Day_Name
(Locale : Locale_Type := Default_Locale;
Day : Day_Number; Abbreviate : Boolean := False) return String;
A localized string containing the name of a day of the week corresponding to the specified day number is returned. If Abbreviate is true and the specified locale has an abbreviated form of the day name, then the abbreviated form is returned, otherwise the full name is returned.
A.19.3 Locales.Calendar.Wide_Formatting
Static Semantics
The child package Locales.Calendar.Wide_Formatting has the same contents as Locales.Calendar.Formatting except that each occurrence of String is replaced by Wide_String.
A.19.4 Locales.Calendar.Wide_Wide_Formatting
Static Semantics
The child package Locales.Calendar.Wide_Formatting has the same contents as Locales.Calendar.Formatting except that each occurrence of String is replaced by Wide_Wide_String.
A.19.5 Locales.Numeric
Static Semantics
The library package Locales.Numeric.Formatting has the following declaration:
package Ada.Locales.Numeric is
subtype Field is Integer range 0 .. 255;
function Currency_Symbol (Locale : Locale_Type := Default_Locale) return String;
function Separator (Locale : Locale_Type := Default_Locale) return Character;
function Radix_Mark (Locale : Locale_Type := Default_Locale) return Character;
end Ada.Locales.Numeric;
The exception Locale_Error is propagated by an attempt to pass a locale in a call to any of the functions defined in Locales.Numeric for a locale that is not available in the external execution environment.
function Currency_Symbol
(Locale : Locale_Type := Default_Locale) return String;
A localized string representing the currency symbol for the specified locale is returned.
function Separator
(Locale : Locale_Type := Default_Locale) return Character;
A localized string representing the separator character for the specified locale is returned.
function Radix_Mark
(Locale : Locale_Type := Default_Locale) return Character;
A localized string representing the radix mark character for the specified locale is returned.
A.19.6 Locales.Numeric.Formatting
The package Locales.Numeric.Formatting is used to retrieve localized textual forms of numeric values, including currency values.
Static Semantics
The following procedure defined in the generic package Locales.Numeric.Formatting has to be instantiated for the appropriate floating point type (indicated by Num in the specifications).
The library package Locales.Numeric.Formatting has the following declaration:
generic type Num is digits <>; package Ada.Locales.Numeric.Formatting is
Default_Fore : Field := 2; Default_Aft : Field := Num'digits - 1; Default_Exp : Field := 3;
function Image (Item : Num; Locale : Locale_Type; Currency : Boolean := False; Fore : Field := Default_Fore; Aft : Field := Default_Aft; Exp : Field := Default_Exp) return String;
end Ada.Locales.Numeric.Formatting;
function Image (Item : Num; Locale : Locale_Type; Currency : Boolean := False; Fore : Field := Default_Fore; Aft : Field := Default_Aft; Exp : Field := Default_Exp) return String;
The exception Locale_Error is propagated by an attempt to pass a locale that is not available in the external execution environment. A localized string representing the value designated by Item is returned. if Currency is true, then the value is formatted as a currency value using a localized currency symbol. Otherwise, a currency symbol is not included in the result. Fore indicates the minimum number of characters before the radix mark. Aft indicates the maximum number of characters that follow the radix mark. The Fore field may include leading spaces, and a minus sign for negative values. if Aft is zero, then the radix mark may not appear in the result. A separator character may also be used to separate groupings of digits. The Aft field includes only decimal digits (possibly with trailing zeros). The Exp field includes the sign (plus or minus) and the exponent (possibly with leading zeros).
A.19.7 Locales.Numeric.Wide_Formatting
The child package Locales.Numeric.Wide_Formatting has the same contents as Locales.Numeric.Formatting except that each occurrence of String is replaced by Wide_String, and each occurrence of Character is replaced by Wide_Character.
A.19.8 Locales.Numeric.Wide_Wide_Formatting
The child package Locales.Numeric.Wide_Wide_Formatting has the same contents as Locales.Numeric.Formatting except that each occurrence of String is replaced by Wide_Wide_String, and each occurrence of Character is replaced by Wide_Wide_Character.
A.19.9 Locales.Strings
Static Semantics
package Ada.Locales.Strings is end Ada.Locales.Strings;
A.19.10 Locales.Strings.Collating
The package Locales.Strings.Collating is used to determine the localized sort ordering for localized text strings.
Static Semantics
The following procedures defined in the generic packages Locales.Strings.Collating have to be instantiated for the appropriate locale.
The library package Locales.Strings.Collating has the following declaration:
package Ada.Locales.Strings.Collating is
generic Locale : Locale_Type := Default_Locale; package String_Comparison is
type Localized_String is new String;
function "=" (Left : Localized_String; Right : Localized_String) return Boolean;
function "<" (Left : Localized_String; Right : Localized_String) return Boolean;
function ">" (Left : Localized_String; Right : Localized_String) return Boolean;
function "<=" (Left : Localized_String; Right : Localized_String) return Boolean;
function ">=" (Left : Localized_String; Right : Localized_String) return Boolean;
end String_Comparison;
end Ada.Locales.Strings.Collating;
A Localized string is a string that contains characters intended for presentation using a particular locale.
The exception Locale_Error is propagated by an attempt to call any of the functions defined in Locales.Strings.Collating for an instantiation for a locale that is not available in the external execution environment.
Each of the functions "=", "<", ">", "<=", and ">=" have the same semantics as the corresponding String operation applied to the string values given or represented by the two parameters except that collating order for the comparison of characters is determined by the locale.
A.19.11 Locales.Strings.Wide_Collating
The child package Locales.Strings.Wide_Collating has the same contents as Locales.Strings.Collating except that each occurrence of String is replaced by Wide_String.
A.19.12 Locales.Strings.Wide_Wide_Collating
The child package Locales.Numeric.Wide_Wide_Formatting has the same contents as Locales.Strings.Collating except that each occurrence of String is replaced by Wide_Wide_String.
!discussion
Consideration was given to whether locale based case folding capabilities could be provided. This was ruled out because there are cases for certain characters that do not have round trip case folding. That is, a lower case character might map to multiple upper case characters, or vice versa.
The Localized_String defined in Ada.Locales.Strings is derived from the standard String type. This ensures that the specified comparison operators are used. However, a Localized_String can be cast back and forth between a regular String which allows for using any of the standard libraries that involve Strings.
The use of the locale as a formal object in Ada.Locales.Strings.Collating is rather unusual. This approach was chosen because it allows the use of the comparison operators, "=", "<", ">", "<=", and ">=". Otherwise, locale would have to be passed in as a third parameter which makes the calls awkward because they would need to be in the form operator (L, R) rather than the form L operator R. Because the generic is a sub-package however, it allows for the implementation to move most of the code out of the generic, to minimize potential code bloat.
A question was raised whether Country codes should be 3 characters, Chinese apparently has two written text forms. Java uses the 2 character scheme however where one form uses a country code for Taiwan, and the other uses a Country code for China. It is believed that the underlying operating system support for Unix and Windows uses a two character code, so this might be the reason for the two character limit.
--!corrigendum A.19(0)
!ACATS test
!appendix


From: Brad Moore
Date: Tuesday, October 21, 2008  2:08 AM

At the last ARG meeting, the PRG had requested that an AI be created
that takes the locale into account for conversions between upper and
lower case.

Please see the attached for the initial writeup for this issue.

****************************************************************

From: Randy Brukardt
Date: Tuesday, October 21, 2008  1:16 PM

Now for the technical comments. Ignoring the massive incompatibility that
this represents, it is bad that with this change,
Ada.Characters.Handling.To_Upper would provide a different result than
using the functions and constants in Ada.Strings.Maps.

It is also bad that the results of To_Upper would change uncontrollably.
For instance, my spam scanning program needs to look at message headers
using the Internet character set; the locale of the receiving machine is
completely irrelevant to that task. But this proposal provides no way to
portably ensure that the standard interpretation (rather than
locale-dependent one) be used. The net effect would be that many programs
would have to avoid using Ada.Characters.Handling altogether. That doesn't
seem good.

****************************************************************

From: Brad Moore
Date: Tuesday, October 21, 2008  2:40 PM

Note that to address the issue with the Turkish language, the change would
not be affecting the set of characters that are considered upper case or
lower case. It is only affecting which lower case character maps to which
upper case character.

This might be viewed as a good thing. Instead of having two ways to do the
same thing, there would be a locale sensitive way, and a hard-coded mapping way.

The incompatibility however is probably not a good thing. Maybe this ties
into the other discussion and the user could select which "feature" they want
somehow.

Maybe a pragma could be used, or maybe a new subprogram could be used to
set a flag in the implemenation that indicates whether or not to consider
locale.

eg. Something like;

Ada.Characters.Handling.Use_Current_Locale;

****************************************************************

From: Robert Dewar
Date: Tuesday, October 21, 2008  4:28 PM

> It is also bad that the results of To_Upper would change
> uncontrollably. For instance, my spam scanning program needs to look
> at message headers using the Internet character set; the locale of the
> receiving machine is completely irrelevant to that task. But this
> proposal provides no way to portably ensure that the standard
> interpretation (rather than locale-dependent one) be used. The net
> effect would be that many programs would have to avoid using
> Ada.Characters.Handling altogether. That doesn't seem good.

I am opposed to doing anything in this direction, and it is out of the
question to introduce any incompatibilities. This is definitely overkill
in terms of predefined capabilities in the language.

****************************************************************

From: Jean-Pierre Rosen
Date: Tuesday, October 21, 2008  5:23 PM

> Maybe a pragma could be used, or maybe a new subprogram could be used
> to set a flag in the implemenation that indicates whether or not to
> consider locale.
>
> eg. Something like;
>
> Ada.Characters.Handling.Use_Current_Locale;

In that case, I'd rather leave To_Upper as is, and add a function
Localized_To_Upper. This would definitely put the choice in the hands
of the user.

****************************************************************

From: Robert Dewar
Date: Tuesday, October 21, 2008  5:42 PM

This seems a solution searching for a problem, has anyone encountered a
user (e.g. from Turkey) who has requested this feature?

We have many hundreds of enhancement requests filed, this is not among them!

****************************************************************

From: Robert I. Eachus
Date: Tuesday, October 21, 2008  5:56 PM

Worse, trying to define some locales would be extremely problematical.
We should either let the appropriate ISO body deal with it or stay away
from any specific localizations.  For example, Canadian French, last I
looked did upper case differently than France.

If anything must be done, define a generic template for creating a
localized version of the appropriate package or packages and leave it
at that.  Notice that this serves the goal of portability.  If a program
depends on localization, it will have the generic instantiation as
part of its source.  Changing locales may be problematic, but the
Turkish version should compile and run in Greece, albeit with Turkish
alphabet and localizations.

****************************************************************

From: Brad Moore
Date: Friday, October 24, 2008  10:56 AM

The origin of the problem that led to this "solution" is the PRG work,
which deals with updating the Ada bindings to POSIX.
There have been a lot of new calls added to POSIX since the bindings were
last updated. The thought was that there may be certain new POSIX calls
and capabilities that may be worth adding to the Ada standard libraries,
rather than create new packages in the POSIX bindings.
For example, Ada.Directories eliminates or greatly reduces the need to
add bindings to directory related calls.

If a package for managing locales is to be added to a standard,
I am guessing that users would prefer to see that as part of the Ada standard,
rather than part of the Ada POSIX bindings, if possible.
For one thing, the Ada standard is more portable than POSIX, since the Ada standard
exists on platforms that do not support POSIX. Also the bindings can get out of date
with the POSIX standard as well as the Ada standard, whereas that would not be an
issue if the calls were part of the Ada standard.

We were looking at locale related functionality, and thought this would
be a good trial balloon to float over to the ARG.

The early indications seem to be that the locale balloon is losing
altitude rapidly. Though, there are likely other locale related issues that
are of more interest to users, such as currency, data/time formatting, etc.

Regardless what happens with this AI, it would be good to get a feel
for the number of enhancement requests that come in for updates to the Ada POSIX bindings.

If there are any, it would be useful to know which areas of functionality people
are interested in. For example, if nobody is interested in accessing POSIX locale calls
with Ada, then maybe those calls should be left alone, (in both the Ada standard and
the POSIX bindings).

We have limited resources, and should probably avoid spending time on things that aren't of
interest to anyone.

Robert, do you have any inputs on which POSIX bindings people are interested in seeing updated?

As a final note, even if To_Upper and To_Lower do not end up checking the locale, there is still
the point that the RM does not seem to clearly state how the conversion between upper and lower
is supposed to happen. It is implied that this is an obvious transformation, which it may be.
It might at least be worth considering the wording portion that clearly specifies how the current
conversion works.

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  1:15 PM

Excellent response.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  6:04 PM

Not true now (it was true in Ada 95). 2.1(5/2) specifies that the language
definition uses the uppercase mapping of Unicode. That applies everywhere
(not just in program text) - this paragraph says "the language definition",
not something about source files.

One could imagine adding an AARM note to clarify that a bit, but there isn't
anything undefined here. (The same sort of conversion wording is used for
identifiers and the Wide_Wide_Image attribute -- there is no discussion of
what it means to convert to upper case.)

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  6:11 PM

It was certainly perfectly clear to me when I implemented the case equivalence
for 10646.

P.S. I think it is a big mistake to have case equivalence for peculiar characters,
it just doesn't work right, given the locale dependence that should be there,
but isn't, and you wouldn't want it for indentifiers anyway.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  6:26 PM

> It was certainly perfectly clear to me when I implemented the case
> equivalence for 10646.

For me, too. Surely it is obvious for the Latin-1 characters (and those are
the only ones involved in Ada 95). But I couldn't find any Ada 95 wording
that defined the mapping between upper and lower case letters (which letter
belong to each character category IS defined). Admittedly, I didn't try that
hard to find such wording; it's not relavant to the topic at hand (which is
whether Ada 2005 has a hole).

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  6:12 PM

> The origin of the problem that led to this "solution" is the PRG work,
> which deals with updating the Ada bindings to POSIX.
> There have been a lot of new calls added to POSIX since the bindings
> were last updated. The thought was that there may be certain new POSIX
> calls and capabilities that may be worth adding to the Ada standard
> libraries, rather than create new packages in the POSIX bindings.
> For example, Ada.Directories eliminates or greatly reduces the need to
> add bindings to directory related calls.

The fact that the Posix bindings have some feature is not of itself a
reason for burdening Ada implementations with features that are not needed
by Ada users :-)

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  6:55 PM

I can see the point that such functionality shoul not be in
Ada.Characters.Handling, but it still makes sense to define locale-sensitive
conversions in Ada. Does it make sense to define this in Interfaces?

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  7:16 PM

I think it would make sense for it to be a new child of Ada.Characters
(if we have it at all). After all, there are no Wide_Character or
Wide_Wide_Character case conversions (or classifications, for that matter)
currently, and those are where the Locale potentially would make a
significant difference. (After all, there are no Turkish characters in
Latin-1, so the classic example of this need is non-existent for the
Latin-1 routines in Ada.Characters.Handling.

Indeed, I thought that you and Brad were planning to propose some sort of
Ada.Characters.Wide_Handling, which you could make a case for without any
appeal to locale issues.

****************************************************************

From: Brad Moore
Date: Friday, October 24, 2008  6:40 PM

> The fact that the Posix bindings have some feature is not of itself a
> reason for burdening Ada implementations with features that are not
> needed by Ada users :-)

Agreed, and if the feature is not needed by Ada users, then by association,
it seems safe to say that it is not needed by Ada POSIX binding users either.

This is the sort of information that is valuable and could help to
significantly reduce the workload for the PRG team.

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  7:43 PM

there I disagree, a binding to Posix should bind to all features, I don't
think it is the place of the implementors a binder to make value judgments
overriding the designers of the library they are binding to.

On the other hand, as designers of the Ada language, we do most certainly
have the obligation to include only useful stuff.

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  8:50 PM

As a member of a NATO team that is implementing libraries for use in more
than a dozen countries, we are dealing with the issues of internatinalization
and localization. Ada is used as the calculation engine, but Java and C++
are dominating the user interface.
users ask, who has what we need now, not, please give me X.
many of our member nations are implementing on Windows, and some on Unix.
A POSIX only binding doesn't help much.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  9:17 PM

Trying to chase what users need now only means that you'll be giving them
old technology when it gets done. It's a fools game; at best you have to
give them what they'll need in a few years.

And in that perspective, GUIs are going totally to the browser. Ada,
unfortunately, has no role to play in such UIs, unless you think that Ada
can replace Javascript in the browser. (If so, I have a hot new investment
opportunity for you). Ada is well suited to -talk- to such UIs, and that
is what we need to concentrate on.

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  9:32 PM

> I can see the point that such functionality shoul not be in
> Ada.Characters.Handling, but it still makes sense to define locale-sensitive
> conversions in Ada. Does it make sense to define this in Interfaces?

The fact that something makes sense is not sufficient justification for adding
a feature to the language. Seems a lot of complexity for little gain for a
feature which has not in our experience been requested by any users.

****************************************************************

From: Robert Dewar
Date: Saturday, October 25, 2008 10:11 AM

This feature reminds me of leap seconds :-)

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  9:42 PM

But it is very little complexity. There is a localization standard, POSIX
and Windows have implemented it. all we would be asking for is the interface glue.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  10:08 PM

I don't know how you can say that. You'd have to describe the interfaces in
Ada terms (reference to a particular implementation is not done currently in
the Ada standard outside of the AARM, and surely we're going to keep to that).
And you'd need a least-common denominator description, which is not going to
save you any POSIX interface work -- you'll still have to provide a binding
to the POSIX implementation for any advanced capabilities (and there are many,
I would suspect).

****************************************************************

From: Robert Dewar
Date: Saturday, October 25, 2008  10:12 AM

> But it is very little complexity. There is a localization standard, POSIX and
> Windows have implemented it. all we would be asking for is the interface glue.

There are many operating systems that are neither Windows nor Posix compliant.
VxWorks comes to mind immediately!

I suspect similar arguments were used to justify adding leap seconds.
One difference is that we will never repeat the mistake of putting in so much
effort to implement something so useless :-)

****************************************************************

From: Pascal Leroy
Date: Sunday, November 15, 2009  8:01 AM

> A.19 The Package Locales
>
> The package Locales provides operations for querying and determining the locales
> associated with the environment.
>
> Static Semantics
>
> The library package Locales has the following declaration:
>
> package Ada.Locales is
>
>   type Language_Code is array (1 .. 2) of Character range 'a' .. 'z';
>   type Country_Code is array (1 .. 2) of Character range 'A' .. 'Z';
>
>  Max_Variant_Length : constant Natural;

I couldn't find a normative reference in this AI, but it seems to be based on an
ancient standard.  FWIW, language codes can now have 3 characters (there are
more than 1000 of them), and you cannot in general define a locale without
specifying the script, as some languages (eg Uzbek) have changed script over
time, and the script impacts the collation rules.

This AI only scratches the surface of l10n/i18n issues, and if it made it into
the language as currently written I believe that it would be actively harmful,
as it doesn't even come close to addressing the complexity of the problem.  For
instance the Calendar package is specialized for the Gregorian calendar, which
is inappropriate for many/most locales.  For details about the state of the
practice in this area, look for BCP 47 on Wikipedia.

As others have pointed out, integrating i18n into the language would be a
gigantic effort, and I'm not sure that the ARG has the necessary know-how in
this area, so I would suggest to drop the AI entirely.

****************************************************************

From: Bob Duff
Date: Sunday, November 15, 2009  5:01 PM

> As others have pointed out, integrating i18n into the language would
> be a gigantic effort, and I'm not sure that the ARG has the necessary
> know-how in this area, so I would suggest to drop the AI entirely.

I agree.

I certainly have little know-how in this area.  From reading-up on this area, it
seems like i18n should be left as an OS issue, not a language issue.

If we try to do something, we will do it badly, and that's worse than doing
nothing.

P.S. Hi, Pascal, good to hear from you!

****************************************************************

From: Robert Dewar
Date: Sunday, November 15, 2009  5:54 PM

I strongly agree with Pascal, we have no noticeable user interest in this, and
doing a half-baked job would be a step backward. Drop this AI entirely.

****************************************************************

Questions? Ask the ACAA Technical Agent