Version 1.1 of ai05s/ai05-0127-1.txt

Unformatted version of ai05s/ai05-0127-1.txt version 1.1
Other versions for file ai05s/ai05-0127-1.txt

!standard A.3.2(34)          08-10-22 AI05-0127-1/01
!standard A.3.2(35)
!class binding interpretation 08-10-22
!status work item 08-10-22
!status received 08-10-20
!priority Low
!difficulty Medium
!qualifier Omission
!subject Locale consideration for To_Upper and To_Lower
!summary
To_Upper and To_Lower in Ada.Characters.Handling must take the current locale into consideration.
!question
A.3.2(34) and A.3.2(35) state that To_Lower and To_Upper return the corresponding lower-case and upper-case value associated with the formal parameter value. The problem is that the corresponding value is not well defined because it can mean different things depending on the written language of the reader. For example, in Turkish, the upper case value for 'i' is a capital I with a dot above. The lower case value for 'I' is 'i' without the dot. Lithuanian and Azeri are two other languages that present similar idiosyncrasies. Should the implementation take the current locale into consideration to determine if a more appropriate value should be returned to the caller? (Yes.)
!wording
Add new paragraph A.3.2 (32.1) A locale is an external environment configuration that can specify the conventions to be used for matching cultural aspects such as case conversion and collating order, as well as the formatting to be applied in order to represent currency, numeric, and date values. {locale} The current locale is the locale that is currently in effect on the target environment. {current locale}
A.3.2 (32.1.a) Implementation defined: The interpretation of the current_locale.
Change A.3.2(34) as follows:
To_Lower Returns the corresponding lower-case value for Item if Is_Upper(Item),
and returns Item otherwise. {If the target environment specifies a case conversion mapping for the current locale, this mapping is used to determine the result value, otherwise the result is the character whose value is equivalent to;
Character'Value(Character'Pos(Item) + 16#20#)}
Change A.3.2(35) as follows:
To_Upper Returns the corresponding upper-case value for Item if Is_Lower(Item)
and Item has an upper-case form, and returns Item otherwise. The lower case letters '' and '' do not have upper case forms. {If the target environment specifies a case conversion mapping for the current locale, this mapping is used to determine the result value, otherwise the result is the character whose value is equivalent to;
Character'Value(Character'Pos(Item) - 16#20#)}
!discussion
The consideration of the current locale when determining the return values for To_Lower and To_Upper is fairly straight forward. It may also be useful to be able to specify the locale to be used as an input parameter for new forms of To_Lower and To_Upper so that conversions can be performed for locales other than the current locale. This would likely mean defining a new type to be associated with a locale, however. Such a type would likely need to be defined in a new package such as Ada.Locales, since locales pertain to other cultural aspects than case conversion and would not be appropriate to define in Ada.Characters.Handling. While such a package would be useful and might be worth considering for the next revision of the language, this AI only deals with providing clarity to the existing functions. We also want the wording to more stronger than leaving it as implementation defined. It is desirable that implementations consider the locale if one has been defined for the target. Microsoft Windows and POSIX both have similar concepts of locale. If the locale is not specified, or if the target does not support the concept of locales, then at least the behaviour is clearly defined with the new wording.
--!corrigendum A.3.2(34)
!ACATS test
!appendix

From: Brad Moore
Date: Tuesday, October 21, 2008  2:08 AM

At the last ARG meeting, the PRG had requested that an AI be created
that takes the locale into account for conversions between upper and
lower case.

Please see the attached for the initial writeup for this issue.

****************************************************************

From: Randy Brukardt
Date: Tuesday, October 21, 2008  1:16 PM

Now for the technical comments. Ignoring the massive incompatibility that
this represents, it is bad that with this change,
Ada.Characters.Handling.To_Upper would provide a different result than
using the functions and constants in Ada.Strings.Maps.

It is also bad that the results of To_Upper would change uncontrollably.
For instance, my spam scanning program needs to look at message headers
using the Internet character set; the locale of the receiving machine is
completely irrelevant to that task. But this proposal provides no way to
portably ensure that the standard interpretation (rather than
locale-dependent one) be used. The net effect would be that many programs
would have to avoid using Ada.Characters.Handling altogether. That doesn't
seem good.

****************************************************************

From: Brad Moore
Date: Tuesday, October 21, 2008  2:40 PM

Note that to address the issue with the Turkish language, the change would
not be affecting the set of characters that are considered upper case or
lower case. It is only affecting which lower case character maps to which
upper case character.

This might be viewed as a good thing. Instead of having two ways to do the
same thing, there would be a locale sensitive way, and a hard-coded mapping way.

The incompatibility however is probably not a good thing. Maybe this ties
into the other discussion and the user could select which "feature" they want
somehow.

Maybe a pragma could be used, or maybe a new subprogram could be used to
set a flag in the implemenation that indicates whether or not to consider
locale.

eg. Something like;

Ada.Characters.Handling.Use_Current_Locale;

****************************************************************

From: Robert Dewar
Date: Tuesday, October 21, 2008  4:28 PM

> It is also bad that the results of To_Upper would change 
> uncontrollably. For instance, my spam scanning program needs to look 
> at message headers using the Internet character set; the locale of the 
> receiving machine is completely irrelevant to that task. But this 
> proposal provides no way to portably ensure that the standard 
> interpretation (rather than locale-dependent one) be used. The net 
> effect would be that many programs would have to avoid using 
> Ada.Characters.Handling altogether. That doesn't seem good.

I am opposed to doing anything in this direction, and it is out of the
question to introduce any incompatibilities. This is definitely overkill
in terms of predefined capabilities in the language.

****************************************************************

From: Jean-Pierre Rosen
Date: Tuesday, October 21, 2008  5:23 PM

> Maybe a pragma could be used, or maybe a new subprogram could be used 
> to set a flag in the implemenation that indicates whether or not to 
> consider locale.
> 
> eg. Something like;
> 
> Ada.Characters.Handling.Use_Current_Locale;
 
In that case, I'd rather leave To_Upper as is, and add a function
Localized_To_Upper. This would definitely put the choice in the hands
of the user.

****************************************************************

From: Robert Dewar
Date: Tuesday, October 21, 2008  5:42 PM

This seems a solution searching for a problem, has anyone encountered a
user (e.g. from Turkey) who has requested this feature?

We have many hundreds of enhancement requests filed, this is not among them!

****************************************************************

From: Robert I. Eachus
Date: Tuesday, October 21, 2008  5:56 PM

Worse, trying to define some locales would be extremely problematical.  
We should either let the appropriate ISO body deal with it or stay away
from any specific localizations.  For example, Canadian French, last I
looked did upper case differently than France.

If anything must be done, define a generic template for creating a
localized version of the appropriate package or packages and leave it
at that.  Notice that this serves the goal of portability.  If a program
depends on localization, it will have the generic instantiation as
part of its source.  Changing locales may be problematic, but the
Turkish version should compile and run in Greece, albeit with Turkish
alphabet and localizations.

****************************************************************

From: Brad Moore
Date: Friday, October 24, 2008  10:56 AM

The origin of the problem that led to this "solution" is the PRG work,
which deals with updating the Ada bindings to POSIX.
There have been a lot of new calls added to POSIX since the bindings were
last updated. The thought was that there may be certain new POSIX calls
and capabilities that may be worth adding to the Ada standard libraries,
rather than create new packages in the POSIX bindings.
For example, Ada.Directories eliminates or greatly reduces the need to
add bindings to directory related calls.

If a package for managing locales is to be added to a standard,
I am guessing that users would prefer to see that as part of the Ada standard,
rather than part of the Ada POSIX bindings, if possible.
For one thing, the Ada standard is more portable than POSIX, since the Ada standard
exists on platforms that do not support POSIX. Also the bindings can get out of date
with the POSIX standard as well as the Ada standard, whereas that would not be an
issue if the calls were part of the Ada standard.

We were looking at locale related functionality, and thought this would
be a good trial balloon to float over to the ARG.

The early indications seem to be that the locale balloon is losing
altitude rapidly. Though, there are likely other locale related issues that
are of more interest to users, such as currency, data/time formatting, etc.

Regardless what happens with this AI, it would be good to get a feel
for the number of enhancement requests that come in for updates to the Ada POSIX bindings.

If there are any, it would be useful to know which areas of functionality people
are interested in. For example, if nobody is interested in accessing POSIX locale calls
with Ada, then maybe those calls should be left alone, (in both the Ada standard and
the POSIX bindings).

We have limited resources, and should probably avoid spending time on things that aren't of
interest to anyone.

Robert, do you have any inputs on which POSIX bindings people are interested in seeing updated?

As a final note, even if To_Upper and To_Lower do not end up checking the locale, there is still
the point that the RM does not seem to clearly state how the conversion between upper and lower
is supposed to happen. It is implied that this is an obvious transformation, which it may be.
It might at least be worth considering the wording portion that clearly specifies how the current
conversion works.

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  1:15 PM

Excellent response.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  6:04 PM

Not true now (it was true in Ada 95). 2.1(5/2) specifies that the language
definition uses the uppercase mapping of Unicode. That applies everywhere
(not just in program text) - this paragraph says "the language definition",
not something about source files.

One could imagine adding an AARM note to clarify that a bit, but there isn't
anything undefined here. (The same sort of conversion wording is used for
identifiers and the Wide_Wide_Image attribute -- there is no discussion of
what it means to convert to upper case.)

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  6:11 PM

It was certainly perfectly clear to me when I implemented the case equivalence
for 10646.

P.S. I think it is a big mistake to have case equivalence for peculiar characters,
it just doesn't work right, given the locale dependence that should be there,
but isn't, and you wouldn't want it for indentifiers anyway.

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  6:26 PM

> It was certainly perfectly clear to me when I implemented the case 
> equivalence for 10646.

For me, too. Surely it is obvious for the Latin-1 characters (and those are
the only ones involved in Ada 95). But I couldn't find any Ada 95 wording
that defined the mapping between upper and lower case letters (which letter
belong to each character category IS defined). Admittedly, I didn't try that
hard to find such wording; it's not relavant to the topic at hand (which is
whether Ada 2005 has a hole).

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  6:12 PM

> The origin of the problem that led to this "solution" is the PRG work, 
> which deals with updating the Ada bindings to POSIX.
> There have been a lot of new calls added to POSIX since the bindings 
> were last updated. The thought was that there may be certain new POSIX 
> calls and capabilities that may be worth adding to the Ada standard 
> libraries, rather than create new packages in the POSIX bindings.
> For example, Ada.Directories eliminates or greatly reduces the need to 
> add bindings to directory related calls.

The fact that the Posix bindings have some feature is not of itself a
reason for burdening Ada implementations with features that are not needed
by Ada users :-)

****************************************************************

From: Stephen Michell
Date: Friday, October 24, 2008  6:55 PM

I can see the point that such functionality shoul not be in
Ada.Characters.Handling, but it still makes sense to define locale-sensitive
conversions in Ada. Does it make sense to define this in Interfaces?

****************************************************************

From: Randy Brukardt
Date: Friday, October 24, 2008  7:16 PM

I think it would make sense for it to be a new child of Ada.Characters
(if we have it at all). After all, there are no Wide_Character or
Wide_Wide_Character case conversions (or classifications, for that matter)
currently, and those are where the Locale potentially would make a
significant difference. (After all, there are no Turkish characters in
Latin-1, so the classic example of this need is non-existent for the
Latin-1 routines in Ada.Characters.Handling.

Indeed, I thought that you and Brad were planning to propose some sort of
Ada.Characters.Wide_Handling, which you could make a case for without any
appeal to locale issues.

****************************************************************

From: Brad Moore
Date: Friday, October 24, 2008  6:40 PM

> The fact that the Posix bindings have some feature is not of itself a 
> reason for burdening Ada implementations with features that are not 
> needed by Ada users :-)

Agreed, and if the feature is not needed by Ada users, then by association,
it seems safe to say that it is not needed by Ada POSIX binding users either.

This is the sort of information that is valuable and could help to
significantly reduce the workload for the PRG team. 

****************************************************************

From: Robert Dewar
Date: Friday, October 24, 2008  7:43 PM

there I disagree, a binding to Posix should bind to all features, I don't
think it is the place of the implementors a binder to make value judgments
overriding the designers of the library they are binding to.

On the other hand, as designers of the Ada language, we do most certainly
have the obligation to include only useful stuff.

****************************************************************


Questions? Ask the ACAA Technical Agent