!standard 13.7.3(0) 10-06-01 AI05-0127-2/01 !standard 1.2(2) !standard 1.2(4/2) !class Amendment 10-06-01 !status work item 10-06-01 !status received 10-06-01 !priority Low !difficulty Medium !subject Adding Locale Capabilities !summary A package is needed to identify the current locale. !problem Ada does not provide a portable way to determine the active locale in an environment. Knowing the active locale would facilitate writing applications that tailor the users experience to match the users expectations. The means to determine the current locale is operating system specific and non-portable. Should basic localization support be added to the language? (Yes.) !proposal Most modern operating systems provide capabilities that facilitate writing applications that tailor the users experience with an application to match the users expectations. The existing approaches vary considerably however and are non-portable. For example POSIX provides POSIX library calls whereas Microsoft Windows provides a completely different set of interfaces. A portable solution is desired for Ada. There are many areas that are affected by locale settings such as dates, times, currency, character collation orders, message text, and numeric formatting. The basic need however, is to be able to determine the current locale (language and country). If an application has this capability, all locale related differences can be programmed into the application in a portable manner. This proposal provides a new package: System.Locale package System.Locale is type Language_Code is new String; type Country_Code is new String; Country_Unknown : constant Country_Code := implementation-defined Language_Unknown : constant Language_Code := implementation-defined function Language return Language_Code; function Country return Country_Code; end System.Locale; If the country associated with the current locale can be determined from the environment, the Country function returns a code as defined by ISO 3166-1, otherwise Unknown_Country is returned. ISO 3166-1 defines three sets of codes; alpha-2, alpha-3, and numeric-3. These three codes cover an identical number of country names. The alpha-2 code is a two letter code, alpha-3 is a three letter code, and numeric-3 is a 3 digit numeric code. e.g. Country alpha-2 alpha-3 number-3 -------------------------------------------------- AFGHANISTAN af afg 004 CANADA ca can 124 FRANCE fr fra 250 GERMANY de deu 276 ITALY it ita 380 SPAIN es esp 724 UNITED KINGDOM gb gbr 826 UNITED STATES us usa 840 Numeric codes are used mostly for countries where non-Latin scripts are used. The Country function returns a lower-case string that represents the country of the current locale. Whether it returns an alpha-2, alpha-3, or numeric-3 code is implementation defined, though it is recommended that the returned value be the value most appropriate for the environment, which typically is the alpha-2 code. These are the same codes used in the internet for top level domain names. E.g. google.ca If the language associated with the current locale can be determined from the environment, the Language function returns a code as defined by ISO 639. Otherwise Language returns Unknown_Language. ISO 639 has 5 code lists, three of which are relevant. Part 1, the alpha-2 code Part 2, the alpha-3 code ISO 639-2/T contains alpha 3 codes for the same languages as defined in ISO 639-1 ISO 639-3/B contains alpha 3 codes that are mostly the same as ISO 639-2/T but with some codes derived from English names rather than native names of the languages Part 3, the alpha-3 code for comprehensive coverage of languages. e.g. Language 639-1 639-2/T 639-2/B 639-3 ------------------------------------------------------------ English en eng eng eng French fr fra fre fra German de deu ger deu Chinese zh zho chi zho+one of 13 subcodes (eg cmn for mandarin) The Language function returns a lower-case string that represents the language of the current locale. Whether it returns a 639-1, 639-2/T, 639-2/B, or 639-3 code is implementation defined, though it is recommended that the returned value be the value most appropriate for the environment, which typically is the alpha-2 code. !wording Add to normative references after 1.2(2): ISO/IEC 639-1:2002, Terminology and other language and content resources Codes for the representation of names of languages Part 1: Alpha-2 code ISO/IEC 639-2:1998, Terminology and other language and content resources Codes for the representation of names of languages Part 2: Alpha-3 code ISO/IEC 639-3:2007, Terminology and other language and content resources Codes for the representation of names of languages Part 3: Alpha-3 code for comprehensive coverage of languages Add to normative references after 1.2(4/2): ISO/IEC 3166-1:2006, Information and documentation Codes for the representation of names of countries and their subdivisions Part 1: Country Codes Add a new clause: 13.7.3 The Package System.Locale Static Semantics The following language-defined library package exists: package System.Locale is pragma Preelaborate; type Language_Code is new String; type Country_Code is new String; Country_Unknown : constant Country_Code := implementation-defined Language_Unknown : constant Language_Code := implementation-defined function Language return Language_Code; function Country return Country_Code; end System.Locale; A locale identifies a geopolitical place or region, its associated character sets, data and time formats, currency formats, and other internationalization related characteristics. The active locale is the locale associated with the active partition. Language_Code is a lower-case string representation of an ISO 639 code that identifies the name of a language associated with the active partition. Country_Code is a lower-case string representation of an ISO 3166-1 code that identifies the name of a country associated with the active locale. Dynamic Semantics If the Country_Code associated with the active locale cannot be determined from the environment then Country returns Country_Unknown. If the Language_Code associated with the active locale cannot be determined from the environment then Language returns Language_Unknown. Implementation Advice Codes returned should reflect the target environment semantics as closely as is reasonable. For example, in most environments, it makes sense to return an alpha-2 code instead of an alpha-3 code as defined by ISO 639-1, ISO 639-2, ISO 639-3 and ISO-3166-1, since those are commonly used, have the least variation, and have the highest portability for locale based capabilities. !discussion Consideration was given to whether specific locale capabilities could be provided, such as accessing numeric formatting, date formatting, currency formatting, or collating sequence locale specific information. This was ruled out because it would be difficult to get this right, and would require a high level of effort, when there does not seem to be a high level of demand for these capabilities. A simple capability of determining the locale is all that is needed to provide portability, as application programmers can program specific locale differences as needed once the current locale has been determined. Consideration was also given to whether the returned codes should be specific lengths. For example, country codes are typically two character codes in Windows and POSIX environments. There are cases though where a three character code may be more appropriate. Numeric codes may be used for locales where non Latin scripts are used. It was decided that the result types for these functions should be string types, to provide the greatest flexibility. If the world switches to 3 character codes over time, it will not impact the specification of this package. Originally the package was a child of Ada, however it was decided that this package should be a child of System, because locale capabilities are system dependent. The package name was originally plural, as in System.Locales. Since there is only one active locale, usage of this package reads better if the package name is singular. e.g., If System.Locale.Language = "ca" then ... end if; !example with System.Locale; with Ada.Text_IO.Editing; procedure P is Fill, Separator, Radix : Character; Currency : constant String := "$"; Pic : constant Ada.Text_IO.Editing.Picture := Ada.Text_IO.Editing.To_Picture (Pic_String => "$ZZZZ_ZZ9.99", Blank_When_Zero => False); type Dollars is delta 0.01 digits 8 range 0.0 .. 999_999.99; begin if System.Locale.Country = "ca" or System.Locale.Country = "can" then if System.Locale.Language = "en" or System.Locale.Language = "eng" then Fill := ' '; Separator := ','; Radix := '.'; elsif System.Locale.Language = "fr" or System.Locale.Language = "fre" then Fill := ' '; Separator := '.'; Radix := ','; end if; end if; declare package Canadian_Cash is new Ada.Text_IO.Editing.Decimal_Output (Num => Dollars); Cost : constant Dollars := 256_778.99; begin Canadian_Cash.Put (Item => Cost, Pic => Pic, Currency => Currency, Fill => Fill, Separator => Separator, Radix_Mark => Radix); end; end P; --!corrigendum 13.7.3(0) !ACATS test ACATS C-Tests are needed to test this package. !appendix From: Brad Moore Date: Tuesday, June 1, 2010 1:15 AM Here is a much simplified version of AI05-0127, my homework. [This is version /01 of this AI - Editor.] I've eliminated all locale functionality other than a capability to determine the active language and country. The idea is that once you have a portable means to determine the current locale, the application programmer can program all locale related differences needed in a portable manner. Rather than return string, I thought it was better to return Language_Code and Country_Code which are types derived from String. My thinking was it is better to have distinct types for these rather than subtypes of String to provide stronger type safety. In the ARG meeting notes from Burlington, the suggest was to move the package Ada.Locales to System.Locales. I have moved the new package to be a child of System, but modified the package name from Locales to Locale. (Plural to singular) It reads better in the code. if System.Locale.Language = "en" then ... end if; **************************************************************** From: Jean-Pierre Rosen Date: Tuesday, June 1, 2010 2:30 AM Small nit: the specification says Country_Unknown and Language_Unknown, but the discussion talks about Unknown_Country and Unknown_Language **************************************************************** From: Brad Moore Date: Tuesday, June 1, 2010 10:10 AM Yes, the specification was my intent, it should be Country_Unknown and Language_Unknown throughout. **************************************************************** From: Bob Duff Date: Tuesday, June 1, 2010 10:56 AM > The idea is that once you have a portable means to determine the > current locale, the application programmer can program all locale > related differences needed in a portable manner. I don't really see the need for this AI. For one thing, it doesn't really provide portability, since the country names and language names are impl-def. Not totally impl-def; they have to follow one of several standards (two-letter names, three-letter names, etc). The nice thing about standards is that you have so many to choose from. -- Somebody Famous. (This saying is attributed to at least Andrew S. Tanenbaum, Admiral Grace Hooper, and Ken Olsen, by various web sites. And I seem to recall hearing some Comp Sci professor at CMU saying it, circa 1978. Which leads me to say, "The nice thing about the world wide web is that there's so much misinformation to choose from.") And the supposed reason for using strings is to allow implementations to upgrade to new versions of the relevant locale standards. I'm not sure how to write portable code using such a moving target. If this stuff really is properly standardized, then we can use an enumeration type. The fact that we're using strings seems to indicate otherwise. I don't like Unknown_Country/lang being impl-def. Shouldn't we at least insist that it be distinct from defined country names? For that matter, why not nail it down (say it's "unknown country code" or something). According to the ARG minutes from Burlington (Feb 2010), 2 people voted against keeping this alive. I don't really remember, but I suspect I was one of them. The last 3 messages in the !appendix show Pascal Leroy, Bob Duff, and Robert Dewar, all suggesting to drop this AI (but note that that was a previous much-more-ambitious version). I haven't changed my mind -- I don't think even this much-simpler version is worth the trouble. If I were writing a program that needs l10n / i18n, I think I'd ignore this package, and go straight to the O.S. facilities. There really aren't that many -- windows, plus misc vesions of Unix that probably support Posix. Embedded real-time kernels can probably be ignored. In the !example, variables are left uninitialized if you're not in Canada (or if the impl chooses the numeric encoding of that country). I don't understand what "Dollars" are doing in a supposedly i18n app. I guess I don't really understand the example. I think I prefer Locales over Locale (no big deal -- I'm just used to plurals for package names). **************************************************************** From: Robert Dewar Date: Tuesday, June 1, 2010 11:02 AM I agree with everything Bob says, and I would recommend dropping this AI. **************************************************************** From: Robert Dewar Date: Tuesday, June 1, 2010 6:08 AM ... > Rather than return string, I thought it was better to return > Language_Code and Country_Code which are types derived from String. I think that types derived from String tend to be a nuisance, because various utility functions do not apply without junk conversions. > My thinking was it is better to have distinct types for these rather > than subtypes of String to provide stronger type safety. I disagree **************************************************************** From: Bob Duff Date: Tuesday, June 1, 2010 11:10 AM > I think that types derived from String tend to be a nuisance, because > various utility functions do not apply without junk conversions. But there are cases where distinct types are helpful, and I think this is one of them. See here for another example: http://www.adacore.com/2010/04/05/gem-83/ In C, you can say: printf (input_data); // a security hole, if privileged program! when you should have said printf ("%s", input_data); The idea of template-oriented formatting is a good one. In fact, we use the same idea in GNAT for error messages, and also in IAC (the CORBA IDL-to-Ada compiler). So does CodePeer (last time I checked). But it works best if the "template" type is distinct from the "string that could come from input data" type (namely String). I recently fixed a bunch of bugs of this nature in IAC. And to make sure they STAY fixed, I changed the type from String to a template type derived from String. > > My thinking was it is better to have distinct types for these rather > > than subtypes of String to provide stronger type safety. > > I disagree In this particular case, I agree with Brad's decision. As I said in my previous message, these types are really more like enums than strings. Having country codes as a separate type allows you to keep track of which strings have been verified to really be country codes, versus other strings that could contain arbitrary text. Note: In Ada 2012, I might use subtype predicates instead! ;-) Anyway, if we're going to have this AI, shouldn't there be an Is_Valid_Country function? And/or a conversion function String-->Country that checks? **************************************************************** From: Brad Moore Date: Tuesday, June 1, 2010 11:17 AM I could go either way regarding derived types vs subtypes. On the one hand I thought there might not be much need for applying utility functions on the return codes for these functions, and the stronger types might catch some errors (eg. erroneously passing a language code into a function that accepts a country code to determine the currency symbol) On the other hand, I agree that the junk conversions you mention can be an annoyance. I am happy to go with the consensus on this, but considering your comment, I am starting to think subtypes are the way to go. I presume though that it is preferable to return the Language_Code and Country_Code subtypes rather than just return string? **************************************************************** From: Bob Duff Date: Tuesday, June 1, 2010 11:25 AM > On the other hand, I agree that the junk conversions you mention can > be an annoyance. I am happy to go with the consensus on this, but > considering your comment, I am starting to think subtypes are the way > to go. Don't give in so easily. ;-) But I suppose if we drop this AI, as Robert and I suggest, we can leave the type-vs-subtype question moot. **************************************************************** From: Brad Moore Date: Tuesday, June 1, 2010 11:17 AM > For one thing, it doesn't really provide portability, since the > country names and language names are impl-def. Not totally impl-def; > they have to follow one of several standards (two-letter names, > three-letter names, etc). > > The nice thing about standards is that you have so many to choose > from. -- Somebody Famous. > (This saying is attributed to at least Andrew S. Tanenbaum, > Admiral Grace Hooper, and Ken Olsen, by various web sites. And > I seem to recall hearing some Comp Sci professor at CMU saying > it, circa 1978. Which leads me to say, "The nice thing about > the world wide web is that there's so much misinformation to > choose from.") It's not quite that bad. Really, there is only one standard for country names, and one standard for language names (ISO 3166-1 and ISO 839). Each standard provides several formats for the codes. I think my mistake was to try to get away with not specifying which of the formats was used by the Ada package. I now think it would have been better to say that the alpha-2 formats are always returned, since those are the ones used by Microsoft, POSIX, and Java today w.r.t locale identification. This would at least address your portability comment, I think, since the country names and language names would then be implementation defined. > And the supposed reason for using strings is to allow implementations > to upgrade to new versions of the relevant locale standards. I'm not > sure how to write portable code using such a moving target. > > If this stuff really is properly standardized, then we can use an > enumeration type. The fact that we're using strings seems to indicate > otherwise. I originally considered defining an enumeration that mapped to the codes defined by ISO, but I came to the conclusion that two character codes in the form of a string are better suited for this purpose. Over time, as new countries form, and new languages evolve, the ISO country and language standards will need to be revised. Adding new values to an enumeration will be cause incompatibilities that can be avoided if we stick to returning a string based value that maps to the two-character codes. If I am writing an application for my current locale, say in Canada where English and French are the official languages, it would be nice to know that introducing a new country name for some newly formed country on the other side of the planet will not break any enumeration case statements in my application. > I don't like Unknown_Country/lang being impl-def. Shouldn't we at > least insist that it be distinct from defined country names? For that > matter, why not nail it down (say it's "unknown country code" or > something). My intent was to define these as a constant, such as " ", (two spaces) which does not (and would not) map to any character codes defined by ISO. Nailing it down to a constant value sounds good to me. The point is, these are the only cases where the values returned are not defined in the ISO standard. > If I were writing a program that needs l10n / i18n, I think I'd ignore > this package, and go straight to the O.S. facilities. There really > aren't that many -- windows, plus misc vesions of Unix that probably > support Posix. Embedded real-time kernels can probably be ignored. I do have some real experience with this issue. A major system we developed for our Canadian customer required that all text displayed in all applications running on the data terminal be displayed in either English or French depending on the locale settings of the terminal. The applications originally were developed for a Unix platform, but eventually were also ported to Windows. This is one of the few areas where the code was not portable, so our source tree ended up providing and maintaining multiple implementations of a package. Admittedly, it was not a huge problem to work around, but it is messier than having one source. This complicates project make files. We were even considering bringing in some preprocessor solution for this one issue, which we ended up avoiding thankfully. To those developers on our team coming from a C/C++ environment, it was difficult to convince them that Ada's approach of not providing a preprocessor was a good one, even though I believe that was a good choice, for other technical reasons. On an aside, I was just bit last week by some C/C++ code where a system include file had redefined an enumeration literal I was trying to define to some other string. That's pretty scary stuff if you can't trust that the source code you see displayed in the editor is not what the compiler sees. In my experience, given the choice between an Ada standard package, and going straight to O.S. facilities, I would choose the Ada package almost always, unless the O.S. facilities provided features that were not present in the Ada package. > In the !example, variables are left uninitialized if you're not in > Canada (or if the impl chooses the numeric encoding of that country). > I don't understand what "Dollars" are doing in a supposedly i18n app. > I guess I don't really understand the example. The example is not a comprehensive one. I was thinking of the application we provided for the military. The application is only going to be run in a Canadian context, which is why I didn't test for other countries. I should just have checked to ensure that the country is Canada, and raised program error otherwise. In Canada, both English and French use Dollars. The example also shows how the locale capability can be used with Ada.Text_IO.Editing.Decimal_Output which is an existing package that can be used to address locale formatting of numeric and currency values. It's rather odd that we never provided a means to facilitate using locale to select the radix, separator, and currency inputs. I can probably come up with a better example. In fact, I think I would like to resubmit this AI, with a version that only uses alpha-2 codes. Before we decide to torch this AI, it would be good to have a version that at least addresses some of these comments. It shouldn't take me long to update. **************************************************************** From: Bob Duff Date: Tuesday, June 1, 2010 4:51 PM > In my experience, given the choice between an Ada standard package, > and going straight to O.S. facilities, I would choose the Ada package > almost always, unless the O.S. facilities provided features that were > not present in the Ada package. I guess that "unless" is the key point. There are approximately 3 operating systems to worry about: Windows, Linux/Unix/Posix, any other? There are hundreds of countries/languages. If I want portability across operating systems and portability acrosss countries, I'm thinking I'd rather write 2 or 3 OS-dependent versions of things, rather than hundreds. The current (thankfully simplified!) version of the AI gives a somewhat-portable way to query the country. But the OS gives much more -- for example, collating sequences. Would you rather do: if Country = "xx" then collating order for xx goes here elsif Country = "yy" then collating order for yy goes here ... 100 more elsif's. Or: if this is windows then use windows-specific stuff to get this locale's collating sequence elsif this is linux then use posix stuff else is there anything else? I think I'm choosing 2 or 3 elsifs over 100 elsifs. Of course, your example is different -- you had just 2 locales (English- and French-speaking parts of Canada), so I understand that's somewhat simpler. > > In the !example, variables are left uninitialized if you're not in > > Canada (or if the impl chooses the numeric encoding of that > > country). I don't understand what "Dollars" are doing in a > > supposedly i18n app. I guess I don't really understand the example. > > The example is not a comprehensive one. I was thinking of the > application we provided for the military. The application is only > going to be run in a Canadian context, which is why I didn't test for > other countries. I should just have checked to ensure that the country > is Canada, and raised program error otherwise. Right. Or for a program that could run outside Canada, you'd default to some locale if it's not one of the ones you've specifically coded for. > In Canada, both English and French use Dollars. I know -- I've been to both English- and French-speaking parts. It looks like monopoly money, with all those colors, but hey, who am I to judge. ;-) > The example also shows how the locale capability can be used with > Ada.Text_IO.Editing.Decimal_Output which is an existing package that > can be used to address locale formatting of numeric and currency values. > It's rather odd that we never provided a means to facilitate using > locale to select the radix, separator, and currency inputs. > > I can probably come up with a better example. In fact, I think I would > like to resubmit this AI, with a version that only uses alpha-2 codes. > Before we decide to torch this AI, it would be good to have a version > that at least addresses some of these comments. It shouldn't take me > long to update. Well, maybe you should wait to see what others think. I have never done any serious i18n work, so you should take what I say with a grain of salt. I read a book about it some years ago, and it seemed like operating systems had some fairly sophisticated stuff. Unfortunately not portable across operating systems. But portable across locales! **************************************************************** From: Brad Moore Date: Wednesday, June 2, 2010 12:53 AM ... > I think I'm choosing 2 or 3 elsifs over 100 elsifs. I agree that if one were to write an application that everyone in the world could use, you have serious locale needs and probably would want to glean all the OS capabilities you can by writing non-portable calls to the OS. I think you would be hard pressed though to find many real world example of such an application however. By the way, if you can think of a new application that everyone in the world would want to use, please let me know. :-) (I suppose a web browser is an example of one such existing application) One of the significant development costs we encountered in the Canadian applications was in the area of language translation. It's fine having OS hooks to determine decimal points, and currency symbols, but no amount of OS capabilities is going to do the hard work of determining which text to display to the user in GUI's, reports, help text, etc. Getting text from different languages to fit on the same places on a GUI window can be quite a challenge in itself. Writing an application that displays the correct text in every language would be a monumental task. Good luck just finding the translators needed for all those languages. I suspect in practice, a good majority of il8n applications involve a handful of languages at most, attempting to cover the main population of the users. For example, bank machines in my area mostly have two languages, some have 4 or 5. (eg. Chinese, English, French, Spanish, Japanese). Instruction manuals for equipment I have seen may have 3 - 5 languages depending on where the equipment is sold internationally. German might be one of the languages added to the list. Any websites I have seen typically only support up to a handful of languages. I don't recall ever encountering Swahili in my travels, (not that I would know Swahili if I saw it), someone writing an application would not bother translating to Swahili, unless there was a reasonable chance that someone speaking that language would be relatively common user of the application. So I think maybe I'm choosing up to a handful of portable elsifs over your 3 non-portable else ifs. Incidentally, collating sequence I suspect is one of the more esoteric of locale based areas. I dont think we used any locale based use of collating sequences in our applications that I can recall. The most important use of the locale we found is to select which text we wanted to display to the user. > > In fact, I think I > > would like to resubmit this AI, with a version that only uses > > alpha-2 codes. > > Well, maybe you should wait to see what others think. > Will do. **************************************************************** From: Jean-Pierre Rosen Date: Wednesday, June 2, 2010 2:49 AM > I agree that if one were to write an application that everyone in the > world could use, you have serious locale needs and probably would want > to glean all the OS capabilities you can by writing non-portable calls > to the OS. I think you would be hard pressed though to find many real > world example of such an application however. Do not forget games. "Battle for Wesnoth" is available in 49 languages (see http://www.wesnoth.org/gettext/). According to one of the main developpers of the game, Jeremy Rosen ;-), Gettext is the way to go to handle that many languages. But out of scope for us, I guess. **************************************************************** From: Bob Duff Date: Wednesday, June 2, 2010 7:34 AM > I think you would be hard pressed though to find many real world > example of such an application however. Indeed. I have never worked on any project that had anything to do with i18n, so I've got zero first-hand experience. I designed the message-printing stuff in CodePeer to support it, but as far as I know, there's still only one version of the messages (in English). AdaCore has customers all over, but GNAT and everything else we sell gives messages only in English. My bank's machine asks me whether I want to use English or Spanish. ****************************************************************