CVS difference for ai05s/ai05-0127-2.txt

Differences between 1.1 and version 1.2
Log of other versions for file ai05s/ai05-0127-2.txt

--- ai05s/ai05-0127-2.txt	2010/06/13 02:16:13	1.1
+++ ai05s/ai05-0127-2.txt	2010/06/13 02:48:42	1.2
@@ -1,4 +1,4 @@
-!standard 13.7.3(0)                                10-06-01  AI05-0127-2/01
+!standard A.19(0)                                   10-06-05  AI05-0127-2/02
 !standard 1.2(2)
 !standard 1.2(4/2)
 !class Amendment 10-06-01
@@ -38,75 +38,10 @@
 locale related differences can be programmed into the application in a
 portable manner.
 
-This proposal provides a new package: System.Locale
+This proposal provides a new package Ada.Locales that provides functions
+to query the identity of the country and language associated with the current
+locale.
 
-package System.Locale is
-   type Language_Code is new String;
-   type Country_Code is new String;
-
-   Country_Unknown : constant Country_Code := implementation-defined
-   Language_Unknown : constant Language_Code := implementation-defined
-
-   function Language return Language_Code;
-   function Country return Country_Code;
-end System.Locale;
-
-If the country associated with the current locale can be determined from
-the environment, the Country function returns a code as defined by
-ISO 3166-1, otherwise Unknown_Country is returned. ISO 3166-1 defines
-three sets of codes; alpha-2, alpha-3, and numeric-3. These three codes
-cover an identical number of country names.
-
-The alpha-2 code is a two letter code, alpha-3 is a three letter code,
-and numeric-3 is a 3 digit numeric code.
-
-e.g.
-Country               alpha-2   alpha-3   number-3
---------------------------------------------------
-AFGHANISTAN             af       afg        004
-CANADA                  ca       can        124
-FRANCE                  fr       fra        250
-GERMANY                 de       deu        276
-ITALY                   it       ita        380
-SPAIN                   es       esp        724
-UNITED KINGDOM          gb       gbr        826
-UNITED STATES           us       usa        840
-
-Numeric codes are used mostly for countries where non-Latin scripts
-are used. The Country function returns a lower-case string that represents
-the country of the current locale. Whether it returns an alpha-2, alpha-3,
-or numeric-3 code is implementation defined, though it is recommended that
-the returned value be the value most appropriate for the environment, which
-typically is the alpha-2 code. These are the same codes used in the internet
-for top level domain names. E.g. google.ca
-
-If the language associated with the current locale can be determined from
-the environment, the Language function returns a code as defined by ISO 639.
-Otherwise Language returns Unknown_Language. ISO 639 has 5 code lists, three
-of which are relevant.
-
-   Part 1, the alpha-2 code
-   Part 2, the alpha-3 code
-      ISO 639-2/T contains alpha 3 codes for the same languages as defined in ISO 639-1
-      ISO 639-3/B contains alpha 3 codes that are mostly the same as ISO 639-2/T
-                  but with some codes derived from English names rather than native
-                  names of the languages
-   Part 3, the alpha-3 code for comprehensive coverage of languages.
-
-e.g.
-Language               639-1    639-2/T   639-2/B     639-3
-------------------------------------------------------------
-English                 en       eng        eng        eng
-French                  fr       fra        fre        fra
-German                  de       deu        ger        deu
-Chinese                 zh       zho        chi        zho+one of 13 subcodes
-                                                          (eg cmn for mandarin)
-
-The Language function returns a lower-case string that represents the language
-of the current locale. Whether it returns a 639-1, 639-2/T, 639-2/B, or 639-3
-code is implementation defined, though it is recommended that the returned value
-be the value most appropriate for the environment, which typically is the alpha-2 code.
-
 !wording
 
 Add to normative references after 1.2(2):
@@ -128,37 +63,38 @@
 
 Add a new clause:
 
-13.7.3 The Package System.Locale
+A.19 The Package System.Locale
 
 
 Static Semantics
 
 The following language-defined library package exists:
 
-package System.Locale is
-   pragma Preelaborate;
+package Ada.Locales is
+   pragma Preelaborate(Locales);
 
-   type Language_Code is new String;
-   type Country_Code is new String;
+   type Language_Code is array (1 .. 3) of Character range 'a' .. 'z';
+   type Country_Code is array (1 .. 2) of Character range 'A' .. 'Z';
 
-   Country_Unknown : constant Country_Code := implementation-defined
-   Language_Unknown : constant Language_Code := implementation-defined
+   Country_Unknown : constant Country_Code := "ZZ";
+   Language_Unknown : constant Language_Code := "und";
 
    function Language return Language_Code;
    function Country return Country_Code;
-end System.Locale;
 
+end Ada.Locales;
+
 A locale identifies a geopolitical place or region, its associated
 character sets, data and time formats, currency formats, and other
 internationalization related characteristics. The active locale is
 the locale associated with the active partition.
 
-Language_Code is a lower-case string representation of an ISO 639
-code that identifies the name of a language associated with the
-active partition.
+Language_Code is a lower-case string representation of an ISO 639-2/T
+alpha-3 code that identifies the language associated with the
+active locale.
 
-Country_Code is a lower-case string representation of an ISO 3166-1
-code that identifies the name of a country associated with the
+Country_Code is an upper-case string representation of an ISO 3166-1
+alpha-2 code that identifies the country associated with the
 active locale.
 
 Dynamic Semantics
@@ -169,96 +105,193 @@
 If the Language_Code associated with the active locale cannot
 be determined from the environment then Language returns Language_Unknown.
 
-Implementation Advice
+!discussion
 
-Codes returned should reflect the target environment semantics
-as closely as is reasonable.  For example, in most environments, it makes
-sense to return an alpha-2 code instead of an alpha-3 code as defined by
-ISO 639-1, ISO 639-2, ISO 639-3 and ISO-3166-1, since those are commonly
-used, have the least variation, and have the highest portability for
-locale based capabilities.
+ISO 3166-1 defines three sets of codes; alpha-2, alpha-3, and numeric-3.
+These three codes cover an identical number of country names.
 
-!discussion
+The alpha-2 code is a two letter code, alpha-3 is a three letter code,
+and numeric-3 is a 3 digit numeric code.
+
+e.g.
+Country               alpha-2   alpha-3   number-3
+--------------------------------------------------
+AFGHANISTAN             AF       AFG        004
+CANADA                  CA       CAN        124
+FRANCE                  FR       FRA        250
+GERMANY                 DE       DEU        276
+ITALY                   IT       ITA        380
+SPAIN                   ES       ESP        724
+UNITED KINGDOM          GB       GBR        826
+UNITED STATES           US       USA        840
+
+Numeric codes are used mostly for countries where non-Latin scripts
+are used.
+
+The Country function returns an upper-case string that represents
+the country of the current locale. The ISO 3166-1 standard is
+case insensitive for country codes, but recommends upper case for
+code usage, which is why the Country function limits the return
+result to upper case only. This simplifies client usage if clients
+know they can expect the return values to be consistently in upper
+case. Alpha-2 codes were chosen instead of alpha-3 codes because
+existing locale capabilities in POSIX and Apple OSX follow
+BCP 47 RFC 4646, which excludes the use of alpha-3 codes.
+Since Microsoft's locale id scheme does not follow ISO 3166-1,
+the Microsoft scheme does not impact this decision.
+Going with alpha-2 code format possibly allows for simpler
+implementations in POSIX and OSX environments since the
+alpha-2 code can be extracted directly from the environment
+without requiring a mapping.
 
 Consideration was given to whether specific locale capabilities could
 be provided, such as accessing numeric formatting, date formatting,
 currency formatting, or collating sequence locale specific information.
 This was ruled out because it would be difficult to get this right,
 and would require a high level of effort, when there does not seem
-to be a high level of demand for these capabilities. A simple
-capability of determining the locale is all that is needed to provide
-portability, as application programmers can program specific locale
+to be a high level of demand for these capabilities.
+A simple capability of determining the locale is all that is needed to
+provide portability, as application programmers can program specific locale
 differences as needed once the current locale has been determined.
+
+A user application could relatively easy define a translation lookup
+facility that accepted the current locale, and an application message
+id to lookup a locale specific translation. Such a facility could
+also lookup localization features such as those provided by the OS
+for numeric, date formatting and currency formatting and collating
+sequences.
+
+ISO 639 has 5 code lists, three
+of which are relevant.
+
+   Part 1, the alpha-2 code
+   Part 2, the alpha-3 code
+      ISO 639-2/T contains alpha 3 codes for the same languages as defined in ISO 639-1
+      ISO 639-3/B contains alpha 3 codes that are mostly the same as ISO 639-2/T
+                  but with some codes derived from English names rather than native
+                  names of the languages
+   Part 3, the alpha-3 code for comprehensive coverage of languages.
+
+e.g.
+Language               639-1    639-2/T   639-2/B     639-3
+------------------------------------------------------------
+English                 en       eng        eng        eng
+French                  fr       fra        fre        fra
+German                  de       deu        ger        deu
+Chinese                 zh       zho        chi        zho+one of 13 subcodes
+                                                          (eg cmn for mandarin)
 
-Consideration was also given to whether the returned codes should
-be specific lengths. For example, country codes are typically two
-character codes in Windows and POSIX environments. There are cases
-though where a three character code may be more appropriate.
-Numeric codes may be used for locales where non Latin scripts are used.
-It was decided that the result types for these functions should be
-string types, to provide the greatest flexibility. If the world
-switches to 3 character codes over time, it will not impact the
-specification of this package.
-
-Originally the package was a child of Ada, however it was decided
-that this package should be a child of System, because locale
-capabilities are system dependent.
-
-The package name was originally plural, as in System.Locales.
-Since there is only one active locale, usage of this package reads
-better if the package name is singular.
-e.g.,
-    If System.Locale.Language = "ca" then
-        ...
-    end if;
+The Language function returns a lower-case 639-2/T alpha-3 string that
+represents the language of the current locale. The ISO 639-2 standard is
+case insensitive for language codes, but recommends lower case for
+code usage, which is why the Language function limits the return
+result to lower case only. This simplifies client usage if clients
+know they can expect the return values to be consistently in lower
+case.
+
+The decision to go with 639-2/T alpha-3 codes was driven by the fact that
+639-1 codes only cover the major languages in use. ISO 639-2 defines codes
+for many more languages than 639-1, and generally covers all languages that
+have significant bodies of literature, and covers most languages.
+
+The selection of 639-2/T over 639-2/B is driven by the fact that POSIX and
+Apple OSX follow BCP 47 (RFC 4646), which states that when there is a choice
+between the "T" code and the "B" code, the T code is the recommended choice.
+
+The definition of Language_Unknown is defined to "und" because ISO 639
+defines that code to be used in situations in which a language or languages
+must be indicated but the language cannot be identified.
+
+The definition of Country_Unknown is defined to "ZZ" because ISO 3166-1
+specifies that is one of a set of codes in the standard that is
+user assigned. User-assigned code elements are codes at the disposal of
+users who need to add further names of countries, territories, or other
+geographical entities to their in-house application of ISO 3166-1, and
+the ISO 3166/MA will never use these codes in the updating process of
+the standard. The following codes can be user-assigned:
+
+    * Alpha-2: AA, QM to QZ, XA to XZ, and ZZ
+    * Alpha-3: AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ
+
+One such user-assigned coding is by the Unicode Common Locale Data Repository,
+which assigns ZZ to represent "Unknown or Invalid Territory"
+
+Since there is no specific code defined for unknown Country and there already
+are uses of this code for similar purposes, and because
+this is the last user assigned alpha-2 code and less likely to be used for
+other purposes, "ZZ", seemed like the correct choice.
+
+Consideration was given to whether the package should deal with
+macro-geographic regions. The BCP 47 RFC indicates that country codes
+can be in numeric-3 format if the region identified is larger than a
+country, such as a continent. Microsoft does not have any locales based
+on macro-geographic regions. It is dubious that these locales are used
+much if at all. The numeric-3 codes in this case are outside of ISO 3166-1
+because they do not represent countries. Trying to build support for this
+into the Ada package would be messy, and in these cases it is the Language
+that is the most important distinguisher rather than region. If the OS
+provides a numeric-3 format for macro-geographic region, it makes sense
+to return Country_Unknown for the Country function, since the Country
+truly is unknown.
+
+Consideration was given to whether this new package should be a child of
+System or a child of Ada. It was decided that this package should be a child
+of Ada, because the package does not provide any impementation-defined
+definitions, and provides a portable way to access operating system
+facilities similar to Ada.Directories.
 
 !example
 
-with System.Locale;
+package Canadian_Point_Of_Sale_System is
+   type Dollars is delta 0.01 digits 8 range 0.0 .. 999_999.99;
+
+   function To_String (Amount : Dollars) return String;
+end Canadian_Point_Of_Sale_System;
+
+with Ada.Locales;
 with Ada.Text_IO.Editing;
+with Ada.Text_IO;
 
-procedure P is
-   Fill, Separator, Radix : Character;
-   Currency : constant String := "$";
-   Pic : constant Ada.Text_IO.Editing.Picture
-     := Ada.Text_IO.Editing.To_Picture
-     (Pic_String => "$ZZZZ_ZZ9.99",
-      Blank_When_Zero => False);
+package body Canadian_Point_Of_Sale_System is
 
-   type Dollars is delta 0.01 digits 8 range 0.0 .. 999_999.99;
-begin
+   function To_String (Amount : Dollars) return String
+   is
+      package Canadian_Decimal_Output is new
+        Ada.Text_IO.Editing.Decimal_Output (Num => Dollars);
 
-   if System.Locale.Country = "ca" or System.Locale.Country = "can" then
+      Separator, Radix : Character;
+      use type Ada.Locales.Country_Code;
+      use type Ada.Locales.Language_Code;
+   begin
+
+      if Ada.Locales.Country /= "CA" then
+         raise Program_Error;
+      end if;
 
-      if System.Locale.Language = "en" or System.Locale.Language = "eng" then
-         Fill := ' ';
+      if Ada.Locales.Language = "eng" then
          Separator := ',';
          Radix := '.';
-      elsif System.Locale.Language = "fr" or System.Locale.Language = "fre" then
-         Fill := ' ';
+      elsif Ada.Locales.Language = "fra" then
          Separator := '.';
          Radix := ',';
+      else
+         raise Program_Error;
       end if;
-   end if;
 
-   declare
-      package Canadian_Cash is new
-        Ada.Text_IO.Editing.Decimal_Output (Num => Dollars);
-      Cost : constant Dollars := 256_778.99;
-   begin
-      Canadian_Cash.Put
-        (Item => Cost,
-         Pic  => Pic,
-         Currency => Currency,
-         Fill => Fill,
+      return Canadian_Decimal_Output.Image
+        (Item => Amount,
+         Pic => Ada.Text_IO.Editing.To_Picture
+          (Pic_String => "$ZZZZ_ZZ9.99"),
+         Currency => "$",
          Separator => Separator,
          Radix_Mark => Radix);
-   end;
 
-end P;
+   end To_String;
 
---!corrigendum 13.7.3(0)
+end Canadian_Point_Of_Sale_System;
 
+--!corrigendum A.19(0)
+
 !ACATS test
 
 ACATS C-Tests are needed to test this package.
@@ -760,6 +793,817 @@
 everything else we sell gives messages only in English.
 
 My bank's machine asks me whether I want to use English or Spanish.
+
+****************************************************************
+
+From: Brad Moore
+Date: Thursday, June 3, 2010   9:09 AM
+
+> Do not forget games. "Battle for Wesnoth" is available in 49 languages
+> (see http://www.wesnoth.org/gettext/). According to one of the main
+> developpers of the game, Jeremy Rosen ;-), Gettext is the way to go to
+> handle that many languages. But out of scope for us, I guess.
+
+Thinking back on the Canadian application I've been mentioning, it also was
+scalable to any number of languages.
+
+For that we employed a simple flat-file database, indexed by application defined
+message id strings.
+
+A function that looked something like;
+
+     function Lookup
+        (Message_Id : String;
+         Locale_Id : Integer) return String;
+
+The message id hashed into a file to get the variable length translation record,
+and each translation record contained a set of translations for however many
+languages were supported in the system. For example we had two languages, so
+English might map to 0, and French would map to 1. If we wanted to add Japanese
+next, we would give it 2, and so on.
+
+We used this mechanism for all locale related differences. We even used it for
+strings returning field lengths for reports, since the spacing for reports and
+tabular display on a GUI depended on the length of the text of column headers
+and so on.
+
+Once you had taken the time to enter all the translations for all the help
+messages, GUI labels, report headers, and so on, adding the few extra handful of
+translations that the OS provides (such as decimal radix point, currency symbol,
+probably even collating sequence, into this lookup database would be negligible
+compared to the work involved in doing all the other translations.
+
+Whenever we changed the translations for a new release of the application, we
+would run a utility to reindex the message file.
+
+In this process, we even incorporated as ASIS program that I had written to
+extract enumerations from the Ada source code. If a programmer changed an
+enumeration in the source code, this tool would detect whether or not the
+enumeration identifier mapped to the message identifier in the database file.
+This ensured that there was full coverage of translations for all enumeration
+values in the Ada code, as well as ordering of enumeration values matched, and
+the enumeration literal names matched the message id names in the translation
+file. The ASIS application would only be run during the reindexing process.
+
+This system worked really well, and was fast enough that lookups could be done
+on the fly as windows were being presented to the user.
+
+All this system needed was a way to figure out the current locale, which is what
+this AI is hoping to address. Everything else already was portable, and didn't
+require any changes when we ported from Unix to Windows.
+
+I had considered whether this AI should also provide a message lookup facility
+like the one I described above, but thought that would be too much. It is not
+that difficult to create a persistent hash table to implement the lookup
+function. In certain environments that have relational databases, this could be
+even easier. If the number of translations is small enough, someone could
+implement this using one of the standard containers such as
+Ada.Containers.Hashed_Maps. (Or create their own Persistent_Hashed_Maps
+container) Another alternative is to use the gettext utility that Jean-Pierre
+mentioned.
+
+So, Bob Duff's earlier comment having to write too many if else statements to
+support too many languages doesn't seem to apply if you have such a message
+lookup facility.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Thursday, June 3, 2010   9:24 AM
+
+> ...So, Bob Duff's earlier comment having to write too many if else
+> statements to support too many languages doesn't seem to apply if you
+> have such a message lookup facility.
+
+I agree that all programs that support internationalization use some kind of
+table lookup, rather than explicit "if...else" statements.  Having an easy way
+to get the current locale seems useful.
+
+I agree with the desire to make this as portable as possible, so we should
+choose one of the representations, and it sounds like you recommend the
+2-character one, which makes sense to me.  If that is true, we should probably
+use String(1..2) explicitly (or a suitably-named subtype or type) rather than
+simply String.  Having to manipulate arbitrary-length strings seems like
+unnecessary overhead if we are standardizing on 2-character locale names.
+
+****************************************************************
+
+From: Brad Moore
+Date: Thursday, June 3, 2010   9:40 AM
+
+> All this system needed was a way to figure out the current locale,
+> which is what this AI is hoping to address.
+
+But you could use the same argument about any platform-dependent
+issue:
+
+    - I once wrote a program that needed to query the virtual memory
+      page size.
+
+    - Lots of people want to write programs that spawn subprocesses.
+
+    - Just yesterday, we had an internal discussion at AdaCore,
+      where we decided we wanted a way to query the number of
+      processors on the current machine.
+
+    - Etc.
+
+Should we add portable ways to do the above in Ada?  Well, maybe.
+We're moving slowly in that direction (e.g. adding Ada.Directories).
+
+But why is querying the current locale more important than any other
+OS-dependent thing?  If you don't have it, it's no big deal -- you write a
+package with multiple bodies (one for windows, one for unix, ...).
+
+> So, Bob Duff's earlier comment having to write too many if else
+> statements to support too many languages doesn't seem to apply if you
+> have such a message lookup facility.
+
+Well, sure, you've moved the work.  It seems like the bulk of the work for the
+kind of project you described is in implementing the database and the ASIS tool,
+and hiring a native French (or whatever) speaker to translate the messages.
+Writing the query-locale primitive seems trivial by comparison.
+
+I'm not strongly opposed to this AI -- I just can't get too excited about having
+this feature, and of course any feature, even a small one like this, has a cost.
+But as I've admitted several times, I am biased by having spent my whole life
+writing English-only software.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Thursday, June 3, 2010  10:07 AM
+
+> ... But why is querying the current locale more important than any
+> other OS-dependent thing?  If you don't have it, it's no big deal --
+> you write a package with multiple bodies (one for windows, one for
+> unix, ...)
+
+The argument for this kind of package is that without it, you can't easily share
+anything built on top of it.  So if someone builds a nice higher-level
+internationalization capability, each such capability needs to invent its own
+way to get the locale.  If you want to use two of these, say one that does nice
+error messages and one that handles currency well, you can't easily mix and
+match.
+
+As software becomes more globalized, this kind of thing seems increasingly
+important for an internationally-standardized language. Admittedly it is just a
+start, but if it can be made even simpler (e.g. by eliminating the three or four
+different formats), then it puts a useful "stake" in the ground for further
+portable packages to be built upon.
+
+If we can just agree on the package spec, we don't really care so much on
+whether the package is implemented by the vendor, because as we know the
+implementation of such a package is generally trivial on most existing O/Ss.  It
+is the common spec that provides the real value.
+
+****************************************************************
+
+From: Bob Duff
+Date: Thursday, June 3, 2010   4:01 PM
+
+...
+> But you could use the same argument about any platform-dependent
+> issue:
+>
+>     - I once wrote a program that needed to query the virtual memory
+>       page size.
+
+Me too. But I don't think this is a common need (and when you do it, the result
+isn't portable even to different versions of the same OS - I ended up tuning
+each such program to the machine that I intended to run it on).
+
+>     - Lots of people want to write programs that spawn subprocesses.
+
+We've tried previously to standardize that, but we couldn't even figure out a
+way to describe it portably. If anyone has a good idea, I think we would surely
+consider it again.
+
+>     - Just yesterday, we had an internal discussion at AdaCore,
+>       where we decided we wanted a way to query the number of
+>       processors on the current machine.
+
+That's part of the CPU proposal, of course, which is now in AI05-0171-1.
+Specifically, function Number_Of_CPUs. Should be part of Ada 2012.
+
+>     - Etc.
+
+Hard to comment on that.
+
+> Should we add portable ways to do the above in Ada?  Well, maybe.
+> We're moving slowly in that direction (e.g. adding Ada.Directories).
+
+The answer is yes, but it is hard enough in some cases (spawn) that we haven't
+done it.
+
+> But why is querying the current locale more important than any other
+> OS-dependent thing?  If you don't have it, it's no big deal -- you
+> write a package with multiple bodies (one for windows, one for unix,
+> ...).
+...
+> I'm not strongly opposed to this AI -- I just can't get too excited
+> about having this feature, and of course any feature, even a small one
+> like this, has a cost.  But as I've admitted several times, I am
+> biased by having spent my whole life writing English-only software.
+
+Well, for me, I think that all software should be written in English and people
+that don't know English should keep using their abacuses. :-) But I don't expect
+to get much support for *that* position.
+
+And this seems pretty trivial. The main issue is whether to insist on strings of
+a particular length or just make it String. Based on the various comments, I
+thought we have decided on String (because the most recent standards in this
+area use 3 character and longer strings sometimes), just to keep the future
+flexibility available. (If we insist on 2-character strings, what happens when
+Windows or Linux implements the 3-character names from i18n??)
+
+****************************************************************
+
+From: Brad Moore
+Date: Friday, June 4, 2010   1:05 PM
+
+> I agree with the desire to make this as portable as possible, so we
+> should choose one of the representations, and it sounds like you
+> recommend the 2-character one, which makes sense to me.  If that is
+> true, we should probably use String(1..2) explicitly (or a
+> suitably-named subtype or type) rather than simply String.  Having to
+> manipulate arbitrary-length strings seems like unnecessary overhead if
+> we are standardizing on 2-character locale names.
+
+I have been giving this more thought, and think that for language names, it
+makes for sense to go with 3-character codes defined in ISO 639-2, rather than
+the 2-character codes defined in ISO 639-1
+
+ISO 639-2 covers all the languages in ISO 639-1, but adds quite a number of
+other languages.
+
+ISO 639-1 is intended to cover all the major languages in the world ISO 639-2 is
+intended to cover all the languages in the world with significant bodies of
+literature, and also includes codes for language groups (although those probably
+aren't relevant to this AI). It covers most of the languages of the world.
+
+ISO 639-3 adds to the ISO 639-2 code set. It is intended to be a comprehensive
+list of all languages, including extinct, ancient, historic, and constructed
+languages. It also is a 3 character code.
+
+See http://www.loc.gov/standards/iso639-2/php/code_list.php
+
+Also is an excellent site that you can browse ISO 639-1, -2, and -3 codes.
+
+http://www.sil.org/iso639-3/codes.asp
+
+A quote from ISO's FAQ on the ISO 639 site.
+
+Q => "Why do some languages have both ISO 639-1 and 639-2 codes associated with
+them while others have only ISO 639-2 codes?"
+
+A => "  ...  However, because of the inadequacy of the alpha-two codes to
+represent all of the languages in the world (it can only accommodate 676 codes)
+and to assure backwards compatibility with existing usage compliant with RFC
+4646 (and its predecessors), new language codes may be considered for inclusion
+in both parts or in ISO 639-2 only."
+
+Assuming that we decide to go with ISO 639-2, The question then becomes should
+we use the T codes or the B codes of ISO 639-2?
+
+The B codes are the codes that match the English pronunciation of the language
+name, whereas the T codes match the native name of the language.
+
+e.g. For French, the B code is fre while the T code is fra.
+     For German, the B code is ger while the T code is deu.
+
+Most languages only have one code in ISO 639-2. For those languages you'd get
+the same code for both.
+
+The Wikipedia site below suggests that the T codes are generally preferred,
+http://en.wikipedia.org/wiki/ISO_639-2
+
+I suggest we go with ISO 639-2/T.
+
+For country names defined in ISO 3166-1, the set of countries of alpha-2 codes
+is identical to the set of countries with alpha-3 codes. In that case, there is
+not much reason for recommending the alpha-2 codes vs the alpha-3 codes. Alpha-2
+codes are used in domain name suffixes. Alpha-3 codes are used in places such as
+passport identification. They are a bit more readable than that the 2 character
+codes. To be consistent, I suppose I could argue that we should use 3 character
+codes since we would want to use 3 character codes for language names.
+
+A further note of discussion.
+For locales, Microsoft uses its own concept called Locale Id.
+Microsoft defines a locale as either a language, or a language combined with a
+country.
+
+A Windows locale id is a 16 bit code.
+
+See
+http://www.science.co.il/language/Locale-Codes.asp?s=decimal
+
+Which shows a mapping from locale id to ISO 3661-1 country name (but not ISO 639
+language name).
+
+I believe ISO 639-2 would cover all the languages supported by Microsoft (and
+much more).
+
+I had a quick scan through the list and counted roughly 65 languages supported
+by Microsoft some with multiple variants based on country. eg. Engish (United
+States) and English (Canada)
+
+(You'd think we speak the same language, but we say our "Z"'s differently. Not
+to mention we also tend to favour British spellings on things, but sometimes
+prefer the American spelling just to keep things confusing)
+
+Microsoft's approach with locale id suggests to me that there is even more
+reason for providing a portable means in Ada to get the locale. General purpose
+translation lookup facilities such as mentioned in the previous email, would
+benefit from having a portable way to get locale names (language and country) on
+a Windows platform.
+
+An implemention of this AI on windows could do the translation from windows
+locale id into ISO 639-2 and ISO 3166-1, which I don't think would be hard to
+do. It should be a simple mapping.
+
+Assuming we decide to have the package return 3 character subtypes, rather than
+string,
+
+Which would be preferred?
+
+1)
+   type Language_Code is array (1 .. 3) of Character range 'a' .. 'z';
+   type Country_Code is array (1 .. 3) of Character range 'a' .. 'z';
+
+(or leave off the constraint. The ISO standards recommend lower case,  and the
+codes are case insensitive)
+
+2)
+   type Language_Code is new String (1 .. 3);
+   type Country_Code is new String (1 .. 3);
+
+3)
+   subtype Language_Code is String (1 .. 3);
+   subtype Country_Code is String (1 .. 3);
+
+
+4) Have the functions return
+        String (1 ..3)
+
+Any other suggestions?
+
+I'm leaning toward either 1) or 2).
+
+****************************************************************
+
+From: Tucker Taft
+Date: Friday, June 4, 2010   2:20 PM
+
+All of this makes sense.  As far as T vs. B, what do most operating systems
+provide? If they only provide B, then we should go with that.  If they provide
+both, then "T" seems like the way to go.  If some provide only "T" and some
+provide only "B", then we would have to say it is implementation defined whether
+the "T" or "B" version is returned, and the user would have to have both as keys
+in their mapping from locale ID to message contents.
+
+Having to worry about both upper and lower case is a pain.  I would go with
+upper case only if we want to save people the trouble of doing case-insensitive
+lookups, since they seem to be used in upper case in many contexts, and Ada
+tends to favor all upper case for things like Enum'Image.
+
+By the way, are the characters used in the 3-character code guaranteed to be
+Latin-1, or do we need to use Wide_Character for the character codes?
+
+And I agree with making them a distinct type, if they are restricted to being
+exactly three characters.
+
+****************************************************************
+
+From: Brad Moore
+Date: Friday, June 4, 2010   5:50 PM
+
+In Max OS X, locales are identified as per BCP 47 (RFC 4646)
+http://www.rfc-editor.org/rfc/bcp/bcp47.txt
+
+Locales in POSIX are identified by
+system environment variables, the LANG environment variable.
+
+From Wikipedia,
+"On Unix, Linux and other POSIX-type platforms, locale identifiers are defined
+similar to the BCP 47 definition of language tags, but the locale variant
+modifier is defined differently, and the character set is included as a part of
+the identifier. It is defined in this format:
+[language[_territory][.codeset][@modifier]]
+
+(For example, Australian English using the UTF-8 encoding is
+en_AU.UTF-8.) "
+
+BCP 47 (RFC 4646) identifies a format that starts with an ISO 639 code (either
+alpha-2 or alpha-3) followed by other optional parts separated by hyphens. The
+next optional part is the extended language tag which are up to 3 alpha-3 codes
+(separated by hyphens) from ISO 639.
+
+This mostly is not used, but some locales do use this extended language tag.
+
+Examples of language tags including extlang subtags are:
+
+    * zh-yue (Cantonese Chinese)
+    * ar-afb (Gulf Arabic)
+
+With this possibility, I would say we need to go back to language codes as being
+defined as variable length string types.
+
+Following this is a 4 character ISO 15924 code identifying the script associated
+with the language.
+
+Then comes the region identifier, which is either an alpha-2 or numeric-3 ISO
+3166-1 code.
+
+Note that alpha-3 is not used here to identify the country. I believe this is
+how the syntax differentiates between optional extlang ISO 639 language codes
+and the region code. If it's alpha-3 it's an ISO 639 code, otherwise, it's an
+ISO 3166-1 code. This suggests that we should be using 3166-1 alpha-2 for the
+country codes.
+
+If we see a numeric-3 code, the implementation could convert that to an alpha-2
+code. Or, alternatively we could say that the Country code function can return
+either a 2 or 3 character field.
+
+Some Key excerpts from BCP 47...
+
+The RFC states "the language tags described in this document are sequences of
+characters from the US-ASCII [ISO646] (7 bit ASCII) repertoire."
+
+This answers your question about whether we need to worry about wide characters
+and so on. The answer is no.
+
+"At all times, language tags and their subtags, including private use
+   and extensions, are to be treated as case insensitive:"
+
+"This format generally corresponds to
+   the common conventions for the various ISO standards from which the
+   subtags are derived.
+
+   These conventions include:
+
+   o  [ISO639-1] recommends that language codes be written in lowercase
+      ('mn' Mongolian).
+
+   o  [ISO15924] recommends that script codes use lowercase with the
+      initial letter capitalized ('Cyrl' Cyrillic).
+
+   o  [ISO3166-1] recommends that country codes be capitalized ('MN'
+      Mongolia).
+"
+
+"When languages have both an ISO 639-1 two-character code and a three-
+   character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only
+   the ISO 639-1 two-character code is defined in the IANA registry.
+"
+
+This suggests ISO 639-1 (alpha-2) is used when it can be, otherwise use ISO
+639-2 (or higher) to return an alpha-3 code.
+
+On my linux machine, the LANG variable is set to "en_CA.utf8".  -- utf8
+identifies the script.
+
+"When a language has no ISO 639-1 two-character code and the ISO
+   639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code
+   for that language differ, only the Terminology code is defined in the
+   IANA registry."
+
+This suggests the answer to your question about "T" vs "B" is that they tend to
+use "T".
+
+User's of the Ada.Locale.Language function would write their applications
+according to the spec we provide.
+
+Based on all this, if we wanted to define as precise as possible type to
+describe the return values in Ada we might end up with something like;
+
+   ISO_639_Max_Length : constant := 3;
+   subtype ISO_639_Index is
+     Positive range 1 .. 4 * ISO_639_Max_Length;
+
+   type Language_Code is
+      array (ISO_639_Index range <>) of Character range 'a' .. 'z';
+   -- Add an Ada 2012 invariant that says
+   -- Language_Code'Length > 1 and
+   -- (Language_Code'Length mod 3 = 0 or Language_Code'Length mod 3 = 2)
+   -- This allows for extended language subtags
+
+   type Country_Code is array (1 .. 2) of Character range 'A' .. 'Z';
+
+Though the ISO standards are case insensitive, we could force return values from
+our package to be only upper or lower case.
+
+Or we could eliminate the constraint and use simpler string types, even though
+we would always return upper or lower case.
+
+****************************************************************
+
+From: Brad Moore
+Date: Friday, June 4, 2010   6:57 PM
+
+- More on extended language subtags from BCP 47.
+
+"  Although the ABNF production 'extlang' permits up to three
+       extended language tags in the language tag, extended language
+       subtags MUST NOT include another extended language subtag in
+       their 'Prefix'.  That is, the second and third extended language
+       subtag positions in a language tag are permanently reserved and
+       tags that include those subtags in that position are, and will
+       always remain, invalid.
+
+   For example, the macrolanguage Chinese ('zh') encompasses a number of
+   languages.  For compatibility reasons, each of these languages has
+   both a primary and extended language subtag in the registry.  A few
+   selected examples of these include Gan Chinese ('gan'), Cantonese
+   Chinese ('yue'), and Mandarin Chinese ('cmn').  Each is encompassed
+   by the macrolanguage 'zh' (Chinese).  Therefore, they each have the
+   prefix "zh" in their registry records.  Thus, Gan Chinese is
+   represented with tags beginning "zh-gan" or "gan", Cantonese with
+   tags beginning either "yue" or "zh-yue", and Mandarin Chinese with
+   "zh-cmn" or "cmn".  The language subtag 'zh' can still be used
+   without an extended language subtag to label a resource as some
+   unspecified variety of Chinese, while the primary language subtag
+   ('gan', 'yue', 'cmn') is preferred to using the extended language
+   form ("zh-gan", "zh-yue", "zh-cmn")."
+
+This suggests we might be able to stick with just returning a single
+alpha-2 or alpha-3 code for Language. If the locale has an extended sub-tab,
+return that instead of the primary language sub-tab.
+
+- Regarding numeric region (country) codes.
+
+The numeric codes identify macro-geographic (continental) or sub regions. If the
+region has an ISO 3166-1 code defined for it, that is what must be registered.
+The numeric code is only used for bigger regions larger than a country.
+
+To give an idea of what these numeric codes are;
+
+I found the list of numeric regions at
+
+http://rishida.net/utils/subtags/index.php?list=7&submit=List
+
+001 World
+002 Africa
+005 South America
+009 Oceania
+011 Western Africa
+013 Central America
+014 Eastern Africa
+015 Northern Africa
+017 Middle Africa
+018 Southern Africa
+019 Americas
+021 Northern America
+029 Caribbean
+030 Eastern Asia
+034 Southern Asia
+035 South-Eastern Asia
+039 Southern Europe
+053 Australia and New Zealand
+054 Melanesia
+057 Micronesia
+061 Polynesia
+142 Asia
+143 Central Asia
+145 Western Asia
+150 Europe
+151 Eastern Europe
+154 Northern Europe
+155 Western Europe
+419 Latin America and the Caribbean
+
+The key thing is, the rules in BCP 47 is setup in such a way that for each
+locale there is only one way to define it according to the IANA registries. For
+a given locale, it basically uses the shortest codes defined in ISO 3166-1 and
+ISO 639. If we just return this minimal code then clients shouldn't have to
+worry about checking for all the variants between alpha-2, alpha-3, and
+numeric-3.
+
+So my once again revised view of return types now is that Country and Language
+would both return a single alpha-2 or alpha-3 code that uniquely identifies the
+locale.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Saturday, June 5, 2010   10:42 AM
+
+> ... So my once again revised view of return types now is that Country
+> and Language would both return a single alpha-2 or
+> alpha-3 code that uniquely identifies the locale.
+
+TMI!
+
+We lose portability if we try to accommodate everything.  I think we should pick
+one, and have the body map to that, presuming that is possible.  Why allow both
+alpha-2 and alpha-3?  How is the "portable" program supposed to deal with that?
+
+****************************************************************
+
+From: Brad Moore
+Date: Saturday, June 5, 2010   12:53 PM
+
+> TMI!
+
+Sorry for the data overload. Once I started poking around in BCP 47, I ran into issues I hadn't considered, and was getting quite frazzled with the approach of having a package implementation that blindly returned whatever the OS gave us.
+
+> We lose portability if we try to accommodate everything.  I think we
+> should pick one, and have the body map to that, presuming that is
+> possible.  Why allow both alpha-2 and alpha-3?  How is the "portable"
+> program supposed to deal with that?
+
+Yes! If we have the implementation map the OS string to alpha-3, I think it
+makes things a whole lot simpler for us in the long run. I was worried that we
+wanted to allow returning the OS string value. While that might be trivial to
+implement as it doesn't require any mapping, you then have to deal with
+specifying what actually is returned, including special IANA rules regarding
+whether codes are registered or not, and how they are registered, and so on.
+
+As you mention, then clients also have a harder time trying to figure out
+whether to expect alpha-2 or alpha-3, or numeric-3.
+
+The BCP 47 description of macro-geographic regions also through me for a loop,
+and I was worried about how to deal with that.
+
+Microsoft doesn't have locales based on macro-geopraphic regions, it's either a
+language, or a language specialized by a country.
+
+I suspect macro-geographic regions are seldom used, if at all.
+Since those aren't countries, if that's what the OS gives us, I think it would
+be fair to return Country_Unknown for those cases, if they exist. Likely it is
+the Language that is important, not the country/region name in those cases.
+
+So it seems clear to me that Language should returns a 3 character ISO 639-2/T
+code.
+
+It is less clear whether we should go with alpha-2 or alpha-3 IS0 3166-1 codes
+for Country. We should pick one or the other and stick with it. Alpha-2 might
+require less implementation work for POSIX/OSX since BCP 47 does not allow
+alpha-3 country names. We could use a simple parsing of the OS value in that
+case. Windows would require a mapping either way.
+
+On the other hand, if we are using alpha-3 for Language, it might make sense to
+use alpha-3 for country also. Alpha-3 is generally more readable than alpha-2,
+and does not have the code space limitations of the 2 character coding scheme.
+Based on this, I think it makes sense to go with alpha-3 for Country name.
+
+I think we have converged a lot closer to a solution than the last writeup of
+AI-0127. I'm thinking I should submit an updated revision so people can get a
+better understanding of where we are at.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Saturday, June 5, 2010   1:19 PM
+
+I would follow POSIX if it has
+already standardized this to some
+degree.  And yes, we need a simplified
+version of this incorporating your
+latest (and hopefully final ;-) thinking on this!
+
+****************************************************************
+
+From: Brad Moore
+Date: Saturday, June 5, 2010   2:44 PM
+
+OK, based on that I will go with alpha-2 for Country codes.
+
+One more question.
+
+I am thinking of going with;
+
+   type Language_Code is array (1 .. 3) of Character range 'a' .. 'z';
+   type Country_Code is array (1 .. 2) of Character range 'A' .. 'Z';
+
+rather than;
+
+   type Language_Code is new String (1 .. 3);
+   type Country_Code is new String (1 .. 2);
+
+since it is a more precise definition, and better portrays to the user what they
+can expect as a return value.
+
+That raises the question of how to define Country_Unknown and Language_Unknown,
+since defining as a 2 or 3 spaces code would not match this definition.
+
+For country codes, ISO 3166-1 does define some reserved codes in two categories;
+   - reserved codes
+   - user defined codes.
+
+Reserved codes are codes that have become obsolete. The ISO 3166/MA, when
+justified, reserves these codes which it undertakes not to use for other than
+specified purposes during a limited or indeterminate period of time.
+
+User-assigned code elements are codes at the disposal of users who need to add
+further names of countries, territories, or other geographical entities to their
+in-house application of ISO 3166-1, and the ISO 3166/MA will never use these
+codes in the updating process of the standard. The following codes can be
+user-assigned:
+
+    * Alpha-2: AA, QM to QZ, XA to XZ, and ZZ
+    * Alpha-3: AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ
+
+Of these two categories, if we wanted to define a constant for Country_Unknown,
+I think we would want to select a code from the user-assigned code group.
+
+According to Wikipedia, one such user-assigned coding is by the Unicode Common
+Locale Data Repository, which assigns ZZ to represent "Unknown or Invalid
+Territory"
+
+Assuming this is the way we want to go,
+I would propose we use "ZZ" also for that purpose in the definition of
+Country_Unknown.
+
+I like the idea of using a value defined within the standard rather than
+defining our own constant such as "  ". We can then say that the values returned
+by the Country function are always ISO 3166-1 codes.
+
+For language codes, ISO-639 defines "und" (for undetermined) which is used in
+situations in which a language or languages must be indicated but the language
+cannot be identified.
+
+The Language_Unknown constant should be set as that code.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Saturday, June 5, 2010   4:12 PM
+
+XX and xxx would seem to be natural choices.
+Defining the arrays as limited to the specified range of characters makes sense,
+since just looking at the spec will eliminate a lot of questions.
+
+****************************************************************
+
+From: Brad Moore
+Date: Saturday, June 5, 2010   5:11 PM
+
+The reserved XXX code for alpha-3 only applies to Country codes and ISO 3166-1.
+It is not reserved as far as I know in ISO 639.
+
+ISO 639 specifically defines "und" which is not a special reserved code or user
+assigned code, it is a regular code just like all the others. We should
+definitely be using "und" for Language_Unknown.
+
+Since we wouldn't use xxx for language, there is no benefit for a matching code
+for country. Since ZZ is the last user defined Country code, and because it
+already has uses for the purpose of representing an unknown country, I think we
+should use the ZZ code.
+
+****************************************************************
+
+From: Brad Moore
+Date: Saturday, June 5, 2010   3:58 PM
+
+One other question.
+
+Does it really make sense for this package to be a child of System?
+
+System packages seem to be packages that define things such as implementation
+defined constants, such as the Storage_Element definition.
+
+This locale package does not have any implementation defined definitions, and
+feels to me more like a portable library such as Ada.Directories.
+
+I think it should be a child of Ada, rather than a child of System.
+
+****************************************************************
+
+From: Tucker Taft
+Date: Saturday, June 5, 2010   4:13 PM
+
+Agreed.  Make it a child of Ada,
+or conceivably a child of Interfaces.
+
+****************************************************************
+
+From: Brad Moore
+Date: Saturday, June 5, 2010   5:55 PM
+
+I have a updated version attached that [Version /02 - Editor.]
+  - eliminates implementation defined constants
+  - eliminates implementation defined return values
+  - Limits the codes returned to always be 3 characters for language
+    codes as defined by ISO 629-2/T and 2 characters for country codes
+    as defined by ISO 3166-1.
+  - Changes the types to specify lower case constraints for language
+    codes and upper case constraints for country codes.
+  - Moves the package from a child of System to a child of Ada.
+
+****************************************************************
+
+From: Randy Brukardt
+Date: Saturday, June 12, 2010   9:48 PM
+
+A couple of questions for the next version (*AFTER* the meeting):
+
+The package name changed from Locale to Locales in this version. Was that
+intended?
+
+The package requires ISO 639-2/T names. Why are the other parts of that standard
+referenced in the "Normative References" section? If we're not using them, they
+ought not be there. (BTW, I put those references in the required numeric order.)
 
 ****************************************************************
 

Questions? Ask the ACAA Technical Agent