CVS difference for ais/ai-00285.txt

Differences between 1.6 and version 1.7
Log of other versions for file ais/ai-00285.txt

--- ais/ai-00285.txt	2003/01/15 00:33:52	1.6
+++ ais/ai-00285.txt	2003/05/24 00:51:38	1.7
@@ -2130,3 +2130,716 @@
 
 *************************************************************
 
+From: Robert I. Eachus
+Sent: Tuesday, March 18, 2003  12:56 AM
+
+I hate to reopen the character set can of worms, but I think we need to
+do it.  In effect Latin 1 is being replaced by Latin 9 (ISO 8859-15).
+Latin 9 adds the Euro sign, OE Ligatures, and S and Z with caron, and
+capital Y with diaresis to Latin 1, removing the currency symbol, broken
+bar, some accents and the vulgar fractions.  See
+http://www.cs.tut.fi/~jkorpela/latin9.html for a fuller explanation.
+
+Latin 9 is slowly being adopted.  Of course some countries in the Euro
+zone are already using a "localized" version of Latin 1 with the
+currency sign representation looking suspiciously like a Euro symbol.
+So we could decide to leave this issue to Ada-1Z or whatever.
+
+However, I think that at the least we should add a Latin9 package to Ada
+with the correct character names.  What else should or could be done?
+
+One possibility would be to redefine Ada.Characters.Handling to
+correctly treat seven new codes as lower or upper case characters.  I
+would much prefer to go for the whole nine yards so we never need to do
+this again.  Add an enumeration type Sets to Ada Characters, or if you
+prefer Character_Sets.  It should enumerate all the ISO 8859 character
+sets.  (If you want to be clever, we could start with ISO646 so that
+Sets'Pos(N) = ISO 8859-N.) In any case we should allow implementations
+to extend the type.  This would allow both for new ISO 8859 character
+sets, and for Unicode, EBCDIC, IBM code pages, and so on.
+
+Now add procedure Set_Default_Character_Set, and function
+Current_Character_Set to Ada.Characters.Handling.  (Or if you prefer to
+Ada.Characters.)  As far as I am concerned the only required behavior
+for Set_Character_Set should be to accept an argument of Latin_1.
+
+It it probably a day or two of work to modify the functions in
+Ada.Character.Handling to support all the current ISO 8859 mappings.  It
+is at least ten times harder to actually test all possible combinations
+of character set and Ada.Characters.Handling functions.  It could be
+another five to ten times that to add tests to the validation suite,
+with very little practical effect.  This is why I favor a minimalist
+approach to the requirements.  (National bodies can of course require
+supporting other values for Character_Sets.  For example, the Japanese
+national body could require Shift-JIS support if they felt like it,
+without requiring that compilers that comply to the Japanes national
+standard be incompatible with ISO 8652, without the ARG spending all of
+its time on character set issues.)
+
+What about names in Ada programs?  ARGH!  If your compiler is written in
+Ada and uses Ada.Characters.Handling, modifying the compiler is not a
+problem.  Defining what it means to compile a program written using a
+non-Latin-1 character set threatens to expand clause 2 (Lexical
+Elements) to the size of a small telephone directory.  I would prefer to
+just modify 2.1 to direct people to ISO 10646-1, which is the size of a
+large telephone directory, plus currently five ammendments, for the
+meaning of lexical elements in non-Latin_1 source representations, and
+let national bodies decide what they want to define locally.
+
+*************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, March 18, 2003  2:15 AM
+
+> I hate to reopen the character set can of worms, but I think we need to
+> do it.  In effect Latin 1 is being replaced by Latin 9 (ISO 8859-15).
+> Latin 9 adds the Euro sign, OE Ligatures, and S and Z with caron, and
+> capital Y with diaresis to Latin 1, removing the currency symbol, broken
+> bar, some accents and the vulgar fractions.  See
+> http://www.cs.tut.fi/~jkorpela/latin9.html for a fuller explanation.
+
+This issue was discussed at some length as part of AI 285/01 (of which I am
+the editor).  It is clear that adding support for Latin-9 in Ada.Characters
+(and children) is relatively straightforward.  However there is the much
+nastier question of type Standard.Character, (which has pretty much to
+remain Latin-1 if you don't want to introduce awful incompatibilities) and
+of the interactions between what happens at compile-time and what happens at
+run-time.  Consider for instance the call:
+
+    Ada.Characters.Latin_9.Handling.Is_Letter ('έ')
+
+It has pretty much to return True (that's an S-caron in Latin-9), but that's
+certainly surprising!  This amounts to breaking the Character abstraction
+and interpreting characters as bytes/code points, which is likely to lead to
+confusion in an Ada program that would deal with character sets having
+different encodings.
+
+Another interesting example is mentioned in the minutes of the Bedford
+meeting (http://www.ada-auth.org/ai-files/minutes/min-0210.html#AI285):
+"Consider the enumeration identifier "˜" (latin small letter y diaeresis).
+E'Image(˜) = "˜" in Latin-1 (there is no upper case version), but "Y" in
+Latin-9 (there is an upper case version). So we would need the identifier
+semantics to be changed depending on the character set. Pascal claims that
+this is important to reading French."
+
+After giving it more thought, I have come to the conclusion that the entire
+Latin-9 approach is misguided because:
+
+1 - There is relatively little support in software out there for this
+encoding (heck, I am even reading that some mail gateways bounce back
+messages that use Latin-9 as their character encoding).  Most of the editors
+that I have played with just go to Unicode when you type the Euro sign.
+That provides support for this new character without causing endless
+compatibility nightmares.
+
+2 - I have gone through a similar "code point shuffle" mess at the
+beginning of the 80s: at the time we only had 7 bits per character (as you
+probably remember, the 8th bit was often used for parity) and some genius
+had invented to encode the French accented characters using the code points
+normally assigned to [, ], \, and the like.  I have written thousands of
+lines of Pascal where an array indexing looked like Arr‡IŠ (instead of
+Arr[I]) just because of this silliness. What was painful-but-tolerable 20
+years ago is just not going to fly nowadays: I am ready to bet that the
+world will go Unicode before it goes Latin-9.
+
+Therefore, the latest version of AI 285 proposes to go to Unicode for the
+text representation of programs, relying on the categorization work done by
+the Unicode people so that we don't have to argue endlessly about which
+characters can appear in identifiers, etc.  And it entirely ignores Latin-9,
+or any other Latin-N for that matter.
+
+*************************************************************
+
+From: Robert I. Eachus
+Sent: Tuesday, March 18, 2003  2:15 AM
+
+Pascal Leroy wrote:
+
+> This issue was discussed at some length as part of AI 285/01 (of which I am
+> the editor).  It is clear that adding support for Latin-9 in Ada.Characters
+> (and children) is relatively straightforward.  However there is the much
+> nastier question of type Standard.Character, (which has pretty much to
+> remain Latin-1 if you don't want to introduce awful incompatibilities) and
+> of the interactions between what happens at compile-time and what happens at
+> run-time.
+
+I thought we had an AI on the subject, but searching for Latin in the
+title didn't find it.  I see what happened is that the name of the AI
+was changed.  (I don't want to make work for Randy, and this may be a
+rare occurance or may not be.  Perhaps a set of links to "old" names
+somewhere.)
+
+So I guess that the title of the original post is correct, because as I
+see it, the issue of Latin 9 support is completely separate from the
+issues with 16 and 32 bit character sets.
+
+Now to pull some magic by quoting from rev 1.4 of AI 285:
+
+An implementation is allowed to provide a library package named
+Ada.Characters.Latin_9.  This package shall be identical to
+Ada.Characters.Latin_1, except for the following differences:
+
+- It doesn't declare the constants Currency_Sign, Broken_Bar, Diaeresis,
+Acute, Cedilla, Fraction_One_Quarter, Fraction_One_Half, and
+Fraction_Three_Quarter.
+
+- It declares the following constants:
+
+     Euro_Sign : constant Character := '€'; -- Character'Val (164)
+     UC_S_Caron : constant Character := 'S'; -- Character'Val (166)
+     LC_S_Caron : constant Character := 's'; -- Character'Val (168)
+     UC_Z_Caron : constant Character := 'Ž'; -- Character'Val (180)
+     LC_Z_Caron : constant Character := 'ž'; -- Character'Val (184)
+     UC_OE_Diphthong : constant Character := 'O'; -- Character'Val (188)
+     LC_OE_Diphthong : constant Character := 'o'; -- Character'Val (189)
+     UC_Y_Diaeresis : constant Character := 'Y'; -- Character'Val (190)
+
+In Netscape 7.01, with the encoding set to Latin-1, this displays
+(correctly) the Latin 9 representations!  As does OpenOffice.org,
+Notepad and so on.  Now let me abstract from the Ada.Characters.Latin 1
+
+     Currency_Sign : constant Character := '';  --Character'Val(164)
+     Broken_Bar    : constant Character := 'έ';  --Character'Val(166)
+     Diaeresis     : constant Character := '"';  --Character'Val(168)
+     Acute         : constant Character := ''';  --Character'Val(180)
+     Cedilla       : constant Character := ',';  --Character'Val(184)
+     Fraction_One_Quarter : constant Character := '¬'; --Character'Val(188)
+     Fraction_One_Half : constant Character := '«';  --Character'Val(189)
+     Fraction_Three_Quarters : constant Character := '_';
+--Character'Val(190)
+
+How can this work?  Easy, other standards, in particular ISO/IEC 2022,
+http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=22747
+specify control characters and escape sequences which can be used with
+Latin 1--or any ISO 8859 character set--to access characters from other
+sets.
+
+> Consider for instance the call:
+ >
+ >     Ada.Characters.Latin_9.Handling.Is_Letter ('έ')
+ >
+> It has pretty much to return True (that's an S-caron in Latin-9), but that's
+> certainly surprising!  This amounts to breaking the Character abstraction
+> and interpreting characters as bytes/code points, which is likely to lead to
+> confusion in an Ada program that would deal with character sets having
+> different encodings.
+
+Why would you expect this call to work?  You could argue that a compiler
+"should" raise Program_Error or Constraint_Error, but I would expect any
+reasonable compiler to object at compile time to an invalid character
+literal.  Remember, notationally Ada is written in Unicode/ISO 10646
+BMP, however it is represented.  In context, that call is illegal and
+
+     Ada.Characters.Latin_9.Handling.Is_Letter ('S')
+
+is legal and should return true. (Assuming we recommend having or
+allowing a package Ada.Characters.Latin_9.Handling.)  But this
+discussion has done a lot to convince me that the best solution is to
+add a function Ada.Characters.Current_Set to Ada.Characters.  The
+required work is trival for compilers that want to stay in the Latin 1
+only world, and for compilers that do want to implement support for
+other 8-bit character sets, they really have to do most of the same work
+anyway.
+
+To repeat my proposal:
+
+Add an enumeration type Sets to Ada Characters, or if you prefer
+Character_Sets.  It should enumerate all the ISO 8859 character sets.
+(If you want to be clever, we could start with ISO646 so that
+Sets'Pos(N) = ISO 8859-N.) In any case we should allow implementations
+to extend the type.  This would allow both for new ISO 8859 character
+sets, and for Unicode, EBCDIC, IBM code pages, and so on.
+
+Now add procedure Set_Default_Character_Set, and function
+Current_Character_Set to Ada.Characters.Handling.  (Or if you prefer to
+Ada.Characters.)  As far as I am concerned the only required behavior
+for Set_Character_Set should be to accept an argument of Latin_1.
+
+We should probably also add a library pragma to change the default
+mapping of Character.  (Compilers will probably accept command line
+setting of character mappings, but I think that a stanard pragma would
+help standardization.)
+
+If my proposal is accepted:
+
+      Ada.Characters.Handling.Is_Letter ('έ')
+
+should return false when Ada.Characters.Current_Set is Latin_1, and
+
+     Ada.Characters.Handling.Is_Letter ('S')
+
+should return true when Ada.Characters.Current_Set is Latin_9. The
+behavior of
+
+     Ada.Characters.Handling.Is_Letter (Character'Val(166)
+
+Should depend on the current value of Ada.Characters.Current_Set. What
+happens in the other cases will at best be implementation defined.  In
+other words, if your program contains a (Unicode/BMP or UTF8) character
+literal that is not in a supported character set, I expect
+Program_Error, if a character in a literal is not a legal literal for
+Character, it is an error, just like other misspellings of literals.
+
+> Another interesting example is mentioned in the minutes of the Bedford
+> meeting (http://www.ada-auth.org/ai-files/minutes/min-0210.html#AI285):
+> "Consider the enumeration identifier "˜" (latin small letter y diaeresis).
+> E'Image(˜) = "˜" in Latin-1 (there is no upper case version), but "Y" in
+> Latin-9 (there is an upper case version). So we would need the identifier
+> semantics to be changed depending on the character set. Pascal claims that
+> this is important to reading French."
+
+Exactly why I think a way is needed for the programmer to be able to
+determine what the actual character set mapping is.  Almost no burden
+for compilers that support Latin 1 only, and not that much additional
+for compilers that do support other 8-bit mappings.  (Actually, I may be
+wrong but I think all currently validated compilers accept source in
+non-Latin 1 character sets.)
+
+> After giving it more thought, I have come to the conclusion that the entire
+> Latin-9 approach is misguided because:
+>
+> 1 - There is relatively little support in software out there for this
+> encoding (heck, I am even reading that some mail gateways bounce back
+> messages that use Latin-9 as their character encoding).  Most of the editors
+> that I have played with just go to Unicode when you type the Euro sign.
+> That provides support for this new character without causing endless
+> compatibility nightmares.
+>
+> 2 - I have gone through a similar "code point shuffle" mess at the
+> beginning of the 80s: at the time we only had 7 bits per character (as you
+> probably remember, the 8th bit was often used for parity) and some genius
+> had invented to encode the French accented characters using the code points
+> normally assigned to [, ], \, and the like.  I have written thousands of
+> lines of Pascal where an array indexing looked like Arr‡IŠ (instead of
+> Arr[I]) just because of this silliness. What was painful-but-tolerable 20
+> years ago is just not going to fly nowadays: I am ready to bet that the
+> world will go Unicode before it goes Latin-9.
+>
+> Therefore, the latest version of AI 285 proposes to go to Unicode for the
+> text representation of programs, relying on the categorization work done by
+> the Unicode people so that we don't have to argue endlessly about which
+> characters can appear in identifiers, etc.  And it entirely ignores Latin-9,
+> or any other Latin-N for that matter.
+
+Couldn't agree more.  The right solution is not to switch from Latin 1
+to any other character set as a standard, but to supply a standard
+method for localization, and keep with the assumption of current
+Unicode/BMP for Wide_Character and for (notational) source.  Does any
+implementor see a problem implementing the above recommendation?
+
+We could also go to the extreme of adding another optional annex dealing
+with character representation issues, but I think we all agree that the
+ARG should stay away from piecemeal character set bindings.  On the
+other hand, I can see having a standard Wide_Character categorization,
+and allowing other characterizations to fall out from that.  But let's
+keep that discussion in AI-285.
+
+*************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, March 18, 2003  6:04 PM
+
+> (Actually, I may be
+> wrong but I think all currently validated compilers accept source in
+> non-Latin 1 character sets.)
+
+Since the only currently validated compilers are from Rational and DDC-I,
+that isn't saying much at all. You have to at least talk about widely-used
+compilers, but then you get into definitional problems.
+
+*************************************************************
+
+From: Robert I. Eachus
+Sent: Tuesday, March 18, 2003  6:51 PM
+
+We are in the standards business.  I think that this is an area where a
+small extention to the standard will be very helpful in providing
+portability.  But we can't really worry about the cost of conformity for
+non-standardized compilers. ;-)
+
+That is why I think that a definition which names the various character
+sets should be standardized:
+
+type Character_Sets is (ISO_646, Latin_1, Latin_2,...Latin_Greek...);
+
+This would help standarized the way that non-Latin 1 character sets are
+named for compatibility.  But I think we should stay out of the business
+of defining which characters are which for Latin_Greek, etc.  That is
+ISO/IEC JTC1/SC2's job, and I think they do it pretty well.
+
+Now if my proposal is accepted and say, GNAT, chooses to support the
+function Ada.Characters.Current_Set in a useful manner.  However, ACT
+sees no demand for Ada.Characters.Set_Default_Character_Set to do
+anything useful, and therefore raises an exception if you try to change
+the value.  (In other words Ada.Characters.Set_Default_Character_Set
+(Ada.Characters.Current_Set) does not raise an exception, but actually
+trying to change the value does.)
+
+Some other vendor may have a customer who requires Latin_Hebrew support,
+but could care less about Latin 9.  Fine.  Assigning Ada names to the
+various 8859 character sets is in our area of competence.  Deciding
+which sets compiler vendors support should be left up to their customers.
+
+Is this useful progress towards standarization? Sure.  Is arguing over
+whether there is demand for Linear_B support way out of the way of
+anything that the ARG wants to get involved in?  Obviously.  Or worse,
+whether a variable named with the Greek Alpha, should match a Latin A?
+Arggh! (If you think that is bad what about CJK unification?  Do we want
+to get into political cat fights about whether or not a Japanese Kanji
+code point matches a (Korean) Hangul character with a different
+appearence?  Please! Anything but that...)
+
+That is why I think we should be in the business of defining how to
+change character sets, but should stay well out of the politics of
+whether, say, compilers purchased by the Canadian government must
+support Latin 9.
+
+*************************************************************
+
+From: Pascal Leroy
+Sent: Wednesday, March 19, 2003  3:54 AM
+
+> In Netscape 7.01, with the encoding set to Latin-1, this displays
+> (correctly) the Latin 9 representations!  As does OpenOffice.org,
+> Notepad and so on.  Now let me abstract from the Ada.Characters.Latin
+1
+
+In the case of Notepad, it just goes to Unicode (encoded as UTF-8) as
+soon as you type a non-Latin-1 character.  So I am not sure what your
+point is.  (Didn't check the other software packages that you mention.)
+
+> Add an enumeration type Sets to Ada Characters, or if you prefer
+> Character_Sets.  It should enumerate all the ISO 8859 character sets.
+> ...
+> Now add procedure Set_Default_Character_Set, and function
+> Current_Character_Set to Ada.Characters.Handling.
+> ...
+> We should probably also add a library pragma to change the default
+> mapping of Character.  (Compilers will probably accept command line
+> setting of character mappings, but I think that a stanard pragma would
+> help standardization.)
+
+I understand the usefulness of a pragma, but I don't really understand
+what sense it makes to change the default character set (whatever that
+is) at run-time.  Consider the case where you compile a program in
+Latin-9 mode, and it has an enumeration literal with an S-caron in it.
+Then at run-time you switch to Latin-1.  Would the 'Image attribute now
+return a string including a broken bar?  That would be very strange.
+
+I can imagine why a program might want to juggle with different
+character encodings (by withing different Latin_N units) but it seems to
+me that the default character set has to be fixed at compilation time.
+
+Anyway none of this changes my opinion that the Latin-N sets are far too
+unimportant to spend precious ARG time on them.
+
+> Or worse,
+> whether a variable named with the Greek Alpha, should match a Latin A?
+> Arggh! (If you think that is bad what about CJK unification?  Do we
+want
+> to get into political cat fights about whether or not a Japanese Kanji
+> code point matches a (Korean) Hangul character with a different
+> appearence?  Please! Anything but that...)
+
+As a matter of fact, the current AI 285 does exactly that, and I don't
+see this as a political cat fight.  The idea is to just follow what the
+Unicode folks are doing (and I suppose _they_ do quite a bit of
+political cat fight).  So to answer your questions, a Latin A is not the
+same thing as a Greek Alpha or a Cyrillic A.  And at this point the
+kanjis and hanguls are not letters, so they are not allowed in
+identifiers.  When the Unicode people decide that ideograms are letters,
+we will update the definition in Ada.
+
+*************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Wednesday, March 19, 2003  4:19 AM
+
+> I understand the usefulness of a pragma, but I don't really understand
+> what sense it makes to change the default character set (whatever that
+> is) at run-time.  Consider the case where you compile a program in
+> Latin-9 mode, and it has an enumeration literal with an S-caron in it.
+> Then at run-time you switch to Latin-1.  Would the 'Image attribute now
+> return a string including a broken bar?  That would be very strange.
+>
+And if you go that way, you may want different tasks to use different
+encodings.... Did I hear "can of worms" ?
+
+*************************************************************
+
+From: Robert I. Eachus
+Sent: Wednesday, March 19, 2003  4:43 PM
+
+First, let me get this out of the way.  I really like UTF-8, and for
+that matter UTF-16.  I would also love to put real Unicode/BMP support
+into Chapter (Clause) 2 and elsewhere in the RM. I would like to see a
+(standard) Wide_Text_IO that supported UTF-1. But it is a lot of work.
+
+However, even if users do eventually migrate toward 16-bit and 32-bit
+character standards, we currently have an 8-bit character type in the
+standard.  My reasons behind arguing for a minimal AI in this area is
+that I think that it would "clear the decks" forever in the 8-bit area,
+and let us concentrate on enhancing 16-bit support in the future.
+
+Pascal Leroy wrote:
+
+> In the case of Notepad, it just goes to Unicode (encoded as UTF-8) as
+> soon as you type a non-Latin-1 character.  So I am not sure what your
+> point is.  (Didn't check the other software packages that you mention.)
+
+I guess you missed the point.  Windows actually uses a superset of Latin
+1 that contains all the Latin 9 characters with different code-points.
+Windows also has IANA-registered extended versions of some other Latin
+sets.  (These are Windows-1291 et. seq.) See the MIME and HTML standards
+for more details.  Notepad and other applications may switch to Unicode
+internally when you enter non-Latin 1 (or non-Windows 1291) characters.
+  But if you cut-and-paste into a text document from one with a
+different mapping, most PC software seems to use ISO 2022 control
+characters to avoid having to reprocess the entire document.  This can
+be done as long as you use at most three ISO 8859 (or Windows) font
+variants.
+
+> I understand the usefulness of a pragma, but I don't really understand
+> what sense it makes to change the default character set (whatever that
+> is) at run-time.
+
+> I can imagine why a program might want to juggle with different
+> character encodings (by withing different Latin_N units) but it seems to
+> me that the default character set has to be fixed at compilation time.
+
+You may be right which is why I gave that hypothetical GNAT example.  I
+think it would be almost trivial for them to support a current character
+set enquiry function, but a procedure to change the character set at
+run-time might take a lot more work.
+
+Where you would want to be able to change the default character set at
+run-time would be for things like Character to UTF-8 encoders and decoders.
+
+ > Consider the case where you compile a program in Latin-9 mode, and
+ > it has an enumeration literal with an S-caron in it. Then at
+ > run-time you switch to Latin-1.  Would the 'Image attribute now
+ > return a string including a broken bar?  That would be very strange.
+
+Why?  The character or string literal gets translated from Latin 9 to
+Character at compile time.  Then you conceptually remap all Character
+and String values when you change the default character set at run-time.
+If you convert the literal from Latin 9 to UTF-8 or Unicode at compile
+time, then try to convert back with a default character set of Latin 1,
+you can and should expect a Constraint_Error.
+
+> Anyway none of this changes my opinion that the Latin-N sets are far too
+> unimportant to spend precious ARG time on them.
+
+In one sense, as I said I agree.  But I think that since we do have
+compilers around that support remapping of Character, a standard way of
+querying that setting is needed for standardization.  As I indicated, I
+can easily be convinced that a way of setting the default mapping at
+run-time is a bit too much.
+
+Certainly though, the same issues will come up with respect to
+Wide_Character if and when compilers support different Wide_Character
+mappings.  In the Wide_Character case determining at run-time what the
+actual mapping is may be important, but I certainly agree that requiring
+support for changing the Wide_Character mapping at run-time (say from
+Shift-JIS to Unicode) would be extreme.
+
+Remember that all that my current proposal requires is that changing
+from Latin 1 to Latin 1 succeed. I agree that anything else should be
+left outside the scope of the (ISO) standard.  I have no trouble with
+leaving the procedure to change the default character set out
+altogether, or making it optional.
+
+> As a matter of fact, the current AI 285 does exactly that, and I don't
+> see this as a political cat fight.  The idea is to just follow what the
+> Unicode folks are doing (and I suppose _they_ do quite a bit of
+> political cat fight).  So to answer your questions, a Latin A is not the
+> same thing as a Greek Alpha or a Cyrillic A.  And at this point the
+> kanjis and hanguls are not letters, so they are not allowed in
+> identifiers.  When the Unicode people decide that ideograms are letters,
+> we will update the definition in Ada.
+
+Exactly my point, except that I think we officially follow ISO 10646 not
+Unicode.  So in theory we should update to Unicode 3.2 compatibility
+when DIS 10646(2003) is accepted.  (Those battles come closer to
+vendettas than cat fights.  The major battles are Japanese vs. Korean,
+Chinese vs. Japanese, Russian vs. Georgian, Greeks vs. Macedonians, and
+francophones vs. everybody.  Did I miss anyone?)
+
+If any other ARG--or CRG--members really care about all this, you too
+can join the madness in Prague next week.
+(http://www.unicode.org/iuc/iuc23/ ;-)
+
+*************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, March 19, 2003  7:25 PM
+
+> Pascal Leroy wrote:
+>
+> > In the case of Notepad, it just goes to Unicode (encoded as UTF-8) as
+> > soon as you type a non-Latin-1 character.  So I am not sure what your
+> > point is.  (Didn't check the other software packages that you mention.)
+>
+> I guess you missed the point.  Windows actually uses a superset of Latin
+> 1 that contains all the Latin 9 characters with different code-points.
+> Windows also has IANA-registered extended versions of some other Latin
+> sets.  (These are Windows-1291 et. seq.) See the MIME and HTML standards
+> for more details.  Notepad and other applications may switch to Unicode
+> internally when you enter non-Latin 1 (or non-Windows 1291) characters.
+
+Humm, the messages you are sending are encoded as "Windows-1252", which is
+the standard Windows character set. That hardly proves anything at all
+(other than that Windows doesn't use Latin-1 itself). (I checked this out in
+the spam filter.)
+
+>   But if you cut-and-paste into a text document from one with a
+> different mapping, most PC software seems to use ISO 2022 control
+> characters to avoid having to reprocess the entire document. This can
+> be done as long as you use at most three ISO 8859 (or Windows) font
+> variants.
+
+Nope, it doesn't change the text at all (if its in the standard Windows
+character set, which most everything is). And if you paste it into the DOS
+box (which uses the OEM character set - which is how I edit the AIs with my
+circa-1986 text editor), it just gets converted to the nearest equivalents.
+For instance, I get a capital Y for UC_Y_Diaeresis (which, BTW, is how your
+note will appear in the !appendix to AI-285).
+
+Generalizations about Windows are almost always wrong. :-)
+
+*************************************************************
+
+From: Robert I. Eachus
+Sent: Thursday, March 20, 2003  12:20 AM
+
+Randy Brukardt wrote:
+
+> Humm, the messages you are sending are encoded as "Windows-1252", which is
+> the standard Windows character set. That hardly proves anything at all
+> (other than that Windows doesn't use Latin-1 itself). (I checked this out in
+> the spam filter.)
+
+(Sorry 1291 et. seq. instead of 1251 et. seq. was a typo.)
+
+I guess I shouldn't be surprised that 1252 as succeeded 1251 as the
+"standard" Windows binding in the US, but I hadn't noticed.  But that
+more clearly makes my point.  Users might want to be able to use 8-bit
+bindings that the ARG as a group should have little or no interest in.
+But there is the IANA registry, and I think we can bind to a pointer to
+those names with little difficulty, and leave it to compiler vendors and
+others to do the "proper" binding to the character set they want to use.
+  We should in no way require compilers to reject S or o (S-caron or the
+oe ligature) in a name.  But we should fix that through references to
+the Unicode & ISO/IEC 10646 standards, and let compiler vendors support
+the 8-bit sets their users want to use. (Including 8-bit standards like
+Shift-JIS and UTF-8.)
+
+> Nope, it doesn't change the text at all (if its in the standard Windows
+> character set, which most everything is).
+
+Oh, there are those who would make you pay dearly for those comments,
+unless you meant Unicode as the "standard" Windows character set.  But
+the reality is that there is NO standard 8-bit character set for
+Windows, versions for different countries use different character sets.
+
+> And if you paste it into the DOS box (which uses the OEM character set -
+ > which is how I edit the AIs with my circa-1986 text editor), it just gets
+ > converted to the nearest equivalents. For instance, I get a capital Y for
+ > UC_Y_Diaeresis (which, BTW, is how your note will appear in the !appendix
+ > to AI-285).
+
+Ouch, does that mean I should write the proposal up as a new draft AI,
+so people can read it?
+
+> Generalizations about Windows are almost always wrong. :-)
+
+I have learned the hard way that generalizations about preferred
+character sets are ALWAYS wrong.
+
+*************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, March 20, 2003  5:51 PM
+
+> Randy Brukardt wrote:
+>
+> > Humm, the messages you are sending are encoded as "Windows-1252", which is
+> > the standard Windows character set. That hardly proves anything at all
+> > (other than that Windows doesn't use Latin-1 itself). (I checked this out in
+> > the spam filter.)
+>
+> (Sorry 1291 et. seq. instead of 1251 et. seq. was a typo.)
+>
+> I guess I shouldn't be surprised that 1252 as succeeded 1251 as the
+> "standard" Windows binding in the US, but I hadn't noticed.
+
+FYI, that's confused. 1251 is "Cyrillic", while 1252 is "Western European".
+
+...
+>   We should in no way require compilers to reject S or o (S-caron or the
+> oe ligature) in a name.  But we should fix that through references to
+> the Unicode & ISO/IEC 10646 standards, and let compiler
+> vendors support the 8-bit sets their users want to use. (Including 8-bit
+> standards like Shift-JIS and UTF-8.)
+
+Which is exactly what Pascal has proposed.
+
+But it should be pointed out that this is a very pervasive change. It means
+that the representation for names at runtime (in things like the tables for
+'Image, for 'External_Tag, for exception information) has to be changed (at the
+very least to UTF-8). For Janus/Ada, where most of the runtime code that deals
+with those things is written in assembler, such a change will be very
+expensive. And that will be true to some extent or other for all compilers.
+
+> > Nope, it doesn't change the text at all (if its in the standard Windows
+> > character set, which most everything is).
+>
+> Oh, there are those who would make you pay dearly for those comments,
+> unless you meant Unicode as the "standard" Windows character
+> set.  But
+> the reality is that there is NO standard 8-bit character set for
+> Windows, versions for different countries use different
+> character sets.
+
+Of course. I should have said "standard US Windows character set"; didn't mean
+to imply that it is the same for everyone.
+
+> > And if you paste it into the DOS box (which uses the OEM character set -
+>  > which is how I edit the AIs with my circa-1986 text editor), it just gets
+>  > converted to the nearest equivalents. For instance, I get a capital Y for
+>  > UC_Y_Diaeresis (which, BTW, is how your note will appear in the !appendix
+>  > to AI-285).
+>
+> Ouch, does that mean I should write the proposal up as a new draft AI,
+> so people can read it?
+
+Nope, AIs go through the same text editor. Using non-7-bit characters in AIs is
+strongly discouraged. (If we wanted to start using HTML for AIs, then perhaps a
+little more flexibility could be allowed.)
+
+> > Generalizations about Windows are almost always wrong. :-)
+>
+> I have learned the hard way that generalizations about preferred
+> character sets are ALWAYS wrong.
+
+Correct. The less the standard says about character sets, the better. Your
+proposal seems to require a lot of additional verbiage and support to solve a
+problem that doesn't seem to actually exist. The Unicode/ISO 10646 problem does
+exist, but once we support that fully, compilers can support anything they want
+without us getting in the way.
+
+(It would be nice to have a way to convert to and from UTF-8 in Ada programs.
+But, that's one of many things that "easy enough to write yourself", so its
+hard to say if it worth adding anything for that.)
+
+*************************************************************
+
+From: Robert Dewar
+Sent: Saturday, March 22, 2003  11:38 AM
+
+I find all this discussion of character sets going way off target. All we are
+talking about here is some predefined names for some of the characters, nothing
+more and nothing less.
+
+*************************************************************
+

Questions? Ask the ACAA Technical Agent