!standard A.3.5 (0) 11-04-26 AI05-0185-1/05 !standard A.3.6 (0) !standard A.3.1(7/2) !standard A.3.2(4) !standard A.3.2(32) !class amendment 09-11-02 !status Amendment 2012 10-08-11 !status ARG Approved 5-0-3 10-10-31 !status work item 10-10-18 !status ARG Approved 8-0-0 10-06-20 !status work item 09-11-02 !status received 09-11-02 !priority Medium !difficulty Medium !subject Wide_Character and Wide_Wide_Character classification and folding !summary Packages are added to provide support for the classification and case folding of Wide_Character and Wide_Wide_Character values. !problem The package Ada.Characters.Handling provides functions to classify a Character, and provides procedures to convert a Character to upper case and lower case. There are no such capabilities for Wide_Character and Wide_Wide_Character. Support for classification and case folding of the Wide_Character and Wide_Wide_Character types should be added to the language. !proposal The current version of the GNAT compiler has defined the following implementation-defined packages; Ada.Wide_Characters.Unicode Ada.Wide_Wide_Characters.Unicode While Ada.Wide_Characters and Ada.Wide_Wide_Characters are standard Ada 2005 packages, the Unicode child packages are non-standard. This proposal to create two standard packages; Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling is based on the GNAT Unicode packages, but without the functions that accept Unicode Category parameters. !wording Modify A.3.1(7/2): If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as [case mapping, classification, ]collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. Add to the end of A.3.2(4): function Is_Line_Terminator (Item : in Character) return Boolean; function Is_Mark (Item : in Character) return Boolean; function Is_Other_Format (Item : in Character) return Boolean; function Is_Punctuation_Connector (Item : in Character) return Boolean; function Is_Space (Item : in Character) return Boolean; Add following A.3.2(32) Is_Line_Terminator True if Item is a character with position 10 .. 13 (Line_Feed, Line_Tabulation, Form_Feed, Carriage_Return) or 133 (Next_Line). Is_Mark Never True (no value of type Character has categories Mark, Non-Spacing or Mark, Spacing Combining). Is_Other_Format True if Item is a character with position 173 (Soft_Hyphen). Is_Punctuation_Connector True if Item is a character with position 95 ('_', known as Low_Line or Underscore). Is_Space True if Item is a character with position 32 (' ') or 160 (No_Break_Space). A.3.5 The Package Wide_Characters.Handling The package Wide_Characters.Handling provides operations for classifying Wide_Characters and case folding for Wide_Characters. Static Semantics The library package Wide_Characters.Handling has the following declaration: package Ada.Wide_Characters.Handling is function Is_Control (Item : Wide_Character) return Boolean; function Is_Letter (Item : Wide_Character) return Boolean; function Is_Lower (Item : Wide_Character) return Boolean; function Is_Upper (Item : Wide_Character) return Boolean; function Is_Digit (Item : Wide_Character) return Boolean; function Is_Decimal_Digit (Item : Wide_Character) return Boolean renames Is_Digit; function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean; function Is_Alphanumeric (Item : Wide_Character) return Boolean; function Is_Special (Item : Wide_Character) return Boolean; function Is_Line_Terminator (Item : Wide_Character) return Boolean; function Is_Mark (Item : Wide_Character) return Boolean; function Is_Other_Format (Item : Wide_Character) return Boolean; function Is_Punctuation_Connector (Item : Wide_Character) return Boolean; function Is_Space (Item : Wide_Character) return Boolean; function Is_Graphic (Item : Wide_Character) return Boolean; function To_Lower (Item : Wide_Character) return Wide_Character; function To_Upper (Item : Wide_Character) return Wide_Character; function To_Lower (Item : Wide_String) return Wide_String; function To_Upper (Item : Wide_String) return Wide_String; end Ada.Wide_Characters.Handling; The subprograms defined in Ada.Wide_Characters.Handling are locale independent. function Is_Control (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as other_control, otherwise returns False. function Is_Letter (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, or number_letter; otherwise returns False. function Is_Lower (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as letter_lowercase, otherwise returns False. function Is_Upper (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, otherwise returns False. function Is_Digit (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as number_decimal, otherwise returns False. function Is_Hexadecimal_Digit (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as number_decimal, or is in the range 'A' .. 'F' or 'a' .. 'f', otherwise returns False. function Is_Alphanumeric (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, number_letter, or number_decimal; otherwise returns False. function Is_Special (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as graphic_character, but not categorized as letter_uppercase, letter_lowercase, letter_titlecase, letter_modifier, letter_other, number_letter, or number_decimal; otherwise returns False. function Is_Line_Terminator (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as separator_line or separator_paragraph, or if Item is a conventional line terminator character (Line_Feed, Line_Tabulation, Form_Feed, Carriage_Return, Next_Line); otherwise returns False. function Is_Mark (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as mark_non_spacing or mark_spacing_combining, otherwise returns False. function Is_Other_Format (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as other_format, otherwise returns False. function Is_Punctuation_Connector (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as punctuation_connector, otherwise returns False. function Is_Space (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as separator_space, otherwise returns False. function Is_Graphic (Item : Wide_Character) return Boolean; Returns True if the Wide_Character designated by Item is categorized as graphic_character, otherwise returns False. function To_Lower (Item : Wide_Character) return Wide_Character; Returns the Simple Lowercase Mapping as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003 of the Wide_Character designated by Item. If the Simple Lowercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned. function To_Lower (Item : Wide_String) return Wide_String; Returns the result of applying the To_Lower Wide_Character to Wide_Character conversion to each element of the Wide_String designated by Item. The result is the null Wide_String if the value of the formal parameter is the null Wide_String. The lower bound of the result Wide_String is 1. function To_Upper (Item : Wide_Character) return Wide_Character; Returns the Simple Uppercase Mapping as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003 of the Wide_Character designated by Item. If the Simple Uppercase Mapping does not exist for the Wide_Character designated by Item, then the value of Item is returned. function To_Upper (Item : Wide_String) return Wide_String; Returns the result of applying the To_Upper Wide_Character to Wide_Character conversion to each element of the Wide_String designated by Item. The result is the null Wide_String if the value of the formal parameter is the null Wide_String. The lower bound of the result Wide_String is 1. A.3.6 The Package Wide_Wide_Characters.Handling The package Wide_Wide_Characters.Handling has the same contents as Wide_Characters.Handling except that each occurrence of Wide_Character is replaced by Wide_Wide_Character, and each occurrence of Wide_String is replaced by Wide_Wide_String. !discussion The GNAT Unicode packages define a Category type which maps to the Unicode standard. Second forms of most of the classification routines exist that operate on category type parameters instead of Wide_Character or Wide_Wide_Character. The reason for these routines is that it is claimed they are more efficient if multiple classification tests are to be performed on a Wide_Character or Wide_Wide_Character value, otherwise the other form of the call that accepts Wide_Character or Wide_Wide_Character is expected to be more efficient. The category type however would tie the package more closely to the Unicode standard, whereas it is desirable to hide that abstraction. Furthermore, adding these routines would likely mean having to define a package like System.UTF_32 which is currently defined in GNAT. It seems that the categorization routines are not necessary for the standard, and might be better left as implementation-defined functionality. The package Ada.Characters.Handling defines classification routines that are not present in the GNAT Wide_Characters.Unicode and GNAT Ada.Wide_Characters.Handling and Ada.Wide_Wide_Characters.Handling. Specifically, Is_Control, Is_Lower, Is_Upper, Is_Basic, Is_Decimal_Digit, Is_Graphic, Is_Hexadecimal_Digit, Is_Alphanumeric, and Is_Special are absent. These should be provided to be consistent with Ada.Characters.Handling The Non_Graphic routine was replaced with Graphic, otherwise the remaining functions were added, except for the Is_Basic function, and the To_Basic functions. It is not clear whether these functions have any meaning in Wide_Character or Wide_Wide_Character contexts, as there do not appear to be any Unicode functions for stripping off diacritical marks, and it is not clear that doing so would result in a string that was meaningful. Also, the ISO_646 related functions were not added, since those deal with 8-bit values, they were deemed not appropriate for Wide_Character and Wide_Wide_Character contexts. Another question is whether some of the new classification functions should be added to Ada.Characters.Handling. The wording in the RM for that package describes the classification in terms of character ranges rather than the categories defined in 2.1. Should these be reworded in terms of these categories? [That question is tangentally covered by AI05-0114-1 - Editor.] !example (See discussion.) !corrigendum A.3.1(7/2) @drepl If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as case mapping, classification, collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. @dby If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. !corrigendum A.3.2(4) @drepl @xcode< @b Is_Control (Item : @b Character) @b Boolean; @b Is_Graphic (Item : @b Character) @b Boolean; @b Is_Letter (Item : @b Character) @b Boolean; @b Is_Lower (Item : @b Character) @b Boolean; @b Is_Upper (Item : @b Character) @b Boolean; @b Is_Basic (Item : @b Character) @b Boolean; @b Is_Digit (Item : @b Character) @b Boolean; @b Is_Decimal_Digit (Item : @b Character) @b Boolean; @b Is_Digit; @b Is_Hexadecimal_Digit (Item : @b Character) @b Boolean; @b Is_Alphanumeric (Item : @b Character) @b Boolean; @b Is_Special (Item : @b Character) @b Boolean;> @dby @xcode< @b Is_Control (Item : @b Character) @b Boolean; @b Is_Graphic (Item : @b Character) @b Boolean; @b Is_Letter (Item : @b Character) @b Boolean; @b Is_Lower (Item : @b Character) @b Boolean; @b Is_Upper (Item : @b Character) @b Boolean; @b Is_Basic (Item : @b Character) @b Boolean; @b Is_Digit (Item : @b Character) @b Boolean; @b Is_Decimal_Digit (Item : @b Character) @b Boolean; @b Is_Digit; @b Is_Hexadecimal_Digit (Item : @b Character) @b Boolean; @b Is_Alphanumeric (Item : @b Character) @b Boolean; @b Is_Special (Item : @b Character) @b Boolean; @b Is_Line_Terminator (Item : @b Character) @b Boolean; @b Is_Mark (Item : @b Character) @b Boolean; @b Is_Other_Format (Item : @b Character) @b Boolean; @b Is_Punctuation_Connector (Item : @b Character) @b Boolean; @b Is_Space (Item : @b Character) @b Boolean;> !corrigendum A.3.2(32) @dinsa @xhang<@xterm True if Item is a special graphic character. A @i is a graphic character that is not alphanumeric.> @dinss @xhang<@xterm Is_Line_Terminator True if Item is a character with position 10 .. 13 (Line_Feed, Line_Tabulation, Form_Feed, Carriage_Return) or 133 (Next_Line).> @xhang<@xterm Never True (no value of type Character has categories Mark, Non-Spacing or Mark, Spacing Combining).> @xhang<@xterm True if Item is a character with position 173 (Soft_Hyphen).> @xhang<@xterm True if Item is a character with position 95 ('_', known as Low_Line or Underscore).> @xhang<@xterm True if Item is a character with position 32 (' ') or 160 (No_Break_Space).> !corrigendum A.3.5(0) @dinsc The package Wide_Characters.Handling provides operations for classifying Wide_Characters and case folding for Wide_Characters. @s8<@i> The library package Wide_Characters.Handling has the following declaration: @xcode<@b Ada.Wide_Characters.Handling @b @b Is_Control (Item : Wide_Character) @b Boolean; @b Is_Letter (Item : Wide_Character) @b Boolean; @b Is_Lower (Item : Wide_Character) @b Boolean; @b Is_Upper (Item : Wide_Character) @b Boolean; @b Is_Digit (Item : Wide_Character) @b Boolean; @b Is_Decimal_Digit (Item : Wide_Character) @b Boolean @b Is_Digit; @b Is_Hexadecimal_Digit (Item : Wide_Character) @b Boolean; @b Is_Alphanumeric (Item : Wide_Character) @b Boolean; @b Is_Special (Item : Wide_Character) @b Boolean; @b Is_Line_Terminator (Item : Wide_Character) @b Boolean; @b Is_Mark (Item : Wide_Character) @b Boolean; @b Is_Other_Format (Item : Wide_Character) @b Boolean; @b Is_Punctuation_Connector (Item : Wide_Character) @b Boolean; @b Is_Space (Item : Wide_Character) @b Boolean; @b Is_Graphic (Item : Wide_Character) @b Boolean; @b To_Lower (Item : Wide_Character) @b Wide_Character; @b To_Upper (Item : Wide_Character) @b Wide_Character; @b To_Lower (Item : Wide_String) @b Wide_String; @b To_Upper (Item : Wide_String) @b Wide_String; @b Ada.Wide_Characters.Handling;> The subprograms defined in Ada.Wide_Characters.Handling are locale independent. @xcode<@b Is_Control (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Letter (Item : Wide_Character) @b Boolean;> @xindent, @fa, @fa, @fa, @fa, or @fa; otherwise returns False.> @xcode<@b Is_Lower (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Upper (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Digit (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Hexadecimal_Digit (Item : Wide_Character) @b Boolean;> @xindent, or is in the range 'A' .. 'F' or 'a' .. 'f', otherwise returns False.> @xcode<@b Is_Alphanumeric (Item : Wide_Character) @b Boolean;> @xindent, @fa, @fa, @fa, @fa, @fa, or @fa; otherwise returns False.> @xcode<@b Is_Special (Item : Wide_Character) @b Boolean;> @xindent, but not categorized as @fa, @fa, @fa, @fa, @fa, @fa, or number_decimal; otherwise returns False.> @xcode<@b Is_Line_Terminator (Item : Wide_Character) @b Boolean;> @xindent or @fa, or if Item is a conventional line terminator character (Line_Feed, Line_Tabulation, Form_Feed, Carriage_Return, Next_Line); otherwise returns False.> @xcode<@b Is_Mark (Item : Wide_Character) @b Boolean;> @xindent or @fa, otherwise returns False.> @xcode<@b Is_Other_Format (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Punctuation_Connector (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Space (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b Is_Graphic (Item : Wide_Character) @b Boolean;> @xindent, otherwise returns False.> @xcode<@b To_Lower (Item : Wide_Character) @b Wide_Character;> @xindent @xcode<@b To_Lower (Item : Wide_String) @b Wide_String;> @xindent @xcode<@b To_Upper (Item : Wide_Character) @b Wide_Character;> @xindent @xcode<@b To_Upper (Item : Wide_String) @b Wide_String;> @xindent !corrigendum A.3.6(0) @dinsc The package Wide_Wide_Characters.Handling has the same contents as Wide_Characters.Handling except that each occurrence of Wide_Character is replaced by Wide_Wide_Character, and each occurrence of Wide_String is replaced by Wide_Wide_String. !ACATS test ACATS C-Tests should be constructed for these packages. !appendix From: Robert Dewar Sent: Saturday, July 3, 2010 3:29 PM we forgot to say what the bounds of the result are for To_Lower and To_Upper. I suggest the same as the bounds of the input parameter (the alternative is always 1 as the low bound). **************************************************************** From: Robert Dewar Sent: Saturday, July 3, 2010 3:45 PM The Inline pragma for Is_Graphic says Is_Non_Graphic **************************************************************** From: Robert Dewar Sent: Saturday, July 3, 2010 4:05 PM I object to the pragma Inline's that's up to the implementation what makes sense to mark as inlined. **************************************************************** From: Randy Brukardt Sent: Saturday, July 3, 2010 5:28 PM Robert, in the future, please indicate the AI and version (and Bob would like the title as well) that you are looking at, because it can be hard to find whatever is being referred to. Anyway, once I figured out that you are talking about AI05-0185-1, the first note in the yet-to-be-published minutes says: "Drop all of the pragma Inline." It's not that helpful to review AIs between the end of a meeting and the publishing of the minutes, because it is likely that you'll just comment on stuff that has already been decided -- and that just adds to my workload without any corresponding benefit. There will be an editorial review of all of the newly completed AIs that will start shortly, and that is the appropriate time for reviewing these. **************************************************************** From: Robert Dewar Sent: Saturday, July 3, 2010 6:40 PM No problem, I am just making comments as I implement things and notice them, but this stuff is hardly critical! One thing that does concern me is To_Lower, Ihope everyone realizes that To_Upper and To_Lower are not easily reversible. For instance in identifiers, you definitely want lower case i with a dot to be equivalent to upper case I without a dot. Anything else would be a big surprise to anyone who is not Turkish. But there are for characters lower case i with and without a dot, and upper case I with and without a dot. The natural folding would be to keep the dot, but that's obviously not what you want. So my current implementation of To_Upper folds lower case i with a dot to upper case I without a dot. But I am not sure what the To_Upper and To_Lower functions in these packages in the AI are supposed to do. Who has studied tyhe To_Upper/To_Lower issue carefully for the purpose of this AI? Someone I hope! Or were these routines just stuck in casually without thinking about the difficult problems behind them (I suspect this is the case, please tell me it isn't and that someone can tell me EXACTLY what they had in mind). I follow the locale independent case folding discussed in note 1 of ISO/IEC 10646:2003 for To_Upper_Case currently. And now I can't even find this standard to look at it again :-( UGH! Case folding was one of the hardest things to deal with, and here it is in even greater glory in this package. Oh well I can always implement something or other. The RM certainly does not say what it means (though what *is* the reference to Simple_Lower_Case???) **************************************************************** From: Robert Dewar Sent: Saturday, July 3, 2010 6:44 PM > For instance in identifiers, you definitely want lower case i with a > dot to be equivalent to upper case I without a dot. Anything else > would be a big surprise to anyone who is not Turkish. To expand on this a bit, my current To_Upper function maps both lower case i with dot and lower case i with no dot to upper case I with no dot. I am sure this is what is wanted for identifier case equivalence (anything else would be an incompatible disaster). But that means that To_Upper is a many-to-one mapping, and thus is not reversible. **************************************************************** From: Randy Brukardt Sent: Sunday, July 4, 2010 6:07 PM ... > One thing that does concern me is To_Lower, Ihope everyone realizes > that To_Upper and To_Lower are not easily reversible. Those of us in the ARG who (sort of) understand the character stuff surely know that. But it probably would be a good idea to make it clear to regular end-users, so it would make sense to add a user note. ... > So my current implementation of To_Upper folds lower case i with a dot > to upper case I without a dot. But I am not sure what the To_Upper and > To_Lower functions in these packages in the AI are supposed to do. My understanding is that they are supposed to use the "Simple Uppercase Mapping" (and "Simple Lowercase Mapping") as defined by 10646. If there is no such thing, we have a problem! Probably the wording should make this clearer rather than just using Titlecase for the terms. That is, say something like "Simple Uppercase Mapping of ISO/IEC 10646:2003." > Who has studied tyhe To_Upper/To_Lower issue carefully for the purpose > of this AI? Someone I hope! Or were these routines just stuck in > casually without thinking about the difficult problems behind them (I > suspect this is the case, please tell me it isn't and that someone can > tell me EXACTLY what they had in mind). > > I follow the locale independent case folding discussed in note 1 of > ISO/IEC 10646:2003 for To_Upper_Case currently. > > And now I can't even find this standard to look at it again :-( I vaguely recall someone saying that this standard has free availability; presuming that is true there should be no problem getting a copy. (That said, I don't have a copy and should get one.) > UGH! Case folding was one of the hardest things to deal with, and here > it is in even greater glory in this package. Oh well I can always > implement something or other. The RM certainly does not say what it > means (though what *is* the reference to > Simple_Lower_Case???) It's "Simple Uppercase Mapping", and I presume there is something with that name in 10646. If not, we don't have a defined functionality, and that *surely* would be a problem. I personally had thought that this was talking about the same mapping used for Ada Identifiers, but having read the definition again, I'm not so sure anymore. That's because To_Upper for strings is defined in terms of To_Upper for characters, and that surely doesn't work for the full character set (how can To_Upper for a character return the *three* characters needed in some extreme cases??). So I suspect that you are right that there is a definitional problem here. **************************************************************** From: Robert Dewar Sent: Sunday, July 4, 2010 6:30 PM > My understanding is that they are supposed to use the "Simple > Uppercase Mapping" (and "Simple Lowercase Mapping") as defined by > 10646. If there is no such thing, we have a problem! Probably the > wording should make this clearer rather than just using Titlecase for > the terms. That is, say something like "Simple Uppercase Mapping of ISO/IEC > 10646:2003." I don't know what this refers to, can someone find a reference? > I personally had thought that this was talking about the same mapping > used for Ada Identifiers, but having read the definition again, I'm > not so sure anymore. That's because To_Upper for strings is defined in > terms of To_Upper for characters, and that surely doesn't work for the > full character set (how can To_Upper for a character return the > *three* characters needed in some extreme cases??). So I suspect that > you are right that there is a definitional problem here. To_Upper cannot return three characters for one, what are you talking about? 10646 has one code per point, we are not talking about UTF-8 strings here. For source it's up to you how the characters are represented, but conceptually identifiers are a sequence of wide_wide_characters. [This thread is rapidly turning to talk about identifiers; as such it continues in AI05-0227-1.] **************************************************************** From: Randy Brukardt Sent: Wednesday, August 11, 2010 9:44 PM The text in this AI says: function Is_Decimal_Digit (Item : Wide_Character) return Boolean; This function is a rename of Is_Digit. We don't write this in English, we just do it when desired. That is, the specification ought to be: function Is_Decimal_Digit (Item : Wide_Character) return Boolean renames Is_Digit; and the text description removed. **************************************************************** From: Randy Brukardt Sent: Wednesday, August 11, 2010 9:55 PM Should we change the Implementation Advice in A.3.1 since we are now providing some form of case mapping and classification? It says: If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as case mapping, classification, collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. Argubly it is still correct, since one could easily imagine further classification functions and "full case folding". But it seems a bit misleading, especially as it originally was added because we were *not* adding Wide_Characters.Handling in Ada 2005; now that we decided to do that, it not clear that it is as useful. (And it is a bit weird that it doesn't mention String; why not make the same statement for it?) Thoughts?? **************************************************************** From: Randy Brukardt Sent: Wednesday, August 11, 2010 10:38 PM Having looked at this a bit more I wonder if the names of the Is_Other and Is_Punctuation routines are misleading. Is_Other returns True for Other_Format characters, but not for other characters classified as "other, something". I think this routine would be better called Is_Other_Format. I was going to ignore Is_Other, but then I saw that Is_Punctuation is very misleading. This returns true for characters in category punctuation_connector (that is, for underscore), but will return False for common punctuation like '.' and ','. Punctuation_connector is the only category used in the Ada grammar (in identifiers), so it is the only one our standard defines. As such, it probably is the only one we really want to support here, but clearly we need a name that isn't misleading. Is_Punctuation_Connector would be a much better name. Thoughts?? **************************************************************** From: Robert Dewar Sent: Wednesday, August 11, 2010 11:14 PM No objections to these name changes, they seem minor and are easy enough to adjust in existing code. **************************************************************** Editor's note: This AI was reopened to address the items mentioned above and others raised during Editorial Review. Specifically: I had previously asked that the names Is_Other and Is_Punctuation be changed to Is_Other_Format and Is_Punctuation_Connector; the latter in particular is very misleading (it is true for underscore, but not for period or comma). I had also noted that the implementation advice in A.3.1 is now dubious; no one commented on that. So I don't even know what to suggest there. Robert noted that To_Lower and To_Upper doesn't define the bounds of the result. (It should be 1, to be consistent with Ada.Characters.Handling.) John would prefer that Is_Line_Terminator, .. Is_Graphic be added to Ada.Characters.Handling. Finally we need to clarify the definition of Simple Lowercase Mapping and Simple Uppercase Mapping. The first is a Unicode terms; but we can't refer to Unicode normatively in the Standard. The second doesn't exist anywhere. Moreover, these are different than what identifiers use. Robert and I had an e-mail meltdown on this back in July. And the identifier definition is completely daft, as the "convert to uppercase" definition says use Unicode full case folding -- but *that* is a conversion to *lower* case! See AI05-0227-1. So we need to decide what we really want here. **************************************************************** !topic Inconsistency in meaning of line terminator !reference Ada 2012 RM (Draft 11) 2.1(13/2), 2.2(2/2), A.3.2(32.1/3), A.3.5(40/3) !from Howard W. Ludwig 11-04-18 !discussion Paragraph 2.1(13/2) explicitly includes 16#85# (NEXT LINE) character as a format effector. Paragraph 2.2(2/2) states that a sequence of one or more format effectors except for CHARACTER TABULATIONs signifies at least one end of line. Therefore, from 2.1(13/2) I infer that the presence of a NEXT LINE character, which is not a CHARACTER TABULATION, would signify at least one end of line and constitute a line terminator. Paragraph A.3.2(32.1/3) implies that the function Is_Line_Terminator would not return True if Item is a character is the NEXT LINE character, with position 133 (16#85#), which is in the position range of type Character (unlike separator_line and separator_paragraph, which are appropriately excluded in this paragraph). Paragraph A.3.5(40/3) also inappropriately excludes NEXT LINE characters in a Wide_Character context. Otherwise, NEXT LINE needs to be removed from the format effector category in Paragraph 2.1(13/2), which would contradict Unicode and other current trends. **************************************************************** From: Randy Brukardt Sent: Tuesday, April 19, 2011 12:27 AM Well, there is one additional possibility, which would be to leave NEXT LINE as a format_effector but exclude it from being the end of a line (treating it like CHARACTER TABULATION). But I think this is just a clear omission in A.3.2, since the Chapter 2 rules seem clear enough. I'll see if there is support for a change. **************************************************************** From: Randy Brukardt Sent: Tuesday, April 19, 2011 12:33 AM There is a comment from Howard Ludwig on Ada-Comment saying that Is_Line_Terminator should return True for "Next Line" (Pos(16#85#)). His logic is that it is defined as a format_effector by 2.1(13/2); format_effectors other than TAB "signify end of line" according to 2.2(2/2). And everything that signifies an end of line is included in Is_Line_Terminator except "Next Line". Other fixes would include removing Next Line from the definition of format_effector (but that would vary from Unicode and 10646); or defining Next Line similarly to Tab (but why?). I think he is right, and it is a simple change at this point (these are new routines, so there is no compatibility concern). Thoughts? **************************************************************** From: Tucker Taft Sent: Tuesday, April 19, 2011 7:08 AM Given its name, it sounds like it makes sense to consider it a line terminator. ****************************************************************