!standard 1.1.4(15) 05-07-26 AI95-00395/09 !standard 2.1(01) !standard 2.1(03) !standard 2.1(04) !standard 2.1(14) !standard 2.3(02) !standard 2.3(03) !standard 2.3(04) !standard 2.3(05) !standard 2.3(06) !standard 2.9(02) !standard 3.5.2(03) !standard 6.1(10) !standard A.1(36) !standard A.3.1(00) !standard A.3.1(02) !standard A.3.2(02) !standard A.3.2(13-18) !standard A.3.2(42-48) !standard A.3.4(01) !standard A.4.7(46) !standard A.4.7(48) !standard A.4.8(01) !standard J.14(00) !class amendment 05-01-25 !status Amendment 200Y 05-02-25 !status ARG Approved 10-0-1 05-04-17 !status work item 05-01-25 !status received 05-01-25 !priority High !difficulty Easy !subject Various clarifications regarding 16- and 32-bit characters !summary (See proposal.) !problem 1 - The characters in category other_format are generally not displayed. The syntax rule for identifier would make it possible to have an identifier that includes two underlines separated by an other_format, which would visually look like two underlines. Similarly for trailing underlines, or for identifiers that would look like reserved words. Is this intended? (No.) 2 - The character at position 16#AD#, SOFT HYPHEN, is in category other_format. It was allowed in Ada 95 in literals, but the current wording means that it's no longer allowed, which introduces an incompatibility. Is this intended? (No.) 3 - Many places in normative text talk about "upper case" without qualification. This is somewhat ambiguous in the Unicode world. 4 - The definition of the image of non-graphic wide characters results in long strings like "Character_12345678". This increases the Width attribute for Wide_String and Wide_Wide_String for no good reason. 5 - AI-302-3 defines Ada.Strings.Wide_Hash, Ada.Strings.Wide_Fixed.Wide_Hash, Ada.Strings.Wide_Bounded.Wide_Hash; and Ada.Strings.Wide_Unbounded.Wide_Hash; there should be double wide versions of these as well. Similarly, the addition of AI-362 to A.4.7 needs to be made in A.4.8. 6 - Ada.Strings.Wide_Maps.Wide_Constants and Ada.Strings.Wide_Wide_Maps.Wide_Wide_Constants define Upper_Case_Map and Lower_Case_Map. What is their effect? 7 - Wide_Wide_Character has 2**31 values. Therefore its size is 31. Because Wide_Wide_String is a packed array, each of its component should occupy 31 bits. Is this intended? (No.) 8 - ISO/IEC 10646:2003 reserves positions 16#FFFE# and 16#FFFF# of *each plane*, but the AARM only mentions the BMP. Is this intended? (No.) 9 - Ada compilers will have a mechanism for locale-independent case folding and character classification. It seems wrong to not allow Ada users to use these facilities. !proposal 1 - After removing the other_format characters, an identifier must not violate the "usual" rules about underlines. It must not be a reserved word, either. Also, other_format characters are allowed (but ignored) in reserved words, and in "special" attribute designators. Note that we must phrase the wording to only allow ASCII characters in identifiers, to avoid oddities like "if" written with a Turkish dotless-i, or "access" written with a German sharp-s. 2 - The incompatibility doesn't seem justified. While Unicode recommends that other_format characters be ignored in identifiers, it doesn't say anything about other constructs. ECMA C#, which we used as a guideline in resolving some of the characters issues, allow them in string literals. Hopefully decent program editors will provide a way to display these characters. Note that some languages allow any character in string literals. We do not want to go that far, in particular we do not want to allow control characters. They have been disallowed for 20 years, and there is no indication that users have had any problem with that. We are just avoiding an incompatibility. We must also specify what is the effect of other_format characters in operator symbols. We are following the rule that other_format characters work in operator symbols just like in normal text: they are allowed (and ignored) for operators that are reserved words, and disallowed in other operators. 3 - We are not going to fix all these places. Currently we only have a rule in 2.3, but it surely doesn't cover all the occurrences of "upper case", so it would be better to have a blanket statement somewhere in section 1. 4 - Change the language-defined names to keep the current value of Width (which is 12). 5 - Add Ada.Strings.Wide_Wide_Hash, Ada.Strings.Wide_Wide_Fixed.Wide_Wide_Hash, Ada.Strings.Wide_Wide_Bounded.Wide_Wide_Hash, and Ada.Strings.Wide_Wide_Unbounded.Wide_Wide_Hash to A.4.8's list of functions. Add A.4.7(46.1/2) to A.4.8. 6 - Ada.Strings.Wide_Maps.Wide_Constants defines Upper_Case_Map and Lower_Case_Map in terms of Ada.Strings.Maps.Constants. A.4.7(48) makes it clear that this is intended. Changing their definition would be inconsistent with Ada 95 - programs would behave differently with no compile-time indication. Ada.Strings.Wide_Wide_Maps.Wide_Wide_Constants is defined in terms of Ada.Strings.Wide_Maps.Wide_Constants, but the note A.4.7(48) was not carried over. So it is unclear whether these are just copies (which is what the normative wording implies) or whether they cover the full range. Covering the full range seems inconsistent with Wide_Maps. Case folding is not necessarily 1-to-1; therefore, these mappings are inappropriate for 32-bit characters anyway. Therefore, we stay consistent with Wide_Constants and add text to A.4.8. This text should be normative, and A.4.7 should be changed similarly. 7 - There are two ways to fix this issue: add a size clause for 32 bits; or add another 2**31 literals to Wide_Wide_Character. The former has the drawback that some operations involving low-level programming (Unchecked_Conversion, C interfacing) may become erroneous. The latter has the drawback that Wide_Wide_Character does not model properly the 10646 character set, and therefore programmers who care about internationalization have to deal with the 2**31 extra values; in particular, a signed integer type on a 32-bit machine cannot hold the Pos of a Wide_Wide_Character. This AI was written for the first option. 8 - Add wording to cover positions 16#FFFE# and 16#FFFF# of each plane. We are also removing the language-defined names FFFE and FFFF. 9 - Full case mapping and wide character categorization requires hefty run-time tables, so it would be inappropriate to add that to Ada.Characters.Handling. However, the addition of new operations dealing with Wide_Wide_Characters in Ada.Characters.Handling is problematic, as it makes some calls (those that use literals) ambiguous. So we are moving the conversion functions in a new child package, Ada.Characters.Conversions, and making the existing conversion functions in Ada.Characters.Handling obsolete. We are also adding packages Ada.Wide_Characters and Ada.Wide_Wide_Characters as umbrellas for implementation-defined (or user-defined) operations on Wide_ and Wide_Wide_Characters and Strings. !wording 1 - Change 2.3(2-4) to read: identifier ::= identifier_start {identifier_start | identifier_extend} identifier_start ::= letter_uppercase | letter_lowercase | letter_titlecase | letter_modifier | letter_other | number_letter identifier_extend ::= mark_non_spacing | mark_spacing_combining | number_decimal_digit | punctuation_connector | other_format After eliminating the characters in category other_format, an identifier shall not contain two consecutive characters in category punctuation_connector, or end with a character in that category. Add before 2.3(6): After applying these transformations, an identifier shall not be identical to a reserved word (in upper case). Replace the introductory sentence of 2.9(2) with: The following are the reserved words. Within a program, some or all of the letters of a reserved word may be in upper case, and one or more characters in category other_format may be inserted within or at the end of the reserved word. 2 - Change 2.1(14) (as modified by AI95-00285) to read: graphic_character Any character which is not in the categories other_control, other_private_use, other_surrogate, format_effector, and whose code position is neither 16#FFFE# nor 16#FFFF#. Change 6.1(10) to read: The sequence of characters in an operator_symbol shall form a reserved word, a delimiter, or compound delimiter that corresponds to an operator belonging to one of the six categories of operators defined in clause 4.5. AARM Note: The "sequence of characters" of the string literal of the operator is a technical term (see 2.6), and does not include the surrounding quote characters. As defined in 2.2, lexical elements are "formed" from a sequence of characters. Spaces are not allowed, and upper and lower case is not significant. See 2.2 and 2.9 for rules related to the use of other_format characters in delimiters and reserved words. 3 - Add before 1.1.4(15): When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using locale-independent full case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003. AARM Note: For sequences of characters, case folding is applied to the sequence, not to individual characters. It sometimes can make a difference. Change the second paragraph added after 2.3(5) by AI95-00285 to read: o The remaining sequence of characters is converted to upper case. 4 - In the second paragraph added after 3.5.2(3) by AI95-00285, replace: ... the string "Character_" ... by: ... the string "Hex_" ... 5 - In A.4.8(1) (introduced by AI95-00285), add Strings.Wide_Wide_Hash, Strings.Wide_Wide_Fixed.Wide_Wide_Hash Strings.Wide_Wide_Bounded.Wide_Wide_Hash, and Strings.Wide_Wide_Unbounded.Wide_Wide_Hash functions. In A.4.8(28) (introduced by AI95-00285), add Strings.Hash, Strings.Fixed.Hash, Strings.Bounded.Hash, and Strings.Unbounded.Hash functions. Add after A.4.8(45) (introduced by AI95-00285): Pragma Pure is replaced by pragma Preelaborate in Strings.Wide_Wide_Maps.Wide_Wide_Constants. 6 - Change A.4.7(48) into normative text. Add at the end of A.4.8 (introduced by AI95-00302): Each Wide_Wide_Character_Set constant in the package Strings.Wide_Wide_Maps.Wide_Wide_Constants contains no values outside the Character portion of Wide_Wide_Character. Similarly, each Wide_Wide_Character_Mapping constant in this package is the identity mapping when applied to any element outside the Character portion of Wide_Wide_Character. 7 - Add in A.1, after the declaration of Wide_Wide_Character: for Wide_Wide_Character'Size use 32; 8 - Change 2.1(1) (as modified by AI95-00285) to read: The character repertoire for the text of an Ada program consists of the collection of characters described by the ISO/IEC 10646:2003 Universal Multiple- Octet Coded Character Set. This collection is organized in *planes*, each plane comprising 65536 characters. Change the paragraph inserted after 2.1(3) by AI95-00285 to read: A character is any character defined within ISO/IEC 10646:2003 other than those whose relative code position in their plane is 16#FFFE# or 16#FFFF#. Change the second sentence of 2.1(4) (as modified by AI95-00285) to read: A character whose relative code position in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. Change 2.1(14) to read (that's in addition to the changes for this paragraph above): graphic_character Any character which is not in the categories other_control, other_private_use, other_surrogate, format_effector, and whose relative code position in its plane is neither 16#FFFE# nor 16#FFFF#. In 3.5.2(3), remove: "The last 2 values of Wide_Character correspond to the nongraphic positions FFFE and FFFF of the BMP, and are assigned the language-defined names FFFE and FFFF. As with the other language-defined names for nongraphic characters, the names FFFE and FFFF are usable only with the attributes (Wide_)Image and (Wide_)Value; they are not usable as enumeration literals." In A.1(36) replace: type Wide_Character is (nul, soh, ..., Hex_0000FFFF); 9 - Change A.3.1(0) to read: A.3.1 The Packages Characters, Wide_Characters, and Wide_Wide_Characters Insert after A.3.1(2): The library package Wide_Characters has the following declaration: package Ada.Wide_Characters is pragma Pure (Wide_Characters); end Ada.Wide_Characters; The library package Wide_Wide_Characters has the following declaration: package Ada.Wide_Wide_Characters is pragma Pure (Wide_Wide_Characters); end Ada.Wide_Wide_Characters; Implementation Advice If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as case mapping, classification, collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. Add before A.3.2(2): with Ada.Characters.Conversions Replace A.3.2(13) by: -- The functions Is_Character, Is_String, To_Character, To_String, To_Wide_Character, -- and To_Wide_String are obsolescent; see J.14. Delete A.3.2(14-18). Delete A.3.2(42-48). Add section J.14: J.14 Character and Wide_Character Conversion Functions The following declarations exist in the declaration of package Ada.Characters.Handling: function Is_Character (Item : in Wide_Character) return Boolean renames Conversions.Is_Character; function Is_String (Item : in Wide_String) return Boolean renames Conversions.Is_String; function To_Character (Item : in Wide_Character; Substitute : in Character := ' ') return Character renames Conversions.To_Character; function To_String (Item : in Wide_String; Substitute : in Character := ' ') return String renames Conversions.To_String; function To_Wide_Character (Item : in Character) return Wide_Character renames Conversions.To_Wide_Character; function To_Wide_String (Item : in String) return Wide_String renames Conversions.To_Wide_String; Add section A.3.4: A.3.4 The Package Characters.Conversions The library package Ada.Characters.Conversions has the following declaration: package Ada.Characters.Conversions is pragma Pure (Conversions); function Is_Character (Item : in Wide_Character) return Boolean; function Is_String (Item : in Wide_String) return Boolean; function Is_Character (Item : in Wide_Wide_Character) return Boolean; function Is_String (Item : in Wide_Wide_String) return Boolean; function Is_Wide_Character (Item : in Wide_Wide_Character) return Boolean; function Is_Wide_String (Item : in Wide_Wide_String) return Boolean; function To_Wide_Character (Item : in Character) return Wide_Character; function To_Wide_String (Item : in String) return Wide_String; function To_Wide_Wide_Character (Item : in Character) return Wide_Wide_Character; function To_Wide_Wide_String (Item : in String) return Wide_Wide_String; function To_Wide_Wide_Character (Item : in Wide_Character) return Wide_Wide_Character; function To_Wide_Wide_String (Item : in Wide_String) return Wide_Wide_String; function To_Character (Item : in Wide_Character; Substitute : in Character := ' ') return Character; function To_String (Item : in Wide_String; Substitute : in Character := ' ') return String; function To_Character (Item : in Wide_Wide_Character; Substitute : in Character := ' ') return Character; function To_String (Item : in Wide_Wide_String; Substitute : in Character := ' ') return String; function To_Wide_Character (Item : in Wide_Wide_Character; Substitute : in Wide_Character := ' ') return Wide_Character; function To_Wide_String (Item : in Wide_Wide_String; Substitute : in Wide_Character := ' ') return Wide_String; end Ada.Characters.Conversions; (The wording for the semantics of the operations declared in this package is identical to the one currently in AI95-00285.) !discussion (See proposal.) !example !corrigendum 1.1.4(15) @dinsb A @i is a nonterminal in the grammar defined in BNF under "Syntax." Names of syntactic categories are set in a different font, @fa. @dinst When this International Standard mentions the conversion of some character or sequence of characters to upper case, it means the character or sequence of characters obtained by using locale-independent full case folding, as defined by documents referenced in the note in section 1 of ISO/IEC 10646:2003. !corrigendum 2.1(01) @drepl The only characters allowed outside of @fas are the @fas and @fas. @dby The character repertoire for the text of an Ada program consists of the collection of characters described by the ISO/IEC 10646:2003 Universal Multiple-Octet Coded Character Set. This collection is organized in @i, each plane comprising 65536 characters. !corrigendum 2.1(03) @drepl @xcode<@fa> @dby @xindent is any character defined within ISO/IEC 10646:2003 other than those whose relative code position in their plane is 16#FFFE# or 16#FFFF#.> !corrigendum 2.1(04) @drepl The character repertoire for the text of an Ada program consists of the collection of characters called the Basic Multilingual Plane (BMP) of the ISO 10646 Universal Multiple-Octet Coded Character Set, plus a set of @fas and, in comments only, a set of @fas; the coded representation for these characters is implementation defined (it need not be a representation defined within ISO-10646-1). @dby The coded representation for characters is implementation defined (it need not be a representation defined within ISO/IEC 10646:2003). A character whose relative code position in its plane is 16#FFFE# or 16#FFFF# is not allowed anywhere in the text of a program. The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section 24 of ISO/IEC 10646:2003) is implementation defined. !corrigendum 2.1(14) @drepl @xhang<@xterm<@fa> Any control function, other than a @fa, that is allowed in a comment; the set of @fas allowed in comments is implementation defined.> @dby @xhang<@xterm<@fa> Any character which is not in the categories @fa, @fa, @fa, @fa, and whose relative code position in its plane is neither 16#FFFE# nor 16#FFFF#.> !corrigendum 2.3(02) @drepl @xcode<@fa> @dby @xcode<@fa> !corrigendum 2.3(03) @drepl @xcode<@fa> @dby @xcode<@fa> @xcode<@fa> !corrigendum 2.3(04) @drepl An identifier shall not be a reserved word. @dby After eliminating the characters in category @fa, an @fa shall not contain two consecutive characters in category punctuation_connector, or end with a character in that category. !corrigendum 2.3(05) @drepl All characters of an @fa are significant, including any underline character. @fas differing only in the use of corresponding upper and lower case letters are considered the same. @dby Two @fas are considered the same if they consist of the same sequence of characters after applying the following transformations (in this order): @xbullet are eliminated.> @xbullet !corrigendum 2.3(06) @dinsb In a nonstandard mode, an implementation may support other upper/lower case equivalence rules for identifiers, to accommodate local conventions. @dinst After applying these transformations, an identifier shall not be identical to a reserved word (in upper case). !corrigendum 2.9(02) @dprepl The following are the @i (ignoring upper/lower case distinctions): @dby The following are the @i. Within a program, some or all of the letters of a reserved word may be in upper case, and one or more characters in category @fa may be inserted within or at the end of the reserved word. !corrigendum 3.5.2(03) @drepl The predefined type Wide_Character is a character type whose values correspond to the 65536 code positions of the ISO 10646 Basic Multilingual Plane (BMP). Each of the graphic characters of the BMP has a corresponding @fa in Wide_Character. The first 256 values of Wide_Character have the same @fa or language-defined name as defined for Character. The last 2 values of Wide_Character correspond to the nongraphic positions FFFE and FFFF of the BMP, and are assigned the language-defined names @i and @i. As with the other language-defined names for nongraphic characters, the names @i and @i are usable only with the attributes (Wide_)Image and (Wide_)Value; they are not usable as enumeration literals. All other values of Wide_Character are considered graphic characters, and have a corresponding @fa. @dby The predefined type Wide_Character is a character type whose values correspond to the 65536 code positions of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP). Each of the graphic characters of the BMP has a corresponding @fa in Wide_Character. The first 256 values of Wide_Character have the same @fa or language-defined name as defined for Character. Each of the @fas has a corresponding @fa. The predefined type Wide_Wide_Character is a character type whose values correspond to the 2147483648 code positions of the ISO/IEC 10646:2003 character set. Each of the @fas has a corresponding @fa in Wide_Wide_Character. The first 65536 values of Wide_Wide_Character have the same @fa or language-defined name as defined for Wide_Character. The characters whose code position is larger than 16#FF# and which are not @fas have language-defined names which are formed by appending to the string "Hex_" the representation of their code position in hexadecimal as eight extended digits. As with other language-defined names, these names are usable only with the attributes (Wide_)Wide_Image and (Wide_)Wide_Value; they are not usable as enumeration literals. !corrigendum 6.1(10) @drepl The sequence of characters in an @fa shall correspond to an operator belonging to one of the six classes of operators defined in clause 4.5 (spaces are not allowed and the case of letters is not significant). @dby The sequence of characters in an @fa shall form a reserved word, a delimiter, or compound delimiter that corresponds to an operator belonging to one of the six categories of operators defined in clause 4.5. !corrigendum A.1(36) @drepl @xcode< --@ft<@i< The predefined operators for the type Character are the same as for>> --@ft<@i< any enumeration type.>> --@ft<@i< The declaration of type Wide_Character is based on the standard ISO 10646 BMP character set.>> --@ft<@i< The first 256 positions have the same contents as type Character. See 3.5.2.>> @b Wide_Character @b (@i, @i ... @i, @i); @b ASCII @b ... @b ASCII; --@ft<@i>> @dby @xcode< --@ft<@i< The predefined operators for the type Character are the same as for>> --@ft<@i< any enumeration type.>> --@ft<@i< The declaration of type Wide_Character is based on the standard ISO/IEC 10646:2003 BMP character>> --@ft<@i< set. The first 256 positions have the same contents as type Character. See 3.5.2.>> @b Wide_Character @b (@i, @i ... @i, @i); --@ft<@i< The declaration of type Wide_Wide_Character is based on the full>> --@ft<@i< ISO/IEC 10646:2003 character set. The first 65536 positions have the>> --@ft<@i< same contents as type Wide_Character. See 3.5.2.>> @b Wide_Wide_Character @b (@i, @i ... @i, @i); @b Wide_Wide_Character'Size @b 32; @b ASCII @b ... @b ASCII; --@ft<@i>> !corrigendum A.3.1(00) @drepl The Package Characters @dby The Packages Characters, Wide_Characters, and Wide_Wide_Characters !corrigendum A.3.1(02) @dinsa @xcode<@b Ada.Characters @b @b Pure(Characters); @b Ada.Characters;> @dinss The library package Wide_Characters has the following declaration: @xcode<@b Ada.Wide_Characters @b @b Pure(Wide_Characters); @b Ada.Wide_Characters;> The library package Wide_Wide_Characters has the following declaration: @xcode<@b Ada.Wide_Wide_Characters @b @b Pure(Wide_Wide_Characters); @b Ada.Wide_Wide_Characters;> @i<@s8> If an implementation chooses to provide implementation-defined operations on Wide_Character or Wide_String (such as case mapping, classification, collating and sorting, etc.) it should do so by providing child units of Wide_Characters. Similarly if it chooses to provide implementation-defined operations on Wide_Wide_Character or Wide_Wide_String it should do so by providing child units of Wide_Wide_Characters. !corrigendum A.3.2(02) @drepl @xcode<@b Ada.Characters.Handling @b @b Preelaborate(Handling);> @dby @xcode<@b Ada.Characters.Conversions; @b Ada.Characters.Handling @b @b Pure(Handling);> !corrigendum A.3.2(13) @drepl @xcode< --@ft<@i>> @dby @xcode< --@ft<@i> --@ft<@i>> !corrigendum A.3.2(14) @ddel @xcode< @b Is_Character (Item : @b Wide_Character) @b Boolean; @b Is_String (Item : @b Wide_String) @b Boolean;> !corrigendum A.3.2(15) @ddel @xcode< @b To_Character (Item : @b Wide_Character; Substitute : @b Character := ' ') @b Character;> !corrigendum A.3.2(16) @ddel @xcode< @b To_String (Item : @b Wide_String; Substitute : @b Character := ' ') @b String;> !corrigendum A.3.2(17) @ddel @xcode< @b To_Wide_Character (Item : @b Character) @b Wide_Character;> !corrigendum A.3.2(18) @ddel @xcode< @b To_Wide_String (Item : @b String) @b Wide_String;> !corrigendum A.3.2(42) @ddel The following set of functions test Wide_Character values for membership in Character, or convert between corresponding characters of Wide_Character and Character. !comment A.3.2(43-47) are deleted by the original AI-285, so no change is needed here. !corrigendum A.3.2(48) @ddel @xhang<@xterm !corrigendum A.3.4(01) @dinsc The library package Characters.Conversions has the following declaration: @xcode<@b Ada.Characters.Conversions @b @b Pure(Conversions); @b Is_Character (Item : @b Wide_Character) @b Boolean; @b Is_String (Item : @b Wide_String) @b Boolean; @b Is_Character (Item : @b Wide_Wide_Character) @b Boolean; @b Is_String (Item : @b Wide_Wide_String) @b Boolean; @b Is_Wide_Character (Item : @b Wide_Wide_Character) @b Boolean; @b Is_Wide_String (Item : @b Wide_Wide_String) @b Boolean; @b To_Wide_Character (Item : @b Character) @b Wide_Character; @b To_Wide_String (Item : @b String) @b Wide_String; @b To_Wide_Wide_Character (Item : @b Character) @b Wide_Wide_Character; @b To_Wide_Wide_String (Item : @b String) @b Wide_Wide_String; @b To_Wide_Wide_Character (Item : @b Wide_Character) @b Wide_Wide_Character; @b To_Wide_Wide_String (Item : @b Wide_String) @b Wide_Wide_String; @b To_Character (Item : @b Wide_Character; Substitute : @b Character := ' ') @b Character; @b To_String (Item : @b Wide_String; Substitute : @b Character := ' ') @b String; @b To_Character (Item : @b Wide_Wide_Character; Substitute : @b Character := ' ') @b Character; @b To_String (Item : @b Wide_Wide_String; Substitute : @b Character := ' ') @b String; @b To_Wide_Character (Item : @b Wide_Wide_Character; Substitute : @b Wide_Character := ' ') @b Wide_Character; @b To_Wide_String (Item : @b Wide_Wide_String; Substitute : @b Wide_Character := ' ') @b Wide_String; @b Ada.Characters.Conversions;> The functions in package Characters.Conversions test Wide_Wide_Character or Wide_Character values for membership in Wide_Character or Character, or convert between corresponding characters of Wide_Wide_Character, Wide_Character, and Character. @xcode<@b Is_Character (Item : @b Wide_Character) @b Boolean;> @xindent @xcode<@b Is_Character (Item : @b Wide_Wide_Character) @b Boolean;> @xindent @xcode<@b Is_Wide_Character (Item : @b Wide_Wide_Character) @b Boolean;> @xindent @xcode<@b Is_String (Item : @b Wide_String) @b Boolean; @b Is_String (Item : @b Wide_Wide_String) @b Boolean;> @xindent @xcode<@b Is_Wide_String (Item : @b Wide_Wide_String) @b Boolean;> @xindent @xcode<@b To_Character (Item : @b Wide_Character; Substitute : @b Character := ' ') @b Character; @b To_Character (Item : @b Wide_Wide_Character; Substitute : @b Character := ' ') @b Character;> @xindent @xcode<@b To_Wide_Character (Item : @b Character) @b Wide_Character;> @xindent @xcode<@b To_Wide_Character (Item : @b Wide_Wide_Character; Substitute : @b Wide_Character := ' ') @b Wide_Character;> @xindent @xcode<@b To_Wide_Wide_Character (Item : @b Character) @b Wide_Wide_Character;> @xindent @xcode<@b To_Wide_Wide_Character (Item : @b Wide_Character) @b Wide_Wide_Character;> @xindent @xcode<@b To_String (Item : @b Wide_String; Substitute : @b Character := ' ') @b String; @b To_String (Item : @b Wide_Wide_String; Substitute : @b Character := ' ') @b String;> @xindent @xcode<@b To_Wide_String (Item : @b String) @b Wide_String;> @xindent @xcode<@b To_Wide_String (Item : @b Wide_Wide_String; Substitute : @b Wide_Character := ' ') @b Wide_String;> @xindent @xcode<@b To_Wide_Wide_String (Item : @b String) @b Wide_Wide_String; @b To_Wide_Wide_String (Item : @b Wide_String) @b Wide_Wide_String;> @xindent !corrigendum A.4.7(46) @drepl @xcode< Character_Set : @b Wide_Maps.Wide_Character_Set; --@ft<@i< Contains each Wide_Character value WC such that>> --@ft<@i< Characters.Is_Character(WC) is True>>> @dby @xcode< Character_Set : @b Wide_Maps.Wide_Character_Set; --@ft<@i< Contains each Wide_Character value WC such that>> --@ft<@i< Characters.Conversions.Is_Character(WC) is True>>> Each Wide_Character_Set constant in the package Strings.Wide_Maps.Wide_Constants contains no values outside the Character portion of Wide_Character. Similarly, each Wide_Character_Mapping constant in this package is the identity mapping when applied to any element outside the Character portion of Wide_Character. !corrigendum A.4.7(48) @ddel @xindent<13 Each Wide_Character_Set constant in the package Strings.Wide_Maps.Wide_Constants contains no values outside the Character portion of Wide_Character. Similarly, each Wide_Character_Mapping constant in this package is the identity mapping when applied to any element outside the Character portion of Wide_Character.> !corrigendum A.4.8(01) @dinsc Facilities for handling strings of Wide_Wide_Character elements are found in the packages Strings.Wide_Wide_Maps, Strings.Wide_Wide_Fixed, Strings.Wide_Wide_Bounded, Strings.Wide_Wide_Unbounded, and Strings.Wide_Wide_Maps.Wide_Wide_Constants, and in the functions Strings.Wide_Wide_Hash, Strings.Wide_Wide_Fixed.Wide_Wide_Hash, Strings.Wide_Wide_Bounded.Wide_Wide_Hash, and Strings.Wide_Wide_Unbounded.Wide_Wide_Hash. They provide the same string-handling operations as the corresponding packages for strings of Character elements. @i<@s8> The library package Strings.Wide_Wide_Maps has the following declaration. @xcode<@b Ada.Strings.Wide_Wide_Maps @b @b Preelaborate(Wide_Wide_Maps); --@ft<@i< Representation for a set of Wide_Wide_Character values:>> @b Wide_Wide_Character_Set @b; @b Preelaborable_Initialization(Wide_Wide_Character_Set); Null_Set : @b Wide_Wide_Character_Set; @b Wide_Wide_Character_Range @b @b Low : Wide_Wide_Character; High : Wide_Wide_Character; @b; --@ft<@i< Represents Wide_Wide_Character range Low..High>> @b Wide_Wide_Character_Ranges @b (Positive @b <@>) @b Wide_Wide_Character_Range; @b To_Set (Ranges : @b Wide_Wide_Character_Ranges) @b Wide_Wide_Character_Set; @b To_Set (Span : @b Wide_Wide_Character_Range) @b Wide_Wide_Character_Set; @b To_Ranges (Set : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Ranges; @b "=" (Left, Right : @b Wide_Wide_Character_Set) @b Boolean; @b "@b" (Right : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Set; @b "@b" (Left, Right : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Set; @b "@b" (Left, Right : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Set; @b "@b" (Left, Right : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Set; @b "-" (Left, Right : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Set; @b Is_In (Element : @b Wide_Wide_Character; Set : @b Wide_Wide_Character_Set) @b Boolean; @b Is_Subset (Elements : @b Wide_Wide_Character_Set; Set : @b Wide_Wide_Character_Set) @b Boolean; @b "<=" (Left : @b Wide_Wide_Character_Set; Right : @b Wide_Wide_Character_Set) @b Boolean @b Is_Subset; --@ft<@i< Alternative representation for a set of Wide_Wide_Character values:>> @b Wide_Wide_Character_Sequence @b Wide_Wide_String; @b To_Set (Sequence : @b Wide_Wide_Character_Sequence) @b Wide_Wide_Character_Set; @b To_Set (Singleton : @b Wide_Wide_Character) @b Wide_Wide_Character_Set; @b To_Sequence (Set : @b Wide_Wide_Character_Set) @b Wide_Wide_Character_Sequence; --@ft<@i< Representation for a Wide_Wide_Character to Wide_Wide_Character>> --@ft<@i< mapping:>> @b Wide_Wide_Character_Mapping @b; @b Preelaborable_Initialization(Wide_Wide_Character_Mapping); @b Value (Map : @b Wide_Wide_Character_Mapping; Element : @b Wide_Wide_Character) @b Wide_Wide_Character; Identity : @b Wide_Wide_Character_Mapping; @b To_Mapping (From, To : @b Wide_Wide_Character_Sequence) @b Wide_Wide_Character_Mapping; @b To_Domain (Map : @b Wide_Wide_Character_Mapping) @b Wide_Wide_Character_Sequence; @b To_Range (Map : @b Wide_Wide_Character_Mapping) @b Wide_Wide_Character_Sequence; @b Wide_Wide_Character_Mapping_Function @b @b (From : @b Wide_Wide_Character) @b Wide_Wide_Character; @b ... --@ft<@i< not specified by the language>> @b Ada.Strings.Wide_Wide_Maps;> The context clause for each of the packages Strings.Wide_Wide_Fixed, Strings.Wide_Wide_Bounded, and Strings.Wide_Wide_Unbounded identifies Strings.Wide_Wide_Maps instead of Strings.Maps. For each of the packages Strings.Fixed, Strings.Bounded, Strings.Unbounded, and Strings.Maps.Constants, and for functions String.Hash, Strings.Fixed.Hash, Strings.Bounded.Hash, and Strings.Unbounded.Hash, the corresponding wide wide string package or function has the same contents except that @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet @xbullet The following additional declarations are present in Strings.Wide_Wide_Maps.Wide_Wide_Constants: @xcode< Character_Set : @b Wide_Wide_Maps.Wide_Wide_Character_Set; --@ft<@i< Contains each Wide_Wide_Character value WWC such that>> --@ft<@i< Characters.Conversions.Is_Character(WWC) is True>> Wide_Character_Set : @b Wide_Wide_Maps.Wide_Wide_Character_Set; --@ft<@i< Contains each Wide_Wide_Character value WWC such that>> --@ft<@i< Characters.Conversions.Is_Wide_Character(WWC) is True>>> Each Wide_Wide_Character_Set constant in the package Strings.Wide_Wide_Maps.Wide_Wide_Constants contains no values outside the Character portion of Wide_Wide_Character. Similarly, each Wide_Wide_Character_Mapping constant in this package is the identity mapping when applied to any element outside the Character portion of Wide_Wide_Character. @fa Pure is replaced by @fa Preelaborate in Strings.Wide_Wide_Maps.Wide_Wide_Constants. @xindent<@s9> !corrigendum J.14(00) @dinsc The following declarations exist in the declaration of package Ada.Characters.Handling: @xcode< @b Is_Character (Item : @b Wide_Character) @b Boolean @b Conversions.Is_Character; @b Is_String (Item : @b Wide_String) @b Boolean @b Conversions.Is_String; @b To_Character (Item : @b Wide_Character; Substitute : @b Character := ' ') @b Character @b Conversions.To_Character; @b To_String (Item : @b Wide_String; Substitute : @b Character := ' ') @b String @b Conversions.To_String; @b To_Wide_Character (Item : @b Character) @b Wide_Character @b Conversions.To_Wide_Character; @b To_Wide_String (Item : @b String) @b Wide_String @b Conversions.To_Wide_String;> !ACATS test ACATS C-Test(s) should be created to test these rules. !appendix From: Robert Dewar Sent: Sunday, January 23, 2005 12:24 PM The grammar as it is now allows identifiers to contain the sequence underline other-format-character underline Now the normal way of handling other-format-character internally would be to simply ignore it, but then we end up internally with an identifier with two underscores in it. That's a real pain, since we assume that two underscores is reserved. I really think this is undesirable for other reasons, since other-format-character often corresponds to something not visible, such as formatting information, and you end up with an identifier that has two visible underscores in a row. I would recommend we modify the grammar in AI195 to eliminate this unpleasant possibility. Note that the current rules also allow an identifier to effectively end with an underscore (by ending with the sequence underscore other-format-character) but not to begin with an underscore. I know the standard is written in terms of how to compare identifiers, but in fact I think may compilers will work as GNAT does, by canonicalizing identifiers as they are scanned. P.S. for those who don't want to go rummaging in the AI, other-format characters include stuff like: invisible separator soft hyphen zero width non-joiner zero width no-break space tag space language tag **************************************************************** From: Pascal Leroy Sent: Monday, January 24, 2005 4:56 AM (I suppose you mean AI 285, not AI 195, btw.) I fully agree, I didn't realize that the syntax as written did allow for two (visibly) consecutive underscores, or for trailing underscores. It was never my intent to allow that. The other_format characters need to be integrated in the BNF for identifier so that they don't interrupt an identifier, but being typically invisible they should not be usable to circumvent the presentation rules that we know and love. It might be possible to fix the BNF to account for this rule, but I think it would be clearer to add a syntax rule in English like: "After eliminating the characters in category other_format, an identifier shall not contain two consecutive characters in category punctuation_connector, or a end with a character in that category." **************************************************************** From: Robert Dewar Sent: Monday, January 25, 2005 7:29 AM > (I suppose you mean AI 285, not AI 195, btw.) Yes indeed, sorry about that misprint > It might be possible to fix the BNF to account for this rule, but I think > it would be clearer to add a syntax rule in English like: > > "After eliminating the characters in category other_format, an identifier > shall not contain two consecutive characters in category > punctuation_connector, or a end with a character in that category." I agree, it is a bit tricky (not impossible, but messy) to do this in BNF. Note that once this sentence is added, you can simplify the grammar to: identifier ::= identifier_start {identifier_start | identifier_extend} identifier_start ::= letter_uppercase | letter_lowercase | letter_titlecase | letter_modifier | letter_other | number_letter identifier_extend ::= mark_non_spacing | mark_spacing_combining | number_decimal_digit | punctuation_connector | other_format which is exactly the grammar that annex 7 of UAX #15 recommends. So that's nice. We adopt exactly the Unicode recommendation, with an extra sentence giving the restriction that we decide to add. **************************************************************** From: Pascal Leroy Sent: Monday, January 24, 2005 7:45 AM Excellent point! So this is even better, we don't look like we add our own inventions on top of Unicode. **************************************************************** From: Dan Eilers Sent: Monday, January 24, 2005 1:14 PM AI 285 says: > The characters in the category other_format are effectively ignored in most > lexical elements, with the exception that they are illegal in string_literals > and character_literals. Is the intent that other-format characters will be allowed in other lexical elements, such as reserved words, numeric literals, and compound delimiters? It seems a little strange to be gumming up the works of lexical analyzers by allowing certain formating characters inside certain lexemes. **************************************************************** From: Robert Dewar Sent: Monday, January 24, 2005 1:22 PM > Is the intent that other-format characters will be allowed in other lexical > elements, such as reserved words, numeric literals, and compound delimiters? I don't know the intent, but the rules are clear, other-format characters are allowed ONLY in identifiers. > > It seems a little strange to be gumming up the works of lexical analyzers > by allowing certain formating characters inside certain lexemes. Well it's not that hard to implement, but it does seem odd. **************************************************************** From: Dan Eilers Sent: Monday, January 24, 2005 2:00 PM Then the erroneous wording in AI 285 needs to be changed from: The characters in the category other_format are effectively ignored in most lexical elements, with the exception that they are illegal in string_literals and character_literals. to: The characters in the category other_format are illegal in all lexical elements except identifiers (and maybe comments). > > It seems a little strange to be gumming up the works of lexical analyzers > > by allowing certain formating characters inside certain lexemes. > > Well it's not that hard to implement, but it does seem odd. Our lexical analyzer processes reserved words and identifiers together, so it will have more of an impact. Are there any users chomping at the bit to put formatting characters in their identifiers? If not, it seems unwise to slow down lexical processing for everybody else on the off chance that someone might eventually find some use for this. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 2:23 PM > Then the erroneous wording in AI 285 needs to be changed from: > > The characters in the category other_format are effectively ignored in most > lexical elements, with the exception that they are illegal in string_literals > and character_literals. Well I see this wording, but I don't see anything else in the AI to back up this position. I really don't want to have to allow these junk characters in the middle of := > The characters in the category other_format are illegal in all lexical > elements except identifiers (and maybe comments) Let's add reserved words, and for sure absolutely anything shbould be allowed in a comment except an end of line, which terminates the comment. > Our lexical analyzer processes reserved words and identifiers together, > so it will have more of an impact. Ah ha, you are right, my current implementation is ignoring these other format characters in reserved words, and it would be a huge pain to fix this. It also is bizarre to allow these in identifiers and not in reserved words. I also think it would be horrible (really an extension of my double underline point) to allow identifiers that are visually identical to reserved words, differing only in invisible format characters. I can even see programmers misusing this when they really really want to use a reserved word as an identifier, UGH! > Are there any users chomping at the bit to put formatting characters > in their identifiers? If not, it seems unwise to slow down lexical > processing for everybody else on the off chance that someone might > eventually find some use for this. Well there is merit in following the recommendations of the standard. **************************************************************** From: Randy Brukardt Sent: Monday, January 24, 2005 2:22 PM > Are there any users chomping at the bit to put formatting characters > in their identifiers? If not, it seems unwise to slow down lexical > processing for everybody else on the off chance that someone might > eventually find some use for this. My understanding of the intent is that we are trying to match (within reason) the Unicode recommendations for program identifiers. The presumption is that the Unicode people know more about character sets than we ever will, so it is best to follow their lead. I personally don't think that there are many who are "chomping at the bit" to use any Unicode characters in identifiers. So it would be impossible to predict what the users that do want such identifiers will want. Simply allowing Unicode characters in identifiers is going to slow down the lexing (as with Dan's implementation, Janus/Ada processes ids and reserved words together), and I doubt that the particulars of the allowed characters will make much difference. **************************************************************** From: Robert Dewar Sent: Monday, January 24, 2005 3:12 PM I don't think allowing unicode characters in identifiers slows things down significantly in practice, at least not with the approach we take (which you can look at if you like :-) I do think making a distinction between identifiers and keywords is a huge menace and we should fix this. This has nothing to do with the standard really, it is perfectly appropriate to apply the unicode recommendations for identifiers to keywords, regarding the notion in unicode of identifier to be more general and subsume keywords. It's really so much easier to simply ignore the format effectors as you store the identifier in the first place. **************************************************************** From: Randy Brukardt Sent: Monday, January 24, 2005 4:08 PM > I don't think allowing unicode characters in identifiers slows things > down significantly in practice, at least not with the approach we take > (which you can look at if you like :-) I think that the table lookups (which can't be pure array indexing like it is now) will slow things down somewhat. But I don't think it will be a major issue. > I do think making a distinction between identifiers and keywords is a > huge menace and we should fix this. This has nothing to do with the > standard really, it is perfectly appropriate to apply the unicode > recommendations for identifiers to keywords, regarding the notion > in unicode of identifier to be more general and subsume keywords. I certainly agree with you here, and didn't mean to give the impression that I didn't. > It's really so much easier to simply ignore the format effectors as > you store the identifier in the first place. Yes, if they don't change equality, they certainly would be ignored. In which case, they need to be allowed in reserved words. Especially because we don't want the abuse someone suggested about sticking invisible characters into a keyword to make it an identifier. I'm not quite sure what wording change is needed, however. **************************************************************** From: Robert Dewar Sent: Monday, January 24, 2005 9:35 PM Randy Brukardt wrote: > I think that the table lookups (which can't be pure array indexing like it > is now) will slow things down somewhat. But I don't think it will be a major > issue. But you only look up in the tables if you have a wide character, so you can't say that this slows things down. It is true that having to check for letters etc is slower than the approach GNAT took before which was to allow any wide characters in identifiers, but neither approach slows things down for identifiers not containing wide characters. ... >>It's really so much easier to simply ignore the format effectors as >>you store the identifier in the first place. > > Yes, if they don't change equality, they certainly would be ignored. In > which case, they need to be allowed in reserved words. Especially because we > don't want the abuse someone suggested about sticking invisible characters > into a keyword to make it an identifier. I'm not quite sure what wording > change is needed, however. The main thing is to agree that there will be no ACATS tests that test for this anomoly :-) **************************************************************** From: Randy Brukardt Sent: Monday, January 24, 2005 10:25 PM You have figure out the character class of every character somehow; that certainly includes the Latin-1 characters. You can test for wide characters first, then do different lookups for wide and non-wide, or you can think of the lookup as a single operation, in which case the lookup is complicated by handling wide characters. The code is probably essentially the same either way, and its clearly slower for handling wide characters outside of literals. **************************************************************** From: Robert Dewar Sent: Monday, January 24, 2005 11:37 PM > You have figure out the character class of every character somehow; that > certainly includes the Latin-1 characters. You can test for wide characters > first, then do different lookups for wide and non-wide, or you can think of > the lookup as a single operation, in which case the lookup is complicated by > handling wide characters. That's a really bad idea to do it as a single lookup > The code is probably essentially the same either way, and its clearly slower > for handling wide characters outside of literals. In GNAT, there really is zero penalty here. The way things are done is to have an identifier table of valid identifer characters. If wide characters are allowed, then depending on the encoding, all upper half characters are not in this table, triggering an exit from identifier scanning, at which point you do the appropriate tests for wide characters. But in practice in the real world, 99.9% of all identifiers are in the lower half of ASCII anyway. Programs with characters in the upper half are either UTF-8 encoded or they are not. If they are not, then the only triggering characters are ESC (e.g. for Shift-JIS) or '[' for brackets, but those are not valid identifier characters in any case. If such programs are UTF-8 coded, then you have to decode anyway. I really don't see *any* penalty *at all* here. I invite you to look at the GNAT code, and explain why there is any penalty whatever. **************************************************************** From: Randy Brukardt Sent: Tuesday, January 25, 2005 12:15 AM Interesting. Sounds to me like you traded off maintainable code for performance (certainly a justifiable trade-off in some cases, and this quite possibly is one of them). Given the number of places that would have to do special processing (not just identifiers, but white space, literals, and comments), it seems like a nightmare. In fact, AI-285 *is* a nightmare, any way you slice it. It affects *everything*, and little of it in simple ways. Sigh. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 6:54 AM Randy Brukardt wrote: > Interesting. Sounds to me like you traded off maintainable code for > performance (certainly a justifiable trade-off in some cases, and this quite > possibly is one of them). Well I find the code very nicely maintainable, because it is mostly table driven (see csets in the GNAT sources for the tables for all the many character sets for identifiers supported by GNAT: Lexical analyzers are such a trivial part of a compiler anyway, and yes, it is very much worthwhile worrying about speed here :-) > Given the number of places that would have to do > special processing (not just identifiers, but white space, literals, and > comments), it seems like a nightmare. In fact, AI-285 *is* a nightmare, any > way you slice it. It affects *everything*, and little of it in simple ways. > Sigh. Nightmare seems a bit strong, I have taken about six days to do everything except the pretty mechanical Wide_Wide packages. True, that is longer than most other 2005 features :-) I do agree that in terms of effort-to-value ratio, this one is vanishingly small. Particularly because of so many edge cases. Quite a chunk of the time was taken in dealing with the very annoying case of Width/Wide_Width/Wide_Wide_Width applied to dynamic subtypes of String/Wide_String/Wide_Wide_String (nine cases now instead of only four before). **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 12:03 AM OK, here is another puzzle. What is the status of Soft Hyphen? The database entry is 00AD;SOFT HYPHEN;Cf;0;ON;;;;;N;;;;; Meaning that this is Other, Format, and therefore not a graphic character. So is this character *really* excluded from string literals? That seems like quite a surprising incompatibility, and indeed causes failure of an ACATS test: with Ada.Characters.Latin_1; package C250002_["C1"] is type Enum is ( Item, 'A', '["AD"]', AE_["C6"]["E6"]_ae, '["2D"]', '["FF"]' ); task type C2_["C2"] is entry C2_["C3"]; end C2_["C2"]; end C250002_["C1"]; So ???? **************************************************************** From: Pascal Leroy Sent: Tuesday, January 25, 2005 3:34 AM My reply to Dan and Robert's comments. Dan: > Is the intent that other-format characters will be allowed in > other lexical elements, such as reserved words, numeric > literals, and compound delimiters? Hmm, that's interesting. I originally wrote the AI by stating that other_format are first stripped out of the program text, and that the rest of section 2 applied to the "clean" text. However, this caused an incompatibility which was thought to be unpleasant: such a character appearing in a string literal would be OK in Ada 95, but would silently disappear in Ada 2005. So we decided to make other_formal illegal in character and string literals, to detect the problem at compile time. But obviously I did a lousy job of fixing the rest of the AI. Robert: > I don't know the intent, but the rules are clear, > other-format characters are allowed ONLY in identifiers. And comments (I think that's cover edby the current wording). This could be changed (to allow them pretty much everywhere) but I am reluctant to do this at this point, as it would seem to hair up both the RM and the implementations. Plus, this AI is approved by SC22 (yes, I mean SC22, not WG9) so we should only fix serious problems, not do cosmetic changes. Finally, the main use of other_format is to control the presentation of text, so it's only in comments and identifiers that they make any sense. Robert: > Well I see this wording, but I don't see anything else in the > AI to back up this position. I really don't want to have to > allow these junk characters in the middle of := Agreed. This is non-normative text in the AI anyway, so we can safely ignore it ;-) Note that the Unicode recommendations are not entirely clear as to what should be done with other_format in programming languages (except for identifiers, where we strictly follow the recommendations). Robert: > It also is bizarre to allow these > in identifiers and not in reserved words. I also think it > would be horrible (really an extension of my double underline > point) to allow identifiers that are visually identical to > reserved words, differing only in invisible format > characters. I can even see programmers misusing this when > they really really want to use a reserved word as an identifier, UGH! For sure we don't want this. I'll add wording to make sure that this is taken care of. Robert: > So this is annoying, we have introduced a non upwards > compatibility by forcing compilers to go to a lot of effort > to forbid a curious set of wide characters in string > literals, just to cause people trouble who run into this > silly rule in existing programs. Wake up, this AI does create a myriad little incompatibilities, just because many characters that were classified as graphic characters are not anymore. This was well understood by the ARG during the discussion of the AI, and we tried to make it so that incompatibilities would be detected at compile-time most of the time. Robert: > That's just drawn from the database, but I am a little bit > unsure of this table. What is the category of codes which > simply have no definition at all in the table. I assume they > are not excluded, since otherwise why are FFFE and FFFF > specially treated. Right, this was done on purpose: graphic_character is defined by exclusion (any character not in category...) so that the characters which are not classified yet (by Unicode) are considered graphic characters, and can therefore be used in string literals. On the other hand, they are neither letters nor digits, so they cannot be used in identifiers. This is essentially what Ada 95 did with respect to the wide characters. Robert: > What is the status of Soft Hyphen? > The database entry is > > 00AD;SOFT HYPHEN;Cf;0;ON;;;;;N;;;;; > > Meaning that this is Other, Format, and therefore not a > graphic character. So is this character *really* excluded > from string literals? That seems like quite a surprising > incompatibility... Yes, it is excluded from string and character literals, and yes, this is an incompatibility, but as I explained above, one which is detected at compile time. Again, we were aware of this, and sorry, this is not the worse incompatibility that comes with Ada 2005. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 7:13 AM Yes, but we introduce incompatibilities if there is a good reason to do so. Here there is no good reason at all that I can see except appeal to some notion of uniformity that has nothing to do with Ada. I don't find this worth implementing, so this is a place where GNAT will quite deliberately not conform. If anyone ever wants to validate (not clear that this will ever happen), we can put this under control of some silly pedantic switch. In fact there is an argument for putting the entire graphic-in-string stuff under such a switch. Perhaps it would be nice to have a collected list of all incompatibilities especially since some of them are considered bad by Pascal (not sure what he is referring to, since in general we have had little trouble in that area). The A5 case is the first time I have seen tests fail so far. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 7:18 AM oops, I mean AD case :-) **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 7:31 AM Let me be a little clearer on why I think it is such a bad mistake to exclude AD from string literals. In practice, Ada programs are run in many different environments where the graphics associated with the upper half have nothing to do with international standards or with anything in the Ada standard (e.g. various windows character sets). Ada programs work just fine in such environments, provided that the compiler and rules do not get in the way. Yes, 10646 thinks AD is a soft hyphen, but in my XP environment, it comes out as an upside down exclamation point. It really seems annoying to tell an Ada programmer working on XP that you can freely deal with all the upper half graphics in the range A0-FE, except for AD. I don't mind so much the changes in wide character stuff, since no one uses this anyway (we know because of bug reports that show that no one ran into things which were pretty fundamental for many years). But Ada programs working with 8-bit chars in various character sets are all over the place. Now a counter argument in the XP case is that 80-9F are also graphic characters in windows. True, but this is a (somewhat annoying) restriction that Ada programmers are used to and have worked around, but the AD exclusion is new and annoying, and simply makes no sense whatever in many environments. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 7:38 AM Actually, now that I think of it, once you get into the switch business, you might as well allow 80-9F in string and character literals when not in pedantic mode. This would make working under windows much easier. **************************************************************** From: Pascal Leroy Sent: Tuesday, January 25, 2005 7:39 AM > Yes, but we introduce incompatibilities if there is a good > reason to do so. Here there is no good reason at all that I > can see except... The reason is that we have a *mandate* from SC22 to support Unicode, er, I mean, ISO/IEC 10646:2003. There is no way that we could get the Amendment past SC22 without this. > Perhaps it would be nice to have a collected list of all > incompatibilities especially since some of them are > considered bad by Pascal (not sure what he is referring to, > since in general we have had little trouble in that area). > The A5 case is the first time I have seen tests fail so far. The AARM that is being prepared for Ada 2005 has a fairly extensive list of incompatibilities, much like the AARM for Ada 95. It turns out that this one is not mentioned, and I agree that it should, but I still think it's a rather unimportant incompatibility. At any rate, an implementation is free to have a nonstandard mode where it deviates from the syntax rules spelled out in the RM. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 8:17 AM It's fine to support unicode. How can a mandate to support unicode be interpreted as a mandate to NOT support something. We support all valid unicode stuff, where does it say in the unicode standard that we are required to reject AD in string literals, I don't see it. **************************************************************** From: Randy Brukardt Sent: Tuesday, January 25, 2005 12:32 PM > Perhaps it would be nice to have a collected list of all > incompatibilities especially since some of them are considered > bad by Pascal (not sure what he is referring to, since in general > we have had little trouble in that area). The A5 case is the first > time I have seen tests fail so far. I've tried to identify all incompatibilities and inconsistencies in the AARM, in the same way that it was done for Ada 95. It would be possible to extract those (mechanically or otherwise) to provide a short document. There is a similar list of "extensions" to Ada 95. **************************************************************** From: Randy Brukardt Sent: Tuesday, January 25, 2005 12:45 PM > It really seems annoying to tell an Ada programmer working > on XP that you can freely deal with all the upper half > graphics in the range A0-FE, except for AD. Well, I sympathize, but can't get too excited about this. But I'm more concerned about the basic idea: why can't soft hyphen be used in string literals? It's commonly used (the AARM is full of them) and it generally has a display representation (else you couldn't edit it). A program that generated AARM text could have many soft hyphens in strings and character literals; it seems like another case of Nanny Ada: "Wide_[AD]Wide_[AD]Text_IO" is a whole lot clearer than "Wide_" & Character'Val(16#AD#) & "Wide_" Character'Val(16#AD#) & "Text_IO" **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 1:44 PM > "Wide_[AD]Wide_[AD]Text_IO" > > is a whole lot clearer than > > "Wide_" & Character'Val(16#AD#) & "Wide_" Character'Val(16#AD#) & > "Text_IO" That comparison is not quite fair, it should be > "Wide_["AD"]Wide_["AD"]Text_IO" > > compared to: > > "Wide_" & SH & "Wide_" SH & "Text_IO" And indeed I rather prefer the second one here I must say. But I think it should be the programmer's choice. The argument in favor of not allowing soft hyphens is presumably that if you type in Unicode (whatever that means), and display in unicode, then the soft hyphens will be invisible in a program listing, which seems a worry. Of course if you use brackets notation, all is well (another reason for not being so down on brackets notation :-) **************************************************************** From: Randy Brukardt Sent: Tuesday, January 25, 2005 2:51 PM Well, that assumes that you use a use clause for package Latin_1; I would not do that personally because it isn't a package I use frequently enough. And the first one is more complex than it would be in practice: you'd just insert the proper character in your editor. I wrote it with the brackets notation only so I could send it in e-mail. > But I think it should be the programmer's choice. > > The argument in favor of not allowing soft hyphens is > presumably that if you type in Unicode (whatever that > means), and display in unicode, then the soft hyphens > will be invisible in a program listing, which seems > a worry. Of course if you use brackets notation, all > is well (another reason for not being so down on > brackets notation :-) I know, but that seems to me to be saying that we want the language to work with the crappiest possible tools. Any Unicode programming editor that didn't provide a way to show "hidden" characters would be pretty worthless. (Word does that - which is hardly a programming editor - and I generally leave the hidden characters displayed there.) These rules made sense in 1980, when everything was in 7-bit ASCII (if you were lucky); it's 25 years later now, and everything is done graphically with rich fonts. Prohibiting tabs and soft hyphens simply because some ancient editors can't display them is silly. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 3:56 PM I couldn't agree more! For my taste, I would allow AD in string literals and characters, and also allow all wide characters in literals. It seems to me that 10646 is about supporting use of wide characters, not making it hard by introducing unnecessary restrictions. If there are good and sufficient reasons to avoid some character in some particular environments, then please let's allow the programmer to make this decision and not try to second guess requiremens. **************************************************************** From: Robert A. Duff Sent: Tuesday, January 25, 2005 3:40 PM > Prohibiting tabs and soft hyphens simply because some > ancient editors can't display them is silly. Well, I think tabs are an abomination that should never have been invented. I don't even think they should be allowed in *whitespace* in Ada programs, much less string literals! But... But I tend to agree with Randy's sentiment, here. If some character has a reasonable use, as suggested by Randy at least for soft hyphens, it seems like a shame to forbid it in the language definition. If you don't like tabs or soft hyphens or whatever, make it a project-wide coding convention, and enforce it using a script as part of your CM system or something like that. **************************************************************** From: Robert Dewar Sent: Tuesday, January 25, 2005 5:30 PM I agree with the Robert Duff who wrote the third, permissive, paragraph, and I disagree with the Robert Duff who wrote the second, non-permissive para :-) **************************************************************** From: Randy Brukardt Sent: Tuesday, January 25, 2005 5:47 PM > Well, I think tabs are an abomination that should never have been > invented. I don't even think they should be allowed in *whitespace* > in Ada programs, much less string literals! But... The people who designed HTML agreed with you; they left out tabs. Now, try to get free-form text to line up properly (Ada syntax productions come to mind). Luckily for us, we make printed copies of the AARM from PDF derived from RTF, which has no such restrictions. Programming certainly needs tabs (especially when it is using a readable, non-fixed width font). Now the implementation of tabs often sucks... Anyway, back to your regularly scheduled language feature debate, already in progress. :-) **************************************************************** From: Jean-Pierre Rosen Sent: Wednesday, January 26, 2005 2:32 AM > If there are good and sufficient reasons to avoid some character in > some particular environments, then please let's allow the programmer > to make this decision and not try to second guess requiremens. > Which seems to beg for a configuration pragma Restriction (Basic_Character_Set_Only) ... **************************************************************** From: Pascal Leroy Sent: Wednesday, January 26, 2005 4:30 AM In reply to Bob, Randy and Robert: First, a political comment. Irrespective of the technical issues, the topic of character set is a very delicate one politically. Jim and I were very concerned that it could cause a catfight at the SC22 level that would derail the Amendment process, with potentially devastating consequences. So at the Palma WG9 meeting we decided to send to SC22 a summary of AI 285 to get a stamp of approval well in advance of the vote on the entire Amendment, so as to avoid ending up in a quagmire. Thanks to the support of Kiyoshi and Steve M., our proposal was approved by SC22, so we are on pretty firm ground now. However, by following this process, we have pretty much committed to not making substantial changes to AI 285. Otherwise there will be someone is SC22 who will think that we are cheating, and the gates of hell will open. I wished Bob, Randy, Robert and others had read the AI at that time, because we should really have had this discussion before sending the AI to SC22. Note that I am *not* trying to use this argument to quench the discussion, but I think we should only be doing minimal changes to the AI at this point. It's OK to say "we discovered an unintended consequence of the write-up, we are fixing it"; it's another kettle of fish to say "well, we really changed our mind on this entire business". Now for the technical discussion. I do not feel very strongly about other_format in literals, but my intuition is to be conservative because we have so little experience with programming in Unicode. On the other hand, I am noticing that Java and C# allow anything (including control characters) in string literals. On the third hand I'm not sure these languages are models that we want to follow. Specific comments below. Robert: > That comparison is not quite fair, it should be > > "Wide_["AD"]Wide_["AD"]Text_IO" > > compared to: > > "Wide_" & SH & "Wide_" SH & "Text_IO" > > And indeed I rather prefer the second one here I must say. > > The argument in favor of not allowing soft hyphens is > presumably that if you type in Unicode (whatever that means), > and display in unicode, then the soft hyphens will be > invisible in a program listing, which seems a worry. Well surely the second idiom is preferable because the bracket notation is not defined by the language, and is therefore not a portable syntax ;-) The first piece of program text doesn't even parse with a compiler (like ours) that doesn't support the bracket notation, regardless of the soft hyphen issue. And yes, the rationale for not allowing other_format characters in literals is that they may or may not print (or be displayed), and they may alter the presentation of the literals in surprising ways (details below). Randy: > I know, but that seems to me to be saying that we want the > language to work with the crappiest possible tools. Any > Unicode programming editor that didn't provide a way to show > "hidden" characters would be pretty worthless. (Word does > that - which is hardly a programming editor - and I generally > leave the hidden characters displayed there.) > > These rules made sense in 1980, when everything was in 7-bit > ASCII (if you were lucky); it's 25 years later now, and > everything is done graphically with rich fonts. Prohibiting > tabs and soft hyphens simply because some ancient editors > can't display them is silly. You cannot have it both ways, Randy. A few days ago you agreed with Robert that we should disallow other_format characters that would be used to write an identifier that looks like a reserved word (e.g., pro-tected where the hyphen is a soft hyphen). Now you say that anything goes because surely people will be able to display all these funny characters. But then you should not be bothered by pro-tected. None of us has much experience with Unicode editors (whatever that means), so I think we should err on the side of caution. It is not a simple matter of displaying the characters, by the way. These characters typically have some semantics for displaying text. For instance a soft-hyphen indicates a place where the editor can fold the line. I for one who be annoyed if my editor folded the line in the middle of a string literal. Another case that gave me headaches is this: among the other_format characters are some that change the display direction. Even if you have an editor that displays these characters, it's unclear how you would interpret what you see on the glass. Compare for instance: "a" & Right2Left & "bc" & Left2Right & "d" -- unambiguous, good old Ada "a[Right2Left]bc[Left2Right]d" -- Is it bc or cb in the string? Depends on whether the formatting characters are interpreted by the editor. I have read enough of the Unicode standard to realize that this is an extremely complicated area, and again, I'd rather be conservative. (Just out of curiosity, has anyone other than Robert looked at Unicode?) Bob: > But I tend to agree with Randy's sentiment, here. If some > character has a reasonable use, as suggested by Randy at > least for soft hyphens, it seems like a shame to forbid it in > the language definition. If you don't like tabs or soft > hyphens or whatever, make it a project-wide coding > convention, and enforce it using a script as part of your CM > system or something like that. This is the Bob I totally disagree with. Let's make the language very lax, and people will implement coding conventions on top of it if they like. This is really a flexibility vs. safety tradeoff. However, I for one have a hard time evaluating the safety impact, i.e., the confusion that may stem from having literals that are not wysiwyg. Perhaps I am overstating the problem. But I don't think you can just ignore it. **************************************************************** From: Robert A. Duff Sent: Wednesday, January 26, 2005 1:10 PM In reply to Pascal: > In reply to Bob, Randy and Robert: > > First, a political comment. Irrespective of the technical issues, the > topic of character set is a very delicate one politically. I'll defer to you on the political issues. If you say we should leave it as is to avoid rocking the boat, that's fine with me. You ought to think about whether people motivated by political concerns can discern the difference between minor and major changes. For all I know, no change at all is politically acceptable. > I wished Bob, Randy, Robert and others had read the AI at that time, > because we should really have had this discussion before sending the AI to > SC22. Well, I did look at it at the time, but my eyes glazed over, and my review was therefore useless. ;-) I'm only picking up on Robert's comments, and Robert apparently didn't notice these issues until he started to implement the thing. >...(Just > out of curiosity, has anyone other than Robert looked at Unicode?) A little bit, but again, my eyes glaze over. > Bob: > > But I tend to agree with Randy's sentiment, here. If some > > character has a reasonable use, as suggested by Randy at > > least for soft hyphens, it seems like a shame to forbid it in > > the language definition. If you don't like tabs or soft > > hyphens or whatever, make it a project-wide coding > > convention, and enforce it using a script as part of your CM > > system or something like that. > > This is the Bob I totally disagree with. Let's make the language very > lax, and people will implement coding conventions on top of it if they > like. I didn't state that as a general principle -- it just seems reasonable in this case. But I'd be happy either way (I mainly stick to 7-bit ASCII for my own code!). **************************************************************** From: Randy Brukardt Sent: Wednesday, January 26, 2005 3:40 PM > I wished Bob, Randy, Robert and others had read the AI at that time, > because we should really have had this discussion before sending the AI to > SC22. Note that I am *not* trying to use this argument to quench the > discussion, but I think we should only be doing minimal changes to the AI > at this point. It's OK to say "we discovered an unintended consequence of > the write-up, we are fixing it"; it's another kettle of fish to say "well, > we really changed our mind on this entire business". I doubt very much that SC22 cares what characters are allowed vs. not allowed in string literals. That was never the point of the political discussion. In any case, the AI did not point out the incompatibility, and I for one didn't think of it (as you point out, its not documented in the AARM). We should always look at incompatibilities carefully to see if they are justified. This one, IMHO, does not seem to be. ... > Randy: > > I know, but that seems to me to be saying that we want the > > language to work with the crappiest possible tools. Any > > Unicode programming editor that didn't provide a way to show > > "hidden" characters would be pretty worthless. (Word does > > that - which is hardly a programming editor - and I generally > > leave the hidden characters displayed there.) > > > > These rules made sense in 1980, when everything was in 7-bit > > ASCII (if you were lucky); it's 25 years later now, and > > everything is done graphically with rich fonts. Prohibiting > > tabs and soft hyphens simply because some ancient editors > > can't display them is silly. > > You cannot have it both ways, Randy. A few days ago you agreed with > Robert that we should disallow other_format characters that would be used > to write an identifier that looks like a reserved word (e.g., pro-tected > where the hyphen is a soft hyphen). I don't remember ever agreeing with any such thing. My understanding of Robert's position is that we have to allow other-format in reserved words, because otherwise the processing of identifiers (of which reserved words are a subset) is substantially complicated. That's what I agreed with; you seem to be taking the opposite approach. > Now you say that anything goes > because surely people will be able to display all these funny characters. > But then you should not be bothered by pro-tected. I'm not, and never was. > None of us has much experience with Unicode editors (whatever that means), > so I think we should err on the side of caution. It is not a simple matter > of displaying the characters, by the way. These characters typically have > some semantics for displaying text. For instance a soft-hyphen indicates > a place where the editor can fold the line. I for one who be annoyed if > my editor folded the line in the middle of a string literal. A Unicode programming editor clearly will not do such things in hidden-text mode. A general purpose word processor is not appropriate for editing programs now, and I very much doubt that will change. One of my objections to AI-388 is that is essentially forces implementations to create Unicode programming editors (since an editor that can't display the predefined packages is junk), and that is certainly a non-trivial task -- and one for which off-the-shelf support is quite scanty. Probably most would simply support a subset of Unicode (graphic characters and a few well-used other-formats, and little else). ... > I have read enough of the Unicode standard to realize that this is an > extremely complicated area, and again, I'd rather be conservative. (Just > out of curiosity, has anyone other than Robert looked at Unicode?) I don't doubt it. I'd be happy to simply except Soft-Hyphen and Tab from the existing rules, and stop there. I'm less concerned about other ones. Another option would be to allow them, and give an implementation-permission to not allow problematic other-formats, private-use, etc. in strings. We generally don't talk about source code formats, and it is there that there is a problem, not in the language definition. Worrying about what editors might or might not do is purely a function of the source formats and the tools, and that is way out of bounds for the language. Having restrictions on strings because some editor somewhere might not work right is pretty silly. Even if we went all the way and specified that canonical Ada source be given in UTF-8 , we could hardly force editors and tools to be able to handle all possible source. So the problem is the assumption that every tool can handle every possible Ada program. Once you realize that is impractical in a Unicode world, there really remains no important reason for restrictions *in the language*. The restrictions (if they need to exist) are *in the tools*. The standard needs to recognize that there must be the possibility of character restrictions in the tools; once it does so, there is no need to restrict character or string literals or comments *in the standard*. (Identifiers are a whole different kettle of fish, of course, and that was the area that was so contentious in SC22.) **************************************************************** From: Pascal Leroy Sent: Thursday, January 27, 2005 3:02 AM > We should always look at incompatibilities carefully to see > if they are justified. This one, IMHO, does not seem to be. Fine. At any rate I'll write an AI to discuss this in Paris. > I don't remember ever agreeing with any such thing. My > understanding of Robert's position is that we have to allow > other-format in reserved words, because otherwise the > processing of identifiers (of which reserved words are a > subset) is substantially complicated. That's what I agreed > with; you seem to be taking the opposite approach. I think we are actually in agreement. My view is that when the tokenizer reads a "word" it first removes all the other_format characters, and then checks to see if it's a reserved word (after conversion to upper case) or an identifier (in which case it checks for double underscores and the like). So pro-tected would be a reserved word, not an identifier. I feel quite strongly about this, btw, because this seems to align with the Unicode recommendations, and to match what other languages are doing. > I don't doubt it. I'd be happy to simply except Soft-Hyphen > and Tab from the existing rules, and stop there. I'm less > concerned about other ones. I would be very much opposed to picking characters one by one. We should take entire categories of characters, as defined by Unicode, if only because we don't have a good understanding of the purpose of all these weird 16- and 32-bit characters. We trust that the Unicode folks got the categorization right. I could live with other_format in literals. I'd rather not include tabs (which would effectively mean allow everything except format_effectors) because we have all been bitten by tabs at one point or another in our life. Remember, other_format is the only category that creates an incompatibility. **************************************************************** From: Robert Dewar Sent: Thursday, January 27, 2005 3:58 PM Robert A Duff wrote: > Well, I did look at it at the time, but my eyes glazed over, and my > review was therefore useless. ;-) I'm only picking up on Robert's > comments, and Robert apparently didn't notice these issues until he > started to implement the thing. Excactly, you don't really dig into the details till you look at them. I think it is essential that we fix things to have the same basic syntax for keywords and identifiers. I think it would be nice to me more permissing in string and character literals given that a) this makes the language far more convenient to use b) other languages faced with the same decision have gone in that direction c) it avoids a completely unnecessary non-upwards compatibility. If I had my way, I would also do in the case equivalence (I have fully implemented it, so this is not to ease the implementation burden in GNAT :-). The reason is that proper case equivalence processing is unavoidably locale dependent. It is simply too peculiar that a Turkish Ada programmer finds that dotted i is folded incorrectly to capital I without a dot. This means that of the identifiers Capital I with dot Lower case I with dot Lower case I without dot Captial I with out dot the first is distinct from the last three, which is just weird. I am sure that there are other locale dependent weirdnesses like this. But we can live with this if necessary. Good Ada style is never to take advantage of the case equivalence in any case. P.S. Pascal's tables in the AI for letters and numbers are significantly wrong. If people are interested, I am happy to post GNAT's understanding of the unicode categorizations :-) **************************************************************** From: Robert Dewar Sent: Thursday, January 27, 2005 4:15 PM >>You cannot have it both ways, Randy. A few days ago you agreed with >>Robert that we should disallow other_format characters that would be used >>to write an identifier that looks like a reserved word (e.g., pro-tected >>where the hyphen is a soft hyphen). I never said anything of the kind. It is fine to allow other other format stuff in identifiers following the recommendations. What is not OK is allowing underline (some soft junk) underline I think the prohibition against two underlines (more generally against two punctuation,connector class charactes should apply AFTER soft junk is stripped, not before. **************************************************************** From: Robert Dewar Sent: Thursday, January 27, 2005 4:20 PM > I think we are actually in agreement. My view is that when the tokenizer > reads a "word" it first removes all the other_format characters, and then > checks to see if it's a reserved word (after conversion to upper case) or > an identifier (in which case it checks for double underscores and the > like). So pro-tected would be a reserved word, not an identifier. Yes, that's right, I agree also > > I feel quite strongly about this, btw, because this seems to align with > the Unicode recommendations, and to match what other languages are doing. Not sure about the strongly, since I think this is non-critical, but certainly I agree. >>I don't doubt it. I'd be happy to simply except Soft-Hyphen >>and Tab from the existing rules, and stop there. I'm less >>concerned about other ones. That;s because you have not looked through them, and/or you simply don't know what they are. Here is the list: UTF_32_Other_Format : constant UTF_32_Ranges := ( (16#000AD#, 16#000AD#), -- SOFT HYPHEN .. SOFT HYPHEN (16#00600#, 16#00603#), -- ARABIC NUMBER SIGN .. ARABIC SIGN SAFHA (16#006DD#, 16#006DD#), -- ARABIC END OF AYAH .. ARABIC END OF AYAH (16#0070F#, 16#0070F#), -- SYRIAC ABBREVIATION MARK .. SYRIAC ABBREVIATION MARK (16#017B4#, 16#017B5#), -- KHMER VOWEL INHERENT AQ .. KHMER VOWEL INHERENT AA (16#0200C#, 16#0200F#), -- ZERO WIDTH NON-JOINER .. RIGHT-TO-LEFT MARK (16#0202A#, 16#0202E#), -- LEFT-TO-RIGHT EMBEDDING .. RIGHT-TO-LEFT OVERRIDE (16#02060#, 16#02063#), -- WORD JOINER .. INVISIBLE SEPARATOR (16#0206A#, 16#0206F#), -- INHIBIT SYMMETRIC SWAPPING .. NOMINAL DIGIT SHAPES (16#0FEFF#, 16#0FEFF#), -- ZERO WIDTH NO-BREAK SPACE .. ZERO WIDTH NO-BREAK SPACE (16#0FFF9#, 16#0FFFB#), -- INTERLINEAR ANNOTATION ANCHOR .. INTERLINEAR ANNOTATION TERMINATOR (16#1D173#, 16#1D17A#), -- MUSICAL SYMBOL BEGIN BEAM .. MUSICAL SYMBOL END PHRASE (16#E0001#, 16#E0001#), -- LANGUAGE TAG .. LANGUAGE TAG (16#E0020#, 16#E007F#)); -- TAG SPACE .. CANCEL TAG Why on earth would you suggest treating zero width no-break space or invisible separator in a manner different from Soft Hyphen (not sure what tab has to do with this, it is not an other format character). > I would be very much opposed to picking characters one by one. We should > take entire categories of characters, as defined by Unicode, if only > because we don't have a good understanding of the purpose of all these > weird 16- and 32-bit characters. We trust that the Unicode folks got the > categorization right. Exactly, I agree > I could live with other_format in literals. I'd rather not include tabs > (which would effectively mean allow everything except format_effectors) > because we have all been bitten by tabs at one point or another in our > life. Remember, other_format is the only category that creates an > incompatibility. RIght, I would allow everything except format effectors, that's an old Ada tradition, and I think it is fine to extend it to separator,line and separator,para. Think for a moment of an environment where Line Separator is used routinely to end lines, you really do NOT want to allow this in string literals as well. **************************************************************** From: Pascal Leroy Sent: Friday, January 28, 2005 3:29 AM > But we can live with this if necessary. Good Ada style is > never to take advantage of the case equivalence in any case. I noticed that issue while reviewing the AARM, and added an IA to say roughly "if you target a culture where some locale-dependent case folding rule is more appropriate, by all means, provide a nonstandard mode that supports this case folding". Being an IA, it doesn't impose anything on implementations, but it draws their attention to the fact that other case folding rules exist, and it also tell users that it's a legitimate request that they may want to bring up with their vendor (provided that they are willing to put a laaarge number of Turkish Lira on the table ;-) As far as I know only Turkish and Lithuanian have that problem (well, ancient Greek too, but I don't expect many people to program in ancient Greek). > P.S. Pascal's tables in the AI for letters and numbers are > significantly wrong. If people are interested, I am happy to > post GNAT's understanding of the unicode categorizations :-) I know, I know, but please don't post your tables. If I read any piece of code from GNAT my brain gets polluted by public domain software, and IBM won't let me work on Apex anymore (it does sound silly, but it's absolutely true). **************************************************************** From: Robert Dewar Sent: Friday, January 28, 2005 4:21 AM > know, but please don't post your tables. If I read any piece of code from > GNAT my brain gets polluted by public domain software, and IBM won't let > me work on Apex anymore (it does sound silly, but it's absolutely true). That is indeed complete nonsense, given this is under the GMGPL, and you can perfectly well incorporate it into Apex with no legal problems whatever. I really think it would be better to have an agreed on set of tables that we all use. Uniformtiy of implementations is more important than the junk rules we have agreed to implement! **************************************************************** From: Robert Dewar Sent: Friday, January 28, 2005 4:23 AM By the way, NO PART of gnat is public domain, please do not spread this seriously incorrect misconception. All our software is copyrighted and we object to people trying to dilute our copyright by claiming that our software is in the public domain. Thanks for being careful on this point in future. For sure, you have to check licensing conditions. In this case, the GNAT code is under the GMGPL precisely so that other proprietary implementations can share the tables if they wish. **************************************************************** From: Pascal Leroy Sent: Friday, January 28, 2005 5:12 AM I realize that, and "public domain" was just a convenient shorthand, although I should have been more precise/careful. Sorry about that. > For sure, you have to check licensing conditions. In this > case, the GNAT code is under the GMGPL precisely so that > other proprietary implementations can share the tables if they wish. I understand that, but I was not actually kidding: we had to remove libraries covered by the LGPL from some of our products before the lawyers objected to it. Anyway, this is getting totally off-topic... **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 10:03 AM > These tables were posted when Rational was still an independent company, > and Rational had a very lax policy with respect to IP, so it would have > been fine at the time to incorporate them in a product. They didn't have > a copyright notice, and that was on purpose. a) copyright notices have zero legal sigficance (true since 1986) b) tables like this are not copyrightable anyway > > This is an irrelevant question. The only document you need to implement > Ada-related products is the AARM for Ada 2005, and certainly the copyright > situation for the AARM is very clear. That surely is NOT true here, the necessary information for implementing the unicode stuff is not incorporated in the AARM (should it be? I would think the answer should be yes, we should include the tables in the AARM). Is the copyright situation for the AARM clear. It is owned by everyone who has contributed text. Have they all signed waivers or assignments? **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 1:43 PM As it turns out, the tables are too long to post to the list anyway. If anyone wants a copy of the tables, send me some email and I will attach them to the reply. Note that the tables themselves are not copyrightable elements, since they are derived from external requirements (so there is no creative element, see Altai vs Computer Associates for a well worth while analysis of what is and what is not copyrightable when it comes to sofware). **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 1:48 PM Pascal Leroy wrote: > I understand that, but I was not actually kidding: we had to remove > libraries covered by the LGPL from some of our products before the lawyers > objected to it. > > Anyway, this is getting totally off-topic... I know, but it is worth while making one more point. Indeed your lawyers understandably disliked the LGPL, since it places pretty severe restrictions on the distribution, namely that things must be distributed in object form in a manner that makes it possible to relink with modified versions of the LGPL'ed units, and if you change any of these units, you have to distribute the sources. The GMGPL (a similar license is used for the gnu c and g++ libraries) is quite different and much more permissive, it places no restriction whatever on the distribution of executable programs containing GMGPL'ed code. The whole point of this license is to remove any impediments to use of the related components in proprietary or classified code. **************************************************************** From: Dan Eilers Sent: Saturday, January 29, 2005 2:31 PM > IBM of course has rather stricter policies, so I suppose that today I > wouldn't post the tables anymore, at least not without a copyright notice > and/or approval by our legal department. This of course doesn't answer the question. Does IBM or any other company have any known IP claims on the contents of any AI's? I would not be the least bit surprised if IBM has current or pending patents or other IP claims on Unicode, and if so, the licensing rights should be clearly stated before we go adding Unicode tables and algorithms to our compilers, and especially before we go adding a gratuitous Unicode character to the spec of a predefined Ada package, which, as noted in the Atlanta ARG minutes, is intended to force Unicode algorithms not only on Ada compilers but also on Ada toolsets. I would also not be the least bit surprised if some of the AI's, especially those that standardize features from existing implementations, such as pragma assert, pragma no_return, etc., are subject to current or pending IP claims. > > If so, please let us know which AI's these are... > > This is an irrelevant question. I think it is highly relevant. There have been some very high visibility recent cases of companies implementing a standard getting sued because the standard is claimed to infringe the IP of one of the members of the standardization committee, witness Rambus, for example. There is certainly precedent for patenting Ada compiler implementation techniques, witness the DEC Generics patent. There is certainly precedent for patenting compiler implementation techniques in general, witness the patents on implementing multiple inheritance, and SSA, for example. I am not aware of any effort within the ARG or WG9 to head off such difficulties. Quite to the contrary, ARG members are claiming to be afraid to read public domain comments from other vendors, for fear of brain pollution. > The only document you need to implement > Ada-related products is the AARM for Ada 2005, ... Certainly you are not claiming that all the implementation information from the AI's is in the AARM? Certainly you are not claiming that someone implementing the AARM will be automatically free of IP claims? > and certainly the copyright > situation for the AARM is very clear. What is the copyright situation for the AARM??? Is the AARM being produced by the ARG? By Ada Europe? By AXE Consultants? By Mitre? By Springer Verlag? My understanding is that the ARG is limiting itself to producing only an Amendment document, and not a new RM or annotated RM. **************************************************************** From: Randy Brukardt Sent: Saturday, January 29, 2005 10:21 PM > What is the copyright situation for the AARM??? Is the AARM being > produced by the ARG? By Ada Europe? By AXE Consultants? By Mitre? > By Springer Verlag? It's quite clear from the copyright pages of the AARM; they were updated first, before any copies went out. Same with the Amendment document. I can't say if anyone could claim rights to some part of these documents. My guess is, sure someone *could* - even to parts of Ada 83. But all of those contributions were made to public mailing lists, where there is an implied intent to allow others to use the contributions unencumbered. (It probably would be prudent to post such a statement for both ARG and Ada-Comment. But that would be insufficient to avoid any *possibility* of problems. I doubt that there is *anything* that could be done that would avoid any *possibility* of problems, short of the Shakespeare solution.) Ada could use some publicity anyway. :-) **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 10:28 PM > This of course doesn't answer the question. Does IBM or any other > company have any known IP claims on the contents of any AI's? No one knows the status of comments made to public newsgroups. Obviously individual comments are copyrighted by their author unless copyright is specifically disclaimed. On the other hand, everyone assumes that quoting such comments (as I am doing here) is fair use. I think a jury would agree, but no one knows what a jury might or might not do, and you can't live your life worrying about jury opinions that have not been stated and might never be. > I would not be the least bit surprised if IBM has current or pending > patents or other IP claims on Unicode, and if so, the licensing rights > should be clearly stated before we go adding Unicode tables and algorithms > to our compilers, and especially before we go adding a gratuitous > Unicode character to the spec of a predefined Ada package, which, > as noted in the Atlanta ARG minutes, is intended to force Unicode > algorithms not only on Ada compilers but also on Ada toolsets. You can perfectly well look up the patents if you want. To say you think there may be such without bothering to find out definitely seems mere FUD to me. > I would also not be the least bit surprised if some of the AI's, > especially those that standardize features from existing implementations, > such as pragma assert, pragma no_return, etc., are subject to current > or pending IP claims. Well pragma assert and pragma no_return come from GNAT, where there are definitely no such claims. I don't see the point in executing some document to that effect in this case, but it could be done. In any case, these are language features, and the issue is being compatible with other implementations of other languagres. Dan, reread the Lotus-Borland case, and it should put your mind at rest that no such IP's can be asserted. > I think it is highly relevant. There have been some very high visibility > recent cases of companies implementing a standard getting sued because > the standard is claimed to infringe the IP of one of the members of the > standardization committee, witness Rambus, for example. True, but I don't see Unicode as a concern here, and would recommend (not as an attorney, but also not as a complete neophyte, I have been certified as an expert in copyright matters several times in federal court) that we not waste time on this issue. I do not regard it as a legitimate argument against pi, though I agree that this is a gratuitous feature. I don't think it's so bad in practice, since I think Ada programmers have better taste than the ARG and will not use it :-) I would recommend that tools not bother with it. Either you make a tool completely wide wide aware (which is likely to be a huge job), or you just say that your tool only handles non wide stuff, and the (mis)use of pi is just a special case of programs that are not handled. > There is certainly precedent for patenting Ada compiler implementation > techniques, witness the DEC Generics patent. There is certainly > precedent for patenting compiler implementation techniques in general, > witness the patents on implementing multiple inheritance, and SSA, for > example. Yes, but that's really not an issue here. I think this is just FUD that can safely be ignored. Certainly making a judgment as the CEO of AdaCore (I have to make judgments like this all the time > I am not aware of any effort within the ARG or WG9 to head off such > difficulties. Quite to the contrary, ARG members are claiming to be > afraid to read public domain comments from other vendors, for fear of > brain pollution. Well the brain pollution problem is Pascal's own, and I don't think it should legitimately stop anyone from posting anything in this mailing list. Indeed companies who are REALLY serious about the brain pollution issue would not dream of letting an employee for whom this is a concern serve on a standards committee (I know of several such cases). I still don't think the ARG need spend any time worrying about this issue. If there is any action required it is at a higher level, and I would encourage Dan to make waves at those higher levels to see if he can attract any concern (I would expect not). > Certainly you are not claiming that someone implementing the AARM > will be automatically free of IP claims? No one is ever free of such claims. Anyone can sue anyone, anytime, about anything. If you write any code, you can be sued for patent violations concerning patents you do not know about, and could not know about. I would worry far more about that problem than the issue here with Ada language design and features. **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 11:15 PM The unicode patent policy is stated in http://www.unicode.org/policies/patent_policy.html I have read this document in its entirety, and my judgment is that this completely lays to rest any concerns that Dan Eilers has expressed with regard to patents and unicode, but I encourage others to read this document if they have any concerns. **************************************************************** From: Dan Eilers Sent: Saturday, January 29, 2005 9:34 PM > You can perfectly well look up the patents if you want. To > say you think there may be such without bothering to find > out definitely seems mere FUD to me. I was hoping I wouldn't have to, since rummaging through patent databases is nontrivial. The implications of a patent are not at all clear from the title, and the legalese used in the descriptions often are designed more for obfuscation than for clarity, in an attempt to be as widely applicable as possible. And of course a patent database doesn't help at all with patents that haven't yet been published, which are the ones that seem to have caused the most trouble in standardization efforts. Well, rummaging through the patent database at http://www.freepatentsonline.com turns up 603 patents mentioning the word Unicode, some of which are assigned to IBM. I don't have any idea how serious this is, without taking a lot of time to examine each of these. ...[Editor's note: Part of this message is filed in AI-388, along with appropriate responses to it.] > The unicode patent policy is stated in > > http://www.unicode.org/policies/patent_policy.html This requires non-discriminatory licensing terms. It doesn't say what those terms are, or which companies have IP claims related to Unicode. **************************************************************** From: Robert Dewar Sent: Sunday, January 30, 2005 2:31 AM Can I suggest that any further messages about patents or copyright issues with unicode identify themselves with something like the above, so that the thread is identified, and easily ignored for those who do not want to follow it further (which certainly includes me!) **************************************************************** From: Pascal Leroy Sent: Sunday, January 30, 2005 5:59 AM > This of course doesn't answer the question. Does IBM or any > other company have any known IP claims on the contents of any AI's? It surely doesn't answer the question, because I am neither competent nor allowed to make legal statements for IBM. If you are really worried about this, I suggest that your attorneys contact the IBM General Legal and Intellectual Property Law Department, in Armonk, NY, USA. **************************************************************** From: Robert Dewar Sent: Sunday, January 30, 2005 12:01 PM chuckle :-) If he follows your advice, the IBM lawyers can probably keep Dan occupied for months :-) I must say, Dan is an expert at undermining his own arguments by moving into unreasonable arguments. P.S. Sorry I will miss ARG meeting again, but Gary will be there. **************************************************************** From: Robert Dewar Sent: Sunday, January 30, 2005 12:11 PM Oops this was really not supposed to go to the list (the ARG list is one of these nasty lists where a specific reply goes to the list and not to the sender). Never mind, I think Dan knows my point of view at this stage! I actually think that Dan's original argument against AI-388 had quite a bit of merit. This really was a gratuitous change, not requested by any real Ada user as far as I know. Never mind, there are worse problems. Actually these days, IBM is not the main target to worry about if you are worried about patent issues in contexts like this (I assume everyone is aware of IBM's recent welcome actions in this area). **************************************************************** From: Dan Eilers Sent: Monday, January 31, 2005 4:21 AM Anytime someone proposes including in an international standard some feature that their company has patented or has patents pending, I think it is reasonable to expect that such a proposal (and if the proposal is accepted, the new standard document itself) be accompanied by a statement clarifying the intended licensing conditions for such patents. **************************************************************** From: Robert Dewar Sent: Sunday, January 30, 2005 8:34 AM [The comment below is posted in AI-388.] > There is obviously no pressing issue in the RM or in other AIs for > a better looking Pi symbol. I suppose it could be argued that SC22's > resolution 02-24 for "appropriate support of Unicode" constitutes > as a pressing need for certain Unicode characters _other_ than Pi. > For example, Unicode recommends that paired quotation marks use > U+201C and U+201D instead of the ASCII double-quote mark that > Ada currently uses. I am not suggesting following this recommendation, > but if there is a pressing need to follow Unicode recommendations as > closely as possible, then Pi is the wrong symbol to start with. Actually I think that supporting U+201C and U+201D for strings would make perfectly good sense in the context of AI285, and I would recommend adding it, since it is a negligible extra burden in the context of AI-285. **************************************************************** From: Dan Eilers Sent: Monday, January 31, 2005 3:21 AM Just to be clear. Some of the reasons that I am not suggesting following this recommendation are: 1) Pascal has made it very clear that the deadline for submitting new proposals was more than a year ago, and for a new AI to stand a chance, it must address a pressing issue in the existing RM or in other AIs, and this does not come anywhere close to meeting that standard. 2) Even if the deadline had not passed, the language is insufficiently broke. No users are clamoring for this, and there are a host of features and libraries available in competing languages that users are clamoring for. 3) Consistency of language design is important. If Unicode directional quotes were allowed, there would soon be a call for Unicode "/=", and if that were allowed, there would soon be a call for Unicode "<=" and ">=". And if those were allowed, there would soon be a call for multiply and divide symbols, etc. The ARG would end up spending all its time debating each possible Unicode symbol, with no guiding principle. 4) Code portability is more important than the appearance of mathematical symbols. Users in a Unicode environment would be tempted to use these symbols, making their code non-portable to non-Unicode environments. **************************************************************** From: Pascal Leroy Sent: Monday, January 31, 2005 5:01 AM > Actually I think that supporting U+201C and U+201D for > strings would make perfectly good sense in the context of > AI285, and I would recommend adding it, since it is a > negligible extra burden in the context of AI-285. Don't go there! There was a clear opposition from some countries to allowing extended character sets in contexts other than identifiers (and string literals and comments, but that was already present in Ada 95). I originally had proposed to allow extended characters in numeric literals, and this was rejected. Furthermore, as explained by the Unicode standard, use of the quotation marks is extremely culture-dependent. The above characters would be appropriate for English, but not for French, and U+201C which is opening in English is closing in German. In fact I cannot find a recommendation to allow U+201C and U+201D to delimit string literals in the Unicode standard, the only thing I can find is an indication that these are the preferred characters to use in English text. Finally, ECMA C#, which is a good model from this perspective, because it's one of the few ISO languages that support Unicode, only allows U+0022 (the good ol' double quote) to delimit string literals. **************************************************************** From: Robert A. Duff Sent: Monday, January 31, 2005 7:14 AM > Don't go there! I wouldn't dare go there. ;-) But I'm curious: >... There was a clear opposition from some countries to > allowing extended character sets in contexts other than identifiers (and > string literals and comments, but that was already present in Ada 95). What was the basis of this opposition? Were they generally opposed, or was it particular characters or lexical elements, such as num lits mentioned below? >...I > originally had proposed to allow extended characters in numeric literals, > and this was rejected. ... > The above characters would be > appropriate for English, but not for French, ... We use French-style quotes for labels, right? **************************************************************** From: Pascal Leroy Sent: Monday, January 31, 2005 8:07 AM > What was the basis of this opposition? Were they generally > opposed, or was it particular characters or lexical elements, > such as num lits mentioned below? Numeric literals were discussed in great detail, and there was a feeling that the "usual" digits (I first wrote "arabic" but of course although Europe got these digits from the Arabs, they use distinct forms these days) are understood by programmers, and that allowing other digits in literals would lead to confusion. For instance, Unicode has special characters for the Roman digits, which look furiously like X, V, I, etc. Consider the assignment: A := VIII; Is the right-hand side an identifier, or a literal? You need to have a good eyesight to distinguish the letter V from the Roman digit V. Furthermore, is the value of the literal 8 or 5111? Hmm. In general, you want to refrain from introducing locale-dependent issues in the lexical elements (as far as I can tell, the only locale-dependent issue we have at this point is the fact that the case folding rule is not ideal for Turkish and Lithuanian). Identifiers are much less problematic, and they do bring more value, as it makes sense for programmers, say, in Russia, to be able to use their native alphabet and not rely on some phony translitteration. > We use French-style quotes for labels, right? We use Jean's approximation of French quotes, built from ASCII characters. Of course, Unicode now has specific characters for the French quotes. **************************************************************** From: Robert Dewar Sent: Monday, January 31, 2005 11:20 AM >...I >originally had proposed to allow extended characters in numeric literals, >and this was rejected. I would hope so! I can't believe anyone would even suggest this :-) **************************************************************** From: Robert Dewar Sent: Monday, January 31, 2005 11:26 AM > Numeric literals were discussed in great detail, and there was a feeling > that the "usual" digits (I first wrote "arabic" but of course although > Europe got these digits from the Arabs, they use distinct forms these > days) Arabic digits are still used in West Arabic writing (e.g. Morroco). It is the East Arabic writing that uses the other set of digits. > are understood by programmers, and that allowing other digits in > literals would lead to confusion. For instance, Unicode has special > characters for the Roman digits, which look furiously like X, V, I, etc. > Consider the assignment: > > A := VIII; > > Is the right-hand side an identifier, or a literal? You need to have a > good eyesight to distinguish the letter V from the Roman digit V. > Furthermore, is the value of the literal 8 or 5111? Hmm. The fact that this kind of nonsense was even discussed gives a lot more insight into how things went a bit far. Are you seriously saying that someone suggested that we support Roman numerals for literals. Please tell me this is a joke :-) > > In general, you want to refrain from introducing locale-dependent issues > in the lexical elements (as far as I can tell, the only locale-dependent > issue we have at this point is the fact that the case folding rule is not > ideal for Turkish and Lithuanian). not ideal = plain wrong > Identifiers are much less problematic, and they do bring more value, as it > makes sense for programmers, say, in Russia, to be able to use their > native alphabet and not rely on some phony translitteration. Actually most programmers prefer to maintain the portability of sticking to lower half ASCII. GNAT has allowed wide characters in identifiers for ever, but we have never seen any serious use of it, just a bit of academic experimentation. I doubt Ada 2005 will be different. **************************************************************** From: Robert A. Duff Sent: Monday, January 31, 2005 5:40 PM Hmm. Maybe by 2015 Unicode will be ubiquitous. Remember that Ada 83 was designed with the fact in mind that lowercase letters might not be available. Colonel Whittaker used to send us (Intermetrics) bug reports in ALL CAPS. In my own code, I'm pretty happy with 7-bit ascii for identifiers, but I can understand why other folks might want more. But I've always thought things like <= are rather uncivilized compared to the notation I learned in grade school. There's a program called a2ps that knows how to print various programming languages; for Ada, it boldfaces keywords, and prints a proper less-than-or-equal symbol, and so forth. I was once hired by a law firm on an intellectual property rights case. The code was not Ada. Most of the comments were in Kanji. Most of the identifiers were in English. I wonder what this Roman Numeral might mean: X := 16#VIII#; -- ;-) **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 1:53 PM In Robert's view, these tables which GNAT currently uses are not copyrightable elements, since they are derived in an obvious and mechanical way from the external requirements. All the tables except the case conversion one have been recomputed using a program I wrote from what I understand to be the latest unicode categorization tables. Still no guarantees :-) The case conversion table is from the AI, and may thus be out of date. Frankly I have a hard time getting up the energy to check it since as I wrote in a previous note, it seems to me to be a piece of junk. Still if some customer finds a bug, or the ACATS tests do, then we will fix it at that point :-) I have not included the (fairly trivial) code that goes with these tables that provides convenient interfaces for use by a client, including an efficient binary search through the tables. If anyone is interested let me know, and I will email you privately the entire GMGPL'ed unit from the GNAT sources. (comments welcome) ----------------------------------------------- -- Tables for UTF_32 Categorization Routines -- ----------------------------------------------- -- Note these tables are derived from those given in AI-285. For details -- see //www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00285.TXT?rev=1.22. type UTF_32_Range is record Lo : Char_Code; Hi : Char_Code; end record; type UTF_32_Ranges is array (Positive range <>) of UTF_32_Range; -- The following array includes all characters considered digits, i.e. -- all characters from the Unicode table with categories: -- Number, Decimal Digit (Nd) UTF_32_Digits : constant UTF_32_Ranges := ( (16#00030#, 16#00039#), -- DIGIT ZERO .. DIGIT NINE (16#00660#, 16#00669#), -- ARABIC-INDIC DIGIT ZERO .. ARABIC-INDIC DIGIT NINE (16#006F0#, 16#006F9#), -- EXTENDED ARABIC-INDIC DIGIT ZERO .. EXTENDED ARABIC-INDIC DIGIT NINE (16#00966#, 16#0096F#), -- DEVANAGARI DIGIT ZERO .. DEVANAGARI DIGIT NINE (16#009E6#, 16#009EF#), -- BENGALI DIGIT ZERO .. BENGALI DIGIT NINE (16#00A66#, 16#00A6F#), -- GURMUKHI DIGIT ZERO .. GURMUKHI DIGIT NINE (16#00AE6#, 16#00AEF#), -- GUJARATI DIGIT ZERO .. GUJARATI DIGIT NINE (16#00B66#, 16#00B6F#), -- ORIYA DIGIT ZERO .. ORIYA DIGIT NINE (16#00BE7#, 16#00BEF#), -- TAMIL DIGIT ONE .. TAMIL DIGIT NINE (16#00C66#, 16#00C6F#), -- TELUGU DIGIT ZERO .. TELUGU DIGIT NINE (16#00CE6#, 16#00CEF#), -- KANNADA DIGIT ZERO .. KANNADA DIGIT NINE (16#00D66#, 16#00D6F#), -- MALAYALAM DIGIT ZERO .. MALAYALAM DIGIT NINE (16#00E50#, 16#00E59#), -- THAI DIGIT ZERO .. THAI DIGIT NINE (16#00ED0#, 16#00ED9#), -- LAO DIGIT ZERO .. LAO DIGIT NINE (16#00F20#, 16#00F29#), -- TIBETAN DIGIT ZERO .. TIBETAN DIGIT NINE (16#01040#, 16#01049#), -- MYANMAR DIGIT ZERO .. MYANMAR DIGIT NINE (16#01369#, 16#01371#), -- ETHIOPIC DIGIT ONE .. ETHIOPIC DIGIT NINE (16#017E0#, 16#017E9#), -- KHMER DIGIT ZERO .. KHMER DIGIT NINE (16#01810#, 16#01819#), -- MONGOLIAN DIGIT ZERO .. MONGOLIAN DIGIT NINE (16#01946#, 16#0194F#), -- LIMBU DIGIT ZERO .. LIMBU DIGIT NINE (16#0FF10#, 16#0FF19#), -- FULLWIDTH DIGIT ZERO .. FULLWIDTH DIGIT NINE (16#104A0#, 16#104A9#), -- OSMANYA DIGIT ZERO .. OSMANYA DIGIT NINE (16#1D7CE#, 16#1D7FF#)); -- MATHEMATICAL BOLD DIGIT ZERO .. MATHEMATICAL MONOSPACE DIGIT NINE -- The following table includes all characters considered letters, i.e. -- all characters from the Unicode table with categories: -- Letter, Uppercase (Lu) -- Letter, Lowercase (Ll) -- Letter, Titlecase (Lt) -- Letter, Modifier (Lm) -- Letter, Other (Lo) -- Number, Letter (Nl) UTF_32_Letters : constant UTF_32_Ranges := ( (16#00041#, 16#0005A#), -- LATIN CAPITAL LETTER A .. LATIN CAPITAL LETTER Z (16#00061#, 16#0007A#), -- LATIN SMALL LETTER A .. LATIN SMALL LETTER Z (16#000AA#, 16#000AA#), -- FEMININE ORDINAL INDICATOR .. FEMININE ORDINAL INDICATOR (16#000B5#, 16#000B5#), -- MICRO SIGN .. MICRO SIGN (16#000BA#, 16#000BA#), -- MASCULINE ORDINAL INDICATOR .. MASCULINE ORDINAL INDICATOR (16#000C0#, 16#000D6#), -- LATIN CAPITAL LETTER A WITH GRAVE .. LATIN CAPITAL LETTER O WITH DIAERESIS (16#000D8#, 16#000F6#), -- LATIN CAPITAL LETTER O WITH STROKE .. LATIN SMALL LETTER O WITH DIAERESIS (16#000F8#, 16#00236#), -- LATIN SMALL LETTER O WITH STROKE .. LATIN SMALL LETTER T WITH CURL (16#00250#, 16#002C1#), -- LATIN SMALL LETTER TURNED A .. MODIFIER LETTER REVERSED GLOTTAL STOP (16#002C6#, 16#002D1#), -- MODIFIER LETTER CIRCUMFLEX ACCENT .. MODIFIER LETTER HALF TRIANGULAR COLON (16#002E0#, 16#002E4#), -- MODIFIER LETTER SMALL GAMMA .. MODIFIER LETTER SMALL REVERSED GLOTTAL STOP (16#002EE#, 16#002EE#), -- MODIFIER LETTER DOUBLE APOSTROPHE .. MODIFIER LETTER DOUBLE APOSTROPHE (16#0037A#, 16#0037A#), -- GREEK YPOGEGRAMMENI .. GREEK YPOGEGRAMMENI (16#00386#, 16#00386#), -- GREEK CAPITAL LETTER ALPHA WITH TONOS .. GREEK CAPITAL LETTER ALPHA WITH TONOS (16#00388#, 16#0038A#), -- GREEK CAPITAL LETTER EPSILON WITH TONOS .. GREEK CAPITAL LETTER IOTA WITH TONOS (16#0038C#, 16#0038C#), -- GREEK CAPITAL LETTER OMICRON WITH TONOS .. GREEK CAPITAL LETTER OMICRON WITH TONOS (16#0038E#, 16#003A1#), -- GREEK CAPITAL LETTER UPSILON WITH TONOS .. GREEK CAPITAL LETTER RHO (16#003A3#, 16#003CE#), -- GREEK CAPITAL LETTER SIGMA .. GREEK SMALL LETTER OMEGA WITH TONOS (16#003D0#, 16#003F5#), -- GREEK BETA SYMBOL .. GREEK LUNATE EPSILON SYMBOL (16#003F7#, 16#003FB#), -- GREEK CAPITAL LETTER SHO .. GREEK SMALL LETTER SAN (16#00400#, 16#00481#), -- CYRILLIC CAPITAL LETTER IE WITH GRAVE .. CYRILLIC SMALL LETTER KOPPA (16#0048A#, 16#004CE#), -- CYRILLIC CAPITAL LETTER SHORT I WITH TAIL .. CYRILLIC SMALL LETTER EM WITH TAIL (16#004D0#, 16#004F5#), -- CYRILLIC CAPITAL LETTER A WITH BREVE .. CYRILLIC SMALL LETTER CHE WITH DIAERESIS (16#004F8#, 16#004F9#), -- CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS .. CYRILLIC SMALL LETTER YERU WITH DIAERESIS (16#00500#, 16#0050F#), -- CYRILLIC CAPITAL LETTER KOMI DE .. CYRILLIC SMALL LETTER KOMI TJE (16#00531#, 16#00556#), -- ARMENIAN CAPITAL LETTER AYB .. ARMENIAN CAPITAL LETTER FEH (16#00559#, 16#00559#), -- ARMENIAN MODIFIER LETTER LEFT HALF RING .. ARMENIAN MODIFIER LETTER LEFT HALF RING (16#00561#, 16#00587#), -- ARMENIAN SMALL LETTER AYB .. ARMENIAN SMALL LIGATURE ECH YIWN (16#005D0#, 16#005EA#), -- HEBREW LETTER ALEF .. HEBREW LETTER TAV (16#005F0#, 16#005F2#), -- HEBREW LIGATURE YIDDISH DOUBLE VAV .. HEBREW LIGATURE YIDDISH DOUBLE YOD (16#00621#, 16#0063A#), -- ARABIC LETTER HAMZA .. ARABIC LETTER GHAIN (16#00640#, 16#0064A#), -- ARABIC TATWEEL .. ARABIC LETTER YEH (16#0066E#, 16#0066F#), -- ARABIC LETTER DOTLESS BEH .. ARABIC LETTER DOTLESS QAF (16#00671#, 16#006D3#), -- ARABIC LETTER ALEF WASLA .. ARABIC LETTER YEH BARREE WITH HAMZA ABOVE (16#006D5#, 16#006D5#), -- ARABIC LETTER AE .. ARABIC LETTER AE (16#006E5#, 16#006E6#), -- ARABIC SMALL WAW .. ARABIC SMALL YEH (16#006EE#, 16#006EF#), -- ARABIC LETTER DAL WITH INVERTED V .. ARABIC LETTER REH WITH INVERTED V (16#006FA#, 16#006FC#), -- ARABIC LETTER SHEEN WITH DOT BELOW .. ARABIC LETTER GHAIN WITH DOT BELOW (16#006FF#, 16#006FF#), -- ARABIC LETTER HEH WITH INVERTED V .. ARABIC LETTER HEH WITH INVERTED V (16#00710#, 16#00710#), -- SYRIAC LETTER ALAPH .. SYRIAC LETTER ALAPH (16#00712#, 16#0072F#), -- SYRIAC LETTER BETH .. SYRIAC LETTER PERSIAN DHALATH (16#0074D#, 16#0074F#), -- SYRIAC LETTER SOGDIAN ZHAIN .. SYRIAC LETTER SOGDIAN FE (16#00780#, 16#007A5#), -- THAANA LETTER HAA .. THAANA LETTER WAAVU (16#007B1#, 16#007B1#), -- THAANA LETTER NAA .. THAANA LETTER NAA (16#00904#, 16#00939#), -- DEVANAGARI LETTER SHORT A .. DEVANAGARI LETTER HA (16#0093D#, 16#0093D#), -- DEVANAGARI SIGN AVAGRAHA .. DEVANAGARI SIGN AVAGRAHA (16#00950#, 16#00950#), -- DEVANAGARI OM .. DEVANAGARI OM (16#00958#, 16#00961#), -- DEVANAGARI LETTER QA .. DEVANAGARI LETTER VOCALIC LL (16#00985#, 16#0098C#), -- BENGALI LETTER A .. BENGALI LETTER VOCALIC L (16#0098F#, 16#00990#), -- BENGALI LETTER E .. BENGALI LETTER AI (16#00993#, 16#009A8#), -- BENGALI LETTER O .. BENGALI LETTER NA (16#009AA#, 16#009B0#), -- BENGALI LETTER PA .. BENGALI LETTER RA (16#009B2#, 16#009B2#), -- BENGALI LETTER LA .. BENGALI LETTER LA (16#009B6#, 16#009B9#), -- BENGALI LETTER SHA .. BENGALI LETTER HA (16#009BD#, 16#009BD#), -- BENGALI SIGN AVAGRAHA .. BENGALI SIGN AVAGRAHA (16#009DC#, 16#009DD#), -- BENGALI LETTER RRA .. BENGALI LETTER RHA (16#009DF#, 16#009E1#), -- BENGALI LETTER YYA .. BENGALI LETTER VOCALIC LL (16#009F0#, 16#009F1#), -- BENGALI LETTER RA WITH MIDDLE DIAGONAL .. BENGALI LETTER RA WITH LOWER DIAGONAL (16#00A05#, 16#00A0A#), -- GURMUKHI LETTER A .. GURMUKHI LETTER UU (16#00A0F#, 16#00A10#), -- GURMUKHI LETTER EE .. GURMUKHI LETTER AI (16#00A13#, 16#00A28#), -- GURMUKHI LETTER OO .. GURMUKHI LETTER NA (16#00A2A#, 16#00A30#), -- GURMUKHI LETTER PA .. GURMUKHI LETTER RA (16#00A32#, 16#00A33#), -- GURMUKHI LETTER LA .. GURMUKHI LETTER LLA (16#00A35#, 16#00A36#), -- GURMUKHI LETTER VA .. GURMUKHI LETTER SHA (16#00A38#, 16#00A39#), -- GURMUKHI LETTER SA .. GURMUKHI LETTER HA (16#00A59#, 16#00A5C#), -- GURMUKHI LETTER KHHA .. GURMUKHI LETTER RRA (16#00A5E#, 16#00A5E#), -- GURMUKHI LETTER FA .. GURMUKHI LETTER FA (16#00A72#, 16#00A74#), -- GURMUKHI IRI .. GURMUKHI EK ONKAR (16#00A85#, 16#00A8D#), -- GUJARATI LETTER A .. GUJARATI VOWEL CANDRA E (16#00A8F#, 16#00A91#), -- GUJARATI LETTER E .. GUJARATI VOWEL CANDRA O (16#00A93#, 16#00AA8#), -- GUJARATI LETTER O .. GUJARATI LETTER NA (16#00AAA#, 16#00AB0#), -- GUJARATI LETTER PA .. GUJARATI LETTER RA (16#00AB2#, 16#00AB3#), -- GUJARATI LETTER LA .. GUJARATI LETTER LLA (16#00AB5#, 16#00AB9#), -- GUJARATI LETTER VA .. GUJARATI LETTER HA (16#00ABD#, 16#00ABD#), -- GUJARATI SIGN AVAGRAHA .. GUJARATI SIGN AVAGRAHA (16#00AD0#, 16#00AD0#), -- GUJARATI OM .. GUJARATI OM (16#00AE0#, 16#00AE1#), -- GUJARATI LETTER VOCALIC RR .. GUJARATI LETTER VOCALIC LL (16#00B05#, 16#00B0C#), -- ORIYA LETTER A .. ORIYA LETTER VOCALIC L (16#00B0F#, 16#00B10#), -- ORIYA LETTER E .. ORIYA LETTER AI (16#00B13#, 16#00B28#), -- ORIYA LETTER O .. ORIYA LETTER NA (16#00B2A#, 16#00B30#), -- ORIYA LETTER PA .. ORIYA LETTER RA (16#00B32#, 16#00B33#), -- ORIYA LETTER LA .. ORIYA LETTER LLA (16#00B35#, 16#00B39#), -- ORIYA LETTER VA .. ORIYA LETTER HA (16#00B3D#, 16#00B3D#), -- ORIYA SIGN AVAGRAHA .. ORIYA SIGN AVAGRAHA (16#00B5C#, 16#00B5D#), -- ORIYA LETTER RRA .. ORIYA LETTER RHA (16#00B5F#, 16#00B61#), -- ORIYA LETTER YYA .. ORIYA LETTER VOCALIC LL (16#00B71#, 16#00B71#), -- ORIYA LETTER WA .. ORIYA LETTER WA (16#00B83#, 16#00B83#), -- TAMIL SIGN VISARGA .. TAMIL SIGN VISARGA (16#00B85#, 16#00B8A#), -- TAMIL LETTER A .. TAMIL LETTER UU (16#00B8E#, 16#00B90#), -- TAMIL LETTER E .. TAMIL LETTER AI (16#00B92#, 16#00B95#), -- TAMIL LETTER O .. TAMIL LETTER KA (16#00B99#, 16#00B9A#), -- TAMIL LETTER NGA .. TAMIL LETTER CA (16#00B9C#, 16#00B9C#), -- TAMIL LETTER JA .. TAMIL LETTER JA (16#00B9E#, 16#00B9F#), -- TAMIL LETTER NYA .. TAMIL LETTER TTA (16#00BA3#, 16#00BA4#), -- TAMIL LETTER NNA .. TAMIL LETTER TA (16#00BA8#, 16#00BAA#), -- TAMIL LETTER NA .. TAMIL LETTER PA (16#00BAE#, 16#00BB5#), -- TAMIL LETTER MA .. TAMIL LETTER VA (16#00BB7#, 16#00BB9#), -- TAMIL LETTER SSA .. TAMIL LETTER HA (16#00C05#, 16#00C0C#), -- TELUGU LETTER A .. TELUGU LETTER VOCALIC L (16#00C0E#, 16#00C10#), -- TELUGU LETTER E .. TELUGU LETTER AI (16#00C12#, 16#00C28#), -- TELUGU LETTER O .. TELUGU LETTER NA (16#00C2A#, 16#00C33#), -- TELUGU LETTER PA .. TELUGU LETTER LLA (16#00C35#, 16#00C39#), -- TELUGU LETTER VA .. TELUGU LETTER HA (16#00C60#, 16#00C61#), -- TELUGU LETTER VOCALIC RR .. TELUGU LETTER VOCALIC LL (16#00C85#, 16#00C8C#), -- KANNADA LETTER A .. KANNADA LETTER VOCALIC L (16#00C8E#, 16#00C90#), -- KANNADA LETTER E .. KANNADA LETTER AI (16#00C92#, 16#00CA8#), -- KANNADA LETTER O .. KANNADA LETTER NA (16#00CAA#, 16#00CB3#), -- KANNADA LETTER PA .. KANNADA LETTER LLA (16#00CB5#, 16#00CB9#), -- KANNADA LETTER VA .. KANNADA LETTER HA (16#00CBD#, 16#00CBD#), -- KANNADA SIGN AVAGRAHA .. KANNADA SIGN AVAGRAHA (16#00CDE#, 16#00CDE#), -- KANNADA LETTER FA .. KANNADA LETTER FA (16#00CE0#, 16#00CE1#), -- KANNADA LETTER VOCALIC RR .. KANNADA LETTER VOCALIC LL (16#00D05#, 16#00D0C#), -- MALAYALAM LETTER A .. MALAYALAM LETTER VOCALIC L (16#00D0E#, 16#00D10#), -- MALAYALAM LETTER E .. MALAYALAM LETTER AI (16#00D12#, 16#00D28#), -- MALAYALAM LETTER O .. MALAYALAM LETTER NA (16#00D2A#, 16#00D39#), -- MALAYALAM LETTER PA .. MALAYALAM LETTER HA (16#00D60#, 16#00D61#), -- MALAYALAM LETTER VOCALIC RR .. MALAYALAM LETTER VOCALIC LL (16#00D85#, 16#00D96#), -- SINHALA LETTER AYANNA .. SINHALA LETTER AUYANNA (16#00D9A#, 16#00DB1#), -- SINHALA LETTER ALPAPRAANA KAYANNA .. SINHALA LETTER DANTAJA NAYANNA (16#00DB3#, 16#00DBB#), -- SINHALA LETTER SANYAKA DAYANNA .. SINHALA LETTER RAYANNA (16#00DBD#, 16#00DBD#), -- SINHALA LETTER DANTAJA LAYANNA .. SINHALA LETTER DANTAJA LAYANNA (16#00DC0#, 16#00DC6#), -- SINHALA LETTER VAYANNA .. SINHALA LETTER FAYANNA (16#00E01#, 16#00E30#), -- THAI CHARACTER KO KAI .. THAI CHARACTER SARA A (16#00E32#, 16#00E33#), -- THAI CHARACTER SARA AA .. THAI CHARACTER SARA AM (16#00E40#, 16#00E46#), -- THAI CHARACTER SARA E .. THAI CHARACTER MAIYAMOK (16#00E81#, 16#00E82#), -- LAO LETTER KO .. LAO LETTER KHO SUNG (16#00E84#, 16#00E84#), -- LAO LETTER KHO TAM .. LAO LETTER KHO TAM (16#00E87#, 16#00E88#), -- LAO LETTER NGO .. LAO LETTER CO (16#00E8A#, 16#00E8A#), -- LAO LETTER SO TAM .. LAO LETTER SO TAM (16#00E8D#, 16#00E8D#), -- LAO LETTER NYO .. LAO LETTER NYO (16#00E94#, 16#00E97#), -- LAO LETTER DO .. LAO LETTER THO TAM (16#00E99#, 16#00E9F#), -- LAO LETTER NO .. LAO LETTER FO SUNG (16#00EA1#, 16#00EA3#), -- LAO LETTER MO .. LAO LETTER LO LING (16#00EA5#, 16#00EA5#), -- LAO LETTER LO LOOT .. LAO LETTER LO LOOT (16#00EA7#, 16#00EA7#), -- LAO LETTER WO .. LAO LETTER WO (16#00EAA#, 16#00EAB#), -- LAO LETTER SO SUNG .. LAO LETTER HO SUNG (16#00EAD#, 16#00EB0#), -- LAO LETTER O .. LAO VOWEL SIGN A (16#00EB2#, 16#00EB3#), -- LAO VOWEL SIGN AA .. LAO VOWEL SIGN AM (16#00EBD#, 16#00EBD#), -- LAO SEMIVOWEL SIGN NYO .. LAO SEMIVOWEL SIGN NYO (16#00EC0#, 16#00EC4#), -- LAO VOWEL SIGN E .. LAO VOWEL SIGN AI (16#00EC6#, 16#00EC6#), -- LAO KO LA .. LAO KO LA (16#00EDC#, 16#00EDD#), -- LAO HO NO .. LAO HO MO (16#00F00#, 16#00F00#), -- TIBETAN SYLLABLE OM .. TIBETAN SYLLABLE OM (16#00F40#, 16#00F47#), -- TIBETAN LETTER KA .. TIBETAN LETTER JA (16#00F49#, 16#00F6A#), -- TIBETAN LETTER NYA .. TIBETAN LETTER FIXED-FORM RA (16#00F88#, 16#00F8B#), -- TIBETAN SIGN LCE TSA CAN .. TIBETAN SIGN GRU MED RGYINGS (16#01000#, 16#01021#), -- MYANMAR LETTER KA .. MYANMAR LETTER A (16#01023#, 16#01027#), -- MYANMAR LETTER I .. MYANMAR LETTER E (16#01029#, 16#0102A#), -- MYANMAR LETTER O .. MYANMAR LETTER AU (16#01050#, 16#01055#), -- MYANMAR LETTER SHA .. MYANMAR LETTER VOCALIC LL (16#010A0#, 16#010C5#), -- GEORGIAN CAPITAL LETTER AN .. GEORGIAN CAPITAL LETTER HOE (16#010D0#, 16#010F8#), -- GEORGIAN LETTER AN .. GEORGIAN LETTER ELIFI (16#01100#, 16#01159#), -- HANGUL CHOSEONG KIYEOK .. HANGUL CHOSEONG YEORINHIEUH (16#0115F#, 16#011A2#), -- HANGUL CHOSEONG FILLER .. HANGUL JUNGSEONG SSANGARAEA (16#011A8#, 16#011F9#), -- HANGUL JONGSEONG KIYEOK .. HANGUL JONGSEONG YEORINHIEUH (16#01200#, 16#01206#), -- ETHIOPIC SYLLABLE HA .. ETHIOPIC SYLLABLE HO (16#01208#, 16#01246#), -- ETHIOPIC SYLLABLE LA .. ETHIOPIC SYLLABLE QO (16#01248#, 16#01248#), -- ETHIOPIC SYLLABLE QWA .. ETHIOPIC SYLLABLE QWA (16#0124A#, 16#0124D#), -- ETHIOPIC SYLLABLE QWI .. ETHIOPIC SYLLABLE QWE (16#01250#, 16#01256#), -- ETHIOPIC SYLLABLE QHA .. ETHIOPIC SYLLABLE QHO (16#01258#, 16#01258#), -- ETHIOPIC SYLLABLE QHWA .. ETHIOPIC SYLLABLE QHWA (16#0125A#, 16#0125D#), -- ETHIOPIC SYLLABLE QHWI .. ETHIOPIC SYLLABLE QHWE (16#01260#, 16#01286#), -- ETHIOPIC SYLLABLE BA .. ETHIOPIC SYLLABLE XO (16#01288#, 16#01288#), -- ETHIOPIC SYLLABLE XWA .. ETHIOPIC SYLLABLE XWA (16#0128A#, 16#0128D#), -- ETHIOPIC SYLLABLE XWI .. ETHIOPIC SYLLABLE XWE (16#01290#, 16#012AE#), -- ETHIOPIC SYLLABLE NA .. ETHIOPIC SYLLABLE KO (16#012B0#, 16#012B0#), -- ETHIOPIC SYLLABLE KWA .. ETHIOPIC SYLLABLE KWA (16#012B2#, 16#012B5#), -- ETHIOPIC SYLLABLE KWI .. ETHIOPIC SYLLABLE KWE (16#012B8#, 16#012BE#), -- ETHIOPIC SYLLABLE KXA .. ETHIOPIC SYLLABLE KXO (16#012C0#, 16#012C0#), -- ETHIOPIC SYLLABLE KXWA .. ETHIOPIC SYLLABLE KXWA (16#012C2#, 16#012C5#), -- ETHIOPIC SYLLABLE KXWI .. ETHIOPIC SYLLABLE KXWE (16#012C8#, 16#012CE#), -- ETHIOPIC SYLLABLE WA .. ETHIOPIC SYLLABLE WO (16#012D0#, 16#012D6#), -- ETHIOPIC SYLLABLE PHARYNGEAL A .. ETHIOPIC SYLLABLE PHARYNGEAL O (16#012D8#, 16#012EE#), -- ETHIOPIC SYLLABLE ZA .. ETHIOPIC SYLLABLE YO (16#012F0#, 16#0130E#), -- ETHIOPIC SYLLABLE DA .. ETHIOPIC SYLLABLE GO (16#01310#, 16#01310#), -- ETHIOPIC SYLLABLE GWA .. ETHIOPIC SYLLABLE GWA (16#01312#, 16#01315#), -- ETHIOPIC SYLLABLE GWI .. ETHIOPIC SYLLABLE GWE (16#01318#, 16#0131E#), -- ETHIOPIC SYLLABLE GGA .. ETHIOPIC SYLLABLE GGO (16#01320#, 16#01346#), -- ETHIOPIC SYLLABLE THA .. ETHIOPIC SYLLABLE TZO (16#01348#, 16#0135A#), -- ETHIOPIC SYLLABLE FA .. ETHIOPIC SYLLABLE FYA (16#013A0#, 16#013F4#), -- CHEROKEE LETTER A .. CHEROKEE LETTER YV (16#01401#, 16#0166C#), -- CANADIAN SYLLABICS E .. CANADIAN SYLLABICS CARRIER TTSA (16#0166F#, 16#01676#), -- CANADIAN SYLLABICS QAI .. CANADIAN SYLLABICS NNGAA (16#01681#, 16#0169A#), -- OGHAM LETTER BEITH .. OGHAM LETTER PEITH (16#016A0#, 16#016EA#), -- RUNIC LETTER FEHU FEOH FE F .. RUNIC LETTER X (16#016EE#, 16#016F0#), -- RUNIC ARLAUG SYMBOL .. RUNIC BELGTHOR SYMBOL (16#01700#, 16#0170C#), -- TAGALOG LETTER A .. TAGALOG LETTER YA (16#0170E#, 16#01711#), -- TAGALOG LETTER LA .. TAGALOG LETTER HA (16#01720#, 16#01731#), -- HANUNOO LETTER A .. HANUNOO LETTER HA (16#01740#, 16#01751#), -- BUHID LETTER A .. BUHID LETTER HA (16#01760#, 16#0176C#), -- TAGBANWA LETTER A .. TAGBANWA LETTER YA (16#0176E#, 16#01770#), -- TAGBANWA LETTER LA .. TAGBANWA LETTER SA (16#01780#, 16#017B3#), -- KHMER LETTER KA .. KHMER INDEPENDENT VOWEL QAU (16#017D7#, 16#017D7#), -- KHMER SIGN LEK TOO .. KHMER SIGN LEK TOO (16#017DC#, 16#017DC#), -- KHMER SIGN AVAKRAHASANYA .. KHMER SIGN AVAKRAHASANYA (16#01820#, 16#01877#), -- MONGOLIAN LETTER A .. MONGOLIAN LETTER MANCHU ZHA (16#01880#, 16#018A8#), -- MONGOLIAN LETTER ALI GALI ANUSVARA ONE .. MONGOLIAN LETTER MANCHU ALI GALI BHA (16#01900#, 16#0191C#), -- LIMBU VOWEL-CARRIER LETTER .. LIMBU LETTER HA (16#01950#, 16#0196D#), -- TAI LE LETTER KA .. TAI LE LETTER AI (16#01970#, 16#01974#), -- TAI LE LETTER TONE-2 .. TAI LE LETTER TONE-6 (16#01D00#, 16#01D6B#), -- LATIN LETTER SMALL CAPITAL A .. LATIN SMALL LETTER UE (16#01E00#, 16#01E9B#), -- LATIN CAPITAL LETTER A WITH RING BELOW .. LATIN SMALL LETTER LONG S WITH DOT ABOVE (16#01EA0#, 16#01EF9#), -- LATIN CAPITAL LETTER A WITH DOT BELOW .. LATIN SMALL LETTER Y WITH TILDE (16#01F00#, 16#01F15#), -- GREEK SMALL LETTER ALPHA WITH PSILI .. GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA (16#01F18#, 16#01F1D#), -- GREEK CAPITAL LETTER EPSILON WITH PSILI .. GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA (16#01F20#, 16#01F45#), -- GREEK SMALL LETTER ETA WITH PSILI .. GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA (16#01F48#, 16#01F4D#), -- GREEK CAPITAL LETTER OMICRON WITH PSILI .. GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA (16#01F50#, 16#01F57#), -- GREEK SMALL LETTER UPSILON WITH PSILI .. GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI (16#01F59#, 16#01F59#), -- GREEK CAPITAL LETTER UPSILON WITH DASIA .. GREEK CAPITAL LETTER UPSILON WITH DASIA (16#01F5B#, 16#01F5B#), -- GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA .. GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA (16#01F5D#, 16#01F5D#), -- GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA .. GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA (16#01F5F#, 16#01F7D#), -- GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI .. GREEK SMALL LETTER OMEGA WITH OXIA (16#01F80#, 16#01FB4#), -- GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI (16#01FB6#, 16#01FBC#), -- GREEK SMALL LETTER ALPHA WITH PERISPOMENI .. GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI (16#01FBE#, 16#01FBE#), -- GREEK PROSGEGRAMMENI .. GREEK PROSGEGRAMMENI (16#01FC2#, 16#01FC4#), -- GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI .. GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI (16#01FC6#, 16#01FCC#), -- GREEK SMALL LETTER ETA WITH PERISPOMENI .. GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI (16#01FD0#, 16#01FD3#), -- GREEK SMALL LETTER IOTA WITH VRACHY .. GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA (16#01FD6#, 16#01FDB#), -- GREEK SMALL LETTER IOTA WITH PERISPOMENI .. GREEK CAPITAL LETTER IOTA WITH OXIA (16#01FE0#, 16#01FEC#), -- GREEK SMALL LETTER UPSILON WITH VRACHY .. GREEK CAPITAL LETTER RHO WITH DASIA (16#01FF2#, 16#01FF4#), -- GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI .. GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI (16#01FF6#, 16#01FFC#), -- GREEK SMALL LETTER OMEGA WITH PERISPOMENI .. GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI (16#02071#, 16#02071#), -- SUPERSCRIPT LATIN SMALL LETTER I .. SUPERSCRIPT LATIN SMALL LETTER I (16#0207F#, 16#0207F#), -- SUPERSCRIPT LATIN SMALL LETTER N .. SUPERSCRIPT LATIN SMALL LETTER N (16#02102#, 16#02102#), -- DOUBLE-STRUCK CAPITAL C .. DOUBLE-STRUCK CAPITAL C (16#02107#, 16#02107#), -- EULER CONSTANT .. EULER CONSTANT (16#0210A#, 16#02113#), -- SCRIPT SMALL G .. SCRIPT SMALL L (16#02115#, 16#02115#), -- DOUBLE-STRUCK CAPITAL N .. DOUBLE-STRUCK CAPITAL N (16#02119#, 16#0211D#), -- DOUBLE-STRUCK CAPITAL P .. DOUBLE-STRUCK CAPITAL R (16#02124#, 16#02124#), -- DOUBLE-STRUCK CAPITAL Z .. DOUBLE-STRUCK CAPITAL Z (16#02126#, 16#02126#), -- OHM SIGN .. OHM SIGN (16#02128#, 16#02128#), -- BLACK-LETTER CAPITAL Z .. BLACK-LETTER CAPITAL Z (16#0212A#, 16#0212D#), -- KELVIN SIGN .. BLACK-LETTER CAPITAL C (16#0212F#, 16#02131#), -- SCRIPT SMALL E .. SCRIPT CAPITAL F (16#02133#, 16#02139#), -- SCRIPT CAPITAL M .. INFORMATION SOURCE (16#0213D#, 16#0213F#), -- DOUBLE-STRUCK SMALL GAMMA .. DOUBLE-STRUCK CAPITAL PI (16#02145#, 16#02149#), -- DOUBLE-STRUCK ITALIC CAPITAL D .. DOUBLE-STRUCK ITALIC SMALL J (16#02160#, 16#02183#), -- ROMAN NUMERAL ONE .. ROMAN NUMERAL REVERSED ONE HUNDRED (16#03005#, 16#03007#), -- IDEOGRAPHIC ITERATION MARK .. IDEOGRAPHIC NUMBER ZERO (16#03021#, 16#03029#), -- HANGZHOU NUMERAL ONE .. HANGZHOU NUMERAL NINE (16#03031#, 16#03035#), -- VERTICAL KANA REPEAT MARK .. VERTICAL KANA REPEAT MARK LOWER HALF (16#03038#, 16#0303C#), -- HANGZHOU NUMERAL TEN .. MASU MARK (16#03041#, 16#03096#), -- HIRAGANA LETTER SMALL A .. HIRAGANA LETTER SMALL KE (16#0309D#, 16#0309F#), -- HIRAGANA ITERATION MARK .. HIRAGANA DIGRAPH YORI (16#030A1#, 16#030FA#), -- KATAKANA LETTER SMALL A .. KATAKANA LETTER VO (16#030FC#, 16#030FF#), -- KATAKANA-HIRAGANA PROLONGED SOUND MARK .. KATAKANA DIGRAPH KOTO (16#03105#, 16#0312C#), -- BOPOMOFO LETTER B .. BOPOMOFO LETTER GN (16#03131#, 16#0318E#), -- HANGUL LETTER KIYEOK .. HANGUL LETTER ARAEAE (16#031A0#, 16#031B7#), -- BOPOMOFO LETTER BU .. BOPOMOFO FINAL LETTER H (16#031F0#, 16#031FF#), -- KATAKANA LETTER SMALL KU .. KATAKANA LETTER SMALL RO (16#03400#, 16#03400#), -- .. (16#04DB5#, 16#04DB5#), -- .. (16#04E00#, 16#04E00#), -- .. (16#09FA5#, 16#09FA5#), -- .. (16#0A000#, 16#0A48C#), -- YI SYLLABLE IT .. YI SYLLABLE YYR (16#0AC00#, 16#0AC00#), -- .. (16#0D7A3#, 16#0D7A3#), -- .. (16#0F900#, 16#0FA2D#), -- CJK COMPATIBILITY IDEOGRAPH-F900 .. CJK COMPATIBILITY IDEOGRAPH-FA2D (16#0FA30#, 16#0FA6A#), -- CJK COMPATIBILITY IDEOGRAPH-FA30 .. CJK COMPATIBILITY IDEOGRAPH-FA6A (16#0FB00#, 16#0FB06#), -- LATIN SMALL LIGATURE FF .. LATIN SMALL LIGATURE ST (16#0FB13#, 16#0FB17#), -- ARMENIAN SMALL LIGATURE MEN NOW .. ARMENIAN SMALL LIGATURE MEN XEH (16#0FB1D#, 16#0FB1D#), -- HEBREW LETTER YOD WITH HIRIQ .. HEBREW LETTER YOD WITH HIRIQ (16#0FB1F#, 16#0FB28#), -- HEBREW LIGATURE YIDDISH YOD YOD PATAH .. HEBREW LETTER WIDE TAV (16#0FB2A#, 16#0FB36#), -- HEBREW LETTER SHIN WITH SHIN DOT .. HEBREW LETTER ZAYIN WITH DAGESH (16#0FB38#, 16#0FB3C#), -- HEBREW LETTER TET WITH DAGESH .. HEBREW LETTER LAMED WITH DAGESH (16#0FB3E#, 16#0FB3E#), -- HEBREW LETTER MEM WITH DAGESH .. HEBREW LETTER MEM WITH DAGESH (16#0FB40#, 16#0FB41#), -- HEBREW LETTER NUN WITH DAGESH .. HEBREW LETTER SAMEKH WITH DAGESH (16#0FB43#, 16#0FB44#), -- HEBREW LETTER FINAL PE WITH DAGESH .. HEBREW LETTER PE WITH DAGESH (16#0FB46#, 16#0FBB1#), -- HEBREW LETTER TSADI WITH DAGESH .. ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM (16#0FBD3#, 16#0FD3D#), -- ARABIC LETTER NG ISOLATED FORM .. ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM (16#0FD50#, 16#0FD8F#), -- ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM .. ARABIC LIGATURE MEEM WITH KHAH WITH MEEM INITIAL FORM (16#0FD92#, 16#0FDC7#), -- ARABIC LIGATURE MEEM WITH JEEM WITH KHAH INITIAL FORM .. ARABIC LIGATURE NOON WITH JEEM WITH YEH FINAL FORM (16#0FDF0#, 16#0FDFB#), -- ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM .. ARABIC LIGATURE JALLAJALALOUHOU (16#0FE70#, 16#0FE74#), -- ARABIC FATHATAN ISOLATED FORM .. ARABIC KASRATAN ISOLATED FORM (16#0FE76#, 16#0FEFC#), -- ARABIC FATHA ISOLATED FORM .. ARABIC LIGATURE LAM WITH ALEF FINAL FORM (16#0FF21#, 16#0FF3A#), -- FULLWIDTH LATIN CAPITAL LETTER A .. FULLWIDTH LATIN CAPITAL LETTER Z (16#0FF41#, 16#0FF5A#), -- FULLWIDTH LATIN SMALL LETTER A .. FULLWIDTH LATIN SMALL LETTER Z (16#0FF66#, 16#0FFBE#), -- HALFWIDTH KATAKANA LETTER WO .. HALFWIDTH HANGUL LETTER HIEUH (16#0FFC2#, 16#0FFC7#), -- HALFWIDTH HANGUL LETTER A .. HALFWIDTH HANGUL LETTER E (16#0FFCA#, 16#0FFCF#), -- HALFWIDTH HANGUL LETTER YEO .. HALFWIDTH HANGUL LETTER OE (16#0FFD2#, 16#0FFD7#), -- HALFWIDTH HANGUL LETTER YO .. HALFWIDTH HANGUL LETTER YU (16#0FFDA#, 16#0FFDC#), -- HALFWIDTH HANGUL LETTER EU .. HALFWIDTH HANGUL LETTER I (16#10000#, 16#1000B#), -- LINEAR B SYLLABLE B008 A .. LINEAR B SYLLABLE B046 JE (16#1000D#, 16#10026#), -- LINEAR B SYLLABLE B036 JO .. LINEAR B SYLLABLE B032 QO (16#10028#, 16#1003A#), -- LINEAR B SYLLABLE B060 RA .. LINEAR B SYLLABLE B042 WO (16#1003C#, 16#1003D#), -- LINEAR B SYLLABLE B017 ZA .. LINEAR B SYLLABLE B074 ZE (16#1003F#, 16#1004D#), -- LINEAR B SYLLABLE B020 ZO .. LINEAR B SYLLABLE B091 TWO (16#10050#, 16#1005D#), -- LINEAR B SYMBOL B018 .. LINEAR B SYMBOL B089 (16#10080#, 16#100FA#), -- LINEAR B IDEOGRAM B100 MAN .. LINEAR B IDEOGRAM VESSEL B305 (16#10300#, 16#1031E#), -- OLD ITALIC LETTER A .. OLD ITALIC LETTER UU (16#10330#, 16#1034A#), -- GOTHIC LETTER AHSA .. GOTHIC LETTER NINE HUNDRED (16#10380#, 16#1039D#), -- UGARITIC LETTER ALPA .. UGARITIC LETTER SSU (16#10400#, 16#1049D#), -- DESERET CAPITAL LETTER LONG I .. OSMANYA LETTER OO (16#10800#, 16#10805#), -- CYPRIOT SYLLABLE A .. CYPRIOT SYLLABLE JA (16#10808#, 16#10808#), -- CYPRIOT SYLLABLE JO .. CYPRIOT SYLLABLE JO (16#1080A#, 16#10835#), -- CYPRIOT SYLLABLE KA .. CYPRIOT SYLLABLE WO (16#10837#, 16#10838#), -- CYPRIOT SYLLABLE XA .. CYPRIOT SYLLABLE XE (16#1083C#, 16#1083C#), -- CYPRIOT SYLLABLE ZA .. CYPRIOT SYLLABLE ZA (16#1083F#, 16#1083F#), -- CYPRIOT SYLLABLE ZO .. CYPRIOT SYLLABLE ZO (16#1D400#, 16#1D454#), -- MATHEMATICAL BOLD CAPITAL A .. MATHEMATICAL ITALIC SMALL G (16#1D456#, 16#1D49C#), -- MATHEMATICAL ITALIC SMALL I .. MATHEMATICAL SCRIPT CAPITAL A (16#1D49E#, 16#1D49F#), -- MATHEMATICAL SCRIPT CAPITAL C .. MATHEMATICAL SCRIPT CAPITAL D (16#1D4A2#, 16#1D4A2#), -- MATHEMATICAL SCRIPT CAPITAL G .. MATHEMATICAL SCRIPT CAPITAL G (16#1D4A5#, 16#1D4A6#), -- MATHEMATICAL SCRIPT CAPITAL J .. MATHEMATICAL SCRIPT CAPITAL K (16#1D4A9#, 16#1D4AC#), -- MATHEMATICAL SCRIPT CAPITAL N .. MATHEMATICAL SCRIPT CAPITAL Q (16#1D4AE#, 16#1D4B9#), -- MATHEMATICAL SCRIPT CAPITAL S .. MATHEMATICAL SCRIPT SMALL D (16#1D4BB#, 16#1D4BB#), -- MATHEMATICAL SCRIPT SMALL F .. MATHEMATICAL SCRIPT SMALL F (16#1D4BD#, 16#1D4C3#), -- MATHEMATICAL SCRIPT SMALL H .. MATHEMATICAL SCRIPT SMALL N (16#1D4C5#, 16#1D505#), -- MATHEMATICAL SCRIPT SMALL P .. MATHEMATICAL FRAKTUR CAPITAL B (16#1D507#, 16#1D50A#), -- MATHEMATICAL FRAKTUR CAPITAL D .. MATHEMATICAL FRAKTUR CAPITAL G (16#1D50D#, 16#1D514#), -- MATHEMATICAL FRAKTUR CAPITAL J .. MATHEMATICAL FRAKTUR CAPITAL Q (16#1D516#, 16#1D51C#), -- MATHEMATICAL FRAKTUR CAPITAL S .. MATHEMATICAL FRAKTUR CAPITAL Y (16#1D51E#, 16#1D539#), -- MATHEMATICAL FRAKTUR SMALL A .. MATHEMATICAL DOUBLE-STRUCK CAPITAL B (16#1D53B#, 16#1D53E#), -- MATHEMATICAL DOUBLE-STRUCK CAPITAL D .. MATHEMATICAL DOUBLE-STRUCK CAPITAL G (16#1D540#, 16#1D544#), -- MATHEMATICAL DOUBLE-STRUCK CAPITAL I .. MATHEMATICAL DOUBLE-STRUCK CAPITAL M (16#1D546#, 16#1D546#), -- MATHEMATICAL DOUBLE-STRUCK CAPITAL O .. MATHEMATICAL DOUBLE-STRUCK CAPITAL O (16#1D54A#, 16#1D550#), -- MATHEMATICAL DOUBLE-STRUCK CAPITAL S .. MATHEMATICAL DOUBLE-STRUCK CAPITAL Y (16#1D552#, 16#1D6A3#), -- MATHEMATICAL DOUBLE-STRUCK SMALL A .. MATHEMATICAL MONOSPACE SMALL Z (16#1D6A8#, 16#1D6C0#), -- MATHEMATICAL BOLD CAPITAL ALPHA .. MATHEMATICAL BOLD CAPITAL OMEGA (16#1D6C2#, 16#1D6DA#), -- MATHEMATICAL BOLD SMALL ALPHA .. MATHEMATICAL BOLD SMALL OMEGA (16#1D6DC#, 16#1D6FA#), -- MATHEMATICAL BOLD EPSILON SYMBOL .. MATHEMATICAL ITALIC CAPITAL OMEGA (16#1D6FC#, 16#1D714#), -- MATHEMATICAL ITALIC SMALL ALPHA .. MATHEMATICAL ITALIC SMALL OMEGA (16#1D716#, 16#1D734#), -- MATHEMATICAL ITALIC EPSILON SYMBOL .. MATHEMATICAL BOLD ITALIC CAPITAL OMEGA (16#1D736#, 16#1D74E#), -- MATHEMATICAL BOLD ITALIC SMALL ALPHA .. MATHEMATICAL BOLD ITALIC SMALL OMEGA (16#1D750#, 16#1D76E#), -- MATHEMATICAL BOLD ITALIC EPSILON SYMBOL .. MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA (16#1D770#, 16#1D788#), -- MATHEMATICAL SANS-SERIF BOLD SMALL ALPHA .. MATHEMATICAL SANS-SERIF BOLD SMALL OMEGA (16#1D78A#, 16#1D7A8#), -- MATHEMATICAL SANS-SERIF BOLD EPSILON SYMBOL .. MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA (16#1D7AA#, 16#1D7C2#), -- MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA .. MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA (16#1D7C4#, 16#1D7C9#), -- MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL .. MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL (16#20000#, 16#20000#), -- .. (16#2A6D6#, 16#2A6D6#), -- .. (16#2F800#, 16#2FA1D#)); -- CJK COMPATIBILITY IDEOGRAPH-2F800 .. CJK COMPATIBILITY IDEOGRAPH-2FA1D -- The following table includes all characters considered spaces, i.e. -- all characters from the Unicode table with categories: -- Separator, Space (Zs) UTF_32_Spaces : constant UTF_32_Ranges := ( (16#00020#, 16#00020#), -- SPACE .. SPACE (16#000A0#, 16#000A0#), -- NO-BREAK SPACE .. NO-BREAK SPACE (16#01680#, 16#01680#), -- OGHAM SPACE MARK .. OGHAM SPACE MARK (16#02000#, 16#0200B#), -- EN QUAD .. ZERO WIDTH SPACE (16#0202F#, 16#0202F#), -- NARROW NO-BREAK SPACE .. NARROW NO-BREAK SPACE (16#0205F#, 16#0205F#), -- MEDIUM MATHEMATICAL SPACE .. MEDIUM MATHEMATICAL SPACE (16#03000#, 16#03000#)); -- IDEOGRAPHIC SPACE .. IDEOGRAPHIC SPACE -- The following table includes all characters considered punctuation, -- i.e. all characters from the Unicode table with categories: -- Punctuation, Connector (Pc) UTF_32_Punctuation : constant UTF_32_Ranges := ( (16#0005F#, 16#0005F#), -- LOW LINE .. LOW LINE (16#0203F#, 16#02040#), -- UNDERTIE .. CHARACTER TIE (16#02054#, 16#02054#), -- INVERTED UNDERTIE .. INVERTED UNDERTIE (16#030FB#, 16#030FB#), -- KATAKANA MIDDLE DOT .. KATAKANA MIDDLE DOT (16#0FE33#, 16#0FE34#), -- PRESENTATION FORM FOR VERTICAL LOW LINE .. PRESENTATION FORM FOR VERTICAL WAVY LOW LINE (16#0FE4D#, 16#0FE4F#), -- DASHED LOW LINE .. WAVY LOW LINE (16#0FF3F#, 16#0FF3F#), -- FULLWIDTH LOW LINE .. FULLWIDTH LOW LINE (16#0FF65#, 16#0FF65#)); -- HALFWIDTH KATAKANA MIDDLE DOT .. HALFWIDTH KATAKANA MIDDLE DOT -- The following table includes all characters considered as other format, -- i.e. all characters from the Unicode table with categories: -- Other, Format (Cf) UTF_32_Other_Format : constant UTF_32_Ranges := ( (16#000AD#, 16#000AD#), -- SOFT HYPHEN .. SOFT HYPHEN (16#00600#, 16#00603#), -- ARABIC NUMBER SIGN .. ARABIC SIGN SAFHA (16#006DD#, 16#006DD#), -- ARABIC END OF AYAH .. ARABIC END OF AYAH (16#0070F#, 16#0070F#), -- SYRIAC ABBREVIATION MARK .. SYRIAC ABBREVIATION MARK (16#017B4#, 16#017B5#), -- KHMER VOWEL INHERENT AQ .. KHMER VOWEL INHERENT AA (16#0200C#, 16#0200F#), -- ZERO WIDTH NON-JOINER .. RIGHT-TO-LEFT MARK (16#0202A#, 16#0202E#), -- LEFT-TO-RIGHT EMBEDDING .. RIGHT-TO-LEFT OVERRIDE (16#02060#, 16#02063#), -- WORD JOINER .. INVISIBLE SEPARATOR (16#0206A#, 16#0206F#), -- INHIBIT SYMMETRIC SWAPPING .. NOMINAL DIGIT SHAPES (16#0FEFF#, 16#0FEFF#), -- ZERO WIDTH NO-BREAK SPACE .. ZERO WIDTH NO-BREAK SPACE (16#0FFF9#, 16#0FFFB#), -- INTERLINEAR ANNOTATION ANCHOR .. INTERLINEAR ANNOTATION TERMINATOR (16#1D173#, 16#1D17A#), -- MUSICAL SYMBOL BEGIN BEAM .. MUSICAL SYMBOL END PHRASE (16#E0001#, 16#E0001#), -- LANGUAGE TAG .. LANGUAGE TAG (16#E0020#, 16#E007F#)); -- TAG SPACE .. CANCEL TAG -- The following table includes all characters considered marks i.e. -- all characters from the Unicode table with categories: -- Mark, Nonspacing (Mn) -- Mark, Spacing Combining (Mc) UTF_32_Marks : constant UTF_32_Ranges := ( (16#00300#, 16#00357#), -- COMBINING GRAVE ACCENT .. COMBINING RIGHT HALF RING ABOVE (16#0035D#, 16#0036F#), -- COMBINING DOUBLE BREVE .. COMBINING LATIN SMALL LETTER X (16#00483#, 16#00486#), -- COMBINING CYRILLIC TITLO .. COMBINING CYRILLIC PSILI PNEUMATA (16#00591#, 16#005A1#), -- HEBREW ACCENT ETNAHTA .. HEBREW ACCENT PAZER (16#005A3#, 16#005B9#), -- HEBREW ACCENT MUNAH .. HEBREW POINT HOLAM (16#005BB#, 16#005BD#), -- HEBREW POINT QUBUTS .. HEBREW POINT METEG (16#005BF#, 16#005BF#), -- HEBREW POINT RAFE .. HEBREW POINT RAFE (16#005C1#, 16#005C2#), -- HEBREW POINT SHIN DOT .. HEBREW POINT SIN DOT (16#005C4#, 16#005C4#), -- HEBREW MARK UPPER DOT .. HEBREW MARK UPPER DOT (16#00610#, 16#00615#), -- ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM .. ARABIC SMALL HIGH TAH (16#0064B#, 16#00658#), -- ARABIC FATHATAN .. ARABIC MARK NOON GHUNNA (16#00670#, 16#00670#), -- ARABIC LETTER SUPERSCRIPT ALEF .. ARABIC LETTER SUPERSCRIPT ALEF (16#006D6#, 16#006DC#), -- ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA .. ARABIC SMALL HIGH SEEN (16#006DF#, 16#006E4#), -- ARABIC SMALL HIGH ROUNDED ZERO .. ARABIC SMALL HIGH MADDA (16#006E7#, 16#006E8#), -- ARABIC SMALL HIGH YEH .. ARABIC SMALL HIGH NOON (16#006EA#, 16#006ED#), -- ARABIC EMPTY CENTRE LOW STOP .. ARABIC SMALL LOW MEEM (16#00711#, 16#00711#), -- SYRIAC LETTER SUPERSCRIPT ALAPH .. SYRIAC LETTER SUPERSCRIPT ALAPH (16#00730#, 16#0074A#), -- SYRIAC PTHAHA ABOVE .. SYRIAC BARREKH (16#007A6#, 16#007B0#), -- THAANA ABAFILI .. THAANA SUKUN (16#00901#, 16#00903#), -- DEVANAGARI SIGN CANDRABINDU .. DEVANAGARI SIGN VISARGA (16#0093C#, 16#0093C#), -- DEVANAGARI SIGN NUKTA .. DEVANAGARI SIGN NUKTA (16#0093E#, 16#0094D#), -- DEVANAGARI VOWEL SIGN AA .. DEVANAGARI SIGN VIRAMA (16#00951#, 16#00954#), -- DEVANAGARI STRESS SIGN UDATTA .. DEVANAGARI ACUTE ACCENT (16#00962#, 16#00963#), -- DEVANAGARI VOWEL SIGN VOCALIC L .. DEVANAGARI VOWEL SIGN VOCALIC LL (16#00981#, 16#00983#), -- BENGALI SIGN CANDRABINDU .. BENGALI SIGN VISARGA (16#009BC#, 16#009BC#), -- BENGALI SIGN NUKTA .. BENGALI SIGN NUKTA (16#009BE#, 16#009C4#), -- BENGALI VOWEL SIGN AA .. BENGALI VOWEL SIGN VOCALIC RR (16#009C7#, 16#009C8#), -- BENGALI VOWEL SIGN E .. BENGALI VOWEL SIGN AI (16#009CB#, 16#009CD#), -- BENGALI VOWEL SIGN O .. BENGALI SIGN VIRAMA (16#009D7#, 16#009D7#), -- BENGALI AU LENGTH MARK .. BENGALI AU LENGTH MARK (16#009E2#, 16#009E3#), -- BENGALI VOWEL SIGN VOCALIC L .. BENGALI VOWEL SIGN VOCALIC LL (16#00A01#, 16#00A03#), -- GURMUKHI SIGN ADAK BINDI .. GURMUKHI SIGN VISARGA (16#00A3C#, 16#00A3C#), -- GURMUKHI SIGN NUKTA .. GURMUKHI SIGN NUKTA (16#00A3E#, 16#00A42#), -- GURMUKHI VOWEL SIGN AA .. GURMUKHI VOWEL SIGN UU (16#00A47#, 16#00A48#), -- GURMUKHI VOWEL SIGN EE .. GURMUKHI VOWEL SIGN AI (16#00A4B#, 16#00A4D#), -- GURMUKHI VOWEL SIGN OO .. GURMUKHI SIGN VIRAMA (16#00A70#, 16#00A71#), -- GURMUKHI TIPPI .. GURMUKHI ADDAK (16#00A81#, 16#00A83#), -- GUJARATI SIGN CANDRABINDU .. GUJARATI SIGN VISARGA (16#00ABC#, 16#00ABC#), -- GUJARATI SIGN NUKTA .. GUJARATI SIGN NUKTA (16#00ABE#, 16#00AC5#), -- GUJARATI VOWEL SIGN AA .. GUJARATI VOWEL SIGN CANDRA E (16#00AC7#, 16#00AC9#), -- GUJARATI VOWEL SIGN E .. GUJARATI VOWEL SIGN CANDRA O (16#00ACB#, 16#00ACD#), -- GUJARATI VOWEL SIGN O .. GUJARATI SIGN VIRAMA (16#00AE2#, 16#00AE3#), -- GUJARATI VOWEL SIGN VOCALIC L .. GUJARATI VOWEL SIGN VOCALIC LL (16#00B01#, 16#00B03#), -- ORIYA SIGN CANDRABINDU .. ORIYA SIGN VISARGA (16#00B3C#, 16#00B3C#), -- ORIYA SIGN NUKTA .. ORIYA SIGN NUKTA (16#00B3E#, 16#00B43#), -- ORIYA VOWEL SIGN AA .. ORIYA VOWEL SIGN VOCALIC R (16#00B47#, 16#00B48#), -- ORIYA VOWEL SIGN E .. ORIYA VOWEL SIGN AI (16#00B4B#, 16#00B4D#), -- ORIYA VOWEL SIGN O .. ORIYA SIGN VIRAMA (16#00B56#, 16#00B57#), -- ORIYA AI LENGTH MARK .. ORIYA AU LENGTH MARK (16#00B82#, 16#00B82#), -- TAMIL SIGN ANUSVARA .. TAMIL SIGN ANUSVARA (16#00BBE#, 16#00BC2#), -- TAMIL VOWEL SIGN AA .. TAMIL VOWEL SIGN UU (16#00BC6#, 16#00BC8#), -- TAMIL VOWEL SIGN E .. TAMIL VOWEL SIGN AI (16#00BCA#, 16#00BCD#), -- TAMIL VOWEL SIGN O .. TAMIL SIGN VIRAMA (16#00BD7#, 16#00BD7#), -- TAMIL AU LENGTH MARK .. TAMIL AU LENGTH MARK (16#00C01#, 16#00C03#), -- TELUGU SIGN CANDRABINDU .. TELUGU SIGN VISARGA (16#00C3E#, 16#00C44#), -- TELUGU VOWEL SIGN AA .. TELUGU VOWEL SIGN VOCALIC RR (16#00C46#, 16#00C48#), -- TELUGU VOWEL SIGN E .. TELUGU VOWEL SIGN AI (16#00C4A#, 16#00C4D#), -- TELUGU VOWEL SIGN O .. TELUGU SIGN VIRAMA (16#00C55#, 16#00C56#), -- TELUGU LENGTH MARK .. TELUGU AI LENGTH MARK (16#00C82#, 16#00C83#), -- KANNADA SIGN ANUSVARA .. KANNADA SIGN VISARGA (16#00CBC#, 16#00CBC#), -- KANNADA SIGN NUKTA .. KANNADA SIGN NUKTA (16#00CBE#, 16#00CC4#), -- KANNADA VOWEL SIGN AA .. KANNADA VOWEL SIGN VOCALIC RR (16#00CC6#, 16#00CC8#), -- KANNADA VOWEL SIGN E .. KANNADA VOWEL SIGN AI (16#00CCA#, 16#00CCD#), -- KANNADA VOWEL SIGN O .. KANNADA SIGN VIRAMA (16#00CD5#, 16#00CD6#), -- KANNADA LENGTH MARK .. KANNADA AI LENGTH MARK (16#00D02#, 16#00D03#), -- MALAYALAM SIGN ANUSVARA .. MALAYALAM SIGN VISARGA (16#00D3E#, 16#00D43#), -- MALAYALAM VOWEL SIGN AA .. MALAYALAM VOWEL SIGN VOCALIC R (16#00D46#, 16#00D48#), -- MALAYALAM VOWEL SIGN E .. MALAYALAM VOWEL SIGN AI (16#00D4A#, 16#00D4D#), -- MALAYALAM VOWEL SIGN O .. MALAYALAM SIGN VIRAMA (16#00D57#, 16#00D57#), -- MALAYALAM AU LENGTH MARK .. MALAYALAM AU LENGTH MARK (16#00D82#, 16#00D83#), -- SINHALA SIGN ANUSVARAYA .. SINHALA SIGN VISARGAYA (16#00DCA#, 16#00DCA#), -- SINHALA SIGN AL-LAKUNA .. SINHALA SIGN AL-LAKUNA (16#00DCF#, 16#00DD4#), -- SINHALA VOWEL SIGN AELA-PILLA .. SINHALA VOWEL SIGN KETTI PAA-PILLA (16#00DD6#, 16#00DD6#), -- SINHALA VOWEL SIGN DIGA PAA-PILLA .. SINHALA VOWEL SIGN DIGA PAA-PILLA (16#00DD8#, 16#00DDF#), -- SINHALA VOWEL SIGN GAETTA-PILLA .. SINHALA VOWEL SIGN GAYANUKITTA (16#00DF2#, 16#00DF3#), -- SINHALA VOWEL SIGN DIGA GAETTA-PILLA .. SINHALA VOWEL SIGN DIGA GAYANUKITTA (16#00E31#, 16#00E31#), -- THAI CHARACTER MAI HAN-AKAT .. THAI CHARACTER MAI HAN-AKAT (16#00E34#, 16#00E3A#), -- THAI CHARACTER SARA I .. THAI CHARACTER PHINTHU (16#00E47#, 16#00E4E#), -- THAI CHARACTER MAITAIKHU .. THAI CHARACTER YAMAKKAN (16#00EB1#, 16#00EB1#), -- LAO VOWEL SIGN MAI KAN .. LAO VOWEL SIGN MAI KAN (16#00EB4#, 16#00EB9#), -- LAO VOWEL SIGN I .. LAO VOWEL SIGN UU (16#00EBB#, 16#00EBC#), -- LAO VOWEL SIGN MAI KON .. LAO SEMIVOWEL SIGN LO (16#00EC8#, 16#00ECD#), -- LAO TONE MAI EK .. LAO NIGGAHITA (16#00F18#, 16#00F19#), -- TIBETAN ASTROLOGICAL SIGN -KHYUD PA .. TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS (16#00F35#, 16#00F35#), -- TIBETAN MARK NGAS BZUNG NYI ZLA .. TIBETAN MARK NGAS BZUNG NYI ZLA (16#00F37#, 16#00F37#), -- TIBETAN MARK NGAS BZUNG SGOR RTAGS .. TIBETAN MARK NGAS BZUNG SGOR RTAGS (16#00F39#, 16#00F39#), -- TIBETAN MARK TSA -PHRU .. TIBETAN MARK TSA -PHRU (16#00F3E#, 16#00F3F#), -- TIBETAN SIGN YAR TSHES .. TIBETAN SIGN MAR TSHES (16#00F71#, 16#00F84#), -- TIBETAN VOWEL SIGN AA .. TIBETAN MARK HALANTA (16#00F86#, 16#00F87#), -- TIBETAN SIGN LCI RTAGS .. TIBETAN SIGN YANG RTAGS (16#00F90#, 16#00F97#), -- TIBETAN SUBJOINED LETTER KA .. TIBETAN SUBJOINED LETTER JA (16#00F99#, 16#00FBC#), -- TIBETAN SUBJOINED LETTER NYA .. TIBETAN SUBJOINED LETTER FIXED-FORM RA (16#00FC6#, 16#00FC6#), -- TIBETAN SYMBOL PADMA GDAN .. TIBETAN SYMBOL PADMA GDAN (16#0102C#, 16#01032#), -- MYANMAR VOWEL SIGN AA .. MYANMAR VOWEL SIGN AI (16#01036#, 16#01039#), -- MYANMAR SIGN ANUSVARA .. MYANMAR SIGN VIRAMA (16#01056#, 16#01059#), -- MYANMAR VOWEL SIGN VOCALIC R .. MYANMAR VOWEL SIGN VOCALIC LL (16#01712#, 16#01714#), -- TAGALOG VOWEL SIGN I .. TAGALOG SIGN VIRAMA (16#01732#, 16#01734#), -- HANUNOO VOWEL SIGN I .. HANUNOO SIGN PAMUDPOD (16#01752#, 16#01753#), -- BUHID VOWEL SIGN I .. BUHID VOWEL SIGN U (16#01772#, 16#01773#), -- TAGBANWA VOWEL SIGN I .. TAGBANWA VOWEL SIGN U (16#017B6#, 16#017D3#), -- KHMER VOWEL SIGN AA .. KHMER SIGN BATHAMASAT (16#017DD#, 16#017DD#), -- KHMER SIGN ATTHACAN .. KHMER SIGN ATTHACAN (16#0180B#, 16#0180D#), -- MONGOLIAN FREE VARIATION SELECTOR ONE .. MONGOLIAN FREE VARIATION SELECTOR THREE (16#018A9#, 16#018A9#), -- MONGOLIAN LETTER ALI GALI DAGALGA .. MONGOLIAN LETTER ALI GALI DAGALGA (16#01920#, 16#0192B#), -- LIMBU VOWEL SIGN A .. LIMBU SUBJOINED LETTER WA (16#01930#, 16#0193B#), -- LIMBU SMALL LETTER KA .. LIMBU SIGN SA-I (16#020D0#, 16#020DC#), -- COMBINING LEFT HARPOON ABOVE .. COMBINING FOUR DOTS ABOVE (16#020E1#, 16#020E1#), -- COMBINING LEFT RIGHT ARROW ABOVE .. COMBINING LEFT RIGHT ARROW ABOVE (16#020E5#, 16#020EA#), -- COMBINING REVERSE SOLIDUS OVERLAY .. COMBINING LEFTWARDS ARROW OVERLAY (16#0302A#, 16#0302F#), -- IDEOGRAPHIC LEVEL TONE MARK .. HANGUL DOUBLE DOT TONE MARK (16#03099#, 16#0309A#), -- COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK .. COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK (16#0FB1E#, 16#0FB1E#), -- HEBREW POINT JUDEO-SPANISH VARIKA .. HEBREW POINT JUDEO-SPANISH VARIKA (16#0FE00#, 16#0FE0F#), -- VARIATION SELECTOR-1 .. VARIATION SELECTOR-16 (16#0FE20#, 16#0FE23#), -- COMBINING LIGATURE LEFT HALF .. COMBINING DOUBLE TILDE RIGHT HALF (16#1D165#, 16#1D169#), -- MUSICAL SYMBOL COMBINING STEM .. MUSICAL SYMBOL COMBINING TREMOLO-3 (16#1D16D#, 16#1D172#), -- MUSICAL SYMBOL COMBINING AUGMENTATION DOT .. MUSICAL SYMBOL COMBINING FLAG-5 (16#1D17B#, 16#1D182#), -- MUSICAL SYMBOL COMBINING ACCENT .. MUSICAL SYMBOL COMBINING LOURE (16#1D185#, 16#1D18B#), -- MUSICAL SYMBOL COMBINING DOIT .. MUSICAL SYMBOL COMBINING TRIPLE TONGUE (16#1D1AA#, 16#1D1AD#), -- MUSICAL SYMBOL COMBINING DOWN BOW .. MUSICAL SYMBOL COMBINING SNAP PIZZICATO (16#E0100#, 16#E01EF#)); -- VARIATION SELECTOR-17 .. VARIATION SELECTOR-256 -- The following table includes all characters considered non-graphic, -- i.e. all characters from the Unicode table with categories: -- Other, Control (Cc) -- Other, Private Use (Co) -- Other, Surrogate (Cs) -- Other, Format (Cf) -- Separator, Line (Zl) -- Separator, Paragraph (Zp) -- In addition, the characters FFFE and FFFF are excluded. Note that the -- defined Ada category of format effector is subsumed by the above set -- of Unicode categories. UTF_32_Non_Graphic : constant UTF_32_Ranges := ( (16#00000#, 16#0001F#), -- .. (16#0007F#, 16#0009F#), -- .. (16#000AD#, 16#000AD#), -- SOFT HYPHEN .. SOFT HYPHEN (16#00600#, 16#00603#), -- ARABIC NUMBER SIGN .. ARABIC SIGN SAFHA (16#006DD#, 16#006DD#), -- ARABIC END OF AYAH .. ARABIC END OF AYAH (16#0070F#, 16#0070F#), -- SYRIAC ABBREVIATION MARK .. SYRIAC ABBREVIATION MARK (16#017B4#, 16#017B5#), -- KHMER VOWEL INHERENT AQ .. KHMER VOWEL INHERENT AA (16#0200C#, 16#0200F#), -- ZERO WIDTH NON-JOINER .. RIGHT-TO-LEFT MARK (16#02028#, 16#0202E#), -- LINE SEPARATOR .. RIGHT-TO-LEFT OVERRIDE (16#02060#, 16#02063#), -- WORD JOINER .. INVISIBLE SEPARATOR (16#0206A#, 16#0206F#), -- INHIBIT SYMMETRIC SWAPPING .. NOMINAL DIGIT SHAPES (16#0D800#, 16#0D800#), -- .. (16#0DB7F#, 16#0DB80#), -- .. (16#0DBFF#, 16#0DC00#), -- .. (16#0DFFF#, 16#0E000#), -- .. (16#0F8FF#, 16#0F8FF#), -- .. (16#0FEFF#, 16#0FEFF#), -- ZERO WIDTH NO-BREAK SPACE .. ZERO WIDTH NO-BREAK SPACE (16#0FFF9#, 16#0FFFB#), -- INTERLINEAR ANNOTATION ANCHOR .. INTERLINEAR ANNOTATION TERMINATOR (16#0FFFE#, 16#0FFFF#), -- excluded code positions (16#1D173#, 16#1D17A#), -- MUSICAL SYMBOL BEGIN BEAM .. MUSICAL SYMBOL END PHRASE (16#E0001#, 16#E0001#), -- LANGUAGE TAG .. LANGUAGE TAG (16#E0020#, 16#E007F#), -- TAG SPACE .. CANCEL TAG (16#F0000#, 16#FFFFD#), -- .. (16#100000#, 16#10FFFD#)); -- .. -- The following two tables define the mapping to upper case. The first -- table gives the ranges of lower case letters. The corresponding entry -- in Uppercase_Adjust shows the amount to be added (or subtracted) from -- the code value to get the corresponding upper case letter. -- Note that this folding is not reversible, for example lower case -- dotless i folds to normal upper case I, and that cannot be reversed. Lower_Case_Letters : constant UTF_32_Ranges := ( (16#00061#, 16#0007A#), -- LATIN SMALL LETTER A .. LATIN SMALL LETTER Z (16#000B5#, 16#000B5#), -- MICRO SIGN .. MICRO SIGN (16#000E0#, 16#000F6#), -- LATIN SMALL LETTER A WITH GRAVE .. LATIN SMALL LETTER O WITH DIAERESIS (16#000F8#, 16#000FE#), -- LATIN SMALL LETTER O WITH STROKE .. LATIN SMALL LETTER THORN (16#000FF#, 16#000FF#), -- LATIN SMALL LETTER Y WITH DIAERESIS .. LATIN SMALL LETTER Y WITH DIAERESIS (16#00101#, 16#00101#), -- LATIN SMALL LETTER A WITH MACRON .. LATIN SMALL LETTER A WITH MACRON (16#00103#, 16#00103#), -- LATIN SMALL LETTER A WITH BREVE .. LATIN SMALL LETTER A WITH BREVE (16#00105#, 16#00105#), -- LATIN SMALL LETTER A WITH OGONEK .. LATIN SMALL LETTER A WITH OGONEK (16#00107#, 16#00107#), -- LATIN SMALL LETTER C WITH ACUTE .. LATIN SMALL LETTER C WITH ACUTE (16#00109#, 16#00109#), -- LATIN SMALL LETTER C WITH CIRCUMFLEX .. LATIN SMALL LETTER C WITH CIRCUMFLEX (16#0010B#, 16#0010B#), -- LATIN SMALL LETTER C WITH DOT ABOVE .. LATIN SMALL LETTER C WITH DOT ABOVE (16#0010D#, 16#0010D#), -- LATIN SMALL LETTER C WITH CARON .. LATIN SMALL LETTER C WITH CARON (16#0010F#, 16#0010F#), -- LATIN SMALL LETTER D WITH CARON .. LATIN SMALL LETTER D WITH CARON (16#00111#, 16#00111#), -- LATIN SMALL LETTER D WITH STROKE .. LATIN SMALL LETTER D WITH STROKE (16#00113#, 16#00113#), -- LATIN SMALL LETTER E WITH MACRON .. LATIN SMALL LETTER E WITH MACRON (16#00115#, 16#00115#), -- LATIN SMALL LETTER E WITH BREVE .. LATIN SMALL LETTER E WITH BREVE (16#00117#, 16#00117#), -- LATIN SMALL LETTER E WITH DOT ABOVE .. LATIN SMALL LETTER E WITH DOT ABOVE (16#00119#, 16#00119#), -- LATIN SMALL LETTER E WITH OGONEK .. LATIN SMALL LETTER E WITH OGONEK (16#0011B#, 16#0011B#), -- LATIN SMALL LETTER E WITH CARON .. LATIN SMALL LETTER E WITH CARON (16#0011D#, 16#0011D#), -- LATIN SMALL LETTER G WITH CIRCUMFLEX .. LATIN SMALL LETTER G WITH CIRCUMFLEX (16#0011F#, 16#0011F#), -- LATIN SMALL LETTER G WITH BREVE .. LATIN SMALL LETTER G WITH BREVE (16#00121#, 16#00121#), -- LATIN SMALL LETTER G WITH DOT ABOVE .. LATIN SMALL LETTER G WITH DOT ABOVE (16#00123#, 16#00123#), -- LATIN SMALL LETTER G WITH CEDILLA .. LATIN SMALL LETTER G WITH CEDILLA (16#00125#, 16#00125#), -- LATIN SMALL LETTER H WITH CIRCUMFLEX .. LATIN SMALL LETTER H WITH CIRCUMFLEX (16#00127#, 16#00127#), -- LATIN SMALL LETTER H WITH STROKE .. LATIN SMALL LETTER H WITH STROKE (16#00129#, 16#00129#), -- LATIN SMALL LETTER I WITH TILDE .. LATIN SMALL LETTER I WITH TILDE (16#0012B#, 16#0012B#), -- LATIN SMALL LETTER I WITH MACRON .. LATIN SMALL LETTER I WITH MACRON (16#0012D#, 16#0012D#), -- LATIN SMALL LETTER I WITH BREVE .. LATIN SMALL LETTER I WITH BREVE (16#0012F#, 16#0012F#), -- LATIN SMALL LETTER I WITH OGONEK .. LATIN SMALL LETTER I WITH OGONEK (16#00131#, 16#00131#), -- LATIN SMALL LETTER DOTLESS I .. LATIN SMALL LETTER DOTLESS I (16#00133#, 16#00133#), -- LATIN SMALL LIGATURE IJ .. LATIN SMALL LIGATURE IJ (16#00135#, 16#00135#), -- LATIN SMALL LETTER J WITH CIRCUMFLEX .. LATIN SMALL LETTER J WITH CIRCUMFLEX (16#00137#, 16#00137#), -- LATIN SMALL LETTER K WITH CEDILLA .. LATIN SMALL LETTER K WITH CEDILLA (16#0013A#, 16#0013A#), -- LATIN SMALL LETTER L WITH ACUTE .. LATIN SMALL LETTER L WITH ACUTE (16#0013C#, 16#0013C#), -- LATIN SMALL LETTER L WITH CEDILLA .. LATIN SMALL LETTER L WITH CEDILLA (16#0013E#, 16#0013E#), -- LATIN SMALL LETTER L WITH CARON .. LATIN SMALL LETTER L WITH CARON (16#00140#, 16#00140#), -- LATIN SMALL LETTER L WITH MIDDLE DOT .. LATIN SMALL LETTER L WITH MIDDLE DOT (16#00142#, 16#00142#), -- LATIN SMALL LETTER L WITH STROKE .. LATIN SMALL LETTER L WITH STROKE (16#00144#, 16#00144#), -- LATIN SMALL LETTER N WITH ACUTE .. LATIN SMALL LETTER N WITH ACUTE (16#00146#, 16#00146#), -- LATIN SMALL LETTER N WITH CEDILLA .. LATIN SMALL LETTER N WITH CEDILLA (16#00148#, 16#00148#), -- LATIN SMALL LETTER N WITH CARON .. LATIN SMALL LETTER N WITH CARON (16#0014B#, 16#0014B#), -- LATIN SMALL LETTER ENG .. LATIN SMALL LETTER ENG (16#0014D#, 16#0014D#), -- LATIN SMALL LETTER O WITH MACRON .. LATIN SMALL LETTER O WITH MACRON (16#0014F#, 16#0014F#), -- LATIN SMALL LETTER O WITH BREVE .. LATIN SMALL LETTER O WITH BREVE (16#00151#, 16#00151#), -- LATIN SMALL LETTER O WITH DOUBLE ACUTE .. LATIN SMALL LETTER O WITH DOUBLE ACUTE (16#00153#, 16#00153#), -- LATIN SMALL LIGATURE OE .. LATIN SMALL LIGATURE OE (16#00155#, 16#00155#), -- LATIN SMALL LETTER R WITH ACUTE .. LATIN SMALL LETTER R WITH ACUTE (16#00157#, 16#00157#), -- LATIN SMALL LETTER R WITH CEDILLA .. LATIN SMALL LETTER R WITH CEDILLA (16#00159#, 16#00159#), -- LATIN SMALL LETTER R WITH CARON .. LATIN SMALL LETTER R WITH CARON (16#0015B#, 16#0015B#), -- LATIN SMALL LETTER S WITH ACUTE .. LATIN SMALL LETTER S WITH ACUTE (16#0015D#, 16#0015D#), -- LATIN SMALL LETTER S WITH CIRCUMFLEX .. LATIN SMALL LETTER S WITH CIRCUMFLEX (16#0015F#, 16#0015F#), -- LATIN SMALL LETTER S WITH CEDILLA .. LATIN SMALL LETTER S WITH CEDILLA (16#00161#, 16#00161#), -- LATIN SMALL LETTER S WITH CARON .. LATIN SMALL LETTER S WITH CARON (16#00163#, 16#00163#), -- LATIN SMALL LETTER T WITH CEDILLA .. LATIN SMALL LETTER T WITH CEDILLA (16#00165#, 16#00165#), -- LATIN SMALL LETTER T WITH CARON .. LATIN SMALL LETTER T WITH CARON (16#00167#, 16#00167#), -- LATIN SMALL LETTER T WITH STROKE .. LATIN SMALL LETTER T WITH STROKE (16#00169#, 16#00169#), -- LATIN SMALL LETTER U WITH TILDE .. LATIN SMALL LETTER U WITH TILDE (16#0016B#, 16#0016B#), -- LATIN SMALL LETTER U WITH MACRON .. LATIN SMALL LETTER U WITH MACRON (16#0016D#, 16#0016D#), -- LATIN SMALL LETTER U WITH BREVE .. LATIN SMALL LETTER U WITH BREVE (16#0016F#, 16#0016F#), -- LATIN SMALL LETTER U WITH RING ABOVE .. LATIN SMALL LETTER U WITH RING ABOVE (16#00171#, 16#00171#), -- LATIN SMALL LETTER U WITH DOUBLE ACUTE .. LATIN SMALL LETTER U WITH DOUBLE ACUTE (16#00173#, 16#00173#), -- LATIN SMALL LETTER U WITH OGONEK .. LATIN SMALL LETTER U WITH OGONEK (16#00175#, 16#00175#), -- LATIN SMALL LETTER W WITH CIRCUMFLEX .. LATIN SMALL LETTER W WITH CIRCUMFLEX (16#00177#, 16#00177#), -- LATIN SMALL LETTER Y WITH CIRCUMFLEX .. LATIN SMALL LETTER Y WITH CIRCUMFLEX (16#0017A#, 16#0017A#), -- LATIN SMALL LETTER Z WITH ACUTE .. LATIN SMALL LETTER Z WITH ACUTE (16#0017C#, 16#0017C#), -- LATIN SMALL LETTER Z WITH DOT ABOVE .. LATIN SMALL LETTER Z WITH DOT ABOVE (16#0017E#, 16#0017E#), -- LATIN SMALL LETTER Z WITH CARON .. LATIN SMALL LETTER Z WITH CARON (16#0017F#, 16#0017F#), -- LATIN SMALL LETTER LONG S .. LATIN SMALL LETTER LONG S (16#00183#, 16#00183#), -- LATIN SMALL LETTER B WITH TOPBAR .. LATIN SMALL LETTER B WITH TOPBAR (16#00185#, 16#00185#), -- LATIN SMALL LETTER TONE SIX .. LATIN SMALL LETTER TONE SIX (16#00188#, 16#00188#), -- LATIN SMALL LETTER C WITH HOOK .. LATIN SMALL LETTER C WITH HOOK (16#0018C#, 16#0018C#), -- LATIN SMALL LETTER D WITH TOPBAR .. LATIN SMALL LETTER D WITH TOPBAR (16#00192#, 16#00192#), -- LATIN SMALL LETTER F WITH HOOK .. LATIN SMALL LETTER F WITH HOOK (16#00195#, 16#00195#), -- LATIN SMALL LETTER HV .. LATIN SMALL LETTER HV (16#00199#, 16#00199#), -- LATIN SMALL LETTER K WITH HOOK .. LATIN SMALL LETTER K WITH HOOK (16#0019E#, 16#0019E#), -- LATIN SMALL LETTER N WITH LONG RIGHT LEG .. LATIN SMALL LETTER N WITH LONG RIGHT LEG (16#001A1#, 16#001A1#), -- LATIN SMALL LETTER O WITH HORN .. LATIN SMALL LETTER O WITH HORN (16#001A3#, 16#001A3#), -- LATIN SMALL LETTER OI .. LATIN SMALL LETTER OI (16#001A5#, 16#001A5#), -- LATIN SMALL LETTER P WITH HOOK .. LATIN SMALL LETTER P WITH HOOK (16#001A8#, 16#001A8#), -- LATIN SMALL LETTER TONE TWO .. LATIN SMALL LETTER TONE TWO (16#001AD#, 16#001AD#), -- LATIN SMALL LETTER T WITH HOOK .. LATIN SMALL LETTER T WITH HOOK (16#001B0#, 16#001B0#), -- LATIN SMALL LETTER U WITH HORN .. LATIN SMALL LETTER U WITH HORN (16#001B4#, 16#001B4#), -- LATIN SMALL LETTER Y WITH HOOK .. LATIN SMALL LETTER Y WITH HOOK (16#001B6#, 16#001B6#), -- LATIN SMALL LETTER Z WITH STROKE .. LATIN SMALL LETTER Z WITH STROKE (16#001B9#, 16#001B9#), -- LATIN SMALL LETTER EZH REVERSED .. LATIN SMALL LETTER EZH REVERSED (16#001BD#, 16#001BD#), -- LATIN SMALL LETTER TONE FIVE .. LATIN SMALL LETTER TONE FIVE (16#001BF#, 16#001BF#), -- LATIN LETTER WYNN .. LATIN LETTER WYNN (16#001C5#, 16#001C5#), -- LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON .. LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON (16#001C6#, 16#001C6#), -- LATIN SMALL LETTER DZ WITH CARON .. LATIN SMALL LETTER DZ WITH CARON (16#001C8#, 16#001C8#), -- LATIN CAPITAL LETTER L WITH SMALL LETTER J .. LATIN CAPITAL LETTER L WITH SMALL LETTER J (16#001C9#, 16#001C9#), -- LATIN SMALL LETTER LJ .. LATIN SMALL LETTER LJ (16#001CB#, 16#001CB#), -- LATIN CAPITAL LETTER N WITH SMALL LETTER J .. LATIN CAPITAL LETTER N WITH SMALL LETTER J (16#001CC#, 16#001CC#), -- LATIN SMALL LETTER NJ .. LATIN SMALL LETTER NJ (16#001CE#, 16#001CE#), -- LATIN SMALL LETTER A WITH CARON .. LATIN SMALL LETTER A WITH CARON (16#001D0#, 16#001D0#), -- LATIN SMALL LETTER I WITH CARON .. LATIN SMALL LETTER I WITH CARON (16#001D2#, 16#001D2#), -- LATIN SMALL LETTER O WITH CARON .. LATIN SMALL LETTER O WITH CARON (16#001D4#, 16#001D4#), -- LATIN SMALL LETTER U WITH CARON .. LATIN SMALL LETTER U WITH CARON (16#001D6#, 16#001D6#), -- LATIN SMALL LETTER U WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER U WITH DIAERESIS AND MACRON (16#001D8#, 16#001D8#), -- LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE .. LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE (16#001DA#, 16#001DA#), -- LATIN SMALL LETTER U WITH DIAERESIS AND CARON .. LATIN SMALL LETTER U WITH DIAERESIS AND CARON (16#001DC#, 16#001DC#), -- LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE .. LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE (16#001DD#, 16#001DD#), -- LATIN SMALL LETTER TURNED E .. LATIN SMALL LETTER TURNED E (16#001DF#, 16#001DF#), -- LATIN SMALL LETTER A WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER A WITH DIAERESIS AND MACRON (16#001E1#, 16#001E1#), -- LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON .. LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON (16#001E3#, 16#001E3#), -- LATIN SMALL LETTER AE WITH MACRON .. LATIN SMALL LETTER AE WITH MACRON (16#001E5#, 16#001E5#), -- LATIN SMALL LETTER G WITH STROKE .. LATIN SMALL LETTER G WITH STROKE (16#001E7#, 16#001E7#), -- LATIN SMALL LETTER G WITH CARON .. LATIN SMALL LETTER G WITH CARON (16#001E9#, 16#001E9#), -- LATIN SMALL LETTER K WITH CARON .. LATIN SMALL LETTER K WITH CARON (16#001EB#, 16#001EB#), -- LATIN SMALL LETTER O WITH OGONEK .. LATIN SMALL LETTER O WITH OGONEK (16#001ED#, 16#001ED#), -- LATIN SMALL LETTER O WITH OGONEK AND MACRON .. LATIN SMALL LETTER O WITH OGONEK AND MACRON (16#001EF#, 16#001EF#), -- LATIN SMALL LETTER EZH WITH CARON .. LATIN SMALL LETTER EZH WITH CARON (16#001F2#, 16#001F2#), -- LATIN CAPITAL LETTER D WITH SMALL LETTER Z .. LATIN CAPITAL LETTER D WITH SMALL LETTER Z (16#001F3#, 16#001F3#), -- LATIN SMALL LETTER DZ .. LATIN SMALL LETTER DZ (16#001F5#, 16#001F5#), -- LATIN SMALL LETTER G WITH ACUTE .. LATIN SMALL LETTER G WITH ACUTE (16#001F9#, 16#001F9#), -- LATIN SMALL LETTER N WITH GRAVE .. LATIN SMALL LETTER N WITH GRAVE (16#001FB#, 16#001FB#), -- LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE .. LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE (16#001FD#, 16#001FD#), -- LATIN SMALL LETTER AE WITH ACUTE .. LATIN SMALL LETTER AE WITH ACUTE (16#001FF#, 16#001FF#), -- LATIN SMALL LETTER O WITH STROKE AND ACUTE .. LATIN SMALL LETTER O WITH STROKE AND ACUTE (16#00201#, 16#00201#), -- LATIN SMALL LETTER A WITH DOUBLE GRAVE .. LATIN SMALL LETTER A WITH DOUBLE GRAVE (16#00203#, 16#00203#), -- LATIN SMALL LETTER A WITH INVERTED BREVE .. LATIN SMALL LETTER A WITH INVERTED BREVE (16#00205#, 16#00205#), -- LATIN SMALL LETTER E WITH DOUBLE GRAVE .. LATIN SMALL LETTER E WITH DOUBLE GRAVE (16#00207#, 16#00207#), -- LATIN SMALL LETTER E WITH INVERTED BREVE .. LATIN SMALL LETTER E WITH INVERTED BREVE (16#00209#, 16#00209#), -- LATIN SMALL LETTER I WITH DOUBLE GRAVE .. LATIN SMALL LETTER I WITH DOUBLE GRAVE (16#0020B#, 16#0020B#), -- LATIN SMALL LETTER I WITH INVERTED BREVE .. LATIN SMALL LETTER I WITH INVERTED BREVE (16#0020D#, 16#0020D#), -- LATIN SMALL LETTER O WITH DOUBLE GRAVE .. LATIN SMALL LETTER O WITH DOUBLE GRAVE (16#0020F#, 16#0020F#), -- LATIN SMALL LETTER O WITH INVERTED BREVE .. LATIN SMALL LETTER O WITH INVERTED BREVE (16#00211#, 16#00211#), -- LATIN SMALL LETTER R WITH DOUBLE GRAVE .. LATIN SMALL LETTER R WITH DOUBLE GRAVE (16#00213#, 16#00213#), -- LATIN SMALL LETTER R WITH INVERTED BREVE .. LATIN SMALL LETTER R WITH INVERTED BREVE (16#00215#, 16#00215#), -- LATIN SMALL LETTER U WITH DOUBLE GRAVE .. LATIN SMALL LETTER U WITH DOUBLE GRAVE (16#00217#, 16#00217#), -- LATIN SMALL LETTER U WITH INVERTED BREVE .. LATIN SMALL LETTER U WITH INVERTED BREVE (16#00219#, 16#00219#), -- LATIN SMALL LETTER S WITH COMMA BELOW .. LATIN SMALL LETTER S WITH COMMA BELOW (16#0021B#, 16#0021B#), -- LATIN SMALL LETTER T WITH COMMA BELOW .. LATIN SMALL LETTER T WITH COMMA BELOW (16#0021D#, 16#0021D#), -- LATIN SMALL LETTER YOGH .. LATIN SMALL LETTER YOGH (16#0021F#, 16#0021F#), -- LATIN SMALL LETTER H WITH CARON .. LATIN SMALL LETTER H WITH CARON (16#00223#, 16#00223#), -- LATIN SMALL LETTER OU .. LATIN SMALL LETTER OU (16#00225#, 16#00225#), -- LATIN SMALL LETTER Z WITH HOOK .. LATIN SMALL LETTER Z WITH HOOK (16#00227#, 16#00227#), -- LATIN SMALL LETTER A WITH DOT ABOVE .. LATIN SMALL LETTER A WITH DOT ABOVE (16#00229#, 16#00229#), -- LATIN SMALL LETTER E WITH CEDILLA .. LATIN SMALL LETTER E WITH CEDILLA (16#0022B#, 16#0022B#), -- LATIN SMALL LETTER O WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER O WITH DIAERESIS AND MACRON (16#0022D#, 16#0022D#), -- LATIN SMALL LETTER O WITH TILDE AND MACRON .. LATIN SMALL LETTER O WITH TILDE AND MACRON (16#0022F#, 16#0022F#), -- LATIN SMALL LETTER O WITH DOT ABOVE .. LATIN SMALL LETTER O WITH DOT ABOVE (16#00231#, 16#00231#), -- LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON .. LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON (16#00233#, 16#00233#), -- LATIN SMALL LETTER Y WITH MACRON .. LATIN SMALL LETTER Y WITH MACRON (16#00253#, 16#00253#), -- LATIN SMALL LETTER B WITH HOOK .. LATIN SMALL LETTER B WITH HOOK (16#00254#, 16#00254#), -- LATIN SMALL LETTER OPEN O .. LATIN SMALL LETTER OPEN O (16#00256#, 16#00257#), -- LATIN SMALL LETTER D WITH TAIL .. LATIN SMALL LETTER D WITH HOOK (16#00259#, 16#00259#), -- LATIN SMALL LETTER SCHWA .. LATIN SMALL LETTER SCHWA (16#0025B#, 16#0025B#), -- LATIN SMALL LETTER OPEN E .. LATIN SMALL LETTER OPEN E (16#00260#, 16#00260#), -- LATIN SMALL LETTER G WITH HOOK .. LATIN SMALL LETTER G WITH HOOK (16#00263#, 16#00263#), -- LATIN SMALL LETTER GAMMA .. LATIN SMALL LETTER GAMMA (16#00268#, 16#00268#), -- LATIN SMALL LETTER I WITH STROKE .. LATIN SMALL LETTER I WITH STROKE (16#00269#, 16#00269#), -- LATIN SMALL LETTER IOTA .. LATIN SMALL LETTER IOTA (16#0026F#, 16#0026F#), -- LATIN SMALL LETTER TURNED M .. LATIN SMALL LETTER TURNED M (16#00272#, 16#00272#), -- LATIN SMALL LETTER N WITH LEFT HOOK .. LATIN SMALL LETTER N WITH LEFT HOOK (16#00275#, 16#00275#), -- LATIN SMALL LETTER BARRED O .. LATIN SMALL LETTER BARRED O (16#00280#, 16#00280#), -- LATIN LETTER SMALL CAPITAL R .. LATIN LETTER SMALL CAPITAL R (16#00283#, 16#00283#), -- LATIN SMALL LETTER ESH .. LATIN SMALL LETTER ESH (16#00288#, 16#00288#), -- LATIN SMALL LETTER T WITH RETROFLEX HOOK .. LATIN SMALL LETTER T WITH RETROFLEX HOOK (16#0028A#, 16#0028B#), -- LATIN SMALL LETTER UPSILON .. LATIN SMALL LETTER V WITH HOOK (16#00292#, 16#00292#), -- LATIN SMALL LETTER EZH .. LATIN SMALL LETTER EZH (16#003AC#, 16#003AC#), -- GREEK SMALL LETTER ALPHA WITH TONOS .. GREEK SMALL LETTER ALPHA WITH TONOS (16#003AD#, 16#003AF#), -- GREEK SMALL LETTER EPSILON WITH TONOS .. GREEK SMALL LETTER IOTA WITH TONOS (16#003B1#, 16#003C1#), -- GREEK SMALL LETTER ALPHA .. GREEK SMALL LETTER RHO (16#003C2#, 16#003C2#), -- GREEK SMALL LETTER FINAL SIGMA .. GREEK SMALL LETTER FINAL SIGMA (16#003C3#, 16#003CB#), -- GREEK SMALL LETTER SIGMA .. GREEK SMALL LETTER UPSILON WITH DIALYTIKA (16#003CC#, 16#003CC#), -- GREEK SMALL LETTER OMICRON WITH TONOS .. GREEK SMALL LETTER OMICRON WITH TONOS (16#003CD#, 16#003CE#), -- GREEK SMALL LETTER UPSILON WITH TONOS .. GREEK SMALL LETTER OMEGA WITH TONOS (16#003D0#, 16#003D0#), -- GREEK BETA SYMBOL .. GREEK BETA SYMBOL (16#003D1#, 16#003D1#), -- GREEK THETA SYMBOL .. GREEK THETA SYMBOL (16#003D5#, 16#003D5#), -- GREEK PHI SYMBOL .. GREEK PHI SYMBOL (16#003D6#, 16#003D6#), -- GREEK PI SYMBOL .. GREEK PI SYMBOL (16#003D9#, 16#003D9#), -- GREEK SMALL LETTER ARCHAIC KOPPA .. GREEK SMALL LETTER ARCHAIC KOPPA (16#003DB#, 16#003DB#), -- GREEK SMALL LETTER STIGMA .. GREEK SMALL LETTER STIGMA (16#003DD#, 16#003DD#), -- GREEK SMALL LETTER DIGAMMA .. GREEK SMALL LETTER DIGAMMA (16#003DF#, 16#003DF#), -- GREEK SMALL LETTER KOPPA .. GREEK SMALL LETTER KOPPA (16#003E1#, 16#003E1#), -- GREEK SMALL LETTER SAMPI .. GREEK SMALL LETTER SAMPI (16#003E3#, 16#003E3#), -- COPTIC SMALL LETTER SHEI .. COPTIC SMALL LETTER SHEI (16#003E5#, 16#003E5#), -- COPTIC SMALL LETTER FEI .. COPTIC SMALL LETTER FEI (16#003E7#, 16#003E7#), -- COPTIC SMALL LETTER KHEI .. COPTIC SMALL LETTER KHEI (16#003E9#, 16#003E9#), -- COPTIC SMALL LETTER HORI .. COPTIC SMALL LETTER HORI (16#003EB#, 16#003EB#), -- COPTIC SMALL LETTER GANGIA .. COPTIC SMALL LETTER GANGIA (16#003ED#, 16#003ED#), -- COPTIC SMALL LETTER SHIMA .. COPTIC SMALL LETTER SHIMA (16#003EF#, 16#003EF#), -- COPTIC SMALL LETTER DEI .. COPTIC SMALL LETTER DEI (16#003F0#, 16#003F0#), -- GREEK KAPPA SYMBOL .. GREEK KAPPA SYMBOL (16#003F1#, 16#003F1#), -- GREEK RHO SYMBOL .. GREEK RHO SYMBOL (16#003F2#, 16#003F2#), -- GREEK LUNATE SIGMA SYMBOL .. GREEK LUNATE SIGMA SYMBOL (16#003F5#, 16#003F5#), -- GREEK LUNATE EPSILON SYMBOL .. GREEK LUNATE EPSILON SYMBOL (16#00430#, 16#0044F#), -- CYRILLIC SMALL LETTER A .. CYRILLIC SMALL LETTER YA (16#00450#, 16#0045F#), -- CYRILLIC SMALL LETTER IE WITH GRAVE .. CYRILLIC SMALL LETTER DZHE (16#00461#, 16#00461#), -- CYRILLIC SMALL LETTER OMEGA .. CYRILLIC SMALL LETTER OMEGA (16#00463#, 16#00463#), -- CYRILLIC SMALL LETTER YAT .. CYRILLIC SMALL LETTER YAT (16#00465#, 16#00465#), -- CYRILLIC SMALL LETTER IOTIFIED E .. CYRILLIC SMALL LETTER IOTIFIED E (16#00467#, 16#00467#), -- CYRILLIC SMALL LETTER LITTLE YUS .. CYRILLIC SMALL LETTER LITTLE YUS (16#00469#, 16#00469#), -- CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS .. CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS (16#0046B#, 16#0046B#), -- CYRILLIC SMALL LETTER BIG YUS .. CYRILLIC SMALL LETTER BIG YUS (16#0046D#, 16#0046D#), -- CYRILLIC SMALL LETTER IOTIFIED BIG YUS .. CYRILLIC SMALL LETTER IOTIFIED BIG YUS (16#0046F#, 16#0046F#), -- CYRILLIC SMALL LETTER KSI .. CYRILLIC SMALL LETTER KSI (16#00471#, 16#00471#), -- CYRILLIC SMALL LETTER PSI .. CYRILLIC SMALL LETTER PSI (16#00473#, 16#00473#), -- CYRILLIC SMALL LETTER FITA .. CYRILLIC SMALL LETTER FITA (16#00475#, 16#00475#), -- CYRILLIC SMALL LETTER IZHITSA .. CYRILLIC SMALL LETTER IZHITSA (16#00477#, 16#00477#), -- CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT .. CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT (16#00479#, 16#00479#), -- CYRILLIC SMALL LETTER UK .. CYRILLIC SMALL LETTER UK (16#0047B#, 16#0047B#), -- CYRILLIC SMALL LETTER ROUND OMEGA .. CYRILLIC SMALL LETTER ROUND OMEGA (16#0047D#, 16#0047D#), -- CYRILLIC SMALL LETTER OMEGA WITH TITLO .. CYRILLIC SMALL LETTER OMEGA WITH TITLO (16#0047F#, 16#0047F#), -- CYRILLIC SMALL LETTER OT .. CYRILLIC SMALL LETTER OT (16#00481#, 16#00481#), -- CYRILLIC SMALL LETTER KOPPA .. CYRILLIC SMALL LETTER KOPPA (16#0048B#, 16#0048B#), -- CYRILLIC SMALL LETTER SHORT I WITH TAIL .. CYRILLIC SMALL LETTER SHORT I WITH TAIL (16#0048D#, 16#0048D#), -- CYRILLIC SMALL LETTER SEMISOFT SIGN .. CYRILLIC SMALL LETTER SEMISOFT SIGN (16#0048F#, 16#0048F#), -- CYRILLIC SMALL LETTER ER WITH TICK .. CYRILLIC SMALL LETTER ER WITH TICK (16#00491#, 16#00491#), -- CYRILLIC SMALL LETTER GHE WITH UPTURN .. CYRILLIC SMALL LETTER GHE WITH UPTURN (16#00493#, 16#00493#), -- CYRILLIC SMALL LETTER GHE WITH STROKE .. CYRILLIC SMALL LETTER GHE WITH STROKE (16#00495#, 16#00495#), -- CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK .. CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK (16#00497#, 16#00497#), -- CYRILLIC SMALL LETTER ZHE WITH DESCENDER .. CYRILLIC SMALL LETTER ZHE WITH DESCENDER (16#00499#, 16#00499#), -- CYRILLIC SMALL LETTER ZE WITH DESCENDER .. CYRILLIC SMALL LETTER ZE WITH DESCENDER (16#0049B#, 16#0049B#), -- CYRILLIC SMALL LETTER KA WITH DESCENDER .. CYRILLIC SMALL LETTER KA WITH DESCENDER (16#0049D#, 16#0049D#), -- CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE .. CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE (16#0049F#, 16#0049F#), -- CYRILLIC SMALL LETTER KA WITH STROKE .. CYRILLIC SMALL LETTER KA WITH STROKE (16#004A1#, 16#004A1#), -- CYRILLIC SMALL LETTER BASHKIR KA .. CYRILLIC SMALL LETTER BASHKIR KA (16#004A3#, 16#004A3#), -- CYRILLIC SMALL LETTER EN WITH DESCENDER .. CYRILLIC SMALL LETTER EN WITH DESCENDER (16#004A5#, 16#004A5#), -- CYRILLIC SMALL LIGATURE EN GHE .. CYRILLIC SMALL LIGATURE EN GHE (16#004A7#, 16#004A7#), -- CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK .. CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK (16#004A9#, 16#004A9#), -- CYRILLIC SMALL LETTER ABKHASIAN HA .. CYRILLIC SMALL LETTER ABKHASIAN HA (16#004AB#, 16#004AB#), -- CYRILLIC SMALL LETTER ES WITH DESCENDER .. CYRILLIC SMALL LETTER ES WITH DESCENDER (16#004AD#, 16#004AD#), -- CYRILLIC SMALL LETTER TE WITH DESCENDER .. CYRILLIC SMALL LETTER TE WITH DESCENDER (16#004AF#, 16#004AF#), -- CYRILLIC SMALL LETTER STRAIGHT U .. CYRILLIC SMALL LETTER STRAIGHT U (16#004B1#, 16#004B1#), -- CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE .. CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE (16#004B3#, 16#004B3#), -- CYRILLIC SMALL LETTER HA WITH DESCENDER .. CYRILLIC SMALL LETTER HA WITH DESCENDER (16#004B5#, 16#004B5#), -- CYRILLIC SMALL LIGATURE TE TSE .. CYRILLIC SMALL LIGATURE TE TSE (16#004B7#, 16#004B7#), -- CYRILLIC SMALL LETTER CHE WITH DESCENDER .. CYRILLIC SMALL LETTER CHE WITH DESCENDER (16#004B9#, 16#004B9#), -- CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE .. CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE (16#004BB#, 16#004BB#), -- CYRILLIC SMALL LETTER SHHA .. CYRILLIC SMALL LETTER SHHA (16#004BD#, 16#004BD#), -- CYRILLIC SMALL LETTER ABKHASIAN CHE .. CYRILLIC SMALL LETTER ABKHASIAN CHE (16#004BF#, 16#004BF#), -- CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER .. CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER (16#004C2#, 16#004C2#), -- CYRILLIC SMALL LETTER ZHE WITH BREVE .. CYRILLIC SMALL LETTER ZHE WITH BREVE (16#004C4#, 16#004C4#), -- CYRILLIC SMALL LETTER KA WITH HOOK .. CYRILLIC SMALL LETTER KA WITH HOOK (16#004C6#, 16#004C6#), -- CYRILLIC SMALL LETTER EL WITH TAIL .. CYRILLIC SMALL LETTER EL WITH TAIL (16#004C8#, 16#004C8#), -- CYRILLIC SMALL LETTER EN WITH HOOK .. CYRILLIC SMALL LETTER EN WITH HOOK (16#004CA#, 16#004CA#), -- CYRILLIC SMALL LETTER EN WITH TAIL .. CYRILLIC SMALL LETTER EN WITH TAIL (16#004CC#, 16#004CC#), -- CYRILLIC SMALL LETTER KHAKASSIAN CHE .. CYRILLIC SMALL LETTER KHAKASSIAN CHE (16#004CE#, 16#004CE#), -- CYRILLIC SMALL LETTER EM WITH TAIL .. CYRILLIC SMALL LETTER EM WITH TAIL (16#004D1#, 16#004D1#), -- CYRILLIC SMALL LETTER A WITH BREVE .. CYRILLIC SMALL LETTER A WITH BREVE (16#004D3#, 16#004D3#), -- CYRILLIC SMALL LETTER A WITH DIAERESIS .. CYRILLIC SMALL LETTER A WITH DIAERESIS (16#004D5#, 16#004D5#), -- CYRILLIC SMALL LIGATURE A IE .. CYRILLIC SMALL LIGATURE A IE (16#004D7#, 16#004D7#), -- CYRILLIC SMALL LETTER IE WITH BREVE .. CYRILLIC SMALL LETTER IE WITH BREVE (16#004D9#, 16#004D9#), -- CYRILLIC SMALL LETTER SCHWA .. CYRILLIC SMALL LETTER SCHWA (16#004DB#, 16#004DB#), -- CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS .. CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS (16#004DD#, 16#004DD#), -- CYRILLIC SMALL LETTER ZHE WITH DIAERESIS .. CYRILLIC SMALL LETTER ZHE WITH DIAERESIS (16#004DF#, 16#004DF#), -- CYRILLIC SMALL LETTER ZE WITH DIAERESIS .. CYRILLIC SMALL LETTER ZE WITH DIAERESIS (16#004E1#, 16#004E1#), -- CYRILLIC SMALL LETTER ABKHASIAN DZE .. CYRILLIC SMALL LETTER ABKHASIAN DZE (16#004E3#, 16#004E3#), -- CYRILLIC SMALL LETTER I WITH MACRON .. CYRILLIC SMALL LETTER I WITH MACRON (16#004E5#, 16#004E5#), -- CYRILLIC SMALL LETTER I WITH DIAERESIS .. CYRILLIC SMALL LETTER I WITH DIAERESIS (16#004E7#, 16#004E7#), -- CYRILLIC SMALL LETTER O WITH DIAERESIS .. CYRILLIC SMALL LETTER O WITH DIAERESIS (16#004E9#, 16#004E9#), -- CYRILLIC SMALL LETTER BARRED O .. CYRILLIC SMALL LETTER BARRED O (16#004EB#, 16#004EB#), -- CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS .. CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS (16#004ED#, 16#004ED#), -- CYRILLIC SMALL LETTER E WITH DIAERESIS .. CYRILLIC SMALL LETTER E WITH DIAERESIS (16#004EF#, 16#004EF#), -- CYRILLIC SMALL LETTER U WITH MACRON .. CYRILLIC SMALL LETTER U WITH MACRON (16#004F1#, 16#004F1#), -- CYRILLIC SMALL LETTER U WITH DIAERESIS .. CYRILLIC SMALL LETTER U WITH DIAERESIS (16#004F3#, 16#004F3#), -- CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE .. CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE (16#004F5#, 16#004F5#), -- CYRILLIC SMALL LETTER CHE WITH DIAERESIS .. CYRILLIC SMALL LETTER CHE WITH DIAERESIS (16#004F9#, 16#004F9#), -- CYRILLIC SMALL LETTER YERU WITH DIAERESIS .. CYRILLIC SMALL LETTER YERU WITH DIAERESIS (16#00501#, 16#00501#), -- CYRILLIC SMALL LETTER KOMI DE .. CYRILLIC SMALL LETTER KOMI DE (16#00503#, 16#00503#), -- CYRILLIC SMALL LETTER KOMI DJE .. CYRILLIC SMALL LETTER KOMI DJE (16#00505#, 16#00505#), -- CYRILLIC SMALL LETTER KOMI ZJE .. CYRILLIC SMALL LETTER KOMI ZJE (16#00507#, 16#00507#), -- CYRILLIC SMALL LETTER KOMI DZJE .. CYRILLIC SMALL LETTER KOMI DZJE (16#00509#, 16#00509#), -- CYRILLIC SMALL LETTER KOMI LJE .. CYRILLIC SMALL LETTER KOMI LJE (16#0050B#, 16#0050B#), -- CYRILLIC SMALL LETTER KOMI NJE .. CYRILLIC SMALL LETTER KOMI NJE (16#0050D#, 16#0050D#), -- CYRILLIC SMALL LETTER KOMI SJE .. CYRILLIC SMALL LETTER KOMI SJE (16#0050F#, 16#0050F#), -- CYRILLIC SMALL LETTER KOMI TJE .. CYRILLIC SMALL LETTER KOMI TJE (16#00561#, 16#00586#), -- ARMENIAN SMALL LETTER AYB .. ARMENIAN SMALL LETTER FEH (16#01E01#, 16#01E01#), -- LATIN SMALL LETTER A WITH RING BELOW .. LATIN SMALL LETTER A WITH RING BELOW (16#01E03#, 16#01E03#), -- LATIN SMALL LETTER B WITH DOT ABOVE .. LATIN SMALL LETTER B WITH DOT ABOVE (16#01E05#, 16#01E05#), -- LATIN SMALL LETTER B WITH DOT BELOW .. LATIN SMALL LETTER B WITH DOT BELOW (16#01E07#, 16#01E07#), -- LATIN SMALL LETTER B WITH LINE BELOW .. LATIN SMALL LETTER B WITH LINE BELOW (16#01E09#, 16#01E09#), -- LATIN SMALL LETTER C WITH CEDILLA AND ACUTE .. LATIN SMALL LETTER C WITH CEDILLA AND ACUTE (16#01E0B#, 16#01E0B#), -- LATIN SMALL LETTER D WITH DOT ABOVE .. LATIN SMALL LETTER D WITH DOT ABOVE (16#01E0D#, 16#01E0D#), -- LATIN SMALL LETTER D WITH DOT BELOW .. LATIN SMALL LETTER D WITH DOT BELOW (16#01E0F#, 16#01E0F#), -- LATIN SMALL LETTER D WITH LINE BELOW .. LATIN SMALL LETTER D WITH LINE BELOW (16#01E11#, 16#01E11#), -- LATIN SMALL LETTER D WITH CEDILLA .. LATIN SMALL LETTER D WITH CEDILLA (16#01E13#, 16#01E13#), -- LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW (16#01E15#, 16#01E15#), -- LATIN SMALL LETTER E WITH MACRON AND GRAVE .. LATIN SMALL LETTER E WITH MACRON AND GRAVE (16#01E17#, 16#01E17#), -- LATIN SMALL LETTER E WITH MACRON AND ACUTE .. LATIN SMALL LETTER E WITH MACRON AND ACUTE (16#01E19#, 16#01E19#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW (16#01E1B#, 16#01E1B#), -- LATIN SMALL LETTER E WITH TILDE BELOW .. LATIN SMALL LETTER E WITH TILDE BELOW (16#01E1D#, 16#01E1D#), -- LATIN SMALL LETTER E WITH CEDILLA AND BREVE .. LATIN SMALL LETTER E WITH CEDILLA AND BREVE (16#01E1F#, 16#01E1F#), -- LATIN SMALL LETTER F WITH DOT ABOVE .. LATIN SMALL LETTER F WITH DOT ABOVE (16#01E21#, 16#01E21#), -- LATIN SMALL LETTER G WITH MACRON .. LATIN SMALL LETTER G WITH MACRON (16#01E23#, 16#01E23#), -- LATIN SMALL LETTER H WITH DOT ABOVE .. LATIN SMALL LETTER H WITH DOT ABOVE (16#01E25#, 16#01E25#), -- LATIN SMALL LETTER H WITH DOT BELOW .. LATIN SMALL LETTER H WITH DOT BELOW (16#01E27#, 16#01E27#), -- LATIN SMALL LETTER H WITH DIAERESIS .. LATIN SMALL LETTER H WITH DIAERESIS (16#01E29#, 16#01E29#), -- LATIN SMALL LETTER H WITH CEDILLA .. LATIN SMALL LETTER H WITH CEDILLA (16#01E2B#, 16#01E2B#), -- LATIN SMALL LETTER H WITH BREVE BELOW .. LATIN SMALL LETTER H WITH BREVE BELOW (16#01E2D#, 16#01E2D#), -- LATIN SMALL LETTER I WITH TILDE BELOW .. LATIN SMALL LETTER I WITH TILDE BELOW (16#01E2F#, 16#01E2F#), -- LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE .. LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE (16#01E31#, 16#01E31#), -- LATIN SMALL LETTER K WITH ACUTE .. LATIN SMALL LETTER K WITH ACUTE (16#01E33#, 16#01E33#), -- LATIN SMALL LETTER K WITH DOT BELOW .. LATIN SMALL LETTER K WITH DOT BELOW (16#01E35#, 16#01E35#), -- LATIN SMALL LETTER K WITH LINE BELOW .. LATIN SMALL LETTER K WITH LINE BELOW (16#01E37#, 16#01E37#), -- LATIN SMALL LETTER L WITH DOT BELOW .. LATIN SMALL LETTER L WITH DOT BELOW (16#01E39#, 16#01E39#), -- LATIN SMALL LETTER L WITH DOT BELOW AND MACRON .. LATIN SMALL LETTER L WITH DOT BELOW AND MACRON (16#01E3B#, 16#01E3B#), -- LATIN SMALL LETTER L WITH LINE BELOW .. LATIN SMALL LETTER L WITH LINE BELOW (16#01E3D#, 16#01E3D#), -- LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW (16#01E3F#, 16#01E3F#), -- LATIN SMALL LETTER M WITH ACUTE .. LATIN SMALL LETTER M WITH ACUTE (16#01E41#, 16#01E41#), -- LATIN SMALL LETTER M WITH DOT ABOVE .. LATIN SMALL LETTER M WITH DOT ABOVE (16#01E43#, 16#01E43#), -- LATIN SMALL LETTER M WITH DOT BELOW .. LATIN SMALL LETTER M WITH DOT BELOW (16#01E45#, 16#01E45#), -- LATIN SMALL LETTER N WITH DOT ABOVE .. LATIN SMALL LETTER N WITH DOT ABOVE (16#01E47#, 16#01E47#), -- LATIN SMALL LETTER N WITH DOT BELOW .. LATIN SMALL LETTER N WITH DOT BELOW (16#01E49#, 16#01E49#), -- LATIN SMALL LETTER N WITH LINE BELOW .. LATIN SMALL LETTER N WITH LINE BELOW (16#01E4B#, 16#01E4B#), -- LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW (16#01E4D#, 16#01E4D#), -- LATIN SMALL LETTER O WITH TILDE AND ACUTE .. LATIN SMALL LETTER O WITH TILDE AND ACUTE (16#01E4F#, 16#01E4F#), -- LATIN SMALL LETTER O WITH TILDE AND DIAERESIS .. LATIN SMALL LETTER O WITH TILDE AND DIAERESIS (16#01E51#, 16#01E51#), -- LATIN SMALL LETTER O WITH MACRON AND GRAVE .. LATIN SMALL LETTER O WITH MACRON AND GRAVE (16#01E53#, 16#01E53#), -- LATIN SMALL LETTER O WITH MACRON AND ACUTE .. LATIN SMALL LETTER O WITH MACRON AND ACUTE (16#01E55#, 16#01E55#), -- LATIN SMALL LETTER P WITH ACUTE .. LATIN SMALL LETTER P WITH ACUTE (16#01E57#, 16#01E57#), -- LATIN SMALL LETTER P WITH DOT ABOVE .. LATIN SMALL LETTER P WITH DOT ABOVE (16#01E59#, 16#01E59#), -- LATIN SMALL LETTER R WITH DOT ABOVE .. LATIN SMALL LETTER R WITH DOT ABOVE (16#01E5B#, 16#01E5B#), -- LATIN SMALL LETTER R WITH DOT BELOW .. LATIN SMALL LETTER R WITH DOT BELOW (16#01E5D#, 16#01E5D#), -- LATIN SMALL LETTER R WITH DOT BELOW AND MACRON .. LATIN SMALL LETTER R WITH DOT BELOW AND MACRON (16#01E5F#, 16#01E5F#), -- LATIN SMALL LETTER R WITH LINE BELOW .. LATIN SMALL LETTER R WITH LINE BELOW (16#01E61#, 16#01E61#), -- LATIN SMALL LETTER S WITH DOT ABOVE .. LATIN SMALL LETTER S WITH DOT ABOVE (16#01E63#, 16#01E63#), -- LATIN SMALL LETTER S WITH DOT BELOW .. LATIN SMALL LETTER S WITH DOT BELOW (16#01E65#, 16#01E65#), -- LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE .. LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE (16#01E67#, 16#01E67#), -- LATIN SMALL LETTER S WITH CARON AND DOT ABOVE .. LATIN SMALL LETTER S WITH CARON AND DOT ABOVE (16#01E69#, 16#01E69#), -- LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE .. LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE (16#01E6B#, 16#01E6B#), -- LATIN SMALL LETTER T WITH DOT ABOVE .. LATIN SMALL LETTER T WITH DOT ABOVE (16#01E6D#, 16#01E6D#), -- LATIN SMALL LETTER T WITH DOT BELOW .. LATIN SMALL LETTER T WITH DOT BELOW (16#01E6F#, 16#01E6F#), -- LATIN SMALL LETTER T WITH LINE BELOW .. LATIN SMALL LETTER T WITH LINE BELOW (16#01E71#, 16#01E71#), -- LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW (16#01E73#, 16#01E73#), -- LATIN SMALL LETTER U WITH DIAERESIS BELOW .. LATIN SMALL LETTER U WITH DIAERESIS BELOW (16#01E75#, 16#01E75#), -- LATIN SMALL LETTER U WITH TILDE BELOW .. LATIN SMALL LETTER U WITH TILDE BELOW (16#01E77#, 16#01E77#), -- LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW (16#01E79#, 16#01E79#), -- LATIN SMALL LETTER U WITH TILDE AND ACUTE .. LATIN SMALL LETTER U WITH TILDE AND ACUTE (16#01E7B#, 16#01E7B#), -- LATIN SMALL LETTER U WITH MACRON AND DIAERESIS .. LATIN SMALL LETTER U WITH MACRON AND DIAERESIS (16#01E7D#, 16#01E7D#), -- LATIN SMALL LETTER V WITH TILDE .. LATIN SMALL LETTER V WITH TILDE (16#01E7F#, 16#01E7F#), -- LATIN SMALL LETTER V WITH DOT BELOW .. LATIN SMALL LETTER V WITH DOT BELOW (16#01E81#, 16#01E81#), -- LATIN SMALL LETTER W WITH GRAVE .. LATIN SMALL LETTER W WITH GRAVE (16#01E83#, 16#01E83#), -- LATIN SMALL LETTER W WITH ACUTE .. LATIN SMALL LETTER W WITH ACUTE (16#01E85#, 16#01E85#), -- LATIN SMALL LETTER W WITH DIAERESIS .. LATIN SMALL LETTER W WITH DIAERESIS (16#01E87#, 16#01E87#), -- LATIN SMALL LETTER W WITH DOT ABOVE .. LATIN SMALL LETTER W WITH DOT ABOVE (16#01E89#, 16#01E89#), -- LATIN SMALL LETTER W WITH DOT BELOW .. LATIN SMALL LETTER W WITH DOT BELOW (16#01E8B#, 16#01E8B#), -- LATIN SMALL LETTER X WITH DOT ABOVE .. LATIN SMALL LETTER X WITH DOT ABOVE (16#01E8D#, 16#01E8D#), -- LATIN SMALL LETTER X WITH DIAERESIS .. LATIN SMALL LETTER X WITH DIAERESIS (16#01E8F#, 16#01E8F#), -- LATIN SMALL LETTER Y WITH DOT ABOVE .. LATIN SMALL LETTER Y WITH DOT ABOVE (16#01E91#, 16#01E91#), -- LATIN SMALL LETTER Z WITH CIRCUMFLEX .. LATIN SMALL LETTER Z WITH CIRCUMFLEX (16#01E93#, 16#01E93#), -- LATIN SMALL LETTER Z WITH DOT BELOW .. LATIN SMALL LETTER Z WITH DOT BELOW (16#01E95#, 16#01E95#), -- LATIN SMALL LETTER Z WITH LINE BELOW .. LATIN SMALL LETTER Z WITH LINE BELOW (16#01E9B#, 16#01E9B#), -- LATIN SMALL LETTER LONG S WITH DOT ABOVE .. LATIN SMALL LETTER LONG S WITH DOT ABOVE (16#01EA1#, 16#01EA1#), -- LATIN SMALL LETTER A WITH DOT BELOW .. LATIN SMALL LETTER A WITH DOT BELOW (16#01EA3#, 16#01EA3#), -- LATIN SMALL LETTER A WITH HOOK ABOVE .. LATIN SMALL LETTER A WITH HOOK ABOVE (16#01EA5#, 16#01EA5#), -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE (16#01EA7#, 16#01EA7#), -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE (16#01EA9#, 16#01EA9#), -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE (16#01EAB#, 16#01EAB#), -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE (16#01EAD#, 16#01EAD#), -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW (16#01EAF#, 16#01EAF#), -- LATIN SMALL LETTER A WITH BREVE AND ACUTE .. LATIN SMALL LETTER A WITH BREVE AND ACUTE (16#01EB1#, 16#01EB1#), -- LATIN SMALL LETTER A WITH BREVE AND GRAVE .. LATIN SMALL LETTER A WITH BREVE AND GRAVE (16#01EB3#, 16#01EB3#), -- LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE .. LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE (16#01EB5#, 16#01EB5#), -- LATIN SMALL LETTER A WITH BREVE AND TILDE .. LATIN SMALL LETTER A WITH BREVE AND TILDE (16#01EB7#, 16#01EB7#), -- LATIN SMALL LETTER A WITH BREVE AND DOT BELOW .. LATIN SMALL LETTER A WITH BREVE AND DOT BELOW (16#01EB9#, 16#01EB9#), -- LATIN SMALL LETTER E WITH DOT BELOW .. LATIN SMALL LETTER E WITH DOT BELOW (16#01EBB#, 16#01EBB#), -- LATIN SMALL LETTER E WITH HOOK ABOVE .. LATIN SMALL LETTER E WITH HOOK ABOVE (16#01EBD#, 16#01EBD#), -- LATIN SMALL LETTER E WITH TILDE .. LATIN SMALL LETTER E WITH TILDE (16#01EBF#, 16#01EBF#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE (16#01EC1#, 16#01EC1#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE (16#01EC3#, 16#01EC3#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE (16#01EC5#, 16#01EC5#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE (16#01EC7#, 16#01EC7#), -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW (16#01EC9#, 16#01EC9#), -- LATIN SMALL LETTER I WITH HOOK ABOVE .. LATIN SMALL LETTER I WITH HOOK ABOVE (16#01ECB#, 16#01ECB#), -- LATIN SMALL LETTER I WITH DOT BELOW .. LATIN SMALL LETTER I WITH DOT BELOW (16#01ECD#, 16#01ECD#), -- LATIN SMALL LETTER O WITH DOT BELOW .. LATIN SMALL LETTER O WITH DOT BELOW (16#01ECF#, 16#01ECF#), -- LATIN SMALL LETTER O WITH HOOK ABOVE .. LATIN SMALL LETTER O WITH HOOK ABOVE (16#01ED1#, 16#01ED1#), -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE (16#01ED3#, 16#01ED3#), -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE (16#01ED5#, 16#01ED5#), -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE (16#01ED7#, 16#01ED7#), -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE (16#01ED9#, 16#01ED9#), -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW (16#01EDB#, 16#01EDB#), -- LATIN SMALL LETTER O WITH HORN AND ACUTE .. LATIN SMALL LETTER O WITH HORN AND ACUTE (16#01EDD#, 16#01EDD#), -- LATIN SMALL LETTER O WITH HORN AND GRAVE .. LATIN SMALL LETTER O WITH HORN AND GRAVE (16#01EDF#, 16#01EDF#), -- LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE .. LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE (16#01EE1#, 16#01EE1#), -- LATIN SMALL LETTER O WITH HORN AND TILDE .. LATIN SMALL LETTER O WITH HORN AND TILDE (16#01EE3#, 16#01EE3#), -- LATIN SMALL LETTER O WITH HORN AND DOT BELOW .. LATIN SMALL LETTER O WITH HORN AND DOT BELOW (16#01EE5#, 16#01EE5#), -- LATIN SMALL LETTER U WITH DOT BELOW .. LATIN SMALL LETTER U WITH DOT BELOW (16#01EE7#, 16#01EE7#), -- LATIN SMALL LETTER U WITH HOOK ABOVE .. LATIN SMALL LETTER U WITH HOOK ABOVE (16#01EE9#, 16#01EE9#), -- LATIN SMALL LETTER U WITH HORN AND ACUTE .. LATIN SMALL LETTER U WITH HORN AND ACUTE (16#01EEB#, 16#01EEB#), -- LATIN SMALL LETTER U WITH HORN AND GRAVE .. LATIN SMALL LETTER U WITH HORN AND GRAVE (16#01EED#, 16#01EED#), -- LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE .. LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE (16#01EEF#, 16#01EEF#), -- LATIN SMALL LETTER U WITH HORN AND TILDE .. LATIN SMALL LETTER U WITH HORN AND TILDE (16#01EF1#, 16#01EF1#), -- LATIN SMALL LETTER U WITH HORN AND DOT BELOW .. LATIN SMALL LETTER U WITH HORN AND DOT BELOW (16#01EF3#, 16#01EF3#), -- LATIN SMALL LETTER Y WITH GRAVE .. LATIN SMALL LETTER Y WITH GRAVE (16#01EF5#, 16#01EF5#), -- LATIN SMALL LETTER Y WITH DOT BELOW .. LATIN SMALL LETTER Y WITH DOT BELOW (16#01EF7#, 16#01EF7#), -- LATIN SMALL LETTER Y WITH HOOK ABOVE .. LATIN SMALL LETTER Y WITH HOOK ABOVE (16#01EF9#, 16#01EF9#), -- LATIN SMALL LETTER Y WITH TILDE .. LATIN SMALL LETTER Y WITH TILDE (16#01F00#, 16#01F07#), -- GREEK SMALL LETTER ALPHA WITH PSILI .. GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI (16#01F10#, 16#01F15#), -- GREEK SMALL LETTER EPSILON WITH PSILI .. GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA (16#01F20#, 16#01F27#), -- GREEK SMALL LETTER ETA WITH PSILI .. GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI (16#01F30#, 16#01F37#), -- GREEK SMALL LETTER IOTA WITH PSILI .. GREEK SMALL LETTER IOTA WITH DASIA AND PERISPOMENI (16#01F40#, 16#01F45#), -- GREEK SMALL LETTER OMICRON WITH PSILI .. GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA (16#01F51#, 16#01F51#), -- GREEK SMALL LETTER UPSILON WITH DASIA .. GREEK SMALL LETTER UPSILON WITH DASIA (16#01F53#, 16#01F53#), -- GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA .. GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA (16#01F55#, 16#01F55#), -- GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA .. GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA (16#01F57#, 16#01F57#), -- GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI .. GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI (16#01F60#, 16#01F67#), -- GREEK SMALL LETTER OMEGA WITH PSILI .. GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI (16#01F70#, 16#01F71#), -- GREEK SMALL LETTER ALPHA WITH VARIA .. GREEK SMALL LETTER ALPHA WITH OXIA (16#01F72#, 16#01F75#), -- GREEK SMALL LETTER EPSILON WITH VARIA .. GREEK SMALL LETTER ETA WITH OXIA (16#01F76#, 16#01F77#), -- GREEK SMALL LETTER IOTA WITH VARIA .. GREEK SMALL LETTER IOTA WITH OXIA (16#01F78#, 16#01F79#), -- GREEK SMALL LETTER OMICRON WITH VARIA .. GREEK SMALL LETTER OMICRON WITH OXIA (16#01F7A#, 16#01F7B#), -- GREEK SMALL LETTER UPSILON WITH VARIA .. GREEK SMALL LETTER UPSILON WITH OXIA (16#01F7C#, 16#01F7D#), -- GREEK SMALL LETTER OMEGA WITH VARIA .. GREEK SMALL LETTER OMEGA WITH OXIA (16#01F80#, 16#01F87#), -- GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI (16#01F90#, 16#01F97#), -- GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI (16#01FA0#, 16#01FA7#), -- GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI (16#01FB0#, 16#01FB1#), -- GREEK SMALL LETTER ALPHA WITH VRACHY .. GREEK SMALL LETTER ALPHA WITH MACRON (16#01FB3#, 16#01FB3#), -- GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI (16#01FBE#, 16#01FBE#), -- GREEK PROSGEGRAMMENI .. GREEK PROSGEGRAMMENI (16#01FC3#, 16#01FC3#), -- GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI (16#01FD0#, 16#01FD1#), -- GREEK SMALL LETTER IOTA WITH VRACHY .. GREEK SMALL LETTER IOTA WITH MACRON (16#01FE0#, 16#01FE1#), -- GREEK SMALL LETTER UPSILON WITH VRACHY .. GREEK SMALL LETTER UPSILON WITH MACRON (16#01FE5#, 16#01FE5#), -- GREEK SMALL LETTER RHO WITH DASIA .. GREEK SMALL LETTER RHO WITH DASIA (16#01FF3#, 16#01FF3#), -- GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI (16#0FF41#, 16#0FF5A#), -- FULLWIDTH LATIN SMALL LETTER A .. FULLWIDTH LATIN SMALL LETTER Z (16#10428#, 16#1044D#)); -- DESERET SMALL LETTER LONG I .. DESERET SMALL LETTER ENG Upper_Case_Adjust : constant array (Lower_Case_Letters'Range) of Integer := ( -32, -- LATIN SMALL LETTER A .. LATIN SMALL LETTER Z 743, -- MICRO SIGN .. MICRO SIGN -32, -- LATIN SMALL LETTER A WITH GRAVE .. LATIN SMALL LETTER O WITH DIAERESIS -32, -- LATIN SMALL LETTER O WITH STROKE .. LATIN SMALL LETTER THORN 121, -- LATIN SMALL LETTER Y WITH DIAERESIS .. LATIN SMALL LETTER Y WITH DIAERESIS -1, -- LATIN SMALL LETTER A WITH MACRON .. LATIN SMALL LETTER A WITH MACRON -1, -- LATIN SMALL LETTER A WITH BREVE .. LATIN SMALL LETTER A WITH BREVE -1, -- LATIN SMALL LETTER A WITH OGONEK .. LATIN SMALL LETTER A WITH OGONEK -1, -- LATIN SMALL LETTER C WITH ACUTE .. LATIN SMALL LETTER C WITH ACUTE -1, -- LATIN SMALL LETTER C WITH CIRCUMFLEX .. LATIN SMALL LETTER C WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER C WITH DOT ABOVE .. LATIN SMALL LETTER C WITH DOT ABOVE -1, -- LATIN SMALL LETTER C WITH CARON .. LATIN SMALL LETTER C WITH CARON -1, -- LATIN SMALL LETTER D WITH CARON .. LATIN SMALL LETTER D WITH CARON -1, -- LATIN SMALL LETTER D WITH STROKE .. LATIN SMALL LETTER D WITH STROKE -1, -- LATIN SMALL LETTER E WITH MACRON .. LATIN SMALL LETTER E WITH MACRON -1, -- LATIN SMALL LETTER E WITH BREVE .. LATIN SMALL LETTER E WITH BREVE -1, -- LATIN SMALL LETTER E WITH DOT ABOVE .. LATIN SMALL LETTER E WITH DOT ABOVE -1, -- LATIN SMALL LETTER E WITH OGONEK .. LATIN SMALL LETTER E WITH OGONEK -1, -- LATIN SMALL LETTER E WITH CARON .. LATIN SMALL LETTER E WITH CARON -1, -- LATIN SMALL LETTER G WITH CIRCUMFLEX .. LATIN SMALL LETTER G WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER G WITH BREVE .. LATIN SMALL LETTER G WITH BREVE -1, -- LATIN SMALL LETTER G WITH DOT ABOVE .. LATIN SMALL LETTER G WITH DOT ABOVE -1, -- LATIN SMALL LETTER G WITH CEDILLA .. LATIN SMALL LETTER G WITH CEDILLA -1, -- LATIN SMALL LETTER H WITH CIRCUMFLEX .. LATIN SMALL LETTER H WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER H WITH STROKE .. LATIN SMALL LETTER H WITH STROKE -1, -- LATIN SMALL LETTER I WITH TILDE .. LATIN SMALL LETTER I WITH TILDE -1, -- LATIN SMALL LETTER I WITH MACRON .. LATIN SMALL LETTER I WITH MACRON -1, -- LATIN SMALL LETTER I WITH BREVE .. LATIN SMALL LETTER I WITH BREVE -1, -- LATIN SMALL LETTER I WITH OGONEK .. LATIN SMALL LETTER I WITH OGONEK -232, -- LATIN SMALL LETTER DOTLESS I .. LATIN SMALL LETTER DOTLESS I -1, -- LATIN SMALL LIGATURE IJ .. LATIN SMALL LIGATURE IJ -1, -- LATIN SMALL LETTER J WITH CIRCUMFLEX .. LATIN SMALL LETTER J WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER K WITH CEDILLA .. LATIN SMALL LETTER K WITH CEDILLA -1, -- LATIN SMALL LETTER L WITH ACUTE .. LATIN SMALL LETTER L WITH ACUTE -1, -- LATIN SMALL LETTER L WITH CEDILLA .. LATIN SMALL LETTER L WITH CEDILLA -1, -- LATIN SMALL LETTER L WITH CARON .. LATIN SMALL LETTER L WITH CARON -1, -- LATIN SMALL LETTER L WITH MIDDLE DOT .. LATIN SMALL LETTER L WITH MIDDLE DOT -1, -- LATIN SMALL LETTER L WITH STROKE .. LATIN SMALL LETTER L WITH STROKE -1, -- LATIN SMALL LETTER N WITH ACUTE .. LATIN SMALL LETTER N WITH ACUTE -1, -- LATIN SMALL LETTER N WITH CEDILLA .. LATIN SMALL LETTER N WITH CEDILLA -1, -- LATIN SMALL LETTER N WITH CARON .. LATIN SMALL LETTER N WITH CARON -1, -- LATIN SMALL LETTER ENG .. LATIN SMALL LETTER ENG -1, -- LATIN SMALL LETTER O WITH MACRON .. LATIN SMALL LETTER O WITH MACRON -1, -- LATIN SMALL LETTER O WITH BREVE .. LATIN SMALL LETTER O WITH BREVE -1, -- LATIN SMALL LETTER O WITH DOUBLE ACUTE .. LATIN SMALL LETTER O WITH DOUBLE ACUTE -1, -- LATIN SMALL LIGATURE OE .. LATIN SMALL LIGATURE OE -1, -- LATIN SMALL LETTER R WITH ACUTE .. LATIN SMALL LETTER R WITH ACUTE -1, -- LATIN SMALL LETTER R WITH CEDILLA .. LATIN SMALL LETTER R WITH CEDILLA -1, -- LATIN SMALL LETTER R WITH CARON .. LATIN SMALL LETTER R WITH CARON -1, -- LATIN SMALL LETTER S WITH ACUTE .. LATIN SMALL LETTER S WITH ACUTE -1, -- LATIN SMALL LETTER S WITH CIRCUMFLEX .. LATIN SMALL LETTER S WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER S WITH CEDILLA .. LATIN SMALL LETTER S WITH CEDILLA -1, -- LATIN SMALL LETTER S WITH CARON .. LATIN SMALL LETTER S WITH CARON -1, -- LATIN SMALL LETTER T WITH CEDILLA .. LATIN SMALL LETTER T WITH CEDILLA -1, -- LATIN SMALL LETTER T WITH CARON .. LATIN SMALL LETTER T WITH CARON -1, -- LATIN SMALL LETTER T WITH STROKE .. LATIN SMALL LETTER T WITH STROKE -1, -- LATIN SMALL LETTER U WITH TILDE .. LATIN SMALL LETTER U WITH TILDE -1, -- LATIN SMALL LETTER U WITH MACRON .. LATIN SMALL LETTER U WITH MACRON -1, -- LATIN SMALL LETTER U WITH BREVE .. LATIN SMALL LETTER U WITH BREVE -1, -- LATIN SMALL LETTER U WITH RING ABOVE .. LATIN SMALL LETTER U WITH RING ABOVE -1, -- LATIN SMALL LETTER U WITH DOUBLE ACUTE .. LATIN SMALL LETTER U WITH DOUBLE ACUTE -1, -- LATIN SMALL LETTER U WITH OGONEK .. LATIN SMALL LETTER U WITH OGONEK -1, -- LATIN SMALL LETTER W WITH CIRCUMFLEX .. LATIN SMALL LETTER W WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER Y WITH CIRCUMFLEX .. LATIN SMALL LETTER Y WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER Z WITH ACUTE .. LATIN SMALL LETTER Z WITH ACUTE -1, -- LATIN SMALL LETTER Z WITH DOT ABOVE .. LATIN SMALL LETTER Z WITH DOT ABOVE -1, -- LATIN SMALL LETTER Z WITH CARON .. LATIN SMALL LETTER Z WITH CARON -300, -- LATIN SMALL LETTER LONG S .. LATIN SMALL LETTER LONG S -1, -- LATIN SMALL LETTER B WITH TOPBAR .. LATIN SMALL LETTER B WITH TOPBAR -1, -- LATIN SMALL LETTER TONE SIX .. LATIN SMALL LETTER TONE SIX -1, -- LATIN SMALL LETTER C WITH HOOK .. LATIN SMALL LETTER C WITH HOOK -1, -- LATIN SMALL LETTER D WITH TOPBAR .. LATIN SMALL LETTER D WITH TOPBAR -1, -- LATIN SMALL LETTER F WITH HOOK .. LATIN SMALL LETTER F WITH HOOK 97, -- LATIN SMALL LETTER HV .. LATIN SMALL LETTER HV -1, -- LATIN SMALL LETTER K WITH HOOK .. LATIN SMALL LETTER K WITH HOOK 130, -- LATIN SMALL LETTER N WITH LONG RIGHT LEG .. LATIN SMALL LETTER N WITH LONG RIGHT LEG -1, -- LATIN SMALL LETTER O WITH HORN .. LATIN SMALL LETTER O WITH HORN -1, -- LATIN SMALL LETTER OI .. LATIN SMALL LETTER OI -1, -- LATIN SMALL LETTER P WITH HOOK .. LATIN SMALL LETTER P WITH HOOK -1, -- LATIN SMALL LETTER TONE TWO .. LATIN SMALL LETTER TONE TWO -1, -- LATIN SMALL LETTER T WITH HOOK .. LATIN SMALL LETTER T WITH HOOK -1, -- LATIN SMALL LETTER U WITH HORN .. LATIN SMALL LETTER U WITH HORN -1, -- LATIN SMALL LETTER Y WITH HOOK .. LATIN SMALL LETTER Y WITH HOOK -1, -- LATIN SMALL LETTER Z WITH STROKE .. LATIN SMALL LETTER Z WITH STROKE -1, -- LATIN SMALL LETTER EZH REVERSED .. LATIN SMALL LETTER EZH REVERSED -1, -- LATIN SMALL LETTER TONE FIVE .. LATIN SMALL LETTER TONE FIVE 56, -- LATIN LETTER WYNN .. LATIN LETTER WYNN -1, -- LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON .. LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON -2, -- LATIN SMALL LETTER DZ WITH CARON .. LATIN SMALL LETTER DZ WITH CARON -1, -- LATIN CAPITAL LETTER L WITH SMALL LETTER J .. LATIN CAPITAL LETTER L WITH SMALL LETTER J -2, -- LATIN SMALL LETTER LJ .. LATIN SMALL LETTER LJ -1, -- LATIN CAPITAL LETTER N WITH SMALL LETTER J .. LATIN CAPITAL LETTER N WITH SMALL LETTER J -2, -- LATIN SMALL LETTER NJ .. LATIN SMALL LETTER NJ -1, -- LATIN SMALL LETTER A WITH CARON .. LATIN SMALL LETTER A WITH CARON -1, -- LATIN SMALL LETTER I WITH CARON .. LATIN SMALL LETTER I WITH CARON -1, -- LATIN SMALL LETTER O WITH CARON .. LATIN SMALL LETTER O WITH CARON -1, -- LATIN SMALL LETTER U WITH CARON .. LATIN SMALL LETTER U WITH CARON -1, -- LATIN SMALL LETTER U WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER U WITH DIAERESIS AND MACRON -1, -- LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE .. LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE -1, -- LATIN SMALL LETTER U WITH DIAERESIS AND CARON .. LATIN SMALL LETTER U WITH DIAERESIS AND CARON -1, -- LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE .. LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE -79, -- LATIN SMALL LETTER TURNED E .. LATIN SMALL LETTER TURNED E -1, -- LATIN SMALL LETTER A WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER A WITH DIAERESIS AND MACRON -1, -- LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON .. LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON -1, -- LATIN SMALL LETTER AE WITH MACRON .. LATIN SMALL LETTER AE WITH MACRON -1, -- LATIN SMALL LETTER G WITH STROKE .. LATIN SMALL LETTER G WITH STROKE -1, -- LATIN SMALL LETTER G WITH CARON .. LATIN SMALL LETTER G WITH CARON -1, -- LATIN SMALL LETTER K WITH CARON .. LATIN SMALL LETTER K WITH CARON -1, -- LATIN SMALL LETTER O WITH OGONEK .. LATIN SMALL LETTER O WITH OGONEK -1, -- LATIN SMALL LETTER O WITH OGONEK AND MACRON .. LATIN SMALL LETTER O WITH OGONEK AND MACRON -1, -- LATIN SMALL LETTER EZH WITH CARON .. LATIN SMALL LETTER EZH WITH CARON -1, -- LATIN CAPITAL LETTER D WITH SMALL LETTER Z .. LATIN CAPITAL LETTER D WITH SMALL LETTER Z -2, -- LATIN SMALL LETTER DZ .. LATIN SMALL LETTER DZ -1, -- LATIN SMALL LETTER G WITH ACUTE .. LATIN SMALL LETTER G WITH ACUTE -1, -- LATIN SMALL LETTER N WITH GRAVE .. LATIN SMALL LETTER N WITH GRAVE -1, -- LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE .. LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE -1, -- LATIN SMALL LETTER AE WITH ACUTE .. LATIN SMALL LETTER AE WITH ACUTE -1, -- LATIN SMALL LETTER O WITH STROKE AND ACUTE .. LATIN SMALL LETTER O WITH STROKE AND ACUTE -1, -- LATIN SMALL LETTER A WITH DOUBLE GRAVE .. LATIN SMALL LETTER A WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER A WITH INVERTED BREVE .. LATIN SMALL LETTER A WITH INVERTED BREVE -1, -- LATIN SMALL LETTER E WITH DOUBLE GRAVE .. LATIN SMALL LETTER E WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER E WITH INVERTED BREVE .. LATIN SMALL LETTER E WITH INVERTED BREVE -1, -- LATIN SMALL LETTER I WITH DOUBLE GRAVE .. LATIN SMALL LETTER I WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER I WITH INVERTED BREVE .. LATIN SMALL LETTER I WITH INVERTED BREVE -1, -- LATIN SMALL LETTER O WITH DOUBLE GRAVE .. LATIN SMALL LETTER O WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER O WITH INVERTED BREVE .. LATIN SMALL LETTER O WITH INVERTED BREVE -1, -- LATIN SMALL LETTER R WITH DOUBLE GRAVE .. LATIN SMALL LETTER R WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER R WITH INVERTED BREVE .. LATIN SMALL LETTER R WITH INVERTED BREVE -1, -- LATIN SMALL LETTER U WITH DOUBLE GRAVE .. LATIN SMALL LETTER U WITH DOUBLE GRAVE -1, -- LATIN SMALL LETTER U WITH INVERTED BREVE .. LATIN SMALL LETTER U WITH INVERTED BREVE -1, -- LATIN SMALL LETTER S WITH COMMA BELOW .. LATIN SMALL LETTER S WITH COMMA BELOW -1, -- LATIN SMALL LETTER T WITH COMMA BELOW .. LATIN SMALL LETTER T WITH COMMA BELOW -1, -- LATIN SMALL LETTER YOGH .. LATIN SMALL LETTER YOGH -1, -- LATIN SMALL LETTER H WITH CARON .. LATIN SMALL LETTER H WITH CARON -1, -- LATIN SMALL LETTER OU .. LATIN SMALL LETTER OU -1, -- LATIN SMALL LETTER Z WITH HOOK .. LATIN SMALL LETTER Z WITH HOOK -1, -- LATIN SMALL LETTER A WITH DOT ABOVE .. LATIN SMALL LETTER A WITH DOT ABOVE -1, -- LATIN SMALL LETTER E WITH CEDILLA .. LATIN SMALL LETTER E WITH CEDILLA -1, -- LATIN SMALL LETTER O WITH DIAERESIS AND MACRON .. LATIN SMALL LETTER O WITH DIAERESIS AND MACRON -1, -- LATIN SMALL LETTER O WITH TILDE AND MACRON .. LATIN SMALL LETTER O WITH TILDE AND MACRON -1, -- LATIN SMALL LETTER O WITH DOT ABOVE .. LATIN SMALL LETTER O WITH DOT ABOVE -1, -- LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON .. LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON -1, -- LATIN SMALL LETTER Y WITH MACRON .. LATIN SMALL LETTER Y WITH MACRON -210, -- LATIN SMALL LETTER B WITH HOOK .. LATIN SMALL LETTER B WITH HOOK -206, -- LATIN SMALL LETTER OPEN O .. LATIN SMALL LETTER OPEN O -205, -- LATIN SMALL LETTER D WITH TAIL .. LATIN SMALL LETTER D WITH HOOK -202, -- LATIN SMALL LETTER SCHWA .. LATIN SMALL LETTER SCHWA -203, -- LATIN SMALL LETTER OPEN E .. LATIN SMALL LETTER OPEN E -205, -- LATIN SMALL LETTER G WITH HOOK .. LATIN SMALL LETTER G WITH HOOK -207, -- LATIN SMALL LETTER GAMMA .. LATIN SMALL LETTER GAMMA -209, -- LATIN SMALL LETTER I WITH STROKE .. LATIN SMALL LETTER I WITH STROKE -211, -- LATIN SMALL LETTER IOTA .. LATIN SMALL LETTER IOTA -211, -- LATIN SMALL LETTER TURNED M .. LATIN SMALL LETTER TURNED M -213, -- LATIN SMALL LETTER N WITH LEFT HOOK .. LATIN SMALL LETTER N WITH LEFT HOOK -214, -- LATIN SMALL LETTER BARRED O .. LATIN SMALL LETTER BARRED O -218, -- LATIN LETTER SMALL CAPITAL R .. LATIN LETTER SMALL CAPITAL R -218, -- LATIN SMALL LETTER ESH .. LATIN SMALL LETTER ESH -218, -- LATIN SMALL LETTER T WITH RETROFLEX HOOK .. LATIN SMALL LETTER T WITH RETROFLEX HOOK -217, -- LATIN SMALL LETTER UPSILON .. LATIN SMALL LETTER V WITH HOOK -219, -- LATIN SMALL LETTER EZH .. LATIN SMALL LETTER EZH -38, -- GREEK SMALL LETTER ALPHA WITH TONOS .. GREEK SMALL LETTER ALPHA WITH TONOS -37, -- GREEK SMALL LETTER EPSILON WITH TONOS .. GREEK SMALL LETTER IOTA WITH TONOS -32, -- GREEK SMALL LETTER ALPHA .. GREEK SMALL LETTER RHO -31, -- GREEK SMALL LETTER FINAL SIGMA .. GREEK SMALL LETTER FINAL SIGMA -32, -- GREEK SMALL LETTER SIGMA .. GREEK SMALL LETTER UPSILON WITH DIALYTIKA -64, -- GREEK SMALL LETTER OMICRON WITH TONOS .. GREEK SMALL LETTER OMICRON WITH TONOS -63, -- GREEK SMALL LETTER UPSILON WITH TONOS .. GREEK SMALL LETTER OMEGA WITH TONOS -62, -- GREEK BETA SYMBOL .. GREEK BETA SYMBOL -57, -- GREEK THETA SYMBOL .. GREEK THETA SYMBOL -47, -- GREEK PHI SYMBOL .. GREEK PHI SYMBOL -54, -- GREEK PI SYMBOL .. GREEK PI SYMBOL -1, -- GREEK SMALL LETTER ARCHAIC KOPPA .. GREEK SMALL LETTER ARCHAIC KOPPA -1, -- GREEK SMALL LETTER STIGMA .. GREEK SMALL LETTER STIGMA -1, -- GREEK SMALL LETTER DIGAMMA .. GREEK SMALL LETTER DIGAMMA -1, -- GREEK SMALL LETTER KOPPA .. GREEK SMALL LETTER KOPPA -1, -- GREEK SMALL LETTER SAMPI .. GREEK SMALL LETTER SAMPI -1, -- COPTIC SMALL LETTER SHEI .. COPTIC SMALL LETTER SHEI -1, -- COPTIC SMALL LETTER FEI .. COPTIC SMALL LETTER FEI -1, -- COPTIC SMALL LETTER KHEI .. COPTIC SMALL LETTER KHEI -1, -- COPTIC SMALL LETTER HORI .. COPTIC SMALL LETTER HORI -1, -- COPTIC SMALL LETTER GANGIA .. COPTIC SMALL LETTER GANGIA -1, -- COPTIC SMALL LETTER SHIMA .. COPTIC SMALL LETTER SHIMA -1, -- COPTIC SMALL LETTER DEI .. COPTIC SMALL LETTER DEI -86, -- GREEK KAPPA SYMBOL .. GREEK KAPPA SYMBOL -80, -- GREEK RHO SYMBOL .. GREEK RHO SYMBOL -79, -- GREEK LUNATE SIGMA SYMBOL .. GREEK LUNATE SIGMA SYMBOL -96, -- GREEK LUNATE EPSILON SYMBOL .. GREEK LUNATE EPSILON SYMBOL -32, -- CYRILLIC SMALL LETTER A .. CYRILLIC SMALL LETTER YA -80, -- CYRILLIC SMALL LETTER IE WITH GRAVE .. CYRILLIC SMALL LETTER DZHE -1, -- CYRILLIC SMALL LETTER OMEGA .. CYRILLIC SMALL LETTER OMEGA -1, -- CYRILLIC SMALL LETTER YAT .. CYRILLIC SMALL LETTER YAT -1, -- CYRILLIC SMALL LETTER IOTIFIED E .. CYRILLIC SMALL LETTER IOTIFIED E -1, -- CYRILLIC SMALL LETTER LITTLE YUS .. CYRILLIC SMALL LETTER LITTLE YUS -1, -- CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS .. CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS -1, -- CYRILLIC SMALL LETTER BIG YUS .. CYRILLIC SMALL LETTER BIG YUS -1, -- CYRILLIC SMALL LETTER IOTIFIED BIG YUS .. CYRILLIC SMALL LETTER IOTIFIED BIG YUS -1, -- CYRILLIC SMALL LETTER KSI .. CYRILLIC SMALL LETTER KSI -1, -- CYRILLIC SMALL LETTER PSI .. CYRILLIC SMALL LETTER PSI -1, -- CYRILLIC SMALL LETTER FITA .. CYRILLIC SMALL LETTER FITA -1, -- CYRILLIC SMALL LETTER IZHITSA .. CYRILLIC SMALL LETTER IZHITSA -1, -- CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT .. CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT -1, -- CYRILLIC SMALL LETTER UK .. CYRILLIC SMALL LETTER UK -1, -- CYRILLIC SMALL LETTER ROUND OMEGA .. CYRILLIC SMALL LETTER ROUND OMEGA -1, -- CYRILLIC SMALL LETTER OMEGA WITH TITLO .. CYRILLIC SMALL LETTER OMEGA WITH TITLO -1, -- CYRILLIC SMALL LETTER OT .. CYRILLIC SMALL LETTER OT -1, -- CYRILLIC SMALL LETTER KOPPA .. CYRILLIC SMALL LETTER KOPPA -1, -- CYRILLIC SMALL LETTER SHORT I WITH TAIL .. CYRILLIC SMALL LETTER SHORT I WITH TAIL -1, -- CYRILLIC SMALL LETTER SEMISOFT SIGN .. CYRILLIC SMALL LETTER SEMISOFT SIGN -1, -- CYRILLIC SMALL LETTER ER WITH TICK .. CYRILLIC SMALL LETTER ER WITH TICK -1, -- CYRILLIC SMALL LETTER GHE WITH UPTURN .. CYRILLIC SMALL LETTER GHE WITH UPTURN -1, -- CYRILLIC SMALL LETTER GHE WITH STROKE .. CYRILLIC SMALL LETTER GHE WITH STROKE -1, -- CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK .. CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK -1, -- CYRILLIC SMALL LETTER ZHE WITH DESCENDER .. CYRILLIC SMALL LETTER ZHE WITH DESCENDER -1, -- CYRILLIC SMALL LETTER ZE WITH DESCENDER .. CYRILLIC SMALL LETTER ZE WITH DESCENDER -1, -- CYRILLIC SMALL LETTER KA WITH DESCENDER .. CYRILLIC SMALL LETTER KA WITH DESCENDER -1, -- CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE .. CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE -1, -- CYRILLIC SMALL LETTER KA WITH STROKE .. CYRILLIC SMALL LETTER KA WITH STROKE -1, -- CYRILLIC SMALL LETTER BASHKIR KA .. CYRILLIC SMALL LETTER BASHKIR KA -1, -- CYRILLIC SMALL LETTER EN WITH DESCENDER .. CYRILLIC SMALL LETTER EN WITH DESCENDER -1, -- CYRILLIC SMALL LIGATURE EN GHE .. CYRILLIC SMALL LIGATURE EN GHE -1, -- CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK .. CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK -1, -- CYRILLIC SMALL LETTER ABKHASIAN HA .. CYRILLIC SMALL LETTER ABKHASIAN HA -1, -- CYRILLIC SMALL LETTER ES WITH DESCENDER .. CYRILLIC SMALL LETTER ES WITH DESCENDER -1, -- CYRILLIC SMALL LETTER TE WITH DESCENDER .. CYRILLIC SMALL LETTER TE WITH DESCENDER -1, -- CYRILLIC SMALL LETTER STRAIGHT U .. CYRILLIC SMALL LETTER STRAIGHT U -1, -- CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE .. CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE -1, -- CYRILLIC SMALL LETTER HA WITH DESCENDER .. CYRILLIC SMALL LETTER HA WITH DESCENDER -1, -- CYRILLIC SMALL LIGATURE TE TSE .. CYRILLIC SMALL LIGATURE TE TSE -1, -- CYRILLIC SMALL LETTER CHE WITH DESCENDER .. CYRILLIC SMALL LETTER CHE WITH DESCENDER -1, -- CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE .. CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE -1, -- CYRILLIC SMALL LETTER SHHA .. CYRILLIC SMALL LETTER SHHA -1, -- CYRILLIC SMALL LETTER ABKHASIAN CHE .. CYRILLIC SMALL LETTER ABKHASIAN CHE -1, -- CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER .. CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER -1, -- CYRILLIC SMALL LETTER ZHE WITH BREVE .. CYRILLIC SMALL LETTER ZHE WITH BREVE -1, -- CYRILLIC SMALL LETTER KA WITH HOOK .. CYRILLIC SMALL LETTER KA WITH HOOK -1, -- CYRILLIC SMALL LETTER EL WITH TAIL .. CYRILLIC SMALL LETTER EL WITH TAIL -1, -- CYRILLIC SMALL LETTER EN WITH HOOK .. CYRILLIC SMALL LETTER EN WITH HOOK -1, -- CYRILLIC SMALL LETTER EN WITH TAIL .. CYRILLIC SMALL LETTER EN WITH TAIL -1, -- CYRILLIC SMALL LETTER KHAKASSIAN CHE .. CYRILLIC SMALL LETTER KHAKASSIAN CHE -1, -- CYRILLIC SMALL LETTER EM WITH TAIL .. CYRILLIC SMALL LETTER EM WITH TAIL -1, -- CYRILLIC SMALL LETTER A WITH BREVE .. CYRILLIC SMALL LETTER A WITH BREVE -1, -- CYRILLIC SMALL LETTER A WITH DIAERESIS .. CYRILLIC SMALL LETTER A WITH DIAERESIS -1, -- CYRILLIC SMALL LIGATURE A IE .. CYRILLIC SMALL LIGATURE A IE -1, -- CYRILLIC SMALL LETTER IE WITH BREVE .. CYRILLIC SMALL LETTER IE WITH BREVE -1, -- CYRILLIC SMALL LETTER SCHWA .. CYRILLIC SMALL LETTER SCHWA -1, -- CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS .. CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER ZHE WITH DIAERESIS .. CYRILLIC SMALL LETTER ZHE WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER ZE WITH DIAERESIS .. CYRILLIC SMALL LETTER ZE WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER ABKHASIAN DZE .. CYRILLIC SMALL LETTER ABKHASIAN DZE -1, -- CYRILLIC SMALL LETTER I WITH MACRON .. CYRILLIC SMALL LETTER I WITH MACRON -1, -- CYRILLIC SMALL LETTER I WITH DIAERESIS .. CYRILLIC SMALL LETTER I WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER O WITH DIAERESIS .. CYRILLIC SMALL LETTER O WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER BARRED O .. CYRILLIC SMALL LETTER BARRED O -1, -- CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS .. CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER E WITH DIAERESIS .. CYRILLIC SMALL LETTER E WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER U WITH MACRON .. CYRILLIC SMALL LETTER U WITH MACRON -1, -- CYRILLIC SMALL LETTER U WITH DIAERESIS .. CYRILLIC SMALL LETTER U WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE .. CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE -1, -- CYRILLIC SMALL LETTER CHE WITH DIAERESIS .. CYRILLIC SMALL LETTER CHE WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER YERU WITH DIAERESIS .. CYRILLIC SMALL LETTER YERU WITH DIAERESIS -1, -- CYRILLIC SMALL LETTER KOMI DE .. CYRILLIC SMALL LETTER KOMI DE -1, -- CYRILLIC SMALL LETTER KOMI DJE .. CYRILLIC SMALL LETTER KOMI DJE -1, -- CYRILLIC SMALL LETTER KOMI ZJE .. CYRILLIC SMALL LETTER KOMI ZJE -1, -- CYRILLIC SMALL LETTER KOMI DZJE .. CYRILLIC SMALL LETTER KOMI DZJE -1, -- CYRILLIC SMALL LETTER KOMI LJE .. CYRILLIC SMALL LETTER KOMI LJE -1, -- CYRILLIC SMALL LETTER KOMI NJE .. CYRILLIC SMALL LETTER KOMI NJE -1, -- CYRILLIC SMALL LETTER KOMI SJE .. CYRILLIC SMALL LETTER KOMI SJE -1, -- CYRILLIC SMALL LETTER KOMI TJE .. CYRILLIC SMALL LETTER KOMI TJE -48, -- ARMENIAN SMALL LETTER AYB .. ARMENIAN SMALL LETTER FEH -1, -- LATIN SMALL LETTER A WITH RING BELOW .. LATIN SMALL LETTER A WITH RING BELOW -1, -- LATIN SMALL LETTER B WITH DOT ABOVE .. LATIN SMALL LETTER B WITH DOT ABOVE -1, -- LATIN SMALL LETTER B WITH DOT BELOW .. LATIN SMALL LETTER B WITH DOT BELOW -1, -- LATIN SMALL LETTER B WITH LINE BELOW .. LATIN SMALL LETTER B WITH LINE BELOW -1, -- LATIN SMALL LETTER C WITH CEDILLA AND ACUTE .. LATIN SMALL LETTER C WITH CEDILLA AND ACUTE -1, -- LATIN SMALL LETTER D WITH DOT ABOVE .. LATIN SMALL LETTER D WITH DOT ABOVE -1, -- LATIN SMALL LETTER D WITH DOT BELOW .. LATIN SMALL LETTER D WITH DOT BELOW -1, -- LATIN SMALL LETTER D WITH LINE BELOW .. LATIN SMALL LETTER D WITH LINE BELOW -1, -- LATIN SMALL LETTER D WITH CEDILLA .. LATIN SMALL LETTER D WITH CEDILLA -1, -- LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER E WITH MACRON AND GRAVE .. LATIN SMALL LETTER E WITH MACRON AND GRAVE -1, -- LATIN SMALL LETTER E WITH MACRON AND ACUTE .. LATIN SMALL LETTER E WITH MACRON AND ACUTE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER E WITH TILDE BELOW .. LATIN SMALL LETTER E WITH TILDE BELOW -1, -- LATIN SMALL LETTER E WITH CEDILLA AND BREVE .. LATIN SMALL LETTER E WITH CEDILLA AND BREVE -1, -- LATIN SMALL LETTER F WITH DOT ABOVE .. LATIN SMALL LETTER F WITH DOT ABOVE -1, -- LATIN SMALL LETTER G WITH MACRON .. LATIN SMALL LETTER G WITH MACRON -1, -- LATIN SMALL LETTER H WITH DOT ABOVE .. LATIN SMALL LETTER H WITH DOT ABOVE -1, -- LATIN SMALL LETTER H WITH DOT BELOW .. LATIN SMALL LETTER H WITH DOT BELOW -1, -- LATIN SMALL LETTER H WITH DIAERESIS .. LATIN SMALL LETTER H WITH DIAERESIS -1, -- LATIN SMALL LETTER H WITH CEDILLA .. LATIN SMALL LETTER H WITH CEDILLA -1, -- LATIN SMALL LETTER H WITH BREVE BELOW .. LATIN SMALL LETTER H WITH BREVE BELOW -1, -- LATIN SMALL LETTER I WITH TILDE BELOW .. LATIN SMALL LETTER I WITH TILDE BELOW -1, -- LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE .. LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE -1, -- LATIN SMALL LETTER K WITH ACUTE .. LATIN SMALL LETTER K WITH ACUTE -1, -- LATIN SMALL LETTER K WITH DOT BELOW .. LATIN SMALL LETTER K WITH DOT BELOW -1, -- LATIN SMALL LETTER K WITH LINE BELOW .. LATIN SMALL LETTER K WITH LINE BELOW -1, -- LATIN SMALL LETTER L WITH DOT BELOW .. LATIN SMALL LETTER L WITH DOT BELOW -1, -- LATIN SMALL LETTER L WITH DOT BELOW AND MACRON .. LATIN SMALL LETTER L WITH DOT BELOW AND MACRON -1, -- LATIN SMALL LETTER L WITH LINE BELOW .. LATIN SMALL LETTER L WITH LINE BELOW -1, -- LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER M WITH ACUTE .. LATIN SMALL LETTER M WITH ACUTE -1, -- LATIN SMALL LETTER M WITH DOT ABOVE .. LATIN SMALL LETTER M WITH DOT ABOVE -1, -- LATIN SMALL LETTER M WITH DOT BELOW .. LATIN SMALL LETTER M WITH DOT BELOW -1, -- LATIN SMALL LETTER N WITH DOT ABOVE .. LATIN SMALL LETTER N WITH DOT ABOVE -1, -- LATIN SMALL LETTER N WITH DOT BELOW .. LATIN SMALL LETTER N WITH DOT BELOW -1, -- LATIN SMALL LETTER N WITH LINE BELOW .. LATIN SMALL LETTER N WITH LINE BELOW -1, -- LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER O WITH TILDE AND ACUTE .. LATIN SMALL LETTER O WITH TILDE AND ACUTE -1, -- LATIN SMALL LETTER O WITH TILDE AND DIAERESIS .. LATIN SMALL LETTER O WITH TILDE AND DIAERESIS -1, -- LATIN SMALL LETTER O WITH MACRON AND GRAVE .. LATIN SMALL LETTER O WITH MACRON AND GRAVE -1, -- LATIN SMALL LETTER O WITH MACRON AND ACUTE .. LATIN SMALL LETTER O WITH MACRON AND ACUTE -1, -- LATIN SMALL LETTER P WITH ACUTE .. LATIN SMALL LETTER P WITH ACUTE -1, -- LATIN SMALL LETTER P WITH DOT ABOVE .. LATIN SMALL LETTER P WITH DOT ABOVE -1, -- LATIN SMALL LETTER R WITH DOT ABOVE .. LATIN SMALL LETTER R WITH DOT ABOVE -1, -- LATIN SMALL LETTER R WITH DOT BELOW .. LATIN SMALL LETTER R WITH DOT BELOW -1, -- LATIN SMALL LETTER R WITH DOT BELOW AND MACRON .. LATIN SMALL LETTER R WITH DOT BELOW AND MACRON -1, -- LATIN SMALL LETTER R WITH LINE BELOW .. LATIN SMALL LETTER R WITH LINE BELOW -1, -- LATIN SMALL LETTER S WITH DOT ABOVE .. LATIN SMALL LETTER S WITH DOT ABOVE -1, -- LATIN SMALL LETTER S WITH DOT BELOW .. LATIN SMALL LETTER S WITH DOT BELOW -1, -- LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE .. LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE -1, -- LATIN SMALL LETTER S WITH CARON AND DOT ABOVE .. LATIN SMALL LETTER S WITH CARON AND DOT ABOVE -1, -- LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE .. LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE -1, -- LATIN SMALL LETTER T WITH DOT ABOVE .. LATIN SMALL LETTER T WITH DOT ABOVE -1, -- LATIN SMALL LETTER T WITH DOT BELOW .. LATIN SMALL LETTER T WITH DOT BELOW -1, -- LATIN SMALL LETTER T WITH LINE BELOW .. LATIN SMALL LETTER T WITH LINE BELOW -1, -- LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER U WITH DIAERESIS BELOW .. LATIN SMALL LETTER U WITH DIAERESIS BELOW -1, -- LATIN SMALL LETTER U WITH TILDE BELOW .. LATIN SMALL LETTER U WITH TILDE BELOW -1, -- LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW .. LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW -1, -- LATIN SMALL LETTER U WITH TILDE AND ACUTE .. LATIN SMALL LETTER U WITH TILDE AND ACUTE -1, -- LATIN SMALL LETTER U WITH MACRON AND DIAERESIS .. LATIN SMALL LETTER U WITH MACRON AND DIAERESIS -1, -- LATIN SMALL LETTER V WITH TILDE .. LATIN SMALL LETTER V WITH TILDE -1, -- LATIN SMALL LETTER V WITH DOT BELOW .. LATIN SMALL LETTER V WITH DOT BELOW -1, -- LATIN SMALL LETTER W WITH GRAVE .. LATIN SMALL LETTER W WITH GRAVE -1, -- LATIN SMALL LETTER W WITH ACUTE .. LATIN SMALL LETTER W WITH ACUTE -1, -- LATIN SMALL LETTER W WITH DIAERESIS .. LATIN SMALL LETTER W WITH DIAERESIS -1, -- LATIN SMALL LETTER W WITH DOT ABOVE .. LATIN SMALL LETTER W WITH DOT ABOVE -1, -- LATIN SMALL LETTER W WITH DOT BELOW .. LATIN SMALL LETTER W WITH DOT BELOW -1, -- LATIN SMALL LETTER X WITH DOT ABOVE .. LATIN SMALL LETTER X WITH DOT ABOVE -1, -- LATIN SMALL LETTER X WITH DIAERESIS .. LATIN SMALL LETTER X WITH DIAERESIS -1, -- LATIN SMALL LETTER Y WITH DOT ABOVE .. LATIN SMALL LETTER Y WITH DOT ABOVE -1, -- LATIN SMALL LETTER Z WITH CIRCUMFLEX .. LATIN SMALL LETTER Z WITH CIRCUMFLEX -1, -- LATIN SMALL LETTER Z WITH DOT BELOW .. LATIN SMALL LETTER Z WITH DOT BELOW -1, -- LATIN SMALL LETTER Z WITH LINE BELOW .. LATIN SMALL LETTER Z WITH LINE BELOW -59, -- LATIN SMALL LETTER LONG S WITH DOT ABOVE .. LATIN SMALL LETTER LONG S WITH DOT ABOVE -1, -- LATIN SMALL LETTER A WITH DOT BELOW .. LATIN SMALL LETTER A WITH DOT BELOW -1, -- LATIN SMALL LETTER A WITH HOOK ABOVE .. LATIN SMALL LETTER A WITH HOOK ABOVE -1, -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE -1, -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE -1, -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE -1, -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE -1, -- LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW -1, -- LATIN SMALL LETTER A WITH BREVE AND ACUTE .. LATIN SMALL LETTER A WITH BREVE AND ACUTE -1, -- LATIN SMALL LETTER A WITH BREVE AND GRAVE .. LATIN SMALL LETTER A WITH BREVE AND GRAVE -1, -- LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE .. LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE -1, -- LATIN SMALL LETTER A WITH BREVE AND TILDE .. LATIN SMALL LETTER A WITH BREVE AND TILDE -1, -- LATIN SMALL LETTER A WITH BREVE AND DOT BELOW .. LATIN SMALL LETTER A WITH BREVE AND DOT BELOW -1, -- LATIN SMALL LETTER E WITH DOT BELOW .. LATIN SMALL LETTER E WITH DOT BELOW -1, -- LATIN SMALL LETTER E WITH HOOK ABOVE .. LATIN SMALL LETTER E WITH HOOK ABOVE -1, -- LATIN SMALL LETTER E WITH TILDE .. LATIN SMALL LETTER E WITH TILDE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE -1, -- LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW -1, -- LATIN SMALL LETTER I WITH HOOK ABOVE .. LATIN SMALL LETTER I WITH HOOK ABOVE -1, -- LATIN SMALL LETTER I WITH DOT BELOW .. LATIN SMALL LETTER I WITH DOT BELOW -1, -- LATIN SMALL LETTER O WITH DOT BELOW .. LATIN SMALL LETTER O WITH DOT BELOW -1, -- LATIN SMALL LETTER O WITH HOOK ABOVE .. LATIN SMALL LETTER O WITH HOOK ABOVE -1, -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE -1, -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE -1, -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE -1, -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE -1, -- LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW .. LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW -1, -- LATIN SMALL LETTER O WITH HORN AND ACUTE .. LATIN SMALL LETTER O WITH HORN AND ACUTE -1, -- LATIN SMALL LETTER O WITH HORN AND GRAVE .. LATIN SMALL LETTER O WITH HORN AND GRAVE -1, -- LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE .. LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE -1, -- LATIN SMALL LETTER O WITH HORN AND TILDE .. LATIN SMALL LETTER O WITH HORN AND TILDE -1, -- LATIN SMALL LETTER O WITH HORN AND DOT BELOW .. LATIN SMALL LETTER O WITH HORN AND DOT BELOW -1, -- LATIN SMALL LETTER U WITH DOT BELOW .. LATIN SMALL LETTER U WITH DOT BELOW -1, -- LATIN SMALL LETTER U WITH HOOK ABOVE .. LATIN SMALL LETTER U WITH HOOK ABOVE -1, -- LATIN SMALL LETTER U WITH HORN AND ACUTE .. LATIN SMALL LETTER U WITH HORN AND ACUTE -1, -- LATIN SMALL LETTER U WITH HORN AND GRAVE .. LATIN SMALL LETTER U WITH HORN AND GRAVE -1, -- LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE .. LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE -1, -- LATIN SMALL LETTER U WITH HORN AND TILDE .. LATIN SMALL LETTER U WITH HORN AND TILDE -1, -- LATIN SMALL LETTER U WITH HORN AND DOT BELOW .. LATIN SMALL LETTER U WITH HORN AND DOT BELOW -1, -- LATIN SMALL LETTER Y WITH GRAVE .. LATIN SMALL LETTER Y WITH GRAVE -1, -- LATIN SMALL LETTER Y WITH DOT BELOW .. LATIN SMALL LETTER Y WITH DOT BELOW -1, -- LATIN SMALL LETTER Y WITH HOOK ABOVE .. LATIN SMALL LETTER Y WITH HOOK ABOVE -1, -- LATIN SMALL LETTER Y WITH TILDE .. LATIN SMALL LETTER Y WITH TILDE 8, -- GREEK SMALL LETTER ALPHA WITH PSILI .. GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI 8, -- GREEK SMALL LETTER EPSILON WITH PSILI .. GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA 8, -- GREEK SMALL LETTER ETA WITH PSILI .. GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI 8, -- GREEK SMALL LETTER IOTA WITH PSILI .. GREEK SMALL LETTER IOTA WITH DASIA AND PERISPOMENI 8, -- GREEK SMALL LETTER OMICRON WITH PSILI .. GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA 8, -- GREEK SMALL LETTER UPSILON WITH DASIA .. GREEK SMALL LETTER UPSILON WITH DASIA 8, -- GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA .. GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA 8, -- GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA .. GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA 8, -- GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI .. GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI 8, -- GREEK SMALL LETTER OMEGA WITH PSILI .. GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI 74, -- GREEK SMALL LETTER ALPHA WITH VARIA .. GREEK SMALL LETTER ALPHA WITH OXIA 86, -- GREEK SMALL LETTER EPSILON WITH VARIA .. GREEK SMALL LETTER ETA WITH OXIA 100, -- GREEK SMALL LETTER IOTA WITH VARIA .. GREEK SMALL LETTER IOTA WITH OXIA 128, -- GREEK SMALL LETTER OMICRON WITH VARIA .. GREEK SMALL LETTER OMICRON WITH OXIA 112, -- GREEK SMALL LETTER UPSILON WITH VARIA .. GREEK SMALL LETTER UPSILON WITH OXIA 126, -- GREEK SMALL LETTER OMEGA WITH VARIA .. GREEK SMALL LETTER OMEGA WITH OXIA 8, -- GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI 8, -- GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI 8, -- GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI .. GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI 8, -- GREEK SMALL LETTER ALPHA WITH VRACHY .. GREEK SMALL LETTER ALPHA WITH MACRON 9, -- GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI -7205, -- GREEK PROSGEGRAMMENI .. GREEK PROSGEGRAMMENI 9, -- GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI 8, -- GREEK SMALL LETTER IOTA WITH VRACHY .. GREEK SMALL LETTER IOTA WITH MACRON 8, -- GREEK SMALL LETTER UPSILON WITH VRACHY .. GREEK SMALL LETTER UPSILON WITH MACRON 7, -- GREEK SMALL LETTER RHO WITH DASIA .. GREEK SMALL LETTER RHO WITH DASIA 9, -- GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI .. GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI -32, -- FULLWIDTH LATIN SMALL LETTER A .. FULLWIDTH LATIN SMALL LETTER Z -40); -- DESERET SMALL LETTER LONG I .. DESERET SMALL LETTER ENG **************************************************************** From: Dan Eilers Sent: Saturday, January 29, 2005 8:51 PM Since suggesting that you post unicode tables, I have read http://www.unicode.org/copyright.html which states (paraphrased) that by downloading, copying, installing or otherwise using Unicode Inc.'s data files, you agree to include their copyright notice in any copies. **************************************************************** From: Robert Dewar Sent: Sunday, January 29, 2005 7:29 AM Right, if anyone posts unicode documents as such here, they should follow that process. In my opinion this does not apply to technical use of the standard. Dan, if you think otherwise, feel free to follow your own inclination. **************************************************************** From: Robert Dewar Sent: Saturday, January 29, 2005 5:09 PM My colleage Vincent Celier has been keeping me honest on the Unicode tables by running a completely independent test based on an independently written program analyzing the unicode data base. This is very helpful and averted a potential disaster that might have befalled hapless Mongolian Ada programmers looking forward to Ada 2005. Vincent reported: I have run my test program again, and it found only one problem: 180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;WS;;;;;N;;;;; Is_UTF_Space = FALSE MONGOLIAN VOWEL SEPARATOR should be in category Space, but it is not. -- Vincent This error can be corrected by using the following updated version of the space table. Apparently I had forgotten to regenerate this: -- The following table includes all characters considered spaces, i.e. -- all characters from the Unicode table with categories: -- Separator, Space (Zs) UTF_32_Spaces : constant UTF_32_Ranges := ( (16#00020#, 16#00020#), -- SPACE .. SPACE (16#000A0#, 16#000A0#), -- NO-BREAK SPACE .. NO-BREAK SPACE (16#01680#, 16#01680#), -- OGHAM SPACE MARK .. OGHAM SPACE MARK (16#0180E#, 16#0180E#), -- MONGOLIAN VOWEL SEPARATOR .. MONGOLIAN VOWEL SEPARATOR (16#02000#, 16#0200B#), -- EN QUAD .. ZERO WIDTH SPACE (16#0202F#, 16#0202F#), -- NARROW NO-BREAK SPACE .. NARROW NO-BREAK SPACE (16#0205F#, 16#0205F#), -- MEDIUM MATHEMATICAL SPACE .. MEDIUM MATHEMATICAL SPACE (16#03000#, 16#03000#)); -- IDEOGRAPHIC SPACE .. IDEOGRAPHIC SPACE **************************************************************** From: Robert Dewar Sent: Sunday, January 30, 2005 8:45 AM Vincent Celier (who seems to be turning himself into a Unicode expert :-) makes the following valid point: In http://www.unicode.org/Public/4.0-Update/UCD-4.0.0.html: For backwards compatibility, in the file UnicodeData.txt a range is specified not by the form "X..Y", but by their start and end characters. In such cases, the names of characters in the range are algorithmically derivable. Surrogate code points and private use characters have no names. In http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html: There are six special ranges of characters that are represented only by their start and end characters, since the properties in the file are uniform, except for code values (which are all sequential and assigned). This means that the code tables I posted need adjustment with respect to: > It seems to me that, when in the Unicode database we have <... First> > and <... Last> that are of some category, then the whole range First .. > Last of characters are of this category. > > So, these entries in the table UTF_32_Letters: > > (16#03400#, 16#03400#), -- .. > > (16#04DB5#, 16#04DB5#), -- .. > > > should be replaced with this single entry: > > (16#03400#, 16#04DB5#), -- .. > > > Similarly: > > (16#04E00#, 16#04E00#), -- .. Ideograph, First> > (16#09FA5#, 16#09FA5#), -- .. Ideograph, Last> > => > (16#04E00#, 16#09FA5#), -- .. Ideograph, Last> > > (16#0AC00#, 16#0AC00#), -- .. Syllable, First> > (16#0D7A3#, 16#0D7A3#), -- .. Syllable, Last> > => > (16#0AC00#, 16#0D7A3#), -- .. Syllable, Last> > > (16#20000#, 16#20000#), -- .. > > (16#2A6D6#, 16#2A6D6#), -- .. > > => > (16#20000#, 16#2A6D6#), -- .. > > > There are similar modification to be done to the UTF_32_Non_Graphic table. **************************************************************** From: Randy Brukardt Sent: Wednesday, April 13, 2005 8:05 PM The wording in 6.1(10) now says: The sequence of characters in an operator_symbol shall be identical, after conversion to upper case, to the sequence of characters for one of the six classes of operators defined in clause 4.5 (in upper case). Spaces are not allowed. One or more characters in category other_format may be inserted after any graphic_character in the operator_symbol if the operator_symbol is a reserved word. This seems wrong, as an operator_symbol is a string literal. And how anything with quotes around it could ever "be identical" to "the sequence of characters for one" of "operators defined in clause 4.5" is beyond me. The original wording isn't much better, as it talks about "correspond to" one of the operators (whatever that means). I suggest adding "and removing the surrounding quotation marks" after "conversion to upper case". The last sentence needs fixing, too. The sequence of characters in an operator_symbol shall be identical, after conversion to upper case and removing the surrounding quotation marks, to the sequence of characters for one of the six classes of operators defined in clause 4.5 (in upper case). Spaces are not allowed. One or more characters in category other_format may be inserted after any graphic_character in the operator_symbol other than the surrounding quotation marks if the operator_symbol is a reserved word. Maybe there is a better way to do it, but I can't think of it off-hand. ****************************************************************