Version 1.3 of ais/ai-00038.txt

Unformatted version of ais/ai-00038.txt version 1.3
Other versions for file ais/ai-00038.txt

!standard B.3 (46)          99-09-14 AI95-00038/04
!class confirmation 95-06-25
!status WG9 approved 95-06-14
!status ARG approved 10-0-0 (by letter ballot) 96-06-05
!status ARG approved 9-0-1 95-11-01
!status received 95-06-25
!subject Mapping between Interfaces.C.char and Standard.Character
!summary
The To_C and To_Ada functions in Interfaces.C map between corresponding characters, not necessarily between characters with the same internal representation. Corresponding characters are characters defined by the same enumeration literal, if such exist; otherwise, the correspondence is not defined by the language.
!question
This paragraph states that To_Ada and To_C map between Character and char, but does not explain how. Presumably, Interfaces.C.char corresponds to the C type char, i.e., to the native character set of the target machine. Type Character, of course, always corresponds to Latin-1, regardless of the target machine. On an EBCDIC machine, does To_C('A') yield the C.Interfaces.char value corresponding to EBCDIC 'A', or does it yield the character whose EBCDIC code is Character'Pos('A')?
!response
The intent is that 'A' maps to 'A', even if the two 'A's have a different representation.
The following definition is equivalent to the above summary:
To_C (Latin_1_Char) = char'Value(Character'Image(Latin_1_Char))
provided that char'Value does not raise an exception; otherwise the result is not defined by the language.
To_Ada (Native_C_Char) = Character'Value(char'Image(Native_C_Char))
provided that Character'Value does not raise an exception; otherwise the result is not defined by the language.
!ACATS test
ACATS tests CXB3004 and CXB3006 check that the value of characters converted via To_Ada and To_C have the same meaning in both Ada and C.
!appendix

!section B.3(46)
!subject Mapping between Interfaces.C.char and Standard.Character
!reference RM95-B.3(46)
!from Norman Cohen
!reference as: 95-5119.c Norman H. Cohen 95-4-7>>
!discussion

This paragraph states that To_Ada and To_C map between Character and
char, but does not explain how.  Presumably, Interfaces.C.char
corresponds to the C type char, i.e., to the native character set of the
target machine.  Type Character, of course, always corresponds to
Latin-1, regardless of the target machine.  On an EBCDIC machine, does
To_C('A') yield the C.Interfaces.char value corresponding to EBCDIC 'A',
or does it yield the character whose EBCDIC code is Character'Pos('A')?

****************************************************************

!section B.3(46)
!subject Mapping between Interfaces.C.char and Standard.Character
!reference RM95-B.3(46)
!reference 95-5119.c Norman Cohen 95-04-07
!from Tucker Taft 95-04-12
!reference as: 95-5127.c Tucker Taft 95-4-12>>
!discussion

> This paragraph states that To_Ada and To_C map between Character and
> char, but does not explain how.  Presumably, Interfaces.C.char
> corresponds to the C type char, i.e., to the native character set of the
> target machine.  Type Character, of course, always corresponds to
> Latin-1, regardless of the target machine.  On an EBCDIC machine, does
> To_C('A') yield the C.Interfaces.char value corresponding to EBCDIC 'A',
> or does it yield the character whose EBCDIC code is Character'Pos('A')?

The intent was definitely to map 'A' to the "corresponding" 'A',
not simply the character with the same position number.

****************************************************************

!section A.12.1(29)
!subject Call for letter ballot on AI95-00038
!reference AI95-00038/01
!from Norman Cohen 96-04-29
!reference 96-5520.a Norman H. Cohen 96-4-29>>
!discussion

I call for a letter ballot on AI-38, which, I believe, needs more work.

AI-38 talks about "corresponding" characters in types Character and
Interfaces.C.char, but doesn't define (except by one example) what that
correspondence is.  The example suggests that corresponding characters
are those named by the same character literal, but if that is the
intent it ought to be stated explicitly.

In addition, there may be some characters in one character set that have
no corresponding character in the other character set.  (For example, the
EBCDIC cent character has no counterpart in Latin-1; the Latin-1
square-bracket characters have no officially defined counterpart in
EBCDIC, but there are two or three translations in common use.)  The only
reasonable choice in such cases is to leave the behavior of To_C or
To_Ada implementation-defined or to raise Constraint_Error.  I believe
that leaving it implementation-defined is more likely to lead to
implementations that do what the programmer and end-user expect.

Here is one possible formulation:

   To_C (Latin_1_Char) = char'Value(Character'Image(Latin_1_Char))
       provided that char'Value does not raise an exception;
       otherwise the result is implementation-defined.

   To_Ada (Native_C_Char) = Character'Value(char'Image(Native_C_Char))
       provided that Character'Value does not raise an exception;
       otherwise the result is implementation-defined.

****************************************************************

!section A.12.1(29)
!subject Call for letter ballot on AI95-00038
!reference AI95-00038/01, NCohen's email of 96-04-29
!from David Emery 96-04-29
!reference 96-5522.a demery@CCGATE.HAC.COM 96-4-29>>
!discussion

>[Norm] call[s] for a letter ballot on AI-38, which, [he] believe[s],
>needs more work.

We spent a lot of time on this area in POSIX, and I strongly
suggest that the ARG take a good look at the character correspondence
rules in Chapter 8 (and the rationale) of the POSIX/Ada binding.
(This is very well covered in the POSIX/Ada tutorial we've given several
times at Tri-Ada, Ada-Europe and Ada-UK.  If you have a copy of the
TA tutorial proceedings from 92 or 93, it's in there.)

Jim Moore did most of the work on this (both the standard and the
tutorial :-).

                    dave

****************************************************************

!section B.3(46)
!subject Ada/C Character correspondence
!reference AI95-00038/02
!from David Emery 96-05-15
!reference 96-5556.a demery@CCGATE.HAC.COM 96-5-15>>
!discussion

The recommendation currently reads:
  Corresponding characters are characters defined by the
  same enumeration literal, if such exist; otherwise, the correspondence
  is not defined by the language.

I think we need to be a bit more specific in the "doesn't match" case.  In
particular, we can:
    a.  raise Constraint_Error
    b.  return an implementation-defined value in the target type
    c.  return a language (AI :-) defined value in the target type
    d.  consider this a bounded error

Option 'a' retains the "flavor" of checked conversions, and I tend to prefer
this approach.

If users don't want this behavior, then they can implement the other
behaviors by interpreting the 'pos or even the underlying value bit
representation.  Of course, such conversions depend on the representations
of the two character sets, but that is as it should be.





****************************************************************

!section B.3(46)
!subject Ada/C Character correspondence
!reference AI95-00038/02
!from David Emery 96-05-21
!reference 96-5566.a demery@CCGATE.HAC.COM 96-5-21>>
!discussion

The effect of making the mappings implementation-dependent is to remove their
utility for portability.   I do NOT think we want to do this.  Either
we have to provide some notion of portable/dependable semantics, or we
give up entirely.

My position is as follows:
    1.  The mapping function is defined so that character literals are
        preserved, i.e. to_c ('a') returns the C encoding of the character
        literal 'a'.  This is defined for all character literals that appear
        in both the Ada and C implementations.

    2.  Where a literal does not appear in both Ada and C, the options are:
            a.  implementation-defined (Norm's recommendation)
            b.  Constraint_Error (my recommendation)
            c.  undefined (if the standard doesn't say anything about this)
            d.  some 'specified value' is returned (specified by the ARG)

I can live with 2a, 2b or 2d, but not 2c.  I believe that 2b is the right
answer for this operation, since the user can define an implementation-specific
mapping function (based on 'pos, for instance), and the Ada predefined
function should "look like" other type conversion functions (which
generally raise Constraint_Error in similar circumstances.)



****************************************************************

!section B.3(46)
!subject Ada/C Character correspondence
!reference AI95-00038/02
!reference 96-5566.a demery@CCGATE.HAC.COM 96-5-21
!from Keith Thompson 96-05-21
!reference 96-5568.a Keith Thompson 96-5-22>>
!discussion

David Emery suggests that To_Ada and To_C should raise Constraint_Error
whenever a character literal does not appear in both Ada and C.  I suggest
that an implementation should at least be allowed to provide a meaningful
mapping for control characters.  In particular, if the C type char is
Latin-1, To_Ada and To_C should be identity functions.

Also, I believe there are (several!) standard conversions between EBCDIC
and ASCII which (I think) define mappings between printable characters
whose literals aren't necessarily the same; see the Unix program "dd"
with arguments "conv=ascii", "conv=ebcdic", or "conv=ibm".  Perhaps
implementations should be allowed to use these "standard" mappings,
since that's what users are most likely to expect.

I'm not convinced that portability is much of an issue.  The whole point
of the To_Ada and To_C functions is that character sets are non-portable.


****************************************************************

Questions? Ask the ACAA Technical Agent