CVS difference for ai05s/ai05-0286-1.txt
--- ai05s/ai05-0286-1.txt 2012/02/11 07:36:20 1.1
+++ ai05s/ai05-0286-1.txt 2012/02/14 08:16:04 1.2
@@ -1,4 +1,4 @@
-!standard 2.1(16/2) 12-02-10 AI05-0286-1/01
+!standard 2.1(16/2) 12-02-14 AI05-0286-1/02
!standard 11.4.1(19)
!standard A.3.5(0)
!standard A.7(14)
@@ -44,11 +44,7 @@
[Editor's note: The Swiss comment ends here. See also the discussion section.]
-
-(5) The use of VT and FF characters to represent line endings is obsolescent
-(especially for VT). Something should be done here.
-
-(6) Simple case folding should be provided as an operation in
+(5) Simple case folding should be provided as an operation in
Ada.Characters.Handling, so that case insensitive comparisons (as opposed to
case conversions) of strings can be accomplished.
@@ -75,12 +71,30 @@
Delete AARM note 2.1(18.a.1/3).
+[Editor's note: I took the simplest approach here, and treated 7-bit ASCII
+input files as a subset of UTF-8. (The ACATS only uses 7-bit ASCII and UTF-8,
+never 8-bit Latin-1.) As such, only a single requirement was needed.
+Practically, however, it's likely that compilers will treat pure 8-bit input
+(with no BOM) differently than UTF-8 input (with a BOM). I wasn't sure if it
+was worthwhile to describe both formats, it mostly seemed like more wording
+to me. But this is easy to change.]
+
[Editor's note: I do not know what to do with the Note 2.1(18) and the
associated AARM note. This is still *strictly* true, because the language only
*recommends* (as opposed to *specify*) a format. OTOH, it seems misleading to
me. My preference is to delete it and move a modified version of the AARM note
-onto this new Implementation Advice.]
+onto this new Implementation Advice. Bob thinks it would be better to delete
+both.]
+[Editor's opinion: I would actually prefer that we made this a requirement,
+rather than Implementation Advice. Then there is no need for 2.1(18) and the
+associated notes. In this case, implementation still could appeal to the
+"impossible or impractical" exception. I've always thought that the lack of
+a standard source format decreased the portability of Ada source code
+unnecessarily. OTOH, we can be somewhat more informal in Implementation Advice,
+so that might help describe this better. And the documentation requirement
+hopefully will reduce the chances of implementers ignoring it.]
+
[Editor's note: "code point" is as defined in ISO/IEC 10646; we mention this fact in AARM 3.5.2(11.p/3)
but not normatively. Formally, this is not necessary (as we include the definitions of 10646
by reference), but some might find it confusing.]
@@ -137,13 +151,9 @@
[Q: Should we do this for Environment_Variables as well? I think not; it's not
necessary (you can always put a UTF-8 encoded string there and get it back out
without any language discussion).]
-
-For (5):
-*** TBD *** [Presumably an Implementation Permission in 2.2?]
+For (5), add after A.3.5(21/3):
-For (6), add after A.3.5(21/3):
-
function Equal_Case_Insensitive (Left, Right : Wide_String) return Boolean;
Add after A.3.5(61/3):
@@ -160,7 +170,8 @@
conversions are the same but this routine will report the strings as
different.
-[Editor's note: Should the last sentence be a user note or an AARM note instead?]
+[Editor's note: Should the last sentence be a user note or an AARM note
+instead?]
!discussion
@@ -547,77 +558,6 @@
****************************************************************
From: Randy Brukardt
-Sent: Friday, February 10, 2012 11:20 PM
-
-Robert Dewar wrote:
-> It's really pretty horrible to use VT in sources to end a line, this
-> is an ancient bow to old IBM line printers.
-> I think we should define the use of this format effector as
-> obsolescent, and catch it using No_Obsolescent_Features.
->
-> Not sure about FF, it's certainly horrible to use it as a terminator
-> for a source line, but I have seen people use it in place of pragma
-> Page. I think this should probably also be considered obsolescent, but
-> am not so concerned about that one.
->
-> This is certainly not a vital issue!
-
-Tucker replied:
-
-> I see no harm in treating these as white space.
-> I think the bizarreness is treating these as line terminators, since
-> no modern operating system treats them as such, causing line numbers
-> to mismatch between Ada's line counting and the line counting of other
-> tools.
-
-I would inject a mild note of caution in terms of FF. One could argue that it
-makes sense for the interpretation of sources to match the implementation's
-Text_IO (so that Ada programs can write source text). If the programmer calls
-Text_IO.New_Page, they're probably going to get an FF in their file (that
-happens with most of the Ada compilers that I've used). Similarly, reading an FF
-will cause the end of a line if it is not already ended (although Text_IO will
-probably not write such a file).
-
-I don't give a darn about VT, though, other than to note that there is a
-compatibility problem to making a change. (But it is miniscule...)
-
-Robert replied:
-
-> But you must treat them as line terminators in the logical sense, the
-> RM insists on this, that is, you must have SOME representation for VT
-> and FF, of course strictly it does not have to be the corresponding
-> ASCII characters.
-
-The notion that the Standard somehow requires having some representation for
-every possible character in every source form is laughable in my view. The
-implication that this is required only appears in the AARM and only in a single
-note. There is absolutely nothing normative about such a "requirement". It makes
-about as much sense as requiring that an Ada compiler only run on a machine with
-a two button mouse! A given source format will represent whatever characters it
-can (or desires), and that is it.
-
-However, with the proposed introduction of Implementation Advice that compilers
-accept UTF-8 encoded files, where every character is represented by its code
-point, this becomes more important. If such a UTF-8 file contains a VT
-character, then the standard requires it to be treated as a line terminator.
-Period. Treating it as white space would require a non-standard mode (where the
-"canonical representation" was interpreted other than as recommended by the
-standard), or of course ignoring the IA completely. That seems bad if existing
-compilers are doing something else with the character.
-
-I'm not sure that the right answer is here. We could add an Implementation
-Permission that VT and FF represent 0 line terminators, or just do that for VT
-(assuming FF is used in Text_IO files), or say something about Text_IO, or
-something else. (We don't need anything to allow <LF><FF> to be treated as a
-single line terminator - 2.2(2/3) already says this). For Janus/Ada, I'd
-probably not make any change here (the only time I've ever seen a VT in a text
-file is in the ACATS test for this character, so I think it is essentially
-irrelevant as to how its handled, and for FF the same handing as Text_IO seems
-right), and I'd rather not be forced to do so.
-
-****************************************************************
-
-From: Randy Brukardt
Sent: Wednesday, January 11, 2012 11:23 PM
We have received the following comment from Switzerland. I'm posting it here so
@@ -885,5 +825,536 @@
For a blunter definition of "political correctness", see
http://www.urbandictionary.com/define.php?term=politically%20correct
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February 10, 2012 11:20 PM
+
+Robert Dewar wrote:
+> It's really pretty horrible to use VT in sources to end a line, this
+> is an ancient bow to old IBM line printers.
+> I think we should define the use of this format effector as
+> obsolescent, and catch it using No_Obsolescent_Features.
+>
+> Not sure about FF, it's certainly horrible to use it as a terminator
+> for a source line, but I have seen people use it in place of pragma
+> Page. I think this should probably also be considered obsolescent, but
+> am not so concerned about that one.
+>
+> This is certainly not a vital issue!
+
+Tucker replied:
+
+> I see no harm in treating these as white space.
+> I think the bizarreness is treating these as line terminators, since
+> no modern operating system treats them as such, causing line numbers
+> to mismatch between Ada's line counting and the line counting of other
+> tools.
+
+I would inject a mild note of caution in terms of FF. One could argue that it
+makes sense for the interpretation of sources to match the implementation's
+Text_IO (so that Ada programs can write source text). If the programmer calls
+Text_IO.New_Page, they're probably going to get an FF in their file (that
+happens with most of the Ada compilers that I've used). Similarly, reading an FF
+will cause the end of a line if it is not already ended (although Text_IO will
+probably not write such a file).
+
+I don't give a darn about VT, though, other than to note that there is a
+compatibility problem to making a change. (But it is miniscule...)
+
+Robert replied:
+
+> But you must treat them as line terminators in the logical sense, the
+> RM insists on this, that is, you must have SOME representation for VT
+> and FF, of course strictly it does not have to be the corresponding
+> ASCII characters.
+
+The notion that the Standard somehow requires having some representation for
+every possible character in every source form is laughable in my view. The
+implication that this is required only appears in the AARM and only in a single
+note. There is absolutely nothing normative about such a "requirement". It makes
+about as much sense as requiring that an Ada compiler only run on a machine with
+a two button mouse! A given source format will represent whatever characters it
+can (or desires), and that is it.
+
+However, with the proposed introduction of Implementation Advice that compilers
+accept UTF-8 encoded files, where every character is represented by its code
+point, this becomes more important. If such a UTF-8 file contains a VT
+character, then the standard requires it to be treated as a line terminator.
+Period. Treating it as white space would require a non-standard mode (where the
+"canonical representation" was interpreted other than as recommended by the
+standard), or of course ignoring the IA completely. That seems bad if existing
+compilers are doing something else with the character.
+
+I'm not sure that the right answer is here. We could add an Implementation
+Permission that VT and FF represent 0 line terminators, or just do that for VT
+(assuming FF is used in Text_IO files), or say something about Text_IO, or
+something else. (We don't need anything to allow <LF><FF> to be treated as a
+single line terminator - 2.2(2/3) already says this). For Janus/Ada, I'd
+probably not make any change here (the only time I've ever seen a VT in a text
+file is in the ACATS test for this character, so I think it is essentially
+irrelevant as to how its handled, and for FF the same handing as Text_IO seems
+right), and I'd rather not be forced to do so.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Saturday, February 11, 2012 9:03 AM
+
+> The notion that the Standard somehow requires having some
+> representation for every possible character in every source form is laughable in my view.
+
+Not sure what you mean by "laughable", but formally speaking, OF COURSE an
+implementation must support all the characters in the character set the standard
+requires. Refusing to compile programs containing VT would be just as
+nonconforming as refusing to compile programs containing the letter "A".
+
+Practically speaking, on the other hand, I agree with "I don't give a darn about
+VT". But if it never occurs in Ada programs (other than ACVC tests), they
+there's no reason to change the rules.
+
+>...The
+> implication that this is required only appears in the AARM and only in
+>a single note. There is absolutely nothing normative about such a
+>"requirement".
+
+I disagree. Implementations can't just make up legality rules.
+
+> I'm not sure that the right answer is here. We could add an
+> Implementation Permission ...
+
+See what I said the other day about Implementation Permissions.
+
+I say, insufficiently broken. And it introduces an incompatibility:
+if a source contains "-- blah<FF> X := X + 1;" the suggested change will
+comment-out the assignment statement. Not likely to occur, but pretty nasty if
+it does.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Sunday, February 12, 2012 10:10 AM
+
+>> The notion that the Standard somehow requires having some
+>> representation for every possible character in every source form is laughable in my view.
+>
+> Not sure what you mean by "laughable", but formally speaking, OF
+> COURSE an implementation must support all the characters in the
+> character set the standard requires. Refusing to compile programs
+> containing VT would be just as nonconforming as refusing to compile
+> programs containing the letter "A".
+
+I 100% agree with Bob on this, and do not know where Randy is coming from.
+
+I agree we could have impl permission to ignore VT, but I really think any
+change to the handling of FF would generate gratuitous incompatibilites in
+existing programs, where the use of FF to get new pages in listings is not
+uncommon.
+
+> Practically speaking, on the other hand, I agree with "I don't give a
+> darn about VT". But if it never occurs in Ada programs (other than
+> ACVC tests), they there's no reason to change the rules.
+
+Right, changing the rules does not help existing implementations after all, it
+makes extra work!
+
+>> ...The
+>> implication that this is required only appears in the AARM and only
+>> in a single note. There is absolutely nothing normative about such a
+>> "requirement".
+>
+> I disagree. Implementations can't just make up legality rules.
+
+Yes, exactly
+
+>> I'm not sure that the right answer is here. We could add an
+>> Implementation Permission ...
+>
+> See what I said the other day about Implementation Permissions.
+>
+> I say, insufficiently broken. And it introduces an incompatibility:
+> if a source contains "-- blah<FF> X := X + 1;" the suggested
+> change will comment-out the assignment statement. Not likely to
+> occur, but pretty nasty if it does.
+
+Yes, exactly
+
+Let's do nothing here, no reason to make a change, not sufficiently broken!
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, February 11, 2012 2:19 PM
+
+> > The notion that the Standard somehow requires having some
+> > representation for every possible character in every source
+> form is laughable in my view.
+>
+> Not sure what you mean by "laughable", but formally speaking, OF
+> COURSE an implementation must support all the characters in the
+> character set the standard requires. Refusing to compile programs
+> containing VT would be just as nonconforming as refusing to compile
+> programs containing the letter "A".
+
+But this has *nothing* to do with the representation of the source. What I was
+saying is that a source representation does not necessarily have to have a
+representation for VT (or PI or euro sign or any other character). I think it is
+laughable to think that it ought to.
+
+I definitely agree with if the source representation *does* have a
+representation for VT, then it has to follow the standard associated with that
+character.
+
+> Practically speaking, on the other hand, I agree with "I don't give a
+> darn about VT". But if it never occurs in Ada programs (other than
+> ACVC tests), they there's no reason to change the rules.
+>
+> >...The
+> > implication that this is required only appears in the AARM and only
+> >in a single note. There is absolutely nothing normative about such a
+> >"requirement".
+>
+> I disagree. Implementations can't just make up legality rules.
+
+I never said anything about *making up legality rules*. And I surely was not
+considering *rejecting* programs containing VT. However, I think it would be
+perfectly OK if 16#0B# happened to be interpreted as a space in some source
+representation; there cannot be a requirement on *all* source representations.
+
+More generally, there is almost no requirement that any particular character be
+representable in a particular source form. Ada 83 made this clear, by trying to
+accommodate keypunch programs (which was archaic even back in 1980). Pretty much
+the only requirement is for the digits, letters, space, some line ending, and
+the delimiters defined in 2.2 (minus the allowed replacements). Anything else is
+optional. (One could imagine having some $ notation [square brackets not being
+in the 64 characters of the Unisys keypunches I used in the Paleozoic era of
+computing] for additional characters, but that is not helpful unless the tools
+also support it. Otherwise, they're just unintelligible gibberish in the text,
+making it much harder to read and understand.)
+
+The only indication to the contrary is the second sentence of 2.1(18.a/2), and
+it does not follow from any normative requirements (there is no requirement or
+need in Ada to translate *back* from the standard characters of an internal
+compiler representation to Ada source). IMHO, that sentence is complete fantasy.
+
+Anyway, this will become irrelevant if we adopt the Implementation Advice for a
+standard source form, since that form will contain all of the "standard"
+characters. It will still be optional (of course) to support this form, but
+implementers that don't support it will have to explain themselves. (Which is
+easy to do, at least in my case: no one has asked.)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, February 11, 2012 2:27 PM
+
+...
+> > I say, insufficiently broken. And it introduces an incompatibility:
+> > if a source contains "-- blah<FF> X := X + 1;" the suggested
+> > change will comment-out the assignment statement. Not likely to
+> > occur, but pretty nasty if it does.
+>
+> Yes, exactly
+>
+> Let's do nothing here, no reason to make a change, not sufficiently
+> broken!
+
+I personally agree with this, there is no important reason for a change.
+However, someone posting using "Robert Dewar"s name back in November seemed to
+think otherwise (call this person Robert Dewar #1):
+
+> It's really pretty horrible to use VT in sources to end a line, this
+> is an ancient bow to old IBM line printers. I think we should define
+> the use of this
+> format effector as obsolescent, and catch it using No_Obsolescent_Features.
+>
+> Not sure about FF, it's certainly horrible to use it as a terminator
+> for a source line, but I have seen people use it in place of pragma
+> Page. I think this
+> should probably also be considered obsolescent, but am not so
+> concerned about that one.
+
+Tucker jumped in to agree (saying that these both should be interpreted as a
+space), and then the topic dropped.
+
+I was prepared to ignore this thought forever, but when we decided to put an
+Implementation Advice for a standard UTF-8 source format on the agenda for the
+upcoming meeting (as a partial response to the Swiss comment), this seemed to be
+more important. After all, in that standard format, every character represents
+itself (I included wording to say that, as pointed out by Robert in a different
+thread), and that surely includes VT.
+
+So Robert #1 wants a change to the handling of VT, and Robert #2 does not. Not
+sure which Robert to pay attention to! Note that this is pretty much our last
+chance to make any changes here; once the standard format is in use, changing
+its interpretation would be too incompatible to contemplate.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February 13, 2012 2:31 PM
+
+> Tucker jumped in to agree (saying that these both should be
+> interpreted as a space), and then the topic dropped.
+
+Well there is nothing inconsistent between thinking something should be fixed or changed, and deciding that it is not worth the trouble!
+>
+> I was prepared to ignore this thought forever, but when we decided to
+> put an Implementation Advice for a standard UTF-8 source format on the
+> agenda for the upcoming meeting (as a partial response to the Swiss
+> comment), this seemed to be more important. After all, in that
+> standard format, every character represents itself (I included wording
+> to say that, as pointed out by Robert in a different thread), and that surely includes VT.
+>
+> So Robert #1 wants a change to the handling of VT, and Robert #2 does not.
+> Not sure which Robert to pay attention to! Note that this is pretty
+> much our last chance to make any changes here; once the standard
+> format is in use, changing its interpretation would be too incompatible to
+> contemplate.
+
+Leave VT as is, insufficiently broken to be worth fixing
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February 13, 2012 2:32 PM
+
+> But this has *nothing* to do with the representation of the source.
+> What I was saying is that a source representation does not necessarily
+> have to have a representation for VT (or PI or euro sign or any other
+> character). I think it is laughable to think that it ought to.
+
+I find this incomprehensible. Of course the source representation must allow all
+characters to be represented. As Bob Duff says, refusing to have a
+representation for VT would be equivalent to refusing to have a represention for
+'a'. There is no distinction.
+
+I am completely non-plussed by Randy's "laughable" view here ????
+
+> Anyway, this will become irrelevant if we adopt the Implementation
+> Advice for a standard source form, since that form will contain all of
+> the "standard" characters. It will still be optional (of course) to
+> support this form, but implementers that don't support it will have to
+> explain themselves. (Which is easy to do, at least in my case: no one
+> has asked.)
+
+Actually you have a positive requirement to document all failure to follow IA,
+whether you are asked or not.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Monday, February 13, 2012 3:10 PM
+
+> The only indication to the contrary is the second sentence of
+> 2.1(18.a/2),
+
+I'm completely mystified -- I must be totally misunderstanding what you mean.
+
+> Anyway, this will become irrelevant if we adopt the Implementation
+> Advice
+
+OK, if it's irrelevant, I won't bother arguing about it. I object to "fixing"
+2.1(18.a/2), and I object to adding any normative text that tries to say what
+2.1(18.a/2) is saying. If you are not proposing to do either of those things,
+then I'll drop the matter. Otherwise, I'll answer in more detail.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, February 13, 2012 3:16 PM
+
+> > But this has *nothing* to do with the representation of the source.
+> > What I was saying is that a source representation does not
+> > necessarily have to have a representation for VT (or PI or euro sign
+> > or any other character). I think it is laughable to think that it ought to.
+>
+> I find this incomprehensible. Of course the source representation must
+> allow all characters to be represented.
+> As Bob Duff says, refusing to have a representation for VT would be
+> equivalent to refusing to have a represention for 'a'. There is no
+> distinction.
+
+Why? There is nothing in the Standard that requires that. It requires an
+interpretation for each character that appears in the source, but it cannot say
+anything about which characters can appear in any particular source. How could
+it? So why do we care which characters can appear? It's actively harmful to
+include hacks like "brackets notation" in source to meet such a non-requirement
+in straight 8-bit formats -- to take one such example.
+
+And I agree with you: there is no distinction! It's perfectly OK to not allow
+'a' (so long as 'A' is allowed). And indeed, the only reason for saying that you
+need either 'a' or 'A' is one of practicality: you can't write useful Ada
+programs without the various reserved words including 'A'.
+
+> I am completely non-plussed by Randy's "laughable" view here ????
+
+I'm completely flabbergasted that anyone would think that there is any
+requirement or value to a requirement otherwise. Moreover, in the absence of a
+customer requirement, why should any Ada implementer spend time on this (in any
+way)?
+
+Anyway, this is probably going to be irrelevant down the line, so it probably
+does not need to be resolved.
+
+> > Anyway, this will become irrelevant if we adopt the Implementation
+> > Advice for a standard source form, since that form will contain all
+> > of the "standard" characters. It will still be optional (of course)
+> > to support this form, but implementers that don't support it will
+> > have to explain themselves. (Which is easy to do, at least in my
+> > case: no one has asked.)
+>
+> Actually you have a positive requirement to document all failure to
+> follow IA, whether you are asked or not.
+
+Sorry, you misunderstood: that is what my documentation would say: "We didn't
+implement UTF-8 formats, because no one has asked for support for identifiers
+and string literals with characters other than those in Latin-1."
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, February 13, 2012 3:23 PM
+
+> > The only indication to the contrary is the second sentence of
+> > 2.1(18.a/2),
+>
+> I'm completely mystified -- I must be totally misunderstanding what
+> you mean.
+
+Example: If you have an Ada source in some 6-bit character format (say the old
+keypunch), does it have to have some mechanism to represent other characters
+than those naturally present in that format. I say no, it would be harmful as
+the meaning would be inconsistent to what "normal" tools for that format would
+expect.
+
+> > Anyway, this will become irrelevant if we adopt the Implementation
+> > Advice
+>
+> OK, if it's irrelevant, I won't bother arguing about it.
+> I object to "fixing" 2.1(18.a/2), and I object to adding any normative
+> text that tries to say what 2.1(18.a/2) is saying.
+> If you are not proposing to do either of those things, then I'll drop
+> the matter. Otherwise, I'll answer in more detail.
+
+The Implementation Advice would require a UTF-8 format where every code point
+represents the associated character. Thus it renders 2.1(18.a/2) essentially
+irrelevant, as any implementation that follows the advice would trivially meet
+the requirement. And any implementation that doesn't would do so for good (and
+documented) reasons, and it would seem silly to care beyond that (let the market
+decide).
+
+I would suggest deleting that AARM note, along with the associated RM note, if
+the advice is added -- but it is not clear-cut and we'll have to discuss this in
+Houston.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Monday, February 13, 2012 3:39 PM
+
+> I would suggest deleting that AARM note, along with the associated RM
+> note, if the advice is added -- but it is not clear-cut and we'll have
+> to discuss this in Houston.
+
+OK.
+
+I don't object to deleting 2.1(18). And if we do that, then I don't object to
+deleting the following AARM annotations.
+
+The purpose of 2.1(18.a/2) was to explain 2.1(18). People would say things
+like, "What stops an impl from saying the source rep is FORTRAN, and thereby
+pass off a FORTRAN compiler as a conforming Ada impl." The answer is: you can
+only do that if you can explain the mapping FORTRAN<-->Ada, which ain't likely.
+;-)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February 13, 2012 5:42 PM
+
+> Why? There is nothing in the Standard that requires that. It requires
+> an interpretation for each character that appears in the source, but
+> it cannot say anything about which characters can appear in any
+> particular source. How could it? So why do we care which characters
+> can appear? It's actively harmful to include hacks like "brackets
+> notation" in source to meet such a non-requirement in straight 8-bit formats -- to take one such example.
+
+This would say that you regard almost any string literal as non-portable. I find
+that ludicrous.
+
+> I'm completely flabbergasted that anyone would think that there is any
+> requirement or value to a requirement otherwise. Moreover, in the
+> absence of a customer requirement, why should any Ada implementer
+> spend time on this (in any way)?
+
+Because the standard specifies the abstract character set that must be accepted.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Monday, February 13, 2012 5:42 PM
+
+> Example: If you have an Ada source in some 6-bit character format (say
+> the old keypunch), does it have to have some mechanism to represent
+> other characters than those naturally present in that format. I say
+> no, it would be harmful as the meaning would be inconsistent to what
+> "normal" tools for that format would expect.
+
+Yes, of COURSE it does!
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, February 13, 2012 7:03 PM
+
+> > Why? There is nothing in the Standard that requires that. It
+> > requires an interpretation for each character that appears in the
+> > source, but it cannot say anything about which characters can appear
+> > in any particular source. How could it? So why do we care which
+> > characters can appear? It's actively harmful to include hacks like
+> > "brackets notation" in source to meet such a non-requirement in
+> > straight 8-bit formats -- to take one such example.
+>
+> This would say that you regard almost any string literal as
+> non-portable. I find that ludicrous.
+
+Yes, of course. More generally, Ada (83-95-2005) has nothing to say about source
+formats, so by definition there is no portability of Ada source. And there
+surely is no requirement *in the Standard* that you can convert from one source
+format to another. Indeed, I've always considered this a major hole in Ada's
+definition, I'd rather have the standard clearly define this one way or another.
+
+As a practical matter, of course, all Ada compilers support processing the
+ACATS, so there is in fact a common interchange format. But with a handful of
+exceptions, that only requires 7-bit ASCII support, so if you are using anything
+else, it's at least potentially non-portable. And if you use the conversion
+tools provided by the target system, you're probably going to lose information.
+
+> > I'm completely flabbergasted that anyone would think that there is
+> > any requirement or value to a requirement otherwise. Moreover, in
+> > the absence of a customer requirement, why should any Ada
+> > implementer spend time on this (in any way)?
+>
+> Because the standard specifies the abstract character set that must be
+> accepted.
+
+Not at all, it defines the handling of each character that *might* appear in Ada
+source. It never says anything *requiring* that you can actually write those
+characters (and I'm not sure that it can). Please find my *any* text that says
+the compiler *must* accept source containing the PI character (to take one
+example).
+
+
+Anyway, we can clearly diffuse this question by simply putting in the Standard
+that processing UTF-8 is required. And even without *requiring* that, simply
+recommending it will definitely reduce the situation (any implementation
+following the recommendation will have a clear, common format for Ada source
+code). I'd actually be in favor of requiring it, even through that would make
+Janus/Ada non-compliant in this area. The only reason for not doing that IMHO is
+to avoid making work to implementers for which they have no customer demand.
+(And if everyone agrees with you, then there cannot be much actual work involved
+for other implementations.)
****************************************************************
Questions? Ask the ACAA Technical Agent