Version 1.2 of ai05s/ai05-0226-1.txt

Unformatted version of ai05s/ai05-0226-1.txt version 1.2
Other versions for file ai05s/ai05-0226-1.txt

!standard 2.4.2(5)          10-06-29 AI05-0226-1/01
!standard 2.4.2(6)
!standard 2.4.2(8)
!class amendment 10-10-21
!status No Action (7-1-1) 10-10-29
!status work item 10-10-21
!status received 10-06-14
!priority Low
!difficulty Easy
!subject Extended digits extended
!summary
Extended digits are extended to embrace the whole 26 letters of the alphabet. This permits based literals to be given in any base up to 36.
!problem
Users have observed that it would be convenient to be able to give literals in a base such as 24 or 32. The conventional restriction that the base should not exceed 16 seems artificial.
!proposal
Allow the extended digits to use the whole 26 letters of the alphabet thereby allowing bases up to and including 36.
!wording
Replace 2.4.2(5) by
extended_digit ::= digit | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
Modify 2.4.2(6) as follows
The base (the numeric value of the decimal numeral preceding the first #) shall be at least two and at most [sixteen]{thirty-six}. The extended digits A through [F]{Z} represent the digits ten through [fifteen]{thirty-five}, respectively. The value of each extended_digit of a based_literal shall be less than the base.
Modify 2.4.2(8) as follows
The extended_digits A through [F]{Z} can be written either in lower case or in upper case, with the same meaning.
!discussion
The extension to allow bases up to 36 is straightforward. However, the limit of 36 is itself artificial and it is interesting to explore how arbitrarily large bases could be accommodated.
The Babylonians used base 60 and we have inherited that for times and angles with 60 minutes in an hour (or degree) and 60 seconds in a minute. The Babylonian representation used a two tier system whereby each base 60 digit was written in their ordinary notation for numbers up to 59. This was like Roman notation with one mark for 1 and a different mark for 10. These marks were a bit like V and <. Thus 23 is written as <<VVV and 59 as <<<<<VVVVVVVVV.
The base 60 digits were separated by spaces with a wide space representing
zero.
We can and often do represent base 60 literals in a similar way. Thus 2 degrees 30 minutes and 10 seconds can be written as 230'10" and 2 hours 30 minutes and 10 seconds as simply 2:30:10.
So in Ada we could represent a base 60 numeric literal by
60#2:30:10# or maybe 60#2'30'10#
Real literals would be written as expected. Thus 37#2'55.10'17# is
237 + 55 + 10/37 + 17/(3737)
The individual superdigits are written as base 10 numerals. Underlines could be permitted in the numerals as usual but there seems no need for any special punctuation to group or space superdigits.
The syntax might be
superbased_literal ::= base#superbased_numeral[.superbased_numeral]#[exponent]
superbased_numeral ::= numeral['numeral]
This notation is very easy to read and could be applied to bases such as 32. Indeed 32#31'19'0'0# is perhaps easier to appreciate than 32#VJ00# and probably less error prone when writing the literals.
!example
(See discussion.)
--!corrigendum 02.04.02(5)
!ACATS test
!appendix

From: Peter Hermann
Sent: Monday, June 14, 2010  7:06 AM

do you think that
http://www.ada-auth.org/cgi-bin/cvsweb.cgi/acs/ac-00070.txt?rev=1.1
does have a chance to find consideration by WG9?
Apologies for asking for a pure mundane practical issue.

*************************************************************

From: John Barnes
Sent: Tuesday, June 29, 2010  4:18 AM

At the last meeting I was volunteered to write an AI permitting based literals
up to base 36 by extending the notion of extended digits to use the whole
alphabet. This had been requested by some customers.

However, it occurred to me that we could extend the base without limit by using
the technique of the ancient Babylonians. Accordingly I have wrtten that as an
alternative in the discussion.

[This is version /01 of the AI - Editor.]

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  6:04 AM

My reaction is that this is too specialized to be worth considering.
Bases greater than 16 are too rare in practice to accomodate at this level in
the language.

I have never seen any customer request for such a feature. I think any decision
to go ahead would need to be conditioned on a convincing case of need.

*************************************************************

From: Tucker Taft
Sent: Tuesday, June 29, 2010  7:54 AM

This request arose in a customer, not
in the ivory towers of language lawyerism.
You might want to read the original
ada-comment which was re-endorsed recently by the German delegation.  See the
recent e-mail from Erhard.

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  8:35 AM

OK, well if it is something the German delegation wants, it is trivial enough to
add, a few minutes work in the compiler nothing more, so I don't seriously
object (though I still find it pretty dubious, sort of the in category of
enumerations with holes, or non-binary modular types).

I assume the original ada-comment will be in the AI (I don't have access to
ada-comment).

*************************************************************

From: Tucker Taft
Sent: Tuesday, June 29, 2010  8:58 AM

Here is the note from Peter Hermann:

   do you think that
   http://www.ada-auth.org/cgi-bin/cvsweb.cgi/acs/ac-00070.txt?rev=1.1
   does have a chance to find consideration by WG9?
   Apologies for asking for a pure mundane practical issue.

   Peter Hermann

You should have access to the above URL.

I'm not convinced we need to go for support beyond base 36, though I admit the
Babylonians were clever fellas.  Supporting up to base 36 is trivial. Going
beyond that is some amount of work, which involves Text_IO, 'Wide_Wide_Value,
etc. as well.

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  9:23 AM

I am completely convinced we should NOT go beyond base 36, we really don't want
more unused features that require a lot of work to implement (think leap seconds
:-))

*************************************************************

From: John Barnes
Sent: Tuesday, June 29, 2010  11:02 AM

I didn't expect anyone to want to go beyond base 36 with its  horrors of
Wide_Wide_Gosh_Golly. But I thought it worth putting in the discussion simply as
a record of how it might be done.

But I do share concern regarding the use of I and O possibly confusing with 1
and 0.

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  11:19 AM

I actually find the use of extended letters beyond F pretty horrible. I
    accept A-F as 10-15 because I happen to know them well, but who knows that M
    is 22?

If you see a number something like

     35#AQLXM23#

I am at a loss to understand what the heck this means. How about a separate
syntax that uses decimal digit groups separated by underscores or somesuch, so
we would write the above as

     35#10_26_21_33_22_2_3#

(if I got any of these digits wrong, it just shows how hard it is to read the
alhabetic nonsense)

For sure the latter form is a little bit more work to implement, but really not
so much, and I really object to the extended letters, never mind the I/1 and O/0
issues.

I am not wed to the particular syntax I suggested, just the idea that we give
decimal digit values in some reasonable syntax. I am sure that the original data
is more likely to be in this form than AQLXM23 form.

If we do want to get fancy, we could resurrect the 9X mapping document
suggestion for user defined literals :-) :-)

Note that apart from efficiency, a notation like

    Based_Literal (35,(10,26,21,33,22,2,3))

using an aggregate, can be used fine today, I really wonder whether extra syntax
is worthwhile, since all such constants can be evaluated at elaboration time,
how big a deal is a few calls to Based_Literal at elaboration time.

*************************************************************

From: John Barnes
Sent: Tuesday, June 29, 2010  11:39 AM

Well that is more or less what I suggested in the !discussion as an alternative
which would indeed have no limit and could be read easily. Actually I proposed

The syntax might be

superbased_literal ::=
base#superbased_numeral[.superbased_numeral]#[exponent]

superbased_numeral ::= numeral['numeral]

This notation is very easy to read and could be applied to bases such as
32. Indeed 32#31'19'0'0#  is perhaps easier to appreciate than 32#VJ00# and
probably less error prone when writing the literals.

which is more or less what you wrote Robert except for changing your
underscors into primes.

*************************************************************

From: Stephen Michell
Sent: Tuesday, June 29, 2010  12:47 PM

Underscores cannot be used as a seperator because you can already write
12#1_2_3# = 12#12_3#

I personally find the "'" difficult, but Find 32#26:0:14:31# Quite readable.

*************************************************************

From: Tucker Taft
Sent: Tuesday, June 29, 2010  1:02 PM

The ":" is problematic because it is
allowed to take the place of "#" in based literals (courtesy of Col. Whitaker
and his 026 keypunch, I believe -- see J.2(3)).

Can you elaborate on what you find "difficult" about "'"?

*************************************************************

From: Tucker Taft
Sent: Tuesday, June 29, 2010  1:50 PM

Robert Dewar wrote:
> I don't think the # was on the O29 keypunch even.

Here is a JPEG of the 029 keypunch.  It has a "#":

   http://www.columbia.edu/acis/history/029-keys.jpg

The 026 actually had several variants, some of which were produced to better support
programming in FORTRAN, etc.  Here is one of the 026 character sets:

   PROG +-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZb=':>V?.)[<!$*];^,(v\

Notice there is a ':' but no '#'.  There were several other 026 character sets more
oriented toward business than programming, and they had a '#' but no '='.

Ahhh, a bit of history...

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  1:08 PM

> Underscores cannot be used as a seperator because you can already
> write 12#1_2_3# = 12#12_3#

When I proposed underscores I of course had the implicit (but admittedly
irregular) rule that this only applies for bases > 16
>
> I personally find the "'" difficult, but Find 32#26:0:14:31# Quite
> readable.

That's nasty to me, given that : is a legitimate replacement for # so you don't
know if the colon is terminating the constant without looking ahead which seems
ugly to me.

I think the quotes are OK, but really any separator would be fine, we could even
use the minus sign :-)

*************************************************************

From: Stephen Michell
Sent: Tuesday, June 29, 2010  1:21 PM

I had forgotten that : was a replacement for # - my goof.
You cannot use underscores because they are not a separator, and as I was
showing, even for numbers greater than 16, you may want to use them within a
single number just as you do today.

My problem with ' is just personal - my brain tends to skip over them. Of all of
the proposals that I have seen, (except : ) The use of ' seems to be best
because it already has a special place in Ada syntax today. Symbols such as -
are already used in arithmetic.

*************************************************************

From: Robert Dewar
Sent: Tuesday, June 29, 2010  2:04 PM

Of course we could take the position that the use of : is

a) unambiguous technically in the colon case

b) easy enough to scan, just look at the next character

c) no one uses the colon anyway, so what does it matter if it looks ugly

*************************************************************

From: Erhard Ploedereder
Sent: Wednesday, June 30, 2010  6:58 AM

> I have never seen any customer request for such a feature. I think any
> decision to go ahead would need to be conditioned on a convincing case
> of need.

The arguments why this capability was requested are contained in:
http://www.ada-auth.org/cgi-bin/cvsweb.cgi/acs/ac-00070.txt?rev=1.1

The original comment came from the head of the German WG9 delegation, who
complained to me recently that the comment had not been turned into an AI.

*************************************************************

From: Robert Dewar
Sent: Wednesday, June 30, 2010  7:06 AM

As per previous messages I find the use of weird letters totally unacceptable,
and I must say that I really don't see what's wrong with a function

     Based_Number (Base, (digit, digit, digit ....))

which would be called only during elaboration.

But if we must have syntactic mucking, it really has to use decimal values for
digits. Of course the Based_Number form above is more flexible, since it allows
using e.g. binary or hex values, or named constants.

I wonder if the original request here is really a requirement or just one of
those "neat idea" moments.

*************************************************************

From: Bob Duff
Sent: Monday, July  5, 2010  2:50 PM

Summary: I am opposed to adding bases beyond 16.

> The arguments why this capability was requested are contained in:
> http://www.ada-auth.org/cgi-bin/cvsweb.cgi/acs/ac-00070.txt?rev=1.1

Not any compelling arguments, as far as I can see.

> The original comment came from the head of the German WG9 delegation,

I assume that's Peter Hermann.

> who complained to me recently that the comment had not been turned
> into an AI.

Then why don't we turn it into an AI?  I think that AI should be rejected by the
ARG.  It sounds to me like Peter Hermann is annoyed that he is being ignored.
If we explicitly vote against such an AI, with proper reasoning, will he be
happy?

In the AC, he says:

> the coding of numbers into shorter strings by means of higher number
> bases is sometimes employed in practice (e.g. short filenames, etc.).

Sure, I've seen base 36 used to construct file names, in an attempt to pack a
lot of information into few characters, while avoiding oddball characters not
supported by all file systems.  Such names are totally unreadable, but that's OK
-- they're just unique Id's of some sort.  It's not clear what the "etc" above
is, but I can't see any use for integer literals with large bases in Ada source
code.

Common Lisp supports bases up to 36.

But this is hardly an important capability.  Anybody can implement their own
base 36 facility.

Note that the various suggestions in this thread using decimal syntax for the
individual digits of a large-base number are unresponsive to this requirement.
So I'd reject those out of hand (and anyway, as Robert pointed out, you can
program those yourself, too).

Note that the AI as written up by John doesn't really address the requirement,
either -- it talks about literals, but not the conversions to String in Text_IO.

> I see no compelling reason to prevent number systems beyond hexadecimal.

Not a strong argument, and somewhat refuted by the l vs. 1 and O vs 0 argument.

> the restriction to noncasesensitive latin abc may be a practical compromise.

I've no idea what that means.

In the same AC, Martin Dowie writes:

> This would be _very_ handy for things like digital maps systems and
> map preparation facilities.

Well, I can believe "handy", but "_very_ handy" seems bogus to me.
I can't imagine this Ada feature saving anybody a lot of work.

I really think "handy" is insufficient technical justification for this feature.
Of course, if the German WG9 delegation wants to pound their shoe on the table
about it, then maybe that's sufficient _political_ justification.  If so, then
please let's keep it simple (i.e. A-Z represents digits 10-35 (case
insensitive)).

P.S. My first reaction to John's message was "This must have been sent on April
1".  ;-)

*************************************************************

From: Robert Dewar
Sent: Tuesday, July 6, 2010  6:39 AM

I agree with all Bob's arguments here, and his summary position

*************************************************************


Questions? Ask the ACAA Technical Agent