Version 1.1 of ai12s/ai12-0337-1.txt

Unformatted version of ai12s/ai12-0337-1.txt version 1.1
Other versions for file ai12s/ai12-0337-1.txt

!standard A.16(74/2)          19-06-06 AI12-0337-1/01
!class binding interpretation 19-06-06
!status work item 19-06-06
!status received 19-06-03
!priority Low
!difficulty Easy
!qualifier Clarification
!subject Simple_Name("/") in Ada.Directories
!summary
Ada.Directories.Simple_Name of a root returns the root itself.
!question
What should be the result of Ada.Directories.Simple_Name("/") be? ("/".)
!recommendation
(See Summary.)
!wording
Add after A.16(47/2):
A root directory is a directory whose name cannot be decomposed.
Implementation Note: For Unix and Unix-like systems, "/" is the root. For Windows, "C:\" and "\\Computer\Share" are roots.
[Editor's note: This was AARM A.16.1(20.a/3), we're moving it here with the definition.]
Modify A.16(74/2): [Simple_Name]
Returns the simple name portion of the file name specified by Name. {The simple name of a root directory is the root itself.} The exception Name_Error is propagated if the string given as Name does not allow the identification of an external file (including directories and special files).
Add after A.16(74/2):
AARM Discussion: The result of Simple_Name corresponds to the result of
the "basename" command on Linux and Unix. The "basename" command ignores a trailing '/' and then returns the part of the name in front of a '/'. It returns a root intact. The null string is never returned. Similar rules should be used for Windows filenames.
Add after AARM A.16(76.a/2):
AARM Ramification: Containing_Directory raises Use_Error when passed a
string representing a root directory. A root has no containing directory by definition.
Modify A.16(81/3):
Returns the name of the external file with the specified Containing_Directory, Name, and Extension. If Extension is the null string, then Name is interpreted as a simple name; otherwise, Name is interpreted as a base name. The exception Name_Error is propagated if{:}[ the string given as Containing_Directory is not null and does not allow the identification of a directory, or if the string given as Extension is not null and is not a possible extension, or if the string given as Name is not a possible simple name (if Extension is null) or base name (if Extension is nonnull).]
{* the string given as Containing_Directory is not null and does not
allow the identification of a directory;
* the string given as Extension is not null and is not a possible extension;
* the string given as Name is not a possible simple name (if Extension is null) or base name (if Extension is nonnull); or
* the string given as Name is a root directory and either of Containing_Directory or Extension is nonnull.
[Editor's note: Rather than the second part of this rule, we could have
Base_Name passed a root raising Name_Error. I didn't do that as we'd still need the containing directory rule, and it doesn't seem to help enough.]
Add after A.16.1(17/3):
AARM Ramification: Root directories are considered simple names, so this function will return True if Name represents a root. Use Is_Root_Directory if it necessary to tell between roots and other simple names.
Delete AARM A.16.1(20.a/3). [It was moved above.]
Modify AARM A.16.1(28.a/3):
Ramification: Relative names include simple names {other than root directories} as a special case. This function returns False if the syntax of the name is incorrect.
Modify the last line of AARM A.16.1(35.c/3):
Rel(1) is equivalent to a simple name {that is not a root}; thus we don't have to describe that separately.
Modify the last line of AARM A.16.1(35.e/3):
Else if N = 1, raise {Use_Error}[Name_Error].
!discussion
After altogether too much discussion, it was pointed out that the Unix command "basename" has essentially this function. It makes sense to follow this model, as it means that the function Simple_Name already returns an answer for any sane file name (Unix calls these "pathnames").
In our wording, we leave the description of "simple name" (the term) unchanged. The wording is vague enough to include roots. We clarify that Simple_Name returns a root unchanged. This keeps Simple_Name idemopotent and keeps it from raising exceptions unnecessarily.
To allow this, we have to explicit state that Compose of a Root will raise Name_Error unless the Containing_Directory is null.
We update a number of AARM notes to clarify the intent as well, in particular that Containing_Directory(<Some_Root>) raises Use_Error. This is necessary in order that Compose (Containing_Directory (A), Simple_Name (A)) always returns a name functionally the same as A in the absence of exceptions.
!ASIS
No ASIS effect.
!ACATS test
!appendix

From: Tucker Taft
Sent: Monday, June  3, 2019  10:05 AM

A question came up recently at AdaCore as to what should be the result of 
Simple_Name("/"). The description of this function (A.16(73/2-74/3)) says:

  function Simple_Name (Name : in String) return String;

  Returns the simple name portion of the file name specified by Name. The 
  exception Name_Error is propagated if the string given as Name does not 
  allow the identification of an external file (including directories and 
  special files).

---

Later in A.16(127/2) it is made clear that Simple_Name does not require the 
passed-in string Name to identify an existing file.  It merely requires the 
Name string to have the potential for some day identifying a file -- i.e. 
needs to be a legal "pathname" in Unix parlance.

In any case, should SImple_Name("/") return "" or "."?  Because it says it 
returns the simple name "portion" of Name, that would be the empty string. 
On the other hand, the empty string is not a legal simple name in Unix, and
"." is understood implicitly when you open "/" in Unix.  GNAT currently 
returns the empty string.  A bug was reported that a new version was returning
"/" which is clearly wrong.  But is empty string the right answer, or should 
the result be a "legal" simple name, which means "." on Unix.

*****************************************************************

From: Richard Wai
Sent: Monday, June  3, 2019  10:08 AM

> A question came up recently at AdaCore as to what should be the result 
> of Simple_Name("/"). The description of this function (A.16(73/2-74/3)) says:
> 
>   function Simple_Name (Name : in String) return String;
> 
>   Returns the simple name portion of the file name specified by Name. 
> The exception Name_Error is propagated if the string given as Name 
> does not allow the identification of an external file (including 
> directories and special files).

Shouldn't it raise Name_Error in this case? The string "/" does not allow the 
identification of an external file.

*****************************************************************

From: Tucker Taft
Sent: Monday, June  3, 2019  10:20 AM

Note that "external file" includes directories and special files (which is 
made explicit in parentheses).  So "/" clearly allows the identification of a
directory.  A directory is considered an "external file" as far as this part 
of the Ada standard goes, as stated pretty clearly in A.16(45/2):

  "External files may be classified as directories, special files, or ordinary 
  files. A directory is an external file that is a container for files on the 
  target system. A special file is an external file that cannot be created or 
  read by a predefined Ada input-output package. External files that are not 
  special files or directories are called ordinary files."

So "external file name" in this part of the standard is what Unix calls a 
"pathname."

*****************************************************************

From: Richard Wai
Sent: Monday, June  3, 2019  10:21 AM

I want to clarify a bit here..

I think it is a mischaracterization to consider '/' to be the  name of the 
root directory in UNIX. '/' is a delimiter, and can never be a name. The 
"root" of the filesystem has no name (it is implicit). So since '/' itself is 
not a name, that's why I think Name_Error makes sense. And the root directory 
has an implicit name, we can't say it is a blank name such as a null string, 
because we don't "know" that (chroot might have a word for you here). Rather 
the root directort has some kind of unknown name that we wouldn't be able to 
return.

This also seems like the most portable approach, IMO.

*****************************************************************

From: Tucker Taft
Sent: Monday, June  3, 2019  10:38 AM

I suppose you can say that, but that is not actually the way Unix works.  Here
is the "official" definition of what is a legal "pathname" in Unix, according 
to the 12th paragraph of "Pathname Resolution" in:

  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13

"A pathname consisting of a single <slash> shall resolve to the root directory 
of the process. A null pathname shall not be successfully resolved. If a 
pathname begins with two successive <slash> characters, the first component 
following the leading <slash> characters may be interpreted in an 
implementation-defined manner, although more than two leading <slash> 
characters shall be treated as a single <slash> character."

---

Certainly if you use one of the Unix system calls that takes a "pathname" to 
do something with a directory (such as set it as the default directory, or 
open it for reading), "/" is perfectly legal.

> This also seems like the most portable approach, IMO.

Not sure what portability has to do with it.  Filenames/pathnames are 
operating-system specific, and this package is about interpreting 
operating-system-specific pathnames in a reasonable way.  "/" is a legal Unix 
pathname (aka "external file name" in this section's terminology), so it is 
reasonable to pass it to Simple_Name, and get some sort of reasonable result.

*****************************************************************

From: Richard Wai
Sent: Monday, June  3, 2019  11:00 AM

> Certainly if you use one of the Unix system calls that takes a "pathname"
> to do something with a directory (such as set it as the default directory,
> or open it for reading), "/" is perfectly legal.

I definitely see your point here, but I'd say that this also agrees with what 
I was saying. The documentation says that '/' should "resolve to" the root 
directory of the process. This to me means the root directory is implicit, 
especially because of the reference to the "root directory of the process".
UNIX allows different processes to have different roots (chroot).
Which means that the meaning of '/' is dependent on the process, not the 
platform, and it also means there could be an actual "Simple_Name" behind it 
that the process cannot see. So to me the documentation you linked aligns 
with my argument.

I'm probably just arguing semantics. If I'm truly honest with myself, I'd 
expect that Simple_Name ("/") would return "/". 

Maybe a "to be honest" does the job?

*****************************************************************

From: Bob Duff
Sent: Monday, June  3, 2019  11:10 AM

> In any case, should SImple_Name("/") return "" or "."?

I don't know, but I think the way to think about this question
is to think about what invariants should be preserved.
E.g. if you pick apart the pieces of a name, and compose
them back together, do you get what you started with?
Or at least something that denotes the same external
file as what you started with?

*****************************************************************

From: Tucker Taft
Sent: Monday, June  3, 2019  1:02 PM

> I'm probably just arguing semantics. If I'm truly honest with myself, 
> I'd expect that Simple_Name ("/") would return "/".

It seems quite clear from the AARM notes of this section (e.g. 47.a/2) and the 
next (e.g. A.16.1(20.a/3) and (35.c/3)) that a simple name has no slashes in 
it for Unix-like systems.

> Maybe a "to be honest" does the job?

I was asking whether the answer should be "" or ".".  You have come up with a 
third option, which I believe is certainly inconsistent with the intent.  
Perhaps Randy can clarify the intent!  

Bob's question also makes good sense, but Simple_Name seems to be the only 
function (other than Extension and Base_Name) described in this section as 
returning a "portion" of the input.  In particular, according to (76.a/2), 
Containing_Directory should return "." if there is no path at all on the 
name.  Clearly that is producing something which is not part of the 
passed-in Name.  

As a minor complaint, there is nothing that explains what happens when the 
Containing_Directory parameter of Compose (82/3) is the empty string.  
Presumably on Unix-like systems, nothing is prepended to the (base) Name.
If Containing_Directory is explicitly ".", then I would presume a "./" is 
prepended, that is, the result of Compose(".", "ABC") would be "./ABC".

*****************************************************************

From: Squirek
Sent: Monday, June  3, 2019  1:23 PM

To me this portion business sounds like an oversight especially since 
Containing_Directory has a different associated behavior - and if an empty 
string is a valid result that should be added to the documentation for the 
function because it might be non-obvious to an end-user. My vote is for "." 
over "" and for Simple_Name and Containing_Directory to be more uniform.

*****************************************************************

From: Richard Wai
Sent: Monday, June  3, 2019  2:41 PM

> I was asking whether the answer should be "" or ".".  You have come up with
> a third option, which I believe is certainly inconsistent with the intent.
> Perhaps Randy can clarify the intent!

But both "." and "" are definitely wrong.

The arm says that "The full name of an external file is a full specification 
of the name of the file.", and that a simple name is "The simple name of an 
external file is the name of the item, not including any containing directory 
names".

So in this case, the full name is "/", which identifies specifically the root 
directory of the filesystem. So it is a name for that external file.
Obviously the root directory is not contained by any other directory, meaning 
the simple name for the root directory "/" should be "the name of the item"
("/"), "not including any containing directory name" (there are none), and 
therefore the simple name should be "/". 

However, "" is not a valid name at all in any case. And "." is really bad 
because it is a name for what could be a totally different file. "." refers 
to the containing directory, so "/" will not refer to the same thing as "."
unless the program is executing with root as the working directory, which is 
rarely the case.

*****************************************************************

From: Tucker Taft
Sent: Monday, June  3, 2019  3:28 PM

> But both "." and "" are definitely wrong.

I don't follow your logic.  You seem to have decided that the root directory 
on Unix-like systems doesn't have a name, but that is a very unusual way of 
looking at it.  I think 99% of the Unix world would say that the name of the
root directory is "/".  The manual is also pretty clear about that, e.g., in
Is_Root_Directory_Name in A.16.1 (19-20.a/3).  

It turns out you can add a "." at the end of any Unix pathname that ends with 
"/" and still be talking about the same thing.  So we can expand "/" to "/.". 
Now we have a clear way of distinguishing Containing_Directory (i.e. "/") from
Simple_Name (i.e. ".").

I will agree that this RM section is somewhat confusing, because when one 
typically sees the word "file" you think "regular file" or "ordinary file" and 
don't include directories as a possibility.  But Unix has the same issue, where
it uses the term "file" more generally, and then you have to distinguish by 
specifying "regular file" or "directory" or "special file" (or "symbolic link",
which is not discussed in this part of the RM).

> The arm says that "The full name of an external file is a full 
> specification of the name of the file.", and that a simple name is 
> "The simple name of an external file is the name of the item, not 
> including any containing directory names".
> 
> So in this case, the full name is "/", which identifies specifically 
> the root directory of the filesystem. So it is a name for that external file.
> Obviously the root directory is not contained by any other directory, 
> meaning the simple name for the root directory "/" should be "the name 
> of the item" ("/"), "not including any containing directory name" 
> (there are none), and therefore the simple name should be "/".

I think this has become a more general issue.  If a full name ends with "/" 
what should be the simple name?  The description of Is_Relative_Name in 
A.16.1(27/3-28.a/3) I believe resolves part of the issue:

27/3
function Is_Relative_Name (Name : in String) return Boolean;

28/3
Returns True if Name allows the identification of an external file (including 
directories and special files) but is not a full name, and returns False 
otherwise.

28.a/3
Ramification: Relative names include simple names as a special case. This 
function returns False if the syntax of the name is incorrect. 

---

That AARM note says that simple names are a special case of relative names.  
The prior paragraph say that a relative name must *not* be a full name.  A 
full name is defined, for the purposes of this section, by:

25/3
function Is_Full_Name (Name : in String) return Boolean;

26/3
Returns True if the leftmost directory part of Name is a root, and returns 
False otherwise.

--- 

and a "root", for Unix-like systems (per A.16.1(20.a/3)) is the external file 
name for the root directory, i.e. "/".

So given the above, we know that a simple name cannot start with "/" for a 
Unix-like system.  

A.16(47.a/2) goes a bit further:

47.a/2
Discussion: The full name on Unix is a complete path to the root. For 
Windows®, the full name includes a complete path, as well as a disk name 
("C:") or network share name. For both systems, the simple name is the part 
of the name following the last '/' (or '\' for Windows®). For example, in the 
name "/usr/randy/ada-directories.ads", "ada-directories.ads" is the simple 
name. 

---

This pretty clearly argues for Simple_Name("/") to be "". 

One last bit of information is that Relative_Name("/") is supposed to raise 
Name_Error, but presumably Relative_Name("/.") is supposed to return ".", 
according to:

31/3
function Relative_Name (Name : in String) return String;

32/3
Returns the entire file name except the Initial_Directory portion. The 
exception Name_Error is propagated if the string given as Name does not allow 
the identification of an external file (including directories and special 
files), or if Name has a single part (this includes if any of Is_Simple_Name, 
Is_Root_Directory_Name, Is_Parent_Directory_Name, or Is_Current_Directory_Name 
are True).

---

Unfortunately, it is hard to generalize from Relative_Name to Simple_Name.  In 
particular, Simple_Name is idempotent (Simple_Name(Simple_Name(X)) = 
Simple_Name(X)), while each call on Relative_Name strips one more level of 
directory (starting from the left), so not surprisingly, it complains when you 
get down to only a single name.

I think at this point we should make a "Binding Interpretation" for either 
"" or ".".

> However, "" is not a valid name at all in any case. And "." is really 
> bad because it is a name for what could be a totally different file.

Not really, because "." by definition is the "simple name" of the containing 
directory on Unix-like systems.

> "." refers
> to the containing directory,

I presume you meant "current" directory in this context.

> so "/" will not refer to the same thing as "."
> unless the program is executing with root as the working directory, 
> which is rarely the case.

But the definition of "Simple_Name" is essentially what you get after 
stripping off all of the directory context from the full name.  So of course 
if you just use the result of Simple_Name by itself, it won't mean the same 
thing as the full name, unless you happen to be in the directory where 
Simple_Name comes from.

I fear we aren't making progress here...

I suppose we could have a straw vote about what Simple_Name("/") returns, 
between "", ".", and "/".  

Personally, I can live with "" and "." but I believe "/" contradicts the 
manual in several places, as I have tried to illustrate above.

*****************************************************************

From: Tucker Taft
Sent: Monday, June  3, 2019  4:14 PM

f we want to define some desirable invariants, we might agree that:

  not (Is_Full_Name(X) and Is_Simple_Name(X))

  (if Is_Simple_Name(X) then Is_Relative_Name(X))

  (if Is_Full_Name(X) then Is_Root_Directory_Name(Initial_Directory(X)))

  (if Is_Simple_Name(X) then Is_Current_Directory_Name(Initial_Directory(X)))

  (if Is_Simple_Name(X) then Is_Current_Directory_Name(Containing_Directory(X)))

---

More controversial might be:

  (if Is_Root_Directory_Name(X) then Containing_Directory(X) = X)

---

Deciding in favor of "." as result of Simple_Name("/") would produce:

   (if Is_Root_Directory_Name(X) then Is_Current_Directory_Name(Simple_Name(X)))

which would then allow us to say Simple_Name is idempotent, because if it 
returns anything, you can pass that again to Simple_Name and get the same 
thing:

   Simple_Name(Simple_Name(X)) = Simple_Name(X)

---

Deciding in favor of "" as result of Simple_Name("/") would produce:

  (if Is_Root_Directory_Name(X) then Simple_Name(X) = "")

  (if Is_Root_Directory_Name(X) then Simple_Name(Simple_Name(X)) raises Name_Error)

---

The invariants look somewhat better if Simple_Name("/") returns "." I would say!

*****************************************************************

From: Randy Brukardt
Sent: Monday, June  3, 2019   5:53 PM

...
> I fear we aren't making progress here...
> 
> I suppose we could have a straw vote about what
> Simple_Name("/") returns, between "", ".", and "/".  
> 
> Personally, I can live with "" and "." but I believe "/" 
> contradicts the manual in several places, as I have tried to 
> illustrate above.

The addition of Hierarchical_File_Names was supposed to answer all of these 
questions. This is a hard problem. In AI05-0049-1, I posted a lengthy mail 
message explaining the "algebra" behind these rules. (See !appendix, March 29, 
2007 8:32 PM). There's also a notation for explaining the details.

I believe the intent is that we follow this algebra closely, as that is the 
only way for Ada.Directories to be portable to different targets and in 
particular to reason about the results.

Unfortunately, the handling of roots is inconsistent. I tried to make 
something consistent in the Janus/Ada version of Ada.Directories, but looking 
at the implementation, it's terrible. Specifically:
    Simple_Name ("D:\") = "" -- A local root.
    Simple_Name ("\\Gatekeeper\Webroot") = "\\Gatekeeper\Webroot" -- A root on the network.

I tend to agree that returning a root from Simple_Name is unintended.

Simple_Name returning "." would be nonsense. "." represents the current 
directory, and that's not the answer here. It also would be never be the name 
you would want to use (see next point).

Simple_Name returning "" can be justified, but I don't think it makes sense in 
practice, since often such names are used to name related things and naming 
something the null string isn't a good idea. :-). Thus, none of the suggested
 solutions would work in that instance.

Ergo, I have to think that the best solution is for roots to raise Name_Error 
when passed to Simple_Name -- the entity does not have a simple name. 
Essentially, such a result would mean that a pretest for Root is required. I 
can't say if this is better than the alternative of returning ""
(since Janus/Ada does this, and I didn't even know it, it must not matter much 
in practice) - which also requires some sort of test before using it to name 
a constructed file.

Note that the companion decomposition function Containing_Directory does raise 
Name_Error when passed a root.

OTOH, the root test is in Hierarchical_File_Names, and thus we probably ought 
not depend on it for defining Simple_Name. That argues for saying that the 
simple name of a root is "", and probably the RM should mention that somehow. 
(And in that case, GNAT is already doing the right thing, which seems to be a 
bonus.)

The counterpart in Hierarchical_File_Names is "Initial_Directory", and it 
seems to be defined to return something in pretty much every case. So that
also argues for not raising an exception for Simple_Name.

I note that the definition of Compose (Ada.Directories version) seems to allow 
the Name to be null; it says it has to be a possible Simple_Name but makes no 
limitations on what that can be.

There's only two reasonable answers to me, either ""

*****************************************************************

From: Squirek
Sent: Monday, June  3, 2019  7:05 PM

> There's only two reasonable answers to me, either ""

I assume you meant to say either "" or raise an error ; )?

*****************************************************************

From: Randy Brukardt
Sent: Monday, June  3, 2019  7:45 PM

> I assume you meant to say either "" or raise an error ; )?

Sounds good. :-) I rewrote that part so many times that I hardly know what I 
think anymore. :-)

*****************************************************************

From: Squirek
Sent: Monday, June  3, 2019  10:04 PM

Well, it is possible to hold a straw poll even though this is getting into the
fuzzy area of OS specific stuff, but with my end-user hat on I would say 
raising an error is logical in cases of root directories or paths ending in 
directory separators.

What do you all think is the best course of action?

*****************************************************************

From: Randy Brukardt
Sent: Monday, June  3, 2019  10:50 AM

My eventual conclusion is that Simple_Name should not raise an exception, 
because the inverse function (Initial_Directory) is defined to not raise 
an exception in such cases. Specifically, if S is a simple name, 
Initial_Directory (S) = S. An exception is only raised if the parameter 
doesn't have legitimate path syntax.

It would make the most sense for Simple_Name to work similarly. (I expected 
Initial_Directory to work the same as Simple_Name, so this is a bit
recursive.) However, returning the root path would screw up the other 
invariants, so having it return "" in that case would work better. And I note 
that both GNAT and Janus/Ada do that (at least in some cases on some targets).
It probably would make sense to add some text suggesting/requiring that (if 
that's the decision).

Probably we ought to talk it over for a bit at a meeting and then take a poll. 
I'd suggest discussing just the two options (Name_Error or ""). Both require 
some work by the user if they are allowing arbitrary paths, but that seems 
necessary (and Ada.Hierarchical_File_Names.Is_Root_Directory_Name can make a 
precheck portable).

*****************************************************************

From: Richard Wait
Sent: Monday, June  3, 2019  10:55 PM

> There's only two reasonable answers to me, either ""

I also note that the wording of "Simple_Name" says it "Returns the simple name 
_portion_ of the file name specified by Name" (emphasis mine). This seems to me
to indicate that Simple_Name should not return anything that is not also a part
of Name. So this seems to be another technical strike against "." in this case.

From a user-weighted perspective, I'm not going to use Simple_Name to then 
append to some path string to get a full name, instead I'm likely going to 
use it to with Set_Directory, via Containing_Directory (Name), and then open 
a file separately using the result of Simple_Name (Name). Obviously opening a
file with "" would not work anyways. I'd think it would be less surprising if
Simple_Name refused to give me a name I couldn't open (via Name_Error), than 
it would for me to think I had a name since Simple_Name "worked", only to get
a Name_Error on call to Open.  

For that reason, I'd also vote for Name_Error. Otherwise, "" makes the most 
sense to me - because unlike ".", it would never be totally wrong.

But if we go with Name_Error, what happens to the text of Simple_Name, which 
states that Name_Error is only raised when Name does not identify an external
file? In the case of "/", this does (arguably) identify a file. So again as 
an end-user, I might find getting Name_Error for "/" would be surprising, and
not what the standard seems to say should happen.

P.S.
My original position was that "/" doesn't _actually_ identify a file in the 
normal sense, since it more of a concept than an actual file. I'll admit this 
was a shameless spin to try to justify Name_Error according to the standard 
as is. 

*****************************************************************

From: Jean-Pierre Rosen
Sent: Tuesday, June  4, 2019  12:33 AM

I think there are two possible positions:

1) these functions are purely string manipulation functions, i.e. they return 
parts of the initial string according to some system-dependent conventions. In 
that case, "" is clearly the answer.

2) these functions return useful strings, returning useful names designating 
the same file as described by the definition (but not necessarily a substring
of the original string). In that case, "." is clearly the answer.

I don't like too much the raising of the exception, because it would be quite 
hostile to the user. Typically, every user would have to handle the exception 
and use "." in that case.

Rather that looking for arguments in the RM, I think we should imagine 
typical use cases and agree on the solution that is most user-friendly.

*****************************************************************

From: Randy Brukardt
Sent: Tuesday, June  4, 2019  1:51 AM

> I think there are two possible positions:
> 
> 1) these functions are purely string manipulation functions, i.e. they 
> return parts of the initial string according to some system-dependent 
> conventions. In that case, "" is clearly the answer.

This is the intended design. But...

> 2) these functions return useful strings, returning useful names 
> designating the same file as described by the definition (but not 
> necessarily a substring of the original string). In that case, "." is 
> clearly the answer.

This is *not* the intended design. I don't buy the idea that this is useful,
anyway. You can't create a file with this name, if you pass it to Basic_Name
and then append a different extension, you'll end up with an illegal file 
name, and so on.

This also completely destroys the algebra behind the 6 related functions:
Simple_Name, Containing_Directory, Initial_Directory, Relative_Name, and the 
two Composes. "." (or "Current_Directory_Name", as it is called in 
Ada.Directories.Hierarchical_File_Names), isn't supposed to appear out of the 
blue.

> I don't like too much the raising of the exception, because it would 
> be quite hostile to the user. Typically, every user would have to 
> handle the exception and use "." in that case.

Arguably, it doesn't make sense to ask the name of something that doesn't have 
a name.

Let's look at some use cases:

One use case is that the input string is coming from the outside (unverified 
input). In that case, you'd have to handle Name_Error anyway (to guard against 
bad syntax).

Another use case is that the input string is known to refer to a file (not a 
directory). [That's the most common in my usage.] In that case, the exception
would only happen if there is a bug. I don't handle such exceptions (I want 
to know about bugs!).

Another use case is that the input string is known to refer to a directory 
(which could include a root). In that case, a pretest with Is_Root would avoid 
the need to handle the exception.

So I don't think that whatever the return is (either "" or exception) matters 
much. If you actually want to allow a root, you're going to have to do a test 
somewhere (either before or after the call), but it is extremely rare that 
you even want to allow a root in this context. (These routines are about 
deconstructing and reconstructing file names -- you always have to deal with 
boundary conditions in doing so.)
 
> Rather that looking for arguments in the RM, I think we should imagine 
> typical use cases and agree on the solution that is most 
> user-friendly.

The reason for looking in the RM is that we've considered these issues (at
least) twice before, and it's worthwhile to figure out what we were thinking.

In any case, unconditional deconstruction of something is nonsense; there are 
always going to be names that can't be deconstructed. So there's always going 
to have to be some sort of pretest or posttest. And the problem with "." is 
that you can't get the original result back if you return it (and in some 
cases, you can't even get a legal result, at least not with bending over 
backwards). Returning that turns something with one part into two. (And it's 
also generally considered a security hazard, because it makes matching much 
harder.)

*****************************************************************

From: Erhard Ploedereder
Sent: Wednesday, June  5, 2019  12:57 PM

I agree with Bob, that there ought to be invariants preserved.

In all the discussion, I could not find an answer to the following:

What should be the result of
  Simple_Name("usr/ploedere/myfiles/")

Is that clear to everybody but me?

If one thinks invariants and orthogonal semantics, there shouldn't be a 
difference from the root case.

Maybe the result is/should be "myfiles/"? The consistent result for the root 
case then is "/". Or is it Name_Error? Then the consistent answer for the root 
case is Name_Error. Or is it ""? Then the consistent answer is "" (which, on 
many systems will cause surprising failures for both scenarios, when people 
try to open by simple_name after having cd'ed to the enclosing directory).

"." seems really bad to me. Is this supposed to be portable across arbitrary 
OS and kernels?

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  6:44 PM

> In all the discussion, I could not find an answer to the following:
> 
> What should be the result of
>   Simple_Name("usr/ploedere/myfiles/")
> 
> Is that clear to everybody but me?

Umm, no. ;-) Never thought of it.

> If one thinks invariants and orthogonal semantics, there shouldn't be 
> a difference from the root case.

Agreed. This says that this is a wider problem than originally considered.

> Maybe the result is/should be "myfiles/"? The consistent result for 
> the root case then is "/".
> Or is it Name_Error? Then the consistent answer for the root case is 
> Name_Error.
> Or is it ""? Then the consistent answer is "" (which, on many systems 
> will cause surprising failures for both scenarios, when people try to 
> open by simple_name after having cd'ed to the enclosing directory).

Janus/Ada's code raises Name_Error if the string has more than two characters 
and ends with '/' or '\' (this being on Windows). I'm not sure why the two 
character exclusion; perhaps I was trying to allow ".\". The comment says 
"This name is not a root and ends with a path character". But roots on Windows 
would be three characters ("D:\"), so this doesn't make much sense.

If Name_Error is not raised, then "" is returned.

I'm pretty sure this is unplanned, so it doesn't tell us much.
 
> "." seems really bad to me. Is this supposed to be portable across 
> arbitrary OS and kernels?

I would hope that the return in that case would be defined to be 
"Current_Directory_Name", since that's what we called it in 
Hierarchical_File_Names. But I don't see any real justification for that, 
as it just doesn't fit into the model of these routines.

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  7:33 PM

...
> > If one thinks invariants and orthogonal semantics, there shouldn't 
> > be a difference from the root case.
> 
> Agreed. This says that this is a wider problem than originally 
> considered.

I wrote the following program to find out what compilers actually do.
Results in following messages. Feel free to try this on your favorite compiler.

================================

with Ada.Directories;
with Ada.Exceptions;
with Ada.Text_IO;
procedure Simple_Names is
   -- Attempt to determine what existing compilers do with various
   -- boundary conditions for Simple_Name and Containing_Directory.

   procedure Test (Test_String, Test_Name : String) is
   begin
      begin
         declare
            SN : constant String := Ada.Directories.Simple_Name (Test_String);
         begin
            Ada.Text_IO.Put_Line ("Simple_Name result =""" & SN &
                """ for subtest [" & Test_Name & ']');
         end;
      exception
         when Ada.Directories.Name_Error =>
            Ada.Text_IO.Put_Line ("Simple_Name raises Name_Error for subtest ["
                & Test_Name & ']');
         when Ada.Directories.Use_Error =>
            Ada.Text_IO.Put_Line ("Simple_Name raises Use_Error for subtest ["
                & Test_Name & ']');
         when Huh1:others =>
            Ada.Text_IO.Put_Line ("Simple_Name raises " &
                Ada.Exceptions.Exception_Name (Huh1) & " for subtest ["
                & Test_Name & ']');
      end;
      begin
         declare
            CD : constant String :=
                Ada.Directories.Containing_Directory (Test_String);
         begin
            Ada.Text_IO.Put_Line ("Containing_Directory result =""" & CD &
                """ for subtest [" & Test_Name & ']');
         end;
      exception
         when Ada.Directories.Name_Error =>
            Ada.Text_IO.Put_Line ("Containing_Directory raises Name_Error for" &
                " subtest [" & Test_Name & ']');
         when Ada.Directories.Use_Error =>
            Ada.Text_IO.Put_Line ("Containing_Directory raises Use_Error for" &
                " subtest [" & Test_Name & ']');
         when Huh2:others =>
            Ada.Text_IO.Put_Line ("Simple_Name raises " &
                Ada.Exceptions.Exception_Name (Huh2) & " for subtest ["
                & Test_Name & ']');
      end;
   end Test;

begin
   Ada.Text_IO.Put_Line ("Boundary cases for Ada.Directories");

   Test ("", "null string");
   Test ("/", "/");
   Test ("./", "./");
   Test ("../", "../");
   Test (".", ".");
   Test ("..", "..");
   Test ("bob", "bob"); -- Normal case.
   Test ("./tucker", "./tucker"); -- Normal case.
   Test ("./tucker/steve", "./tucker/steve"); -- Normal case.
   Test ("./tucker/erhard/", "./tucker/erhard/");

   -- Windows-specific:
   Test ("D:", "D:");
   Test ("D:\", "D:\");
   Test ("D:\tucker", "D:\tucker"); -- Normal case.
   Test ("D:\erhard\", "D:\erhard\");
   Test ("\\machine\share\", "\\machine\share\");
   Test ("\\machine\share", "\\machine\share");
   Test ("Con:", "Con:");

end Simple_Names;

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  7:39 PM

For Janus/Ada, the results of the previous program are:

Boundary cases for Ada.Directories
Simple_Name raises Name_Error for subtest [null string]
Containing_Directory raises Use_Error for subtest [null string]
Simple_Name result ="" for subtest [/]
Containing_Directory result ="\" for subtest [/]
Simple_Name result ="" for subtest [./]
Containing_Directory result ="." for subtest [./]
Simple_Name raises Name_Error for subtest [../]
Containing_Directory raises Name_Error for subtest [../]
Simple_Name result ="." for subtest [.]
Containing_Directory raises Use_Error for subtest [.]
Simple_Name result =".." for subtest [..]
Containing_Directory raises Use_Error for subtest [..]
Simple_Name result ="bob" for subtest [bob]
Containing_Directory result ="." for subtest [bob]
Simple_Name result ="tucker" for subtest [./tucker]
Containing_Directory result ="." for subtest [./tucker]
Simple_Name result ="steve" for subtest [./tucker/steve]
Containing_Directory result ="./tucker" for subtest [./tucker/steve]
Simple_Name raises Name_Error for subtest [./tucker/erhard/]
Containing_Directory raises Name_Error for subtest [./tucker/erhard/]
Simple_Name result ="D:" for subtest [D:]
Containing_Directory raises Use_Error for subtest [D:]
Simple_Name raises Name_Error for subtest [D:\]
Containing_Directory raises Name_Error for subtest [D:\]
Simple_Name result ="tucker" for subtest [D:\tucker]
Containing_Directory result ="D:" for subtest [D:\tucker]
Simple_Name raises Name_Error for subtest [D:\erhard\]
Containing_Directory raises Name_Error for subtest [D:\erhard\]
Simple_Name raises Name_Error for subtest [\\machine\share\]
Containing_Directory raises Name_Error for subtest [\\machine\share\]
Simple_Name result ="\\machine\share" for subtest [\\machine\share]
Containing_Directory raises Use_Error for subtest [\\machine\share]
Simple_Name raises Name_Error for subtest [Con:]
Containing_Directory raises Name_Error for subtest [Con:]

=======================

Analysis: There are some oddities in here, the results aren't fully
self-consistent. In particular, producing results for "./" and raising
Name_Error for "../" makes no sense. In most cases, a path string ending in
"/" raises Name_Error, but short ones do not. Not sure why. Since the
checking code is shared amongst all of the routines in Ada.Directories, the
answer may lie in some other routine.

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  7:46 PM

Here's the result of the program for GNAT (specifically, 18.1 on Windows):

Boundary cases for Ada.Directories
Simple_Name raises Name_Error for subtest [null string]
Containing_Directory raises Name_Error for subtest [null string]
Simple_Name result ="" for subtest [/]
Containing_Directory raises Use_Error for subtest [/]
Simple_Name result ="" for subtest [./]
Containing_Directory result ="." for subtest [./]
Simple_Name result ="" for subtest [../]
Containing_Directory result ="D:\Testing\Win" for subtest [../]
Simple_Name result ="" for subtest [.]
Containing_Directory result ="." for subtest [.]
Simple_Name result ="" for subtest [..]
Containing_Directory result ="." for subtest [..]
Simple_Name result ="bob" for subtest [bob]
Containing_Directory result ="." for subtest [bob]
Simple_Name result ="tucker" for subtest [./tucker]
Containing_Directory result ="." for subtest [./tucker]
Simple_Name result ="steve" for subtest [./tucker/steve]
Containing_Directory result ="./tucker" for subtest [./tucker/steve]
Simple_Name result ="" for subtest [./tucker/erhard/]
Containing_Directory result ="./tucker/erhard" for subtest
[./tucker/erhard/]
Simple_Name raises Name_Error for subtest [D:]
Containing_Directory raises Name_Error for subtest [D:]
Simple_Name result ="" for subtest [D:\]
Containing_Directory raises Use_Error for subtest [D:\]
Simple_Name result ="tucker" for subtest [D:\tucker]
Containing_Directory result ="D:\" for subtest [D:\tucker]
Simple_Name result ="" for subtest [D:\erhard\]
Containing_Directory result ="D:\erhard" for subtest [D:\erhard\]
Simple_Name result ="" for subtest [\\machine\share\]
Containing_Directory result ="\\machine\share" for subtest
[\\machine\share\]
Simple_Name result ="share" for subtest [\\machine\share]
Containing_Directory result ="\\machine" for subtest [\\machine\share]
Simple_Name raises Name_Error for subtest [Con:]
Containing_Directory raises Name_Error for subtest [Con:]

=======================

Analysis: GNAT mainly wants to return "" in these cases, which we already
knew (since it was the start of the discussion). There are a few anomolies
(and it has trouble with Windows-specific path names).

The containing directory of both "." and ".." is returned as ".". That would
be OK if the Simple_Name of ".." was "..", but it's "". I don't see how
calling Compose on these two results could possibly give a result that's
semantically the same as the original name.

And the Containing_Directory of "../" was "D:\Testing\Win", which is not the
intended string-manipulation only semantics. It probably should have been
"..". (Since these routines will be declared Nonblocking => True, Global =>
null in Ada 202x, this result will not be possible.)

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  10:35 PM

Thanks to Shawn Fanning at PTC, here is the result for this program for
ObjectAda 10.1 on Windows:

Boundary cases for Ada.Directories
Simple_Name raises Name_Error for subtest[null string]
Containing_Directory raises Name_Error for subtest [null string]
Simple_Name result ="/" for subtest [/]
Containing_Directory raises Use_Error for subtest [/]
Simple_Name result ="./" for subtest [./]
Containing_Directory result ="." for subtest [./]
Simple_Name result ="../" for subtest [../]
Containing_Directory result =".." for subtest [../]
Simple_Name result ="." for subtest [.]
Containing_Directory raises Use_Error for subtest [.]
Simple_Name result =".." for subtest [..]
Containing_Directory raises Use_Error for subtest [..]
Simple_Name result ="bob" for subtest [bob]
Containing_Directory raises Use_Error for subtest [bob]
Simple_Name result ="tucker" for subtest [./tucker]
Containing_Directory result ="." for subtest [./tucker]
Simple_Name result ="steve" for subtest [./tucker/steve]
Containing_Directory result ="./tucker" for subtest [./tucker/steve]
Simple_Name result ="./tucker/erhard/" for subtest [./tucker/erhard/]
Containing_Directory result ="./tucker/erhard" for subtest
[./tucker/erhard/]
Simple_Name raises Name_Error for subtest[D:]
Containing_Directory raises Name_Error for subtest [D:]
Simple_Name result ="D:\" for subtest [D:\]
Containing_Directory raises Use_Error for subtest [D:\]
Simple_Name result ="tucker" for subtest [D:\tucker]
Containing_Directory result ="D:" for subtest [D:\tucker]
Simple_Name result ="D:\erhard\" for subtest [D:\erhard\]
Containing_Directory result ="D:\erhard" for subtest [D:\erhard\]
Simple_Name result ="\\machine\share\" for subtest [\\machine\share\]
Containing_Directory raises Use_Error for subtest [\\machine\share\]
Simple_Name result ="share" for subtest [\\machine\share]
Containing_Directory raises Use_Error for subtest [\\machine\share]
Simple_Name raises Name_Error for subtest[Con:]
Containing_Directory raises Name_Error for subtest [Con:]


============================

Analysis: ObjectAda seems to punt on strings ending with '/', returning the
entire path in that case. It's hard to imagine that anyone would expect the
Simple_Name of "./tucker/erhard/" is "./tucker/erhard/" -- which I guess
demonstrates that these are truly boundary conditions and no one has thought
too much about what they should return.

The "normal" case results are as expected.

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  10:42 PM

CentOS Linux:

Boundary cases for Ada.Directories

Simple_Name raises CONSTRAINT_ERROR for subtest [null string]
Simple_Name raises CONSTRAINT_ERROR for subtest [null string]
Simple_Name result ="" for subtest [/]
Containing_Directory result ="/" for subtest [/]
Simple_Name result ="./" for subtest [./]
Containing_Directory result ="" for subtest [./]
Simple_Name result ="../" for subtest [../]
Containing_Directory result ="" for subtest [../]
Simple_Name result ="." for subtest [.]
Containing_Directory result ="" for subtest [.]
Simple_Name result =".." for subtest [..]
Containing_Directory result ="" for subtest [..]
Simple_Name result ="bob" for subtest [bob]
Containing_Directory result ="" for subtest [bob]
Simple_Name result ="tucker" for subtest [./tucker]
Containing_Directory result ="." for subtest [./tucker]
Simple_Name result ="steve" for subtest [./tucker/steve]
Containing_Directory result ="./tucker" for subtest [./tucker/steve]
Simple_Name result ="erhard" for subtest [./tucker/erhard/]
Containing_Directory result ="./tucker" for subtest [./tucker/erhard/]
Simple_Name result ="D:" for subtest [D:]
Containing_Directory result ="" for subtest [D:]
Simple_Name result ="D:\" for subtest [D:\]
Containing_Directory result ="" for subtest [D:\]
Simple_Name result ="D:\tucker" for subtest [D:\tucker]
Containing_Directory result ="" for subtest [D:\tucker]
Simple_Name result ="D:\erhard\" for subtest [D:\erhard\]
Containing_Directory result ="" for subtest [D:\erhard\]
Simple_Name result ="\\machine\share\" for subtest [\\machine\share\]
Containing_Directory result ="" for subtest [\\machine\share\]
Simple_Name result ="\\machine\share" for subtest [\\machine\share]
Containing_Directory result ="" for subtest [\\machine\share]
Simple_Name result ="Con:" for subtest [Con:]
Containing_Directory result ="" for subtest [Con:]

=======================

Analysis: This result is somewhat less consistent than the ObjectAda
results. Sometimes, strings ending in a / are returned outright, and in
other cases, the trailing '/' is completely ignored. Interestingly, this
compiler did better with the Windows strings than any of the Windows
compilers. Not sure what that means. :-)

*****************************************************************

From: Randy Brukardt
Sent: Wednesday, June  5, 2019  11:06 PM

So, to summarize our trip through the (language-defined package) looking
glass:

What is the result of Ada.Directories.Simple_Name ("/")?

   GNAT 18.1 (Windows): returns ""; 
   ObjectAda 10.1 (Windows): returns "/"; 
   Janus/Ada 3.2.1a (Windows): returns "";
   Apex v5.2 (Linux): returns "";  

What is the result of Ada.Directories.Containing_Directory ("/")?

   GNAT 18.1 (Windows): raises Use_Error;
   ObjectAda 10.1 (Windows): raises Use_Error;
   Janus/Ada 3.2.1a (Windows): returns "\";
   Apex v5.2 (Linux): returns "/"; 

Does the invariant Compose (Containing_Directory (A), Simple_Name (A)) =
A [semantically] hold for A = "/"? (Note: not tested directly).

   GNAT 18.1 (Windows): exception raised so no result;
   ObjectAda 10.1 (Windows): exception raised so no result;
   Janus/Ada 3.2.1a (Windows): Yes (but it would actually raise Name_Error 
                                    because of the null string);
   Apex v5.2 (Linux): Yes (if the null Simple_Name doesn't raise an
                           exception)

What is the result of Ada.Directories.Simple_Name ("./tucker/erhard/")?
[Erhard's question]

   GNAT 18.1 (Windows): returns "";
   ObjectAda 10.1 (Windows): returns "./tucker/erhard/";
   Janus/Ada 3.2.1a (Windows): raises Name_Error;
   Apex v5.2 (Linux): returns "erhard";

What is the result of Ada.Directories.Containing_Diretory
("./tucker/erhard/")?

   GNAT 18.1 (Windows): returns "./tucker/erhard" 
   ObjectAda 10.1 (Windows): returns "./tucker/erhard"
   Janus/Ada 3.2.1a (Windows): raises Name_Error;
   Apex v5.2 (Linux): returns "./tucker" 

Does the invariant Compose (Containing_Directory (A), Simple_Name (A)) = A 
[semantically] hold for A = "./tucker/erhard/"? (Note: not tested directly).

   GNAT 18.1 (Windows): Yes (if the null Simple_Name doesn't raise an
                             exception)
   ObjectAda 10.1 (Windows): No way.
   Janus/Ada 3.2.1a (Windows): exceptions raised so no results;
   Apex v5.2 (Linux): Yes

I'm afraid we didn't learn too much from this. The Apex results are probably 
the most consistent set, but even it gets confused with "./" and "../". And 
completely ignoring the trailing "/" is odd. The ObjectAda results are 
downright buggy in that the most basic invariant doesn't hold on many of the 
results.

All of the compilers other than ObjectAda return the null string on some 
inputs. I don't know if that is a consensus or accident, but it's fairly 
consistent. As such, I guess I'd lean slightly in the direction of expecting 
all of the examples ending with / to return a Simple_Name of ""; that means 
that Compose has to be able to deal with a Simple_Name parameter of "" as 
well.

I guess I'll write it up this way in the absence of other input (it has to be 
written up *somehow* for the meeting!)

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019  3:40 AM

> A bug was reported that a new version was returning "/" which is clearly 
> wrong.  But is empty string the right answer, or should the result be a 
> "legal" simple name, which means "." on Unix.

This is one of the reasons I think we're doing it wrong — IOW 
string-operations are not what we should be thinking about when dealing 
with directories, and thus Ada.Directories should not expose 
string-dependencies, but rather be all about manipulation in the 
conceptual-level [i.e. private type], perhaps with a sub-package dealing with 
the string-based issues.

When looking at it from that angle, it seems obvious that the Simple_Name 
function (a) should take the private-type representing a directory, and (b) 
return the conceptual-object's name. The "full name" would then be the 
recursive simple-names of the given directory and its parents separated with 
the appropriate delimiter.

We only invite pain and suffering by thinking of structure primarily as text 
— which I thought we would realize if we-as-a-discipline had any experience 
with RegEx.

*****************************************************************

From: Squirek
Sent: Thursday, June  6, 2019  4:07 AM

> So, to summarize our trip through the (language-defined package) 
> looking glass: ...

Thanks for all this testing - very helpful : )

*****************************************************************

From: Squirek
Sent: Thursday, June  6, 2019  4:46 AM

Randy, I guess the confusion began with your ACATS test for A.D.HFN which 
expects identity results or Name_Errors for simple_name on roots.
E.g. "/" must equal Simple_Name ("/") or raise a name error

...
          begin
              if Impdef.Equivalent_File_Names (Name,
                  ADH.Simple_Name (Name)) then
                 if TC_Trace then
                     Report.Comment ("  Simple_Name is identity on " &
                                       "single part");
                 end if;
              else
                 Report.Failed ("Simple_Name gets non-identity result - " &
                                 Subtest);
              end if;
          exception
             when Ada.Directories.Name_Error =>
                Report.Comment ("Name_Error from Simple_Name " &
                                 "of single part");
                -- We allow this as the result might not be Is_Simple_Name.
             when Ada.Directories.Use_Error =>
                Report.Failed ("Wrong exception from simple name " &
                                "of single part - " &
                                Subtest);
          end;
...

To which my modified version of GNAT outputs

      CXAG002 Check that package Ada.Directories.Hierarchical_File_Names
                 exists and the functions it contains work as expected.
    - CXAG002 Containing_Directory of simple name returns . - Simple.
    * CXAG002 Simple_Name gets non-identity result - Root.
**** CXAG002 FAILED ****************************.

Should the test be modified so that "" is acceptable in the root case?

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019  6:57 AM

> So, to summarize our trip through the (language-defined package) 
> looking glass: ...

I would agree at this point we should specify "" for 
Simple_Name("<whatever>/").  It is easy to test for the empty string, 
and so long as that is what is expected, the programmer can deal with it.

*****************************************************************

From: Bob Duff
Sent: Thursday, June  6, 2019  7:32 AM

I see a lot of invention of wheels in this thread, not necessarily round.

There are people who have thought this stuff through (not me!).
I don't normally take language-design advice from the likes of sh, bash, csh, 
ksh, etc, but in this case...

*****************************************************************

From: Arnaud Charlet
Sent: Thursday, June  6, 2019  7:42 AM

I fully agree with Bob.

Simple_Name = unix basename

So...

$ basename foo
foo

$ basename foo/
foo

$ basename /foo/bar/bii
bii

$ basename /foo/bar/bii/
bii

$ basename /
/

This is definitely what makes sense to me.

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019  9:08 AM

Note that this whole chain started when some work by Justin which caused GNAT
to change from returning "" for Simple_Name("/") to returning "/", which NicoS
reported caused some regressions in GPS.  Personally I think "", ".", and "/" 
could all be justified.  What is important is that the RM is unambiguous, so 
programmers know what to expect.

*****************************************************************

From: Richard Wai
Sent: Thursday, June  6, 2019  10:13 AM

It occurs to me that in most OS's, '/' or '\' can appear any number of times 
in a sequence, to the same effect as a single one.

I.e, the unix path "/foo//bar////" is the same as "/foo/bar". This works in 
Windows as well.

E.g, if we concatenate Containing_Directory ("/") with  Simple_Name ("/") and 
get "//", this would also be valid, since it is the same (to the OS) as "/".

Of course "" works in this situation as well, but at least "/" is not a null 
string!

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019  10:22 AM

Actually, in Posix, "/" and "//" are *not* equivalent.  Weirdly, "/" and "///" 
*are* equivalent!?!  Two slashes are used for referring to remote systems in 
Posix.  One slash is the "local" root.  Two slashes are a kind of "global" 
root.  And three slashes you are back to the local root...

*****************************************************************

From: Richard Wai
Sent: Thursday, June  6, 2019  3:37 PM

That's pretty wild. I tested this on Linux, FreeBSD and Solaris and they all 
had no problem resolving "//" as "/". Even with NFS, the mounts are supposed 
to be transparent to the underlying file system. I always thought it was 
Windows that did that "\\servername\share\" thing.

Maybe we should return "///"? (kidding) 

It's come time for me to throw my hands up: ¯\_(?)_/¯

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019   4:18 PM

From the current standard 
(http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13):

"... A pathname consisting of a single <slash> shall resolve to the root 
directory of the process. A null pathname shall not be successfully resolved. 
If a pathname begins with two successive <slash> characters, the first 
component following the leading <slash> characters may be interpreted in 
an implementation-defined manner, although more than two leading <slash> 
characters shall be treated as a single <slash> character."

It might have been a "sop" to the Windows folks, but we need something that 
works for Windows as well, in any case.

*****************************************************************

From: Squirek
Sent: Thursday, June  6, 2019  3:24 PM

>> I fully agree with Bob.
>>
>> Simple_Name = unix basename
...

Is it safe to assume then that Arno's preference is what should be done and 
that I can reapply my reverted patch?

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019  4:07 PM

No, I would wait until after the ARG meeting in Warsaw.  This clearly needs a 
"binding interpretation" so it is clearly in the ARG's purview.  I can't 
imagine there is a huge rush here...  And of course, NicoS was the person who 
saw the initial regressions, so hopefully he can weigh in on this as well 
(Justin, you might prompt him to do so).

*****************************************************************

From: Tucker Taft
Sent: Thursday, June  6, 2019   4:25 PM

Ok, no rush indeed. Its just annoying to sit on something if there is no need 
(as you say though further discussions are warranted).

*****************************************************************

From: Randy Brukardt
Sent: Thursday, June  6, 2019  10:21 PM

> It occurs to me that in most OS's, '/' or '\' can appear any number of 
> times in a sequence, to the same effect as a single one.

Perhaps, but that sort of gibberish is a security hazard.
 
> I.e, the unix path "/foo//bar////" is the same as "/foo/bar". 
> This works in Windows as well.

Really? It certainly didn't back in the old days. You'd probably end up with a
file named "/bar" which you couldn't delete. Janus/Ada has extensive file name
verification for this reason (I removed some of it for modern Windows, but 
mainly because Windows allows more characters).

> E.g, if we concatenate Containing_Directory ("/") with Simple_Name 
> ("/") and get "//", this would also be valid, since it is the same (to 
> the OS) as "/".

If you're "concatenating" anything Ada.Directories-related, you're doing it 
wrong. The issue is whatever Compose will do.

In any event, Containing_Directory ("/") is defined to raise Use_Error (there 
is no container for a root, see A.16(76/2)).

> Of course "" works in this situation as well, but at least "/" is not 
> a null string!

Which matters why? The only truly relevant thing is whether Compose allows the 
null string in these sort of situations. And even if you are using &, the null 
string works fine.

The only use case I can think for for Simple_Name outside of Compose is in 
error messages. But in that case, the result is unlikely to matter:

     The Source.Ads file could not be opened.

Any result for a root is still going to result in nonsense here:

     The / file could not be opened.
     The  file could not be opened.
     The D:\ file could not be opened.

None of them even represent a file, so a decent error message would require 
some sort of predetection and display of the entire input string, not just a 
blind use of Simple_Name.

*****************************************************************

From: Randy Brukardt
Sent: Thursday, June  6, 2019  10:43 PM

> Simple_Name = unix basename

This is definitely *not* literally true. I was the primary author for the 
wording for Ada.Directories, and I never heard of "basename". (Perhaps I knew
of it in the 70s and 80s when I was working regularly on Unix, but I doubt I 
used it ever.)

We could decide to make this true, but is there is a specification to copy?
The man page I found was vaguer than the description in the RM.

 
> So...
> 
> $ basename foo
> foo
> 
> $ basename foo/
> foo
> 
> $ basename /foo/bar/bii
> bii
> 
> $ basename /foo/bar/bii/
> bii
> 
> $ basename /
> /
> 
> This is definitely what makes sense to me.

I think from the comments in the Janus/Ada Ada.Directories was that I thought 
then (which was much closer to the design of Ada.Directories) that a root 
would be returned, and that Simple_Name would never raise an exception unless 
the name was so badly malformed that it couldn't ever be a name (for Windows 
that's mainly the presence of a few characters on the excluded list; for Linux,
it probably is impossible).

This seems to conform to "basename", although I think that is likely to be 
accidental (at least on my part, not necessarily other reviewers).

One problem with this idea is trying to decide what Windows names should 
return and describing that appropriately. We also have to properly describe 
what happens with "." and ".." (almost everybody choked on those, especially
the versions with a trailing slash). And of course, we really need to write 
this generically enough that it works for other targets that are neither 
Linux or Windows.

There's also the loss of semantic information (the trailing slash); it seems 
that can't be helped if we follow this model. But "/foo/" and "/foo" are not 
the same; unfortunately, they *can* be the same in some cases, and since 
these routines are supposed to be context-free, I suppose there's no other 
alternative.

In particular,
    Compose (Containing_Directory ("/foo/"), Simple_Name ("/foo/")) = "/foo"

Anyway, a minor glitch.

*****************************************************************

From: Arnaud Charlet
Sent: Friday, June  7, 2019  2:37 AM

> No, I would wait until after the ARG meeting in Warsaw.  This clearly needs 
> a "binding interpretation" so it is clearly in the ARG's purview.  I can't 
> imagine there is a huge rush here...  And of course, NicoS was the person 
> who saw the initial regressions, so hopefully he can weigh in on this as 
> well (Justin, you might prompt him to do so).

Nico was mainly reacting to the change of behavior (hence incompatibility), 
requiring code changes, and more complicated changes in order to accomodate 
two versions of GNAT. So he wanted to make sure that the change done would be
justified and not gratuituous (and would not change further), hence this 
discussion.

*****************************************************************

From: Randy Brukardt
Sent: Friday, June  7, 2019   2:42 AM

> Randy, I guess the confusion began with your ACATS test for A.D.HFN 
> which expects identity results or Name_Errors for simple_name on 
> roots.
> E.g. "/" must equal Simple_Name ("/") or raise a name error

Cool, I already knew this. Wonder when I forgot it??? ;-)
 
...
> To which my modified version of GNAT outputs
> 
>       CXAG002 Check that package 
> Ada.Directories.Hierarchical_File_Names
>                  exists and the functions it contains work as expected.
>     - CXAG002 Containing_Directory of simple name returns . - Simple.
>     * CXAG002 Simple_Name gets non-identity result - Root.
> **** CXAG002 FAILED ****************************.
> 
> Should the test be modified so that "" is acceptable in the root case?

Arno's solution, which is probably the easiest to justify (as it matches the
behavior of an ancient Unix tool that no one seems to know about), would say 
I had this right when I wrote this test. (Not sure why Janus/Ada doesn't do 
the right thing, other than a bug. Fixed now, though.)

*****************************************************************

From: Randy Brukardt
Sent: Thursday, June  6, 2019  11:14 PM

> No, I would wait until after the ARG meeting in Warsaw.  This clearly 
> needs a "binding interpretation" so it is clearly in the ARG's 
> purview.  I can't imagine there is a huge rush here...  And of course, 
> NicoS was the person who saw the initial regressions, so hopefully he 
> can weigh in on this as well (Justin, you might prompt him to do so).

There seems to be enough evidence that Unix "basename" (which Wikipedia says 
dates to 1979, even though I've never heard it) is fairly close to the intent
here. And it does have the advantage that it always returns a result if the 
string is even close to reasonable.

Ergo, I suggest we proceed with that in mind. The question then becomes what 
to do here?

The definition of simple names is found in A.16(47/2). That's nicely vague:

  The simple name of an external file is the name of the item, not including 
  any containing directory names. 

And the definition of function Simple_Name is equally vague:

  Returns the simple name portion of the file name specified by Name. The exception
  Name_Error is propagated if the string given as Name does not allow the
  identification of an external file (including directories and special files).

I don't think we want to hair up the definition of "simple name", as that is 
supposed to work for any OS that exists or is yet to be imagined.

So we seem to have a few options:

(1) Add a bit of normative wording to clarify that Simple_Name should return a 
    non-null result for any non-null string that doesn't raise Name_Error.
    This would be something like:

    The simple name is a non-null string for any Name that isn't the null string.

(2) Add a bit of normative wording to say that the Simple_Name of a root is 
    the root itself. That would be something like:

    The simple name of a root (see A.16.1) is the root itself.

(3) Or we could just beef up the AARM note for Simple_Name with more information, 
    including the above and/or additional stuff. Probably some or all of the 
    following:

    The result of Simple_Name corresponds to the result of the "basename" 
    command on Linux and Unix. The "basename" command ignores a trailing '/' 
    and then returns the part of the name in front of a '/'. It returns a root 
    intact. The null string is never returned. Similar rules should be used 
    for Windows filenames.

We probably also want to clarify in the AARM note for Containing_Directory 
that it raises Use_Error for a root [we want that so that 
Compose (Containing_Directory(A), Simple_Name(A)) roughly equals A for all A; 
Windows roots will not work if duplicated even if Unix would work in that
case]:

    If Name represents a root (see A.16.1), Containing_Directory raises 
    Use_Error; there is no directory that surrounds a root.

Do you have any preference for which of these I should put into the AI for 
discussion at the next meeting? It matters somewhat as the AI classification 
is "Ramification" if we are only changing/added AARM notes, and "Binding 
Interpretation" if we are adding normative wording. (I'm hoping to redo as
little work as possible. :-)

P.S. I note that this means I need to dive back into the algebra of file 
names to make sure this doesn't break anything in the definition of 
Ada.Directories.Hierarchical_File_Names. Not before Warsaw, though.

*****************************************************************

Questions? Ask the ACAA Technical Agent