!standard A.18 04-04-02 AI95-00370/02 !class amendment 03-12-20 !status work item 03-12-20 !status received 03-12-20 !priority Medium !difficulty Medium !subject Environment variables !summary Add a new standard package, Ada.Environment, to access environment variables. !problem A vast number of operating systems support environment variables, including the various Unixes, GNU/Linux, MS-DOS, Windows, and Mac OS X. It's possible to create the interfaces to the C libraries, but a standard interface would improve portability and usability of the language. !proposal (See wording.) !discussion This proposal provides a minimum set of capabilities that should be straightforward to implement on modern operating systems, most of the time by a simple skin on top of the C library. We do not provide subprograms equivalent to all of the functions provided by C, we use a building block approach instead. For instance there is not equivalent to clearenv(), but there is a way to clear one particular environment variable. This package can potentially result in race conditions, because the environment variables are really shared resources. For instance, calling Count, then Name with a Number less than Count, might well raise Constraint_Error if some environment variables have been undefined in the meantime. Users who want to do these things will need to call Environment from within a protected object. Also, concurrent calls to operations of Environment are a dicey proposition, because the operating system interfaces for accessing environment variables may or may not be thread-safe. We don't want to force implementations to use a protected object to implement this package, as it is expected that most of its uses will not involve concurrent access. So we are making it erroneous to perform concurrent calls on the operations declared here. Based on the normative description of Environment, this package might well be implemented as a mere string-to-string map. We try to nail down the semantics of environment variables a bit more precisely in an Implementation Advice, where we can talk about subprocesses and the like. If the operating system has a notion of environment variable then it must implement this package with the proper semantics. If however there is no such notion in the operating system (or if there is no operating system) we give a permission to provide a "dummy" implementation of Environment. !wording Add after A.15: A.16 The Package Environment The package Environment allows a program to read or modify the environment variables. Environment variables are name-value pairs, where both the name and value are strings, and where a variable is uniquely identified by its name. The definition of what constitutes an "environment variable", and the meaning of the name and value, are implementation defined. Static Semantics The library package Ada.Environment has the following declaration: package Ada.Environment is pragma Pure (Environment); function Count return Natural; function Name (Number : in Positive) return String; function Value (Name : in String) return String; function Value (Number : in Positive) return String; function Exists (Name : in String) return Boolean; procedure Set (Name : in String; Value : in String); procedure Clear (Name : in String); procedure Clear (Number : in Positive); end Ada.Environment; Dynamic Semantics function Count return Natural; If the external execution environment supports environment variables, Count returns the number of environment variables currently defined. Otherwise it returns 0. function Name (Number : in Positive) return String; If the external execution environment supports environment variables, then Name returns the name of the environment variable at position Number. If Number is outside the range 1..Count, then Constraint_Error is propagated. function Value (Name : in String) return String; If the external execution environment supports environment variables, then Value returns the value of the environment variable with name Name. If Name is not the name of an environment variable, then Constraint_Error is propagated. function Value (Number : in Positive) return String; Equivalent to Value (Name (Number)). function Exists (Name : in String) return Boolean; If the external execution environment supports environment variables, then Exists returns True if an environment variable with name Name is currently defined. Otherwise it returns False. procedure Set (Name : in String; Value : in String); If the external execution environment supports environment variables, then Set defines an environment variable with name Name and value Value, possibly overwriting a preexisting environment variable with the same name. Otherwise Constraint_Error is propagated. In the implementation defined circumstances where the external execution environment does not allow the definition of an environment variable with the given name and value, Constraint_Error is propagated. It is implementation defined whether there exist values for which the call Set(Name, Value) has the same effect as Clear (Name). Calling Set changes in an unspecified manner the association between names and numbers exposed by function Name. procedure Clear (Name : in String); If the external execution environment supports environment variables, then Clear deletes the environment variable with name Name, if one existed. Otherwise Constraint_Error is propagated. Immediately after this procedure returns, the expression Exists(Name) is False. Calling Clear changes in an unspecified manner the association between names and numbers exposed by function Name. procedure Clear (Number : in Positive); Equivalent to Clear (Name (Number)). Implementation Permissions An implementation running on a system which has no concept corresponding to environment variables is permitted to define the operations of package Environment with the semantics corresponding to the case where "the external execution environment does not support environment variables". An alternative declaration is allowed for package Environment if different functionality is appropriate for the external execution environment. Implementation Advice If it chooses not to take advantage of the first permission above, an implementation should provide a mechanism to initialize a non-empty set of environment variables at the beginning of the execution of a partition. Also, if the execution environment has a concept of subprocess, the currently defined environment variables should be used to initialize the environment variables of the subprocess. Erroneous Execution Making concurrent calls to operations of package Environment results in erroneous execution. <> <> <> !example The following program prints out the environment, clears it, and defines a new environment variable: for V in 1 .. Ada.Environment.Count loop Ada.Text_Io.Put_Line (Ada.Environment.Name (V) & "=" & Ada.Environment.Value (V)); Ada.Environment.Clear (V); end loop; Ada.Environment.Set (Name => "ADA", Value => "05"); --!corrigendum !ACATS test Create ACATS C-Tests for this facility. !appendix From: Pascal Orby Sent: Sunday, December 21, 2003 2:12 AM > !subject Add standard interface for environment variables Did you check the AI database, I seem to remember that a proposal in this direction has already been made, no ? **************************************************************** [Editor's note: AI-371 contains a different proposal for such a package. For the sake of making it easier to evaluate the two proposals, it is copied here.] The following package exists: package Environment is procedure Set_Variable (Name : String; To : String); function Get_Variable (Name : String) return String; function Path return String; function Locate_Executable (Name : String) return String; function Locate_Regular_File (Name : String; Path : String := "") return String; end Environment; This package provides access to environment variables. An environment variable is a value external to the program, which has a name. If the operating system does not support the notion of environment variables, calls to Get_Variable will always return the null string, and calls to Set_Variable will raise Unsupported. The function Path returns an implementation defined string which defines the location of executable files. If the operating system does not support the notion of path, the null string is returned. The function Locate_Executable returns a string corresponding to the full name of the specified executable name, following the operating system's conventions for locating executables. The exception Ada.IO_Exceptions.Name_Error is raised if no executable with the given name can be identified. The function Locate_Regular_File returns a string corresponding to the full name of the specified file name, following the operating system's conventions for locating data files. The exception Ada.IO_Exceptions.Name_Error is raised if no file with the given name can be identified. **************************************************************** From: David A. Wheeler Sent: Monday, April 19, 2004 6:47 PM Currie Colket wrote: >Tom, Dave, and Clyde: > >Thought you might be interested in the status of the ada-comment Dave >submitted last December. Pascal Leroy is the Chair of the ARG. It looks >like it is scheduled for discussion at SIGAda 2004 in Atlanta. > >This might be encouragement to all to go to Atlanta. The ARG meetings >are typically by invitation only. But now, you have an AI number and a >URL that can provide you with the current status of the AI. > >http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00370.TXT?rev=1.3 > >Cheers! Thanks for the mention! I'm not married to the interface I proposed for environments, so changes are of course fine. Indeed, I'm delighted that Pascal Leroy is working to ensure that there's a clean interface to this important functionality. A few comments, though. In particular, I fear that the currently proposed interface is awkward to use in common situations. In particular, the lack of a default for Value() will probably make this interface much harder to use than necessary. Also, the lack of a Clear_Environment function is a problem for writing secure programs. First point. The new "Value()" function will raise Constraint_Error if "Name" is not the name of an environment variable, and does not support a "Default" argument. This can be mildly annoying if Restrictions(No_Exceptions) has been invoked. Also, this means that the normal case of using environment variables will be complicated. You'd like to do this: if Exists("VARNAME") then setting := Ada.Environment.Value("VARNAME"); else setting := "DEFAULT"; end if; But a String can't be set to a possibly different length like this. Probably the simplest approach without a defaulting parameter is to switch it to an Unbounded_String, like this: if Exists("VARNAME") then setting := To_Unbounded_String(Ada.Environment.Value("VARNAME")); else setting := To_Unbounded_String("DEFAULT"); end if; But now you're doing type conversions simply to work around an Ada limitation. You CAN'T catch exceptions while initializing local variables, so you can't do this if VARNAME won't always exist: function blah(...) is setting := Ada.Environment.Value("VARNAME"); begin All this is unnecessary. In most cases, environment variables are used to provide information for non-default situations. If Value() has a default, then there needn't be special handling to avoid an exception, and the common case becomes: function blah(...) is setting := Ada.Environment.Value("VARNAME", "DEFAULT"); begin... With one environment variable, this isn't a big deal, but many programs have a large set of environment variables that control various special-purpose parameters. You can create a "defaulting" subprogram using the proposed interface. But if that's what most people will want to do, then failing to have a standard interface that does it means that there's more code (with inconsistent names) that people will have to read through. And some people may decide to use long, complicated approaches when simple approaches will do. It also creates a lot of unnecessary baggage; now I have to pass up unbounded strings through several routines, and either handle exceptions (but what about No_Exceptions?) or do the search twice (using Exists). Absolutely minimal interfaces are not always a good engineering approach; a Turing machine is minimal, but you wouldn't want to program one directly. Clearenv() functionality isn't included in the modified proposal, but it is needed by some programs with elevated privileges. C programs that wish to do that often just overwrite the environment count, rather than calling clearenv(), but if Ada.Environment doesn't allow direct write access to that count (which it shouldn't), it should provide some interface for erasing all environment variables. The problem is that environment variables propogate (that's the point!), but programs that obtain environment variables from untrusted sources need to extract the variables they need, erase EVERYTHING, and then re-set just the ones they wish to pass on. Thanks for working on this. I look forward to the results of the ARG (and Pascal's) hard work. **************************************************************** From: Randy Brukardt Sent: Tuesday, April 20, 2004 6:04 PM I've added this comment to the appendix of AI-370, as requested. In the future, please make comments to Ada-Comment (not privately) so (a) other people can comment; and (b) I know that permission has been given to make the comment public. > >Thought you might be interested in the status of the ada-comment Dave > >submitted last December. Pascal Leroy is the Chair of the ARG. It looks > >like it is scheduled for discussion at SIGAda 2004 in Atlanta. I would expect that it will be finished by then; there are two meetings before SIGAda (at Ada-Europe in June and in Madison in September). By the time of SIGAda, we hopefully will be working only on integration issues. > First point. > The new "Value()" function will raise Constraint_Error if "Name" is not > the name of an environment variable, and does not support a "Default" > argument. This can be mildly annoying if Restrictions(No_Exceptions) has > been invoked. Most such functions return "" if the name is not found. That would seem to be most appropriate here. But I have a hard time imagining any application using environment variables and processes running with Restrictions(No_Exceptions)! Our experience with "default" arguments is that no one understands or uses them. We have such arguments in Claw's registry operations (which are of course similar to the environment), but experience shows that I'm virtually the only person who uses them. I suspect that's because they're very difficult to describe, and in many circumstances, very hard to set up to a meaningful value. (The default is rarely a simple constant, it's usually a function of other settings.) > Clearenv() functionality isn't included in the > modified proposal, but it is needed by some programs > with elevated privileges. C programs that wish to do that often just > overwrite the environment count, rather than calling clearenv(), but if > Ada.Environment doesn't allow direct write access to that count (which > it shouldn't), it should provide some interface for erasing all > environment variables. I don't think there is any such count in Windows, so certainly this cannot be done this way. > The problem is that environment variables > propogate (that's the point!), but programs that obtain environment > variables from untrusted sources need to extract the variables they > need, erase EVERYTHING, and then re-set just the ones they wish to pass on. That's not how this is supposed to be done (certainly in Windows, but I think the same was true in Unix). Such a program should create a new, empty environment block, and then populate it with the appropriate variables. I'm not sure that it is even possible to empty an environment in Windows via the API, and it would have to be done a string at a time if it can be done (which would be awful). But such an interface goes beyond what should be in the standard. Especially as we've decided not to standardize any Spawn-like functionality (and one of the main reasons for that is that we'd need to define environment and file inheritance to make it useful). > Thanks for working on this. I look forward to the > results of the ARG (and Pascal's) hard work. I worry about this interface, as it goes far beyond what is provided in the Windows API. The Windows API for the environment provides three functions: GetEnvironmentVariable, SetEnvironmentVariable, and GetEnvironmentStrings. The first can be used to provide Exists (but you can't have a variable with the null string); the second can be used to provide Clear (because setting to the null string deletes). The third makes a copy of the entire environment in an undefined format (some examples show it just as a list of null-terminated strings, which would match MS-DOS). That is a copy because the environment consists of two parts: the unchangable system environment, and the processes own environment. So, even implementing Count would be very expensive, requiring a copy of the entire environment (which is huge, especially if Apex is installed), and then a scan of the entire thing. Moreover, this would have to be repeated on every operation, as this is a copy of the real environment. Even if you were willing to let any mods other than that using this package to be lost (which I'm not, especially in the early days of this package, when most old code will do something else), you'd still have to re-read the environment after anything that changed it, and refigure the count. Why does that API matter? Because whatever is done needs to be compatible with foreign code (and old Ada code directly using the API). If the package managed its own environment, the cost would be gone, but it would only work with operations that knew about this package (not with direct calls to CreateProcess, for example, which are very common in existing code). So I believe that only a direct interface to the API is appropriate for implementing these operations. What C provides here is irrelevant. Ada vendors cannot in general require someone to purchase a C compiler before they can use Ada! So I believe that the proposed interface goes way too far already. The Count interface is unstable (because a change caused by other code would change the top value). If the intent is to be able to read the entire environment, there should be an iterator for that purpose: generic with procedure Process (Name, Value : in String); procedure Generic_Traverse_Environment; which could walk the entire environment without any race conditions. The only alternative is to provide a stale snapshot of the environment from the start of the program, which would often be the wrong thing. **************************************************************** From: Pascal Leroy Sent: Wednesday, April 21, 2004 4:05 PM > I worry about this interface, as it goes far beyond what is > provided in the Windows API. Note that this interface reflects what was discussed at the last ARG meeting (see the minutes). > So, even implementing Count would be very expensive, > requiring a copy of the entire environment (which is huge, > especially if Apex is installed), and then a scan of the > entire thing. Moreover, this would have to be repeated on > every operation, as this is a copy of the real environment. > Even if you were willing to let any mods other than that > using this package to be lost (which I'm not, especially in > the early days of this package, when most old code will do > something else), you'd still have to re-read the environment > after anything that changed it, and refigure the count. Even on Unix I think that you would want to cache the result of calling getenv and say that modifications not made through this package are lost. Otherwise the performance is going to be horrendous. It would be an easy matter to fixup the cache when the value of an environment variable changes (that's true regardless of the OS). > If the > intent is to be able to read the entire environment, there > should be an iterator for that purpose: > > generic > with procedure Process (Name, Value : in String); > procedure Generic_Traverse_Environment; > > which could walk the entire environment without any race > conditions. I have never seen a need for scanning the entire environment, but that may just be my inexperience. Note that there would still be a race condition if other tasks asynchronously call the C library to muck with the environment. All the other predefined units that access shared resources (I/O, directory operations, etc) go belly up if you use C code to change these resources behind Ada's back. This package is no different, and it should work the same. **************************************************************** From: Randy Brukardt Sent: Wednesday, April 21, 2004 1:28 PM > > I worry about this interface, as it goes far beyond what is > > provided in the Windows API. > > Note that this interface reflects what was discussed at the last ARG > meeting (see the minutes). If you're saying that I should have complained at the meeting, let me say that I wasn't in a position to check the difficulty of the implementation at the meeting. (I don't carry the Microsoft CDs with my laptop!). I suspected that it was a problem, but without any proof, I did not want to be spreading FUD on an interface that I'd like to see. (And note taking prevents deep thinking in any case.) I only did the implementation check last night, and I wrote up my conclusions immediately. ... > Even on Unix I think that you would want to cache the result of calling > getenv and say that modifications not made through this package are > lost. Otherwise the performance is going to be horrendous. It would be > an easy matter to fixup the cache when the value of an environment > variable changes (that's true regardless of the OS). Which would mean either that "Value"-by-Name would be wrong some of the time, or would differ from the rest of the package's results - either of which would be a huge mistake. (I don't think that using GetEnvironmentVariable is an "option"). > > If the > > intent is to be able to read the entire environment, there > > should be an iterator for that purpose: > > > > generic > > with procedure Process (Name, Value : in String); > > procedure Generic_Traverse_Environment; > > > > which could walk the entire environment without any race > > conditions. > > I have never seen a need for scanning the entire environment, but that > may just be my inexperience. I just don't see any other value to the "Count" option. (And I agree with you, I don't see any value to scanning the whole environment.) If you're not scanning the entire environment, then you're accessing items by name. There cannot be any useful purpose to accessing the third item in the environment, because you have no control over the order in which these things are stored. This is just a Map, after all, and Maps avoid this race condition by using "cursors" for iteration, not numbers. The cursor doesn't change if something is added or deleted to the map. > Note that there would still be a race > condition if other tasks asynchronously call the C library to muck with > the environment. Well, not really. The only way to implement this (on Windows) would be to make a copy of the environment when the iterator started. So later mucking would not be reflected. (I'm presuming that the OS will serialize access when making the copy.) Unix may differ (I don't remember clearly how that works). > All the other predefined units that access shared resources (I/O, > directory operations, etc) go belly up if you use C code to change these > resources behind Ada's back. This package is no different, and it > should work the same. Well, perhaps in language terms, but not in practical terms. Sure if you use C to forcibly close a file handle, things will die. But if you use C in a sensible way (say to write something to a stream), nothing bad will happen. The customers would crucify us if that wasn't true. Are you saying that changing the environment is not a sensible change? **************************************************************** From: Pascal Leroy Sent: Thursday, April 22, 2004 2:08 AM > I just don't see any other value to the "Count" option. (And I agree with > you, I don't see any value to scanning the whole environment.) If you're not > scanning the entire environment, then you're accessing items by name. There > cannot be any useful purpose to accessing the third item in the environment, > because you have no control over the order in which these things are > stored. Agreed, all this access by position has been irritating when I wrote the AI because I had to add all these rules that say "after you do so-and-so, access by position is not going to work". I would not oppose to removing it altogether and providing a passive iterator for the odd case where someone really wants to scan the environment. > Well, perhaps in language terms, but not in practical terms. Sure if you > use C to forcibly close a file handle, things will die. But if you use C in a > sensible way (say to write something to a stream), nothing bad will > happen. If you do a truncate or an fseek on a file that is being accessed by Ada, you're screwed, especially if buffering takes place. I think the only safe rule is that there are some files that you access exclusively with Ada and some files that you access exclusively with C. If you access the same file with both C and Ada, all bets are off. > The customers would crucify us if that wasn't true. Are you saying that > changing the environment is not a sensible change? I am saying that if you access some variables exclusively with Ada, and some variables exclusively with C, everything is safe. If there are variables that you access with both C and Ada, all bets are off. **************************************************************** From: Randy Brukardt Sent: Thursday, April 22, 2004 3:49 PM ... > Agreed, all this access by position has been irritating when I wrote the > AI because I had to add all these rules that say "after you do so-and-so, > access by position is not going to work". I would not oppose to removing > it altogether and providing a passive iterator for the odd case where > someone really wants to scan the environment. Sounds like a plan to me. ... > I am saying that if you access some variables exclusively with Ada, and > some variables exclusively with C, everything is safe. If there are > variables that you access with both C and Ada, all bets are off. I understand this position, but don't really agree with it (for files or for the environment). I realize that the language standard can't take a position here, but I believe the implementation should minimize problems (so, for instance, you shouldn't use buffering on I/O). I don't suppose further discussion on this point is going to get us anywhere, and it isn't critical anyway, **************************************************************** From: David A. Wheeler Sent: Thursday, April 22, 2004 4:10 PM Removing support for positional access & the Count is fine. The primary purpose for that is to allow positional iterators and to counter certain fancy attacks. A passive iterator handles the former, and the latter case is such a special case that a developer would probably be willing to interface directly to the underlying system anyway. BUT there needs to at least be a passive iterator (like the generic). For debugging & some other situations it's sometimes very important to know exactly what the environment is. Being able to walk through it & dump it all (or programmatically select parts of it) is extremely useful. E.G., "Show me all the environment variables beginning with LD_" is useful on GNU systems. I still think it's very important to provide an interface for erasing environment variables. I guess if it can't be done on the given platform, that call could raise an exception. But for many platforms, requesting that the environment be erased _IS_ a reasonable request. It's absolutely a minimum requirement for a setuid/setgid program on Unix/Linux that executes (exec() family) other programs. Ada's type-safety is useful for secure programs, let's give those developers the tools they need to write such programs. >>I am saying that if you access some variables exclusively with Ada, and >>some variables exclusively with C, everything is safe. If there are >>variables that you access with both C and Ada, all bets are off. > >I understand this position, but don't really agree with it (for files or for >the environment). I realize that the language standard can't take a position >here, but I believe the implementation should minimize problems (so, for >instance, you shouldn't use buffering on I/O). Personally, I think having Ada-unique buffers separate from the system data is a recipe for subtle, nasty bugs. **************************************************************** From: David A. Wheeler Sent: Thursday, April 22, 2004 4:22 PM I should've been clearer in my last message. There is a security attack where position matters, and I didn't describe the problem. Which is why I had a positional interface. But if there's an iterator and a way to erase all environment variables, that'll be fine for most real purposes. Here are the details. On Unix/Linux, and I suspect on Windows too, environment variables are SUPPOSED to be a simple map (given a name, there is one and only one value for it in the environment). But in actuality, on Unix/Linux at least, unprivileged users can create environments where this is NOT true: a single key can have more than one value! The user can then invoke a privileged program and pass the privileged program this ugly environment. So if there is more than one value, which one is picked? The answer is... it depends. Some attacks exploit this. E.G., the checker checked value #1, but the program actually used #2, so I can now sneak malicious data past the checker to attack the program. As long as there's a way to erase environments, this isn't usually a problem. A program that gains privileges (like setuid/setgid programs Unix) should normally extract from the environment what it needs, check the values, completely ERASE the environment, and then reset only the values it has determined are safe. Basically, you can't trust data from users unless it's carefully verified. These attacks & solutions are well-known in the security community, though not enough developers know about them. For more info, see: http://www.dwheeler.com/secure-programs **************************************************************** From: Pascal Leroy Sent: Friday, April 23, 2004 3:55 AM > These attacks & solutions are well-known in the > security community, though not enough developers > know about them. For more info, see: > http://www.dwheeler.com/secure-programs I certainly find your analysis convincing enough to warrant inclusion of some clearenv capability. And I prefer a passive iterator over a positional interface. ****************************************************************