!standard D.16.1(9/3) 13-06-15 AI12-0033-1/05 !standard D.16.1(20/3) !standard D.16.1(23/3) !standard D.16.1(24/3) !standard D.16.1(26/3) !class binding interpretation 12-11-28 !status work item 12-06-06 !status received 12-05-29 !priority Low !difficulty Medium !subject Sets of CPUs when defining dispatching domains !summary Discontiguous sets of CPU numbers may be used when specifying a dispatching domain. A dispatching domain may be empty, but it is an error to assign a task to an empty domain. !question It seems limiting that dispatching domains can only be defined by ranges of CPUs. For example in our hardware Architecture, we have 4 CPU's with each 4 cores, but the numbering is not straight forward. The first CPU has the cores 0, 4, 8, 12, and the second 1,5,9,13. So here it will be better to have a higher flexibility in assigning Domains to ranges and single cores e.g a Domain (2,6,10,14) pointing the 2nd CPU or (2-3,6-7,10-11,14-15) defining a Domain of the 2nd and 3rd CPU. Should an additional Create routine be defined? (Yes.) !recommendation A more flexible specification is proposed allowing sets of CPUs to be specified. This may in any case be necessary to describe the set of CPUs that remain in the System dispatching domain, after the other domains have been "carved" out of it. !wording Add after D.16.1(9/3): type CPU_Set is array(CPU range <>) of Boolean; function Create (Set : CPU_Set) return Dispatching_Domain; function Get_CPU_Set (Domain : Dispatching_Domain) return CPU_Set; Modify D.16.1(20/3): The expression specified for the Dispatching_Domain aspect of a task {type} is evaluated for each [task] object {of the task type} (see 9.1). {If the identified Dispatching_Domain is empty, then Dispatching_Domain_Error is raised; otherwise the} [The] Dispatching_Domain value is [then] associated with the task object [whose task declaration specifies the aspect]. Modify D.16.1(23/3): The function Create {with First and Last parameters} creates and returns a Dispatching_Domain containing all the processors in the range First .. Last. {The function Create with a Set parameter creates and returns a Dispatching_Domain containing the processors for which Set(I) is True.} These processors are removed from System_Dispatching_Domain. A call of Create will raise Dispatching_Domain_Error if any designated processor is not currently in System_Dispatching_Domain, or if the system cannot support a distinct domain over the processors identified, or if a processor has a task assigned to it, or if the allocation would leave System_Dispatching_Domain empty. A call of Create will raise Dispatching_Domain_Error if the calling task is not the environment task, or if Create is called after the call to the main subprogram. Modify D.16.1(24/3): The function Get_First_CPU returns the first CPU in Domain; Get_Last_CPU returns the last one. {The function Get_CPU_Set(D) returns an array whose The function Get_First_CPU returns the first CPU in Domain; Get_Last_CPU returns the last one. {The function Get_CPU_Set(D) returns an array whose low bound is Get_First_CPU(D), whose high bound is Get_Last_CPU(D), with True values in the Set corresponding to the CPUs that are in the given Domain. If Domain is empty, then Get_Last_CPU(D) returns one less than Get_First_CPU(D), but are otherwise unspecified.} Modify D.16.1(26/3): A call of the procedure Assign_Task assigns task T to the CPU within Dispatching_Domain Domain. Task T can now execute only on CPU {,} unless CPU designates Not_A_Specific_CPU[,] in which case it can execute on any processor within Domain. The exception Dispatching_Domain_Error is propagated if {Domain is empty,} T is already assigned to a Dispatching_Domain other than System_Dispatching_Domain, or if CPU is not one of the processors of Domain (and is not Not_A_Specific_CPU). A call of Assign_Task is a task dispatching point for task T unless T is inside of a protected action, in which case the effect on task T is delayed until its next task dispatching point. If T is the Current_Task the effect is immediate if T is not inside a protected action, otherwise the effect is as soon as practical. Assigning a task to System_Dispatching_Domain that is already assigned to that domain has no effect. !discussion There is already a problem with the current mechanism for specifying dispatching domains, in that the System_Dispatching_Domain might have holes. For example, if the program is running on a system with 8 CPUs, and a domain is Created containing CPUs 3 through 6, the System_Dispatching_Domain holds the remaining CPUs, which in this case are CPUs 1, 2, 7, and 8. Then Get_First_CPU(System_Dispatching_Domain) = 1 and Get_First_CPU(System_Dispatching_Domain) = 8. but this surely does not completely characterize the values in the System_Dispatching_Domain. Moreover, you can create many holes in the System_Dispatching_Domain this way. So we propose to add set operations here in order to properly represent any of these items. We have chosen a straightforward bit-vector representation of the CPU set. A more sophisticated approach is possible, but seems to be overkill. The CPU_Set array type is unconstrained, so the overall length may be kept to the minimum necessary to include all of the "True" values. Aspect Pack might be applied to CPU_Set in the private part of this package, if the implementation so chooses. Note that if the domain specified by Create is empty, Get_Last_CPU(D) will return one less than Get_First_CPU(D), but is otherwise unspecified. We allow empty domains to be specified because the domains might be created using information from querying the environment, and in some environments there might be insufficient CPUs to make each possible domain non-empty. The code which assigns tasks can be conditional, but it is not easy to make the declarations of dispatching domains conditional, as they must be declared and initialized using library-level declarations. !ACATS test An ACATS C-Test should be created (or modified) to test these additional subprograms. !appendix From: Ed Schonberg Sent: Thurssday, May 31, 2012 7:48 AM Worth discussing: I have now tried out the System.Multiprocessor package. I found a restriction in the Dispatching_Domain, where you can only assigne a range of CPUs to such a domain as I understand, going from First to Last CPU. But, for example in our hardware Architecture, we have 4 CPU's with each 4 cores, but the numbering is not straight forward. The first CPU has the cores 0, 4, 8, 12 the second 1,5,9,13. So here it will be better to have a higher flexibility in Assigning Domains to ranges and single cores e.g a Domain (2,6,10,14) pointing the 2nd CPU or (2-3,6-7,10-11,14-15) defining a Domain of the 2nd and 3rd CPU. **************************************************************** From: Robert Dewar Sent: Thurssday, May 31, 2012 7:54 AM He has a real point here. We should fix this. I think if we don't fix it, GNAT will probably introduce another parallel feature to fix it, and that seems undesirable! **************************************************************** From: Tucker Taft Sent: Thurssday, May 31, 2012 8:36 AM An alternative approach would be to perform a mapping between the Ada notion of "CPU" and the hardware one. We already do that for priority, given that in some RTOS's the most urgent priority is the lowest numerically. **************************************************************** From: Tullio Vardanega Sent: Thurssday, May 31, 2012 8:55 AM That reads more natural to me. **************************************************************** From: Ed Schonberg Sent: Thurssday, May 31, 2012 9:23 AM > An alternative approach would be to perform a mapping between the Ada > notion of "CPU" and the hardware one. > We already do that for priority, given that in some RTOS's the most > urgent priority is the lowest numerically. One more application of the principle that everything can be solved with one additional level of indirection. Does the user do the mapping, or the run-time? Given the novelty of the feature, the easiest we can make its use, the better. **************************************************************** From: Robert Dewar Sent: Thurssday, May 31, 2012 9:28 AM The mapping could work, but is junky and unnecessarily implementation dependent. We have always had trouble with people understanding the priority mapping. I would at least like to consider a fix? **************************************************************** From: Jean-Pierre Rosen Sent: Thurssday, May 31, 2012 10:01 AM > One more application of the principle that everything can be solved > with one additional level of indirection. Does the user do the mapping, or the > run-time? Given the novelty of the feature, the easiest we can make its use, > the better. I just reread the chapter (sorry, clause) and realized that a dispatching domain is defined as a range of CPUs, while it should have been defined as a set of CPUs. Of course, since a range defines a set, it should be easy to fix it compatibly by just adding a couple of procedures. **************************************************************** From: Geert Bosch Sent: Thurssday, May 31, 2012 10:51 AM > I have now tried out the System.Multiprocessor package. I found a > restriction in the Dispatching_Domain, where you can only assigne a > range of CPUs to such a domain as I understand, going from First to > Last CPU. But, for example in our hardware Architecture, we have 4 > CPU's with each 4 cores, but the numbering is not straight forward. > The first CPU has the cores 0, 4, 8, 12 the second 1,5,9,13. > So here it will be better to have a higher flexibility in Assigning > Domains to ranges and single cores e.g a Domain > (2,6,10,14) pointing the 2nd CPU or (2-3,6-7,10-11,14-15) defining a > Domain of the 2nd and 3rd CPU. There is nothing that states that the CPU numbering has to match some hardware or OS numbering. Note that the whole concept of CPU is very loose to start with. A CPU might refer to a hyperthread, core, die, socket or even node on a NUMA system. Also, the number of "online" CPUs and their assignment may even change dynamically as CPUs are suspended due to errors or, more commonly nowadays, to save power. Similarly, OS commands or even hypervisors may be used to dynamically change the set of processors available to a certain Ada program. That said, I agree that it makes sense for an implementation to provide an ordering that takes topology into account. **************************************************************** From: Randy Brukardt Sent: Monday, June 4, 2012 8:25 PM ... > Of course, since a range defines a set, it should be easy to fix it > compatibly by just adding a couple of procedures. But what would they look like? Ada doesn't have a convenient general notation for sets (I can't imagine quite how to use the membership notation as a parameter to one of these procedures). The closest thing is an array aggregate, but these aren't going to be portable since the range of CPU is implementation-defined, and in any case they are a very clunky representation. If someone is advocating a change here, I think they ought to suggest a specific change as opposed to simply calling for a "fix" or "adding a couple of procedures". Things always seem more reasonable in the abstract! **************************************************************** From: Jean-Pierre Rosen Sent: Tuesday, June 5, 2012 4:47 AM > If someone is advocating a change here, I think they ought to suggest > a specific change as opposed to simply calling for a "fix" or "adding > a couple of procedures". Things always seem more reasonable in the abstract! There is currently only one "create" function. I was thinking of adding variants: type CPU_Set is array (CPU) of Boolean; function Create (Set : CPU_Set) return Dispatching_Domain; -- DD := Create ((1, 3, 5 => True, Others => False)); and/or: type CPU_List is array (Positive range <>) of CPU; function Create (List : CPU_List) return Dispatching_Domain; -- Reads better: -- DD := Create ((1, 3, 5)); It would presumably be useful to have "Add" procedures too (with the same run-tim constraints as Create): procedure Add (To: in out Dispatching_Domain; First, Last : CPU); procedure Add (To: in out Dispatching_Domain; Set : CPU_Set); procedure Add (To: in out Dispatching_Domain; List : CPU_List); Get_First_CPU and Get_Last_CPU would have to be defined as the lower and upper bounds of assigned CPUs, /with possible holes/, maybe declared obsolescent, and replaced with: function Get_CPU_Set (Domain : Dispatching_Domain) return CPU_Set; (it would be possible to define an iterator to retrieve all CPUs, but I don't think it's worth the trouble - I doubt we have enough CPUS to make arrays of booleans impractical before 2020). **************************************************************** From: Robert Dewar Sent: Tuesday, June 5, 2012 7:23 AM > If someone is advocating a change here, I think they ought to suggest > a specific change as opposed to simply calling for a "fix" or "adding > a couple of procedures". Things always seem more reasonable in the abstract! I agree with Randy, I don't see a nice solution, and I think having a mapping for specific implementations that makes sense is good enough! **************************************************************** From: Cousins, Jeff Sent: Wednesday, June 6, 2012 4:03 AM Sorry for joining in late on this - I'd started out on holiday walking in Spain then had to come back due to my mother suffering a stroke. Anyway, I thought I'd already brought this up (Edinburgh??) and been told not to worry, there'd be a mapping if need be. **************************************************************** From: Ed Schonberg Sent: Wednesday, June 6, 2012 4:04 PM yes, "there will be a mapping!". The issue is whether the language itself has to say something about this, or whether it is all up to the implementation. My feeling is that it is the latter, and at most some implementation advice needs to be added to the RM. The mapping will have to be hardware- and OS-dependent, and we can trust the implementors to provide something usable, without guessing what that might be in any specific case. **************************************************************** From: Alan Burns Sent: Thursday, May 31, 2012 8:01 AM [But not actually posted until June 22nd.] The thinking here was, if I recall, that there is no way of knowing how the underlying platform will refer to its cores; and hence a mapping would always be needed. So contiguous numbers in the program seems to make sense > I have now tried out the System.Multiprocessor package. I found a > restriction in the Dispatching_Domain, where you can only assigne a > range of CPUs to such a domain as I understand, going from First to > Last CPU. But, for example in our hardware Architecture, we have 4 > CPU's with each 4 cores, but the numbering is not straight forward. > The first CPU has the cores 0, 4, 8, 12 the second 1,5,9,13. > So here it will be better to have a higher flexibility in Assigning > Domains to ranges and single cores e.g a Domain > (2,6,10,14) pointing the 2nd CPU or (2-3,6-7,10-11,14-15) defining a > Domain of the 2nd and 3rd CPU. **************************************************************** From: Randy Brukardt Sent: Friday, June 22, 2012 4:33 PM Apologies to Alan: I just found this [the above - Editor] in my inbox, waiting to be approved. It came while I was on vacation, and I must have missed it when I checked the inbox after I got back. **************************************************************** From: Robert Dewar Sent: Saturday, June 23, 2012 3:16 PM I don't really understand the mapping idea. However you map Ada numbers to the actual threads, you may want to specify a non-contiguous sequence of the mapped numbers. > But, for example in our hardware Architecture, we have 4 CPU's with > each 4 cores, but the numbering is not straight forward. > The first CPU has the cores 0, 4, 8, 12 the second 1,5,9,13. And you might want to specify all the cores on one processor or one core from each of the 4 separate CPU's (the latter quite likely if you want to be able to avoid multi-core usage completely, as is often the case in certified programs). **************************************************************** From: Geert Bosch Sent: Saturday, June 23, 2012 9:33 PM > And you might want to specify all the cores on one processor or one > core from each of the 4 separate CPU's (the latter quite likely if you > want to be able to avoid multi-core usage completely, as is often the > case in certified programs). This last situation seems strange to me, as you surely don't want to mix unrelated tasks on different cores of the same CPU. I can understand situations where you want 4 separate tasks pinned to 4 separate CPUs, but for that you'd would either assign a specific one to each task, which is of course always possible. Alternatively, you'd assign each of the tasks to an arbitrary core of a specific CPU, for which you'd use ranges. What situation would you have in mind where you don't care which CPU your task runs on, but you want to know the task runs on core1 of that CPU? This seems really uncommon. Anyway, CPU sets are really troublesome, as systems can have many CPUs. Typically we don't want to set a maximum number of CPUs in advance for our programs, but maintaining large sets is too costly in many situations. Also, when distributing binary versions of compiled Ada programs, as is common, you can't really know what the number of CPUs or topology will be of the system the program will run on. Using ranges will be more portable. For example, if you run one set of tasks on the first half of the processors and the other one on the second, would make sense on virtually any multiprocessor. **************************************************************** From: Randy Brukardt Sent: Wednesday, June 27, 2012 11:50 PM ... > Also, when distributing binary versions of compiled Ada programs, as > is common, you can't really know what the number of CPUs or topology > will be of the system the program will run on. Using ranges will be > more portable. For example, if you run one set of tasks on the first > half of the processors and the other one on the second, would make > sense on virtually any multiprocessor. Certainly we're going to continue to allow using ranges to specify dispatching domains. But we realized that ranges are insufficient to characterize the System_Dispatching_Domain even with the current definition of the package, so we really have to have an option to use sets.(*) Moreover, as you say, there is no way for the implementation to really know the number of CPUs or topology -- so that makes it impossible for the implementation to map the CPUs into a contiguous range (expecially in a binary compiled Ada program). Thus, there has to be a way for a program to use discontiguous CPU numbering if that is what the program determines to make sense for the particular topology in use. So we think we *have* to have some mechanism for defining and retrieving sets of CPUs. (Tucker volunteered to create a solution, so don't expect to see anything soon. ;-) Interesting that you reach the exact opposite conclusion from the same facts. (*) Consider a program is running on a system with 8 CPUs. If a domain is Created containing CPUs 3 through 6, the System_Dispatching_Domain holds the remaining CPUs -- in this case, CPUs 1, 2, 7, and 8. Get_First_CPU(System_Dispatching_Domain) = 1 and Get_First_CPU(System_Dispatching_Domain) = 8. but this surely does not completely characterize the values in the System_Dispatching_Domain. This is only using the existing operations in package Dispatching_Domains. I can believe that you would say that it doesn't make sense to Create such a domain, but the problem is that it is legal to do so and the operations in the package need to make sense in that case. (And the alternative of restricting Create even further makes little sense; we don't want to be guessing what future topologies will look like!) **************************************************************** From: Tucker Taft Sent: Thursday, June 28, 2012 5:09 AM >... So we think we *have* to have some mechanism for defining and >retrieving sets of CPUs. (Tucker volunteered to create a solution, so >don't expect to see anything soon. ;-) Interesting that you reach the >exact opposite conclusion from the same facts. If you will send me whatever rough minutes you have on this particular topic, I will send out a draft proposal sooner rather than later. As I mentioned in the meeting, I was confused by J.P. Rosen's proposal to think the problem was more complex than it really is. I now believe a very simple addition to the package will provide all of the functionality we need. **************************************************************** From: Robert Dewar Sent: Thursday, June 28, 2012 7:12 AM > So we think we *have* to have some mechanism for defining and > retrieving sets of CPUs. (Tucker volunteered to create a solution, so > don't expect to see anything soon. ;-) Interesting that you reach the > exact opposite conclusion from the same facts. I definitely agree with Randy (and Tucker) **************************************************************** From: Randy Brukardt Sent: Thursday, June 28, 2012 6:35 PM ... > If you will send me whatever rough minutes you have on this particular > topic, I will send out a draft proposal sooner rather than later. I presume that means on December 1st rather than the 7th. ;-) [Your reputation is pretty well solidified in this area!] > As I mentioned in the meeting, I was > confused by J.P. Rosen's proposal to think the problem was more > complex than it really is. I now believe a very simple addition to > the package will provide all of the functionality we need. I think J-P was trying to show two alternative ways of specifying the mapping, and that made the suggestions more complex than had he picked something and going with it. Anyway, I've already completed the preliminary draft minutes for this AI (that's why I was possessed to write about it last night). That means that these haven't been reviewed by anyone, but they are what I have recorded for the meeting. See them below. ------------ AI12-0033-1/01 Sets of CPUs when defining dispatching How the defined numbering corresponds to the underlying system is implementation-defined. It's not necessarily a direct mapping. Randy notes that an implementation has no practical way to find out the best numbering for a particular user, when it might be controlled by a particular implementation of Linux. The number and organization of CPUs is likely to be determined at run-time; it's likely that programs are compiled to run on any appropriate target, anything from a single core processor to sets of multithreaded multicore processors. The appropriate CPU mappings may vary wildly. The set implementation is looking more interesting. Randy notes that a lot of operations would need to be changed. Tucker wonders if it makes more sense to support a function to let the users update the mapping between the target's underlying numbering and the numbers that the CPU functions use. Steve notes that if a task is running then, things would be weird. This remapping operation could only be executed when no other tasks (other than the environment task) are running. What happens if you refer to CPUs beyond Number_of_CPUs? It seems like such values shouldn't be allowed here (its not possible to run a task using a CPU over that value - D.16(14/3)). After a break, Tucker says that he notes that there is already a problem in that the System_Dispatching_Domain might have holes. For example, if the program is running on a system with 8 CPUs, and a domain is Created containing CPUs 3 through 6, the System_Dispatching_Domain holds the remaining CPUs, which in this case are CPUs 1, 2, 7, and 8. Then Get_First_CPU(System_Dispatching_Domain) = 1 and Get_First_CPU(System_Dispatching_Domain) = 8. but this surely does not completely characterize the values in the System_Dispatching_Domain. Moreover, you can create many holes in the System_Dispatching_Domain this way. So he thinks that it makes more sense to add set operations here in order to properly represent any of these items. And that fixes the problem that was the original question. Tucker volunteers to write up a solution on this basis. Steve notes that Assign_Task does not work on a child task that inherits a non-system domain from its parent. The aspect can be specified (even to System Dispatching Domain), but not Assign_Task. That seems weird. Keep AI alive: 9-0-0. **************************************************************** From: Bob Duff Sent: Thursday, June 28, 2012 6:53 PM This discussion makes me think it was a mistake to have this feature in the standard in the first place. It's just too platform specific. **************************************************************** From: Randy Brukardt Sent: Thursday, June 28, 2012 8:02 PM I agree. Not to mention that its useful to only a very small percentage of applications. (Almost all applications are better off letting the scheduler - controlled via priorities or deadlines - decide what cores to use. They'll get better utilization and be much more portable to other architectures.) But I would argue that goes for a lot of what's found in Annex D -- and I can easily imagine that the real-time folks feel the same way about the various containers and the features added to enhance them. Gotta give every constituency some bones. **************************************************************** From: Tucker Taft Sent: Thursday, June 28, 2012 9:03 PM Current package starts: package System.Multiprocessors.Dispatching_Domains is Dispatching_Domain_Error : exception; type Dispatching_Domain (<>) is limited private; System_Dispatching_Domain : constant Dispatching_Domain; function Create (First, Last : CPU) return Dispatching_Domain; function Get_First_CPU (Domain : Dispatching_Domain) return CPU; function Get_Last_CPU (Domain : Dispatching_Domain) return CPU; ... I would suggest we add: type CPU_Set is array(CPU range <>) of Boolean; [This might be declared in the System.Multiprocessors package] function Create (Set : CPU_Set) return Dispatching_Domain; function Get_CPU_Set (Domain : Dispatching_Domain) return CPU_Set; Comments? I'll do a full AI writeup if this is what we like. **************************************************************** From: Randy Brukardt Sent: Thursday, June 28, 2012 9:26 PM One hopes that the CPU_Set is packed. CPU_Set could get pretty big if we start seeing machines with 10K cores -- but I suppose we'll all be using Parasail by then. ;-) **************************************************************** From: Alan Burns Sent: Friday, June 29, 2012 7:23 AM I think what Tuck suggests is sufficient, but I think it is worth noting why we have what we have. Multiprocessor platforms come in many different forms of course. The standard simplest architecture is a pure SMP (Symmetric Multiprocessor). Here all CPUs are identical and all memory accesses are uniform. All devices are accessible from all CPUs. Next class in CC-NUMA (cache coherent non-uniform multiprocessor architecture). Here CPUS are identical but 'time to main memory' is not uniform. After that there are many forms of heterogeneous platforms with/without cache coherence. What I'm not sure about is an architecture SMP where there are level 2 caches shared between only some of the cores. So, for example a 16 core machine in which there are four banks of four that share a cache. I would prefer to put these in the CC-NUMA class. The dispatching domain proposal was designed for pure SMP. Here you only really need to know how many CPUs are in the domain. It is useful to have numbered CPUs so that a program can designate a semi-partitioned approach within a domain. The current package definition is sufficient for this usage, it was always assumed there would be an implementation defined mapping between whatever numbers/names the hardware uses and the CPU numbers in the program. But for SMPs this is sufficient. If we want the language to give some support to CC-NUMA architectures then some knowledge of the platform needs to be available. This could concern which cores share a cache; and these numbers may not be contiguous - so the use of sets is a useful extension. But we did decide before that, at this time, Ada should not support a broader set of facilities - for example a dynamic number of available CPUs, moving CPUs between domains etc. If you are still reading this - perhaps a use case for dispatching domains is useful. Consider a 16 core chip. One could map all tasks to just one CPU (a fully partitioned systems). Or one could allow all tasks to run on all CPUs (one dispatching domain, a fully global system). Alternatively one could use four domains of four CPUs each. An implementation COULD provide distinct ready queues for each domain which would probably be more efficient that the fully global scheme. Note for a pure SMP it does not matter which 4 CPUs are in each domain - but for CC-NUMA it might if, for example numbers 1, 5, 9 and 13 share some resource. **************************************************************** From: Tucker Taft Sent: Friday, June 29, 2012 7:50 AM > ... What I'm not sure about is an architecture SMP where there are > level > 2 caches shared between only some of the cores. So, for example a 16 > core machine in which there are four banks of four that share a cache. > I would prefer to put these in the CC-NUMA class. > > The dispatching domain proposal was designed for pure SMP. ... Unfortunately, I don't think there really will be many such machines. CC-NUMA seems much more likely, and providing support which only makes sense for "pure SMP" will simply be frustrating. Yes, perhaps we would need to provide some kind of query about cache structure or other NUMA-related information, but that seems more naturally to be something that could be in some other package, or provided outside the language completely. But the basic partitioning into dispatching domains will ultimately need to be more flexible, and I think we know this today. On the other hand, I would be surprised if we knew today how best to provide the other information about cache/NUMA structure. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, June 29, 2012 3:33 PM > But the basic partitioning into dispatching domains will ultimately > need to be more flexible, and I think we know this today. On the > other hand, I would be surprised if we knew today how best to provide > the other information about cache/NUMA structure. Moreover, I don't see any harm in providing that facility. Maybe, with an implementation permission not to support sets for implementations where it doesn't make sense (there is always the blank permission, but an explicit one may have some value to avoid questions) **************************************************************** From: Geert Bosch Sent: Tuesday, July 3, 2012 11:02 AM > Unfortunately, I don't think there really will be many such machines. > CC-NUMA seems much more likely, and providing support which only makes > sense for "pure SMP" will simply be frustrating. Yes, perhaps we > would need to provide some kind of query about cache structure or > other NUMA-related information, but that seems more naturally to be > something that could be in some other package, or provided outside the > language completely. IMO, it should be completely outside the language. There surely is nothing to standardize at this point. > But the basic partitioning into dispatching domains will ultimately > need to be more flexible, and I think we know this today. On the > other hand, I would be surprised if we knew today how best to provide > the other information about cache/NUMA structure. We right now have machines with N sockets, each with P dies, Q cores per die, and R simultaneous (hyper) threads per core. At any point in time, any of these may be on-line or off-line, especially for power-saving purposes. Add to this the prevalence of hypervisors, commonly on large machines, and even increasingly on embedded systems for partitioning purposes, and it becomes clear that even the notion of Number_Of_CPUs is not straightforward. A facility that may make sense for one system, may not make sense on another. So, I think there should first be more experience with actual implementations. Only when some approach turns out to be generally useful, should we consider standardizing it. **************************************************************** From: Randy Brukardt Sent: Tuesday, July 3, 2012 1:23 PM ... > A facility that may make sense for one system, may not make sense on > another. So, I think there should first be more experience with actual > implementations. Only when some approach turns out to be generally > useful, should we consider standardizing it. I totally agree with this. And that means that the entire dispatching domain mechanism should not be part of the language (there is no evidence that it is "generally useful" at this point). (Raw CPU affinity -- that is package System.Multiprocessing -- is much more obviously useful -- and even it is marginal. I think we ought to have let people build on that before offering any other facilities, and waited for more experience to determine what is *really* needed.) Be that as it may, dispatching domains *are* part of the language. And it's plenty obvious that they are too hard to use for their intended purpose. And on top of that, that you can't reliably find out the contents of the System_Dispatching_Domain. So it should be obvious that the facility is inadequate for any use in its current form. The claim that dispatching domains are only for homogeneous machines is laughable. Dispatching domains could only be useful for machines with many CPUs, yet are there are almost no homogeneous machines with more than 4 CPUs. Dispatching domains are complete overkill for a quadcore system like my desktop; basic CPU affinity is sufficient (if any CPU mapping is required at all, which is itself a dubious proposition). So if this feature is useful at all (and one has to presume it is, else it would not be part of the standard), it has to work on non-homogeneous systems. It's surely premature to go beyond that fix at this point. And no one has seriously suggested doing so. But it doesn't matter anyway; we get these Annex D features almost full grown from IRTAW and we (the ARG) have almost always deferred to those experts as to whether to add them. The ARG just gets to clean up the mess afterwards. I very much doubt that the ARG would create any Annex D feature from air -- it's never happened the entire time I've been on the ARG. In any case, it's not clear what you're arguing for anymore -- are you arguing that we shouldn't even fix Dispatching_Domains so that they work for their intended purpose, or are you arguing against the straw man of additional operations? **************************************************************** From: Geert Bosch Sent: Tuesday, July 3, 2012 10:42 PM > In any case, it's not clear what you're arguing for anymore -- are you > arguing that we shouldn't even fix Dispatching_Domains so that they > work for their intended purpose, or are you arguing against the straw > man of additional operations? Clearly, what is currently in the standard is there to stay. With an appropriate implementation-defined mapping between the Ada notion of CPU and the system on which the program executes, the Dispatching Domains can be used as intended. Such a mapping provides the extra level of indirection that can be used to allow dispatching domains to map on arbitrary sets of physical processors. Further requests for standardization should wait until there is more experience with implementing and using the current standard. So yes, we shouldn't even "fix" Dispatching_Domains, as it is not clear what needs fixing and what the proper fix is. **************************************************************** From: Jean-Pierre Rosen Sent: Wednesday, July 4, 2012 12:08 AM > Further requests for standardization should wait until there is more > experience with implementing and using the current standard. > So yes, we shouldn't even "fix" Dispatching_Domains, as it is not > clear what needs fixing and what the proper fix is. In general, I would agree with such a position, but for the case at hand, it's just fixing an obvious omission. Let's face it: dispatching domains /are/ sets of processors, it's just that currently we can specify only sets made of consecutive CPU numbers (but we can indirectly create a set of non-consecutive CPU numbers). The proposal is just about adding trivial subprograms that don't change the notion of dispatching domains in any way. **************************************************************** From: Bob Duff Sent: Wednesday, July 4, 2012 9:38 AM >... So if this feature is useful at > all (and one has to presume it is, else it would not be part of the >standard), ... Please do not make such presumptions! **************************************************************** From: Robert Dewar Sent: Wednesday, July 4, 2012 10:03 AM From a formal point of view, it's a reasonable presumption :-) **************************************************************** From: Robert Dewar Sent: Wednesday, July 4, 2012 11:08 AM By which I mean, by including a feature the ARG has made a presumptive assumption that the feature is useful. Now it may turn out that it's not as useful as initially thought, but then the cure is to make it useful, not **************************************************************** From: Tucker Taft Sent: Thursday, July 5, 2012 9:04 AM > Clearly, what is currently in the standard is there to stay. > With an appropriate implementation-defined mapping between the Ada > notion of CPU and the system on which the program executes, the > Dispatching Domains can be used as intended. For me, the convincing argument for adding the more general set notion was that the System dispatching domain can end up with "holes" in it. Therefore, we either need to somehow prevent that from happening, or provide some way to represent a non-contiguous sequence of CPU ids. Since it is difficult for implementations to guess what sort of "level of indirection" would be ideal for all applications, it seemed simplest to just augment the basic capability, so it is possible to talk about "holey" sets. It really isn't that big a deal to add these two interfaces, and without it, or some other set of restrictions on how the system domain can be "carved up," the current capability is broken, in my view. **************************************************************** From: Robert Dewar Sent: Thursday, July 5, 2012 9:26 AM > For me, the convincing argument for adding the more general set notion > was that the System dispatching domain can end up with "holes" in it. > Therefore, we either need to somehow prevent that from happening, or > provide some way to represent a non-contiguous sequence of CPU ids. > Since it is difficult for implementations to guess what sort of "level > of indirection" would be ideal for all applications, it seemed > simplest to just augment the basic capability, so it is possible to > talk about "holey" sets. I agree with Tuck on this > It really isn't that big a deal to add these two interfaces, and > without it, or some other set of restrictions on how the system domain > can be "carved up," the current capability is broken, in my view. Indeed **************************************************************** From: Geert Bosch Sent: Thursday, July 5, 2012 9:42 AM > In general, I would agree with such a position, but for the case at > hand, it's just fixing an obvious omission. Let's face it: dispatching > domains /are/ sets of processors, it's just that currently we can > specify only sets made of consecutive CPU numbers (but we can > indirectly create a set of non-consecutive CPU numbers). How is that? A Dispatching_Domain is created as range and cannot change. Despite the fact that Assign_Task has the domain as "in out", one cannot use it to change the set of CPUs, see D.16.1(26). As far as I see from the standard, each CPU belongs to exactly one dispatching domain. So, by renumbering CPUs, one can always get each dispatching domain to be represented by a range which would make it pointless to use sets anyway. > The proposal is just about adding trivial subprograms that don't > change the notion of dispatching domains in any way. I don't quite see the notion in the Ada RM that dispatching domains are sets. For our compiler, we support 65536 processors. If we'd be required to implement Dispatching_Domain as set, each variable of the type would require 8 kB. On the other hand, if Dispatching_Domain is a range, it just requires 4 bytes and is far more efficient to deal with. **************************************************************** From: Robert Dewar Sent: Thursday, July 5, 2012 9:49 AM ... > How is that? A Dispatching_Domain is created as range and cannot change. > Despite the fact that Assign_Task has the domain as "in out", one > cannot use it to change the set of CPUs, see D.16.1(26). As far as I > see from the standard, each CPU belongs to exactly one dispatching domain. > So, by renumbering CPUs, one can always get each dispatching domain to > be represented by a range which would make it pointless to use sets > anyway. But who would do the renumbering? I really don't understand the point Geert is trying to make here. **************************************************************** From: Edmond Schonberg Sent: Thursday, July 5, 2012 10:12 AM ... > I don't quite see the notion in the Ada RM that dispatching domains are sets. > For our compiler, we support 65536 processors. If we'd be required to > implement Dispatching_Domain as set, each variable of the type would require 8 kB. > On the other hand, if Dispatching_Domain is a range, it just requires > 4 bytes and is far more efficient to deal with. The problem is the System_Dispatching_Domain, which originally includes all processors. If you carve out of it a specific domain with a range, the System_Dispatching_Domain becomes disjoint. so right away you need to notion of a set. You might as well allow it for all domains. **************************************************************** From: Geert Bosch Sent: Thursday, July 5, 2012 10:15 AM >> How is that? A Dispatching_Domain is created as range and cannot change. >> Despite the fact that Assign_Task has the domain as "in out", one >> cannot use it to change the set of CPUs, see D.16.1(26). As far as I >> see from the standard, each CPU belongs to exactly one dispatching domain. >> So, by renumbering CPUs, one can always get each dispatching domain >> to be represented by a range which would make it pointless to use >> sets anyway. > > But who would do the renumbering? I really don't understand the point > Geert is trying to make here. It has been asserted that a Dispatching_Domain refers to a set. In my reading of the language a Dispatching_Domain always refers to a contiguous range of CPUs. I only used the renumbering argument to show that it doesn't make sense to talk about sets if every CPU is contained in exactly one domain. Forget about that, if you wish. The current RM does not allow one to create a "holey" Dispatching_Domain. I don't want to add extra subprograms that suddenly enable that capability. **************************************************************** From: Jeff Cousins Sent: Thursday, July 5, 2012 10:22 AM > For our compiler, we support 65536 processors. If we'd be required to > implement Dispatching_Domain as set, each variable of the type would require 8 > kB. On the other hand, if Dispatching_Domain is a range, it just requires 4 > bytes and is far more efficient to deal with. How often would people be creating variables of the type? I would have thought that assigning tasks to domains would be controlled from the main procedure (running under the environment task) during start-up, rather than all over the place. **************************************************************** From: Robert Dewar Sent: Thursday, July 5, 2012 10:59 AM > I only used the renumbering argument to show that it doesn't make > sense to talk about sets if every CPU is contained in exactly one domain. > Forget about that, if you wish. The current RM does not allow one to > create a "holey" Dispatching_Domain. I don't want to add extra > subprograms that suddenly enable that capability. OK, so that's the focus of disagreement, I do want to add such subprograms, because it seems easy to do and useful. **************************************************************** From: Geert Bosch Sent: Thursday, July 5, 2012 11:06 AM > The problem is the System_Dispatching_Domain, which originally includes all processors. > If you carve out of it a specific domain with a range, the System_Dispatching_Domain > becomes disjoint. So right away you need to notion of a set. You might as well allow > it for all domains. Ah, the "you might as well" argument :-) So, while I see now how one could leave the System_Dispatching_Domain with holes, an implementation does not need to support this. The simplest one is for an implementation to only support a domain that leaves the System_Dispatching_Domain contiguous. I believe the text in D.16.1(23) allows this. The other obvious option is to only allow a small number of dispatching domains (say 4), and just check each in order to check if the CPU is in that domain. This is still very efficient at run time and doesn't require a lot of space, even for large numbers of CPUs. Given that each processor belongs to exactly one dispatching domain, I honestly don't see what using sets would accomplish. **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, July 5, 2012 11:08 AM > I only used the renumbering argument to show that it doesn't make > sense to talk about sets if every CPU is contained in exactly one domain. > Forget about that, if you wish. The current RM does not allow one to > create a "holey" Dispatching_Domain. I don't want to add extra > subprograms that suddenly enable that capability. But that's the wrong hypothesis. The current RM DOES allow "holey" Dispatching_Domain. **************************************************************** From: Tucker Taft Sent: Thursday, July 5, 2012 11:31 AM Note that the proposed interface allows the set to be specified by an array that is no larger than necessary. That is, the bounds of the boolean array can be something other than 1..Num_CPUs. For example: My_Holey_Domain : Dispatching_Domain := Create((1|5|9|13 => True, 2..4|6..8|10..12 => False)); **************************************************************** From: Randy Brukardt Sent: Thursday, July 5, 2012 12:28 PM > > I only used the renumbering argument to show that it doesn't make > > sense to talk about sets if every CPU is contained in > exactly one domain. > > Forget about that, if you wish. The current RM does not > allow one to > > create a "holey" Dispatching_Domain. I don't want to add extra > > subprograms that suddenly enable that capability. > > But that's the wrong hypothesis. The current RM DOES allow "holey" > Dispatching_Domain. He's technically right. The definition of Create includes: "A call of Create will raise Dispatching_Domain_Error if... the system cannot support a distinct domain over the processors identified..." That allows the implementation to reject any call on Create that it likes (certainly including the holey System_Dispatching_Domain). The problem is that this permission opens a truck-sized hole in the definition; it essentially means that supporting this package is optional (the system never *has* to support creating a dispatching domain). I don't think the intent was that this package was optional -- if that was the intent, it should have been made explicit. As such, I think this sentence should be removed and replaced by something specific in order to explain what is and is not allowed. Because the entire reason to standardize on a facility is to provide portability between implementations. If the implementations aren't even required to support basic functionality, then using this facility is just another way to force vendor-lock-in. (Especially as using the facility is very unlikely, almost no programs have any reason to use it -- especially as it destroys all of the scheduling guarantees given elsewhere in the Standard.) Indeed, the permission of D.16.1(33/3) seems to be all that is required. So perhaps we don't need the sentence at all?? In any case, if basic use of this package isn't portable at all, then I don't think it even belongs in the language. What could possibly be the point? The Standard isn't about defining facilities that might be interesting to have sometimes!! **************************************************************** From: Randy Brukardt Sent: Thursday, July 5, 2012 12:49 PM ... > So, by renumbering CPUs, one can always get each dispatching domain to > be represented by a range which would make it pointless to use sets > anyway. The problem with that is who does the renumbering? You convinced us that it is impossible for the implementation to do such a renumbering -- it probably doesn't even know the architecture involved. Which leaves it to the user. If the user has to use some implementation-defined mechanism to do such a remapping, all portability is lost. So if such a facility is required, it has to be within the language. Indeed, we considered adding such a facility to the existing packages, but it seemed complex and it didn't seem any better than allowing sets in the first place. > > The proposal is just about adding trivial subprograms that don't > > change the notion of dispatching domains in any way. > > I don't quite see the notion in the Ada RM that dispatching domains are > sets. For our compiler, we support 65536 processors. Oh, the old "it hurts when I do this argument". > If we'd be required to implement Dispatching_Domain as set, each > variable of the type would require 8 kB. Not at all; you can represent the set anyway you want. We don't have a problem with Wide_Character_Set, even though it is logically the same time (64K bits), because we use more compact representations than that. And we even have Wide_Wide_Character_Set, which would be practically impossible to represent as an array of bits. > On the other hand, if Dispatching_Domain is a range, it just requires > 4 bytes and is far more efficient to deal with. Well, unless we remove the permission to reject from D.16.1(23) [which I think we should do, but I can believe that won't happen], I don't think there will be a real requirement to support a set. (I would expect that the set Create would have the same permissions.) OTOH, given that Robert and Ed both support adding these interfaces, as does the majority of the remainder of the ARG, I suspect you will be directed to implement it. So you need to figure out how to do it efficiently. :-) [Also note, so long as the permission to reject exists, you can easily represent this set as a fixed number of ranges; if someone tries to use a more complex set, you can just reject. Also, as these are limited types with no assignment, a small amount of dynamic allocation is unlikely to be a problem, since you don't have to support assignment nor streaming nor any changes after initialization (and thus not even deallocation is needed). And this can be "faked" with a permanent "pool" of ranges to "allocate" from (just like you would do in SPARC or Fortran 66).] **************************************************************** From: Bob Duff Sent: Thursday, July 5, 2012 12:42 PM > > I don't think the intent was that this package was optional I can't too excited about that. Everything in every standard is optional, despite all the "shall"s. The police won't arrest you if you disobey the Ada RM. ;-) We trust implementations not to abuse their right to disobey. >... (Especially as using the facility is > > very unlikely, almost no programs have any reason to use it Another reason not to be excited about this issue. > In any case, if basic use of this package isn't portable at all, then I > don't think it even belongs in the language. It probably doesn't belong. >... What could possibly be the > point? The Standard isn't about defining facilities that might be > interesting to have sometimes!! You mean like the _optional_ annex in which this facility lives? ;-) Having said all that, Tucker's proposal seems pretty simple and harmless, and trivial to implement, so I don't object to including it. Alternatively, we could consider removing this facility as a binding interpretation. A few years from now, that would be an unacceptable incompatibility, but at this point, with Ada 2012 not yet approved, and nobody using this package... **************************************************************** From: Robert Dewar Sent: Thursday, July 5, 2012 1:22 PM > In any case, if basic use of this package isn't portable at all, then > I don't think it even belongs in the language. What could possibly be > the point? The Standard isn't about defining facilities that might be > interesting to have sometimes!! Well one hopes that implementations try to be consistent, and the requirement means that *something* has to be provided. We have plenty of cases of things that aren't portable at all. package Machine_Code is the most oobvious case. **************************************************************** From: Geert Bosch Sent: Thursday, July 26, 2012 1:14 PM As far as sets vs. ranges are concerned, the topology of typical shared memory systems can be given as a tree: [System] / \ [node1] . . . [nodeN] / \ [socket1] . . . socketN] / \ [die1] . . . [dieN] / \ [core1] . . . [coreN] / \ [thread1] . . . [threadN] The threads at the leaves represent simultaneously executing threads (such as Intel's hyper threading), while higher levels represent multi-core to up to inter-node shared memory connects. The term "processor" or CPU is ambiguous, and is used to refers to different levels in the hierarchy. While it may be reasonable to equate the Ada notion of processor or CPU to each of the nodes * sockets * dies * cores * threads execution leaves, that is not the only possible or reasonable one. In particular, it is an unreasonable restriction and expectation that there is a fixed 1-to-1 mapping between system threads and Ada processors. Even the Linux OS kernel does not define a specific mapping between its concept of CPU number and actual sockets or cores. With the widespread use of hypervisors and virtualization, even in the real-time embedded systems that Dispatching_Domains are meant for, there may not even be the possibility to guarantee a fixed mapping. If we accept that the Ada notion of processor/CPU only relates to some execution resource that has a single running task associated with it, and if we accept that a CPU can only belong to a single dispatching domain, it becomes clear that allowing users to specify sets doesn't achieve anything. The Ada implementation, or the OS or hypervisor it runs in, can always change the mapping in an arbitrary way. A good implementation (wether the Ada run time, or the OS kernel) would indeed do that in such a manner that each dispatching domain would have maximal locality by using execution resources on the same socket, die or core. Anything else would mean that a program tuned for one specific observed mapping between CPU numbers and execution resources might perform horribly on any other system, or even on the same system on a different day. This would mean writing portable code using Dispatching_Domains would be impossible. **************************************************************** From: Randy Brukardt Sent: Thursday, July 26, 2012 2:27 PM ... > If we accept that the Ada notion of processor/CPU only relates to some > execution resource that has a single running task associated with it, > and if we accept that a CPU can only belong to a single dispatching > domain, it becomes clear that allowing users to specify sets doesn't > achieve anything. The Ada implementation, or the OS or hypervisor it > runs in, can always change the mapping in an arbitrary way. If it does, then the user can't use this for any sort of tuning, which seems to ensure that it can't be used for anything. And I think we all agree that existing OSes (and hypervisors) are incompatible with strict Annex D semantics, so we're really only talking about purpose-built Annex-D compliant kernels. You can't say anything interesting about programs that run on Windows or Linux or VMWare. > A good implementation (wether the Ada run time, or the OS > kernel) would indeed do that in such a manner that each dispatching > domain would have maximal locality by using execution resources on the > same socket, die or core. Anything else would mean that a program > tuned for one specific observed mapping between CPU numbers and > execution resources might perform horribly on any other system, or > even on the same system on a different day. This would mean writing > portable code using Dispatching_Domains would be impossible. I don't understand your thinking. Of course writing portable code using Dispatching_Domains is impossible. The entire point of using facilities like this is to tightly tune an application for a particular piece of hardware. Moving it to another piece of hardware requires retuning; the only thing that is "portable" is that the application will most likely still continue to run on another piece of hardware (but with substantially degraded performance). If the underlying system is doing all of this remapping anyway, there is no real possibility of tuning (because as soon as you do, the system will remap and the work to tune will go out the window). Moreover, on such a system, 98% of programs are better off letting the system do the tuning on the fly -- it has far more information than the programmer could use. (This is very similar to optimizations, almost always the programmer is better off letting the compiler decide on optimizations rather than trying to force them by recoding.) Using dispatching domains in such a system would almost always make performance worse. D.16 facilities are clearly a very specialized need that ought to be only used in rare instances. And in those instances, performance will clearly trump portability, so little or none is required. (Which is why I don't think Dispatching_Domains should be in the language at all, but I digress.) And the sorts of systems on which it even makes sense to talk about it are very limited kernels - no fancy OSes or hypervisors in sight. **************************************************************** From: Robert Dewar Sent: Thursday, July 26, 2012 2:44 PM > I don't understand your thinking. Of course writing portable code > using Dispatching_Domains is impossible. The entire point of using > facilities like this is to tightly tune an application for a particular piece of hardware. > Moving it to another piece of hardware requires retuning; the only > thing that is "portable" is that the application will most likely > still continue to run on another piece of hardware (but with > substantially degraded performance). Or just incorrect performance. For instance we may assign one task to a single processor, if that does not map properly on the new hardrware, that one task might be preempted, and the application may assume this is impossible. > D.16 facilities are clearly a very specialized need that ought to be > only used in rare instances. And in those instances, performance will > clearly trump portability, so little or none is required. (Which is > why I don't think Dispatching_Domains should be in the language at > all, but I digress.) And the sorts of systems on which it even makes > sense to talk about it are very limited kernels - no fancy OSes or hypervisors in sight. The one thing about having dispatching domains in the language is that it means Ada programmers know about it and know to go and look for the feature and how it is implemented. If they are nnot in the language, you are counting on the programmer to read all the documentation and find out about the specialized facility for a particular implementation. That's unlikely, you would be amazeed how many of our support questions are of the form: Is there a way to do xxx answer: yes, this is a standard part of the GNAT implementation, as follows .... P.S. this is documented in section xxxx of yyyy manual. **************************************************************** From: Geert Bosch Sent: Thursday, July 26, 2012 3:14 PM >> If we accept that the Ada notion of processor/CPU only relates to >> some execution resource that has a single running task associated >> with it, and if we accept that a CPU can only belong to a single >> dispatching domain, it becomes clear that allowing users to specify >> sets doesn't achieve anything. The Ada implementation, or the OS or >> hypervisor it runs in, can always change the mapping in an arbitrary >> way. > > If it does, then the user can't use this for any sort of tuning, which > seems to ensure that it can't be used for anything. And I think we all > agree that existing OSes (and hypervisors) are incompatible with > strict Annex D semantics, so we're really only talking about > purpose-built Annex-D compliant kernels. You can't say anything > interesting about programs that run on Windows or Linux or VMWare. Linux supports running real-time programs. Of course, this requires root privilege. If I use Linux for driving my 3D printer, I may want to use one or two processors using FIFO scheduling for controlling the stepper motors, monitoring heaters and sensors, while I want to use the remaining processors for user-interface, model slicing, network communication and such. Using today's language that should be possible. The comment from the early adopter that started this thread just reflects that GNAT doesn't do a good job implementing dispatching domains yet. That will change with time. >> A good implementation (wether the Ada run time, or the OS >> kernel) would indeed do that in such a manner that each dispatching >> domain would have maximal locality by using execution resources on >> the same socket, die or core. Anything else would mean that a program >> tuned for one specific observed mapping between CPU numbers and >> execution resources might perform horribly on any other system, or >> even on the same system on a different day. This would mean writing >> portable code using Dispatching_Domains would be impossible. > > I don't understand your thinking. Of course writing portable code > using Dispatching_Domains is impossible. The entire point of using > facilities like this is to tightly tune an application for a particular piece of hardware. > Moving it to another piece of hardware requires retuning; the only > thing that is "portable" is that the application will most likely > still continue to run on another piece of hardware (but with > substantially degraded performance). I disagree. I think it is perfectly possible to write portable code using dispatching domains on top of a real time operating system such as Linux. This thread started with an early adopter trying to do just that. While indeed he didn't have a good experience yet, that was just because of a poor implementation. > If the underlying system is doing all of this remapping anyway, there > is no real possibility of tuning (because as soon as you do, the > system will remap and the work to tune will go out the window). > Moreover, on such a system, 98% of programs are better off letting the > system do the tuning on the fly > -- it has far more information than the programmer could use. (This is > very similar to optimizations, almost always the programmer is better > off letting the compiler decide on optimizations rather than trying to > force them by > recoding.) Using dispatching domains in such a system would almost > always make performance worse. No, I think dispatching domains make sense on general purpose real time operating systems. As for specialized Ada implementations for very small custom kernels, they can always allow the user to specify a mapping between Ada processor numbers and hardware threads, though I think that a sensible default mapping (walking the CPU tree depth first) will always suffice. > D.16 facilities are clearly a very specialized need that ought to be > only used in rare instances. And in those instances, performance will > clearly trump portability, so little or none is required. (Which is > why I don't think Dispatching_Domains should be in the language at > all, but I digress.) And the sorts of systems on which it even makes > sense to talk about it are very limited kernels - no fancy OSes or > hypervisors in sight I'd reverse this argument: it only makes sense to have Dispatching_Domains in the language if we allow them to make sense on real-time operating systems such as Linux and VxWorks. **************************************************************** From: Randy Brukardt Sent: Thursday, July 26, 2012 3:50 PM ... > > If it does, then the user can't use this for any sort of tuning, > > which seems to ensure that it can't be used for anything. And I > > think we all agree that existing OSes (and hypervisors) are > > incompatible with strict Annex D semantics, so we're really only > > talking about purpose-built Annex-D compliant kernels. You can't say > > anything interesting about programs that run on Windows or Linux or VMWare. > > Linux supports running real-time programs. Sure. But it doesn't support strict Annex D compliance. So it's not relevant here. > Of course, this > requires root privilege. If I use Linux for driving my 3D printer, I > may want to use one or two processors using FIFO scheduling for > controlling the stepper motors, monitoring heaters and sensors, while > I want to use the remaining processors for user-interface, model > slicing, network communication and such. Using today's language that > should be possible. Sure, but that's just using Annex D to define (hopefully thin) interfaces to the whatever the underlying kernel. All of the semantics described in Annex D is irrelevant in that case. (You can't possibly figure out whether the OS actually supports everything correctly - it could change with every OS update - so you have two choices: (A) lie, or (B) don't claim Annex D compliance. Most implementations used to do (A), but I think more are now leaning toward (B), which is much better IMHO, both for the implementer and the user.) ... > >> A good implementation (wether the Ada run time, or the OS > >> kernel) would indeed do that in such a manner that each dispatching > >> domain would have maximal locality by using execution resources on > >> the same socket, die or core. Anything else would mean that a > >> program tuned for one specific observed mapping between CPU numbers > >> and execution resources might perform horribly on any other system, > >> or even on the same system on a different day. This would mean > >> writing portable code using Dispatching_Domains would be impossible. > > > > I don't understand your thinking. Of course writing portable code > > using Dispatching_Domains is impossible. The entire point of using > > facilities like this is to tightly tune an application for a > > particular piece of hardware. > > Moving it to another piece of hardware requires retuning; the only > > thing that is "portable" is that the application will most likely > > still continue to run on another piece of hardware (but with > > substantially degraded performance). > I disagree. I think it is perfectly possible to write portable code > using dispatching domains on top of a real time operating system such > as Linux. > This thread started with an early adopter trying to do just that. > While indeed he didn't have a good experience yet, that was just > because of a poor implementation. The implementation doesn't sound "poor" to me. Your imaginary "good" implementation sounds harmful to me. If you're implementing on top of an OS, you want packages like Dispatching_Domains to be as thin as possible a binding to the underlying OS's facilities. (Much like Stream_IO and Ada.Directories.) Because your users are going to want to use the features of the kernel, not some weird Ada mapping of them. So any mapping beyond what the OS does is much more likely to be harmful than helpful. > > If the underlying system is doing all of this remapping anyway, > > there is no real possibility of tuning (because as soon as you do, > > the system will remap and the work to tune will go out the window). > > Moreover, on such a system, 98% of programs are better off letting > > the system do the tuning on the fly > > -- it has far more information than the programmer could use. (This > > is very similar to optimizations, almost always the programmer is > > better off letting the compiler decide on optimizations rather than > > trying to force them by > > recoding.) Using dispatching domains in such a system would almost > > always make performance worse. > No, I think dispatching domains make sense on general purpose real > time operating systems. As for specialized Ada implementations for > very small custom kernels, they can always allow the user to specify a > mapping between Ada processor numbers and hardware threads, though I > think that a sensible default mapping (walking the CPU tree depth > first) will always suffice. That sort of extra level of mapping is precisely what we *don't* want. You end up with some implementation-defined control mechanism on top of Dispatching_Domains, and you have to use both to do any tuning. How that helps portability is beyond me. > > D.16 facilities are clearly a very specialized need that ought to be > > only used in rare instances. And in those instances, performance > > will clearly trump portability, so little or none is required. > > (Which is why I don't think Dispatching_Domains should be in the > > language at all, but I digress.) And the sorts of systems on which > > it even makes sense to talk about it are very limited kernels - no > > fancy OSes or hypervisors in sight > > I'd reverse this argument: it only makes sense to have > Dispatching_Domains in the language if we allow them to make sense on > real-time operating systems such as Linux and VxWorks. For that, only the interface matters (the semantics will be defined by the underlying system, not anything written in the RM). And the more flexible the interface, the better it can map to the underlying facilities. P.S. Why are you still fighting this battle that's lost? The rest of us have long since agreed and moved on. Sheesh. **************************************************************** [Some possibly relevant discussion that happened in parallel with the above can be found in AI12-0048-1 - Editor.] **************************************************************** From: Tucker Taft Sent: Wednesday, November 28, 2012 7:34 PM Here is a rewrite of AI12-0033 to propose a set representation. It is just a simple boolean array indexed by CPU. It is unconstrained so the array can be as short as possible so as to only include the True values. It could of course also be packed if the implementation so chose. I considered a more complex representation but it seemed to be overkill. Even if there were thousands of CPUs, the number of dispatching domains is likely to be pretty small, so the total amount of space devoted to these bit-vectors would never amount to much. [This is version /02 of the AI - Editor.] **************************************************************** From: Randy Brukardt Sent: Wednesday, November 28, 2012 9:55 PM ... > !question > ... > Should an additional Create routine be defined? (No.) And then... > !wording > > After 9/3: > type CPU_Set is array(CPU range <>) of Boolean; > > function Create (Set : CPU_Set) return Dispatching_Domain; Humm -- this looks like an additional Create routine. ;-) It appears that the answer to the question should have been changed. It also would have helped if you had converted this to the form of a Binding Interpretation -- it surely isn't a Ramification any more. I've made these corrections in the version I filed. (And sent this message to make it clear that I made these changes to the version you sent.) **************************************************************** From: Randy Brukardt Sent: Wednesday, November 28, 2012 10:05 PM ... > Presumably a pragma Pack(CPU_Set) might be in the private part of this > package, if the implementation so chooses. pragma Pack is obsolescent, it's better to talk about aspect Pack if possible. "Aspect Pack might be applied to CPU_Set in the private part of this package, if the implementation so chooses." (I dropped the "Presumably" as we don't need both "might" and "presumably" in the same sentence.) **************************************************************** From: Alan Burns Sent: Thursday, May 30, 2013 8:18 AM There has been discussions on these AIs from within the IRTAW 'community'. Our views are as follows (note we also looked at AI-0048 as well). ... [Only relevant part here - Editor.] AI-0033-1 We see no problem with giving a more expressive representation to CPUs. Indeed the original proposal that came from IRTAW used sets. There does seem to be an issue with the definition of Create that was noted in AI-0055 discussions. The description for Create (to create a Dispatching Domain) as contained in D.16.1 23/3 actually allows a domain to be created with no CPUs in it. This is a bug. It was never the intention, and a quick fix is to extend the circumstances in which the exception is raised. It might be worth considering using an aspect clause to define an appropriate pre-condition (First <= Last) to reinforce the point. ****************************************************************