The planned work spans in several axis, splited in work packages:
We aim at producing a scientific instrument directly usable by a large community. We work in close loop with end-users to ensure that the tool is well adapted to their need (WP6).
The kernel of the simulator relies on models that can be selected and composed at runtime. Currently, there is only one model for CPUs and four models for networks. More models are needed to take advantage of the modularity of the simulator, thus, increasing the ability to study the same code at several levels of accuracy and several scales.
Realistic model instances are needed to feed the simulator. There are two main expected benefits:
Outside the scope of simulation, model instantiation is also very useful to provide the data necessary to put optimization algorithms into practice.
The aim of this work package is to provide a visualization environment for the analysis of simulation traces produced by a SimGrid instrumentation. A first analysis is devoted to the identification of specific patterns (sequences of events) in the trace and consequently focus on the causes of performance anomalies of the application (e.g., identification of bottlenecks). Then statistical tools should establish the global performance profile (resources utilization, average latencies, waiting time, etc.) of the simulated application by applying performance functions on the trace. The main difficulty of this work package is to manage the huge size of traces (i.e., find a reasonable trade-off between accuracy of measures and size) and give the capability to browse efficiently among these data.
We aim, in this work package, to set up, control and report large campaigns of simulations. Computed simulations are based on parameter sweep, or on arrangement of several parameters. Computations will be executed on a large scale distributed grid architecture.
We can notice that this work package focuses on the management of a specific experiment campaign, and does not aim to manage the execution on a simulation. This last point is devoted to WP3.
In the current version of the simulator, system threads representing simulated processes run in a mutually exclusive manner. This is not really a design choice (although it mitigates multi-threading difficulties to users), but mainly because of a technical reason. Only simulated processes which are done with any previous simulated communication or computation can go further. So, to have two simulated processes ready to run at the same time, their previous simulated actions must finish at the very same instant. Because of floating point arithmetic precision, it is thus very unlikely that many simulated processes are ready at the very same instant.
We envision two ways of surpassing this limit and increase the level of parallelism between simulated processes. First, the constant time model developed in task 1.1 will be of a great help here since it induces that every simulated processes are always ready at the very same instant (since any action last the exact same time). The second approach is more general, at the price of an increased difficulty. A process A ready at time t cannot disturb another process B which would be ready at time t + if the network latency between the two machines is greater than . Both A and B can thus be started in parallel, even if the simulated clock did not reach the point where B is ready. Such prospective startups could greatly help when running precise models in parallel since the network latency would be used to smoothen floating point equality issues.
Part of the success of the USS SimGrid project will be to offer an efficient and validated simulation framework to academic and industrial end users working in other fields of Computer Science. This work package targets two complementary research communities: peer-to-peer and high performance computing. More precisely, one task will address the problem of cluster dimensioning in collaboration between AlGorille and Mescal while another task will focus on peer-to-peer backup and will involve MASCOTTE.
The main objective is to demonstrate that, within a single application simulation framework, it is possible to break through the limitations of existing simulation solutions when confronted to the special needs of the studied applications.