Early performance estimates for a new software system aid the design process by providing feedback when design decisions can be easily revised. Unfortunately, constructing a performance model of a distributed and concurrent software system can require significant effort. An automated performance model generation technique is described that reduces the model building effort by providing: easy specification of performance experiments, empirical estimates for model parameters, automated model generation, and support for different types of models. A prototype is used to describe a software system, from which causal traces (angio traces) are recorded during execution. These traces are then processed into sequences of resource demands (workthreads), aggregated into system execution descriptions (workthread classes), and combined to generate a performance model. The technique can also be applied at other stages of the development process, including the redesign of existing software.
Early performance estimates for a new software system aid the design process by providing feedback when design decisions can be easily revised. Unfortunately, constructing a performance model of a distributed and concurrent software system can require significant effort. We propose an approach that reduces the model building effort by providing easy specification of performance test cases, empirical estimates for model parameters, automated model generation, and support for different types of models. A prototype is used to describe an object- based system, for which causal traces (angio traces) are recorded during execution. The traces are then processed into sequences of resource demands (workthreads), aggregated into system execution descriptions (workthread classes), and combined to generate performance models. The technique can also be applied at other stages of the development process, including the redesign of existing software.
This paper introduces a new logical clock approach for distributed systems that is called proper time. In proper time events are ordered causally using two reasonable assumptions as axioms. The axioms lead to a causal relation between events we term a visibility relation. Informally, the visibility relation identifies if one event was necessary for another event to occur. The visibility relation includes the well-known `happened before' temporal relation. Proper time has two logical clocks to characterize causality: (i) a local logical clock (called a task execution graph) for each concurrent software component and (ii) a system level logical clock (called a distributed response graph or DReG). The DReG is a linear time, non-interleaving model of the execution of concurrent software components. The DReG also includes semantic information about how the software components interact or communicate. The causal information of proper time has been used for automatically constructing predictive performance models of a distributed system.
In developing distributed applications and services, such as DCE, X.500, CORBA, ANSA, or SUN/ONC, there is a need to be able to set up and run tests on certain processes. The tests might be to obtain performance data, to test the processes' behaviour, or to evaluate a management strategy. Common requirements are:
- to load and run special versions of at least some of the software, often on mulitple nodes of a network,
- to initialize the software in a well-controlled way, so the tests may be repeatable,
- to monitor execution and collect results for analysis.
DECALS is a system of a half a dozen or so processes which support the testing of distributed applications consisting of many processes on a network. There may be any number of experiments; in each experiment a set of processes is loaded on specified workstations, and each process is initialized with data for the particular experiment using command line arguments. The configuration of the experiment, and the data state of the processes, can thus be controlled down to any level of detail, as desired. The nerve center of DECALS is an ``experiment controller'' process which communicates with each application process (or ``process under test'', POT) either directly or through an agent which it creates on each workstation.
DECALS provides global control over the running of the experiment and the collection of data. The data is collected both from probes installed in the application's source code, and from instrumented operating system primitives. The present system collects data in the form of a separate list of events for each workstation. Events are timestamped by clock values local to the workstation on which the list was made. The lists of events are adjusted for time, and merged, in a postprocessing step. A mechanism is provided for handling the often troublesome problem of tracking the time differences between workstation clocks.
As an example of the usefulness of DECALS, we will describe a how it can be used to create a synthetic workload of processes on a network, which can simulate a system for experiments on network management.
TINA is a distributed computing framework based on Open Distributed Processing which is targeted partly towards management of telecommunications services. This paper is partly a TINA case study and partly an examination of how performance can be addressed during design of a TINA system. The case study part examines a recently developed software suite which manages a telepresence system with video connections, video services and sources, and conference management services. The software was designed without using TINA, but here it is re-interpreted within TINA definitions and concepts. Then a performance assessment framework is described, with the results of performance tests, a model, and an analysis of system scale-up properties based on the model. From the example some lessons are drawn about how performance assessment should be done. Our long-term goal is a method for assessment of performance problems at an early stage in development of distributed software.
This paper presents a notation for extracting complex performance measurements of parallel software from an event log. NICE defines a ``complex interval'' as a sequence of events in the event log which match a rather general template using an Interval Monitor Process (IMP). The IMP process model combines an extended FSM with parameterized event descriptors to provide a generalized description of a sequence of subintervals each one of which may be measured for duration An interval matching algorithm provides a strategy for distributing the events in the log to multiple concurrent IMPS, thereby handling interval occurrences which overlap and/or share the same event. Practical issues of implementing and using NICE are addressed in a discussion of our performance monitoring tool called Finale.
Distributed or parallel software with synchronous communication via rendezvous is found in client-server systems and in proposed Open Distributed Systems, in implementation environments such as Ada, V, Remote Procedure Call systems, in Transputer systems, and in specification techniques such as CSP, CCS and LOTOS.The delays induced by rendezvous can cause serious performance problems, which are not easy to estimate using conventional models which focus on hardware contention, or on a restricted view of the parallelism which ignores implementation constraints. Stochastic Rendezvous Networks are queueing networks of a new type which have been proposed as a modelling framework for these systems. They incorporate the two key phenomena of included service and a second phase of service. This paper extends the model to also incorporate different services or entries associated with each task. Approximations to arrival-instant probabilities are employed with a Mean-Value Analysis framework, to give approximate performance estimates.
The method has been applied to moderately large industrial software systems.