Annotated Bibliography of papers related to Layered Queueing

[1]: Greg Franks and Murray Woodside. Performance of multi-level client-server systems with parallel service operations. In Proceedings of the First International Workshop on Software and Performance (WOSP '98), pages 120–130, Santa Fe, NM, October 12–16 1998. Association for Computing Machinery. (doi:10.1145/287318.287346)
Parallel execution can enhance the performance of distributed client-server systems, but the enhancement may be less than expected. Evaluations of such designs must include the complex effects of overheads, heterogeneous parallel branches, contention by the parallel parts for servers in lower levels, and simultaneous resource possession effects. A ``compensated complementary delay'' approximation is described which exploits layered queueing approximations for layered resources which occur in client-server architectures, based on synchronization delay estimates and adjusted levels of contention. The new approximation uses the overlap of parallel branches and a new fast calculation of join delays. It gives acceptable errors (averaging about two percent), and has an enormously lower computational cost compared to the competing approach based on decomposition. The new approximation may moderate a conclusion made by Heidelberger and Trivedi, that decomposition gives greatly superior accuracy to the much cheaper complementary delay approach; these delay approximations are only a little less accurate.
[2]: Greg Franks and Murray Woodside. Effectiveness of early replies in client-server systems. Performance Evaluation, 36(1):165–184, August 1999. (doi:10.1016/S0166-5316(99)00034-6)
A common performance optimization for a Server process is to send the reply to each request as early as possible, before final operations that are not in the critical path (such as buffer cleanup, state updates, logging and file updates). The operations after the reply form a ``second phase'' of service. This does not delay the current request from the client, but may delay succeeding requests. The net performance improvement depends on the number of clients at a server, its utilization, and the proportion of the total work which is placed in the second phase. This dependence is explored using analytic models that include an improved special approximation for two phases service in queueing networks, and layered queueing networks The result is an approximate analysis for large and complex client-server systems, with second phases.
[3]: Greg Franks and Murray Woodside. Multiclass multiservers with deferred operations in layered queueing networks, with software system applications. In Doug DeGroot and Pete Harrison, editors, Proceedings of the Twelfth IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS 2004), pages 239–248, Volendam, The Netherlands, October 2004. (PDF) (doi:10.1109/MASCOT.2004.1348262)
Layered queueing networks describe the simultaneous-resource behaviour of servers that request lower-layer services and wait for them to complete. Layered software systems often follow this model, with messages to request service and receive the results. Their performance has been computed successfully using mean-value queueing approximations. Such systems also have multiservers (which model multi-threaded software processes), multiple classes of service, and what we call deferred operations or ``second phases'', which are executed after sending the reply message to the requester. In this paper, three established MVA approximations for multiclass multiservers were extended to include deferred service, and evaluated within the layered queueing context. Errors ranged from 1% up to about 15%. These servers were then used to model the Network File System, as implemented on Linux, to show that the method scales up and gives good accuracy on typical systems, with computation times of a few seconds to a few minutes. This is hundreds of times faster than simulation.
[4]: Greg Franks, Peter Maly, Murray Woodside, Dorina C. Petriu, and Alex Hubbard. Layered Queueing Network Solver and Simulator User Manual. Real-time and Distributed Systems Lab, Carleton University, Ottawa. (PDF)
[5]: Greg Franks, Shikharesh Majumdar, John Neilson, Dorina Petriu, Jerome Rolia, and Murray Woodside. Performance analysis of distributed server systems. In The Sixth International Conference on Software Quality (6ICSQ), pages 15–26, Ottawa, Ontario, Canada, October 1996. American Society for Quality Control (ASQC). (PDF)
It is generally accepted that performance characteristics, such as response time and throughput, are an integral part of the factors defining the quality of software products. The relationship between quality and system responsiveness is especially strong in the case of distributed application using various kind of software servers (name servers, application servers, database servers, etc.) In order to meet the performance requirements of such systems, the developers should be able to assess and understand the effect of various design decisions on system performance at an early stage, when changes can be made easily and effectively. Performance analysis should then continue throughout the whole life cycle, becoming one of the means of assuring the quality of software products. For this to become a practical reality, we need appropriate modeling techniques.This paper presents a new performance model named Layered Queueing Networks (LQN) for systems with distributed software servers. In such systems the servers are are frequently layered, so that lower level delays are included in the higher layer service time, which limits the system capacity. LQN represents these effects in a compact and understandable way, and provide analytical and simulation tools. The paper explains the model, and more importantly, shows how can be applied to identify performance trouble spots in a system and devise effective corrective measures.
[6]: Greg Franks, Dorina Petriu, Murray Woodside, Jing Xu, and Peter Tregunno. Layered bottlenecks and their mitigation. In Proceedings of the Third International Conference on the Quantative Evaluation of Systems (QEST'06), pages 103 – 114, Riverside, CA, USA, September 11–14 2006. (PDF) (doi:10.1109/QEST.2006.23)
Bottlenecks are a simple and well-understood phenomenon in servicesystems and queueing models. However in systems with layered resources bottlenecks are more complicated, because of simultaneous resource possession. Thus, the holding time of a higher-layerresource, such as a process thread, may include a small execution demand, but a large time to use other resources at a lower layer (such as a disk). A single saturation point may in fact saturate many otherresources by push-back, making diagnosis of the problemdifficult. This paper gives a new corrected definition of a layeredbottleneck, and develops a framework for systematic detection of thesource of a bottleneck, for applying improvements and for estimatingtheir effectiveness. Many of the techniques are specific to layered bottlenecks.
[7]: Greg Franks, Tariq Al-Omari, Murray Woodside, Olivia Das, and Salem Derisavi. Enhanced modeling and solution of layered queueing networks. IEEE Transactions on Software Engineering, 35(2):148–161, March–April 2009. (doi:10.1109/TSE.2008.74)
Layered queues are a canonical form of extended queueing network for systems with nested multiple resource possession, in which successive depths of nesting define the layers. The model has been applied to most modern distributed systems, which use different kinds of client-server and master-slave relationships, and scales up well. The Layered Queueing Network (LQN) model is described here in a unified fashion, including its many more extensions to match the semantics of sophisticated practical distributed and parallel systems. These include efficient representation of replicated services, parallel and quorum execution, and dependability analysis under failure and reconfiguration. The full LQN model is defined here and its solver is described. A substantial case study to an air traffic control system shows errors (compared to simulation) of a few percent. The LQN model is compared to other models and solutions, and is shown to cover all their features.
[8]: Greg Franks, Danny Lau, and Curtis Hrischuk. Performance measurements and modeling of a Java-based session initiation protocol (SIP) application server. In Proceedings of the Seventh International ACM Sigsoft Conference on the Quality of Software Architectures (QOSA '11), Boulder, CO, USA, June 20–24 2011. To appear. (PDF)
The Session Initiation Protocol (SIP) is an Internet protocol for establishing sessions between two or more parties. It is becoming ubiquitous in uses such as Voice over IP, instant messaging, Internet TV, and others. Performance is a chief concern with SIP because Quality of Service is important and SIP has internal timers that need to be honored or network efficiency suffers. The Java community has provided a standardized API so that SIP applications can now be built using Java application servers. These new capabilities also bring with them new performance engineering methods, tools, and benchmarking needs. This paper describes the experiences and processes for the performance engineering of SIP applications in a Java environment. In this paper, a Java 2 Enterprise Edition (J2EE) SIP application server's performance is analyzed in a standalone and cluster environment, with network traces used to build a performance model of each environment. This included gathering data from test runs and extracting performance parameters from packet traces to construct the performance models. The models are then calibrated to match the model prediction with real system test data. Using the calibrated models, some bottlenecks were identified and suggestions to improve the overall maximum throughput were developed and were subsequently implemented in the system.
[9]: Greg Franks. Traffic dependencies in client-server systems and their effect on performance prediction. In IEEE International Computer Performance and Dependability Symposium, pages 24–33, Erlangen, Germany, April 1995. (PostScript) (doi:10.1109/IPDS.1995.395840)
Client-server systems are becoming increasingly common in the world today as users move from centralized mainframe facilities to networks of distributed work stations. This form of work demands new performance models as the interactions in client-server systems are more complex than the types supported by classic queueing network solvers such as Mean Value Analysis. However, certain interaction patterns can arise in multi-level client-server systems that require special treatment. This paper describes these interactions (referred to as interlocking here) and how they affect the performance estimates of solution methods using surrogate delays to solve multi-level client-server models. It then describes a method to take interlocking into account when solving the performance models. These corrections often reduce the solution error to close to zero when compared to exact solutions for situations where interlocking is significant.
[10]: Greg Franks. Simulating layered queueing networks with passive resources. In Proceedings of the 2011 Spring Simulation Multiconference, volume 4 (TMS/DEVS), pages 8–15, Boston, MA, June 4–7 2011.
This paper describes an extension to Layered Queueing Networks (LQN), a form of an extended queueing network used to investigate performance problems, to model passive resources such as counting semaphores and buffers. Layered queueing networks can be constructed directly, or from UML design models which incorporate the MARTE profile, either directly or via the Core Scenario Model. Layered Queueing Networks scale well and can solve analytically systems with nested resource requests to active resources. However, passive resources cause problems which force the use of simulation. The layered queueing network simulator, lqsim, is also described here. Simulations are created by reading in an LQN model, constructing objects from pre-existing templates, then solving. The semaphore task extension was incorporated by modifying the existing template used to model multi-server tasks. Finally, the semaphore extension was used to solve a model of a building security system which has a pool of buffers to capture video images. The results here show that a lack of buffers is indeed a bottleneck, but other parts of the system ultimately limit the capacity of the system.
[11]: Lianhua Li and Greg Franks. Performance modeling of systems using fair share scheduling with layered queueing networks. In Proceedings of the Seventeenth IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS 2009), pages 1–10, Imperial College, London, September 21–23 2009. (doi:10.1109/MASCOT.2009.5366689)
[12]: Peter Maly and C. Murray Woodside. Layered modeling of hardware and software, with application to a LAN extension router. In 11th International Conference on Computer Performance Evaluation; Modelling Techniques and Tools TOOLS 2000, number 1786 in Lecture Notes in Computer Science, pages 10 – 24. Springer-Verlag, March 27–31 2000. (doi:10.1007/3-540-46429-8_2)
Understanding the interactions between hardware and software is important to performance in many systems found in data communications like routers. Responsibilities that traditionally were programmed in software are being transferred to intelligent devices, and special purpose hardware. With more functionality being transferred to these devices, it becomes increasingly important to capture them in performance models. Modeling hardware/software systems requires an extended queueing model like LQN. This paper describes a layered architecture model which represents hardware and software uniformly and which emphasizes resources and performance, called a Resource-based Model Architecture (RMA). The approach is demonstrated on a remote access or LAN extension router. The model is created by a systematic tracing of scenarios and is used to explore the router capacity for different workloads, and to analyze a re-design for scaleup.
[13]: Daniel A. Menascé. Two-level iterative queuing modeling of software contention. In Proceedings of the Tenth IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2002), Fort Worth, TX, October 12–16 2002. (doi:10.1109/MASCOT.2002.1167086)
Being able to model contention for software resources (e.g., a critical section or database lock) is paramount to building performance models that capture all aspects of the delay encountered by a process as it executes. Several methods have been offered for dealing with software contention and with message blocking in client-server systems. This paper presents a general, straightforward, easy to understand and implement, approach to modeling software contention using queuing networks. The approach, called SQN-HQN, consists of a two-level iterative process. Two queuing networks are considered: one represents software resources (SQN) and the other hardware resources (HQN). Multiclass models are allowed and any solution technique exact or approximate - can be used at any of the levels. This technique falls in the general category of fixed-point approximate models and is similar in nature to other approaches. The main difference lies in its simplicity. The process converges very fast in the examples examined. The results were validated against global balance equation solutions and are very accurate.
[14]: Martin Mroz and Greg Franks. A performance experiment system supporting fast mapping of system issues. In Fourth International Conference on Performance Evaluation Methodologies and Tools, Pisa, Italy, October20–22 2009. (PDF) (doi:10.4108/ICST.VALUETOOLS2009.7807)
The most fruitful use of a performance model is to study deep properties of the system, and hypothetical situations that might lead to improved configurations or designs. This requires executing experiments on the model, which evaluate systematic changes. Parameter estimation methods also exploit search in a parameter space to fit a model to performance data. Estimation, sensitivity and optimization experiments can require hundreds of evaluations, and the efficiency of the analytic model solver may become an issue. Analytic models usually provide fast solutions (compared to simulations) but repetitive solutions for near- models offer opportunities for further reducing the effort. This work describes an driver for a layered queueing solver which provides a factor of two improvement. It also raises the issue of domain-specific languages for model experiments, versus general languages with suitable libraries.
[15]: John E. Neilson, C. Murray Woodside, Dorina C. Petriu, and Shikharesh Majumdar. Software bottlenecking in client-server systems and rendezvous networks. IEEE Transactions on Software Engineering, 21(9):776–782, September 1995. (doi:10.1109/32.464543)
Software bottlenecks are performance constraints caused by slow execution of a software task. In typical client-server systems a client task must wait in a blocked state for the server task to respond to its requests, so a saturated server will slow down all its clients. A Rendezvous Network generalizes this relationship to multiple layers of servers with send-and-wait interactions (rendezvous), a two-phase model of task behavior, and to a unified model for hardware and software convention. software bottlenecks have different symptoms, different behavior when the system is altered, and a different cure from the conventional bottlenecks seen in queueing network models of computer systems, caused by hardware limits. The differences are due to the ``push-back'' effect of the rendezvous, which spreads the saturation of a server to its clients. The paper describes software bottlenecks by examples, gives a definition, shows bow they can be located and alleviated, and gives a method for estimating the performance benefit to be obtained. Ultimately, if all the software bottlenecks can be removed, the performance limit will be due to a conventional hardware bottleneck.
[16]: John E. Neilson. PARASOL: A simulator for distributed and/or parallel systems. Technical Report SCS TR-192, School of Computer Science, Carleton University, Ottawa, Ontario, Canada, May 1991. (PDF)
This paper describes the kernel of a distributed/parallel system simulator/emulator which provides the user with ``fine grained'' control over system components including the hardware, the system software, and the application software architectures. The PARASOL system is truly an umbrella system in that it is equally suitable for performance simulation studies, with various degrees of model abstraction, for implementing and testing distributed/parallel algorithms, and for software prototyping. The kernel provides features for defining: heterogeneous network topologies of arbitrary complexity; dynamic task structures with a variety of management tools including task migration; interprocess communication via Mach-like primitives using communication ports; and spinlocks for low-level task synchronization. For those who are interested in performance simulations, PARASOL also provides the customary simulation tools for random variate generation and for statistics collection. Finally. PARASOL includes a monitor/debugging facility for monitoring and/or debugging user applications. PARASOL may be viewed as both ``process-based'' and ``object-based''. Its programming paradigm requires users to define PARASOL tasks, typically written in C. These together with built-in PARASOL entities are used to construct system models. The PARASOL kernel is highly portable and will run on virtually any system supporting simple, single-threaded applications written in C.
[17]: Dorina C. Petriu and C. Murray Woodside. Approximate MVA for software client/server models by markov chain task-directed aggregation. In The Third IEEE Symposium on Parallel and Distributed Processing, Dallas, TX USA, December 2–5 1991. (doi:10.1109/SPDP.1991.218224)
Stochastic Rendezvous Networks (SRVN) are performance models for multitasking parallel software with intertask communication via rendezvous (RV). SRVN differ from Queueing Networks (QN) in two ways: nodes act as both clients and servers (allowing for nested service), and servers have two distinct phases of service – the first one ``in RV'' with the client, and the second ``after RV'', executed in parallel with the client. SRVN are very appropriate to model systems with software servers. Previous work on solving SRVN models has used a kind of approximate Mean Value Analysis based on heuristic ad hoc assumptions to determine the task queues properties at the instant of RV request arrivals. This paper describes a much better approximation for the arrival instant probabilities for a class of simple client/server SRVN, based on a rigorous analysis of the Markov Chain model describing the interference of different client tasks contending for a single server task with FIFO queueing discipline. The algorithm for a simple client/server SRVN is integrated into an iterative decomposition algorithm for complex SRVN models with any number of servers and multi-layered service. Finally, the SRVN model is used to study an example of software bottleneck.
[18]: Dorina C. Petriu. Approximate mean value analysis of client–server systems with multi-class requests. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems., pages 77–86, Nashville, TN, USA, May 1994. (doi:10.1145/183018.183027)
Stochastic Rendezvous Networks (SRVNs) are performance models for multitasking parallel software with intertask communication via rendezvous introduced in [1], which are very appropriate to model client-server systems. SRVNs differ from Queueing Networks (QNs) in two ways: nodes act as both clients and servers (allowing for nested service), and servers have two distinct phases of service – the first one `in RV' with the client, and the second `after RV', executed in parallel with the client. Early work on solving SRVN models has used a kind of approximate Mean Value Analysis based on heuristic ad hoc assumptions to determine the task queue properties at the instant of RV request arrivals. Approximation are necessary since SRVN violates product form. Recently, a more rigorous approach was proposed in [2] for the solution of SRVN models, based on a special aggregation (named `Task-Directed Aggregation' TDA) of the Markov chain model describing the interference of different clients that contend for a single server with FIFO queueing discipline and different service times. The algorithm derived in [2] has the limitation that each client may require only a single class of service. In general, a software server offers a range of services with different workloads and functionalities, and a client may need more than one service. The present paper uses the TDA approach to derive an extended algorithm which allows a client to require any number of services from a server by changing randomly the request class. The new algorithm is incorporated into a decomposition method for models with any number of servers. The SRVN modelling technique is applied to a large case study of a distributed database system, giving insight into the behaviour of the system and helping to identify performance problems such as software bottleneck.
[19]: S. Ramesh and H. G. Perros. A multi-layer client-server queueing network model with synchronous and asynchronous messages. In Proceedings of the First International Workshop on Software and Performance (WOSP '98), pages 107–119, Santa Fe, NM USA, October 12–16 1998. Association for Computing Machinery. (doi:10.1145/287318.287343)
We analyze a multi-layered queueing network that models a client-server system where clients and servers communicate via synchronous and asynchronous messages. The servers are organized in groups such that they form a multi-layered hierarchical structure. The queueing network is analyzed approximately using a decomposition algorithm. Numerical tests show that the approximation has good accuracy.
[20]: J. A. Rolia and K. A. Sevcik. The method of layers. IEEE Transactions on Software Engineering, 21(8):689–700, August 1995. (doi:10.1109/32.403785)
Distributed applications are being developed that contain one or more layers of software servers. Software processes within such systems suffer contention delays both for shared hardware and at the software servers. The responsiveness of these systems is affected by the software design, the threading level and number of instances of software processes, and the allocation of processes to processors. The Method Of Layers (MOL) is proposed to provide performance estimates for such systems. The MOL uses the mean value analysis (MVA) algorithm Linearizer as a subprogram to assist in predicting model performance measures.
[21]: C. M. Woodside, E. Neron, E. D.-S. Ho, and B. Mondoux. An ``Active Server'' model for the performance of parallel programs written using rendezvous. Journal of Systems and Software, pages 844–848, 1986. (doi:10.1016/0164-1212(86)90031-2)
A method is presented for analyzing the performance of multitasking multiprocessor software that uses rendezvous, possibly implemented by using message passing, for interprocess communication and synchronization. The rendezvous is a feature of several modern systems supporting concurrency and implies two phases of service that could be termed ``within-rendezvous'' and ``post-rendezvous'' service. The paper gives a notation for the pattern of rendezvous, a framework for translating a software/hardware system structure in to an active-server queueing network model, and an implicit decomposition algorithm for solving for the system performance. The active-server model has servers with a dual nature as servers and customers, so it is distinct in concept from a network of queues, although there are many points of similarity.
[22]: C. M. Woodside, J. E. Neilson, J. W. Miernik, D. C. Petriu, and Constantin R. Performance of concurrent rendezvous systems with complex pipeline structures. In Ramon Puigjaner and Dominique Potier, editors, Fourth International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, pages 307–322, Palma, Spain, September 1988.
The term ``Complex Pipeline'' describes a set of tasks which process incoming data in a sequence, like a pipeline, but have various kinds of parallel execution steps coupled into the main stream of execution. Examples are, splitting off of parallel streams, and shared server tasks. Examples are found in processing to interpret radar data, static processor allocations and synchronous inter-task communications, which can cause potential performance problems. The growing importance of rendezvous-based environments, including Ada, require that we be able to predict these problems. Models such as Petri nets are often too expensive to solve; fast approximation techniques are needed. The approach of ``stochastic rendezvous networks'' is adopted here to deal with complex pipelines. This paper describes an algorithm and evaluates is accuracy; the algorithm is the major feature of the paper, including a ``conditional mean value analysis'' step. It included processor queueing, which was not modelled in the earlier work. The method is several orders of magnitude faster than Petri net analysis even on small examples. the accuracy obtained is generally better than 10%.
[23]: C. Murray Woodside, John E. Neilson, Dorina C. Petriu, and Shikharesh Majumdar. The stochastic rendezvous network model for performance of synchronous client-server-like distributed software. IEEE Transactions on Computers, 44(8):20–34, August 1995. (doi:10.1109/12.368012)
Distributed or parallel software with synchronous communication via rendezvous is found in client-server systems and in proposed Open Distributed Systems, in implementation environments such as Ada, V, Remote Procedure Call systems, in Transputer systems, and in specification techniques such as CSP, CCS and LOTOS. The delays induced by rendezvous can cause serious performance problems, which are not easy to estimate using conventional models which focus on hardware contention, or on a restricted view of the parallelism which ignores implementation constraints. Stochastic Rendezvous Networks are queueing networks of a new type which have been proposed as a modelling framework for these systems. They incorporate the two key phenomena of included service and a second phase of service. This paper extends the model to also incorporate different services or entries associated with each task. Approximations to arrival-instant probabilities are employed with a Mean-Value Analysis framework, to give approximate performance estimates. The method has been applied to moderately large industrial software systems.
[24]: C. Murray Woodside. Throughput calculation for basic stochastic rendezvous networks. Performance Evaluation, 9:143–160, 1989. (doi:10.1016/0166-5316(89)90039-4)
Distributed computer programs using rendezvous for intertask communication have performace effects due to tasks waiting for rendezvous. Tasks have a dual nature as being both servers of requests and customers in making requests of other tasks. In a previous paper, such tasks have been termed `active servers'. In this paper, a distributed program is modelled as a stochastic network of tasks related by their rendezvous requests. For purposes of defining the throughput, the tasks are assumed to have maximum concurrency as if each task were executed on its own processor. For small networksm exact throughput values van be found by translating them into equivalent Petri Nets, and the translation process is given as an algorithm. An approximate suitable for larger networks, which was found to be accurate to within a few percent in most of the cases tested, is also given.
[25]: Jing Xu, Murray Woodside, and Dorina Petriu. Performance analysis of a software design using the UML profile for schedulability, performance and time. In 13th International Conference on Computer Performance Evaluation; Modelling Techniques and Tools TOOLS 2003, number 2794 in Lecture Notes in Computer Science, pages 291–307, Urbana, IL, USA, September 2003. Springer-Verlag. (doi:10.1007/b12028)
As software development cycles become shorter, it is more important to evaluate non-functional properties of a design, such as its performance (in the sense of response times, capacity and scalability). To assist users of UML (the Unified Modeling Language), a language extension called Profile for Schedulability, Performance and Time has been adopted by OMG. This paper demonstrates the use of the profile to describe performance aspects of design, and to evaluate and evolve the design to deal with performance issues, based on a performance model in the form of a layered queueing network. The focus is on addressing different kinds of performance concerns, and interpreting the results into modifications to the design and to the planned run-time configuration.