Publications by Eric Eide |
These documents are provided to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Pubs | Most of my papers are available online. Look in the: | Related | You might also want to look at: |
Listed in order from newest to oldest.
Listed in order from newest to oldest.
Programs such as simulators and fuzz testers often use randomness to walk through a large state space in search of interesting paths or outcomes. These explorations can be made more efficient by employing heuristics that ``zero-in'' on paths through the state space that are more likely to lead to interesting solutions. Given one path that exhibits a desired property, it may be beneficial to generate and explore similar paths to determine if they produce similarly interesting results. When the random decisions made during this path exploration can be manipulated in such a way that they correspond to discrete structural changes in the result, we call it parametric randomness.Many programming languages, including Racket, provide only simple randomness primitives, making the implementation of parametric randomness somewhat difficult. To address this deficiency, we present Clotho: a Racket library for parametric randomness, designed to be both easy to use and flexible. Clotho supports multiple strategies for using parametric randomness in Racket applications without hassle.
While distributed application-layer tracing is widely used for performance diagnosis in microservices, its coarse granularity at the service level limits its applicability towards detecting more fine-grained system level issues. To address this problem, cross-layer stitching of tracing information has been proposed. However, all existing cross-layer stitching approaches either require modification of the kernel or need updates in the application-layer tracing library to propagate stitching information, both of which add further complex modifications to existing tracing tools. This paper introduces Deepstitch, a deep learning based approach to stitch cross-layer tracing information without requiring any changes to existing application layer tracing tools. Deepstitch leverages a global view of a distributed application composed of multiple services and learns the global system call sequences across all services involved. This knowledge is then used to stitch system call sequences with service-level traces obtained from a deployed application. Our proof of concept experiments show that the proposed approach successfully maps application-level interaction into the system call sequences and can identify thread-level interactions.
This paper provides an overview of the Platform for Open Wireless Data-driven Experimental Research (POWDER). POWDER is a city-scale, remotely accessible, end-to-end software defined platform to support mobile and wireless research. Compared to other mobile and wireless testbeds POWDER provides advances in scale, realism, diversity, flexibility, and access.
Kernel-resident malware remains a significant threat. An effective way to detect such malware is to examine the kernel memory of many similar (virtual) machines, as one might find in an enterprise network or cloud, in search of anomalies: i.e., the relatively rare infected hosts within a large population of healthy hosts. It is challenging, however, to compare the kernel memories of different hosts against each other. Previous work has relied on knowledge of specific kernels—e.g., the locations of important variables and the layouts of key data structures—to cross the ``semantic gap'' and allow kernels to be compared. As a result, those previous systems work only with the kernels they were built for, and they make assumptions about the malware being searched for.We present a new approach to detecting kernel-resident malware within a ``herd'' of similar virtual machines. Our approach uses limited knowledge of the kernels under examination—e.g., the location of the page global directory and the processor's instruction set—to concisely fingerprint each kernel. It uses no kernel-specific semantics to compare the fingerprints and find those that represent anomalous hosts. We implement our method in a tool called Fluorescence and demonstrate its ability to identify Linux and Windows hosts infected with real-world, kernel-resident malware. Fluorescence can examine a herd of 200 virtual machines with Linux guests in about an hour.
Given the highly empirical nature of research in cloud computing, networked systems, and related fields, testbeds play an important role in the research ecosystem. In this paper, we cover one such facility, CloudLab, which supports systems research by providing raw access to programmable hardware, enabling research at large scales, and creating a shared platform for repeatable research.We present our experiences designing CloudLab and operating it for four years, serving nearly 4,000 users who have run over 79,000 experiments on 2,250 servers, switches, and other pieces of datacenter equipment. From this experience, we draw lessons organized around two themes. The first set comes from analysis of data regarding the use of CloudLab: how users interact with it, what they use it for, and the implications for facility design and operation. Our second set of lessons comes from looking at the ways that algorithms used ``under the hood,'' such as resource allocation, have important—and sometimes unexpected—effects on user experience and behavior. These lessons can be of value to the designers and operators of IaaS facilities in general, systems testbeds in particular, and users who have a stake in understanding how these systems are built.
Though there has been much study of information leakage channels exploiting shared hardware resources (memory, cache, and disk) in cloud environments, there has been less study of the exploitability of shared software resources. In this paper, we analyze the exploitability of cloud networking services (which are shared among cloud tenants) and introduce a practical method for building information leakage channels by monitoring workloads on the cloud networking services through the virtual firewall. We also demonstrate the practicality of this attack by implementing two different covert channels in OpenStack as well as a new class of side channels that can eavesdrop on infrastructure-level events. By utilizing a Long Short-Term Memory (LSTM) neural network model, our side channel attack could detect infrastructure level VM creation/termination events with 93.3% accuracy.
Researchers conduct experiments in a variety of computing environments, including dedicated testbeds and commercial clouds, and they need convenient mechanisms for deploying their software within these disparate platforms. To address this need, we have extended Emulab so that it can instantiate and configure container-based virtual devices using Docker images. Docker is a de facto standard for packaging and deploying software in cloud environments; now, researchers can use Docker to package and deploy software within Emulab-based testbeds as well. We describe how Emulab incorporates Docker and how it extends Docker images to support the interactivity that researchers expect within a testbed. We show that Emulab can use many popular Docker images to create testbed experiments. We expect that Emulab's support for Docker will make it easier for researchers to move their activities freely, both into the testbed and out into production.
Since cloud users do not have direct visibility into the cloud provider's infrastructure, cloud users generally depend on the information provided by the cloud providers when they need to know about states of their virtual resources. In this document, we introduce an approach to monitor the update time of infrastructure level virtual firewalls (called Security Group) from the cloud user's side and technical details for practical deployment of this approach in an OpenStack environment.
We present CapNet, a capability-based network architecture designed to enable least authority and secure collaboration in the cloud. CapNet allows fine-grained management of rights, recursive delegation, hierarchical policies, and least privilege. To enable secure collaboration, CapNet extends a classical capability model with support for decentralized authority. We implement CapNet in the substrate of a software-defined network, integrate it with the OpenStack cloud, and develop protocols enabling secure multi-party collaboration.
Wingman is a run-time monitoring system that aims to detect and mitigate anomalies, including malware infections, within virtual appliances (VAs). It observes the kernel state of a VA and uses an expert system to determine when that state is anomalous. Wingman does not simply restart a compromised VA; instead, it attempts to repair the VA, thereby minimizing potential downtime and state loss. This paper describes Wingman and summarizes experiments in which it detected and mitigated three types of malware within a web-server VA. For each attack, Wingman was able to defend the VA by bringing it to an acceptable state.
We will demonstrate features and capabilities of the PhantomNet testbed. PhantomNet is a mobile testbed, at the University of Utah, aimed at enabling a broad range of mobile networking related research. PhantomNet is remotely accessible and open to the mobile networking research community.
Efficient deterministic replay of whole operating systems is feasible and useful, so why isn't replay a default part of the software stack? While implementing deterministic replay is hard, we argue that the main reason is the lack of general abstractions for understanding and addressing the significant engineering challenges involved in the development of a replay engine for a modern VMM. We present a design blueprint—a set of abstractions, general principles, and low-level implementation details—for efficient deterministic replay in a modern hypervisor. We build and evaluate our architecture in Xen, a full-featured hypervisor. Our architecture can be readily followed and adopted, enabling replay as a ubiquitous part of a modern virtualization stack.
Penetration testing—the process of probing a deployed system for security vulnerabilities—involves a fundamental tension. If one tests a production system, there is a real danger of collateral damage; this is particularly true for systems hosted in the cloud due to the presence of other tenants. If one tests against a separate system brought up to model the live one, the dynamic state of the production system is not captured, and the value of the test is reduced. This paper presents Potassium, which provides penetration testing as a service (PTaaS) and resolves this tension for system owners, penetration testers, and cloud providers. Potassium uses techniques originally developed for live migration of virtual machines to clone them instead, capturing their full disk, memory, and network state. Potassium isolates the cloned system from the rest of the cloud, providing confidence that side effects of the penetration test will not harm other tenants. The penetration tester effectively owns the cloned system, allowing testing to be more thorough, efficient, and automatable. Experiments with our Potassium prototype show that PTaaS can detect real-world vulnerabilities while having minimal impact on cloud-based production systems.
Many clouds and network testbeds use disk images to initialize local storage on their compute devices. Large facilities must manage thousands or more images, requiring significant amounts of storage. At the same time, to provide a good user experience, they must be able to deploy those images quickly. Driven by our experience in operating the Emulab site at the University of Utah—a long-lived and heavily-used testbed—we have created a new service for efficiently storing and deploying disk images. This service exploits the redundant data found in similar images, using deduplication to greatly reduce the amount of physical storage required. In addition to space savings, our system is also designed for highly efficient image deployment—it integrates with an existing highly-optimized disk image deployment system, Frisbee, without significantly increasing the time required to distribute and install images. In this paper, we explain the design of our system and discuss the trade-offs we made to strike a balance between efficient storage and fast disk image deployment. We also propose a new chunking algorithm, called AFC, which enables fixed-size chunking for deduplicating allocated disk sectors. Experimental results show that our system reduces storage requirements by up to 3 times while imposing only a negligible runtime overhead on the end-to-end disk-deployment process.
The PhantomNet facility allows experimenters to combine mobile networking, cloud computing and software-defined networking in a single environment. It is an end-to-end testbed, meaning that it supports experiments not just with mobile end-user devices but also with a cellular core network that can be configured and extended with new technologies. This article introduces PhantomNet and presents a road map for its future development. The current PhantomNet prototype is available now at no cost to researchers and educational users.
Virtual machine introspection (VMI) allows users to debug software that executes within a virtual machine. To support rich, whole-system analyses, a VMI tool must inspect and control systems at multiple levels of the software stack. Traditional debuggers enable inspection and control, but they limit users to treating a whole system as just one kind of target: e.g., just a kernel, or just a process, but not both.We created Stackdb, a debugging library with VMI support that allows one to monitor and control a whole system through multiple, coordinated targets. A target corresponds to a particular level of the system's software stack; multiple targets allow a user to observe a VM guest at several levels of abstraction simultaneously. For example, with Stackdb, one can observe a PHP script running in a Linux process in a Xen VM via three coordinated targets at the language, process, and kernel levels. Within Stackdb, higher-level targets are components that utilize lower-level targets; a key contribution of Stackdb is its API that supports multi-level and flexible ``stacks'' of targets. This paper describes the challenges we faced in creating Stackdb, presents the solutions we devised, and evaluates Stackdb through its application to a security-focused, whole-system case study.
Test features are basic compositional units used to describe what a test does (and does not) involve. For example, in API-based testing, the most obvious features are function calls; in grammar-based testing, the obvious features are the elements of the grammar. The relationship between features as abstractions of tests and produced behaviors of the tested program is surprisingly poorly understood. This paper shows how large-scale random testing modified to use diverse feature sets can uncover causal relationships between what a test contains and what the program being tested does. We introduce a general notion of observable behaviors as targets, where a target can be a detected fault, an executed branch or statement, or a complex coverage entity such as a state, predicate-valuation, or program path. While it is obvious that targets have triggers — features without which they cannot be hit by a test — the notion of suppressors — features which make a test less likely to hit a target — has received little attention despite having important implications for automated test generation and program understanding. For a set of subjects including C compilers, a flash file system, and JavaScript engines, we show that suppression is both common and important.
For modern software systems, performance analysis can be a challenging task. The software stack can be a complex, multi-layer, multi-component, concurrent, and parallel environment with multiple contexts of execution and multiple sources of performance data. Although much performance data is available, because modern systems incorporate many mature data-collection mechanisms, analysis algorithms suffer from the lack of a unifying programming environment for processing the collected performance data, potentially from multiple sources, in a convenient and script-like manner.This paper presents Weir, a streaming language for systems performance analysis. Weir is based on the insight that performance-analysis algorithms can be naturally expressed as stream-processing pipelines. In Weir, an analysis algorithm is implemented as a graph composed of stages, where each stage operates on a stream of events that represent collected performance measurements. Weir is an imperative streaming language with a syntax designed for the convenient construction of stream pipelines that utilize composable and reusable analysis stages. To demonstrate practical application, this paper presents the authors' experience in using Weir to analyze performance in systems based on the Xen virtualization platform.
Reliable isolation of malicious application inputs is necessary for preventing the future success of an observed novel attack after the initial incident. In this paper we describe, measure and analyze, Input-Reduction, a technique that can quickly isolate malicious external inputs that embody unforeseen and potentially novel attacks, from other benign application inputs. The Input-Reduction technique is integrated into an advanced, security-focused, and adaptive execution environment that automates diagnosis and repair. In experiments we show that Input-Reduction is highly accurate and efficient in isolating attack inputs and determining casual relations between inputs. We also measure and show that the cost incurred by key services that support reliable reproduction and fast attack isolation is reasonable in the adaptive execution environment.
Aggressive random testing tools (``fuzzers'') are impressively effective at finding compiler bugs. For example, a single test-case generator has resulted in more than 1,700 bugs reported for a single JavaScript engine. However, fuzzers can be frustrating to use: they indiscriminately and repeatedly find bugs that may not be severe enough to fix right away. Currently, users filter out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and grepping test results. This paper formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Our evaluation shows our ability to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler and 2,603 test cases triggering 28 bugs in a JavaScript engine.
A variability mechanism is a software implementation technique that realizes a choice in the features that are incorporated into a software system. Variability mechanisms are essential for the implementation of configurable software, but the nature of mechanisms as structuring abstractions is not well understood. Mechanisms are often equated with their stereotypical realizations. As a consequence, certain variability mechanisms are generally viewed as having limited applicability due to run-time performance overhead.We claim that it is feasible and useful to realize variability mechanisms in novel ways to improve the run-time performance of software systems. This claim is supported by the implementation and evaluation of three examples.
The first is the flexible generation of performance-optimized code from high-level specifications. As exemplified by Flick, an interface definition language (IDL) compiler kit, concepts from traditional programming language compilers can be applied to bring both flexibility and optimization to the domain of IDL compilation.
The second is a method for realizing design patterns within software that is neither object-oriented nor, for the most part, dynamically configured. By separating static software design from dynamic system behavior, this technique enables more effective and domain-specific detection of design errors, prediction of run-time behavior, and more effective optimization. The method is demonstrated using a suite of operating-system components.
The third, middleware-based brokering of resources, can improve the ability of multi-agent, real-time applications to maintain quality of service (QoS) in the face of resource contention. A CPU broker, for instance, may use application feedback and other inputs to adjust CPU allocations at run time. This helps to ensure that applications continue to function, or at least degrade gracefully, under load.
This paper describes an ongoing research effort aiming to use adaptation to defend individual applications against novel attacks. Application focused adaptive security spans adaptive use of security mechanisms in both the host and the network. The work presented in this paper is developing key infrastructure capabilities and supporting services including mandatory mediation of application I/O, record and replay of channel interaction, and VMI-based monitoring and analysis of execution that will facilitate replay-based diagnosis and patch derivation for attacks that succeed and go unnoticed until a known undesired condition manifests. After describing the basics, we present the results from our initial evaluation and outline the next steps.
Swarm testing is a novel and inexpensive way to improve the diversity of test cases generated during random testing. Increased diversity leads to improved coverage and fault detection. In swarm testing, the usual practice of potentially including all features in every test case is abandoned. Rather, a large ``swarm'' of randomly generated configurations, each of which omits some features, is used, with configurations receiving equal resources. We have identified two mechanisms by which feature omission leads to better exploration of a system's state space. First, some features actively prevent the system from executing interesting behaviors; e.g., ``pop'' calls may prevent a stack data structure from executing a bug in its overflow detection logic. Second, even when there is no active suppression of behaviors, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically; for example, in a week of testing it found 42% more distinct ways to crash a collection of C compilers than did the heavily hand-tuned default configuration of a random tester.
To report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of substrings that delta cannot remove. We have found this approach less than ideal for reducing C programs because it typically yields test cases that are too large or even invalid (relying on undefined behavior). To obtain small and valid test cases consistently, we designed and implemented three new, domain-specific test-case reducers. The best of these is based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations. This reducer produces outputs that are, on average, more than 25 times smaller than those produced by our other reducers or by the existing reducer that is most commonly used by compiler developers. We conclude that effective program reduction requires more than straightforward delta debugging.
This report summarizes the Sixth Workshop on Programming Languages and Operating Systems (PLOS 2011), which was held in conjunction with the SOSP 2011 conference. It presents the motivation for the PLOS workshop series and describes the contributions of the PLOS 2011 event.
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
Network testbeds like Emulab allocate physical computers to users for the duration of an experiment. During an experiment, a user has nearly unfettered access to the devices under his or her control. Thus, at the end of an experiment, an allocated computer can be in an arbitrary state. A testbed must reclaim devices and ensure they are properly configured for future experiments. This is particularly important for security-related experiments: for example, a testbed must ensure that malware cannot persist on a device from one experiment to another.This paper presents the prototype trusted disk-loading system (TDLS) that we have implemented for Emulab. When Emulab allocates a PC to an experiment, the TDLS ensures that if experiment set-up succeeds, the PC is configured to boot the operating system specified by the user. The TDLS uses the Trusted Platform Module (TPM) of an allocated PC to securely communicate with Emulab's control infrastructure and attest about the PC's configuration. The TDLS prevents state from surviving from one experiment to another, and it prevents devices in the testbed from impersonating one another. The TDLS addresses the challenges of providing a scalable and flexible service, which allows large testbeds to support a wide range of systems research. We describe these challenges, detail our TDLS for Emulab, and present the lessons we have learned from its construction.
The increasing use of data repositories, testbeds, and experiment-management systems shows that the networking and systems research communities are moving in the direction of repeatability. We assert, however, that the goal of these communities should not be repeatable research, but “replayable” research. Beyond encapsulating the definition and history of an experiment, a replayable experiment is associated with a mechanism for actually re-executing a system under test. In this paper, we outline the challenges to be overcome in building an archive of replayable experiments in computer networking and systems research.
This report summarizes the Fifth Workshop on Programming Languages and Operating Systems (PLOS 2009), which was held in conjunction with the SOSP 2009 conference. This report presents the motivation for holding the workshop and summarizes the workshop contributions.
C's volatile qualifier is intended to provide a reliable link between operations at the source-code level and operations at the memory-system level. We tested thirteen production-quality C compilers and, for each, found situations in which the compiler generated incorrect code for accessing volatile variables. This result is disturbing because it implies that embedded software and operating systems—both typically coded in C, both being bases for many mission-critical and safety-critical applications, and both relying on the correct translation of volatiles—may be being miscompiled.Our contribution is centered on a novel technique for finding volatile bugs and a novel technique for working around them. First, we present access summary testing: an efficient, practical, and automatic way to detect code-generation errors related to the volatile qualifier. We have found a number of compiler bugs by performing access summary testing on randomly generated C programs. Some of these bugs have been confirmed and fixed by compiler developers. Second, we present and evaluate a workaround for the compiler defects we discovered. In 96% of the cases in which one of our randomly generated programs is miscompiled, we can cause the faulty C compiler to produce correctly behaving code by applying a straightforward source-level transformation to the test program.
In a software product line, the binding time of a feature is the time at which one decides to include or exclude a feature from a product. Typical binding site implementations are intended to support a single binding time only, e.g., compile time or run time. Sometimes, however, a product line must support features with variable binding times. For instance, a product line may need to include both embedded system configurations, in which features are selected and optimized early, and desktop configurations, in which client programs choose features on demand.We present a new technique for implementing the binding sites of features that require flexible binding times. Our technique combines design patterns and aspect-oriented programming: a pattern encapsulates the variation point, and targeted aspects—called edicts—set the binding times of the pattern participants. We describe our approach and demonstrate its usefulness by creating a middleware product line capable of serving the desktop and embedded domains. Our product line is based on JacORB, a middleware platform with many dynamically configurable features. By using edicts to select features at compile time, we create a version of JacORB more suited to resource-constrained environments. By configuring four JacORB subsystems via edicts, we achieve a 32.2% reduction in code size. Our examples show that our technique effectively modularizes binding-time concerns, supporting both compile-time optimization and run-time flexibility as needed.
Reliable sensor network software is difficult to create: applications are concurrent and distributed, hardware-based memory protection is unavailable, and severe resource constraints necessitate the use of unsafe, low-level languages. Our work improves this situation by providing efficient memory and type safety for TinyOS 2 applications running on the Mica2, MicaZ, and TelosB platforms. Safe execution ensures that array and pointer errors are caught before they can corrupt RAM. Our contributions include showing that aggressive optimizations can make safe execution practical in terms of resource usage; developing a technique for efficiently enforcing safety under interrupt-driven concurrency; extending the nesC language and compiler to support safety annotations; finding previously unknown bugs in TinyOS; and, finally, showing that safety can be exploited to increase the availability of sensor networks applications even when memory errors are left unfixed.
The software that runs on a typical wireless sensor network node must address a variety of constraints that are imposed by its purpose and implementation platform. Examples of such constraints include real-time behavior, highly limited RAM and ROM, and other scarce resources. These constraints lead to crosscutting concerns for the implementations of sensor network software: that is, all parts of the software must be carefully written to respect its resource constraints. Neither traditional languages (such as C) nor component-based languages (such as nesC) for implementing sensor network software allow programmers to deal with crosscutting resource constraints in a modular fashion.In this paper we describe Aspect nesC (ANesC), a language we are now implementing to help programmers modularize the implementations of crosscutting concerns within sensor network software. Aspect nesC extends nesC, a component-based dialect of C, with constructs for aspect-oriented programming. In addition to combining the ideas of components and aspects in a single language, ANesC will provide specific and novel constructs for resource-management concerns. For instance, pointcuts can identify program points at which the run-time stack is about to be exhausted or a real-time deadline has been missed. Corrective actions can be associated with these points via ``advice.'' A primary task of the Aspect nesC compiler is to implement such resource-focused aspects in an efficient manner.
The networked and distributed systems research communities have an increasing need for ``replayable'' research, but our current experimentation resources fall short of satisfying this need. Replayable activities are those that can be re-executed, either as-is or in modified form, yielding new results that can be compared to previous ones. Replayability requires complete records of experiment processes and data, of course, but it also requires facilities that allow those processes to actually be examined, repeated, modified, and reused.We are now evolving Emulab, our popular network testbed management system, to be the basis of a new experimentation workbench in support of realistic, large-scale, replayable research. We have implemented a new model of testbed-based experiments that allows people to move forward and backward through their experimentation processes. Integrated tools help researchers manage their activities (both planned and unplanned), software artifacts, data, and analyses. We present the workbench, describe its implementation, and report how it has been used by early adopters. Our initial case studies highlight both the utility of the current workbench and additional usability challenges that must be addressed.
The network and distributed systems research communities have an increasing need for ``replayable'' research, but our current experimentation resources fall short of reaching this goal. Replayable activities are those that can be re-executed, either as-is or in modified form, yielding new results that can be compared to previously obtained results. Replayability requires complete records of experiment processes and data, of course, but it also requires facilities that allow those processes to actually be examined, repeated, modified, and reused for new studies.We are now evolving Emulab, our popular network testbed management system, to be the basis of a new ``experimentation workbench'' in support of realistic, large-scale, replayable networking research. We have implemented a new model of testbed-based experiments, based on scientific workflow, that allows people to move forward and backward through their experimentation processes. Integrated tools in the workbench help researchers manage their activities (both planned and unplanned), software artifacts, data, and analyses. In this paper we present the workbench, describe its implementation, and report how it has been used by early adopters. Our initial case studies, using PlanetLab nodes, cluster PCs, 802.11 wireless, and software radio, have highlighted both the utility of the current workbench and additional usability challenges that must be addressed.
We report our experience in implementing type and memory safety in an efficient manner for sensor network nodes running TinyOS: tiny embedded systems running legacy, C-like code. A compiler for a safe language must often insert dynamic checks into the programs it produces; these generally make programs both larger and slower. In this paper, we describe our novel compiler toolchain, which uses a family of techniques to minimize or avoid these run-time costs. Our results show that safety can in fact be implemented cheaply on low-end 8-bit microcontrollers.
Sensor network applications should be reliable. However, TinyOS, the dominant sensor net OS, lacks basic building blocks for reliable software systems: memory protection, isolation, and safe termination. These features are typically found in general-purpose operating systems but are believed to be too expensive for tiny embedded systems with a few kilobytes of RAM. We dispel this notion and show that CCured, a safe dialect of C, can be leveraged to provide memory safety for largely unmodified TinyOS applications. We build upon safety to implement two very different environments for TinyOS applications. The first, Safe TinyOS, provides a minimal kernel for safely executing trusted applications. Safe execution traps and identifies bugs that would otherwise have silently corrupted RAM. The second environment, UTOS, implements a user-kernel boundary that supports isolation and safe termination of untrusted code. Existing TinyOS components can often be ported to UTOS with little effort. To create our environments, we substantially augmented the CCured toolchain to emit code that is safe under interrupt-driven concurrency, to reduce storage requirements by compressing error messages, to refactor direct hardware access into calls to trusted helper functions, and to make safe programs more efficient using whole-program optimization. A surprising result of our work is that a safe, optimized TinyOS program can be faster than the original unsafe, unoptimized application.
The main forces that shaped current network testbeds were the needs for realism and scale. Now that several testbeds support large and complex experiments, management of experimentation processes and results has become more difficult and a barrier to high-quality systems research. The popularity of network testbeds means that new tools for managing experiment workflows, addressing the ready-made base of testbed users, can have important and significant impacts.We are now evolving Emulab, our large and popular network testbed, to support experiments that are organized around scientific workflows. This paper summarizes the opportunities in this area, the new approaches we are taking, our implementation in progress, and the challenges in adapting scientific workflow concepts for testbed-based research. With our system, we expect to demonstrate that a network testbed with integrated scientific workflow management can be an important tool to aid research in networking and distributed systems.
The main forces that shaped current network testbeds were the needs for realism and scale. Now that several testbeds support large and complex experiments, management of experimentation processes and results has become more difficult and a barrier to high-quality systems research. The popularity of network testbeds means that new tools for managing experiment workflows, addressing the ready-made base of testbed users, can have important and significant impacts.We are now evolving Emulab, our large and popular network testbed, to support experiments that are organized around scientific workflows. This paper summarizes the opportunities in this area, the new approaches we are taking, our implementation in progress, and the challenges in adapting scientific workflow concepts for testbed-based research. With our system, we expect to demonstrate that network testbeds with integrated scientific workflow management can be an important tool to aid research in networking and distributed systems.
An implementation and deployment plan is critically important in the development of a software product or product line. A good plan results from an informed feature analysis that incorporates crosscutting concerns such as variability management and the project time-line. A poor feature analysis, on the other hand, can result in an ineffective project schedule and an implementation that hinders the delivery of new features. An early understanding of product features can help a project planner create an effective plan, and help a designer avoid unsupportable code.Feature typing is our emerging model that describes what features are in terms of two separate natures. The vertical nature of a feature generally follows a UML's use-case path, describing what the feature does. The horizontal nature expresses a program characteristic, describing what the feature has. Because each nature affects feature interaction and product integration and testing, early identification and typing can increase predictability of project time-lines and reduce code complexity.
This paper details the feature typing model and its application for two separate case studies. The studies demonstrate the ways that feature typing can help with the design, planning, coding, and testing of feature implementations. The studies also illuminate the limitations of the approach.
Many real-world distributed, real-time, embedded (DRE) systems, such as multi-agent military applications, are built using commercially available operating systems, middleware, and collections of pre-existing software. The complexity of these systems makes it difficult to ensure that they maintain high quality of service (QoS). At design time, the challenge is to introduce coordinated QoS controls into multiple software elements in a non-invasive manner. At run time, the system must adapt dynamically to maintain high QoS in the face of both expected events, such as application mode changes, and unexpected events, such as resource demands from other applications.In this paper we describe the design and implementation of a CPU Broker for these types of DRE systems. The CPU Broker mediates between multiple real-time tasks and the facilities of a real-time operating system: using feedback and other inputs, it adjusts allocations over time to ensure that high application-level QoS is maintained. The broker connects to its monitored tasks in a non-invasive manner, is based on and integrated with industry-standard middleware, and implements an open architecture for new CPU management policies. Moreover, these features allow the broker to be easily combined with other QoS mechanisms and policies, as part of an overall end-to-end QoS management system. We describe our experience in applying the CPU Broker to a simulated DRE military system. Our results show that the broker connects to the system transparently and allows it to function in the face of run-time CPU resource contention.
Feature-wise decomposition is an important approach to building configurable software systems. Although there has been research on the usefulness of particular tools for feature-wise decomposition, there are not many informative comparisons on the relative effectiveness of different tools. In this paper, we compare AspectJ and Jiazzi, which are two different systems for decomposing Java programs. AspectJ is an aspect-oriented extension to Java, whereas Jiazzi is a component system for Java. To compare these systems, we reimplemented an AspectJ implementation of a highly configurable CORBA Event Service using Jiazzi. Our experience is that Jiazzi provides better support for structuring the system and manipulating features, while AspectJ is more suitable for manipulating existing Java code in non-invasive and unanticipated ways.
Many real-world distributed, real-time, embedded (DRE) systems, such as multi-agent military applications, are built using commercially available operating systems, middleware, and collections of pre-existing software. The complexity of these systems makes it difficult to ensure that they maintain high quality of service (QoS). At design time, the challenge is to introduce coordinated QoS controls into multiple software elements in a non-invasive manner. At run time, the system must adapt dynamically to maintain high QoS in the face of both expected events, such as application mode changes, and unexpected events, such as resource demands from other applications.In this paper we describe the design and implementation of a CPU Broker for these types of DRE systems. The CPU Broker mediates between multiple real-time tasks and the facilities of a real-time operating system: using feedback and other inputs, it adjusts allocations over time to ensure that high application-level QoS is maintained. The broker connects to its monitored tasks in a non-invasive manner, is based on and integrated with industry-standard middleware, and implements an open architecture for new CPU management policies. Moreover, these features allow the broker to be easily combined with other QoS mechanisms and policies, as part of an overall end-to-end QoS management system. We describe our experience in applying the CPU Broker to a simulated DRE military system. Our results show that the broker connects to the system transparently and allows it to function in the face of run-time CPU resource contention.
Mobile code makes it possible for users to define the processing and protocols used to communicate with a remote node, while still allowing the remote administrator to set the terms of interaction with that node. However, mobile code cannot do anything useful without a rich execution environment, and no administrator would install a rich environment that did not also provide strict controls over the resources consumed and accessed by the mobile code.Based on our experience with ANTS, we have developed Bees, an execution environment that provides better security, fine-grained control over capsule propagation, simple composition of active protocols, and a more flexible mechanism for interacting with end-user programs. Bees' security comes from a flexible authentication and authorization mechanism, capability-based access to privileged resources, and integration with our custom virtual machine that provides isolation, termination, and resource control. The enhancements to the mobile code environment make it possible to compose a protocol with a number of ``helper'' protocols. In addition, mobile code can now interact naturally with end-user programs, making it possible to communicate with legacy applications. We believe that these features offer significant improvements over the ANTS execution environment and create a more viable platform for active applications.
Design patterns are a valuable mechanism for emphasizing structure, capturing design expertise, and facilitating restructuring of software systems. Patterns are typically applied in the context of an object-oriented language and are implemented so that the pattern participants correspond to object instances that are created and connected at run-time. This paper describes a complementary realization of design patterns, in which many pattern participants correspond to statically instantiated and connected components.Our approach separates the static parts of the software design from the dynamic parts of the system behavior. This separation makes the software design more amenable to analysis, thus enabling more effective and domain-specific detection of system design errors, prediction of run-time behavior, and more effective optimization. This technique is applicable to imperative, functional, and object-oriented languages: we have extended C, Scheme, and Java with our component model. In this paper, we illustrate our approach in the context of the OSKit, a collection of operating system components written in C.
Design patterns are a valuable mechanism for emphasizing structure, capturing design expertise, and facilitating restructuring of software systems. Patterns are typically applied in the context of an object-oriented language and are implemented so that the pattern participants correspond to object instances that are created and connected at run-time. This paper describes a complementary realization of design patterns, in which the pattern participants are statically instantiated and connected components.Our approach separates the static parts of the software design from the dynamic parts of the system behavior. This separation makes the software design more amenable to analysis, enabling more effective and domain specific detection of system design errors, prediction of run-time behavior, and more effective optimization. This technique is applicable to imperative, functional, and object-oriented languages: we have extended C, Scheme, and Java with our component model. In this paper, we illustrate this approach in the context of the OSKit, a collection of operating system components written in C.
Knit is a new component specification and linking language. It was initially designed for low-level systems software, which requires especially flexible components with especially well-defined interfaces. For example, threads and virtual memory are typically implemented by components within the system, instead of being supplied by some execution environment. Consequently, components used to construct the system must expose interactions with threads and memory. The component composition tool must then check the resulting system for correctness, and weave the components together to achieve reasonable performance.Component composition with Knit thus acts like aspect weaving: component interfaces determine the ``join points'' for weaving, while components (some of which may be automatically generated) implement aspects. Knit is not limited to the construction of low-level software, and to the degree that a set of components exposes fine-grained relationships, Knit provides the benefits of aspect-oriented programming within its component model.
Knit is a new component definition and linking language for systems code. Knit helps make C code more understandable and reusable by third parties, helps eliminate much of the performance overhead of componentization, detects subtle errors in component composition that cannot be caught with normal component type systems, and provides a foundation for developing future analyses over C-based components, such as cross-component optimization. The language is especially designed for use with component kits, where standard linking tools provide inadequate support for component configuration. In particular, we developed Knit for use with the OSKit, a large collection of components for building low-level systems. However, Knit is not OSKit-specific, and we have implemented parts of the Click modular router in terms of Knit components to illustrate the expressiveness and flexibility of our language. This paper provides an overview of the Knit language and its applications.
Distributed applications are complex by nature, so it is essential that there be effective software development tools to aid in the construction of these programs. Commonplace ``middleware'' tools, however, often impose a tradeoff between programmer productivity and application performance. For instance, many CORBA IDL compilers generate code that is too slow for high-performance systems. More importantly, these compilers provide inadequate support for sophisticated patterns of communication. We believe that these problems can be overcome, thus making IDL compilers and similar middleware tools useful for a broader range of systems.To this end we have implemented Flick, a flexible and optimizing IDL compiler, and are using it to produce specialized high-performance code for complex distributed applications. Flick can produce specially ``decomposed'' stubs that encapsulate different aspects of communication in separate functions, thus providing application programmers with fine-grain control over all messages. The design of our decomposed stubs was inspired by the requirements of a particular distributed application called Khazana, and in this paper we describe our experience to date in refitting Khazana with Flick-generated stubs. We believe that the special IDL compilation techniques developed for Khazana will be useful in other applications with similar communication requirements.
The author of a distributed system is often faced with a dilemma when writing the system's communication code. If the code is written by hand (e.g., using Active Messages) or partly by hand (e.g., using MPI) then the speed of the application may be maximized, but the human effort required to implement and maintain the system is greatly increased. On the other hand, if the code is generated using a high-level tool (e.g., a CORBA IDL compiler) then programmer effort will be reduced, but the performance of the application may be intolerably poor. The tradeoff between system performance and development effort arises because existing communication middleware is inefficient, imposes excessive presentation layer overhead, and therefore fails to expose much of the underlying network performance to application code. Moreover, there is often a mismatch between the desired communication style of the application (e.g., asynchronous message passing) and the communication style of the code produced by an IDL compiler (synchronous remote procedure call). We believe that this need not be the case, but that established optimizing compiler technology can be applied and extended to attack these domain-specific problems.We have implemented Flick, a flexible and optimizing IDL compiler, and are using it to explore techniques for producing high-performance code for distributed and parallel applications. Flick produces optimized code for marshaling and unmarshaling data; experiments show that Flick-generated stubs can marshal data between 2 and 17 times as fast as stubs produced by other IDL compilers. Further, because Flick is implemented as a ``kit'' of components, it is possible to extend the compiler to produce stylized code for many different application interfaces and underlying transport layers. In this paper we outline a novel approach for producing ``decomposed'' stubs for a distributed global memory service.
An interface definition language (IDL) is a nontraditional language for describing interfaces between software components. IDL compilers generate ``stubs'' that provide separate communicating processes with the abstraction of local object invocation or procedure call. High-quality stub generation is essential for applications to benefit from component-based designs, whether the components reside on a single computer or on multiple networked hosts. Typical IDL compilers, however, do little code optimization, incorrectly assuming that interprocess communication is always the primary bottleneck. More generally, typical IDL compilers are ``rigid'' and limited to supporting only a single IDL, a fixed mapping onto a target language, and a narrow range of data encodings and transport mechanisms.Flick, our new IDL compiler, is based on the insight that IDLs are true languages amenable to modern compilation techniques. Flick exploits concepts from traditional programming language compilers to bring both flexibility and optimization to the domain of IDL compilation. Through the use of carefully chosen intermediate representations, Flick supports multiple IDLs, diverse data encodings, multiple transport mechanisms, and applies numerous optimizations to all of the code it generates. Our experiments show that Flick-generated stubs marshal data between 2 and 17 times faster than stubs produced by traditional IDL compilers, and on today's generic operating systems, increase end-to-end throughput by factors between 1.2 and 3.7.
Many modern human-computer interfaces are difficult for people to use. This is often because these interfaces make no significant attempt to communicate with the people who use them. In other words, these interfaces are uncooperative: They do not adapt themselves to their users' needs and they are insensitive to human foibles. Ordinary command line interfaces such as that of the UNIX C shell (csh) are intolerant of even the most simple input errors, even when those errors have obvious corrections. An ``intelligent'' UNIX shell interface, on the other hand, would make use of knowledge and interaction context in order to interpret — and as necessary, correct — its users' commands.Valet is a prototype of such an ``intelligent'' interface to the UNIX C shell. Valet adds knowledge-based parsing and input correction to the shell by encapsulating an ordinary C shell process within a framework that allows Valet to control the shell's input and output. Valet intercepts shell commands and parses them, using its knowledge of the most popular UNIX shell commands, its built-in model of the file system, and data that describe the commands and files most often and recently referenced by individual users. Valet incorporates heuristics designed to detect and correct the kinds of mistakes that experienced users make most frequently: typographical errors, file location errors, and minor syntactic errors.
In order to evaluate the interface, eleven volunteers agreed to use Valet in the course of their normal work for approximately four weeks. The commands that those people entered, along with Valet's responses, were recorded and analyzed in order to measure the overall usefulness and effectiveness of the system. The data from the experiment suggest that knowledge-based, error-tolerant, ``intelligent'' command parsing can have very beneficial effects. The experiment also pointed to ways in which Valet could be improved.
Eric Eide <eeide@cs.utah.edu> | Last modified: Fri Jan 15 11:50:08 MST 2021 |