Bryan Ford Godmar Back Greg Benson Jay Lepreau Albert Lin Olin Shivers
University of Utah University of California, Davis Massachusetts Institute of Technology
http://www.cs.utah.edu/projects/flux/oskit/
Implementing new operating systems is tedious, costly, and often impractical except for large projects. The Flux OSKit addresses this problem in a novel way by providing clean, well-documented OS components designed to be reused in a wide variety of other environments, rather than defining a new OS structure. The OSKit uses unconventional techniques to maximize its usefulness, such as intentionally exposing implementation details and platform-specific facilities. Further, the OSKit demonstrates a technique that allows unmodified code from existing mature operating systems to be incorporated quickly and updated regularly, by wrapping it with a small amount of carefully designed ``glue'' code to isolate its dependencies and export well-defined interfaces. The OSKit uses this technique to incorporate over 230,000 lines of stable code including device drivers, file systems, and network protocols. Our experience demonstrates that this approach to component software structure and reuse has a surprisingly large impact in the OS implementation domain. Four real-world examples show how the OSKit is catalyzing research and development in operating systems and programming languages.
Appears in Proceedings of the 16th ACM Symposium on Operating Systems Principles, October 1997, Saint-Malo, France. Copyright ©1997 by ACM, Inc.
While it is possible to adapt existing systems, they are generally complicated and entwined with interdependencies. The OSKit, developed by the Flux research group at the University of Utah, addresses this problem by providing a framework and a set of modularized library code with straightforward and well-documented interfaces for the construction of operating system kernels, servers, and other core OS functionality. The OSKit provides functionality such as simple bootstrapping, a minimal POSIX environment usable in kernels, memory management suited for physical memory and its constraints, extensive debugging support, and higher-level subsystems such as protocol stacks and file systems. The OSKit gives a developer an immediate starting point for investigating ``real'' OS issues such as scheduling, virtual memory, IPC, file systems, or security. Developers can easily replace generic OSKit modules or functions with their own, guided by research interests or performance considerations. The OSKit can be used to bootstrap unconventional operating systems quickly, such as those for embedded systems and network computers.
The OSKit is heavily used in at least three OS kernels under ongoing development at different institutions. Our own microkernel-based OS, Fluke [17], puts almost all of the OSKit to use. Over half of the Fluke kernel is OSKit code, and many of the servers and user-level utilities that run on top of this kernel also rely heavily on parts of the OSKit. The OSKit has also enhanced and accelerated our OS research by allowing us to quickly create several prototype kernels in order to explore ideas before investing the effort necessary to incorporate these ideas into the much larger primary development system.
Research groups at MIT and U.C. Davis, represented by co-authors of this paper, have recently adopted the OSKit for systems-level language research. Traditional kernels distance the language from the hardware; even microkernels and other extensible kernels enforce some default policy which often conflicts with a particular language's semantics. The OSKit provides a valuable tool to allow advanced languages to be evaluated and experimented with at a low level, to explore novel OS structures enabled by such languages, and to make it possible to obtain accurate performance measurements without the interference of a full-scale OS. By implementing Standard ML [26] directly on the hardware using the OSKit, we are able to model hardware resources with the constructs of a functional programming language. Our port of SR (``Synchronizing Resources'') [3], a parallel language intended for systems-level programming but never extensively used for this purpose, allows us to investigate the effectiveness of using a communication-oriented language for implementing OS functionality. Finally, using a Java [19] virtual machine running on the OSKit, we have prototyped a small network computer supporting a Java-based web server and other applications, as well as an active network router that dynamically executes Java bytecode embedded in network packets.
The rest of this paper describes the OSKit and reports on our experience using it for research in operating systems and advanced language systems. Section 2 discusses related work. Section 3 outlines the major OSKit components, and Section 4 details the OSKit's design and implementation. Section 5 describes two example network-oriented OSKit-based applications. Section 6 presents our experience with the OSKit through several case studies. Finally, in sections 7 and 8 we present status, future work, and conclusions.
Recent research projects such as the exokernel [14], SPIN, and VINO focus on creating extensible systems which allow applications to modify the behavior of the core OS to suit their particular needs. However, these systems still define a particular, fixed set of ``core'' functionality and a set of policies by which the core can be used and extended. As the exokernel authors state, ``mechanism is policy, albeit with one less layer of indirection'' [14]. The OSKit, in contrast, makes no attempt to be a useful OS in itself and does not define any particular set of ``core'' functionality, but merely provides a suite of components from which real OS's can be built.
Many real-time, embedded operating systems, such as QNX [20] and VxWorks [35], are designed as a set of modular components that can be statically or dynamically linked with a small core kernel in various configurations. These systems apparently do provide a relatively hospitable kernel environment for a single POSIX-based application, such as a Java virtual machine, and indeed industry has recently constructed a Java-based system using VxWorks. However, the main purpose of these embedded systems' modularity is to allow them to be used in very small, tightly-constrained hardware environments as well as (or instead of) in fully-equipped workstations and PCs. All of the optional components still rely on the basic OS environment provided by the core kernel, and are neither designed with the intention of making them usable in other environments nor sufficiently documented to make it practical for users of these systems to do so. For example, the VxWorks execution environment always runs a special ``exception thread'' (in VxWorks terminology, a ``task''), whose purpose is to field messages from distressed threads, providing a safe execution environment in which to execute cleanup code. Other VxWorks components require this thread's presence, so cannot easily be used in other environments.
Several object-oriented operating systems have been created, such as Choices [12], which provides a full operating system, and the Taligent [27] system, which provides OS services above an underlying microkernel. Like the OSKit, these systems attempt to provide an extensible OS structure built from customizable, replaceable components. However, these are still operating systems in and of themselves: they still define a basic core OS structure and a framework within which OS components are to be extended, and make no attempt to allow their components to be easily separated out and used in other widely different OS environments. Thus, these object-oriented operating systems are comparable in their basic design goals to the extensible and scalable systems described above.
In this section we provide a brief overview of several of the most important OSKit components. For reference, Figure 1 illustrates the overall structure of the OSKit and the relationships between its components.
Figure 1: The structure of the OSKit. The shaded
components indicate off-the-shelf code that is encapsulated in the
OSKit using glue code. This figure shows only the approximate
relationships
between components; the sizes of the regions do not correspond to
the sizes of the components.
Most operating systems come with their own boot loading mechanisms which are largely incompatible with those used by other systems. This diversity of existing mechanisms is caused not so much by any fundamental difference in the bootstrap services required by each OS, but instead merely by the ad hoc way in which boot loaders are typically constructed. Because boot loaders are basically uninteresting from a research standpoint, OS developers generally just produce a minimal quick-and-dirty design, which results in each boot loader being unsuitable for the next OS due to slight differences in design philosophy or requirements. To solve this problem, the OSKit directly supports the MultiBoot standard [16] which was cooperatively designed by members of several OS projects to provide a simple but general interface between boot loaders and OS kernels, allowing any compliant boot loader to load any compliant OS. Using the OSKit, it is easy to create OS kernels that support a variety of existing boot loaders that adhere to the MultiBoot standard. In addition, the OSKit includes tools that allow these MultiBoot kernels to be loaded from older BSD and Linux boot loaders, and from MS-DOS.
A key feature of the MultiBoot standard that makes it highly useful to research systems is the ability of the boot loader to load additional files, or boot modules, at boot time along with the kernel itself. A boot module is simply an arbitrary ``flat'' file, which the boot loader does not interpret in any way, but instead merely loads into chunks of reserved physical memory along with the kernel image itself. Upon starting the kernel, the boot loader then provides the kernel with a list of the physical addresses and sizes of all the boot modules that were loaded, along with an arbitrary user-defined string associated with each boot module. These boot modules and their associated user-defined strings can then be interpreted by the kernel however it sees fit; typically their purpose is to ease the kernel's bootstrapping burden by providing arbitrary data that the kernel might need to get started, such as initialization programs, device drivers, and file system servers.
The primary purpose of the OSKit's kernel support library is to provide easy access to the raw hardware facilities without adding overhead or obscuring the underlying abstractions. It contains a large collection of useful functions and symbol definitions that are highly specific to supervisor-mode code. In contrast, most of the other libraries in the OSKit are often useful in user-mode code as well, even though they are designed primarily with kernels in mind. Also unlike the rest of the OSKit, much of the kernel support code is necessarily architecture-specific; no attempt has been made to hide machine-specific details that might be useful to the client OS. For instance, on the x86, the kernel support library includes functions to create and manipulate x86 page tables and segment registers. Other OSKit components can, and often do, provide higher-level architecture-neutral facilities built on these low-level mechanisms, but the architecture-specific interfaces always remain accessible in order to provide maximum flexibility.
The OSKit's kernel support library is especially important on the x86 architecture, whose OS-level programming environment is particularly complex and obscure. The kernel support library takes care of setting up a basic 32-bit execution environment (x86 processors normally start up in a 16-bit mode for compatibility with MS-DOS), initializing segmentation and page translation tables, installing an interrupt vector table, and providing default trap and interrupt handlers. Naturally, the client OS can modify or override any of this behavior; however, by default, the kernel support library automatically does everything necessary to get the processor into a convenient execution environment in which interrupts, traps, debugging, and other standard facilities work as expected. The library also by default automatically locates all of the boot modules loaded with the kernel and reserves the physical memory in which they are located so that the application can easily make use of them later on. The client OS need only provide a main function in the standard C style; after everything is set up, the kernel support library will call it with any arguments and environment variables passed by the boot loader. Thus, using the OSKit, a ``Hello World'' kernel is as simple as an ordinary ``Hello World'' application in C.
By contrast, the OSKit provides a minimal C library designed around the principle of minimizing dependencies rather than maximizing functionality and performance. For example, locales and floating-point are not supported, and the standard I/O calls don't do any buffering, instead relying directly on underlying read and write operations. Dependencies between C library functions are minimized, and those dependencies that do exist are documented so that individual functions can be replaced as necessary in order to adapt the minimal C library to arbitrary environments.
One of the OSKit's most important practical benefits is that, given an appropriate hardware setup, it immediately provides the OS developer with a full source-level kernel debugging environment. The OSKit's kernel support library includes a serial-line stub for the GNU debugger, GDB [32]. The stub is a small module that handles traps in the client OS environment and communicates over a serial line with GDB running on another machine, using GDB's standard remote debugging protocol. The OSKit's GDB stub can be used even if the client OS performs its own trap handling, and even supports multithreaded debugging if the client OS provides appropriate hooks. In the future, we plan to integrate a local debugger into the OSKit as well, which can be used when a separate machine running GDB is not available.
In addition to the basic debugging support, the OSKit also provides a memory allocation debugging library, which tracks memory allocations and detects common errors such as buffer overruns and freeing already-freed memory. This library provides similar functionality to many popular application debugging utilties, except that it runs in the minimal kernel environment provided by the OSKit.
One of the most expensive tasks in OS development and maintenance is supporting the wide variety of available I/O hardware. Devices are tricky and often have undocumented glitches and new hardware is constantly being released with incompatible software interfaces. For these reasons, the OSKit leverages the extensive set of stable, well-tested drivers developed for existing kernels such as Linux and BSD. To avoid divergence from these existing source bases and allow new and improved drivers to be easily integrated into the OSKit in the future, existing driver code is incorporated into the OSKit largely unmodified using an encapsulation technique described later in Section 4.7. These existing drivers are surrounded by a thin layer of OSKit glue code which allows them to be used in environments completely different from those for which the drivers were originally written. Currently, most of the Ethernet, SCSI, and IDE disk device drivers from Linux 2.0.29 are included--over fifty in all--as well as eight character device drivers imported from FreeBSD in the same way, supporting the standard PC console and serial port and various multi-serial port boards. Because of the OSKit's careful packaging of these drivers, the FreeBSD drivers work alongside the Linux drivers without a problem. In the future we expect to incorporate drivers from other sources as well, possibly even from popular commercial operating systems for which hardware vendor-supplied drivers are often available.
The OSKit provides a full TCP/IP network protocol stack; like the device drivers, the networking code is incorporated by encapsulation so that it can easily be kept up-to-date. However, whereas the OSKit currently takes its network device drivers from Linux, which is the largest source of freely available drivers for the PC platform, the OSKit's network components are instead drawn from the 4.4BSD-derived FreeBSD [24] system, which is generally considered to have much more mature network protocols. This demonstrates a secondary advantage of using encapsulation to package existing software into flexible components: with this approach, it is possible to pick the best components from different sources and use them together--in this case, Linux network drivers with BSD networking.
To complete our picture, the OSKit incorporates standard disk-based file system code, again using encapsulation, this time based on NetBSD's file systems. NetBSD was chosen in this case as the primary source base because its file system code is the most cleanly separated of the available systems; FreeBSD and Linux file systems are more tightly coupled with their virtual memory systems. We are currently incorporating Linux file systems as well, to support many diverse file system formats, such as those of Windows 95, OS/2, and System V.
Our development of a highly secure file server using the OSKit's file system provided an interesting experience with the use of such a component. The OSKit file system's exported COM interfaces are similar to the internal VFS interface [23] used by many Unix file systems. These interfaces are of sufficiently fine granularity that we were able to leave untouched the internals of the OSKit file system. For example, the OSKit interface accepts only single pathname components, allowing the security wrapping code to do appropriate permission checking. The fileserver itself, however, exports an interface accepting full pathnames, providing efficiency where it matters, between processes. Avoiding any modification of the main file system code greatly reduces our maintenance burden, allowing us easily to track NetBSD releases.
The most important goal of the OSKit is to be as convenient as possible for the developer to use. Although this goal has many ramifications throughout the OSKit's design, its first manifestation is in the basic layout and usage pattern of the OSKit as a whole. The OSKit is structured as a package that can be automatically built and installed, in most cases, as easily as an ordinary GNU-style application. It is self-sufficient in that it does not use or depend on any existing libraries or header files installed on the system; the only things the user must provide are the compiler, linker, and a few other development tools. Building and installing the OSKit causes a set of libraries to be created in a user-defined location (e.g., /usr/local/lib and /usr/local/include) from which they can then be linked into operating systems just like ordinary libraries are linked into user-level applications.
The OSKit is structured this way because developers are already familiar with libraries and know how to use them; although it is not common practice to link libraries into a kernel, this is simply because until now few libraries have been designed to be usable in kernels. Given a set of libraries designed for this purpose, it is much easier for a developer to link in a library and use it than to drop in a set of .c files, figure out what compiler options to compile them with, what header files they need, etc. Developers can define their own source tree layout and build environment rather than being required to integrate their sources into a predefined existing structure.
While modularity is a standard software design goal, in the OSKit it gains a new level of importance. A primary goal of the OSKit is to allow developers to use arbitrary components in a given situation without being forced to use other parts of the OSKit; this means that the OSKit's components must not only be modular, but also fully separable. For example, the client should be able to use the OSKit's device drivers without also having to use the OSKit's memory manager, even though device drivers necessarily require some kind of memory allocation service. A traditional kernel such as BSD may be extremely clean and modular, but is still not very separable because of extensive inter-module dependencies which make it difficult, for instance, to use BSD's device drivers without BSD's memory allocator, or BSD's file system code without BSD's VFS layer.
To provide full separability between components in the OSKit, it is often necessary to introduce thin intermediate ``glue'' layers to provide a level of indirection between a component and the services it requires. In many cases these layers take the form of library functions with trivial default implementations, whose sole purpose is to be overridden by the client OS when the need arises. In other cases, the layer of indirection is provided through the use of function pointers and dispatch tables which allow components to be dynamically bound together by the client OS at run time. The former method is generally used for services for which there is generally only one implementation in the system, such as putchar and malloc, whereas the latter method is used when multiple implementations of a service must coexist, such as the block I/O interfaces to different disk device drivers.
As an example of the first method, to allocate memory, all device driver components in the OSKit invoke a client-supplied function called fdev_mem_alloc. A default implementation of this function is provided which uses the OSKit's memory management library, but this default can easily be overridden by the client OS if it uses its own method of managing physical memory. This way, in simple situations where the client just uses the OSKit's defaults, everything ``just works'' without any special action on the client's part; however, the client OS can obtain full control over memory allocation and other services when needed.
As an example of the second method of ensuring separability, none of the OSKit's file system components have any link-time dependencies on the OSKit's device driver components, even though the file systems must invoke the block device drivers in order to access the underlying disk on which the file system resides. Instead, when the client OS initializes an OSKit device driver, the device driver returns a pointer to an interface to use to access the device. OSKit interfaces will be described in more detail later, but essentially are just opaque objects with dynamic dispatch tables similar to C++ virtual function tables. Later, when initializing the OSKit's file system component, the client OS passes the device driver's interface pointer and the file system henceforth uses that interface to invoke the driver's services. In this way, the client OS can bind at run time any file system to any device driver, and neither component needs to know how it is being used.
The OSKit's libraries each contain a number of logical components; the client OS incorporates these components by referencing symbols defined by the libraries, and the linker determines which specific object files to pull in. However, beyond this principle there is no single, standard definition of exactly what a ``component'' is or how it is used. By not attempting to force all components into a single fixed design methodology, the OSKit gains a degree of flexibility that we have found to be essential to its success. In particular, the most natural size and granularity for components vary widely in different parts of the OSKit, from tiny single-function ``components'' such as strcpy to large components each consisting of many modules such as the BSD file system. To cope with such large variations in granularity, there must be some corresponding variety in implementation and usage patterns. The OSKit's libraries are roughly divided into two main categories: function libraries and component libraries.
The OSKit's function libraries provide relatively simple, low-level services in a traditional C-language function-oriented style. They are designed for fine-grained use and control, generally on a function-by-function basis, allowing the client OS to use particular library functions while leaving out or individually overriding other functions. The dependencies between library functions are minimized, as are dependencies on other libraries; where these dependencies inevitably exist, they are well-defined and explicitly exposed to the OS developer.
For instance, the OSKit's minimal C library provides an implementation of the well-known printf function as well as other standard I/O services; however, these services are designed very differently from those of traditional C libraries. A standard I/O module traditionally acts as one big ``black box'' which implements a broad set of high-level services on top of a corresponding set of low-level services (read, write, etc.), and maintains private state to implement buffering and other common features. Making use of a single standard I/O function in an application pulls in various internal support routines, and with them many dependencies on other facilities such as memory allocation, terminal control, etc. The standard I/O services in the OSKit's minimal C library, on the other hand, minimize dependencies and internal state (e.g., they perform no buffering), and their implementations are documented so that the client OS can exercise full control over them. For instance, the OSKit's default printf function is implemented in terms of two other functions, puts and putchar; the default puts, in turn, is implemented only in terms of putchar. While this implementation would be a bug in a standard C library, in which overriding one function is not supposed to affect the behavior of another, in the OSKit's minimal C library it is extremely useful because it allows the client OS to obtain basic formatted console output simply by providing a putchar function and nothing else.
Whereas the function libraries are designed for maximum fine-grained flexibility and controllability, the component libraries are designed to provide large chunks of functionality in one shot, as quickly and conveniently as possible. They adopt a more coarse-grained, object-oriented ``black box'' design philosophy with relatively fewer public entrypoints. For example, in the OSKit's device driver component libraries, each device driver is represented by a single function entrypoint which is used to initialize and register the entire driver. Most of the internal details of the driver and the hardware it controls are hidden from the client OS, which generally interacts with these components only through common, well-defined, object-oriented interfaces, giving the OS developer ``plug and play'' control over the overall system structure. This design increases large-scale flexibility at the expense of fine-grained controllability: by using a particular OSKit device driver, the OS developer gives up direct control over the piece of hardware the driver is controlling, but gains the ability to drop a different driver in its place later on without changing anything else.
There is no clear-cut criterion defining the appropriate granularity for particular components; in fact, for some services it may be desirable to have alternative libraries available implemented at different granularities. For example, while the OSKit's minimal C library serves the needs of kernels and simple applications well by emphasizing simplicity and flexibility over functionality, in the future we may integrate a more traditional C library, such as the BSD C library, as an alternative OSKit component, to be used in situations where more complete functionality but less fine-grained control is needed by the application.
For usability, it is critical that OSKit components have clean, well-defined interfaces, designed along some coherent set of global conventions and principles. To provide this standardization, we adopted a subset of the Component Object Model [25] as a framework in which to define the OSKit's component interfaces. At its lowest level, COM is merely a language-independent protocol allowing software components within an address space to rendezvous and interact with each other efficiently, while retaining sufficient separation so that they can be developed and evolved independently. Besides the obvious advantages of making the OSKit's interfaces more consistent with each other and with those widely used in component-oriented applications, COM also brings several technical advantages described below.
COM is founded on the notion of interfaces, which are comparable to Java [19] interfaces: they define a set of methods that can be invoked on an object without providing any direct access to the object's internal state. COM interfaces are defined independently of the components that implement them, ensuring that implementation and interface remain well-separated; in practice many different implementations of a particular COM interface often coexist even within a single system. As represented in the C language, a COM interface is an opaque structure whose actual size and content is unknown to the client, except that its first member is a pointer to a table of function pointers, similar to a C++ virtual function table. For example, Figure 2 shows a slightly simplified but essentially complete definition of the OSKit's blkio interface, which is implemented by each of the OSKit's disk device drivers as well as by other components.
typedef struct blkio { struct blkio_ops *ops; } blkio_t; struct blkio_ops { error_t (*query)(blkio_t *io, const struct guid *iid, void **out_ihandle); unsigned (*addref)(blkio_t *io); unsigned (*release)(blkio_t *io); unsigned (*getblocksize)(blkio_t *io); error_t (*read)(blkio_t *io, void *buf, off_t offset, size_t amount, size_t *out_actual); error_t (*write)(blkio_t *io, const void *buf, off_t offset, size_t amount, size_t *out_actual); error_t (*getsize)(blkio_t *io, off_t *out_size); error_t (*setsize)(blkio_t *io, off_t new_size); }; /* Friendly macros */ #define oskit_blkio_read(io, buf, ofs, amount, out_actual) \ ((io)->ops->read((io), (buf), (ofs), (amount), (out_actual))) .... #define BLKIO_IID GUID(0x4aa7df81, 0x7c74, 0x11cf, \ 0xb5, 0x00, 0x08, 0x00, 0x09, 0x53, 0xad, 0xc2)Figure 2: The OSKit's COM Interface for Block I/O. The blkio_ops structure is the dynamic dispatch table for this interface, representing the methods that can be called. The last two lines define the Globally Unique Identifier (GUID) identifying the blkio interface.
Finally, one of the abstraction features in our use of COM that is most important for the purposes of the OSKit is that interfaces can be completely ``standalone'' and do not require any common infrastructure or support code that the client OS must use in order to make use of the interfaces. Contrast this, for example, with the BSD, Linux, and x-kernel network stacks, in which the protocols themselves are modular and interchangeable to some degree, but each of their interfaces depends on a particular buffer management abstraction with a particular concrete implementation (mbufs, skbuffs, and Msgs, respectively). In order to use any BSD networking code, one must also incorporate and ``design around'' the BSD mbuf code; it would be nontrivial at best to replace it with an alternative buffer management implementation that differs in more than minor details. The OSKit's corresponding interfaces, on the other hand, are purely behavioral contracts between modules that rely on no particular common implementation infrastructure.
To achieve full OSKit component separability, it is necessary to define and document not only the interface implemented by a component, but also all of the interfaces the component itself uses and the execution environment on which it depends: in other words, each component must be described not only ``in front'' but ``all around.'' For function libraries such as the minimal C library, this is mostly a matter of documenting each function's behavior and dependencies: for instance, the documented ``environment'' of the printf function consists of the puts and putchar functions on which it is based.
For larger components such as device drivers, however, issues such as concurrency and synchronization are important, and the reentrancy and interruptibility requirements of each component must be defined carefully. Naturally, the complexity of the execution environment required by a component varies depending on the size and complexity of the component itself in addition to other factors; however, in all cases the OSKit's design attempts to minimize the complexity of this expected execution environment. For instance, the OSKit does not require the OS to provide a notion of ``interrupt priority levels'' as is used in BSD, even though the OSKit incorporates BSD file system and networking code and can be made to use multiple IPLs if desired. The OSKit documentation specifies several basic execution models of varying complexity, ranging from an extremely simple concurrency model in which the component makes almost no assumptions about its environment, to the most complex model in which components must be aware of and have some control over various concurrency issues such as blocking, preemption, and interrupts. All of the OSKit's components conform to one of these documented execution models. Further, since using the OSKit in a given environment will invariably involve some adaptation to local requirements, we have also included in the documentation a number of ``recipes'' for using OSKit components in various common environments, such as preemptive, multiprocessor, or interrupt-model kernels.
Whereas hiding the implementation of a module is generally considered good software design practice, we take an approach in line with Kiczales' ``Open Implementation'' philosophy [22]. The OSKit often explicitly exposes the implementation of a component as part of its documented interface, in order to provide maximum power and flexibility to the client OS. For instance, the OSKit's basic memory management library exposes a number of functions that are fairly specific to its particular implementation, such as the ability to reserve particular regions of physical memory or walk through and examine the free list. The client OS is not obligated to use these low-level interfaces and in most cases can stick to the standard malloc-like interface, but the availability of the low-level interfaces is often important in meeting the needs of particular kernels.
The OSKit employs an open implementation philosophy even for the more coarse-grained component libraries in which it is usually desirable to hide most implementation details. However, in this case, the key point is that implementation details are hidden unless explicitly requested; they are not forced onto the client. For example, all of the OSKit's device drivers, whether derived from BSD or Linux, export a common set of basic interfaces which hide the nature and origin of each individual driver; however, each device driver can also export additional interfaces providing extended, driver-specific functionality. In fact, the COM interface extension mechanism (Section 4.4.2) provides an ideal basis for open implementation in the OSKit.
We have found it extremely useful to preserve not only the contents of source files imported from legacy systems, but also the directory structure they reside in. For instance, all of the encapsulated FreeBSD code is located in the OSKit subdirectory freebsd/src; this directory exactly mirrors the /usr/src tree in the actual FreeBSD distribution, except that it only contains the files the OSKit actually uses. The glue code that encapsulates the imported FreeBSD code is located in separate directories such as freebsd/net and freebsd/dev, keeping the glue well separated from the encapsulated code. This structure allows changes in a new release of the donor OS to be incorporated simply by applying an appropriate patch to the appropriate OSKit directory subtree and then fixing any resulting conflicts. Of course, if the changes in the donor OS were extensive, the conflict resolution and debugging process can take some time and thought, but it is still much simpler and quicker than updating heavily modified or restructured code.
The link-time namespace presents another problem: although the legacy header files are never used or seen by other components or by the client OS, any global functions or variables the legacy code defines may conflict with those defined by the client OS or by other components. For example, the NetBSD file system and FreeBSD networking components use many functions with the same names but incompatible definitions, which is not surprising given the common heritage of these systems. To solve this problem we used preprocessor magic to rename these symbols: e.g., the wakeup function used in the FreeBSD device drivers is named FDEV_FREEBSD_wakeup in the compiled object files comprising the library, preventing linker conflicts with other code.
When a Linux network driver receives a packet from the hardware, it reads it into a contiguous skbuff and then passes it up to higher-level networking code, which in this case is the OSKit's Linux glue code. This glue code must in turn export the packet from the component using the OSKit's common networking interfaces in which packets are represented by bufio interfaces (see Section 4.4.2). Because COM interfaces make essentially no restrictions on the implementation details of the objects themselves, the Linux glue code can export the skbuff directly as a COM bufio object without copying the data, merely by adding a bufio interface to the Linux skbuff structure itself. The COM interface is simply a one-word field in the skbuff structure in which the glue code places a pointer to a function table providing methods to access the skbuff's contents; the semantics of these functions are defined by the bufio interface, but the functions themselves are implemented by the glue code with full internal knowledge of Linux's skbuff implementation. After the skbuff leaves the component, external code only manipulates it through its bufio interface.
Packets submitted to the driver component for transmission are also represented by a COM bufio interface, but the Linux glue code cannot assume that the object is really an skbuff since, for example, the packet may have been manufactured in the FreeBSD TCP/IP code where packets are instead represented as mbufs. The Linux glue code can easily recognize ``foreign'' bufio objects by checking their function table pointer; when it receives one, it first calls its map method to obtain a direct pointer to the data in the buffer if possible. This call will only succeed if the implementor of the bufio object happens to store the requested range of data in contiguous local memory; if it does, the Linux glue code creates a ``fake'' skbuff pointing directly to this data. Otherwise, the glue code allocates a normal skbuff and calls the bufio interface's read method to copy the data into the buffer. In this way, copying is avoided whenever possible while presenting a clean, abstract interface to the client OS and other components.
Since all of the OSKit's encapsulated components currently come from systems that use the relatively simple and well-understood blocking model, the encapsulated OSKit components retain this same execution model as seen by the client OS. The model has two levels, ``process level'' and ``interrupt level.'' There can be many process-level threads of control using separate stacks, but only one can run at a time and context switches only occur at well-defined ``blocking'' points; interrupt-level activities can run any time interrupts are enabled and always run to completion without blocking. Unlike in conventional kernels, however, the ``process'' and ``interrupt'' abstractions in the OSKit components are generally only relevant for purposes of defining the concurrency model, and do not necessarily correspond to ``real'' processes or hardware interrupt handlers. As long as the client OS ensures that all calls it makes into the component follow the constraints defined by the concurrency model, it can use the component in practically any environment.
For example, although the encapsulated OSKit components are not inherently multiprocessor or thread safe, they can easily be used in multiprocessor or multithreaded environments by taking a component-wide lock just before entering the component, and releasing it after the component returns and during any ``blocking'' calls the component makes back to the client OS. Although this allows only relatively coarse-grained concurrency, this granularity is sufficient in many situations and is clearly the best we can do without heavily modifying the imported code. Furthermore, the client OS can run different OSKit components independently--for instance, a multiprocessor OS could place separate locks around the file system and network components, allowing them to execute concurrently. This medium-grained concurrency is possible because of the clean separation and independence of the OSKit components, and would be much harder to achieve if the BSD or Linux file system and networking code was imported into the client OS directly without any of the ``packaging'' done by the OSKit.
Even though the concurrency model presented to the client OS by the OSKit's encapsulated components is superficially similar to the models used in the donor OS environments, the OSKit's environment is much simpler and more limited in many ways, making as few demands as possible on the client OS in order to make the components as widely usable as possible. For example, as mentioned above, the OSKit's components generally can't assume that the client OS has any notion of a ``process'' in the traditional sense. However, the imported legacy code is generally riddled with code that makes assumptions about processes and often accesses the ``current process'' structure directly (e.g., through BSD's curproc or Linux's current pointer). This is done for many reasons, but the most common cases are permission checking and blocking on events.
To avoid having to make the large number of changes required to eliminate these dependencies, we instead structured the glue code to emulate the abstractions expected by the encapsulated code. For example, to emulate the current process, at every entrypoint into the component from the ``outside,'' the glue code creates and initializes a minimal temporary process structure on the stack, and initializes the global (component-wide) curproc pointer to point to it. This structure then represents the ``current process,'' as far as the encapsulated code is concerned, for the duration of this call, and automatically disappears when the call completes--in essence, the glue code manufactures processes on demand. Since other threads of control may execute in the component during any blocking calls the component makes back to the client OS, the glue code must also intercept these calls and save the curproc pointer on the local per-thread stack for their duration in order to prevent it from getting trashed by other concurrent activities. In this way, the glue code emulates the process abstraction expected by the legacy code while completely hiding it from the client OS.
Another related part of the donor OS environment used throughout the imported legacy code is the sleep/wakeup mechanism, which for example the interrupt handler in a device driver uses to wake up a blocked read or write request after it has completed. Naturally, the details of the sleep/wakeup mechanism vary widely even among similarly structured kernels such as BSD and Linux. The glue code in each encapsulated component emulates the native sleep/wakeup mechanism of the donor OS on top of a single extremely simple underlying abstraction designed to be as easy as possible for the client OS to implement: namely a ``sleep record,'' which is like a condition variable except that only one thread of control can wait on it at a time. The client OS can directly implement these ``sleep records'' in various ways, such as using conventional condition variables or event objects or whatever else the client OS uses internally for synchronization. In fact, in the OSKit's single-threaded example kernels, sleeping is implemented simply as a busy loop that spins on a one-bit field in the sleep record structure.
Given this minimal sleep/wakeup mechanism provided by the client OS, we found that the easiest way to emulate the more elaborate mechanisms expected by legacy OS code was to incorporate the actual sleep/wakeup code from the legacy OS and modify it slightly to use the OSKit's sleep record abstraction in place of the legacy OS's scheduler. For instance, the BSD sleep/wakeup mechanism uses a global hash table of ``events,'' where an event is just an arbitrary 32-bit value; when wakeup is called on a particular event, all processes waiting on that particular value are woken. In the encapsulated BSD-based OSKit components, we retain BSD's original event hash table management code; however, the hash table is now only used within that particular component rather than throughout the entire system, and instead of all the scheduling-related fields in the emulated proc structure there is now only a sleep record which the BSD glue code uses to block the current ``process'' as necessary.
Another tricky aspect of the legacy environment to be emulated is the memory allocation mechanism. BSD's in-kernel malloc package tries to be particularly clever in a number of respects: (1) all allocated blocks are naturally aligned according to their size (e.g., 65-128 byte blocks are aligned on a 128-byte boundary); (2) blocks with a size of exactly a power of two can be allocated efficiently without wasting space; and (3) the allocator automatically keeps track of the sizes of allocated blocks. Any two of these properties can be implemented easily, but it takes special tricks to provide all three at once--and several parts of the BSD kernel, such as the mbuf code for networking and the clist code for character I/O, depend on all three properties.
The BSD kernel allocator provides these three properties by reserving on system startup a special range of kernel virtual memory for this allocator, and creating a separate table with one entry for each page in this range. Each entry records the size of the blocks allocated from that page, so that on a call to free the allocator can quickly look up the size of the allocated block without having to reserve space in the memory block itself for this information, which would conflict with the first two properties above. Unfortunately, this solution is not acceptable in the OSKit because OSKit components do not have any knowledge of or control over the virtual or physical memory layout of the kernel or user-mode application in which they run. In this case, there appears to be no entirely satisfactory solution that does not involve heavy modifications to the imported legacy code; however, we have adopted an imperfect but practical solution which relies on the (generally true) heuristic assumption that most memory blocks returned by the client OS will be fairly densely packed within the address space. To provide the properties above in this context, our BSD glue code uses BSD's kernel malloc package unmodified, layering it on top of the memory allocation facility the client OS provides, except that our glue code watches the memory blocks returned by the client OS and dynamically re-allocates and grows the allocation table as necessary to ensure that it always covers all of the addresses that the allocator has ever ``seen.'' Naturally, this solution may become very inefficient or cease to work in unusual cases where the memory blocks returned by the client OS are very widely dispersed in the component's address space; however, it works fine in all of the situations we have so far encountered.
Another common problem with legacy kernel code, particularly device driver code, is the assumptions it makes about how it can access physical memory. For example, some of the Linux device drivers assume that all physical memory is direct-mapped into their address space starting at virtual address zero, and access memory-mapped devices, BIOS information, etc., simply by ``manufacturing'' pointers directly from physical addresses. This makes it impossible to use these particular drivers in a client OS that does not directly map physical memory in this way; In this case there appears to be no solution other than fixing the drivers themselves. Fortunately, FreeBSD drivers use a well-known symbolic constant when accessing physical memory in this way; this ``constant'' can simply be changed into a variable initialized by the BSD glue code on start-up to point to a region of mapped physical memory.
The minimal C library provided all but a few of the functions required by these examples. Since ttcp relies on the times reported by getrusage for its timing, we implemented a simple getrusage based on the timers kept by the FreeBSD-derived networking code. ttcp also uses signal and select which are currently not provided by the OSKit, but they are only used to handle exceptional conditions and can be implemented as null functions without affecting the results. Aside from these additions and the initialization of the device drivers and networking stack, we were able to compile the applications unchanged. The required initialization code looks like this:
fdev_linux_init_ethernet(); fdev_probe(); oskit_freebsd_net_init(&sf); posix_set_socketcreator(sf); fdev_device_lookup(&fdev_ethernet_iid, &dev); oskit_freebsd_net_open_ether_if(dev[0], &eif); oskit_freebsd_net_ifconfig(eif, IPADDR, NETMASK);
fdev_linux_init_ethernet initializes the Linux ethernet drivers, causing all supported drivers to be linked into the resulting application. (The client OS can alternatively link in only specific drivers if it chooses.) fdev_probe locates all devices for which a driver has been initialized. The FreeBSD networking stack is initialized with oskit_freebsd_net_init which returns a ``socket factory'' interface used to create new sockets; posix_set_socketcreator is then called to register that socket factory with the C library so that its socket function will work. The fdev_device_lookup call searches the device table constructed with fdev_probe, returning an array of handles for Ethernet devices. The first such device is then bound to the FreeBSD network stack by oskit_freebsd_net_open_ether_if, which returns a handle used by oskit_freebsd_net_ifconfig to perform BSD ifconfig-style configuration of the network interface.
Figure 3:
Structure of the ttcp and rtcp
example kernels.
Figure 3 shows the structure of the ttcp and rtcp kernels when compiled with the OSKit libraries. The application uses the familiar BSD socket functions. The OSKit's C library maps these functions directly to the methods of the oskit_socket COM interface implemented by the FreeBSD networking component, by associating file descriptors with references to COM objects. Since the C library's socket call uses a client-provided socket factory interface to create new sockets, this C library code can be used with any protocol stack that provides these socket and socket factory interfaces.
When the client OS binds the FreeBSD protocol stack to a Linux device driver during initialization, these components exchange callback functions which are subsequently used to pass packets back and forth asynchronously. When the driver receives a packet from the hardware, the driver calls the protocol stack's registered callback; similarly, when the protocol stack needs to transmit a packet, it calls the device driver's callback. Packets passed through these callbacks are represented as references to opaque objects implementing the oskit_bufio COM interface (see Section 4.7.3). In this system configuration, incoming packets are initially represented internally as skbuffs in the Linux network driver code; these skbuffs are passed directly to the FreeBSD TCP/IP component as COM bufio objects, which the FreeBSD glue code internally repackages as mbufs for the benefit of its imported FreeBSD code. Since skbuffs represents packets as contiguous buffers, the FreeBSD glue code is able to obtain a direct pointer to the packet data using the map method of the bufio interface, and therefore never has to copy the incoming data. Outgoing packets manufactured by the FreeBSD TCP/IP code, on the other hand, sometimes consist of multiple discontiguous buffers chained together; in this case, when the mbuf chain is passed to the Linux driver as a bufio object, the Linux glue code must read the data into its own contiguous buffer in order to present it to the encapsulated driver code as an skbuff. Thus, the mismatch between Linux's and BSD's internal packet representations sometimes requires extra copying on the send path, but never on the receive path.
|
|
Table 1 presents the bandwidth measurements obtained with ttcp. In each case ttcp transmitted 131072 blocks of 4096 data bytes (52MB total). This reasonably long run of the test compensates for the relatively low 10ms granularity of the clock used for timing. The results show that the OSKit implementation receives about as fast as FreeBSD--this is due to the fact that the Linux driver always hands contiguous buffers up which can be mapped to mbuf clusters without copying. On the other hand, when a packet is sent, an additional copy is necessary since non-contiguous protocol mbufs must be copied into contiguous device driver skbuffs, accounting for the decrease in send performance.
Table 2 shows the latency of a 1-byte round-trip. While we cannot interpret Linux's performance, the FreeBSD versus OSKit results indicate that the OSKit imposes significant overhead. Extra data copies are not part of the problem since this test involves small packet sizes that fit in a single protocol mbuf, enabling mapping into a device driver skbuff. Hence, the overhead is largely attributable to the additional glue code within the OSKit components: the price we pay for modularity and separability and for the ability to use existing driver and networking code unmodified in an environment for which they were not designed.
In this section we first briefly describe four major research projects that have recently taken advantage of the OSKit, and the overall experience of using the OSKit in each case; the next section will describe in more detail specific aspects of the OSKit that proved to be particularly useful in these research projects.
In 1996 we developed an entirely new microkernel-based system called Fluke [17] to explore numerous ideas in fundamental kernel structuring, including scheduling mechanisms, execution models, IPC, and virtual memory. We had been pursuing research using the Mach microkernel, but found that none of these directions could have been explored or effectively evaluated in the context of this existing system because it was too large, inflexible, and tightly bound together to be amenable to the fundamental changes we needed to make. Therefore, we started a new system, incorporating a few pieces of Mach and BSD code but otherwise written from scratch. To ensure that Fluke would not quickly become as tightly-bound and inflexible as its predecessor, we started developing the OSKit concurrently as a set of components to be used in Fluke and other projects. Therefore, Fluke acted as the primary driving application for the OSKit, but by also using the OSKit simultaneously for other purposes, we were able to prevent it from becoming specific to Fluke.
Fluke puts almost all of the OSKit to use, and in fact over half of the Fluke microkernel is OSKit code. Most of the basic servers and other utilities that run on Fluke also use the OSKit to provide their standard C library, memory allocation, address space management, and debugging facilities. These servers include a virtual memory manager, checkpointer, file and network servers, and a process manager. Although Fluke includes a complete standard C library based on FreeBSD's C library for the use of Unix applications running on Fluke, in many situations we have found that the OSKit's minimal C library provides all the functionality needed and is much smaller, simpler, and more flexible.
At MIT, ML/OS was developed over a semester by a Master's student with the part-time assistance of an undergraduate programmer. Neither student was previously familiar with OS internals or the low-level details of the x86 architecture. Much of their effort was spent in learning and dealing with the details of the SML/NJ implementation, which was far more complex than the OSKit code to which it was being mated. In contrast to this experience, strong groups of advanced programming language researchers have been intending, for years, to implement a sophisticated functional language directly on a raw hardware platform. For example, the Fox project at CMU [5, page 214,] and the Programming Principles group at Bell Labs have at different times begun efforts to port SML/NJ to run on bare hardware. But the details of doing so have been sufficiently forbidding as to prevent these efforts from ever being completed.
Implementing SR/OS with the OSKit was accomplished by one U.C. Davis student while visiting the University of Utah. The initial implementation took approximately one week, and adding network support using the x-kernel protocol stack required another week. In contrast, several earlier attempts to implement SR directly on the hardware proved very difficult or were stillborn. A very early version of SR was implemented directly on PDP-11 machines, but its development and debugging were extremely tedious [28]. Later, during the Saguaro distributed operating system project, an SR implementation on the bare hardware was again considered, but abandoned due to the lack of good development tools [2].
Building Java/PC atop the OSKit was remarkably easy: one Utah student, at that time not a major contributor to the OSKit, took just 14 hours to get the system to run a ``Hello, World'' Java application; large single-threaded applications, such as Sun's Java compiler, ran the next day. Less than three weeks later he had built a usable system that ran complex applications such as the Jigsaw Web Server [7], making extensive use of threads, timers, and file and network I/O. The resulting system is similar in function to Sun's JavaOS [33] but with a dramatically different implementation. Whereas almost all components in our system reuse existing C-based components provided by the OSKit, Sun's was primarily written anew in Java and took much longer to build.
We used an early version of the OSKit in a ``DOS extender'' [15], a small OS kernel that runs on MS-DOS and creates a more complete process environment for 32-bit applications; this DOS extender is now being used in commercial products. We have also used the OSKit in two small experimental kernels that we prototyped in order to test out new IPC, capability, and kernel execution environment concepts before committing to these ideas in the main Fluke kernel effort. Both of these kernels were developed in a matter of days to the point of being useful for measurement and analysis; these prototypes would not have been feasible without the OSKit. Finally, besides these experimental kernels, we have used the OSKit in several smaller utilities, such as specialized kernels to boot other kernels across the network or from existing non-MultiBoot boot loaders.
A few of the sites that have retrieved the OSKit have informed us of their use. Among these are researchers at the University of Carlos III in Spain who have built their ``Off'' distributed adaptable microkernel [8] on top of the OSKit, and the ``bits and pieces microkernel'' (bpmk), developed in Finland. A company, Network Storage Solutions, is using the OSKit to provide the base hardware support for a ``network appliance''-style server. In the wake of the successful language-based OS projects discussed above, another Utah student recently ported the GNU Smalltalk system to the bare hardware. He implemented a complete, functional multithreaded Smalltalk system in just over seven hours, starting with little experience with operating systems, the x86 PC, the OSKit, or the Smalltalk run-time system. This system has not yet, however, been used for serious research.
This section describes a few specific aspects of the OSKit that proved to be particularly useful in the above research projects, as well as OSKit features that caused problems or were commonly needed but not yet available.
All of the language implementations greatly benefited from the fairly complete POSIX environment provided by the OSKit's minimal C library, memory allocator, and kernel support library. This environment allowed the Unix versions of the languages to be quickly ported to the bare hardware, and then gradually specialized to take advantage of the new environment, extending their control to various hardware resources (e.g., registers, traps, interrupts) that are hidden by a normal operating system. Furthermore, the OSKit's minimal POSIX environment allowed the language research to focus on issues critical to the research, and let the POSIX environment pick up the rest: we could let unimportant code remain unimportant.
A particularly notable feature of the OSKit's minimal environment is its boot module support (see Section 3.1), which provides a simple RAM-disk file system accessible immediately upon bootstrap through POSIX's standard open/close/read/write interfaces. For example, in Fluke, this boot module file system includes the first user-mode executable to be loaded and run by the kernel, and subsequently acts as the root file system for this initial server, without requiring any ``real'' file system or device driver components to be embedded in the kernel. ML/OS uses a boot module to hold the precompiled ``initial heap image'' for the ML runtime, which is over 99% of the kernel (i.e., everything written in ML); similarly, Java/PC loads its Java bytecode (.class files) from the initial boot module file system. Other alternatives are available in each of these cases, such as manually embedding data into .o files linked into the kernel, or using the OSKit's device drivers and file system components to load this data from a ``real'' disk-based file system, but the boot module facility invariably proved to be by far the most simple, robust, and convenient for this purpose.
The absence of an OS-defined process or thread abstraction proved of great benefit to all three languages. ML/OS provides continuation-based Concurrent ML [29] threads as the machine's basic thread facility, complete with a preemptive scheduler, console I/O, and timer events. Whereas OS thread systems usually center on stacks, CML threads are entirely without stacks, running entirely in the heap. Interrupts and thread context switches use garbage collected continuations. Modeling concurrency in this way is central to our line of research; building on an infrastructure that imposed no thread abstraction meant that we could implement this model directly. It was similarly straightforward to port the built-in thread packages in Kaffe and SR to the OSKit. This contrasts with our experience porting Kaffe to a kernel providing its own thread abstraction--our Fluke microkernel. On Fluke, in order to avoid classic problems such as blocking for I/O, we needed to use native Fluke threads instead of Kaffe's built-in threads; minor mismatches between Java's and Fluke's thread semantics caused the Fluke port of Kaffe to take much longer.
In the ML port it was pleasant to discover how simple it can be to implement a high-performance functional programming language when one doesn't have to bend over backwards to accommodate the demands of an ill-suited operating system. The SML/NJ sources are larded with complex sequences of code designed to work around the limits of the Unix architecture. For example, SML/NJ allows heap allocation in signal handlers; to make this work, the run-time system must go through arcane gymnastics when it wishes to modify the processor's register state from a Unix signal handler. The trickery involved is considered sufficiently clever as to be worth reporting in the literature [4]. Since in our case the hardware is exposed in a documented manner, we just did it. No Unix; no trickery.
Java provided other examples of leveraging the exposed implementation and hardware. Kaffe relies on the delivery of SIGSEGV signals to detect null pointer accesses and throw the null pointer exception specified by Java semantics. The x86 architecture provides an efficient way to catch such accesses, by setting the processor's breakpoint registers appropriately. Besides allowing Java/PC direct access to this facility, the OSKit also provided a simple way for it to install its own custom trap handlers written in ordinary C, which can still fall back to the default handler for traps that are of no interest.
``Network computers'' seek to minimize memory footprint, and often do not need a disk or file system; using the OSKit it proved trivial to build a version of Java/PC that included networking but no file system. We have not yet made any effort to minimize memory footprint, but the inherent modularity of the OSKit keeps the resulting system to a modest size: the static (code+data) size of our executable is 412KB, including one ethernet driver, networking (121KB), the Kaffe virtual machine and native libraries (132KB), and various glue code.
Note that this system does not contain an implementation of Java's abstract windowing toolkit (AWT).
The networking performance of our Java/PC and SR/OS systems demonstrates the value of using mature, tuned components with flexible interfaces. Four weeks into the Java/PC project, using a measurement program written in Java, we measured a sustained TCP receive throughput of 78Mbps over a 100Mbps Ethernet, using the hardware platform described in Section 5. As expected, the TCP send throughput was lower at 59Mbps due to the extra copy. This relatively high performance is not surprising considering that the BSD network protocols have been tuned for over 15 years. In contrast, Sun's recent re-implementation of TCP/IP in Java [33] has been reported as being ``more than adequate for Web browsing,'' but by inference is probably as yet nowhere near the performance of traditional C implementations.
The simple, well-defined execution models used by the OSKit components enabled them to be used fairly easily in very different environments from those in which they were originally designed. For example, even though the Linux and FreeBSD-based components were designed for a traditional nonpreemptive uniprocessor kernel environment, they were easily incorporated into the fully multithreaded Java/PC and SR/OS environments by surrounding them with small amounts of locking and other glue code. The Fluke kernel uses these same components in an even more exotic environment. Since Fluke can be configured as an interrupt-model kernel, with only one kernel stack per processor rather than one per thread, it is impossible to run process-model OSKit components on these kernel stacks since they are not retained across context switches. However, we were easily able to solve this problem by running the process-level activities of these components on ordinary Fluke threads running in user mode but in the kernel's address space, while interrupt handlers in the components still run in supervisor mode. Using the OSKit components in these fundamentally foreign environments is only possible because their assumptions on the surrounding environment are minimized, and the few remaining assumptions are precisely defined.
SR/OS's use of the x-kernel [21] protocol framework demonstrated the value of the OSKit's simple structure as a set of libraries independent of the client system. The x-kernel has an extremely complex build environment which is entirely different from the OSKit's. However, we got the x-kernel working quickly with SR/OS by working completely within the x-kernel's build environment and using the OSKit as an independent, previously-installed ``package.'' Since all of the OSKit's functionality is provided by libraries, we just needed to add the appropriate `-L' and `-l' parameters to the main x-kernel makefile, and to point the main makefile to the OSKit include directories. Because the OSKit's exported structure is simple, it is easy to use it from within more complicated environments.
Although mundane from a research perspective, the practical importance of quality debugging tools and documentation should not be underestimated; this is particularly important to those whose primary interest and expertise lie elsewhere than in operating systems. The OSKit's robust source-level debugging support provided an environment familiar to application developers, contrasting sharply with the methods used during the early phases of the ML/OS project before GDB support was available: we could debug only by inserting ``halt'' instructions into the kernel, rebuilding, rebooting, and observing whether the kernel would quietly lock up (``success'') or exhibit random behavior (``failure'').
Finally, the documentation provided by the OSKit developers at Utah was a great help to the ML/OS hackers at MIT, even when it was much less complete than it is today. Had the ML project proceeded with its original intent to cannibalize Linux, we would likely still be puzzling out the code and interfaces of the kernel internals. Another more informal form of documentation were the twenty-line kernels E-mailed from Utah to MIT in answer to questions. These tiny (in source) but complete kernels were enabled by many features of the OSKit, all working together: the bootstrap/kernel support, the C/POSIX environment, the boot modules, and the component separability.
These research projects also revealed several deficiencies in the OSKit; some of these have already been addressed, whereas others remain for future work. They include:
In Table 3 we present a breakdown of the source sizes of the various OSKit components. The OSKit currently consists of over 260,000 lines of code. Of that, 32,000 lines of native and assimilated code and interfaces provide access to remaining 230,000 lines of entirely encapsulated imported code. Over 500 pages of documentation are provided; most of this documentation is in Unix man-page format, one module or interface per page, so it is not as forbidding as it sounds. While the OSKit currently only runs on x86 PCs, most of it is designed to be easily portable to other platforms, and two thirds of the OSKit's exported interfaces are architecture-neutral. Most of the heavily architecture-specific aspects of the OSKit are isolated in the low-level kernel support library and the bootstrap code.
Our first public release of the OSKit in July 1996 was an alpha version; two months later we made one public update, primarily adding initial device driver support. For the seven months in which at least one of those versions was available, inspection of our FTP logs shows that, excluding obvious mirror sites, 537 different external sites obtained the OSKit, including 122 at U.S. companies. The currently released version of the OSKit is available at http://www.cs.utah.edu/projects/flux/.
Interface | Implementation | |||||
| Native | Encapsulated | ||||
Library | Description | MI | x86 | MI | x86 | |
boot | Bootstrap Support | -- | 67 | -- | 2829 | -- |
kern | Kernel Support | 325 | 1379 | 476 | 3890 | -- |
smp | Multiprocessor Support | 6 | 2 | -- | 868 | -- |
lmm | List Memory Manager | 33 | -- | 314 | -- | -- |
amm | Address Map Manager | 60 | -- | 349 | -- | -- |
c | Minimal C library | 588 | 4 | 4863 | 220 | -- |
memdebug | Malloc Debugging | 18 | -- | 398 | 13 | -- |
diskpart | Disk Partitioning | 205 | -- | 311 | -- | -- |
fsread | File System Reading | 13 | -- | 1581 | -- | -- |
exec | Program Loading | 133 | 5 | 61 | 125 | -- |
com | COM interfaces & support | 1514 | -- | 667 | -- | -- |
fdev | Device driver support | 35 | -- | 861 | 388 | -- |
linux_dev | Linux drivers & support | 77 | -- | 2801 | -- | 77023 |
FreeBSD common code | -- | -- | 524 | 116 | 8275 | |
freebsd_dev | FreeBSD drivers & support | 8 | 13 | 558 | 10 | 14755 |
freebsd_net | FreeBSD network stack | 44 | -- | 1318 | -- | 17241 |
freebsd_m | FreeBSD Math library | -- | -- | -- | -- | 7517 |
netbsd_fs | NetBSD file system | 8 | -- | 2465 | -- | 18968 |
x11 | X11-based windowing | -- | -- | 1909 | -- | 90182 |
3067 | 1470 | 19456 | 8459 | |||
Total | 4537 | 27915 | 233961 |
We are especially grateful to Mike Hibler for last minute writing help, and to the many people at the University of Utah who have contributed to the OSKit, including Steve Clawson, Mike Hibler, John McCorquodale, Bart Robinson, Steve Smalley, Pat Tullmann, Kevin Van Maren, and Jeff Turner. We thank Shantanu Goel for his important work, both at Columbia and at Utah, on the Linux device driver framework in Mach. Erich Boleyn co-authored the MultiBoot specification and developed a boot loader for it. We thank the OSKit's clients for their feedback, and especially our shepherd John Ousterhout, the anonymous reviewers, members of the Flux project, Bob Kessler, and Ron Olsson for their helpful comments. Finally, we are grateful to the thousands who have contributed to the systems which provided the base for many of our components.
This research was supported in part by the Defense Advanced Research Projects Agency, monitored by the Dept. of the Army under contract number DABT63-94-C-0058 and Rome Laboratory under grant F30602-96-1-0343, and by the National Science Foundation under grant CCR-9633438.
Copyright ©1997 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.