Excerpt from "MultiBoot Standard" Version 0.6 March 29, 1996 _________________________________________________________________ (This contains the essential MultiBoot specification, omitting background and related info found in ftp://flux.cs.utah.edu/flux/multiboot/.) Contents * Terminology * Scope and Requirements * Details * Authors The following items are not part of the standards document, but are included for prospective OS and bootloader writers. * Example OS Code * Example Bootloader Code _________________________________________________________________ Terminology Throughout this document, the term "boot loader" means whatever program or set of programs loads the image of the final operating system to be run on the machine. The boot loader may itself consist of several stages, but that is an implementation detail not relevant to this standard. Only the "final" stage of the boot loader - the stage that eventually transfers control to the OS - needs to follow the rules specified in this document in order to be "MultiBoot compliant"; earlier boot loader stages can be designed in whatever way is most convenient. The term "OS image" is used to refer to the initial binary image that the boot loader loads into memory and transfers control to to start the OS. The OS image is typically an executable containing the OS kernel. The term "boot module" refers to other auxiliary files that the boot loader loads into memory along with the OS image, but does not interpret in any way other than passing their locations to the OS when it is invoked. _________________________________________________________________ Scope and Requirements Architectures This standard is primarily targetted at PC's, since they are the most common and have the largest variety of OS's and boot loaders. However, to the extent that certain other architectures may need a boot standard and do not have one already, a variation of this standard, stripped of the x86-specific details, could be adopted for them as well. Operating systems This standard is targetted toward free 32-bit operating systems that can be fairly easily modified to support the standard without going through lots of bureaucratic rigmarole. The particular free OS's that this standard is being primarily designed for are Linux, FreeBSD, NetBSD, Mach, and VSTa. It is hoped that other emerging free OS's will adopt it from the start, and thus immediately be able to take advantage of existing boot loaders. It would be nice if commercial operating system vendors eventually adopted this standard as well, but that's probably a pipe dream. Boot sources It should be possible to write compliant boot loaders that load the OS image from a variety of sources, including floppy disk, hard disk, and across a network. Disk-based boot loaders may use a variety of techniques to find the relevant OS image and boot module data on disk, such as by interpretation of specific file systems (e.g. the BSD/Mach boot loader), using precalculated "block lists" (e.g. LILO), loading from a special "boot partition" (e.g. OS/2), or even loading from within another operating system (e.g. the VSTa boot code, which loads from DOS). Similarly, network-based boot loaders could use a variety of network hardware and protocols. It is hoped that boot loaders will be created that support multiple loading mechanisms, increasing their portability, robustness, and user-friendliness. Boot-time configuration It is often necessary for one reason or another for the user to be able to provide some configuration information to the OS dynamically at boot time. While this standard should not dictate how this configuration information is obtained by the boot loader, it should provide a standard means for the boot loader to pass such information to the OS. Convenience to the OS OS images should be easy to generate. Ideally, an OS image should simply be an ordinary 32-bit executable file in whatever file format the OS normally uses. It should be possible to 'nm' or disassemble OS images just like normal executables. Specialized tools should not be needed to create OS images in a "special" file format. If this means shifting some work from the OS to the boot loader, that is probably appropriate, because all the memory consumed by the boot loader will typically be made available again after the boot process is created, whereas every bit of code in the OS image typically has to remain in memory forever. The OS should not have to worry about getting into 32-bit mode initially, because mode switching code generally needs to be in the boot loader anyway in order to load OS data above the 1MB boundary, and forcing the OS to do this makes creation of OS images much more difficult. Unfortunately, there is a horrendous variety of executable file formats even among free Unix-like PC-based OS's - generally a different format for each OS. Most of the relevant free OS's use some variant of a.out format, but some are moving to ELF. It is highly desirable for boot loaders not to have to be able to interpret all the different types of executable file formats in existence in order to load the OS image - otherwise the boot loader effectively becomes OS-specific again. This standard adopts a compromise solution to this problem. MultiBoot compliant boot images always either (a) are in ELF format, or (b) contain a "magic MultiBoot header", described below, which allows the boot loader to load the image without having to understand numerous a.out variants or other executable formats. This magic header does not need to be at the very beginning of the executable file, so kernel images can still conform to the local a.out format variant in addition to being MultiBoot compliant. Boot modules Many modern operating system kernels, such as those of VSTa and Mach, do not by themselves contain enough mechanism to get the system fully operational: they require the presence of additional software modules at boot time in order to access devices, mount file systems, etc. While these additional modules could be embedded in the main OS image along with the kernel itself, and the resulting image be split apart manually by the OS when it receives control, it is often more flexible, more space-efficient, and more convenient to the OS and user if the boot loader can load these additional modules independently in the first place. Thus, this standard should provide a standard method for a boot loader to indicate to the OS what auxiliary boot modules were loaded, and where they can be found. Boot loaders don't have to support multiple boot modules, but they are strongly encouraged to, because some OS's will be unable to boot without them. _________________________________________________________________ Details There are three main aspects of the boot-loader/OS image interface this standard must specify: * The format of the OS image as seen by the boot loader. * The state of the machine when the boot loader starts the OS. * The format of the information passed by the boot loader to the OS. OS Image Format An OS image is generally just an ordinary 32-bit executable file in the standard format for that particular OS, except that it may be linked at a non-default load address to avoid loading on top of the PC's I/O region or other reserved areas, and of course it can't use shared libraries or other fancy features. Initially, only images in a.out format are supported; ELF support will probably later be specified in the standard. Unfortunately, the exact meaning of the text, data, bss, and entry fields of a.out headers tends to vary widely between different executable flavors, and it is sometimes very difficult to distinguish one flavor from another (e.g. Linux ZMAGIC executables and Mach ZMAGIC executables). Furthermore, there is no simple, reliable way of determining at what address in memory the text segment is supposed to start. Therefore, this standard requires that an additional header, known as a 'multiboot_header', appear somewhere near the beginning of the executable file. In general it should come "as early as possible", and is typically embedded in the beginning of the text segment after the "real" executable header. It _must_ be contained completely within the first 8192 bytes of the executable file, and must be longword (32-bit) aligned. These rules allow the boot loader to find and synchronize with the text segment in the a.out file without knowing beforehand the details of the a.out variant. The layout of the header is as follows: +-------------------+ 0 | magic: 0x1BADB002 | (required) 4 | flags | (required) 8 | checksum | (required) +-------------------+ 8 | header_addr | (present if flags[16] is set) 12 | load_addr | (present if flags[16] is set) 16 | load_end_addr | (present if flags[16] is set) 20 | bss_end_addr | (present if flags[16] is set) 24 | entry_addr | (present if flags[16] is set) +-------------------+ All fields are in little-endian byte order, of course. The first field is the magic number identifying the header, which must be the hex value 0x1BADB002. The flags field specifies features that the OS image requests or requires of the boot loader. Bits 0-15 indicate requirements; if the boot loader sees any of these bits set but doesn't understand the flag or can't fulfill the requirements it indicates for some reason, it must notify the user and fail to load the OS image. Bits 16-31 indicate optional features; if any bits in this range are set but the boot loader doesn't understand them, it can simply ignore them and proceed as usual. Naturally, all as-yet-undefined bits in the flags word must be set to zero in OS images. This way, the flags fields serves for version control as well as simple feature selection. If bit 0 in the flags word is set, then all boot modules loaded along with the OS must be aligned on page (4KB) boundaries. Some OS's expect to be able to map the pages containing boot modules directly into a paged address space during startup, and thus need the boot modules to be page-aligned. If bit 1 in the flags word is set, then information on available memory via at least the 'mem_*' fields of the multiboot_info structure defined below must be included. If the bootloader is capable of passing a memory map (the 'mmap_*' fields) and one exists, then it must be included as well. If bit 16 in the flags word is set, then the fields at offsets 8-24 in the multiboot_header are valid, and the boot loader should use them instead of the fields in the actual executable header to calculate where to load the OS image. This information does not need to be provided if the kernel image is in ELF format, but it should be provided if the images is in a.out format or in some other format. Compliant boot loaders must be able to load images that either are in ELF format or contain the load address information embedded in the multiboot_header; they may also directly support other executable formats, such as particular a.out variants, but are not required to. All of the address fields enabled by flag bit 16 are physical addresses. The meaning of each is as follows: * header_addr -- Contains the address corresponding to the beginning of the multiboot_header - the physical memory location at which the magic value is supposed to be loaded. This field serves to "synchronize" the mapping between OS image offsets and physical memory addresses. * load_addr -- Contains the physical address of the beginning of the text segment. The offset in the OS image file at which to start loading is defined by the offset at which the header was found, minus (header_addr - load_addr). load_addr must be less than or equal to header_addr. * load_end_addr -- Contains the physical address of the end of the data segment. (load_end_addr - load_addr) specifies how much data to load. This implies that the text and data segments must be consecutive in the OS image; this is true for existing a.out executable formats. * bss_end_addr -- Contains the physical address of the end of the bss segment. The boot loader initializes this area to zero, and reserves the memory it occupies to avoid placing boot modules and other data relevant to the OS in that area. * entry -- The physical address to which the boot loader should jump in order to start running the OS. The checksum is a 32-bit unsigned value which, when added to the other required fields, must have a 32-bit unsigned sum of zero. Machine State When the boot loader invokes the 32-bit operating system, the machine must have the following state: * CS must be a 32-bit read/execute code segment with an offset of 0 and a limit of 0xffffffff. * DS, ES, FS, GS, and SS must be a 32-bit read/write data segment with an offset of 0 and a limit of 0xffffffff. * The address 20 line must be usable for standard linear 32-bit addressing of memory (in standard PC hardware, it is wired to 0 at bootup, forcing addresses in the 1-2 MB range to be mapped to the 0-1 MB range, 3-4 is mapped to 2-3, etc.). * Paging must be turned off. * The processor interrupt flag must be turned off. * EAX must contain the magic value 0x2BADB002; the presence of this value indicates to the OS that it was loaded by a MultiBoot-compliant boot loader (e.g. as opposed to another type of boot loader that the OS can also be loaded from). * EBX must contain the 32-bit physical address of the multiboot_info structure provided by the boot loader (see below). All other processor registers and flag bits are undefined. This includes, in particular: * ESP: the 32-bit OS must create its own stack as soon as it needs one. * GDTR: Even though the segment registers are set up as described above, the GDTR may be invalid, so the OS must not load any segment registers (even just reloading the same values!) until it sets up its own GDT. * IDTR: The OS must leave interrupts disabled until it sets up its own IDT. However, other machine state should be left by the boot loader in "normal working order", i.e. as initialized by the BIOS (or DOS, if that's what the boot loader runs from). In other words, the OS should be able to make BIOS calls and such after being loaded, as long as it does not overwrite the BIOS data structures before doing so. Also, the boot loader must leave the PIC programmed with the normal BIOS/DOS values, even if it changed them during the switch to 32-bit mode. Boot Information Format Upon entry to the OS, the EBX register contains the physical address of a 'multiboot_info' data structure, through which the boot loader communicates vital information to the OS. The OS can use or ignore any parts of the structure as it chooses; all information passed by the boot loader is advisory only. The multiboot_info structure and its related substructures may be placed anywhere in memory by the boot loader (with the exception of the memory reserved for the kernel and boot modules, of course). It is the OS's responsibility to avoid overwriting this memory until it is done using it. The format of the multiboot_info structure (as defined so far) follows: +-------------------+ 0 | flags | (required) +-------------------+ 4 | mem_lower | (present if flags[0] is set) 8 | mem_upper | (present if flags[0] is set) +-------------------+ 12 | boot_device | (present if flags[1] is set) +-------------------+ 16 | cmdline | (present if flags[2] is set) +-------------------+ 20 | mods_count | (present if flags[3] is set) 24 | mods_addr | (present if flags[3] is set) +-------------------+ 28 - 40 | syms | (present if flags[4] or flags[5] is set) +-------------------+ 44 | mmap_length | (present if flags[6] is set) 48 | mmap_addr | (present if flags[6] is set) +-------------------+ The first longword indicates the presence and validity of other fields in the multiboot_info structure. All as-yet-undefined bits must be set to zero by the boot loader. Any set bits that the OS does not understand should be ignored. Thus, the flags field also functions as a version indicator, allowing the multiboot_info structure to be expanded in the future without breaking anything. If bit 0 in the multiboot_info.flags word is set, then the 'mem_*' fields are valid. 'mem_lower' and 'mem_upper' indicate the amount of lower and upper memory, respectively, in kilobytes. Lower memory starts at address 0, and upper memory starts at address 1 megabyte. The maximum possible value for lower memory is 640 kilobytes. The value returned for upper memory is maximally the address of the first upper memory hole minus 1 megabyte. It is not guaranteed to be this value. If bit 1 in the multiboot_info.flags word is set, then the 'boot_device' field is valid, and indicates which BIOS disk device the boot loader loaded the OS from. If the OS was not loaded from a BIOS disk, then this field must not be present (bit 3 must be clear). The OS may use this field as a hint for determining its own "root" device, but is not required to. The boot_device field is layed out in four one-byte subfields as follows: +-------+-------+-------+-------+ | drive | part1 | part2 | part3 | +-------+-------+-------+-------+ The first byte contains the BIOS drive number as understood by the BIOS INT 0x13 low-level disk interface: e.g. 0x00 for the first floppy disk or 0x80 for the first hard disk. The three remaining bytes specify the boot partition. 'part1' specifies the "top-level" partition number, 'part2' specifies a "sub-partition" in the top-level partition, etc. Partition numbers always start from zero. Unused partition bytes must be set to 0xFF. For example, if the disk is partitioned using a simple one-level DOS partitioning scheme, then 'part1' contains the DOS partition number, and 'part2' and 'part3' are both zero. As another example, if a disk is partitioned first into DOS partitions, and then one of those DOS partitions is subdivided into several BSD partitions using BSD's "disklabel" strategy, then 'part1' contains the DOS partition number, 'part2' contains the BSD sub-partition within that DOS partition, and 'part3' is 0xFF. DOS extended partitions are indicated as partition numbers starting from 4 and increasing, rather than as nested sub-partitions, even though the underlying disk layout of extended partitions is hierarchical in nature. For example, if the boot loader boots from the second extended partition on a disk partitioned in conventional DOS style, then 'part1' will be 5, and 'part2' and 'part3' will both be 0xFF. If bit 2 of the flags longword is set, the 'cmdline' field is valid, and contains the physical address of the the command line to be passed to the kernel. The command line is a normal C-style null-terminated string. If bit 3 of the flags is set, then the 'mods' fields indicate to the kernel what boot modules were loaded along with the kernel image, and where they can be found. 'mods_count' contains the number of modules loaded; 'mods_addr' contains the physical address of the first module structure. 'mods_count' may be zero, indicating no boot modules were loaded, even if bit 1 of 'flags' is set. Each module structure is formatted as follows: +-------------------+ 0 | mod_start | 4 | mod_end | +-------------------+ 8 | string | +-------------------+ 12 | reserved (0) | +-------------------+ The first two fields contain the start and end addresses of the boot module itself. The 'string' field provides an arbitrary string to be associated with that particular boot module; it is a null-terminated ASCII string, just like the kernel command line. The 'string' field may be 0 if there is no string associated with the module. Typically the string might be a command line (e.g. if the OS treats boot modules as executable programs), or a pathname (e.g. if the OS treats boot modules as files in a file system), but its exact use is specific to the OS. The 'reserved' field must be set to 0 by the boot loader and ignored by the OS. NOTE: Bits 4 & 5 are mutually exclusive. If bit 4 in the multiboot_info.flags word is set, then the following fields in the multiboot_info structure starting at byte 28 are valid: +-------------------+ 28 | tabsize | 32 | strsize | 36 | addr | 40 | reserved (0) | +-------------------+ These indicate where the symbol table from an a.out kernel image can be found. 'addr' is the physical address of the size (4-byte unsigned long) of an array of a.out-format 'nlist' structures, followed immediately by the array itself, then the size (4-byte unsigned long) of a set of null-terminated ASCII strings (plus sizeof(unsigned long) in this case), and finally the set of strings itself. 'tabsize' is equal to it's size parameter (found at the beginning of the symbol section), and 'strsize' is equal to it's size parameter (found at the beginning of the string section) of the following string table to which the symbol table refers. Note that 'tabsize' may be 0, indicating no symbols, even if bit 4 in the flags word is set. If bit 5 in the multiboot_info.flags word is set, then the following fields in the multiboot_info structure starting at byte 28 are valid: +-------------------+ 28 | num | 32 | size | 36 | addr | 40 | shndx | +-------------------+ These indicate where the section header table from an ELF kernel is, the size of each entry, number of entries, and the string table used as the index of names. They correspond to the 'shdr_*' entries ('shdr_num', etc.) in the Executable and Linkable Format (ELF) specification in the program header. All sections are loaded, and the physical address fields of the elf section header then refer to where the sections are in memory (refer to the i386 ELF documentation for details as to how to read the section header(s)). Note that 'shdr_num' may be 0, indicating no symbols, even if bit 5 in the flags word is set. If bit 6 in the multiboot_info.flags word is set, then the 'mmap_*' fields are valid, and indicate the address and length of a buffer containing a memory map of the machine provided by the BIOS. 'mmap_addr' is the address, and 'mmap_length' is the total size of the buffer. The buffer consists of one or more of the following size/structure pairs ('size' is really used for skipping to the next pair): +-------------------+ -4 | size | +-------------------+ 0 | BaseAddrLow | 4 | BaseAddrHigh | 8 | LengthLow | 12 | LengthHigh | 16 | Type | +-------------------+ where 'size' is the size of the associated structure in bytes, which can be greater than the minimum of 20 bytes. 'BaseAddrLow' is the lower 32 bits of the starting address, and 'BaseAddrHigh' is the upper 32 bits, for a total of a 64-bit starting address. 'LengthLow' is the lower 32 bits of the size of the memory region in bytes, and 'LengthHigh' is the upper 32 bits, for a total of a 64-bit length. 'Type' is the variety of address range represented, where a value of 1 indicates available RAM, and all other values currently indicated a reserved area. The map provided is guaranteed to list all standard RAM that should be available for normal use. __________________________________________________________________________ Authors Bryan Ford Flux Research Group Dept. of Computer Science University of Utah Salt Lake City, UT 84112 multiboot@flux.cs.utah.edu baford@cs.utah.edu Erich Stefan Boleyn 924 S.W. 16th Ave, #202 Portland, OR, USA 97205 (503) 226-0741 erich@uruk.org We would also like to thank the many other people have provided comments, ideas, information, and other forms of support for our work. __________________________________________________________________________ Example OS code can be found in the OSKit in the "kern/x86" directory and in the oskit/x86/multiboot.h file. __________________________________________________________________________ Example Bootloader Code (from Erich Boleyn) - The GRUB bootloader project (http://www.uruk.org/grub) will be fully Multiboot-compliant, supporting all required and optional features present in this standard. A final release has not been made, but the GRUB beta release (which is quite stable) is available from ftp://ftp.uruk.org/public/grub/.