projects techreports press lab location staff
citi top.2 top.3
citi mid.3
bot.1 bot.2 bot.3

Projects: Linux scalability: Scaling poll() on Linux

Linux Optimizations

The following summarizes modifications made to the Linux kernel in order to optimize I/O-bound applications.

Open and Close

For applications with many open file descriptors, the open() system call needs to search a linear list in the form of a single bitmap in the Linux kernel to find a free fd (file descriptor). The result is that the time to find an open file descriptor increases linearly with the number of open files. Linux and *BSD memorize the last closed fd which allows a single open() to return quickly. But when open() and close() system calls are interleaved, a performance degradation is noticeable

A possible solution to this problem was proposed by Banga and Mogul in the paper "Scalable kernel performance for Internet servers under realistic load" (see annotated bibliography). Instead of using a single bitmap a two bitmap structure is used to maintain the list of free fds. The first bitmap points to free entries in the second bitmap, and the second bitmap points to the free fds themselves.

Graph: Performance difference between single bitmap and two bitmap allocation of free file descriptors.

The graph above shows the result of a benchmark that opened about 100,000 file descriptors. It measured the time between closing the first and the n-th file descriptors and two subsequent calls to open(). There is a noticeable difference between the single bitmap allocation of fds and the two-bitmap allocation.

Patch: two bitmap allocation of file descriptors.

Scaling Poll

Until recently Linux did not support more than 1024 file descriptors per process. For this reason it's kernel supports polling() and selecting() on only up to about 1000 file descriptors.

When calling poll() or select() each process has to allocate a fixed size wait_table which could be used by the underlying device drivers to store information about the processes which need to be woken up when an event occurs.

Each device driver adds information to the wait_table via the poll_wait() interface. The interface has been extended to dynamically allocate more storage space for the wait_table if needed.

Patch: scalable poll.

Hinting to Poll

A process registers interest in events on file descriptors with the poll() system call. The kernel passes down this information to all concerned device drivers and puts the process to sleep until a relevant event occurs.

When the process wakes up it has to scan all file descriptors it has been polling on to check for status changes. This is the case even though the status of only one file descriptor might have changed.

It would be useful if the device drivers could tell - hint to - each process which file descriptors changed their status.

To make this possible the poll()-interface was extended to additionally pass down the file descriptor each process uses as identification. The device drives keeps them in a backmapping list and on an event flags the file descriptor for each process in the backmapping list.

Because in general the hinting system would require each device driver to be rewritten, each device driver was given the ability to indicate if it supports hinting. So that only the essential drivers need to be modified, e.g. the network devices.

Graph: Performance difference between standard poll() and poll() using hints. Patch: hinting poll for 2.2.9 and hinting poll for 2.3.15
Niels Provos projects | techreports | press
lab | location | staff |
for info, email or call +1 (734) 763-2929.
Copyright 1999 University of Michigan Regents. All rights reserved.