Running SPEC Benchmarks on Linux

Chuck Lever, Netscape Communications Corp.

$Id: spec.html,v 1.7 1999/11/12 20:12:54 cel Exp $

Abstract

We describe the modifications required to run SPEC's SDET, KENBUS, and SPECweb96 benchmarks on Linux.

This document is Copyright © 1999 Netscape Communications Corp., all rights reserved. Trademarked material referenced in this document is copyright by its respective owner.

Introduction

Linux is an open-source POSIX-compliant operating system that runs on commodity Intel PC hardware. The Linux kernel is developed using an innovative distributed process that has resulted in a high-performance ultra-stable OS platform that is often preferred over commercial OS distributions.

However, this process isn't perfect. Sometimes old problems are re-introduced to the kernel, and performance can vary from release to release. Very often, performance changes are described in informal terms, and no scientific analysis is provided. No history of performance improvements is maintained. Clearly, a database of regression test results can help prevent flagging performance or API compliance as development progresses.

In order to regression-test Linux scientifically, benchmark standards must be agreed upon. In this paper, we describe the modifications required to run SPEC's SDET, KENBUS, and SPECweb96 benchmarks on Linux. We show how to run the standard SPEC benchmarks on Linux, and make a case for using these as part of a regression suite.

SPEC SDM 1.1

The latest release of SPEC's SDM benchmark is 1.1, current as of the early 90's. Very little work has been done on this suite since then. However, it is still a useful benchmark because of how thoroughly it checks its output. SPEC SDM is distributed by the SPEC Organization for a fee.

In the modification instructions below, we assume that you already have some familiarity the benchmarks.

Common changes

This section describes changes you'll need to make to Linux to get both S-DET and KENBUS working correctly.

In order to run tests with a large number of processes, you will need to re-compile the Linux kernel with NR_TASKS set to its maximum value. The NR_TASKS constant is contained in /usr/src/linux/include/linux/tasks.h. Its maximum value is 4090 on Intel processors with the kernel APM extensions enabled. It is also recommended that MAX_TASKS_PER_USER be increased to a number almost as large, say, 3500.
Be sure that any programs, such as xntpd, that can alter the system clock have been disabled. The system clock is used to measure the elapsed time for a benchmark run, and if it changes during a run, the results can be altered, sometimes in a way where the results look reasonable, and sometimes they will appear outlandish (negative throughput, for example).
Disable any regularly scheduled jobs such as sendmail, or, for example, a cron job that might start a news process. These jobs will compete with the benchmark for I/O bandwidth and physical memory, and cause unpredictable variation in the results.
Create /bin/time with the contents:
```
#!/bin/sh
exec /usr/bin/time --portability $*
```
and be sure the /bin comes before /usr/bin in the default PATH used during the benchmarks.
The Linux "top" command may need to be rebuilt if you have modified the NR_TASKS macro as described above, and would like to use "top" to watch the system as you run the benchmark. Otherwise, "top" will run out of process table space, and will stop prematurely during benchmark runs with a large number of scripts.
If you have installed an alternate C compiler (that is, a C compiler that exists somewhere other than /usr/bin/gcc) you will need to change the /usr/bin/cc link to point to it if you want to use the alternate compiler during the benchmark.
If you want to run with many scripts, you will need to increase the system-wide file descriptor maximum. You can do that by echoing the new maximum into /proc/sys/fs/file-max:
```
su
echo 32768 >/proc/sys/fs/file-max
```
Be careful about how your benchmark file system is mounted. If it is mounted with the "sync" option, the benchmark will run much more slowly, and will be very disk-intensive. You may also choose to set the "noatime" mount option to lessen disk activity even further. In specific, both benchmark suites use /usr/tmp which is linked to /var/tmp. If /var is mounted with the "sync" option, this will cause significant slow-downs.

Description of S-DET

The S-DET portion of the benchmark is based on a script of typical programs run by an imaginary software developer. The script contains commands such as nroff, cc, and spell. The script, and therefore the system load offered by the script, remains the same over all invocations of the benchmark. Offered load is varied by concurrently invoking several copies of this script. A throughput result is obtained by dividing the number of running scripts by the elapsed time required for their completion.

This benchmark exercises multiprocessing, filesystem, and virtual memory facilities. Even on modern hardware, this benchmark is able to create significant loads. Because the output of every script is checked against a standard output log, misbehavior caused by system overload can be spotted by the benchmark automatically.

S-DET modifications

Linux distributions don't have standard "time" or "spell" programs, so some minor adjustments must be made to compensate.

Step-by-step:

Create sdm1.1/benchspec/057.sdet/M.linux.22
- copy from M.sun
- delete extra LD flags
- change compiler optimization, MACHID, and LABEL
- salt to taste
Edit sdm1.1/benchspec/057.sdet/tools/excommon.h
- line 30 should read "} dummy;"
Edit /usr/bin/spell
- change the ispell invocation to "cat $* | ispell -l"

Edit sdm1.1/benchspec/057.sdet/output/generic
- go to the "starting text" section, replace it with:

*** starting text
real
user
sys
pre
Pre
SPECmark
SPECthruput
spiff
pre
pre
pre
POSIX
*** starting bprogs

At this time, there is a bug/feature in Gnu "make" that prevents the "wrapper" feature of the runsdm script from working. We are still investigating this problem.

Description of KENBUS

The KENBUS portion of the SDM benchmark suite is similar to S-DET in that it is based on a fixed script of programs. However, the KENBUS script is more typical of a time-shared word processing environment. The script driver simulates keystrokes at a rate controlled by the benchmark user. The KENBUS script is meant to be tailored by the benchmark user, so the offered load can vary significantly. Overall offered load is varied by concurrently invoking several copies of the KENBUS script. As with S-DET, a throughput result is obtained by dividing the number of scripts by the elapsed time required for their completion.

This benchmark, like its cousin S-DET, can also tax a modern system significantly, revealing operating system problems that don't arise under everyday loads. Because the output of every script is checked against a standard output log, misbehavior caused by system overload can be spotted by the benchmark automatically.

KENBUS modifications

The GNU versions of "make" and "time" need to have special options in order to operate in a way the KENBUS scripts expect.

Step-by-step:

Create sdm1.1/benchspec/061.kenbus1/M.linux.22
- copy from M.sun
- delete extra LD flags
- change compiler optimization, MACHID, and LABEL
- salt to taste
Before invoking runsdm, set and export the following environment variable:
- MAKEFLAGS=--no-print-directory
Edit the master script in sdm1.1/benchspec/061.kenbus1/Workload/script.master
- change /bin/sh to /bin/ash
- add "-s" to the piped invocations of "ed"
Edit sdm1.1/benchspec/061.kenbus1/check.sed
- replace PS1= assignment with static 'PS1="#"'
Edit benchspec/061.kenbus1/time.awk
- on line 64, change 'print' to 'print " "'

Due to the small system-wide maximum number of processes on Intel Linux (4090), the KENBUS benchmark can't drive modern hardware very hard. Please stay tuned to this space for progress.

This document was written as part of the Linux Scalability Project. For more information, see our home page.
If you have comments or suggestions, email linux-scalability@citi.umich.edu

Projects: Linux scalability: Porting SPEC benchmarks