CITI: </HEAD> <BODY> Projects: NFS Version 4 Open Source Reference Implementation

NFS Version 4 Open Source Project Code Release Linux Documentation

Note:
This page will dramatically improve over the next week as we madly type up documentation....

Functionality

General

In this code release we deliver a client and server that implement a great portion of the required pieces of the NFSv4 protocol. This release is based on the merged Linux 2.2.14 NFSv3 patches produced by Dave Higgins of VA Linux Systems. Specifically, dhiggen_merge-1.1 which consists of client work by Trond Myklebust, server work by Niel Brown, and NLM code by SGI. There have been many bug fixes and improvements to the NFSv3 code since this merge, which we have not tracked; concentrating instead on producing a base NFSv4 implementation. In most cases, we use a #define to note the changes to the NFSv3 base. Functionality is our goal, not performance, so we sometimes take the easier route of extra copies, or ultimately unnecessary kmalloc's. In many cases, our client still 'thinks' NFSv3, not taking full advantage of the compound RPC.

The NFSv4 specification is large - 257 pages. The order of our development schedule has been in part determined by the NFSv4 bakeoff's; we develop the features necessary to inter-operate with other NFSv4 implementations.

Testing

We use the nine Basic Connectathon Tests as regression tests with the caveat that currently, O_TRUNC does not work, so we perform only one instead of the default ten write tests in the fifth basic test.

Comments

Access Control Lists

ACL's are not implemeted.

File System Migration and Replication

We have yet to implement the fs_locations attribute.

Pseudo File System

Our server exports a pseudo file system. See Mounting under the Operations section.

ClientCaching and Delegation

We have yet to implement a client side persistent cache. See the Delegation Support section.

RPC and Security Flavors

We have a rudimentary RPCSEC_GSS kernel RPC implementation, with a hardcoded security tuple of <Kerberos v5, default QOP, RPCSEC_GSS_SVC_NONE>.

File Locking and Share Reservations

We have implemented share reservations. See the State section for details. Byte range locking is partially complete.

UTF8

Linux VFS uses ASCII, NFSv4 is UTF8 on the wire. We have as set of UTF8 functions, which currently assume ASCII, and which can be updated to full UTF8 functionality.

Filehandle Volitility

The Linux VFS dcache maps file names to dcache entries. This means that the NFSv4 server does not export persistent filehandles, rather, the filehandles are volitile on rename and migration. We have not addressed this, and claim to export persistent filehandles.

Compound Operations

Compound procedure 0, the NULL procedure works, and is used for RPCSEC_GSS. Here is a overview of the functionality of each of the compound RPC procedure 1 operations. This is a high level overview, the detail is in the code.

ACCESS - functional

CLOSE - functional

COMMIT - not functional

CREATE - functional

DELEGPURGE - functional - see delegation section

DELEGRETURN - functional - see delegation section

GETATTR - functional, not all attributes coded

GETFH - functional

LINK -functional

LOCK - in development

LOCKT - in development

LOCKU - in development

LOOKUP - functional

LOOKUPP - functional

NVERIFY - not functional

OPEN - partially functional

OPENATTR - not functional

OPEN_CONFIRM - functional

OPEN_DOWNGRADE - not functional

PUTFH - functional

PUTPUBFH - not functional

PUTROOTFH - functional

READ - functional

READDIR - functional

READLINK - functional

REMOVE - functional

RENEW - partially functional

RESTOREFH - functional

SAVEFH - functional

SECINFO - functional

SETATTR - functional

SETCLIENTID - functional

SETCLIENTID_CONFIRM - functional

VERIFY - not functional

WRITE - functional

Operating Instructions

Daemons

MOUNTD and LOCKD are no longer needed as separate daemons, as their functionality is subsumed as part of the protocol. Both the client and server need to run GSSD which serves two functions - to communicate with underlying security services, namely Kerberos V5 for this port, and to perform various name translations.

Mounting

The server has a different view of the namespace it exports than previous versions of NFS. The exported sub-trees specified in /etc/exports are joined together by a read only virtual file system called the Pseudo file system. Clients mount the root of the server, which returns the Pseudo file system root. Clients then browse the Pseudo file system, and are challenged for access upon crossing the Pseudo file system/real file system mounts on the server. Note that any client is able to mount the Pseudo file system root, and it's the user's credentials that allow access into the exported pieces of the server's native file system. Thus, the client list portion of the /etc/exports stanza is ignored by NFSv4 server. Since the clients always mount the root of the server, the clients /etc/fstab no longer needs to track changes in the server's /etc/exports.

Since MOUNTD is no longer needed, mount/umount have been modified, and the new binaries need to be used.

Security

In this code release, the Pseudo file system is exported as security flavor AUTH_UNIX. In future releases, as per the protocol specification, a minimum RPCSEC_GSS security tuple will be required.

We have implemented some security features based on the default <Kerberos V5, default QOP, RPCSEC_GSS_SVC_NONE> RPCSEC_GSS security tuple. Kerberos V5 requires the following to be installed on both the client and the server machines:

/etc/krb5.keytab - the Kerberos v5 srvtab for this machine.
/etc/krb5.conf - Kerberos v5 configuration file.

Currently, to export a sub-tree with Kerberos v5 security, we use the following HACK! Since the client list is not used, specifying a client of 10.10.10.10 indicates that the exported subtree has the hardcoded default security tuple. We are hard at work on exportfs to implement the scheme that Sun Microsystems will use to describe the security on an exported fs, and will update the release asap.

Examples

/etc/exports

The first subtree uses RPCSEC_GSS, the second subtree uses AUTH_UNIX

/export/nfstest          10.10.10.10(rw)  
/export/nfstest1         141.211.92.196(rw)

/etc/fstab

leeds:/ /nfs/leeds nfs rsize=1024,wsize=1024,
timeo=14,intr 0 0

Without proper Kerberos v5 credentials on the client, any access to the RPCSEC_GSS-protected export will be denied.

State

One of the big differences between NFSv4 and previous versions of NFS is that the protocol has become stateful. Security, Compound RPC, DOS style share locking, byte range file locking, and delegation all require that the NFSv4 client and server maintain information of past NFSv4 events. Locking and delegation state is negotiated before and initialized in the NFSv4 OPEN call.

Compound RPC

In the compound RPC, the result of one operation is often the input to the next operation, thus the NFSv4 server needs to keep state. It turns out that the only result that needs to be passed to the next operation is a file handle or two - the current filehandle, and sometimes the previous filehandle. Our NFSv4 server stores these as per thread globals.

Locking and Delegation

The book keeping involved with implementing lease-based share and byte-range locks, and delegation is described below. There are four levels of activity imposed by the NFSv4 protocol - per client, per lock owner, and per file. Both the NFSv4 client and server maintain state corresponding to these activity levels. Note that delegation means that on delegated files, the client needs to do all the book keeping that the server does, and therefore, in general, maintains the same state. In this document, only the server-side state will be described.

Server
Per client state consists of a negotiated clientid (see SETCLIENTID section), and the lease.

Per lock owner state consists of the lock owner identity as well as the book keeping needed to negotiate and maintain the lock sequence number used to maintain 'at most once' lock semantics.

Per file state is kept in a hash table of nfs4_file_element structures hashed by the ext2 inode number and device number. Currently, there are 1024 hash buckets. The nfs4_file structure keeps normal stuff about the file such as the pointer to the VFS file structure, the filehandle, etc. Also kept in the file structure are linked lists of nfs4_locks and nfs4_delegations associated with the file. Of interest here is the construction of the nfs4_lock stateid (l_stateid) and the nfs4_delegation stateid (d_stateid). Both stateid' act as handles to server side state - they are opaque to the client, and they are constructed so as to uniquely locate a lock or delegation in both space (the hash table) and time (across server reboots).

l_stateid is 64 bits long. The first 32 bits is the server boot time in seconds. Then comes 12 bits of the nfs4_lock l_generation field, 10 bits of the nfs4_file f_seq field, and 10 bits of the hash bucket index. The nfs4_lock.l_generation value and the nfs4_file.f_seq value are set by incrementing nfs4_file.f_lock4_seq and nfs4_file_element.f_uniq_seq respectively which provide unique values across the nfs4_lock and nfs4_file linked lists. So, given a l_stateid, the NFSv4 server can use the server boot time to determine if the stateid is stale(NFS4ERR_STALE_STATEID). If the stateid is not stale, the combination of the hash bucket number and the f_seq stored in the l_stateid gives the unique location of the nfs4_file that the stateid refers to. If the nfs4_file is found, and the l_generation number stored in the l_stateid doesn't match an l_generation of an existing nfs4_lock on the nfs4_files linked list, then the l_stateid is old (NFS4ERR_OLD_STATEID). If the nfs4_file is not found, then the l_stateid is bad (NFS4ERR_BAD_STATEID).

Eventually, the d_stateid will be constructed in a similar manner to the l_stateid. Currently, the l_stateid does not contain the server boot time, but only contains the nfs4_files[ ] hash bucket and a uniquifier.

Delegation Support

Support for NFSv4 delegations is underway in our implementation, although the current level of delegation support is quite miimal. By dedfault, the delegation subsystem is disabled, but can be enabled by changing the valuse of several #define's in the files fs/nfs/nfs4deleg.c (client-side) and fs/nfsd/nfsd4deleg.c (server-side). These two files also contain most of the delegation code, with the exception of the callback XDR, which can be found in the files fs/nfs/nfs4cbxdr.c (client-side), and fs/nfsd/nfsd4cbxdr.c (server-side).

Assuming that delegation is enabled, the client spawns several threads to handle callback RPC's when the nfs4 module is loaded, which currently listen on the hardcoded port 23137 (arbitrarily chosen). When a client performs SETCLIENTID, the server does a CB_NULL to probe for the existence of a callback path to the client. If this call succeeds, the client is considered a candidate for receiving future delegations: if ti fails, the client will never receive them. The server-side logic for issuing delegations is currently fairly simple; the serfver always issues a delegation if there are no conflicts with other clients who have OPENed the file (such as issuing a read delegation whehn a different client has opened the file for writing, or issued a write delegation when a different client has opened the file read-only). When a delegation is issued, the server allocates a structure (struct nfs4_delegation) to store state for the delegation; the structure is inserted into a linked list of per-file delegations, and inserted into a hash table of all delegations (keyed by stateid, since the client references the delegations by stateid during DELEGRETURN). When the client receives a delegation, the client allocates a structure (struct nfs4_deleg_inode) to store state for the delegation; the structure is linked off the file's in-memory inode, and inserted into a hash gable (keyed by filehandle, since the server references delegations by filehandel during callbacks).

Once a delegation has been issued, it can be recalled at a later time. The server issues a recall when it receives an OPEN from a different client which conflicts with the delegation, and necessitates the recall. The thread which handles the OPEN call perfroms CB_RECALL operations on all conflicting delegations, and goes to sleep waiting for the dlelgations to be returned. When a client receives the CB_RECALL, it responds to the callback immediately (as mandated by the NFSv4 spec), and dispatches an axynchronous rpc task to perform the actual DELEGRETURN. When the server receives the DELEGRETURN, it deletes the delegation state, and wakes up the sleeping thread, which can now finish processing the OPEN call.

THis basic mechanism for delegation exchange between client and server, as described in the preceeding two paragraphs, is fully implemented at presetn. However no futher delegation support has been implemented yet; the client does not retain a persistent cache int eh presence of a delegation, perform OPEN operations locally in the presence of a delegation, perform LOCK operations locally int eh presence of a delegation, etc. In other words, our implementation of delegation is in a half-finished state; the client treats the delegation as a piece of state which must be remembered, but does not attach any meaning to the delegation itself once it has been aquired.

Enabling Delegations

To enable delegations, change the values of

#define ENABLE_CALLBACKS 0

#define ACCEPT_READ_DELEGATIONS 0

in fs/nfs/nfs4deleg.c (client-side) from 0 to 1 and

#define TEST_CALLBACKS 0

#define ISSUE_READ_DELEGATIONS 0

in fs/nfsd/nfsd4deleg.c (server-side) from 0 to 1.

The client and server will then exchange delegations. As remarked above, this will have no effect on performance, since the client doesn't use the delegations for anything, but the basic mechanism of delegation exchange between client and server can be inspected.

IMPORTANT: If delegations are enabled on the client, several threads (all named nfs4_cb) will be spawned when the nfs module is loaded. Before the module can be unloaded, these threads must be killed, e.g. using 'killall -9 nfs4_cb'. Until these threads are killed, 'rmmod nfs' will fail. Future releases will kill the callback threads automatically when the module is unloaded.