NFS Version 4 Open Source Project Code Release Linux Documentation
In this code release we deliver a client and server that implement a great portion of the required pieces of the NFSv4 protocol. This release is based on the merged Linux 2.2.14 NFSv3 patches produced by Dave Higgins of VA Linux Systems. Specifically, dhiggen_merge-1.1 which consists of client work by Trond Myklebust, server work by Niel Brown, and NLM code by SGI. There have been many bug fixes and improvements to the NFSv3 code since this merge, which we have not tracked; concentrating instead on producing a base NFSv4 implementation. In most cases, we use a #define to note the changes to the NFSv3 base. Functionality is our goal, not performance, so we sometimes take the easier route of extra copies, or ultimately unnecessary kmalloc's. In many cases, our client still 'thinks' NFSv3, not taking full advantage of the compound RPC.
The NFSv4 specification is large - 257 pages. The order of our development schedule has been in part determined by the NFSv4 bakeoff's; we develop the features necessary to inter-operate with other NFSv4 implementations.
We use the nine Basic Connectathon Tests as regression tests with the caveat that currently, O_TRUNC does not work, so we perform only one instead of the default ten write tests in the fifth basic test.
Access Control Lists
ACL's are not implemeted.
File System Migration and Replication
We have yet to implement the fs_locations attribute.
Pseudo File System
Our server exports a pseudo file system. See Mounting under the Operations section.
ClientCaching and Delegation
We have yet to implement a client side persistent cache. See the Delegation Support section.
RPC and Security Flavors
We have a rudimentary RPCSEC_GSS kernel RPC implementation, with a hardcoded security tuple of <Kerberos v5, default QOP, RPCSEC_GSS_SVC_NONE>.
File Locking and Share Reservations
We have implemented share reservations. See the State section for details. Byte range locking is partially complete.
Linux VFS uses ASCII, NFSv4 is UTF8 on the wire. We have as set of UTF8 functions, which currently assume ASCII, and which can be updated to full UTF8 functionality.
The Linux VFS dcache maps file names to dcache entries. This means that
the NFSv4 server does not export persistent filehandles, rather, the filehandles
are volitile on rename and migration. We have not addressed this, and claim
to export persistent filehandles.
Compound procedure 0, the NULL procedure works, and is used for RPCSEC_GSS. Here is a overview of the functionality of each of the compound RPC procedure 1 operations. This is a high level overview, the detail is in the code.
ACCESS - functional
CLOSE - functional
COMMIT - not functional
CREATE - functional
DELEGPURGE - functional - see delegation section
DELEGRETURN - functional - see delegation section
GETATTR - functional, not all attributes coded
GETFH - functional
LOCK - in development
LOCKT - in development
LOCKU - in development
LOOKUP - functional
LOOKUPP - functional
NVERIFY - not functional
OPEN - partially functional
OPENATTR - not functional
OPEN_CONFIRM - functional
OPEN_DOWNGRADE - not functional
PUTFH - functional
PUTPUBFH - not functional
PUTROOTFH - functional
READ - functional
READDIR - functional
READLINK - functional
REMOVE - functional
RENEW - partially functional
RESTOREFH - functional
SAVEFH - functional
SECINFO - functional
SETATTR - functional
SETCLIENTID - functional
SETCLIENTID_CONFIRM - functional
VERIFY - not functional
WRITE - functional
MOUNTD and LOCKD are no longer needed as separate daemons, as their functionality is subsumed as part of the protocol. Both the client and server need to run GSSD which serves two functions - to communicate with underlying security services, namely Kerberos V5 for this port, and to perform various name translations.
The server has a different view of the namespace it exports than previous versions of NFS. The exported sub-trees specified in /etc/exports are joined together by a read only virtual file system called the Pseudo file system. Clients mount the root of the server, which returns the Pseudo file system root. Clients then browse the Pseudo file system, and are challenged for access upon crossing the Pseudo file system/real file system mounts on the server. Note that any client is able to mount the Pseudo file system root, and it's the user's credentials that allow access into the exported pieces of the server's native file system. Thus, the client list portion of the /etc/exports stanza is ignored by NFSv4 server. Since the clients always mount the root of the server, the clients /etc/fstab no longer needs to track changes in the server's /etc/exports.
Since MOUNTD is no longer needed, mount/umount have been modified, and the new binaries need to be used.
In this code release, the Pseudo file system is exported as security flavor AUTH_UNIX. In future releases, as per the protocol specification, a minimum RPCSEC_GSS security tuple will be required.
We have implemented some security features based on the default <Kerberos V5, default QOP, RPCSEC_GSS_SVC_NONE> RPCSEC_GSS security tuple. Kerberos V5 requires the following to be installed on both the client and the server machines:
Currently, to export a sub-tree with Kerberos v5 security, we use the
following HACK! Since the client list is not used, specifying a client
of 10.10.10.10 indicates that the exported subtree has the hardcoded
default security tuple. We are hard at work on exportfs to
implement the scheme that Sun Microsystems will use to describe the
security on an exported fs, and will update the release asap.
The first subtree uses RPCSEC_GSS, the second subtree uses AUTH_UNIX
/export/nfstest 10.10.10.10(rw) /export/nfstest1 18.104.22.168(rw)/etc/fstab
Without proper Kerberos v5 credentials on the client, any access to the RPCSEC_GSS-protected export will be denied.
One of the big differences between NFSv4 and previous versions of NFS is that the protocol has become stateful. Security, Compound RPC, DOS style share locking, byte range file locking, and delegation all require that the NFSv4 client and server maintain information of past NFSv4 events. Locking and delegation state is negotiated before and initialized in the NFSv4 OPEN call.
In the compound RPC, the result of one operation is often the input to the next operation, thus the NFSv4 server needs to keep state. It turns out that the only result that needs to be passed to the next operation is a file handle or two - the current filehandle, and sometimes the previous filehandle. Our NFSv4 server stores these as per thread globals.
Locking and Delegation
The book keeping involved with implementing lease-based share and byte-range locks, and delegation is described below. There are four levels of activity imposed by the NFSv4 protocol - per client, per lock owner, and per file. Both the NFSv4 client and server maintain state corresponding to these activity levels. Note that delegation means that on delegated files, the client needs to do all the book keeping that the server does, and therefore, in general, maintains the same state. In this document, only the server-side state will be described.
Per lock owner state consists of the lock owner identity as well as the book keeping needed to negotiate and maintain the lock sequence number used to maintain 'at most once' lock semantics.
Per file state is kept in a hash table of nfs4_file_element structures
hashed by the ext2 inode number and device number. Currently, there are
1024 hash buckets. The nfs4_file structure keeps normal stuff about the
file such as the pointer to the VFS file structure, the filehandle, etc.
Also kept in the file structure are linked lists of nfs4_locks and nfs4_delegations
associated with the file. Of interest here is the construction of the nfs4_lock
stateid (l_stateid) and the nfs4_delegation stateid (d_stateid). Both stateid'
act as handles to server side state - they are opaque to the client, and
they are constructed so as to uniquely locate a lock or delegation in both
space (the hash table) and time (across server reboots).
l_stateid is 64 bits long. The first 32 bits is the server boot time in seconds. Then comes 12 bits of the nfs4_lock l_generation field, 10 bits of the nfs4_file f_seq field, and 10 bits of the hash bucket index. The nfs4_lock.l_generation value and the nfs4_file.f_seq value are set by incrementing nfs4_file.f_lock4_seq and nfs4_file_element.f_uniq_seq respectively which provide unique values across the nfs4_lock and nfs4_file linked lists. So, given a l_stateid, the NFSv4 server can use the server boot time to determine if the stateid is stale(NFS4ERR_STALE_STATEID). If the stateid is not stale, the combination of the hash bucket number and the f_seq stored in the l_stateid gives the unique location of the nfs4_file that the stateid refers to. If the nfs4_file is found, and the l_generation number stored in the l_stateid doesn't match an l_generation of an existing nfs4_lock on the nfs4_files linked list, then the l_stateid is old (NFS4ERR_OLD_STATEID). If the nfs4_file is not found, then the l_stateid is bad (NFS4ERR_BAD_STATEID).
Eventually, the d_stateid will be constructed in a similar manner to
the l_stateid. Currently, the l_stateid does not contain the server boot
time, but only contains the nfs4_files[ ] hash bucket and a uniquifier.
Support for NFSv4 delegations is underway in our implementation, although the current level of delegation support is quite miimal. By dedfault, the delegation subsystem is disabled, but can be enabled by changing the valuse of several #define's in the files fs/nfs/nfs4deleg.c (client-side) and fs/nfsd/nfsd4deleg.c (server-side). These two files also contain most of the delegation code, with the exception of the callback XDR, which can be found in the files fs/nfs/nfs4cbxdr.c (client-side), and fs/nfsd/nfsd4cbxdr.c (server-side).
Assuming that delegation is enabled, the client spawns several threads to handle callback RPC's when the nfs4 module is loaded, which currently listen on the hardcoded port 23137 (arbitrarily chosen). When a client performs SETCLIENTID, the server does a CB_NULL to probe for the existence of a callback path to the client. If this call succeeds, the client is considered a candidate for receiving future delegations: if ti fails, the client will never receive them. The server-side logic for issuing delegations is currently fairly simple; the serfver always issues a delegation if there are no conflicts with other clients who have OPENed the file (such as issuing a read delegation whehn a different client has opened the file for writing, or issued a write delegation when a different client has opened the file read-only). When a delegation is issued, the server allocates a structure (struct nfs4_delegation) to store state for the delegation; the structure is inserted into a linked list of per-file delegations, and inserted into a hash table of all delegations (keyed by stateid, since the client references the delegations by stateid during DELEGRETURN). When the client receives a delegation, the client allocates a structure (struct nfs4_deleg_inode) to store state for the delegation; the structure is linked off the file's in-memory inode, and inserted into a hash gable (keyed by filehandle, since the server references delegations by filehandel during callbacks).
Once a delegation has been issued, it can be recalled at a later time. The server issues a recall when it receives an OPEN from a different client which conflicts with the delegation, and necessitates the recall. The thread which handles the OPEN call perfroms CB_RECALL operations on all conflicting delegations, and goes to sleep waiting for the dlelgations to be returned. When a client receives the CB_RECALL, it responds to the callback immediately (as mandated by the NFSv4 spec), and dispatches an axynchronous rpc task to perform the actual DELEGRETURN. When the server receives the DELEGRETURN, it deletes the delegation state, and wakes up the sleeping thread, which can now finish processing the OPEN call.
THis basic mechanism for delegation exchange between client and server, as described in the preceeding two paragraphs, is fully implemented at presetn. However no futher delegation support has been implemented yet; the client does not retain a persistent cache int eh presence of a delegation, perform OPEN operations locally in the presence of a delegation, perform LOCK operations locally int eh presence of a delegation, etc. In other words, our implementation of delegation is in a half-finished state; the client treats the delegation as a piece of state which must be remembered, but does not attach any meaning to the delegation itself once it has been aquired.
To enable delegations, change the values of
#define ENABLE_CALLBACKS 0
#define ACCEPT_READ_DELEGATIONS 0
in fs/nfs/nfs4deleg.c (client-side) from 0 to 1 and
#define TEST_CALLBACKS 0
#define ISSUE_READ_DELEGATIONS 0
in fs/nfsd/nfsd4deleg.c (server-side) from 0 to 1.
The client and server will then exchange delegations. As remarked above, this will have no effect on performance, since the client doesn't use the delegations for anything, but the basic mechanism of delegation exchange between client and server can be inspected.
IMPORTANT: If delegations are enabled on the client, several threads (all named nfs4_cb) will be spawned when the nfs module is loaded. Before the module can be unloaded, these threads must be killed, e.g. using 'killall -9 nfs4_cb'. Until these threads are killed, 'rmmod nfs' will fail. Future releases will kill the callback threads automatically when the module is unloaded.