A loose glossary of NTAP's parts. _____________________________________ . initial draft 09/06/2005. dmr. _____________________________________ . portal: The portal host itself is basically the hub of NTAP tests. It runs the PHP which provides the entire web-GUI and all of the profile-based user/test configuration functionality. The portal's webserver itself, in addition to running mod_php, runs several "grid" modules that perform authentication and credential-translation duties in order to secure the portal's online interface. .. perhaps it's easier to break this into parts right now: . grid modules: When a user wants to connect to the portal (at Umich, at least), they first do the logical equivalent of running the `kinit' program to acquire Kerberos credentials, and thereafter run `kx509' to get shorter-term X509 credentials. These kX509 creds are used by the user's PKCS11-plugin for their browser to authenticate the client to the portal machine. On the portal, several Apache modules assist in the credential dance; these are `grid-proxy-init', `mod_kct' ("kerberized credential-translator"), `mod_kx509' (a second set of proxy credentials are created a) for the user b) on the portal and c) are the actual ones used when issuing remote commands), and finally `mod_tts' (which assists in acquiring renewable credentials for long-running and/or repetitive jobs). These modules currently (mid-2005) exist for Apache 1.3+ and are being ported to Apache 2. . The LDAP directory: Also on the portal is NTAP's LDAP directory, which is used for a few different things. First, there is a directory that stores information about all of the PMPs and associated routers in your institution -- all network interfaces are specified with VLAN information (we use this when planning which PMPs to use as waypoints along the overall testpath). Next, there is a directory in which all of the PHP's profile-based settings are stored (e.g., a user's scheduled tests, saved option-templates for different kinds of tests, etc). Finally, the output data from, e.g., iperf and owamp are stored in another section of the directory. CITI's annotated schemas, as well as some utilities and documentation, are under `/usr/local/ntap2/ldap/'. . testpilot.py: An NTAP test is actually run by the "testpilot", a program which takes as input (at least) two PMP addresses, arguments for the test program(s) which will run between the PMPs during the test, and various options for storing output, determining waypoints of the testpath on-the-fly, and so on. After the pilot determines the testpath of PMPs, it creates a schedule for running each of the pairwise tests and, using the translated credentials supplied by the portal modules, issues globus-client commands to remotely and securely schedule, execute, and collect output from the tests. Note that the PMPs themselves independently authenticate and authorize the user scheduling the tests. As the pilot is a command-line based program, it can by run from a shell without trouble -- the PHP on the portal collects all of the arguments to the pilot and then runs the pilot, passing-through to the web user the pilot's stdout data. The pilot's usage information is helpful. . copilot.py: NTAP now supports deferred and/or repetitive job scheduling (e.g., "run this network test once a week at 4am"). The copilot's arguments consist of time/date scheduling elements ("a period of 2 days", "end at this time on this date", etc), how many times to retry a failed execution of the pilot (e.g., the PMPs were in-use or down or something), and then all of the normal arguments that the pilot requires. The copilot gathers any conf files being given to the testpilot and backs them up into a per-reservation directory, along with crontab files that schedule and unschedule the repetitive invocations of the pilot. The copilot, since it doesn't run synchronous with any web-based user interaction, spools all of the pilot's output to a per-reservation file. The copilot is also used to manually unschedule a repetitive or deferred test. Nearly all scheduling-related information can be found in `/usr/local/ntap2/reservations/'. Note, also, that the copilot utilizes the patched Vixie cron facility, `/etc/cron.d/' (upshot: don't delete anything in there that starts with "ntap"). . guru (daemon): In addition to tests run between pairs of PMPs, a "first-hop" test can also be run -- i.e., between an end-user and an NDT/web100-enabled PMP. Since we can't authenticate the end-user's machine, we use a modified version of the NDT client applet that runs tests between the user's machine and a PMP; a web100-savvy PMP has an instrumented network stack that gathers statistics, which are then analyzed for common errors (e.g., a duplex mismatch) and reported to the user. Our modified applet also sends its (nicely formatted) data back to the PMP, which securely relays the test data to the portal. The "guru" is a daemon that runs on the portal and chiefly does two things: (1) it stores NDT data from users' tests, and (2) it maintains a cache of traceroutes to and from the end-users' machines, which assist in finding a PMP "close" to the client host. The guru program itself is installed in /usr/local/ntap2/webserver/bin/ntapguru.py and has usage information. . guru client: The guru client program does a few different things (and has a lot of usage information; use `ntaputil --guru' to see the guru client's (extensive) usage info. First, the guru client can be used to send raw commands to the guru -- these are things like, given a user's DN and IP address, get the most-recent NDT results or the traceroute from the portal to the user's machine; or get all user DN's for which we have data -- things like that. These commands can be given on the command-line ("--getuserdns"), or the client has an interactive mode ("--interact") where raw commands can be sent straight through to the guru. Second, the guru client can be used to run traceroutes to client machines ("--radar") and store them in the guru. There are two modes, one of which is implemented as of mid-2005: "p2h" (runs a traceroute from the portal back to the end-user) and "mesh" (unimplemented; will be used to have one or more PMPs traceroute back to the end-user and relay the data back to the guru on the portal). These traceroutes are then used with "--find" to choose a PMP near an end-user. If none are available or no good choices can be made, all available PMPs (or routers) can be listed with "--list pmps|routers". Lastly, the guru client is used when relaying NDT data from the web100-enabled PMP back to the portal. . renewd: In conjunction with `mod_tts' (above), a new Kerberized service called `renewd' handles the acquisition and maintenance of renewable kX509 credentials on the portal. Though my Kerberos creds may expire after a day, I can have them translated into service-specific credentials that will be renewed, e.g., every day for a month. This facilitates long- running jobs. Note that `renewd' is in no way NTAP-specific and is a significant piece of software in its own right. . PMPs: Each PMP can be thought of as a Grid resource, which NTAP uses as a secure, remote invocation platform. PMPs run basic Globus gatekeeper software (chiefly for authentication), as well as a Globus-based resource manager known as GARA (for authorization and actually executing the jobs). While some Grid setups will have one gatekeeper protect multiple Grid resources, each of our PMPs independently authenticates and authorizes its users (via the X509 user's credentials). . install bits: Setting up a PMP is a bit complicated, but the instructions enumerate the various steps. Mainly, one installs the PMP RPM, Kerberos, and a Java JDK; contained within the PMP RPM is a(n instrumented) web100 kernel RPM that can be used if a custom kernel + patching isn't the route for you. Much more instructions are later. After installing the RPM, directions are given for configuring the below PMP constituents. Thereafter, one runs `ntap-postinstall-verify.sh', which finds most PMP setup snags and offers fixes. The bulk of the NTAP CVS repository (which includes our utilities, docs, etc) is installed in /usr/local/ntap2/ . . globus-gatekeeper: The gatekeeper is a daemon launched via xinetd. It's basically a funnel through which nearly all NTAP requests go through; NDT tests are the exception -- they're basically orthogonal to the globus infrastructure. The gatekeeper authenticates users in one of two ways: first, it can use the default (de-centralized, very-difficult-to-administer) per-Grid-resource flatfile, the "grid-mapfile" (which just contains a mapping from allowed-user-DN to local-UID); second, the gatekeeper can instead use a callout to Walden (a centralized LDAP directory of user DNs, user "groups", and other elements). For more information, globus.org. . globus-client: Using both Globus and GARA libraries, the globus-client takes a user's X509 credentials and, along with a lengthy string (an "RSL") describing the various options for the test, contacts each of the PMPs (the globus-client is normally run from the portal itself) and schedules the tests and output-gathering through the remote diffserv managers. . Walden: Walden is a per-PMP daemon written in Java that is used by the gatekeeper. The most-succinct summary of Walden that I can summon to mind is: "Walden makes Globus-based grid authentication- and authorization-management solutions scale". For a solid description of Walden and examples of its usefulness, please visit: http://www.citi.umich.edu/projects/ntap/docs.html#rawk . GARA diffserv manager: The diffserv manager is a daemon launched with an init-style script. Once the gatekeeper has authenticated the user, it hands off the request to the local diffserv manager, which then executes the test(s) and returns the spew. . NDT/Web100 ("first-hop") tests: As mentioned above, NTAP supports running tests between an end-user's machine and a web100- savvy PMP. What that actually means is that the PMP must be running: 1) the instrumented web100 kernel (or a suitably-patched version), 2) the daemon process that runs the performance tests (`web100srv'), and 3) the lightweight webserver whose sole job is to provide a webpage that contains the NDT Java applet (`Tcpbw100', a client for `web100srv'). In return, the user is presented with details statistics about network conditions, how the user's hardware and software are configured, and potential problem-conditions as determined by several heuristics. Our modified versions of `Tcpbw100' and `web100srv' then utilize the guru client to save these results on the portal. E.g., someone in network support might respond to a user's complaint of poor network conditions by sending them a URL to a web100- enabled PMP and, by tacking on an encoded form of the user's DN in the URL's query-string, the results are automatically saved back on the portal. The support person can then easily get at the user's specific data and, lo, it's a duplex mismatch or some such. A simple demo shell script called `show-last-NDT-results.sh' brings up a user's most-recent test data. Note that the guru client interface is very general and therefore invites higher-level wrappers. For those looking for our software (e.g., from that installed by our PMP RPM) here, it isn't located in a great spot. Under our `ntap2' top-level directory (normally /usr/local/ntap2), the source code for our modified NDT/Web100 servers and our modified NDT applet -- it should be in `ntap2/webserver/jarsigner' (it should be called "firsthop" or something). . policy routing: Given the PMP/Router LDAP directory described above, a given PMP might have addresses on a variety of VLANs, e.g. If a PMP is trying to "proxy" its traffic so that it follows, as closely as possible, the network path taken by a test-invoker's packets. So, if a PMP finds that the test user is on subnet X, which is VLAN Y, if the PMP has a presence on VLAN Y, it will choose that address. Effectively, we use Linux's `iproute2' code to implement one routing table per virtual interface, instead of one (primary) routing table for an entire host. The utility `ntap-config' is relatively primitive and, given a conf file, can set up all of the policy routing required quickly. However, here at Umich we haven't had a suitable network on which to actually use much of this. More information is available online. . Utilities: Several utilities have grown out of the various development needs of the project. They get used all over the place. Some are mentioned in other places. . send-ssh-key: `send-ssh-key' is a shell script that is not at all NTAP-specific -- I use it whenever I am distributing my ssh public key for use with `ssh-agent'. It transmits, installs, and configures an ssh key -- and the user only has to type the password once (it is not cached or stored in any way). Read below about `ntapctl' for how we use it in NTAP. Its usage info is plenty for anyone to use it. . ntapctl: The idea behind `ntapctl' is that each PMP is running five daemon processes (globus-gatekeeper, diffserv manager, Walden mgridauthd, and NDT's `web100srv' and `fakewww') and they need to be restarted periodically. I use four PMPs for development.. that's a lot of remote ssh commands. So, `ntapctl' is meant to be used in conjunction with `ssh-agent' so it can send commands over ssh -- but -not- ask for a password (makes it automatable, too). First, one sets up their agent stuff (e.g., `ssh-keygen -t rsa && ssh-agent $SHELL', followed by `ssh-add'). Then, the ssh public key (e.g., `~/.ssh/id_rsa.pub') needs to be installed on all of the remote machines that you don't want to have to type a password to access. Many people end up with default umasks that botch the file permissions here, at which point many given up on the ssh-agent altogether (because one often needs to crank up the debug spew on sshd itself, which most people can't/won't do). Instead, we use `send-ssh-key' to set everything up and set all of the various file permissions. Then, having sent our key to each PMP -once-, we need only load our key into the ssh-agent on our local machine, and then `ntapctl' can issue passwordless remote commands that are nevertheless secure (moreso than if a password is used with ssh, actually). Then, given a tiny prefab list of our PMPs' IP addresses (crack open the script, it's obvious), I can issue the normal init-style commands "start", "stop", "restart", and "status". E.g.: % ntapctl status --citi ... and all four of my normal PMPs report. Quite handy. I intend to expand this so it can be used to securely, remotely upgrade PMP software. Read `ntapctl's usage information from the command-line. . ntapproc.py: If a parent process creates children who create children (and so on), those children are re-parented to the `init' process if the parent is killed (more or less). If one wants to launch a process that is allowed to run for at most N seconds, and after N seconds that process and -all- of its descendents must have either terminated on their own or will be killed, it's a serious pain. As in, given that /proc/ and `kill' and whatnot are not standard across various POSIX systems, there isn't a "kill my whole process group" utility. That's what `ntapproc.py' does. Right now, it uses `ps' to gather PIDs, but I think I'll rewrite it to look for and parse /proc/ entires. Regardless, this is another fairly general-use utility. . garaerrors.py: When the globus-client runs, it returns GARA error codes, which are rather inscrutable. `garaerrors.py' is just a simple mapping of codes to strings; given an integer, it prints a message. Really only useful if you're in GARA's guts. . tagparser.py: Super-super simple markup-format parser. We use it for our conf files: --program and --pathmap files for `testpilot.py', --relayNDT files for the guru-client, and other places. Returns Python dictionaries of (sub)key(s) and value(s). . "nannies": We have a shell-script wrapper for our daemons that will restart them a certainly number of times if they exit, possibly delaying for some time and possibly scaling the timeouts. An example is "nanny.sh", which we use on the PMPs to keep the diffserv manager (which restarts itself periodically) running. . Heretofore-unclassified, largely task-oriented things: I'm not sure how this list will end up, but I'm thinking of several project-level tasks that I handle. . Building the PMP RPM: Actually rebuilding the RPM we distribute online takes a bit of doing, but once the setup is done it's not very hard to roll out new versions of the RPM. However, a bad setup can pretty much obliterate the (carefully-configured) machine from which you're trying to gather all the RPM's components. As such, I wrote up a step-by-step document for setting up the build environment and whatnot -- it is in `/usr/local/ntap2/docs/HOWTO.create_RPM'. Note that changes to the RPM involve editing two things: a prep script called `prep-pmp-rpmbuild.sh' and the RPM's "spec" file, `pmp-x.y.spec'. These and more are in `/usr/local/ntap2/pmp/rpm-dev/'. . Building and signing Java jarfiles (NDT): There are currently two Java jars that we build and distribute: our modified NDT applet for first-hop performance tests and our (underutilized) FirstHopScout for running traceroutes -from- the end-user's machine out -to- arbitrary destinations (the guru then saves these traceroutes). In `/usr/local/ntap2/webserver/jarsigner/', our modified NDT applet source is `Tcpbw100.java' and can be rebuilt with the `redo' build script there. Similarly, `FirstHopScout.java' is in the same directory. Now, the steps actually involved in setting up the Java keystore that is used to digitally sign these jars (google for "signed Java Webstart jar") are written up in `/usr/local/ntap2/docs/HOWTO-java-webstart'. FWIW, Tcpbw100.jar needs to be in /usr/local/ndt/ on the PMPs. . Creating and signing host certs: In order to authenticate the PMPs, their gatekeeper principal needs a signed X509 cert from the local KCA. At CITI, we have several (very) helpful scripts to automate the creation of certreqs/keys/certs. Please email someone at CITI for the newest versions of the base scripts, but some are included in `/usr/local/ntap2/webserver/setup/'. `makecerts.expect' is really the useful one. `certcat' is just a simple wrapper around openssl. `certreqtifier' generates sane certreqs -- I used it to create a cert for CITI's jarsigner principal (used for Webstart stuff). . Old, vestigial things that you needn't worry about: . ntap2/citi-permis . ntap2/demos . ntap2/portal . ntap2/qos