A loose glossary of NTAP's parts.
_____________________________________

  . initial draft 09/06/2005.  dmr.
_____________________________________


. portal:
	The portal host itself is basically the hub of NTAP tests.  It runs the
	PHP which provides the entire web-GUI and all of the profile-based user/test
	configuration functionality.  The portal's webserver itself, in addition to
	running mod_php, runs several "grid" modules that perform authentication and
	credential-translation duties in order to secure the portal's online interface.
	.. perhaps it's easier to break this into parts right now:
	

	. grid modules:
		When a user wants to connect to the portal (at Umich, at least), they first do
		the logical equivalent of running the `kinit' program to acquire Kerberos 
		credentials, and thereafter run `kx509' to get shorter-term X509 credentials.  
		These kX509 creds are used by the user's PKCS11-plugin for their browser to 
		authenticate the client to the portal machine.
		
		On the portal, several Apache modules assist in the credential dance; these
		are `grid-proxy-init', `mod_kct' ("kerberized credential-translator"), `mod_kx509'
		(a second set of proxy credentials are created  a) for the user  b) on the portal
		and  c) are the actual ones used when issuing remote commands), and finally
		`mod_tts' (which assists in acquiring renewable credentials for long-running 
		and/or repetitive jobs).  These modules currently (mid-2005) exist for Apache
		1.3+ and are being ported to Apache 2.

			
	. The LDAP directory:
		Also on the portal is NTAP's LDAP directory, which is used for a few different
		things.  First, there is a directory that stores information about all of the 
		PMPs and associated routers in your institution -- all network interfaces are
		specified with VLAN information (we use this when planning which PMPs to use as
		waypoints along the overall testpath).  Next, there is a directory in which all
		of the PHP's profile-based settings are stored (e.g., a user's scheduled tests, 
		saved option-templates for different kinds of tests, etc).  Finally, the output
		data from, e.g., iperf and owamp are stored in another section of the directory.
		CITI's annotated schemas, as well as some utilities and documentation, are under
		`/usr/local/ntap2/ldap/'.

			
	. testpilot.py:
		An NTAP test is actually run by the "testpilot", a program which takes as input
		(at least) two PMP addresses, arguments for the test program(s) which will run 
		between the PMPs during the test, and various options for storing output, 
		determining waypoints of the testpath on-the-fly, and so on.  After the pilot
		determines the testpath of PMPs, it creates a schedule for running each of the
		pairwise tests and, using the translated credentials supplied by the portal 
		modules, issues globus-client commands to remotely and securely schedule, execute,
		and collect output from the tests.  Note that the PMPs themselves independently
		authenticate and authorize the user scheduling the tests.
		
		As the pilot is a command-line based program, it can by run from a shell without
		trouble -- the PHP on the portal collects all of the arguments to the pilot and 
		then runs the pilot, passing-through to the web user the pilot's stdout data.  
		The pilot's usage information is helpful.
		
		
	. copilot.py:
		NTAP now supports deferred and/or repetitive job scheduling (e.g., "run this 
		network test once a week at 4am").  The copilot's arguments consist of time/date
		scheduling elements ("a period of 2 days", "end at this time on this date", etc), 
		how many times to retry a failed execution of the pilot (e.g., the PMPs were in-use
		or down or something), and then all of the normal arguments that the pilot requires.
		
		The copilot gathers any conf files being given to the testpilot and backs them up
		into a per-reservation directory, along with crontab files that schedule and unschedule
		the repetitive invocations of the pilot.  The copilot, since it doesn't run synchronous
		with any web-based user interaction, spools all of the pilot's output to a 
		per-reservation file.  The copilot is also used to manually unschedule a repetitive or
		deferred test.  Nearly all scheduling-related information can be found in 
		`/usr/local/ntap2/reservations/'.  Note, also, that the copilot utilizes the 
		patched Vixie cron facility, `/etc/cron.d/' (upshot: don't delete anything in there 
		that starts with "ntap").
	
	
	. guru (daemon):
		In addition to tests run between pairs of PMPs, a "first-hop" test can also be run -- 
		i.e., between an end-user and an NDT/web100-enabled PMP.  Since we can't authenticate 
		the end-user's machine, we use a modified version of the NDT client applet that runs
		tests between the user's machine and a PMP; a web100-savvy PMP has an instrumented
		network stack that gathers statistics, which are then analyzed for common errors
		(e.g., a duplex mismatch) and reported to the user.  Our modified applet also sends
		its (nicely formatted) data back to the PMP, which securely relays the test data to the
		portal.
		
		The "guru" is a daemon that runs on the portal and chiefly does two things:  (1) it 
		stores NDT data from users' tests, and (2) it maintains a cache of traceroutes to and
		from the end-users' machines, which assist in finding a PMP "close" to the client host.
		The guru program itself is installed in /usr/local/ntap2/webserver/bin/ntapguru.py and 
		has usage information.

	
	. guru client:
		The guru client program does a few different things (and has a lot of usage information;
		use `ntaputil --guru' to see the guru client's (extensive) usage info.
		
		First, the guru client can be used to send raw commands to the guru -- these are things
		like, given a user's DN and IP address, get the most-recent NDT results or the traceroute
		from the portal to the user's machine; or get all user DN's for which we have data -- things
		like that.  These commands can be given on the command-line ("--getuserdns"), or the client
		has an interactive mode ("--interact") where raw commands can be sent straight through to the
		guru.
		
		Second, the guru client can be used to run traceroutes to client machines ("--radar") and 
		store them in the guru.  There are two modes, one of which is implemented as of mid-2005:
		"p2h" (runs a traceroute from the portal back to the end-user) and "mesh" (unimplemented;
		will be used to have one or more PMPs traceroute back to the end-user and relay the data 
		back to the guru on the portal).  These traceroutes are then used with "--find" to choose a 
		PMP near an end-user.  If none are available or no good choices can be made, all available 
		PMPs (or routers) can be listed with "--list pmps|routers".
		
		Lastly, the guru client is used when relaying NDT data from the web100-enabled PMP back to
		the portal.
		
		
	. renewd:
		In conjunction with `mod_tts' (above), a new Kerberized service called `renewd' handles 
		the acquisition and maintenance of renewable kX509 credentials on the portal.  Though my
		Kerberos creds may expire after a day, I can have them translated into service-specific
		credentials that will be renewed, e.g., every day for a month.  This facilitates long-
		running jobs.  Note that `renewd' is in no way NTAP-specific and is a significant piece of
		software in its own right.
	
	
. PMPs:
	Each PMP can be thought of as a Grid resource, which NTAP uses as a secure, remote invocation 
	platform.  PMPs run basic Globus gatekeeper software (chiefly for authentication), as well as 
	a Globus-based resource manager known as GARA (for authorization and actually executing the
	jobs).  While some Grid setups will have one gatekeeper protect multiple Grid resources, each 
	of our PMPs independently authenticates and authorizes its users (via the X509 user's 
	credentials).

	. install bits:
		Setting up a PMP is a bit complicated, but the instructions enumerate the various steps.  
		Mainly, one installs the PMP RPM, Kerberos, and a Java JDK; contained within the PMP RPM is
		a(n instrumented) web100 kernel RPM that can be used if a custom kernel + patching isn't the
		route for you.  Much more instructions are later.
		
		After installing the RPM, directions are given for configuring the below PMP constituents.
		Thereafter, one runs `ntap-postinstall-verify.sh', which finds most PMP setup snags and offers
		fixes.  The bulk of the NTAP CVS repository (which includes our utilities, docs, etc) is
		installed in /usr/local/ntap2/ .
		
			
	. globus-gatekeeper:
		The gatekeeper is a daemon launched via xinetd.  It's basically a funnel through which nearly
		all NTAP requests go through; NDT tests are the exception -- they're basically orthogonal to 
		the globus infrastructure.  The gatekeeper authenticates users in one of two ways: first, it
		can use the default (de-centralized, very-difficult-to-administer) per-Grid-resource flatfile,
		the "grid-mapfile" (which just contains a mapping from allowed-user-DN to local-UID); second,
		the gatekeeper can instead use a callout to Walden (a centralized LDAP directory of user DNs,
		user "groups", and other elements).  For more information, globus.org.
	
		
	. globus-client:
		Using both Globus and GARA libraries, the globus-client takes a user's X509 credentials and,
		along with a lengthy string (an "RSL") describing the various options for the test, contacts
		each of the PMPs (the globus-client is normally run from the portal itself) and schedules the
		tests and output-gathering through the remote diffserv managers.
	
	
	. Walden:
		Walden is a per-PMP daemon written in Java that is used by the gatekeeper.  The most-succinct
		summary of Walden that I can summon to mind is: "Walden makes Globus-based grid authentication- 
		and authorization-management solutions scale".  For a solid description of Walden and examples 
		of its usefulness, please visit: 	
			
			http://www.citi.umich.edu/projects/ntap/docs.html#rawk
		
		
	. GARA diffserv manager:
		The diffserv manager is a daemon launched with an init-style script.  Once the gatekeeper has
		authenticated the user, it hands off the request to the local diffserv manager, which then
		executes the test(s) and returns the spew.
				
	
	. NDT/Web100 ("first-hop") tests:
		As mentioned above, NTAP supports running tests between an end-user's machine and a web100-
		savvy PMP.  What that actually means is that the PMP must be running:  1) the instrumented 
		web100 kernel (or a suitably-patched version),  2) the daemon process that runs the performance
		tests (`web100srv'),  and  3) the lightweight webserver whose sole job is to provide a webpage
		that contains the NDT Java applet (`Tcpbw100', a client for `web100srv').

		In return, the user is presented with details statistics about network conditions, how the
		user's hardware and software are configured, and potential problem-conditions as determined
		by several heuristics.  Our modified versions of `Tcpbw100' and `web100srv' then utilize the
		guru client to save these results on the portal.  E.g., someone in network support might 
		respond to a user's complaint of poor network conditions by sending them a URL to a web100-
		enabled PMP and, by tacking on an encoded form of the user's DN in the URL's query-string, 
		the results are automatically saved back on the portal.  The support person can then easily
		get at the user's specific data and, lo, it's a duplex mismatch or some such.  A simple demo
		shell script called `show-last-NDT-results.sh' brings up a user's most-recent test data.  Note
		that the guru client interface is very general and therefore invites higher-level wrappers.
		
		For those looking for our software (e.g., from that installed by our PMP RPM) here, it isn't
		located in a great spot.  Under our `ntap2' top-level directory (normally /usr/local/ntap2), 
		the source code for our modified NDT/Web100 servers and our modified NDT applet -- it should
		be in `ntap2/webserver/jarsigner' (it should be called "firsthop" or something).
	
	
	. policy routing:
		Given the PMP/Router LDAP directory described above, a given PMP might have addresses on a 
		variety of VLANs, e.g.  If a PMP is trying to "proxy" its traffic so that it follows, as 
		closely as possible, the network path taken by a test-invoker's packets.  So, if a PMP finds
		that the test user is on subnet X, which is VLAN Y, if the PMP has a presence on VLAN Y, it 
		will choose that address.  Effectively, we use Linux's `iproute2' code to implement one
		routing table per virtual interface, instead of one (primary) routing table for an entire 
		host.  The utility `ntap-config' is relatively primitive and, given a conf file, can set up 
		all of the policy routing required quickly.  However, here at Umich we haven't had a suitable
		network on which to actually use much of this.  More information is available online.


. Utilities:
	Several utilities have grown out of the various development needs of the project.  They get used 
	all over the place.  Some are mentioned in other places.
	
	. send-ssh-key:
		`send-ssh-key' is a shell script that is not at all NTAP-specific -- I use it whenever I am
		distributing my ssh public key for use with `ssh-agent'.  It transmits, installs, and configures
		an ssh key -- and the user only has to type the password once (it is not cached or stored in 
		any way).  Read below about `ntapctl' for how we use it in NTAP.  Its usage info is plenty for
		anyone to use it.
		
		
	. ntapctl:
		The idea behind `ntapctl' is that each PMP is running five daemon processes (globus-gatekeeper,
		diffserv manager, Walden mgridauthd, and NDT's `web100srv' and `fakewww') and they need to be
		restarted periodically.  I use four PMPs for development.. that's a lot of remote ssh commands.
		So, `ntapctl' is meant to be used in conjunction with `ssh-agent' so it can send commands over
		ssh -- but -not- ask for a password (makes it automatable, too).  
		
		First, one sets up their agent stuff (e.g., `ssh-keygen -t rsa && ssh-agent $SHELL', followed by
		`ssh-add').  Then, the ssh public key (e.g., `~/.ssh/id_rsa.pub') needs to be installed on all
		of the remote machines that you don't want to have to type a password to access.  Many people 
		end up with default umasks that botch the file permissions here, at which point many given up on
		the ssh-agent altogether (because one often needs to crank up the debug spew on sshd itself, which
		most people can't/won't do).  Instead, we use `send-ssh-key' to set everything up and set all of
		the various file permissions.
		
		Then, having sent our key to each PMP -once-, we need only load our key into the ssh-agent on 
		our local machine, and then `ntapctl' can issue passwordless remote commands that are nevertheless
		secure (moreso than if a password is used with ssh, actually).  Then, given a tiny prefab list
		of our PMPs' IP addresses (crack open the script, it's obvious), I can issue the normal init-style
		commands "start", "stop", "restart", and "status".  E.g.:
		
			% ntapctl status --citi
			
		... and all four of my normal PMPs report.  Quite handy.  I intend to expand this so it can be 
		used to securely, remotely upgrade PMP software.  Read `ntapctl's usage information from the 
		command-line.


	. ntapproc.py:
		If a parent process creates children who create children (and so on), those children are re-parented
		to the `init' process if the parent is killed (more or less).  If one wants to launch a process that
		is allowed to run for at most N seconds, and after N seconds that process and -all- of its 
		descendents must have either terminated on their own or will be killed, it's a serious pain.  As in,
		given that /proc/ and `kill' and whatnot are not standard across various POSIX systems, there isn't 
		a "kill my whole process group" utility.  That's what `ntapproc.py' does.  Right now, it uses `ps'
		to gather PIDs, but I think I'll rewrite it to look for and parse /proc/ entires.  Regardless, this
		is another fairly general-use utility.
			
			
	. garaerrors.py:
		When the globus-client runs, it returns GARA error codes, which are rather inscrutable.  `garaerrors.py'
		is just a simple mapping of codes to strings; given an integer, it prints a message.  Really only
		useful if you're in GARA's guts.

	
	. tagparser.py:
		Super-super simple markup-format parser.  We use it for our conf files:  --program and --pathmap files
		for `testpilot.py',  --relayNDT files for the guru-client, and other places.  Returns Python 
		dictionaries of (sub)key(s) and value(s).


	. "nannies":
		We have a shell-script wrapper for our daemons that will restart them a certainly number of times if
		they exit, possibly delaying for some time and possibly scaling the timeouts.  An example is "nanny.sh",
		which we use on the PMPs to keep the diffserv manager (which restarts itself periodically) running.


. Heretofore-unclassified, largely task-oriented things:
	I'm not sure how this list will end up, but I'm thinking of several project-level tasks that I handle.

	. Building the PMP RPM:
		Actually rebuilding the RPM we distribute online takes a bit of doing, but once the setup is done 
		it's not very hard to roll out new versions of the RPM.  However, a bad setup can pretty much obliterate
		the (carefully-configured) machine from which you're trying to gather all the RPM's components.  As
		such, I wrote up a step-by-step document for setting up the build environment and whatnot -- it is in
		`/usr/local/ntap2/docs/HOWTO.create_RPM'.  Note that changes to the RPM involve editing two things:
		a prep script called `prep-pmp-rpmbuild.sh' and the RPM's "spec" file, `pmp-x.y.spec'.  These and more
		are in `/usr/local/ntap2/pmp/rpm-dev/'.
		
		
	. Building and signing Java jarfiles (NDT):
		There are currently two Java jars that we build and distribute:  our modified NDT applet for first-hop
		performance tests and our (underutilized) FirstHopScout for running traceroutes -from- the end-user's 
		machine out -to- arbitrary destinations (the guru then saves these traceroutes).  In
		`/usr/local/ntap2/webserver/jarsigner/', our modified NDT applet source is `Tcpbw100.java' and can
		be rebuilt with the `redo' build script there.  Similarly, `FirstHopScout.java' is in the same 
		directory.
		
		Now, the steps actually involved in setting up the Java keystore that is used to digitally sign these
		jars (google for "signed Java Webstart jar") are written up in `/usr/local/ntap2/docs/HOWTO-java-webstart'.
		FWIW, Tcpbw100.jar needs to be in /usr/local/ndt/ on the PMPs.


	. Creating and signing host certs:
		In order to authenticate the PMPs, their gatekeeper principal needs a signed X509 cert from the local
		KCA.  At CITI, we have several (very) helpful scripts to automate the creation of certreqs/keys/certs.
		Please email someone at CITI for the newest versions of the base scripts, but some are included in 
		`/usr/local/ntap2/webserver/setup/'.  `makecerts.expect' is really the useful one.  `certcat' is just
		a simple wrapper around openssl.  `certreqtifier' generates sane certreqs -- I used it to create a 
		cert for CITI's jarsigner principal (used for Webstart stuff).
	

. Old, vestigial things that you needn't worry about:
	. ntap2/citi-permis
	. ntap2/demos
	. ntap2/portal
	. ntap2/qos