SLab:Todo
From CCGB
in progress
- md1k-2 disk problems - currently waiting for problems to show up again to get fresh RAID controller log entries, and to verify that switching to the spare EMM (array controller) didn't fix the problem
- nagios monitoring (http://kaylee.bx.psu.edu/nagios [login as guest/guest])
- who gets notified? everyone all at once, or use elapsed-time-based escalations?
- what do we want to monitor?
- (DONE) samba share status on s2 so illumina-ga can copy data!
- up/down state for all nodes and servers
- (DONE) nfs server on s2/s3 and PARTIAL all the related tcp/udp ports necessary for proper nfs operation
- disk usage via snmp for s2+s3
- (DONE) fault management via FMD over SNMP like we do for afs-fs{4..7}, thumper, saturn....
- SGE queue status
- sequencer up/down state, and maybe disk usage (can we do that for illumina-ga remotely somehow?)
- Migrate linne to bx network. See Slab:Linne_BX_migration
- install AFS client on all nodes
- (PARTIAL) finish sync'ing uid's/gid's to match what is in BX LDAP
- create BX accounts for those that don't already have them (cleanup/disable linne accounts that are no longer necessary for security reasons?)
- point all nodes to ldap.bx.psu.edu for authZ, and switch to the BX.PSU.EDU krb5 realm for authN
- disable the services running on linne that are no longer necessary
queued
- attach the UPSes to s2 and s3 to enable graceful shutdown in the event of a power outage?
- automatic snapshots for the ZFS datasets on s2 and s3 (see http://blogs.sun.com/timf/resource/README.zfs-auto-snapshot.txt)
- more scripts:
- migrate sequencing runs from temp to staging (currently /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/software/archive/move_*_temp_to_staging)
- perhaps notify by email automatically when there are finished runs ready to be moved?
- notify by email when this is done so any interested parties will see that it has been done, and provide paths to new runs
- this should call a script to update symlinks and release the data.schuster_lab volume
- script to better handle submitting illumina jobs to cluster, with email notifications
- script to allow rsync'ing individual lanes from a run, given source directory, dest directory, and lane(s)
- migrate sequencing runs from temp to staging (currently /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/software/archive/move_*_temp_to_staging)
- Migrate Schuster Lab machines to bx network.
- Or at least, install AFS client and setup BX.PSU.EDU krb5 realm to handle authentication so it's easier for the schuster lab machines to work with everyone else
- Automate the archiving of sequencing run directories.
- Maybe after two weeks in staging they're moved into the archive?
- need to keep symlinks up-to-date and release the data.schuster_lab volume appropriately!
- Combine linne and persephone clusters
- dependent on finishing linne-to-bx migration
- master/slave SGE qmasters running somewhere central and more reliable (currently c1.persephone and linne)
- tsm backups of s2 and s3?
- Replace BioTeam iNquiry
- Use Galaxy instead?
- Implement a centralized database of sequencing run information.
- maybe generate this based on the filesystem layout and the presense/absence of certain files?
- maybe use this for generating notifications so people know when certain parts of the pipeline are done?
- Basically a small LIMS.
- Maybe integrate with galaxy
- After problems with md1k-2 are fixed, turn on automated scrubbing.
- clean up old files in /afs/bx.psu.edu/depot/data/schuster_lab/old_stuff_to_cleanup