SLab:Todo

From CCGB
Jump to: navigation, search

in progress

  • md1k-2 disk problems - currently waiting for problems to show up again to get fresh RAID controller log entries, and to verify that switching to the spare EMM (array controller) didn't fix the problem
  • nagios monitoring (http://kaylee.bx.psu.edu/nagios [login as guest/guest])
    • who gets notified? everyone all at once, or use elapsed-time-based escalations?
    • what do we want to monitor?
      • (DONE) samba share status on s2 so illumina-ga can copy data!
      • up/down state for all nodes and servers
      • (DONE) nfs server on s2/s3 and PARTIAL all the related tcp/udp ports necessary for proper nfs operation
      • disk usage via snmp for s2+s3
      • (DONE) fault management via FMD over SNMP like we do for afs-fs{4..7}, thumper, saturn....
      • SGE queue status
      • sequencer up/down state, and maybe disk usage (can we do that for illumina-ga remotely somehow?)
  • Migrate linne to bx network. See Slab:Linne_BX_migration
    • install AFS client on all nodes
    • (PARTIAL) finish sync'ing uid's/gid's to match what is in BX LDAP
    • create BX accounts for those that don't already have them (cleanup/disable linne accounts that are no longer necessary for security reasons?)
    • point all nodes to ldap.bx.psu.edu for authZ, and switch to the BX.PSU.EDU krb5 realm for authN
    • disable the services running on linne that are no longer necessary

queued

  • attach the UPSes to s2 and s3 to enable graceful shutdown in the event of a power outage?
  • automatic snapshots for the ZFS datasets on s2 and s3 (see http://blogs.sun.com/timf/resource/README.zfs-auto-snapshot.txt)
  • more scripts:
    • migrate sequencing runs from temp to staging (currently /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/software/archive/move_*_temp_to_staging)
      • perhaps notify by email automatically when there are finished runs ready to be moved?
      • notify by email when this is done so any interested parties will see that it has been done, and provide paths to new runs
      • this should call a script to update symlinks and release the data.schuster_lab volume
    • script to better handle submitting illumina jobs to cluster, with email notifications
    • script to allow rsync'ing individual lanes from a run, given source directory, dest directory, and lane(s)
  • Migrate Schuster Lab machines to bx network.
    • Or at least, install AFS client and setup BX.PSU.EDU krb5 realm to handle authentication so it's easier for the schuster lab machines to work with everyone else
  • Automate the archiving of sequencing run directories.
    • Maybe after two weeks in staging they're moved into the archive?
    • need to keep symlinks up-to-date and release the data.schuster_lab volume appropriately!
  • Combine linne and persephone clusters
    • dependent on finishing linne-to-bx migration
    • master/slave SGE qmasters running somewhere central and more reliable (currently c1.persephone and linne)
  • tsm backups of s2 and s3?
  • Replace BioTeam iNquiry
    • Use Galaxy instead?
  • Implement a centralized database of sequencing run information.
    • maybe generate this based on the filesystem layout and the presense/absence of certain files?
    • maybe use this for generating notifications so people know when certain parts of the pipeline are done?
    • Basically a small LIMS.
    • Maybe integrate with galaxy
  • After problems with md1k-2 are fixed, turn on automated scrubbing.
  • clean up old files in /afs/bx.psu.edu/depot/data/schuster_lab/old_stuff_to_cleanup