BX:SGE

From CCGB
Revision as of 16:17, 3 September 2010 by Phalenor (talk | contribs)

Jump to: navigation, search

Overview

There is one central SGE installation which handles job scheduling across all of the BX clusters, work servers, and workstations (with the exception of the okinawa, linne, and galaxy clusters). Merging existing clusters, work servers, and workstations is still a work-in-progress project.

The central grid engine has a pair of fully redundant master servers to ensure continuous job scheduling. The loss of both sge masters does not kill jobs that are currently running or queued, but will prevent any further job submissions. There is an approximately 5 minute failover period between sge master failure and the startup of the other sge master.

Status

Usage

Job submission

To submit a job, put the command(s) into a script, and use qsub.

Various job resource requirements can be specified with -l resource=foo. Careful consideration should be given to your job's resource requirements. Specifying arch, mem_free, s_vmem, and slots for threaded jobs is essential to ensure your job does not over-subscribe on resources, and runs to completion. SGE cannot predict what resources your job requires, for example it cannot predict how much memory your job will require, so it might schedule it on a node that has far less memory than necessary, causing the node to wedge itself or the job to die.

For more detailed usage and examples, please see the SGE Documentation Site: SGE 6.2u5 documentation

Job monitoring

SGE host status can be seen with qhost

Job queue/status can be seen with qstat -f, which will show just your jobs. To see everyone's jobs, qstat -f -u '*'. Note that qstat behaves different than previous versions of SGE.

It is highly recommended that job notification be turned on depending on how many jobs you will be submitting. This can be done with the SGE variable settings as seen below:

#$ -M youremail@bx.psu.edu
#$ -m beas

This will send email when your job starts, finishes, is suspended, or aborted or rescheduled.

Disk space

Most if not all nodes have a /scratch directory which should be used in lieu of /tmp or /var/tmp for temporary job output.

AFS considerations

Without going into too much technical detail, when you submit a job, SGE 'steals' your Kerberos tickets and sends them along with the job, and uses them to obtain AFS tokens right before the job starts. Your Kerberos tickets, and hence your AFS tokens, have a default lifetime of 14 days, renewable up to 30 days. It is essential that you pay attention to your current ticket expiration date before submitting a long running job, or a job that will sit in the queue for a significant period of time. This can be seen in the output of klist.

If you submit a job without currently posessing valid Kerberos tickets for your user, then your job will not have authenticated access to AFS, and hence your home directory.

You should not use your AFS home directory for any high throughput IO. The AFS home directory servers are fast, but are certainly not fast enough to serve HPC needs. Running programs, scripts, reading small files, writing small output files should be fine, though. AFS does tend to break down when there are multiple writers to the same volume/directory, so keep this in mind. In most cases, the various NFS storage systems should be preferred to AFS.