Difference between revisions of "BX:SGE"

From CCGB
Jump to: navigation, search
(Created page with 'Using SGE 6.2u5 : http://wikis.sun.com/display/gridengine62u5/Using')
 
Line 1: Line 1:
Using SGE 6.2u5 : http://wikis.sun.com/display/gridengine62u5/Using
+
= Overview =
 +
 
 +
There is one central SGE installation which handles job scheduling across all of the BX clusters, work servers, and workstations (with the exception of the okinawa, linne, and galaxy clusters). Merging existing clusters, work servers, and workstations is still a work-in-progress project.
 +
 
 +
 
 +
The central grid engine has a pair of fully redundant master servers to ensure continuous job scheduling. The loss of both sge masters does not kill jobs that are currently running or queued, but will prevent any further job submissions. There is an approximately 5 minute failover period between sge master failure and the startup of the other sge master.
 +
 
 +
= Status =
 +
 
 +
* Current ''BX Grid'' load can be seen through GANGLIA at http://ganglia.bx.psu.edu
 +
* A web version of qstat (XSL formatted version of ''qstat -f -u '*' -xml) is available at http://qstat.bx.psu.edu
 +
 
 +
= Usage =
 +
 
 +
To submit a job, put the command(s) into a script, and use qsub. Various job resource requirements can be specified with '''-l resource=foo'''.
 +
 
 +
SGE host status can be seen with '''qhost'''
 +
 
 +
Job queue/status can be seen with '''qstat -f''', which will show just your jobs. To see everyone's jobs, '''qstat -f -u '*''''. Note that qstat behaves different than previous versions of SGE.
 +
 
 +
For more detailed usage and examples, please see the SGE Documentation Site:
 +
[http://wikis.sun.com/display/gridengine62u5/Using SGE 6.2u5 documentation]

Revision as of 15:59, 3 September 2010

Overview

There is one central SGE installation which handles job scheduling across all of the BX clusters, work servers, and workstations (with the exception of the okinawa, linne, and galaxy clusters). Merging existing clusters, work servers, and workstations is still a work-in-progress project.


The central grid engine has a pair of fully redundant master servers to ensure continuous job scheduling. The loss of both sge masters does not kill jobs that are currently running or queued, but will prevent any further job submissions. There is an approximately 5 minute failover period between sge master failure and the startup of the other sge master.

Status

Usage

To submit a job, put the command(s) into a script, and use qsub. Various job resource requirements can be specified with -l resource=foo.

SGE host status can be seen with qhost

Job queue/status can be seen with qstat -f, which will show just your jobs. To see everyone's jobs, qstat -f -u '*'. Note that qstat behaves different than previous versions of SGE.

For more detailed usage and examples, please see the SGE Documentation Site: SGE 6.2u5 documentation