next up previous contents
Next: submitting SCore MPI jobs Up: Using the SCore SunGridEngine Previous: Using the SCore SunGridEngine   Contents


submitting a SCore MPI job

If your system is set up using the SCore SGE environment then in order to share resources with other users you must submit batch jobs using the qsub command.

To do this you must first write a simple script file. You cannot submit an executable to SGE. This is best explained using an example. Suppose you have compiled an MPI binary called mpitest (see section 8.2 for instructions on how to compile ), and want to run this using 4 cpu's, then you will need to write a script called (say) score.sh whose contents are:

#!/bin/bash
#$ -masterq ehtpx-cluster.q -cwd -V
scout -wait -F $HOME/.score/ndfile.$JOB_ID -e /tmp/scrun.$JOB_ID \
 -nodes=$((NSLOTS-1))x1 /users/nrcb/mpi/mpitest
Here the variables $JOB_ID and $NSLOTS are defined by SGE when your job runs, so you must not define values for these yourself. Next you must submit the script file to the batch system using the ``score'' parallel queue thus:

[nrcb@ehtpx-cluster]$ qsub -pe score 5 ./score.sh
your job 38 ("score.sh") has been submitted
[nrcb@ehtpx-cluster]$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     38     0 score.sh   nrcb         t     08/14/2002 09:55:28 comp000.q  SLAVE 
     38     0 score.sh   nrcb         t     08/14/2002 09:55:28 comp001.q  SLAVE 
     38     0 score.sh   nrcb         t     08/14/2002 09:55:28 comp002.q  SLAVE 
     38     0 score.sh   nrcb         t     08/14/2002 09:55:28 comp003.q  SLAVE 
     38     0 score.sh   nrcb         t     08/14/2002 09:55:28 ehtpx-cluster.q  MASTER
Because SCore always spawns MPI jobs from the front end server, you need to include an extra "slot" to account for this. So to run a parallel job on 4 compute nodes you need to request 5 slots, 4 slots for the parallel execution and 1 slot for the spawning process. In this case the front end server is called server. Options to qsub may be embedded in the job script after #$ at the begining of a job script line. The SGE qsub option -masterq ehtpx-cluster.q in the job script refers to the spawning queue on the front end server. Change the name server to whatever the name of your front end server is. The SGE qsub option -cwd changes to the directory you were in when you submitted the job. The SGE qsub option -V carries all your currently set environment variables over when the job executes. The qstat command can be used to query the job. And the above shows that the job is running. If the job normally prints to the screen (standard output) then this output will be directed to a file in the user's home directory with the same name as the job script appended with a .o and the job id number - e.g in the above example script.sh.o146, and similarly, any errors will be sent to script.e146. For a parallel job 2 additional files are generated, namely the output from the parallel job "start" script and job stop script. In this example the files would be score.sh.po143 and score.sh.pe146 respectively.

The status of a submitted job may be queried by the qstat command. A user may query his or her own jobs by using the -u option to qstat - e.g qstat -u bloggs. See man qstat for full details of qstat options.

SGE can be set up so that it emails you when the job is complete. Please see the man page for qsub or ask your administrator.


next up previous contents
Next: submitting SCore MPI jobs Up: Using the SCore SunGridEngine Previous: Using the SCore SunGridEngine   Contents
2004-06-17