Submit Jobs to SLURM

Article author
Jillian Rowe

When you submit jobs you can optionally specify CPUs,  walltime, and the number of nodes. You can also specify the partition and/or constrain to a particular node type.

 

Cluster Configuration

In order to get your cluster configuration run :

sinfo

This will show you the queues and the node types per queue.

For more information on each node run sinfo -N -l.

 

Specifying Memory on SLURM AWS

By default, the SLURM on AWS is not configured to use memory. Instead, you create a constraint on a node type.

sbatch --constraint=t3a.2xlarge --partition dev my_job.sh

 

Submit Jobs

The simplest way to submit a job to the cluster is to use sbatch job-script.sh.

Make sure your job submission script is set as executable with chmod 777 /path/to/job-script.sh.

Here’s an example job script. Replace the jobname, memory, and cpu requirements with real values.

#!/usr/bin/env bash

#SBATCH --job-name=jobname
#SBATCH --ntasks=1                 
#SBATCH --mem=1gb      

echo "HELLO"
sleep 60

 

 Submit For-Loop Like Jobs

Let’s say you have 10 samples, and you want to run the same analysis across each of them. Instead of submitting 10 individual jobs, you can submit a single job that is itself an array of jobs.

#!/bin/bash
#
#SBATCH --job-name=analysis
#SBATCH --output=res_emb_arr.txt
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#
#SBATCH --array=1-10

./my_analysis $SLURM_ARRAY_TASK_ID

 

Now let’s say you have a file that lists your samples, aptly named sample_file.

 

sample1
sample2
sample3

 

(and so on). Instead of passing an integer to your analysis script you may want to pass the sample name.

#!/bin/bash
#
#SBATCH --job-name=analysis
#SBATCH --output=analysis.txt
#SBATCH --ntasks=1
#SBATCH --time=1:00:00
#SBATCH --mem-per-cpu=100
#SBATCH --array=1-10

./my_analysis $(sed -n ${SLURM_ARRAY_TASK_ID} sample_file)

 

Submit Interactive Jobs (srun)

If you have used other clusters you are probably used to being able to run srun and get an interactive job. That doesn’t work on AWS HPC clusters because nodes come up or down on demand, which is different than the what SLURM would expect in a typical data center.

Instead, submit a job and give it a command that won’t exit right away like sleep.

#!/usr/bin/env bash

# submit-script.sh

#SBATCH --job-name=jobname
#SBATCH --ntasks=1                 
#SBATCH --mem=1gb      
#SBATCH --time=01:00:00

sleep  5h 

 

Name this script submit-script.sh and give it the appropriate memory and CPU usage and submit by running sbatch submit-script.sh. This will give you a jobId. Then run squeue -u $USER, or watch squeue -u $USER. Wait for you job to come up as RUNNING or R. Once it’s running, you’ll see an IP address or hostname listed. SSH over to that IP address as the same user you logged into the cluster as.

 

Constrain your Job to a Node Type or Partition

You can constrain your job to a particular node type by using a combination of --partition and --constraint. This is especially useful for machine learning jobs that require GPU nodes, analyses that require high amounts of memory, etc.

Exactly which combination you need depends upon your cluster configuration. Partitions can mix and match node types. To see how your cluster is configured run:

sinfo
 
#!/usr/bin/env bash

#SBATCH --partition=extralarge
#SBATCH --constraint=m4.10xlarge
#SBATCH --time=1:00:00
#SBATCH --mem=2gb
#SBATCH --cpus-per-task=2

echo "HELLO"

sleep 5m

Submit to t3.2xlarge

#!/usr/bin/env bash

#SBATCH --partition=compute
#SBATCH --constraint=t3.2xlarge

echo "HELLO"

sleep 5m
 
 

The simplest way to do this is to go to your Jupyterhub URL. Log in using the username and password.

Once there, you will see a form with parameters that correspond to a SLURM submission. Fill that in as you would submit a SLURM job. A background process submits a job to the cluster, and then you are given a notebook on that node.

 

Submit jobs to Multiple Nodes

If you’re running Dask, Spark, MPI, etc jobs you may want to submit to multiple hosts using the -N (capital N!) option.

 

#!/bin/bash
#SBATCH -N 2                	# request two nodes

... rest of your submission script

 

Once you’ve submitted a job you can SSH over to one of your nodes. The environmental variable $SLURM_JOB_NODELIST will have the nodelist.

 

scontrol show hostnames $SLURM_JOB_NODELIST

 

Other Resources

There is a lot of information on submitting SLURM jobs, and this article is only meant to give a brief overview.

All SBATCH Options Slurm Environmental Variables Tutorials on Submitting SLURM jobs

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.

Still have questions?

Submit a request