Topics

  1. Setting up aliases
  2. Submitting batch jobs (review)
  3. Running parallel (array) jobs
  4. Running sequential jobs
  5. Running MATLAB interactively
  6. Submitting MATLAB jobs

Topic 1: Setting up aliases

  1. Open the terminal
  2. Type ls -a to see .bash_profile file
  3. Type emacs -nw .bash_profile to edit or create
  4. Add alias(es) by typing:

alias [alias-name]="[command-to-run]"

Example: alias jhpce="ssh -X amejia@jhpce01.jhsph.edu"

  1. Type Ctrl-x Ctrl-c then y to exit and save

Exercise!

Set up an alias to get on the jhpce cluster.


Topic 2: Submitting batch jobs (review)

  1. Get on jhpce.
  2. Go to the directory where your code is OR create a short R file by typing emacs -nw myscript.R and entering a couple lines of R code.
  3. Create and open shell script by typing emacs -nw shellscriptname.sh. Write shell script (example below), then exit and save by typing Ctrl-X Ctrl-C then y
  4. Submit job by typing qsub shellscriptname.sh
#!/bin/sh
#$ -cwd 
Rscript Rfilename.R

Exercise!

Submit an R job that computes 2+2.


Topic 3: Running parallel (array) jobs

Array jobs are useful if you want to run the same job in parallel over different subjects, scenarios, simulation iterations, etc. The basic idea is that each array job will have a task ID, and in your R code you will grab this task ID.

  1. Add task IDs to your shell script through the t option. For example, if you want your jobs to be indexed 1-10, you’d add #$ -t 1-10
  2. Add a line k <- as.numeric(Sys.getenv("SGE_TASK_ID")) in your R code to grab the task ID, and use k to determine which subjects, scenarios, etc. to run.

Exercise!

Run the following R code in parallel:

subjects <- c("Marie","John","Shelly","Chen")
k <- as.numeric(Sys.getenv("SGE_TASK_ID"))
subject.k <- subjects[k]
print(subject.k)

Topic 4: Running sequential jobs

Sequential jobs are useful if you want to run a series of jobs, where each job should be completed before the next begins.

  1. Submit the first job by typing qsub job1.sh
  2. Type qstat and copy the task id, for example 1234.
  3. Submit the second job by typing qsub -hold_jid 1234 job2.sh

Exercise!

Run the following two R jobs sequentially:

a <- "hello world"
save(a, file="~/a.Rdata")
load(file="~/a.Rdata")
print(a)

Topic 5: Running MATLAB interactively

  1. Type which matlab to see if the MATLAB module is loaded.
  2. If you get a “no matlab in…” message, add the following line to the end of your .bashrc file: module load matlab
  3. Get onto a node by typing qrsh
  4. Type matlab -nodisplay to run MATLAB on the command line, or just matlab to open the GUI.
  5. To exit MATLAB, type exit

Exercise!

Open MATLAB and compute 2+2.


Topic 6: Submitting MATLAB jobs

If you want to run a MATLAB job titled myscript.m, create the following shell script, then submit as usual with qsub:

#$ -m e -M amejia@jhsph.edu
#$ -cwd
matlab -r "myscript; exit"

A cautionary not: MATLAB may run code in a parallel environment without being explicitly told to do so, taking up resources on multiple cores!
* If your job is relatively big, you should add a line #$ -pe local [num-cores] to the shell script, where num-cores is the number of cores you think MATLAB might use. * To estimate the number of cores MATLAB used on a previous job, find the email sent to you by root@local. This email should include CPU time and Wallclock time. * If the CPU time is greater than the Wallclock time, this means that multiple cores were used. The number of cores used \(latex \approx\) CPU Time/Wallclock Time.