Q1.1. Please answer the following questions concerning the directories [ 1 point ].
Q1.1a. Which of these directories should you write output to from jobs submitted on compute nodes? Indicate all that apply from /scratch, /archive, /home
/scratch
Q1.1b. Which of these directories is backed up and can be recovered should the data be lost? Indicate all that apply from /scratch, /archive, /home
/home, /archive
Q1.1c. Which of these directories is flushed every 60 days? Indicate all that apply from /scratch, /archive, /home
/scratch
Q1.1d Execute the “myquota” command to determine how much disk space you have available in each directory. Note that if you are working from a compute node prompt, the /archive directory will not appear because /archive is not mounted on compute nodes. How much space do you have remaining on each of your /home and /scratch and directories?
/home: 50.0GB /scratch: 5.0TB
Q2.1 Scroll down past the header lines of the SAM file to where the alignment records begin and answer the following [ 1 point ].
Q2.1a What is the delimiter between columns of an alignment record (row) (hint: your answer should not be “^I”, You may need to use online resources to answer the question) ?
The delimitter between columns of an alignment record are tabs. Since SAM files stores sequence data in a series of tab-delimitted format, each column of an alignment record is thus separated by tabs.
Q2.1b What does the “$” represent at the end of each line?
The “$” represents end of read at the end of each line.
Q2.2 Execute week1.sh using your preferred method and copy both the command and output into you answers file [ 1 point ].
Command:
cp /scratch/work/courses/BI7653/hw1.2022/week1.sh .
cat week1.sh | less
chmod +x week1.sh
./week1.sh
Output is as follows:
This is the contents of the samfile variable: /scratch/courses/BI7653/hw1.2022/week1.sam
This is the first alignment record in the sam file:
grep: /scratch/courses/BI7653/hw1.2022/week1.sam: No such file or directory
This is the chromosome and position of the first 3 records in the sam
grep: /scratch/courses/BI7653/hw1.2022/week1.sam: No such file or directory
The following is todays time and date:
Mon Feb 7 18:08:43 EST 2022
This is todays time and date: Mon Feb 7 18:08:43 EST 2022
Q3.1. Now you will create and modify a shell script with a command line (or “terminal”) text editor and execute a shell script and execute as a slurm job.
Command line text editors nano, vim, emacs are available on the HPC. You may launch a text editor simply by typing the name of the editor at the command prompt. nano is the simplest editor available on HPC and recommended for a quick start.
Perform the following tasks after confirming that you are working on a compute node.
echo script begin: $(date)
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai
echo script completed: $(date)
Report the (1) commands you used for steps 1-5 in your homework answers document answers for Q3.1, (2) report the contents of your script, and (3) report the job id [ 1 point ].
Commands used for 1-5:
cd /scratch/bl2477
mkdir ngs.week1
cp /scratch/work/courses/BI7653/hw1.2022/slurm_template.sh ngs.week1
cd ngs.week1
mv slurm_template.sh slurmjob_template.sh
vim slurmjob_template.sh
Content of script:
#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=4:00:00
#SBATCH --mem=10GB
#SBATCH --job-name=slurm_template
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=bl2477@nyu.edu
module purge
echo script begin: $(date)
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai
echo script completed: $(date)
Submit job script command:
sbatch slurmjob_template.sh
Job ID: 14715512
Q3.2. Submit your job script using the sbatch command and monitor your job status using the command:
squeue -u bl2477
The job will typically register as pending “PD”, running “R”, or complete “C”. If the job is no longer in the queue then it has completed. If you have a syntax error you can typically identify the problem by reviewing the STDERR of the job, or by reviewing the exit status (see pre-recorded video).
Q3.2 Now answer the following questions [ 1 point ].
Q3.2a What is the job id of your job?
Job ID: 14715512
Q3.2b What are the names of ALL the files in the directory where you launched the job after the job has completed?
Name of the files in the directory:
HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai
slurm-14715512.out
slurmjob_template.sh
Q3.2c What is the exit status of your job. To see execute seff <job id>
Job ID: 14715512
Cluster: greene
User/Group: bl2477/bl2477
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:03
CPU Efficiency: 0.45% of 00:11:04 core-walltime
Job Wall-clock time: 00:11:04
Memory Utilized: 1.27 MB
Memory Efficiency: 0.01% of 10.00 GB
Q3.2d How much memory (RAM) was used? Again, try seff <job id>
1.27 MB was used
Q3.3 Answer the following [ 1 point ].
Q3.3a What is the name of the file(s) with the STDERR and STDOUT for your job? (hint: watch the pre-recorded video)?
slurm-14715512.out
Q3.3b What is the output of the “date” command substitution from your script in the STDERR/STDOUT file for your job?
Output:
script begin: Mon Feb 7 18:44:25 EST 2022
script completed: Mon Feb 7 18:55:22 EST 2022
Q4.1. Perform the following steps and save commands and output for your answer using the pre-recorded video (and powerpoint) for help. 1. Load the most recent samtools module (highest version number) (see the pre-recorded video for help with the module load command) 2. Use the “which” command to confirm samtools is now in your path. 3. Print the samtools help to your terminal
samtools --help | head -n 5 # or simply enter "samtools | head -n 5"
Report all command lines and output from Q4.1 for your answer [ 1 point ].
Command (part 1):
module avail samtools
Output(part 1):
--------------------------- /share/apps/modulefiles ----------------------------
samtools/intel/1.11 samtools/intel/1.12 samtools/intel/1.14
Command (part 1&2):
module load samtools/intel/1.14
which samtools
Output (part 1&2):
/share/apps/samtools/1.14/intel/bin/samtools
Command (part 3):
samtools --help | head -n 5
Output (part 3):
Program: samtools (Tools for alignments in the SAM format)
Version: 1.14 (using htslib 1.14)
Command (part 4):
module list
Output (part 4):
Currently Loaded Modules:
1) perl/intel/5.32.0 3) htslib/intel/1.14
2) intel/19.1.2 4) samtools/intel/1.14
Command (part 5):
module purge
module list
Output:
No modules loaded
Q4.2. Convert the BAM downloaded in Task 3 to SAM format.
Command:
module load samtools/intel/1.14
samtools view -h HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.bam > HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.sam
head -n 10 HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.sam
First 10 lines of sam file:
@HD VN:1.0 SO:coordinate
@SQ SN:1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:3 LN:198022430 M5:fdfd811849cc2fadebc929bb925902e5 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
Q5.1. What is the size of the SAM file (in human readable bytes)? See the man page for the “du” command and report the human readable file size. [ 1 point ].
cd /scratch/bl2477/ngs.week1
du -h HG00096.chrom11.ILLUMINA.bwa.GBR.low_coverage.20120522.sam
The human readable file size for the SAM file is 2.7G
Q5.2 How did your /scratch quota change relative to your myquota command from Task 1? Include the output from your terminal into your answer (you can highlight text on your console and copy and paste to your homework document) [ 1 point ].
Memory usuage space fom scratch increased when compared to /scratch quota from task 1. Here, 3.30GB are currently being used.
Output:
Filesystem Environment Backed up? Allocation Current Usage
Space Variable /Flushed? Space / Files Space(%) / Files(%)
/home $HOME Yes/No 50.0GB/30.7K 0.00GB(0.00%)/15(0.05%)
/scratch $SCRATCH No/Yes 5.0TB/1.0M 3.30GB(0.06%)/7(0.00%)