Code
/home/shared/kallisto/kallistoCheck kallisto version to confirm the correct file path to the program with the following command:
/home/shared/kallisto/kallistokallisto index -iThis code is indexing the file Montipora_capitata_HIv3.genes.cds.fna while also renaming it as mcap_rna.index. /home/shared/kallisto/kallisto is the absolute path to the kallisto program from within the raven server, while the lines after the index command indicate where to where to write the file to (mcap_rna.index) and where to get the data from (theMontipora_capitata_HIv3.genes.cds.fna file)
# if it doesn't already exist, make an index folder inside output
mkdir -p ../output/kallisto-index
# use kallisto to build an index file
/home/shared/kallisto/kallisto \
index -i \
../output/kallisto-index/mcap_rna_kallisto.index \
../data/Montipora_capitata_HIv3.genes.cds.fnaabundance.tsv for each sample with kallisto quant -icreate a subdirectory kallisto in the output folder using mkdir -p
use the find utility to search for all files in the /home/shared/8TB_HDD_01/mcap/ directory that match the pattern *.fastq
use the basename command to extract the base file name of each file (i.e., the file name without the directory path), and remove the suffix .fastq.
run the kallisto quant command on each input file, with the following options:
-i ../output/kallisto-index/mcap_rna_kallisto.index: Use the kallisto index file located at ../output/kallisto-index/mcap_rna_kallisto.index.
-o ../output/kallisto_01/{}: Write the output files to a directory called ../output/kallisto_01/ with a subdirectory named after the base filename of the input file (the {} is a placeholder for the base filename).
-t 40: Use 40 threads for the computation.
--single -l 100 -s 10: Specify that the input file contains single-end reads (–single), with an average read length of 100 (-l 100) and a standard deviation of 10 (-s 10).
The input file to process is specified using the {} placeholder, which is replaced by the base file name from the previous step.
# make directory in the output folder for kallisto (if it already exists, the `-p` option creates intermediate directories as needed, and keeps from throwing an error)
mkdir -p ../output/kallisto-abundance
find /home/shared/8TB_HDD_01/mcap/*.fastq \
| xargs basename -s .fastq | xargs -I{} /home/shared/kallisto/kallisto \
quant -i ../output/kallisto-index/mcap_rna_kallisto.index \
-o ../output/kallisto-abundance/{} \
-t 40 \
--single -l 100 -s 10 /home/shared/8TB_HDD_01/mcap/{}.fastqThis makes individual folders for each sample. In each folder there should be three files: - run_info.json - abundance.tsv - abundance.h5
This next command runs the abundance_estimates_to_matrix.pl script from the Trinity RNA-seq assembly software package to create a gene expression matrix from kallisto output files.
The specific options and arguments used in the command are as follows:
perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl: Run the abundance_estimates_to_matrix.pl script from Trinity.
/home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl--est_method kallisto: Specify that the abundance estimates were generated using kallisto.
--gene_trans_map none: Do not use a gene-to-transcript mapping file.
--out_prefix ../output/kallisto: Use ../output/kallisto as the output directory and prefix for the gene expression matrix file.
--name_sample_by_basedir: Use the sample directory name (i.e., the final directory in the input file paths) as the sample name in the output matrix.
Use the kallisto abundance.tsc files generated in the last step as input for creating the gene expression matrix.
trinity gene expression matrix
# If it doesn't already exist, create a folder named 'trinity-gem' inside the 'output' folder
mkdir -p ../output/trinity-matrix
# A list of all the paths to each 'abundance.tsv' file for each sample
abundance_paths=../output/kallisto-abundance/*/abundance.tsv
# The paths list, but joined with a '\' between each one!
joined_abundance_paths="${abundance_paths[*]// /\\ }"
# Create gene expression matrix with `trinity` using the list of '\' joined paths named `joined_abundance_paths`
perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map none \
--name_sample_by_basedir \
--out_prefix ../output/trinity-matrix/mcap \
${joined_abundance_paths} Getting some errors I don’t understand..
R –no-save –no-restore –no-site-file –no-init-file -q < ../output/trinity-gem/.isoform.TPM.not_cross_norm.runTMM.R 1>&2 sh: 1: R: not found Error, cmd: R –no-save –no-restore –no-site-file –no-init-file -q < ../output/trinity-gem/.isoform.TPM.not_cross_norm.runTMM.R 1>&2 died with ret (32512) at /home/shared/trinityrnaseq-v2.12.0/util/support_scripts/run_TMM_scale_matrix.pl line 105. Error, CMD: /home/shared/trinityrnaseq-v2.12.0/util/support_scripts/run_TMM_scale_matrix.pl –matrix ../output/trinity-gem/.isoform.TPM.not_cross_norm > ../output/trinity-gem/.isoform.TMM.EXPR.matrix died with ret 6400 at /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl line 385.