This post is aimmed at showing how to count gene expression using HISAT2.
Data that I need are cfRDNA sequences downloaded from ENR Browser.
SRR15852393.fastq
SRR15852394.fastq
SRR15852395.fastq
The first step is to check quality of data
system("fastqc SRR15852393.fastq")
Then I trim data as:
system("
java -jar /Users/nnthieu/genome_tools/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 6 \
./reads/SRR15852394_1.fastq ./reads/SRR15852394_2.fastq \
SRR15852394_trimmed_1P.fastq SRR15852394_trimmed_1U.fastq \
SRR15852394_trimmed_2P.fastq SRR15852394_trimmed_2U.fastq \
ILLUMINACLIP:adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36
")
The output files are:
SRR31264073_1_trimmed_1P.fastq
SRR31264073_2_trimmed_2P.fastq
SRR31264074_1_trimmed_1P.fastq
SRR31264074_2_trimmed_2P.fastq
Then I merge these files into SRR31264073_trimmed.fastq and SRR31264074_trimmed.fastq
system("cat SRR15852394_trimmed_1P.fastq SRR15852394_trimmed_2P.fastq > ./data/SRR15852394_trimmed.fastq
")
In this step I use hisat2 indexes reference genome downloaded from ‘HISAT2 Download’ and save in ‘genome_ref’ directory.
system("
hisat2 -q --rna-strandness R -x /Users/nnthieu/genome_ref/hisat2_indexes/grch38/genome \
-U /Volumes/Extreme/rnaseq/data/SRR15852394_trimmed.fastq | \
samtools sort -o ./hisat2/SRR15852394_trimmed.bam
")
The output are bam files
SRR15852393_trimmed.bam
SRR15852394_trimmed.bam
I need a reference genome for hisat2:
Homo_sapiens.GRCh38.106.gtf.gz
downloaded from http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz
system("wget http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz
")
system("
featureCounts -S 2 -a /Users/nnthieu/genome_ref/gtf/Homo_sapiens.GRCh38.106.gtf -o /Volumes/Extreme/rnaseq/quantification/SRR15852394_featurecounts.txt /Volumes/Extreme/rnaseq/hisat2/SRR15852394_trimmed.bam
")
The output files are SRR15852393_featurecounts.txt and SRR15852394_featurecounts.txt
I want to get out data of geneid and counts, so I extract data as:
system("cat ./quantification/SRR15852394_featurecounts.txt | cut -f 1,7 > ./quantification/SRR15852394_quants.txt
")
Quantification
# Load the data
sample1 <- read.table("/Volumes/Extreme/rnaseq/quantification/SRR15852393_quants.txt", header=TRUE, sep="\t")
sample2 <- read.table("/Volumes/Extreme/rnaseq/quantification/SRR15852394_quants.txt", header=TRUE, sep="\t")
# Merge the data on 'Geneid'
merged_tab <- merge(sample1, sample2, by="Geneid", suffixes=c("_Sample1", "_Sample2"))
# Write to a file
write.table(merged_tab, "/Volumes/Extreme/rnaseq/quantification/merged_counts.txt", sep="\t", row.names=FALSE, quote=FALSE)
the merged txt file is:
The merged file is now ready for gene expression analysis comparing between samples and groups difference.
#—-