HISAT2: Gene Expression Counts

This post is aimmed at showing how to count gene expression using HISAT2.

Loading and preparing Data

Data that I need are cfRDNA sequences downloaded from ENR Browser.

SRR15852393.fastq

SRR15852394.fastq

SRR15852395.fastq

The first step is to check quality of data

system("fastqc SRR15852393.fastq")

Then I trim data as:

system("
       java -jar /Users/nnthieu/genome_tools/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 6 \
./reads/SRR15852394_1.fastq ./reads/SRR15852394_2.fastq \
SRR15852394_trimmed_1P.fastq SRR15852394_trimmed_1U.fastq \
SRR15852394_trimmed_2P.fastq SRR15852394_trimmed_2U.fastq \
ILLUMINACLIP:adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36
")

The output files are:

SRR31264073_1_trimmed_1P.fastq

SRR31264073_2_trimmed_2P.fastq

SRR31264074_1_trimmed_1P.fastq

SRR31264074_2_trimmed_2P.fastq

Then I merge these files into SRR31264073_trimmed.fastq and SRR31264074_trimmed.fastq

system("cat SRR15852394_trimmed_1P.fastq SRR15852394_trimmed_2P.fastq > ./data/SRR15852394_trimmed.fastq
 ")

HISAT2: run alignment

In this step I use hisat2 indexes reference genome downloaded from ‘HISAT2 Download’ and save in ‘genome_ref’ directory.

system("
       hisat2 -q --rna-strandness R -x /Users/nnthieu/genome_ref/hisat2_indexes/grch38/genome \
-U /Volumes/Extreme/rnaseq/data/SRR15852394_trimmed.fastq | \
samtools sort -o ./hisat2/SRR15852394_trimmed.bam
")

The output are bam files

SRR15852393_trimmed.bam

SRR15852394_trimmed.bam

HISAT2: runing featureCounts - Quantification

I need a reference genome for hisat2:

Homo_sapiens.GRCh38.106.gtf.gz

downloaded from http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz

system("wget http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz
")

runing featureCounts

system("
       featureCounts -S 2 -a /Users/nnthieu/genome_ref/gtf/Homo_sapiens.GRCh38.106.gtf -o /Volumes/Extreme/rnaseq/quantification/SRR15852394_featurecounts.txt /Volumes/Extreme/rnaseq/hisat2/SRR15852394_trimmed.bam
")

The output files are SRR15852393_featurecounts.txt and SRR15852394_featurecounts.txt

I want to get out data of geneid and counts, so I extract data as:

system("cat ./quantification/SRR15852394_featurecounts.txt | cut -f 1,7 > ./quantification/SRR15852394_quants.txt
")

Quantification

Merge txt files for many samples using R

# Load the data
sample1 <- read.table("/Volumes/Extreme/rnaseq/quantification/SRR15852393_quants.txt", header=TRUE, sep="\t")

sample2 <- read.table("/Volumes/Extreme/rnaseq/quantification/SRR15852394_quants.txt", header=TRUE, sep="\t")

# Merge the data on 'Geneid'
merged_tab <- merge(sample1, sample2, by="Geneid", suffixes=c("_Sample1", "_Sample2"))

# Write to a file
write.table(merged_tab, "/Volumes/Extreme/rnaseq/quantification/merged_counts.txt", sep="\t", row.names=FALSE, quote=FALSE)

the merged txt file is:

The merged file is now ready for gene expression analysis comparing between samples and groups difference.

#—-