ISG 5311 Final Project

Author

Stephanie Ballas

Published

December 7, 2025

1 Introduction / Background

Pyrethroid pesticides, such as deltamethrin (DM), are widely used in agriculture and aquaculture due to their high insecticidal activity and relatively low toxicity compared to older pesticides. DM is effective against a broad spectrum of pests, making it a popular choice for pest and disease control in aquatic environments (Guardiola et al. (2014)). However, DM frequently enters rivers and lakes through surface runoff and wastewater discharge, posing risks to aquatic organisms. Fish are particularly vulnerable because they lack efficient mechanisms to metabolize and degrade DM, leading to toxic effects even at environmentally relevant concentrations (Wu et al. (2020)).

Goldfish (Carassius auratus) are a common freshwater species widely distributed in many water bodies and are often used as a model organism in toxicology research (Blanco et al. (2018)). Their ecological relevance, manageable size, and ease of maintenance make them suitable for laboratory experiments, while their physiology provides insight into potential impacts of environmental contaminants on aquatic species. Prior studies have shown that DM exposure can induce oxidative stress, apoptosis, and immune dysfunction in various fish species, but the molecular mechanisms underlying these effects are not fully understood (Zhou et al. (2021)).

The aim of this study was to investigate the effects of acute, environmentally relevant concentrations of DM on goldfish kidney transcriptomes and intestinal microbiota. Nine healthy goldfish were randomly assigned to three treatment groups: 0 µg/L control, 0.2 µg/L DM, and 2 µg/L DM, with three fish per group. Fish were exposed for 96 hours under controlled laboratory conditions. Kidney tissues were collected and RNA extracted for high-throughput RNA-sequencing to identify differentially expressed genes and characterize affected biological pathways, including apoptosis, immune response, and drug metabolism. Additionally, intestinal contents were collected to assess changes in microbial composition using 16S rRNA sequencing.

By combining transcriptomic and microbiota analyses, this study provides an integrative understanding of how DM affects fish at both the molecular and microbial levels. The results are expected to reveal key pathways disrupted by DM exposure and highlight the broader impacts of environmental contaminants on fish physiology and gut microbial communities (Zhou et al. (2021)). This work contributes to the growing field of aquatic toxicology by providing detailed molecular insights into DM-induced toxicity in a widely studied model species.

2 Methods

Methods were adapted from Zhou et al. (2021) unless otherwise noted.

2.1 Experimental Design and Samples

Eighty-one healthy goldfish (Carassius auratus) were purchased from a commercial supplier (Hanjin Ornamental Fish Farm, Wuhan, China) and acclimated for two weeks in aerated, dechlorinated water. During acclimation, fish were held in large tanks under controlled temperature (19–21 °C), pH (6.9–7.3), and dissolved oxygen (~6.2 mg/L), with a consistent light:dark cycle of 10 hours of light to 14 hours of darkness. Fish were fed once daily with a commercial pellet diet, and one-third of the water was exchanged every three days to maintain water quality.

Deltamethrin (>98% purity) was dissolved in dimethyl sulfoxide (DMSO) to generate stock solutions. Fish were randomly assigned to one of three treatment groups: 0 µg/L DM (control), 0.2 µg/L DM, and 2 µg/L DM (n = 9 fish per group). These concentrations reflect environmentally relevant DM levels reported in aquatic systems. All groups, including controls, contained trace amounts of DMSO (<0.001% v/v). Fish were exposed to their assigned treatments for 96 hours. Solutions were renewed daily to maintain stable DM levels, and fish were not fed during the exposure period.

2.2 Sample Collection

After 96 hours exposure, nine specimens were selected from each group to collect intestinal contents and pooled for transctiprome anslysis. Simultaneously, three specimens were randomly selected from each group and the kidney was removed. The kidney specimens were used for RNA-seq analysis.

2.3 Transcriptome Processing and Analysis

2.3.1 RNA Extraction and Sequencing

Total RNA was extracted from kidney tissue using TRIzol Reagent (Life Technologies) in accordance with the manufacturer’s protocol. A NanoDrop 2000 was used to assess RNA quantity and purity, and an Agilent 2100 Bioanalyzer was used to confirm RNA integrity. Only high-quality RNA samples were processed for sequencing. Libraries were prepared and sequenced on an Illumina NovaSeq 6000 to produce paired-end reads. Raw RNA-seq data have been submitted to the Sequence Read Archive (SRA) database in NCBI (accession number: SRP323663).

2.3.2 Bioinformatic Workflow

Raw reads were filtered to remove low-quality sequences, adapter contamination, and poly-N content. Clean reads were aligned to the goldfish reference genome (GCF_003368295.1) using HISAT2 v2.2.1. Gene-level counts were generated and normalized using DESeq2 v1.38.0 within R.

Differentially expressed genes (DEGs) were identified based on false discovery rate (FDR) < 0.05 and |log2 fold change| > 1. Functional enrichment of DEGs was performed using GOseq for Gene Ontology (GO) enrichment and KOBAS for Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. These analyses were used to identify biological pathways associated with apoptosis, immune function, and metabolism.

2.3.3 qPCR Validation

To confirm the reliability of RNA-seq results, a subset of ten DEGs were selected for quantitative real-time PCR (qPCR). Primers were designed using Primer3, and qPCR was conducted using a QuantStudio 3 Real-Time PCR System with ChamQ Universal SYBR qPCR Master Mix. Each reaction contained SYBR qPCR Master Mix, forward and reverse primers, cDNA template, and nuclease-free water. Thermal cycling conditions were: 95 °C for 30 seconds followed by 40 cycles of denaturing at 95 °C for 10 seconds and annealing at 60 °C for 30 seconds. The β-actin gene was used as a reference (housekeeping) gene, its expression level was assumed to remain stable across all samples. This allows for comparison of the expression of the target DEGs relative to a consistent baseline. To quantify gene expression differences, the 2⁻ΔΔCt method was applied. This method uses the cycle threshold (Ct) values from qPCR to calculate relative fold changes in gene expression between experimental groups.

2.4 Intestinal Microbiota Analysis

2.4.1 DNA Extraction and Sequencing

DNA was extracted from intestinal contents using the QIAamp DNA Stool Mini Kit (Qiagen). DNA concentration and purity were measured using a NanoDrop spectrophotometer. Full-length 16S rRNA genes were amplified using primers 27F and 1492R. PCR amplification was performed using Premix Taq (Takara) with the following thermal cycling program: an initial denaturation at 95 °C for 5 minutes; 25 cycles of denaturation at 95 °C for 30 seconds, annealing at 50 °C for 30 seconds, and extension at 72 °C for 1 minute; followed by a final extension at 72 °C for 7 minutes. Libraries were prepped from purified DNA using the SMRTbell Template Prep Kit 1.0, and sequenced on a PacBio SMRT RS II platform.

2.4.2 Sequence Processing and Statistical Analysis

Raw PacBio reads were processed with the SMRT Analysis Suite to generate high-quality circular consensus sequences. Sequences were clustered into operational taxonomic units at 97% similarity using USEARCH v10.0. Taxonomic classification was performed using the RDP Classifier with an 80% confidence threshold.

Microbial diversity analyses were completed in QIIME2, including: alpha diversity (Chao1, ACE, Shannon, and Simpson indices), Good’s coverage to estimate sequencing completeness, and beta diversity (PCoA and UPGMA clustering). All statistical analyses were performed in R. Data were checked for normality and homogeneity of variance. Differences among treatment groups were evaluated using one-way ANOVA followed by Duncan’s multiple range test, with significance set at p < 0.05.

3 Results

3.1 Raw sequencing and preprocessing

Nine kidney RNA-seq libraries, three per treatment group, were sequenced. After quality control, the number of clean reads per library was ~41.36 million and per-library Q30 exceeded 95.09% (see Table 1). Clean reads mapped to the goldfish reference genome (GCF_003368295.1) with mapping rates between 88.37% and 90.71%, indicating high-quality sequencing and good genome coverage.

Table 1: Summary of RNA-seq library quality and mapping statistics
Table 1

3.2 Differential expression analysis

After alignment and normalization, DESeq2 identified 270 DEGs, 160 upregulated and 110 downregulated in the 0.2 µg/L versus control comparison, and 711 DEGs, 309 upregulated and 402 downregulated in the 2 µg/L versus control comparison (Table 2). A set of DEGs overlapped between treatments; the intersection is shown in a Venn diagram (Figure 2D).

Table 2: Summary of differentially expressed genes (DEGs)
Warning: package 'knitr' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Summary of differentially expressed genes (DEGs) {#tab:DEGs}
Comparison Total DEGs Upregulated Downregulated
0.2 µg/L vs Control 270 160 110
2 µg/L vs Control 711 309 402
Figure 1: Differentially Expressed Genes by Treatment

Figure 2: Summary of differentially expressed genes in goldfish (Carassius auratus) after exposure to 0.2 and 2 µg/L deltamethrin Zhou et al. (2021).

3.3 Functional enrichment (GO & KEGG)

GO enrichment for the 0.2 µg/L group assigned 228 DEGs to GO categories: biological process (BP), cellular component (CC) and molecular function (MF) (Figure 3). The most-enriched BP terms included regulation of programmed cell death and regulation of humoral immune response. MF enrichment highlighted sequence-specific DNA-binding transcription factor activity. For the 2 µg/L group, enriched terms included myeloid leukocyte activation and several cellular component terms related to mitochondria and granules.

KEGG enrichment identified 92 pathways significantly affected at 0.2 µg/L and 151 pathways at 2 µg/L. The top enriched pathway at 0.2 µg/L was apoptosis (ko04210) (14 DEGs), and at 2 µg/L phagosome (ko04145) had the highest enrichment (27 DEGs) (Figure 4).

Figure 3: Gene ontology annotation of the differentially expressed genes in the three main GO categories: biological process, cellular component and molecular function. A: 0 μg/L vs 0.2 μg/L deltamethrin; B: 0 μg/L vs 2 μg/L deltamethrin Zhou et al. (2021).

Figure 4: KEGG enrichment based on the DEGs after exposure to (A) 0.2 and (B) 2 µg/L DM compared to the control group (0 µg/L DM). The vertical and horizontal axes represent different pathways and rich factor, respectively Zhou et al. (2021).

3.4 qPCR validation

Ten DEGs were selected for qPCR validation, five in the 0.2 µg/L group and five in the 2 µg/L group. IFI44L, CYP27C1, CTSL, IFIG1 and GIMAP7 are evaluated in the 0.2 µg/L DM-treated group and are compared with control samples. LYS, ALP, C3, GST and CYP1A1 genes are evaluated in the 2 µg/L DM-treated group and are compared with control samples. qPCR fold-change results were consistent with RNA-seq fold changes for all ten genes, supporting the reliability of the transcriptomic data (Figure 5). Error bars indicate mean ± SD (n = 3 biological replicates).

Figure 5: Differentially expressed genes in goldfish kidneys confirmed by qPCR. The X-axis displays 10 DEGs and the Y-axis represents relative fold change. The data are expressed as the means ± SD (n = 3) Zhou et al. (2021).

3.5 Microbiome sequencing and alpha/beta diversity

A total of 88,633 effective circular consensus reads (CCS) were obtained from intestinal samples, with average read lengths between 1,418 and 1,456 bp. Rarefaction curves indicated that the sequencing depth was sufficient to capture the majority of microbial diversity, allowing for reliable downstream analysis. Analysis of alpha-diversity indices showed a slight decrease in microbial diversity in the 0.2 µg/L deltamethrin (DM) group and an increase in the 2 µg/L DM group; however, these changes were not statistically significant (P > 0.05).

Proteobacteria dominated at the phylum level, followed by Firmicutes, Bacteroidetes, Fusobacteria, and Verrucomicrobia. Exposure to 2 µg/L DM increased Proteobacteria, Firmicutes, and Fusobacteria, while Bacteroidetes decreased in both DM-treated groups; Verrucomicrobia increased in the 0.2 µg/L group. At the genus level, Aeromonas, Cetobacterium, Dielma, Pseudorhodobacter, and Akkermansia were most abundant. DM exposure decreased Aeromonas, Cetobacterium, Dielma, and Pseudorhodobacter, but increased Akkermansia. These results indicate that deltamethrin exposure alters intestinal microbial composition in a dose-dependent manner.

Figure 6: Hierarchical cluster analysis based on the unweighted pair group method with arithmetic mean; Relative abundances of dominant microbial genera. (CM: 0 μg/L deltamethrin; LM: 0.2 μg/L deltamethrin; HM: 2 μg/L deltamethrin) Zhou et al. (2021).

Discussion

This project applied a comprehensive RNA-seq bioinformatics pipeline to evaluate the transcriptional effects of DM exposure in goldfish kidney tissue. Beginning with raw sequencing data, each analytical step—from quality control to gene ontology enrichment—contributed to understanding how DM alters gene expression in a dose-dependent manner. Overall, the results strongly align with the original publication while providing additional insight.

The raw RNA-seq dataset and the corresponding goldfish reference genome and annotation (GCF_003368295.1) were downloaded to ensure accurate mapping and quantification. The genome assembly matches that used in the published study, ensuring compatibility between analyses.

Quality control (QC) of the FASTQ files indicated high-quality sequencing suitable for downstream analysis. Per-base Phred quality scores were consistently above 30, N content was negligible, and GC content ranged from 45–47%, closely matching the 46–47% reported in the original study. Adapter contamination was minimal, and overrepresented sequences accounted for less than 1% of reads. Duplication levels were relatively high (68–72%), likely reflecting biological duplication or highly expressed transcripts, as the pattern was consistent across all replicates. No sample displayed abnormal metrics, suggesting the dataset was robust and no libraries required removal.

Reads were aligned to the goldfish genome using a splice-aware aligner (HISAT2). Mapping rates ranged from approximately 88–91%, consistent with the original study, confirming the reference genome’s suitability for kidney transcriptomics. Uniform mapping rates across treatment groups suggest consistent library preparation.

Gene expression quantification was performed using featureCounts to assign reads to annotated genes. Each sample yielded ~41 million clean reads, aligning with the sequencing depth in the original article. Approximately 25–33% of genes were filtered out due to low counts, reflecting typical tissue- and condition-specific expression patterns in fish transcriptomes.

Differential expression analysis using DESeq2 revealed clear, dose-dependent transcriptional responses to DM. In the 0.2 μg/L treatment versus control, 173 genes were upregulated and 123 downregulated (296 total DEGs). In the 2 μg/L group versus control, 531 genes were upregulated and 622 downregulated (1,153 DEGs). The high-versus-low dose comparison identified 1,265 DEGs, demonstrating that increasing DM concentration substantially alters transcriptional profiles. Principal component analysis (PCA) showed clear clustering by treatment with no outliers, confirming data reliability. Heatmap visualization displayed less distinct grouping, likely due to the large number of DEGs and gene-specific expression variability.

These results parallel the original publication, which reported 270 DEGs for the low dose and 711 DEGs for the high dose. Minor differences in DEG counts likely reflect variations in filtering thresholds, DESeq2 versions, or QC cutoffs, but the biological trend is consistent: DM induces a dose-dependent transcriptional response in goldfish kidneys.

Gene Ontology (GO) enrichment analysis contextualized the DEGs within biological processes. Enriched terms included immune activation, apoptosis, complement signaling, programmed cell death regulation, and membrane trafficking, consistent with the original study. These findings reinforce that DM disrupts immune function and cellular homeostasis, even at low exposure levels. Expansion of apoptotic and immune-related GO categories at higher doses suggests a model in which DM induces progressive cellular stress that intensifies with concentration.

Overall, the bioinformatics pipeline—from data download and QC through alignment, quantification, differential expression analysis, and GO enrichment—produced results that are technically sound and biologically meaningful. The findings align with the central conclusions of the original study.

Scripts

Download SRA

This script automates the download of raw RNA-seq data from the NCBI Sequence Read Archive (SRA) using fasterq-dump. It loops through a list of SRR accession numbers, downloads the corresponding FASTQ files, and compresses them with gzip. Job metadata, such as run time and host information, is recorded, and email notifications are configured to monitor job status.

#!/bin/bash
#SBATCH --job-name=fasterq_dump_goldfish
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --mem=15G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

# Load software
module load sratoolkit/3.0.1

# Data is from:
#  https://www.ncbi.nlm.nih.gov/sra/?term=SRP323663
#  https://doi.org/10.1016/j.ecoenv.2021.112716

# Output folder
OUTDIR=~/Goldfish_Deltamethrin_Project/data/fastq
mkdir -p ${OUTDIR}

# Accession list
ACCLIST=~/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# Loop through each SRR in the accession list and download
while read SRR; do
    echo "Downloading ${SRR}..."
    fasterq-dump ${SRR} -O ${OUTDIR}
done < ${ACCLIST}

# Compress all FASTQ files
echo "Compressing FASTQ files..."
gzip ${OUTDIR}/*.fastq

date
echo "Download and compression complete."

Download Genome

This script retrieves the goldfish reference genome, GTF annotation, and cDNA sequences from Ensembl using wget. Files are stored in a designated genome directory, decompressed, and indexed using samtools faidx to facilitate downstream alignment and analysis.

#!/bin/bash
#SBATCH --job-name=get_genome
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=4G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o /home/FCAM/sballas/Goldfish_Deltamethrin_Project/logs/%x_%j.out
#SBATCH -e /home/FCAM/sballas/Goldfish_Deltamethrin_Project/logs/%x_%j.err

# Load samtools
module load samtools/1.16.1

# Make genome directory
mkdir -p /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome

# Download genome, GTF, and cDNA from Ensembl
wget -c -P /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome \
http://ftp.ensembl.org/pub/release-115/fasta/carassius_auratus/dna/Carassius_auratus.ASM336829v1.dna_sm.toplevel.fa.gz

wget -c -P /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome \
http://ftp.ensembl.org/pub/release-115/gtf/carassius_auratus/Carassius_auratus.ASM336829v1.115.gtf.gz

wget -c -P /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome \
http://ftp.ensembl.org/pub/release-115/fasta/carassius_auratus/cdna/Carassius_auratus.ASM336829v1.cdna.all.fa.gz

# Decompress
gunzip -f /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/*.gz

# Create samtools index
samtools faidx /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/Carassius_auratus.ASM336829v1.dna_sm.toplevel.fa
samtools faidx /home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/Carassius_auratus.ASM336829v1.cdna.all.fa

FastQC

The FastQC script performs quality control on raw FASTQ files. It reads a list of sample accessions, runs FastQC on paired-end files, and outputs detailed per-sample quality reports, including read quality, adapter content, and sequence duplication levels.

#!/bin/bash
#SBATCH --job-name=fastqc
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 10
#SBATCH --mem=20G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

echo `hostname`

#################################################################
# FastQC
#################################################################
module load fastqc/0.12.1

# set input/output directory variables
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/fastq
REPORTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/fastqc
mkdir -p $REPORTDIR


# accession list
ACCLIST=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# Run FastQC on all FASTQ pairs (no parallel needed)
for ACC in $(cat $ACCLIST); do
    echo "Running FastQC on $ACC..."
    fastqc --outdir $REPORTDIR ${INDIR}/${ACC}_1.fastq.gz ${INDIR}/${ACC}_2.fastq.gz
done

MultiQC

This script aggregates individual FastQC reports into a single summary using MultiQC. It collects output from all samples and generates an integrated HTML report, simplifying the evaluation of overall sequencing quality.

#!/bin/bash
#SBATCH --job-name=multiqc
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=10G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

# Load module
module load MultiQC/1.15

# Set directories
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/fastqc
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/multiqc

mkdir -p $OUTDIR

# run on fastqc output
multiqc -f -o ${OUTDIR} ${INDIR}

Trimmomatic


#!/bin/bash
#SBATCH --job-name=trimmomatic
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=20G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

# Record where and when this ran
hostname
date

#################################################################
# Trimmomatic
#################################################################

# Load Trimmomatic module
module load Trimmomatic/0.39
module list

# Set directories
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/fastq
TRIMDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/trimmed_fastq
mkdir -p $TRIMDIR

# Adapter file
ADAPTERS=/isg/shared/apps/Trimmomatic/0.39/adapters/TruSeq3-PE.fa

# Sample list
SAMPLES=(SRR14783835 SRR14783836 SRR14783837 SRR14783838 SRR14783839 SRR14783840 SRR14783841 SRR14783$

# Loop through samples
for SAMPLE in "${SAMPLES[@]}"
do
    echo "Processing ${SAMPLE}..."
    java -jar /isg/shared/apps/Trimmomatic/0.39/trimmomatic-0.39.jar PE -phred33 \
        -threads 4 \
        ${INDIR}/${SAMPLE}_1.fastq.gz ${INDIR}/${SAMPLE}_2.fastq.gz \
        ${TRIMDIR}/${SAMPLE}_trim_1.fastq.gz ${TRIMDIR}/${SAMPLE}_trim_orphans_1.fastq.gz \
        ${TRIMDIR}/${SAMPLE}_trim_2.fastq.gz ${TRIMDIR}/${SAMPLE}_trim_orphans_2.fastq.gz \
        ILLUMINACLIP:${ADAPTERS}:2:30:10 \
        SLIDINGWINDOW:4:25 MINLEN:45
done

FastQC Trimmed

#!/bin/bash
#SBATCH --job-name=fastqc_trimmed
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=10G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

# Load FastQC
module load fastqc/0.12.1

# Directories
TRIMDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/trimmed_fastq
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/fastqc_trimmed
mkdir -p ${OUTDIR}

# Run FastQC on all trimmed paired-end files
fastqc -t 4 -o ${OUTDIR} ${TRIMDIR}/*_trim_[12].fastq.gz

MultiQC Trimmed

#!/bin/bash
#SBATCH --job-name=multiqc_trimmed
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=10G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

# Load MultiQC
module load MultiQC/1.15

# Directories
FASTQCDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/fastqc_trimmed
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/multiqc_trimmed

# Run MultiQC to summarize all FastQC reports
multiqc ${FASTQCDIR} -o ${OUTDIR}
MultiQC Results

Index

#!/bin/bash
#SBATCH --job-name=hisat2_index
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=10G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

hostname
date

#################################################################
# Index the Genome
#################################################################

# Load HISAT2
module load hisat2/2.2.1

# Define absolute input/output directories
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/genome/hisat2_index
mkdir -p $OUTDIR

GENOME=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/Carassius_auratus.ASM336829v1.dna_sm.toplevel.fa

# Build the HISAT2 index
hisat2-build -p 16 $GENOME $OUTDIR/Fhet

Alignment

#!/bin/bash
#SBATCH --job-name=align
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 3
#SBATCH --mem=15G
#SBATCH --partition=xeon
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o /home/FCAM/sballas/Goldfish_Deltamethrin_Project/logs/%x_%A_%a.out
#SBATCH -e /home/FCAM/sballas/Goldfish_Deltamethrin_Project/logs/%x_%A_%a.err
#SBATCH --array=1-9%3   # Adjust to the number of samples, max concurrent jobs 3

# Print hostname and date
hostname
date

#################################################################
# Align reads to genome (HISAT2)
#################################################################

module load hisat2/2.2.1
module load samtools/1.16.1

# Absolute paths
FASTQ_DIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/trimmed_fastq
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/align
mkdir -p ${OUTDIR}

GENOME_DIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome
HISAT2_INDEX=${GENOME_DIR}/Carassius_auratus_index  # We'll build index below if needed

ACCLIST=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# Build HISAT2 index if it doesn't exist
if [ ! -f "${HISAT2_INDEX}.1.ht2" ]; then
    hisat2-build ${GENOME_DIR}/Carassius_auratus.ASM336829v1.dna_sm.toplevel.fa ${HISAT2_INDEX}
fi

# Get sample for this array task
SAMPLE=$(sed -n ${SLURM_ARRAY_TASK_ID}p ${ACCLIST})

# Run HISAT2 alignment and pipe to sorted BAM
hisat2 \
    -p 3 \
    -x ${HISAT2_INDEX} \
    -1 ${FASTQ_DIR}/${SAMPLE}_trim_1.fastq.gz \
    -2 ${FASTQ_DIR}/${SAMPLE}_trim_2.fastq.gz | \
samtools sort -@ 3 -T ${OUTDIR}/${SAMPLE} -o ${OUTDIR}/${SAMPLE}.bam

# Index BAM file
samtools index ${OUTDIR}/${SAMPLE}.bam

Samtools Stats


#!/bin/bash
#SBATCH --job-name=samstats
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH --mail-type=ALL
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=15G
#SBATCH --qos=general
#SBATCH --partition=general

hostname
date

##################################
# Calculate alignment statistics using Samtools
##################################

# Load software
module load samtools/1.16.1
module load parallel/20180122

# Define directories
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/align
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/samstats
mkdir -p $OUTDIR

# Accession list
ACCLIST=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# Run samtools stats for each BAM file
cat $ACCLIST | parallel -j 10 \
    "samtools stats $INDIR/{}.bam > $OUTDIR/{}.stats"

# Combine key stats into one table
FILES=($(find $OUTDIR -name "*.stats" | sort))

# Initialize summary table with metric names (SN lines)
grep "^SN" ${FILES[0]} | cut -f 2 > $OUTDIR/SN.txt

# Loop over each stats file and append columns
for file in ${FILES[@]}; do
    paste $OUTDIR/SN.txt <(grep ^SN $file | cut -f 3) > $OUTDIR/SN2.txt && \
    mv $OUTDIR/SN2.txt $OUTDIR/SN.txt
    echo "Processed $file"
done

# Add header with sample names
cat \
<(echo ${FILES[@]} | sed 's,.*/,,g' | sed 's/.stats//g' | tr ' ' '\t') \
$OUTDIR/SN.txt \
> $OUTDIR/Samtools_Summary.txt

Qualimap

#!/bin/bash
#SBATCH --job-name=qualimap
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH --mail-type=ALL
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=5
#SBATCH --mem=10G
#SBATCH --qos=general
#SBATCH --partition=general

hostname
date

##################################
# Calculate stats on alignments using Qualimap
##################################

# Load required software
module load qualimap/2.2.1
module load parallel/20180122

# Define directories
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/align
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/qualimap
mkdir -p $OUTDIR

# GTF annotation file
GTF=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/Carassius_auratus.ASM336829v1.115.gtf

# Accession list
ACCLIST=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# Run Qualimap in parallel
cat $ACCLIST | \
parallel -j 5 \
    qualimap rnaseq \
        -bam $INDIR/{}.bam \
        -gtf $GTF \
        -outdir $OUTDIR/{} \
        --java-mem-size=2G

MultiQC

#!/bin/bash
#SBATCH --job-name=multiqc
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=5G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

hostname
date

#################################################################
# Aggregate reports using MultiQC
#################################################################

module load MultiQC/1.9

# Define input and output directories
STATS=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/samtools_stats
QUALIMAP=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/qualimap
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/multiqc

# Create output directories if needed
mkdir -p ${OUTDIR}/samtools_stats
mkdir -p ${OUTDIR}/qualimap

# Run MultiQC on Samtools stats output
multiqc -f -o ${OUTDIR}/samtools_stats ${STATS}

# Run MultiQC on Qualimap output
multiqc -f -o ${OUTDIR}/qualimap ${QUALIMAP}

Counts

#SBATCH --job-name=htseq_count
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=20G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=stephanie.ballas@uconn.edu
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

echo `hostname`
date

#################################################################
# Generate Counts
#################################################################
module load htseq/0.13.5

# Directories
INDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/align
OUTDIR=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/results/counts
mkdir -p $OUTDIR

# Accession list
ACCLIST=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/metadata/accessionlist.txt

# GTF annotation file
GTF=/home/FCAM/sballas/Goldfish_Deltamethrin_Project/data/genome/Carassius_auratus.ASM336829v1.115.gtf

echo "Starting HTSeq-count..."

while read SAMPLE; do
    echo "Processing $SAMPLE..."
    htseq-count \
        -s no \
        -r pos \
        -f bam \
        $INDIR/${SAMPLE}.bam \
        $GTF \
        > $OUTDIR/${SAMPLE}.counts
done < $ACCLIST

echo "Done!"
date
Heatmap

PCA Plot

References

Blanco, Ayelén Melisa, Lakshminarasimhan Sundarrajan, Juan Ignacio Bertucci, and Suraj Unniappan. 2018. “Why Goldfish? Merits and Challenges in Employing Goldfish as a Model Organism in Comparative Endocrinology Research.” General and Comparative Endocrinology 257: 13–28. https://doi.org/https://doi.org/10.1016/j.ygcen.2017.02.001.
Guardiola, F. A., P. Gónzalez-Párraga, J. Meseguer, A. Cuesta, and M. A. Esteban. 2014. “Modulatory Effects of Deltamethrin-Exposure on the Immune Status, Metabolism and Oxidative Stress in Gilthead Seabream (Sparus Aurata l.).” Fish & Shellfish Immunology 36 (1): 120–29. https://doi.org/https://doi.org/10.1016/j.fsi.2013.10.020.
Wu, Yaqin, Wenhua Li, Mingrui Yuan, and Xuan Liu. 2020. “The Synthetic Pyrethroid Deltamethrin Impairs Zebrafish (Danio Rerio) Swim Bladder Development.” Science of The Total Environment 701: 134870. https://doi.org/https://doi.org/10.1016/j.scitotenv.2019.134870.
Zhou, Shun, Jing Dong, Yongtao Liu, Qiuhong Yang, Ning Xu, Yibin Yang, and Xiaohui Ai. 2021. “Effects of Acute Deltamethrin Exposure on Kidney Transcriptome and Intestinal Microbiota in Goldfish (Carassius Auratus).” Ecotoxicology and Environmental Safety 225: 112716. https://doi.org/https://doi.org/10.1016/j.ecoenv.2021.112716.