06-slides

Project Goal

I am using RNA-seq data taken from Sea Cucumbers (Apostichopus japonicus) that were treated under 2 different temperatures (26°C & 30°C). The purpose being to conduct DGE analysis to determine the biological responses that heat stress induces on this organism. Data was obtained from the NIH website, done by researchers in Qingdao Agricultural University.

Methods: Heat stress experiment

  • 3 controls kept at 18°C

  • Six 26°C (Sub lethal temperature)

  • Three 30°C (Lethal temperature)

  • Sea cucumbers went through a temperature-rise process from 18°C to 26°C or to 30°C respectively, with a rate of 2°C per hour by using a heating rod.

  • Maintained at 26°C temperature for 6 hours and 48 hours.

  • The 30°C treatment groups were only kept at that temperature for 6 hours (likely due to lethality).

  • Intestine tissue was used for RNA-Seq

Preliminary Results: Fast QC results

/home/shared/8TB_HDD_02/hannia/SeaCucumber/FastQC/fastqc \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*.fastq \
-o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output

Conclusion: The QC results show that all sampels have a red “X” for per base sequence content and sequence duplication levels.

Screenshot of the 30°C data.

Preliminary Results: Pseudo-alignment

/home/shared/kallisto/kallisto quant \
  -i /home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx \
  -o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*.fastq

Preliminary Results: Path to DGE Analysis

1. RNA-seq quantification using Kallisto

Set input and output directories
INPUT_DIR="/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq"
OUTPUT_DIR="/home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01"
INDEX="/home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx"
KALLISTO="/home/shared/kallisto/kallisto"

 Loop through all forward reads (_1.fastq)
for R1 in ${INPUT_DIR}/*_1.fastq; do
    Extract the base sample ID (e.g., SRR19635628)
    SAMPLE=$(basename "$R1" _1.fastq)

    Define the reverse read
  R2="${INPUT_DIR}/${SAMPLE}_2.fastq"

   Create output directory for this sample
    SAMPLE_OUT="${OUTPUT_DIR}/${SAMPLE}"
    mkdir -p "$SAMPLE_OUT"

  Run kallisto quant
   "$KALLISTO" quant -i "$INDEX" -o "$SAMPLE_OUT" -t 40 "$R1" "$R2"

2. Creating abundance estimates for gene expression matrix.

perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
  --est_method kallisto \
  --gene_trans_map none \
  --out_prefix /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01 \
  --name_sample_by_basedir \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635628/abundance#.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635629/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635630/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635631/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635632/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635633/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635634/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635635/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635636/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635637/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635638/abundance.tsv

3. Top 100 Differential Expression Results

4. Top 50 most deferentially expressed genes.

Plan for next 4 weeks

  • Import missing file for 30 deg C treatment

  • ID the names of the genes in the results to understand how to interpret results

  • Edit the tables to show the names of the samples in a more clear way (Ex. Instead of XM32333 -> Control 1, Control 2, 26_1, 26_2, 26_3…ect.)

  • Look through literature and ask for advice on how to conduct the DGE analysis and interpret the results

  • Complete a comprehensive analysis from data QC to DGE analysis