wget -P /home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/data http://pver.reefgenomics.org/download/Pver_genome_assembly_v1.0.gff3.gz
gunzip /home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/data/Pver_genome_assembly_v1.0.gff3.gz
wget -P /home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/data http://pver.reefgenomics.org/download/Pver_genome_assembly_v1.0.fasta.gz
gunzip /home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/data/Pver_genome_assembly_v1.0.fasta.gzLong non-coding RNA discovery in Pocillopora sp.
Introduction
This project uses RNA-seq data from Danielle Becker’s experiment exposing colonies of Pocillopora sp. to low level nutrient conditions (nitrogen and phosphorous).
- Original Experiment GitHub Repo
- Project OSF Link
Experimental Design
This study attempts to identify and understand the expression of long non-coding RNAs in corals experiencing low level nutrient enrichment. Long non-coding RNAs (lncRNAs), RNA strands >200bp that do not code for proteins, are implicated in important processes of gene regulation, but little is known about their function in corals.
- Pocillopora sp. corals were experimentall enriched with dissolved inorganic nitrogen and phosphate for 15 months on an oligotrophic fore reef in Mo’orea, French Polynesia.
- Tissue samples were retrieved, went through RNA extraction, and were sent off for sequencing.
- The primary data type used in this study is RNA-seq data.
Objectives
- Identify lncRNAs present within the RNA-seq data
- Create a list of lncRNA transcript IDs
- Examine the sensitivity of lncRNA expression to low level nutrient enrichment with differential expression analysis
Part 1: Finding lncRNAs
Data Wrangling
Get data from reef genomics genome assembly (GFF3) and scaffolds (FASTA) as well as genome assembly GTF created by Roberts Lab Research Scientist, Sam White…
curl -o ../data/Pver_genome_assembly_v1.0-valid.gtf https://gannet.fish.washington.edu/Atumefaciens/20230127-pver-gff_to_gtf/Pver_genome_assembly_v1.0-valid.gtfDownload 32 paired-end RNA-seq FASTQ files from Roberts Lab server, Gannet. This downloads all FASTQ files at once…
wget -r \
--no-check-certificate \
--quiet \
--no-directories --no-parent \
-P /home/shared/8TB_HDD_01/pver \
-A *fastq.gz \
https://gannet.fish.washington.edu/Atumefaciens/hputnam-Becker_E5/Becker_RNASeq/data/trimmed/What an unzipped FASTQ should look like…
@GWNJ-0957:702:GW2012083569th:7:1101:21278:1608 1:N:0:TCCGGAGA+TTTCGCCT
TATGCTGTGTTGAGTTCAGTATGCAACCCAATTGATACTGTTCATTTTGTGTGCTTCTAGGAAACTGAGGACAATATGATATTCTGTAGACAGCCCAGGATGAGAGATTCCTCTGTGCGAGTGGTCTCCAAATAATATCAACATCCTGCA
+
AAAFFJJJJJJJF-AA7FJJJJJFJFAFFJJFJFJJJJJJJJJJAJJJJJFJJAFF<F-<JFFJA<F7FA7AFJJJJJJJJJJJAJFF-FFJJ-7AFAJJJJ<JFJJJJJJFJJJJJJFJJJJJJFFJJJFJJJJJ<F7AJJJJJJFA<F
@GWNJ-0957:702:GW2012083569th:7:1101:13758:1907 1:N:0:TCCGGAGA+ATTCGCCT
GGATCATCATATTTGTTGTTCACCAGGTTAAGCTCGTGGCGTAGATGATCGACGGGTGAATCCAGGTTCTTTATGTCTGCATCATTTTCTTTAAGTCATCTTCTGGCTTCCGTTAGGTCCTCTTGAATTTGCTCGTTCTTGCTCTGCTTG
+
AAFFFJJJJJJF-AF<AFFJJJJJJJJJJJJJJJJJJJJJJJJJFAFJJJJFJJA<JJA<JJJJJAAAJJJJJFJJJJFJJJJJFJJJ-FJ-AFFJ--AJFJJJJJJFFJJAJJJJFJ<JJJJFJJJJJJJJFAFJJA-AFAFFJ<777-
@GWNJ-0957:702:GW2012083569th:7:1101:8389:1924 1:N:0:TCCGCAGA+CTTCGCCT
CCCAAAGTCATTTTCCCAGTCTGTGCGGGGACATCGTGGAGGAGTATAAATGAAACTTGCTCTTGCTGTGTAGTAGAATTTACACTTGGTTACAGGATCAAGGTGAGTTCGGATATCTTTTGTTTTCTTATCAAACCAGTGTGTAATATC
HISAT2
HISAT2 build to create an index for the Pver_genome_assembly_v1.0.fasta file…
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/Pver_genome_assembly_v1.0.fasta \
../output/Pver_genome_assembly_v1.0-valid.indexHISAT2 to align paired-end RNA-Seq reads to the Pver_genome_assembly_v1.0.fasta index…
find /home/shared/8TB_HDD_01/pver/*gz \
| xargs basename -s _R1_001.fastq.gz | xargs -I{} \
/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/Pver_genome_assembly_v1.0-valid.index \
-p 8 \
-1 /home/shared/8TB_HDD_01/pver/{}_R1_001.fastq.gz \
-2 /home/shared/8TB_HDD_01/pver/{}_R2_001.fastq.gz \
-S /home/shared/8TB_HDD_01/pver/hisat-output/{}-valid.samSamtools
Samtools to convert the SAM files output from the previous HISAT2 command into sorted BAM files…
for file in /home/shared/8TB_HDD_01/pver/hisat-output/*-valid.sam; do
base=$(basename "$file" -valid.sam)
/home/shared/samtools-1.12/samtools view -bS "$file" | \
/home/shared/samtools-1.12/samtools sort \
-o /home/shared/8TB_HDD_01/pver/samtools-output/"$base"_valid_sorted.bam
doneStringTie
StringTie to assemble transcripts from the sorted BAM files generated by the previous Samtools commands…
find /home/shared/8TB_HDD_01/pver/samtools-output/*bam \
| xargs basename -s -valid_sorted.bam | xargs -I{} \
/home/shared/stringtie-2.2.1.Linux_x86_64/stringtie \
-p 8 \
-G ../data/Pver_genome_assembly_v1.0-valid.gtf \
-o /home/shared/8TB_HDD_01/pver/stringtie-output/{}-valid.gtf \
/home/shared/8TB_HDD_01/pver/samtools-output/{}-valid_sorted.bam \Use StringTie merge to merge all the individual GTF files into a single merged GTF file…
/home/shared/stringtie-2.2.1.Linux_x86_64/stringtie \
--merge \
-G ../data/Pver_genome_assembly_v1.0-valid.gtf \
-o /home/shared/8TB_HDD_01/pver/stringtie-merge-output/stringtie_merged.gtf \
/home/shared/8TB_HDD_01/pver/stringtie-output/*-valid.gtfGffCompare
Use gffcompare to compare annotation file generated by StringTie to a reference annotation file and to produce a set of output files summarizing the results of the comparison, including classification of each transcript…
/home/shared/gffcompare-0.12.6.Linux_x86_64/gffcompare \
-r ../data/Pver_genome_assembly_v1.0-valid.gtf \
-G ../data/Pver_genome_assembly_v1.0-valid.gtf \
-o /home/shared/8TB_HDD_01/pver/gffcompare-output/gffcompare_merged \
/home/shared/8TB_HDD_01/pver/stringtie-merge-output/stringtie_merged.gtf \Filter
Filter to get a subset of the transcripts from the original GTF file that are putative lncRNA candidates based on their length and lack of overlap with known reference transcripts…
awk '$3 == "transcript" && $1 !~ /^#/ {print}' /home/shared/8TB_HDD_01/pver/gffcompare-output/gffcompare_merged.annotated.gtf | grep 'class_code "u"' | awk '$5 - $4 > 199 {print}' > /home/shared/8TB_HDD_01/pver/filter-output/merged_lncRNA_candidates.gtfBedtools
Get fasta of lncRNA candidate regions with Bedtools…
/home/shared/bedtools2/bin/bedtools \
getfasta -fi ../data/Pver_genome_assembly_v1.0.fasta -bed /home/shared/8TB_HDD_01/pver/filter-output/merged_lncRNA_candidates.gtf -fo /home/shared/8TB_HDD_01/pver/bedtools-output/merged_lncRNA_candidates.fasta -name -split>transcript::Pver_Sc0000000_size2095917:21743-24809
CTTACCTTTTAGAGCATGTGCCAATGTCTTTACCCAGCCTGTCATATGACATTCACACGTAACCAGTCAATATTTTCTGTTTTTACAATTTAAATCAGCCTCGCTGAGAATTTGACGTCACTTTTTTTAGTCGTCGAAGAAAGGTGGCTCACCCATGTGATTTCGAGTAATTGTTTCTATCATTCTCATTTATTTCCCCTTTCAACTAACTGAGGTAATTACACAAGGAGCATCGACATTATCTTTTTCAGTTTTGTACGTTCTGAACATGGTTTTCAAGACTATCGTTTGCATTCTGCTTGCGAGTCAAGTATTCAGCATCGGTAGGTCTCTTCCTATATTTATTCCATCGTTTGAGGATGCTCTTAAATATTTCCAAATGTTTTAGCCGGTAACCGATTTTTTTATTTTTTTCTGCTCGTTGCATACTCCATGGTTTTTTTTCTGTGTTCTCTAATCGAGAATTTCGAGTTTTCGGGTAGATTTTTCATACGATAAGAAAAGAATTGACAACTAGCTGCTTCATGGTGGAGCCTATTTATTAATTTTTTTTGTTTCAGGTTATTTTATCTCTCCTCTCTAAAGTTCTCCTTGATGAGAGTCCACCATTTTCTCTGTCAATTTTCCATTCCCTCAATATTCCAAAGAGTTTTTGACCAAACAAAACGAGTTTCTCACCACACTTATTGGATATTTCATAGTACCCTTGGCTTATTACTTCTCCTCTTTGCTTGTGCTATTTTTTTGTTTAATTTGCGTCTGATTCTTCTTAGTTGTTTTTAATTTTTTTCATTTTGTATCTTTTTTAAAAATTCTTTATTTTTACTGATCCCAGCATGTTTCGGGATCTTTTGCGCTAAAATCACTCATGGTCATGTAGAAAAGTCACTCCTAAATGAAATAACTTACAATTTGTCGGTCATTTCTTGTTTACAGACGTCATAGACTACGTACCCGTCGGATGTTACAAGGACATACGCTTAAATAGGGCCTTGCCTACTTTGCTCGCAAACTACCGCGTGAAAAACGCAAAATGGCCAGATATCTTGGACTGGACTGATTTGGAAAACTCTGTCATTAAAAAATGCGCCACTAAGGTTTGCCTGATTATTATTGTTGACATTGTTGTCGTTGTTGATGTTTTTCATAATAGTTGCAAAGAAAAAAAATGGCGAAGGTACTTGCACAGACATCTCTCTTAATTCGCCACGGTACTTACGGTCGTCTTGGTAACTTTATTTTCTTTCTACGCATACAGCTCTGTAAGAAGCCTGTTACACTGAACTGACGTTTTTCAAAATTTTATACGAGAAACATGACGAGCATTTTTTGTCCTATTCAGGCTAAAGAGAGGGGCTATTCATTTTTCGGGATCCAATTCTATGGAGAGTGCTGGAGTGGATCAGCGAGTCAAAGTACCTATGCTAAACACGGTTTATCCAAGAGATGCACCACGGCGGCTCCATATGTGGGAAAGGATCACGCCAACTTCGTTTACATGCTTACTGAAGGTTAGCACTTACTAGGAAAGATTTACATTGATGCTTGTTGTACTTGTGTACATATGTAAGCGACTAATTGATTACTGGAACTGATTCTCCTCCTATTTCGGTTAATCCACATGTGGCGGGTGGATTAAAGAGTCCATTGTACCCAGCCGTTACACGGAAGGCACGTGCTTTAGTCAAGTCAAAGCCTTGTTTCCTTATTTTCATTAATTATTTTTGTTATTCTTGTTGCTTTCAAAGAACTGTTTGTTCTGTTTTTTACTTCTAAAGGTTGATTTGTCTTCTTTATCTTTTAATACTGTGCCGAGTTGGACCCAAACGACCTTTTGCTCAAGACCTCGATAGTTAAAAATTTTGTTTTCGTGGGTTTGATTTTGTGCGTGAAGTATTATTGCGCGAAATATTATTAAACAACGTCATAGTAAGAGTCACAGACAATGGGCGAAAGAAAAAGCAGACGAAGAAAATAGATTACAGGTAGGTACGCGTGCGGTCTAGCACAATTGTCATCGCAATGTTGTACGGTTTCGATACTAAAGCGTCCTTACGCAATTTTGTAACAGTATACCAATATTTTCTCTTTTGTGTATACTTCATGCAAATTCTGCTGTTGTTCTTATGCAGAAAATGAATGTAAAAGCTACAAAATTTTTGACGACGCCTCCCGATCTCAGTTTCATGTAACTAACTATGGACGCCCGATTAGATGCGACAATGGACAGTTGGATGAAAACCGCTGGTATAGATTTACTGGCCAAGCAGGTAGTGCCATGGCAGACCAATGTGTCCCATCCAGCAGGTGTCAGACGTTTGGTACAGGATGGTACAGCGGATCTTATCCTCAGGTAATCCTCAGTTTAAATTTGGATTATCTCTTTGTGTTCTATTGGCCCATGATTTTTTCGTGACCATCTGTATAACGAGGGAGTGGCTTGTTTACTCAAAGTAAAATGTCAACGCATGGGAAATGGTCAGGTAGCAGTGGTAACTATATCATCCACTACCACTCATTAGTCAAAAGTAGTCATTCATTTCGCTTGGGGCATTCTTTTGAAGAGAAGATCTCTTCACCTTAAGATAAACATTTACTGACCAAAGAATCTATTCCACACTTCATGTTGAAAGTTAATGAACTAACTTGGACTAACTCCAAACTTTGTATCTCTGTCAGTCGTATTCCTTGTTAAGACAAATTCAAACCAAACCAAGGGAATGAACACTAGGAATGCCATATCAATACGCATTAACCATTTTCCATCTATTTTCTTTCTGGTTTTTTTTTTGTAAATTGAAGGTTTCAGACGGAATTCAAGAAGGAGATGTTTGCTTCAAATGGGATGGCAATTGCTGTAGCAAAAAAGTCAACATTCGCATTCGAAACTGTTCATCTTTTTTCGTTTACAAGCTGAACAACAAGATCAGCTGCCAGTATGCATATTGCGGAAACGGGCAGTGATTAGGAGAGGTCTTCAGTAATGAAACGGGAAGTGAACTGACCCACTGTATCAGCCAGTCATTTTTAACATTATTATCTGCTCAACACACAGCTGATATCAAG
>transcript::Pver_Sc0000000_size2095917:205920-207483
GAGACCTTCGGGCTCGAAGCTTGGGAACTAGTATTCAGGGAAAGATAAAATGGCGGCTTACGGCGGCGAAGGAAATCTTAGTAGTTTAATATTTTAGCGTGAAAAGGCTTGGGTGTTGTGATCTGGTTCTCAAGTAAAAAAAAATCACGGTCTGCAATGGCACTACCACCTTTTGAACAACAAGAAAACATTGAAACGGTGGTAAAAAGCAGCAAGTTTCGAAGTCTAAACGAAAGATGGGTAAGCTGAGTTTCAGTCAAATATCATGACATTTGGGATGTAATTTCGATACCGAATAAATAGTTCTACATGATTATACACGCATAGGATTCTGTTGTAAGTGGTAGGAAATAAAATATATGCATACTTGTTTCGTATTGAAGCCCTGTAAAATCACAAATTTGATTTATTTCTTCTTTGTTTTAGGTGGTCATTATCCATTATTTTTGGAAGTCGAATCATACTGATGAGAAAGAAAAAATGTTTACGATTATTTCTTCTCTCGAACATTTTACTATTTAGGTACCAAATAATCTTGATACTGAAGATCATTGCATCCTATTATGAAAAAATTACACTTGATTGCTTTGTCAACTGGAATGAATAATGTAATATACCTTGTGATTTGTGGGATTTTCACAAGTAAACTAGAAATACTTGAAGAGGGTGATGATGTACTTAATGCAGAGCATTTGGTACAAAGTCTGCTTTGCTGTAATCTACCTGAAGGTGCTTTTTATATGCTAGTAAGACTTGCACAGCCCAGTCAAAATGTTTTTGAGTAGTAGACAAGCTGGCTGTAGTAATTATAGGCAGCAGAAGCCAAACCTCTTCCTGAAAAGCAGTCTGAAAAGTGTGAAGTTAGCGGAAGGTACTTCATTGTAGGCAGCCAAATTGTAAGCTGCCAGCCCATTTTGAGTGAGCAGTAAAACTCAAGATTGGCTATTTTAGCTAATTTTATTGAGAACCAAATTCCCAGCAACGGGCATATTTTTACTGGGCTATTTGTTATTTAACTACTTAGTAATATTCTTATTAAAATTAACTGCATTCCTTTCTAATAATTTGCTGCTGATTGTATGCTGTATATAGCTTATGATTTGTGTGATTTTCACAAGTAAATTAGAAATACCTGAAGATGGTAATGATGTACTAGATAATGTACTTGATGCAAAGCTATTTTGGTACAAAGTGTGTGCCTTGTCCGTAGGTGATTTTGCATGCTAATAAGACTTGTAATTCAAATAATACACATCAGCTGGAGTGGGAATATTAAAAGAAAAATAATAATATTTAGCTATACCAAGTTCTGGTTTTATGTCCAAANTTTCACGCTAAAATATTAAACTACTGAGATTTCCTTCGCCGCCGTAAGCCGCCATTTTATCTTTCCCTGAATACTAGTTCCCAAGCTTCGAGCCCGAAGGTCTCAGACATGCACTGAAATTATGATATTCGTTATTTCAGGGCCAATTTAGAAATAACGATAATCGTTATTTTGGTGTCGAGATCATGCTGATCACAACTGAACGCACATTTACCCCTTTCCTCCGCCATCTTGAA
>transcript::Pver_Sc0000000_size2095917:209824-211345
CATATCAAACTCTTTCCCCAGCGCGATGAACATTTGTTTTTCTCGGTTCCAGGTGTACGCGACCATGCACATGCTCTCCGTTGCGAATTTTCAAAATGGCGGACAGGCTAAATAGTGAAATAAAAAAAGCAAAGTGATTTTGCAACAAATACTTAACATAAACCCCCGTTAGAAAGGCGAATCCACGATAGAAAGATACAAAACTGTATGATATTGAGATCAGGGAGGTTGACAAAATCAACAAGAGAGGTTGAAAATTCATTACGTTGGCTATGGCACATGCTTCGATGAATGGACACCCTTCAGGGCCGATAGAGGATCTGGATATTTTCCGTTTGTTCGCAGAGAAAAGCTGTTCATTCCCTCGGAAAATTCTCTAGAGGATAGAACAGAATCGTTCCATGGCCAACTTGCTCGTGAAATAAAGAAAAAGCTTTGGTCTGGACAGCGAAAGGACCCTGATATAAACATAGGCCTTAATGTTGACCAAGATGTTTTCATGGAAGGTCTTGGGAGTGTTTTTCCCGGCGTTCTTCAAAGAGGGAAGATGGTTTACCAATTGGAGTCAAATAGGGTGCTTGATTCATATCTGGGGCTGAAGTGGGATGAAAGAATTTTAAATGTAGGTGAAGATTTTGCCTATGTAATTCCAGGAACTATTAAGTATTGGCTGGGACAAAGAAGTCCAATCATTGAATACAAGCTCATCGTTGTAGGTGTTCTTAAACCCCTTCATAACCTGCTGCCTCCAGGAACTGAATCCTGTTTTTCTTCAGAAACAGTTGTTTGAGGGCATCAACCCCTATGTGAATTGTGGATCTGAAGAAATCTTTAGAAGCTCTATTATTGTATTTTTCCAAGCCTTGCTGTGTGTAATATGCTATGTTCTGATAAAGTGAGAGGAACTCAGGAACATGATAACATAGTGCATGCATGTAGGGAGTGACATCTTTTGCTTGGTAAACTTTGAGAAAGTGCTCAAACCATGATTTGATATTTCTTTGCAGTTCAAGAATACCCTCTTCAGATGTAAAATCTAGCCTCAAATCACCAATTATGTCCATAAATTTAGCCCAGATCACTTGTATGTCTTTGTTTTGACTGCAAAGAGGGACAAGCATAGGTCGGAATGTTTATATTCTGAAAGAGTTTCAACTTTTCTGGTTTGGTAAGATCTCTATATTCCAGCTTTTTGCTCTCTTTGTTGATTCTAAAATTAAAAGTGATTCCCATACTTTTGATAAATTCTTCATAACATGCCATGTGTTTATNCGATCCCTCTACTACTGCTTTCACGCTCACCAAACAGGAAAAAATCAAAAGGAGGAAGATACAGCTGTTTTGAAACGCCATTTGACGAGGGTAAATCTGCTTTTTGTTCGACACTTTACATATCAAAACAAAGGAAATTTGTTCCTCTTTTTGATTCTGGCGGTACATTTTTAAATTGCGCTTGCGTAAAAGGGATATCGCCCGAGACGCTTCGCTCGGTTGTGTTTTGAACAAAAACGCCTTGTTAAT
>transcript::Pver_Sc0000000_size2095917:209932-211345
TAAATAGTGAAATAAAAAAAGCAAAGTGATTTTGCAACAAATACTTAACATAAACCCCCGTTAGAAAGGCGAATCCACGATAGAAAGATACAAAACTGTATGATATTGAGATCAGGGAGGTTGACAAAATCAACAAGAGAGGTTGAAAATTCATTACGTTGGCTATGGCACATGCTTCGATGAATGGACACCCTTCAGGGCCGATAGAGGATCTGGATATTTTCCGTTTGTTCGCAGAGAAAAGCTGTTCATTCCCTCGGAAAATTCTCTAGAGGATAGAACAGAATCGTTCCATGGCCAACTTGCTCGTGAAATAAAGAAAAAGCTTTGGTCTGGACAGCGAAAGGACCCTGATATAAACATAGGCCTTAATGTTGACCAAGATGTTTTCATGGAAGGTCTTGGGAGTGTTTTTCCCGGCGTTCTTCAAAGAGGGAAGATGGTTTACCAATTGGAGTCAAATAGGGTGCTTGATTCATATCTGGGGCTGAAGTGGGATGAAAGAATTTTAAATGTAGGTGAAGATTTTGCCTATGTAATTCCAGGAACTATTAAGTATTGGCTGGGACAAAGAAGTCCAATCATTGAATACAAGCTCATCGTTGTAGGTGTTCTTAAACCCCTTCATAACCTGCTGCCTCCAGGAACTGAATCCTGTTTTTCTTCAGAAACAGTTGTTTGAGGGCATCAACCCCTATGTGAATTGTGGATCTGAAGAAATCTTTAGAAGCTCTATTATTGTATTTTTCCAAGCCTTGCTGTGTGTAATATGCTATGTTCTGATAAAGTGAGAGGAACTCAGGAACATGATAACATAGTGCATGCATGTAGGGAGTGACATCTTTTGCTTGGTAAACTTTGAGAAAGTGCTCAAACCATGATTTGATATTTCTTTGCAGTTCAAGAATACCCTCTTCAGATGTAAAATCTAGCCTCAAATCACCAATTATGTCCATAAATTTAGCCCAGATCACTTGTATGTCTTTGTTTTGACTGCAAAGAGGGACAAGCATAGGTCGGAATGTTTATATTCTGAAAGAGTTTCAACTTTTCTGGTTTGGTAAGATCTCTATATTCCAGCTTTTTGCTCTCTTTGTTGATTCTAAAATTAAAAGTGATTCCCATACTTTTGATAAATTCTTCATAACATGCCATGTGTTTATNCGATCCCTCTACTACTGCTTTCACGCTCACCAAACAGGAAAAAATCAAAAGGAGGAAGATACAGCTGTTTTGAAACGCCATTTGACGAGGGTAAATCTGCTTTTTGTTCGACACTTTACATATCAAAACAAAGGAAATTTGTTCCTCTTTTTGATTCTGGCGGTACATTTTTAAATTGCGCTTGCGTAAAAGGGATATCGCCCGAGACGCTTCGCTCGGTTGTGTTTTGAACAAAAACGCCTTGTTAAT
>transcript::Pver_Sc0000000_size2095917:309051-311795
AGCAAATCGACGTGGGAGCTTAAAGTAAATCGATGTCCCAGCAGAAATAACCGGTGATTTCAGTCAGGAAAACGCACCAGCATTTGCGTAAGCTTGTTAGGCCACCGAAAGTATACCATTTGTACATAAAACAATTCTTCCTAAATTATTGGTTTTACAAGTGATTATGGTTTTGATAAGGGGACTCATGGTAGTTTGGGATACATCGGTCAAAAATCCGCATTTCTTAAAGAATTATGTTTGCCATTACCATGCAACAGACAAAGAGACTCGTCGTGCTCTAGTATCGGAACTTTTTGATAGAAATATACCAAGGCACTCTTAATGCCACTTCATGTGGCTCCTAGTGTCTTGTTGACACAGCTTTATCAGGGAAAATGTTAATGACATAAAGAATATAATCAGTTCAACATTCTCCACAGGCTCAATCACCGGCGCACGAGGAAACTTTTTTTATTAAATCTAAATTAAAGTCGATTTAGATAAATTTCACTGATTGACTTCGCTGAAACGACTTCCAACTATGAGAAAGTCAGGGGGAAATACTGAACTTCTCTGACGGTAACAATATCAGAAAGATTCCTGACTAATGCGTCTGCGAATTGTTCATGAATCGGTTCAACACAGTTTGATTCAGCTTATTGTTTTTTTGATTTTAGATTGAACTGCAACGGGAGTTGACAAACACACCTGCTTTTTCGAGGCTTTAACATCTCTTTCTAATTATCCTTACCCAGTCTATTCAGCTACATGTTTTATTATGGCATCCTCACAGTGTCTGATCCTGGAGATTAAAGTGAATATTTCCATTACAAAAGCGCTTAAAAACTCGTTTTCAAGAGCCCCGGTGGCATTTCTATCAGATCAAAGAGTGGTCGAAAAAGAGAATGAGAGAGAAGAAGGACGCCTCCCGCTGGTAATTTTAATTTTGTGGTTCCTATGGTGAAGAAGACGACTCGTTTTGTTGGTTTCCTAGACCTATTATAAGTATAAATACAGCAAACCAAGAAACTCAATTAAATTGTGAATTACAGAAGTGCAGGTTATGCAAAAAAGAACTAGTGGAATAACGACAAGGGCAGAAATTCAATCGCCGCACAACAGTGAATAGCGGACCCTAGGCGAGAAAGAAAGATACGTTCGGAACTGGTCCAAAGCCTTGCAAAGCTTCTAAACGTAAGCGCAATTCAGAACGAACACGTCGAAAGAAACGGAGTTCTCATGCAGTAGAAGTTGATCATCGCCCTACAATAAATAGCTTATCTAAAGCACAAAAGCAAGATGAGCTCGGATGTCATTCAAAACATGACAAAACGACTAAATTCGGGTGCAATTCAGAACAAGTCTTCCGAAGGCAACGGACTTCAAATCGACAGAGTGCTCGAGGCGATTTTACTCTCTTTTTTGGCCCTCTTCGCTATCGTGGGAAATGGCCTTCTCATGTACGTCATATACTCAAATCACAATCTTCGAACTCTTCGGAATGCTTTTGTAGCAAGTCTTGCGGCTGCCGATCTCATGTTAGCTTGTACAGACATTATTTACCAAGCTGTGGAAAAGGTCATAATCGACTTTAAGCCTATCCACCCATCGGTTTGTTACATTATCTTGCTTAGCGGAGTATTGTTCGGTTCTGCTTCTGTGTTTAACCTCACCGCCATGACTTTGGTCCGTTACGTAAACATTCGCTACCCTCTTCATTTTAACCGTTACCTAACTGTCCATCGCTCTACTACCATTGTACTGGTGACTTGGCTAATTGCGCTTTGTCTCGCGTTCCCGCCTCTCATTTGGCGTCCGGAAGGCGTCGTGTGTTCGACAACGAAGCCTTCCGACGAGCACATCTTAAACGAAGCCATATACATGAGTGCCGAGTGGTTAATTTGGTTCATTATCCCAGCGATACTCATAACGTTTTCATATTATCGCATATACATGATCGCAAGAACTCAAGCCAGGCAGATAGCTGCACTAGAAGTGACTGCAGTCAGTGAAACAAATTCTAAGAATTCTTTTCCTGTAAGAGGACGAGCGTCGTCTTTGAGAGAAAAGAAAGCGGGGCGAATGGTGGCCATTTTGGTTGGATTTTGGGTTCTTTGCTGGCTTCCGTTCTTCACAGTGCTCACCATAAGTAAATTCAAGTCTCAAGTGCCAGGCTTTCTCATGCGTTTGTTTCTGTGTCTTATGTTTGCAAACTCAGCAATTAACCCTATTGTATTAACGTGGTATAACCGTGAACTTAAAGAAGCGATTAAGAAGTTGTTCTGTCGAGAAAAACAGCGTAACCAGTCGTTATCCATGGCAGAAACACGGTCAAACTTGAGAGGATCAACTTCTAGAACAGCATTTTAAACATGGCATACAAACTCATAAATAAGACTGATCAAGGGCGCTGGCTGGTGTTTCTCTCAGAATGTTTTAAACACCTACTGAGAAGGGAGGACAAAATGTAAGAGCATGCCGTGTCAAAAAAAGTATTATTACTCGCAAGAAATAGCTCTGGATGGTACTTTAAACGAGTGAGTTTTTACCCCTTAAAATCCATTCAAATAAGTCGCAATCTTTGATTCTTTATTACGTACAGAAAATAAATCTAATCAAATTTTATGATTACTGAATCGTTATAAACTCAAGGTTAATGTCATATAGGATCTCAAACTCACAGCAAAAGAAATCACCTTGTGGCAAGGACAATAACTCTTTACATGACATCAAATACAAAATATCAGACGGACCTAGTCTCG
CPC2
Evaluate coding potential of transcripts with Coding Potential Calculator 2…
eval "$(/opt/anaconda/anaconda3/bin/conda shell.bash hook)"
python /home/shared/CPC2_standalone-1.0.1/bin/CPC2.py -i /home/shared/8TB_HDD_01/pver/bedtools-output/merged_lncRNA_candidates.fasta -o ~/github/zach-lncRNA/output/merged_cpc2_resultsFinal Filter
Filter for those transcripts with label “noncoding”…
awk '$8 == "noncoding" {print $1}' ~/github/zach-lncRNA/output/merged_cpc2_results.txt > ~/github/zach-lncRNA/output/noncoding_transcripts_ids.txt
grep -Fwf ~/github/zach-lncRNA/output/noncoding_transcripts_ids.txt /home/shared/8TB_HDD_01/pver/bedtools-output/merged_lncRNA_candidates.fasta > ~/github/zach-lncRNA/output/merged_final_lncRNAs.gtfThis results in a final list of transcript IDs identified as lncRNAs. The final GTF of lncRNA transcripts can be found here.
>transcript::Pver_Sc0000000_size2095917:21743-24809
>transcript::Pver_Sc0000000_size2095917:205920-207483
>transcript::Pver_Sc0000000_size2095917:209824-211345
>transcript::Pver_Sc0000000_size2095917:209932-211345
>transcript::Pver_Sc0000000_size2095917:405436-413642
>transcript::Pver_Sc0000000_size2095917:434811-435922
>transcript::Pver_Sc0000000_size2095917:437235-439117
>transcript::Pver_Sc0000000_size2095917:645784-646411
>transcript::Pver_Sc0000000_size2095917:811102-817194
>transcript::Pver_Sc0000000_size2095917:812141-814871
>transcript::Pver_xpSc0122334_size1013:0-293
>transcript::Pver_xpSc0122334_size1013:566-801
>transcript::Pver_xpSc0122341_size1005:3-987
>transcript::Pver_xpSc0122357_size967:266-953
>transcript::Pver_xpSc0122375_size948:11-928
>transcript::Pver_xpSc0122429_size882:198-839
>transcript::Pver_xpSc0122462_size858:90-315
>transcript::Pver_xpSc0122471_size849:473-849
>transcript::Pver_xpSc0122613_size713:64-293
>transcript::Pver_xpSc0122829_size550:47-547
Part 2: Differential Expression of lncRNAs
Kallisto Abundance Estimation
Transcript quantification using the Kallisto software on multiple pairs of FASTQ files…
find /home/shared/8TB_HDD_01/pver/*R1_001.fastq* \
| xargs basename -s _R1_001.fastq.gz | xargs -I{} /home/shared/kallisto/kallisto \
quant -i ~/github/zach-lncRNA/output/kallisto-output/Pver_genome_assembly_v1.0.index \
-o /home/shared/8TB_HDD_01/pver/kallisto-output/{} \
-t 20 \
/home/shared/8TB_HDD_01/pver/{}_R1_001.fastq.gz \
/home/shared/8TB_HDD_01/pver/{}_R2_001.fastq.gz \
2>&1 | tee /home/shared/8TB_HDD_01/pver/kallisto-output/{}.outTrinity Count Matrix Generation
Generate a count matrix from Kallisto abundance estimates for all transcripts in RNA-seq data…
perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map none \
--out_prefix ~/github/zach-lncRNA/output/count_matrix \
--name_sample_by_basedir \
/home/shared/8TB_HDD_01/pver/kallisto-output/*/abundance.tsvFilter Count Matrix for lncRNA Transcript IDs
Extract counts only for lncRNAs and save as a new filtered count matrix…
# Specify the transcript ID file (GTF format)
transcript_id_file="/home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/output/merged_final_lncRNAs.gtf"
# Specify the count matrix file
count_matrix_file="/home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/output/count_matrix.isoform.counts.matrix"
# Specify the output file for the filtered count matrix
filtered_count_matrix_file="/home/shared/8TB_HDD_02/zbengt/github/zach-lncRNA/output/filtered_count_matrix.tsv"
# Extract the middle part of the transcript IDs from the GTF file
grep -oP "(?<=transcript::)[^:]*" "$transcript_id_file" > "tmp_transcript_ids.txt"
# Store the first line (column headings) of the count matrix file
head -n 1 "$count_matrix_file" > "$filtered_count_matrix_file"
# Filter the count matrix using the extracted transcript IDs, starting from the second line
awk 'NR==FNR{a[$0];next} ($1 in a)' "tmp_transcript_ids.txt" <(tail -n +2 "$count_matrix_file") >> "$filtered_count_matrix_file"
# Remove the temporary transcript ID file
rm "tmp_transcript_ids.txt"The final count matrix of lncRNAs can be found here. Note: Counts need to be rounded before DESeq2 analysis
DESeq2 Differential Expression Analysis
Install the DESeq2 package from Bioconductor and load necessary packages to run DESeq2 and create figures…
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")library(DESeq2)
library(tidyverse)
library(pheatmap)
library(RColorBrewer)
library(data.table)Create data frame for DESeq to reference. This specifies which sample IDs are control and which are treatment (nutrient enriched)…
metaData <- data.frame(
sample = c("C17", "C18", "C19", "C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28", "C29", "C30", "C31", "C32", "E10", "E11", "E12", "E13", "E14", "E15", "E16", "E1", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9"),
treatment = c("Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched", "Enriched")
)Load lncRNA filtered count matrix…
countmatrix <- read.delim("~/github/zach-lncRNA/output/filtered_count_matrix.tsv", header = TRUE, sep = '\t')
rownames(countmatrix) <- countmatrix$X
countmatrix <- countmatrix[,-1]
head(countmatrix) C17 C18 C19 C20 C21 C22
Pver_Sc0004518_size2183 7.85228 17.11440 6.81050 2.16756 7.18268 4.66885
Pver_Sc0006272_size1408 0.00000 0.00000 2.00000 0.00000 1.00000 0.00000
Pver_Sc0003523_size3224 0.00000 1.09759 12.12730 1.61304 1.16390 3.45154
Pver_Sc0006233_size1418 7.34194 14.33960 1.32007 12.92490 20.73860 9.61535
Pver_xpSc0121546_size5746 6.00000 7.46051 13.02040 5.35396 2.47586 0.00000
Pver_xfSc0001217_size5096 6.85357 2.54863 1.00207 2.02089 0.00000 2.69000
C23 C24 C25 C26 C27 C28
Pver_Sc0004518_size2183 21.47020 9.93404 1.07379 30.33510 3.59148 4.56873
Pver_Sc0006272_size1408 0.00000 0.00000 13.00000 1.00000 0.00000 1.00000
Pver_Sc0003523_size3224 1.22297 2.52883 9.83971 5.84554 8.89700 6.15775
Pver_Sc0006233_size1418 21.42530 3.65209 9.37371 14.74800 5.75744 7.00592
Pver_xpSc0121546_size5746 2.03561 8.09258 6.53826 7.29864 8.02735 6.94427
Pver_xfSc0001217_size5096 0.00000 0.00000 0.00000 2.84627 3.77124 1.67987
C29 C30 C31 C32 E10 E11
Pver_Sc0004518_size2183 3.49436 14.42350 19.15810 19.950900 3.16544 4.45746
Pver_Sc0006272_size1408 2.00000 0.00000 0.00000 2.324370 0.00000 0.00000
Pver_Sc0003523_size3224 18.88970 1.15033 6.69146 2.181350 5.74262 2.48931
Pver_Sc0006233_size1418 7.79912 20.08930 26.51350 12.057400 17.45350 9.35781
Pver_xpSc0121546_size5746 2.91164 3.22259 1.99999 5.000000 1.08613 1.00000
Pver_xfSc0001217_size5096 0.00000 9.00644 3.62421 0.856301 2.81287 2.09052
E12 E13 E14 E15 E16 E1
Pver_Sc0004518_size2183 6.03863 5.71854 5.38058 35.04610 18.17120 15.94860
Pver_Sc0006272_size1408 0.00000 2.00000 0.00000 2.00000 1.00000 0.00000
Pver_Sc0003523_size3224 5.28486 2.52545 3.55154 15.84280 3.86092 0.00000
Pver_Sc0006233_size1418 3.82369 12.46040 19.26390 12.24390 28.95150 7.63817
Pver_xpSc0121546_size5746 3.81159 5.89469 5.36072 6.68835 2.41352 1.00000
Pver_xfSc0001217_size5096 0.00000 0.00000 0.00000 3.43347 2.08294 1.57733
E2 E3 E4 E5 E6 E7
Pver_Sc0004518_size2183 6.64014 2.77836 15.68280 4.42886 6.84266 28.33200
Pver_Sc0006272_size1408 1.00000 0.00000 2.38019 1.00000 0.00000 0.00000
Pver_Sc0003523_size3224 1.68306 9.61741 2.51885 4.15475 11.98390 1.83673
Pver_Sc0006233_size1418 3.93828 14.00460 1.24852 3.46708 12.58970 2.49715
Pver_xpSc0121546_size5746 1.00000 3.60409 0.00000 1.09690 1.00000 11.00000
Pver_xfSc0001217_size5096 0.00000 3.81283 1.39317 4.50908 2.31714 8.13514
E8 E9
Pver_Sc0004518_size2183 4.73182 0.00000
Pver_Sc0006272_size1408 0.00000 1.00000
Pver_Sc0003523_size3224 5.64801 5.10102
Pver_Sc0006233_size1418 11.70150 9.82817
Pver_xpSc0121546_size5746 29.74560 1.70132
Pver_xfSc0001217_size5096 0.00000 0.00000
Round counts to integers so they can be run through DESeq2 and summarize countmatrix structure to check and see that it worked…
countmatrix <- round(countmatrix, 0)
str(countmatrix)'data.frame': 6264 obs. of 32 variables:
$ C17: num 8 0 0 7 6 ...
$ C18: num 17 0 1 14 7 ...
$ C19: num 7 2 12 1 13 ...
$ C20: num 2 0 2 13 5 ...
$ C21: num 7 1 1 21 2 ...
$ C22: num 5 0 3 10 0 ...
$ C23: num 21 0 1 21 2 ...
$ C24: num 10 0 3 4 8 ...
$ C25: num 1 13 10 9 7 ...
$ C26: num 30 1 6 15 7 ...
$ C27: num 4 0 9 6 8 ...
$ C28: num 5 1 6 7 7 ...
$ C29: num 3 2 19 8 3 ...
$ C30: num 14 0 1 20 3 ...
$ C31: num 19 0 7 27 2 ...
$ C32: num 20 2 2 12 5 ...
$ E10: num 3 0 6 17 1 ...
$ E11: num 4 0 2 9 1 ...
$ E12: num 6 0 5 4 4 ...
$ E13: num 6 2 3 12 6 ...
$ E14: num 5 0 4 19 5 ...
$ E15: num 35 2 16 12 7 ...
$ E16: num 18 1 4 29 2 ...
$ E1 : num 16 0 0 8 1 ...
$ E2 : num 7 1 2 4 1 ...
$ E3 : num 3 0 10 14 4 ...
$ E4 : num 16 2 3 1 0 ...
$ E5 : num 4 1 4 3 1 ...
$ E6 : num 7 0 12 13 1 ...
$ E7 : num 28 0 2 2 11 ...
$ E8 : num 5 0 6 12 30 ...
$ E9 : num 0 1 5 10 2 ...
The count matrix should look like this before you input into DESeq2…
C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29
Pver_Sc0004518_size2183 8 17 7 2 7 5 21 10 1 30 4 5 3
Pver_Sc0006272_size1408 0 0 2 0 1 0 0 0 13 1 0 1 2
Pver_Sc0003523_size3224 0 1 12 2 1 3 1 3 10 6 9 6 19
Pver_Sc0006233_size1418 7 14 1 13 21 10 21 4 9 15 6 7 8
Pver_xpSc0121546_size5746 6 7 13 5 2 0 2 8 7 7 8 7 3
Pver_xfSc0001217_size5096 7 3 1 2 0 3 0 0 0 3 4 2 0
C30 C31 C32 E10 E11 E12 E13 E14 E15 E16 E1 E2 E3 E4
Pver_Sc0004518_size2183 14 19 20 3 4 6 6 5 35 18 16 7 3 16
Pver_Sc0006272_size1408 0 0 2 0 0 0 2 0 2 1 0 1 0 2
Pver_Sc0003523_size3224 1 7 2 6 2 5 3 4 16 4 0 2 10 3
Pver_Sc0006233_size1418 20 27 12 17 9 4 12 19 12 29 8 4 14 1
Pver_xpSc0121546_size5746 3 2 5 1 1 4 6 5 7 2 1 1 4 0
Pver_xfSc0001217_size5096 9 4 1 3 2 0 0 0 3 2 2 0 4 1
E5 E6 E7 E8 E9
Pver_Sc0004518_size2183 4 7 28 5 0
Pver_Sc0006272_size1408 1 0 0 0 1
Pver_Sc0003523_size3224 4 12 2 6 5
Pver_Sc0006233_size1418 3 13 2 12 10
Pver_xpSc0121546_size5746 1 1 11 30 2
Pver_xfSc0001217_size5096 5 2 8 0 0
Use the DESeq2 package to create a DESeqDataSet object from a count matrix and metadata…
deseq2.dds <- DESeqDataSetFromMatrix(countData = countmatrix,
colData = metaData,
design = ~ treatment)Performs differential expression analysis using DESeq2 on the deseq2.dds object, extract the results, converts them into a data frame, and then orders the rows of the data frame alphabetically based on the row names…
deseq2.dds <- DESeq(deseq2.dds)
deseq2.res <- results(deseq2.dds)
deseq2.res <- as.data.frame(deseq2.res) # Convert to data frame
deseq2.res <- deseq2.res[order(rownames(deseq2.res)), ]The results (deseq2.res) should look like this…
baseMean log2FoldChange lfcSE stat
Pver_Sc0000000_size2095917 110374.10 0.055635923 0.04699103 1.1839689
Pver_Sc0000001_size2081954 102310.20 0.117073634 0.05308784 2.2052817
Pver_Sc0000002_size1617595 76740.43 0.281523591 0.15086218 1.8660978
Pver_Sc0000003_size1576134 63080.00 0.045165834 0.03014089 1.4984905
Pver_Sc0000004_size1560107 45979.44 -0.001693396 0.02232370 -0.0758564
Pver_Sc0000005_size1451149 40752.22 0.019263990 0.03704071 0.5200762
pvalue padj
Pver_Sc0000000_size2095917 0.23642537 0.7606988
Pver_Sc0000001_size2081954 0.02743434 0.3870680
Pver_Sc0000002_size1617595 0.06202768 0.5058815
Pver_Sc0000003_size1576134 0.13400586 0.6511615
Pver_Sc0000004_size1560107 0.93953335 0.9916985
Pver_Sc0000005_size1451149 0.60301048 0.9171572
Count the number of hits with adjusted p-value less then 0.05…
dim(deseq2.res[!is.na(deseq2.res$padj) & deseq2.res$padj <= 0.05, ])[1] 22 6
Retrieve the transcript IDs of the 22 lncRNAs with differential expression p-values less than 0.05 and save them to a text file…
# Subset the hits with adjusted p-value less than 0.05
hits <- deseq2.res[!is.na(deseq2.res$padj) & deseq2.res$padj <= 0.05, ]
# View the transcript IDs of the hits
transcript_ids <- rownames(hits)
print(transcript_ids) [1] "Pver_Sc0000139_size537088" "Pver_Sc0000156_size509041"
[3] "Pver_Sc0000172_size480810" "Pver_Sc0000556_size184003"
[5] "Pver_Sc0000853_size79572" "Pver_Sc0000968_size64176"
[7] "Pver_Sc0001125_size43131" "Pver_Sc0001315_size24563"
[9] "Pver_Sc0001356_size25637" "Pver_Sc0001408_size22845"
[11] "Pver_Sc0001524_size18081" "Pver_Sc0001709_size13140"
[13] "Pver_Sc0001979_size9044" "Pver_Sc0002476_size5736"
[15] "Pver_Sc0009607_size821" "Pver_xfSc0000044_size69330"
[17] "Pver_xfSc0000188_size16086" "Pver_xfSc0001443_size4587"
[19] "Pver_xfSc0006440_size1274" "Pver_xfSc0006736_size1221"
[21] "Pver_xpSc0121218_size45261" "Pver_xpSc0121370_size11769"
# Save the transcript IDs in a text file
writeLines(transcript_ids, "~/github/zach-lncRNA/output/sig_diff_IDs.txt")These transcript IDs can be found in a text file here.
Generating Differential Expression Figures
Create a plot of differentially expressed lncRNAs with those that have a p-value less than 0.05 highlighted in red…
tmp <- deseq2.res
# The main plot
plot(tmp$baseMean, tmp$log2FoldChange, pch=20, cex=0.45, ylim=c(-3, 3), log="x", col="darkgray",
main="DEG Nutrient Enrichment (pval <= 0.05)",
xlab="mean of normalized counts",
ylab="Log2 Fold Change")
# Getting the significant points and plotting them again so they're a different color
tmp.sig <- deseq2.res[!is.na(deseq2.res$padj) & deseq2.res$padj <= 0.05, ]
points(tmp.sig$baseMean, tmp.sig$log2FoldChange, pch=20, cex=0.45, col="red")
# 2 FC lines
abline(h=c(-1,1), col="blue")Generate a PCA plot to see how all of the differential expression results cluster…
vsd <- vst(deseq2.dds, blind = FALSE)
plotPCA(vsd, intgroup = "treatment")Create heatmaps to view the top 50 and top 22 differentially expressed lncRNAs…
# Select top 50 differentially expressed genes
res <- results(deseq2.dds)
res_ordered <- res[order(res$padj), ]
top_genes <- row.names(res_ordered)[1:50]
# Extract counts and normalize
counts <- counts(deseq2.dds, normalized = TRUE)
counts_top <- counts[top_genes, ]
# Log-transform counts
log_counts_top <- log2(counts_top + 1)
# Generate heatmap
pheatmap(log_counts_top, scale = "row")# Select top 22 differentially expressed genes
res <- results(deseq2.dds)
res_ordered <- res[order(res$padj), ]
top_genes <- row.names(res_ordered)[1:22]
# Extract counts and normalize
counts <- counts(deseq2.dds, normalized = TRUE)
counts_top <- counts[top_genes, ]
# Log-transform counts
log_counts_top <- log2(counts_top + 1)
# Generate heatmap
pheatmap(log_counts_top, scale = "row")Write final differential expression results table…
write.table(tmp.sig, "../output/DEGlist.tab", row.names = T)The final differential gene expression results can be found here.
Part 3: Interpreting Results and Future Work
Of 6264 identified lncRNA transcripts, only 22 were differentially expressed with a p-value of less than 0.05 in this analysis.
It may be that lncRNAs don’t play much of a role in nutrient enrichment, or that a relatively small number of lncRNAs are particularly important managing nutrient uptake and utilization. Low level nutrient enrichment may also not be enough stimulus to elicit a response in the expression of lncRNAs.
Further work needs to be done to understand the potential function of these lncRNAs. Preliminary database searches did not yield transcript matches with associated GO terms.
The next step in this work will be to complete weighted correlation network analysis (WGCNA) to see how these lncRNAs are expressed alongside mRNAs with associated functions. This will provide clues about which GO terms, and thus potential functions, these lncRNAs may be associated with.