R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. R code chunks look like this:

# This is a comment line, which we will often use as a question prompt.
# Do not change the name of the chunk because it is used when grading.
note <- "This is a line of R code that sets a variable to a value"
note # This line will print the value to the knitted document.
## [1] "This is a line of R code that sets a variable to a value"

We will also use multiple choice questions within this assignment. You will use an X to indicate you choice, as shown below.

Which of the following is a shade of red?

Module 3 Differential Expression Analysis

In this module, you will perform differential gene expression analysis using DESeq2, a widely used method for modeling RNA-seq count data. Differential expression analysis begins with two key inputs: (1) a phenotype/metadata table describing each sample and its experimental condition, and (2) a gene-by-sample count matrix containing raw, unnormalized sequencing counts for each gene. After subsetting the data to the conditions of interest, you will create a DESeqDataSet object, which stores the counts, sample information, and your experimental design formula.

DESeq2 normalizes counts to account for differences in sequencing depth and fits a negative binomial model to estimate gene-wise dispersion and fold changes. When you run DESeq(), the software performs all modeling steps and produces statistical results, including log2 fold changes, p-values, and adjusted p-values that correct for multiple testing. You will learn how to interpret the direction and size of fold changes based on the reference condition, filter significant genes, and visualize results using tools such as volcano plots and gene-level count plots.

Once you have completed all of the exercises below, use the Knit button to generate an html file with all of your code and output. Then submit your .html file to Canvas.

Question 1

#load the dplyr and DESeq2 libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(magrittr)
library("DESeq2")
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
## 
## Attaching package: 'generics'
## The following object is masked from 'package:dplyr':
## 
##     explain
## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union
## 
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## The following object is masked from 'package:grDevices':
## 
##     windows
## Loading required package: GenomicRanges
## Loading required package: Seqinfo
## 
## Attaching package: 'GenomicRanges'
## The following object is masked from 'package:magrittr':
## 
##     subtract
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
## 
##     count
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
library(ggplot2)
library("RColorBrewer")
library("pheatmap")

Question 2

The tutorial data was a .txt file, while the exercise data here is a .csv file. You will need a different function (read.csv) from the tutorial to read this data. Set the argument stringsAsFactors in the function read.csv as FALSE so that R reads text columns as plain character strings instead of automatically converting them into factors. Strings are raw text, while factors are categorical variables stored as integer codes with predefined levels. In this dataset, keeping sample.id and stage as strings initially gives us full control over how and when we convert variables to factors and avoids unintended recoding of sample IDs, and prevents confusion when manipulating the dataframe. You will also need to set the argument header in the function read.csv as T so that the titles of the column names are preserved.

#Read in the samples/conditions file using read.csv
#Read in the counts data using read.csv
#Convert the first column named "gene" to the rownames index 
# Load libraries
library(dplyr)
library(magrittr)
library(DESeq2)
library(ggplot2)
library(RColorBrewer)
library(pheatmap)

# Read in the samples/conditions file
samples <- read.csv("C:/Users/acman/Downloads/modencodefly_phenodata (1).csv")

# Read in the counts data
counts <- read.csv("C:/Users/acman/Downloads/modencodefly_count_table (1).csv")

# Convert the first column named "gene" to row names
rownames(counts) <- counts$gene
counts$gene <- NULL

# Check the result
head(counts)
##             SRX007811 SRX008180 SRX008227 SRX008238 SRX008258 SRX008015
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      3903      1601       681       648       718       921
## FBgn0000014        75        29        10         9        12       563
## FBgn0000015        33        11         8         4         8       235
## FBgn0000017      9763      4107      1814      1652      1788      6358
## FBgn0000018       832       368       146       126       170        97
##             SRX008179 SRX008190 SRX008193 SRX008271 SRX008027 SRX008181
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       295       364       268       267      1764       411
## FBgn0000014       186       239       152       191      4890       957
## FBgn0000015        95       101        58       101      3000       590
## FBgn0000017      2214      2763      1906      2053      6006      1252
## FBgn0000018        33        55        25        22       339        57
##             SRX008217 SRX008250 SRX008265 SRX008025 SRX008175 SRX008210
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       284       396       706      1074       563       156
## FBgn0000014       806      1036      1956      6147      3166       956
## FBgn0000015       492       715      1185      2492      1228       340
## FBgn0000017       904      1244      2456      5030      2584       746
## FBgn0000018        60        71       116       178        95        34
##             SRX008212 SRX008257 SRX008010 SRX008249 SRX008252 SRX008273
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       160       440      1557       874       904       632
## FBgn0000014      1053      2876      5160      3270      2836      2048
## FBgn0000015       458      1174      1986      1454      1138       788
## FBgn0000017       847      2105      5350      3101      3046      2343
## FBgn0000018        35        75       130        99        75        56
##             SRX008274 SRX008005 SRX008198 SRX008208 SRX008243 SRX008247
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       234      4488      1222       825      3030      1224
## FBgn0000014       815      4792      1390       883      3102      1192
## FBgn0000015       324      1709       541       370      1076       431
## FBgn0000017       969      8695      2448      1518      5540      2345
## FBgn0000018        25       141        43        25        97        34
##             SRX008018 SRX008177 SRX008225 SRX008235 SRX008277 SRX008007
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      8755      2431      1718      5478      2212      5617
## FBgn0000014      9943      2536      1890      5782      2559      5134
## FBgn0000015      3421       886       744      1942       985      1313
## FBgn0000017     12295      3296      2440      7652      3125      9374
## FBgn0000018       229        59        50       113        48       122
##             SRX008196 SRX008233 SRX008237 SRX008262 SRX008006 SRX008178
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      1039      2769      1716      1664      1749      1532
## FBgn0000014       935      2339      1404      1426      2908      2602
## FBgn0000015       240       645       392       419      1024       845
## FBgn0000017      1816      4564      2640      2853      4548      4156
## FBgn0000018        19        67        27        44        93        57
##             SRX008205 SRX008242 SRX008278 SRX008020 SRX008213 SRX008215
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       549       715       390      1685       384      1397
## FBgn0000014       965      1116       638      2333       543      1832
## FBgn0000015       321       398       216       914       199       667
## FBgn0000017      1537      1816      1085      2577       572      2010
## FBgn0000018        20        23        10        42        18        31
##             SRX008222 SRX008259 SRX008011 SRX008214 SRX008221 SRX008241
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       407      2141      1036       417       353       560
## FBgn0000014       512      3101      1068       448       338       589
## FBgn0000015       217      1153       419       165       139       219
## FBgn0000017       526      3328      3327      1425      1046      1719
## FBgn0000018        12        72        35        16        11        15
##             SRX008256 SRX008019 SRX008167 SRX008234 SRX008251 SRX008266
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       394      2548       730       687       786       587
## FBgn0000014       475      3090       979       819      1008       810
## FBgn0000015       179      1306       404       334       450       310
## FBgn0000017      1342      6979      1985      1843      2297      1539
## FBgn0000018        15       113        33        26        39        23
##             SRX008026 SRX008174 SRX008201 SRX008239 SRX008008 SRX008168
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       928       392       575       851      1696       353
## FBgn0000014      1845       796      1203      1740      2030       422
## FBgn0000015       635       295       485       617       695       155
## FBgn0000017      1440       670      1021      1330      2360       481
## FBgn0000018       150        64       102       139       371        94
##             SRX008211 SRX008255 SRX008261 SRX008009 SRX008194 SRX008207
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       284       733       258       311        94       108
## FBgn0000014       375       914       296       536       182       211
## FBgn0000015       130       350        95       190        71        70
## FBgn0000017       434      1054       381       481       154       197
## FBgn0000018        66       162        49        83        29        27
##             SRX008224 SRX008244 SRX008029 SRX008184 SRX008206 SRX008220
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       110       148       228        74       115        77
## FBgn0000014       196       296       535       181       278       212
## FBgn0000015        81       111       127        56        62        56
## FBgn0000017       197       292       667       226       357       242
## FBgn0000018        35        48       131        63        82        48
##             SRX008267 SRX008014 SRX008182 SRX008187 SRX008189 SRX008200
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008        74       393       541       168       161       163
## FBgn0000014       173       822      1342       360       434       398
## FBgn0000015        44       166       324        82        95       103
## FBgn0000017       216      1036      1416       490       452       423
## FBgn0000018        57       107       167        44        57        39
##             SRX008156 SRX008199 SRX008245 SRX008253 SRX008023 SRX008218
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      1205      2939       945      1022      3602       926
## FBgn0000014      1598      3795      1136      1349      3227       836
## FBgn0000015       389      1029       322       338       470       134
## FBgn0000017      1791      4400      1395      1484      5073      1312
## FBgn0000018       154       334        94       124       307        78
##             SRX008246 SRX008248 SRX008275 SRX008016 SRX008192 SRX008219
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      2209      1494      1076      4011      1233      1758
## FBgn0000014      1903      1231       878      2698       885      1115
## FBgn0000015       287       201       130       529       192       229
## FBgn0000017      3019      1972      1521      5793      1860      2517
## FBgn0000018       178       105        80       274        95       128
##             SRX008236 SRX008264 SRX008013 SRX008188 SRX008197 SRX008228
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      2123      2023      1117      1004      1447      1452
## FBgn0000014      1577      1488       282       286       355       379
## FBgn0000015       325       316        99        87       153       101
## FBgn0000017      3176      2960      2356      2159      3069      3182
## FBgn0000018       156       148       264       247       320       333
##             SRX008022 SRX008183 SRX008185 SRX008216 SRX012269 SRX008024
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       665       396       686       409      1011       603
## FBgn0000014        61        44        64        36        72        39
## FBgn0000015        17        19        17        15        20        15
## FBgn0000017      2170      1317      2252      1426      3357      1906
## FBgn0000018       218       149       220       137       343       194
##             SRX008173 SRX008195 SRX008202 SRX008254 SRX012270 SRX008012
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       336       380       431       437       738       717
## FBgn0000014        35        40        36        29        54       340
## FBgn0000015        17        20         8        10        12       256
## FBgn0000017      1123      1239      1551      1410      2504      1388
## FBgn0000018       138       164       143       136       224       170
##             SRX008170 SRX008263 SRX008268 SRX016332 SRX008028 SRX008191
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       630       389       303       373      1234      1112
## FBgn0000014       307       195       148       222       309       298
## FBgn0000015       270       165       116       142       347       292
## FBgn0000017      1279       790       579       849      2007      1837
## FBgn0000018       186       107        76       108       319       280
##             SRX008223 SRX008260 SRX008021 SRX008203 SRX008226 SRX008229
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       989      1026       430       200       158       430
## FBgn0000014       243       281       137        75        47       132
## FBgn0000015       305       328       167       118        70       177
## FBgn0000017      1596      1644       898       461       341       777
## FBgn0000018       251       308       154        98        54       137
##             SRX008232 SRX012271 SRX008171 SRX008186 SRX008240 SRX010758
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       250      1074      3386      6313      3210      4811
## FBgn0000014        74       339       647      1310       632       939
## FBgn0000015        98       322       202       375       203       259
## FBgn0000017       504      2100      2759      5211      2524      3931
## FBgn0000018        71       332       128       263        96       165
##             SRX016331 SRX008155 SRX008169 SRX008172 SRX008231 SRX008542
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008      3803      2393      1834      1890       744       952
## FBgn0000014       751       572       404       440       156       212
## FBgn0000015       219       252       183       210        76        81
## FBgn0000017      3212      2584      2013      2156       809       963
## FBgn0000018       117       216       142       143        47        65
##             SRX008017 SRX008209 SRX008230 SRX008270 SRX008157 SRX008204
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       971       844      1204       620      1519      1267
## FBgn0000014       307       253       371       159      1841      1650
## FBgn0000015       146       146       168        93       340       332
## FBgn0000017      1168      1085      1513       782      1973      1592
## FBgn0000018       134       124       146        82       101       112
##             SRX008269 SRX008272 SRX008276
## FBgn0000003         0         0         0
## FBgn0000008      1448      1185      3225
## FBgn0000014      1875      1370      4048
## FBgn0000015       368       261       711
## FBgn0000017      1833      1465      4066
## FBgn0000018       116       102       294
head(samples)
##   sample.id num.tech.reps       stage
## 1 SRX007811             5 Embryos0002
## 2 SRX008180             2 Embryos0002
## 3 SRX008227             1 Embryos0002
## 4 SRX008238             1 Embryos0002
## 5 SRX008258             1 Embryos0002
## 6 SRX008015             2 Embryos0204

Question 3

##Since we do not want to work on all comparisons, we will filter out the samples and conditions that we do not need. 
#Filter the samples/conditions file to samples in "adultfemale5d" and "adultmale5d"
#you can do this using the subset() function, or the %>% (pipe) and %in% membership operators as part of the dplyr package
#Then, filter the counts file to samples in "adultfemale5d" and "adultmale5d"
#Finally, print the filtered groups dataframe and the first 10 lines of the filtered counts dataframe
library(dplyr)
library(magrittr)
# Filter the samples/conditions file to adultfemale5d and adultmale5d
filtered_samples <- samples %>%
  filter(stage %in% c("adultfemale5d", "adultmale5d"))

# Filter the counts file to keep only those same samples
filtered_counts <- counts[, colnames(counts) %in% filtered_samples$sample.id]

# Print the filtered groups dataframe
filtered_samples
##    sample.id num.tech.reps         stage
## 1  SRX008024             2 adultfemale5d
## 2  SRX008173             1 adultfemale5d
## 3  SRX008195             1 adultfemale5d
## 4  SRX008202             1 adultfemale5d
## 5  SRX008254             1 adultfemale5d
## 6  SRX012270             2 adultfemale5d
## 7  SRX008021             3   adultmale5d
## 8  SRX008203             1   adultmale5d
## 9  SRX008226             1   adultmale5d
## 10 SRX008229             1   adultmale5d
## 11 SRX008232             2   adultmale5d
## 12 SRX012271             4   adultmale5d
# Print the first 10 lines of the filtered counts dataframe
head(filtered_counts, 10)
##             SRX008024 SRX008173 SRX008195 SRX008202 SRX008254 SRX012270
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       603       336       380       431       437       738
## FBgn0000014        39        35        40        36        29        54
## FBgn0000015        15        17        20         8        10        12
## FBgn0000017      1906      1123      1239      1551      1410      2504
## FBgn0000018       194       138       164       143       136       224
## FBgn0000022         1         0         1         1         0         0
## FBgn0000024        97        63        76        62        58        85
## FBgn0000028         3         4         1         5         2         5
## FBgn0000032       899       592       862       588       629       923
##             SRX008021 SRX008203 SRX008226 SRX008229 SRX008232 SRX012271
## FBgn0000003         0         0         0         0         0         0
## FBgn0000008       430       200       158       430       250      1074
## FBgn0000014       137        75        47       132        74       339
## FBgn0000015       167       118        70       177        98       322
## FBgn0000017       898       461       341       777       504      2100
## FBgn0000018       154        98        54       137        71       332
## FBgn0000022         0         0         0         0         0         0
## FBgn0000024       610       360       238       645       348      1309
## FBgn0000028       160        71        64       125        80       469
## FBgn0000032       303       189       114       340       151       730

Question 4

#Create the DESeqDataSet object
#A DESeqDataSet object is the data structure in the DESeq2 package for performing differential expression analysis
#print the DESeqDataSet object by typing the variable name on a new line by itself
# Create the DESeqDataSet object
library(DESeq2)
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")
## Bioconductor version 3.22 (BiocManager 1.30.27), R 4.5.2 (2025-10-31 ucrt)
## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'DESeq2'
## Installation paths not writeable, unable to update packages
##   path: C:/Program Files/R/R-4.5.2/library
##   packages:
##     cluster, foreign, lattice, mgcv, survival
## Old packages: 'later', 'phyloseq', 'renv', 'systemfonts', 'vegan'
dds <- DESeqDataSetFromMatrix(
  countData = filtered_counts,
  colData = filtered_samples,
  design = ~ stage
)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# Print the object
dds
## class: DESeqDataSet 
## dim: 14869 12 
## metadata(1): version
## assays(1): counts
## rownames(14869): FBgn0000003 FBgn0000008 ... FBgn0261574 FBgn0261575
## rowData names(0):
## colnames(12): SRX008024 SRX008173 ... SRX008232 SRX012271
## colData names(3): sample.id num.tech.reps stage

Question 5

#call the DESeq function on a DESeqDataSet object
#extract the results and save them to the variable name
#print the first 10 lines of the results file 
# Run the differential expression analysis

library(DESeq2)

filtered_samples <- filtered_samples[match(colnames(filtered_counts), filtered_samples$sample.id), ]

dds <- DESeqDataSetFromMatrix(
  countData = filtered_counts,
  colData = filtered_samples,
  design = ~ stage
)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)

head(res, 10)
## log2 fold change (MLE): stage adultmale5d vs adultfemale5d 
## Wald test p-value: stage adultmale5d vs adultfemale5d 
## DataFrame with 10 rows and 6 columns
##                baseMean log2FoldChange     lfcSE       stat       pvalue
##               <numeric>      <numeric> <numeric>  <numeric>    <numeric>
## FBgn0000003    0.000000             NA        NA         NA           NA
## FBgn0000008  407.826981      -0.562735 0.0858582  -6.554233  5.59287e-11
## FBgn0000014   71.817903       1.425201 0.1308824  10.889174  1.29812e-27
## FBgn0000015   75.111228       3.266039 0.2079481  15.706033  1.37519e-55
## FBgn0000017 1147.875974      -1.275833 0.0837582 -15.232330  2.15763e-52
## FBgn0000018  141.237348      -0.574080 0.0884264  -6.492177  8.46048e-11
## FBgn0000022    0.260019      -1.570634 2.0699669  -0.758772  4.47989e-01
## FBgn0000024  276.893656       2.679440 0.0921583  29.074327 7.58177e-186
## FBgn0000028   61.379165       5.158434 0.3570199  14.448590  2.55869e-47
## FBgn0000032  502.076585      -1.656786 0.0929024 -17.833624  3.87474e-71
##                     padj
##                <numeric>
## FBgn0000003           NA
## FBgn0000008  8.50316e-11
## FBgn0000014  2.75901e-27
## FBgn0000015  5.23212e-55
## FBgn0000017  7.78738e-52
## FBgn0000018  1.28078e-10
## FBgn0000022  4.79309e-01
## FBgn0000024 1.00697e-184
## FBgn0000028  8.39766e-47
## FBgn0000032  1.82417e-70

Question 6

#It is important to know the order of the condition/group levels in a DESeqDataSet object so we know what our reference condition is
#The first condition listed in the DESeqDataSet object is the reference condition
#A positive gene fold change means that the gene is upregulated in the second condition relative to the first (reference) condition.
#Check the order of the condition levels by typing name_of_your_DESeq_object$stage
dds$stage <- relevel(dds$stage, ref = "adultmale5d")
dds <- DESeq(dds)  # rerun DESeq after changing reference
## using pre-existing size factors
## estimating dispersions
## found already estimated dispersions, replacing these
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing

Question 7

Which of the conditions from our DESeqDataSet object is going to be used as the reference level?

Question 8

What is the biological meaning of the log2 fold change for gene FBgn0000014?

Question 9

#In R, compute the log2 fold change of a gene that has:
#A gene expression equal to 230 in the “adultfemale5d” condition.
#A gene expression equal to 750 in the “adultmale5d” condition.
#Express the results where the adultfemale5d is the reference condition
# Expression values
expr_ref <- 230      # adultfemale5d
expr_second <- 750   # adultmale5d

# Compute log2 fold change
log2FC <- log2(expr_second / expr_ref)
log2FC
## [1] 1.705257

Question 10

Select the answer that completes the true statement:

A log2 fold change of [blank] means that the gene has [blank] expression (approximately[blank]) in [blank] compared to [blank]

Question 11

#Filter your DESeq2 results so that only genes that meet a p-adjusted significance level of 0.05 and show at least a 2-fold change are retained
#When you perform thousands of statistical tests (one for each gene), you will by chance call genes differentially expressed while they are not (false positives). You can control for this by applying certain statistical procedures called multiple hypothesis test correction.
#Print the dimensions of this list
# Filter significant genes: padj < 0.05 and |log2FoldChange| >= 1 (2-fold change)
sig_genes <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) >= 1), ]

# Print the number of genes and columns
dim(sig_genes)
## [1] 8258    6

Question 12

#To the unfiltered results, add a column called "category" whose entry indicates 
#whether a gene falls into one of three groups:
# 1) adjusted p-value > 0.05  AND  |log2FoldChange| < 2      → color = "red"
# 2) adjusted p-value ≤ 0.05 AND  |log2FoldChange| < 2      → color = "blue"
# 3) adjusted p-value ≤ 0.05 AND  |log2FoldChange| ≥ 2      → color = "green"
#Then, create a Volcano plot (log2FoldChange vs -log10(p-value)) colored accordingly

res_df <- as.data.frame(res)  # convert to data frame if it's not already

res_df$category <- "red"  # default category

# Assign blue: significant but <2-fold change
res_df$category[res_df$padj <= 0.05 & abs(res_df$log2FoldChange) < 1] <- "blue"

# Assign green: significant and ≥2-fold change
res_df$category[res_df$padj <= 0.05 & abs(res_df$log2FoldChange) >= 1] <- "green"

library(ggplot2)

ggplot(res_df, aes(x = log2FoldChange, y = -log10(pvalue), color = category)) +
  geom_point(alpha = 0.6, size = 1.5) +
  scale_color_manual(values = c("red" = "red", "blue" = "blue", "green" = "green")) +
  theme_minimal() +
  xlab("Log2 Fold Change") +
  ylab("-log10(p-value)") +
  ggtitle("Volcano Plot of Differentially Expressed Genes") +
  theme(legend.title = element_blank())
## Warning: Removed 1800 rows containing missing values or values outside the scale range
## (`geom_point()`).

Question 13

#From the top result of your filtered list from Question 11, plot the counts per sample using the plotCounts() function
# Identify the gene with the smallest adjusted p-value
top_gene <- rownames(sig_genes)[1]  # the first gene in your filtered list

# Plot counts for this gene by stage
plotCounts(
  dds,
  gene = top_gene,
  intgroup = "stage",
  main = paste("Counts for", top_gene),
  xlab = "Stage"
)

Question 14

This plot shows this gene is [blank] in group adultmale5d.

Question 15

Using https://flybase.org/ what is the known function of this gene?