This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. R code chunks look like this:
# This is a comment line, which we will often use as a question prompt.
# Do not change the name of the chunk because it is used when grading.
note <- "This is a line of R code that sets a variable to a value"
note # This line will print the value to the knitted document.
## [1] "This is a line of R code that sets a variable to a value"
We will also use multiple choice questions within this assignment.
You will use an X to indicate you choice, as shown
below.
Which of the following is a shade of red?
In this module, you will perform differential gene expression analysis using DESeq2, a widely used method for modeling RNA-seq count data. Differential expression analysis begins with two key inputs: (1) a phenotype/metadata table describing each sample and its experimental condition, and (2) a gene-by-sample count matrix containing raw, unnormalized sequencing counts for each gene. After subsetting the data to the conditions of interest, you will create a DESeqDataSet object, which stores the counts, sample information, and your experimental design formula.
DESeq2 normalizes counts to account for differences in sequencing depth and fits a negative binomial model to estimate gene-wise dispersion and fold changes. When you run DESeq(), the software performs all modeling steps and produces statistical results, including log2 fold changes, p-values, and adjusted p-values that correct for multiple testing. You will learn how to interpret the direction and size of fold changes based on the reference condition, filter significant genes, and visualize results using tools such as volcano plots and gene-level count plots.
Once you have completed all of the exercises below, use the Knit button to generate an html file with all of your code and output. Then submit your .html file to Canvas.
#load the dplyr and DESeq2 libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(magrittr)
library("DESeq2")
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following object is masked from 'package:dplyr':
##
## explain
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:dplyr':
##
## combine
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:grDevices':
##
## windows
## Loading required package: GenomicRanges
## Loading required package: Seqinfo
##
## Attaching package: 'GenomicRanges'
## The following object is masked from 'package:magrittr':
##
## subtract
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
##
## count
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(ggplot2)
library("RColorBrewer")
library("pheatmap")
The tutorial data was a .txt file, while the exercise data here is a .csv file. You will need a different function (read.csv) from the tutorial to read this data. Set the argument stringsAsFactors in the function read.csv as FALSE so that R reads text columns as plain character strings instead of automatically converting them into factors. Strings are raw text, while factors are categorical variables stored as integer codes with predefined levels. In this dataset, keeping sample.id and stage as strings initially gives us full control over how and when we convert variables to factors and avoids unintended recoding of sample IDs, and prevents confusion when manipulating the dataframe. You will also need to set the argument header in the function read.csv as T so that the titles of the column names are preserved.
#Read in the samples/conditions file using read.csv
#Read in the counts data using read.csv
#Convert the first column named "gene" to the rownames index
# Load libraries
library(dplyr)
library(magrittr)
library(DESeq2)
library(ggplot2)
library(RColorBrewer)
library(pheatmap)
# Read in the samples/conditions file
samples <- read.csv("C:/Users/acman/Downloads/modencodefly_phenodata (1).csv")
# Read in the counts data
counts <- read.csv("C:/Users/acman/Downloads/modencodefly_count_table (1).csv")
# Convert the first column named "gene" to row names
rownames(counts) <- counts$gene
counts$gene <- NULL
# Check the result
head(counts)
## SRX007811 SRX008180 SRX008227 SRX008238 SRX008258 SRX008015
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 3903 1601 681 648 718 921
## FBgn0000014 75 29 10 9 12 563
## FBgn0000015 33 11 8 4 8 235
## FBgn0000017 9763 4107 1814 1652 1788 6358
## FBgn0000018 832 368 146 126 170 97
## SRX008179 SRX008190 SRX008193 SRX008271 SRX008027 SRX008181
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 295 364 268 267 1764 411
## FBgn0000014 186 239 152 191 4890 957
## FBgn0000015 95 101 58 101 3000 590
## FBgn0000017 2214 2763 1906 2053 6006 1252
## FBgn0000018 33 55 25 22 339 57
## SRX008217 SRX008250 SRX008265 SRX008025 SRX008175 SRX008210
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 284 396 706 1074 563 156
## FBgn0000014 806 1036 1956 6147 3166 956
## FBgn0000015 492 715 1185 2492 1228 340
## FBgn0000017 904 1244 2456 5030 2584 746
## FBgn0000018 60 71 116 178 95 34
## SRX008212 SRX008257 SRX008010 SRX008249 SRX008252 SRX008273
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 160 440 1557 874 904 632
## FBgn0000014 1053 2876 5160 3270 2836 2048
## FBgn0000015 458 1174 1986 1454 1138 788
## FBgn0000017 847 2105 5350 3101 3046 2343
## FBgn0000018 35 75 130 99 75 56
## SRX008274 SRX008005 SRX008198 SRX008208 SRX008243 SRX008247
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 234 4488 1222 825 3030 1224
## FBgn0000014 815 4792 1390 883 3102 1192
## FBgn0000015 324 1709 541 370 1076 431
## FBgn0000017 969 8695 2448 1518 5540 2345
## FBgn0000018 25 141 43 25 97 34
## SRX008018 SRX008177 SRX008225 SRX008235 SRX008277 SRX008007
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 8755 2431 1718 5478 2212 5617
## FBgn0000014 9943 2536 1890 5782 2559 5134
## FBgn0000015 3421 886 744 1942 985 1313
## FBgn0000017 12295 3296 2440 7652 3125 9374
## FBgn0000018 229 59 50 113 48 122
## SRX008196 SRX008233 SRX008237 SRX008262 SRX008006 SRX008178
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 1039 2769 1716 1664 1749 1532
## FBgn0000014 935 2339 1404 1426 2908 2602
## FBgn0000015 240 645 392 419 1024 845
## FBgn0000017 1816 4564 2640 2853 4548 4156
## FBgn0000018 19 67 27 44 93 57
## SRX008205 SRX008242 SRX008278 SRX008020 SRX008213 SRX008215
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 549 715 390 1685 384 1397
## FBgn0000014 965 1116 638 2333 543 1832
## FBgn0000015 321 398 216 914 199 667
## FBgn0000017 1537 1816 1085 2577 572 2010
## FBgn0000018 20 23 10 42 18 31
## SRX008222 SRX008259 SRX008011 SRX008214 SRX008221 SRX008241
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 407 2141 1036 417 353 560
## FBgn0000014 512 3101 1068 448 338 589
## FBgn0000015 217 1153 419 165 139 219
## FBgn0000017 526 3328 3327 1425 1046 1719
## FBgn0000018 12 72 35 16 11 15
## SRX008256 SRX008019 SRX008167 SRX008234 SRX008251 SRX008266
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 394 2548 730 687 786 587
## FBgn0000014 475 3090 979 819 1008 810
## FBgn0000015 179 1306 404 334 450 310
## FBgn0000017 1342 6979 1985 1843 2297 1539
## FBgn0000018 15 113 33 26 39 23
## SRX008026 SRX008174 SRX008201 SRX008239 SRX008008 SRX008168
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 928 392 575 851 1696 353
## FBgn0000014 1845 796 1203 1740 2030 422
## FBgn0000015 635 295 485 617 695 155
## FBgn0000017 1440 670 1021 1330 2360 481
## FBgn0000018 150 64 102 139 371 94
## SRX008211 SRX008255 SRX008261 SRX008009 SRX008194 SRX008207
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 284 733 258 311 94 108
## FBgn0000014 375 914 296 536 182 211
## FBgn0000015 130 350 95 190 71 70
## FBgn0000017 434 1054 381 481 154 197
## FBgn0000018 66 162 49 83 29 27
## SRX008224 SRX008244 SRX008029 SRX008184 SRX008206 SRX008220
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 110 148 228 74 115 77
## FBgn0000014 196 296 535 181 278 212
## FBgn0000015 81 111 127 56 62 56
## FBgn0000017 197 292 667 226 357 242
## FBgn0000018 35 48 131 63 82 48
## SRX008267 SRX008014 SRX008182 SRX008187 SRX008189 SRX008200
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 74 393 541 168 161 163
## FBgn0000014 173 822 1342 360 434 398
## FBgn0000015 44 166 324 82 95 103
## FBgn0000017 216 1036 1416 490 452 423
## FBgn0000018 57 107 167 44 57 39
## SRX008156 SRX008199 SRX008245 SRX008253 SRX008023 SRX008218
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 1205 2939 945 1022 3602 926
## FBgn0000014 1598 3795 1136 1349 3227 836
## FBgn0000015 389 1029 322 338 470 134
## FBgn0000017 1791 4400 1395 1484 5073 1312
## FBgn0000018 154 334 94 124 307 78
## SRX008246 SRX008248 SRX008275 SRX008016 SRX008192 SRX008219
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 2209 1494 1076 4011 1233 1758
## FBgn0000014 1903 1231 878 2698 885 1115
## FBgn0000015 287 201 130 529 192 229
## FBgn0000017 3019 1972 1521 5793 1860 2517
## FBgn0000018 178 105 80 274 95 128
## SRX008236 SRX008264 SRX008013 SRX008188 SRX008197 SRX008228
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 2123 2023 1117 1004 1447 1452
## FBgn0000014 1577 1488 282 286 355 379
## FBgn0000015 325 316 99 87 153 101
## FBgn0000017 3176 2960 2356 2159 3069 3182
## FBgn0000018 156 148 264 247 320 333
## SRX008022 SRX008183 SRX008185 SRX008216 SRX012269 SRX008024
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 665 396 686 409 1011 603
## FBgn0000014 61 44 64 36 72 39
## FBgn0000015 17 19 17 15 20 15
## FBgn0000017 2170 1317 2252 1426 3357 1906
## FBgn0000018 218 149 220 137 343 194
## SRX008173 SRX008195 SRX008202 SRX008254 SRX012270 SRX008012
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 336 380 431 437 738 717
## FBgn0000014 35 40 36 29 54 340
## FBgn0000015 17 20 8 10 12 256
## FBgn0000017 1123 1239 1551 1410 2504 1388
## FBgn0000018 138 164 143 136 224 170
## SRX008170 SRX008263 SRX008268 SRX016332 SRX008028 SRX008191
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 630 389 303 373 1234 1112
## FBgn0000014 307 195 148 222 309 298
## FBgn0000015 270 165 116 142 347 292
## FBgn0000017 1279 790 579 849 2007 1837
## FBgn0000018 186 107 76 108 319 280
## SRX008223 SRX008260 SRX008021 SRX008203 SRX008226 SRX008229
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 989 1026 430 200 158 430
## FBgn0000014 243 281 137 75 47 132
## FBgn0000015 305 328 167 118 70 177
## FBgn0000017 1596 1644 898 461 341 777
## FBgn0000018 251 308 154 98 54 137
## SRX008232 SRX012271 SRX008171 SRX008186 SRX008240 SRX010758
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 250 1074 3386 6313 3210 4811
## FBgn0000014 74 339 647 1310 632 939
## FBgn0000015 98 322 202 375 203 259
## FBgn0000017 504 2100 2759 5211 2524 3931
## FBgn0000018 71 332 128 263 96 165
## SRX016331 SRX008155 SRX008169 SRX008172 SRX008231 SRX008542
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 3803 2393 1834 1890 744 952
## FBgn0000014 751 572 404 440 156 212
## FBgn0000015 219 252 183 210 76 81
## FBgn0000017 3212 2584 2013 2156 809 963
## FBgn0000018 117 216 142 143 47 65
## SRX008017 SRX008209 SRX008230 SRX008270 SRX008157 SRX008204
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 971 844 1204 620 1519 1267
## FBgn0000014 307 253 371 159 1841 1650
## FBgn0000015 146 146 168 93 340 332
## FBgn0000017 1168 1085 1513 782 1973 1592
## FBgn0000018 134 124 146 82 101 112
## SRX008269 SRX008272 SRX008276
## FBgn0000003 0 0 0
## FBgn0000008 1448 1185 3225
## FBgn0000014 1875 1370 4048
## FBgn0000015 368 261 711
## FBgn0000017 1833 1465 4066
## FBgn0000018 116 102 294
head(samples)
## sample.id num.tech.reps stage
## 1 SRX007811 5 Embryos0002
## 2 SRX008180 2 Embryos0002
## 3 SRX008227 1 Embryos0002
## 4 SRX008238 1 Embryos0002
## 5 SRX008258 1 Embryos0002
## 6 SRX008015 2 Embryos0204
##Since we do not want to work on all comparisons, we will filter out the samples and conditions that we do not need.
#Filter the samples/conditions file to samples in "adultfemale5d" and "adultmale5d"
#you can do this using the subset() function, or the %>% (pipe) and %in% membership operators as part of the dplyr package
#Then, filter the counts file to samples in "adultfemale5d" and "adultmale5d"
#Finally, print the filtered groups dataframe and the first 10 lines of the filtered counts dataframe
library(dplyr)
library(magrittr)
# Filter the samples/conditions file to adultfemale5d and adultmale5d
filtered_samples <- samples %>%
filter(stage %in% c("adultfemale5d", "adultmale5d"))
# Filter the counts file to keep only those same samples
filtered_counts <- counts[, colnames(counts) %in% filtered_samples$sample.id]
# Print the filtered groups dataframe
filtered_samples
## sample.id num.tech.reps stage
## 1 SRX008024 2 adultfemale5d
## 2 SRX008173 1 adultfemale5d
## 3 SRX008195 1 adultfemale5d
## 4 SRX008202 1 adultfemale5d
## 5 SRX008254 1 adultfemale5d
## 6 SRX012270 2 adultfemale5d
## 7 SRX008021 3 adultmale5d
## 8 SRX008203 1 adultmale5d
## 9 SRX008226 1 adultmale5d
## 10 SRX008229 1 adultmale5d
## 11 SRX008232 2 adultmale5d
## 12 SRX012271 4 adultmale5d
# Print the first 10 lines of the filtered counts dataframe
head(filtered_counts, 10)
## SRX008024 SRX008173 SRX008195 SRX008202 SRX008254 SRX012270
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 603 336 380 431 437 738
## FBgn0000014 39 35 40 36 29 54
## FBgn0000015 15 17 20 8 10 12
## FBgn0000017 1906 1123 1239 1551 1410 2504
## FBgn0000018 194 138 164 143 136 224
## FBgn0000022 1 0 1 1 0 0
## FBgn0000024 97 63 76 62 58 85
## FBgn0000028 3 4 1 5 2 5
## FBgn0000032 899 592 862 588 629 923
## SRX008021 SRX008203 SRX008226 SRX008229 SRX008232 SRX012271
## FBgn0000003 0 0 0 0 0 0
## FBgn0000008 430 200 158 430 250 1074
## FBgn0000014 137 75 47 132 74 339
## FBgn0000015 167 118 70 177 98 322
## FBgn0000017 898 461 341 777 504 2100
## FBgn0000018 154 98 54 137 71 332
## FBgn0000022 0 0 0 0 0 0
## FBgn0000024 610 360 238 645 348 1309
## FBgn0000028 160 71 64 125 80 469
## FBgn0000032 303 189 114 340 151 730
#Create the DESeqDataSet object
#A DESeqDataSet object is the data structure in the DESeq2 package for performing differential expression analysis
#print the DESeqDataSet object by typing the variable name on a new line by itself
# Create the DESeqDataSet object
library(DESeq2)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")
## Bioconductor version 3.22 (BiocManager 1.30.27), R 4.5.2 (2025-10-31 ucrt)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'DESeq2'
## Installation paths not writeable, unable to update packages
## path: C:/Program Files/R/R-4.5.2/library
## packages:
## cluster, foreign, lattice, mgcv, survival
## Old packages: 'later', 'phyloseq', 'renv', 'systemfonts', 'vegan'
dds <- DESeqDataSetFromMatrix(
countData = filtered_counts,
colData = filtered_samples,
design = ~ stage
)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# Print the object
dds
## class: DESeqDataSet
## dim: 14869 12
## metadata(1): version
## assays(1): counts
## rownames(14869): FBgn0000003 FBgn0000008 ... FBgn0261574 FBgn0261575
## rowData names(0):
## colnames(12): SRX008024 SRX008173 ... SRX008232 SRX012271
## colData names(3): sample.id num.tech.reps stage
#call the DESeq function on a DESeqDataSet object
#extract the results and save them to the variable name
#print the first 10 lines of the results file
# Run the differential expression analysis
library(DESeq2)
filtered_samples <- filtered_samples[match(colnames(filtered_counts), filtered_samples$sample.id), ]
dds <- DESeqDataSetFromMatrix(
countData = filtered_counts,
colData = filtered_samples,
design = ~ stage
)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
head(res, 10)
## log2 fold change (MLE): stage adultmale5d vs adultfemale5d
## Wald test p-value: stage adultmale5d vs adultfemale5d
## DataFrame with 10 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## FBgn0000003 0.000000 NA NA NA NA
## FBgn0000008 407.826981 -0.562735 0.0858582 -6.554233 5.59287e-11
## FBgn0000014 71.817903 1.425201 0.1308824 10.889174 1.29812e-27
## FBgn0000015 75.111228 3.266039 0.2079481 15.706033 1.37519e-55
## FBgn0000017 1147.875974 -1.275833 0.0837582 -15.232330 2.15763e-52
## FBgn0000018 141.237348 -0.574080 0.0884264 -6.492177 8.46048e-11
## FBgn0000022 0.260019 -1.570634 2.0699669 -0.758772 4.47989e-01
## FBgn0000024 276.893656 2.679440 0.0921583 29.074327 7.58177e-186
## FBgn0000028 61.379165 5.158434 0.3570199 14.448590 2.55869e-47
## FBgn0000032 502.076585 -1.656786 0.0929024 -17.833624 3.87474e-71
## padj
## <numeric>
## FBgn0000003 NA
## FBgn0000008 8.50316e-11
## FBgn0000014 2.75901e-27
## FBgn0000015 5.23212e-55
## FBgn0000017 7.78738e-52
## FBgn0000018 1.28078e-10
## FBgn0000022 4.79309e-01
## FBgn0000024 1.00697e-184
## FBgn0000028 8.39766e-47
## FBgn0000032 1.82417e-70
#It is important to know the order of the condition/group levels in a DESeqDataSet object so we know what our reference condition is
#The first condition listed in the DESeqDataSet object is the reference condition
#A positive gene fold change means that the gene is upregulated in the second condition relative to the first (reference) condition.
#Check the order of the condition levels by typing name_of_your_DESeq_object$stage
dds$stage <- relevel(dds$stage, ref = "adultmale5d")
dds <- DESeq(dds) # rerun DESeq after changing reference
## using pre-existing size factors
## estimating dispersions
## found already estimated dispersions, replacing these
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
Which of the conditions from our DESeqDataSet object is going to be used as the reference level?
What is the biological meaning of the log2 fold change for gene FBgn0000014?
#In R, compute the log2 fold change of a gene that has:
#A gene expression equal to 230 in the “adultfemale5d” condition.
#A gene expression equal to 750 in the “adultmale5d” condition.
#Express the results where the adultfemale5d is the reference condition
# Expression values
expr_ref <- 230 # adultfemale5d
expr_second <- 750 # adultmale5d
# Compute log2 fold change
log2FC <- log2(expr_second / expr_ref)
log2FC
## [1] 1.705257
Select the answer that completes the true statement:
A log2 fold change of [blank] means that the gene has [blank] expression (approximately[blank]) in [blank] compared to [blank]
#Filter your DESeq2 results so that only genes that meet a p-adjusted significance level of 0.05 and show at least a 2-fold change are retained
#When you perform thousands of statistical tests (one for each gene), you will by chance call genes differentially expressed while they are not (false positives). You can control for this by applying certain statistical procedures called multiple hypothesis test correction.
#Print the dimensions of this list
# Filter significant genes: padj < 0.05 and |log2FoldChange| >= 1 (2-fold change)
sig_genes <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) >= 1), ]
# Print the number of genes and columns
dim(sig_genes)
## [1] 8258 6
#To the unfiltered results, add a column called "category" whose entry indicates
#whether a gene falls into one of three groups:
# 1) adjusted p-value > 0.05 AND |log2FoldChange| < 2 → color = "red"
# 2) adjusted p-value ≤ 0.05 AND |log2FoldChange| < 2 → color = "blue"
# 3) adjusted p-value ≤ 0.05 AND |log2FoldChange| ≥ 2 → color = "green"
#Then, create a Volcano plot (log2FoldChange vs -log10(p-value)) colored accordingly
res_df <- as.data.frame(res) # convert to data frame if it's not already
res_df$category <- "red" # default category
# Assign blue: significant but <2-fold change
res_df$category[res_df$padj <= 0.05 & abs(res_df$log2FoldChange) < 1] <- "blue"
# Assign green: significant and ≥2-fold change
res_df$category[res_df$padj <= 0.05 & abs(res_df$log2FoldChange) >= 1] <- "green"
library(ggplot2)
ggplot(res_df, aes(x = log2FoldChange, y = -log10(pvalue), color = category)) +
geom_point(alpha = 0.6, size = 1.5) +
scale_color_manual(values = c("red" = "red", "blue" = "blue", "green" = "green")) +
theme_minimal() +
xlab("Log2 Fold Change") +
ylab("-log10(p-value)") +
ggtitle("Volcano Plot of Differentially Expressed Genes") +
theme(legend.title = element_blank())
## Warning: Removed 1800 rows containing missing values or values outside the scale range
## (`geom_point()`).
#From the top result of your filtered list from Question 11, plot the counts per sample using the plotCounts() function
# Identify the gene with the smallest adjusted p-value
top_gene <- rownames(sig_genes)[1] # the first gene in your filtered list
# Plot counts for this gene by stage
plotCounts(
dds,
gene = top_gene,
intgroup = "stage",
main = paste("Counts for", top_gene),
xlab = "Stage"
)
This plot shows this gene is [blank] in group adultmale5d.