## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'ggplot2'
## [1] "/Users/nataliamunozperez/Documents/Einstein/PhD/Martin Lab/Year 1/Data/RNA-seq/counts"
## [1] "/Users/nataliamunozperez/Documents/Einstein/PhD/Martin Lab/Year 1/Data/RNA-seq/counts"
## parent_1A parent_2A parent_3A Sample_2_1_1
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 0 Median : 0 Median : 0 Median : 0
## Mean : 1661 Mean : 1286 Mean : 1326 Mean : 1533
## 3rd Qu.: 36 3rd Qu.: 34 3rd Qu.: 28 3rd Qu.: 34
## Max. :27002543 Max. :20936225 Max. :22118521 Max. :27493245
## Sample_2_1_2 Sample_2_1_3 Sample_2_2_1 Sample_2_2_2
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 0 Median : 0 Median : 0 Median : 0
## Mean : 1872 Mean : 1626 Mean : 1473 Mean : 1387
## 3rd Qu.: 40 3rd Qu.: 36 3rd Qu.: 31 3rd Qu.: 29
## Max. :33341466 Max. :29367119 Max. :24550302 Max. :23262095
## Sample_2_2_3 Sample_6_1_1 Sample_6_1_2 Sample_6_1_3
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 0 Median : 0 Median : 0 Median : 0
## Mean : 2030 Mean : 1550 Mean : 1563 Mean : 1720
## 3rd Qu.: 45 3rd Qu.: 31 3rd Qu.: 35 3rd Qu.: 36
## Max. :35812014 Max. :26936999 Max. :27213534 Max. :28815476
## Sample_Scr2_1A Sample_Scr2_2A Sample_Scr2_3A gene_name
## Min. : 0 Min. : 0 Min. : 0 Length:55541
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 0 Median : 0 Median : 0 Mode :character
## Mean : 1533 Mean : 1446 Mean : 1389
## 3rd Qu.: 36 3rd Qu.: 34 3rd Qu.: 32
## Max. :23431953 Max. :23751800 Max. :22851275
## [1] 55541 16
## [1] "/Users/nataliamunozperez/Documents/Einstein/PhD/Martin Lab/Year 1/Data/RNA-seq/counts"
## [1] "/Users/nataliamunozperez/Documents/Einstein/PhD/Martin Lab/Year 1/Data/RNA-seq/counts"
## [1] 15 3
3.Generate a dds object
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'DESeq2'
## Warning: package 'S4Vectors' was built under R version 4.1.2
## Warning: package 'GenomicRanges' was built under R version 4.1.2
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## class: DESeqDataSet
## dim: 55541 12
## metadata(1): version
## assays(1): counts
## rownames(55541): ENSMUSG00000000001.4 ENSMUSG00000000003.15 ...
## __not_aligned __alignment_not_unique
## rowData names(0):
## colnames(12): parent_1A parent_2A ... Sample_6_1_2 Sample_6_1_3
## colData names(3): condition group replicate
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## class: DESeqDataSet
## dim: 55541 12
## metadata(1): version
## assays(1): counts
## rownames(55541): ENSMUSG00000000001.4 ENSMUSG00000000003.15 ...
## __not_aligned __alignment_not_unique
## rowData names(0):
## colnames(12): Sample_2_1_1 Sample_2_1_2 ... Sample_Scr2_2A
## Sample_Scr2_3A
## colData names(3): condition group replicate
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## class: DESeqDataSet
## dim: 55541 6
## metadata(1): version
## assays(1): counts
## rownames(55541): ENSMUSG00000000001.4 ENSMUSG00000000003.15 ...
## __not_aligned __alignment_not_unique
## rowData names(0):
## colnames(6): parent_1A parent_2A ... Sample_Scr2_2A Sample_Scr2_3A
## colData names(3): condition group replicate
3b. Make sure factor levels are correct
## [1] "parent" "scr2"
4.Pre-filtering: filter out lowly expressed genes by only keeping rows with at least 10 counts per gene
## log2 fold change (MLE): condition clone vs control
## Wald test p-value: condition clone vs control
## DataFrame with 55541 rows and 6 columns
## baseMean log2FoldChange lfcSE stat
## <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000000001.4 1.50452e+04 0.524650 0.0853515 6.1469390
## ENSMUSG00000000003.15 6.33682e-02 0.126886 3.5986409 0.0352595
## ENSMUSG00000000028.15 3.34137e+03 -0.194482 0.1339397 -1.4520105
## ENSMUSG00000000031.16 6.07257e+02 -7.770508 0.9940525 -7.8169996
## ENSMUSG00000000037.16 1.84680e+00 -1.383965 1.2275245 -1.1274438
## ... ... ... ... ...
## __no_feature 4391282 -0.2050530 0.1304784 -1.57155
## __ambiguous 2495136 -0.2823281 0.1409592 -2.00291
## __too_low_aQual 0 NA NA NA
## __not_aligned 0 NA NA NA
## __alignment_not_unique 26697626 0.0901309 0.0629959 1.43074
## pvalue padj
## <numeric> <numeric>
## ENSMUSG00000000001.4 7.89925e-10 6.10884e-08
## ENSMUSG00000000003.15 9.71873e-01 NA
## ENSMUSG00000000028.15 1.46499e-01 4.54375e-01
## ENSMUSG00000000031.16 5.40973e-15 9.34790e-13
## ENSMUSG00000000037.16 2.59555e-01 NA
## ... ... ...
## __no_feature 0.1160556 0.400921
## __ambiguous 0.0451873 0.224862
## __too_low_aQual NA NA
## __not_aligned NA NA
## __alignment_not_unique 0.1525045 0.464131
## log2 fold change (MLE): condition clone vs control
## Wald test p-value: condition clone vs control
## DataFrame with 55541 rows and 6 columns
## baseMean log2FoldChange lfcSE stat
## <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000000001.4 1.59289e+04 0.2414369 0.0737715 3.2727678
## ENSMUSG00000000003.15 6.42570e-02 0.2291678 3.5986409 0.0636818
## ENSMUSG00000000028.15 3.33477e+03 -0.0929964 0.1276046 -0.7287859
## ENSMUSG00000000031.16 5.15760e+01 -3.9539147 0.9023733 -4.3816843
## ENSMUSG00000000037.16 1.28053e+00 0.3673577 1.4522845 0.2529516
## ... ... ... ... ...
## __no_feature 4257890 0.0612479 0.1278220 0.479166
## __ambiguous 2457085 -0.1130532 0.0777348 -1.454345
## __too_low_aQual 0 NA NA NA
## __not_aligned 0 NA NA NA
## __alignment_not_unique 26744126 0.1852999 0.0580834 3.190241
## pvalue padj
## <numeric> <numeric>
## ENSMUSG00000000001.4 1.06500e-03 0.01108376
## ENSMUSG00000000003.15 9.49224e-01 NA
## ENSMUSG00000000028.15 4.66133e-01 0.73032335
## ENSMUSG00000000031.16 1.17765e-05 0.00024855
## ENSMUSG00000000037.16 8.00306e-01 NA
## ... ... ...
## __no_feature 0.63182074 0.8399297
## __ambiguous 0.14585071 0.3904142
## __too_low_aQual NA NA
## __not_aligned NA NA
## __alignment_not_unique 0.00142154 0.0139158
## log2 fold change (MLE): group scr2 vs parent
## Wald test p-value: group scr2 vs parent
## DataFrame with 55541 rows and 6 columns
## baseMean log2FoldChange lfcSE stat
## <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000000001.4 11788.31229 0.288504 0.162454 1.775916
## ENSMUSG00000000003.15 0.00000 NA NA NA
## ENSMUSG00000000028.15 3347.62931 -0.096815 0.195210 -0.495953
## ENSMUSG00000000031.16 1202.45920 -3.809510 0.763116 -4.992044
## ENSMUSG00000000037.16 2.06334 -1.741200 1.662284 -1.047474
## ... ... ... ... ...
## __no_feature 4192386 -0.2602742 0.224274 -1.160519
## __ambiguous 2553811 -0.1632179 0.251762 -0.648301
## __too_low_aQual 0 NA NA NA
## __not_aligned 0 NA NA NA
## __alignment_not_unique 23151471 -0.0896871 0.155899 -0.575291
## pvalue padj
## <numeric> <numeric>
## ENSMUSG00000000001.4 7.57467e-02 0.628107119
## ENSMUSG00000000003.15 NA NA
## ENSMUSG00000000028.15 6.19928e-01 0.979093604
## ENSMUSG00000000031.16 5.97437e-07 0.000110118
## ENSMUSG00000000037.16 2.94881e-01 NA
## ... ... ...
## __no_feature 0.245837 0.871268
## __ambiguous 0.516790 0.967566
## __too_low_aQual NA NA
## __not_aligned NA NA
## __alignment_not_unique 0.565094 0.972468
##
## out of 38992 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 1340, 3.4%
## LFC < 0 (down) : 1778, 4.6%
## outliers [1] : 74, 0.19%
## low counts [2] : 16275, 42%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## out of 38952 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 2255, 5.8%
## LFC < 0 (down) : 2224, 5.7%
## outliers [1] : 43, 0.11%
## low counts [2] : 15516, 40%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## out of 35175 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 208, 0.59%
## LFC < 0 (down) : 255, 0.72%
## outliers [1] : 30, 0.085%
## low counts [2] : 16529, 47%
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] 3118
## [1] 4479
## [1] 463
6: Use the alpha argument in the results function to identify a different p-value cut off (in this case, 0.05) and output a summary with the chosen cutoff.
##
## out of 38992 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 1001, 2.6%
## LFC < 0 (down) : 1417, 3.6%
## outliers [1] : 74, 0.19%
## low counts [2] : 15537, 40%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## out of 38952 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 1751, 4.5%
## LFC < 0 (down) : 1723, 4.4%
## outliers [1] : 43, 0.11%
## low counts [2] : 16253, 42%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## out of 35175 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 154, 0.44%
## LFC < 0 (down) : 185, 0.53%
## outliers [1] : 30, 0.085%
## low counts [2] : 13885, 39%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
8b. MA plots and volcano plots
9.Using the plotcounts functions, you can visualize the read counts of a gene across different groups.
Plot the gene from your results that has the lowest p-value from your results.
## [1] "Ddx19b"
## [1] "Sri"
## [1] "Rundc3b"
12.Save your results into a .txt file
13.Plot PCA
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'AnnotationDbi'
## Warning: package 'AnnotationDbi' was built under R version 4.1.2
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'AnnotationHub'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'org.Mm.eg.db'
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MGI"
## [16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UNIPROT"
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MGI"
## [16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UNIPROT"
## [1] "11287" "11298" "11302" "11303" "11304" "11305"
## [1] "ENSMUSG00000033658" "ENSMUSG00000060803" "ENSMUSG00000051579"
## [4] "ENSMUSG00000046991" "ENSMUSG00000117904" "ENSMUSG00000074863"
## [1] "ENSMUSG00000003161" "ENSMUSG00000040570" "ENSMUSG00000003623"
## [4] "ENSMUSG00000079659" "ENSMUSG00000054099" "ENSMUSG00000033658"
## [1] "ENSMUSG00000040570" "ENSMUSG00000003623" "ENSMUSG00000003161"
## [4] "ENSMUSG00000056004" "ENSMUSG00000054099" "ENSMUSG00000002297"
## AnnotationHub with 1 record
## # snapshotDate(): 2021-10-20
## # names(): AH78811
## # $dataprovider: Ensembl
## # $species: Mus musculus
## # $rdataclass: EnsDb
## # $rdatadateadded: 2019-10-29
## # $title: Ensembl 99 EnsDb for Mus musculus
## # $description: Gene and protein annotations for Mus musculus based on Ensem...
## # $taxonomyid: 10090
## # $genome: GRCm38
## # $sourcetype: ensembl
## # $sourceurl: http://www.ensembl.org
## # $sourcesize: NA
## # $tags: c("99", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
## # "Protein", "Transcript")
## # retrieve record with 'object[["AH78811"]]'
## EnsDb for Ensembl:
## |Backend: SQLite
## |Db type: EnsDb
## |Type of Gene ID: Ensembl Gene ID
## |Supporting package: ensembldb
## |Db created by: ensembldb package from Bioconductor
## |script_version: 0.3.5
## |Creation time: Wed Feb 5 00:04:44 2020
## |ensembl_version: 99
## |ensembl_host: localhost
## |Organism: Mus musculus
## |taxonomy_id: 10090
## |genome_build: GRCm38
## |DBSCHEMAVERSION: 2.1
## | No. of genes: 56289.
## | No. of transcripts: 144726.
## |Protein data available.
## [1] "DESCRIPTION" "ENTREZID" "EXONID"
## [4] "EXONIDX" "EXONSEQEND" "EXONSEQSTART"
## [7] "GCCONTENT" "GENEBIOTYPE" "GENEID"
## [10] "GENEIDVERSION" "GENENAME" "GENESEQEND"
## [13] "GENESEQSTART" "INTERPROACCESSION" "ISCIRCULAR"
## [16] "PROTDOMEND" "PROTDOMSTART" "PROTEINDOMAINID"
## [19] "PROTEINDOMAINSOURCE" "PROTEINID" "PROTEINSEQUENCE"
## [22] "SEQCOORDSYSTEM" "SEQLENGTH" "SEQNAME"
## [25] "SEQSTRAND" "SYMBOL" "TXBIOTYPE"
## [28] "TXCDSSEQEND" "TXCDSSEQSTART" "TXID"
## [31] "TXIDVERSION" "TXNAME" "TXSEQEND"
## [34] "TXSEQSTART" "TXSUPPORTLEVEL" "UNIPROTDB"
## [37] "UNIPROTID" "UNIPROTMAPPINGTYPE"
## [1] "ENTREZID" "EXONID" "GENEBIOTYPE"
## [4] "GENEID" "GENENAME" "PROTDOMID"
## [7] "PROTEINDOMAINID" "PROTEINDOMAINSOURCE" "PROTEINID"
## [10] "SEQNAME" "SEQSTRAND" "SYMBOL"
## [13] "TXBIOTYPE" "TXID" "TXNAME"
## [16] "UNIPROTID"
## Warning: Unable to map 5 of 2431 requested IDs.
## Warning: Unable to map 10 of 3458 requested IDs.
## Warning: Unable to map 1 of 348 requested IDs.
Make a pretty heatmap
Gene ontology
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'topGO'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'org.Mm.eg.db'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'tidygraph'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'clusterProfiler'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'enrichplot'
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'ggplot2'