Today we began our independent data research. First step - Install DESeq2- I will have to this every time I restart this R project to get access to the library.
We are going to compare mutant auxin to WT auxin In order to do this we are going to use many different R functions and then GoTerm analysis
The purpose of this investigation is to understand which genes are regulated by NHR-25 within C. elegans, and which tissues these genes are expressed in so that we may decipher phenotypic associations. We are going to investigate gene expression differentiation between C. elegans that had transient knockdown of NHR-25 and wildtype C. elegans using DESeq2. Differentially expressed genes will then be run through GO Term enrichment analysis to identify conserved pathways and the top 50 regulated genes will be selected for further investigation. The genes will then run through a tissue and phenotypic enrichment analysis to locate the gene expression and phenotypic associations.
The command below must be run each time to download the DESeq2 package in order to look at our data
plot(cars)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'DESeq2'
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'trying URL 'https://bioconductor.org/packages/3.8/bioc/bin/macosx/el-capitan/contrib/3.5/DESeq2_1.22.2.tgz'
Content type 'application/x-gzip' length 4063874 bytes (3.9 MB)
==================================================
downloaded 3.9 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
Update old packages: 'MASS', 'Matrix', 'mgcv', 'RcppArmadillo', 'rlang', 'survival', 'xfun'
Update all/some/none? [a/s/n]:
a
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'
There are binary versions available but the source versions are later:
no
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/MASS_7.3-51.4.tgz'
Content type 'application/x-gzip' length 1163710 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/Matrix_1.2-17.tgz'
Content type 'application/x-gzip' length 4700204 bytes (4.5 MB)
==================================================
downloaded 4.5 MB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/mgcv_1.8-28.tgz'
Content type 'application/x-gzip' length 3115955 bytes (3.0 MB)
==================================================
downloaded 3.0 MB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/RcppArmadillo_0.9.300.2.0.tgz'
Content type 'application/x-gzip' length 1676095 bytes (1.6 MB)
==================================================
downloaded 1.6 MB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rlang_0.3.1.tgz'
Content type 'application/x-gzip' length 1122574 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/survival_2.44-1.1.tgz'
Content type 'application/x-gzip' length 5175991 bytes (4.9 MB)
==================================================
downloaded 4.9 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
installing the source package ‘xfun’
trying URL 'https://cran.rstudio.com/src/contrib/xfun_0.6.tar.gz'
Content type 'application/x-gzip' length 54516 bytes (53 KB)
==================================================
downloaded 53 KB
* installing *source* package ‘xfun’ ...
** package ‘xfun’ successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (xfun)
The downloaded source packages are in
‘/private/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T/RtmpKyO24P/downloaded_packages’
library("DESeq2")
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap,
parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colMeans, colnames, colSums,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste,
pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite
Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: DelayedArray
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
Loading required package: BiocParallel
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply
After downloading DESeq2 we wanted to create a project table with our data in it,we decided to first run the data with all 6 of our data sets
AllProject_table <- read.table("Ourdatatable.txt", header = TRUE, sep= '\t')
AllProject_data <- DESeqDataSetFromHTSeqCount(sampleTable = AllProject_table, directory = 'Alldata', design = ~ Condition)
Normal_AllProject_Data_In_R <- vst(AllProject_data, blind = FALSE)
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
plotPCA(Normal_AllProject_Data_In_R, intgroup=c("Condition", "Replicate"))
This plot shows that MUT1 has a lot of variation compared to the other mut data sets, may be a problem with it.
Can find the plot over by plots, we can clearly see that Mut 1 is not similar to the other mutation data sets, which may cause problems in the future, also use the view feature in order to see the plot that was made
Deseq_Allproject_analysis_data <- DESeq(AllProject_data)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Results_for_AllProject <- results(Deseq_Allproject_analysis_data, contrast = c("Condition", "mut", "norm"))
summary(Results_for_AllProject)
out of 26103 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 102, 0.39%
LFC < 0 (down) : 97, 0.37%
outliers [1] : 1674, 6.4%
low counts [2] : 13225, 51%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
These are the summaries for the DESeq analysis. We can see theat 102 genes are upregulated and 97 are downregulated.
Next we want to make a plot to see if some genes are being over/underexpressed look at fold changes
RNAi_N2_res_ordered <- Results_for_AllProject [order(Results_for_AllProject$pvalue),]
RNAi_N2_res_sig <- subset(RNAi_N2_res_ordered, padj < .05)
write.table(as.data.frame(RNAi_N2_res_sig), file="Final_AllData_for_GOTERM.txt", sep = "\t")
plotMA(Results_for_AllProject)
View(plotMA(Results_for_AllProject))
To view the plot do View(plotMA(Results_for_AllProject)) also taking screen shots of the plots
Summary-this plot shows us that the red are outliers that are either upregulated or downregulated-a visual representation
Next we wanted to create a pheatmap to see which genes are being overexpressed
install.packages("pheatmap")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/pheatmap_1.0.12.tgz'
Content type 'application/x-gzip' length 75564 bytes (73 KB)
==================================================
downloaded 73 KB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//Rtmp8BR6vx/downloaded_packages
library("pheatmap")
subset_first_25_genes <- order(rowMeans(counts(Deseq_Allproject_analysis_data, normalized = TRUE)), decreasing = TRUE)[1:25]
pheatmap_aux_analysis <- normTransform(AllProject_data)
pheatmap(assay(pheatmap_aux_analysis)[subset_first_25_genes,])
since we can see that there is a problem with KRY85_1_aux (all of the red on the pheatmap) we have decided that we are going to try and see what the results are like if we do not use this data Therefore I am going to run everything again but without that one data set
since we can see that there is a problem with KRY85_1_aux we have decided that we are going to try and see what the results are like if we do not use this data Therefore I am going to run everything again but without that one data set Furthermore we wanted to run Glimma–which is a new tool that Dr. TS showed us.
We furthermore wanted to investigate just the one data set being taken out since from the principle component we can see that the three norms are all relatively similar, therefore the more data the better, truly the only odd one is KRY85_1_aux.
OneProject_table <- read.table("Ourdatatableonegone.txt", header = TRUE, sep= '\t')
OneProject_data <- DESeqDataSetFromHTSeqCount(sampleTable = OneProject_table, directory = 'Onedata', design = ~ Condition)
Normal_OneProject_Data_In_R <- vst(OneProject_data, blind = FALSE)
plotPCA(Normal_OneProject_Data_In_R, intgroup=c("Condition", "Replicate"))
Deseq_Oneproject_analysis_data <- DESeq(OneProject_data)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Results_for_OneProject <- results(Deseq_Oneproject_analysis_data, contrast = c("Condition", "mut", "norm"))
summary(Results_for_OneProject)
out of 26089 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 229, 0.88%
LFC < 0 (down) : 336, 1.3%
outliers [1] : 1, 0.0038%
low counts [2] : 11259, 43%
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
RNAi_N2_oneres_ordered <- Results_for_OneProject [order(Results_for_OneProject$pvalue),]
RNAi_N2_oneres_sig <- subset(RNAi_N2_oneres_ordered, padj < .05)
write.table(as.data.frame(RNAi_N2_oneres_sig), file="Final_OneData_for_GOTERM.txt", sep = "\t")
plotMA(Results_for_OneProject)
View(plotMA(Results_for_OneProject))
install.packages("pheatmap")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/pheatmap_1.0.12.tgz'
Content type 'application/x-gzip' length 75564 bytes (73 KB)
==================================================
downloaded 73 KB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
library("pheatmap")
subset_onefirst_25_genes <- order(rowMeans(counts(Deseq_Oneproject_analysis_data, normalized = TRUE)), decreasing = TRUE)[1:25]
pheatmap_auxone_analysis <- normTransform(OneProject_data)
pheatmap(assay(pheatmap_auxone_analysis)[subset_onefirst_25_genes,])
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenomeInfoDb", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'GenomeInfoDb'
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'trying URL 'https://bioconductor.org/packages/3.8/bioc/bin/macosx/el-capitan/contrib/3.5/GenomeInfoDb_1.18.2.tgz'
Content type 'application/x-gzip' length 3801774 bytes (3.6 MB)
==================================================
downloaded 3.6 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
Update old packages: 'rlang'
Update all/some/none? [a/s/n]:
a
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'
There is a binary version available but the source version is later:
no
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rlang_0.3.1.tgz'
Content type 'application/x-gzip' length 1122574 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
library("GenomeInfoDb")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Glimma", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'Glimma'
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'trying URL 'https://bioconductor.org/packages/3.8/bioc/bin/macosx/el-capitan/contrib/3.5/Glimma_1.10.1.tgz'
Content type 'application/x-gzip' length 3483591 bytes (3.3 MB)
==================================================
downloaded 3.3 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
Update old packages: 'rlang'
Update all/some/none? [a/s/n]:
a
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'
There is a binary version available but the source version is later:
no
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rlang_0.3.1.tgz'
Content type 'application/x-gzip' length 1122574 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
library(Glimma)
status <- as.numeric(Results_for_OneProject$padj < .1)
anno_onegene <- data.frame(GeneID=rownames(Results_for_OneProject), symbol = rownames(Results_for_OneProject))
glMDPlot(Results_for_OneProject, status = status, counts = counts(Deseq_Oneproject_analysis_data, normalized = TRUE), groups= Deseq_Oneproject_analysis_data$Condition,transform = TRUE, samples = colnames(Deseq_Oneproject_analysis_data), anno = anno_onegene, path = './', folder = "glimma_MD", launch = FALSE)
Data is in all of the plots above and the DESeq results are below showing over/underexpression
out of 26089 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 229, 0.88% LFC < 0 (down) : 336, 1.3% outliers [1] : 1, 0.0038% low counts [2] : 11259, 43% (mean count < 4)
Interesting that taking out the one file, leasds to more upregulated genes and downregulated Clearing the one data set was causing a lot of problems, however we can also see that mut 2 and 3 are still quite different and norm 3 is very different from norm 1 and 2 from our PCA plot.
Glimma link file:///Users/mhunter/Desktop/Genomics%20Lab%20Methods/glimma_MD/MD-Plot.html
Plot is saved on desktop names onedata
Class note 3/14/19 chipSeq data-chromatin precipitation, where transcription factor binds to the geneome and freeze those associations and then sequence where transpriction binds–this is on the galaxy organism—how to use modERN-ce10–wormby gene—-most useful are the peaks and called where it is binding–lots of different data sets and look how it is in the paper–where a transcription factor is binding—unzip it—region peak fine–use IGV and download it,
ALSO WE NEED TO USE THE EXCEL link on moodle to find the tissues HiSeq–number of bases in each gene and number of transcripts also there is a Jaspar–a place where peoaple ahve catalouged, >and then past the sequence from IGV- use to see if there are patterns
Brooke is having problems with her data set, so I have decided to run her data (which was taking out both KRY85_1_aux and N2_1_auxin to see whether by removing both data sets we have a difference in the genes that are over/underexpressed. I previously used the data that brooke sent me and tried to do a GENE Venn diagram however, found that her data and alldata were the same, however using my data there were many differnces–This venndiagram was saved in ourdata folder on my computer–Also going to try to find a way to run a venndiagram on my R
TwoProject_table <- read.table("Ourdatatabletwo.txt", header = TRUE, sep= '\t')
incomplete final line found by readTableHeader on 'Ourdatatabletwo.txt'
TwoProject_data <- DESeqDataSetFromHTSeqCount(sampleTable = TwoProject_table, directory = 'Twodata', design = ~ Condition)
Normal_TwoProject_Data_In_R <- vst(TwoProject_data, blind = FALSE)
plotPCA(Normal_TwoProject_Data_In_R, intgroup=c("Condition", "Replicate"))
Deseq_Twoproject_analysis_data <- DESeq(TwoProject_data)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Results_for_TwoProject <- results(Deseq_Twoproject_analysis_data, contrast = c("Condition", "mut", "norm"))
summary(Results_for_TwoProject)
out of 23647 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 67, 0.28%
LFC < 0 (down) : 125, 0.53%
outliers [1] : 0, 0%
low counts [2] : 11871, 50%
(mean count < 9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
RNAi_N2_resTwo_ordered <- Results_for_TwoProject [order(Results_for_TwoProject$pvalue),]
RNAi_N2_resTwo_sig <- subset(RNAi_N2_resTwo_ordered, padj < .05)
write.table(as.data.frame(RNAi_N2_resTwo_sig), file="Final_TwoData_for_GOTERM.txt", sep = "\t")
plotMA(Results_for_TwoProject)
View(plotMA(Results_for_TwoProject))
subset_first_25Two_genes <- order(rowMeans(counts(Deseq_Twoproject_analysis_data, normalized = TRUE)), decreasing = TRUE)[1:25]
pheatmap_auxTwo_analysis <- normTransform(TwoProject_data)
pheatmap(assay(pheatmap_auxTwo_analysis)[subset_first_25Two_genes,])
BiocManager::install("GenomeInfoDb", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'GenomeInfoDb'
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'trying URL 'https://bioconductor.org/packages/3.8/bioc/bin/macosx/el-capitan/contrib/3.5/GenomeInfoDb_1.18.2.tgz'
Content type 'application/x-gzip' length 3801774 bytes (3.6 MB)
==================================================
downloaded 3.6 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
Update old packages: 'rlang'
Update all/some/none? [a/s/n]:
a
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'
There is a binary version available but the source version is later:
no
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rlang_0.3.1.tgz'
Content type 'application/x-gzip' length 1122574 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
library("GenomeInfoDb")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Glimma", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'Glimma'
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'trying URL 'https://bioconductor.org/packages/3.8/bioc/bin/macosx/el-capitan/contrib/3.5/Glimma_1.10.1.tgz'
Content type 'application/x-gzip' length 3483591 bytes (3.3 MB)
==================================================
downloaded 3.3 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
Update old packages: 'rlang'
Update all/some/none? [a/s/n]:
a
cannot open URL 'https://bioconductor.org/packages/3.8/data/experiment/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.8/workflows/bin/macosx/el-capitan/contrib/3.5/PACKAGES.rds': HTTP status was '404 Not Found'
There is a binary version available but the source version is later:
no
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rlang_0.3.1.tgz'
Content type 'application/x-gzip' length 1122574 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
The downloaded binary packages are in
/var/folders/r4/rf1jhdcd3szb7m4lswmzhk740000gn/T//RtmpKyO24P/downloaded_packages
library(Glimma)
status <- as.numeric(Results_for_TwoProject$padj < .1)
anno_twogene <- data.frame(GeneID=rownames(Results_for_TwoProject), symbol = rownames(Results_for_TwoProject))
glMDPlot(Results_for_TwoProject, status = status, counts = counts(Deseq_Twoproject_analysis_data, normalized = TRUE), groups= Deseq_Twoproject_analysis_data$Condition,transform = TRUE, samples = colnames(Deseq_Twoproject_analysis_data), anno = anno_twogene, path = './', folder = "glimma_twoMD", launch = FALSE)
DESeq-data out of 23647 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 67, 0.28% LFC < 0 (down) : 125, 0.53% outliers [1] : 0, 0% low counts [2] : 11871, 50% (mean count < 9)
If we take out two we have less under/overexpression of genes when we do with one gone
Below are my attempts to run a new R package VennDiagram This program would be so useful bc it would allow us ro title and change the color of the venndiagram, furthermore we would be able to do analysis on more than three data sets and would be able to change the size to correspond wiht the data sets size.
We are going to present on this data in class on Tuesday Struggling to get the data to work in the VennDiagram, might just keep using the GeneVenn Diagram
install.packages('VennDiagram')
library(VennDiagram)
draw.pairwise.venn(area1, area2, cross.area, category = rep("", 2),
euler.d = TRUE, scaled = TRUE, inverted = FALSE,
ext.text = TRUE, ext.percent = rep(0.05, 3), lwd =
rep(2, 2), lty = rep("solid", 2), col = rep("black",
2), fill = NULL, alpha = rep(0.5, 2), label.col =
rep("black", 3), cex = rep(1, 3), fontface =
rep("plain", 3), fontfamily = rep("serif", 3), cat.pos
= c(-50, 50), cat.dist = rep(0.025, 2), cat.cex =
rep(1, 2), cat.col = rep("black", 2), cat.fontface =
rep("plain", 2), cat.fontfamily = rep("serif", 2),
cat.just = rep(list(c(0.5, 0.5)), 2), cat.default.pos
= "outer", cat.prompts = FALSE, ext.pos = rep(0, 2),
ext.dist = rep(0, 2), ext.line.lty = "solid",
ext.length = rep(0.95, 2), ext.line.lwd = 1,
rotation.degree = 0, rotation.centre = c(0.5, 0.5),
ind = TRUE, sep.dist = 0.05, offset = 0, cex.prop =
NULL, print.mode = "raw", sigdigs = 3, ...)
Answer to last check. Our heatmaps are quite intesting. The first one we did clearly showed problems with one of our data points, however when we removed 1 data set we saw that we had a big change in what the heatmap looked like. Clearly there is a lot of variation even between the the normal data sets. Specifically it is interesting to look at when the two data sets are gone. It almost looks like from the heatmap that mut 3 and norm 2 are more simialr to one another, and vice versa, mut 2 and norm 3. However when look at hte principle component analysis we know this is not the case.
Today I am going to begin examining the actual genes that are being up/down regulated! Going to use GennVenn-the red circle is AllData, the yellow circle is OneData, and green is TwoData sets There are 75 genes that are expressed differently than all the data, and so we are going to focus on those data sets. Below are the gene names that are similar. It is interesting to not that the TwoData sets shares no similarity with the AllData set on its own. Overall all three share 45 of the same up/downregulated genes.
I am going to spend the next few classes making notes on these genes and what we know of them. I am saving the places that I cite in Zotero
Most of these genes we know nothing about–these top 20 or so B0218.7 C04F12.8 C08F8.6 C10G11.8– functional partners: gsp-3: Serine/threonine-protein phosphatase PP1-gamma; Probable phosphatase which plays a redundant role with gsp-4 in spermatogenesis by regulating sister chromatid segregation during meiosis. In addition, involved in sperm motility by controlling the dynamic disassembly of major sperm proteins (MSP) in the spermatozoan pseudopodium Identifier: W09C3.6, gsp-3 Organism: Caenorhabditis elegans C13A2.1 C13A2.3 C30H6.12 C34F11.5 C34H4.2 C38D9.2 D1007.13 E01G6.1 F01G10.5 F07G11.3 F11D11.3 F15B9.6 F20D6.2 F21F3.3 F26D2.10 F27E5.3 F33H12.7 F35E2.5 F37A4.4 F37A4.5 F42G8.8 F53B6.4 F55F8.7 K08D9.2 K08D9.6 R04B5.6 T03F7.7 T05A7.1 T16G12.7 T22C1.8 T24E12.5 T25B9.6 W01B6.2 W02A2.8 W03F11.4 Y106G6G.4 Y113G7C.1 Y37H2A.14 Y47H10A.5 Y49E10.10 Y57G11C.42 Y6E2A.10 ZK484.7 ZK616.61 ZK849.1 ZK970.7
Cut off of what we know
acp-6 alh-9 cbl-1 clec-42 cnp-2 col-163 col-38 col-45 col-49
cutl-16-Brooke is looking into cyn-6-Brooke is looking into dmd-10-Brooke is looking into dod-20-Brooke is looking into
dpy-1 dumPY–shorter than WT Process: body morphogenesis IMP PubMed positive regulation of multicellular organism growth
fer-1 Human orthologs:DYSF-LOOK INTO THIS LATER ON****** Sperm vesicle fusion protein-THERE ARE A TON OF PAPERS ON THIS ONE Function: calcium-dependent phospholipid binding Process: cell differentiation IEA
flagellated sperm motility IMP PubMed membrane fusion IMP PubMed multicellular organism development IEA
plasma membrane organization IMP PubMed pseudopodium organization IMP PubMed spermatogenesis IEA
spermatogenesis
gipc-1 gene expression in material anatomical entity
gipc-2
igeg-2 immunoglobulin, intergral protein of membrane
mltn-7- intergral portein of membrane
pgp-8-ATP binding, transmembrane transport
rab-11.2*********** GTP binding Rab-protein signal transduction
sams-5-ATP binding,ATP binding IEA
metal ion binding IEA
methionine adenosyltransferase activity IBA PubMed methionine adenosyltransferase activity IEA
nucleotide binding IEA
transferase activity tba-8-GTP binding-mitotic cell cycle, nucleotide binding, exoskeleton, structural constitute of
cytoskeleton
Process: S-adenosylmethionine biosynthetic process IEA
one-carbon metabolic process
MAT2A- Human ortholog, The protein encoded by this gene catalyzes the production of
S-adenosylmethionine from methionine and ATP. AdoMet is the key methyl donor in cellular processes. Expression Ubiquitous expression in thyroid (RPKM 234.0), adrenal (RPKM 141.3) and 25 other tissues
tsp-10
wrt-7– Found in Hedgehog-like family–look into this one
Going to look at these genes specfically to see what is exactly going on.
While looking at these genes, i have found that a lot of them them are found on the X Chromosome–which is helpful for sex orientation LOOK FOR GENES WITH ********** to look into them more later on
Wormbase Gene Richment Analysis
Term Expected Observed Enrichment Fold Change P value Q value structural constituent of cuticle GO:0042302 2.4 21 8.7 6.9e-15 8.9e-13 collagen trimer GO:0005581 2.5 20 8.1 1e-13 6.6e-12 dephosphorylation GO:0016311 3.9 18 4.6 2e-08 8.6e-07 phosphorus metabolic process GO:0006793 19 44 2.4 9.9e-08 3.2e-06 extracellular region GO:0005576 8.3 25 3 4.3e-07 1.1e-05 protein modification process GO:0036211 21 43 2.1 4.1e-06 8.8e-05 peptidyl-serine modification GO:0018209 1.9 10 5.2 4.2e-06 8.8e-05 peptidyl-threonine modification GO:0018210 1.7 9 5.2 1.1e-05 0.00017 nucleoside phosphate binding GO:1901265 19 34 1.8 0.00069 0.0098 peptidase activity GO:0008233 6.7 15 2.2 0.0014 0.018 purine nucleotide binding GO:0017076 17 30 1.7 0.0016 0.018 extracellular space GO:0005615 4.3 11 2.5 0.0016 0.018 carbohydrate derivative binding GO:0097367 18 31 1.7 0.0021 0.021 peptidyl-tyrosine modification GO:0018212 1.4 5 3.5 0.0033 0.03 metalloendopeptidase activity GO:0004222 1.5 5 3.2 0.0048 0.041 intrinsic component of membrane GO:0031224 86 107 1.2 0.01 0.081
Instead of working on R, I have been working on a google doc, attempting to see where the genes we are looking at are expressed within a c.elegan. We used an excel sheet given to us by Dr. TS called “C. elegans scRNA-seq data”which gave us information on the tissues that our genes were expressed in. We then created a google-doc and documented the locations that our genes were most expressed. We then created a wordle–(also known as a word cloud) for a visual representation of which tissues our genes were most highly expressed in, which were hypoermis, gonads, and body-wall-muscle.
After completing this we wanted to examine our most downregulated and overexpressed gene between mutation and normal. So we ran the r-lines below to examine the comparison between cpb-2’s expression between normal and mutatant strains. In order to do this I ran the plotcounts program on the set of data sets without the one strange mutant data set. As seen below were able to comfirm that cpb-2 is definitly downregulated compared to the normal expression level of cpb-2. We also found that there is very little information on this gene, however some is known of cpb-1 which might give us some idea of what cpb-2 does. Here is a link we used- https://wormbase.org/species/c_elegans/gene/WBGene00000771#0-9g-3 Furthermore we were able to look at other interactions this gene is having with other genes using String– https://string-db.org/network/281687.CJA09303
plotCounts(dds= Deseq_Oneproject_analysis_data, gene ="cpb-2" , intgroup= c( "Condition"))
plotCounts(dds= Deseq_Oneproject_analysis_data, gene ="K08D9.2" , intgroup= c( "Condition"))
The gene we found that was most upregulated was K08D9.2 We found that K08D9.2 was upregulated in the mutant compared to the normal data sets when we looked at the data sets with just the messed up mutant gone (called Onedata). Interestingly this gene has been found to be involved in material anatomical entity link–https://bgee.org/?page=gene&gene_id=WBGene00019524
plotCounts(dds= Deseq_Twoproject_analysis_data, gene ="cpb-2" , intgroup= c( "Condition"))
Next we wanted to look at the most upregulated gene found during our gene expression analysis when we were looking at two data sets removed from our data (both the mutant that was messed up and the norm that went with it). We found that for both cpb-2 and K08D9.2 that these data sets showed similar regulation between the Onedata set and Twodata set.
plotCounts(dds= Deseq_Twoproject_analysis_data, gene ="K08D9.2" , intgroup= c( "Condition"))
Next we wanted to look at the specific pathways that our genes were involved in, so we began individually reasearching both the interactions and the pathways that these genes are involved in.
upregulated mltn-7 integral protein of membrane upregulated pgp-8 ATP-binding, transmembrane transport, p=glycoprotein upregulated rab-11.2 GTP-bidning upregulated tba-8 structural constitutuit of skelton, GTP-binding upregulated wrt-7 hedge-hog like family upregulated E01G6.1 chitin binding, negative regulation of endopeptidase activity upregulated C13A2.1 Glycosyltransferas upregulated C13A2.3 Glycosyltransferas upregulated C30H6.12 no info downregulated C34F11.5 ATP binding, protein tryrosine kinase activity downregulated tsp-10 highest expression in material antomical entities, cell-signall downregulated F01G10.5 no info downregulated sams-5 ATP binding down C08F8.6 Protein kinase family-protein coding down C10G11.8 imporatant- spermatogensis, protein kinase pathway–ATP binding K8
From this research we were able to see that a lot of the upregulated genes had to do with structural componets of c. elegans, whereas some of the down regulated genes had to with spermatagenis. Specififcaly by using string, we were able to see that a lot of these genes interacted with otehr genes that are involved in spermatogensis.
We also ran our goTerm anaylsis data on Wormbase and found that most of the genes that are downregulated are found in male tissues (tissue richment analsis). From the phenotype enrichment anaylsis we found that the most significant phenotype terms are blistered and fat content increased. Finally form the gene ontology enrichment analysis the most significant terms were phosphorus metbolic process, dephosphoylation, protein modification process, and peptidyl-serine modification. Whereas on the other hand around the most upregualted genes are found in epitheliel system, intesistne, gonandal primordium, and the nervous system. The phenotype associated with these genes are calium hypersensitive and dumpy. Lastly the gene ontology enrichment analysis found the most significant terms to be collagen trimer, structureal constuent of cuticles, and extracellular region.
Now my parnter and I are going to start on synthesizing all of this information and working on our presentation.