This is a tutorial on how to use R markdown or reproducible research
Here we can type long passages or descriptions of our data without the need of “hashing” out our comments with the # symbol. In our first example we will be using the ToothGrowth dataset. In this experiment, Guinea Pigs (literal) were given different amount of vitamin C to see the effects on the animal’s tooth growth.
To run R code in a markdown file, we need to denote the section that is considered R code. We call these “code chunks.”
Below is a code chunk:
Toothdata <- ToothGrowth
head(Toothdata)
As you can see from running the “play” button on the code chunk, the results are printed inline of the r markdown file.
fit <- lm(len ~ dose, data = Toothdata)
b <- fit$coefficients
plot(len ~ dose, data = Toothdata)
abline(lm(len ~ dose, data = Toothdata))
Figure 1: The tooth growth of Guinea Pigs when given variable amounts of Vitamin C
The slope of the regression line is 9.7635714 .
We can also put sections and subsections in our r markdown file similar to numbers or bullet points in a word document. This is done with the “#” that we previously used to denote text in an R script.
Make sure you put a space after the hastag, otherwise it will not work!
We can also add bullet point-type marks in our R markdown file.
Its important to note here that in R markdown identation matters!
we can put really nice quotes in the markdown document. We can do this by using the “>” symbol.
“Genes are like the story, and DNA is the language that the story in written in.”
—Sam Kean
Hyperlinks can also be incorporated into these files. This is especially useful in HTML files, since they are in a web browser and will redirect the reader to the material that you are interested in showing them. Here we will use the link to R markdown’s homepage for this example. Rmarkdown
We can also put nice formatted formulas into Markdown by using two dollar signs
Hard-Weinburg Formula
\[p^2 + 2pq + q^2 = 1\]
And you can get really complex as we!!
\[\Theta = \begin{pmatrix}\alpha & \beta\\ \gamma & \delta \end{pmatrix}\]
There are aslo options for your R markdown file on how knitr interprets code chunk. There are the following options
Eval (T or F): whether or not to evaluate the code chunk.
Echo (T or F): whether or not show the code for the chunk, but results will still print.
Cache: If enable, the same code chunk will not be evaluated the next time the knitr is run. Great for code that has LONG runtimes.
fig.width or fig.height: the (graphical device) size of the R plots in inches. The figures are first written to the knitr document then to files that are saved separately.
out.width or out.height: The output of the size of the R plots IN THE R DOCUMENT
fig.cap: the words for the figure caption
We can also add a table of contents to our HTML document. We do this by altering the YAML code (the weird code chunk at the VERY top of the document) We can add this:
title: “HTML_Tutorial” author: “Lauren Green” date: “4/11/2022” output: html_document: toc: true toc_float: true
This will give us a very nice floating table of contents on the right hand side of the document.
You can also add tabs in our report. To this you need to specify each section that you want to become a tab by placing “{.tabset}” after the line. Every subsequent header will be a new tab.
You can also add themes to your HTML document that change the highlighting color and hyperlink color of your html output. This can be nice aesthetically. TO do this, you change your theme and your YAML to one of the following:
cerulean journal flatly readable spacelab united cosmo lumen paper sandstone simplex yeti null
You can also change the color by specifing higlight:
default tango payments kate monochrome espresso zenburn haddock textmate
you can also use the code_folding option to allow the reader tot toggle between displaying the code and hiding the code. This is done with:
code_folding: hide
There are a TON of options and ways for you to customize your R code using the HTML format. This is also a great way to display a “portfolio of your work if you are trying to market yourself to interested parties.
library(tidyverse)
??flights
my_data <- nycflights13::flights
head(my_data)
First we will just look at the data on the october 14th.
filter(my_data, month == 10, day ==14)
If we want to subset this into a new variable we do the following: oct_14_flight <- filter(my_data, month == 10, day ==14)
what if you want to do both print and save the variable?
(oct_14_flight_2 <- filter(my_data, month == 10, day ==14))
If you want to filter based on different opperators, you can use the following:
Equals == not equal to != greater than > less than < greater than or equal to >= less than or equal to <=
(flight_through_september <- filter(my_data, month < 10))
If we don’t use the == to mean equals, we get this: (oct_14_flight_2 <- filter(my_data, month = 10, day =14))
you can also use logical opperators to be more selective and & or | not !
Lets use the “or” function to pick flights in March and April
March_April_Flights <- filter(my_data, month ==3 | month == 4)
March_4th_Flights <- filter(my_data, month == 3 & day == 4)
Non_jan_flights <- filter(my_data, month != 1)
Arrange allows us to arrange the dataset nased on the variables we desire.
arrange(my_data, year, day, month)
we can also do this in descending fashion descending <- arrange(my_data, desc(year), desc(day), desc(month))
Missing values are always placed at the end of the dataframe regardless of ascending or descending
we can also select specific columns that we want to look at.
calendar <- select(my_data, year, month, day) print(calendar)
we can also look at a range of columns
calendar2 <- select(my_data, year:day)
lets look at all columns months through carrier calendar3 <- select(my_data, year:carrier)
we can also choose which columns NOT to include
everything_else <- select(my_data, -(year:day))
In this instance we can use the “not” opperator everything_else2 <- select(my_data, !(year:day))
there are also some other helper functions that can help you select the colums or data you’re looking for
starts_with(“xyz”) – will select the values that starts with xyz ends_with(“xyz”) — will select the values that ends with xyz contains(“xyz”) — will select the values that contain xyz matches (“xyz”) — will match the identical value xyz
head(my_data)
rename(my_data, departure_time = dep_time)
my_data <- rename(my_data, departure_time = dep_time)
what if you want to add new columns to your data frame? we have the mutate() function for that.
first, lets make smaller data fram so we can see what we’re doing
my_data_small <- select(my_data, year:day, distance, air_time)
lets calculate the speed of the flights. mutate(my_data_small, speed = distance / air_time * 60)
my_data_small <- mutate(my_data_small, speed = distance / air_time * 60)
what if we wanted to create a new dataframe with ONLY your calulations (transmute)
airspeed <- transmute(my_data_small, speed = distance / air_time * 60 , speed2 = distance / air_time)
we can summarize to run a function on a data column to get a single return
summarize(my_data, delay = mean(dep_delay, na.rm = TRUE))
by_day <- group_by(my_data, year, month, day) summarize(by_day, delay = mean (dep_delay, na.rm = TRUE))
As you can see, we now have the delay by the days of the year
what happens if we don’t tell R what do to with the missing data? summarize(by_day, delay = mean (dep_delay))
we can also filter our data based on NA (which in this dataset was canelled flights)
not_cancelled <- filter (my_data, !is.na(dep_delay), !is.na(arr_delay))
lets run summarize again on this data summarize(not_cancelled, delay = mean(dep_delay))
we can also count the number of variables that are NA
sum(is.na(my_data$dep_delay))
we can also count the numbers that are a NOT NA sum(!is.na(my_data$dep_delay))
with tibble datasets (more on them soon, we can pipe results to get rid of the need to use the dollar signs) we can then summarize the number of flights by minutes delayed.
my_data %>% group_by(year, month, day) %>% summarize(mean = mean(departure_time, na.rm = TRUE))
library(tibble)
as_tibble(iris)
tibble( x = 1:5, y = 1, z = x ^ 2 + y )
tribble( ~genea, ~geneb, ~genec, ######################################## 110, 112, 114, 6, 5, 4 )
print(by_day) as.data.frame(by_day) head(by_day)
nycflights13::flights %>% print(n=10, width=Inf)
subsetting tibbles is easy, similar to data.frames
df_tibble <- tibble(nycflights13::flights)
df_tibble
df_tibble$carrier
df_tibble[[2]]
df_tibble %>% .$carrier
class(df_tibble)
df_tibble_2 <- as.data.frame(df_tibble)
class(df_tibble_2)
df_tibble
head(df_tibble_2)
library(tidyverse)
How do we make a tidy data set? well the tidyverse follows three rules
1 - each variable must have its own column. 2 - each observation has its own row. 3 - each value has its own cell.
It is impossible to satisfy two of the three rules
this leads to the following instructions for tidy data
1 put each data set into a tibble 2 put each variable into a column 3 profit
picking one consistent method of data storage makes for easier understanding of your code and what is happening “under the hood” or behind the scenes.
lets now look at working with tibbles
bmi <- tibble (women)
bmi %>% mutate(bmi = (703 * weight)/(height)^2)
sometimes you’ll find datasets that don’t fit well into a tibble
we’ll use the built-in data from tidyverse for this part
table4a
as you can see from this data, we have one variabel in column A (country) # but column b and c are two of the same. Thus, there are two observations in # each row.
table4a %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘cases’)
table4b
table4b %>% gather(“1999”, “2000”, key = “year”, value = “population”)
table4a <- table4a %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘cases’) table4b <- table4b %>% gather(“1999”, “2000”, key = “year”, value = “population”)
left_join(table4a, table4b)
table2
spread(table2, key = type, value = count)
table3
table3 %>% separate(rate, into = c(“cases”, “population”))
table3 %>% separate(rate, into = c(“cases”, “populate”), conver = TRUE)
table3 %>% separate(rate, into = c(“cases”, “populate”), sep = “/”, conver = TRUE)
table3 %>% separate( year, into = c(“century”, “year”), convert = TRUE, sep = 2 )
what happens if we want to do the inverse of separate?
table5
table5 %>% unite(date, century, year)
table5 %>% unite(date, century, year, sep = ““)
There can be two types of missing vales. NA (explicit) or just no entry(implicit)
gene_data <- tibble( gene = c(‘a’, ‘a’, ‘a’, ‘a’, ‘b’, ‘b’, ‘b’), nuc = c(20, 22, 24, 25, NA, 42, 67), run = c(1,2,3,4,2,3,4) )
gene_data
gene_data %>% spread(gene, nuc)
gene_data %>% spread(gene, nuc) %>% gather(gene, nuc, ‘a’:‘b’, na.rm = TRUE)
gene_data %>% complete(gene, run)
treatment <-tribble( ~person, ~treatment, ~response, ############################################# “Issac”, 1, 7,
NA, 2, 10, NA, 3, 9, “VDB”, 1, 8, NA, 2, 11, NA, 3, 10, )
treatment
treatment %>% fill(person)
It is rare that you will be working with a single data table. The DPLYR package allows you to join teo data tables based on common values
# they are present in another
library(tidyverse) library(nycflights13)
airlines
airports
planes
weather
flights
keys are uniwue identifiers per observation primary key uniquely identifies an observation in its own table.
one way to identify a primary key is as follows
planes %>% count(tailnum) %>% filter(n>1)
planes %>% count(model) %>% filter(n>1)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
flights2
flights2 %>% select(-origin, -dest) %>% left_join(airlines, by = ‘carrier’)
library(tidyverse) library(stringr)
string1 <- “this is a string” string2 <- “to put a quote in your string, use the opposite”
string1 string2
string3 <- “where is this string going?”
string3
string4 <- c(“one”, “two”, “three”) string4
str_length(string3)
tr_length(string4)
str_c(“X”, “Y”)
str_c(string1, string2)
str_c(string1, string2, sep = ” “)
str_c(“x”, “y”, “z”, sep = “_“)
HSP <- c(“HSP123”, “HSP234”, “HSP456”)
str_sub(HSP, 4,6)
str_sub(HSP, -3, -1)
HSP str_to_lower(HSP)
install.packages(“htmlwidgets”)
x <- c(‘ATTAGA’, ‘CGCCCCCGGAT’, ‘TATTA’)
str_view(x, “G”)
str_view(x, “TA”)
str_view(x, “.G.”)
str_view(x, “^TA”)
str_view(x, “TA$”)
you can also use | to pick between two alternatives str_view(x, “TA[G|T]”)
str_detect() returns a logical vextor the same length of input
y <- c(“apple”, “banana”, “pear”) y
str_detect(y, “e”)
words
sum(str_detect(words, “e”))
mean(str_detect(words, “[aeiou]$”))
mean (str_detect(words, “1”))
no_o <- !str_detect(words, “[ou]”)
no_o
words[!str_detect(words, “[ou]”)]
x
str_count(x, “[GC]”)
df <- tibble( word = words, count = seq_along(word) )
df
df %>% mutate( vowels = str_count(words, “[aeiou]”), constonants = str_count(words, “[^aeiou]”) )
setwd(“~/Desktop/classroom/myfiles”) #setwd(“~/Desktop/classroom/myfiles”)
library(ath1121501cdf) library(ath1121501.db) library(annotate) library(affy) library(limma) library(oligo) library(dplyr) library(stats) library(reshape)
Read the cell files into the directory
targets <- readTargets(“Bric16_Targets.csv”, sep = “,”, row.names = “filenames”)
ab <- ReadAffy() ########################################## hist(ab)
eset <- affy::rma(ab)
pData(eset)
hist(eset)
ID <- featureNames(eset) Symbol <- getSYMBOL(ID, “ath1121501.db”)
affydata <- read.delim(“affy_ATH1_array_elements.txt”)
condition <- targets$gravity
design <- model.matrix(~factor(condition)) colnames(design) <- c(“flight”, “ground”)
fit <- lmFit(eset, design) fit <- eBayes(fit) options(digits = 2) top <- topTable(fit,coef = 2, n = Inf, adjust = ‘fdr’)
Annot <- data.frame(GENE = sapply(contents(ath1121501GENENAME), paste, collapse = “,”), KEGG = sapply(contents(ath1121501PATH), paste, collapse = “,”), GO = sapply(contents(ath1121501GO), paste, collapse = “,”), SYMBOL = sapply(contents(ath1121501SYMBOL), paste, collapse = “,”), ACCNUM = sapply(contents(ath1121501ACCNUM), paste, collapse = “,”))
all <- merge(Annot, top, by.x = 0, by.y = 0, all = TRUE)
all2 <- merge(all, affydata, by.x = “Row.names”, by.y = “array_element_name”)
all2\(ACCNUM <- as.character(all2\)ACCNUM)
write.table(all2, file = “BRIC_16_Final.csv”, row.names = TRUE, col.names = TRUE, sep = “)
columns(org.At.tair.db)
foldchanges <- as.data.frame(all2$logFc)
all2\(entrez = mapIds(org.At.tair.db, keys = all2\)ACCNUM, column = “ENTREZID”, keytype = “TAIR”, multiVals = “first”)
head(all2, 10)
library(pathview) library(gage) library(gageData) data(kegg.sets.hs)
foldchanges = all2\(logFC names (foldchanges) = all2\)entrez
head(foldchanges)
kegg.ath = kegg.gsets(“ath”, id.type = “entrez”) kegg.ath.sigmet = kegg.ath\(kg.sets[kegg.ath\)sigmet.idx]
keggres = gage(foldchanges, gsets = kegg.ath.sigmet, same.dir = TRUE)
lapply(keggres, head)
greater <- keggres\(greater lessers <- keggres\)less
write.csv(greater, file = “BRIC_16_pathview_Greater_Pathways.csv”) write.csv(lessers, file = “BRIC_16_pathview_lesser_Pathways.csv”)
keggrespathways = data.frame(id=rownames(keggres\(greater), keggres\)greater) %>% tbl_df() %>% filter(row_number()<=5) %>% .$id %>% as.character() keggrespathways
keggressids_greater = substr(keggrespathways, start=1, stop=8) keggressids_greater
genedata <- as.character(all2$logFC)
plot_pathway = function(pid) pathview(gene.data=foldchanges, pathway.id = pid, species = “ath”, new.signature = FALSE)
tmp = sapply(keggressids_greater, function(pid) pathview(gene.data=foldchanges, pathway.id = pid, species = “ath”))
keggrespathways = data.frame(id=rownames(keggres\(less), keggres\)less) %>% tbl_df() %>% filter(row_number()<=5) %>% .$id %>% as.character() keggrespathways
keggressids_less = substr(keggrespathways, start=1, stop=8) keggressids_less
genedata <- as.character(all2$logFC)
plot_pathway = function(pid) pathview(gene.data=foldchanges, pathway.id = pid, species = “ath”, new.signature = FALSE)
tmp = sapply(keggressids_less, function(pid) pathview(gene.data=foldchanges, pathway.id = pid, species = “ath”))
author: Lauren Green date: 5/20/2022 output: html_document —
Lets load the required libraries for this analysis
library("DESeq2")
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library("dplyr")
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("apeglm")
lets load in our data
countData <- read.csv("GLDS-102_rna_seq_Unnormalized_Counts.csv")
## Warning in file(file, "rt"): cannot open file
## 'GLDS-102_rna_seq_Unnormalized_Counts.csv': No such file or directory
## Error in file(file, "rt"): cannot open the connection
colData <- read.csv("PHENO_DATA_Mouse.csv", sep = ",", row.names = 1)
## Warning in file(file, "rt"): cannot open file 'PHENO_DATA_Mouse.csv': No such
## file or directory
## Error in file(file, "rt"): cannot open the connection
Now we need to add leveling factors to the colData
colData$group <- factor(colData$group, levels = c("0", "1"))
## Error in colData$group: object of type 'closure' is not subsettable
Now lets make sure we have the same number of individuals as well as groups
all(rownames(colData)) %in% colnames(countData)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'table' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'colnames': object 'countData' not found
we need to round up the counts, because DESeq2 only allows integers as an input, and not fractional numbers. This depends on the method of alignment that was used upstream.
countData %>%
mutate_if(is.numeric, ceiling)
## Error in is_grouped_df(tbl): object 'countData' not found
countData[, 2:13] <- sapply(countData[, 2:13], as.integer)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'countData' not found
row.names(countData) <- countData [, 1]
## Error in eval(expr, envir, enclos): object 'countData' not found
countData <- countData[2:13]
## Error in eval(expr, envir, enclos): object 'countData' not found
rownames(colData) == colnames(countData)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'colnames': object 'countData' not found
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~group)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'ncol': object 'countData' not found
dds <- dds[rowSums(counts(dds)>2) >=4]
## Error in eval(expr, envir, enclos): object 'dds' not found
dds <- DESeq(dds)
## Error in is(object, "DESeqDataSet"): object 'dds' not found
res <- results(dds)
## Error in is(object, "DESeqDataSet"): object 'dds' not found
resLFC <- lfcShrink(dds, coef= 2)
## Error in is(dds, "DESeqDataSet"): object 'dds' not found
(resOrdered <- res[order(res$padj), ])
## Error in eval(expr, envir, enclos): object 'res' not found
summary(res)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'res' not found
FLT_VS_GC <- as.data.frame(res$log2FoldChange)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'res' not found
head(FLT_VS_GC)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'FLT_VS_GC' not found
plotMA(resLFC, ylim = c(-2,2))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'plotMA': object 'resLFC' not found
pdf(file = "MA_Plot_FLT_vs_GC.pdf")
dev.off()
## png
## 2
Lets save our differential expression results to file.
write.csv(as.data.frame(resOrdered), file = "Mouse_DESeq.csv")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'resOrdered' not found
Lets perform a custom transformation
dds <- estimateSizeFactors(dds)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'estimateSizeFactors': object 'dds' not found
se <- SummarizedExperiment(log2(counts(dds, normalize = TRUE) +1), colData = colData(dds))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'counts': object 'dds' not found
pdf(file = "PCA_PLOT_FLT_vs_GC.pdf")
plotPCA(DESeqTransform(se), intgroup = "group")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'plotPCA': object 'se' not found
lets load our required packages
library(AnnotationDbi)
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
library(org.Mm.eg.db)
##
columns(org.Mm.eg.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MGI"
## [16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UNIPROT"
foldchanges <- as.data.frame(res$log2FoldChange, row.names = row.names(res))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'res' not found
head(foldchanges)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'foldchanges' not found
res$symbol = mapIds(org.Mm.eg.db,
keys = row.names(res),
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(res): object 'res' not found
res$entrez = mapIds(org.Mm.eg.db,
keys = row.names(res),
column = "ENTREZID",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(res): object 'res' not found
res$name = mapIds(org.Mm.eg.db,
keys = row.names(res),
column = "GENENAME",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(res): object 'res' not found
head(res, 10)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'res' not found
Lets load our pathview packages
library(pathview)
##
## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
## formally cite the original Pathview paper (not just mention it) in publications
## or products. For details, do citation("pathview") within R.
##
## The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
## license agreement (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################
library(gage)
##
library(gageData)
data(kegg.sets.mm)
data(go.subs.mm)
foldchanges <- res$log2FoldChange
## Error in eval(expr, envir, enclos): object 'res' not found
names(foldchanges) = res$entrez
## Error in eval(expr, envir, enclos): object 'res' not found
head(foldchanges)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'foldchanges' not found
kegg.mm = kegg.gsets("mouse", id.type = "entrez")
kegg.mm.sigmet <- kegg.mm$kg.sets[kegg.mm$sigmet.idx]
lets map the resulta
keggres <- gage(foldchanges, gsets = kegg.mm.sigmet, same.dir = TRUE)
## Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir, compare = compare, : object 'foldchanges' not found
lapply(keggres, head)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'keggres' not found
Lets save our greater and less than pathways
greaters <- keggres$greater
## Error in eval(expr, envir, enclos): object 'keggres' not found
lessers <- keggres$less
## Error in eval(expr, envir, enclos): object 'keggres' not found
keggrespathways <- data.frame(id = rownames(keggres$greater), keggres$greater) %>%
tbl_df() %>%
filter(row_number() <=3) %>%
.$id %>%
as.character()
## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'keggres' not found
keggrespathways
## Error in eval(expr, envir, enclos): object 'keggrespathways' not found
keggresids <- substr(keggrespathways, start = 1, stop = 8)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'substr': object 'keggrespathways' not found
keggresids
## Error in eval(expr, envir, enclos): object 'keggresids' not found
PLOT!
plot_pathway = function(pid) pathview(gene.data = foldchange, pathway.id = pid, species = "mouse", new.signature = FALSE)
tmp = sapply(keggresids, function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse"))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'keggresids' not found
keggrespathways <- data.frame(id = rownames(keggres$less), keggres$less) %>%
tbl_df() %>%
filter(row_number() <=3) %>%
.$id %>%
as.character()
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'keggres' not found
keggrespathways
## Error in eval(expr, envir, enclos): object 'keggrespathways' not found
keggresids <- substr(keggrespathways, start = 1, stop = 8)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'substr': object 'keggrespathways' not found
keggresids
## Error in eval(expr, envir, enclos): object 'keggresids' not found
PLOT!
plot_pathway = function(pid) pathview(gene.data = foldchange, pathway.id = pid, species = "mouse", new.signature = FALSE)
tmp = sapply(keggresids, function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse"))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'keggresids' not found
library(imager)
## Loading required package: magrittr
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following objects are masked from 'package:SummarizedExperiment':
##
## resize, width
## The following object is masked from 'package:Biobase':
##
## channel
## The following objects are masked from 'package:GenomicRanges':
##
## resize, width
## The following objects are masked from 'package:IRanges':
##
## resize, width
## The following object is masked from 'package:S4Vectors':
##
## width
## The following object is masked from 'package:BiocGenerics':
##
## width
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
file.names <- list.files(path = "~desktop/classroom/myfiles", pattern = ".*pathview.png")
all_images <- lapply(filenames, load.image)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'filenames' not found
knitr::include_graphics(filenames)
## Error in Encoding(x): object 'filenames' not found
author: Lauren Green date: “5/20/2022” output: html_document —
library("edgeR")
## Loading required package: limma
##
## Attaching package: 'limma'
## The following object is masked from 'package:DESeq2':
##
## plotMA
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
library("dplyr")
library("AnnotationDbi")
library("org.Mm.eg.db")
rawdata = read.csv("GLDS-102_rna_seq_Normalized_Counts.csv", row.names = "gene_id")
## Warning in file(file, "rt"): cannot open file
## 'GLDS-102_rna_seq_Normalized_Counts.csv': No such file or directory
## Error in file(file, "rt"): cannot open the connection
group <- factor(c("1", '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2'))
dgeGlm <- DGEList(counts = rawdata, group = group)
## Error in is.data.frame(counts): object 'rawdata' not found
str(dgeGlm)
## Error in str(dgeGlm): object 'dgeGlm' not found
keep <- rowSums(cpm(dgeGlm)>2) >=4
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rowSums': object 'dgeGlm' not found
dgeGlm <- dgeGlm[keep,]
## Error in eval(expr, envir, enclos): object 'dgeGlm' not found
names(dgeGlm)
## Error in eval(expr, envir, enclos): object 'dgeGlm' not found
dgeGlm[["samples"]]
## Error in eval(expr, envir, enclos): object 'dgeGlm' not found
design <- model.matrix(~group)
design
## (Intercept) group2
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
## 7 1 1
## 8 1 1
## 9 1 1
## 10 1 1
## 11 1 1
## 12 1 1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
dgeGlmComDisp <- estimateGLMCommonDisp(dgeGlm, design, verbose = TRUE)
## Error in estimateGLMCommonDisp(dgeGlm, design, verbose = TRUE): object 'dgeGlm' not found
dgeGlmTrendDisp <- estimateGLMTrendedDisp(dgeGlmComDisp, design)
## Error in estimateGLMTrendedDisp(dgeGlmComDisp, design): object 'dgeGlmComDisp' not found
dgeGlmTagDisp <- estimateGLMTagwiseDisp(dgeGlmTrendDisp, design)
## Error in estimateGLMTagwiseDisp(dgeGlmTrendDisp, design): object 'dgeGlmTrendDisp' not found
plotBCV(dgeGlmTagDisp)
## Error in is(y, "DGEList"): object 'dgeGlmTagDisp' not found
fit <- glmFit(dgeGlmTagDisp, design)
## Error in glmFit(dgeGlmTagDisp, design): object 'dgeGlmTagDisp' not found
colnames(coef(fit))
## NULL
lrt <- glmLRT(fit, coef =2)
## Error in glmLRT(fit, coef = 2): glmfit must be an DGEGLM object (usually produced by glmFit).
ttGlm <- topTags(lrt, n = Inf)
## Error in topTags(lrt, n = Inf): object 'lrt' not found
class(ttGlm)
## Error in eval(expr, envir, enclos): object 'ttGlm' not found
summary(deGlm <- decideTestsDGE(lrt, p = 0.1, adjust = "fdr"))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'lrt' not found
tagsGlm <- rownames(dgeGlmTagDisp)[as.logical(deGlm)]
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'dgeGlmTagDisp' not found
plotSmear(lrt, de.tags = tagsGlm)
## Error in plotSmear(lrt, de.tags = tagsGlm): object 'lrt' not found
hits2 <- ttGlm$table[ttGlm$table$FDR < 0.1, ]
## Error in eval(expr, envir, enclos): object 'ttGlm' not found
write.csv(hits2, "mouse_edgeR_results_Zero_vs_1.csv")
## Error in is.data.frame(x): object 'hits2' not found
columns(org.Mm.eg.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MGI"
## [16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UNIPROT"
ttGlm2 <- as.data.frame(ttGlm$table)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'ttGlm' not found
ttGlm$symbol = mapIds(org.Mm.eg.db,
keys=row.names(ttGlm2),
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(ttGlm2): object 'ttGlm2' not found
ttGlm2$entrez = mapIds(org.Mm.eg.db,
keys=row.names(ttGlm2),
column = "ENTREZID",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(ttGlm2): object 'ttGlm2' not found
ttGlm2$name = mapIds(org.Mm.eg.db,
keys=row.names(ttGlm2),
column = "GENENAME",
keytype = "ENSEMBL",
multiVals = "first")
## Error in row.names(ttGlm2): object 'ttGlm2' not found
head(ttGlm2)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'ttGlm2' not found
library(pathview)
library(gage)
library(gageData)
data(kegg.sets.mm)
data(go.subs.mm)
foldchanges <- ttGlm2$logFC
## Error in eval(expr, envir, enclos): object 'ttGlm2' not found
names(foldchanges) <- ttGlm2$entrez
## Error in eval(expr, envir, enclos): object 'ttGlm2' not found
kegg.mm = kegg.gsets("mouse", id.type = "entrez")
kegg.mm.sigmet = kegg.mm$kg.sets [kegg.mm$sigmet.idx]
# lets get the results
keggres = gage(foldchanges, gsets = kegg.mm.sigmet, smae.dir = TRUE)
## Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir, compare = compare, : object 'foldchanges' not found
lapply(keggres, head)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'keggres' not found
greaters <- keggres$greater
## Error in eval(expr, envir, enclos): object 'keggres' not found
lessers <- keggres$lessers
## Error in eval(expr, envir, enclos): object 'keggres' not found
{
keggrespathways = data.frame(id = rownames(keggres$greater), keggres$greater) %>%
tbl_df() %>%
filter(row_number()<=5) %>%
.$id %>%
as.character()
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'keggres' not found
keggrespathways
## Error in eval(expr, envir, enclos): object 'keggrespathways' not found
keggresids = substr(keggrespathways, start=1, stop = 8)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'substr': object 'keggrespathways' not found
keggresids
## Error in eval(expr, envir, enclos): object 'keggresids' not found
plot_pathway = function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse", new.signature = FALSE)
# plot multiple pathways
tmp = sapply(keggresids, function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse"))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'keggresids' not found
keggrespathways = data.frame(id = rownames(keggres$less), keggres$less) %>%
tbl_df() %>%
filter(row_number()<=5) %>%
.$id %>%
as.character()
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'keggres' not found
keggrespathways
## Error in eval(expr, envir, enclos): object 'keggrespathways' not found
keggresids = substr(keggrespathways, start=1, stop = 8)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'substr': object 'keggrespathways' not found
keggresids
## Error in eval(expr, envir, enclos): object 'keggresids' not found
plot_pathway = function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse", new.signature = FALSE)
# plot multiple pathways
tmp = sapply(keggresids, function(pid) pathview(gene.data = foldchanges, pathway.id = pid, species = "mouse"))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'keggresids' not found
library(imager)
filenames <- list.files(path = "~/Desktop/classroom/myfiles", pattern= ".*pathview.png")
all_images <- lapply(filenames, load.image)
## Error in wrap.url(file, load.image.internal): File not found
knitr::include_graphics(filenames)
## Error in knitr::include_graphics(filenames): Cannot find the file(s): "ath00020.pathview.png"; "ath00071.pathview.png"; "ath00195.pathview.png"; "ath00592.pathview.png"; "ath00906.pathview.png"; "ath01230.pathview.png"; "ath03010.pathview.png"; "ath04016.pathview.png"; "ath04141.pathview.png"; "ath04145.pathview.png"; "mmu00910.pathview.png"; "mmu00970.pathview.png"; "mmu02010.pathview.png"; "mmu03008.pathview.png"; "mmu03010.pathview.png"; "mmu03013.pathview.png"; "mmu04022.pathview.png"; "mmu04110.pathview.png"; "mmu04145.pathview.png"; "mmu04217.pathview.png"; "mmu04330.pathview.png"; "mmu04350.pathview.png"; "mmu04360.pathview.png"; "mmu04390.pathview.png"; "mmu04550.pathview.png"; "mmu04657.pathview.png"; "mmu04714.pathview.png"
library(tidyverse)
EdgeR <- read.csv(“mouse_edgeR_results_Zero_vs_1.csv”)
DESeq <- read.csv(“mouse_DESeq.csv”)
DESeq2 <- DESeq %>% filter(padj <0.1)
DESeq2 <- DESeq2[,c(1,3)]
EdgeR<-EdgeR[, 1:2]
colnames(DESeq2) <- c(“ID”, “logFC”) colnames(EdgeR) <- c(“ID”, “logFC”)
install.packages(“GOplot”) library(GOplot)
comp <- GOVenn(DESeq2, EdgeR, label = c(“DESeq1”, “EdgeR”), title = “Comparison of DESeq and EdgeR DE Genes”, plot = FALSE)
comp$plot
comp$table
library(msa)
seq <- readAAStringSet(“hglobin.fa”) # seq <- readAAStringSet(“hglobin.fa”)
seq
alignments <- msa(seq, method = “ClustalW”) alignments
msaPrettyPrint(alignments, output = “pdf”, showNames = “left”, showLogo = “none”, askForOverwrite = FALSE, verbose = TRUE, file = “whole_align.pdf”)
msaPrettyPrint(alignments, c(10,30), output = “pdf”, showNames = “left”, file = “Zoomed_align.pdf”, showLogo = “top”, askForOverwrite = FALSE, verbose = TRUE)
library(ape) library(seqinr)
alignment_seqninr <- msaConvert(alignments, type = “seqinr::alignment”)
distances1 <- seqinr::dist.alignment(alignment_seqninr, “identity”)
tree <- ape::nj(distances1) plot(tree, main = “Phylogenetic Tree of HBA Sequences”)
library(DECIPHER)
long_seg <- readDNAStringSet(“plastid_genomes.fa”) long_seg
Seqs2DB(long_seq, “XStringSet”, “long_db”, names(long_seq))
synteny <- FindSynteny(“long_db”)
#view blocs with plotting pairs(synteny) plot(synteny)
alignment <- AlignSynteny(synteny, “long_db”)
block <- unlist(alignment[[1]])
writeXStringSet(block, “genome_blocks_out.fa”)
library(DECIPHER)
long_seg <- readDNAStringSet(“plastid_genomes.fa”) long_seg
Seqs2DB(long_seq, “XStringSet”, “long_db”, names(long_seq))
synteny <- FindSynteny(“long_db”)
#view blocs with plotting pairs(synteny) plot(synteny)
alignment <- AlignSynteny(synteny, “long_db”)
block <- unlist(alignment[[1]])
writeXStringSet(block, “genome_blocks_out.fa”)
#Quantifying difference between trees with treespace
library(ape) library(adegraphics) library(treespace)
treefiles <- list.files(file.path(getwd(), “gene_trees”), full.names = TRUE) tree_list <- lapply(treefiles, read.tree) class(tree_list)
class(tree_list) <- “multiPhylo”
class(tree_list)
# Now we can compute the Kendall-Coljin distances between tress. This function # does a LOT of analyses.
# First it runs a pairwise comparison of all trees in the input # Second it carrier out clustering using PCA # These results are returned in a list of objects, where $D contains the pairwise # metric of the trees, and $pco contains the PCA. The method we use (Kendal-Coljin) # particularly sutiable for rooted trees as we have here. The option NF tells us how # many principal components to retain.
comparisons <- treespace(tree_list, nf = 3)
aeiou↩︎