Reproducible research using R Markdown tutorial.
We can type long passages or descriptions of data that we record without having to use the using the symbol # to type our comments like in normal r. ToothGrowth dataset is going to be used in the first example we are given. In this experiment we use, real Guinea Pigs were given different amounts of VItamin C to see if the animal’s teeth were affected.
When running the r code you have to denote what is going to be considered the R code. These sections are called “code chunks”.
Below is a code chunk:
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
The R Markdown file printed the code chunck after pressing the print button.
fit <- lm(len ~ dose, data = Toothdata)
b <- fit$coefficients
plot(len ~ dose, data = Toothdata)
abline(lm(len ~ dose, data = Toothdata))Figure 1: The tooth growth of Guinea Pigs when given variable amounts of vitamin C
the slope of the regression line is 9.7635714.
We can also put sections and subsections into our markdown file in a similar way to numbers and bullet points in a word document. We do this with the “#” that we used previously to denote and detect R script.
Always make sure to put a space after the hashtag or it will not work!!
We can also add bullet point-tyoe marks in our r markdown file.
It’s important to note that indentation matters in r markdown.
We can put really nice quotes into the document. We can do this by using the “>” symbol.
“Genes are like the story, and DNA is the language that the story is written in.”
— Sam Kean
Hyperlinks can also be incorporated into these files. This is especially useful in HTML files, since they are in a web browser and will redirect the reader to the material you are interested in showing them. Here we will use the link to R Markdown’s homepage for this example. Markdown
We can also put nice formatted formulas into R Markdown using two dollar signs.
Hard-Weinberg Formula
\[p^2 + 2qp + q^2 = 1\]
You can get very complex!
\[\theta = \begin{pmatrix}\alpha & \beta\\ \gamma & \delta \end{pmatrix}\]
There are options with R markdown files that can interprit the code chunk. These are the following options:
Eval (T or F): whether or not to evaluate the code chunk.
Echo (T or F): whether or not to show the code for the chunk, but results will still print.
Cache: if enabled, the same code chunk will not be evaluated the next time that the kniter is run. Great for a code that has LONG run times.
fig.width or fig.height: the (graphical device) size of the r plots in inches. The figures are first written to the knitr document then to files that are saved separately.
out.width or out.height: the output size of the r plots in the R documents.
fig.cap: the words for the figure caption.
We can also add a table of contents to our HTML document. We do this by altering the YAML code (the weird code chunk at the very top of the document). We can add this:
title: “R_Markdown” author: “MLD” date: “2024-11-12” output: html_document: toc: true toc_float: true
This will give us a very nice floating table of contents on the right hand side of the document.
You can also add TABS in the report. To do this you need to specify each section that you want to become a tab by placing {.tabset} after the line.
You can add themes to the HTML document that change the highlighting color and hyperlink color of your HTML output. This can be nice aesthetically. To do this, change your theme in the YAML to one of the following:
cerulean journal flatly readable spacelab united cosmo lumen_ paper sandstone simplex yeti null
You can also change the color by specifying highlight:
default tango payments kate monochrome espresso zenburn haddock textmate
You can also use the code_folding option to allow the reader to toggle between displaying the code and hiding the code. This is done with:
code_folding: hide
There are many options for you to customize your R code using the HTML format. This is also a great way to display any “portfolio” of your work if you are trying to market yourself to interested parties.
library(tidyverse)
my_data <- nycflights13::flights
head(my_data)
filter(my_data, month == 10, day == 14)
oct_14_flight <- filter(my_data, month == 10, day == 14)
(oct_14_flight <- filter(my_data, month == 10, day == 14))
(flight_through_september <- filter(my_data, month == 10))
arrange(my_data, year, day, month)
descending <- arrange(my_data, desc(year), desc(day), desc(month))
###Select
calendar <- dplyr::select(my_data, year, month, day) print(calendar)
calendar2 <- dplyr::select(my_data, year:day)
calendar3 <- dplyr::select(my_data, year:carrier)
everything_else <- dplyr::select(my_data, -(year:day))
everything_else2 <- dplyr::select(my_data, !(year:day))
head(my_data)
my_data_small <- dplyr::select(my_data, year:day, distance, air_time)
###Lets calculate the speed of the flights
mutate(my_data_small, speed = distance / air_time * 60)
my_data_small <- mutate(my_data_small, speed = distance / air_time * 60)
dplyr::summarize(my_data, delay = mean(dep_delay, na.rm = TRUE))
by_day <- group_by(my_data, year, month, day) dplyr::summarize(by_day, delay = mean(dep_delay, na.rm = TRUE))
dplyr::summarize(by_day, delay = mean(dep_delay))
not_cancelled <- filter(my_data, !is.na(dep_delay), !is.na(arr_delay))
dplyr::summarize(not_cancelled, delay = mean(dep_delay))
sum(is.na(my_data$dep_delay))
sum(!is.na(my_data$dep_delay))
my_data %>% group_by(year, month, day) %>% dplyr::summarize(mean = mean(dep_time, na.rm = TRUE))
library(tibble)
as_tibble(iris)
tibble( x = 1:5, y = 1, z = x ^ 2 + y )
tribble( ~genea, ~geneb, ~ genec, ######################## 110, 112, 114, 6, 5, 4 )
print(by_day) as.data.frame(by_day) head(by_day)
nycflights13::flights %>% print(n=10, width = Inf)
df_tibble <- tibble(nycflights13::flights)
df_tibble
df_tibble$carrier
df_tibble[[2]]
class(df_tibble)
df_tibble2 <- as.data.frame(df_tibble)
df_tibble
head(df_tibble2)
library(tidyverse)
bmi <- tibble(women)
bmi %>% mutate(bmi = (703 * weight)/(height)^2)
table4a
table4a %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘cases’)
table4b
table4b %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘population’)
table4a <- table4a %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘cases’) table4b <- table4b %>% gather(‘1999’, ‘2000’, key = ‘year’, value = ‘population’)
left_join(table4a, table4b)
by = join_by(country, year)table2
spread(table2, key = type, value = count)
table3
table3 %>% separate(rate, into = c(“cases”, “population”))
table3 %>% separate(rate, into = c(“cases”, “populate”), conver = TRUE)
table3 %>% separate(rate, into = c(“cases”, “populate”), sep = “/”, conver = TRUE)
table3 %>% separate( year, into = c(“cases”, “population”), convert = TRUE, sep = 2 )
library(tidyverse) library(nycflights13)
airlines
airports
planes
weather
flights
planes %>% count(tailnum) %>% filter(n>1)
planes %>% count(model) %>% filter(n>1)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
flights2
flights2 %>% select(-origin, -dest) %>% left_join(airlines, by = “carrier”)
table5
table5 %>% unite(data, century, year)
table5 %>% unite(data, century, year, sep = ““)
gene_data <- tibble( gene = c(‘a’, ‘a’, ‘a’, ‘a’, ‘b’, ‘b’, ‘b’), nuc = c(20, 22, 24, 25, NA, 42, 67), run = c(1,2,3,4,2,3,4) )
gene_data
gene_data %>% spread(gene, nuc) %>% gather (gene, nuc, ‘a’:‘b’, na.rm = TRUE)
gene_data %>% complete(gene, run)
gene_data %>% spread(gene, nuc) %>% gather(gene, nuc, ‘a’, ‘b’, na.rm = TRUE)
gene_data %>% complete(gene, nuc)
treatment <- tribble( ~person, ~treatment, ~response, ########################################### “Isaac”, 1, 7, NA, 2, 10, NA, 3, 9, “VDB”, 1, 8, NA, 2, 11, NA, 3, 10 )
treatment
treatment %>% fill(person)
library(tidyverse) library(tringr)
string1 <- “this is a string” string2 <- ‘to put a “quote” in your string, use the opposite’
string1
string2
string3 <- “where is this string going?”
string3
string4 <- c(“one”, “two”, “three”)
string4
str_length(string3)
str_length(string4)
str_c(“x”, “y”)
str_c(string1, string2)
str_c(string1, string2, sep = ” “)
str_c(“x”, “y”, “z”, sep = “_“)
MSP <- c(“MSP123”, “MSP234”, “MSP456”)
str_sub(MSP, 4, 6)
str_sub(MSP, -3, -1)
MSP str_to_lower(MSP)
install.packages(“htmlwidgets”)
x <- c(‘ATTAGA’, ‘CGCCCCCGGAT’, ‘TATTA’)
str_view(x, “G”)
str_view(x, “TA”)
str_view(x, “.G.”)
str_view(x, “^TA”)
str_view(x, “TA$”)
str_view(x, “TA[GT]”)
str_view(x, “TA[^T]”)
str_view(x, “TA[G|T]”)
y <- c(“apple”, “banana”, “pear”) y
str_detect(y, “e”)
sum(str_detect(words, “e”))
mean(str_detect(words, “[aeiou]$”))
mean(str_detect(words, “1”))
no_o <- !str_detect(words, “[ou]”) no_o
words[!str_detect(words, “[ou]”)]
x
str_count(x, “GC”)
df <- tibble( word = words, i = seq_along(word) )
df
df %>% mutate( vowels = str_count(words, “[aeiou]”), constonants = str_count(words, “[^aeiou]”) )
library(ath1121501cdf)
library(ath1121501.db)
library(dplyr) library(stats) library(reshape)
library(affy)
targets <- readTargets(“Bric16_Targets.csv”, sep = “,”, row.names = “filename”)
ab <- ReadAffy()
aeiou↩︎