Get the code for this markdown here.
R Markdown can make labs for biostats more seamless. Instead of copying and pasting your work into a document to submit, you can knit it together here. To open a new markdown document, click File > New File > R Markdown. This is an html document, but you have the option of creating PDF or Word files too.
Below you will see an automatic code chunk that contains the text, “knitr::opts_chunk$set(echo = True)”. Click to the left of those closed parenthesis and hit enter. This is the place you should add any libraries that you will be using in your markdown. I plan to use ggplot2 to create graphs for this tutorial. Be sure that you have installed the package first, or else the library will not work. Click “packages” to the right of your coding terminal in RStudio to determine which packages you already have.
Create *italics* by adding one asterisks around your text I am italic
Create **bold** by adding two asterisks around your text I am bold
Create headings of different sizes by using your fav, the hashtag. The greater number of hashtags, the smaller the heading size.
You can embed an R code chunk by clicking Ctrl+Alt+i, which you will need to do if you plan to add in any code to your markdown.
I am going to use data that is built into everyone’s RStudio already, called cars for the Summaries & plots section below.
You can do anything you would normally do in R in R Markdown, like summarize the first 5 lines of the cars data by embedding a new R code chunk and calling on the summary function, like what is shown below.
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Or like adding a basic plot like what is shown below.
plot(cars$speed)
You can make the code chunk invisible in your output document by
adding an echo = FALSE parameter in place of
TRUE, which only makes sense if you are in the coding
window and not reading the html document.
Snow <- read.csv("Snow.csv")
head(Snow, 5)
## id inches period temp
## 1 1 5.090 first 29.62
## 2 2 2.306 first 33.13
## 3 3 0.453 first 36.48
## 4 4 0.639 first 35.04
## 5 5 0.604 first 36.65
Here you can see that there are 4 columns of information: id, inches, period, temp. Inches and temperature seem to be continuous numerical data, whereas period is categorical. Chapter 2.3 of your textbook, “Showing associations between 2 variables and differences between groups”, provides information on which type of graphs to use for displaying data types.
If we want to display categorical vs. numerical data, we should use either strip charts, violin plots, or overlaying histograms.
ggplot(Snow, aes(period, temp, col = temp))+
geom_jitter(position = position_jitter(width = .2))+
labs(title = "Snow data with 1 categorical and 1 numerical", x = "Period", y = "Temperature")
If we want to display numerical vs. numerical data, we should use scatter plots.
ggplot(Snow, aes(temp, inches))+
geom_point(color = "tomato")+
labs(title = "Snow data with 2 numerical data types", x = "Temperature", y = "Inches")
The Analysis of Biological Data, 3rd edition, by Michael Whitlock and Dolph Schluter Resources
Jessie and Dr. Kodner are good resources too.