Get the code for this markdown here.

R Markdown can make labs for biostats more seamless. Instead of copying and pasting your work into a document to submit, you can knit it together here. To open a new markdown document, click File > New File > R Markdown. This is an html document, but you have the option of creating PDF or Word files too.

Set up your library

Below you will see an automatic code chunk that contains the text, “knitr::opts_chunk$set(echo = True)”. Click to the left of those closed parenthesis and hit enter. This is the place you should add any libraries that you will be using in your markdown. I plan to use ggplot2 to create graphs for this tutorial. Be sure that you have installed the package first, or else the library will not work. Click “packages” to the right of your coding terminal in RStudio to determine which packages you already have.

Basic things

Create *italics* by adding one asterisks around your text I am italic

Create **bold** by adding two asterisks around your text I am bold

Create headings of different sizes by using your fav, the hashtag. The greater number of hashtags, the smaller the heading size.

Code chunks

You can embed an R code chunk by clicking Ctrl+Alt+i, which you will need to do if you plan to add in any code to your markdown.

I am going to use data that is built into everyone’s RStudio already, called cars for the Summaries & plots section below.

Summaries & plots

You can do anything you would normally do in R in R Markdown, like summarize the first 5 lines of the cars data by embedding a new R code chunk and calling on the summary function, like what is shown below.

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Or like adding a basic plot like what is shown below.

plot(cars$speed)

You can make the code chunk invisible in your output document by adding an echo = FALSE parameter in place of TRUE, which only makes sense if you are in the coding window and not reading the html document.

Try it yourself

Download the following data and place it into your working directory.

Snow Data

Read in the data:


Snow <- read.csv("Snow.csv")

What is in the data?

head(Snow, 5)
##   id inches period  temp
## 1  1  5.090  first 29.62
## 2  2  2.306  first 33.13
## 3  3  0.453  first 36.48
## 4  4  0.639  first 35.04
## 5  5  0.604  first 36.65

Here you can see that there are 4 columns of information: id, inches, period, temp. Inches and temperature seem to be continuous numerical data, whereas period is categorical. Chapter 2.3 of your textbook, “Showing associations between 2 variables and differences between groups”, provides information on which type of graphs to use for displaying data types.

How should we display this data?

If we want to display categorical vs. numerical data, we should use either strip charts, violin plots, or overlaying histograms.

ggplot(Snow, aes(period, temp, col = temp))+
  geom_jitter(position = position_jitter(width = .2))+
  labs(title = "Snow data with 1 categorical and 1 numerical", x = "Period", y = "Temperature")

If we want to display numerical vs. numerical data, we should use scatter plots.

ggplot(Snow, aes(temp, inches))+
  geom_point(color = "tomato")+
  labs(title = "Snow data with 2 numerical data types", x = "Temperature", y = "Inches")