In this tutorial, we will go through a piece of code that reads in data, manipulates it, and generates a plot. Specifically, the code imports a CSV file containing data on weekly death trends in the United States, manipulates the data to ensure the date column is in the correct format, and generates a line graph of the number of weekly deaths.
Before we can start using R, we need to make sure that we have the necessary packages installed. We will be using the following packages in this tutorial:
To install these packages, we can use the following code:
install.packages("readr")
install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggthemes")
Once we have installed these packages, we can load them into our
environment using the library()
function:
library(readr)
library(tidyverse)
library(lubridate)
library(ggthemes)
The first part of our code reads in a CSV file containing data on
weekly death trends in the United States. This is done using the
read_csv()
function from the readr
package:
data_table_for_weekly_death_trends_the_united_states <- read_csv("~/Downloads/data_table_for_weekly_death_trends__the_united_states.csv", skip = 2)
Here, we specify the path to our file using
~/Downloads/
, followed by the name of our file,
data_table_for_weekly_death_trends__the_united_states.csv
.
The skip = 2
argument tells the function to ignore the
first two rows of the file.
We then assign the resulting table to a variable called
data
:
data <- data_table_for_weekly_death_trends_the_united_states
Next, we use the mdy()
function from the
lubridate
package to convert the date column in our table
to the correct format:
mdy(data$Date) -> data$Date
Here, we are using the %>%
symbol (also known as the
“pipe” operator) to pass the date vector to the mdy()
function. The resulting vector is then assigned back to the
Date
column of our table.
Finally, we use the ggplot()
function from the
ggplot2
package (part of the tidyverse
) to
create a line graph of the number of weekly deaths. We use the
aes()
function to specify that we want to plot
Weekly Deaths
against Date
:
ggplot(data, aes(Date, `Weekly Deaths`)) + geom_line() + geom_smooth() + theme_economist()
We then add a line to our plot using the geom_line()
function and a smoothed line using the geom_smooth()
function. Finally, we apply a theme to our plot using the
theme_economist()
function from the ggthemes
package.
This tutorial has covered a simple yet useful example of manipulating data and creating visualizations in R using some popular packages. With more practice, these concepts will become more familiar and intuitive, helping you to unlock R’s full potential!