library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
if (!file.exists('president_approval_polls.csv')){
download.file('https://projects.fivethirtyeight.com/polls-page/president_approval_polls.csv', 'president_approval_polls.csv')
}
df <- read.csv('president_approval_polls.csv')
GGPlot is centered around is base function ggplot and then adding certain geometries to ggplot to create different types of graphs. The grammar of ggplot is ggplot(arguments) + geom_type() where geom_type determines the type of plot you will make. Here are some examples.
df$end_date <- df$end_date %>% mdy()
df %>% ggplot(aes(x = end_date, y = yes)) + geom_line()
Here we have a basic plot. Inside the ggplot call we call aes which creates the aesthetics for the plot. Inside the aes call we create the x and y variables. However we see that the graph is extremely cluttered and that perhaps the line geometry is not correct for this graph. Let us try and get more visual clarity by using geom_point rather than geom_line. I also added a second geometry geom_smooth to see the trend.
df %>% ggplot(aes(x = end_date, y = yes)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
We see that changing geometry is as simple as changing the geometry argument. Now each poll is much clearer, and we can see an interesting phenomena: There are horizontal lines in this graph where polls are being rounded to the nearest percent. You can find a full list of different geometries here https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf.
Now say we want to graph approval vs. disapproval. We do this by making two geom calls, and we will no longer use a global aesthetic.
df %>% ggplot() + geom_point(aes(end_date, yes), color = 'green') + geom_point(aes(end_date, no), color = 'red')
Here I made 2 different calls to geom_point and set the aesthetics for each of them within their own function call rather than as part of the global graph. This is because I wanted each of them to plot different things. I also set the color for each set of points outside of the aesthetics call. Let us see what happens when we set color inside of the aesthetics call.
if (!file.exists('president_polls.csv')){
download.file('https://projects.fivethirtyeight.com/polls-page/president_polls.csv', 'president_polls.csv')
}
df2 <- read.csv('president_polls.csv')
df2$created_at <- df2$created_at %>% mdy_hm()
df2 <- df2 %>% filter(candidate_name %in% c('Donald Trump', 'Joseph R. Biden Jr.', 'Bernard Sanders'))
df2 %>% ggplot(aes(created_at, pct, color = candidate_name)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Notice how I called color with a column name. This indicates to ggplot that each value in that column should receive a different color. When you want to give different values in a column different colors that must go inside the aes call, whereas if you want to assign all points the same color, that falls outside the aes call.
Another way to split data is by using the facet_grid function.
df2 %>% ggplot(aes(created_at, pct)) + geom_point() + geom_smooth() + facet_grid(. ~ candidate_name)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The facet grid function tells ggplot to split the data into different graphs based on the values in the faceted columns. to split into columns we use the notation . ~ column, and to split into rows we use column . ~
One other functionality we should discuss is the personalization of graphs. Adding labels and titles to graphs is extremely easy.
df2 %>% ggplot(aes(created_at, pct, color = candidate_name)) + geom_point() + geom_smooth() + xlab('Date') + ylab('%') + ggtitle('Presidential Candidate Approval over Time')
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Using the functions xlab, ylab, and ggtitle, we quickly labeled and titled our plot. Another easy personalization we can do is to change the theme of said plot.
df2 %>% ggplot(aes(created_at, pct, color = candidate_name)) + geom_point() + geom_smooth() + xlab('Date') + ylab('%') + ggtitle('Presidential Candidate Approval over Time') + theme_bw()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Here I changed the theme of the graph by adding theme_bw. There are a variety of themes included with ggplot, and even more added in the ggthemes package for added variability.
There are many other useful functionalities for ggplot including zooming on certain coordinates, scaling axes, and generating statistics. If you want more information, I encourage you to use the cheat sheet above to find out more about ggplot.