ggplot2, a package in the core tidyverse, is a system for easily and efficiently creating graphics in R. It is based on the “Grammar of Graphics”, the notion that all graphs can be built from the same components: a data set, a coordinate system, and visual marks that represent data points. You specify these details to ggplot2 and the package creates the graph.
The possibilities with ggplot2 are endless. You can learn more at https://ggplot2.tidyverse.org/. I will describe here some of the basic core capabilities, and then do a deeper dive into themes.
We start by loading the tidyverse library. If tidyverse is not installed you must first install the tidyverse package ( install.packages(“tidyverse”))
library(tidyverse)
Next we load the data that will be used for our examples. This data is from the article, “Marriage Isn’t Dead - Yet”, published on fivethirtyeight.com. The data represents the share of the population, aged 25 to 34, that has never been married, broken down by year and by other factors - we will be looking at level of education for the years from 2000 forward:
dfMarriage <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master//marriage/both_sexes.csv", header= TRUE)
dfMarriage_Subset <- dfMarriage %>%
subset(select = c("year", "all_2534", "HS_2534", "SC_2534", "BAo_2534", "GD_2534")) %>%
rename(c(all_individuals = "all_2534", High_School_Or_Less = "HS_2534", Some_College = "SC_2534", Bachelors_NoGradDegree="BAo_2534", Graduate_Degree="GD_2534")) %>%
filter(year >= 2000)
knitr::kable(head(dfMarriage_Subset))
| year | all_individuals | High_School_Or_Less | Some_College | Bachelors_NoGradDegree | Graduate_Degree |
|---|---|---|---|---|---|
| 2000 | 0.3450087 | 0.3316545 | 0.3249205 | 0.3939579 | 0.3691740 |
| 2001 | 0.3527767 | 0.3446069 | 0.3341101 | 0.3925148 | 0.3590304 |
| 2002 | 0.3535249 | 0.3490367 | 0.3361595 | 0.3870840 | 0.3512848 |
| 2003 | 0.3620345 | 0.3581877 | 0.3418930 | 0.4000039 | 0.3538130 |
| 2004 | 0.3673247 | 0.3708102 | 0.3450748 | 0.3976124 | 0.3517729 |
| 2005 | 0.3793451 | 0.3870680 | 0.3596663 | 0.4029116 | 0.3514251 |
ggplot2 libraryggplot2 can create a vast array of graphs with many features. We will focus here on a few: creating different types of graphs, adding simple features like titles and labels, and creating themes.
We begin with a boxplot of the share of never-married adults in the population with a high school education or less, for all the years in the dataset:
ggplot(data = dfMarriage_Subset, aes(x ="", y=High_School_Or_Less)) +
geom_boxplot()
Note the grammar of graphics at work - we supply a dataset (dfMarriage_subset), a coordinate system, or “aesthetic” (aes(x = "", y=High_School_Or_Less))) and a type of graph (geom_boxplot()) - and ggplot2 does the rest.
We can change or add to any of the parameters for this plot and generate a new graph. For example, we can improve the y axis label by specifying new text with ylab(), add a title with ggtitle and a new font size with theme(text = element_text(size = 9)).
ggplot(data = dfMarriage_Subset, aes(x ="", y=High_School_Or_Less)) +
geom_boxplot() +
ylab("Share of adults who have never married") +
ggtitle("Marriage Rates for Adults with a High School Education or Less") +
theme(text = element_text(size = 9))
What about a bar chart of the share of ‘never-married adults with a high school education or less’ against ‘year’? True to the grammar of graphics, we can use virtually the same code with just two adjustments - change the x axis to ‘year’ and the plot type to ‘geom_col.’ The plot will retain our y axis label, title and other features:
g1 <- ggplot(data = dfMarriage_Subset, aes(x =year, y=High_School_Or_Less)) +
geom_col() +
ylab("Share of adults who have never married") +
xlab("Year") +
ggtitle("Marriage Rates for Adults with a High School Education or Less") +
theme(text = element_text(size = 10))
g1
We can see that among adults between 24 and 34 with a high school education or less, the share of those who have never been married has increased steadily over time.
There are many other plots possible with ggplot2 - column, scatter, histogram - virtually any plot a data scientist might need. We will not explore them here.
We have already seen how much we can do with ‘theme’ in order to change font size. But there are over 100 elements that can be controlled using ‘theme’ to control the look of your graph. The best way to understand it is to see it in action.
Let’s use theme elements on the above graph to
g2 <- g1 +
theme(plot.title = element_text(size = 12, face="bold"),
panel.background = element_rect(fill = "white"),
panel.grid.major.x = element_blank() ,
panel.grid.major.y = element_line( size=.2, color="gray" ),
plot.background = element_rect(fill = "cornsilk3"),
axis.text = element_text(colour = "darkslateblue", face="bold"),
axis.text.x = element_text(angle=65),
axis.title = element_text(size = 10, face="bold"),
axis.title.y = element_text(vjust=3),
axis.title.x = element_text(vjust=7),
axis.ticks.x = element_blank())
g2
Beware of chartjunk. It is always recommended to use ornament sparingly and only when it contributes to the overall communication of data. But many of these elements can come in handy to make charts more readable and understandable.
If we want to keep consistent with this theme, we can put it in a function and call it again within our markdown file:
my_theme <- function()
{
theme(plot.title = element_text(size = 12, face="bold"),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "cornsilk3"),
panel.grid.major.x = element_blank() ,
panel.grid.major.y = element_line( size=.2, color="gray" ),
axis.text = element_text(colour = "darkslateblue", face="bold"),
axis.text.x = element_text(angle=65),
axis.title = element_text(size = 10, face="bold"),
axis.title.y = element_text(vjust=3),
axis.title.x = element_text(vjust=7),
axis.ticks.x = element_blank())
}
Now if we look at marriage rates for individuals with a Graduate degree we can use our theme for a consistent look like this:
ggplot(data = dfMarriage_Subset, aes(x =year, y=Graduate_Degree)) +
geom_col() +
ylab("Share of adults who have never married") +
xlab("Year") +
ggtitle("Marriage Rates for Adults with a Graduate Degree") +
my_theme()
Here we see that the pattern for adults with a graduate degree is different - marriage rates increased until 2005 and then began to decline.
This ends are introduction to ggplot2 and themes. There is so much more we can do! But this introduction should get you started.