Before introducing package ggplot2, it’s important to understand the theory on which it’s built.
Consider human language. Native speakers rarely understand grammatical rules, but they use them intuitively.
For example, we may not know why the indefinite article is sometimes “a” and at other times “an”. Observe:
We may also intuit the definite article, “the”, in the case of superlatives. Observe:
Every Grammatical Element is Significant. We use parts of speech like comparatives, superlatives, prepositions, definite and indefinite articles, verbs, nouns, adjectives, adverbs, phrasal verbs, and all other variety of grammatical elemets to create novel sentences. Observe a classic example:
“The quick brown fox jumps over the lazy dog.”
Note that if we change a single word, we change the signficance of the sentence.
“The slow brown fox jumps over the lazy dog.”
By using an antonym for “quick”, the sentence is even more absurd. Let’s make two more changes.
“The slow brown fox runs over the ethereal dog.”
Now we have a fox that can presumably operate a motor vehicle, and we delve into metaphysics a bit.
We intuit the grammar of visualization. But we often don’t know the “rules” - if there are any.
Like human sentences, machine visualization can be elegant and efficient in conveying meaning.
Or it can be a salad of visual elements that disrespects your audience’s time.
In 1999, Leland Wilkinson published The Grammar of Graphics and a theoretical framework for such a grammar.
Layers. Graphics are comprised of distinct layers of grammatical elements.
Typical Conversation in Grammatical Terms. Note what layers are really being discussed:
Data:
“Are you pulling occupations from O*NET or BLS? We only need SOC-level."
Aesthetics:
“Could you color code the data points by ethnicity?”
Geometry:
“I’m trying to emphasize the increase in cases of EBLL levels over time.”
Coordinates:
“Can we zoom in on just the household incomes that are less than $45,000?”
Statistics:
“Let’s add one of those squiggly lines to make that trend really stand out.”
Facets:
“Can we show multiple graphs that are organized by country of origin?”
Themes:
“We can only use colors that are in the company logo. Wait, what is that?”
“My two weeks’ notice.”
A Unified Framework. Individually, each element is a building block. As a whole, the Grammar of Graphics provides a common language for visualization experts.
Mapping or Aesthetic Mapping is simply depicting a variable by using these elements.
Package ggplot2 is a popular, flexible, and powerful visualization extension for R.
gridFurther resources:
ggplot2ggplot2 Vignetteggplot2 Cheat Sheetggplot2 from the R Graph Galleryggplot2ggplot2 Extensions GalleryInstalling and loading ggplot2 is easy.
Use function install.packages() to install ggplot2.
ggplot2install.packages("ggplot2")
Use function library() to load ggplot2.
ggplot2 every time you start a new sessionlibrary(ggplot2)
library("ggplot2")
We’ll use the practice dataset, diamonds, which comes with the ggplot2 package.
ggplot2color,, price, clarity, etc.?diamonds or help(diamonds)data()data(diamonds)
Some other functiosn for exploring the diamonds dataset include:
str(diamonds) # The structure of the data
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
names(diamonds) # Variables names, or use...
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
colnames(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
dim(diamonds) # Tabular dimensions
## [1] 53940 10
Package ggplot2 has an absurd number of functions, but if you grasp the theory and practice, you’ll get it.
Each function begins with a name corresponding to the layers identified above.
ggplot() corresponds to the Data Layeraes() corresponds to the Aesthetics Layergeom_ correspond to the Geometry Layertheme_ correspond to the Themes Layerfacet_correspond to the Facets LayerThe Addition Operator, or +, connects all of these functions into a single visualization.
For example, I’ve saved a grob, or Graphical Object that contains a plot, to demonstrate.
my_grob
Note that we can add, for example, a new them with a theme_*() function.
* represents anything that may follow theme_my_grob +
theme_classic()
Let’s try a different premade theme - there are a ton!
my_grob +
theme_minimal()
my_grob +
theme_light()
Package Extensions. You can even add new themes with package ggthemes.
install.packages() and load with library()library(ggthemes)
my_grob +
theme_fivethirtyeight()
Pretty neat. There’s even a color scheme for each of Wes Anderson’s movies.
We’ll discuss extensions at a later session.
The anatomy of a plot is simple. As mentioned, three layers are essential:
ggplot() and the name of the datasetaes() and the variables to mapgeom_ for the shape of the plotYou can chain all of these together with the Addition Operator.
x = and y = in aes()ggplot()carat and price
ggplot(data = diamonds) +
aes(x = carat, y = price) +
geom_point()
The “Right” Way. Although it’s much easier and cleaner to keep these functions separate:
ggplot() and aes()ggplot(data = diamonds, aes(x = carat,
y = price)) +
geom_point()
The difference between Attributes and Aesthetics is extremely important but very simple.
Aesthetic Mappings depict a variable from your data
x =, y =, color =, fill =, alpha =, etc. all represent dataaes()
Attributes do not depict data - this is called Non-Data Ink
color = and fill =geom_
Let’s check out the same plot with an extra Aesthetic Mapping. Here, we’ll add a new variable, each diamond’s quality of cut, and map it to color = in function aes():
ggplot(data = diamonds, aes(x = carat,
y = price,
color = cut)) +
geom_point()
We can already gain new insights by mapping the variable cut.
Question. What can you tell about the relationship between price and cut?
Now, observe an example of an Attribute or Non-Data Ink. Because we’re not depicting a new variable, we don’t put these arguments in functon aes(). Instead, we put them in function geom_point(), and we can choose the Attributes that help us to interpret the plot. Specifically:
alpha = sets the transparency of data points; helpful because of heavy overlapcolor = helps us discern transparent points from the default grey background of ggplot2theme_light() or other functions for a non-grey background, if neededggplot(data = diamonds, aes(x = carat,
y = price)) +
geom_point(alpha = 0.1,
color = "tomato") +
theme_light()
We haven’t added any new data to the plot, but now we can more easily interpret it.
Question. What insights are made more clear by the parsimonious use of Non-Data Ink?
I’m went over my hourage cap for session prep like three hours ago, and I have some rare birds to hunt in Red Dead Redemption 2, but this practice may still be useful.
Instructions. Using the same plot with which we’ve practiced, experiment with:
aes() because they are Data Inknames(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
Also, try:
geom_point() because they are Non-Data InkIt’s dangerous to go alone. Take this:
shape =, size =, alpha =, etc.Thanks for reading!