\(\underline{\textbf{Chapter Description}}\)
In this chapter, you’ll progress The Coordinates layers offer specific and very useful tools for efficiently and accurately communicating data.
Here we’ll look at the various ways of effectively using these layers, so you can clearly visualize lognormal datasets, variables with units, and periodic data.
# The datasets airquality and sunspots were loaded from the built-in "datasets" package
# ggplot2 contains the msleep dataset
library(tidyverse)
# Used the function index() from the zoo package to modify the sunspot.month dataset from the datasets package
library(zoo)
# Used lubridate functions to append a date column to the airquality dataset
library(lubridate)
airquality <- airquality %>%
mutate(Date = make_date(1974, Month, Day))
str(airquality)
'data.frame': 153 obs. of 7 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date : Date, format: "1974-05-01" "1974-05-02" ...
mtcars <- read.csv("~/Desktop/R/Datacamp/Data Visualization/Datasets/mtcars.csv", stringsAsFactors=FALSE)
mtcars <- mtcars %>%
mutate(fam = as.factor(am), fcyl = as.factor(cyl), car = model, fvs = as.factor(vs))
str(mtcars)
'data.frame': 32 obs. of 16 variables:
$ model: chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ hp : int 110 110 93 110 175 105 245 62 95 123 ...
$ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec : num 16.5 17 18.6 19.4 17 ...
$ vs : int 0 0 1 1 0 1 0 1 1 1 ...
$ am : int 1 1 1 0 0 0 0 0 0 0 ...
$ gear : int 4 4 4 3 3 3 3 4 4 4 ...
$ carb : int 4 4 1 1 2 1 4 2 2 4 ...
$ fcyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ fam : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ car : chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ fvs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
wind <- read_csv("~/Desktop/R/Datacamp/Data Visualization/Datasets/wind.csv",
col_types = cols(
date = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
ws = col_factor(levels = c("0 - 2", "2 - 4", "4 - 6", "6 - 8", "8 - 10", "10 - 12", "12 - 14")),
wd = col_factor(levels = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"))
))
str(wind)
spec_tbl_df [8,753 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ date: POSIXct[1:8753], format: "2003-01-01 00:00:00" "2003-01-01 01:00:00" ...
$ ws : Factor w/ 7 levels "0 - 2","2 - 4",..: 3 3 2 3 3 3 3 3 2 2 ...
$ wd : Factor w/ 16 levels "N","NNE","NE",..: 8 7 7 7 7 7 8 8 8 8 ...
- attr(*, "spec")=
.. cols(
.. date = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
.. ws = col_factor(levels = c("0 - 2", "2 - 4", "4 - 6", "6 - 8", "8 - 10", "10 - 12", "12 - 14"
.. ), ordered = FALSE, include_na = FALSE),
.. wd = col_factor(levels = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW",
.. "SW", "WSW", "W", "WNW", "NW", "NNW"), ordered = FALSE, include_na = FALSE)
.. )
- attr(*, "problems")=<externalptr>
Lecture Slides 1-12.
In the video, you saw different ways of using the coordinates layer to zoom in.
# ?coord_cartesian
# ?scale_x_continuous
In this exercise, we’ll compare zooming by changing scales and by changing coordinates.
The big difference is that the scale functions change the underlying dataset, which affects calculations made by computed geoms (like histograms or smooth trend lines), whereas coordinate functions make no changes to the dataset.
A scatter plot using mtcars with a LOESS smoothed trend line is provided. Take a look at this before updating it.
Update the plot by adding (+) a continuous
x scale with limits from 3 to
6.
# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
geom_point() +
geom_smooth() +
# Add a continuous x scale from 3 to 6
scale_x_continuous(limits = c(3, 6))
Warning: Removed 12 rows containing non-finite values (stat_smooth).
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : span too small. fewer data values than degrees of freedom.
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : at 3.168
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : radius 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : all data on boundary of neighborhood. make span bigger
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : pseudoinverse used at 3.168
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : neighborhood radius 0.002
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : reciprocal condition number 1
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : at 3.572
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : radius 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : all data on boundary of neighborhood. make span bigger
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : There are other near singularities as well. 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : zero-width neighborhood. make span bigger
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : zero-width neighborhood. make span bigger
Warning: Computation failed in `stat_smooth()`:
NA/NaN/Inf in foreign function call (arg 5)
Warning: Removed 12 rows containing missing values (geom_point).
x limits, xlim, from 3 to
6.ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
geom_point() +
geom_smooth() +
# Add Cartesian coordinates with x limits from 3 to 6
coord_cartesian(xlim = c(3, 6))
Using the scale function to zoom in meant that there wasn’t
enough data to calculate the trend line, and geom_smooth()
failed.
When coord_cartesian() was applied, the full dataset
was used for the trend calculation.
We can set the aspect ratio of a plot with
coord_fixed(), which uses ratio = 1 as a
default. A 1:1 aspect ratio is most appropriate when two continuous
variables are on the same scale, as with the iris
dataset.
All variables are measured in centimeters, so it only makes sense that one unit on the plot should be the same physical distance on each axis. This gives a more truthful depiction of the relationship between the two variables since the aspect ratio can change the angle of our smoothing line. This would give an erroneous impression of the data. Of course the underlying linear models don’t change, but our perception can be influenced by the angle drawn.
A plot using the iris dataset, of sepal width vs. sepal
length colored by species, is shown in the viewer.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
# Fix the coordinate ratio
# coord_fixed(ratio = 1) # No Effect because default ratio = 1
coord_fixed(ratio = 0.5)
My Remarks:
coord_fixed(ratio = 1) had no effect because
ratio = 1 by default. Therefore, I used
ratio = 0.5.When values are not on the same scale it can be a bit tricky to set
an appropriate aspect ratio. A classic William Cleveland (inventor of
dot plots) example is the sunspots data set. We have 3200
observations from 1750 to 2016.
# Zoo Package Required to Load Dataset
sunspots.m <- data.frame(year = index(sunspot.month),
value = reshape2::melt(sunspot.month)$value)
sun_plot (below) is a plot without any set
aspect ratio. It fills up the graphics device.
# Automatically Loaded in datacamp.com BUT Background Code NOT Given
sun_plot <- ggplot(sunspots.m, aes(x = year, y = value)) +
geom_line(color = "skyblue") #+
# coord_fixed(ratio = 0.0555) # I added this ratio so that it
# sun_plot
To make aspect ratios clear, we’ve drawn an orange box that is 75 units high and 75 years wide.
Using a 1:1 aspect ratio would make the box square. That aspect ratio would make things harder to see the oscillations: it is better to force a wider ratio.
# Fix the aspect ratio to 1:1
sun_plot +
coord_fixed(ratio = 1)
The \(y\) axis is now unreadable because it is too small. Make it bigger!
ratio to 20:1. This is the aspect
ratio recommended by Cleveland to help make the trend among oscillations
easiest to see.# Change the aspect ratio to 20:1
sun_plot +
coord_fixed(ratio = 20)
coord_fixed() with a high
ratio is often useful for long time series.My Remarks: This section was confusing because they did not provide the underlying code that made the graph have whered aspect ratios.
The coord_*() layer functions offer two useful arguments
that work well together: expand and clip.
expand sets a buffer margin around the plot, so data
and axes don’t overlap.
expand to 0 draws the axes to the
limits of the data.clip decides whether plot elements that would lie
outside the plot panel are displayed or ignored (“clipped”).
When done properly this can make a great visual effect! We’ll use
theme_classic() and modify the axis lines in this
example.
x and y axes.ggplot(mtcars, aes(wt, mpg)) +
geom_point(size = 2) +
# Add Cartesian coordinates with zero expansion
coord_cartesian(expand = 0) +
theme_classic()
Setting expand to 0 caused points at
the edge of the plot panel to be cut off.
Set the clip argument to "off" to
prevent this.
Remove the axis lines by setting the axis.line
argument to element_blank() in the theme()
layer function.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(size = 2) +
# Turn clipping off
coord_cartesian(expand = 0, clip = "off") +
theme_classic() +
# Remove axis lines
theme(axis.line = element_blank())
Lecture Slides 13-22.
msleep
# A tibble: 83 × 11
name genus vore order conse…¹ sleep…² sleep…³ sleep…⁴ awake brainwt
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Cheetah Acin… carni Carn… lc 12.1 NA NA 11.9 NA
2 Owl monkey Aotus omni Prim… <NA> 17 1.8 NA 7 0.0155
3 Mountain be… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6 NA
4 Greater sho… Blar… omni Sori… lc 14.9 2.3 0.133 9.1 0.00029
5 Cow Bos herbi Arti… domest… 4 0.7 0.667 20 0.423
6 Three-toed … Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6 NA
7 Northern fu… Call… carni Carn… vu 8.7 1.4 0.383 15.3 NA
8 Vesper mouse Calo… <NA> Rode… <NA> 7 NA NA 17 NA
9 Dog Canis carni Carn… domest… 10.1 2.9 0.333 13.9 0.07
10 Roe deer Capr… herbi Arti… lc 3 NA NA 21 0.0982
# … with 73 more rows, 1 more variable: bodywt <dbl>, and abbreviated variable
# names ¹conservation, ²sleep_total, ³sleep_rem, ⁴sleep_cycle
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Using scale_y_log10() and scale_x_log10()
is equivalent to transforming our actual dataset before getting
to ggplot2.
Using coord_trans(), setting x = "log10"
and/or y = "log10" arguments, transforms the data
after statistics have been calculated. The plot will look the
same as with using scale_*_log10(), but the scales will be
different, meaning that we’ll see the original values on our log10
transformed axes. This can be useful since log scales are not always
intuitive.
Let’s see this in action with positively skewed data - the brain and
body weight of 51 mammals from the msleep dataset.
msleep dataset, plot the raw values of
brainwt against bodywt values as a scatter
plot.# Produce a scatter plot of brainwt vs. bodywt
ggplot(data = msleep, aes(x = bodywt, y = brainwt)) +
geom_point(na.rm = T) +
ggtitle("Raw Values")
scale_x_log10() and
scale_y_log10() layers with default values to transform the
data before plotting.# Add scale_x_log10() and scale_y_log10() functions
ggplot(msleep, aes(bodywt, brainwt)) +
geom_point(na.rm = T) +
scale_x_log10() +
scale_y_log10() +
ggtitle("Scale_ functions")
coord_trans() to apply a "log10"
transformation to both the x and y
scales.# Perform a log10 coordinate system transformation
ggplot(msleep, aes(bodywt, brainwt)) +
geom_point(na.rm = T) +
coord_trans(x = "log10", y = "log10")
Each transformation method has implications for the plot’s interpretability.
Think about your audience when choosing a method for applying transformations.
In the last exercise, we saw the usefulness of the
coord_trans() function, but be careful! Remember that
statistics are calculated on the untransformed data. A linear model may
end up looking not-so-linear after an axis transformation. Let’s revisit
the two plots from the previous exercise and compare their linear
models.
log10 transformed scales to the x and
y axes.# Plot with a scale_*_*() function:
ggplot(msleep, aes(bodywt, brainwt)) +
geom_point(na.rm = T) +
geom_smooth(method = "lm", se = FALSE, na.rm = T) +
# Add a log10 x scale
scale_x_log10() +
# Add a log10 y scale
scale_y_log10() +
ggtitle("Scale functions")
Add a log10 coordinate transformation for both the
x and y axes.
Do you notice the difference between the two plots?
# Plot with transformed coordinates
ggplot(msleep, aes(bodywt, brainwt)) +
geom_point(na.rm = T) +
geom_smooth(method = "lm", se = FALSE, na.rm = T) +
# Add a log10 coordinate transformation for x and y axes
coord_trans(x = "log10", y = "log10") # Transforms coordinates too
The smooth trend line is calculated after scale transformations but not coordinate transformations, so the second plot doesn’t make sense.
Be careful when using the coord_trans()
function!
Lecture Slides 23-31.
Double x and y-axes are a contentious topic
in data visualization. We’ll revisit that discussion at the end of
Chapter 4. Here, I want to review a great use case where double axes
actually do add value to a plot.
Our goal plot is displayed in the viewer. The two axes are the raw temperature values on a Fahrenheit scale and the transformed values on a Celsius scale.
You can imagine a similar scenario for Log-transformed and original values, miles and kilometers, or pounds and kilograms. A scale that is not intuitive for many people can be made easier by adding a transformation as a double axis.
Temp described by
Date in the airquality dataset.# Using airquality, plot Temp vs. Date
ggplot(data = airquality, aes(x = Date, y = Temp)) +
# Add a line layer
geom_line() +
labs(x = "Date (1973)", y = "Fahrenheit")
Convert y_breaks from Fahrenheit to Celsius
(subtract 32, then multiply by 5, then divide by 9).
Define the secondary y-axis using
sec_axis().
Use the identity transformation.
Set the breaks and labels to the
defined objects y_breaks and y_labels,
respectively.
# Define breaks (Fahrenheit)
y_breaks <- c(59, 68, 77, 86, 95, 104)
# Convert y_breaks from Fahrenheit to Celsius
y_labels <- (y_breaks - 32) * 5 / 9
# Create a secondary x-axis
secondary_y_axis <- sec_axis(
# Use identity transformation
trans = identity,
name = "Celsius",
# Define breaks and labels as above
breaks = y_breaks,
labels = y_labels
)
y-axis to the sec.axis
argument of scale_y_continuous().# Update the plot
ggplot(airquality, aes(Date, Temp)) +
geom_line() +
# Add the secondary y-axis
scale_y_continuous(sec.axis = secondary_y_axis) +
labs(x = "Date (1973)", y = "Fahrenheit")
Flipping axes means to reverse the variables mapped onto the
x and y aesthetics. We can just change the
mappings in aes(), but we can also use the
coord_flip() layer function.
There are two reasons to use this function:
We want a vertical geom to be horizontal, or
We’ve completed a long series of plotting functions and want to flip it without having to rewrite all our commands.
fam, filled according to fcyl.# Plot fcyl bars, filled by fam
ggplot(data = mtcars, aes(x = fcyl, fill = fam)) +
# Place bars side by side
geom_bar(position = "dodge")
coord_flip()
function.ggplot(mtcars, aes(fcyl, fill = fam)) +
geom_bar(position = "dodge") +
# Flip the x and y coordinates
coord_flip()
Partially overlapping bars are popular with “infoviz” in magazines.
Update the position argument to use position_dodge()
with a width of 0.5.
ggplot(mtcars, aes(fcyl, fill = fam)) +
# Set a dodge width of 0.5 for partially overlapping bars
geom_bar(position = position_dodge(width = 0.5)) +
coord_flip()
In this exercise, we’ll continue to use the coord_flip()
layer function to reverse the variables mapped onto the x
and y aesthetics.
Within the mtcars dataset, car is the name
of the car and wt is its weight.
wt versus car
using the mtcars dataset. (We’ll flip the axes in the next
step.)# Plot of wt vs. car
ggplot(data = mtcars, aes(x = car, y = wt)) +
# Add a point layer
geom_point() +
labs(x = "car", y = "weight")
It would be easier to read if car was mapped to the
y axis. Flip the coordinates.
# Flip the axes to set car to the y axis
ggplot(mtcars, aes(car, wt)) +
geom_point() +
labs(x = "car", y = "weight") +
coord_flip()
Lecture Slides 32-39.
The coord_polar() function converts a planar
x-y Cartesian plot to polar coordinates. This can be useful
if you are producing pie charts.
We can imagine two forms for pie charts - the typical filled circle, or a colored ring.
Typical pie charts omit all of the non-data ink, which we saw in the themes chapter of the last course. Pie charts are not really better than stacked bar charts, but we’ll come back to this point in the next chapter.
A bar plot using mtcars of the number of cylinders (as a
factor), fcyl, is shown in the console.
Run the code to see the stacked bar plot.
Add (+) a polar coordinate system, mapping the angle
to the y variable by setting theta to
"y".
# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = 1, fill = fcyl)) +
geom_bar() +
# Add a polar coordinate system
coord_polar(theta = "y")
Reduce the width of the bars to
0.1.
Make it a ring plot by adding a continuous x scale
with limits from 0.5 to
1.5.
ggplot(mtcars, aes(x = 1, fill = fcyl)) +
# Reduce the bar width to 0.1
geom_bar(width = 0.1) +
coord_polar(theta = "y") +
# Add a continuous x scale from 0.5 to 1.5
scale_x_continuous(limits = c(0.5, 1.5))
Polar coordinate plots are well-suited to scales like compass direction or time of day. A popular example is the “wind rose”.
The wind dataset is taken from the openair
package and contains hourly measurements for wind-speed
(ws) and direction (wd) from London in 2003.
Both variables are factors.
Final form of wind dataset used found online and uploaded as a .csv in the beginning of the “Polar Coordinates”
Make a classic bar plot mapping wd onto the
x aesthetic and ws onto
fill.
Use a geom_bar() layer, since we want to aggregate
over all date values, and set the width argument to
1, to eliminate any spaces between the bars.
# Using wind, plot wd filled by ws
ggplot(data = wind, aes(x = wd, fill = ws)) +
# Add a bar layer with width 1
geom_bar(width = 1)
coord_polar().# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
geom_bar(width = 1) +
coord_polar()
start argument to -pi/16 to
position North at the top of the plot.# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
geom_bar(width = 1) +
coord_polar(start = -pi/16)