\(\underline{\textbf{Chapter Description}}\)

In this chapter, you’ll progress The Coordinates layers offer specific and very useful tools for efficiently and accurately communicating data.
Here we’ll look at the various ways of effectively using these layers, so you can clearly visualize lognormal datasets, variables with units, and periodic data.

# The datasets airquality and sunspots were loaded from the built-in "datasets" package

# ggplot2 contains the msleep dataset
library(tidyverse)  
# Used the function index() from the zoo package to modify the sunspot.month dataset from the datasets package
library(zoo)        
# Used lubridate functions to append a date column to the airquality dataset
library(lubridate)
airquality <- airquality %>% 
  mutate(Date = make_date(1974, Month, Day))
str(airquality)
'data.frame':   153 obs. of  7 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Date   : Date, format: "1974-05-01" "1974-05-02" ...
mtcars <- read.csv("~/Desktop/R/Datacamp/Data Visualization/Datasets/mtcars.csv", stringsAsFactors=FALSE)
mtcars <- mtcars %>% 
  mutate(fam = as.factor(am), fcyl = as.factor(cyl), car = model, fvs = as.factor(vs))
str(mtcars)
'data.frame':   32 obs. of  16 variables:
 $ model: chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl  : int  6 6 4 6 8 6 8 4 4 6 ...
 $ disp : num  160 160 108 258 360 ...
 $ hp   : int  110 110 93 110 175 105 245 62 95 123 ...
 $ drat : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec : num  16.5 17 18.6 19.4 17 ...
 $ vs   : int  0 0 1 1 0 1 0 1 1 1 ...
 $ am   : int  1 1 1 0 0 0 0 0 0 0 ...
 $ gear : int  4 4 4 3 3 3 3 4 4 4 ...
 $ carb : int  4 4 1 1 2 1 4 2 2 4 ...
 $ fcyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
 $ fam  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
 $ car  : chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ fvs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
wind <- read_csv("~/Desktop/R/Datacamp/Data Visualization/Datasets/wind.csv", 
                 col_types = cols(
                   date = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
                   ws = col_factor(levels = c("0 - 2", "2 - 4", "4 - 6", "6 - 8", "8 - 10", "10 - 12", "12 - 14")),
                   wd = col_factor(levels = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"))
                   ))
str(wind)
spec_tbl_df [8,753 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ date: POSIXct[1:8753], format: "2003-01-01 00:00:00" "2003-01-01 01:00:00" ...
 $ ws  : Factor w/ 7 levels "0 - 2","2 - 4",..: 3 3 2 3 3 3 3 3 2 2 ...
 $ wd  : Factor w/ 16 levels "N","NNE","NE",..: 8 7 7 7 7 7 8 8 8 8 ...
 - attr(*, "spec")=
  .. cols(
  ..   date = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
  ..   ws = col_factor(levels = c("0 - 2", "2 - 4", "4 - 6", "6 - 8", "8 - 10", "10 - 12", "12 - 14"
  ..     ), ordered = FALSE, include_na = FALSE),
  ..   wd = col_factor(levels = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", 
  ..     "SW", "WSW", "W", "WNW", "NW", "NNW"), ordered = FALSE, include_na = FALSE)
  .. )
 - attr(*, "problems")=<externalptr> 

Coordinates

Lecture Slides 1-12.

In the video, you saw different ways of using the coordinates layer to zoom in.

# ?coord_cartesian
# ?scale_x_continuous

Zooming In

In this exercise, we’ll compare zooming by changing scales and by changing coordinates.

The big difference is that the scale functions change the underlying dataset, which affects calculations made by computed geoms (like histograms or smooth trend lines), whereas coordinate functions make no changes to the dataset.

A scatter plot using mtcars with a LOESS smoothed trend line is provided. Take a look at this before updating it.

Exercise 1

  • Update the plot by adding (+) a continuous x scale with limits from 3 to 6.

    • Spoiler: this will cause a problem!
# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
  geom_point() +
  geom_smooth() +
  # Add a continuous x scale from 3 to 6
  scale_x_continuous(limits = c(3, 6))
Warning: Removed 12 rows containing non-finite values (stat_smooth).
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : span too small. fewer data values than degrees of freedom.
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : at 3.168
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : radius 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : all data on boundary of neighborhood. make span bigger
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : pseudoinverse used at 3.168
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : neighborhood radius 0.002
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : reciprocal condition number 1
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : at 3.572
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : radius 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : all data on boundary of neighborhood. make span bigger
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : There are other near singularities as well. 4e-06
Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : zero-width neighborhood. make span bigger

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
parametric, : zero-width neighborhood. make span bigger
Warning: Computation failed in `stat_smooth()`:
NA/NaN/Inf in foreign function call (arg 5)
Warning: Removed 12 rows containing missing values (geom_point).

Exercise 2

  • Update the plot by adding a Cartesian coordinate system with x limits, xlim, from 3 to 6.
ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
  geom_point() +
  geom_smooth() +
  # Add Cartesian coordinates with x limits from 3 to 6
  coord_cartesian(xlim = c(3, 6))

Concluding Remarks

  • Using the scale function to zoom in meant that there wasn’t enough data to calculate the trend line, and geom_smooth() failed.

  • When coord_cartesian() was applied, the full dataset was used for the trend calculation.



Aspect Ratio Part 1: 1 to 1 Ratios

We can set the aspect ratio of a plot with coord_fixed(), which uses ratio = 1 as a default. A 1:1 aspect ratio is most appropriate when two continuous variables are on the same scale, as with the iris dataset.

All variables are measured in centimeters, so it only makes sense that one unit on the plot should be the same physical distance on each axis. This gives a more truthful depiction of the relationship between the two variables since the aspect ratio can change the angle of our smoothing line. This would give an erroneous impression of the data. Of course the underlying linear models don’t change, but our perception can be influenced by the angle drawn.

A plot using the iris dataset, of sepal width vs. sepal length colored by species, is shown in the viewer.

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_jitter() +
  geom_smooth(method = "lm", se = FALSE)

Exercise 1

  • Add a fixed coordinate layer to force a 1:1 aspect ratio.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_jitter() +
  geom_smooth(method = "lm", se = FALSE) +
  
  # Fix the coordinate ratio
  # coord_fixed(ratio = 1) # No Effect because default ratio = 1
  coord_fixed(ratio = 0.5)

Concluding Remarks

  • A 1:1 aspect ratio is helpful when your axes show the same scales.

My Remarks:

  • Adding coord_fixed(ratio = 1) had no effect because ratio = 1 by default. Therefore, I used ratio = 0.5.



Aspect Ratio Part 2: Setting Ratios

When values are not on the same scale it can be a bit tricky to set an appropriate aspect ratio. A classic William Cleveland (inventor of dot plots) example is the sunspots data set. We have 3200 observations from 1750 to 2016.

# Zoo Package Required to Load Dataset 
sunspots.m <- data.frame(year = index(sunspot.month), 
                         value = reshape2::melt(sunspot.month)$value)

sun_plot (below) is a plot without any set aspect ratio. It fills up the graphics device.

# Automatically Loaded in datacamp.com BUT Background Code NOT Given
sun_plot <- ggplot(sunspots.m, aes(x = year, y = value)) +
  geom_line(color = "skyblue") #+
  # coord_fixed(ratio = 0.0555) # I added this ratio so that it
# sun_plot

To make aspect ratios clear, we’ve drawn an orange box that is 75 units high and 75 years wide.

Using a 1:1 aspect ratio would make the box square. That aspect ratio would make things harder to see the oscillations: it is better to force a wider ratio.

Exercise 1

  • Fix the coordinates to a 1:1 aspect ratio.
# Fix the aspect ratio to 1:1
sun_plot +
  coord_fixed(ratio = 1)

Exercise 2

  • The \(y\) axis is now unreadable because it is too small. Make it bigger!

    • Change the aspect ratio to 20:1. This is the aspect ratio recommended by Cleveland to help make the trend among oscillations easiest to see.
# Change the aspect ratio to 20:1
sun_plot +
  coord_fixed(ratio = 20)

Concluding Remarks

  • Making a wide plot by calling coord_fixed() with a high ratio is often useful for long time series.

My Remarks: This section was confusing because they did not provide the underlying code that made the graph have whered aspect ratios.



Expand and Clip

The coord_*() layer functions offer two useful arguments that work well together: expand and clip.

  • expand sets a buffer margin around the plot, so data and axes don’t overlap.

    • Setting expand to 0 draws the axes to the limits of the data.
  • clip decides whether plot elements that would lie outside the plot panel are displayed or ignored (“clipped”).

When done properly this can make a great visual effect! We’ll use theme_classic() and modify the axis lines in this example.

Exercise 1

  • Add Cartesian coordinates with zero expansion, to remove all buffer margins on both the x and y axes.
ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size = 2) +
  
  # Add Cartesian coordinates with zero expansion
  coord_cartesian(expand = 0) +
  
  theme_classic()

Exercise 2

  • Setting expand to 0 caused points at the edge of the plot panel to be cut off.

    • Set the clip argument to "off" to prevent this.

    • Remove the axis lines by setting the axis.line argument to element_blank() in the theme() layer function.

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size = 2) +
  
  # Turn clipping off
  coord_cartesian(expand = 0, clip = "off") +
  
  theme_classic() +
  
  # Remove axis lines
  theme(axis.line = element_blank())

Concluding Remarks

  • These arguments make clean and accurate plots by not cutting off data.



Coodrinates vs. Scales

Lecture Slides 13-22.

msleep
# A tibble: 83 × 11
   name         genus vore  order conse…¹ sleep…² sleep…³ sleep…⁴ awake  brainwt
   <chr>        <chr> <chr> <chr> <chr>     <dbl>   <dbl>   <dbl> <dbl>    <dbl>
 1 Cheetah      Acin… carni Carn… lc         12.1    NA    NA      11.9 NA      
 2 Owl monkey   Aotus omni  Prim… <NA>       17       1.8  NA       7    0.0155 
 3 Mountain be… Aplo… herbi Rode… nt         14.4     2.4  NA       9.6 NA      
 4 Greater sho… Blar… omni  Sori… lc         14.9     2.3   0.133   9.1  0.00029
 5 Cow          Bos   herbi Arti… domest…     4       0.7   0.667  20    0.423  
 6 Three-toed … Brad… herbi Pilo… <NA>       14.4     2.2   0.767   9.6 NA      
 7 Northern fu… Call… carni Carn… vu          8.7     1.4   0.383  15.3 NA      
 8 Vesper mouse Calo… <NA>  Rode… <NA>        7      NA    NA      17   NA      
 9 Dog          Canis carni Carn… domest…    10.1     2.9   0.333  13.9  0.07   
10 Roe deer     Capr… herbi Arti… lc          3      NA    NA      21    0.0982 
# … with 73 more rows, 1 more variable: bodywt <dbl>, and abbreviated variable
#   names ¹​conservation, ²​sleep_total, ³​sleep_rem, ⁴​sleep_cycle
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Log-Transforming Scales

Using scale_y_log10() and scale_x_log10() is equivalent to transforming our actual dataset before getting to ggplot2.

Using coord_trans(), setting x = "log10" and/or y = "log10" arguments, transforms the data after statistics have been calculated. The plot will look the same as with using scale_*_log10(), but the scales will be different, meaning that we’ll see the original values on our log10 transformed axes. This can be useful since log scales are not always intuitive.

Let’s see this in action with positively skewed data - the brain and body weight of 51 mammals from the msleep dataset.

Exercise 1

  • Using the msleep dataset, plot the raw values of brainwt against bodywt values as a scatter plot.
# Produce a scatter plot of brainwt vs. bodywt
ggplot(data = msleep, aes(x = bodywt, y = brainwt)) +
  geom_point(na.rm = T) +
  ggtitle("Raw Values")

Exercise 2

  • Add the scale_x_log10() and scale_y_log10() layers with default values to transform the data before plotting.
# Add scale_x_log10() and scale_y_log10() functions
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point(na.rm = T) +
  scale_x_log10() +
  scale_y_log10() +
  ggtitle("Scale_ functions")

Exercise 3

  • Use coord_trans() to apply a "log10" transformation to both the x and y scales.
# Perform a log10 coordinate system transformation
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point(na.rm = T) +
  coord_trans(x = "log10", y = "log10")

Concluding Remarks

  • Each transformation method has implications for the plot’s interpretability.

  • Think about your audience when choosing a method for applying transformations.



Adding Stats to Transformed Scales

In the last exercise, we saw the usefulness of the coord_trans() function, but be careful! Remember that statistics are calculated on the untransformed data. A linear model may end up looking not-so-linear after an axis transformation. Let’s revisit the two plots from the previous exercise and compare their linear models.

Exercise 1

  • Add log10 transformed scales to the x and y axes.
# Plot with a scale_*_*() function:
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point(na.rm = T) +
  geom_smooth(method = "lm", se = FALSE, na.rm = T) +
  # Add a log10 x scale
  scale_x_log10() +
  # Add a log10 y scale
  scale_y_log10() +
  ggtitle("Scale functions")

Exercise 2

  • Add a log10 coordinate transformation for both the x and y axes.

  • Do you notice the difference between the two plots?

# Plot with transformed coordinates
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point(na.rm = T) +
  geom_smooth(method = "lm", se = FALSE, na.rm = T) +
  # Add a log10 coordinate transformation for x and y axes
  coord_trans(x = "log10", y = "log10")           # Transforms coordinates too

Concluding Remarks

  • The smooth trend line is calculated after scale transformations but not coordinate transformations, so the second plot doesn’t make sense.

  • Be careful when using the coord_trans() function!



Double and Flipped Axes

Lecture Slides 23-31.


Useful Double Axes

Double x and y-axes are a contentious topic in data visualization. We’ll revisit that discussion at the end of Chapter 4. Here, I want to review a great use case where double axes actually do add value to a plot.

Our goal plot is displayed in the viewer. The two axes are the raw temperature values on a Fahrenheit scale and the transformed values on a Celsius scale.

You can imagine a similar scenario for Log-transformed and original values, miles and kilometers, or pounds and kilograms. A scale that is not intuitive for many people can be made easier by adding a transformation as a double axis.

Exercise 1

  • Begin with a standard line plot, of Temp described by Date in the airquality dataset.
# Using airquality, plot Temp vs. Date
ggplot(data = airquality, aes(x = Date, y = Temp)) +
  # Add a line layer
  geom_line() +
  labs(x = "Date (1973)", y = "Fahrenheit")

Exercise 2

  • Convert y_breaks from Fahrenheit to Celsius (subtract 32, then multiply by 5, then divide by 9).

  • Define the secondary y-axis using sec_axis().

    • Use the identity transformation.

    • Set the breaks and labels to the defined objects y_breaks and y_labels, respectively.

# Define breaks (Fahrenheit)
y_breaks <- c(59, 68, 77, 86, 95, 104)

# Convert y_breaks from Fahrenheit to Celsius
y_labels <- (y_breaks - 32) * 5 / 9

# Create a secondary x-axis
secondary_y_axis <- sec_axis(
  # Use identity transformation
  trans = identity,
  name = "Celsius",
  # Define breaks and labels as above
  breaks = y_breaks,
  labels = y_labels
)

Exercise 3

  • Add your secondary y-axis to the sec.axis argument of scale_y_continuous().
# Update the plot
ggplot(airquality, aes(Date, Temp)) +
  geom_line() +
  # Add the secondary y-axis 
  scale_y_continuous(sec.axis = secondary_y_axis) +
  labs(x = "Date (1973)", y = "Fahrenheit")

Concluding Remarks

  • Double axes are most useful when you want to display the same value in two differnt units.



Flipping Axes Part 1

Flipping axes means to reverse the variables mapped onto the x and y aesthetics. We can just change the mappings in aes(), but we can also use the coord_flip() layer function.

There are two reasons to use this function:

  1. We want a vertical geom to be horizontal, or

  2. We’ve completed a long series of plotting functions and want to flip it without having to rewrite all our commands.

Exercise 1

  • Create a side-by-side (dodged) bar chart of fam, filled according to fcyl.
# Plot fcyl bars, filled by fam
ggplot(data = mtcars, aes(x = fcyl, fill = fam)) +
  # Place bars side by side
  geom_bar(position = "dodge")

Exercise 2

  • To get horizontal bars, add a coord_flip() function.
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar(position = "dodge") +
  # Flip the x and y coordinates
  coord_flip()

Exercise 3

  • Partially overlapping bars are popular with “infoviz” in magazines.

  • Update the position argument to use position_dodge() with a width of 0.5.

ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Set a dodge width of 0.5 for partially overlapping bars
  geom_bar(position = position_dodge(width = 0.5)) +
  coord_flip()

Concluding Remarks

  • Horizontal bars are especially useful when the axis labels are long.



Flipping Axes Part 2

In this exercise, we’ll continue to use the coord_flip() layer function to reverse the variables mapped onto the x and y aesthetics.

Within the mtcars dataset, car is the name of the car and wt is its weight.

Exercise 1

  • Create a scatter plot of wt versus car using the mtcars dataset. (We’ll flip the axes in the next step.)
# Plot of wt vs. car
ggplot(data = mtcars, aes(x = car, y = wt)) +
  # Add a point layer
  geom_point() +
  labs(x = "car", y = "weight")

Exercise 2

  • It would be easier to read if car was mapped to the y axis. Flip the coordinates.

    • Notice that the labels also get flipped!
# Flip the axes to set car to the y axis
ggplot(mtcars, aes(car, wt)) +
  geom_point() +
  labs(x = "car", y = "weight") +
  coord_flip()

Concluding Remarks

  • Notice how much more interpretable the plot is after flipping the axes.



Polar Coordinates

Lecture Slides 32-39.


Pie Charts

The coord_polar() function converts a planar x-y Cartesian plot to polar coordinates. This can be useful if you are producing pie charts.

We can imagine two forms for pie charts - the typical filled circle, or a colored ring.

Typical pie charts omit all of the non-data ink, which we saw in the themes chapter of the last course. Pie charts are not really better than stacked bar charts, but we’ll come back to this point in the next chapter.

A bar plot using mtcars of the number of cylinders (as a factor), fcyl, is shown in the console.

Exercise 1

  • Run the code to see the stacked bar plot.

  • Add (+) a polar coordinate system, mapping the angle to the y variable by setting theta to "y".

# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = 1, fill = fcyl)) +
  geom_bar() +

  # Add a polar coordinate system
  coord_polar(theta = "y")

Exercise 2

  • Reduce the width of the bars to 0.1.

  • Make it a ring plot by adding a continuous x scale with limits from 0.5 to 1.5.

ggplot(mtcars, aes(x = 1, fill = fcyl)) +
  # Reduce the bar width to 0.1
  geom_bar(width = 0.1) +
  coord_polar(theta = "y") +
  # Add a continuous x scale from 0.5 to 1.5
  scale_x_continuous(limits = c(0.5, 1.5))

Concluding Remarks

  • Polar coordinates are particularly useful if you are dealing with a cycle, like yearly data, that you would like to see represented as such.



Wind Rose Plots

Polar coordinate plots are well-suited to scales like compass direction or time of day. A popular example is the “wind rose”.

The wind dataset is taken from the openair package and contains hourly measurements for wind-speed (ws) and direction (wd) from London in 2003. Both variables are factors.

Final form of wind dataset used found online and uploaded as a .csv in the beginning of the “Polar Coordinates”

Exercise 1

  • Make a classic bar plot mapping wd onto the x aesthetic and ws onto fill.

  • Use a geom_bar() layer, since we want to aggregate over all date values, and set the width argument to 1, to eliminate any spaces between the bars.

# Using wind, plot wd filled by ws
ggplot(data = wind, aes(x = wd, fill = ws)) +
  # Add a bar layer with width 1
  geom_bar(width = 1)

Exercise 2

  • Convert the Cartesian coordinate space into a polar coordinate space with coord_polar().
# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
  geom_bar(width = 1) +
  coord_polar()

Exercise 3

  • Set the start argument to -pi/16 to position North at the top of the plot.
# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
  geom_bar(width = 1) +
  coord_polar(start = -pi/16)

Concluding Remarks

  • Polar coordinates are not common, but polar coordinate plots are really useful.