Scatterplots and Accessibility

Author

Rachel Saidi

Published

February 5, 2022

Load dataset mpg and explore accessibility in R

Some data frames are built in to R, such as mpg. Load the data, then use str and head to look at the data.

{r mpg} loads the data. Alternatively, you can use the command: load(“mpg”)

You will look at the data using the command “str” (gives the structure of the data), “head” (lists the first 6 rows of observations in the dataset), and “describe” from the “psych” package (gives quite detailed summary statistics on the continuous variables).

# install.packages("psych")
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.1.3
-- Attaching packages --------------------------------------- tidyverse 1.3.2 --
v ggplot2 3.3.6      v purrr   0.3.4 
v tibble  3.1.8      v dplyr   1.0.10
v tidyr   1.2.1      v stringr 1.4.1 
v readr   2.1.3      v forcats 0.5.2 
Warning: package 'ggplot2' was built under R version 4.1.3
Warning: package 'tibble' was built under R version 4.1.3
Warning: package 'tidyr' was built under R version 4.1.3
Warning: package 'readr' was built under R version 4.1.3
Warning: package 'dplyr' was built under R version 4.1.3
Warning: package 'stringr' was built under R version 4.1.3
Warning: package 'forcats' was built under R version 4.1.3
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library(psych)
Warning: package 'psych' was built under R version 4.1.3

Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha
str(mpg)
tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
 $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
 $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
 $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr [1:234] "f" "f" "f" "f" ...
 $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr [1:234] "p" "p" "p" "p" ...
 $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
head(mpg)
# A tibble: 6 x 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa~
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa~
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa~
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa~
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa~
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa~
describe(mpg)
              vars   n    mean    sd median trimmed   mad    min  max range
manufacturer*    1 234    7.76  5.13    6.0    7.68  5.93    1.0   15  14.0
model*           2 234   19.09 11.15   18.5   18.98 14.08    1.0   38  37.0
displ            3 234    3.47  1.29    3.3    3.39  1.33    1.6    7   5.4
year             4 234 2003.50  4.51 2003.5 2003.50  6.67 1999.0 2008   9.0
cyl              5 234    5.89  1.61    6.0    5.86  2.97    4.0    8   4.0
trans*           6 234    5.65  2.88    4.0    5.53  1.48    1.0   10   9.0
drv*             7 234    1.67  0.66    2.0    1.59  1.48    1.0    3   2.0
cty              8 234   16.86  4.26   17.0   16.61  4.45    9.0   35  26.0
hwy              9 234   23.44  5.95   24.0   23.23  7.41   12.0   44  32.0
fl*             10 234    4.63  0.70    5.0    4.77  0.00    1.0    5   4.0
class*          11 234    4.59  1.99    5.0    4.64  2.97    1.0    7   6.0
               skew kurtosis   se
manufacturer*  0.21    -1.63 0.34
model*         0.11    -1.23 0.73
displ          0.44    -0.91 0.08
year           0.00    -2.01 0.29
cyl            0.11    -1.46 0.11
trans*         0.29    -1.65 0.19
drv*           0.48    -0.76 0.04
cty            0.79     1.43 0.28
hwy            0.36     0.14 0.39
fl*           -2.25     5.76 0.05
class*        -0.14    -1.52 0.13

Glimpse is also useful to see the data

glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "~
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "~
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.~
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200~
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, ~
$ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto~
$ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4~
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1~
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2~
$ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p~
$ class        <chr> "compact", "compact", "compact", "compact", "compact", "c~

It is essential to recognize that variables may be: int (integer), num (numeric), or double vs char (character) and factor (for categories)

Typically, chr or factor are used for discrete variables and int, dbl, or num for continuous variables.

Now make a scatterplot using ggplot2

Make a scatterplot of city vs highway miles per gallon, but sort/color points by either 4-wheel, front-wheel, or rear-wheel drive

Here is how we will code:

  1. name the plot: “plot1” <-

  2. call back the name of the dataset “mpg” and “pipe it” (more on that later) to create the frame for your plot

  3. call “ggplot” to make a set of axes, with the aesthetics (aes) for city and highway mpg, but color points by the factors for drv

  4. add geom_point to see the points

  5. call plot1 to see the entire plot

You can help accessibility by adding captions and tags in the r code heading. This is what the code looks like in R Studio:

{r fig.cap = “Figure X: City MPG vs Highway MPG”, fig.alt = “A scatterplot showing positive, relatively strong relationship between city and highway miles per gallon. The points representing each of the three vehicle drive modes, 4-wheel drive, front-wheel drive, and 4-wheel drive are clustered with rear-wheel drive at the lower end, 4-wheel drive also lower and front-wheel drive at the upper end.”}

plot1 <- mpg %>% ggplot(aes(cty, hwy, color = drv))+ 
  geom_point()
plot1

A scatterplot showing positive, relatively strong relationship between city and highway miles per gallon. The points representing each of the three vehicle drive modes, 4-wheel drive, front-wheel drive, and 4-wheel drive are clustered with rear-wheel drive at the lower end, 4-wheel drive also lower and front-wheel drive at the upper end.

Figure X: City MPG vs Highway MPG

Notice that the blue points for rear-wheel drive are only at the lower left side of the plot (i.e., not great mpg). Red points for 4-wheel drive have a wider spread of points, but they are also mainly at the lower left corner of the plot. The green points for front-wheel drive are mostly at the upper right, for the higher mpg.

Add a title and labels

Although there are already axes labels, we can do better. We should also add a title

plot2 <- mpg %>% ggplot(aes(cty, hwy, color = drv))+ 
  geom_point()+
  xlab("City miles per gallon") +
  ylab("Highway miles per gallon") +
  ggtitle("Scatterplot of City versus Highway Miles per Gallon")
plot2

A scatterplot showing positive, relatively strong relationship between city and highway miles per gallon. The points representing each of the three vehicle drive modes, 4-wheel drive, front-wheel drive, and 4-wheel drive are clustered with rear-wheel drive at the lower end, 4-wheel drive also lower and front-wheel drive at the upper end.

Figure X: City MPG vs Highway MPG

Install colorblindr package to improve visibility for colorblind audience

#remotes::install_github("clauswilke/colorblindr",  force = TRUE)
plot3 <- mpg %>% ggplot(aes(cty, hwy, color = drv))+ 
  geom_point()+
  xlab("City miles per gallon") +
  ylab("Highway miles per gallon") +
  ggtitle("Scatterplot of City versus Highway Miles per Gallon") +
  colorblindr::scale_color_OkabeIto()   # these are good colorblind colors
plot3

A scatterplot showing positive, relatively strong relationship between city and highway miles per gallon. The points representing each of the three vehicle drive modes, 4-wheel drive, front-wheel drive, and 4-wheel drive are clustered with rear-wheel drive at the lower end, 4-wheel drive also lower and front-wheel drive at the upper end.

Figure X: City MPG vs Highway MPG

Using palmerpenguins library to explore accessibility

library(palmerpenguins)
Warning: package 'palmerpenguins' was built under R version 4.1.3

Scatterplot with alt tags for screen readers

Combine fig.cap for the Figure label and fig.alt for the alt text

Go back through the chunks presented above and look for fig.cap and fig.alt - these are tags for screen readers to improve accessibility in your document. The colors darkorange, purple, and cyan4 improve visibility of colors for colorblind access

ggplot(data = penguins, aes(x = flipper_length_mm,
                            y = bill_length_mm,
                            color = species)) +
  geom_point(aes(shape = species), alpha = 0.8) +
  scale_color_manual(values = c("darkorange","purple","cyan4")) 
Warning: Removed 2 rows containing missing values (geom_point).

Scatterplot of flipper length by bill length of 3 penguin species, where we show penguins with bigger flippers have bigger bills.

Bigger flippers, bigger bills