Further Graphics and
Relationships Between Variables

Chris Brunsdon

Further Graphics and ggplot

What is ggplot

  • An alternative way of creating graphics in R
    • Its been around for around 10 years
  • Original (base R) around since R began
    • Over 25 years
  • Easier for certain things
  • More consistent
    • Generally different kinds of plot specified in the same way
  • Ideas demonstrated below

An example data set

library(tidyverse)
buoy <- read_csv('weather_buoy_daily.csv')
head(buoy)
## # A tibble: 6 x 11
##   date       station_id AtmosphericPres… WindSpeed  Gust WaveHeight WavePeriod
##   <date>     <chr>                 <dbl>     <dbl> <dbl>      <dbl>      <dbl>
## 1 2001-02-06 M1                     976.     14.8   23.3         NA         NA
## 2 2001-02-07 M1                     992.     15.7   24.7         NA         NA
## 3 2001-02-08 M1                    1007.      8.69  15.3         NA         NA
## 4 2001-02-09 M1                    1009.     25.5   35.7         NA         NA
## 5 2001-02-10 M1                    1004.     19.4   27.9         NA         NA
## 6 2001-02-11 M1                    1012.     17.9   27.0         NA         NA
## # … with 4 more variables: AirTemperature <dbl>, DewPoint <dbl>,
## #   SeaTemperature <dbl>, RelativeHumidity <dbl>

Further Details

  • Real time meteorological and oceanographic data collected from the Irish moored Weather Buoy network of stations.
  • Atmospheric Pressure (mbar) AtmosphericPressure
  • Air Temperature (°C) AirTemperature
  • DewPoint Temperature (°C) DewPoint
  • Wind Speed (knots) WindSpeed
  • Max Gust Wind Speed (knots) Gust
  • Sea Surface Temperature (°C) SeaTemperature
  • Wave Period (s) WavePeriod
  • Wave Height (m) WaveHeight
  • Relative Humidity (%) RelativeHumidity
  • License: https://creativecommons.org/licenses/by/4.0/
    • I have computed daily averages

First ggplot exploration

ggplot(buoy,aes(x=AirTemperature,y=AtmosphericPressure)) + geom_point()

Whats going on?

  • ggplot(buoy,aes(...))
    • tells R that you want a plot using the data frame buoy
  • aes(x=...,y=...,...)
    • the aesthetics: which variables in the data connect to which visual characteristic
    • Here x is AirTemperature and y is AtmosphericPressure
  • geom_point()
    • adds a geometry - a mapping from the x and y aesthetics to a visual entity (here to points)
  • Note + is used here to build up the specification of the graphic.

A different geometry

ggplot(buoy,aes(x=AirTemperature,y=AtmosphericPressure)) +
  geom_density_2d_filled()

Overlaying geometries

ggplot(buoy,aes(x=AirTemperature,y=AtmosphericPressure)) + geom_point(size=0.3)  +
  geom_density_2d_filled(alpha=0.5)

  • alpha is transparency, 0=invisible, 1=solid

Designer vs. Painter Model

Painter Designer
Code Type Standard R ggplot
Method Issue commands directly Specify design of graphic
Example plot(xvar,yvar)
abline(lm(y~x))
ggplot(dat,aes(x=xvar,y=yvar))+geom_point()+
geom_smooth(method='lm')
Approach Think about what to draw on the
graph as a set of steps
Think about how you want variables to be
represented in the graphic

More examples

ggplot(buoy,aes(x=date,y=AirTemperature)) +
  geom_line()

- Note - This incorporates figures for all buoys.

More examples

ggplot(buoy,aes(x=date,y=AirTemperature, col=station_id)) +
  geom_point()
## Warning: Removed 169 rows containing missing values (geom_point).

Facet Approach

ggplot(buoy,aes(x=date,y=AirTemperature)) +
  geom_line() + facet_wrap(~station_id)

  • Some missing data (and time coverage of different buoys) have become visible
  • Note the line geometry links gaps in data with a line

More detail about the gaps

ggplot(buoy,aes(x=date,y=station_id)) + geom_point()

Graphics for Exploration

  • Previous graph more use to check the data
  • not for analysis
  • … or for public display (like R-numbers on NPHET press conferences, for example)
  • Public vs. private analyses
  • Exploration to help you understand the data
  • Exploration to help you understand the process

Variable Types

  • Also note that the y aesthetic was categorical, not a number
  • … and the x aesthetic is a <date> not an ordinary number
head(buoy)
## # A tibble: 6 x 11
##   date       station_id AtmosphericPres… WindSpeed  Gust WaveHeight WavePeriod
##   <date>     <chr>                 <dbl>     <dbl> <dbl>      <dbl>      <dbl>
## 1 2001-02-06 M1                     976.     14.8   23.3         NA         NA
## 2 2001-02-07 M1                     992.     15.7   24.7         NA         NA
## 3 2001-02-08 M1                    1007.      8.69  15.3         NA         NA
## 4 2001-02-09 M1                    1009.     25.5   35.7         NA         NA
## 5 2001-02-10 M1                    1004.     19.4   27.9         NA         NA
## 6 2001-02-11 M1                    1012.     17.9   27.0         NA         NA
## # … with 4 more variables: AirTemperature <dbl>, DewPoint <dbl>,
## #   SeaTemperature <dbl>, RelativeHumidity <dbl>
  • Other types
    • <chr> : Character (can be used for categorical)
    • <dbl> : Numeric (from double precision)

More geometries - boxplots

ggplot(buoy,aes(x=WindSpeed,y=station_id)) + geom_boxplot()
## Warning: Removed 2144 rows containing non-finite values (stat_boxplot).

More Geometries - Trends (1)

More Geometries - Trends (2)

More Geometries - Trends (3)

More Geometries - Trends (4)

More Geometries - Trends (5)

Further Geometries and Faceting

ggplot(buoy,aes(x=AirTemperature,y=SeaTemperature)) +
  geom_hex() + facet_wrap(~station_id)

  • Divides graph into hexagons, counts how many in each
  • For this to work, install the hexbin package first

Other Design Artifacts

Scale

  • the scale adds fine controls to how values are mapped to aesthetics.
  • eg fill colour in the previous example
ggplot(buoy,aes(x=AirTemperature,y=SeaTemperature)) +
  geom_hex() + facet_wrap(~station_id) + scale_fill_viridis_c()

Scale

  • the scale adds fine controls to how values are mapped to aesthetics.
  • eg x direction reverse in the previous example
ggplot(buoy,aes(x=AirTemperature,y=SeaTemperature)) +
  geom_hex() + facet_wrap(~station_id) + scale_x_reverse()

Coordinates 1

  • A bit of an overlap with scale but
    • only positional and refers to both x and y
ggplot(buoy,aes(x=AirTemperature,y=SeaTemperature)) +
  geom_hex() + facet_wrap(~station_id) + coord_equal()

Coordinates 2

  • Polar (ie angle/radius) is another coord characteristic
library(lubridate)
buoy <- buoy %>% mutate(Month=month(date,label=TRUE))
ggplot(buoy,aes(y=AirTemperature,x=Month,group=Month) )+ geom_boxplot() + coord_polar()

Coordinates 3

  • Maybe reverse direction for temperature?
library(lubridate)
buoy <- buoy %>% mutate(Month=month(date,label=TRUE))
ggplot(buoy,aes(y=AirTemperature,x=Month,group=Month)) +
  geom_boxplot() + coord_polar() + scale_y_reverse()

Final adjustments - Labelling

  • Alter labelling on plots
ggplot(buoy,aes(y=AirTemperature,x=Month,group=Month)) +
  geom_boxplot() +  labs(x="Month of Year",y="Temperature (°C)",title="Air Temperature")

Final adjustments - Theme

  • The overall ‘look’ of the graphics
ggplot(buoy,aes(y=AirTemperature,x=Month,group=Month)) +
  geom_boxplot() +  
  labs(x="Month of Year",y="Temperature (°C)",title="Air Temperature") + theme_minimal()

Add-ons - ggthemes

  • You need to install package ggthemes
library(ggthemes)
ggplot(buoy,aes(y=AirTemperature,x=Month,group=Month)) +
  geom_boxplot() +  
  labs(x="Month of Year",y="Temperature (°C)",title="Air Temperature") + theme_economist()

Final Observation - Data Types

  • Note that there a number of different kinds of variable
buoy %>% select(date,station_id,WindSpeed,Month)
## # A tibble: 26,352 x 4
##    date       station_id WindSpeed Month
##    <date>     <chr>          <dbl> <ord>
##  1 2001-02-06 M1             14.8  Feb  
##  2 2001-02-07 M1             15.7  Feb  
##  3 2001-02-08 M1              8.69 Feb  
##  4 2001-02-09 M1             25.5  Feb  
##  5 2001-02-10 M1             19.4  Feb  
##  6 2001-02-11 M1             17.9  Feb  
##  7 2001-02-12 M1              9.00 Feb  
##  8 2001-02-13 M1             18.1  Feb  
##  9 2001-02-14 M1             20.5  Feb  
## 10 2001-02-15 M1              6.79 Feb  
## # … with 26,342 more rows

Some of the types

  • <dbl> : Ordinary numeric variable eg. speed, height, etc.
  • <date> : Date variable
  • <chr> : Categorical variable eg Place name, station_id, colour
  • <ord> : Categorical variable with an implicit order (eg month, day)
  • ggplot Considers types when choosing format of plot
  • Other types also exist eg <dttm> (date+time)
  • Longer list here: https://tibble.tidyverse.org/articles/types.html

Conclusion

💡 New ideas

  • New general ideas

    • Designer model of graphics
    • Data types
  • New techniques

    • Using ggplot
    • aes
    • geom, scale, coord, lab, theme
  • Practical issues

    • ggplot syntax
    • Different geometries and thier plot types.
  • Next lecture - Time Series

  • This link may be useful for some of the ideas.