Advanced R and Data Analysis

Overview


  • More on time series
  • Making interactive tools
  • Some new spatial techniques
  • Reproducibility

R studio

Looking at the rainfall data

💧 Main working data set is rainfall.RData


📥 Get it from Moodle in section headed Chris Brunsdon’s Lectures.
load('rainfall.RData')
head(stations)
## # A tibble: 6 x 9
##   Station Elevation Easting Northing   Lat  Long County Abbreviation Source
##   <chr>       <int>   <dbl>    <dbl> <dbl> <dbl> <chr>  <chr>        <chr> 
## 1 Athboy         87  270400   261700  53.6 -6.93 Meath  AB           Met E…
## 2 Foulks…        71  284100   118400  52.3 -6.77 Wexfo… F            Met E…
## 3 Mullin…       112  241780   247765  53.5 -7.37 Westm… M            Met E…
## 4 Portlaw         8  246600   115200  52.3 -7.31 Water… P            Met E…
## 5 Rathdr…       131  319700   186000  52.9 -6.22 Wickl… RD           Met E…
## 6 Stroke…        49  194500   279100  53.8 -8.1  Rosco… S            Met E…

This packages all of the rainfall information


head(rain)
## # A tibble: 6 x 4
##    Year Month Rainfall Station
##   <dbl> <fct>    <dbl> <chr>  
## 1  1850 Jan      169   Ardara 
## 2  1851 Jan      236.  Ardara 
## 3  1852 Jan      250.  Ardara 
## 4  1853 Jan      209.  Ardara 
## 5  1854 Jan      188.  Ardara 
## 6  1855 Jan       32.3 Ardara

  • Its been converted to an R binary data file for convenience.
    • There are no missing values.
    • Data runs from 1850 to 2014 by month.
    • 👍🏻Thanks to Conor Murphy and Simon Noone for supplying.

Introduction to the dplyr package

🎁 A new package dplyr


library(dplyr)
rain %>% group_by(Station) %>% 
  summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     140. 
## 2 Armagh      68.3
## 3 Athboy      74.7
## 4 Belfast     87.1
## 5 Birr        70.8
## 6 Cappoquinn 121.

  • The %>% command acts as a ‘pipeline’.
    • x %>% f(y) is the same as f(x,y).
    • x %>% f1(y) %>% f2(z) is the same as f2(f1(x,y),z) but easier to read.

📆 Group by month


rain %>% group_by(Month) %>% 
  summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
##   Month mrain
##   <fct> <dbl>
## 1 Jan   113. 
## 2 Feb    83.2
## 3 Mar    79.5
## 4 Apr    68.7
## 5 May    71.3
## 6 Jun    72.7

summarise applies a summary function to each group and creates a new data frame with one entry for each group. Any R summary function can be used - eg median, sd, max or a user defined function.

📊 Basic graphic investigation


library(ggplot2)
ggplot(rain_months,aes(x=Month,y=mrain)) + geom_col() 

📊 What does that mean?

  • library(ggplot2) Load ggplot2 - this is an R graphics library
  • Enter line: ggplot(rain_months,aes(x=Month,y=mrain)) + geom_col()
  • ggplot is the main command - it creates a graph
    • aes Aesthetics link variables to characteristics of the graph
    • Here x is month and y is mean rainfall
  • geom_col specifies the kind of graph - here it is a column plot
  • you add a geometry to the graph object to specify the actual graph

📊 Modifying the style of plots


ggplot(rain_months,aes(x=Month,y=mrain)) + geom_col(fill='dodgerblue')  

📈 Also for yearly data


Here we use geom_line to get a line graph, rather than column.

rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(rain_years,aes(x=Year,y=total_rain)) +  geom_line(col='indianred')