Advanced R and Data Analysis



Overview


  • More on time series
  • Making interactive tools
  • Some new spatial techniques
  • Reproducibility

R studio




Looking at the rainfall data



💧 Main working data set is rainfall.RData


📥 Get it from Moodle in section headed Chris Brunsdon’s Lectures.
load('rainfall.RData')
head(stations)
## # A tibble: 6 × 9
##       Station Elevation Easting Northing   Lat  Long    County
##         <chr>     <int>   <dbl>    <dbl> <dbl> <dbl>     <chr>
## 1      Athboy        87  270400   261700 53.60 -6.93     Meath
## 2 Foulksmills        71  284100   118400 52.30 -6.77   Wexford
## 3   Mullingar       112  241780   247765 53.47 -7.37 Westmeath
## 4     Portlaw         8  246600   115200 52.28 -7.31 Waterford
## 5    Rathdrum       131  319700   186000 52.91 -6.22   Wicklow
## 6 Strokestown        49  194500   279100 53.75 -8.10 Roscommon
## # ... with 2 more variables: Abbreviation <chr>, Source <chr>


This packages all of the rainfall information


head(rain)
## # A tibble: 6 × 4
##    Year  Month Rainfall Station
##   <dbl> <fctr>    <dbl>   <chr>
## 1  1850    Jan    169.0  Ardara
## 2  1851    Jan    236.4  Ardara
## 3  1852    Jan    249.7  Ardara
## 4  1853    Jan    209.1  Ardara
## 5  1854    Jan    188.5  Ardara
## 6  1855    Jan     32.3  Ardara

  • Its been converted to an R binary data file for convenience.
  • There are no missing values.
  • Data runs from 1850 to 2014 by month.
  • 👍🏻Thanks to Conor Murphy and Simon Noone for supplying.


Introduction to the dplyr package



🎁 A new package dplyr


library(dplyr)
rain %>% group_by(Station) %>% 
  summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 × 2
##      Station     mrain
##        <chr>     <dbl>
## 1     Ardara 140.36753
## 2     Armagh  68.32096
## 3     Athboy  74.74356
## 4    Belfast  87.10995
## 5       Birr  70.83498
## 6 Cappoquinn 121.22455

  • The %>% command acts as a ‘pipeline’.
  • x %>% f(y) is the same as f(x,y).
  • x %>% f1(y) %>% f2(z) is the same as f2(f1(x,y),z) but easier to read.


📆 Group by month


rain %>% group_by(Month) %>% 
  summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 × 2
##    Month     mrain
##   <fctr>     <dbl>
## 1    Jan 112.64355
## 2    Feb  83.24975
## 3    Mar  79.53280
## 4    Apr  68.74165
## 5    May  71.31769
## 6    Jun  72.74104

summarise applies a summary function to each group and creates a new data frame with one entry for each group. Any R summary function can be used - eg median, sd, max or a user defined function.



📊 Basic graphic investigation


barplot(rain_months$mrain,names=rain_months$Month,las=3,col='dodgerblue')



📊 Shorter version using with


with(rain_months,barplot(mrain,names=Month,las=3,col='dodgerblue'))



📈 Also for yearly data


rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
with(rain_years,plot(Year,total_rain,type='l',col='dodgerblue'))