Question - Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.

Introduction

This is an exercise that explores rainfall data from weather stations located at Belfast, Dublin Airport, University College Galway, and Cork Airport.The rainfall data is reuqired to be shown on a time series on a monthly basis. Time series analysis has become a major tool in different applications in meteorological phenomena such as rainfall. A time series is a set of observations of a variable, rainfall in this case, measured at equally spaced time intervals. Rainfall is the most critical and key variable in the atnosphere, and hydrological cycle. Rainfall is a factor that has great economic and social significance, and it is imperative to try and identify past patterns of Irish rainfall, particularly in the stations mentioned above. Creating a dygraph is also required for the four weather stations that are being analyzed in this assignment. Dygraphs is a fast, flexible open source JavaScript library. It allows users to explore and interpret dense data sets, such as the one being used for this task.

Methodology

setwd("/Users/michael/Downloads")

Necessary Libraries

The libraries needed for this assignment are below

library(ggvis)
library(tidyverse)
## ── Attaching packages ────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dygraphs)
library(dplyr)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(leaflet)

Load Rainfall Data

The data must be loaded into R, which was found on Moodle. We can also get a first glimpse at the data by using the ‘head’ function.

load("rainfall.RData")
head(stations)
## # A tibble: 6 x 9
##   Station Elevation Easting Northing   Lat  Long County Abbreviation Sour…
##   <chr>       <int>   <dbl>    <dbl> <dbl> <dbl> <chr>  <chr>        <chr>
## 1 Athboy         87  270400   261700  53.6 -6.93 Meath  AB           Met …
## 2 Foulks…        71  284100   118400  52.3 -6.77 Wexfo… F            Met …
## 3 Mullin…       112  241780   247765  53.5 -7.37 Westm… M            Met …
## 4 Portlaw         8  246600   115200  52.3 -7.31 Water… P            Met …
## 5 Rathdr…       131  319700   186000  52.9 -6.22 Wickl… RD           Met …
## 6 Stroke…        49  194500   279100  53.8 -8.1  Rosco… S            Met …

We can carry out some initial exploration of the rainfall data. The variables of this file can be seen. The rainfall records date from 1850-2014, for each station. Spanning over a 164 year period. There are no missing values.

head(rain)
## # A tibble: 6 x 4
##    Year Month Rainfall Station
##   <dbl> <fct>    <dbl> <chr>  
## 1  1850 Jan      169   Ardara 
## 2  1851 Jan      236.  Ardara 
## 3  1852 Jan      250.  Ardara 
## 4  1853 Jan      209.  Ardara 
## 5  1854 Jan      188.  Ardara 
## 6  1855 Jan       32.3 Ardara

Dplyr Package

The %>% command acts a pipeline, speeding up the process of large amounts of code.The command below is used to summarise all the data rainfall into their appropriate stations, in which the rainfall occurred. The ‘summarise’ command applies a summary function to each group and creates a new data frame with one entry for each group. This is where we can carry out some initial data exploration.

We can see from the table that Ardara station is home to the greatest median rainfall value with a total of 132mm, and Armagh station had the lowest values with 68mm of median rainfall

library(dplyr)
rain %>% group_by(Station) %>% 
  summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     140. 
## 2 Armagh      68.3
## 3 Athboy      74.7
## 4 Belfast     87.1
## 5 Birr        70.8
## 6 Cappoquinn 121.

Grouping the data by Month

It is also possible to group the rainfall data into their individual months to gain a greater insight, whilst exploring the large dataset.

rain %>% group_by(Month) %>% 
  summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
##   Month mrain
##   <fct> <dbl>
## 1 Jan   113. 
## 2 Feb    83.2
## 3 Mar    79.5
## 4 Apr    68.7
## 5 May    71.3
## 6 Jun    72.7

Visualising Data

It is possible to visualise the data that has been grouped into their individual months by using a simple graphic tool such a bar plot seen below.

barplot(rain_months$mrain,names=rain_months$Month,las=3,col='firebrick')

Yearly Data

We can also group the large data set by year. We can create a time series of the total rainfall and group it into years for observation.

rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
with(rain_years,plot(Year,total_rain,type='l',col='dodgerblue'))

Individual Station Ardara

It is also possible to visualise a very specific piece of the data. It is possible to create a line graph for an individual weather station. An example of an individual station’s rainfall data being manipulated into a line is the station of Ardara, which can be seen below.

rain %>% group_by(Year) %>% 
  filter(Station=='Ardara') %>%
  summarise(total_rain=sum(Rainfall)) -> rain_years_str
with(rain_years_str,plot(Year,total_rain,type='l',col='darkgreen'))

Multiple Groupings

We can see the median rainfall values in their monthly grouped stations.

rain %>% group_by(Month,Station) %>% 
  summarise(mean_rain=mean(Rainfall)) -> rain_season_station
head(rain_season_station)
## # A tibble: 6 x 3
## # Groups:   Month [1]
##   Month Station    mean_rain
##   <fct> <chr>          <dbl>
## 1 Jan   Ardara         175. 
## 2 Jan   Armagh          74.6
## 3 Jan   Athboy          84.9
## 4 Jan   Belfast        101. 
## 5 Jan   Birr            79.9
## 6 Jan   Cappoquinn     154.

The data can be rearranged to be viewed differently. This can be performed by using the ‘reshape2’ package. As we can see from the table of results below, they are now organised into a 2D array.

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
## Using mean_rain as value column: use value.var to override.
##                  Jan       Feb       Mar      Apr      May       Jun
## Ardara     174.82606 126.82303 123.02000 98.79333 96.90727 105.24061
## Armagh      74.57242  55.97182  56.48879 53.67030 59.23182  62.72939
## Athboy      84.94759  62.62133  62.44944 58.97874 62.16260  68.11460
## Belfast    101.20718  74.50206  73.10221 65.90492 69.23426  74.48525
## Birr        79.92074  57.88501  58.42056 54.07187 60.25831  61.96445
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437  94.86357
##                  Jul       Aug       Sep       Oct       Nov       Dec
## Ardara     123.70485 145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh      72.50636  81.92182  69.02576  80.94242  73.73121  79.05939
## Athboy      76.28662  88.84710  76.35077  88.14627  80.95767  87.05997
## Belfast     87.70003 102.44499  87.11622 106.35394 100.31760 102.95073
## Birr        75.10084  86.75669  72.21722  83.80810  77.87266  81.74334
## Cappoquinn 104.09372 125.42245 116.04466 146.70658 141.10012 154.87402

Heatmap

Another way of visualising this rainfall data is through the graphic tool of a heatmap. This task can be carried out by also using the reshape2 package.

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.

Create New Variables

The ‘mutate’ function allows us to create new variable. We can see the relative variability of rainfall in a table below, from the code entered.

rain %>% group_by(Month) %>% 
  summarise(mean_rain=mean(Rainfall),sd_rain=sd(Rainfall)) %>%
  mutate(cv_rain=100 * sd_rain/mean_rain)  -> rain_mnsdcv
head(rain_mnsdcv)
## # A tibble: 6 x 4
##   Month mean_rain sd_rain cv_rain
##   <fct>     <dbl>   <dbl>   <dbl>
## 1 Jan       113.     57.6    51.1
## 2 Feb        83.2    51.5    61.8
## 3 Mar        79.5    44.3    55.7
## 4 Apr        68.7    36.4    52.9
## 5 May        71.3    37.2    52.2
## 6 Jun        72.7    40.9    56.3

Visualising Relative Variability

To visualise the relative variability, it is possible to use the simple graphic tool of a barplot, for clear and concise results.

barplot(rain_mnsdcv$cv_rain,names=rain_mnsdcv$Month,las=3,col='dodgerblue')

Visualising Absolute Variability

If we want to visualise abolute variability, it is also possible to use a barplot, this woul be the best method for comparison of relative and absolute variability.

barplot(rain_mnsdcv$sd_rain,names=rain_mnsdcv$Month,las=3,col='dodgerblue')

rain %>%  group_by(Year,Month) %>% 
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>% 
  ts(start=c(1850,1),freq=12) -> rain_ts
rain_ts %>% window(c(1870,1),c(1871,12))
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8  789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5  657.6 2040.1 3705.0 1869.9 2083.4 2774.3
##         Nov    Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0

The next thing we want to do is start employing the dygraphs package. Dygraphs prove instrumental in visualising large-scale data. This is done by using the pipeline code (%>%) of the rain time series with the dygraph package.

Data of all stations

library(dygraphs) 
rain_ts %>% dygraph 

Add Rangeselector Control

The interactive nature of a dygraph can be enhanced by using the dryrangeselector function. This will make it even more accessible to navigate through the data.The height and the width of the rangeselector must also be specified.

rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector

Add Interactive Rolling Mean

rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector %>% dyRoller(rollPeriod = 600)

Multiple Dygraphs

The next step is to look at multiple dygraphs. These are important if you want to examine them simultaneously. This can be performed by employing the ‘group’ option in ‘dygraph’. Cork, Galway, Dublin and Belfast are filtered out, and the ‘cbind’ function is used to create a multiple time series. The range celector and roller controls are employed by using the pipeline command %>% .

rain %>%  group_by(Year,Month) %>% filter(Station=="Cork Airport") %>%
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
  ts(start=c(1850,1),freq=12) ->  cor_ts
rain %>%  group_by(Year,Month) %>% filter(Station=="University College Galway") %>%
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
  ts(start=c(1850,1),freq=12) ->  gal_ts
rain %>%  group_by(Year,Month) %>% filter(Station=="Dublin Airport") %>%
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
  ts(start=c(1850,1),freq=12) ->  dub_ts
rain %>%  group_by(Year,Month) %>% filter(Station=="Belfast") %>%
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
  ts(start=c(1850,1),freq=12) ->  bel_ts
galcorbeldub_ts <- cbind(gal_ts,cor_ts,bel_ts,dub_ts)
window(galcorbeldub_ts,c(1850,1),c(1850,5))
##          gal_ts cor_ts bel_ts dub_ts
## Jan 1850  108.9  155.3  115.7   75.8
## Feb 1850  131.5   92.6  120.5   47.8
## Mar 1850   56.6   56.0   56.8   18.5
## Apr 1850  120.5  207.2  142.6   97.5
## May 1850   69.8   35.3   57.9   58.6

Multiple Dygraph

Now we can visualise the rainfall data from each of the 4 stations above by using a dygraph and the rangeselector control

galcorbeldub_ts %>% dygraph(width=800,height=360) %>% dyRangeSelector

Individual Dygraphs

It is also possible to analyse the data through a dygraph in their individual stations. This is simply an alternative view.

cor_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="Cork") %>% dyRangeSelector()
gal_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="University College Galway") %>% dyRangeSelector
dub_ts %>% dygraph(width=800,height=130,group="gal_dub_belf_cor",main="Dublin Airport") %>% dyRangeSelector()
bel_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="Belfast") %>% dyRangeSelector()

It can be seen from these individual dygraphs that the four weather stations examined all follow relatively similar patterns from the time period 1850-2014. There is a great amount of variability in the rainfall from year to year. The rainfall appears to spike approximately every decade throughout the observation. There are also some irregular/random components or outliers that are similar at each station observed around 1900 and 2000 in particular.

Conclusion

The objective of this assignment was to present the data of the weather stations at Dublin Airport, University College Galway, Belfast, and Cork Airport and create a dygraph of these specific stations, showing a time series of rainfall on a monthly basis. Including a RangeSelector control was also instrumental in controlling the time window on the four time series. The dygraph function is an incredibly effevtive method of visualising and exploring large, complex datasets, and it can be carried out to examine a myriad of variables.