This is an exercise that explores rainfall data from weather stations located at Belfast, Dublin Airport, University College Galway, and Cork Airport.The rainfall data is reuqired to be shown on a time series on a monthly basis. Time series analysis has become a major tool in different applications in meteorological phenomena such as rainfall. A time series is a set of observations of a variable, rainfall in this case, measured at equally spaced time intervals. Rainfall is the most critical and key variable in the atnosphere, and hydrological cycle. Rainfall is a factor that has great economic and social significance, and it is imperative to try and identify past patterns of Irish rainfall, particularly in the stations mentioned above. Creating a dygraph is also required for the four weather stations that are being analyzed in this assignment. Dygraphs is a fast, flexible open source JavaScript library. It allows users to explore and interpret dense data sets, such as the one being used for this task.
setwd("/Users/michael/Downloads")
The libraries needed for this assignment are below
library(ggvis)
library(tidyverse)
## ── Attaching packages ────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dygraphs)
library(dplyr)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(leaflet)
The data must be loaded into R, which was found on Moodle. We can also get a first glimpse at the data by using the ‘head’ function.
load("rainfall.RData")
head(stations)
## # A tibble: 6 x 9
## Station Elevation Easting Northing Lat Long County Abbreviation Sour…
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 Athboy 87 270400 261700 53.6 -6.93 Meath AB Met …
## 2 Foulks… 71 284100 118400 52.3 -6.77 Wexfo… F Met …
## 3 Mullin… 112 241780 247765 53.5 -7.37 Westm… M Met …
## 4 Portlaw 8 246600 115200 52.3 -7.31 Water… P Met …
## 5 Rathdr… 131 319700 186000 52.9 -6.22 Wickl… RD Met …
## 6 Stroke… 49 194500 279100 53.8 -8.1 Rosco… S Met …
We can carry out some initial exploration of the rainfall data. The variables of this file can be seen. The rainfall records date from 1850-2014, for each station. Spanning over a 164 year period. There are no missing values.
head(rain)
## # A tibble: 6 x 4
## Year Month Rainfall Station
## <dbl> <fct> <dbl> <chr>
## 1 1850 Jan 169 Ardara
## 2 1851 Jan 236. Ardara
## 3 1852 Jan 250. Ardara
## 4 1853 Jan 209. Ardara
## 5 1854 Jan 188. Ardara
## 6 1855 Jan 32.3 Ardara
The %>% command acts a pipeline, speeding up the process of large amounts of code.The command below is used to summarise all the data rainfall into their appropriate stations, in which the rainfall occurred. The ‘summarise’ command applies a summary function to each group and creates a new data frame with one entry for each group. This is where we can carry out some initial data exploration.
We can see from the table that Ardara station is home to the greatest median rainfall value with a total of 132mm, and Armagh station had the lowest values with 68mm of median rainfall
library(dplyr)
rain %>% group_by(Station) %>%
summarise(mrain=mean(Rainfall)) -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
## Station mrain
## <chr> <dbl>
## 1 Ardara 140.
## 2 Armagh 68.3
## 3 Athboy 74.7
## 4 Belfast 87.1
## 5 Birr 70.8
## 6 Cappoquinn 121.
It is also possible to group the rainfall data into their individual months to gain a greater insight, whilst exploring the large dataset.
rain %>% group_by(Month) %>%
summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
## Month mrain
## <fct> <dbl>
## 1 Jan 113.
## 2 Feb 83.2
## 3 Mar 79.5
## 4 Apr 68.7
## 5 May 71.3
## 6 Jun 72.7
It is possible to visualise the data that has been grouped into their individual months by using a simple graphic tool such a bar plot seen below.
barplot(rain_months$mrain,names=rain_months$Month,las=3,col='firebrick')
We can also group the large data set by year. We can create a time series of the total rainfall and group it into years for observation.
rain %>% group_by(Year) %>%
summarise(total_rain=sum(Rainfall)) -> rain_years
with(rain_years,plot(Year,total_rain,type='l',col='dodgerblue'))
It is also possible to visualise a very specific piece of the data. It is possible to create a line graph for an individual weather station. An example of an individual station’s rainfall data being manipulated into a line is the station of Ardara, which can be seen below.
rain %>% group_by(Year) %>%
filter(Station=='Ardara') %>%
summarise(total_rain=sum(Rainfall)) -> rain_years_str
with(rain_years_str,plot(Year,total_rain,type='l',col='darkgreen'))
We can see the median rainfall values in their monthly grouped stations.
rain %>% group_by(Month,Station) %>%
summarise(mean_rain=mean(Rainfall)) -> rain_season_station
head(rain_season_station)
## # A tibble: 6 x 3
## # Groups: Month [1]
## Month Station mean_rain
## <fct> <chr> <dbl>
## 1 Jan Ardara 175.
## 2 Jan Armagh 74.6
## 3 Jan Athboy 84.9
## 4 Jan Belfast 101.
## 5 Jan Birr 79.9
## 6 Jan Cappoquinn 154.
The data can be rearranged to be viewed differently. This can be performed by using the ‘reshape2’ package. As we can see from the table of results below, they are now organised into a 2D array.
library(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
## Using mean_rain as value column: use value.var to override.
## Jan Feb Mar Apr May Jun
## Ardara 174.82606 126.82303 123.02000 98.79333 96.90727 105.24061
## Armagh 74.57242 55.97182 56.48879 53.67030 59.23182 62.72939
## Athboy 84.94759 62.62133 62.44944 58.97874 62.16260 68.11460
## Belfast 101.20718 74.50206 73.10221 65.90492 69.23426 74.48525
## Birr 79.92074 57.88501 58.42056 54.07187 60.25831 61.96445
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437 94.86357
## Jul Aug Sep Oct Nov Dec
## Ardara 123.70485 145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh 72.50636 81.92182 69.02576 80.94242 73.73121 79.05939
## Athboy 76.28662 88.84710 76.35077 88.14627 80.95767 87.05997
## Belfast 87.70003 102.44499 87.11622 106.35394 100.31760 102.95073
## Birr 75.10084 86.75669 72.21722 83.80810 77.87266 81.74334
## Cappoquinn 104.09372 125.42245 116.04466 146.70658 141.10012 154.87402
Another way of visualising this rainfall data is through the graphic tool of a heatmap. This task can be carried out by also using the reshape2 package.
library(reshape2)
rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.
The ‘mutate’ function allows us to create new variable. We can see the relative variability of rainfall in a table below, from the code entered.
rain %>% group_by(Month) %>%
summarise(mean_rain=mean(Rainfall),sd_rain=sd(Rainfall)) %>%
mutate(cv_rain=100 * sd_rain/mean_rain) -> rain_mnsdcv
head(rain_mnsdcv)
## # A tibble: 6 x 4
## Month mean_rain sd_rain cv_rain
## <fct> <dbl> <dbl> <dbl>
## 1 Jan 113. 57.6 51.1
## 2 Feb 83.2 51.5 61.8
## 3 Mar 79.5 44.3 55.7
## 4 Apr 68.7 36.4 52.9
## 5 May 71.3 37.2 52.2
## 6 Jun 72.7 40.9 56.3
To visualise the relative variability, it is possible to use the simple graphic tool of a barplot, for clear and concise results.
barplot(rain_mnsdcv$cv_rain,names=rain_mnsdcv$Month,las=3,col='dodgerblue')
If we want to visualise abolute variability, it is also possible to use a barplot, this woul be the best method for comparison of relative and absolute variability.
barplot(rain_mnsdcv$sd_rain,names=rain_mnsdcv$Month,las=3,col='dodgerblue')
rain %>% group_by(Year,Month) %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> rain_ts
rain_ts %>% window(c(1870,1),c(1871,12))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8 789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5 657.6 2040.1 3705.0 1869.9 2083.4 2774.3
## Nov Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0
The next thing we want to do is start employing the dygraphs package. Dygraphs prove instrumental in visualising large-scale data. This is done by using the pipeline code (%>%) of the rain time series with the dygraph package.
library(dygraphs)
rain_ts %>% dygraph
The interactive nature of a dygraph can be enhanced by using the dryrangeselector function. This will make it even more accessible to navigate through the data.The height and the width of the rangeselector must also be specified.
rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector
rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector %>% dyRoller(rollPeriod = 600)
The next step is to look at multiple dygraphs. These are important if you want to examine them simultaneously. This can be performed by employing the ‘group’ option in ‘dygraph’. Cork, Galway, Dublin and Belfast are filtered out, and the ‘cbind’ function is used to create a multiple time series. The range celector and roller controls are employed by using the pipeline command %>% .
rain %>% group_by(Year,Month) %>% filter(Station=="Cork Airport") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> cor_ts
rain %>% group_by(Year,Month) %>% filter(Station=="University College Galway") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> gal_ts
rain %>% group_by(Year,Month) %>% filter(Station=="Dublin Airport") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> dub_ts
rain %>% group_by(Year,Month) %>% filter(Station=="Belfast") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> bel_ts
galcorbeldub_ts <- cbind(gal_ts,cor_ts,bel_ts,dub_ts)
window(galcorbeldub_ts,c(1850,1),c(1850,5))
## gal_ts cor_ts bel_ts dub_ts
## Jan 1850 108.9 155.3 115.7 75.8
## Feb 1850 131.5 92.6 120.5 47.8
## Mar 1850 56.6 56.0 56.8 18.5
## Apr 1850 120.5 207.2 142.6 97.5
## May 1850 69.8 35.3 57.9 58.6
Now we can visualise the rainfall data from each of the 4 stations above by using a dygraph and the rangeselector control
galcorbeldub_ts %>% dygraph(width=800,height=360) %>% dyRangeSelector
It is also possible to analyse the data through a dygraph in their individual stations. This is simply an alternative view.
cor_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="Cork") %>% dyRangeSelector()
gal_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="University College Galway") %>% dyRangeSelector
dub_ts %>% dygraph(width=800,height=130,group="gal_dub_belf_cor",main="Dublin Airport") %>% dyRangeSelector()
bel_ts %>% dygraph(width=800,height=170,group="gal_dub_belf_cor",main="Belfast") %>% dyRangeSelector()
It can be seen from these individual dygraphs that the four weather stations examined all follow relatively similar patterns from the time period 1850-2014. There is a great amount of variability in the rainfall from year to year. The rainfall appears to spike approximately every decade throughout the observation. There are also some irregular/random components or outliers that are similar at each station observed around 1900 and 2000 in particular.
The objective of this assignment was to present the data of the weather stations at Dublin Airport, University College Galway, Belfast, and Cork Airport and create a dygraph of these specific stations, showing a time series of rainfall on a monthly basis. Including a RangeSelector control was also instrumental in controlling the time window on the four time series. The dygraph function is an incredibly effevtive method of visualising and exploring large, complex datasets, and it can be carried out to examine a myriad of variables.