Assignment 2: Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.
The main working data set used is rainfall.RData. It contains rainfall data from observation stations across the island of Ireland that has been converted to an R binary data file for ease of use and accessibility in R. Information regarding the stations is also available within the data file, such as elevation, latitude and longitude, and source. The data set was obtained through Prof. Christopher Brunsdon, supplied from Prof. Conor Murphy and Simon Noone. No missing values are present and the data range covers 1850 to 2014 by month.
The packages operated in RStudio in order to produce the expected results include dplyr and dygraphs. Libraries like dplyr involve the tidyverse approach and pipelines, effective in a flow-based approach for a more modern method of writing code. For more complicated expressions, it can appear more straightforward and easier to read. The dygraph package is a dynamic graph library that produces dygraphs from time series. A dygraph is an interactive graph that can show patterns within a time series as well as additional details at a specific chosen point for a more interactive interrogation. As dygraph works in a pipeline, the dplyr package acts as a competent collaborator.
All data sets and libraries mentioned are loaded into RStudio as shown below:
library(dplyr)
library(dygraphs)
load('rainfall.RData')
For the purposes of describing the code used to produce to final dygraph, two stages are examined. These include firstly, grouping data and forming the time series and, secondly, creating the dygraph.
rain %>% group_by(Year,Month) %>% filter(Station=='Belfast') %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850,1),frequency = 12) -> bel_ts
rain %>% group_by(Year,Month) %>% filter(Station=='Dublin Airport') %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850,1),frequency = 12) -> dub_ts
rain %>% group_by(Year,Month) %>% filter(Station=='University College Galway') %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850,1),frequency = 12) -> gal_ts
rain %>% group_by(Year,Month) %>% filter(Station=='Cork Airport') %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850,1),frequency = 12) -> cor_ts
bdgc_ts <- cbind(bel_ts,dub_ts,gal_ts,cor_ts)
window(bdgc_ts, c(1850,1), c(1850,5))
## bel_ts dub_ts gal_ts cor_ts
## Jan 1850 115.7 75.8 108.9 155.3
## Feb 1850 120.5 47.8 131.5 92.6
## Mar 1850 56.8 18.5 56.6 56.0
## Apr 1850 142.6 97.5 120.5 207.2
## May 1850 57.9 58.6 69.8 35.3
Firstly, the rain data is grouped using the group_by command. The group_by function does not change the data table. Instead, it ensures that future commands will be associated to selected criteria, in this case, the year and month. Year and month were chosen as to ensure that the time series to be created will show the entire range of rain data from 1850 to 2014 on a monthly basis. The data is also filtered to isolate the appropriate station for the monthly time series. The Rainfall variable is then summarised, giving the total rainfall for each year and month within selected stations. The ungroup function is then implemented to prevent grouping by month and year going forwards. transmute is used to select the wanted column as a variable and discarding unreferenced variables. The ts function or time series is applied at a yearly rate beginning in the first month of 1850.
In this case, the information is stored within a time series for the Belfast station and appropriately named. The process is then repeated for Dublin Airport, University College Galway and Cork Airport. The cbind function, or column bind, is implemented to bind all four time series and create a compound time series names bdgc_ts. A sample is shown using window.
dub_ts %>% dygraph(width = 800, height = 109, group = "bdgc_ts", main="Dublin Airport")
bel_ts %>% dygraph(width = 800, height = 140, group = "bdgc_ts", main="Belfast")
gal_ts %>% dygraph(width = 800, height = 140, group = "bdgc_ts", main="University College Galway", y= 'Rainfall (mm)')
cor_ts %>% dygraph(width = 800, height = 220, group = "bdgc_ts", main="Cork Airport", x='Date') %>%
dyRangeSelector()
Secondly, there are different options available to show the new time series. A multiple dygraph could show the data on the same graph, though this creates issues regarding reading the data clearly, particularly with four station time series. Instead, the information is displayed on separate graphs, linked together with the same controls for easy distinction between the station data. Small multiples of plots can be more effective in conveying information compared to one plot with several sets of data that can overlap.
A plot is created for each of the individual station time series established in the previous section. They are connected though the group option with the compound time series bdgc_ts. The parameters of height and width are set, along with the appropriate title for each station. The height of each of the time series varies as each was chosen in order to establish a similar point of reference, in this case, the mark of 200mm of rainfall, for comparison purposes in identifying patterns between the stations. As each station’s observations include variability with their rainfall ranges, the height is adjusted to display this point of reference uniformly between the plots.
A RangeSelector allows for an increased interactive investigation of the data, allowing the user to change the axis parameters within the dygraph for all time series simultaneously. Focusing on distinct areas of interest and establishing desired time ranges within the time series are just some of the uses of the RangeSelector. The dyRangeSelector function is used on the last time series, in this case, the Cork Airport station, as this locates the RangeSelector underneath all the graphs and for ease of use and visualisation purposes.
At first glance at the plots, Cork Airport displays a higher degree of rainfall in comparison to the other stations, with measurements consistently reaching over 200mm, and the highest overall measurement recording 460.5mm in November 1899. Dublin Airport observations show the lowest overall rainfall, with only one recorded observation above 200mm. Belfast and University College Galway display similar observations, though Galway has slightly higher rainfall measurements and a larger range within observations.
A low range in rainfall observations is observed at the beginning of the Dublin, Belfast and Galway time series. Both Galway and Cork develop a much greater range beginning in the 1860s and continuing up to the end of the decade. The 1870s and 1890s display some of the highest rainfall observations of that half century across all stations. Rainfall range appears to diminish in size within all stations in the early 1900s, further developing into larger ranges with more consistent high rainfall observations between 1920 and 1950.
Again, lower ranges in rainfall are displayed within the Dublin and Belfast stations between 1960 and 1980, though a significant rainfall event is observed in both station in December 1978. The Galway and Cork stations appear relatively unchanged up to the 2000s, despite a slight decrease in range in the early 1960s. In contrast to the other stations, rainfall observations in Cork display lower measurements towards the end of the time series in comparison to the station’s previous history. The other three stations display comparatively high rainfall observations within the 21st century, each displaying some of the highest measurements of their respective time series.