Create an interactive essay using programming Language R

Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.


Introduction

This interactive essay will import and utilise a dataset containing rainfall data for 25 weather stations located throughout Ireland. It contains a continuous record from 1850 to 2014. The data was initially collected by amateur meteorologists in an early precursor of citizen science, and later came under the remit of Met Éireann. The dataset was provided to the students of AFF624 - (Mapping and Modelling Space and Time module of the MA in Digital Humanities from NUIM) by Professor Chris Brunsdon, director of the National Centre for Geocomputation, who in turn acknowledges Simon Noone and Conor Murphy for their work on the dataset.


Load the Data and Initial Packages


First we will load the dataset from the working directory of the R programme:

load('rainfall.RData')

We can load the core ‘tidyverse’ package and make it available in the current R session. The core tidyverse includes the packages that we will use for our analysis of the rainfall dataset. We will need to load them separately as needed. They are ‘ggplot’, and ‘dplyr’.

library("tidyverse"); 

Next we will check the data has loaded and view the variables contained with the dataset.

head(stations)
## # A tibble: 6 x 9
##   Station     Elevation Easting Northing   Lat  Long County    Abbr~ Sour~
##   <chr>           <int>   <dbl>    <dbl> <dbl> <dbl> <chr>     <chr> <chr>
## 1 Athboy             87  270400   261700  53.6 -6.93 Meath     AB    Met ~
## 2 Foulksmills        71  284100   118400  52.3 -6.77 Wexford   F     Met ~
## 3 Mullingar         112  241780   247765  53.5 -7.37 Westmeath M     Met ~
## 4 Portlaw             8  246600   115200  52.3 -7.31 Waterford P     Met ~
## 5 Rathdrum          131  319700   186000  52.9 -6.22 Wicklow   RD    Met ~
## 6 Strokestown        49  194500   279100  53.8 -8.10 Roscommon S     Met ~
head(rain)
## # A tibble: 6 x 4
##    Year Month Rainfall Station
##   <dbl> <ord>    <dbl> <chr>  
## 1  1850 Jan      169   Ardara 
## 2  1851 Jan      236   Ardara 
## 3  1852 Jan      250   Ardara 
## 4  1853 Jan      209   Ardara 
## 5  1854 Jan      188   Ardara 
## 6  1855 Jan       32.3 Ardara

The Data Frames


The first tible table shows the first six values for the nine variables in the table: Station (Location), Elevation (height above sea level), Easting, Northing (terms easting and northing are geographic Cartesian coordinates for a point), Lat, Long (longitude and latitude), County, Abbr (Location Abbreviation), Sour (Source: Met Éireann)

The second tible table shows the first six values for the four variables in the table: Year, Month, Rainfall, Station.


Load Dplyr


Next we will load dplyr.

‘dplyr’ is a package to manipulate data easily and solve the most common data manipulation challenges:

Using the group_by() function allows any operation to be preformed “by group”. http://dplyr.tidyverse.org/

A very advantageous feature of ‘dpylr’ is the command %>% which act as a pipeline, making the code easier to write and clearer to read.

library("dplyr");

Start to Manipulate the data


Now we will take the rain data and pipeline it. First by grouping the stations and then using the summarise function to calculate the mean rainfall for each station.

We will use the group() command to group by stations, and get each station’s mean rainfall using the command summarise, which applies a summary function to each rainfall entry and create a new data frame with one rainfall entry for each station. Any R summary function can be used as such; e.g. median,sd,max or a user defined function. We will store that new data frame in a new variable called rain_summary. We will do this by using the -> assignment operator. We can print the results to see if the summary has worked.

rain %>% group_by(Station) %>% 
  summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     140  
## 2 Armagh      68.3
## 3 Athboy      74.7
## 4 Belfast     87.1
## 5 Birr        70.8
## 6 Cappoquinn 121

Mapping the data


To make more sense of the data, it might be useful to locate the weather stations on some maps so as to elucidate the graph data which will follow. For example, to know if a station is in the East, North-East, or the Midlands, etc., is not apparent from the graph visualisations without a prior knowledge of the geography. It could be significant to know if the east has a higher average rainfall than the west, or vice-a-versa. Similarly, it might be significant to know if the midlands have more or less rain than coastal regions.

To display a map we could load a shape file, or we could use a Leaflet map, or indeed a combination of both. For example, the counties could be loaded onto a Leaflet map as shape files provided their projection is latitude / longitude.

Leaflet is an open-source JavaScript library for interactive maps. http://leafletjs.com/ The map base imagery data consists of tiled raster canvases. These can be replaced by themed canvases which are offered by different providers with differing conditions and rights.


Load Leaflet Map


library(leaflet) 

Load leaflet Map and display stations


On mouse-over or click, the name of the station will be displayed.

leaflet(data=stations,height=650,width=820) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,7) %>% addCircleMarkers(~Long, ~Lat, popup = ~as.character(Station), label = ~as.character(Station))

Visually Analysing the Data


We have been tasked with displaying the data in a time series defined by monthly rain fall, so we will now group the rain data by month, gets the mean values and stores the summary in a new variable called rain_months.

rain %>% group_by(Month) %>% 
  summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
##   Month mrain
##   <ord> <dbl>
## 1 Jan   113  
## 2 Feb    83.2
## 3 Mar    79.5
## 4 Apr    68.7
## 5 May    71.3
## 6 Jun    72.7

Now we can carry out some basic visual analysis.We will start with a simple bar plot.

barplot(rain_months$mrain,names=rain_months$Month,las=1,col='dodgerblue')

With our first visualisation of the data we can see clearly that precipitation is highest on the turn or the year, so that in January, it is only slightly below the maximum which is in December. From January the mean drops sharply, and in February, there is a steadier drop to April. Then there is a gradual rise over the following months back up to maximum precipitation in December.


Visualisation Examples


We will load ggplot2.

ggplot2 is a system for declaratively creating graphics. The user provides the data, tells ggplot2 how to map the variables to aesthetics, what graphical primitives to use, and then ggplot2 will generate graphical plots of the details provided.

library("ggplot2"); 

There follows some examples of how the data can be visualised.

rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + labs(y='Total Annual Rainfall')


Is a trend discernible? It is hard to detect with this type of chart with this scope, but we can apply a gemon_smoth() to the graph. This will average the graph, weighting the smoothing line towards the centre in order to draw a graphic line. The graphic also shows an amount of variability around the average.

So we will apply the Smoothing below.


rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + geom_smooth(col='dodgerblue') + labs(y='Total Annual Rainfall')
## `geom_smooth()` using method = 'loess'


Having applied the gemon_smoth() to the graph, we can see a clear upward trend in precipitation levels.

Is this trend the same for all stations? We can use a coplot of the stations to see them all individually but within the same field of view on screen. This will enable us to quickly see if the upward trend is present in all stations.


rain %>% group_by(Year, Station) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years_st
ggplot(aes(x=Year,y=total_rain),data=rain_years_st) + 
  geom_smooth() + labs(y='Total Annual Rainfall') +
  facet_wrap(~Station,nrow=5) + theme_dark()
## `geom_smooth()` using method = 'loess'


Even at this scale, we can discern an upward trend across the stations as represented by the blue line tilting upward from left to right, with only one outlier. The most precipitous station, Ardara, has a fall towards the end representing a drop in precipitation in more recent years. We can also see at a glance that a number of stations, for example, Ardara, Valentia, and Killarney, have a much higher rainfall than most. If we were not familiar with the geography, it might be useful to combine this data with a map to see if these stations present some kind of geographical pattern.

In order to do this we need to link the geographical location of the stations to the rainfall data.


rain %>% group_by(Station) %>%  summarise(mrain=mean(Rainfall))  -> rain_summary
rain_summary %>% left_join(stations) -> station_means
## Joining, by = "Station"

left_join links the Station data frame to rain_summary to join the rainfall mean data to the geographical information about station location.

We will Create the colour profile for the station markers on the map.

color_fun <- colorNumeric('Blues',station_means$mrain)
previewColors(color_fun,fivenum(station_means$mrain))

Colors: color_fun
Values: fivenum(station_means$mrain)

61.432
78.62
87.11
100
140.37

Create the map


Now we can create a map which indicates the mean rain fall for each station by colour shading of the circular markers which identify the stations’ locations. The darker the shade of blue the greater the mean rain fall at that station. The names of the stations can be displayed by clicking on the circular marker.


leaflet(data=station_means,height=650,width=820) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,7) %>% addCircleMarkers(fillColor=~color_fun(mrain),weight=0,fillOpacity = 0.85, popup = ~as.character(Station)) %>%
  addLegend(pal=color_fun,values=~mrain,title="Rainfall",position='bottomleft')

We can observe that the stations registering the highest precipitation levels are on the periphery of island. While not all coastal, they are very close to the coast in the south, southwest, and northwest. The midlands, in particular the eastern region around Dublin, register the least amount of rainfall. Ardara, the station which measures the most rainfall, lies in the North West close to the Atlantic coast. The combination of map and visualisation of mean rainfall is very useful for quickly observing patterns.


Some other Visualisation options for groupings


In the plot below grouped by year and month, the summary function is used to calculate the total rain fall. We then need to ungroup so that we can mutate the data to display only a specified number of years, filtering out all years beyond our maximum. We will also use a geom smooth to make any trends more discernible.

rain %>% group_by(Year,Month) %>% summarise(rf=sum(Rainfall)) %>% ungroup %>%
  mutate(moyr = 1850 + row_number()/12) %>% filter(Year <= 1870) -> monthly_total
ggplot(aes(x=moyr,y=rf),data=monthly_total) + geom_point() + 
  geom_line(col='indianred') + labs(x='Year',y='Rainfall') + 
  geom_smooth()
## `geom_smooth()` using method = 'loess'


Multiple groupings - average by month/station combination


rain %>% group_by(Month,Station) %>% 
  summarise(mean_rain=mean(Rainfall)) -> rain_season_station

The code above groups the Month and Station and combines them into a new dataframe called rain_season_station. Now we can use the reshape2package to rearrange the data to produce a a 2D array (below), which we can use to produce a head map.

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
##                  Jan       Feb       Mar      Apr      May       Jun
## Ardara     174.82606 126.82303 123.02000 98.79333 96.90727 105.24061
## Armagh      74.57242  55.97182  56.48879 53.67030 59.23182  62.72939
## Athboy      84.94759  62.62133  62.44944 58.97874 62.16260  68.11460
## Belfast    101.20718  74.50206  73.10221 65.90492 69.23426  74.48525
## Birr        79.92074  57.88501  58.42056 54.07187 60.25831  61.96445
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437  94.86357
##                  Jul       Aug       Sep       Oct       Nov       Dec
## Ardara     123.70485 145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh      72.50636  81.92182  69.02576  80.94242  73.73121  79.05939
## Athboy      76.28662  88.84710  76.35077  88.14627  80.95767  87.05997
## Belfast     87.70003 102.44499  87.11622 106.35394 100.31760 102.95073
## Birr        75.10084  86.75669  72.21722  83.80810  77.87266  81.74334
## Cappoquinn 104.09372 125.42245 116.04466 146.70658 141.10012 154.87402

Generate a headmap

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.


Time Series


Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.

Now we have explored some visualisation techniques and how to customise them, we can begin to answer the task set as outlined above. We will explore how to create a time series and manipulate the visualisations.

The R time-series object

ts - A time series object that is useful for many functions.

Examples of time-series visualisations

Below the monthly total for the years 1850 (Jan) to 1853 (Dec). Specified by concatenating c the years within our specified window,window.

monthly_total$rf %>% ts(freq=12,start=1850) -> rain_ts # Time serioes object
rain_ts %>% window(c(1850,1),c(1853,12)) # window from jan 1850 - dec 1853
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1850 2836.3 2158.9  964.1 3457.2 1492.1 1362.4 2584.8 1906.4 1763.1 1567.8
## 1851 4875.5 1379.9 1997.9 1368.1 1124.2 2537.1 2258.3 2416.2 1234.5 2834.2
## 1852 4036.1 2311.2 1116.0 1110.0 1754.6 4484.9 1716.0 2752.0 1433.1 2217.7
## 1853 3618.6 1073.9 2088.0 1881.9  829.1 1936.7 2512.3 2072.0 1504.1 4627.6
##         Nov    Dec
## 1850 2828.5 2600.1
## 1851 1134.3 1719.2
## 1852 5581.2 5205.9
## 1853 2651.1 1209.3

The plot below displays the same scope (all stations and all years) as the first bar chart in this essay, but using the total rainfall instead of mean. Its different visualisation technique provides more information than a simple bar chart.


rain_ts %>% 
  monthplot(col='dodgerblue',col.base='indianred',lwd.base=3)


We will create a new time series variable called rain_ts and ascribe it the values of total rainfall and group it by month and year. Again this needs to be ungrouped to specify a window. First establish the start as being Jan 1850 going up in frequencies of 12 (months). Then specify the window. We will display this as a tible to see if it worked.


rain %>%  group_by(Year,Month) %>% 
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>% 
  ts(start=c(1850,1),freq=12) -> rain_ts
rain_ts %>% window(c(1870,1),c(1871,12))
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8  789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5  657.6 2040.1 3705.0 1869.9 2083.4 2774.3
##         Nov    Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0

Creating a dynamic graph (dygraph)


Now we can load dygraphs library to create dynamic interactive graphs. Then the desired data frame can be displayed as a dynamic graph. This is achieved by using the %>% command to pipeline the rain_ts into a dygraph.


library(dygraphs) # A dynamic graph library
rain_ts %>% dygraph (width=800,height=500,x='Year') # Try moving the pointer along the curve

This dygraph looks rather cluttered and difficult to read at this scale. It also lacks a controlling interface; therefore, we will add a range selector.


rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector

We need to isolate individual stations, let us start with Belfast. First we will pipeline the data for Belfast into a new time series variable called bel_ts.


rain %>%  group_by(Year,Month) %>% filter(Station=="Belfast") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  bel_ts

Now we will create the dygraph.


bel_ts %>% dygraph(width=800,height=360, main='Belfast') %>% dyRangeSelector

Any station can be queried in this way and the results stored in a new variable, We will do this for the remaining stations of Dublin Airport, University College Galway, and Cork Airport.


rain %>%  group_by(Year,Month) %>% filter(Station=="Dublin Airport") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  dub_ts

dub_ts %>% dygraph(width=800,height=360, main='Dublin Airport') %>% dyRangeSelector

rain %>%  group_by(Year,Month) %>% filter (Station=="University College Galway") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  gal_ts
gal_ts %>% dygraph(width=800,height=360, main='University College Galway') %>% dyRangeSelector

rain %>%  group_by(Year,Month) %>% filter(Station=="Cork Airport") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  cor_ts
cor_ts %>% dygraph(width=800,height=360, main='Cork Airport') %>% dyRangeSelector

We can display all of the above as a four-way comparison dynamic graph. First we need to create a new variable which will store the data of the four grouped stations using the combine command cbind, and, as always, the assignment operator ->


comp4_ts <- cbind(bel_ts,dub_ts, gal_ts, cor_ts)

Now to create a dygraph displaying four stations.


We can use the dyOptions to change attributes of the graph, for example the stroke width and/or colours.

comp4_ts %>% dygraph(width=800,height=360) %>%
  dyOptions(strokeWidth = 2, colors = RColorBrewer::brewer.pal(4, "Set2")) %>% dyRangeSelector

It is difficult to draw conclusions as a non meteorologist, however, the rain fall at the four stations seems to correspond closely for the most part, in that if it is a particularly precipitous period, levels tend to be elevated in all, and the converse is true of periods of low precipitation. However, sometimes the stations diverge considerably, as in Oct 1902, while at tother times they correlate very closely as in Oct 2002.

As can be seen from above, this type of graph is very useful for comparing data, however, for the amount of data and the range of years, it can be difficult to read with more than two lines (stations in this case) and becomes more difficult as the scope is increased to include a greater time period. It would be more advantageous to separate out the stations and group them to allow one range control to manipulate all stations. This is also the objective of this assignment.


bel_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='Belfast')
dub_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='Dublin Airport') 
gal_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='University College Galway') 
cor_ts %>% dygraph(width=800,height=170,group="bdgc_four",main='Cork Airport') %>% dyRangeSelector


We can hide the code in RMarkdown so we can read the charts more easily.




A blog associated with my MA course work can be found here: http://dhprojects.maynoothuniversity.ie/2017/ssourke/