Create an interactive essay using programming Language R

Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.


Introduction

This interactive essay will import and utilise a dataset containing rainfall data for 25 weather stations located throughout in Ireland. It contains an impressive continuous record from 1850 to 2014. The data was sourced by Met Éireann and provided to the students of AFF624 - (Mapping Modelling Space and Time module) by Chris Brunsdon, director of the National Centre for Geocomputation, who in turn acknowledges Simon Noone and Conor Murphy for collecting the data.


Load the Data and Initial Packages


First we will load the dataset:

load('rainfall.RData')

We can load the core ‘tidyverse’ package and make it available in your current R session. The core tidyverse includes the packages that we will use for our analysis of the rainfall dataset. We will need to load them separately as needed. They are ‘ggplot’, and ‘dplyr’.

library("tidyverse"); 

Next we will check the data has loaded.

head(stations)
## # A tibble: 6 x 9
##   Station     Elevation Easting Northing   Lat  Long County    Abbr~ Sour~
##   <chr>           <int>   <dbl>    <dbl> <dbl> <dbl> <chr>     <chr> <chr>
## 1 Athboy             87  270400   261700  53.6 -6.93 Meath     AB    Met ~
## 2 Foulksmills        71  284100   118400  52.3 -6.77 Wexford   F     Met ~
## 3 Mullingar         112  241780   247765  53.5 -7.37 Westmeath M     Met ~
## 4 Portlaw             8  246600   115200  52.3 -7.31 Waterford P     Met ~
## 5 Rathdrum          131  319700   186000  52.9 -6.22 Wicklow   RD    Met ~
## 6 Strokestown        49  194500   279100  53.8 -8.10 Roscommon S     Met ~
head(rain)
## # A tibble: 6 x 4
##    Year Month Rainfall Station
##   <dbl> <ord>    <dbl> <chr>  
## 1  1850 Jan      169   Ardara 
## 2  1851 Jan      236   Ardara 
## 3  1852 Jan      250   Ardara 
## 4  1853 Jan      209   Ardara 
## 5  1854 Jan      188   Ardara 
## 6  1855 Jan       32.3 Ardara

The Data Frames


The first tible table shows the first six values for the nine variables in the table: Station (Location), Elevation (Height above sea level), Easting, Northing (terms easting and northing are geographic Cartesian coordinates for a point), Lat, Long (longitude and latitude), County, Abbr(Location Abbreviation), Sour (Source: Met Éireann)

The second tible table shows the first six values for the four variables in the table: Year, Month, Rainfall, Station.


Load Dplyr


Next we will load dplyr.

‘dplyr’ is a package to manipulate data easily and solve the most common data manipulation challenges:

Using the group_by() function allows any operation to be preformed “by group”. http://dplyr.tidyverse.org/

A very advantageous feature of ‘dpylr’ is the command %>% which act as a pipeline, making the code easier to write and clearer to read.

library("dplyr");

Start to Manipulate the data


Now we will take the rain data and pipeline them, first by group and then summarise the mean rainfall. Here we are taking the variables and reducing the multiple values down to a single summary.

We will use the group() command to group them by station, and get the station’s mean rainfall. Summarise applies a summary function to each group and creates a new data frame with one entry for each group. Any R summary function can be used - eg median,sd,max or a user defined function.We will store that new data frame in a new variable called rain_summary and We will do this by using the -> assignment operator. And we can print to see if the summary is worked.

rain %>% group_by(Station) %>% 
  summarise(mrain=mean(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     140  
## 2 Armagh      68.3
## 3 Athboy      74.7
## 4 Belfast     87.1
## 5 Birr        70.8
## 6 Cappoquinn 121

Mapping the data


To make more sense of the data, it might be useful to locate the weather stations on some maps so as to elucidate the graph data which will follow. For example, to know if a station is in the East, North-East, or the Midlands, etc., is not apparent from the graph visualisations without a prior knowledge of the geography. It could be significant to know if the east has a higher average rainfall than the west, or vice-a-versa. Similarly, it might be significant to know if the midlands has more or less rain than coastal regions.

To display a map we could load a shape file, or we could use a Leaflet map, or indeed a combination of both. For example, the counties could be loaded onto a Leaflet map as shape files so long as their projection is latitude / longitude.

Leaflet is an open-source JavaScript library for interactive maps. http://leafletjs.com/ The map base imagery data consists of tiled raster canvases. These can be replaced by themed canvases which are offered by different providers with differing conditions and rights.


Load Leaflet Map


library(leaflet) 

Load leaflet Map and display stations


On mouse-over or click, the name of the station will display

leaflet(data=stations,height=650,width=820) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,7) %>% addCircleMarkers(~Long, ~Lat, popup = ~as.character(Station), label = ~as.character(Station))

Visually Analysing the Data


We have been tasked with displaying the data in a time series defined by monthly rain fall, so we will now create a variable that groups the rain data by month, gets the mean values and stores the summary in a new variable called rain_months

rain %>% group_by(Month) %>% 
  summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
##   Month mrain
##   <ord> <dbl>
## 1 Jan   113  
## 2 Feb    83.2
## 3 Mar    79.5
## 4 Apr    68.7
## 5 May    71.3
## 6 Jun    72.7

Now we can try to basic visual analysis.

barplot(rain_months$mrain,names=rain_months$Month,las=1,col='dodgerblue')

With our first visualisation of the data we can see clearly that precipitation is highest on the turn or the year, so that in January, it is only slightly below maximum in the previous month. From January the mean drops sharply, and in February, there is a steadier drop to April. Then there is a gradual rise over the following months back up to maximum precipitation in December.


Visualisation Examples


We will load ggplot2.

ggplot2 is a system for declaratively creating graphics, where the user provides the data, tells ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes will generate graphical plots of the details provided.

library("ggplot2"); 

There follows some examples of how the data can be visualised.

rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + labs(y='Total Annual Rainfall')

Is a trend discernible? It is hard to detect with this type of chart, but we can apply a gemon_smoth() to the graph. This will average graph, weighting the smoothing line towards the centre to draw a graphic line. The graphic also shows an amount of variability around the average.

rain %>% group_by(Year) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + geom_smooth(col='dodgerblue') + labs(y='Total Annual Rainfall')
## `geom_smooth()` using method = 'loess'

Having applied the gemon_smoth() to the graph, we can see a clear upward trend in precipitation levels.

Is this trend the same for all stations? We can coplot the stations to see them all at once to see if the upward trend is present in all stations.

rain %>% group_by(Year, Station) %>% 
  summarise(total_rain=sum(Rainfall)) -> rain_years_st
ggplot(aes(x=Year,y=total_rain),data=rain_years_st) + 
  geom_smooth() + labs(y='Total Annual Rainfall') +
  facet_wrap(~Station,nrow=5) + theme_dark()
## `geom_smooth()` using method = 'loess'

Even at this scale, we can discern an upward trend across the stations as represented by the blue line tilting upward from left to right, with only one outlier, the most precipitous station, Arddara, which has a fall towards the end representing a drop in precipitation. We can also see that a number of stations, Ardara, Valentia, Killarney, if we were not familiar with the geography, it might be useful to combine this data with a map to see if these present some kind of geographical pattern.

We need to link the geographical location of the stations to the rainfall data.

rain %>% group_by(Station) %>%  summarise(mrain=mean(Rainfall))  -> rain_summary
rain_summary %>% left_join(stations) -> station_means
## Joining, by = "Station"

left_join links the stations data frame to rain_summary to get the geographical information.

Create the colour profile

color_fun <- colorNumeric('Blues',station_means$mrain)
previewColors(color_fun,fivenum(station_means$mrain))

Colors: color_fun
Values: fivenum(station_means$mrain)

61.432
78.62
87.11
100
140.37

Create the map

Now we can create a map which shows the mean rain fall for each station through colour shading the circular markers which identify the stations locations. The darker the shade of blue the greater the mean rain fall. The names of the stations can be got by clicking on the circular marker.

leaflet(data=station_means,height=650,width=820) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,7) %>% addCircleMarkers(fillColor=~color_fun(mrain),weight=0,fillOpacity = 0.85, popup = ~as.character(Station)) %>%
  addLegend(pal=color_fun,values=~mrain,title="Rainfall",position='bottomleft')

We can observe that the stations registering the highest precipitation levels are on the periphery of island, while not all coastal, they are very close to the coast in the south, south west, and north west. The midlands and particularly the eastern region around Dublin register the least rainfall. Ardara, the station which measures the most rainfall, lies in the North West. The combination of map and visualisation of mean rainfall is very useful for quickly observing patterns.

Some other Visualisation options for groupings

rain %>% group_by(Year,Month) %>% summarise(rf=sum(Rainfall)) %>% ungroup %>%
  mutate(moyr = 1850 + row_number()/12) %>% filter(Year <= 1870) -> monthly_total
ggplot(aes(x=moyr,y=rf),data=monthly_total) + geom_point() + 
  geom_line(col='indianred') + labs(x='Year',y='Rainfall') + 
  geom_smooth()
## `geom_smooth()` using method = 'loess'

Multiple groupings - eg average by month/station combination

rain %>% group_by(Month,Station) %>% 
  summarise(mean_rain=mean(Rainfall)) -> rain_season_station

In the coad above we have grouped the Mothns and Stations and combined them into a new dataframe called rain_season_station. Now we can use the reshape2package to rearrange the data to produce a a 2D array (Below), which we can use to produce a head map.

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
##                  Jan       Feb       Mar      Apr      May       Jun
## Ardara     174.82606 126.82303 123.02000 98.79333 96.90727 105.24061
## Armagh      74.57242  55.97182  56.48879 53.67030 59.23182  62.72939
## Athboy      84.94759  62.62133  62.44944 58.97874 62.16260  68.11460
## Belfast    101.20718  74.50206  73.10221 65.90492 69.23426  74.48525
## Birr        79.92074  57.88501  58.42056 54.07187 60.25831  61.96445
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437  94.86357
##                  Jul       Aug       Sep       Oct       Nov       Dec
## Ardara     123.70485 145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh      72.50636  81.92182  69.02576  80.94242  73.73121  79.05939
## Athboy      76.28662  88.84710  76.35077  88.14627  80.95767  87.05997
## Belfast     87.70003 102.44499  87.11622 106.35394 100.31760 102.95073
## Birr        75.10084  86.75669  72.21722  83.80810  77.87266  81.74334
## Cappoquinn 104.09372 125.42245 116.04466 146.70658 141.10012 154.87402

Generate a headmap

library(reshape2)
rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.


Time Series


Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.

Now we have explored some visualisation techniques and how to customise them, we can begin to answer the task set as outlined above. We will explore how to create a time series and manipulate the visualisations.

The R time-series object ts - A time series object that is useful for some functions

Examples of TS Visualisations

monthly_total$rf %>% ts(freq=12,start=1850) -> rain_ts # Time serioes object
rain_ts %>% window(c(1850,1),c(1853,12)) # window from jan 1850 - dec 1853
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1850 2836.3 2158.9  964.1 3457.2 1492.1 1362.4 2584.8 1906.4 1763.1 1567.8
## 1851 4875.5 1379.9 1997.9 1368.1 1124.2 2537.1 2258.3 2416.2 1234.5 2834.2
## 1852 4036.1 2311.2 1116.0 1110.0 1754.6 4484.9 1716.0 2752.0 1433.1 2217.7
## 1853 3618.6 1073.9 2088.0 1881.9  829.1 1936.7 2512.3 2072.0 1504.1 4627.6
##         Nov    Dec
## 1850 2828.5 2600.1
## 1851 1134.3 1719.2
## 1852 5581.2 5205.9
## 1853 2651.1 1209.3

The plot below displays the same information as the first bar chart in this essay, but using a different visualisation technique. Monthly average rainfall for the entire period.


rain_ts %>% 
  monthplot(col='dodgerblue',col.base='indianred',lwd.base=3)


We will create a new time series variable called rain_ts and ascribe it the values of rain total rainfall and group by month and year. We will also establish the start as being Jan 1850 going up in frequencies of 12 (months ). We will display this as a tible to see if it worked.


rain %>%  group_by(Year,Month) %>% 
  summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>% 
  ts(start=c(1850,1),freq=12) -> rain_ts
rain_ts %>% window(c(1870,1),c(1871,12))
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8  789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5  657.6 2040.1 3705.0 1869.9 2083.4 2774.3
##         Nov    Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0

Now we can load dygraphs to create dynamic interactive graphs and pipeine the rain_ts into a dygraph.


library(dygraphs) # A dynamic graph library
rain_ts %>% dygraph (width=800,height=500,x='Year') # Try moving the pointer along the curve

This is rather messy and lacks control; therefore, we will add a range selector.


rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector

We need to isolate individual stations, let us start with Belfast. First we will pipeline the data for Belfast into a new time series variable called bel_ts


rain %>%  group_by(Year,Month) %>% filter(Station=="Belfast") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  bel_ts

Now we will create the dygraph


bel_ts %>% dygraph(width=800,height=360, main='Belfast') %>% dyRangeSelector

Any station can be queried in this way and the results stored in a new variable, We will do this for the remain stations of Dublin Airport, University College Galway, and Cork Airport.


rain %>%  group_by(Year,Month) %>% filter(Station=="Dublin Airport") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  dub_ts

dub_ts %>% dygraph(width=800,height=360, main='Dublin Airport') %>% dyRangeSelector

rain %>%  group_by(Year,Month) %>% filter (Station=="University College Galway") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  gal_ts
gal_ts %>% dygraph(width=800,height=360, main='University College Galway') %>% dyRangeSelector

rain %>%  group_by(Year,Month) %>% filter(Station=="Cork Airport") %>%
    summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
    ts(start=c(1850,1),freq=12) ->  cor_ts
cor_ts %>% dygraph(width=800,height=360, main='Cork Airport') %>% dyRangeSelector

We can display all of the above as a four-way comparison dynamic graph - with rolling mean. First we need to create a new variable which will assign the four grouped stations using the combine command, and, as always, the assignment operator ->


comp4_ts <- cbind(bel_ts,dub_ts, gal_ts, cor_ts)

Now to create a dygraph.


comp4_ts %>% dygraph(width=800,height=360) %>% dyRangeSelector

As can be seen this type of graph could be very useful for comparing data, however, for the amount of data and the range of years it is less suitable. It would be more advantageous to separate out the stations and group them to allow one range control to manipulate all stations. This is also the objective of this assignment.


bel_ts %>% dygraph(width=800,height=170,group="bdgc_four",main='Belfast')
dub_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='Dublin Airport') 
gal_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='University College Galway') 
cor_ts %>% dygraph(width=800,height=170,group="bdgc_four",main='Cork Airport') %>% dyRangeSelector

We can hide the code in RMarkdown so we can read the charts more easily.