This interactive essay will import and utilise a dataset containing rainfall data for 25 weather stations located throughout in Ireland. It contains an impressive continuous record from 1850 to 2014. The data was sourced by Met Éireann and provided to the students of AFF624 - (Mapping Modelling Space and Time module) by Chris Brunsdon, director of the National Centre for Geocomputation, who in turn acknowledges Simon Noone and Conor Murphy for collecting the data.
First we will load the dataset:
load('rainfall.RData')
We can load the core ‘tidyverse’ package and make it available in your current R session. The core tidyverse includes the packages that we will use for our analysis of the rainfall dataset. We will need to load them separately as needed. They are ‘ggplot’, and ‘dplyr’.
library("tidyverse");
Next we will check the data has loaded.
head(stations)
## # A tibble: 6 x 9
## Station Elevation Easting Northing Lat Long County Abbr~ Sour~
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 Athboy 87 270400 261700 53.6 -6.93 Meath AB Met ~
## 2 Foulksmills 71 284100 118400 52.3 -6.77 Wexford F Met ~
## 3 Mullingar 112 241780 247765 53.5 -7.37 Westmeath M Met ~
## 4 Portlaw 8 246600 115200 52.3 -7.31 Waterford P Met ~
## 5 Rathdrum 131 319700 186000 52.9 -6.22 Wicklow RD Met ~
## 6 Strokestown 49 194500 279100 53.8 -8.10 Roscommon S Met ~
head(rain)
## # A tibble: 6 x 4
## Year Month Rainfall Station
## <dbl> <ord> <dbl> <chr>
## 1 1850 Jan 169 Ardara
## 2 1851 Jan 236 Ardara
## 3 1852 Jan 250 Ardara
## 4 1853 Jan 209 Ardara
## 5 1854 Jan 188 Ardara
## 6 1855 Jan 32.3 Ardara
The Data Frames
The first tible table shows the first six values for the nine variables in the table: Station (Location), Elevation (Height above sea level), Easting, Northing (terms easting and northing are geographic Cartesian coordinates for a point), Lat, Long (longitude and latitude), County, Abbr(Location Abbreviation), Sour (Source: Met Éireann)
The second tible table shows the first six values for the four variables in the table: Year, Month, Rainfall, Station.
Load Dplyr
Next we will load dplyr.
‘dplyr’ is a package to manipulate data easily and solve the most common data manipulation challenges:
Using the group_by() function allows any operation to be preformed “by group”. http://dplyr.tidyverse.org/
A very advantageous feature of ‘dpylr’ is the command %>% which act as a pipeline, making the code easier to write and clearer to read.
library("dplyr");
Now we will take the rain data and pipeline them, first by group and then summarise the mean rainfall. Here we are taking the variables and reducing the multiple values down to a single summary.
We will use the group() command to group them by station, and get the station’s mean rainfall. Summarise applies a summary function to each group and creates a new data frame with one entry for each group. Any R summary function can be used - eg median,sd,max or a user defined function.We will store that new data frame in a new variable called rain_summary and We will do this by using the -> assignment operator. And we can print to see if the summary is worked.
rain %>% group_by(Station) %>%
summarise(mrain=mean(Rainfall)) -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
## Station mrain
## <chr> <dbl>
## 1 Ardara 140
## 2 Armagh 68.3
## 3 Athboy 74.7
## 4 Belfast 87.1
## 5 Birr 70.8
## 6 Cappoquinn 121
To make more sense of the data, it might be useful to locate the weather stations on some maps so as to elucidate the graph data which will follow. For example, to know if a station is in the East, North-East, or the Midlands, etc., is not apparent from the graph visualisations without a prior knowledge of the geography. It could be significant to know if the east has a higher average rainfall than the west, or vice-a-versa. Similarly, it might be significant to know if the midlands has more or less rain than coastal regions.
To display a map we could load a shape file, or we could use a Leaflet map, or indeed a combination of both. For example, the counties could be loaded onto a Leaflet map as shape files so long as their projection is latitude / longitude.
Leaflet is an open-source JavaScript library for interactive maps. http://leafletjs.com/ The map base imagery data consists of tiled raster canvases. These can be replaced by themed canvases which are offered by different providers with differing conditions and rights.
Load Leaflet Map
library(leaflet)
Load leaflet Map and display stations
On mouse-over or click, the name of the station will display
leaflet(data=stations,height=650,width=820) %>% addProviderTiles('CartoDB.Positron') %>%
setView(-8,53.5,7) %>% addCircleMarkers(~Long, ~Lat, popup = ~as.character(Station), label = ~as.character(Station))
We have been tasked with displaying the data in a time series defined by monthly rain fall, so we will now create a variable that groups the rain data by month, gets the mean values and stores the summary in a new variable called rain_months
rain %>% group_by(Month) %>%
summarise(mrain=mean(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
## Month mrain
## <ord> <dbl>
## 1 Jan 113
## 2 Feb 83.2
## 3 Mar 79.5
## 4 Apr 68.7
## 5 May 71.3
## 6 Jun 72.7
Now we can try to basic visual analysis.
barplot(rain_months$mrain,names=rain_months$Month,las=1,col='dodgerblue')
With our first visualisation of the data we can see clearly that precipitation is highest on the turn or the year, so that in January, it is only slightly below maximum in the previous month. From January the mean drops sharply, and in February, there is a steadier drop to April. Then there is a gradual rise over the following months back up to maximum precipitation in December.
We will load ggplot2.
ggplot2 is a system for declaratively creating graphics, where the user provides the data, tells ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes will generate graphical plots of the details provided.
library("ggplot2");
There follows some examples of how the data can be visualised.
rain %>% group_by(Year) %>%
summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + labs(y='Total Annual Rainfall')
Is a trend discernible? It is hard to detect with this type of chart, but we can apply a gemon_smoth() to the graph. This will average graph, weighting the smoothing line towards the centre to draw a graphic line. The graphic also shows an amount of variability around the average.
rain %>% group_by(Year) %>%
summarise(total_rain=sum(Rainfall)) -> rain_years
ggplot(aes(x=Year,y=total_rain, col='indianred'),data=rain_years) + geom_line(stat='identity') + geom_smooth(col='dodgerblue') + labs(y='Total Annual Rainfall')
## `geom_smooth()` using method = 'loess'
Having applied the gemon_smoth() to the graph, we can see a clear upward trend in precipitation levels.
Is this trend the same for all stations? We can coplot the stations to see them all at once to see if the upward trend is present in all stations.
rain %>% group_by(Year, Station) %>%
summarise(total_rain=sum(Rainfall)) -> rain_years_st
ggplot(aes(x=Year,y=total_rain),data=rain_years_st) +
geom_smooth() + labs(y='Total Annual Rainfall') +
facet_wrap(~Station,nrow=5) + theme_dark()
## `geom_smooth()` using method = 'loess'
Even at this scale, we can discern an upward trend across the stations as represented by the blue line tilting upward from left to right, with only one outlier, the most precipitous station, Arddara, which has a fall towards the end representing a drop in precipitation. We can also see that a number of stations, Ardara, Valentia, Killarney, if we were not familiar with the geography, it might be useful to combine this data with a map to see if these present some kind of geographical pattern.
We need to link the geographical location of the stations to the rainfall data.
rain %>% group_by(Station) %>% summarise(mrain=mean(Rainfall)) -> rain_summary
rain_summary %>% left_join(stations) -> station_means
## Joining, by = "Station"
left_join links the stations data frame to rain_summary to get the geographical information.
Create the colour profile
color_fun <- colorNumeric('Blues',station_means$mrain)
previewColors(color_fun,fivenum(station_means$mrain))
color_fun
fivenum(station_means$mrain)
| 61.432 | |
| 78.62 | |
| 87.11 | |
| 100 | |
| 140.37 |
Create the map
Now we can create a map which shows the mean rain fall for each station through colour shading the circular markers which identify the stations locations. The darker the shade of blue the greater the mean rain fall. The names of the stations can be got by clicking on the circular marker.
leaflet(data=station_means,height=650,width=820) %>% addProviderTiles('CartoDB.Positron') %>%
setView(-8,53.5,7) %>% addCircleMarkers(fillColor=~color_fun(mrain),weight=0,fillOpacity = 0.85, popup = ~as.character(Station)) %>%
addLegend(pal=color_fun,values=~mrain,title="Rainfall",position='bottomleft')
We can observe that the stations registering the highest precipitation levels are on the periphery of island, while not all coastal, they are very close to the coast in the south, south west, and north west. The midlands and particularly the eastern region around Dublin register the least rainfall. Ardara, the station which measures the most rainfall, lies in the North West. The combination of map and visualisation of mean rainfall is very useful for quickly observing patterns.
rain %>% group_by(Year,Month) %>% summarise(rf=sum(Rainfall)) %>% ungroup %>%
mutate(moyr = 1850 + row_number()/12) %>% filter(Year <= 1870) -> monthly_total
ggplot(aes(x=moyr,y=rf),data=monthly_total) + geom_point() +
geom_line(col='indianred') + labs(x='Year',y='Rainfall') +
geom_smooth()
## `geom_smooth()` using method = 'loess'
Multiple groupings - eg average by month/station combination
rain %>% group_by(Month,Station) %>%
summarise(mean_rain=mean(Rainfall)) -> rain_season_station
In the coad above we have grouped the Mothns and Stations and combined them into a new dataframe called rain_season_station. Now we can use the reshape2package to rearrange the data to produce a a 2D array (Below), which we can use to produce a head map.
library(reshape2)
rain_season_station %>% acast(Station~Month) %>% head
## Jan Feb Mar Apr May Jun
## Ardara 174.82606 126.82303 123.02000 98.79333 96.90727 105.24061
## Armagh 74.57242 55.97182 56.48879 53.67030 59.23182 62.72939
## Athboy 84.94759 62.62133 62.44944 58.97874 62.16260 68.11460
## Belfast 101.20718 74.50206 73.10221 65.90492 69.23426 74.48525
## Birr 79.92074 57.88501 58.42056 54.07187 60.25831 61.96445
## Cappoquinn 153.97159 117.77099 110.02890 94.00365 95.81437 94.86357
## Jul Aug Sep Oct Nov Dec
## Ardara 123.70485 145.24788 152.80727 174.44788 176.45030 186.14182
## Armagh 72.50636 81.92182 69.02576 80.94242 73.73121 79.05939
## Athboy 76.28662 88.84710 76.35077 88.14627 80.95767 87.05997
## Belfast 87.70003 102.44499 87.11622 106.35394 100.31760 102.95073
## Birr 75.10084 86.75669 72.21722 83.80810 77.87266 81.74334
## Cappoquinn 104.09372 125.42245 116.04466 146.70658 141.10012 154.87402
Generate a headmap
library(reshape2)
rain_season_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using mean_rain as value column: use value.var to override.
Creating a dygraph of the weather stations at Belfast, Dublin Airport, University College Galway, and Cork Airport showing time series of rainfall on a monthly basis. Include a RangeSelector control that simultaneously changes the time window on the four time series.
Now we have explored some visualisation techniques and how to customise them, we can begin to answer the task set as outlined above. We will explore how to create a time series and manipulate the visualisations.
The R time-series object ts - A time series object that is useful for some functions
Examples of TS Visualisations
monthly_total$rf %>% ts(freq=12,start=1850) -> rain_ts # Time serioes object
rain_ts %>% window(c(1850,1),c(1853,12)) # window from jan 1850 - dec 1853
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1850 2836.3 2158.9 964.1 3457.2 1492.1 1362.4 2584.8 1906.4 1763.1 1567.8
## 1851 4875.5 1379.9 1997.9 1368.1 1124.2 2537.1 2258.3 2416.2 1234.5 2834.2
## 1852 4036.1 2311.2 1116.0 1110.0 1754.6 4484.9 1716.0 2752.0 1433.1 2217.7
## 1853 3618.6 1073.9 2088.0 1881.9 829.1 1936.7 2512.3 2072.0 1504.1 4627.6
## Nov Dec
## 1850 2828.5 2600.1
## 1851 1134.3 1719.2
## 1852 5581.2 5205.9
## 1853 2651.1 1209.3
The plot below displays the same information as the first bar chart in this essay, but using a different visualisation technique. Monthly average rainfall for the entire period.
rain_ts %>%
monthplot(col='dodgerblue',col.base='indianred',lwd.base=3)
We will create a new time series variable called rain_ts and ascribe it the values of rain total rainfall and group by month and year. We will also establish the start as being Jan 1850 going up in frequencies of 12 (months ). We will display this as a tible to see if it worked.
rain %>% group_by(Year,Month) %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> rain_ts
rain_ts %>% window(c(1870,1),c(1871,12))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1870 2666.2 1975.3 1500.5 1024.8 1862.8 789.2 1038.6 1510.5 2045.5 5177.6
## 1871 3148.3 2343.7 1731.7 2654.5 657.6 2040.1 3705.0 1869.9 2083.4 2774.3
## Nov Dec
## 1870 1733.2 1902.2
## 1871 2000.1 1902.0
Now we can load dygraphs to create dynamic interactive graphs and pipeine the rain_ts into a dygraph.
library(dygraphs) # A dynamic graph library
rain_ts %>% dygraph (width=800,height=500,x='Year') # Try moving the pointer along the curve
This is rather messy and lacks control; therefore, we will add a range selector.
rain_ts %>% dygraph(width=800,height=300) %>% dyRangeSelector
We need to isolate individual stations, let us start with Belfast. First we will pipeline the data for Belfast into a new time series variable called bel_ts
rain %>% group_by(Year,Month) %>% filter(Station=="Belfast") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> bel_ts
Now we will create the dygraph
bel_ts %>% dygraph(width=800,height=360, main='Belfast') %>% dyRangeSelector
Any station can be queried in this way and the results stored in a new variable, We will do this for the remain stations of Dublin Airport, University College Galway, and Cork Airport.
rain %>% group_by(Year,Month) %>% filter(Station=="Dublin Airport") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> dub_ts
dub_ts %>% dygraph(width=800,height=360, main='Dublin Airport') %>% dyRangeSelector
rain %>% group_by(Year,Month) %>% filter (Station=="University College Galway") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> gal_ts
gal_ts %>% dygraph(width=800,height=360, main='University College Galway') %>% dyRangeSelector
rain %>% group_by(Year,Month) %>% filter(Station=="Cork Airport") %>%
summarise(Rainfall=sum(Rainfall)) %>% ungroup %>% transmute(Rainfall) %>%
ts(start=c(1850,1),freq=12) -> cor_ts
cor_ts %>% dygraph(width=800,height=360, main='Cork Airport') %>% dyRangeSelector
We can display all of the above as a four-way comparison dynamic graph - with rolling mean. First we need to create a new variable which will assign the four grouped stations using the combine command, and, as always, the assignment operator ->
comp4_ts <- cbind(bel_ts,dub_ts, gal_ts, cor_ts)
Now to create a dygraph.
comp4_ts %>% dygraph(width=800,height=360) %>% dyRangeSelector
As can be seen this type of graph could be very useful for comparing data, however, for the amount of data and the range of years it is less suitable. It would be more advantageous to separate out the stations and group them to allow one range control to manipulate all stations. This is also the objective of this assignment.
bel_ts %>% dygraph(width=800,height=170,group="bdgc_four",main='Belfast')
dub_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='Dublin Airport')
gal_ts %>% dygraph(width=800,height=130,group="bdgc_four",main='University College Galway')
cor_ts %>% dygraph(width=800,height=170,group="bdgc_four",main='Cork Airport') %>% dyRangeSelector
We can hide the code in RMarkdown so we can read the charts more easily.