Introduction

R is a useful programming language for undertaking statistical analysis of many kinds of data. However, it is also an apt tool for communicating data analysis results through appropriate data visualisation techniques. In this exercise, the steps taken to create a web map of median rainfall amounts (for the period 1850-2014) at weather stations across Ireland for the month of January will be documented. The aim of such data visualisation being to effectively and simply communicate patterns observed in the data in a spatial context. By using RMarkdown, the R code employed at each stage of the data exploration and analysis process will show transparency in the code used and decisions made and will accompany any discussion of this process or subsequent results. This is to keep with the ethos of transparency and reproducible research (Brunsdon, 2018a). Lastly, a discussion of the visual patterns which emerge will conclude this investigation.

The Importance of Rainfall Data

Rainfall is a key climate variable of which reliable historical observations (data) can be analysed to investigate how climate change may have impacted rainfall amounts, patterns, or more broadly the water cycle, in a given location over time (Met Eireann, 2018a). With the existence of the Island of Ireland Precipitation (IIP) series, it is possible to assess median rainfall at stations across Ireland for a 164 year period. As such, rainfall amounts (a continuous ratio variable bounded at a value of 0 (NOAA, n.d.)) will be explored and mapped in this exercise.

Methods

Sourcing the data

Firstly, the working directory was set to access to rainfall and station data using the setwd() function (the function not included as R code here as it is not appropriate for the format of this document).

setwd("C:/Users/Jordan/Desktop/R/R Assignment 2")

The data for this project came from two data sets contained within the ‘rainfall.Rdata’ file (which was downloaded in a ready to use format from Moodle and saved in the working directory folder). The rainfall.Rdata file was first loaded. All of the code from this point onwards is sourced from the lectures given as part of the Methods & Techniques in Geocomputation module (Brunsdon, 2018b).

load('rainfall.RData')

The head () function was used to check the contents of the two tables.

  1. ‘rain’ was revealed to contains daily rainfall records for each weather station across Ireland and,
head(rain)
##   Year Month Rainfall Station
## 1 1850   Jan    169.0  Ardara
## 2 1851   Jan    236.4  Ardara
## 3 1852   Jan    249.7  Ardara
## 4 1853   Jan    209.1  Ardara
## 5 1854   Jan    188.5  Ardara
## 6 1855   Jan     32.3  Ardara
  1. ‘stations’ contained a multitude of attributes for each station, namely coordinate information, elevation, the county where it is located, and importantly a column referencing the source of these attributes.
head(stations)
##       Station Elevation Easting Northing   Lat  Long    County
## 1      Athboy        87  270400   261700 53.60 -6.93     Meath
## 2 Foulksmills        71  284100   118400 52.30 -6.77   Wexford
## 3   Mullingar       112  241780   247765 53.47 -7.37 Westmeath
## 4     Portlaw         8  246600   115200 52.28 -7.31 Waterford
## 5    Rathdrum       131  319700   186000 52.91 -6.22   Wicklow
## 6 Strokestown        49  194500   279100 53.75 -8.10 Roscommon
##   Abbreviation      Source
## 1           AB Met Eireann
## 2            F Met Eireann
## 3            M Met Eireann
## 4            P Met Eireann
## 5           RD Met Eireann
## 6            S Met Eireann

From the above head() functions, it was observed that the variables of specific interest (rainfall amounts in January, station names, and coordinate information) for mapping stations and their associated median rainfall in January were initially in two separate objects. Prior to beginning the process of combining these datasets for the desired attributes, the libraries necessary for analysis were loaded and the datasets explored further.

Exploring the data prior to mapping

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("knitr")
## Warning: package 'knitr' was built under R version 3.5.2
library("leaflet")
library("RColorBrewer")

‘knitr’ is essential for publishing this document in a HTML blog format with the associated chunks of R code upon completion. The error message highlighted the fact that the knitr package was built under a different version of R. However, it did not cause a problem to use.

The ‘dplyr’ library is essential for data manipulation functions such as pipe operator for inserting values into functions (Brunsdon, 2018c), and the mutate and filter functions which are important for producing a map of median rainfall for only January over the time period. With regard to the warning message which appeared upon loading dplyr, this was important to recognise as it informed us that the objects would be masked from the stats and base packages i.e. that the dplyr library would be referred to for these functions.

‘leaflet’ is essential for designing and creating the interactive map of derived medians by station.

‘RColorBrewer’ will allow for colour palettes which work well on maps.

Summary statistics of the rainfall values

summary(rain$Rainfall)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   53.60   82.20   90.97  118.10  460.50
sd(rain$Rainfall)
## [1] 51.60415

Using the summary() function gave a high level glance at indicators such as the mean and median to assess central tendency and the standard deviation gave hints to the underlying distribution of the daily rainfall amounts - giving an indication of what can be viewed as comparatively high levels of rain for the period, and similarly with lower levels. The sd() gaves the standard deviation for the dataset - used to quickly check how far from the mean rainfall amount an observation is likely to lie in.

mn.allrain <- mean(rain$Rainfall)
md.allrain <- median(rain$Rainfall)
mn.allrain
## [1] 90.97078
md.allrain
## [1] 82.2

Visualising the distribution of the rainfall values

hist(rain$Rainfall, main="Distribution of Rainfall Value amounts recorded (1850-2014)")
abline(v=c(md.allrain,mn.allrain),col=c("red","blue"),lty=c(2,2),lwd=c(2,2))

Hist() visualised this distribution and a positive skew was revealed - typical of rainfall data and many datasets. Given the summary() showed the mean was a higher value than the median (indicated by the broken lines on the histogram)

Blue - Mean Red - Median

It is important to note this distribution when considering variance, but bearing in mind that this histogram represents the combined values of all stations and the median is being used, there was no transformation applied.

Deriving and grouping median rainfall amounts by station to investigate differences

rain %>% group_by(Station) %>% 
  summarise(medianrain=median(Rainfall)) -> rain_summary

The code above illustrates the use of the pipeline operator (%>%). Essentially, this code created the object ‘rain_summary’ by taking the ‘rain’ data, grouping it by the stations column within the same object, and using the summarise function to determine the median for each of the stations individually and create a new attribute coloumn.

head(rain_summary)
## # A tibble: 6 x 2
##   Station    medianrain
##   <chr>           <dbl>
## 1 Ardara          132. 
## 2 Armagh           65.4
## 3 Athboy           70.7
## 4 Belfast          82.1
## 5 Birr             66.2
## 6 Cappoquinn      113.

A preview of the new ‘rain_summary’ object then previewed the results.

barplot(rain_summary$medianrain, main = "Median rainfall amounts over entire period (1850-2014) by station", 
        xlab = "Stations", ylab = "Rainfall (mm)", 
        col=("darkblue"), names.arg = rain_summary$Station)

A quick visualisation using the barplot function shows much variation in median values between stations at an initial glance. Ardara and Valentia specifically seem to show some of the highest values - both in the most western regions of Ireland, albeit at opposite ends of the country in terms of latitude. An interesting observation to consider for all of the stations, hinting at a potential underlying spatial aspect or gradient to the varying rainfall amounts observed.

Deriving and grouping the median rainfall by month

rain %>% group_by(Month) %>% 
  summarise(mrain=median(Rainfall)) -> mrain_allmonths

Similar to the previous chunk looking at grouping by stations, the above code makes use of the ‘month’ column in the ‘rain’ object to group the data for calculation of monthly medians.

head(mrain_allmonths)
## # A tibble: 6 x 2
##   Month mrain
##   <fct> <dbl>
## 1 Jan   105. 
## 2 Feb    74.4
## 3 Mar    73.1
## 4 Apr    62.5
## 5 May    65.5
## 6 Jun    65.2

The head() function again showed the varying medians by month for all of the stations across all of the months.

barplot(mrain_allmonths$mrain,names=mrain_allmonths$Month,
main = "Median rainfall amounts over entire period (1850-2014) by months",
las=3,col='dodgerblue')

The barpplot() function above allows for the visualisation of median values by month. At first glance, a seasonal patterns appears evident. December and January show very similar median values, constituting the highest of the year. However, notable peaks are evidence in October and August too. A much lower median value is observed from April through July. Thus, this graph shows a clear temporal dimension to rainfall as a variable in this dataset.

With both the temporal and spatial aspect briefly explored graphically and hinted at, the rationale for mapping these median values according to the location and with explicit reference to median values in January became evident: exploring spatial variation across a specified time of year or temporal interval. The remainder of this exercise documents the steps taken to produce the map.

Filtering out January median rainfall totals for each station

rain %>% group_by(Month,Station) %>% 
  summarise(mrain=median(Rainfall)) -> mrain_allmonths_allstations

Before isolating January median values, the above code was used to group the median rainfall amounts for each month and each station.

Checking the results of the grouping of the medians by station and month

head(mrain_allmonths_allstations)
## # A tibble: 6 x 3
## # Groups:   Month [1]
##   Month Station    mrain
##   <fct> <chr>      <dbl>
## 1 Jan   Ardara     172. 
## 2 Jan   Armagh      75  
## 3 Jan   Athboy      87.1
## 4 Jan   Belfast    102. 
## 5 Jan   Birr        77.5
## 6 Jan   Cappoquinn 147.
View(mrain_allmonths_allstations)

This time, the newly created object was inspected more thoroughly to ensure the code had executed as intended. Intended being to group each station’s monthly median.

As there were over 300 entries for this new object, it was not visualised in the same manner. However, this allowed for the filtering out of median totals for each station only for the month of January.

Filtering out January median values for each station and plotting the distribution for all months

rain %>% group_by(Month,Station) %>% 
  filter(Month=='Jan') %>%
  summarise(mrain=median(Rainfall)) -> mrain_Jan
  with(mrain_allmonths_allstations,plot(Month,mrain,main = "Rainfall distributions across months",xlab="Month", ylab="Rainfall amount (mm)", type='l',col='dodgerblue'))

The new ‘mrain_Jan’ constituted the object to be mapped using the leaflet() package and stations coordinate information.

By using the filter() function in addition to the pipeline operator, January median values could be isolated from the ‘rain’ object as well as simulataneously grouping values by station. At the end, the with() function was used to swiftly refer to the object with all of the months present and create a box and whisker plot.

A box and whisker plot was deemed more appropriate to visualise the distributions that occurred throughout the months of the years, and to contextualise the median. As was observed through the plot, variance in the data seemed to increase in the winter months compared to the summer months, signalled by the presence of wider ranges and outliers. With January, the median (black bar) is contextualised. This may be evidence of many higher value events (extreme) occuring in January, perhaps less frequently than rainfall events which produce values closer to the median. However, it’s an interesting consideration when judging the medians for this month specifically.

Plotting the median rainfall for January over the entire series

rain %>% group_by(Year,Month) %>% 
summarise(mrain=median(Rainfall)) %>% filter(Month=='Jan') -> mrain_allJans

plot(mrain_allJans$mrain[1:165],type='b',ann=F, axes=F, cex=0.6)
axis(1,at=seq(1,165,by=10),labels=seq(1850,2014, by=10), axis(2), title('Median Rainfall (mm) for each January 1850-2014'))

Visualising the medians months for January over the time period shows a pattern which is pretty hard to immediately decipher in terms of any emerging trends. However, where the rationale for making this graph came from is easily discernible in the individual years. It shows the huge variability in wetter and drier years throughout the observation, and the importance of considering this variation for the country as a whole (based off these observations) let alone for each station. The medians to be mapped for each station will be a measure of the centre of this variability throughout time.

A quick check on the structure of total monthly rainfall amounts

rain %>% group_by(Year,Month) %>% summarise(MonthlyTotal=sum(Rainfall)) -> monthly_totals

monthly_totals$MonthlyTotal %>% ts(freq=12,start=1850) -> rain_ts
rain_ts %>% 
  monthplot(col='dodgerblue',col.base='indianred',lwd.base=3, main="Monthly Total Rainfall Amounts 1850-2014", xlab="Month", ylab="Rainfall Amount (mm)")

Above, the time series or ts() function was used to create a time series object in R. Effectively, the code above first grouped the rainfall totals by month and then by year, and making use of the summarise () function, created a new column which calculated the sum of these groupings. As such, this column could be fed into a ts() function to indicate a start date, end date, and what frequency of observation (in this case, 12 times a year i.e. each month). To visualise the results, the monthplot() was used as this allows for the frequency of interest to be visualised after being grouped.

Creating Web Maps

Creating interactive map of median rainfall in Ireland (1850-2014)

leaflet(data=stations,height=400,width=360) %>% addTiles %>%
  setView(-8,53.5,6) %>% addCircleMarkers
## Assuming "Long" and "Lat" are longitude and latitude, respectively

The above leaflet() function takes the coordinate references (latitude and longitude) from the Stations object and creates a web map. The height and width parameters were specified and the addTiles () function adds OpenStreet Map vector tiles.

Changing the basemap so it’s easier to interpret graduated colours for median totals

leaflet(data=stations,height=430,width=390) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,6) %>% addCircleMarkers(fillColor='green',stroke=FALSE,fillOpacity = 0.8)
## Assuming "Long" and "Lat" are longitude and latitude, respectively

The above function is the same as the previous map, but in this case, the addProviderTiles () makes use of other online map tiles that are available to customise the design. The ‘CartoDB.Positron’, as the name suggests, is provided by Carto and Stamen Design.

Joining the median values to the station points layer

mrain_Jan %>% left_join(stations) -> station_medians
## Joining, by = "Station"

Taking the mrain_Jan object created previously, the left_join() function allowed for the creation of a spatial data object by joining the two tables of interest together: mrain_allJans (the variables to be mapped) and Stations (the points to populate) by a common attribute field - station names.

head(station_medians, n=6)
## # A tibble: 6 x 11
## # Groups:   Month [1]
##   Month Station mrain Elevation Easting Northing   Lat  Long County
##   <fct> <chr>   <dbl>     <int>   <dbl>    <dbl> <dbl> <dbl> <chr> 
## 1 Jan   Ardara  172.         15 180788.  394679.  54.8 -8.29 Doneg~
## 2 Jan   Armagh   75          62 287831.  345772.  54.4 -6.64 Armagh
## 3 Jan   Athboy   87.1        87 270400   261700   53.6 -6.93 Meath 
## 4 Jan   Belfast 102.        115 329623.  363141.  54.5 -5.99 Antrim
## 5 Jan   Birr     77.5        73 208017.  203400.  53.1 -7.88 Offaly
## 6 Jan   Cappoq~ 147.         76 213269.  104800.  52.2 -7.8  Water~
## # ... with 2 more variables: Abbreviation <chr>, Source <chr>

From the above preview, the table join was deemed successful.

Selecting a colour palette from RColorBrewer

display.brewer.pal(5,"BrBG")

BrBG_pal <- brewer.pal(5, "BrBG")

From the Color Brewer website, the ‘BrBG’ divergent palette was discovered (Color Brewer, 2018). Using the RColorBrewer library, the display.brewer.pal () function above showed what the palette would look like for 5 classes. This palette in particular was identified as appropriate due to the idea of brown representing relatively drier conditions while green is a more lush colour representing wetter areas in this case. The BrBG palette was also chosen as it was listed as colour blind friendly.

‘BrBG_pal’ packaged this into an object.

Creating the colour palette for mapping the medians

Rain_pal <- colorNumeric(palette = BrBG_pal, domain = station_medians$mrain)
previewColors(Rain_pal, fivenum(station_medians$mrain))

Colors: Rain_pal
Values: fivenum(station_medians$mrain)

63
92.9
105.7
126.4
177.7

The ‘Rain_pal’ function was created to operationalise a gradient scale for the map legend that would classify rainfall values appropriately. The use of the previewColors function which incoporated the Tukey Five-Number summary function produced a legend scale which gave, from top to bottom respectively, the minimum value, lower-hinge (25th percentile), median, upper-hinge (75th percentile), and maximum value.

Creating the map of median rainfall amounts (for January) for 25 rainfall stations across Ireland

leaflet(data=station_medians,height=450,width=620,) %>% addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,6) %>% addCircleMarkers(fillColor=~Rain_pal(mrain),weight=0,fillOpacity = 0.85, popup=station_medians$Station) %>%
  addLegend(pal=Rain_pal,values=~mrain,title="Jan Median rainfall (mm) 1850-2014",position='bottomright')
## Assuming "Long" and "Lat" are longitude and latitude, respectively

First, the leaflet() function was used as it enables a Leaflet map widget which could be rendered on this HTML page. The data was sourced from the database with both the January median values and coordinate information. The same basemap were then imported and the circle symbols chosen with the addCircleMarkers () function. The popup argument allowed for a degree of interaction, with the station name displaying when clicked. Lastly, the addLegend () function allowed for positioning, using the specific palette, and creating a title.

Discussion and Conclusion

Perhaps it is sensible to be cautious when interpreting this map due to the spread of values that constitute a median rainfall amount in this map. The strength of the colour associations may not be accurately captured by this color gradient, but this method was employed with the intention of representing these values in a spatial context.

Regardless, It is clear from the above map of median rainfall that there is a general east-west gradient towards wetter conditions on the Atlantic coasts of Ireland as opposed to the eastern Irish Sea coast, with one or two exceptions either side. Dublin stations in particular show a dense concentration of much lower median rainfall values, but these comparatively lower or middle range median values extend throughout the midlands and into the north too. Perhaps this is due to the rainshadow effect of the Wicklow mountains.

Kerry (Kilarney and Valentia stations) and Donegal (Ardara station) stations seem to display the highest median values. This is somewhat expected for two reasons:

  1. Ireland’s prevailing winds are south-westerly which bring moist air over the western part of Ireland, with mid-latitude storm tracks often crossing Ireland in the winter months (Met Eireann, 2018b), and

  2. Donegal and Kerry have a relatively rugged and mountainous landscape compared to many other counties of Ireland and perhaps this plays a role in rainfall amounts at these stations.

To investigate this, the ‘Elevation’ attribute was made use of in the station_medians object. The same process for mapping the median amounts was applied to the elevation values map.

color_elevation <- colorNumeric('YlOrBr',station_medians$Elevation)

leaflet(data=station_medians,height=450,width=620,) %>% addProviderTiles('CartoDB.Positron') %>% setView(-8,53.5,6) %>% addCircleMarkers(fillColor=~color_elevation(Elevation),weight=0,fillOpacity = 0.85, popup=station_medians$Station) %>%
addLegend(pal=color_elevation,values=~Elevation,title="Station Elevations (m)",position='bottomright')
## Assuming "Long" and "Lat" are longitude and latitude, respectively

Despite the reference to Mountainous regions receiving more than 2000 mm annually by Met Eireann (2018a), stations on the west coast appear to be at much lower elevations. Perhaps, initially this is counterintuitive to the mountainous region hypothesis but not necessarily so on further consideration. Rainfall dyanmics in mountainous regions are affected by large scale synoptic conidtions but also hugely influenced by the local terrain.

If these stations lie on the windward side of mountain ranges, their high values may represent this dynamic. Similarly, for stations at high elevations such as Cork and in Wicklow, their higher elevations indicate that they may be located in different environments. However, this is just a consideration that cannot be verified by this data and would need significant analysis of more variables to gain credibility.

In conclusion, this document set out to illustrate the steps taken to produce a web map of Ireland which showed median rainfall amounts at 25 stations for the month of January for the period 1850 - 2014. It is hoped that by using RMarkdown that the methods and logic proposed here are sufficiently clear and transparent for anyone to reproduce this exercise. As shown above, rainfall data is noisy. It is hard to observe any obvious trends on both graphs and maps. Nonetheless, observational records provide a huge resource for statistical analysis of this important variable. Going forward, progress in rescuing historical rainfall data and ensuring current values are captured in areas throughout the world with barely any records will be vital to detecting trends and effects in this climate variable.

Bibliography

Brunsdon, C. (2018a) Reproducible Research Using RMarkdown [RPubs blog accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 5 January 2018).

Brunsdon, C. (2018b) Further Interactive Techniques [RPubs blog accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 5 January 2018).

Brunsdon, C. (2018c) Working with dplyr [pdf accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 2 January 2018).

Color Brewer (2018) Color Brewer 2.0 Color Advice for Cartography [online]. Available at: http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3 (accessed 7 January 2018).

Met Eireann (2018a) Rainfall [online]. Available at: https://www.met.ie/climate/what-we-measure/rainfall (accessed 3 January 2018).

Met Eireann (2018b) Wind [online]. Available at: https://www.met.ie/climate/what-we-measure/wind (accessed 3 January 2018).

National Oceanic and Atmospheric Administration (n.d.) Climate Time Series: Random Variables [online]. Available at: http://www.nws.noaa.gov/om/csd/pds/PCU2/statistics/Stats/part1/CTS_random.htm (accessed 7 January 2018).