R is a useful programming language for undertaking statistical analysis of many kinds of data. However, it is also an apt tool for communicating data analysis results through appropriate data visualisation techniques. In this exercise, the steps taken to create a web map of median rainfall amounts (for the period 1850-2014) at weather stations across Ireland for the month of January will be documented. The aim of such data visualisation being to effectively and simply communicate patterns observed in the data in a spatial context. By using RMarkdown, the R code employed at each stage of the data exploration and analysis process will show transparency in the code used and decisions made and will accompany any discussion of this process or subsequent results. This is to keep with the ethos of transparency and reproducible research (Brunsdon, 2018a). Lastly, a discussion of the visual patterns which emerge will conclude this investigation.
Rainfall is a key climate variable of which reliable historical observations (data) can be analysed to investigate how climate change may have impacted rainfall amounts, patterns, or more broadly the water cycle, in a given location over time (Met Eireann, 2018a). With the existence of the Island of Ireland Precipitation (IIP) series, it is possible to assess median rainfall at stations across Ireland for a 164 year period. As such, rainfall amounts (a continuous ratio variable bounded at a value of 0 (NOAA, n.d.)) will be explored and mapped in this exercise.
Firstly, the working directory was set to access to rainfall and station data using the setwd() function (the function not included as R code here as it is not appropriate for the format of this document).
setwd("C:/Users/Jordan/Desktop/R/R Assignment 2")
The data for this project came from two data sets contained within the ‘rainfall.Rdata’ file (which was downloaded in a ready to use format from Moodle and saved in the working directory folder). The rainfall.Rdata file was first loaded. All of the code from this point onwards is sourced from the lectures given as part of the Methods & Techniques in Geocomputation module (Brunsdon, 2018b).
load('rainfall.RData')
The head () function was used to check the contents of the two tables.
head(rain)
## Year Month Rainfall Station
## 1 1850 Jan 169.0 Ardara
## 2 1851 Jan 236.4 Ardara
## 3 1852 Jan 249.7 Ardara
## 4 1853 Jan 209.1 Ardara
## 5 1854 Jan 188.5 Ardara
## 6 1855 Jan 32.3 Ardara
head(stations)
## Station Elevation Easting Northing Lat Long County
## 1 Athboy 87 270400 261700 53.60 -6.93 Meath
## 2 Foulksmills 71 284100 118400 52.30 -6.77 Wexford
## 3 Mullingar 112 241780 247765 53.47 -7.37 Westmeath
## 4 Portlaw 8 246600 115200 52.28 -7.31 Waterford
## 5 Rathdrum 131 319700 186000 52.91 -6.22 Wicklow
## 6 Strokestown 49 194500 279100 53.75 -8.10 Roscommon
## Abbreviation Source
## 1 AB Met Eireann
## 2 F Met Eireann
## 3 M Met Eireann
## 4 P Met Eireann
## 5 RD Met Eireann
## 6 S Met Eireann
From the above head() functions, it was observed that the variables of specific interest (rainfall amounts in January, station names, and coordinate information) for mapping stations and their associated median rainfall in January were initially in two separate objects. Prior to beginning the process of combining these datasets for the desired attributes, the libraries necessary for analysis were loaded and the datasets explored further.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("knitr")
## Warning: package 'knitr' was built under R version 3.5.2
library("leaflet")
library("RColorBrewer")
‘knitr’ is essential for publishing this document in a HTML blog format with the associated chunks of R code upon completion. The error message highlighted the fact that the knitr package was built under a different version of R. However, it did not cause a problem to use.
The ‘dplyr’ library is essential for data manipulation functions such as pipe operator for inserting values into functions (Brunsdon, 2018c), and the mutate and filter functions which are important for producing a map of median rainfall for only January over the time period. With regard to the warning message which appeared upon loading dplyr, this was important to recognise as it informed us that the objects would be masked from the stats and base packages i.e. that the dplyr library would be referred to for these functions.
‘leaflet’ is essential for designing and creating the interactive map of derived medians by station.
‘RColorBrewer’ will allow for colour palettes which work well on maps.
summary(rain$Rainfall)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 53.60 82.20 90.97 118.10 460.50
sd(rain$Rainfall)
## [1] 51.60415
Using the summary() function gave a high level glance at indicators such as the mean and median to assess central tendency and the standard deviation gave hints to the underlying distribution of the daily rainfall amounts - giving an indication of what can be viewed as comparatively high levels of rain for the period, and similarly with lower levels. The sd() gaves the standard deviation for the dataset - used to quickly check how far from the mean rainfall amount an observation is likely to lie in.
mn.allrain <- mean(rain$Rainfall)
md.allrain <- median(rain$Rainfall)
mn.allrain
## [1] 90.97078
md.allrain
## [1] 82.2
hist(rain$Rainfall, main="Distribution of Rainfall Value amounts recorded (1850-2014)")
abline(v=c(md.allrain,mn.allrain),col=c("red","blue"),lty=c(2,2),lwd=c(2,2))
Hist() visualised this distribution and a positive skew was revealed - typical of rainfall data and many datasets. Given the summary() showed the mean was a higher value than the median (indicated by the broken lines on the histogram)
Blue - Mean Red - Median
It is important to note this distribution when considering variance, but bearing in mind that this histogram represents the combined values of all stations and the median is being used, there was no transformation applied.
rain %>% group_by(Station) %>%
summarise(medianrain=median(Rainfall)) -> rain_summary
The code above illustrates the use of the pipeline operator (%>%). Essentially, this code created the object ‘rain_summary’ by taking the ‘rain’ data, grouping it by the stations column within the same object, and using the summarise function to determine the median for each of the stations individually and create a new attribute coloumn.
head(rain_summary)
## # A tibble: 6 x 2
## Station medianrain
## <chr> <dbl>
## 1 Ardara 132.
## 2 Armagh 65.4
## 3 Athboy 70.7
## 4 Belfast 82.1
## 5 Birr 66.2
## 6 Cappoquinn 113.
A preview of the new ‘rain_summary’ object then previewed the results.
barplot(rain_summary$medianrain, main = "Median rainfall amounts over entire period (1850-2014) by station",
xlab = "Stations", ylab = "Rainfall (mm)",
col=("darkblue"), names.arg = rain_summary$Station)
A quick visualisation using the barplot function shows much variation in median values between stations at an initial glance. Ardara and Valentia specifically seem to show some of the highest values - both in the most western regions of Ireland, albeit at opposite ends of the country in terms of latitude. An interesting observation to consider for all of the stations, hinting at a potential underlying spatial aspect or gradient to the varying rainfall amounts observed.
rain %>% group_by(Month) %>%
summarise(mrain=median(Rainfall)) -> mrain_allmonths
Similar to the previous chunk looking at grouping by stations, the above code makes use of the ‘month’ column in the ‘rain’ object to group the data for calculation of monthly medians.
head(mrain_allmonths)
## # A tibble: 6 x 2
## Month mrain
## <fct> <dbl>
## 1 Jan 105.
## 2 Feb 74.4
## 3 Mar 73.1
## 4 Apr 62.5
## 5 May 65.5
## 6 Jun 65.2
The head() function again showed the varying medians by month for all of the stations across all of the months.
barplot(mrain_allmonths$mrain,names=mrain_allmonths$Month,
main = "Median rainfall amounts over entire period (1850-2014) by months",
las=3,col='dodgerblue')
The barpplot() function above allows for the visualisation of median values by month. At first glance, a seasonal patterns appears evident. December and January show very similar median values, constituting the highest of the year. However, notable peaks are evidence in October and August too. A much lower median value is observed from April through July. Thus, this graph shows a clear temporal dimension to rainfall as a variable in this dataset.
With both the temporal and spatial aspect briefly explored graphically and hinted at, the rationale for mapping these median values according to the location and with explicit reference to median values in January became evident: exploring spatial variation across a specified time of year or temporal interval. The remainder of this exercise documents the steps taken to produce the map.
rain %>% group_by(Month,Station) %>%
summarise(mrain=median(Rainfall)) -> mrain_allmonths_allstations
Before isolating January median values, the above code was used to group the median rainfall amounts for each month and each station.
head(mrain_allmonths_allstations)
## # A tibble: 6 x 3
## # Groups: Month [1]
## Month Station mrain
## <fct> <chr> <dbl>
## 1 Jan Ardara 172.
## 2 Jan Armagh 75
## 3 Jan Athboy 87.1
## 4 Jan Belfast 102.
## 5 Jan Birr 77.5
## 6 Jan Cappoquinn 147.
View(mrain_allmonths_allstations)
This time, the newly created object was inspected more thoroughly to ensure the code had executed as intended. Intended being to group each station’s monthly median.
As there were over 300 entries for this new object, it was not visualised in the same manner. However, this allowed for the filtering out of median totals for each station only for the month of January.
rain %>% group_by(Month,Station) %>%
filter(Month=='Jan') %>%
summarise(mrain=median(Rainfall)) -> mrain_Jan
with(mrain_allmonths_allstations,plot(Month,mrain,main = "Rainfall distributions across months",xlab="Month", ylab="Rainfall amount (mm)", type='l',col='dodgerblue'))
The new ‘mrain_Jan’ constituted the object to be mapped using the leaflet() package and stations coordinate information.
By using the filter() function in addition to the pipeline operator, January median values could be isolated from the ‘rain’ object as well as simulataneously grouping values by station. At the end, the with() function was used to swiftly refer to the object with all of the months present and create a box and whisker plot.
A box and whisker plot was deemed more appropriate to visualise the distributions that occurred throughout the months of the years, and to contextualise the median. As was observed through the plot, variance in the data seemed to increase in the winter months compared to the summer months, signalled by the presence of wider ranges and outliers. With January, the median (black bar) is contextualised. This may be evidence of many higher value events (extreme) occuring in January, perhaps less frequently than rainfall events which produce values closer to the median. However, it’s an interesting consideration when judging the medians for this month specifically.
rain %>% group_by(Year,Month) %>%
summarise(mrain=median(Rainfall)) %>% filter(Month=='Jan') -> mrain_allJans
plot(mrain_allJans$mrain[1:165],type='b',ann=F, axes=F, cex=0.6)
axis(1,at=seq(1,165,by=10),labels=seq(1850,2014, by=10), axis(2), title('Median Rainfall (mm) for each January 1850-2014'))
Visualising the medians months for January over the time period shows a pattern which is pretty hard to immediately decipher in terms of any emerging trends. However, where the rationale for making this graph came from is easily discernible in the individual years. It shows the huge variability in wetter and drier years throughout the observation, and the importance of considering this variation for the country as a whole (based off these observations) let alone for each station. The medians to be mapped for each station will be a measure of the centre of this variability throughout time.
rain %>% group_by(Year,Month) %>% summarise(MonthlyTotal=sum(Rainfall)) -> monthly_totals
monthly_totals$MonthlyTotal %>% ts(freq=12,start=1850) -> rain_ts
rain_ts %>%
monthplot(col='dodgerblue',col.base='indianred',lwd.base=3, main="Monthly Total Rainfall Amounts 1850-2014", xlab="Month", ylab="Rainfall Amount (mm)")
Above, the time series or ts() function was used to create a time series object in R. Effectively, the code above first grouped the rainfall totals by month and then by year, and making use of the summarise () function, created a new column which calculated the sum of these groupings. As such, this column could be fed into a ts() function to indicate a start date, end date, and what frequency of observation (in this case, 12 times a year i.e. each month). To visualise the results, the monthplot() was used as this allows for the frequency of interest to be visualised after being grouped.
leaflet(data=stations,height=400,width=360) %>% addTiles %>%
setView(-8,53.5,6) %>% addCircleMarkers
## Assuming "Long" and "Lat" are longitude and latitude, respectively
The above leaflet() function takes the coordinate references (latitude and longitude) from the Stations object and creates a web map. The height and width parameters were specified and the addTiles () function adds OpenStreet Map vector tiles.
leaflet(data=stations,height=430,width=390) %>% addProviderTiles('CartoDB.Positron') %>%
setView(-8,53.5,6) %>% addCircleMarkers(fillColor='green',stroke=FALSE,fillOpacity = 0.8)
## Assuming "Long" and "Lat" are longitude and latitude, respectively
The above function is the same as the previous map, but in this case, the addProviderTiles () makes use of other online map tiles that are available to customise the design. The ‘CartoDB.Positron’, as the name suggests, is provided by Carto and Stamen Design.
mrain_Jan %>% left_join(stations) -> station_medians
## Joining, by = "Station"
Taking the mrain_Jan object created previously, the left_join() function allowed for the creation of a spatial data object by joining the two tables of interest together: mrain_allJans (the variables to be mapped) and Stations (the points to populate) by a common attribute field - station names.
head(station_medians, n=6)
## # A tibble: 6 x 11
## # Groups: Month [1]
## Month Station mrain Elevation Easting Northing Lat Long County
## <fct> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Jan Ardara 172. 15 180788. 394679. 54.8 -8.29 Doneg~
## 2 Jan Armagh 75 62 287831. 345772. 54.4 -6.64 Armagh
## 3 Jan Athboy 87.1 87 270400 261700 53.6 -6.93 Meath
## 4 Jan Belfast 102. 115 329623. 363141. 54.5 -5.99 Antrim
## 5 Jan Birr 77.5 73 208017. 203400. 53.1 -7.88 Offaly
## 6 Jan Cappoq~ 147. 76 213269. 104800. 52.2 -7.8 Water~
## # ... with 2 more variables: Abbreviation <chr>, Source <chr>
From the above preview, the table join was deemed successful.
display.brewer.pal(5,"BrBG")
BrBG_pal <- brewer.pal(5, "BrBG")
From the Color Brewer website, the ‘BrBG’ divergent palette was discovered (Color Brewer, 2018). Using the RColorBrewer library, the display.brewer.pal () function above showed what the palette would look like for 5 classes. This palette in particular was identified as appropriate due to the idea of brown representing relatively drier conditions while green is a more lush colour representing wetter areas in this case. The BrBG palette was also chosen as it was listed as colour blind friendly.
‘BrBG_pal’ packaged this into an object.
Rain_pal <- colorNumeric(palette = BrBG_pal, domain = station_medians$mrain)
previewColors(Rain_pal, fivenum(station_medians$mrain))
Rain_pal
fivenum(station_medians$mrain)
| 63 | |
| 92.9 | |
| 105.7 | |
| 126.4 | |
| 177.7 |
The ‘Rain_pal’ function was created to operationalise a gradient scale for the map legend that would classify rainfall values appropriately. The use of the previewColors function which incoporated the Tukey Five-Number summary function produced a legend scale which gave, from top to bottom respectively, the minimum value, lower-hinge (25th percentile), median, upper-hinge (75th percentile), and maximum value.
leaflet(data=station_medians,height=450,width=620,) %>% addProviderTiles('CartoDB.Positron') %>%
setView(-8,53.5,6) %>% addCircleMarkers(fillColor=~Rain_pal(mrain),weight=0,fillOpacity = 0.85, popup=station_medians$Station) %>%
addLegend(pal=Rain_pal,values=~mrain,title="Jan Median rainfall (mm) 1850-2014",position='bottomright')
## Assuming "Long" and "Lat" are longitude and latitude, respectively
First, the leaflet() function was used as it enables a Leaflet map widget which could be rendered on this HTML page. The data was sourced from the database with both the January median values and coordinate information. The same basemap were then imported and the circle symbols chosen with the addCircleMarkers () function. The popup argument allowed for a degree of interaction, with the station name displaying when clicked. Lastly, the addLegend () function allowed for positioning, using the specific palette, and creating a title.
Perhaps it is sensible to be cautious when interpreting this map due to the spread of values that constitute a median rainfall amount in this map. The strength of the colour associations may not be accurately captured by this color gradient, but this method was employed with the intention of representing these values in a spatial context.
Regardless, It is clear from the above map of median rainfall that there is a general east-west gradient towards wetter conditions on the Atlantic coasts of Ireland as opposed to the eastern Irish Sea coast, with one or two exceptions either side. Dublin stations in particular show a dense concentration of much lower median rainfall values, but these comparatively lower or middle range median values extend throughout the midlands and into the north too. Perhaps this is due to the rainshadow effect of the Wicklow mountains.
Kerry (Kilarney and Valentia stations) and Donegal (Ardara station) stations seem to display the highest median values. This is somewhat expected for two reasons:
Ireland’s prevailing winds are south-westerly which bring moist air over the western part of Ireland, with mid-latitude storm tracks often crossing Ireland in the winter months (Met Eireann, 2018b), and
Donegal and Kerry have a relatively rugged and mountainous landscape compared to many other counties of Ireland and perhaps this plays a role in rainfall amounts at these stations.
To investigate this, the ‘Elevation’ attribute was made use of in the station_medians object. The same process for mapping the median amounts was applied to the elevation values map.
color_elevation <- colorNumeric('YlOrBr',station_medians$Elevation)
leaflet(data=station_medians,height=450,width=620,) %>% addProviderTiles('CartoDB.Positron') %>% setView(-8,53.5,6) %>% addCircleMarkers(fillColor=~color_elevation(Elevation),weight=0,fillOpacity = 0.85, popup=station_medians$Station) %>%
addLegend(pal=color_elevation,values=~Elevation,title="Station Elevations (m)",position='bottomright')
## Assuming "Long" and "Lat" are longitude and latitude, respectively
Despite the reference to Mountainous regions receiving more than 2000 mm annually by Met Eireann (2018a), stations on the west coast appear to be at much lower elevations. Perhaps, initially this is counterintuitive to the mountainous region hypothesis but not necessarily so on further consideration. Rainfall dyanmics in mountainous regions are affected by large scale synoptic conidtions but also hugely influenced by the local terrain.
If these stations lie on the windward side of mountain ranges, their high values may represent this dynamic. Similarly, for stations at high elevations such as Cork and in Wicklow, their higher elevations indicate that they may be located in different environments. However, this is just a consideration that cannot be verified by this data and would need significant analysis of more variables to gain credibility.
In conclusion, this document set out to illustrate the steps taken to produce a web map of Ireland which showed median rainfall amounts at 25 stations for the month of January for the period 1850 - 2014. It is hoped that by using RMarkdown that the methods and logic proposed here are sufficiently clear and transparent for anyone to reproduce this exercise. As shown above, rainfall data is noisy. It is hard to observe any obvious trends on both graphs and maps. Nonetheless, observational records provide a huge resource for statistical analysis of this important variable. Going forward, progress in rescuing historical rainfall data and ensuring current values are captured in areas throughout the world with barely any records will be vital to detecting trends and effects in this climate variable.
Brunsdon, C. (2018a) Reproducible Research Using RMarkdown [RPubs blog accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 5 January 2018).
Brunsdon, C. (2018b) Further Interactive Techniques [RPubs blog accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 5 January 2018).
Brunsdon, C. (2018c) Working with dplyr [pdf accessed through Moodle]. NCG602A: Methods & Techniques in Geocomputation. Maynooth University (accessed 2 January 2018).
Color Brewer (2018) Color Brewer 2.0 Color Advice for Cartography [online]. Available at: http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3 (accessed 7 January 2018).
Met Eireann (2018a) Rainfall [online]. Available at: https://www.met.ie/climate/what-we-measure/rainfall (accessed 3 January 2018).
Met Eireann (2018b) Wind [online]. Available at: https://www.met.ie/climate/what-we-measure/wind (accessed 3 January 2018).
National Oceanic and Atmospheric Administration (n.d.) Climate Time Series: Random Variables [online]. Available at: http://www.nws.noaa.gov/om/csd/pds/PCU2/statistics/Stats/part1/CTS_random.htm (accessed 7 January 2018).