Introduction

The objective is to create a map of rainfall in Ireland for the 25 weather stations, colour coding the symbol for each station according to its median rainfall level in January.

On the following sections, the titles correspond to the procedures executed. For each one of them, the grey boxes show the codes inserted on the R script, followed by the outputs generated and explanatory texts about the codes inserted.

Finally, the conclusion brings a brief discussion of patterns identified.

Load the data

setwd("D:/Program Files/RStudio")
load('rainfall.RData')

Dataset with two varibles uploaded: rain and stations

head(stations)
##       Station Elevation Easting Northing   Lat  Long    County
## 1      Athboy        87  270400   261700 53.60 -6.93     Meath
## 2 Foulksmills        71  284100   118400 52.30 -6.77   Wexford
## 3   Mullingar       112  241780   247765 53.47 -7.37 Westmeath
## 4     Portlaw         8  246600   115200 52.28 -7.31 Waterford
## 5    Rathdrum       131  319700   186000 52.91 -6.22   Wicklow
## 6 Strokestown        49  194500   279100 53.75 -8.10 Roscommon
##   Abbreviation      Source
## 1           AB Met Eireann
## 2            F Met Eireann
## 3            M Met Eireann
## 4            P Met Eireann
## 5           RD Met Eireann
## 6            S Met Eireann

‘Stations’ contains information about the 25 weather stations, in particular their location.

head(rain)
##   Year Month Rainfall Station
## 1 1850   Jan    169.0  Ardara
## 2 1851   Jan    236.4  Ardara
## 3 1852   Jan    249.7  Ardara
## 4 1853   Jan    209.1  Ardara
## 5 1854   Jan    188.5  Ardara
## 6 1855   Jan     32.3  Ardara

‘Rain’ contains rainfall records for each station over a 164 year period (January 1850 to December 2014)

Median rainfall summary for each station over the entire period

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
rain %>% group_by(Station) %>% 
  summarise(mrain=median(Rainfall))  -> rain_summary
head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     132. 
## 2 Armagh      65.4
## 3 Athboy      70.7
## 4 Belfast     82.1
## 5 Birr        66.2
## 6 Cappoquinn 113.

County boundaries and settlements for leaflet

The ‘dplyr’ package provides an extension of the R statistical programming language that allows easier manipulation of data, and a ‘pipelining’-style notation, which in some data processing situations makes instructions much clearer than the classical ‘mathematical function’ notation.

Summarise applies a summary function to each group and creates a new data frame with one entry for each group. Here, the R summary function used is the median rainfall.

Investigate rainfall summary

head(rain_summary)
## # A tibble: 6 x 2
##   Station    mrain
##   <chr>      <dbl>
## 1 Ardara     132. 
## 2 Armagh      65.4
## 3 Athboy      70.7
## 4 Belfast     82.1
## 5 Birr        66.2
## 6 Cappoquinn 113.

The ‘head’ function returns the first part of the analysed object, which is the rainfall summary by station, grouped using the pipeline operator (to return the last part of the object, the tail function would be used)

Monthly rainfall summary for each station

rain %>% group_by(Month) %>% 
  summarise(mrain=median(Rainfall)) -> rain_months
head(rain_months)
## # A tibble: 6 x 2
##   Month mrain
##   <fct> <dbl>
## 1 Jan   105. 
## 2 Feb    74.4
## 3 Mar    73.1
## 4 Apr    62.5
## 5 May    65.5
## 6 Jun    65.2

This time, the pipeline operator is used to group the median of rainfall data by month instead of grouping it by station

Summarise applies a summary function to each group and creates a new data frame with one entry for each group. Here, the R summary function used is the median monthly rainfall.

The ‘head’ function returns the first part of the analysed object, which is the median of monthly rainfall

Grouping each station by median monthly rainfall

rain %>% group_by(Month,Station) %>%
  summarise(median_rain=median(Rainfall)) -> Median_rain_month_station
head(Median_rain_month_station)
## # A tibble: 6 x 3
## # Groups:   Month [1]
##   Month Station    median_rain
##   <fct> <chr>            <dbl>
## 1 Jan   Ardara           172. 
## 2 Jan   Armagh            75  
## 3 Jan   Athboy            87.1
## 4 Jan   Belfast          102. 
## 5 Jan   Birr              77.5
## 6 Jan   Cappoquinn       147.

The pipeline operator is used here to group the median of rainfall data by month and by station (first we had grouped it just by month and then just by station)

The R summary function used here is the median monthly rainfall by station (previously we had done just the median monthly rainfall, without designating the stations)

The ‘head’ function returns the first part of the analysed object, which is the median monthly rainfall of each station.

Basic investigation: heatmap for median rainfall data by station

library(reshape2)
Median_rain_month_station %>% acast(Station~Month) %>% heatmap(Colv=NA)
## Using median_rain as value column: use value.var to override.

The ‘reshape2’ is an R package that makes it easy to transform data between wide and long formats (wide data has a column for each variable, while long-format data has a column for possible variable types and a column for the values of those variables)

The pipeline operator is used here to group the median of rainfall data by month and by station in a heat map. This code takes the rain data frame and runs through the steps to aggregated, reshape and compute the heat map.

Filtering out January median rainfall totals for each station

rain %>% group_by(Month,Station) %>%
  filter(Month=='Jan') %>%
  summarise(median_rain=median(Rainfall)) -> Median_rain_Jan

The pipeline operator is used here to group the median of rainfall data by month and by station

As our objective is to map the median rainfall level in January over the weather stations in Ireland, the filter function is used here to filter the January data among the median monthly rainfall data.

The R summary function used here is the median January rainfall by station (previously we had done the median monthly rainfall by station, without filtering the month January)

Time-serie of median rainfall data for January over the period 1850-2014

rain %>% group_by(Year,Month) %>% summarise(median_rain=median(Rainfall)) %>%
  filter(Month=='Jan') -> monthly_medians

The pipeline operator is used here to group the median rainfall data by year and by month

The January medians are selected using the ‘filter’ function

plot(monthly_medians$median_rain[1:165],type='b',ann=F, axes=F, cex=0.6)
axis(1,at=seq(1,165,by=1),labels=1850:2014); axis(2); title('Median Rainfall for January 1850-2014')

The filtered median rainfall data for January over the period 1850-2014 is now used to create a plot. ‘Monthly_medians’ is the data frame created previously, from which we want to extract (using the ‘$’ symbol) the median rainfall observed for the period of 1850 to 2014 (columns 1 to 165)

Interactive map with the stations location around Ireland

library(leaflet)
leaflet(data=stations,height=430,width=390) %>% addTiles %>%
  setView(-8,53.5,6) %>% addCircleMarkers
## Assuming "Long" and "Lat" are longitude and latitude, respectively

Leaflet is an open-source JavaScript library for interactive maps

This function creates a Leaflet map widget using html widgets. This widget can be rendered on HTML pages generated from R Markdown. The data we want to represent in the map are the geographical location of the weather stations across Ireland (data=stations).

The ‘setView’ are a series of methods to manipulate the map. The coordinates of the map bounds (latitude and longitude) are described inside the brackets.

Changing the backgroung to improve visualization

leaflet(data=stations,height=430,width=390) %>%
  addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,6) %>% addCircleMarkers
## Assuming "Long" and "Lat" are longitude and latitude, respectively

CartoDB is a software cloud computing platform that provides GIS and web mapping tools for display in a web browser. Positron is a grey-scale map, light based, which is ideal for a non obtrusive basemap for data visualization.

Linking the rainfall data frame with the geographical location of the stations

library(dplyr)

The ‘dplyr’ package provides an extension of the R statistical programming language that allows easier manipulation of data, and a ‘pipelining’-style notation, which in some data processing situations makes instructions much clearer than the classical ‘mathematical function’ notation.

load('rainfall.RData')
rain %>% group_by(Station) %>%  summarise(median_rain=median(Rainfall)) -> rain_summary
rain_summary %>% left_join(stations) -> station_medians
## Joining, by = "Station"

The pipeline operator is used here to join the median rainfall values to the geographical location of the stations, creating station median objects with’lat’ and ‘long’ information and median rainfall data.

Viewing the new variable

head(station_medians, n=10)
## # A tibble: 10 x 10
##    Station median_rain Elevation Easting Northing   Lat  Long County
##    <chr>         <dbl>     <int>   <dbl>    <dbl> <dbl> <dbl> <chr> 
##  1 Ardara        132.         15 180788.  394679.  54.8 -8.29 Doneg~
##  2 Armagh         65.4        62 287831.  345772.  54.4 -6.64 Armagh
##  3 Athboy         70.7        87 270400   261700   53.6 -6.93 Meath 
##  4 Belfast        82.1       115 329623.  363141.  54.5 -5.99 Antrim
##  5 Birr           66.2        73 208017.  203400.  53.1 -7.88 Offaly
##  6 Cappoq~       113.         76 213269.  104800.  52.2 -7.8  Water~
##  7 Cork A~        90         154 167336.   64387.  51.8 -8.47 Cork  
##  8 Derry          80.5        37 226324.  416014.  55.0 -7.58 Derry 
##  9 Drumsna        80.2        45 200000   295800   53.9 -8    Leitr~
## 10 Dublin~        56          71 319767.  240361.  53.4 -6.2  Dublin
## # ... with 2 more variables: Abbreviation <chr>, Source <chr>

The ‘head’ function returns the first part (here, the 10 first lines) of the analysed object, which is the median objects with’lat’ and ‘long’ information and median rainfall data.

Creating the color mapping for January median rainfall over the period 1850-2014

color_mapping <- colorNumeric('PuBuGn',station_medians$median_rain)
previewColors(color_mapping,fivenum(station_medians$median_rain))

Colors: color_mapping
Values: fivenum(station_medians$median_rain)

56
73
82.1
91.9
131.5
leaflet(data=station_medians,height=430,width=600) %>%
  addProviderTiles('CartoDB.Positron')  %>%
  setView(-8,53.5,6) %>%
  addCircleMarkers(fillColor=~color_mapping(median_rain),weight=0,fillOpacity = 0.75) %>%
  addLegend(pal=color_mapping,values=~median_rain,title="Rainfall(median)",position='bottomleft')
## Assuming "Long" and "Lat" are longitude and latitude, respectively

A sequential palette (PuBuGn, a gradient from blue to green) was chosen to order median rainfall data that progress from low to high.The data frame (station_medians) from which the data (median_rain) was extracted (using the ‘$’ symbol) is the one that joins the median rainfall values to the geographical location of the stations, creating station median objects with’lat’ and ‘long’ information and median rainfall data.

The ‘previewColors’ function is used to give a preview of the color palette selected

The ‘leaflet’ function creates a Leaflet map widget using html widgets. This widget can be rendered on HTML pages generated from R Markdown. The data we want to represent in the map are the geographical location of the weather stations across Ireland (data=stations).

CartoDB is a software cloud computing platform that provides GIS and web mapping tools for display in a web browser. Positron is a grey-scale map, light based, which is ideal for a non obtrusive basemap for data visualization.

‘setView’ are a series of methods to manipulate the map. The coordinates of the map bounds (latitude and longitude) are described inside the brackets.

The function ‘addCircleMarkers’ is used to add graphic elements and layers to the map widget. The fill-opacity attribute is a presentation attribute defining the opacity of the paint server (e.g. color, gradient, pattern) applied to the circles.

The ‘addLegend’ function creates a legend to the map, and it is possible to select the ‘title’ and the ‘position’ of the legend

Rearrange rainfall data to go by month for each station

rain %>% arrange(Year,Month) -> rain2
head(rain2,n=4)
## # A tibble: 4 x 4
##    Year Month Rainfall Station   
##   <dbl> <fct>    <dbl> <chr>     
## 1  1850 Jan      169   Ardara    
## 2  1850 Jan       96.9 Derry     
## 3  1850 Jan      108.  Malin Head
## 4  1850 Jan       92.5 Armagh
local_monthplot <- function(station,raindata) {
  raindata %>% filter(Station == station) -> local_rd
  local_rd$Rainfall %>% ts(freq=12,start=1850) -> rain_ts
  rain_ts %>% monthplot(col='dodgerblue',col.base='indianred',lwd.base=3)
}

local_monthplot('Derry',rain2)

Save as image

png('test.png',width=400,height=300)
local_monthplot('Derry',rain2)
dev.off()
## png 
##   2

Create a filename for each station

library(leaflet)
stations %>% mutate(Filename=paste0('mp ',Station,'.png')) -> files
files %>% select(Station,Filename) %>% head
## # A tibble: 6 x 2
##   Station     Filename          
##   <chr>       <chr>             
## 1 Athboy      mp Athboy.png     
## 2 Foulksmills mp Foulksmills.png
## 3 Mullingar   mp Mullingar.png  
## 4 Portlaw     mp Portlaw.png    
## 5 Rathdrum    mp Rathdrum.png   
## 6 Strokestown mp Strokestown.png

Create a month plot for each station and store as in image (png) in the working directory

for (i in 1:nrow(files))
  with(files, {
    png(Filename[i],width=400,height=300)
    local_monthplot(Station[i],rain2)
    dev.off()} )

Conclusion

Analysing the graphs and maps obtained, it was possible to see that, on the heatmap, the months with most rainfall are April, May and June. The stations with the greatest amount of precipitation vary according to the month, but in general, some of the stations with a significant amount of precipitation on the most rainy months (April, May and June) are: Foulksmills, Roches Point, Portlaw, and Ardara.

The graph featuring the median rainfall for January 1850-2014 shows a peak of precipitation between 1940 and 1958. There is oscilation on the rainfall amount along the years but generally, there is a pattern mantained.

Finally, the results obtained on the color map for January rainfall over the period 1850-2014 show that there are three stations that, on the overall period, recorded the highest amount of precipitation, which are featured in dark green. The geographical distribution of rainfall on the period analysed shows that the South and East of Ireland are the regions with the highest records of precipitation, with one singular spot on the Northwest.