‘It’s always rainy in Ireland !’
‘It’s always rainy in Ireland !’



Hi! In this blog, I’m going to investigate rainfall trends in 4 weather stations throughout Ireland, using dygraphs which is an R package which creates dynamic time series plots. This blog is divided into 4 parts: Introduction, Data Exploration, Data Analysis, Trends. Please click on the show button to see the code used for each section.

Introduction

I carried out this investigation using RStudio. To proceed, I first need to load the necessary packages for my investigation to proceed smoothly. Each of these libraries have an inbuilt set of functions which can be utilized by loading the said packages. These packages include: tidyverse (data analysis,manipulation & visualization), dplyr(data manipulation and transformation.), dygraphs (dynamic time series plots), sp (spatial data manipulation ), sf (spatial data based on the Simple Features standard). [Note-some of the package functions have not been used in this blog but are certainly important from spatial R point of view.]

library(tidyverse)

library(dplyr)

library(dygraphs)

library(sp)

library(RColorBrewer)

library(tmap)

library(leaflet)

library(sf)

Now, I will load the rainfall data and prepare it for further investigation.

load("rainfall.RData")

Data Exploration

To understand our dataset comprehensively, we need to first carry out some data explanatory operations on the data. Here, I have started by checking the structure of the object by using str().

rain_info <- load("rainfall.RData")
str(rain_info)
##  chr [1:2] "stations" "rain"

This shows that the rainfall.Rdata contains a two datasets: stations and rain.


They look like this: Stations:

stations
## # A tibble: 25 × 9
##    Station     Elevation Easting Northing   Lat  Long County Abbreviation Source
##    <chr>           <int>   <dbl>    <dbl> <dbl> <dbl> <chr>  <chr>        <chr> 
##  1 Athboy             87 270400   261700   53.6 -6.93 Meath  AB           Met E…
##  2 Foulksmills        71 284100   118400   52.3 -6.77 Wexfo… F            Met E…
##  3 Mullingar         112 241780   247765   53.5 -7.37 Westm… M            Met E…
##  4 Portlaw             8 246600   115200   52.3 -7.31 Water… P            Met E…
##  5 Rathdrum          131 319700   186000   52.9 -6.22 Wickl… RD           Met E…
##  6 Strokestown        49 194500   279100   53.8 -8.1  Rosco… S            Met E…
##  7 University…        14 129000   225600   53.3 -9.06 Galway UCG          Met E…
##  8 Drumsna            45 200000   295800   53.9 -8    Leitr… DAL          Met E…
##  9 Ardara             15 180788.  394679.  54.8 -8.29 Doneg… AR           Briffa
## 10 Armagh             62 287831.  345772.  54.4 -6.64 Armagh A            Armag…
## # ℹ 15 more rows

The “station” dataset is a tibble with 25 rows and 9 columns.
Columns: Station: Character variable representing the name of the weather station. Elevation: Integer variable representing the elevation of the station. Easting: Double variable representing the easting coordinate. Northing: Double variable representing the northing coordinate. Lat: Double variable representing the latitude of the station. Long: Double variable representing the longitude of the station. County: Character variable representing the county where the station is located. Abbreviation: Character variable representing the abbreviation of the county.

Rain:

rain
## # A tibble: 49,500 × 4
##     Year Month Rainfall Station
##    <dbl> <fct>    <dbl> <chr>  
##  1  1850 Jan      169   Ardara 
##  2  1851 Jan      236.  Ardara 
##  3  1852 Jan      250.  Ardara 
##  4  1853 Jan      209.  Ardara 
##  5  1854 Jan      188.  Ardara 
##  6  1855 Jan       32.3 Ardara 
##  7  1856 Jan      152.  Ardara 
##  8  1857 Jan      179.  Ardara 
##  9  1858 Jan      110.  Ardara 
## 10  1859 Jan      158.  Ardara 
## # ℹ 49,490 more rows


This shows that: The “rain” tibble is a data frame with 49,500 rows and 4 columns. Columns: Year: Numeric variable representing the year. Month: Factor variable representing the month. Rainfall: Numeric variable representing the amount of rainfall. Station: Character variable representing the weather station.

Lastly, I will push this data into a variable ‘rain_info’.

rain_info <- c("stations", "rain")

Data Analysis

Now, to carry out the data analysis: In the following code, the dplyr package is used to aggregate the rainfall data. The group_by function groups the data by Year and Month, and summarise calculates the total rainfall for each combination of Year and Month. The ungroup function removes grouping, and transmute is used to select out a single column as a variable ie. retains only the Rainfall column. The resulting data is then transformed into a time series object (ts) with a specified start date and frequency (monthly in this case). Finally, the window function is used to focus on a specific time range (1877-1896).

rain %>%  group_by(Year, Month) %>% 
  summarise(Rainfall = sum(Rainfall)) %>% 
  ungroup() %>% transmute(Rainfall) %>% 
  ts(start = c(1850, 1), freq = 12) -> rain_ts
rain_ts %>% window(c(1877, 1), c(1896, 12))
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1877 4273.1 1855.2 2154.0 2956.1 1908.2 2084.6 2069.5 3537.6 1981.6 3406.6
## 1878 2370.6 1559.5 1158.3 1885.6 3050.8 3346.6 1012.7 2870.8 2318.3 2692.9
## 1879 2543.1 3010.7 1378.1 1761.5 1774.8 3874.5 2996.7 3064.7 3040.8 1183.8
## 1880 1193.6 2684.2 2296.4 2517.0  935.0 2212.3 3657.7 1299.0 2120.4 2145.1
## 1881  912.6 2953.5 2587.3 1130.6 1669.6 3203.0 1739.8 3258.4 1858.9 2594.0
## 1882 2004.4 2362.1 2104.3 2940.2 1931.6 2433.1 3467.1 2270.0 2294.0 2893.3
## 1883 4179.5 4360.4 1226.3 1666.8 1558.2 1494.0 2699.9 2742.8 3351.6 2485.3
## 1884 3624.0 3916.5 2946.4 1129.7 1797.2  657.8 2361.2 1240.9 1804.8 1827.9
## 1885 2454.8 3174.8 1896.5 2254.3 1811.1  756.8 1422.4 2098.9 3599.3 2880.4
## 1886 2814.2 2063.9 2517.4 1568.8 2795.1 1100.9 2576.8 1890.3 2729.4 3773.4
## 1887 2556.0 1185.6 1131.5 1243.8  969.7  372.1 1789.7 2109.4 2138.3 1747.7
## 1888 2242.4  643.0 2457.8 1397.0 1918.4 3047.9 3496.2 2209.9  830.6 1623.9
## 1889 2256.5 2027.5 1261.6 1863.9 2567.9  701.0 1591.8 4011.2 1376.3 3379.1
## 1890 3582.6 1143.0 2446.4 1146.3 1898.5 2205.3 1828.0 2117.9 2133.4 1624.8
## 1891 1468.1  253.7 1211.6 1535.9 2192.0 1854.9 1340.3 4031.8 2072.9 3545.9
## 1892 1772.9 2024.7  698.3  816.0 2808.7 1766.5 2252.2 4116.0 2533.2 2242.5
## 1893 2206.7 2556.2  589.2  745.7 1250.2 1331.0 1854.2 2843.5 1456.1 2035.8
## 1894 3296.4 2372.0 1553.9 2689.9 1806.5 1570.8 2851.6 2216.8  408.7 3082.4
## 1895 2584.5  745.6 2486.5 1442.5  457.5 1480.9 3110.2 3475.6  711.3 2607.3
## 1896 1219.5 1411.9 2992.7  887.8  279.1 1768.6 3711.8 1498.4 3980.5 2524.6
##         Nov    Dec
## 1877 4059.8 2959.0
## 1878 1651.4 1684.1
## 1879  713.8 1411.4
## 1880 3335.5 2480.2
## 1881 3667.9 3135.2
## 1882 3708.6 2905.7
## 1883 3174.6 1425.8
## 1884 2431.5 2915.1
## 1885 1858.9 1168.5
## 1886 2281.4 3653.2
## 1887 2361.0 2120.3
## 1888 3231.4 3549.7
## 1889 1436.9 2726.7
## 1890 4294.2 1756.6
## 1891 2576.4 3970.7
## 1892 3358.2 1869.3
## 1893 1633.4 2916.3
## 1894 2911.2 2428.8
## 1895 3544.9 3875.9
## 1896  870.6 4139.4
library(dygraphs) 
rain_ts %>% dygraph() 

This step introduces the dygraphs library, then dygraph.Please feel free to hover around the graph to explore the data points.

rain_ts %>% 
  window(c(1850, 1), c(1889, 12)) %>%
  dygraph(height = 300, width = 960)

Building upon the previous step, this code introduces a window restriction to focus on a specific time range (1850-1889). The resulting dygraph is limited to this time window, providing a more detailed view of the selected period. Notice how the graph is decided on the width and height provided by me!

rain_ts %>% dygraph() %>% 
  dyRangeSelector()

This step adds interactivity to the dygraph by implementing a range selector (dyRangeSelector). You can drag and adjust the selected time window, for better exploration of the data.

rain_ts %>% dygraph(width = 960, height = 330) %>% 
  dyRangeSelector() %>% dyRoller(rollPeriod = 600)

Here, a rolling mean is added to the interactive dygraph using the dyRoller function. like mutate but drops unreferenced variables: here used to select out a single column as a variable. The rolling mean smoothens the curve and helps identify trends by averaging over a specified rolling period (600 months in this case). Notice how stable it looks than before!

rain %>%  group_by(Year, Month) %>% filter(Station == "Dublin Airport") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  dub_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "Belfast") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  bel_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "University College Galway") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  ucg_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "Cork Airport") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  cor_ts
beldubucgcor_ts <- cbind(bel_ts, dub_ts, ucg_ts, cor_ts)
window(beldubucgcor_ts, c(1850, 1), c(1850, 12))
##          bel_ts dub_ts ucg_ts cor_ts
## Jan 1850  115.7   75.8  108.9  155.3
## Feb 1850  120.5   47.8  131.5   92.6
## Mar 1850   56.8   18.5   56.6   56.0
## Apr 1850  142.6   97.5  120.5  207.2
## May 1850   57.9   58.6   69.8   35.3
## Jun 1850   62.0   43.6   74.7   11.4
## Jul 1850   96.3   66.0   89.1  179.0
## Aug 1850  110.4   41.2  136.8   46.5
## Sep 1850   65.8   54.2   85.2   40.7
## Oct 1850   87.6   40.4   90.7   53.8
## Nov 1850  104.4   60.0  131.3  153.2
## Dec 1850   57.6   81.1   90.6  169.4

This step involves creating separate time series objects (dub_ts, bel_ts, ucg_ts, cor_ts) for each weather station. These are then combined into a single object (beldubucgcor_ts) for comparative analysis. The window function is applied to focus on a specific time range (1850- January to December).

beldubucgcor_ts %>% dygraph(width = 960, height = 360) %>%
  dyRangeSelector()

The combined time series object is visualized using a dygraph for a four-way comparison. The dygraph includes a range selector for interactive exploration. This graph is not very pleasing to they eye as you cannot notice the difference between each weather stations because they are so close by each other.

dub_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Dublin")
bel_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Belfast")
ucg_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "University College Galway")
cor_ts %>% dygraph(width = 800, height = 170, group = "dub_belf_ucg_cor", main = "Cork Airport") %>% dyRangeSelector()

Hence, in this final step, individual dygraphs are created for each station, providing a detailed view of their respective rainfall patterns. The dyRangeSelector is applied to maintain interactivity across all the dygraphs.

These steps collectively form a comprehensive exploration of the rainfall time series data, utilizing interactive maps and dygraphs for in-depth analysis and visualization. If you compare this graph with the last one, you’ll see why we don’t visualise different time series’ which are superimposed upon one another!