Hi! In this blog, I’m going to investigate
rainfall trends in 4 weather stations throughout Ireland, using dygraphs
which is an R package which creates dynamic time series plots. This blog
is divided into 4 parts: Introduction, Data Exploration, Data Analysis,
Trends. Please click on the show button to see the code
used for each section.
I carried out this investigation using RStudio. To proceed, I first need to load the necessary packages for my investigation to proceed smoothly. Each of these libraries have an inbuilt set of functions which can be utilized by loading the said packages. These packages include: tidyverse (data analysis,manipulation & visualization), dplyr(data manipulation and transformation.), dygraphs (dynamic time series plots), sp (spatial data manipulation ), sf (spatial data based on the Simple Features standard). [Note-some of the package functions have not been used in this blog but are certainly important from spatial R point of view.]
library(tidyverse)
library(dplyr)
library(dygraphs)
library(sp)
library(RColorBrewer)
library(tmap)
library(leaflet)
library(sf)
Now, I will load the rainfall data and prepare it for further investigation.
load("rainfall.RData")
To understand our dataset comprehensively, we need to first carry out some data explanatory operations on the data. Here, I have started by checking the structure of the object by using str().
rain_info <- load("rainfall.RData")
str(rain_info)
## chr [1:2] "stations" "rain"
This shows that the rainfall.Rdata contains a two datasets: stations and rain.
They look like this: Stations:
stations
## # A tibble: 25 × 9
## Station Elevation Easting Northing Lat Long County Abbreviation Source
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 Athboy 87 270400 261700 53.6 -6.93 Meath AB Met E…
## 2 Foulksmills 71 284100 118400 52.3 -6.77 Wexfo… F Met E…
## 3 Mullingar 112 241780 247765 53.5 -7.37 Westm… M Met E…
## 4 Portlaw 8 246600 115200 52.3 -7.31 Water… P Met E…
## 5 Rathdrum 131 319700 186000 52.9 -6.22 Wickl… RD Met E…
## 6 Strokestown 49 194500 279100 53.8 -8.1 Rosco… S Met E…
## 7 University… 14 129000 225600 53.3 -9.06 Galway UCG Met E…
## 8 Drumsna 45 200000 295800 53.9 -8 Leitr… DAL Met E…
## 9 Ardara 15 180788. 394679. 54.8 -8.29 Doneg… AR Briffa
## 10 Armagh 62 287831. 345772. 54.4 -6.64 Armagh A Armag…
## # ℹ 15 more rows
The “station” dataset is a tibble with 25 rows and 9 columns.
Columns: Station: Character variable representing the name of the
weather station. Elevation: Integer variable representing the elevation
of the station. Easting: Double variable representing the easting
coordinate. Northing: Double variable representing the northing
coordinate. Lat: Double variable representing the latitude of the
station. Long: Double variable representing the longitude of the
station. County: Character variable representing the county where the
station is located. Abbreviation: Character variable representing the
abbreviation of the county.
Rain:
rain
## # A tibble: 49,500 × 4
## Year Month Rainfall Station
## <dbl> <fct> <dbl> <chr>
## 1 1850 Jan 169 Ardara
## 2 1851 Jan 236. Ardara
## 3 1852 Jan 250. Ardara
## 4 1853 Jan 209. Ardara
## 5 1854 Jan 188. Ardara
## 6 1855 Jan 32.3 Ardara
## 7 1856 Jan 152. Ardara
## 8 1857 Jan 179. Ardara
## 9 1858 Jan 110. Ardara
## 10 1859 Jan 158. Ardara
## # ℹ 49,490 more rows
This shows that: The “rain” tibble is a data frame with 49,500
rows and 4 columns. Columns: Year: Numeric variable representing the
year. Month: Factor variable representing the month. Rainfall: Numeric
variable representing the amount of rainfall. Station: Character
variable representing the weather station.
Lastly, I will push this data into a variable ‘rain_info’.
rain_info <- c("stations", "rain")
Now, to carry out the data analysis: In the following code, the
dplyr package is used to aggregate the rainfall data. The
group_by function groups the data by Year and Month, and
summarise calculates the total rainfall for each
combination of Year and Month. The ungroup function removes
grouping, and transmute is used to select out a single
column as a variable ie. retains only the Rainfall column. The resulting
data is then transformed into a time series object (ts)
with a specified start date and frequency (monthly in this case).
Finally, the window function is used to focus on a specific
time range (1877-1896).
rain %>% group_by(Year, Month) %>%
summarise(Rainfall = sum(Rainfall)) %>%
ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> rain_ts
rain_ts %>% window(c(1877, 1), c(1896, 12))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1877 4273.1 1855.2 2154.0 2956.1 1908.2 2084.6 2069.5 3537.6 1981.6 3406.6
## 1878 2370.6 1559.5 1158.3 1885.6 3050.8 3346.6 1012.7 2870.8 2318.3 2692.9
## 1879 2543.1 3010.7 1378.1 1761.5 1774.8 3874.5 2996.7 3064.7 3040.8 1183.8
## 1880 1193.6 2684.2 2296.4 2517.0 935.0 2212.3 3657.7 1299.0 2120.4 2145.1
## 1881 912.6 2953.5 2587.3 1130.6 1669.6 3203.0 1739.8 3258.4 1858.9 2594.0
## 1882 2004.4 2362.1 2104.3 2940.2 1931.6 2433.1 3467.1 2270.0 2294.0 2893.3
## 1883 4179.5 4360.4 1226.3 1666.8 1558.2 1494.0 2699.9 2742.8 3351.6 2485.3
## 1884 3624.0 3916.5 2946.4 1129.7 1797.2 657.8 2361.2 1240.9 1804.8 1827.9
## 1885 2454.8 3174.8 1896.5 2254.3 1811.1 756.8 1422.4 2098.9 3599.3 2880.4
## 1886 2814.2 2063.9 2517.4 1568.8 2795.1 1100.9 2576.8 1890.3 2729.4 3773.4
## 1887 2556.0 1185.6 1131.5 1243.8 969.7 372.1 1789.7 2109.4 2138.3 1747.7
## 1888 2242.4 643.0 2457.8 1397.0 1918.4 3047.9 3496.2 2209.9 830.6 1623.9
## 1889 2256.5 2027.5 1261.6 1863.9 2567.9 701.0 1591.8 4011.2 1376.3 3379.1
## 1890 3582.6 1143.0 2446.4 1146.3 1898.5 2205.3 1828.0 2117.9 2133.4 1624.8
## 1891 1468.1 253.7 1211.6 1535.9 2192.0 1854.9 1340.3 4031.8 2072.9 3545.9
## 1892 1772.9 2024.7 698.3 816.0 2808.7 1766.5 2252.2 4116.0 2533.2 2242.5
## 1893 2206.7 2556.2 589.2 745.7 1250.2 1331.0 1854.2 2843.5 1456.1 2035.8
## 1894 3296.4 2372.0 1553.9 2689.9 1806.5 1570.8 2851.6 2216.8 408.7 3082.4
## 1895 2584.5 745.6 2486.5 1442.5 457.5 1480.9 3110.2 3475.6 711.3 2607.3
## 1896 1219.5 1411.9 2992.7 887.8 279.1 1768.6 3711.8 1498.4 3980.5 2524.6
## Nov Dec
## 1877 4059.8 2959.0
## 1878 1651.4 1684.1
## 1879 713.8 1411.4
## 1880 3335.5 2480.2
## 1881 3667.9 3135.2
## 1882 3708.6 2905.7
## 1883 3174.6 1425.8
## 1884 2431.5 2915.1
## 1885 1858.9 1168.5
## 1886 2281.4 3653.2
## 1887 2361.0 2120.3
## 1888 3231.4 3549.7
## 1889 1436.9 2726.7
## 1890 4294.2 1756.6
## 1891 2576.4 3970.7
## 1892 3358.2 1869.3
## 1893 1633.4 2916.3
## 1894 2911.2 2428.8
## 1895 3544.9 3875.9
## 1896 870.6 4139.4
library(dygraphs)
rain_ts %>% dygraph()
This step introduces the dygraphs library, then
dygraph.Please feel free to hover around the graph to
explore the data points.
rain_ts %>%
window(c(1850, 1), c(1889, 12)) %>%
dygraph(height = 300, width = 960)
Building upon the previous step, this code introduces a window restriction to focus on a specific time range (1850-1889). The resulting dygraph is limited to this time window, providing a more detailed view of the selected period. Notice how the graph is decided on the width and height provided by me!
rain_ts %>% dygraph() %>%
dyRangeSelector()
This step adds interactivity to the dygraph by implementing a range
selector (dyRangeSelector). You can drag and adjust the
selected time window, for better exploration of the data.
rain_ts %>% dygraph(width = 960, height = 330) %>%
dyRangeSelector() %>% dyRoller(rollPeriod = 600)
Here, a rolling mean is added to the interactive dygraph using the
dyRoller function. like mutate but drops unreferenced
variables: here used to select out a single column as a variable. The
rolling mean smoothens the curve and helps identify trends by averaging
over a specified rolling period (600 months in this case). Notice how
stable it looks than before!
rain %>% group_by(Year, Month) %>% filter(Station == "Dublin Airport") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> dub_ts
rain %>% group_by(Year, Month) %>% filter(Station == "Belfast") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> bel_ts
rain %>% group_by(Year, Month) %>% filter(Station == "University College Galway") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> ucg_ts
rain %>% group_by(Year, Month) %>% filter(Station == "Cork Airport") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> cor_ts
beldubucgcor_ts <- cbind(bel_ts, dub_ts, ucg_ts, cor_ts)
window(beldubucgcor_ts, c(1850, 1), c(1850, 12))
## bel_ts dub_ts ucg_ts cor_ts
## Jan 1850 115.7 75.8 108.9 155.3
## Feb 1850 120.5 47.8 131.5 92.6
## Mar 1850 56.8 18.5 56.6 56.0
## Apr 1850 142.6 97.5 120.5 207.2
## May 1850 57.9 58.6 69.8 35.3
## Jun 1850 62.0 43.6 74.7 11.4
## Jul 1850 96.3 66.0 89.1 179.0
## Aug 1850 110.4 41.2 136.8 46.5
## Sep 1850 65.8 54.2 85.2 40.7
## Oct 1850 87.6 40.4 90.7 53.8
## Nov 1850 104.4 60.0 131.3 153.2
## Dec 1850 57.6 81.1 90.6 169.4
This step involves creating separate time series objects
(dub_ts, bel_ts, ucg_ts,
cor_ts) for each weather station. These are then combined
into a single object (beldubucgcor_ts) for comparative
analysis. The window function is applied to focus on a specific time
range (1850- January to December).
beldubucgcor_ts %>% dygraph(width = 960, height = 360) %>%
dyRangeSelector()
The combined time series object is visualized using a dygraph for a four-way comparison. The dygraph includes a range selector for interactive exploration. This graph is not very pleasing to they eye as you cannot notice the difference between each weather stations because they are so close by each other.
dub_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Dublin")
bel_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Belfast")
ucg_ts %>% dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "University College Galway")
cor_ts %>% dygraph(width = 800, height = 170, group = "dub_belf_ucg_cor", main = "Cork Airport") %>% dyRangeSelector()
Hence, in this final step, individual dygraphs are created for each
station, providing a detailed view of their respective rainfall
patterns. The dyRangeSelector is applied to maintain
interactivity across all the dygraphs.
These steps collectively form a comprehensive exploration of the rainfall time series data, utilizing interactive maps and dygraphs for in-depth analysis and visualization. If you compare this graph with the last one, you’ll see why we don’t visualise different time series’ which are superimposed upon one another!
1- Dublin, Belfast, University College Galway and Cork airport
all of them had their highest rainfall in January
of 1900.
2- January 1900 is the
wettest month in modern Irish history.
3- Each weather station has different
troughs and crests.This means that
when the rain was the high in one of the stations, say Dublin, it did
not necessarily mean that others will also have their highest rain at
the same time.(except national outliers like Jan 1900). This is because
all these cities are quite far from each other.
4-
All the stations do not have a visible difference if seen in a larger
timeframe(> 15 years).
5- When timeframe to
observe is decreased, there are fluctuations on a day to day, or
month-to-month basis, which are visible.