Puspita Ghosh
2025-01-04
In this blog, I’ll be exploring rainfall patterns across Ireland by analyzing data from four different weather stations. I’ll be using dygraphs, an R package that creates dynamic time series plots, to help visualize and understand these patterns. The blog is structured into four main sections. We’ll start with an Introduction that sets the context for our analysis, followed by Data Exploration where we’ll examine the raw data from our weather stations. Then we’ll move into Data Analysis to investigate patterns and relationships in the rainfall data, and finally look at Trends to understand how rainfall patterns have changed over time. To make this analysis transparent and reproducible, I’ve included expandable “show” buttons throughout the blog that reveal the exact R code used for each section of the analysis.
In conducting this analysis using RStudio, the first crucial step was to load the required packages to ensure smooth execution of my investigation. I utilized several key R libraries, each offering specialized functions essential for different aspects of the analysis. The tidyverse package provided comprehensive tools for data analysis, manipulation, and visualization, while dplyr offered specific capabilities for data manipulation and transformation. For creating dynamic time series visualizations, I employed the dygraphs package. Additionally, I included sp and sf packages, which, although not directly used in this blog, are valuable tools for handling spatial data. The sp package enables spatial data manipulation, while sf provides functionality for working with spatial data based on the Simple Features standard.
library(tidyverse)
library(dplyr)
library(dygraphs)
library(sp)
library(RColorBrewer)
library(tmap)
library(leaflet)
library(sf)
Now, I will load the rainfall data and prepare it for further investigation.
load("rainfall.RData")
For a thorough understanding of our dataset, I began with some exploratory data analysis by examining the basic structure of our data object using the str() function. This initial step allows us to inspect the data’s format, including the types of variables present and how the information is organized within our dataset.
rain_info <- load("rainfall.RData")
str(rain_info)
## chr [1:2] "stations" "rain"
This shows that the rainfall.Rdata contains a two datasets: stations and rain.
They look like this: Stations:
stations
## # A tibble: 25 × 9
## Station Elevation Easting Northing Lat Long County Abbreviation Source
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 Athboy 87 270400 261700 53.6 -6.93 Meath AB Met E…
## 2 Foulksmills 71 284100 118400 52.3 -6.77 Wexfo… F Met E…
## 3 Mullingar 112 241780 247765 53.5 -7.37 Westm… M Met E…
## 4 Portlaw 8 246600 115200 52.3 -7.31 Water… P Met E…
## 5 Rathdrum 131 319700 186000 52.9 -6.22 Wickl… RD Met E…
## 6 Strokestown 49 194500 279100 53.8 -8.1 Rosco… S Met E…
## 7 University… 14 129000 225600 53.3 -9.06 Galway UCG Met E…
## 8 Drumsna 45 200000 295800 53.9 -8 Leitr… DAL Met E…
## 9 Ardara 15 180788. 394679. 54.8 -8.29 Doneg… AR Briffa
## 10 Armagh 62 287831. 345772. 54.4 -6.64 Armagh A Armag…
## # ℹ 15 more rows
The “station” dataset is a tibble with 25 rows and 9 columns.
Columns: Station: Character variable representing the name of the
weather station. Elevation: Integer variable representing the elevation
of the station. Easting: Double variable representing the easting
coordinate. Northing: Double variable representing the northing
coordinate. Lat: Double variable representing the latitude of the
station. Long: Double variable representing the longitude of the
station. County: Character variable representing the county where the
station is located. Abbreviation: Character variable representing the
abbreviation of the county.
Rain:
rain
## # A tibble: 49,500 × 4
## Year Month Rainfall Station
## <dbl> <fct> <dbl> <chr>
## 1 1850 Jan 169 Ardara
## 2 1851 Jan 236. Ardara
## 3 1852 Jan 250. Ardara
## 4 1853 Jan 209. Ardara
## 5 1854 Jan 188. Ardara
## 6 1855 Jan 32.3 Ardara
## 7 1856 Jan 152. Ardara
## 8 1857 Jan 179. Ardara
## 9 1858 Jan 110. Ardara
## 10 1859 Jan 158. Ardara
## # ℹ 49,490 more rows
This shows that: The “rain” tibble is a data frame with 49,500
rows and 4 columns. Columns: Year: Numeric variable representing the
year. Month: Factor variable representing the month. Rainfall: Numeric
variable representing the amount of rainfall. Station: Character
variable representing the weather station.
Lastly, I will push this data into a variable ‘rain_info’.
rain_info <- c("stations", "rain")
To analyze our rainfall data, I followed a systematic approach using several dplyr functions in sequence. First, I grouped the data by both Year and Month using group_by. Then, using summarise, I calculated the total rainfall for each Year-Month combination. After removing these groupings with ungroup, I used transmute to isolate just the Rainfall column, keeping our data focused on this key measurement. I then converted this processed data into a time series format using the ts function, making sure to specify that we’re working with monthly data. Finally, I narrowed down our analysis to look at a specific 20-year period from 1877 to 1896 using the window function. This approach helps us organize our rainfall data chronologically and prepare it for more detailed analysis.
rain %>% group_by(Year, Month) %>%
summarise(Rainfall = sum(Rainfall)) %>%
ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> rain_ts
rain_ts %>% window(c(1877, 1), c(1896, 12))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1877 4273.1 1855.2 2154.0 2956.1 1908.2 2084.6 2069.5 3537.6 1981.6 3406.6
## 1878 2370.6 1559.5 1158.3 1885.6 3050.8 3346.6 1012.7 2870.8 2318.3 2692.9
## 1879 2543.1 3010.7 1378.1 1761.5 1774.8 3874.5 2996.7 3064.7 3040.8 1183.8
## 1880 1193.6 2684.2 2296.4 2517.0 935.0 2212.3 3657.7 1299.0 2120.4 2145.1
## 1881 912.6 2953.5 2587.3 1130.6 1669.6 3203.0 1739.8 3258.4 1858.9 2594.0
## 1882 2004.4 2362.1 2104.3 2940.2 1931.6 2433.1 3467.1 2270.0 2294.0 2893.3
## 1883 4179.5 4360.4 1226.3 1666.8 1558.2 1494.0 2699.9 2742.8 3351.6 2485.3
## 1884 3624.0 3916.5 2946.4 1129.7 1797.2 657.8 2361.2 1240.9 1804.8 1827.9
## 1885 2454.8 3174.8 1896.5 2254.3 1811.1 756.8 1422.4 2098.9 3599.3 2880.4
## 1886 2814.2 2063.9 2517.4 1568.8 2795.1 1100.9 2576.8 1890.3 2729.4 3773.4
## 1887 2556.0 1185.6 1131.5 1243.8 969.7 372.1 1789.7 2109.4 2138.3 1747.7
## 1888 2242.4 643.0 2457.8 1397.0 1918.4 3047.9 3496.2 2209.9 830.6 1623.9
## 1889 2256.5 2027.5 1261.6 1863.9 2567.9 701.0 1591.8 4011.2 1376.3 3379.1
## 1890 3582.6 1143.0 2446.4 1146.3 1898.5 2205.3 1828.0 2117.9 2133.4 1624.8
## 1891 1468.1 253.7 1211.6 1535.9 2192.0 1854.9 1340.3 4031.8 2072.9 3545.9
## 1892 1772.9 2024.7 698.3 816.0 2808.7 1766.5 2252.2 4116.0 2533.2 2242.5
## 1893 2206.7 2556.2 589.2 745.7 1250.2 1331.0 1854.2 2843.5 1456.1 2035.8
## 1894 3296.4 2372.0 1553.9 2689.9 1806.5 1570.8 2851.6 2216.8 408.7 3082.4
## 1895 2584.5 745.6 2486.5 1442.5 457.5 1480.9 3110.2 3475.6 711.3 2607.3
## 1896 1219.5 1411.9 2992.7 887.8 279.1 1768.6 3711.8 1498.4 3980.5 2524.6
## Nov Dec
## 1877 4059.8 2959.0
## 1878 1651.4 1684.1
## 1879 713.8 1411.4
## 1880 3335.5 2480.2
## 1881 3667.9 3135.2
## 1882 3708.6 2905.7
## 1883 3174.6 1425.8
## 1884 2431.5 2915.1
## 1885 1858.9 1168.5
## 1886 2281.4 3653.2
## 1887 2361.0 2120.3
## 1888 3231.4 3549.7
## 1889 1436.9 2726.7
## 1890 4294.2 1756.6
## 1891 2576.4 3970.7
## 1892 3358.2 1869.3
## 1893 1633.4 2916.3
## 1894 2911.2 2428.8
## 1895 3544.9 3875.9
## 1896 870.6 4139.4
library(dygraphs)
create_rainfall_graph <- function(data, title = "Rainfall Data Over Time") {
data %>%
dygraph(main = title) %>%
dySeries("Rainfall", color = "blue") %>%
dyAxis("x", label = "Year") %>%
dyAxis("y", label = "Rainfall (mm)") %>%
dyOptions(drawGrid = TRUE, strokeWidth = 2) %>%
dyLegend(show = "always", width = 300)
}
create_rainfall_graph(rain_ts)
This step introduces the dygraphs library, then
dygraph. Please feel free to hover around the graph to
explore the data points.
I developed a function
create_rainfall_graph for reusability to prevent duplicating
the same customization code for every graph. This guarantees a uniform
look, conserves time, and maintains a tidier main codebase. Employing
one reusable function allows me to swiftly create professional-quality
graphs, enabling me to concentrate on analysis rather than design.
rain_ts %>%
window(c(1850, 1), c(1889, 12)) %>%
create_rainfall_graph(title = "Rainfall Data (1850–1889)")
In this next phase of the analysis, I refined our visualization by implementing a specific time window restriction. By focusing on the years between 1850 and 1889, we created a more targeted view of our rainfall data. Using dygraphs, I customized the visualization with specific width and height dimensions to ensure optimal display of the data. This narrower time frame allows us to examine the rainfall patterns in greater detail during this 40-year period, making it easier to identify trends and patterns that might be less visible in a broader time range.
create_rainfall_graph(rain_ts, title = "Full Rainfall Data") %>%
dyRangeSelector()
This step adds interactivity to the dygraph by implementing a range
selector (dyRangeSelector). You can drag and adjust the
selected time window, for better exploration of the data.
create_rainfall_graph(rain_ts, "Rainfall Data Over Time") %>%
dyRangeSelector() %>%
dyRoller(rollPeriod = 600)
Here, a rolling mean is added to the interactive dygraph using the
dyRoller function. like mutate but drops unreferenced
variables: here used to select out a single column as a variable. The
rolling mean smoothens the curve and helps identify trends by averaging
over a specified rolling period (600 months in this case). Notice how
stable it looks than before!
rain %>% group_by(Year, Month) %>% filter(Station == "Dublin Airport") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> dub_ts
rain %>% group_by(Year, Month) %>% filter(Station == "Belfast") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> bel_ts
rain %>% group_by(Year, Month) %>% filter(Station == "University College Galway") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> ucg_ts
rain %>% group_by(Year, Month) %>% filter(Station == "Cork Airport") %>%
summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
ts(start = c(1850, 1), freq = 12) -> cor_ts
beldubucgcor_ts <- cbind(bel_ts, dub_ts, ucg_ts, cor_ts)
window(beldubucgcor_ts, c(1850, 1), c(1850, 12))
## bel_ts dub_ts ucg_ts cor_ts
## Jan 1850 115.7 75.8 108.9 155.3
## Feb 1850 120.5 47.8 131.5 92.6
## Mar 1850 56.8 18.5 56.6 56.0
## Apr 1850 142.6 97.5 120.5 207.2
## May 1850 57.9 58.6 69.8 35.3
## Jun 1850 62.0 43.6 74.7 11.4
## Jul 1850 96.3 66.0 89.1 179.0
## Aug 1850 110.4 41.2 136.8 46.5
## Sep 1850 65.8 54.2 85.2 40.7
## Oct 1850 87.6 40.4 90.7 53.8
## Nov 1850 104.4 60.0 131.3 153.2
## Dec 1850 57.6 81.1 90.6 169.4
This step involves creating separate time series objects
(dub_ts, bel_ts, ucg_ts,
cor_ts) for each weather station. These are then combined
into a single object (beldubucgcor_ts) for comparative
analysis. The window function is applied to focus on a specific time
range (1850- January to December).
beldubucgcor_ts %>% dygraph(width = 960, height = 360) %>%
dyRangeSelector()
The next step involved creating a comparative visualization of all four weather stations using dygraphs, complete with an interactive range selector for exploring the data. However, this visualization proved to be less than ideal for our analysis. The challenge lies in the close proximity of the rainfall values across stations – when plotted together, the lines overlap significantly, making it difficult to distinguish between individual weather stations and their unique patterns. This visualization limitation makes it challenging to effectively compare rainfall trends across different locations.
dub_ts %>%
dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Dublin") %>%
dySeries(color = "blue")
bel_ts %>%
dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Belfast") %>%
dySeries(color = "green")
ucg_ts %>%
dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "University College Galway") %>%
dySeries(color = "red")
cor_ts %>%
dygraph(width = 800, height = 170, group = "dub_belf_ucg_cor", main = "Cork Airport") %>%
dySeries(color = "purple") %>%
dyRangeSelector()
To overcome the visualization challenge from our previous combined graph, I took a different approach by creating separate dygraphs for each weather station. Each station now has its own individual interactive plot, equipped with a range selector for detailed exploration of specific time periods. This separation makes it much easier to observe and analyze the unique rainfall patterns at each location. When comparing this approach to our earlier attempt at combining all stations in one graph, the advantage becomes clear – we can now distinctly see each station’s rainfall trends without the confusion of overlapping lines. This improved visualization strategy demonstrates why it’s often better to separate multiple time series rather than superimposing them on a single graph, especially when the values are similar and the patterns need to be clearly distinguished
1- Dublin, Belfast, University College Galway and Cork airport
all of them had their highest rainfall in January
of 1900.
2- January 1900 is the
wettest month in modern Irish history.
3- Each weather station has different
troughs and crests.This means that
when the rain was the high in one of the stations, say Dublin, it did
not necessarily mean that others will also have their highest rain at
the same time.(except national outliers like Jan 1900). This is because
all these cities are quite far from each other.
4-
All the stations do not have a visible difference if seen in a larger
timeframe(> 15 years).
5- When timeframe to
observe is decreased, there are fluctuations on a day to day, or
month-to-month basis, which are visible.