Introduction
Welcome to my Geo Computation journey! In this blog post, we’ll
delve into the world of rainfall data analysis using R. I recently
completed an assignment that involved processing, cleaning, and
visualizing rainfall data from various weather stations in Ireland. Join
me as I walk you through the steps I took to transform raw data into
insightful visualizations.
Data Explanation
The data used in this analysis consists of monthly rainfall
measurements from four weather stations in Ireland: Belfast, Dublin
Airport, University College Galway, and Cork Airport. The dataset
includes columns for the year, month, station name, and rainfall amount.
The goal is to process this data, handle any missing values or
anomalies, and create an interactive time series visualization to
identify patterns in rainfall.
Libraries and Data Loading
To kick things off, I loaded the necessary libraries for data
manipulation, visualization, and time series analysis:
Libraries Loading
First, we load the necessary libraries:
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dygraphs)
## Warning: package 'dygraphs' was built under R version 4.4.2
library(xts)
## Warning: package 'xts' was built under R version 4.4.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.4.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
library(zoo)
library(imputeTS)
## Warning: package 'imputeTS' was built under R version 4.4.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'imputeTS'
## The following object is masked from 'package:zoo':
##
## na.locf
library(forecast)
## Warning: package 'forecast' was built under R version 4.4.2
Next, I loaded the rainfall data from a file and prepared it for
processing.
The first task was to create a function that processes data for each
weather station. This function filters the data, summarizes the total
monthly rainfall, ensures the Year and Month columns are numeric, and
creates a Date column in the YYYY-MM-DD format.
Processing Data for Each Station
We create a function to process data for each weather station
Handling Missing Data and Anomalies
Missing data and anomalies can skew analysis 0results, so I created
a function to clean the data. The na_interpolation function fills in
missing values, while the tsclean function removes anomalies.
we create a function to clean the data
The na_interpolation function fills in any missing values in the
rainfall data.The tsclean function removes any anomalies in the cleaned
data.
clean_data <- function(ts_data) {
ts_data_clean <- ts_data %>%
mutate(Rainfall = na_interpolation(Rainfall)) # Fill missing values
ts_data_clean <- ts_data_clean %>%
mutate(Rainfall = tsclean(ts_data_clean$Rainfall)) # Remove anomalies
return(ts_data_clean)
}
Cleaning Data for Each Station
I applied the data processing and cleaning functions to each weather
station, including Belfast, Dublin Airport, University College Galway,
and Cork Airport.
We apply the data cleaning process to each station
This applies the process_station_data and clean_data functions to
the data for each station.
load("C:/Users/Dell/Downloads/rainfall.RData")
bel_ts <- clean_data(process_station_data("Belfast"))
dub_ts <- clean_data(process_station_data("Dublin Airport"))
gal_ts <- clean_data(process_station_data("University College Galway"))
cor_ts <- clean_data(process_station_data("Cork Airport"))
Combining Data
To gain a holistic view, I combined the data from all stations into
a single data frame, merging them by date.
We combine the data from all stations into a single data frame
This creates a combined data frame that includes the rainfall data
for all stations, merged by date.
all_stations_df <- full_join(full_join(bel_ts, dub_ts, by = "Date", suffix = c("_Belfast", "_Dublin_Airport")),
full_join(gal_ts, cor_ts, by = "Date", suffix = c("_Galway", "_Cork_Airport")), by = "Date")
Creating xts Object
For compatibility with dygraphs, I converted the combined data frame
into an xts object with the Date column as the index.
We convert the data frame into an xts object for compatibility with
dygraphs
The xts function converts the data frame into a time series object
with the Date column as the index.
all_stations_xts <- xts(all_stations_df[-1], order.by = all_stations_df$Date)
Feature Engineering: Adding Lagged Variables and Moving
Averages
To enhance the data analysis, I added lagged variables (e.g.,
previous month’s rainfall) and moving averages (e.g., 12-month moving
average) for each station.
We add additional features to the data
We add lagged variables (e.g., previous month’s rainfall) and moving
averages (e.g., 12-month moving average) for each station. The rollapply
function calculates the moving average. We convert the data back into an
xts object after adding these features.
all_stations_xts <- all_stations_xts %>%
as.data.frame() %>%
mutate(
Belfast_Lag1 = dplyr::lag(Rainfall_Belfast, 1),
Dublin_Airport_Lag1 = dplyr::lag(Rainfall_Dublin_Airport, 1),
Galway_Lag1 = dplyr::lag(Rainfall_Galway, 1),
Cork_Airport_Lag1 = dplyr::lag(Rainfall_Cork_Airport, 1),
Belfast_MA = rollapply(Rainfall_Belfast, 12, mean, align = "right", fill = NA),
Dublin_Airport_MA = rollapply(Rainfall_Dublin_Airport, 12, mean, align = "right", fill = NA),
Galway_MA = rollapply(Rainfall_Galway, 12, mean, align = "right", fill = NA),
Cork_Airport_MA = rollapply(Rainfall_Cork_Airport, 12, mean, align = "right", fill = NA)
) %>%
as.xts(order.by = all_stations_df$Date)
Creating the Dygraph
Finally, I created an interactive time series plot using dygraph.
The dygraph function creates the plot, and various dy* functions add
additional features like a range selector, axis labels, highlighting,
and annotations.
For which:
1) dygraph creates the interactive time series plot.
2) dyRangeSelector adds a range selector to the bottom of the
chart.
3) dyOptions sets the colors for each series.
4) dyLegend controls the display of the legend.
5) dyAxis labels the axes.
6) dyHighlight adds highlighting for selected data points.
7) dyEvent adds an annotation for a specific date (e.g., Easter
Rising).
8) dyShading shades a specific time period (e.g., World War
II).
dygraph(all_stations_xts, main = "Monthly Rainfall at Weather Stations") %>%
dyRangeSelector() %>%
dyOptions(colors = c("blue", "red", "green", "purple")) %>%
dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
dyAxis("y", label = "Rainfall (mm)") %>%
dyAxis("x", label = "Year") %>%
dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE) %>%
dyEvent("1916-04-24", "Easter Rising", labelLoc = "bottom") %>%
dyShading(from = "1939-09-01", to = "1945-09-02", color = "#FFE6E6")
Embedded Dygraph
Discussion of Patterns
From the interactive dygraph, several patterns emerge. We can
observe seasonal variations in rainfall, with higher amounts typically
occurring in the winter months. The moving averages help smooth out the
data, revealing long-term trends and changes in rainfall patterns over
the years. Notable events like the Easter Rising and World War II are
marked, allowing us to correlate historical events with any changes in
weather patterns.
Analysis of Rainfall Trends in Specific Decades and Historical
Events
Late 1800s to 2014 (General Overview)
This broader period shows how rainfall trends evolved over more than
a century.
Observations:
Stability Over Time:
While individual years show extreme peaks, there is no clear
long-term trend of increasing or decreasing rainfall across cities from
the late 1800s to 2014.
This suggests that, at least in terms of total rainfall, the Irish
climate has been relatively stable.
Extreme Events (Post-1900):
Later periods (20th and early 21st centuries) show fewer extreme
peaks compared to the late 1800s. This might indicate improved weather
recording methods or natural climate variability.
Moving Averages (MA):
The moving averages (e.g., Galway_MA, Cork_Airport_MA) show smoother
trends, which help to highlight that while there are fluctuations in
rainfall, the overall patterns remain consistent.
Late 1800s to 1900:
This period includes the late 19th century, which appears to show
some notable variability and extreme rainfall events.
Observations:
High Variability:
The graph shows significant fluctuations in rainfall, with frequent
peaks exceeding 200 units (y-axis). These spikes may correspond to
extreme weather events such as storms or floods.
For example, there are sharp spikes in rainfall around the 1880s and
1890s.
Seasonal Patterns:
Despite the variability, there is evidence of a seasonal pattern,
with regular cycles of high and low rainfall. This could indicate
consistent annual weather cycles.
Extreme Events:
Notable peaks are observed during this period, possibly representing
extreme rainfall events.
Historical context: The late 19th century saw several significant
weather anomalies globally (e.g., volcanic eruptions affecting weather
patterns like Krakatoa in 1883).
Regional Comparison:
Galway (green) and Cork (purple) consistently show higher rainfall
compared to Dublin and Belfast. This suggests that western and southern
parts of Ireland experienced heavier rainfall during this period.
Late 1900s to 2000:
This period represents the late 20th century, during which rainfall
patterns could reflect changes due to natural variability,
industrialization, and early climate change effects. Here’s a breakdown
of the trends:
Stability in Seasonal Patterns:
The data shows consistent seasonal variation, with regular peaks and
troughs indicating wet and dry periods throughout the year.
Rainfall trends in this period align with the historical patterns
observed earlier (e.g., late 1800s to 1900).
Regional Differences:
Galway and Cork (western and southern Ireland) consistently show
higher rainfall than Dublin and Belfast (eastern and northern
Ireland).
This suggests that the Atlantic weather systems continue to dominate
rainfall patterns, bringing more moisture to the west and south.
Moving Averages (MA):
The moving averages (e.g., Galway_MA, Cork_Airport_MA) show smooth,
long-term trends without significant deviations.
There is no clear evidence of a long-term increase or decrease in
rainfall during this time.
Extreme Rainfall Events:
Notable Spikes:
Several sharp peaks in rainfall are visible during this period,
indicating extreme weather events. These may correspond to storms,
floods, or other anomalies.
Examples include:
The mid-1990s, which show a notable spike in rainfall across
multiple cities.
Late 1980s also exhibit some extreme events, though not as
pronounced.
Potential Causes:
These spikes could be linked to specific meteorological events, such
as:
Storms or cyclones: These could bring intense rainfall to
Ireland.
El Niño and La Niña events: These global phenomena can impact
weather patterns, potentially leading to wetter-than-normal conditions
in some regions.
Climate Context (Late 1900s):
Global Climate Trends:
The late 20th century saw the beginning of noticeable impacts of
climate change, including warmer temperatures and changing precipitation
patterns globally.
However, in this dataset, there is no clear evidence of a long-term
trend in rainfall totals during this period. Rainfall appears to remain
stable, with natural variability dominating the trends.
Irish Context:
Ireland’s weather is heavily influenced by the North Atlantic
Oscillation (NAO), which can cause variability in rainfall
patterns.
The NAO likely contributed to some of the variability seen during
this period, especially in years with extreme rainfall.
Late 2000s to 2014
This period covers the early 21st century, a time when global
climate change impacts began to become more pronounced, including more
extreme weather events. Let’s analyze the trends during this
timeframe:
General Observations (Late 2000s to 2014):
Seasonal Patterns:
As in previous decades, rainfall trends continue to exhibit seasonal
variability, with regular peaks and troughs.
This indicates that annual weather cycles remain intact, with wetter
periods occurring consistently at certain times of the year.
Regional Differences:
Galway and Cork (western and southern Ireland) maintain higher
rainfall levels compared to Dublin and Belfast (eastern and northern
Ireland), consistent with Ireland’s typical west-to-east rainfall
gradient.
This suggests that Atlantic-driven weather systems are still the
dominant factor in determining rainfall patterns.
Moving Averages (MA):
The moving averages (e.g., Galway_MA, Cork_Airport_MA) show smooth,
consistent trends with no drastic upward or downward shifts.
This indicates that while there are short-term fluctuations, the
overall rainfall totals remain stable.
Extreme Rainfall Events:
Notable Spikes in Rainfall:
The late 2000s to 2014 period shows several sharp peaks in rainfall,
indicating extreme weather events:
2009: A significant spike in rainfall is observed across multiple
cities, likely corresponding to a major storm or flood event.
2011: Another notable peak, potentially linked to extreme rainfall
events such as heavy storms or prolonged wet periods.
2014: Toward the end of the period, there are signs of increased
rainfall, possibly tied to another extreme weather event.
Historical Context for Extreme Events:
2009 Floods: In November 2009, Ireland experienced one of its worst
floods in decades due to heavy rainfall. The west and south (Galway and
Cork) were particularly affected, consistent with the observed spike in
the graph.
2011 Heavy Rainfall:
October 2011 saw severe flooding in Dublin, caused by heavy
rainfall. This event is likely reflected in the data.
2014 Storms:
The winter of 2013-2014 was marked by severe storms across Ireland,
leading to heavy rainfall and flooding. This could explain the peaks
toward the end of the dataset.
Climate Context (2000s to 2014):
Global Climate Trends:
The early 21st century saw increasing evidence of climate change
impacts, including more frequent and intense extreme weather
events.
While Ireland’s overall rainfall totals remain stable, the
occurrence of extreme rainfall events during this period may be linked
to these global changes.
Irish Context:
Ireland’s weather during this period was heavily influenced by:
North Atlantic Oscillation (NAO): Shifts in the NAO can cause wetter
winters or drier summers, contributing to variability.
Jet Stream Variability: Changes in the position and strength of the
jet stream may have led to prolonged periods of heavy rainfall during
storms.
Extreme Events Summary:
2009 Floods: Severe flooding in the west and south, particularly
Galway and Cork.
2011 Floods: Heavy rainfall caused major flooding in Dublin and
surrounding areas.
2014 Storms: Winter storms brought prolonged heavy rainfall,
affecting large parts of Ireland.
Summary:
Late 1800s to 1900: Characterized by high variability, extreme
rainfall events, and regular seasonal patterns.
Late 1800s to 2014: Rainfall patterns remain relatively stable over
the long term, with occasional extreme events and no significant
evidence of long-term climate change impacts on total rainfall in the
dataset.
Conclusion
Through this analysis, we can visualize monthly rainfall patterns at
various weather stations in Ireland. This journey demonstrated the power
of R for data processing, cleaning, and visualization. By understanding
rainfall patterns, we can gain valuable insights into climate trends and
make informed decisions.