Assignment2---Copy.knit

‘It’s always rainy in Ireland !’

In this blog, I’ll be exploring rainfall patterns across Ireland by analyzing data from four different weather stations. I’ll be using dygraphs, an R package that creates dynamic time series plots, to help visualize and understand these patterns. The blog is structured into four main sections. We’ll start with an Introduction that sets the context for our analysis, followed by Data Exploration where we’ll examine the raw data from our weather stations. Then we’ll move into Data Analysis to investigate patterns and relationships in the rainfall data, and finally look at Trends to understand how rainfall patterns have changed over time. To make this analysis transparent and reproducible, I’ve included expandable “show” buttons throughout the blog that reveal the exact R code used for each section of the analysis.

Introduction

In conducting this analysis using RStudio, the first crucial step was to load the required packages to ensure smooth execution of my investigation. I utilized several key R libraries, each offering specialized functions essential for different aspects of the analysis. The tidyverse package provided comprehensive tools for data analysis, manipulation, and visualization, while dplyr offered specific capabilities for data manipulation and transformation. For creating dynamic time series visualizations, I employed the dygraphs package. Additionally, I included sp and sf packages, which, although not directly used in this blog, are valuable tools for handling spatial data. The sp package enables spatial data manipulation, while sf provides functionality for working with spatial data based on the Simple Features standard.

library(tidyverse)

library(dplyr)

library(dygraphs)

library(sp)

library(RColorBrewer)

library(tmap)

library(leaflet)

library(sf)

Now, I will load the rainfall data and prepare it for further investigation.

load("rainfall.RData")

Data Exploration

For a thorough understanding of our dataset, I began with some exploratory data analysis by examining the basic structure of our data object using the str() function. This initial step allows us to inspect the data’s format, including the types of variables present and how the information is organized within our dataset.

rain_info <- load("rainfall.RData")
str(rain_info)

##  chr [1:2] "stations" "rain"

This shows that the rainfall.Rdata contains a two datasets: stations and rain.

They look like this: Stations:

stations

## # A tibble: 25 × 9
##    Station     Elevation Easting Northing   Lat  Long County Abbreviation Source
##    <chr>           <int>   <dbl>    <dbl> <dbl> <dbl> <chr>  <chr>        <chr> 
##  1 Athboy             87 270400   261700   53.6 -6.93 Meath  AB           Met E…
##  2 Foulksmills        71 284100   118400   52.3 -6.77 Wexfo… F            Met E…
##  3 Mullingar         112 241780   247765   53.5 -7.37 Westm… M            Met E…
##  4 Portlaw             8 246600   115200   52.3 -7.31 Water… P            Met E…
##  5 Rathdrum          131 319700   186000   52.9 -6.22 Wickl… RD           Met E…
##  6 Strokestown        49 194500   279100   53.8 -8.1  Rosco… S            Met E…
##  7 University…        14 129000   225600   53.3 -9.06 Galway UCG          Met E…
##  8 Drumsna            45 200000   295800   53.9 -8    Leitr… DAL          Met E…
##  9 Ardara             15 180788.  394679.  54.8 -8.29 Doneg… AR           Briffa
## 10 Armagh             62 287831.  345772.  54.4 -6.64 Armagh A            Armag…
## # ℹ 15 more rows

The “station” dataset is a tibble with 25 rows and 9 columns.
Columns: Station: Character variable representing the name of the weather station. Elevation: Integer variable representing the elevation of the station. Easting: Double variable representing the easting coordinate. Northing: Double variable representing the northing coordinate. Lat: Double variable representing the latitude of the station. Long: Double variable representing the longitude of the station. County: Character variable representing the county where the station is located. Abbreviation: Character variable representing the abbreviation of the county.

Rain:

rain

## # A tibble: 49,500 × 4
##     Year Month Rainfall Station
##    <dbl> <fct>    <dbl> <chr>  
##  1  1850 Jan      169   Ardara 
##  2  1851 Jan      236.  Ardara 
##  3  1852 Jan      250.  Ardara 
##  4  1853 Jan      209.  Ardara 
##  5  1854 Jan      188.  Ardara 
##  6  1855 Jan       32.3 Ardara 
##  7  1856 Jan      152.  Ardara 
##  8  1857 Jan      179.  Ardara 
##  9  1858 Jan      110.  Ardara 
## 10  1859 Jan      158.  Ardara 
## # ℹ 49,490 more rows

This shows that: The “rain” tibble is a data frame with 49,500 rows and 4 columns. Columns: Year: Numeric variable representing the year. Month: Factor variable representing the month. Rainfall: Numeric variable representing the amount of rainfall. Station: Character variable representing the weather station.

Lastly, I will push this data into a variable ‘rain_info’.

rain_info <- c("stations", "rain")

Data Analysis

To analyze our rainfall data, I followed a systematic approach using several dplyr functions in sequence. First, I grouped the data by both Year and Month using group_by. Then, using summarise, I calculated the total rainfall for each Year-Month combination. After removing these groupings with ungroup, I used transmute to isolate just the Rainfall column, keeping our data focused on this key measurement. I then converted this processed data into a time series format using the ts function, making sure to specify that we’re working with monthly data. Finally, I narrowed down our analysis to look at a specific 20-year period from 1877 to 1896 using the window function. This approach helps us organize our rainfall data chronologically and prepare it for more detailed analysis.

rain %>%  group_by(Year, Month) %>% 
  summarise(Rainfall = sum(Rainfall)) %>% 
  ungroup() %>% transmute(Rainfall) %>% 
  ts(start = c(1850, 1), freq = 12) -> rain_ts
rain_ts %>% window(c(1877, 1), c(1896, 12))

##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1877 4273.1 1855.2 2154.0 2956.1 1908.2 2084.6 2069.5 3537.6 1981.6 3406.6
## 1878 2370.6 1559.5 1158.3 1885.6 3050.8 3346.6 1012.7 2870.8 2318.3 2692.9
## 1879 2543.1 3010.7 1378.1 1761.5 1774.8 3874.5 2996.7 3064.7 3040.8 1183.8
## 1880 1193.6 2684.2 2296.4 2517.0  935.0 2212.3 3657.7 1299.0 2120.4 2145.1
## 1881  912.6 2953.5 2587.3 1130.6 1669.6 3203.0 1739.8 3258.4 1858.9 2594.0
## 1882 2004.4 2362.1 2104.3 2940.2 1931.6 2433.1 3467.1 2270.0 2294.0 2893.3
## 1883 4179.5 4360.4 1226.3 1666.8 1558.2 1494.0 2699.9 2742.8 3351.6 2485.3
## 1884 3624.0 3916.5 2946.4 1129.7 1797.2  657.8 2361.2 1240.9 1804.8 1827.9
## 1885 2454.8 3174.8 1896.5 2254.3 1811.1  756.8 1422.4 2098.9 3599.3 2880.4
## 1886 2814.2 2063.9 2517.4 1568.8 2795.1 1100.9 2576.8 1890.3 2729.4 3773.4
## 1887 2556.0 1185.6 1131.5 1243.8  969.7  372.1 1789.7 2109.4 2138.3 1747.7
## 1888 2242.4  643.0 2457.8 1397.0 1918.4 3047.9 3496.2 2209.9  830.6 1623.9
## 1889 2256.5 2027.5 1261.6 1863.9 2567.9  701.0 1591.8 4011.2 1376.3 3379.1
## 1890 3582.6 1143.0 2446.4 1146.3 1898.5 2205.3 1828.0 2117.9 2133.4 1624.8
## 1891 1468.1  253.7 1211.6 1535.9 2192.0 1854.9 1340.3 4031.8 2072.9 3545.9
## 1892 1772.9 2024.7  698.3  816.0 2808.7 1766.5 2252.2 4116.0 2533.2 2242.5
## 1893 2206.7 2556.2  589.2  745.7 1250.2 1331.0 1854.2 2843.5 1456.1 2035.8
## 1894 3296.4 2372.0 1553.9 2689.9 1806.5 1570.8 2851.6 2216.8  408.7 3082.4
## 1895 2584.5  745.6 2486.5 1442.5  457.5 1480.9 3110.2 3475.6  711.3 2607.3
## 1896 1219.5 1411.9 2992.7  887.8  279.1 1768.6 3711.8 1498.4 3980.5 2524.6
##         Nov    Dec
## 1877 4059.8 2959.0
## 1878 1651.4 1684.1
## 1879  713.8 1411.4
## 1880 3335.5 2480.2
## 1881 3667.9 3135.2
## 1882 3708.6 2905.7
## 1883 3174.6 1425.8
## 1884 2431.5 2915.1
## 1885 1858.9 1168.5
## 1886 2281.4 3653.2
## 1887 2361.0 2120.3
## 1888 3231.4 3549.7
## 1889 1436.9 2726.7
## 1890 4294.2 1756.6
## 1891 2576.4 3970.7
## 1892 3358.2 1869.3
## 1893 1633.4 2916.3
## 1894 2911.2 2428.8
## 1895 3544.9 3875.9
## 1896  870.6 4139.4

library(dygraphs)


create_rainfall_graph <- function(data, title = "Rainfall Data Over Time") {
  data %>%
    dygraph(main = title) %>%
    dySeries("Rainfall", color = "blue") %>%  
    dyAxis("x", label = "Year") %>%
    dyAxis("y", label = "Rainfall (mm)") %>%
    dyOptions(drawGrid = TRUE, strokeWidth = 2) %>%
    dyLegend(show = "always", width = 300)
}
create_rainfall_graph(rain_ts)

This step introduces the dygraphs library, then dygraph. Please feel free to hover around the graph to explore the data points.
I developed a function create_rainfall_graph for reusability to prevent duplicating the same customization code for every graph. This guarantees a uniform look, conserves time, and maintains a tidier main codebase. Employing one reusable function allows me to swiftly create professional-quality graphs, enabling me to concentrate on analysis rather than design.

rain_ts %>%
  window(c(1850, 1), c(1889, 12)) %>%
  create_rainfall_graph(title = "Rainfall Data (1850–1889)")

In this next phase of the analysis, I refined our visualization by implementing a specific time window restriction. By focusing on the years between 1850 and 1889, we created a more targeted view of our rainfall data. Using dygraphs, I customized the visualization with specific width and height dimensions to ensure optimal display of the data. This narrower time frame allows us to examine the rainfall patterns in greater detail during this 40-year period, making it easier to identify trends and patterns that might be less visible in a broader time range.

create_rainfall_graph(rain_ts, title = "Full Rainfall Data") %>%
  dyRangeSelector()

This step adds interactivity to the dygraph by implementing a range selector (dyRangeSelector). You can drag and adjust the selected time window, for better exploration of the data.

create_rainfall_graph(rain_ts, "Rainfall Data Over Time") %>% 
  dyRangeSelector() %>% 
  dyRoller(rollPeriod = 600)

Here, a rolling mean is added to the interactive dygraph using the dyRoller function. like mutate but drops unreferenced variables: here used to select out a single column as a variable. The rolling mean smoothens the curve and helps identify trends by averaging over a specified rolling period (600 months in this case). Notice how stable it looks than before!

rain %>%  group_by(Year, Month) %>% filter(Station == "Dublin Airport") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  dub_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "Belfast") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  bel_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "University College Galway") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  ucg_ts
rain %>%  group_by(Year, Month) %>% filter(Station == "Cork Airport") %>%
  summarise(Rainfall = sum(Rainfall)) %>% ungroup() %>% transmute(Rainfall) %>%
  ts(start = c(1850, 1), freq = 12) ->  cor_ts
beldubucgcor_ts <- cbind(bel_ts, dub_ts, ucg_ts, cor_ts)
window(beldubucgcor_ts, c(1850, 1), c(1850, 12))

##          bel_ts dub_ts ucg_ts cor_ts
## Jan 1850  115.7   75.8  108.9  155.3
## Feb 1850  120.5   47.8  131.5   92.6
## Mar 1850   56.8   18.5   56.6   56.0
## Apr 1850  142.6   97.5  120.5  207.2
## May 1850   57.9   58.6   69.8   35.3
## Jun 1850   62.0   43.6   74.7   11.4
## Jul 1850   96.3   66.0   89.1  179.0
## Aug 1850  110.4   41.2  136.8   46.5
## Sep 1850   65.8   54.2   85.2   40.7
## Oct 1850   87.6   40.4   90.7   53.8
## Nov 1850  104.4   60.0  131.3  153.2
## Dec 1850   57.6   81.1   90.6  169.4

This step involves creating separate time series objects (dub_ts, bel_ts, ucg_ts, cor_ts) for each weather station. These are then combined into a single object (beldubucgcor_ts) for comparative analysis. The window function is applied to focus on a specific time range (1850- January to December).

beldubucgcor_ts %>% dygraph(width = 960, height = 360) %>%
  dyRangeSelector()

The next step involved creating a comparative visualization of all four weather stations using dygraphs, complete with an interactive range selector for exploring the data. However, this visualization proved to be less than ideal for our analysis. The challenge lies in the close proximity of the rainfall values across stations – when plotted together, the lines overlap significantly, making it difficult to distinguish between individual weather stations and their unique patterns. This visualization limitation makes it challenging to effectively compare rainfall trends across different locations.

dub_ts %>% 
  dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Dublin") %>%
  dySeries(color = "blue")

bel_ts %>% 
  dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "Belfast") %>%
  dySeries(color = "green")

ucg_ts %>% 
  dygraph(width = 800, height = 130, group = "dub_belf_ucg_cor", main = "University College Galway") %>%
  dySeries(color = "red")

cor_ts %>% 
  dygraph(width = 800, height = 170, group = "dub_belf_ucg_cor", main = "Cork Airport") %>%
  dySeries(color = "purple") %>%
  dyRangeSelector()

To overcome the visualization challenge from our previous combined graph, I took a different approach by creating separate dygraphs for each weather station. Each station now has its own individual interactive plot, equipped with a range selector for detailed exploration of specific time periods. This separation makes it much easier to observe and analyze the unique rainfall patterns at each location. When comparing this approach to our earlier attempt at combining all stations in one graph, the advantage becomes clear – we can now distinctly see each station’s rainfall trends without the confusion of overlapping lines. This improved visualization strategy demonstrates why it’s often better to separate multiple time series rather than superimposing them on a single graph, especially when the values are similar and the patterns need to be clearly distinguished

Trends and Conclusion.

1- Dublin, Belfast, University College Galway and Cork airport all of them had their highest rainfall in January of 1900.
2- January 1900 is the wettest month in modern Irish history.
3- Each weather station has different troughs and crests.This means that when the rain was the high in one of the stations, say Dublin, it did not necessarily mean that others will also have their highest rain at the same time.(except national outliers like Jan 1900). This is because all these cities are quite far from each other.
4- All the stations do not have a visible difference if seen in a larger timeframe(> 15 years).
5- When timeframe to observe is decreased, there are fluctuations on a day to day, or month-to-month basis, which are visible.

Rainfall Analysis through Four Weather Stations in Ireland using R.

Introduction

Data Exploration

Data Analysis

Trends and Conclusion.