GY672 A Assignment 2

Introduction

The key dataset used in this blog was the weather_stations.geojson dataset. This dataset includes monthly rainfall totals across 25 Irish weather stations from 1850 – 2014. As well as monthly rainfall totals other variables included in this dataset are coastal distance, elevation, easting, northing and geographical points. For this blog I will be creating a map of rainfall in Ireland across the 25 weather stations according to its median rainfall level in January. I will also be conducting a brief analysis of the result as well as other potential influences on rainfall such as coastal distance and elevation. The weather_stations_geojson (later named rain_data) dataset was filtered to January monthly rainfall totals from 1850 – 2014. Median rainfall for each weather station was then calculated from 1850 – 2014. A map was then created which included each station colour coded to represent its medium rainfall level in January. The code to achieve this is displayed below as well the nessary packages.

library("sf")

## Warning: package 'sf' was built under R version 4.3.3

## Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE

library("tmap")

## Warning: package 'tmap' was built under R version 4.3.3

## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')

library("tidyverse")

## Warning: package 'tidyverse' was built under R version 4.3.3

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'readr' was built under R version 4.3.3

## Warning: package 'dplyr' was built under R version 4.3.3

## Warning: package 'stringr' was built under R version 4.3.3

## Warning: package 'forcats' was built under R version 4.3.3

## Warning: package 'lubridate' was built under R version 4.3.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("spdep")

## Warning: package 'spdep' was built under R version 4.3.3

## Loading required package: spData

## Warning: package 'spData' was built under R version 4.3.3

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

# Filter data for January
rain_data <- st_read("C:/Users/lynch/Documents/R module assignment 2/weather_stations")

## Reading layer `weather_stations' from data source 
##   `C:\Users\lynch\Documents\R module assignment 2\weather_stations' 
##   using driver `GeoJSON'
## Simple feature collection with 49500 features and 11 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 46633.44 ymin: 59756.08 xmax: 330201.3 ymax: 457227.9
## Projected CRS: TM65 / Irish Grid

geo_data <- st_read("C:/Users/lynch/Documents/R module assignment 2/counties.geojson")

## Reading layer `counties' from data source 
##   `C:\Users\lynch\Documents\R module assignment 2\counties.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 40 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 17528.59 ymin: 19537.25 xmax: 366403.6 ymax: 466923.2
## Projected CRS: TM65 / Irish Grid

counties <- st_read('counties.geojson')

## Reading layer `counties' from data source 
##   `C:\Users\lynch\Documents\R module assignment 2\counties.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 40 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 17528.59 ymin: 19537.25 xmax: 366403.6 ymax: 466923.2
## Projected CRS: TM65 / Irish Grid

rain_jan <- rain_data %>%
  filter(Month == "Jan")

# Calculate median rainfall 
median_rainfall <- rain_jan %>%
  group_by(Station) %>%
  summarize(
    MedianRainfall = median(Rainfall, na.rm = TRUE),
    geometry = first(geometry),  
    County = first(County)      
  ) %>%
  st_as_sf()
   
tm_shape(counties) + 
  tm_borders() + 
  tm_shape(median_rainfall) +
  tm_dots(col='MedianRainfall',
          size=0.7,style='cont')

Analysis

As can be seen in the map, many of the weather stations in coastal counties have the highest level of median rainfall. This can be seen in counties such as Kerry which had median rainfall levels of up to 160mm. Possible clustering can be seen where weather stations near each other have similar median rainfall levels which is explicit in Wexford, Waterford and Cork. This idea is what Tober was referring to when he said, “everything is related to everything else, but near things are more related than distant things” (Tober, 1970: 236). The Moran’s I test is used to measure and test for spatial autocorrelation. Spatial autocorrelations refer to values similar or dissimilar clustering together rather than being spatially distributed randomly (Griffith, 1992). The Moran’s I test has been widely used to test and measure spatial autocorrelation (Getis, 2008). The Moron’s I statistic was calculated using this following code:

# 1. Create a spatial weights matrix using nearest neighbors
neighbors <- knn2nb(knearneigh(st_coordinates(median_rainfall), k = 5))

# 2. Convert neighbors to a spatial weights list
listw <- nb2listw(neighbors)

# 3. Calculate Moran's I for the median rainfall values at the weather stations
moran_result <- moran.test(median_rainfall$MedianRainfall, listw = listw)

# 4. Display the result
moran_result

## 
##  Moran I test under randomisation
## 
## data:  median_rainfall$MedianRainfall  
## weights: listw    
## 
## Moran I statistic standard deviate = 2.9857, p-value = 0.001415
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.26425540       -0.04166667        0.01049844

The Moran’s I statistic generated was 0.2643 which indicates weak to moderate spatial clustering amongst the 25 weather stations. The p – value generated was < 0.05 which allows us to reject the null hypothesis that there is no spatial autocorrelation. Further analysis is conducting specifically in the correlation between median rainfall and coastal distance as well as elevation. Before conducting these correlation tests, it is essential that a normality test is carried out to evaluate the most appropriate correlation test to use moving forward. A Shapiro – Wilkes test was used for this purpose and conducted for median rainfall, coastal distance and elevation. The code used to perform the Shapiro – Wilkes test and generate histograms are shown below. For median rainfall and elevation, the p value was equal to approximately 0.27 and 0.14 respectively, which indicates normal distribution for both. Coastal distance however was found to not have normal distribution (p < 0.05).

combined_data <- read.csv("combined_data")

shapiro.test(combined_data$MedianRainfall)

## 
##  Shapiro-Wilk normality test
## 
## data:  combined_data$MedianRainfall
## W = 0.95137, p-value = 0.2692

hist(combined_data$MedianRain)

shapiro.test(combined_data$coast_dist)

## 
##  Shapiro-Wilk normality test
## 
## data:  combined_data$coast_dist
## W = 0.87633, p-value = 0.005811

hist(combined_data$coast_dist)

shapiro.test(combined_data$Elevation)

## 
##  Shapiro-Wilk normality test
## 
## data:  combined_data$Elevation
## W = 0.93827, p-value = 0.135

hist(combined_data$Elevation)

Since coastal distance data is not normally distributed, the Spearman’s rank correlation test was used to test correlation between the two. The plot shows a negative correlation between median rainfall and coastal distance with the latter increasing as median rainfall decreases. That being said, the Spearman’s rank correlation test generated a p value of 0.07. This result is not enough to allow us to reject the null hypothesis which is that there is no monotonic relationship between the two. Since both median rainfall and elevation data are normally distributed, the Pearson’s correlation test is used. The plot for this, shows that as median rainfall decreases the elevation increases however at around 100m, elevation begins to increase as median rainfall increases. Pearson’s correlation test generated a correlation coefficient of – 0.11 implying a weak negative relationship between the two. The p – value generated was approximately 0.59, which does not allow us to reject the null hypothesis which is that there is no significant relationship between the two. The code used to conduct both Spearman’s and Pearson’s correlatation tests are displayed below as well as the code used to genrate the scatter plots with smoothed lines.

combined_data$MedianRain <- as.numeric(combined_data$MedianRainfall)


cor_test_spearman <- cor.test(combined_data$MedianRain, combined_data$coast_dist, method = "spearman")

# Print the results
print(cor_test_spearman)

## 
##  Spearman's rank correlation rho
## 
## data:  combined_data$MedianRain and combined_data$coast_dist
## S = 3558, p-value = 0.0707
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.3684615

cor_test <- cor.test(combined_data$MedianRainfall, combined_data$Elevation, method = "pearson")

cor_test <- cor.test(combined_data$MedianRainfall, combined_data$Elevation, method = "pearson")

# Print the results
print(cor_test)

## 
##  Pearson's product-moment correlation
## 
## data:  combined_data$MedianRainfall and combined_data$Elevation
## t = -0.54831, df = 23, p-value = 0.5888
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4868698  0.2947699
## sample estimates:
##        cor 
## -0.1135913

ggplot(combined_data, aes(x = coast_dist, y = MedianRainfall)) +
  geom_point(size = 0.2) + 
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(combined_data,aes(x=combined_data$Elevation,y=combined_data$MedianRain)) +
  geom_point(size=0.2) + 
  geom_smooth()

## Warning: Use of `combined_data$Elevation` is discouraged.
## ℹ Use `Elevation` instead.

## Warning: Use of `combined_data$MedianRain` is discouraged.
## ℹ Use `MedianRain` instead.

## Warning: Use of `combined_data$Elevation` is discouraged.
## ℹ Use `Elevation` instead.

## Warning: Use of `combined_data$MedianRain` is discouraged.
## ℹ Use `MedianRain` instead.

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusion

Median monthly rainfall totals for 25 weather stations across Ireland were mapped and colour coded according to their indivdual rainfall levels from 1850 – 2014. For the most part, high median rainfall levels were found in weather stations in coastal counties compared to ones located inland. Using the Moran’s I test there was spatial autocorrelation found amongst the 25 weather stations. An attempt was made to see if coastal distance and elevation could explain the spatial distribution of median rainfall levels. Unfortunately, neither Spearman’s or Pearson’s correlation tests found any significant relationships between median rainfall and the two variables. Future work could utilise more weather station samples and assess median or even mean rainfall in other months of the year. More variables and influences that could also be used, if possible, are temperature, orographic profile and topographical variability.

References

Getis, A. (2008) A history of the concept of spatial autocorrelation: A geographer’s perspective. Geographical analysis, 40(3), 297-309.

Griffith, D.A. (1992) What is spatial autocorrelation? Reflections on the past 25 years of spatial statistics. L’Espace géographique, 265-280.

Tobler, W.R. (1970) A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.

GY672 A Assignment 2

Sam Lynch

2025-01-11

Introduction

Analysis

Conclusion

References