knitr::opts_chunk$set(echo = TRUE)

Exercise (1/2). Calculating climate change in USA

1.1 Uploading libraries and data

We start this section by uploading the libraries for spatial data handling, analysis, and visualization, such as sf, terra, and tidyverse, among others. We then load our primary datasets: the us_states data for geographic information of US states and a NetCDF file containing the SPEI (Standardized Precipitation Evapotranspiration Index) data. This then sets us up to examine the impact of climate change across the US by analyzing drought conditions over the past 50 years.

#Load Libraries

library(sf)

## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

library(sp)
library(spData)

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(sp)
library(terra)

## terra 1.7.71
## 
## Attaching package: 'terra'
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(exactextractr)
library(zoo)

## 
## Attaching package: 'zoo'
## 
## The following object is masked from 'package:terra':
## 
##     time<-
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(gdistance)

## Loading required package: raster
## 
## Attaching package: 'raster'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Loading required package: igraph
## 
## Attaching package: 'igraph'
## 
## The following object is masked from 'package:raster':
## 
##     union
## 
## The following objects are masked from 'package:terra':
## 
##     blocks, compare, union
## 
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## 
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## 
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## 
## The following object is masked from 'package:tidyr':
## 
##     crossing
## 
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## 
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## 
## The following object is masked from 'package:base':
## 
##     union
## 
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Attaching package: 'gdistance'
## 
## The following object is masked from 'package:igraph':
## 
##     normalize

#Loading Data

#Use the us_states data on the geography of US states
sf.states <-us_states

#Use SPEI data
setwd('~/BSE/BSE/Term 2/Geospatial Data Science/Assignment 3')
r.spei <- rast("spei01.nc")
r.spei

## class       : SpatRaster 
## dimensions  : 360, 720, 1380  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
## source      : spei01.nc 
## varname     : spei (Standardized Precipitation-Evapotranspiration Index) 
## names       :   spei_1,   spei_2,   spei_3,   spei_4,   spei_5,   spei_6, ... 
## unit        : z-values, z-values, z-values, z-values, z-values, z-values, ... 
## time (days) : 1901-01-16 to 2015-12-16

1.2 Organizing and editing the data

In this section we process our data to study drought trends over the last 50 years. Initially, we subset the SPEI dataset to include only the past 50 years, aligning it spatially with US states’ boundaries for precise regional analysis. Following this, we perform a detailed examination by extracting average SPEI values for New York as a case study, then extend our method to all states. This involves aggregating data across regions, cleaning column names for clarity, and integrating SPEI values with geographical information. The data is further transformed into a long format and grouped by region and date, enabling us to assess drought trends with greater granularity. This section lays the groundwork for the next section which aims to visualise how drought conditions have evolved across the US over time.

#Subset of the past 50 years
r.spei_50 <- r.spei[[781:1380]] # 600 months to get a 50 year period

#Spatially align SPEI data with the geographic boundaries of US states. 
r.states_crop<-crop(r.spei,sf.states)
#Plot
plot(r.states_crop[[1]])
plot(sf.states$geometry,lwd=3,add=TRUE)

# Testing with one layer

# Extracting SPEI for New York
r.hope <- exact_extract(x=r.spei[[781:1380]],y=sf.states%>%filter(NAME=='New York'),fun='mean')

## Warning in .local(x, y, ...): Polygons transformed to raster CRS (EPSG:NA)

# Aggregating SPEI Data Across Regions
sf.states_spei<- sf.states%>%
  mutate(spei=r.hope)

sf.states_agg <- sf.states_spei%>%
  group_by(REGION)%>%
  summarise(spei_region = mean(spei))

## Warning: There were 4 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `spei_region = mean(spei)`.
## ℹ In group 1: `REGION = Norteast`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.

# Carrying out with all layers

# Extracting SPEI Values for All States
r.spei_index_full <- exact_extract(x=r.spei_50,y=sf.states,fun='mean')

## Warning in .local(x, y, ...): Polygons transformed to raster CRS (EPSG:NA)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%

# Cleaning Column Names
names(r.spei_index_full) <- gsub("mean.spei_", "", names(r.spei_index_full))

#Combining SPEI Values with State Geographies
sf.states_spei_full <- cbind(sf.states,r.spei_index_full) #have checked with individual states that this lines up correctly 

# Setting Column Names to Dates
dt <- seq(as.Date("1974-01-01"), as.Date("2023-12-01"), by = "months")
colnames(sf.states_spei_full)[7:606]<-as.character(dt)

# Preparing the Final Time Series Dataset
sf.ts_fin <- st_drop_geometry(sf.states_spei_full)

# Transforming SPEI Data to Long Format
df_time_series <- sf.ts_fin %>%
  pivot_longer(cols = names(sf.ts_fin)[7:606], names_to = "spei", values_to = "value")%>%
  rename(date= spei)

# Converting Date Strings to Date Objects
df_time_series$date <-as.Date(df_time_series$date)

# Edit the df_time_series to include a column for year and to group the data by region and date.
df_time_series_region<-df_time_series%>%
  mutate (year= format(date,'%Y'))%>%
  group_by(REGION,date)%>%
  summarise(av_spei = mean(value)) # Grouped by year in SPEI and region

## `summarise()` has grouped output by 'REGION'. You can override using the
## `.groups` argument.

1.3 Visualising how drought conditions have changed across the US over time.

In this section, we aim to visualize the change of the SPEI index across US regions over the past 50 years. The first plot, using precise dates, proved too cluttered for clear analysis, prompting a refined approach that aggregates data by year to simplify the visual representation. Despite adjustments, including the introduction of smoothing lines to highlight trends, clarity remained an issue. The final attempt strategically employs color differentiation for regions and optimizes the x-axis with five-year intervals, significantly enhancing plot readability. Visualizations like the final plot help to give us nuanced insights into the regional impacts of climate change across the United States.

# First plot attempt
ggplot(df_time_series_region, aes(x= date, y=av_spei,color=REGION))+
  geom_line() # Plot is too cluttered using the date, will attempt next plot using year.

# Aggregating Data by Year
df_tsry<-df_time_series%>%
  mutate (year= format(date,'%Y'))%>%
  group_by(REGION,year)%>%
  summarise(av_spei = mean(value))

## `summarise()` has grouped output by 'REGION'. You can override using the
## `.groups` argument.

# Standardize dates
df_tsry$year <- as.Date(paste0(df_tsry$year, "-01-01"))

# Second Plot Attempt
ggplot(df_tsry, aes(x= year, y=av_spei,color=REGION))+
  geom_line()+
  geom_smooth() # Graph still unclear.

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Third Plot attempt
ggplot(df_tsry, aes(x = year, y = av_spei)) + 
  geom_line(aes(color = REGION)) + #need to but colour here not globally or else you get 4 geom smooths
  geom_smooth() + 
  scale_x_date(date_breaks = "5 years", date_labels = "%Y") # Plot each region's data with a different colour and adjust the x-axis to improve readability.

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Exercise (2/2). Transportation centrality and isolation

2.1 Loading Data

setwd('~/BSE/BSE/Term 2/Geospatial Data Science/Assignment 3')
populated_dat <- read_sf('ne_10m_populated_places',"ne_10m_populated_places")
roads_data <- read_sf('ne_10m_roads',"ne_10m_roads")

2.2 Setting up the data

For this exercise we are asked only for only spain, for this we filter the world dataframe, and the populated data. In addition, we will only work with the top 10 most populated cities in Spain, for this excersice the dataset has different metrics for population, we used “POP2025” for this occasion.

After filtering our target dataframes, we use st_intersection to find the common ground between the spain boundaries and the roads dataframe.

spain <- world %>% 
  filter(name_long == "Spain") #Filtering for only Spain

populated_data <- populated_dat %>% 
  arrange(desc(POP_MAX)) %>%   #The dataset have different metrics for population we used POP2025, which can vary a little bit from the others.
  filter(ADM0NAME == "Spain") %>% 
  arrange(desc(POP_MAX)) %>% 
  slice(1:10) #After arrange get the top 10

roads_in_spain <- st_intersection(roads_data, spain)

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

2.3 Mapping top 10 Cities in Spain

Here we use ggplot to get a final dataframe for spain, here we can see the red dots indicating the cities, and also the black lines representing the road system. This visual analysis help us identify the most important cities. For example it is clear how all roads are connected to the center in Madrid which is the capital of the country.

# Plot the map with the cropped road data
ggplot() +
  geom_sf(data = spain) +
  geom_sf(data = roads_in_spain, color = "black") +
  geom_sf(data = populated_data, color = "red")

2.4 Distances

Now we need to compute the bilateral matrix of distances, base on the roads, not euclidian distances. First off we will create an empty raster and then rasterize our roads spatial data and set up the friction of where is we are “less likely to pass” when computing the distances.

sf.roads <- roads_in_spain %>% 
  st_transform('EPSG:4326')

r.template <- rast() %>% 
  crop(sf.roads)

sf.roads.rast <- rasterize(vect(sf.roads),r.template)

sf.pop <- populated_data #rename the previous loaded dataframe
sf.points <- st_point_on_surface(sf.roads)

## Warning: st_point_on_surface assumes attributes are constant over geometries

## Warning in st_point_on_surface.sfc(st_geometry(x)): st_point_on_surface may not
## give correct results for longitude/latitude data

# First, transform the raster into a transition
# matrix. For that, replace missing values:

vv<-values(sf.roads.rast)
vv[is.nan(vv)] <- 1/100 # setting the friction, 100x less likely to cross. Wherever there's a NaN, we're setting the value to 1/100. This makes those areas without roads 100 times more difficult to cross compared to the areas with roads.

values(sf.roads.rast) <- vv
sf.raster <- raster::raster(sf.roads.rast)

tr.matrix <- transition(               #Create a transition matrix 
  x = raster::raster(sf.roads.rast),
  transitionFunction = mean,
  directions = 8 #The directions = 8 means that movement can be to any of the 8 surrounding cells (up, down, left, right, and diagonals)
)

Next, we go on to compute the matrix of bilateral distances between all the top 10 populated cities in spain that we selected before.

#Calculate a bilateral distance matrix

# Empty matrix:
dist.mat.riv <- matrix(
  0,
  nrow = nrow(sf.pop),
  ncol = nrow(sf.pop)
  )

for (i in 1:nrow(sf.pop)) {
  
  for (j in 1:nrow(sf.pop)) {
    
    # Creating the sp points:
    sp.point.1 <- as_Spatial(sf.pop[i,])
    sp.point.2 <- as_Spatial(sf.pop[j,])
    
    # Caluculating path:
    sp.distance <- shortestPath(
      x = tr.matrix,
      origin = sp.point.1,
      goal = sp.point.2,
      output = "SpatialLines"
    )
    
    # Extracting length:
    sf.distance <- st_as_sf(sp.distance)
    
    dist.mat.riv[i,j] <- st_length(sf.distance)
    
  }
  
}

After we compute it, we need to make a better format out of it, so is easier to handle for the later plotting that we are going to do. What we do is set up the names of the rows and columns and then put it in a long format taking into account the columns of origin and destination.

Furthermore, after setting up the long format we have to filter the origin to only madrid and vigo so we can compare both of them in the graph. Finally after all those steps we are ready to plot.

# Transform it into a data frame:
location_name <- sf.pop$NAME

colnames(dist.mat.riv) <- location_name #setting up columns name
rownames(dist.mat.riv) <- location_name #setting up rows name

dist.df <- as.data.frame(dist.mat.riv) %>%
  rownames_to_column(var = "origin") %>%
  gather(key = "destination", value = "value", -origin) # set up long format of the matrix

dist.df_obj <- dist.df %>% 
  filter(origin == "Madrid" | origin == "Vigo") #filter out only the origins of vigo or madrid

ggplot(dist.df_obj, aes(x = value*100, fill = origin, color = origin)) +
  geom_density(alpha = 0.7) +
  scale_fill_manual(values = c("Madrid" = "#f4b4b4", "Vigo" = "#84d4dc")) +
  scale_color_manual(values = c("Madrid" = "#f4b4b4", "Vigo" = "#84d4dc"), guide = "none") +
  labs(x = "Distance (meters)", y = "Density", fill = "Origin") +
  theme_minimal()

Analyzing our graph we can notice how Madrid distances to the other major cities of spain are concentrated in one spot, and are closer to zero, meaning that most of those cities are prett close to madrid, this makes sense after we saw the map and how all of the roads led to madrid the capital of spain. On the other hand Vigo distribution is flatter and its peak is farther than the madrid one, meaning that it is more isolated in comparison to the Capital Madrid.

Geospatial Data Science 3

Alejandro Rodriguez, David Frankel, Tiago Zibecchi

2024-03-04