EVEN 2909: Intro to Global Engineering and Resilience

Instructors: Evan Thomas, Taylor Sharpe

Overview

The term remote sensing usually describes the collection of data by satellites. In most cases, “remote” refers to spectral imagery collected by cameras and other spectral instruments across a broad range of wavelengths. In the case of Earth observation, satellites take spectral data reflecting from the atmosphere and the Earth’s surface. Interpretation of this data (often represented as imagery) requires an understanding of spectral data and the physical properties of the Earth and atmosphere. It also often requires calibration against data collected on the Earth’s surface or in the atmosphere directly – data from sensors that are in-situ, rather than “remote”.

NASA Tutorials on Remote Sensing and Drought Detection

Before we dive into analyzing and visualizing data ourselves, take the following tutorials from NASA on remote sensing and drought detection. Some of this content will be included in the quiz associated with this lab activity, so it’s important to complete these.

NASA Fundamentals of Remote Sensing: https://rise.articulate.com/share/Cs2pB_Kx2Mdnuv4zyeakUAPMhEEdOond#/

NASA Advanced Webinar Remote Sensing of Drought (Watch the 1st Webinar, up to the 1:12:00 mark): https://appliedsciences.nasa.gov/get-involved/training/english/arset-remote-sensing-drought

Intro to ArcGIS

GIS stands for Global Information System. The USGS defines it as “a computer system that analyzes and displays geographically referenced information.” In the first part of this lab, you will become familiar with ESRI ArcGIS Online, one of the leading GIS programs. You will create a ArcGIS Online account, link it to the CU Boulder system, and then load temporal and spatial data layers to conduct a preliminary investigation into the relationships between several of these variables.

Throughout this lab, general instructions and guidance are provided, but not every single step-by-step instruction. This is so that you can have the chance to explore these tools and learn how they work rather than just following a recipe.

1 - Sign into ArcGIS Online using CU Boulder’s SSO Account. This will use your Identikey credentials - https://ucboulder.maps.arcgis.com/ (open link in new window)

2 - Create a new map.

3 - Import the monthly CHIRPS rainfall data set, called Total Rainfall CHIRPS (Month). You can do this by selecting “Add Layer,” “ArcGIS Online,” and searching for “Total Rainfall CHIRPS”.

Take a moment to learn about the CHIRPs data here - https://www.chc.ucsb.edu/data/chirps (open link in new window)

4 - Add “Borehole Runtime Data” which is shared by ethomas_ucboulder with all users.

You can find this by searching for it within arcGIS Online.

This data set reflects daily runtime in hours of groundwater boreholes as collected by sensors installed in Ethiopia and Kenya and transmitting data daily over satellite and cellular networks. This is data from 2017 through most of 2018, representing groundwater use across the Afar region of Ethiopia and across much of northern Kenya, for approximately 1.3 million people.

This is what this looks like in the field:

Water Practices in Northern Kenya
Water Practices in Northern Kenya

6 - Create a map overlaying the CHIRPS data with the borehole sensor data.

Within “Layers,” select “Borehole Runtime Data” and “Choose attributes.” You want to show Volume. Choose a color that makes sense to show borehole runtime.

7 - Both of these data sets are time series in addition to having spatial coordinates. Animate this map with the time series data, and add a legend.

Examine the Borehole Runtime Data by selecting “Show table” and looking at what range of dates have data.

Select “Time,” and go to the Time Slider Options. Set Start Date and End Dates to match what you see in the data table. Select a one month timestep to examine data over time.

You should get a map that looks something like the plot below. Are there any visible trends over time?

Screenshot of ArcGIS Online Plot of CHIRPS rainfall and in-situ borehole runtime
Screenshot of ArcGIS Online Plot of CHIRPS rainfall and in-situ borehole runtime

Using R to analyze data

In the second part of this lab, we’re going to use R to analyze these two data sets in more detail. For this section, we are using the data presented in our paper, “Quantifying increased groundwater demand from prolonged drought in the East African Rift Valley”. The paper is linked here: https://www.dropbox.com/scl/fi/96llqdjamzygej9xoaf4t/STOTEN_RainfallBoreholeEastAfrica_2019.pdf?rlkey=kahigjbq5hrfveltt8342tbg2&dl=0

In this paper, we use a negative bionomial model to correlate rainfall and borehole use across the region. We are going to take a simplified approach in this lab. The goal here is to make you familiar with the most popular programming for statistical analysis, called R.

1 - Download and install R Studio - https://posit.co/downloads/ You will need to install both R and RStudio.

2 - Create a new folder on your computer for this assignment, and create a new R script and save it in that folder.

3 - R leans heavily on functions and capabilities developed by third parties. To use these functions, you have to install libraries associated with them. The first one we’ll use is “ggplot2”, which you can learn about here - https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf

Install the library “ggplot2” using this command in the console “install.packages(”ggplot2”)”

4 - Pasted below is a R script that you can cut and paste into your own file. It is commented so you can see what’s going on.

Rainfall and Borehole runtime analysis

This analysis uses the same borehole runtime and CHIRPS data you examined earlier using ArcGIS, with the following legend -

Variables:

  • id=the unique identifier of a borehole
  • week
  • date
  • month of year
  • rm=mean rainfall
  • hm=mean borehole hours
  • rs=sum rainfall
  • hs=sum borehole hours
  • nonmiss=(0=incomplete borehole data, 1=complete borehole data)
  • location (50=Kenya, 51=Ethiopia)
  • country
  • no_activity=1 if no pump hours for duration of observation period
  • rmlag1 & rmlag2=1 & 2 week lagged mean rainfall
  • rslag1 & rslag2=1 & 2 week lagged sum rainfall
  • drought=(1=no rainfall in previous week, 0=rainfall in previous week)

The reason the word “lagged” is used is because we know from previous experiments that the response of borehole runtime “lags” rainfall, meaning borehole usage changes occur one or two weeks after rainfall occurs.

#Borehole Data Analysis

#This tells your script to load the ggplot2 library - this won't work unless you've already installed the library!
require(ggplot2)
## Loading required package: ggplot2
require(ggpubr) #Another required library
## Loading required package: ggpubr
#This line reads a publicly posted CSV file and loads it into a data frame.
BoreholeData = read.csv("https://www.dropbox.com/scl/fi/y38llvmskhv0n1gam9kt3/rain_bore_week_10_3_2018.csv?rlkey=cegb4b9he3sgpom7owni9245m&dl=1",header=T)

#Often, raw retrieved data will have variable names that are unclear, and make your code harder to read. We will rename some of these variables here, while leaving "id" as is:
colnames(BoreholeData)[colnames(BoreholeData) == "hs"] <- "weeklyRuntime"
colnames(BoreholeData)[colnames(BoreholeData) == "rmlag1"] <- "weeklyRainfall"

Let’s first look at all of the historical data for some of the boreholes, to see if we observe any general trends. First, we’ll get a list of all of the borehole IDs. Then, we’ll select a random subset of those, and display them. There will be a lot of variation between boreholes, including some sites with very little data, but there will likely be a visible trend. Run this snippet several times to look at new random selections of boreholes:

#Randomly select only some boreholes to plot data in a grid, to look for trends
someData <- subset(BoreholeData, id %in% sample(unique(BoreholeData$id))[1:16])

#Make a plot showing runtime vs. rainfall, broken out by borehole ID
ggplot(someData, aes(x = weeklyRainfall, y = weeklyRuntime, color=as.factor(id))) + 
 geom_point() +
 #stat_smooth(method=lm,se=FALSE) +
 xlim(0, max(BoreholeData$weeklyRainfall)) + ylim(0, max(BoreholeData$weeklyRuntime)) +
 labs(x = "Mean rainfall in previous week (mm)", 
      y = "Borehole runtime this week (hours)", color = "Borehole ID") +
  facet_wrap(~ id)

To examine this relationship, we’ll use a linear regression model. In the following code, we plot the rainfall and borehole data for all boreholes, and run a linear regression. While the rainfall and borehole data are matched to each other (rainfall at the same site as the borehole runtime), in real life, there are going to be other differences that influence the relationship between rainfall and borehole use at each site, including things like proximity to surface water sources, age of the borehole, population density, recent breakdown and repair activities, etc. We don’t have data on all of these different characteristics, but we expect that the linear correlation between rainfall and borehole use isn’t identical between sites.

To address this, in the linear regression below we use the borehole unique id as a fixed effect. This means that the R code will run each linear regression independently, allowing the intercept to vary between each of the borehole locations. The slope, however, will be an average for all of the sites.

The plot that is is generated shows only one of the linear regressions, but the slope reflects the overall mean relationship between rainfall and borehole runtime.

#This line runs a linear regression of the sum of weekly borehole runtime against the sum of rainfall in the previous week at that site (4 km CHIRPS resolution). Further, it uses the id of the borehole as a fixed effect. In R, a linear regression uses the function lm():

BoreholeFixedEffectRegression = lm(BoreholeData$weeklyRuntime~BoreholeData$weeklyRainfall + factor(BoreholeData$id))

#What is included in this model? You can use the command summary(BoreholeFixedEffectRegression) to examine it.
#boxplot(BoreholeFixedEffectRegression$coefficients[3:224])

#This code chunk plots this data as well as the linear regression and associated regression parameters
cbPallete <- c("#56B4E9","#004166") #Selects two colors for the two countries

ggplot(BoreholeData, aes(x = weeklyRainfall, y = weeklyRuntime, col=country)) + 
 geom_point() + scale_colour_manual(values=cbPallete) +
 xlim(0, max(BoreholeData$weeklyRainfall)) + ylim(0, max(BoreholeData$weeklyRuntime)) +
 labs(x = "Mean rainfall in previous week (mm)", 
      y = "Borehole runtime this week (hours)", col = "Location", 
      title = paste("R2 = ",signif(summary(BoreholeFixedEffectRegression)$adj.r.squared, 5),
                     "Intercept =",signif(BoreholeFixedEffectRegression$coef[[1]],5),
                     " Slope =",signif(BoreholeFixedEffectRegression$coef[[2]], 5),
                     " p =",signif(summary(BoreholeFixedEffectRegression)$coef[2,4], 5))) + 
geom_abline(intercept=BoreholeFixedEffectRegression$coef[[1]],
            slope=BoreholeFixedEffectRegression$coef[[2]], col = "red", linewidth=1) #+

  #stat_smooth(method=lm, se=FALSE)

On the plot above, the red line shows the relationship between rainfall and borehole runtime at one specific borehole, projecting rainfall’s impact based on all of the data. In other words, the line would be pushed up or down to represent the zero-rain borehole runtime at each different borehole. Let’s also look at the overall relationship between rainfall and borehole runtime, by country. You can do this by uncommenting the commented out last two lines in the R script above, by removing the “#” sign in front of lines. This single line will add a best-fit line to each country’s data.

What are the results?

In the real world, there are some complex interactions between seasonality, infrastructure use and maintenance, and human behaviors. Here is a 3D scatterplot of rainfall, runtime and month of year at all the boreholes:

require(scatterplot3d) #See: https://cran.r-project.org/web/packages/scatterplot3d/vignettes/s3d.pdf
## Loading required package: scatterplot3d
#This is a 3D scatter plot of mean weekly borehole runtime, 1 week lagged rainfall, and the week of the year
scatterplot3d(na.omit(data.frame(BoreholeData$month,BoreholeData$hm,BoreholeData$weeklyRainfall)), pch = 3, color="steelblue",
               main="3D Scatter Plot of Rainfall, Runtime and Month of Year",
              xlab = "Month of Year",
              ylab = "Mean rainfall in previous week (mm)",
              zlab = "Borehole runtime this week (hours)",
              mar = c(3, 3, 3, 3) + 0.1)

This plot might teach us something, but it is a bit hard to read. A more direct way to examine the relationship between seasonality, rainfall, and borehole runtime could be to present “timeseries” data. Where the scatter plots above show data collected together, timeseries plots show the evolution and behaviors of our data over time. Timeseries analysis helps us examine patterns and trends in the temporal domain.

Uncomment the last line in this code, and run it several times to look at results from individual, randomly-selected boreholes.

#Randomly select a single borehole to examine its timeseries data
oneData <- subset(BoreholeData, id %in% sample(unique(BoreholeData$id))[1])

#Create two objects that will each store a timeseries plot
runtime <- ggplot(oneData, aes(x=week, y=weeklyRuntime)) + 
  geom_point(color="forestgreen") + xlab("") + ylab("Weekly Runtime (hr)")
rainfall <- ggplot(oneData, aes(x=week, y=weeklyRainfall)) + 
  geom_col(fill="lightblue") + xlab("Week") + ylab("Weekly Rainfall (mm)")

#ggpubr::ggarrange(runtime, rainfall, ncol = 1, nrow = 2, align = "hv") #Plot both together

Now, we will combine all of the data to show the AVERAGE of weekly timeseries data from ALL of the boreholes. We’ll use the mean values of borehole runtime and rainfall over time, to see if there is a clear pattern to be seen.

runtimeTS <- aggregate(hm ~ week, BoreholeData, mean)  #Average all of the runtime data
rainfallTS <- aggregate(rm ~ week, BoreholeData, mean) #Average all of the rainfall data

#Make two objects, each storing a timeseries plot
runtimeAll <- ggplot(runtimeTS, aes(x=week, y=hm)) + 
  geom_area(color="darkgreen", fill="lightgreen") + xlab("") + ylab("Weekly Runtime (hr)")
rainfallAll <- ggplot(rainfallTS, aes(x=week, y=rm)) + 
  geom_area(color="darkblue", fill="lightblue") + xlab("Week") + ylab("Weekly Rainfall (mm)")

ggpubr::ggarrange(runtimeAll, rainfallAll, ncol = 1, nrow = 2, align = "hv") #Plot both together

In the actual publication that resulted from analysis of these data, results were shown as timeseries data and also as modeled projections. This allowed the authors to make borehole runtime predictions based on changes in local rainfall. These predictions could be used to help support operations and maintenance programs in drought-prone regions, improving water availability to impacted communities. Note that while Kenya and Ethiopia saw different rainfall patterns and borehole runtime, the same trend is visible in both datasets.

Findings from borehole runtime and rainfall analysis
Findings from borehole runtime and rainfall analysis

Net Primary Productivity and Rainfall

In the last part of this lab, you will learn about Net Primary Productivity, and how it may correlate to rainfall. Both of these parameters can be detected by satellite. Net Primary Productivity is a critical component of the terrestrial carbon cycle.

Learn about Net Primary Productivity: https://neo.gsfc.nasa.gov/view.php?datasetId=MOD17A3H_Y_NPP NPP Animation: https://svs.gsfc.nasa.gov/30380

We want to analyze how NPP may correlate to rainfall over Africa. This requires both spatial and temporal correlation of two data sets. To do this, we’re going to lean on this publicly posted example: https://hakimabdi.com/blog/test-pixelwise-correlation-between-two-time-series-of-gridded-satellite-data-in-r

1 - Download the gridcorts.R function here: https://gist.github.com/hakimabdi/7308bbd6d9d94cf0a1b8 Put the “gridcorts.R” file in the same directory as your R script.

2 - Install the raster R package.

3 - Download the npp.tif and rain.tif raster files, and load them into the same directory as your R script: https://www.dropbox.com/s/vk0yutk4vm4tyae/gridcorts_data.zip?dl=1

4 - Run the script below to generate a comparison of NPP to rainfall over Africa. Note, this code can take quite a while to run. Be patient!

require(raster)
## Loading required package: raster
## Warning: package 'raster' was built under R version 4.2.3
## Loading required package: sp
## Warning: package 'sp' was built under R version 4.2.3
source("gridcorts.R")

rstack <- stack("npp.tif","rain.tif")

#Uncomment the next two lines - they take a few minutes to run
#system.time(cg1 <- suppressWarnings(gridcorts(rasterstack = rstack, method = "spearman", type = "both")))

#plot(cg1)

Lab Evaluation

Your understanding of the content of this lab will be evaluated with a quiz on Canvas. Before starting the quiz, think about these questions:

1 - How is rainfall detected and estimated with remote sensing? What are the data sources for the CHIRPS data set?

2 - Consider the borehole and CHIRPS data sets. What observations can you make based on the ArcGIS spatial and temporal mapping?

3 - In the R analysis of rainfall versus runtime, what are your findings? Is there a correlation? What is the magnitude of the correlation? What are the differences between Ethiopia and Kenya?

4 - How does the relationship between rainfall and borehole runtime evolve over the course of the year examined?

5 - How much does borehole use increase when there has been no rain in the previous week?

6 - What do you observe in the spatial analysis of the correlation between NPP and rainfall? Is there any region in Africa where the correlation is stronger?

7 - What is the utility of this kind of remote sensing data? What can this type of data be used for in the context of global engineering, global health and resource limited environments?

Interested in learning more? Check out the Earth Lab’s Earth Data Analytics Online Certificate - https://www.earthdatascience.org/courses/earth-analytics/