Identifying and characterizing urban vulnerability to heat is a key step in designing intervention strategies to combat negative consequences of extreme heat on human health. In this study, we seek to identify the most vulnerable populations to extreme heat by analyzing the impact of social-demographics and minimum daily temperature on the risk of mortality for the 1487 census block groups in the greater Houston area. The dataset we consider for this analysis has the following variables collected on each of the 1487 census block groups in Houston:
Variable | Description |
---|---|
NOAC | The percent of homes without air conditioning in the block group |
MED_AGE | the median age of persons living in the block group |
HispanicPC | Percent hispanic in the block group |
BlackPCT | Percent black in the block group |
under5PCT | Percent under 5 years in the block group |
over65PCT | Percent over 65 years in the block group |
povertyPCT | Percent living below poverty line in the block group |
alonePCT | Percent living alone in the block group |
MinTemp | Average minimum summer temperature in the block group |
Count | The count of number of heat-related morbidities |
Population | The population of the census block group |
library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(tidyr)
library(car)
## Loading required package: carData
library(MASS)
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(readr)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(spdep)
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(RSpectra)
myShp <- st_read("HoustonHeat.shp")
## Reading layer `HoustonHeat' from data source
## `/Users/guozhupeng/Documents/Stat 469/HoustonHeat.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1487 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -95.7332 ymin: 29.52989 xmax: -95.02184 ymax: 30.16357
## Geodetic CRS: WGS 84
myShpDF <- myShp %>%
st_drop_geometry()
ggplot(myShpDF, aes(x = MinTemp, y = log(Count+1))) +
geom_point() +
geom_smooth() +
labs(title = "Log of Number of Heat-related Morbidities by Average Minimum Temperature",
x = "Average Minimum Temperature in the Summer",
y = "Log of Number of Heat-related Morbidities")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
From the graph above, we can see that the blue line is fairly flat. It indicators that the increase in the average minimum temperatures seems to not have much effects on the number of heat-related morbidities.
ggplot(myShpDF, aes(x = Population, y = log(Count+1))) +
geom_point() +
geom_smooth() +
labs(title = "Log of Number of Heat-related Morbidities by Population",
x = "Population",
y = "Log of Number of Heat-related Morbidities")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
From the graph above, we can see that the blue line is increasing at the beginning and it turns fairly flat. It indicators that the increase in the population seems to not have much effects on the number of heat-related morbidities.
ggplot(myShpDF, aes(x = alonePCT, y = log(Count+1))) +
geom_point() +
geom_smooth() +
labs(title = "Log of Number of Heat-related Morbidities by Percentage of Living Alone",
x = "Percentage of Living Alone",
y = "Log of Number of Heat-related Morbidities")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
From the graph above, we can see that the blue line is fairly flat. It indicators that the increase in the percentage of living alone seems to not have much effects on the number of heat-related morbidities. But it still worth it exploring it later on.
sf::sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
model.lm <- lm(log(Count+1) ~ NOAC + MED_AGE + HispanicPC + BlackPCT + under5PCT + over65PCT + povertyPCT + alonePCT + MinTemp + Population, data = myShp)
residuals <- resid(model.lm)
ggplot(data=myShp) +
geom_sf(mapping=aes(fill=residuals))
moran.test(x=residuals, listw = nb2listw(poly2nb(myShp)))
## although coordinates are longitude/latitude, st_intersects assumes that they
## are planar
##
## Moran I test under randomisation
##
## data: residuals
## weights: nb2listw(poly2nb(myShp))
##
## Moran I statistic standard deviate = 15.736, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.2338486649 -0.0006729475 0.0002221256
From the graph above, we can see that areas around dark areas tend to be darker. I conducted a Moran’s I test on the residuals. Here is the result from the Moran’s I test. Since the p-value is 2.2*10^(-16), which is less than 0.05. We conclude that there is spatial correlation in the residuals.
The model we’ll use for this problem is a spatial GLM model, y ~ Pois(\(\lambda\)). y represents count of number of heat-related morbidities, and \(\lambda\) is estimated by \(log(\lambda) = X^{'} \beta + b^{'} \theta\). \(X^{'}\) represents a matrix of variables like HispanicPC, MinTemp, and so on. It also contains a column of 1 as the intercept. \(\beta\) represents the coefficients of each variables in \(X^{'}\). For example, for every unit increase in \(X_{MinTemp}\) , the log of the mean count of number of heat-related morbidities counts increases by \(\beta_{MinTemp}\). \(b^{'}\) represents the spatial bases, which is generated from the spatial data. \(\theta\) is the coefficients of each spatial basis in \(b^{'}\). The spatial basis functions are similar to error terms, and they will adjust the value of log of \(\lambda\).
A <- nb2mat(poly2nb(myShp), style = "B")
## although coordinates are longitude/latitude, st_intersects assumes that they
## are planar
X <- model.matrix(object = log(Count+1) ~ NOAC + MED_AGE + HispanicPC + BlackPCT + under5PCT + over65PCT + povertyPCT + alonePCT + MinTemp + Population, data = myShp)
source("moranBasis.R")
M <- moranBasis(X, A, tol=0.95)
myShpDF <- bind_cols(myShpDF, M)
model.glm <- glm(formula= Count ~ ., data=myShpDF, family=poisson)
Here I fitted my spatial GLM model using Moran basis function.
avPlots(model.lm)
From the graphs above, we can see that there is straight line in each graph. So, the assumption of linearity is met.
decorrelated_residuals <- resid(model.glm)
moran.test(x=decorrelated_residuals, listw = nb2listw(poly2nb(myShp)))
## although coordinates are longitude/latitude, st_intersects assumes that they
## are planar
##
## Moran I test under randomisation
##
## data: decorrelated_residuals
## weights: nb2listw(poly2nb(myShp))
##
## Moran I statistic standard deviate = -11.326, p-value = 1
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.1694150910 -0.0006729475 0.0002219758
Here I decorrelated the residuals and performed a Moran’s I test. The p-value of the test is 1, which is greater than 0.05. So, there is no more spatial correlation. The assumption of independence is met.
log_fitted_values <- log(fitted(model.glm))
plot(log_fitted_values, decorrelated_residuals,
xlab = "Fitted Values", ylab = "Decorrelated Residuals",
main = "Scatterplot of Log of Fitted Values vs Decorrelated Residuals")
abline(h = 0, col = "red", lty = 2) # Add a horizontal line at y = 0
From the graph, we can see that points of the Log of Fitted Values vs Decorrelated Residuals scatter plot scattered around the red line since the response variable is counts. Overall, the pattern is good. We can conclude that the equal variance assumption for this model is met.