DEM 7263 - Homework Exploratory Spatial Data Analysis

In this descriptive assignment, I compute the mortality rates for San Antonio using data information from the census tracts for the year 2008. I compute the Moran’s I statistics using two methods: Rook and Queen. Finally, I do a local moran’s I analysis to identify clusters of low or high mortality.

Aggregate Number of Deaths in San Antonio

library(foreign)
d99<-read.dbf("2008_Texas_Deaths.dbf")
d99$cofips<-substr(d99$D_TR2000, 3,5)
#get only bexar county, fips==029
bexar08<-subset(d99, d99$cofips=="029")

#form the tract ID
bexar08$tract<-substr(bexar08$D_TR2000,6,11)

#Cause of death is D_SUPC10

#aggregate by tract
bextr08<-data.frame(xtabs(~tract, bexar08, drop.unused.levels=T))

trs<-bextr08$tract

Merge Data

#make a dataframe with an id and deaths data
trmort<-data.frame(tract=bextr08$tract, d99=bextr08$Freq)

head(trmort)

##    tract d99
## 1 110100  41
## 2 110200   3
## 3 110300  32
## 4 110400  12
## 5 110500  15
## 6 110600  36

library(spdep)
library(maptools)
sa<-readShapePoly("SA_classdata.shp")

## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read

mdat<-sa@data
#merge the death data to the shapefile
mdat2<-merge(mdat, trmort, by.x="TractID", by.y="tract", all.x=T, sort=F)
mdat2$rate <- 1000*mdat2$d99 / mdat2$acs_poptot 

mdat2[is.na(mdat2)] <- 0

sa@data<-mdat2

writePolyShape(sa, fn="SA_2008merged.shp")

## Warning: writePolyShape is deprecated; use rgdal::writeOGR or sf::st_write

Exploratory Analysis

A quick exploratory analysis shows that most tracts share a similar mortality rate with a few outliers who exceed a rate of 15 while the majority range between 3 and 15. A histogram for poverty rates is also included showing a divide between census tracts who have much less poverty than other. This is consistent with distribution of Hispanics vs. White observed in the boxplots. A second histogram is used to show the distribution of unemployment which is generally below .2 for all census tracts. Finally, a boxplot for the black population is also included demonstrating their weaker presence in most census tracts.

library(spdep)
library(maptools)
library(RColorBrewer)
library(classInt)
library(ggplot2)

#Simple Plot
plot (mdat2$TractID, mdat2$rate,
main="Mortality Rate by Census Tract in San Antonio",
ylab="Rate")

#Histogram 
qplot(mdat2$PPOVERTY, geom="histogram", binwidth=.05, main="Distribution of Poverty Rate in San Antonio by Census Tracts",
ylab="Number of Tracts", xlab="Poverty Rate" )

qplot(mdat2$PCIVUNEMP, geom="histogram", binwidth=.05, main="Distribution of Unemployment Rate in San Antonio ny Census Tracts",
ylab="Number of Tracts", xlab="Unemployment Rate" )

#Boxplot
mdat2$decade<-cut(mdat2$MEDYRBLT, breaks = 4)
ggplot(mdat2, aes(x=decade, y=PHISPANIC))+ geom_boxplot()

ggplot(mdat2, aes(x=decade, y=PWHITE))+ geom_boxplot()

ggplot(mdat2, aes(x=decade, y=PBLACK))+ geom_boxplot()

Thematic Maps

Various thematics maps are included below illustrating the concentration of White vs. Hispanic population. The most noticeable area of Hispanic concentration is observed on the bottom section of the map while the White population is clustered on the north section of the map. The downtown area of San Antonio is primarily populated by Hispanics.

The next map shows the mortality rate in San Antonio which appears to be more dominant in the downtown area as well as in various census tracts to the east with a few other with higher mortality rates on the outskirts. The last map includes the poverty distribution in San Antonio which mirrors to some effect that of mortality rates, particularly around the downtown area.

dat<-readShapePoly("SA_2008merged.shp", proj4string=CRS("+proj=utm +zone=14 +north"))

## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read

dat<-st_read(dsn = "C:/Users/PCMcC/Documents/Spatial Demography/Homeworks/HW1", layer = "SA_2008merged")

## Reading layer `SA_2008merged' from data source `C:\Users\PCMcC\Documents\Spatial Demography\Homeworks\HW1' using driver `ESRI Shapefile'
## Simple feature collection with 235 features and 74 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 518785.6 ymin: 3231290 xmax: 569265.6 ymax: 3283869
## epsg (SRID):    NA
## proj4string:    NA

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:GGally':
## 
##     nasa

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dat%>%
  ggplot()+geom_sf(aes(fill=PWHITE))+scale_fill_gradient(low = "powderblue" , high = "darkblue")+ggtitle("Percentage of White Population in San Antonio ")

dat%>%  
ggplot()+geom_sf(aes(fill=PHISPANIC))+scale_fill_gradient(low = "plum3" , high = "purple4")+ggtitle("Percentage of Hispanics in San Antonio ")

dat%>%  
ggplot()+geom_sf(aes(fill=rate))+scale_fill_gradient(low = "lightgreen" , high = "darkgreen")+ggtitle("Mortality Rate in San Antonio ")

dat%>%  
ggplot()+geom_sf(aes(fill=PPOVERTY))+scale_fill_gradient(low = "bisque2" , high = "darkorange4")+ggtitle("Poverty in San Antonio ")

Moran’s I Analysis - QUEEN

A Moran’s I Analysis was conducted using rook and queen methods. Both methods demonstrate that generally there is a slight correlation among census tracts as it relates to mortality rates. The moran I statistic using a queen method resulted in .10 while using a rook method resulted in .12. Both methods showed significant p-values.

## 
##  Moran I test under randomisation
## 
## data:  dat$rate  
## weights: queenw    
## 
## Moran I statistic standard deviate = 2.8086, p-value = 0.002488
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.106251726      -0.004273504       0.001548625

Moran’s I Analysis - ROOK

## 
##  Moran I test under randomisation
## 
## data:  dat$rate  
## weights: rookw    
## 
## Moran I statistic standard deviate = 2.9408, p-value = 0.001637
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.122655508      -0.004273504       0.001862871

Local Moran’s Analysis

Finally, the local Moran’s analysis demonstrated a clustering pattern as observed in the last map. Those census tracts present values that are very similar to each other thus, resulting in a spatial cluster.

#Here, I use the false discovery rate correction, to adjust the p-value for all the tests, similar to a Bonferroni correction in ANOVA
locali<-localmoran(dat$rate, rookw, p.adjust.method="fdr")
dat$locali<-locali[,1]
dat$localp<-locali[,5]

#Create cluster identifiers
dat$cl<-as.factor(ifelse(dat$localp<=.05,"Clustered","NotClustered"))

#Plots of the results
spplot(as(dat, "Spatial"), "locali", main="Local Moran's I", at=quantile(dat$locali), col.regions=brewer.pal(n=4, "RdBu"))

spplot(as(dat, "Spatial"), "cl", main="Local Moran Clusters", col.regions=c(2,0))

```