In this week’s assignment you will explore the use of geographically-weighted statistics using the Detroit database. Specifically, you will calculate geographically weighted means and medians for WPOP (persons identifying as white), BPOP (persons identifying as black or African American), and PCINC (per capita income in past 12 months). We will also compute geographically weighted correlations for PCINC ~ BPOP and PCINC ~ WPOP. We will utilize the gwss function from the GWmodel package. This function uses an IDW weighting function to ‘localize’ the calculation. Finally, you will compare the range of values observed for each local descriptive statistic to its corresponding global value.
Install the GWmodel package (new package in this session).
#install multiple packages. You do this only the first time.
#install.packages(c('tidyverse', 'dplyr', 'tmap', 'ggplot2', 'sf', 'EnvStas', 'cowplot', 'GWmodel'))
#Load the libraries. You do this during every R session.
library(tidyverse) #for processing dataframes (tables, like CSV files)
library(dplyr) #for processing dataframes (tables, like CSV files)
library(tmap) #for plotting shapefiles
library(ggplot2) #for plotting graphics in R
library(sf) #for processing shapefiles
library(EnvStats) #to display some stats on histograms and boxplots
library(cowplot) #To combine multiple graphs into one
library(GWmodel) #To compute localized descriptive statistics
Read your Detroit2015_CTracts shapefile and explore it.
detroit1 <- st_read('./Data/Detroit2015_CTracts.shp')
#print it to view some details
detroit1
Explore global descriptive statistics
#Means
mean_bpop <- mean(detroit1$BPOP)
mean_wpop <- mean(detroit1$WPOP)
mean_pcinc <- mean(detroit1$PCINC)
#Medians
median_bpop <- median(detroit1$BPOP)
median_wpop <- median(detroit1$WPOP)
median_pcinc <- median(detroit1$PCINC)
#Standard Deviations
sd_bpop <- sd(detroit1$BPOP)
sd_wpop <- sd(detroit1$WPOP)
sd_pcinc <- sd(detroit1$PCINC)
#Correlations
cor_pcinc_wpop <- cor(detroit1$PCINC, detroit1$WPOP)
cor_pcinc_bpop <- cor(detroit1$PCINC, detroit1$BPOP)
print(cor_pcinc_wpop)
## [1] -0.04884759
print(cor_pcinc_bpop)
## [1] 0.1515088
#Note: print any of these objects to view the statistic
#For example, below I am printing the mean of the WPOP variable (mean_wpop)
print(mean_wpop)
## [1] 339.8997
The gwss function only takes spatial points dataframes or spatial polygons dataframes. We will now convert our shapefile into a spatial polygons dataframe. Then, we will run the gwss function on it. We will need to provide the spatial polygons dataframe, the three variables we want statistics for, and a bandwidth. We will use a bandwidth of 5 kilometers here. Remember from class that the bandwidth sets the scale at which your process is operating. Feel free to adjust this number and see how that changes the patterns in the geographically weighted statistics.
detroit_spdf <- as_Spatial(detroit1)
localstats1 <- gwss(detroit_spdf,vars=c("WPOP", "BPOP","PCINC"),bw=5000, quantile = T)
The localstats1 object has a number of components. The most important one is probably a spatial data frame containing the results of local summary statistics for each polygon. Let’s view this object.
localstats1
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
##
## ***********************Calibration information*************************
##
## Local summary statistics calculated for variables:
## WPOP BPOP PCINC
## Number of summary points: 309
## Kernel function: bisquare
## Summary points: the same locations as observations are used.
## Fixed bandwidth: 5000
## Distance metric: Euclidean distance metric is used.
##
## ************************Local Summary Statistics:**********************
## Summary information for Local means:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_LM 50.694 113.497 180.957 382.373 2003.3
## BPOP_LM 282.457 1326.533 1907.261 2264.590 3185.5
## PCINC_LM 10717.254 12970.257 14092.007 15861.450 22610.9
## Summary information for local standard deviation :
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_LSD 51.89 151.62 265.76 590.18 1193.5
## BPOP_LSD 251.28 762.24 855.02 907.50 1628.2
## PCINC_LSD 2068.88 3266.63 4277.31 6433.47 14632.9
## Summary information for local variance :
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_LVar 2692.6 22989.7 70631.2 348310.2 1424440
## BPOP_LVar 63141.4 581013.2 731056.4 823551.2 2651097
## PCINC_LVar 4280247.3 10670888.5 18295402.9 41389465.8 214121212
## Summary information for Local skewness:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_LSKe -0.091508 1.395554 2.334593 3.799982 11.7662
## BPOP_LSKe -0.905282 0.115614 0.360371 0.627703 6.5178
## PCINC_LSKe -1.247439 0.543064 1.091245 1.828697 4.5424
## Summary information for localized coefficient of variation:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_LCV 0.50890 0.99169 1.30128 1.73306 2.9935
## BPOP_LCV 0.24019 0.39371 0.47324 0.58125 2.0126
## PCINC_LCV 0.15514 0.23403 0.31438 0.44207 0.7326
## Summary information for localized Covariance and Correlation between these variables:
## Min. 1st Qu. Median 3rd Qu.
## Cov_WPOP.BPOP -1.1790e+06 -1.1837e+05 -1.7845e+04 1.6288e+04
## Cov_WPOP.PCINC -1.7685e+06 -1.9378e+05 7.5579e+04 3.8133e+05
## Cov_BPOP.PCINC -1.9585e+06 1.6381e+05 6.6823e+05 1.0939e+06
## Corr_WPOP.BPOP -7.7711e-01 -3.3719e-01 -9.1896e-02 1.1997e-01
## Corr_WPOP.PCINC -6.6577e-01 -9.5766e-02 1.0924e-01 3.6381e-01
## Corr_BPOP.PCINC -3.2721e-01 5.2844e-02 1.7142e-01 2.9207e-01
## Spearman_rho_WPOP.BPOP -9.2490e-01 -1.7026e-01 2.2529e-02 2.1546e-01
## Spearman_rho_WPOP.PCINC -8.5864e-01 -9.1605e-02 1.1034e-01 2.9979e-01
## Spearman_rho_BPOP.PCINC -2.7384e-01 1.1226e-01 2.1898e-01 3.6340e-01
## Max.
## Cov_WPOP.BPOP 9.7005e+04
## Cov_WPOP.PCINC 1.7613e+06
## Cov_BPOP.PCINC 6.1280e+06
## Corr_WPOP.BPOP 5.6100e-01
## Corr_WPOP.PCINC 7.9600e-01
## Corr_BPOP.PCINC 9.8230e-01
## Spearman_rho_WPOP.BPOP 6.6610e-01
## Spearman_rho_WPOP.PCINC 7.6970e-01
## Spearman_rho_BPOP.PCINC 9.8930e-01
## Summary information for Local median:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_Median 28 63 86 142 2141
## BPOP_Median 200 1245 1695 2219 3280
## PCINC_Median 10050 12380 13436 14727 20251
## Summary information for Interquartile range:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_IQR 53 84 135 405 2088
## BPOP_IQR 132 1060 1248 1465 3635
## PCINC_IQR 1616 3602 4501 5867 15843
## Summary information for Quantile imbalance:
## Min. 1st Qu. Median 3rd Qu. Max.
## WPOP_QI -0.926045 -0.496689 -0.275862 -0.113208 1.0000
## BPOP_QI -1.000000 -0.308738 -0.079137 0.127202 0.6245
## PCINC_QI -1.000000 -0.307807 -0.144461 0.027560 0.6119
##
## ************************************************************************
Here are definitions of all variables contained in the object. Please note that X and Y are your respective variables (WPOP, BPOP, and PCINC in this study).
X_LM - GW mean
X_LSD - GW Standard deviation
X_Lvar - GW Variance (GW Standard deviation squared)
X_LSKe - GW Skewness
X_LCV - GW Coefficient of variation (GW mean divided by GW Standard deviation)
Cov_X.Y - GW Covariance
Corr_X.Y - GW Pearson Correlation
Spearman_rho_X.Y - GW Spearman Correlation
pcinc_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
tm_fill(col = c("PCINC_Median", "PCINC_LM"), palette = "RdBu", title= c('PCINC Median', "PCINC Mean")) +
tm_borders(alpha = 0.5) +
tm_layout(legend.position = c("right", "bottom"))
#Print to view
pcinc_stats1
wpop_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
tm_fill(col = c("WPOP_Median", "WPOP_LM"), palette = "RdBu", title= c('WPOP Median', "WPOP Mean")) +
tm_borders(alpha = 0.5) +
tm_layout(legend.position = c("right", "bottom"))
#Print to view
wpop_stats1
bpop_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
tm_fill(col = c("BPOP_Median", "BPOP_LM"), palette = "RdBu", title= c('BPOP Median', "BPOP Mean")) +
tm_borders(alpha = 0.5) +
tm_layout(legend.position = c("right", "bottom"))
#Print to view
bpop_stats1
wpop_cor <- tm_shape(st_as_sf(localstats1$SDF)) +
tm_fill(col = c("Corr_WPOP.PCINC", "Spearman_rho_WPOP.PCINC"), palette = "RdBu",
title= c('Pearson Correlation', "Spearman Correlation"), n = 6) +
tm_borders(alpha = 0.5) +
tm_layout(legend.position = c("right", "bottom"))
wpop_cor
bpop_cor <- tm_shape(st_as_sf(localstats1$SDF)) +
tm_fill(col = c("Corr_BPOP.PCINC", "Spearman_rho_BPOP.PCINC"), palette = "RdBu",
title= c('Pearson Correlation', "Spearman Correlation")) +
tm_borders(alpha = 0.5) +
tm_layout(legend.position = c("right", "bottom"))
bpop_cor
Your submission will consist of a single knitted HTML file containing the maps above together with your written responses to the questions below. Please name this file using the convention: LastName_Assignment6.html, and zip it. Submit your zipped file to the Assignment 6 folder on HuskyCT. Your submission is due Wednesday, March 20th by 11:59pm.
Discussion: Describe the patterns observed in the geographically weighted statistic maps; i.e., (1) how do local means and medians compare for the same variable? (2) How do Pearson’s and Spearman’s rank correlations compare for the same variable? (3) How do they vary across the city, and do you feel these statistical properties exhibit spatial stationarity? (4) Address the benefit(s) that local statistics provide in these examples (hint: compare the range of values to the global value for each statistic). I have provided all global statistics in the table below. For correlations, compare both Pearson and Spearman rank correlations to the the one global correlation value provided. Be thoughtful in your response.
wpop_stats <- c(round(mean_wpop,0), round(median_wpop,0), round(sd_wpop,0), round(cor_pcinc_wpop,2)) #Replace mean, median, IQR, SD, and CV with values from your analyses.
bpop_stats <- c(round(mean_bpop,0), round(median_bpop,0), round(sd_bpop,0), round(cor_pcinc_bpop,2)) #Replace mean, median, IQR, SD, and CV with values from your analyses.
pcinc_stats <- c(round(mean_pcinc,0), round(median_pcinc,0), round(sd_pcinc,0), 'NA') #mean, median, IQR, SD, and CV
#Create a table for all variables
desc_stats <- as.data.frame(rbind(wpop_stats, bpop_stats, pcinc_stats))
#Rename columns
colnames(desc_stats) <- c('Mean', 'Median', 'SD', 'Correlation')
rownames(desc_stats) <- c('White Population', 'Black Population', 'Per Capita Income')
knitr::kable(desc_stats)
| Mean | Median | SD | Correlation | |
|---|---|---|---|---|
| White Population | 340 | 93 | 624 | -0.05 |
| Black Population | 1826 | 1724 | 1080 | 0.15 |
| Per Capita Income | 14619 | 13540 | 6511 | NA |
Your answer:
Local means and medians show some differences in geographic
plots for all three relevant variables: PCINC, BPOP, & WPOP. For
PCINC, mean incomes seem to be slightly inflated as compared to median
incomes, which is a somewhat expected phenonemon for income. There
appears to be a slightly more defined graduation of mean incomes vs
median incomes, but this visual effect may be partially attributed to
the fact that the returned map for mean values has an extra break as
compared to median values. For BPOP, we see a different effect where
there is more visual graduation between bins for the median than the
mean. Clustering trends are preserved and matched between maps, but are
accentuated in the median map. For WPOP, we observe strong clustering
occurring in both our maps. However, unlike for our BPOP maps, not all
cluster differences are preserved based upon the binning which we use.
For WPOP, Pearson’s and Spearman’s rank correlations
actually result in slightly different resultant distributions of
clusters for spatial correlation. These differences can become important
if we are examining this data for policymaking. Mathematically, the
calculation of each statistic is slightly different and would have to
make a further examination to see which one better fits our needs and
modeling strategy. For BPOP, the Pearon & Spearman’s rank
correlation patterns seem to be matched more closely.
Overall, for these maps, we can see that the variation in per
capita income can be explained to different extents by racial
demographics for different geographic regions. This exhibits spatial
stationarity in an important way - which is that there is variable
correlation of our variables which varies based upon location.
This exercise shows that local statistics can be
very important for understanding patterns in our data.
Based on our global statistic table, we observe almost no correlation
between white and black populations in census tracts and per capita
income in the same census tracts. (r = 0.15 BPOP, r = -0.05 WPOP)
However, our plotting shows that we can obtain local correlations that
even approach 1. This is a much different result than
stating that there is no correlation present.