1 Introduction

In this week’s assignment you will explore the use of geographically-weighted statistics using the Detroit database. Specifically, you will calculate geographically weighted means and medians for WPOP (persons identifying as white), BPOP (persons identifying as black or African American), and PCINC (per capita income in past 12 months). We will also compute geographically weighted correlations for PCINC ~ BPOP and PCINC ~ WPOP. We will utilize the gwss function from the GWmodel package. This function uses an IDW weighting function to ‘localize’ the calculation. Finally, you will compare the range of values observed for each local descriptive statistic to its corresponding global value.

2 Setting up R

2.1 Packages

Install the GWmodel package (new package in this session).

#install multiple packages. You do this only the first time. 
#install.packages(c('tidyverse', 'dplyr', 'tmap', 'ggplot2', 'sf', 'EnvStas', 'cowplot', 'GWmodel'))

#Load the libraries. You do this during every R session. 
library(tidyverse) #for processing dataframes (tables, like CSV files)
library(dplyr) #for processing dataframes (tables, like CSV files)
library(tmap) #for plotting shapefiles
library(ggplot2) #for plotting graphics in R
library(sf) #for processing shapefiles
library(EnvStats) #to display some stats on histograms and boxplots
library(cowplot) #To combine multiple graphs into one
library(GWmodel) #To compute localized descriptive statistics

3 Read in Your Files

Read your Detroit2015_CTracts shapefile and explore it.

detroit1 <- st_read('./Data/Detroit2015_CTracts.shp')

#print it to view some details
detroit1

4 Global Descriptive Statistics

Explore global descriptive statistics

#Means
mean_bpop <- mean(detroit1$BPOP)
mean_wpop <- mean(detroit1$WPOP)
mean_pcinc <- mean(detroit1$PCINC)
#Medians
median_bpop <- median(detroit1$BPOP)
median_wpop <- median(detroit1$WPOP)
median_pcinc <- median(detroit1$PCINC)
#Standard Deviations
sd_bpop <- sd(detroit1$BPOP)
sd_wpop <- sd(detroit1$WPOP)
sd_pcinc <- sd(detroit1$PCINC)

#Correlations
cor_pcinc_wpop <- cor(detroit1$PCINC, detroit1$WPOP)
cor_pcinc_bpop <- cor(detroit1$PCINC, detroit1$BPOP)

print(cor_pcinc_wpop)
## [1] -0.04884759
print(cor_pcinc_bpop)
## [1] 0.1515088
#Note: print any of these objects to view the statistic
#For example, below I am printing the mean of the WPOP variable (mean_wpop)
print(mean_wpop)
## [1] 339.8997

5 Geographically Weighted Descriptive Statistics

The gwss function only takes spatial points dataframes or spatial polygons dataframes. We will now convert our shapefile into a spatial polygons dataframe. Then, we will run the gwss function on it. We will need to provide the spatial polygons dataframe, the three variables we want statistics for, and a bandwidth. We will use a bandwidth of 5 kilometers here. Remember from class that the bandwidth sets the scale at which your process is operating. Feel free to adjust this number and see how that changes the patterns in the geographically weighted statistics.

detroit_spdf <- as_Spatial(detroit1)

localstats1 <- gwss(detroit_spdf,vars=c("WPOP", "BPOP","PCINC"),bw=5000, quantile = T)

The localstats1 object has a number of components. The most important one is probably a spatial data frame containing the results of local summary statistics for each polygon. Let’s view this object.

localstats1
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
## 
##    ***********************Calibration information*************************
## 
##    Local summary statistics calculated for variables:
##     WPOP BPOP PCINC
##    Number of summary points: 309
##    Kernel function: bisquare 
##    Summary points: the same locations as observations are used.
##    Fixed bandwidth: 5000 
##    Distance metric: Euclidean distance metric is used.
## 
##    ************************Local Summary Statistics:**********************
##    Summary information for Local means:
##                  Min.   1st Qu.    Median   3rd Qu.    Max.
##    WPOP_LM     50.694   113.497   180.957   382.373  2003.3
##    BPOP_LM    282.457  1326.533  1907.261  2264.590  3185.5
##    PCINC_LM 10717.254 12970.257 14092.007 15861.450 22610.9
##    Summary information for local standard deviation :
##                 Min. 1st Qu.  Median 3rd Qu.    Max.
##    WPOP_LSD    51.89  151.62  265.76  590.18  1193.5
##    BPOP_LSD   251.28  762.24  855.02  907.50  1628.2
##    PCINC_LSD 2068.88 3266.63 4277.31 6433.47 14632.9
##    Summary information for local variance :
##                     Min.    1st Qu.     Median    3rd Qu.      Max.
##    WPOP_LVar      2692.6    22989.7    70631.2   348310.2   1424440
##    BPOP_LVar     63141.4   581013.2   731056.4   823551.2   2651097
##    PCINC_LVar  4280247.3 10670888.5 18295402.9 41389465.8 214121212
##    Summary information for Local skewness:
##                    Min.   1st Qu.    Median   3rd Qu.    Max.
##    WPOP_LSKe  -0.091508  1.395554  2.334593  3.799982 11.7662
##    BPOP_LSKe  -0.905282  0.115614  0.360371  0.627703  6.5178
##    PCINC_LSKe -1.247439  0.543064  1.091245  1.828697  4.5424
##    Summary information for localized coefficient of variation:
##                 Min. 1st Qu.  Median 3rd Qu.   Max.
##    WPOP_LCV  0.50890 0.99169 1.30128 1.73306 2.9935
##    BPOP_LCV  0.24019 0.39371 0.47324 0.58125 2.0126
##    PCINC_LCV 0.15514 0.23403 0.31438 0.44207 0.7326
##    Summary information for localized Covariance and Correlation between these variables:
##                                   Min.     1st Qu.      Median     3rd Qu.
##    Cov_WPOP.BPOP           -1.1790e+06 -1.1837e+05 -1.7845e+04  1.6288e+04
##    Cov_WPOP.PCINC          -1.7685e+06 -1.9378e+05  7.5579e+04  3.8133e+05
##    Cov_BPOP.PCINC          -1.9585e+06  1.6381e+05  6.6823e+05  1.0939e+06
##    Corr_WPOP.BPOP          -7.7711e-01 -3.3719e-01 -9.1896e-02  1.1997e-01
##    Corr_WPOP.PCINC         -6.6577e-01 -9.5766e-02  1.0924e-01  3.6381e-01
##    Corr_BPOP.PCINC         -3.2721e-01  5.2844e-02  1.7142e-01  2.9207e-01
##    Spearman_rho_WPOP.BPOP  -9.2490e-01 -1.7026e-01  2.2529e-02  2.1546e-01
##    Spearman_rho_WPOP.PCINC -8.5864e-01 -9.1605e-02  1.1034e-01  2.9979e-01
##    Spearman_rho_BPOP.PCINC -2.7384e-01  1.1226e-01  2.1898e-01  3.6340e-01
##                                  Max.
##    Cov_WPOP.BPOP           9.7005e+04
##    Cov_WPOP.PCINC          1.7613e+06
##    Cov_BPOP.PCINC          6.1280e+06
##    Corr_WPOP.BPOP          5.6100e-01
##    Corr_WPOP.PCINC         7.9600e-01
##    Corr_BPOP.PCINC         9.8230e-01
##    Spearman_rho_WPOP.BPOP  6.6610e-01
##    Spearman_rho_WPOP.PCINC 7.6970e-01
##    Spearman_rho_BPOP.PCINC 9.8930e-01
##    Summary information for Local median:
##                  Min. 1st Qu. Median 3rd Qu.  Max.
##    WPOP_Median     28      63     86     142  2141
##    BPOP_Median    200    1245   1695    2219  3280
##    PCINC_Median 10050   12380  13436   14727 20251
##    Summary information for Interquartile range:
##              Min. 1st Qu. Median 3rd Qu.  Max.
##    WPOP_IQR    53      84    135     405  2088
##    BPOP_IQR   132    1060   1248    1465  3635
##    PCINC_IQR 1616    3602   4501    5867 15843
##    Summary information for Quantile imbalance:
##                  Min.   1st Qu.    Median   3rd Qu.   Max.
##    WPOP_QI  -0.926045 -0.496689 -0.275862 -0.113208 1.0000
##    BPOP_QI  -1.000000 -0.308738 -0.079137  0.127202 0.6245
##    PCINC_QI -1.000000 -0.307807 -0.144461  0.027560 0.6119
## 
##    ************************************************************************

Here are definitions of all variables contained in the object. Please note that X and Y are your respective variables (WPOP, BPOP, and PCINC in this study).

5.1 Compare Geographically Weighted Mean and Medians

5.1.1 PCINC

pcinc_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
  tm_fill(col = c("PCINC_Median", "PCINC_LM"), palette = "RdBu", title= c('PCINC Median', "PCINC Mean")) +
  tm_borders(alpha = 0.5) +
  tm_layout(legend.position = c("right", "bottom"))

#Print to view
pcinc_stats1

5.1.2 WPOP

wpop_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
  tm_fill(col = c("WPOP_Median", "WPOP_LM"), palette = "RdBu", title= c('WPOP Median', "WPOP Mean")) +
  tm_borders(alpha = 0.5) +
  tm_layout(legend.position = c("right", "bottom"))

#Print to view
wpop_stats1

5.1.3 BPOP

bpop_stats1 <- tm_shape(st_as_sf(localstats1$SDF)) +
  tm_fill(col = c("BPOP_Median", "BPOP_LM"), palette = "RdBu", title= c('BPOP Median', "BPOP Mean")) +
  tm_borders(alpha = 0.5) +
  tm_layout(legend.position = c("right", "bottom"))

#Print to view
bpop_stats1

5.2 Compare Geographically Weighted Correlations

5.2.1 PCINC vs WPOP

wpop_cor <- tm_shape(st_as_sf(localstats1$SDF)) +
  tm_fill(col = c("Corr_WPOP.PCINC", "Spearman_rho_WPOP.PCINC"), palette = "RdBu", 
          title= c('Pearson Correlation', "Spearman Correlation"), n = 6) +
  tm_borders(alpha = 0.5) +
  tm_layout(legend.position = c("right", "bottom"))

wpop_cor

5.2.2 PCINC vs BPOP

bpop_cor <- tm_shape(st_as_sf(localstats1$SDF)) +
  tm_fill(col = c("Corr_BPOP.PCINC", "Spearman_rho_BPOP.PCINC"), palette = "RdBu", 
          title= c('Pearson Correlation', "Spearman Correlation")) +
  tm_borders(alpha = 0.5) +
  tm_layout(legend.position = c("right", "bottom"))

bpop_cor

6 Submissions

Your submission will consist of a single knitted HTML file containing the maps above together with your written responses to the questions below. Please name this file using the convention: LastName_Assignment6.html, and zip it. Submit your zipped file to the Assignment 6 folder on HuskyCT. Your submission is due Wednesday, March 20th by 11:59pm.

Discussion: Describe the patterns observed in the geographically weighted statistic maps; i.e., (1) how do local means and medians compare for the same variable? (2) How do Pearson’s and Spearman’s rank correlations compare for the same variable? (3) How do they vary across the city, and do you feel these statistical properties exhibit spatial stationarity? (4) Address the benefit(s) that local statistics provide in these examples (hint: compare the range of values to the global value for each statistic). I have provided all global statistics in the table below. For correlations, compare both Pearson and Spearman rank correlations to the the one global correlation value provided. Be thoughtful in your response.

wpop_stats <- c(round(mean_wpop,0), round(median_wpop,0), round(sd_wpop,0), round(cor_pcinc_wpop,2)) #Replace mean, median, IQR, SD, and CV with values from your analyses.
bpop_stats <- c(round(mean_bpop,0), round(median_bpop,0), round(sd_bpop,0), round(cor_pcinc_bpop,2)) #Replace mean, median, IQR, SD, and CV with values from your analyses.
pcinc_stats <- c(round(mean_pcinc,0), round(median_pcinc,0), round(sd_pcinc,0), 'NA') #mean, median, IQR, SD, and CV

#Create a table for all variables
desc_stats <- as.data.frame(rbind(wpop_stats, bpop_stats, pcinc_stats))
#Rename columns
colnames(desc_stats) <- c('Mean', 'Median', 'SD', 'Correlation')
rownames(desc_stats) <- c('White Population', 'Black Population', 'Per Capita Income')

knitr::kable(desc_stats)
Mean Median SD Correlation
White Population 340 93 624 -0.05
Black Population 1826 1724 1080 0.15
Per Capita Income 14619 13540 6511 NA



Your answer:


  1. Local means and medians show some differences in geographic plots for all three relevant variables: PCINC, BPOP, & WPOP. For PCINC, mean incomes seem to be slightly inflated as compared to median incomes, which is a somewhat expected phenonemon for income. There appears to be a slightly more defined graduation of mean incomes vs median incomes, but this visual effect may be partially attributed to the fact that the returned map for mean values has an extra break as compared to median values. For BPOP, we see a different effect where there is more visual graduation between bins for the median than the mean. Clustering trends are preserved and matched between maps, but are accentuated in the median map. For WPOP, we observe strong clustering occurring in both our maps. However, unlike for our BPOP maps, not all cluster differences are preserved based upon the binning which we use.


  2. For WPOP, Pearson’s and Spearman’s rank correlations actually result in slightly different resultant distributions of clusters for spatial correlation. These differences can become important if we are examining this data for policymaking. Mathematically, the calculation of each statistic is slightly different and would have to make a further examination to see which one better fits our needs and modeling strategy. For BPOP, the Pearon & Spearman’s rank correlation patterns seem to be matched more closely.


  3. Overall, for these maps, we can see that the variation in per capita income can be explained to different extents by racial demographics for different geographic regions. This exhibits spatial stationarity in an important way - which is that there is variable correlation of our variables which varies based upon location.


  4. This exercise shows that local statistics can be very important for understanding patterns in our data. Based on our global statistic table, we observe almost no correlation between white and black populations in census tracts and per capita income in the same census tracts. (r = 0.15 BPOP, r = -0.05 WPOP) However, our plotting shows that we can obtain local correlations that even approach 1. This is a much different result than stating that there is no correlation present.