1 Introduction

In this assignment you will explore spatial relationships and spatial dependency in the Detroit Census tract database. First, you will construct a series of spatial weight matrices that capture different types of spatial relationships, including distance-based and contiguity-based. Then, you will use the spatial weights matrices to explore spatial autocorrelation. Your goal is to determine the degree of spatial autocorrelation present in the attribute percent of the population with less than a high school diploma (PER_HGSCH) using varying definitions of what constitutes a neighbor. You will utilize R to complete this analysis.

2 Setting up R

2.1 Packages

Install (any new) packages and load all packages needed for this session. This lab requires one new package we have not worked with before, spdep. Other packages were installed in our previous R session. For these, we just need to make them available by loading them using the library() function.

#install multiple packages. You do this only the first time. 
#install.packages(c('spdep'))

#Load the libraries. You do this during every R session. 
library(tidyverse) #for processing dataframes (tables, like CSV files)
library(dplyr) #for processing dataframes (tables, like CSV files)
library(tmap) #for plotting shapefiles
library(ggplot2) #for plotting graphics in R
library(sf) #for processing shapefiles
library(spdep) #for computing spatial weights and spatial autocorrelation

3 Preliminary Steps

3.1 Read in Your Files

Read your Detroit2015_CTracts shapefile and explore it.

detroit1 <- st_read('./Data/Detroit2015_CTracts.shp')

#print it to view some details
detroit1

3.2 Explore Variable of Interest

It is important to map the variable of interest (PER_HGSCH) to see if there are any obvious clusters in space.

tmap_mode("view") #Change this to 'map' if you want a static map

tm_shape(detroit1, unit = 'm') +
  tm_polygons(col = "PER_HGSCH", style = "pretty",palette = "Reds", 
              border.alpha = 0, title = "% with Less Than High School Diploma")

4 Part 1: Constructing Spatial Weights Matrices

The first step in calculating spatial autocorrelation measures is defining the spatial weights matrix. You will examine four spatial weights matrices – i.e., you will define four types of spatial relationships between the Census tracts.

4.1 Neighbor Connectivity by Queen’s Contiguity

#Create neighbors based on Queen Contiguity 
CTracts_queen <- poly2nb(detroit1, row.names = detroit1$UID, queen=T)

We can then use function summary() to learn something about how much the neighborhoods are connected.

summary(CTracts_queen)

For each census tract in Detroit, CTracts_queen lists all neighboring census tracts based on their UID (the field you specified with the row.names parameter above). For example, to see the neighbors for the first tract:

detroit1$UID[1] #the UID value for the first census tract
CTracts_queen[[1]] #the UID values of the six neighboring census tract

#Print the Tract IDs of neighbors for the first census tract 
detroit1$TRACTCE[CTracts_queen[[1]]]

We can view the distribution of the neighbors using a connectivity histogram. A connectivity histogram shows how many observations have the same number of neighbors. Before doing that, we need to generate the information from the neighbor object using the card() function from the spdep package.

queen_card <- card(CTracts_queen)
ggplot() +
  geom_histogram(aes(x=queen_card), breaks = seq(0,10, by = 1)) +
  xlab("Number of Neighbors") +
  ylab("Frequency") +
  ggtitle('Queen Contiguity Neighbors') +
  theme_bw()

We can now visualize the neighbor connections between tracts using a connectivity graph/map. A connectivity graph displays a point and connects it to all its neighbors. Since we are working with polygons in this exercise, we need to convert them to points in order to create this graph. We will convert the polygons to centroids. We will then use the generic plot() function to create the graphs/maps. Below we visualize the Queen contiguity based definition of neighbors.

#Plot the connectivity graph/map to visualize neighbor connections
centroids <- st_centroid(st_geometry(detroit1)) #get centroids for each census tract
plot(st_geometry(detroit1), border = "grey60", reset = FALSE) #plot the boundaries for each census tract
plot(CTracts_queen, coords = centroids, add=T, col = "red")

What we have established above are neighbors for each census tract. The next thing to do is to assign weights to each neighbor relationship. Weights determine how much each neighbor counts. To accomplish this, we will use the nb2listw() function from the spdep package. For example, we can create weights for our Queen contiguity defined neighbor object CTracts_queen following the procedure below.

queen_weights<-nb2listw(CTracts_queen, style="W")

In the command, the first input is the neighbor nb object (CTracts_queen). We then define the weights style = “W”. Here, style = “W” indicates that the weights for each spatial unit are standardized to sum to 1 (also known as row standardization). For example, if census tract 1 has 6 neighbors, each of those neighbors will have weights of 1/6.

This allows for comparability between areas with different numbers of neighbors.

queen_weights$weights[[1]]

4.2 Neighbor Connectivity by Rook’s Contiguity

Now, let’s create spatial weights based on the Rook’s contiguity

#Create neighbors based on Queen Contiguity 
CTracts_rook <- poly2nb(detroit1, row.names = detroit1$UID, queen=F) #queen=F creates neighbors based on Rook Contiguity

#View the summary
summary(CTracts_rook)

#Explore some neighbor characteristics
detroit1$UID[1] #the UID value for the first census tract
CTracts_rook[[1]] #the UID values of the six neighboring census tract

#Print the Tract IDs of neighbors for the first census tract 
detroit1$TRACTCE[CTracts_rook[[1]]]

#Plot the connectivity histogram for the neighbors based on Rook contiguity
rook_card <- card(CTracts_rook)
ggplot() +
  geom_histogram(aes(x=rook_card), breaks = seq(0,10, by = 1)) +
  xlab("Number of Neighbors") +
  ylab("Frequency") +
  ggtitle('Rook Contiguity Neighbors') +
  theme_bw()

#Plot the connectivity graph/map to visualize neighbor connections
plot(st_geometry(detroit1), border = "grey60", reset = FALSE) #plot the boundaries for each census tract
plot(CTracts_rook, coords = centroids, add=T, col = "red")

#Create weights
rook_weights<-nb2listw(CTracts_rook, style="W")
#View weights for neighbors for first census tract
rook_weights$weights[[1]]

4.3 Neighbor Connectivity by k-nearest Neighbors

Now, let’s create spatial weights based on the k-nearest neighbors. This method finds the k closest observations for each observation of interest, where k is some integer. For example, we can define k=6 - meaning that each observation will have 6 neighbors closest to it. This rule disregards the distance between observations. As such, two observations could be very far away from one another and still be considered neighbors.

We create a k-nearest neighbor object using the functions knearneigh() and knn2nb(), also from the spdep package. Below we create a k-nearest neighbor object with k=6 and derive weights for each neighbor. We will then view the summary of the neighbors, visualize the neighbors using the connectivity map, and finally create weights. Note: since we specified the number of neighbors as 6 (k=6), all census tracts will have 6 neighbors. Therefore, we do not need to plot the connectivity histogram.

#Create neighbors based on k-nearest neighbor
CTracts_knn <- knearneigh(centroids, k = 6) # k number nearest neighbors

CTracts_knn <- knn2nb(CTracts_knn)
#View the summary
summary(CTracts_knn)

#Plot the connectivity graph/map to visualize neighbor connections
plot(st_geometry(detroit1), border = "grey60", reset = FALSE) #plot the boundaries for each census tract
plot(CTracts_knn, coords = centroids, add=T, col = "red")

#Create weights
knn_weights<-nb2listw(CTracts_knn, style="W")
#View weights for neighbors for first census tract
knn_weights$weights[[1]]

4.4 Neighbor Connectivity by Distance

Now, let’s create spatial weights based on distance. In this case, features within a given radius are considered neighbors. If working with points, then all points that fall within that radius are considered neighbors. If working with polygons, centroids of the polygons are assessed and polygons whose centroids fall within the radius are considered neighbors.

We create a distance-based neighbor object using the function dnearneigh(), which is part of the spdep package. The function tells R to designate as neighbors the features falling within the distance specified between d1 (lower distance bound) and d2 (upper distance bound). d1 and d2 should be provided in units in which the shapefile is projected. Our Detroit2015_CTracts shapefile is in the USA Contiguous Albers Equal Area Conic which uses meters as the unit of measurement.

Below we will find all neighbors that are within 0 and 3000 meters of each polygon. We will then view the summary of the neighbors, create weights and finally visualize neighbor connections.Note: depending on the maximum distance (d2) you provide, calculating weights may fail because some census tracts might not have any neighbors within that distance. To take care of this, you can provide a higher distance threshold or set zero.policy = TRUE in your nb2listw() function.

#Create neighbors based on k-nearest neighbor
CTracts_dist <- dnearneigh(centroids, d1 = 0, d2 = 3000, row.names = detroit1$UID) #d1 = lower distance bound and d2 = upper distance bound 

#View the summary
summary(CTracts_dist)

#Plot the connectivity histogram for the neighbors based on distance
dist_card <- card(CTracts_dist)
ggplot() +
  geom_histogram(aes(x=dist_card), breaks = seq(0,35, by = 1)) +
  xlab("Number of Neighbors") +
  ylab("Frequency") +
  ggtitle('Distance Neighbors (d1=0, d2=3000)') +
  theme_bw()

#Plot the connectivity graph/map to visualize neighbor connections
plot(st_geometry(detroit1), border = "grey60", reset = FALSE) #plot the boundaries for each census tract
plot(CTracts_dist, coords = centroids, add=T, col = "red")

#Create weights
dist_weights<-nb2listw(CTracts_dist, style="W", zero.policy = T) #set zero.policy = T
#View weights for neighbors for first census tract
dist_weights$weights[[1]]

5 Part 2: Calculating Moran’s I - PER_HGSCH

5.1 Global Spatial Autocorrelation

In Part 1 we defined what we mean by neighbor by creating a neighbor object. We also generated a spatial weights matrix to characterize the influence of each neighbor. Our map of the PCINC variable showed that per capita income appeared to be clustered in Detroit. We can explore this further using a Moran scatterplot. The Moran scatterplot plots standardized PCINC values on the x-axis and the standardized average PCINC for one’s neighbors on the y-axis. The standardized average PCINC for one’s neighbors is known as the spatial lag.

Below we plot a Moran scatterplot based on spatial weights derived using Queen’s contiguity.

moran.plot(as.numeric(scale(detroit1$PER_HGSCH)), listw=queen_weights, 
           xlab="Standardized % With Less Than High School Diploma", 
           ylab="Standardized Lagged % With Less Than High School Diploma",
           main=c("Moran Scatterplot for PER_HGSCH", "Based on Queen's Contiguity"))

There appears to be a fairly strong positive association - the higher your neighbors’ PER_HGSCH, the higher your PER_HGSCH.

The map and Moran scatterplot provide descriptive visualizations of spatial clustering (autocorrelation) in percent with less than a high school diploma. But, we do not want to stop here. We need a quantitative and objective approach to quantifying the degree to which similar features cluster. This is where we utilize global measures of autocorrelation. A global index of spatial autocorrelation will give us a summary over the entire study area of the level of spatial similarity observed among neighboring observations. We will use Global Moran’s I, which is the most popular test of spatial autocorrelation.

We use the function moran.test() in the spdep package to calculate the Moran’s I. We need to specify the field (PER_HGSCH) and the spatial weights matrix.

queen_moran <- moran.test(detroit1$PER_HGSCH, queen_weights)

We find that Moran’s I is positive (0.45) and statistically significant (p-value < 0.01). Remember that Moran’s I is simply a correlation, and correlations go from -1 to 1. A rule of thumb is a spatial autocorrelation higher than 0.3 and lower than -0.3 is meaningful. Our 0.45 correlation is fairly high indicating strong positive clustering. This clustering is also statistically significant (p-value < 0.01).

We now need to compute the same stats for the PER_HGSCH variable using spatial weights from the Rook’s contiguity, k-nearest neighbor, and distance. As we do this, think about how different (or similar) the results will be with each of the definitions of neighbor.

5.2 Rook’s Contiguity

moran.plot(as.numeric(scale(detroit1$PER_HGSCH)), listw=rook_weights, 
           xlab="Standardized % With Less Than High School Diploma", 
           ylab="Standardized Lagged % With Less Than High School Diploma",
           main=c("Moran Scatterplot for PER_HGSCH", "Based on Rook's Contiguity"))

rook_moran <- moran.test(detroit1$PER_HGSCH, rook_weights)
rook_moran

5.3 K-nearest Neighbor

moran.plot(as.numeric(scale(detroit1$PER_HGSCH)), listw=knn_weights, 
           xlab="Standardized % With Less Than High School Diploma", 
           ylab="Standardized Lagged % With Less Than High School Diploma",
           main=c("Moran Scatterplot for PER_HGSCH", "Based on K-Nearest Neighbor"))

knn_moran <- moran.test(detroit1$PER_HGSCH, knn_weights)
knn_moran

5.4 Distance

moran.plot(as.numeric(scale(detroit1$PER_HGSCH)), listw=dist_weights, 
           xlab="Standardized % With Less Than High School Diploma", 
           ylab="Standardized Lagged % With Less Than High School Diploma",
           main=c("Moran Scatterplot for PER_HGSCH", "Based on Distance"))

dist_moran <- moran.test(detroit1$PER_HGSCH, dist_weights)
dist_moran

5.5 Local Spatial Autocorrelation

The Moran’s I tells us whether clustering exists in the area. It does not tell us, however, where clusters are located. Due to this limitation, scholars came up with Local Indicators of Spatial Association (LISAs), which are local forms of the global indices.

LISAs provide a local measure of similarity between each unit’s value (in our case, percent of population with less than a high school diploma) and those of nearby cases. That is, rather than one single summary measure of spatial association (Moran’s I), we have a measure for every single unit (a census tract in our case) in the study area. We can then map each tract’s LISA value to provide insight into the location of neighborhoods with comparatively high or low associations with neighboring values (i.e. hot or cold spots).

LISA can be measured using the Getis-Ord (Gi and Gi*) and Local Moran’s I. We will use the Getis-Ord Gi statistic in this exercise.

We use the localG() function from the spdep package to do this.

5.5.1 Local Spatial Autocorrelation Based on Queen’s Contiguity

local_queen <-localG(detroit1$PER_HGSCH,  queen_weights)

The command returns a localG object containing the Z-scores for the Gi statistic. A large positive value suggests a cluster of high education percent (hot spot) and a large negative value indicates a cluster of low education percent (cold spot).

In order to plot the results, we will need to coerce the object localg to be numeric. Let’s do that and save it as a column in our detroit1 object.

detroit1 <- detroit1 %>%
                mutate(local_queen = as.numeric(local_queen))

Given that the returned results are Z-scores, we can choose hot spot thresholds in the statistic and calculate them with the case_when() function. Below we categorize hot and cold spots based on different significance levels (1% (or 99%), 5% (or 95%), and 10% (or 90%)) using the appropriate Z-scores. We then set the categorical variable as a factor, ordering the levels from cold to not significant to hot.

detroit1 <- detroit1 %>%
                mutate(hotspots_queen = case_when(
                  local_queen <= -2.58 ~ "Cold spot 99%",
                  local_queen > -2.58 & local_queen <=-1.96 ~ "Cold spot 95%",
                  local_queen > -1.96 & local_queen <= -1.65 ~ "Cold spot 90%",
                  local_queen >= 1.65 & local_queen < 1.96 ~ "Hot spot 90%",
                  local_queen >= 1.96 & local_queen <= 2.58 ~ "Hot spot 95%",
                  local_queen >= 2.58 ~ "Hot spot 99%",
                  TRUE ~ "Not Significant"))

#coerce into a factor, and sort levels from cold to not significant to hot
detroit1 <- detroit1 %>%
              mutate(hotspots_queen = factor(hotspots_queen,
                                  levels = c("Cold spot 99%", "Cold spot 95%",
                                             "Cold spot 90%", "Not Significant",
                                            "Hot spot 90%", "Hot spot 95%",
                                            "Hot spot 99%")))

Now we can plot the clusters

tmap_mode("plot") #Change this to 'view' if you want an interactive map
queen_clusters <- tm_shape(detroit1, unit = "m") +
  tm_polygons(col = "hotspots_queen", title = "Gi value", palette = c("blue","white", "red")) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_layout(frame = F, main.title = "Detroit Education Clusters - Queen",
            legend.outside = F) 
#print queen_clusters to see the map

The map illustrates distinctive patterns of spatial clustering. We see clusters of high percent of the population without a high school diplomas in southern-most parts of Detroit. We also see areas of low percents of the population without high school diplomas in northwestern Detroit. Cold spots also exist around the southeastern parts of Detroit. We also see a some hot spots in north-central Detroit.

5.5.2 Local Spatial Autocorrelation Based on Rook’s Contiguity

#Getis-Ord Gi 
localg <- localG(detroit1$PER_HGSCH, rook_weights)
#Coerce it to numeric
detroit1 <- detroit1 %>%
                mutate(local_rook = as.numeric(localg))
#Reclassify Z-scores
detroit1 <- detroit1 %>%
                mutate(hotspots_rook = case_when(
                  local_rook <= -2.58 ~ "Cold spot 99%",
                  local_rook > -2.58 & local_rook <=-1.96 ~ "Cold spot 95%",
                  local_rook > -1.96 & local_rook <= -1.65 ~ "Cold spot 90%",
                  local_rook >= 1.65 & local_rook < 1.96 ~ "Hot spot 90%",
                  local_rook >= 1.96 & local_rook <= 2.58 ~ "Hot spot 95%",
                  local_rook >= 2.58 ~ "Hot spot 99%",
                  TRUE ~ "Not Significant"))

#coerce into a factor, and sort levels from cold to not significant to hot
detroit1 <- detroit1 %>%
              mutate(hotspots_rook = factor(hotspots_rook,
                                  levels = c("Cold spot 99%", "Cold spot 95%",
                                             "Cold spot 90%", "Not Significant",
                                            "Hot spot 90%", "Hot spot 95%",
                                            "Hot spot 99%")))
#Plot the clusters
tmap_mode("plot") #Change this to 'view' if you want an interactive map
rook_clusters <- tm_shape(detroit1, unit = "m") +
  tm_polygons(col = "hotspots_rook", title = "Gi value", palette = c("blue","white", "red")) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_layout(frame = F, main.title = "Detroit Education Clusters - Rook",
            legend.outside = F) 

#print rook_clusters to see the map
rm(localg)

5.5.3 Local Spatial Autocorrelation Based on K-Nearest Neighbor

#Getis-Ord Gi 
localg <- localG(detroit1$PER_HGSCH, knn_weights)
#Coerce it to numeric
detroit1 <- detroit1 %>%
                mutate(local_knn = as.numeric(localg))
#Reclassify Z-scores
detroit1 <- detroit1 %>%
                mutate(hotspots_knn = case_when(
                  local_knn <= -2.58 ~ "Cold spot 99%",
                  local_knn > -2.58 & local_knn <=-1.96 ~ "Cold spot 95%",
                  local_knn > -1.96 & local_knn <= -1.65 ~ "Cold spot 90%",
                  local_knn >= 1.65 & local_knn < 1.96 ~ "Hot spot 90%",
                  local_knn >= 1.96 & local_knn <= 2.58 ~ "Hot spot 95%",
                  local_knn >= 2.58 ~ "Hot spot 99%",
                  TRUE ~ "Not Significant"))

#coerce into a factor, and sort levels from cold to not significant to hot
detroit1 <- detroit1 %>%
              mutate(hotspots_knn = factor(hotspots_knn,
                                  levels = c("Cold spot 99%", "Cold spot 95%",
                                             "Cold spot 90%", "Not Significant",
                                            "Hot spot 90%", "Hot spot 95%",
                                            "Hot spot 99%")))
#Plot the clusters
tmap_mode("plot") #Change this to 'view' if you want an interactive map
knn_clusters <- tm_shape(detroit1, unit = "m") +
  tm_polygons(col = "hotspots_knn", title = "Gi value", palette = c("blue","white", "red")) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_layout(frame = F, main.title = "Detroit Education Clusters - KNN",
            legend.outside = F) 

#print knn_clusters to see the map
rm(localg)

5.5.4 Local Spatial Autocorrelation Based on Distance

#Getis-Ord Gi 
localg <- localG(detroit1$PER_HGSCH, dist_weights)
#Coerce it to numeric
detroit1 <- detroit1 %>%
                mutate(local_dist = as.numeric(localg))
#Reclassify Z-scores
detroit1 <- detroit1 %>%
                mutate(hotspots_dist = case_when(
                  local_dist <= -2.58 ~ "Cold spot 99%",
                  local_dist > -2.58 & local_dist <=-1.96 ~ "Cold spot 95%",
                  local_dist > -1.96 & local_dist <= -1.65 ~ "Cold spot 90%",
                  local_dist >= 1.65 & local_dist < 1.96 ~ "Hot spot 90%",
                  local_dist >= 1.96 & local_dist <= 2.58 ~ "Hot spot 95%",
                  local_dist >= 2.58 ~ "Hot spot 99%",
                  TRUE ~ "Not Significant"))

#coerce into a factor, and sort levels from cold to not significant to hot
detroit1 <- detroit1 %>%
              mutate(hotspots_dist = factor(hotspots_dist,
                                  levels = c("Cold spot 99%", "Cold spot 95%",
                                             "Cold spot 90%", "Not Significant",
                                            "Hot spot 90%", "Hot spot 95%",
                                            "Hot spot 99%")))
#Plot the clusters
tmap_mode("plot") #Change this to 'view' if you want an interactive map
dist_clusters <- tm_shape(detroit1, unit = "m") +
  tm_polygons(col = "hotspots_dist", title = "Gi value", palette = c("blue","white", "red")) +
  tm_compass(type = "4star", position = c("left", "bottom")) + 
  tm_layout(frame = F, main.title = "Detroit Education Clusters - Distance",
            legend.outside = F) 
#print dist_clusters to see the map
rm(localg)

6 Submissions

Your submission will consist of a single knitted HTML file containing your written answers to the three discussions below. Please limit your response to a maximum of 500 words for each discussion point. Please name this file using the convention: LastName_Assignment5.html.

Submit all files to the Assignment 5 folder on HuskyCT. Your submission is due 11:59pm on Wednesday, February 28th.

6.1 Submission 1

Discussion 1: Discuss the differences between the spatial weights matrices. How does changing the definition of neighbors (contiguity, k-nearest, distance threshold) affect the connectivity patterns? Which definition leads to the greatest number of neighbors or the least? To answer this question, you will need to explore summaries of the neighbor objects obtained from the four definitions. You will also need to explore the various connectivity histograms. Which definition to you think best captures the spatial relationships between census tracts?

summary(CTracts_queen) #summary of queen neighbors

summary(CTracts_rook) #summary of rook neighbors

summary(CTracts_knn) #summary of knn neighbors

summary(CTracts_dist) #summary of distance neighbors

My answer:

Creating a network to arrange the spatial weights matrices using the different results in networks which are appear to be of very different natures. Since we are working with census tract data, it is not uncommon for many tracts to share vertices as opposed to just sides. This results in us seeing some very different connectivity behavior between the rook and queen modeling. While I do not have the specific statically-based number for the average number of connections gained in the queen network as compared to the rook network, it appears as if on average each tract might add ~1.5 neighbors using the queen method. Using 6 nearest neighbors, we see a network evolving that is somewhat similar to the queen contiguity method in terms of the number of connections per tract. However, the nature of connections on the map in the network are different. Our distance-based network formation yields a huge quantity of connections per census tract. In order of supposed average connections per census tract, we see that distance method (with the specified distance) has the most, followed by k-nearest neighbors (set to k=6), then queen, and finally rook.

6.2 Submission 2

Discussion 2: Discuss the differences observed for the Moran’s I values based on the spatial weights matrix used in the calculation. Which spatial weights matrix produced the highest or lowest Moran’s I values? Why might this be the case – be sure to consider the differences in how neighbors were defined? To answer this question, you may need to explore summaries of global Moran I statistics obtained from the four neighbor definitions as well as the choropleth map of PER_HGSCH.

queen_moran
rook_moran
knn_moran
dist_moran

My answer: Based upon the Moran’s I tests performed in this code, we can see that changing our neighbor network definitions, we can observe a difference in our resultant Moran’s I values that range from very small to quite significant. The Moran’s I’s have the following values:
Queen’s: 0.445301966
Rook’s: 0.433396792
6-nearest: 0.4422392767
Distance: 0.3522549066
Here, we observe that the Moran’s I values are quite close for the queen, rook, and KNN methods. We see that the distance method, though, has a substantially smaller Moran’s I value. The average number of connections for the former 3 systems are quite similar, as are the Moran’s I values. We observe that as the number of average connections per polygon increases that we find that our Moran’s I value drops for this data set. This could be a result of a couple things. It could mean that data is autocorrelated and very tightly clustered. It could mean that the borders of census tracts have an important real-world implication on the data. It could also indicate that the data is not as autocorrelated as the queen, rook and KNN models might suggest and that the polygon boundaries are actually somewaht arbitrary in determining autocorrelation and that distance is actually a more important determinant of autocorrelation.

6.3 Submission 3

Discussion 3: Discuss the differences observed in the spatial clustering of PER_HGSCH in Detroit obtained by the Getis-Ord Gi statistic. To answer this question, you will need to explore patterns in the four maps of spatial clusters. Do you see any differences in the clustering of the PER_HGSCH variable? Why or why not - be sure to consider the differences in how neighbors were defined. Which spatial weights matrix produced the most realistic spatial clusters, in your opinion?

tmap_arrange(queen_clusters, rook_clusters, knn_clusters, dist_clusters, ncol = 2)

My answer: Each of these neighborhood definitions yield maps which give similar appearances for the way in which hot and cold spots appear, but are different in important ways. Maps like these could be given to regional or federal authorities to determine things like boundaries for eligibility for adult education programs in Detroit. Therefore, since a map like this could have very important implications on the life situation and outcomes of many people, it is important that the data is represented in a way which is accurate. A compelling map is great, but accuracy comes first. Hypothesis tests for hot and cold spots seem to be influenced heavily by neighborhood definitions. Models which include more neighbors in definitions more readily declare significance for units in hot and cold spot analyses. This could be because there are more possible comparisons to be made when there are more neighbors and presumably if autocorrelation is present, each additional neighbor included is less related than the more primary neighbors. If the neighbors are considered equally for evaluation in hot/cold spot analysis, it makes sense that more significant differences will be accentuated when more neighbors are included.

…………………………….End of Lab Assignment 5…………………………….

GEOG 3500: Lab Assignment 5 - Spatial Autocorrelation

Jack Bienvenue

2024-02-29