HW 3 Spatial Anaylsis

Data

In this assignment we will be looking at handling spatial properties of US counties using a spatial regimes model.

usco.s <- st_simplify(usco, dTolerance = 2000)
ggplot(usco.s)+
  geom_sf() +
  labs(title = "United States Counties")

We will be merging it with United States Regions

ggplot(us_states)+
  geom_sf(aes(fill = REGION)) +
  labs(title = "United States regions")

us.regions <- us_states %>% 
  group_by(REGION) %>% 
  summarise(do_union = TRUE)
usco.s <- st_intersection(usco.s, us.regions)

ggplot(usco.s)+
  geom_sf(aes(fill = REGION)) +
  labs(title = "United States Counties By Region")

We will examine a regression model with these regions in mind. ### Global Linear Regression Looking at things that effect % of persons in poverty

#Scaling all the variables
usco.s<- usco.s %>% 
  mutate_if(is.numeric, scale)

g.fit <- lm(ppov ~ pblack + phisp + prural + pfemhh + unemp + medhouseval + popdensity, data = usco.s)

summary(g.fit)

## 
## Call:
## lm(formula = ppov ~ pblack + phisp + prural + pfemhh + unemp + 
##     medhouseval + popdensity, data = usco.s)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6165 -0.3797 -0.0612  0.2959  4.1790 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.634e-05  9.858e-03  -0.010    0.992    
## pblack      -1.976e-01  1.847e-02 -10.695   <2e-16 ***
## phisp        2.051e-01  1.051e-02  19.512   <2e-16 ***
## prural       3.493e-01  1.296e-02  26.948   <2e-16 ***
## pfemhh       7.738e-01  2.187e-02  35.384   <2e-16 ***
## unemp        1.521e-01  1.232e-02  12.347   <2e-16 ***
## medhouseval -2.100e-01  1.223e-02 -17.179   <2e-16 ***
## popdensity   1.159e-02  1.078e-02   1.075    0.282    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5645 on 3271 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.6814 
## F-statistic:  1003 on 7 and 3271 DF,  p-value: < 2.2e-16

We see results are significant now we try spatial regimes modeling

Regimes Modeling

regions.vector <- as.character(unique(usco.s$REGION))
models <- vector(mode = "list", length = length(regions.vector))
for (i in 1:length(regions.vector)) {
  models[[i]] <- lm(ppov ~ pblack + phisp + prural + pfemhh + unemp + medhouseval + popdensity, data = usco.s,
     subset = REGION == regions.vector[[i]])
}

Chow <- function(lm.list, global){
  RSS2 <- sapply(lm.list, function(x) resid(x)^2) %>% 
    unlist() %>% 
    sum()
  RSS0 <- sum(resid(global)^2)
  k <- length(coef(global))
  n <- sapply(lm.list, function(x) length(resid(x))) %>% 
    sum()
  
  ctest <- ((RSS0 - RSS2)/k) / (RSS2 / (n- 2*k))
  p.value <- df(ctest, k, n - 2 *k)
  cbind(ctest, p.value)
}
Chow(models, g.fit)

##         ctest       p.value
## [1,] 78.69702 1.017381e-118

We have a significant Chow test which means we have motivation for spatial regimes.

Examining the models

temp <- lapply(models, tidy) %>%
  as.data.frame()

m.est <- temp %>% 
  select(term, starts_with("estimate")) %>% 
  tidyr::pivot_longer(cols = starts_with("estimate"),
                      values_to = "estimate",
                      names_to = "Model",
                      names_prefix = "estimate")
  
  
m.p.values <- temp  %>% 
  select(term, starts_with("p.value")) %>% 
  tidyr::pivot_longer(cols = starts_with("p.value"),
                      values_to = "p.value",
                      names_to = "Model",
                      names_prefix = "p.value")

model.clean <- m.est %>% 
  left_join(m.p.values)

## Joining, by = c("term", "Model")

model.clean <- model.clean %>% 
  mutate(Model = case_when(Model == "" ~ regions.vector[1],
                           Model == ".1" ~ regions.vector[2],
                           Model == ".2" ~ regions.vector[3],
                           Model == ".3" ~ regions.vector[4]),
         sig = ifelse(p.value < .05, "Significant", "Non- Significant"))

Looking at the variables across variables.

Examine.Variable <- function(data, name){
  data %>% 
    filter(term == name) %>% 
    select(-p.value) %>% 
      flextable()

}

Examine.Variable(model.clean, name = "(Intercept)")

term	Model	estimate	sig
(Intercept)	Norteast	-0.39945000	Significant
(Intercept)	Midwest	-0.19094299	Significant
(Intercept)	South	0.09186333	Significant
(Intercept)	West	0.05602628	Non- Significant

Looking at the intercepts we see that the estimates vary by region with the North East having the smallest estimate intercept. And the West is Non Significant

Examine.Variable(model.clean, name = "medhouseval")

term	Model	estimate	sig
medhouseval	Norteast	-0.09834662	Significant
medhouseval	Midwest	-0.45079076	Significant
medhouseval	South	-0.29641718	Significant
medhouseval	West	-0.20237483	Significant

Looking at median house value we see that all the estimates are negative and significant. This makes sense as a counties median value increases percent of people in poverty decreases.

Examine.Variable(model.clean, "pblack")

term	Model	estimate	sig
pblack	Norteast	-0.10886792	Non- Significant
pblack	Midwest	0.06352553	Non- Significant
pblack	South	-0.32929979	Significant
pblack	West	-0.61409027	Significant

Looking at percent black we see a larger effect in the South and West and a non significant effect in the North-East and Mid-West. This is probably due to the history of the segregation in the South and Western states. Though the signs are not what we would expect being negative meaning as a counties black population increases the percent of poverty decreases.

Examine.Variable(model.clean, "pfemhh")

term	Model	estimate	sig
pfemhh	Norteast	0.5138073	Significant
pfemhh	Midwest	0.7562961	Significant
pfemhh	South	0.8508856	Significant
pfemhh	West	0.7196037	Significant

Percent of Women headed households has a positive effect on percent on poverty in all the regions. This makes sense since a women head house might mean a single parent home which would make less income.

Examine.Variable(model.clean, "phisp")

term	Model	estimate	sig
phisp	Norteast	-0.005450997	Non- Significant
phisp	Midwest	0.021925421	Non- Significant
phisp	South	0.170032414	Significant
phisp	West	0.073540316	Significant

Looking at percent Hispanic we see a positive effect in the South and West regions for percent in poverty. The Hispanic population is greater in the south and western regions of which might lead to that effect.

Examine.Variable(model.clean, "popdensity")

term	Model	estimate	sig
popdensity	Norteast	0.037070406	Significant
popdensity	Midwest	-0.289387324	Significant
popdensity	South	-0.004484946	Non- Significant
popdensity	West	0.119676268	Significant

Looking at the effect of population density it varies by the region. We see South the effect is not significant. The effect is negative in Midwest and positive in the Northeast and West.

Examine.Variable(model.clean, "prural")

term	Model	estimate	sig
prural	Norteast	0.2498425	Significant
prural	Midwest	0.2794856	Significant
prural	South	0.3287613	Significant
prural	West	0.2504155	Significant

Percent rural effect on poverty is significant throughout the regions and of similar size with the different regions.

Examine.Variable(model.clean, "unemp")

term	Model	estimate	sig
unemp	Norteast	0.288979609	Significant
unemp	Midwest	-0.005081404	Non- Significant
unemp	South	0.204371173	Significant
unemp	West	0.117802244	Significant

We would expect as the percent of unemployment increases that poverty would also increase. That can be seen in these models. Though in the Midwest region we do not see that relationship.

Comparing Variable Importance across Models

Since we scaled all of our variables we can compare the value of the estimates across variables.

model.clean <- model.clean %>% 
  group_by(Model) %>% 
  mutate(rank = order(order(abs(estimate), decreasing=TRUE)))

model.clean %>% 
  arrange(rank) %>% 
  select(term, Model, estimate, rank) %>% 
  flextable()

term	Model	estimate	rank
pfemhh	Norteast	0.513807326	1
pfemhh	Midwest	0.756296120	1
pfemhh	South	0.850885604	1
pfemhh	West	0.719603669	1
(Intercept)	Norteast	-0.399449998	2
pblack	South	-0.329299793	2
pblack	West	-0.614090267	2
medhouseval	Midwest	-0.450790765	2
prural	South	0.328761268	3
prural	West	0.250415535	3
unemp	Norteast	0.288979609	3
popdensity	Midwest	-0.289387324	3
prural	Norteast	0.249842492	4
prural	Midwest	0.279485566	4
medhouseval	South	-0.296417183	4
medhouseval	West	-0.202374828	4
(Intercept)	Midwest	-0.190942991	5
pblack	Norteast	-0.108867919	5
unemp	South	0.204371173	5
popdensity	West	0.119676268	5
pblack	Midwest	0.063525531	6
phisp	South	0.170032414	6
unemp	West	0.117802244	6
medhouseval	Norteast	-0.098346621	6
(Intercept)	South	0.091863325	7
phisp	Midwest	0.021925421	7
phisp	West	0.073540316	7
popdensity	Norteast	0.037070406	7
(Intercept)	West	0.056026279	8
phisp	Norteast	-0.005450997	8
unemp	Midwest	-0.005081404	8
popdensity	South	-0.004484946	8

We can see that across regions that percent of women headed household had the largest effect on poverty. In second we see percent black in the south and Midwest. The intercept is second in North East that might mean there is a variable that is not included in the model that North East could contribute to percent poverty.