Data

In this assignment we will be looking at handling spatial properties of US counties using a spatial regimes model.

usco.s <- st_simplify(usco, dTolerance = 2000)
ggplot(usco.s)+
  geom_sf() +
  labs(title = "United States Counties")

We will be merging it with United States Regions

ggplot(us_states)+
  geom_sf(aes(fill = REGION)) +
  labs(title = "United States regions")

us.regions <- us_states %>% 
  group_by(REGION) %>% 
  summarise(do_union = TRUE)
usco.s <- st_intersection(usco.s, us.regions)
ggplot(usco.s)+
  geom_sf(aes(fill = REGION)) +
  labs(title = "United States Counties By Region")

We will examine a regression model with these regions in mind. ### Global Linear Regression Looking at things that effect % of persons in poverty

#Scaling all the variables
usco.s<- usco.s %>% 
  mutate_if(is.numeric, scale)

g.fit <- lm(ppov ~ pblack + phisp + prural + pfemhh + unemp + medhouseval + popdensity, data = usco.s)

summary(g.fit)
## 
## Call:
## lm(formula = ppov ~ pblack + phisp + prural + pfemhh + unemp + 
##     medhouseval + popdensity, data = usco.s)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6165 -0.3797 -0.0612  0.2959  4.1790 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.634e-05  9.858e-03  -0.010    0.992    
## pblack      -1.976e-01  1.847e-02 -10.695   <2e-16 ***
## phisp        2.051e-01  1.051e-02  19.512   <2e-16 ***
## prural       3.493e-01  1.296e-02  26.948   <2e-16 ***
## pfemhh       7.738e-01  2.187e-02  35.384   <2e-16 ***
## unemp        1.521e-01  1.232e-02  12.347   <2e-16 ***
## medhouseval -2.100e-01  1.223e-02 -17.179   <2e-16 ***
## popdensity   1.159e-02  1.078e-02   1.075    0.282    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5645 on 3271 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.6814 
## F-statistic:  1003 on 7 and 3271 DF,  p-value: < 2.2e-16

We see results are significant now we try spatial regimes modeling

Regimes Modeling

regions.vector <- as.character(unique(usco.s$REGION))
models <- vector(mode = "list", length = length(regions.vector))
for (i in 1:length(regions.vector)) {
  models[[i]] <- lm(ppov ~ pblack + phisp + prural + pfemhh + unemp + medhouseval + popdensity, data = usco.s,
     subset = REGION == regions.vector[[i]])
}
Chow <- function(lm.list, global){
  RSS2 <- sapply(lm.list, function(x) resid(x)^2) %>% 
    unlist() %>% 
    sum()
  RSS0 <- sum(resid(global)^2)
  k <- length(coef(global))
  n <- sapply(lm.list, function(x) length(resid(x))) %>% 
    sum()
  
  ctest <- ((RSS0 - RSS2)/k) / (RSS2 / (n- 2*k))
  p.value <- df(ctest, k, n - 2 *k)
  cbind(ctest, p.value)
}
Chow(models, g.fit)
##         ctest       p.value
## [1,] 78.69702 1.017381e-118

We have a significant Chow test which means we have motivation for spatial regimes.

Examining the models

temp <- lapply(models, tidy) %>%
  as.data.frame()

m.est <- temp %>% 
  select(term, starts_with("estimate")) %>% 
  tidyr::pivot_longer(cols = starts_with("estimate"),
                      values_to = "estimate",
                      names_to = "Model",
                      names_prefix = "estimate")
  
  
m.p.values <- temp  %>% 
  select(term, starts_with("p.value")) %>% 
  tidyr::pivot_longer(cols = starts_with("p.value"),
                      values_to = "p.value",
                      names_to = "Model",
                      names_prefix = "p.value")

model.clean <- m.est %>% 
  left_join(m.p.values)
## Joining, by = c("term", "Model")
model.clean <- model.clean %>% 
  mutate(Model = case_when(Model == "" ~ regions.vector[1],
                           Model == ".1" ~ regions.vector[2],
                           Model == ".2" ~ regions.vector[3],
                           Model == ".3" ~ regions.vector[4]),
         sig = ifelse(p.value < .05, "Significant", "Non- Significant"))

Looking at the variables across variables.

Examine.Variable <- function(data, name){
  data %>% 
    filter(term == name) %>% 
    select(-p.value) %>% 
      flextable()

}

Examine.Variable(model.clean, name = "(Intercept)")

Looking at the intercepts we see that the estimates vary by region with the North East having the smallest estimate intercept. And the West is Non Significant

Examine.Variable(model.clean, name = "medhouseval")

Looking at median house value we see that all the estimates are negative and significant. This makes sense as a counties median value increases percent of people in poverty decreases.

Examine.Variable(model.clean, "pblack")

Looking at percent black we see a larger effect in the South and West and a non significant effect in the North-East and Mid-West. This is probably due to the history of the segregation in the South and Western states. Though the signs are not what we would expect being negative meaning as a counties black population increases the percent of poverty decreases.

Examine.Variable(model.clean, "pfemhh")

Percent of Women headed households has a positive effect on percent on poverty in all the regions. This makes sense since a women head house might mean a single parent home which would make less income.

Examine.Variable(model.clean, "phisp")

Looking at percent Hispanic we see a positive effect in the South and West regions for percent in poverty. The Hispanic population is greater in the south and western regions of which might lead to that effect.

Examine.Variable(model.clean, "popdensity")

Looking at the effect of population density it varies by the region. We see South the effect is not significant. The effect is negative in Midwest and positive in the Northeast and West.

Examine.Variable(model.clean, "prural")

Percent rural effect on poverty is significant throughout the regions and of similar size with the different regions.

Examine.Variable(model.clean, "unemp")

We would expect as the percent of unemployment increases that poverty would also increase. That can be seen in these models. Though in the Midwest region we do not see that relationship.

Comparing Variable Importance across Models

Since we scaled all of our variables we can compare the value of the estimates across variables.

model.clean <- model.clean %>% 
  group_by(Model) %>% 
  mutate(rank = order(order(abs(estimate), decreasing=TRUE)))
model.clean %>% 
  arrange(rank) %>% 
  select(term, Model, estimate, rank) %>% 
  flextable()

We can see that across regions that percent of women headed household had the largest effect on poverty. In second we see percent black in the south and Midwest. The intercept is second in North East that might mean there is a variable that is not included in the model that North East could contribute to percent poverty.