Overview

I found that county-level crime rates (violent, property, and total) have little statistical impact on educational outcomes in West Virginia. There are a few reasons for this including generally low crime rates, positive correlation between funding and crime rates, and positive correlation between funding and educational outcomes.

The educational outcome I thought would be most impacted by crime rates was 11th grade math. I tested multiple grade levels and subjects before choosing this measure.

Correlations

Firstly, observing the correlations between these variables reveals interesting relationships with crime rates. For Example)

Crime rates are positively correlated with population, which has a strong pull on the model by influencing revenues and enrollment.

Surprisingly, but likely tied to their correlation with population, crime rates have a positive correlation with 11th grade math proficiency. This trend held true across grade levels and subjects.

Linear Regression Model

This model utilizes unemployment and total crime rate to predict 11th grade math scores. I intentionally excluded revenue from the model, as local revenue is by far the best predictor in the dataset which could overpower the other factors.


Call:
lm(formula = math_proficiency ~ total_crime_rate + unemployed, 
    data = t_train)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.7893  -4.8249  -0.8843   3.3967  22.9208 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       24.4371     4.0317   6.061 6.41e-07 ***
total_crime_rate   0.2271     0.1504   1.511   0.1399    
unemployed        -0.9998     0.3919  -2.551   0.0153 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.138 on 35 degrees of freedom
Multiple R-squared:  0.2438,    Adjusted R-squared:  0.2006 
F-statistic: 5.642 on 2 and 35 DF,  p-value: 0.007519
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-13.20134  -2.80471  -0.58314   0.08583   6.82420  10.03340 
Warning: Ignoring unknown parameters: `method`

Additional Plots

Below are some additional plots which are helpful in understanding why crime rates are positively correlated with educational outcomes in West Virginia.

Discussion and Explanation

West Virginia is one of the lowest crime-rate states in the country, ranking in the bottom 10 in terms of crime rates. The state is also mostly rural with small populations and low funding.

Higher population areas in WV have (while still low) higher crime rates, but they also have higher populations, more funding, and generally better educational outcomes. Therefore, the state’s rural demographics and relatively low crime rates create this positive correlation because crime rates serve as a proxy for population and funding.

It is important to note in this example that correlation IS NOT causation. All else equal, one would normally expect higher crime rates to create worse educational outcomes.

Reccomendation

My understanding from analyzing this model is that many districts in WV are so underfunded that higher populations and subsequent increases in crime rates are actually beneficial for educational outcomes due to the higher funding that comes with higher populations. This may be a case where more state or federal funding needs to be allocated to districts that cannot meet their needs with local revenues.

---
title: "The Impact of Crime Rates on WV Educational Outcomes"
output: html_notebook
---

# Overview

### I found that county-level crime rates (violent, property, and total) have little statistical impact on educational outcomes in West Virginia. There are a few reasons for this including generally low crime rates, positive correlation between funding and crime rates, and positive correlation between funding and educational outcomes.

### The educational outcome I thought would be most impacted by crime rates was 11th grade math. I tested multiple grade levels and subjects before choosing this measure.



```{r message=FALSE, warning=FALSE, include=FALSE}
## Load assessment data

library(tidyverse)
library(caret)
library(rpart)
library(readxl)

assessment_path <- './wv ed student achievement/Historical_AssessmentResults_SY15-to-SY21.xlsx'


t_assess_raw_school <- read_excel(path = assessment_path,
                           sheet = 'SY21 School & District',
                           range = 'b2:f7312')


t_assess_raw_math <- read_excel(path = assessment_path,
                           sheet = 'SY21 School & District',
                           range = 'ao3:ao7312', 
                           col_names = c('math_proficiency'),
                           na = '**')

t_assess_raw <- t_assess_raw_school %>%
  bind_cols(t_assess_raw_math) %>% 
  janitor::clean_names()  


# Remove subgroups
t_assess <- t_assess_raw %>% 
  filter(school == 999) %>% 
  filter(population_group == 'Total Population') %>% 
  filter(county != 'Statewide') %>% 
  mutate(proficiency = math_proficiency)  

print(t_assess)
```




```{r message=FALSE, warning=FALSE, include=FALSE}

## Load spending data

spending_path <- './us census ed spending/elsec22t.xls'

t_spending_raw <- read_excel(path = spending_path,
                           sheet = 'elsec22t',
                           range = 'a1:gb14106') %>% 
  janitor::clean_names()


cooperates <- c('MOUNTAIN STATE EDUCATIONAL SERVICES COOPERATIVE',
                'EASTERN PANHANDLE INSTRUCTIONAL COOPERATIVE',
                'SOUTHERN EDUCATIONAL SERVICES COOPERATIVE')

t_spending <- t_spending_raw %>% 
  filter(state == 49) %>% 
  filter(!name %in% cooperates) %>% 
  select(name, enroll, tfedrev, tstrev, tlocrev, totalexp, ppcstot) %>% 
  mutate(county = str_to_title(str_split_i(name, ' ',1)),
         county = ifelse(county == 'Mc', 'McDowell', county))


print(t_spending)
```



```{r message=FALSE, warning=FALSE, include=FALSE}
## Load demographic data
  

t_demographics_unemployed <- read_csv('./demographics/unemployed.csv', 
                            skip = 4,
                            na = 'N/A') %>%
  janitor::clean_names() %>% 
  filter(county != 'West Virginia',
         county != 'United States',
         !is.na(value_percent) ) %>% 
  select(county, value_percent) %>%
  rename(unemployed = value_percent)


t_demographics <-  t_demographics_unemployed

print(t_demographics)
```



```{r message=FALSE, warning=FALSE, include=FALSE}

## Joined data

# Merge data
t <- t_assess %>% 
  left_join(t_spending, by = "county") %>% 
  select(-school, -subgroup, -population_group, -school_name, -name, -proficiency) %>% 
  mutate(temp_county = paste(county, "County", sep = " ")) %>% 
  left_join(t_demographics, by = c("temp_county" = "county")) %>% 
  select(-temp_county) %>% 
  mutate(rev_per_student = (tfedrev + tstrev + tlocrev) / enroll * 1000) %>% 
  mutate(surplus_per_student = rev_per_student - ppcstot)

t_crime <- read.csv('wv_crime.csv') %>% 
  rename(county = 'County')

t <- t %>% 
  left_join(t_crime, by = 'county') %>% 
  rename(pop_2019 = 'X2019.Population',
         violent_crime_rate = 'Violent.Crime.Rate',
         property_crime_rate = 'Property.Crime.Rate',
         total_crime_rate = 'Total.Crime.Rate') %>% 
  mutate(percent_student = enroll / pop_2019 * 100)
```


## Correlations

```{r echo=FALSE, message=FALSE, warning=FALSE}
library(ggcorrplot)

t_numeric <- t %>% 
  mutate(pop_2019 = as.numeric(pop_2019)) %>% 
  select(where(is.numeric))

ggcorrplot(cor(t_numeric), 
           colors = c("darkred", "white", "purple4"))
```


```{r message=FALSE, warning=FALSE, include=FALSE}
cor(t_numeric)
cor.test(t_numeric$rev_per_student, t_numeric$math_proficiency)
cor.test(t_numeric$ppcstot, t_numeric$math_proficiency)
cor.test(t_numeric$total_crime_rate, t_numeric$math_proficiency)
```

### Firstly, observing the correlations between these variables reveals interesting relationships with crime rates. For Example)

### Crime rates are positively correlated with population, which has a strong pull on the model by influencing revenues and enrollment.

### Surprisingly, but likely tied to their correlation with population, crime rates have a positive correlation with 11th grade math proficiency. This trend held true across grade levels and subjects.



## Linear Regression Model

### This model utilizes unemployment and total crime rate to predict 11th grade math scores. I intentionally excluded revenue from the model, as local revenue is by far the best predictor in the dataset which could overpower the other factors.

```{r message=FALSE, warning=FALSE, include=FALSE}
set.seed(1)

sample <- sample(nrow(t_numeric), size = 0.7 * nrow(t_numeric))

t_train <- t_numeric %>% slice(sample)
t_test <- t_numeric %>% slice(-sample)
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
m1 <- lm(formula = math_proficiency ~ total_crime_rate + unemployed, data = t_train)
summary(m1)
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
t_test <- t_test %>% 
  mutate(predicted = predict(m1, newdata = t_test)) %>% 
  mutate(residuals = predicted - math_proficiency)

summary(t_test$residuals)

ggplot(t_test) +
  aes(x = predicted, y = math_proficiency) +
  geom_point(color = "#112446") +
  geom_abline(method = lm) +
  coord_equal() +
  theme_minimal()




```
## Additional Plots

### Below are some additional plots which are helpful in understanding why crime rates are positively correlated with educational outcomes in West Virginia.


```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(t) +
  aes(x = total_crime_rate, y = math_proficiency) +
  geom_point(color = "#112446") +
  geom_smooth(method = lm) +
  theme_minimal()
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(t) +
  aes(x = pop_2019, y = total_crime_rate) +
  geom_point(color = "#112446") +
  geom_smooth(method = lm) +
  theme_minimal()
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(t) +
  aes(x = enroll, y = total_crime_rate) +
  geom_point(colour = "#112446") +
  geom_smooth(method = lm) +
  theme_minimal()
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(t) +
  aes(x = percent_student, y = total_crime_rate) +
  geom_point(colour = "#112446") +
  geom_smooth(method = lm) +
  theme_minimal()
```

## Discussion and Explanation

### West Virginia is one of the lowest crime-rate states in the country, ranking in the bottom 10 in terms of crime rates. The state is also mostly rural with small populations and low funding. 

### Higher population areas in WV have (while still low) higher crime rates, but they also have higher populations, more funding, and generally better educational outcomes. Therefore, the state's rural demographics and relatively low crime rates create this positive correlation because crime rates serve as a proxy for population and funding. 

### It is important to note in this example that correlation IS NOT causation. All else equal, one would normally expect higher crime rates to create worse educational outcomes.

## Reccomendation

### My understanding from analyzing this model is that many districts in WV are so underfunded that higher populations and subsequent increases in crime rates are actually beneficial for educational outcomes due to the higher funding that comes with higher populations. This may be a case where more state or federal funding needs to be allocated to districts that cannot meet their needs with local revenues.

## Resources

###  WV Crime Stats: https://das.wv.gov/JCS/ORSP/SAC/Publications/Documents/Crime%20in%20WV-Final%20Copy%20%281%29%20%282%29.pdf

### Crime Rates by State: https://worldpopulationreview.com/state-rankings/crime-rate-by-state

### Link: https://rpubs.com/ClaycanTCode/1252385
