DiscussWk11_605

jbrnbrg

November 8, 2017

I made this data from a previous project but didn’t complete this particular analysis with the data.

It’s NYS counties broken down by average annual school aid and graduation and drop-out rates.

combo_school_aid_url <- "https://raw.githubusercontent.com/jbrnbrg/proj_proposal_606/master/jbrnbrg-project606-proposal20170322.csv"
school_aid_grad_NYS<- read.csv(text = getURL(combo_school_aid_url), stringsAsFactors = FALSE) 

head(school_aid_grad_NYS)
##   X COUNTY_NAME   AVG_AID STU_POP_4YRAVG USD_AID_PER_STU   GRAD_RT
## 1 1      albany 235888446          41046        5746.929 0.8592629
## 2 2    allegany 106406561           7726       13772.529 0.8605809
## 3 3      broome 258815172          26419        9796.554 0.8834121
## 4 4 cattaraugus 171343074          10763       15919.639 0.8801755
## 5 5      cayuga  96665641          10002        9664.631 0.8719947
## 6 6  chautauqua 225452963          18031       12503.631 0.8780143
##      DROP_RT
## 1 0.09063595
## 2 0.04613393
## 3 0.07345100
## 4 0.08180304
## 5 0.08007588
## 6 0.08269780
sm_src <- school_aid_grad_NYS

ggplot(data = sm_src, aes(x=USD_AID_PER_STU, y=DROP_RT)) + 
  geom_point(size = 2, alpha = .4) + 
  scale_x_continuous(label = dollar_format()) + 
  theme(panel.grid.major = element_line(color = "gray")) +
  labs(title = "Average Annual Student Aid per Student vs. County Drop-out Rates:", 
       x = "Annual Student Aid per Student by County", 
       y = "HS Drop-out Rates by County") 

There looks like there’s a trend:

cor(sm_src$DROP_RT, sm_src$USD_AID_PER_STU)
## [1] 0.2702581

I’ll create the linear model:

jmb_m1 <- lm(DROP_RT~USD_AID_PER_STU, data = sm_src)

ggplot(data = sm_src, aes(x=USD_AID_PER_STU, y=DROP_RT)) + 
  geom_point(size = 2, alpha = .4) + 
  scale_x_continuous(label = dollar_format()) +
  #geom_abline(intercept = 0.9238, slope = -0.000004511, color = "blue") +
  geom_smooth(method = "lm", se = FALSE) +
  theme(panel.grid.major = element_line(color = "gray")) +
  labs(title = "Average Annual Student Aid per Student vs. County Drop-out Rates:", 
       x = "Annual Student Aid per Student by County", 
       y = "HS Drop-out Rates by County") 

I’ll review the residuals to see if they’re normally dispersed about zero and then review the qqnorm plot of the linear model’s residuals. This will help support a normal determination:

ggplot(data = sm_src, aes(x=USD_AID_PER_STU, y=jmb_m1$residuals)) + 
  geom_point(size = 2, alpha = .4) + 
  scale_x_continuous(label = dollar_format()) +
  geom_abline(intercept = 0, slope = 0, color = "blue") +
  theme(panel.grid.major = element_line(color = "gray")) +
  labs(title = "Average Annual Student Aid per Student vs. Model Residuals:", 
       x = "Annual Student Aid per Student by County", 
       y = "Model Residuals") 

qqnorm(jmb_m1$residuals)
qqline(jmb_m1$residuals)

hist(jmb_m1$residuals)

summary(jmb_m1)
## 
## Call:
## lm(formula = DROP_RT ~ USD_AID_PER_STU, data = sm_src)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043573 -0.017160 -0.002709  0.012247  0.088565 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.979e-02  1.144e-02   4.352 5.78e-05 ***
## USD_AID_PER_STU 2.233e-06  1.063e-06   2.101   0.0402 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0259 on 56 degrees of freedom
## Multiple R-squared:  0.07304,    Adjusted R-squared:  0.05649 
## F-statistic: 4.412 on 1 and 56 DF,  p-value: 0.04019