This gives us final state-Level metrics that represent the typical performance across all cohorts.
Then, scatter plot with regression line showing relationship between initial scores (x-axis) and gains (y-axis), with residuals indicating which state over/under-performed versus predictions.
## Correlation between 5th Grade and Dropout Rate: -0.47
## Correlation between Gain and Dropout Rate: 0.46
## R2 Without drop out 0.4756675
## R2 with drop out 0.4996888
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7797 -3.7241 -0.4391 3.7562 8.0028
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 118.5694 20.8585 5.684 8.69e-06 ***
## avg_state_math_5th -0.3646 0.1001 -3.643 0.00136 **
## avg_drop_out_rate 17.1778 16.3466 1.051 0.30424
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.808 on 23 degrees of freedom
## Multiple R-squared: 0.4997, Adjusted R-squared: 0.4562
## F-statistic: 11.49 on 2 and 23 DF, p-value: 0.0003477
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 58.0 194. 0.241 52.0
## 2 AM 52.8 185. 0.220 54.8
## 3 AP 54.4 170. 0.304 61.8
## 4 BA 52.7 182. 0.251 56.6
## 5 CE 53.2 203. 0.137 46.9
## 6 DF 35.9 213. 0.158 43.7
## 7 ES 53.1 199. 0.232 49.8
## 8 GO 54.0 197. 0.139 49.2
## 9 MA 63.6 175. 0.232 58.6
## 10 MG 46.0 210. 0.165 44.8
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.4
## Correlation between Gain and Dropout Rate: 0.49
## R2 Without drop out 0.1190966
## R2 with drop out 0.2664999
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.936 -2.861 -1.377 3.149 22.066
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.0296 26.8320 2.908 0.00792 **
## avg_state_leitura_5th -0.1283 0.1427 -0.899 0.37793
## avg_drop_out_rate 45.5938 21.2074 2.150 0.04231 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.453 on 23 degrees of freedom
## Multiple R-squared: 0.2665, Adjusted R-squared: 0.2027
## F-statistic: 4.178 on 2 and 23 DF, p-value: 0.02832
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 69.0 181. 0.241 65.9
## 2 AM 69.5 168. 0.220 66.5
## 3 AP 66.5 155. 0.304 72.0
## 4 BA 64.0 168. 0.251 67.9
## 5 CE 62.8 193. 0.137 59.5
## 6 DF 49.5 194. 0.158 60.4
## 7 ES 66.0 181. 0.232 65.3
## 8 GO 67.8 179. 0.139 61.4
## 9 MA 66.9 162. 0.232 67.8
## 10 MG 60.3 189. 0.165 61.3
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.74
## Correlation between Gain and Dropout Rate: 0.49
## R2 Without drop out 0.3000958
## R2 with drop out 0.3158019
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.091 -3.003 1.904 2.876 8.053
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.3857 33.8426 2.907 0.00773 **
## avg_state_math_5th -0.2690 0.1637 -1.643 0.11337
## avg_drop_out_rate 14.1737 19.0957 0.742 0.46514
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.589 on 24 degrees of freedom
## Multiple R-squared: 0.3158, Adjusted R-squared: 0.2588
## F-statistic: 5.539 on 2 and 24 DF, p-value: 0.01052
## # A tibble: 27 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 55.6 181. 0.285 53.7
## 2 AL 60.0 166. 0.404 59.5
## 3 AM 46.4 184. 0.356 54.0
## 4 AP 51.9 169. 0.372 58.1
## 5 BA 52.6 176. 0.333 55.6
## 6 CE 61.3 180. 0.241 53.3
## 7 ES 54.4 193. 0.270 50.2
## 8 GO 55.0 191. 0.179 49.7
## 9 MA 53.6 166. 0.341 58.6
## 10 MG 49.6 199. 0.204 47.8
## # ℹ 17 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.69
## Correlation between Gain and Dropout Rate: 0.52
## R2 Without drop out 0.1907272
## R2 with drop out 0.2832186
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9277 -2.2789 -0.1677 2.6976 8.0947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.19874 26.73840 2.738 0.0115 *
## avg_state_leitura_5th -0.08928 0.14614 -0.611 0.5470
## avg_drop_out_rate 24.50087 13.92255 1.760 0.0912 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.362 on 24 degrees of freedom
## Multiple R-squared: 0.2832, Adjusted R-squared: 0.2235
## F-statistic: 4.742 on 2 and 24 DF, p-value: 0.01839
## # A tibble: 27 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 62.9 168. 0.285 65.1
## 2 AL 69.6 150. 0.404 69.7
## 3 AM 61.1 164. 0.356 67.2
## 4 AP 62.7 154. 0.372 68.5
## 5 BA 65.3 160. 0.333 67.1
## 6 CE 72.0 165. 0.241 64.4
## 7 ES 68.9 172. 0.270 64.4
## 8 GO 65.9 173. 0.179 62.1
## 9 MA 67.5 150. 0.341 68.1
## 10 MG 63.0 179. 0.204 62.2
## # ℹ 17 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.54
## Correlation between Gain and Dropout Rate: 0.38
## R2 Without drop out 0.5942884
## R2 with drop out 0.5964551
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5810 -4.7036 -0.4465 3.3702 21.6183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.15913 20.97747 6.967 4.22e-07 ***
## avg_state_math_5th -0.45612 0.08961 -5.090 3.73e-05 ***
## avg_drop_out_rate -8.39365 23.88546 -0.351 0.728
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.055 on 23 degrees of freedom
## Multiple R-squared: 0.5965, Adjusted R-squared: 0.5614
## F-statistic: 17 on 2 and 23 DF, p-value: 2.936e-05
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 57.1 202. 0.222 52.1
## 2 AM 50.1 195. 0.197 55.5
## 3 AP 53.8 171. 0.302 65.4
## 4 BA 54.5 187. 0.235 58.9
## 5 CE 55.8 216. 0.0949 46.8
## 6 DF 37.2 223. 0.158 42.9
## 7 ES 51.0 220. 0.232 43.7
## 8 GO 50.9 210. 0.132 49.1
## 9 MA 57.3 184. 0.197 60.5
## 10 MG 42.1 233. 0.162 38.4
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.52
## Correlation between Gain and Dropout Rate: 0.29
## R2 Without drop out 0.5113434
## R2 with drop out 0.5209891
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.4470 -3.8158 0.2201 1.9868 15.8392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 153.04449 20.71003 7.390 1.63e-07 ***
## avg_state_leitura_5th -0.45149 0.09846 -4.586 0.000131 ***
## avg_drop_out_rate -14.67337 21.56122 -0.681 0.502954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.459 on 23 degrees of freedom
## Multiple R-squared: 0.521, Adjusted R-squared: 0.4793
## F-statistic: 12.51 on 2 and 23 DF, p-value: 0.0002108
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 69.9 188. 0.222 64.8
## 2 AM 70.1 180. 0.197 68.9
## 3 AP 69.8 161. 0.302 76.0
## 4 BA 68.6 172. 0.235 71.9
## 5 CE 75.1 202. 0.0949 60.3
## 6 DF 50.7 205. 0.158 58.0
## 7 ES 61.5 200. 0.232 59.4
## 8 GO 65.8 193. 0.132 64.2
## 9 MA 72.6 171. 0.197 72.7
## 10 MG 57.4 210. 0.162 56.0
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.66
## Correlation between Gain and Dropout Rate: 0.49
## R2 Without drop out 0.4377618
## R2 with drop out 0.4440434
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2643 -2.6634 -0.0973 3.7654 8.2769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89.55785 16.48836 5.432 1.61e-05 ***
## avg_state_math_5th -0.21318 0.07417 -2.874 0.00857 **
## avg_drop_out_rate 5.85401 11.48345 0.510 0.61506
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.348 on 23 degrees of freedom
## Multiple R-squared: 0.444, Adjusted R-squared: 0.3957
## F-statistic: 9.185 on 2 and 23 DF, p-value: 0.001169
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 52.1 184. 0.297 52.0
## 2 AL 56.7 171. 0.410 55.5
## 3 AM 45.8 188. 0.355 51.7
## 4 AP 45.5 174. 0.385 54.7
## 5 BA 51.4 180. 0.334 53.1
## 6 CE 58.7 190. 0.242 50.4
## 7 ES 50.2 212. 0.263 45.9
## 8 GO 53.9 202. 0.162 47.4
## 9 MA 52.5 169. 0.344 55.5
## 10 MG 44.5 216. 0.201 44.6
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.69
## Correlation between Gain and Dropout Rate: 0.63
## R2 Without drop out 0.3386471
## R2 with drop out 0.4425327
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3151 -1.9717 -0.3905 1.3437 8.5151
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 75.94963 15.57545 4.876 6.34e-05 ***
## avg_state_leitura_5th -0.09977 0.07810 -1.278 0.2142
## avg_drop_out_rate 20.47556 9.89018 2.070 0.0498 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.595 on 23 degrees of freedom
## Multiple R-squared: 0.4425, Adjusted R-squared: 0.3941
## F-statistic: 9.129 on 2 and 23 DF, p-value: 0.001206
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 63.6 171. 0.297 65.0
## 2 AL 69.6 155. 0.410 68.9
## 3 AM 64.2 170. 0.355 66.3
## 4 AP 60.6 159. 0.385 68.0
## 5 BA 65.0 164. 0.334 66.5
## 6 CE 71.9 176. 0.242 63.4
## 7 ES 63.8 190. 0.263 62.4
## 8 GO 67.7 184. 0.162 60.9
## 9 MA 66.7 155. 0.344 67.6
## 10 MG 60.5 194. 0.201 60.7
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.55
## Correlation between Gain and Dropout Rate: 0.55
## R2 Without drop out 0.497894
## R2 with drop out 0.5354845
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.597 -3.627 1.481 3.270 11.057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.39275 21.00817 5.112 3.54e-05 ***
## avg_state_math_5th -0.32052 0.09446 -3.393 0.0025 **
## avg_drop_out_rate 22.91661 16.79760 1.364 0.1857
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.07 on 23 degrees of freedom
## Multiple R-squared: 0.5355, Adjusted R-squared: 0.4951
## F-statistic: 13.26 on 2 and 23 DF, p-value: 0.0001481
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 50.1 205. 0.222 46.8
## 2 AM 47.0 198. 0.200 48.6
## 3 AP 46.5 178. 0.302 57.1
## 4 BA 50.9 188. 0.242 52.6
## 5 CE 40.3 224. 0.0949 37.9
## 6 DF 32.5 220. 0.158 40.4
## 7 ES 48.7 211. 0.233 45.0
## 8 GO 48.7 209. 0.131 43.5
## 9 MA 52.5 186. 0.197 52.2
## 10 MG 40.2 224. 0.162 39.2
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.53
## Correlation between Gain and Dropout Rate: 0.56
## R2 Without drop out 0.3834476
## R2 with drop out 0.457949
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.4113 -2.6414 0.0139 3.0658 11.7791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 102.6578 20.4412 5.022 4.42e-05 ***
## avg_state_leitura_5th -0.2527 0.1015 -2.491 0.0204 *
## avg_drop_out_rate 27.5465 15.4932 1.778 0.0886 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.761 on 23 degrees of freedom
## Multiple R-squared: 0.4579, Adjusted R-squared: 0.4108
## F-statistic: 9.716 on 2 and 23 DF, p-value: 0.0008739
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 64.0 190. 0.222 60.7
## 2 AM 64.6 184. 0.200 61.8
## 3 AP 60.9 165. 0.302 69.3
## 4 BA 63.8 174. 0.242 65.4
## 5 CE 57.4 209. 0.0949 52.3
## 6 DF 46.7 201. 0.158 56.2
## 7 ES 60.0 192. 0.233 60.5
## 8 GO 63.3 191. 0.131 58.1
## 9 MA 65.9 173. 0.197 64.3
## 10 MG 55.5 201. 0.162 56.2
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.63
## Correlation between Gain and Dropout Rate: 0.54
## R2 Without drop out 0.3191999
## R2 with drop out 0.3741141
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_math_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1837 -2.9419 0.3022 3.2059 7.9808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 75.34155 19.35466 3.893 0.000734 ***
## avg_state_math_5th -0.16075 0.09105 -1.766 0.090747 .
## avg_drop_out_rate 15.27568 10.75330 1.421 0.168861
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.195 on 23 degrees of freedom
## Multiple R-squared: 0.3741, Adjusted R-squared: 0.3197
## F-statistic: 6.874 on 2 and 23 DF, p-value: 0.004568
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_math_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 53.2 187. 0.297 49.8
## 2 AL 56.8 172. 0.410 53.9
## 3 AM 44.1 192. 0.355 49.9
## 4 AP 45.0 175. 0.389 53.2
## 5 BA 49.8 182. 0.334 51.1
## 6 CE 56.2 192. 0.241 48.2
## 7 ES 47.5 205. 0.263 46.4
## 8 GO 48.4 202. 0.162 45.4
## 9 MA 49.6 173. 0.344 52.8
## 10 MG 42.5 212. 0.202 44.4
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>
## Correlation between 5th Grade and Dropout Rate: -0.66
## Correlation between Gain and Dropout Rate: 0.62
## R2 Without drop out 0.249493
## R2 with drop out 0.3942836
##
## Call:
## lm(formula = avg_state_gain ~ avg_state_leitura_5th + avg_drop_out_rate,
## data = state_averages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2918 -1.4494 -0.1022 1.9905 7.2898
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.33387 17.84737 3.829 0.00086 ***
## avg_state_leitura_5th -0.06905 0.09295 -0.743 0.46506
## avg_drop_out_rate 21.54537 9.18871 2.345 0.02804 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.449 on 23 degrees of freedom
## Multiple R-squared: 0.3943, Adjusted R-squared: 0.3416
## F-statistic: 7.486 on 2 and 23 DF, p-value: 0.003134
## # A tibble: 26 × 6
## Uf avg_state_gain avg_state_leitura_5th avg_drop_out_rate predicted_gain
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AC 63.7 173. 0.297 62.8
## 2 AL 68.5 156. 0.410 66.4
## 3 AM 61.4 174. 0.355 64.0
## 4 AP 59.4 159. 0.389 65.7
## 5 BA 63.2 166. 0.334 64.1
## 6 CE 68.6 177. 0.241 61.3
## 7 ES 61.6 184. 0.263 61.3
## 8 GO 61.8 184. 0.162 59.1
## 9 MA 64.0 158. 0.344 64.9
## 10 MG 58.1 190. 0.202 59.6
## # ℹ 16 more rows
## # ℹ 1 more variable: residuals <dbl>