The data are from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999. The data set contains information on test performance, school characteristics and student demographic backgrounds for school districts.
Test scores are on the Stanford 9 standardized test administered to 5th grade students. School characteristics (averaged across the district) include enrollment, number of teachers (measured as “full-time equivalents”, number of computers per classroom, and expenditures per student. Demographic variables for the students are averaged across the district.
A data frame with 420 observations on the following 14 variables.
data(CASchools, package="AER")
dta <- CASchools# have a look
glimpse(dta)Rows: 420
Columns: 14
$ district <chr> "75119", "61499", "61549", "61457", "61523", "62042", "685…
$ school <chr> "Sunol Glen Unified", "Manzanita Elementary", "Thermalito …
$ county <fct> Alameda, Butte, Butte, Butte, Butte, Fresno, San Joaquin, …
$ grades <fct> KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK…
$ students <dbl> 195, 240, 1550, 243, 1335, 137, 195, 888, 379, 2247, 446, …
$ teachers <dbl> 10.90, 11.15, 82.90, 14.00, 71.50, 6.40, 10.00, 42.50, 19.…
$ calworks <dbl> 0.5102, 15.4167, 55.0323, 36.4754, 33.1086, 12.3188, 12.90…
$ lunch <dbl> 2.0408, 47.9167, 76.3226, 77.0492, 78.4270, 86.9565, 94.62…
$ computer <dbl> 67, 101, 169, 85, 171, 25, 28, 66, 35, 0, 86, 56, 25, 0, 3…
$ expenditure <dbl> 6384.91, 5099.38, 5501.95, 7101.83, 5235.99, 5580.15, 5253…
$ income <dbl> 22.69000, 9.82400, 8.97800, 8.97800, 9.08033, 10.41500, 6.…
$ english <dbl> 0.00000, 4.58333, 30.00000, 0.00000, 13.85768, 12.40876, 6…
$ read <dbl> 691.6, 660.5, 636.3, 651.9, 641.8, 605.7, 604.5, 605.5, 60…
$ math <dbl> 690.0, 661.9, 650.9, 643.5, 639.9, 605.4, 609.0, 612.5, 61…
We compute the student-teacher ratio and a score which is the average of math and reading scores. The number of schools in each county is calculated and augment to the data frame.
dta <- dta |> mutate(stratio = students/teachers,
score = (math+read)/2) |>
add_count(county)# OLS regression lines over county with more than 10 schools
ggplot(subset(dta, n > 10), aes(x=stratio, y=score, group=county)) +
geom_point(alpha=0.5) +
stat_smooth(aes(group=1),
method="lm", formula=y ~ x, se=F) +
labs(x="Student-teacher ratio",
y="Score (average of math and reading)") +
theme_minimal() # OLS regression lines over county with more than 10 schools
ggplot(subset(dta, n > 10), aes(x=lunch, y=score, group=county, color=county)) +
geom_point(alpha=0.5) +
stat_smooth(aes(group=1),
method="lm", formula=y ~ x, se=T) +
labs(x="lunch",
y="Score (average of math and reading)") +
theme_minimal()
- Replace stratio with lunch in the above plot.
summary(lm(score ~ stratio, data=subset(dta, n > 10)))
Call:
lm(formula = score ~ stratio, data = subset(dta, n > 10))
Residuals:
Min 1Q Median 3Q Max
-44.44 -14.23 0.42 13.92 47.84
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 709.472 13.324 53.2 < 2e-16
stratio -2.830 0.673 -4.2 3.7e-05
Residual standard error: 19.5 on 249 degrees of freedom
Multiple R-squared: 0.0663, Adjusted R-squared: 0.0625
F-statistic: 17.7 on 1 and 249 DF, p-value: 3.66e-05
nlme::lmList(score ~ stratio | county, data=subset(dta, n > 10))Call:
Model: score ~ stratio | county
Data: subset(dta, n > 10)
Coefficients:
(Intercept) stratio
Fresno 663.794 -1.460448
Humboldt 573.106 4.652889
Kern 590.222 2.458576
Los Angeles 758.799 -5.315277
Merced 672.068 -1.795721
Orange 662.864 -0.403029
Placer 655.978 0.491432
San Diego 737.238 -3.877054
San Mateo 813.695 -7.901506
Santa Barbara 704.248 -1.903107
Santa Clara 822.429 -7.921588
Shasta 649.677 0.478515
Sonoma 673.143 -0.465782
Tulare 631.051 0.371337
Degrees of freedom: 251 total; 223 residual
Residual standard error: 15.9413
Regress score over student-teacher ratio for each county. Compare the average estimated slope coefficients with the value of estimates from the overall regression ignoring county clusters.
Stock, J.H. & Watson, M.W. (2007). Introduction to Econometrics. 2nd Ed. Boston: Addison Wesley.