Data
## Warning: package 'AER' was built under R version 4.3.3
## Loading required package: car
## Warning: package 'car' was built under R version 4.3.3
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# Explore the dataset
head(df)
## district municipality expreg expspecial expbil expocc exptot scratio special
## 1 1 Abington 4201 7375.69 0 0 4646 16.6 14.6
## 2 2 Acton 4129 8573.99 0 0 4930 5.7 17.4
## 3 3 Acushnet 3627 8081.72 0 0 4281 7.5 12.1
## 4 5 Agawam 4015 8181.37 0 0 4826 8.6 21.1
## 5 7 Amesbury 4273 7037.22 0 0 4824 6.1 16.8
## 6 8 Amherst 5183 10595.80 6235 0 6454 7.7 17.2
## lunch stratio income score4 score8 salary english
## 1 11.8 19.0 16.379 714 691 34.3600 0.0000000
## 2 2.5 22.6 25.792 731 NA 38.0630 1.2461059
## 3 14.1 19.3 14.040 704 693 32.4910 0.0000000
## 4 12.1 17.9 16.111 704 691 33.1060 0.3225806
## 5 17.4 17.5 15.423 701 699 34.4365 0.0000000
## 6 26.8 15.7 11.144 714 NA NA 3.9215686
str(df)
## 'data.frame': 220 obs. of 16 variables:
## $ district : chr "1" "2" "3" "5" ...
## $ municipality: chr "Abington" "Acton" "Acushnet" "Agawam" ...
## $ expreg : int 4201 4129 3627 4015 4273 5183 4685 5518 5009 3823 ...
## $ expspecial : num 7376 8574 8082 8181 7037 ...
## $ expbil : int 0 0 0 0 0 6235 0 0 0 12943 ...
## $ expocc : int 0 0 0 0 0 0 0 0 0 11519 ...
## $ exptot : int 4646 4930 4281 4826 4824 6454 5537 6405 5649 4814 ...
## $ scratio : num 16.6 5.7 7.5 8.6 6.1 ...
## $ special : num 14.6 17.4 12.1 21.1 16.8 ...
## $ lunch : num 11.8 2.5 14.1 12.1 17.4 ...
## $ stratio : num 19 22.6 19.3 17.9 17.5 ...
## $ income : num 16.4 25.8 14 16.1 15.4 ...
## $ score4 : num 714 731 704 704 701 714 725 717 702 701 ...
## $ score8 : num 691 NA 693 691 699 NA 728 715 705 688 ...
## $ salary : num 34.4 38.1 32.5 33.1 34.4 ...
## $ english : num 0 1.246 0 0.323 0 ...
stargazer(df, type = "text", title = "Summary of Dataset", summary = TRUE)
##
## Summary of Dataset
## ========================================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------------------
## expreg 220 4,605.464 880.252 2,905 8,759
## expspecial 220 8,900.727 3,511.696 3,832.230 53,569.240
## expbil 220 3,037.309 20,259.260 0 295,140
## expocc 220 1,104.209 2,732.449 0 15,088
## exptot 220 5,370.250 977.040 3,465 9,868
## scratio 211 8.107 2.836 2.300 18.400
## special 220 15.968 3.538 8.100 34.300
## lunch 220 15.316 15.060 0.400 76.200
## stratio 220 17.344 2.277 11.400 27.000
## income 220 18.747 5.808 9.686 46.855
## score4 220 709.827 15.126 658 740
## score8 180 698.411 21.053 641 747
## salary 195 35.993 3.191 24.965 44.494
## english 220 1.118 2.901 0.000 24.494
## --------------------------------------------------------
# Visualize missing values in the dataset
options(repr.plot.width = 6, repr.plot.height = 3)
vis_miss(df)

#Remove null values
df <- na.omit(df)
stargazer(df, type = "text", title = "Summary of Cleaned Dataset", summary = TRUE)
##
## Summary of Cleaned Dataset
## ========================================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------------------
## expreg 155 4,637.394 768.997 3,023 7,944
## expspecial 155 8,592.224 1,709.364 3,832.230 15,740.580
## expbil 155 4,118.619 24,050.970 0 295,140
## expocc 155 1,318.368 2,905.513 0 15,088
## exptot 155 5,389.645 861.020 3,868 8,623
## scratio 155 8.174 2.656 2.600 16.600
## special 155 15.957 3.266 10.400 26.000
## lunch 155 16.706 16.605 0.500 76.200
## stratio 155 17.370 2.178 11.400 27.000
## income 155 18.350 5.378 9.686 46.855
## score4 155 708.335 16.078 658 740
## score8 155 697.626 21.471 641 747
## salary 155 36.169 3.227 24.965 44.494
## english 155 1.346 3.257 0.000 24.494
## --------------------------------------------------------
Main Regression Model
Specification
Specify your linear regression
\(\text{8th grade score} = \beta_0 +
\beta_1 \cdot \text{Income per capita}_i + \beta_2 \cdot
\text{Student-teacher ratio}_i + \beta_3 \cdot \text{Total expenditure
per pupil}_i + \epsilon_i\)
The goal is to predict the 8th-grade score based on factors such as
income, student-teacher ratio, and expenditures. The hypothesis is that
districts with higher income & expenditure with a lower
student-teacher ratio are likely to have better
resources, which may lead to higher test
scores.
Estimation
Estimate your model.
subset_df <- df[, c("income", "stratio", "score8", "exptot")]
# Fit the linear regression model
model <- lm(score8 ~ income + stratio + exptot, data = subset_df)
# Display the summary of the regression model
summary(model)
##
## Call:
## lm(formula = score8 ~ income + stratio + exptot, data = subset_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.703 -7.634 -0.295 6.848 47.139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 714.023990 12.519185 57.034 < 2e-16 ***
## income 3.486270 0.211337 16.496 < 2e-16 ***
## stratio -2.207935 0.484608 -4.556 1.07e-05 ***
## exptot -0.007796 0.001369 -5.696 6.23e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.12 on 151 degrees of freedom
## Multiple R-squared: 0.6875, Adjusted R-squared: 0.6813
## F-statistic: 110.7 on 3 and 151 DF, p-value: < 2.2e-16
Income (3.49): For each one-unit increase in
income (per capita), the predicted score8 increases by 3.49 points,
holding other variables constant. This makes sense, as wealthier
districts are likely to have better resources and facilities,
contributing to higher test scores.
Student-teacher ratio (stratio) (-2.21): For
each one-unit increase in the student-teacher ratio, the predicted
score8 decreases by 2.21 points, holding other variables constant. This
is logical because a higher student-teacher ratio usually means less
individual attention for students, leading to lower test
scores.
Total expenditures per pupil (exptot) (-0.0078):
For each one-unit increase in total expenditures per pupil, the
predicted score8 decreases by 0.0078 points. This result is
counterintuitive because I would expect that higher spending would lead
to better test scores.
vif(model)
## income stratio exptot
## 1.353946 1.167968 1.455779