Q1
data <- mutate (CASchools, stu_teach = students / teachers)
Q2
data <- mutate (data, avg_score = (read + math) / 2)
Q3
lg1 <- lm(avg_score ~ stu_teach, data = data)
summary (lg1)
##
## Call:
## lm(formula = avg_score ~ stu_teach, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.727 -14.251 0.483 12.822 48.540
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 698.9329 9.4675 73.825 < 2e-16 ***
## stu_teach -2.2798 0.4798 -4.751 2.78e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.58 on 418 degrees of freedom
## Multiple R-squared: 0.05124, Adjusted R-squared: 0.04897
## F-statistic: 22.58 on 1 and 418 DF, p-value: 2.783e-06
Ans: The coefficient signifies that if the student to teacher ratio increases by one unit then the average score drops by approximately 2.28. This shows that there is a negative correlation between the two
Q4
ggplot(data = data, aes(x = stu_teach, y = avg_score)) +
geom_point() +
stat_smooth(method = "lm")
Q5
lg2 <- lm(avg_score ~ stu_teach + english + expenditure, data = data)
summary (lg2)
##
## Call:
## lm(formula = avg_score ~ stu_teach + english + expenditure, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51.340 -10.111 0.293 10.318 43.181
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 649.577947 15.205717 42.719 < 2e-16 ***
## stu_teach -0.286399 0.480523 -0.596 0.55149
## english -0.656023 0.039106 -16.776 < 2e-16 ***
## expenditure 0.003868 0.001412 2.739 0.00643 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.35 on 416 degrees of freedom
## Multiple R-squared: 0.4366, Adjusted R-squared: 0.4325
## F-statistic: 107.5 on 3 and 416 DF, p-value: < 2.2e-16
Ans: There is a negative correlation between the average score and the student to teacher ratio and percentage of English learners. As ‘stu_teach’ ratio increases by one unit, the avg_score drops by 0.286. As the percentage of English learners increase by one unit, the avg_score drops by 0.656. The negative correlation between percentage of English learners and the avg score is stronger. There is a positive correlation between the expenditure per student and the avg score but it is not of that much importance. In this case the variable ‘stu_teach’ is not stastically significant.
Q6 Ans: The student to teacher ratio co-efficient is lower in Q5 than what it was in Q3 This might be because, in Q5 we ran a multi variate regression and there were other control variables such as percentage of english learners and expenditure that were contributing to the change in average score. As a result the effect of student to teacher ratio was reduced. In Q3, variables such as ‘english’ and ‘expenditure’ were acting as omitted variables and they might have been co related with the ‘stu_teach’ ratio and were biasing upwards the effect of ‘stu_teach’ ratio on the average score. We can confirm this by checking the co relation between ‘stu_teach’ and variables such as ‘english’ and expenditure.
check <- lm (stu_teach ~ english + expenditure, data = data)
summary (check)
##
## Call:
## lm(formula = stu_teach ~ english + expenditure, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6152 -0.8798 0.0000 0.8414 4.3801
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.071055 0.612073 47.496 < 2e-16 ***
## english 0.014909 0.003918 3.806 0.000163 ***
## expenditure -0.001819 0.000113 -16.100 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.463 on 417 degrees of freedom
## Multiple R-squared: 0.405, Adjusted R-squared: 0.4022
## F-statistic: 141.9 on 2 and 417 DF, p-value: < 2.2e-16
Q7
model1 <- lm (math ~ computer + expenditure + income + calworks + lunch, data = data)
summary (model1)
##
## Call:
## lm(formula = math ~ computer + expenditure + income + calworks +
## lunch, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.921 -6.803 0.237 6.151 33.410
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.577e+02 4.429e+00 148.504 < 2e-16 ***
## computer -4.891e-04 1.154e-03 -0.424 0.672
## expenditure 1.261e-03 8.662e-04 1.455 0.146
## income 6.146e-01 1.041e-01 5.901 7.51e-09 ***
## calworks -4.421e-02 6.515e-02 -0.679 0.498
## lunch -4.409e-01 3.214e-02 -13.717 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.09 on 414 degrees of freedom
## Multiple R-squared: 0.7141, Adjusted R-squared: 0.7106
## F-statistic: 206.8 on 5 and 414 DF, p-value: < 2.2e-16
model2 <- lm (read ~ computer + expenditure + income + calworks + lunch, data = data)
summary (model2)
##
## Call:
## lm(formula = read ~ computer + expenditure + income + calworks +
## lunch, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.533 -5.312 0.005 5.109 36.070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.562e+02 3.864e+00 169.823 < 2e-16 ***
## computer -3.073e-03 1.007e-03 -3.052 0.00242 **
## expenditure 3.656e-03 7.557e-04 4.838 1.85e-06 ***
## income 3.899e-01 9.086e-02 4.291 2.21e-05 ***
## calworks 1.037e-01 5.684e-02 1.825 0.06874 .
## lunch -6.045e-01 2.804e-02 -21.556 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.802 on 414 degrees of freedom
## Multiple R-squared: 0.8107, Adjusted R-squared: 0.8084
## F-statistic: 354.6 on 5 and 414 DF, p-value: < 2.2e-16
Ans: This assumption is not valid since the variables such as ‘computer’, ‘expenditure’, ‘income’, ‘calworks’ are statistically more significant when effecting the scores for reading. However, these variables are less significant when effecting the scores for maths.