library(wooldridge)
library(tidyverse)
library(ggplot2)
perf.lm <- lm(math10 ~ lnchprg, data = meap93)
summary(perf.lm)
##
## Call:
## lm(formula = math10 ~ lnchprg, data = meap93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.386 -5.979 -1.207 4.865 45.845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.14271 0.99758 32.221 <2e-16 ***
## lnchprg -0.31886 0.03484 -9.152 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.566 on 406 degrees of freedom
## Multiple R-squared: 0.171, Adjusted R-squared: 0.169
## F-statistic: 83.77 on 1 and 406 DF, p-value: < 2.2e-16
plot(meap93$lnchprg, meap93$math10,
col = "steelblue",
pch = 20,
xlab = "Lunch Program",
ylab = "Student performance in math exam",
cex.main = 0.9,
main = "Level-Level Regression fit",
cex.main = 1)
abline(lm(math10 ~ lnchprg, data = meap93),
col = "blue",
lwd = 2)
The Estimated regression equation is
\[\hat{performance}=32.142-0.318(lnchprg)\]
The summary of the model gives an RMSE of 9.556 value which is quite far from 0 which indicate that the model might not be fit to the data.In \(R^2\) value of 0.171, this means that lunch program explains about 17.1% of the variation in student performance.
Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam equals to 32.142%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 0.319 given that other variable is constant.
perf.logL <- lm(log(math10) ~ lnchprg, data = meap93)
summary(perf.logL)
##
## Call:
## lm(formula = log(math10) ~ lnchprg, data = meap93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34067 -0.22219 0.03436 0.27521 1.29532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.497277 0.046338 75.47 <2e-16 ***
## lnchprg -0.016734 0.001618 -10.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4443 on 406 degrees of freedom
## Multiple R-squared: 0.2085, Adjusted R-squared: 0.2065
## F-statistic: 106.9 on 1 and 406 DF, p-value: < 2.2e-16
plot((log(meap93$math10) ~ meap93$lnchprg),
col = "steelblue",
pch = 20,
data = ceosal1,
main = "Log-Level Regression Fit",
xlab = "Lunch program",
ylab = "Student performance",
cex.main = 1)
abline(lm(log(meap93$math10) ~ meap93$lnchprg),
col = "red",
lwd = 2)
The Estimated regression equation is
\[log(\hat{performance})=3.497-0.017(lnchprg)\]
The summary of the model gives an RMSE of 0.4443 value which is quite close 0 which indicate that the model might be fit to the data.It was supported by \(R^2\) value of 0.209 which has the highest \(R^2\) among the 4 models, this means that lunch program explains about 20.9% of the variation in log(student performance).
Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 3.5%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 1.7% given that other variable is constant.
perf.Llog <- lm(math10 ~ log(lnchprg), data = meap93)
summary(perf.Llog)
##
## Call:
## lm(formula = math10 ~ log(lnchprg), data = meap93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.336 -6.253 -1.417 4.724 46.218
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.6269 2.2732 20.072 <2e-16 ***
## log(lnchprg) -7.0500 0.7287 -9.675 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.471 on 406 degrees of freedom
## Multiple R-squared: 0.1874, Adjusted R-squared: 0.1854
## F-statistic: 93.6 on 1 and 406 DF, p-value: < 2.2e-16
plot(meap93$lnchprg, meap93$math10,
col = "steelblue",
pch = 20,
xlab = "Lunch Program",
ylab = "Student performance in math exam",
cex.main = 0.9,
main = "Level-Log Regression Fit",
cex.main = 1)
order_id <- order(meap93$lnchprg)
lines(meap93$lnchprg[order_id],
fitted(perf.Llog)[order_id],
col = "violet",
lwd = 2)
The Estimated regression equation is
\[\hat{performance}=45.627-7.05log(lnchprg)\]
The summary of the model gives an RMSE of 9.471 value which is quite far from 0 which indicate that the model might not be fit to the data.In \(R^2\) value of 0.187, this explains that log(lunch program) explains about 18.7% of the variation in student performance.
Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 45.627%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 0.0705% in student performance given that other variable is constant.
perf.loglog <- lm(log(math10) ~ log(lnchprg), data = meap93)
summary(perf.loglog)
##
## Call:
## lm(formula = log(math10) ~ log(lnchprg), data = meap93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.27639 -0.22457 0.03033 0.25315 1.29443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.08338 0.10842 37.661 <2e-16 ***
## log(lnchprg) -0.33017 0.03476 -9.499 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4518 on 406 degrees of freedom
## Multiple R-squared: 0.1818, Adjusted R-squared: 0.1798
## F-statistic: 90.24 on 1 and 406 DF, p-value: < 2.2e-16
plot(log(meap93$math10) ~ log(meap93$lnchprg),
col = "steelblue",
pch = 20,
data = ceosal1,
main = "Log-Log Regression Fit",
xlab = "Lunch Program",
ylab= "Student Performance in Math Exam",
cex.main = 1)
abline(lm(log(meap93$math10) ~ log(meap93$lnchprg)),
col = "red",
lwd = 2)
The Estimated regressionv equation is
\[log(\hat{performance})=4.083-0.33log(lnchprg)\]
The summary of the model gives an RMSE of 0.4518 value which is quite far closer to 0 which indicate that the model might be a fitted model to the data.In \(R^2\) value of 0.187, this means that log(lunch program) explains about 18.7% of the variation in log(student performance).
Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 4.08%, On the other hand, for every 1% increase of lunch program the student performance decreases by 0.33% in student performance given that other variable is constant.