Exercise 1

Data and library

library(wooldridge)
library(tidyverse)
library(ggplot2)

A

A. Level-Level

perf.lm <- lm(math10 ~ lnchprg, data = meap93)
summary(perf.lm)
## 
## Call:
## lm(formula = math10 ~ lnchprg, data = meap93)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.386  -5.979  -1.207   4.865  45.845 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 32.14271    0.99758  32.221   <2e-16 ***
## lnchprg     -0.31886    0.03484  -9.152   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.566 on 406 degrees of freedom
## Multiple R-squared:  0.171,  Adjusted R-squared:  0.169 
## F-statistic: 83.77 on 1 and 406 DF,  p-value: < 2.2e-16
plot(meap93$lnchprg, meap93$math10,
     col = "steelblue",
     pch = 20,
     xlab = "Lunch Program", 
     ylab = "Student performance in math exam",
     cex.main = 0.9,
     main = "Level-Level Regression fit", 
     cex.main = 1)
abline(lm(math10 ~ lnchprg, data = meap93),
       col = "blue", 
       lwd = 2)

  1. The Estimated regression equation is
    \[\hat{performance}=32.142-0.318(lnchprg)\]

  2. The summary of the model gives an RMSE of 9.556 value which is quite far from 0 which indicate that the model might not be fit to the data.In \(R^2\) value of 0.171, this means that lunch program explains about 17.1% of the variation in student performance.

  3. Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam equals to 32.142%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 0.319 given that other variable is constant.

A. Log-Level

perf.logL <- lm(log(math10) ~ lnchprg, data = meap93)
summary(perf.logL)
## 
## Call:
## lm(formula = log(math10) ~ lnchprg, data = meap93)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.34067 -0.22219  0.03436  0.27521  1.29532 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.497277   0.046338   75.47   <2e-16 ***
## lnchprg     -0.016734   0.001618  -10.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4443 on 406 degrees of freedom
## Multiple R-squared:  0.2085, Adjusted R-squared:  0.2065 
## F-statistic: 106.9 on 1 and 406 DF,  p-value: < 2.2e-16
plot((log(meap93$math10) ~ meap93$lnchprg), 
     col = "steelblue",
     pch = 20,
     data = ceosal1,
     main = "Log-Level Regression Fit", 
     xlab = "Lunch program",
     ylab = "Student performance",
     cex.main = 1)
abline(lm(log(meap93$math10) ~ meap93$lnchprg), 
       col = "red", 
       lwd = 2)

  1. The Estimated regression equation is
    \[log(\hat{performance})=3.497-0.017(lnchprg)\]

  2. The summary of the model gives an RMSE of 0.4443 value which is quite close 0 which indicate that the model might be fit to the data.It was supported by \(R^2\) value of 0.209 which has the highest \(R^2\) among the 4 models, this means that lunch program explains about 20.9% of the variation in log(student performance).

  3. Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 3.5%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 1.7% given that other variable is constant.

A. Level-Log

perf.Llog <- lm(math10 ~ log(lnchprg), data = meap93)
summary(perf.Llog)
## 
## Call:
## lm(formula = math10 ~ log(lnchprg), data = meap93)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.336  -6.253  -1.417   4.724  46.218 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   45.6269     2.2732  20.072   <2e-16 ***
## log(lnchprg)  -7.0500     0.7287  -9.675   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.471 on 406 degrees of freedom
## Multiple R-squared:  0.1874, Adjusted R-squared:  0.1854 
## F-statistic:  93.6 on 1 and 406 DF,  p-value: < 2.2e-16
plot(meap93$lnchprg, meap93$math10,
     col = "steelblue",
     pch = 20,
     xlab = "Lunch Program", 
     ylab = "Student performance in math exam",
     cex.main = 0.9,
     main = "Level-Log Regression Fit", 
     cex.main = 1)
order_id  <- order(meap93$lnchprg)
lines(meap93$lnchprg[order_id],
      fitted(perf.Llog)[order_id], 
      col = "violet", 
      lwd = 2)

  1. The Estimated regression equation is
    \[\hat{performance}=45.627-7.05log(lnchprg)\]

  2. The summary of the model gives an RMSE of 9.471 value which is quite far from 0 which indicate that the model might not be fit to the data.In \(R^2\) value of 0.187, this explains that log(lunch program) explains about 18.7% of the variation in student performance.

  3. Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 45.627%, On the other hand, for every 1% increase in the percentage of lunch program the student performance decreases by 0.0705% in student performance given that other variable is constant.

A. Log-Log

perf.loglog <- lm(log(math10) ~ log(lnchprg), data = meap93)
summary(perf.loglog)
## 
## Call:
## lm(formula = log(math10) ~ log(lnchprg), data = meap93)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27639 -0.22457  0.03033  0.25315  1.29443 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.08338    0.10842  37.661   <2e-16 ***
## log(lnchprg) -0.33017    0.03476  -9.499   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4518 on 406 degrees of freedom
## Multiple R-squared:  0.1818, Adjusted R-squared:  0.1798 
## F-statistic: 90.24 on 1 and 406 DF,  p-value: < 2.2e-16
plot(log(meap93$math10) ~ log(meap93$lnchprg), 
     col = "steelblue",
     pch = 20,
     data = ceosal1,
     main = "Log-Log Regression Fit", 
     xlab = "Lunch Program",
     ylab= "Student Performance in Math Exam",
     cex.main = 1)
abline(lm(log(meap93$math10) ~ log(meap93$lnchprg)), 
       col = "red", 
       lwd = 2)

  1. The Estimated regressionv equation is
    \[log(\hat{performance})=4.083-0.33log(lnchprg)\]

  2. The summary of the model gives an RMSE of 0.4518 value which is quite far closer to 0 which indicate that the model might be a fitted model to the data.In \(R^2\) value of 0.187, this means that log(lunch program) explains about 18.7% of the variation in log(student performance).

  3. Basically for the intercept in the model, it says that when a student is not eligible for the lunch program then the student performance in math exam is equal to 4.08%, On the other hand, for every 1% increase of lunch program the student performance decreases by 0.33% in student performance given that other variable is constant.