11/3

C6 Use the data in ECONMATH to answer this question.

library(wooldridge)
data <- wooldridge::econmath
head(data,5)

##   age work study econhs colgpa hsgpa acteng actmth act mathscr male calculus
## 1  23   15  10.0      0 3.4909 3.355     24     26  27      10    1        1
## 2  23    0  22.5      1 2.1000 3.219     23     20  24       9    1        0
## 3  21   25  12.0      0 3.0851 3.306     21     24  21       8    1        1
## 4  22   30  40.0      0 2.6805 3.977     31     28  31      10    0        1
## 5  22   25  15.0      1 3.7454 3.890     28     31  32       8    1        1
##   attexc attgood fathcoll mothcoll score
## 1      0       0        1        1 84.43
## 2      0       0        0        1 57.38
## 3      1       0        0        1 66.39
## 4      0       1        1        1 81.15
## 5      0       1        0        1 95.90

Logically, what are the smallest and largest values that can be taken on by the variable score? What are the smallest and largest values in the sample? ANSWER: Logically, the smallest and the largest values of score wouldbe min=0 and max=100. From the sample max and the min scores are:

max(data$score)

## [1] 98.44

min(data$score)

## [1] 19.53

Consider the linear model score = Bo + B1colgpa + B2actmth + B3acteng + u. Why cannot Assumption MLR.6 hold for the error term u? What consequences does this have for using the usual t statistic to test Ho: B3 = 0? ANSWER: MLR6 can’t hold for u because the score doesn’t have a normal distribution, therefore we can’t reject H0: B3=0

model <- lm(score ~ colgpa + actmth + acteng, data=data)
summary(model)

## 
## Call:
## lm(formula = score ~ colgpa + actmth + acteng, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.855  -6.215   0.444   6.812  32.670 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.17402    2.80044   5.776 1.09e-08 ***
## colgpa      12.36620    0.71506  17.294  < 2e-16 ***
## actmth       0.88335    0.11220   7.873 1.11e-14 ***
## acteng       0.05176    0.11106   0.466    0.641    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.35 on 810 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.3972, Adjusted R-squared:  0.395 
## F-statistic: 177.9 on 3 and 810 DF,  p-value: < 2.2e-16

plot(model)

Estimate the model from part (ii) and obtain the statistic and associated p-value for testing Ho: B3 = 0. How would you defend your findings to someone who makes the following statement: “You cannot trust that p-value because clearly the error term in the equation cannot have a normal distribution.” ANSWER:Although the assumption of normality for the error term is important for certain statistical tests, violating this assumption does not necessarily invalidate the p-value. The p-value is calculated based on the assumptions made in the model and the available data.

11/3

2023-11-03