# Install wooldridge if not already installed
if (!require(wooldridge)) install.packages("wooldridge")
library(wooldridge)
# Load datasets
data("wage2")
data("hprice1")The model to be estimated is:
\[\log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure + u\]
Where:
Call:
lm(formula = lwage ~ educ + exper + tenure, data = wage2)
Residuals:
Min 1Q Median 3Q Max
-1.8282 -0.2401 0.0203 0.2569 1.3400
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.496696 0.110528 49.731 < 2e-16 ***
educ 0.074864 0.006512 11.495 < 2e-16 ***
exper 0.015328 0.003370 4.549 6.10e-06 ***
tenure 0.013375 0.002587 5.170 2.87e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3877 on 931 degrees of freedom
Multiple R-squared: 0.1551, Adjusted R-squared: 0.1524
F-statistic: 56.97 on 3 and 931 DF, p-value: < 2.2e-16
Hypothesis that education has no effect vs. some effect:
\[H_0: \beta_1 = 0\] \[H_1: \beta_1 \neq 0\]
This is a two-tailed test. We make no prior assumption about the direction of education’s effect on wages.
Hypothesis that education has no effect vs. a positive effect:
\[H_0: \beta_1 = 0\] \[H_1: \beta_1 > 0\]
This is a one-tailed (right) test. We expect education to increase wages.
Hypothesis that education has no effect vs. a negative effect:
\[H_0: \beta_1 = 0\] \[H_1: \beta_1 < 0\]
This is a one-tailed (left) test. We test the unlikely case that more education lowers wages.
# Extract coefficients and standard errors
b <- coef(model1)
se <- sqrt(diag(vcov(model1)))
n <- nobs(model1)
r2 <- summary(model1)$r.squared
cat("Estimated Regression Equation:\n")Estimated Regression Equation:
================================
cat(sprintf(
"log(wage) = %.4f + %.4f*educ + %.4f*exper + %.4f*tenure\n",
b["(Intercept)"], b["educ"], b["exper"], b["tenure"]
))log(wage) = 5.4967 + 0.0749*educ + 0.0153*exper + 0.0134*tenure
cat(sprintf(
"SE: (%.4f) (%.4f) (%.4f) (%.4f)\n",
se["(Intercept)"], se["educ"], se["exper"], se["tenure"]
))SE: (0.1105) (0.0065) (0.0034) (0.0026)
n = 935, R-squared = 0.1551
The full estimated model with standard errors in parentheses is:
\[\widehat{\log(wage)} = \underset{(0.1105)}{5.4967} + \underset{(0.0065)}{0.0749}\,educ + \underset{(0.0034)}{0.0153}\,exper + \underset{(0.0026)}{0.0134}\,tenure\]
\[n = 935, \quad R^2 = 0.1551\]
# Extract values for education
beta1 <- b["educ"]
se1 <- se["educ"]
df <- model1$df.residual
# Step 1: Compute t-statistic
t_educ <- beta1 / se1
# Step 2: Critical value (one-tailed, alpha = 0.05, df = 931)
t_crit_05 <- qt(0.95, df = df)
# Step 3: One-tailed p-value
p_one <- pt(t_educ, df = df, lower.tail = FALSE)
cat("=== Q1(e): t-Test for Education (One-Tailed, alpha = 5%) ===\n\n")=== Q1(e): t-Test for Education (One-Tailed, alpha = 5%) ===
Coefficient on educ : 0.074864
Standard Error : 0.006512
t-statistic : 0.074864 / 0.006512 = 11.495
Critical value (5%) : 1.6465
p-value (one-tailed) : 5.406440e-29
if(t_educ > t_crit_05){
cat("Decision: REJECT H0\n")
cat("Conclusion: Education has a significant positive effect on wages.\n")
} else {
cat("Decision: FAIL TO REJECT H0\n")
}Decision: REJECT H0
Conclusion: Education has a significant positive effect on wages.
Step-by-Step Workings:
Step 1 — Hypotheses: \[H_0: \beta_1 = 0 \qquad H_1: \beta_1 > 0\]
Step 2 — t-statistic: \[t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} = \frac{0.074864}{0.006512} = 11.495\]
Step 3 — Critical value (one-tailed, \(\alpha = 0.05\), \(df = 931\)): \[t_{critical} = 1.6464\]
Step 4 — Decision: \[11.495 > 1.6464 \quad \Rightarrow \textbf{Reject } H_0\]
Technical Conclusion: At the 5% significance level, we reject \(H_0\) and conclude that education has a statistically significant positive effect on log(wage).
Plain Language: There is very strong evidence that more years of education leads to higher wages. Specifically, each additional year of education increases wages by approximately 7.49%, holding experience and tenure constant.
=== Q1(f): p-value Approach ===
One-tailed p-value : 5.406440e-29
Alpha : 0.05
if(p_one < 0.05){
cat("Since p-value < 0.05, we REJECT H0.\n")
cat("Conclusion: Education has a significant positive effect on wages.\n")
} else {
cat("Since p-value > 0.05, we FAIL TO REJECT H0.\n")
}Since p-value < 0.05, we REJECT H0.
Conclusion: Education has a significant positive effect on wages.
Conclusion: The one-tailed p-value (\(p \approx 9.45 \times 10^{-29}\)) is far below \(\alpha = 0.05\). We reject \(H_0\) and confirm that education has a significant positive effect on wages.
# Critical value for 95% CI (two-tailed)
t_crit_975 <- qt(0.975, df = df)
ci_lo <- beta1 - t_crit_975 * se1
ci_hi <- beta1 + t_crit_975 * se1
cat("=== Q1(g): 95% Confidence Interval for Education ===\n\n")=== Q1(g): 95% Confidence Interval for Education ===
Formula: B1_hat +/- t(0.975, df) x SE(B1_hat)
t critical value (0.975, 931) : 1.9625
Margin of error : 1.9625 x 0.006512 = 0.012781
Lower Bound : 0.074864 - 0.012781 = 0.062083
Upper Bound : 0.074864 + 0.012781 = 0.087645
95% CI for education: [0.0621, 0.0876]
Formula: \[\hat{\beta}_1 \pm t_{0.975,\,931} \times SE(\hat{\beta}_1)\]
Substituting values: \[0.074864 \pm 1.9624 \times 0.006512\] \[0.074864 \pm 0.012775\]
\[\boxed{95\%\ CI: \ [0.0621, \ 0.0876]}\]
Comment: The confidence interval does not contain zero, which confirms that education has a statistically significant positive effect on log(wage) at the 5% level. We are 95% confident that one additional year of education increases log(wage) by between 6.21% and 8.76%.
\[H_0: \beta_2 = 0\] \[H_1: \beta_2 > 0\]
We expect experience to increase wages. This is a one-tailed (right) test.
beta2 <- b["exper"]
se2 <- se["exper"]
t_exper <- beta2 / se2
t_crit_10 <- qt(0.90, df = df)
p_exper <- pt(t_exper, df = df, lower.tail = FALSE)
cat("=== Q2(b): t-Test for Experience (One-Tailed, alpha = 10%) ===\n\n")=== Q2(b): t-Test for Experience (One-Tailed, alpha = 10%) ===
Coefficient on exper : 0.015328
Standard Error : 0.003370
t-statistic : 0.015328 / 0.003370 = 4.549
Critical value (10%) : 1.2825
p-value (one-tailed) : 3.049992e-06
if(t_exper > t_crit_10){
cat("Decision: REJECT H0\n")
cat("Conclusion: Experience has a significant positive effect on wages.\n")
} else {
cat("Decision: FAIL TO REJECT H0\n")
}Decision: REJECT H0
Conclusion: Experience has a significant positive effect on wages.
Step-by-Step Workings:
Step 1 — Hypotheses: \[H_0: \beta_2 = 0 \qquad H_1: \beta_2 > 0\]
Step 2 — t-statistic: \[t = \frac{\hat{\beta}_2}{SE(\hat{\beta}_2)} = \frac{0.015328}{0.003370} = 4.549\]
Step 3 — Critical value (one-tailed, \(\alpha = 0.10\), \(df = 931\)): \[t_{critical} = 1.2817\]
Step 4 — Decision: \[4.549 > 1.2817 \quad \Rightarrow \textbf{Reject } H_0\]
Technical Conclusion: At the 10% significance level, we reject \(H_0\) and conclude that experience has a statistically significant positive effect on log(wage).
Plain Language: Workers with more years of experience earn significantly higher wages. Each additional year of experience increases wages by approximately 1.53%, holding education and tenure constant.
beta3 <- b["tenure"]
se3 <- se["tenure"]
t_crit_995 <- qt(0.995, df = df)
ci_lo_ten <- beta3 - t_crit_995 * se3
ci_hi_ten <- beta3 + t_crit_995 * se3
cat("=== Q3: 99% Confidence Interval for Tenure ===\n\n")=== Q3: 99% Confidence Interval for Tenure ===
Formula: B3_hat +/- t(0.995, df) x SE(B3_hat)
t critical value (0.995, 931) : 2.5811
Margin of error : 2.5811 x 0.002587 = 0.006678
Lower Bound : 0.013375 - 0.006678 = 0.006697
Upper Bound : 0.013375 + 0.006678 = 0.020053
99% CI for tenure: [0.0067, 0.0201]
Formula: \[\hat{\beta}_3 \pm t_{0.995,\,931} \times SE(\hat{\beta}_3)\]
Substituting values: \[0.013375 \pm 2.5807 \times 0.002587\] \[0.013375 \pm 0.006676\]
\[\boxed{99\%\ CI: \ [0.0067, \ 0.0201]}\]
Comment: The interval does not include zero, confirming that tenure has a statistically significant positive effect on log(wage) even at the strict 1% significance level. Each additional year with the same employer increases wages by between 0.67% and 2.01%.
f_vals <- summary(model1)$fstatistic
F_stat <- f_vals[1]
F_df1 <- f_vals[2]
F_df2 <- f_vals[3]
F_crit_q4 <- qf(0.95, df1 = F_df1, df2 = F_df2)
p_F <- pf(F_stat, df1 = F_df1, df2 = F_df2, lower.tail = FALSE)
cat("=== Q4: Joint F-Test — All Variables (alpha = 5%) ===\n\n")=== Q4: Joint F-Test — All Variables (alpha = 5%) ===
H0: B1 = B2 = B3 = 0 (no variable matters)
H1: At least one Bj is not zero
F-statistic : 56.9739
Numerator df (k) : 3
Denominator df (n-k-1) : 931
F critical value (5%) : 2.6145
p-value : 8.119592e-34
if(F_stat > F_crit_q4){
cat("Decision: REJECT H0\n")
cat("Conclusion: All variables are jointly significant.\n")
} else {
cat("Decision: FAIL TO REJECT H0\n")
}Decision: REJECT H0
Conclusion: All variables are jointly significant.
Hypotheses: \[H_0: \beta_1 = \beta_2 = \beta_3 = 0\] \[H_1: \text{At least one } \beta_j \neq 0\]
Decision: \[F = 56.97 > F_{critical} = 2.6142 \quad \Rightarrow \textbf{Reject } H_0\]
Conclusion: At the 5% significance level, education, experience, and tenure are jointly statistically significant in explaining log(wage). The overall model is statistically meaningful.
Call:
lm(formula = price ~ sqrft + bdrms, data = hprice1)
Residuals:
Min 1Q Median 3Q Max
-127.627 -42.876 -7.051 32.589 229.003
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.31500 31.04662 -0.622 0.536
sqrft 0.12844 0.01382 9.291 1.39e-14 ***
bdrms 15.19819 9.48352 1.603 0.113
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 63.04 on 85 degrees of freedom
Multiple R-squared: 0.6319, Adjusted R-squared: 0.6233
F-statistic: 72.96 on 2 and 85 DF, p-value: < 2.2e-16
b2 <- coef(model2)
se2b <- sqrt(diag(vcov(model2)))
n2 <- nobs(model2)
r2_2 <- summary(model2)$r.squared
cat("Estimated Regression Equation:\n")Estimated Regression Equation:
================================
cat(sprintf(
"price = %.4f + %.4f*sqrft + %.4f*bdrms\n",
b2["(Intercept)"], b2["sqrft"], b2["bdrms"]
))price = -19.3150 + 0.1284*sqrft + 15.1982*bdrms
SE: (31.0466) (0.0138) (9.4835)
n = 88, R-squared = 0.6319
\[\widehat{price} = \underset{(31.0466)}{-19.3150} + \underset{(0.0138)}{0.1284}\,sqrft + \underset{(9.4835)}{15.1982}\,bdrms\]
\[n = 88, \quad R^2 = 0.6319\]
Interpretation: The model explains 63.19% of variation in house prices. Square footage and bedrooms together account for a substantial portion of price differences across the 88 houses.
df2_m <- model2$df.residual
t_crit_q5 <- qt(0.95, df = df2_m)
cat("=== Q5(b): Individual Significance at 10% Level ===\n\n")=== Q5(b): Individual Significance at 10% Level ===
Critical value (10%, df = 85): 1.6630
coef_table <- summary(model2)$coefficients
cat(sprintf("%-15s | %-8s | %-8s | %-8s | %s\n",
"Variable", "t-stat", "p-value", "Decision", ""))Variable | t-stat | p-value | Decision |
-----------------------------------------------------------------
for(v in rownames(coef_table)){
tv <- coef_table[v, "t value"]
pv <- coef_table[v, "Pr(>|t|)"]
dec <- ifelse(abs(tv) > t_crit_q5,
"SIGNIFICANT", "not significant")
cat(sprintf("%-15s | %-8.3f | %-8.4f | %s\n", v, tv, pv, dec))
}(Intercept) | -0.622 | 0.5355 | not significant
sqrft | 9.291 | 0.0000 | SIGNIFICANT
bdrms | 1.603 | 0.1127 | not significant
How we reach this conclusion:
We compare each \(|t_{calculated}|\) to \(t_{critical} = 1.6632\) (one-tailed, \(\alpha = 0.10\), \(df = 85\)):
| Variable | t-stat | p-value | Significant at 10%? |
|---|---|---|---|
| Intercept | -0.622 | 0.536 | ❌ No |
| sqrft | 9.291 | 0.000 | ✅ Yes |
| bdrms | 1.603 | 0.113 | ❌ No |
Only sqrft is individually significant. The p-value
for bdrms (0.113) exceeds \(\alpha = 0.10\), so we cannot reject \(H_0\) for bedrooms individually.
# Unrestricted model R-squared
R2_unres <- summary(model2)$r.squared
# Restricted model (intercept only)
model2_res <- lm(price ~ 1, data = hprice1)
R2_res <- summary(model2_res)$r.squared
# Parameters
q_r <- 2 # number of restrictions
n_obs <- nobs(model2)
k_v <- 2 # number of predictors
# Manual F-statistic
F_num <- (R2_unres - R2_res) / q_r
F_den <- (1 - R2_unres) / (n_obs - k_v - 1)
F_manual <- F_num / F_den
F_crit_q5 <- qf(0.95, df1 = q_r, df2 = n_obs - k_v - 1)
cat("=== Q5(c): Joint F-Test for sqrft and bdrms (alpha = 5%) ===\n\n")=== Q5(c): Joint F-Test for sqrft and bdrms (alpha = 5%) ===
H0: B1 = B2 = 0
H1: At least one of B1, B2 is not zero
Formula:
F = [(R2_unres - R2_res)/q] / [(1 - R2_unres)/(n - k - 1)]
R-squared unrestricted : 0.6319
R-squared restricted : 0.0000
q (restrictions) : 2
n (observations) : 88
Denominator df : 85
Numerator : (0.6319 - 0.0000) / 2 = 0.315959
Denominator : (1 - 0.6319) / 85 = 0.004330
F-statistic computed : 72.9635
F critical value (5%) : 3.1038
if(F_manual > F_crit_q5){
cat("\nDecision: REJECT H0\n")
cat("Conclusion: sqrft and bdrms are jointly significant.\n")
} else {
cat("\nDecision: FAIL TO REJECT H0\n")
}
Decision: REJECT H0
Conclusion: sqrft and bdrms are jointly significant.
Hypotheses: \[H_0: \beta_1 = \beta_2 = 0\] \[H_1: \beta_1 \neq 0 \text{ and/or } \beta_2 \neq 0\]
Formula: \[F = \frac{(R^2_{unres} - R^2_{res})/q}{(1 - R^2_{unres})/(n-k-1)}\]
Substituting values: \[F = \frac{(0.6319 - 0)/2}{(1 - 0.6319)/85} = \frac{0.3160}{0.00433} = 72.97\]
Critical value (\(\alpha = 0.05\), \(df_1 = 2\), \(df_2 = 85\)): \[F_{critical} = 3.1013\]
Decision: \[72.97 > 3.1013 \quad \Rightarrow \textbf{Reject } H_0\]
Conclusion: At the 5% significance level, we reject
\(H_0\). Even though bdrms
is not individually significant, sqrft and bdrms are jointly
statistically significant in explaining house prices. Dropping
both variables from the model would make it significantly worse.
| Question | Test | Decision | Finding |
|---|---|---|---|
| Q1(e) | t-test: educ (5%, one-tail) | Reject H0 | Education increases wages |
| Q1(f) | p-value: educ | Reject H0 | Education increases wages |
| Q1(g) | 95% CI: educ | [0.0621, 0.0876] | Zero not in interval — significant |
| Q2(b) | t-test: exper (10%, one-tail) | Reject H0 | Experience increases wages |
| Q3 | 99% CI: tenure | [0.0067, 0.0201] | Zero not in interval — significant |
| Q4 | Joint F-test (5%) | Reject H0 | All 3 variables jointly significant |
| Q5(b) | Individual sig (10%) | sqrft only | Only sqrft significant individually |
| Q5(c) | Joint F-test: sqrft & bdrms (5%) | Reject H0 | Both jointly significant |
End of Assignment — Brian Okumu | BE01/1011/2023