1 Setup: Load Package and Data

# Install wooldridge if not already installed
if (!require(wooldridge)) install.packages("wooldridge")
library(wooldridge)

# Load datasets
data("wage2")
data("hprice1")

2 Questions 1–4: wage2 Dataset

2.1 Econometric Model

The model to be estimated is:

\[\log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure + u\]

Where:

  • \(\log(wage)\) = natural log of monthly wage
  • \(educ\) = years of education
  • \(exper\) = years of work experience
  • \(tenure\) = years with current employer
  • \(u\) = error term

2.2 Estimate the Regression Model

model1 <- lm(lwage ~ educ + exper + tenure, data = wage2)
summary(model1)

Call:
lm(formula = lwage ~ educ + exper + tenure, data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8282 -0.2401  0.0203  0.2569  1.3400 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5.496696   0.110528  49.731  < 2e-16 ***
educ        0.074864   0.006512  11.495  < 2e-16 ***
exper       0.015328   0.003370   4.549 6.10e-06 ***
tenure      0.013375   0.002587   5.170 2.87e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3877 on 931 degrees of freedom
Multiple R-squared:  0.1551,    Adjusted R-squared:  0.1524 
F-statistic: 56.97 on 3 and 931 DF,  p-value: < 2.2e-16

3 Question 1: Education and log(wage)

3.1 Q1(a) — Two-Sided Hypothesis

Hypothesis that education has no effect vs. some effect:

\[H_0: \beta_1 = 0\] \[H_1: \beta_1 \neq 0\]

This is a two-tailed test. We make no prior assumption about the direction of education’s effect on wages.


3.2 Q1(b) — One-Sided Positive Hypothesis

Hypothesis that education has no effect vs. a positive effect:

\[H_0: \beta_1 = 0\] \[H_1: \beta_1 > 0\]

This is a one-tailed (right) test. We expect education to increase wages.


3.3 Q1(c) — One-Sided Negative Hypothesis

Hypothesis that education has no effect vs. a negative effect:

\[H_0: \beta_1 = 0\] \[H_1: \beta_1 < 0\]

This is a one-tailed (left) test. We test the unlikely case that more education lowers wages.


3.4 Q1(d) — Full Regression Results

# Extract coefficients and standard errors
b  <- coef(model1)
se <- sqrt(diag(vcov(model1)))
n  <- nobs(model1)
r2 <- summary(model1)$r.squared

cat("Estimated Regression Equation:\n")
Estimated Regression Equation:
cat("================================\n")
================================
cat(sprintf(
  "log(wage) = %.4f + %.4f*educ + %.4f*exper + %.4f*tenure\n",
  b["(Intercept)"], b["educ"], b["exper"], b["tenure"]
))
log(wage) = 5.4967 + 0.0749*educ + 0.0153*exper + 0.0134*tenure
cat(sprintf(
  "SE:         (%.4f)    (%.4f)       (%.4f)       (%.4f)\n",
  se["(Intercept)"], se["educ"], se["exper"], se["tenure"]
))
SE:         (0.1105)    (0.0065)       (0.0034)       (0.0026)
cat(sprintf("n = %d,  R-squared = %.4f\n", n, r2))
n = 935,  R-squared = 0.1551

The full estimated model with standard errors in parentheses is:

\[\widehat{\log(wage)} = \underset{(0.1105)}{5.4967} + \underset{(0.0065)}{0.0749}\,educ + \underset{(0.0034)}{0.0153}\,exper + \underset{(0.0026)}{0.0134}\,tenure\]

\[n = 935, \quad R^2 = 0.1551\]


3.5 Q1(e) — t-Test for Education at 5% Significance (One-Tailed)

# Extract values for education
beta1     <- b["educ"]
se1       <- se["educ"]
df        <- model1$df.residual

# Step 1: Compute t-statistic
t_educ    <- beta1 / se1

# Step 2: Critical value (one-tailed, alpha = 0.05, df = 931)
t_crit_05 <- qt(0.95, df = df)

# Step 3: One-tailed p-value
p_one     <- pt(t_educ, df = df, lower.tail = FALSE)

cat("=== Q1(e): t-Test for Education (One-Tailed, alpha = 5%) ===\n\n")
=== Q1(e): t-Test for Education (One-Tailed, alpha = 5%) ===
cat(sprintf("Coefficient on educ  : %.6f\n", beta1))
Coefficient on educ  : 0.074864
cat(sprintf("Standard Error       : %.6f\n", se1))
Standard Error       : 0.006512
cat(sprintf("t-statistic          : %.6f / %.6f = %.3f\n",
            beta1, se1, t_educ))
t-statistic          : 0.074864 / 0.006512 = 11.495
cat(sprintf("Critical value (5%%) : %.4f\n", t_crit_05))
Critical value (5%) : 1.6465
cat(sprintf("p-value (one-tailed) : %e\n\n", p_one))
p-value (one-tailed) : 5.406440e-29
if(t_educ > t_crit_05){
  cat("Decision: REJECT H0\n")
  cat("Conclusion: Education has a significant positive effect on wages.\n")
} else {
  cat("Decision: FAIL TO REJECT H0\n")
}
Decision: REJECT H0
Conclusion: Education has a significant positive effect on wages.

Step-by-Step Workings:

Step 1 — Hypotheses: \[H_0: \beta_1 = 0 \qquad H_1: \beta_1 > 0\]

Step 2 — t-statistic: \[t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} = \frac{0.074864}{0.006512} = 11.495\]

Step 3 — Critical value (one-tailed, \(\alpha = 0.05\), \(df = 931\)): \[t_{critical} = 1.6464\]

Step 4 — Decision: \[11.495 > 1.6464 \quad \Rightarrow \textbf{Reject } H_0\]

Technical Conclusion: At the 5% significance level, we reject \(H_0\) and conclude that education has a statistically significant positive effect on log(wage).

Plain Language: There is very strong evidence that more years of education leads to higher wages. Specifically, each additional year of education increases wages by approximately 7.49%, holding experience and tenure constant.


3.6 Q1(f) — Conclusion Using p-value

cat("=== Q1(f): p-value Approach ===\n\n")
=== Q1(f): p-value Approach ===
cat(sprintf("One-tailed p-value : %e\n", p_one))
One-tailed p-value : 5.406440e-29
cat("Alpha              : 0.05\n\n")
Alpha              : 0.05
if(p_one < 0.05){
  cat("Since p-value < 0.05, we REJECT H0.\n")
  cat("Conclusion: Education has a significant positive effect on wages.\n")
} else {
  cat("Since p-value > 0.05, we FAIL TO REJECT H0.\n")
}
Since p-value < 0.05, we REJECT H0.
Conclusion: Education has a significant positive effect on wages.

Conclusion: The one-tailed p-value (\(p \approx 9.45 \times 10^{-29}\)) is far below \(\alpha = 0.05\). We reject \(H_0\) and confirm that education has a significant positive effect on wages.


3.7 Q1(g) — 95% Confidence Interval for Education

# Critical value for 95% CI (two-tailed)
t_crit_975 <- qt(0.975, df = df)
ci_lo      <- beta1 - t_crit_975 * se1
ci_hi      <- beta1 + t_crit_975 * se1

cat("=== Q1(g): 95% Confidence Interval for Education ===\n\n")
=== Q1(g): 95% Confidence Interval for Education ===
cat("Formula: B1_hat +/- t(0.975, df) x SE(B1_hat)\n\n")
Formula: B1_hat +/- t(0.975, df) x SE(B1_hat)
cat(sprintf("t critical value (0.975, 931) : %.4f\n", t_crit_975))
t critical value (0.975, 931) : 1.9625
cat(sprintf("Margin of error               : %.4f x %.6f = %.6f\n",
            t_crit_975, se1, t_crit_975 * se1))
Margin of error               : 1.9625 x 0.006512 = 0.012781
cat(sprintf("Lower Bound : %.6f - %.6f = %.6f\n",
            beta1, t_crit_975 * se1, ci_lo))
Lower Bound : 0.074864 - 0.012781 = 0.062083
cat(sprintf("Upper Bound : %.6f + %.6f = %.6f\n",
            beta1, t_crit_975 * se1, ci_hi))
Upper Bound : 0.074864 + 0.012781 = 0.087645
cat(sprintf("\n95%% CI for education: [%.4f, %.4f]\n", ci_lo, ci_hi))

95% CI for education: [0.0621, 0.0876]

Formula: \[\hat{\beta}_1 \pm t_{0.975,\,931} \times SE(\hat{\beta}_1)\]

Substituting values: \[0.074864 \pm 1.9624 \times 0.006512\] \[0.074864 \pm 0.012775\]

\[\boxed{95\%\ CI: \ [0.0621, \ 0.0876]}\]

Comment: The confidence interval does not contain zero, which confirms that education has a statistically significant positive effect on log(wage) at the 5% level. We are 95% confident that one additional year of education increases log(wage) by between 6.21% and 8.76%.


4 Question 2: Experience and log(wage)

4.1 Q2(a) — State the Hypothesis

\[H_0: \beta_2 = 0\] \[H_1: \beta_2 > 0\]

We expect experience to increase wages. This is a one-tailed (right) test.


4.2 Q2(b) — t-Test at 10% Significance Level

beta2      <- b["exper"]
se2        <- se["exper"]
t_exper    <- beta2 / se2
t_crit_10  <- qt(0.90, df = df)
p_exper    <- pt(t_exper, df = df, lower.tail = FALSE)

cat("=== Q2(b): t-Test for Experience (One-Tailed, alpha = 10%) ===\n\n")
=== Q2(b): t-Test for Experience (One-Tailed, alpha = 10%) ===
cat(sprintf("Coefficient on exper  : %.6f\n", beta2))
Coefficient on exper  : 0.015328
cat(sprintf("Standard Error        : %.6f\n", se2))
Standard Error        : 0.003370
cat(sprintf("t-statistic           : %.6f / %.6f = %.3f\n",
            beta2, se2, t_exper))
t-statistic           : 0.015328 / 0.003370 = 4.549
cat(sprintf("Critical value (10%%) : %.4f\n", t_crit_10))
Critical value (10%) : 1.2825
cat(sprintf("p-value (one-tailed)  : %e\n\n", p_exper))
p-value (one-tailed)  : 3.049992e-06
if(t_exper > t_crit_10){
  cat("Decision: REJECT H0\n")
  cat("Conclusion: Experience has a significant positive effect on wages.\n")
} else {
  cat("Decision: FAIL TO REJECT H0\n")
}
Decision: REJECT H0
Conclusion: Experience has a significant positive effect on wages.

Step-by-Step Workings:

Step 1 — Hypotheses: \[H_0: \beta_2 = 0 \qquad H_1: \beta_2 > 0\]

Step 2 — t-statistic: \[t = \frac{\hat{\beta}_2}{SE(\hat{\beta}_2)} = \frac{0.015328}{0.003370} = 4.549\]

Step 3 — Critical value (one-tailed, \(\alpha = 0.10\), \(df = 931\)): \[t_{critical} = 1.2817\]

Step 4 — Decision: \[4.549 > 1.2817 \quad \Rightarrow \textbf{Reject } H_0\]

Technical Conclusion: At the 10% significance level, we reject \(H_0\) and conclude that experience has a statistically significant positive effect on log(wage).

Plain Language: Workers with more years of experience earn significantly higher wages. Each additional year of experience increases wages by approximately 1.53%, holding education and tenure constant.


5 Question 3: Tenure — 99% Confidence Interval

beta3       <- b["tenure"]
se3         <- se["tenure"]
t_crit_995  <- qt(0.995, df = df)
ci_lo_ten   <- beta3 - t_crit_995 * se3
ci_hi_ten   <- beta3 + t_crit_995 * se3

cat("=== Q3: 99% Confidence Interval for Tenure ===\n\n")
=== Q3: 99% Confidence Interval for Tenure ===
cat("Formula: B3_hat +/- t(0.995, df) x SE(B3_hat)\n\n")
Formula: B3_hat +/- t(0.995, df) x SE(B3_hat)
cat(sprintf("t critical value (0.995, 931) : %.4f\n", t_crit_995))
t critical value (0.995, 931) : 2.5811
cat(sprintf("Margin of error               : %.4f x %.6f = %.6f\n",
            t_crit_995, se3, t_crit_995 * se3))
Margin of error               : 2.5811 x 0.002587 = 0.006678
cat(sprintf("Lower Bound : %.6f - %.6f = %.6f\n",
            beta3, t_crit_995 * se3, ci_lo_ten))
Lower Bound : 0.013375 - 0.006678 = 0.006697
cat(sprintf("Upper Bound : %.6f + %.6f = %.6f\n",
            beta3, t_crit_995 * se3, ci_hi_ten))
Upper Bound : 0.013375 + 0.006678 = 0.020053
cat(sprintf("\n99%% CI for tenure: [%.4f, %.4f]\n",
            ci_lo_ten, ci_hi_ten))

99% CI for tenure: [0.0067, 0.0201]

Formula: \[\hat{\beta}_3 \pm t_{0.995,\,931} \times SE(\hat{\beta}_3)\]

Substituting values: \[0.013375 \pm 2.5807 \times 0.002587\] \[0.013375 \pm 0.006676\]

\[\boxed{99\%\ CI: \ [0.0067, \ 0.0201]}\]

Comment: The interval does not include zero, confirming that tenure has a statistically significant positive effect on log(wage) even at the strict 1% significance level. Each additional year with the same employer increases wages by between 0.67% and 2.01%.


6 Question 4: Joint F-Test

f_vals     <- summary(model1)$fstatistic
F_stat     <- f_vals[1]
F_df1      <- f_vals[2]
F_df2      <- f_vals[3]
F_crit_q4  <- qf(0.95, df1 = F_df1, df2 = F_df2)
p_F        <- pf(F_stat, df1 = F_df1, df2 = F_df2, lower.tail = FALSE)

cat("=== Q4: Joint F-Test — All Variables (alpha = 5%) ===\n\n")
=== Q4: Joint F-Test — All Variables (alpha = 5%) ===
cat("H0: B1 = B2 = B3 = 0  (no variable matters)\n")
H0: B1 = B2 = B3 = 0  (no variable matters)
cat("H1: At least one Bj is not zero\n\n")
H1: At least one Bj is not zero
cat(sprintf("F-statistic            : %.4f\n", F_stat))
F-statistic            : 56.9739
cat(sprintf("Numerator df (k)       : %.0f\n", F_df1))
Numerator df (k)       : 3
cat(sprintf("Denominator df (n-k-1) : %.0f\n", F_df2))
Denominator df (n-k-1) : 931
cat(sprintf("F critical value (5%%) : %.4f\n", F_crit_q4))
F critical value (5%) : 2.6145
cat(sprintf("p-value                : %e\n\n", p_F))
p-value                : 8.119592e-34
if(F_stat > F_crit_q4){
  cat("Decision: REJECT H0\n")
  cat("Conclusion: All variables are jointly significant.\n")
} else {
  cat("Decision: FAIL TO REJECT H0\n")
}
Decision: REJECT H0
Conclusion: All variables are jointly significant.

Hypotheses: \[H_0: \beta_1 = \beta_2 = \beta_3 = 0\] \[H_1: \text{At least one } \beta_j \neq 0\]

Decision: \[F = 56.97 > F_{critical} = 2.6142 \quad \Rightarrow \textbf{Reject } H_0\]

Conclusion: At the 5% significance level, education, experience, and tenure are jointly statistically significant in explaining log(wage). The overall model is statistically meaningful.


7 Question 5: hprice1 Dataset

7.1 Estimate the Regression Model

model2 <- lm(price ~ sqrft + bdrms, data = hprice1)
summary(model2)

Call:
lm(formula = price ~ sqrft + bdrms, data = hprice1)

Residuals:
     Min       1Q   Median       3Q      Max 
-127.627  -42.876   -7.051   32.589  229.003 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -19.31500   31.04662  -0.622    0.536    
sqrft         0.12844    0.01382   9.291 1.39e-14 ***
bdrms        15.19819    9.48352   1.603    0.113    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 63.04 on 85 degrees of freedom
Multiple R-squared:  0.6319,    Adjusted R-squared:  0.6233 
F-statistic: 72.96 on 2 and 85 DF,  p-value: < 2.2e-16

7.2 Q5(a) — Full Estimated Regression Equation

b2   <- coef(model2)
se2b <- sqrt(diag(vcov(model2)))
n2   <- nobs(model2)
r2_2 <- summary(model2)$r.squared

cat("Estimated Regression Equation:\n")
Estimated Regression Equation:
cat("================================\n")
================================
cat(sprintf(
  "price = %.4f + %.4f*sqrft + %.4f*bdrms\n",
  b2["(Intercept)"], b2["sqrft"], b2["bdrms"]
))
price = -19.3150 + 0.1284*sqrft + 15.1982*bdrms
cat(sprintf(
  "SE:    (%.4f)      (%.4f)        (%.4f)\n",
  se2b["(Intercept)"], se2b["sqrft"], se2b["bdrms"]
))
SE:    (31.0466)      (0.0138)        (9.4835)
cat(sprintf("n = %d,  R-squared = %.4f\n", n2, r2_2))
n = 88,  R-squared = 0.6319

\[\widehat{price} = \underset{(31.0466)}{-19.3150} + \underset{(0.0138)}{0.1284}\,sqrft + \underset{(9.4835)}{15.1982}\,bdrms\]

\[n = 88, \quad R^2 = 0.6319\]

Interpretation: The model explains 63.19% of variation in house prices. Square footage and bedrooms together account for a substantial portion of price differences across the 88 houses.


7.3 Q5(b) — Individual Significance at 10%

df2_m     <- model2$df.residual
t_crit_q5 <- qt(0.95, df = df2_m)

cat("=== Q5(b): Individual Significance at 10% Level ===\n\n")
=== Q5(b): Individual Significance at 10% Level ===
cat(sprintf("Critical value (10%%, df = 85): %.4f\n\n", t_crit_q5))
Critical value (10%, df = 85): 1.6630
coef_table <- summary(model2)$coefficients

cat(sprintf("%-15s | %-8s | %-8s | %-8s | %s\n",
            "Variable", "t-stat", "p-value", "Decision", ""))
Variable        | t-stat   | p-value  | Decision | 
cat(rep("-", 65), "\n", sep = "")
-----------------------------------------------------------------
for(v in rownames(coef_table)){
  tv  <- coef_table[v, "t value"]
  pv  <- coef_table[v, "Pr(>|t|)"]
  dec <- ifelse(abs(tv) > t_crit_q5,
                "SIGNIFICANT", "not significant")
  cat(sprintf("%-15s | %-8.3f | %-8.4f | %s\n", v, tv, pv, dec))
}
(Intercept)     | -0.622   | 0.5355   | not significant
sqrft           | 9.291    | 0.0000   | SIGNIFICANT
bdrms           | 1.603    | 0.1127   | not significant

How we reach this conclusion:

We compare each \(|t_{calculated}|\) to \(t_{critical} = 1.6632\) (one-tailed, \(\alpha = 0.10\), \(df = 85\)):

Variable t-stat p-value Significant at 10%?
Intercept -0.622 0.536 ❌ No
sqrft 9.291 0.000 ✅ Yes
bdrms 1.603 0.113 ❌ No

Only sqrft is individually significant. The p-value for bdrms (0.113) exceeds \(\alpha = 0.10\), so we cannot reject \(H_0\) for bedrooms individually.


7.4 Q5(c) — Joint F-Test for sqrft and bdrms

# Unrestricted model R-squared
R2_unres   <- summary(model2)$r.squared

# Restricted model (intercept only)
model2_res <- lm(price ~ 1, data = hprice1)
R2_res     <- summary(model2_res)$r.squared

# Parameters
q_r   <- 2       # number of restrictions
n_obs <- nobs(model2)
k_v   <- 2       # number of predictors

# Manual F-statistic
F_num     <- (R2_unres - R2_res) / q_r
F_den     <- (1 - R2_unres) / (n_obs - k_v - 1)
F_manual  <- F_num / F_den
F_crit_q5 <- qf(0.95, df1 = q_r, df2 = n_obs - k_v - 1)

cat("=== Q5(c): Joint F-Test for sqrft and bdrms (alpha = 5%) ===\n\n")
=== Q5(c): Joint F-Test for sqrft and bdrms (alpha = 5%) ===
cat("H0: B1 = B2 = 0\n")
H0: B1 = B2 = 0
cat("H1: At least one of B1, B2 is not zero\n\n")
H1: At least one of B1, B2 is not zero
cat("Formula:\n")
Formula:
cat("F = [(R2_unres - R2_res)/q] / [(1 - R2_unres)/(n - k - 1)]\n\n")
F = [(R2_unres - R2_res)/q] / [(1 - R2_unres)/(n - k - 1)]
cat(sprintf("R-squared unrestricted : %.4f\n", R2_unres))
R-squared unrestricted : 0.6319
cat(sprintf("R-squared restricted   : %.4f\n", R2_res))
R-squared restricted   : 0.0000
cat(sprintf("q (restrictions)       : %d\n", q_r))
q (restrictions)       : 2
cat(sprintf("n (observations)       : %d\n", n_obs))
n (observations)       : 88
cat(sprintf("Denominator df         : %d\n", n_obs - k_v - 1))
Denominator df         : 85
cat(sprintf("\nNumerator   : (%.4f - %.4f) / %d = %.6f\n",
            R2_unres, R2_res, q_r, F_num))

Numerator   : (0.6319 - 0.0000) / 2 = 0.315959
cat(sprintf("Denominator : (1 - %.4f) / %d = %.6f\n",
            R2_unres, n_obs - k_v - 1, F_den))
Denominator : (1 - 0.6319) / 85 = 0.004330
cat(sprintf("\nF-statistic computed   : %.4f\n", F_manual))

F-statistic computed   : 72.9635
cat(sprintf("F critical value (5%%) : %.4f\n", F_crit_q5))
F critical value (5%) : 3.1038
if(F_manual > F_crit_q5){
  cat("\nDecision: REJECT H0\n")
  cat("Conclusion: sqrft and bdrms are jointly significant.\n")
} else {
  cat("\nDecision: FAIL TO REJECT H0\n")
}

Decision: REJECT H0
Conclusion: sqrft and bdrms are jointly significant.

Hypotheses: \[H_0: \beta_1 = \beta_2 = 0\] \[H_1: \beta_1 \neq 0 \text{ and/or } \beta_2 \neq 0\]

Formula: \[F = \frac{(R^2_{unres} - R^2_{res})/q}{(1 - R^2_{unres})/(n-k-1)}\]

Substituting values: \[F = \frac{(0.6319 - 0)/2}{(1 - 0.6319)/85} = \frac{0.3160}{0.00433} = 72.97\]

Critical value (\(\alpha = 0.05\), \(df_1 = 2\), \(df_2 = 85\)): \[F_{critical} = 3.1013\]

Decision: \[72.97 > 3.1013 \quad \Rightarrow \textbf{Reject } H_0\]

Conclusion: At the 5% significance level, we reject \(H_0\). Even though bdrms is not individually significant, sqrft and bdrms are jointly statistically significant in explaining house prices. Dropping both variables from the model would make it significantly worse.


8 Overall Summary of Results

Summary: Brian Okumu — BE01/1011/2023
Question Test Decision Finding
Q1(e) t-test: educ (5%, one-tail) Reject H0 Education increases wages
Q1(f) p-value: educ Reject H0 Education increases wages
Q1(g) 95% CI: educ [0.0621, 0.0876] Zero not in interval — significant
Q2(b) t-test: exper (10%, one-tail) Reject H0 Experience increases wages
Q3 99% CI: tenure [0.0067, 0.0201] Zero not in interval — significant
Q4 Joint F-test (5%) Reject H0 All 3 variables jointly significant
Q5(b) Individual sig (10%) sqrft only Only sqrft significant individually
Q5(c) Joint F-test: sqrft & bdrms (5%) Reject H0 Both jointly significant

End of Assignment — Brian Okumu | BE01/1011/2023