Stat 115s (Introduction to Econometrics)

Lesson 2.2- Multiple Linear Regression: Inference

Author

Norberto E. Milla, Jr.

Published

April 21, 2023

1 Sampling distributions of OLS estimators

  • To make statistical inference (hypothesis tests, confidence intervals), in addition to expected values and variances we need to know the sampling distributions of \hat{\beta}_js.

  • To do this we need to assume that the error term is normally distributed. Under the Gauss-Markov assumptions the sampling distributions of OLS estimators can have any shape.

Assumption MLR.6: Normality

Population error term u is independent of the explanatory variables and follows a normal distribution with mean 0 and variance \sigma^2, that is,

u \sim N(0, \sigma^2)

  • Normality assumption is stronger than the previous assumptions.

  • Assumption MLR.6 implies that MLR.4, Zero conditional mean, and MLR.5, homoscedasticity, are also satisfied.

  • Assumptions MLR.1 through MLR.6 are called classical assumptions. (Gauss-Markov assumptions + Normality)

  • Under the classical assumptions, OLS estimators \hat{\beta}_js are the best unbiased estimators in not only all linear estimators but all estimators (including nonlinear estimators).

  • Classical assumptions can be summarized as follows:

y|x \sim N(\beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_kx_k, \sigma^2)

  • Remember that u is the sum of many different unobserved factors affecting y.

  • We can invoke the Central Limit Theorem (CLT) to conclude that u has an approximate normal distribution.

  • CLT assumes that unobserved factors in u affect y in an additive fashion.

  • If u is a complicated function of unobserved factors then the CLT may not apply.

  • In some cases, normality assumption may be violated, for example, distribution of wages may not be normal (positive values, minimum wage laws, etc.). In practice, we assume that conditional distribution is close to being normal.

  • In some cases, transformations of variables (e.g., natural log) may yield an approximately normal distribution.

Normal sampling distributions

Under the assumptions MLR.1 through MLR.6 OLS estimators follow a normal distributions (conditional on x’s):

\hat{\beta}_j \sim N(\beta_j, Var(\hat{\beta}_j))

Standardizing we obtain:

\frac{\hat{\beta}_j - \beta_j}{sd(\hat{\beta}_j)} \sim N(0,1)

  • OLS estimators can be written as a linear combination of error terms.

  • Recall that linear combinations of normally distributed random variables also follow normal distribution.

2 Testing hypotheses about a single population parameter

The t Test

Recall that

\frac{\hat{\beta}_j - \beta_j}{sd(\hat{\beta}_j)} \sim N(0,1)

  • Replacing the standard deviation (sd) in the denominator by standard error (se) we obtain:

\frac{\hat{\beta}_j - \beta_j}{se(\hat{\beta}_j)} \sim t_{n-k-1}

  • The t test is used in testing hypotheses about a single population parameter as in H_0: \beta_{j0} = 0 against a suitable alternative
Code
reg1 <- lm(log(wage) ~ educ + exper + tenure, data =wage1)
summary(reg1)

Call:
lm(formula = log(wage) ~ educ + exper + tenure, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05802 -0.29645 -0.03265  0.28788  1.42809 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.284360   0.104190   2.729  0.00656 ** 
educ        0.092029   0.007330  12.555  < 2e-16 ***
exper       0.004121   0.001723   2.391  0.01714 *  
tenure      0.022067   0.003094   7.133 3.29e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4409 on 522 degrees of freedom
Multiple R-squared:  0.316, Adjusted R-squared:  0.3121 
F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16
  • Is exper statistically significant? Test H_0 : \beta_{exper} = 0 against H_1 : \beta_{exper} > 0

  • The t-statistic is: t = \frac{0.004121}{0.001723} \approx 2.391 with (upper-tail) p-value \approx 0.0086.

  • We reject H_0 and conclude that exper is statistically significant (greater than 0) at 1% level.

Code
reg2 <- lm(colGPA ~ hsGPA + ACT + skipped, data = gpa1)
summary(reg2)

Call:
lm(formula = colGPA ~ hsGPA + ACT + skipped, data = gpa1)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.85698 -0.23200 -0.03935  0.24816  0.81657 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.38955    0.33155   4.191 4.95e-05 ***
hsGPA        0.41182    0.09367   4.396 2.19e-05 ***
ACT          0.01472    0.01056   1.393  0.16578    
skipped     -0.08311    0.02600  -3.197  0.00173 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3295 on 137 degrees of freedom
Multiple R-squared:  0.2336,    Adjusted R-squared:  0.2168 
F-statistic: 13.92 on 3 and 137 DF,  p-value: 5.653e-08
  • The variable skipped refers to the average number of lectures missed per week, while ACT refers to achievement test score

  • H_0: \beta_j = 0 versus H_1: \beta_j \neq 0

    • t_{hsGPA} \approx 4.396;\, p \approx 0 \implies hsGPA is statistically different from zero

    • t_{ACT} \approx 1.393;\, p \approx 0.1658 \implies ACT is not statistically different from zero

    • t_{skipped} \approx -3.197;\, p \approx 0.0017 \implies skipped is statistically different from zero

2.1 Testing other hypotheses about \beta_j

Suppose we want to test H_0: \beta_j = b_j against any suitable alternative.

  • Test statistic:

t = \frac{\hat{\beta}_j - b_j}{se(\hat{\beta}_j)} \sim t_{n-k-1}

Code
reg3 <- lm(log(price) ~ log(nox) + log(dist) + rooms + stratio, data = hprice2)
summary(reg3)

Call:
lm(formula = log(price) ~ log(nox) + log(dist) + rooms + stratio, 
    data = hprice2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.05890 -0.12427  0.02128  0.12882  1.32531 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.083861   0.318111  34.843  < 2e-16 ***
log(nox)    -0.953539   0.116742  -8.168 2.57e-15 ***
log(dist)   -0.134339   0.043103  -3.117  0.00193 ** 
rooms        0.254527   0.018530  13.736  < 2e-16 ***
stratio     -0.052451   0.005897  -8.894  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.265 on 501 degrees of freedom
Multiple R-squared:  0.584, Adjusted R-squared:  0.5807 
F-statistic: 175.9 on 4 and 501 DF,  p-value: < 2.2e-16
  • The variables:

    • price: the median house price
    • nox: the amount of nitrogen oxide in the air in the community–> an indicator of pollution
    • dist: distance to employment centers
    • rooms: average number of rooms in houses in the community,
    • stratio: average student-teacher ratio of schools in the community
  • Suppose we want to test H_0: \beta_{log(nox)} = -1 against H_1: \beta_{log(nox)} \neq -1

  • Test statistic:

t = \frac{-0.953539 - (-1)}{0.116742} \approx 0.398\,\, [p-value \approx 0.6908]

  • Thus, we fail to reject H_0.

2.2 Large standard errors and small t statistics

  • As the sample size (n) gets bigger the standard errors of \hat{\beta}_js become smaller.

  • Therefore, as n becomes larger it is more appropriate to use small significance levels ( such as 1%).

  • One reason for large standard errors in practice may be due to high collinearity among explanatory variables (multicollinearity).

  • If explanatory variables are highly correlated it may be difficult to determine the partial effects of variables.

  • In this case the best we can do is to collect more data.

2.3 Guidelines for economic and statistical significance

  • Check for statistical significance: if significant discuss the practical and economic significance using the magnitude of the coefficient.

    • If a variable is not statistically significant at the usual levels (1%, 5%, 10%) you may still discuss the economic significance and statistical significance using p-values
  • Small t-statistics and wrong signs on coefficients: these can be ignored in practice

    • A significant variable that has the unexpected sign and practically large effect is much more difficult to interpret.
    • This may imply a problem associated with model specification and/or data problems.

3 Confidence intervals

  • We know that:

t_{\hat{\beta}_j} = \frac{\hat{\beta}_j}{se(\hat{\beta}_j)} \sim t_{n-k-1}

  • Using this ratio we can construct the (1 - \alpha)\times 100\% confidence interval for \hat{\beta}_j

\hat{\beta}_j \pm c \times se(\hat{\beta}_j)

  • We need three quantities to calculate confidence intervals: coefficient estimate, standard error of the estimate, and critical value (c).

  • The critical value (c) is the upper \alpha/2 quantile of the t distribution

    • For example, for df=25 and 95% confidence level, confidence interval for a population parameter can be calculated using:

\hat{\beta}_j \pm 2.06 \times se(\hat{\beta}_j)

  • If n - k - 1 > 50, then 95% confidence interval can easily be calculated using

\hat{\beta}_j \pm 2 \times se(\hat{\beta}_j)

  • How do we interpret confidence intervals?

    • If random samples were obtained over and over again and confidence intervals are computed for each sample then the unknown population value \beta_j would lie in the confidence interval for (1-\alpha) \times 100\% of the samples

    • For example for a 95% confidence interval, we would say 95 of the confidence intervals out of 100 would contain the true value.

  • In practice, we only have one sample and thus, only one confidence interval estimate. We do not know if the estimated confidence interval really contains the true value.

  • Confidence intervals can be used also to test H_0: \beta_j = b_j versus H_1: \beta_j \neq b_j

    • We reject H_0 at the 5% significance level in favor of H_1 if the 95% confidence interval does not contain b_j
Code
reg4 <- lm(math10 ~ totcomp + staff + enroll, data = meap93)
summary(reg4)

Call:
lm(formula = math10 ~ totcomp + staff + enroll, data = meap93)

Residuals:
    Min      1Q  Median      3Q     Max 
-22.235  -7.008  -0.807   6.097  40.689 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.2740209  6.1137938   0.372    0.710    
totcomp      0.0004586  0.0001004   4.570 6.49e-06 ***
staff        0.0479199  0.0398140   1.204    0.229    
enroll      -0.0001976  0.0002152  -0.918    0.359    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.24 on 404 degrees of freedom
Multiple R-squared:  0.05406,   Adjusted R-squared:  0.04704 
F-statistic: 7.697 on 3 and 404 DF,  p-value: 5.179e-05
  • the variables:

    • math10: mathematics test results (a measure of student performance)
    • totcomp: total compensation for teachers (a measure of teacher quality)
    • staff: number of staff per 1000 students (a measure of how much attention students get)
    • enroll: number of students (a measure of school size)
  • 95% CI for totcomp: 0.0004586 \pm 2.0 \times 0.0001004 \implies (0.0002578, 0.0006594)

    • Zero is not contained in this interval, thus, the coefficient for totomp is significantly different from zero
  • 95% CI for staff: 0.0479199 \pm 2.0 \times 0.0398140 \implies (-0.0317081, 0.1275479)

    • This interval includes zero, hence, the coefficient for staff is not statistically different from zero

4 Testing hypotheses about a single linear combination

Consider the following model

log(wage) = \beta_0 + \beta_1jc + \beta_2 univ + \beta_3 exper + u

where:

jc: number of years attending a junior college

univ: number of years at a 4-year college

exper: experience (year)

  • Is one year at a junior college (2-year higher education) worth one year at a university (4-year)?

  • Null hypothesis: H_0: \beta_1 = \beta_2 \Longleftrightarrow H_0: \beta_1 - \beta_2 = 0

  • Alternative hypothesis: H_0: \beta_1 < \beta_2 \Longleftrightarrow H_0: \beta_1 - \beta_2 < 0

  • Since the null hypothesis contains a single linear combination we can use t test:

t = \frac{\hat{\beta}_1 - \hat{\beta}_2}{se(\hat{\beta}_1 - \hat{\beta}_2)}

  • The standard error is given by:

\begin{align} se(\hat{\beta}_1 - \hat{\beta}_2) &= \sqrt{Var(\hat{\beta}_1 - \hat{\beta}_2)} \notag \\ &= \sqrt{Var(\hat{\beta}_1) + Var(\hat{\beta}_2) -2 Cov(\hat{\beta}_1, \hat{\beta}_2)} \notag \end{align}

  • An alternative method to compute se(\hat{\beta}_1 - \hat{\beta}_2) is to fit a re-arranged regression.

  • Let \theta = \beta_1 - \beta_2. Now the null and alternative hypotheses become:

H_0: \theta = 0\,\, \text{versus}\, \, H_1: \theta \neq 0

  • Substituting \beta_1 = \theta + \beta_2 into the original model we obtain:

\begin{align} log(wage) &= \beta_0 + (\theta + \beta_2)jc + \beta_2univ + \beta_3exper + u \notag \\ &= \beta_0 + \theta jc + \beta_2 (jc+univ) + \beta_3exper + u \notag \\ &= \beta_0 + \theta jc + \beta_2 (totcoll) + \beta_3exper + u \notag \end{align}

Code
reg5 <- lm(lwage ~ jc + totcoll + exper, data = twoyear)
summary(reg5)

Call:
lm(formula = lwage ~ jc + totcoll + exper, data = twoyear)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.10362 -0.28132  0.00551  0.28518  1.78167 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.4723256  0.0210602  69.910   <2e-16 ***
jc          -0.0101795  0.0069359  -1.468    0.142    
totcoll      0.0768762  0.0023087  33.298   <2e-16 ***
exper        0.0049442  0.0001575  31.397   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4301 on 6759 degrees of freedom
Multiple R-squared:  0.2224,    Adjusted R-squared:  0.2221 
F-statistic: 644.5 on 3 and 6759 DF,  p-value: < 2.2e-16
  • Based on the above output, we have

t_{jc} = \frac{-0.0101795}{0.0069359} \approx -1.468;\, p = 0.142 - There is no strong evidence against H_0. The return on an additional year of education at a 4-year college is statistically similar with the return on an additional year at a 2-year college.

5 Testing multiple linear restrictions: the F Test

  • In practice, we would like to test multiple hypotheses about the population parameters.

  • We will use the F test for this purpose

  • We want to test whether a group of variables has no effect on the dependent variable

  • For example, in the following model (unrestricted model)

y = \beta_0 +\beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_4x_4 + \beta_5x_5 + u

we want to test

H_0: \beta_3 = 0, \beta_4 = 0, \beta_5 = 0 \\ H_1: \beta_3 \neq 0, \beta_4 \neq 0, \beta_5 \neq 0 which means that x_3, x_4\, \text{and}\, x_5 together have no effect on y after controlling for x_1\, \text{and}\, x_2.

  • H_0 puts 3 exclusion restrictions on the model

  • The alternative holds if at least one of \beta_3, \beta_4\, \text{or}\, \beta_5 is different from zero

  • Under H_0, the restricted model is given by

y = \beta_0 +\beta_1x_1 + \beta_2x_2 + u

  • Let SSE_{ur} and SSE_{r} be the residual (error) sums of squares for the unrestricted and restricted models, respectively.

  • Then H_0 can be tested using

F = \frac{(SSE_{r} - SSE_{ur})/q}{SSE_{ur}/(n-k-1)} \sim F_{q, n-k-1}

  • the numerator df (q) is the number of restrictions imposed on the model

  • This test is useful when the variables in the group are highly correlated in which case individual t test is unreliable because of high standard errors due to multicollinearity

  • The F test outlined above can also be used to test for q = 1, say H_0: \beta_j = 0, since t^2_{n-k-1} \sim F_{1, n-k-1}

  • However, the t test is more flexible since it allows for one-sided alternatives

  • We can also express the above F test statistic in terms of R^2

  • Let R^2_{ur} and R^2_{r} be the R^2 for the unrestricted and restricted models, respectively.

F = \frac{(R^2_{ur} - R^2_{r})/q}{(1- R^2_{ur})/(n-k-1)} \sim F_{q, n-k-1}

  • The above expression is true because SSE_r = SST (1-R^2_r) and SSE_{ur} = SST (1-R^2_{ur})

  • To illustrate the above concepts, consider the foolowing model:

bwght = \beta_0 +\beta_1 cigs + \beta_2 parity + \beta_3 faminc + \beta_4 motheduc + \beta_5 fatheduc + u

  • where:

    • bwght: birth weight of newly born babies, in pounds
    • cigs: average number of cigarettes the mother smoked per day during pregnancy,
    • parity: the birth order of this child
    • faminc: annual family income
    • motheduc: years of schooling for the mother
    • fatheduc: years of schooling for the father
Code
bwght1 <- bwght %>% 
  na.omit()
#Unrestricted model
reg6 <- lm(bwght ~ cigs + parity + faminc + motheduc + fatheduc, data = bwght1)
summary(reg6)

Call:
lm(formula = bwght ~ cigs + parity + faminc + motheduc + fatheduc, 
    data = bwght1)

Residuals:
    Min      1Q  Median      3Q     Max 
-95.796 -11.960   0.643  12.679 150.879 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 114.52433    3.72845  30.716  < 2e-16 ***
cigs         -0.59594    0.11035  -5.401 8.02e-08 ***
parity        1.78760    0.65941   2.711  0.00681 ** 
faminc        0.05604    0.03656   1.533  0.12559    
motheduc     -0.37045    0.31986  -1.158  0.24702    
fatheduc      0.47239    0.28264   1.671  0.09492 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.79 on 1185 degrees of freedom
Multiple R-squared:  0.03875,   Adjusted R-squared:  0.03469 
F-statistic: 9.553 on 5 and 1185 DF,  p-value: 5.986e-09
Code
anova(reg6)
Analysis of Variance Table

Response: bwght
            Df Sum Sq Mean Sq F value    Pr(>F)    
cigs         1  13076 13075.8 33.3912 9.625e-09 ***
parity       1   2825  2824.5  7.2129  0.007339 ** 
faminc       1   1680  1679.5  4.2889  0.038578 *  
motheduc     1     32    31.8  0.0811  0.775804    
fatheduc     1   1094  1093.9  2.7934  0.094918 .  
Residuals 1185 464041   391.6                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Suppose we want to test H_0: \beta_4 = 0, \beta_5 = 0. That is, parents’ education has no effect on birth weight, ceteris paribus.
Code
#Restricted model
reg7 <- lm(bwght ~ cigs + parity + faminc, data = bwght1)
summary(reg7)

Call:
lm(formula = bwght ~ cigs + parity + faminc, data = bwght1)

Residuals:
    Min      1Q  Median      3Q     Max 
-95.811 -11.552   0.524  12.739 150.848 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 115.46993    1.65590  69.733  < 2e-16 ***
cigs         -0.59785    0.10877  -5.496 4.74e-08 ***
parity        1.83227    0.65754   2.787  0.00541 ** 
faminc        0.06706    0.03239   2.070  0.03865 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.8 on 1187 degrees of freedom
Multiple R-squared:  0.03642,   Adjusted R-squared:  0.03398 
F-statistic: 14.95 on 3 and 1187 DF,  p-value: 1.472e-09
Code
anova(reg7)
Analysis of Variance Table

Response: bwght
            Df Sum Sq Mean Sq F value   Pr(>F)    
cigs         1  13076 13075.8 33.3666 9.74e-09 ***
parity       1   2825  2824.5  7.2076 0.007361 ** 
faminc       1   1680  1679.5  4.2857 0.038649 *  
Residuals 1187 465167   391.9                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • F statistic in SSE form:

F = \frac{(465167 - 464041)/2}{464041/1185} = 1.437707\,;\, p \approx 0.2379

  • F statistic in SSE form:

F = \frac{(0.03875 - 0.03642)/2}{(1- 0.03875)/1185} = 1.436177;\,\, p \approx 0.2382

  • Using either form of the F statistic leads to the non-rejection of H_0 which implies that parents’ education has no effect on birth weights. They are jointly insignificant.

  • the above calculations can be carried out easily using the anova() function as shown in the following code chunk

Code
anova(reg7, reg6)
Analysis of Variance Table

Model 1: bwght ~ cigs + parity + faminc
Model 2: bwght ~ cigs + parity + faminc + motheduc + fatheduc
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1   1187 465167                           
2   1185 464041  2    1125.7 1.4373  0.238

6 Asymptotic normality and large-sample inference

  • The last classical assumption (MLR.6) states that, conditional on x variables, the error term has a normal distribution. This implies that the conditional distribution of y is also normal because linear combinations of normal random variables also follow the normal distribution.

  • We do not need the normality assumption for the unbiasedness of OLS estimators. The normality assumption is required to derive the exact (finite sample) sampling distributions of OLS estimators (which are also normal).

  • If the normality assumption fails, does this imply that we cannot carry out t and F tests?

  • The answer is NO! If the sample size is large enough, we may be able to rely on the Central Limit Theorem to conclude that OLS estimators are asymptotically normal.

  • Asymptotic = large sample = we collect more data (hence more information as n \rightarrow \infty))

6.1 Lagrange Multiplier (LM) test statistic

  • In large samples, we can use the Lagrange Multiplier (LM, or score test) statistic to test linear restrictions.

  • The LM statistic relies only on the estimation of the restricted model. After the restricted model is estimated an auxiliary regression is run to get the LM statistic.

  • Consider the model

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + u

  • Suppose we want to test H_0: \beta_3 = \beta_4 = 0 versus H_1: \text{at least one of them is not zero}

  • The LM test statistic is computed by multiplying the sample size n by R^2 obtained from the regression of the residuals from the restricted model on all explanatory variables.

  • Under H_0, the resptricted model is

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u

  • Let \tilde{u} be the residuals from the restricted model

  • Fit the following model

\tilde{u} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + error

  • Let R^2_{\tilde{u}} be the coefficient of determination of the above regression

  • Then the LM statistic is

LM = n \times R^2_{\tilde{u}} \sim \chi^2_q

  • Again, q is the number of restrictions imposed on the full model

  • As an example, consider again the birth weight modell

bwght = \beta_0 +\beta_1 cigs + \beta_2 parity + \beta_3 faminc + \beta_4 motheduc + \beta_5 fatheduc + u

  • where:

    • bwght: birth weight of newly born babies, in pounds
    • cigs: average number of cigarettes the mother smoked per day during pregnancy,
    • parity: the birth order of this child
    • faminc: annual family income
    • motheduc: years of schooling for the mother
    • fatheduc: years of schooling for the father

and we want to test H_0: \beta_4 = 0, \beta_5 = 0.

Code
resid7 <- reg7$residuals #extracting the residuals from the restricted model
bwght1 <- cbind(bwght1,resid7)
reg8 <- lm(resid7 ~ cigs + parity + faminc + motheduc + fatheduc, data = bwght1)
summary(reg8)

Call:
lm(formula = resid7 ~ cigs + parity + faminc + motheduc + fatheduc, 
    data = bwght1)

Residuals:
    Min      1Q  Median      3Q     Max 
-95.796 -11.960   0.643  12.679 150.879 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.945597   3.728453  -0.254   0.7998  
cigs         0.001916   0.110348   0.017   0.9862  
parity      -0.044671   0.659406  -0.068   0.9460  
faminc      -0.011020   0.036562  -0.301   0.7631  
motheduc    -0.370450   0.319855  -1.158   0.2470  
fatheduc     0.472394   0.282643   1.671   0.0949 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.79 on 1185 degrees of freedom
Multiple R-squared:  0.00242,   Adjusted R-squared:  -0.001789 
F-statistic: 0.5749 on 5 and 1185 DF,  p-value: 0.7193
  • The LM statistic is

LM = 1191 \times 0.00242 \approx 2.88;\,\, p \approx 0.2369

  • We obtain the same results using F test. Therefore,parents’ education has no effect on birth weights.

7 Reporting regression results

  • One function which is very useful in creating publication-ready summary tabke of regression results is the tbl_regression() function of the gtsummary package
Code
tbl_regression(reg6) %>%
  modify_column_hide(column = ci) %>%
  modify_column_unhide(column = std.error) %>% 
  add_glance_table(include = c(nobs, r.squared, adj.r.squared, AIC))
Characteristic Beta SE1 p-value
cigs -0.60 0.110 <0.001
parity 1.8 0.659 0.007
faminc 0.06 0.037 0.13
motheduc -0.37 0.320 0.2
fatheduc 0.47 0.283 0.095
No. Obs. 1,191
0.039
Adjusted R² 0.035
AIC 10,498
1 SE = Standard Error
  • It is very flexible in reporting regression results of 2 or more models
Code
tbl1 <- tbl_regression(reg6) %>%
  modify_column_hide(column = ci) %>%
  modify_column_unhide(column = std.error) %>% 
  add_glance_table(include = c(nobs, r.squared, adj.r.squared, sigma, AIC))

tbl2 <- tbl_regression(reg7) %>%
  modify_column_hide(column = ci) %>%
  modify_column_unhide(column = std.error) %>% 
  add_glance_table(include = c(nobs, r.squared, adj.r.squared, sigma, AIC))

tbl_merge(
    tbls = list(tbl1, tbl2),
    tab_spanner = c("**Unrestricted model**", "**Restricted model**")
  )
Characteristic Unrestricted model Restricted model
Beta SE1 p-value Beta SE1 p-value
cigs -0.60 0.110 <0.001 -0.60 0.109 <0.001
parity 1.8 0.659 0.007 1.8 0.658 0.005
faminc 0.06 0.037 0.13 0.07 0.032 0.039
motheduc -0.37 0.320 0.2
fatheduc 0.47 0.283 0.095
No. Obs. 1,191 1,191
0.039 0.036
Adjusted R² 0.035 0.034
Sigma 19.8 19.8
AIC 10,498 10,497
1 SE = Standard Error

8 Asymptotic properties of OLS estimators

Recall

  • Finite sample properties: unbiasedness and efficiency

  • These are valid for any sample size n

  • OLS estimators are:

    • Unbiased (under assumptions MLR.1-MLR.4)

    • BLUE (under assumptions MLR.1-MLR5)(Best=the most efficient)

  • Assumption MLR.6: Normality of the error term (u), independence from explanatory variables

  • Normality assumption is used to derive the sampling distributions of OLS estimators for any n

    • Under normality usual t and F test statistics follow standard distributions for any sample size
  • What are the asymptotic properties of OLS estimators?

  • Asymptotic: “as the sample size, n, increases without limit

  • These properties are: consistency and asymptotic normality

8.1 Asymptiotic consistency

Definition:

Let W_n be an estimator for the unknown population parameter \theta based on a random sample of \{Y_1, Y_2, \cdots, Y_n\}. For an arbitrary small number \epsilon > 0, as n becomes larger and larger and if the following condition is satisfied

P(|W_n - \theta| > \epsilon) \rightarrow 0, \, n \rightarrow \infty

then W_n is a consistent estimator for \theta. This means plim(W_n) = \theta

  • For example, the sample mean \overline{y} is a consistent estimator for the population mean \mu (Law of Large Numbers)

  • If \hat{\beta}_j is consistent, as n gets larger and larger the sampling distributions become more concentrated around the true value \beta_j

  • When n \rightarrow \infty, i.e. obtaining more and more data gets us closer to the parameter of interest, \beta_j. In the limit, the sampling distribution collapses on a single point.

  • This means that if we can collect more and more data we can make our estimator arbitrarily close to the true value.

  • In the graph below, as n increases, the distribution of \hat{\beta}_1 gets concentrated closer around the true value \beta_1

  • Recall MLR.4: Random error term is uncorrelated with explanatory variables

E(u|x_1, x_2, \cdots, x_n) = 0

  • A weaker version (ML.4’) is given as

E(u) = 0 \\ Cov(x_j, u) = 0,\,\, j= 1, 2, 3, \cdots, k

  • Replacing the MLR.4 with MLR.4’, it can be shown that OLS estimators are consistent: plim(\hat{\beta}_j) = \beta_j

  • For example, in the simple regression model the OLS estimator of the slope parameter can be written as:

\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)y_i}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2}

  • Substituting y_i = \beta_0 + \beta_1 x_{i1} + u_i and rearranging we obtain

\hat{\beta}_1 = \beta_1 + \frac{n^{-1} \sum_{i=1}^n (x_{i1} - \overline{x}_1)u_i}{n^{-1} \sum_{i=1}^n (x_{i1} - \overline{x}_1)^2}

  • Taking plim:

plim(\hat{\beta}_1) = \beta_1 + \frac{Cov(x_1, u)}{Var(X_1)}

  • By assumption MLR.4’: Cov(x_1, u) = 0, we obtain

plim(\hat{\beta}_1) = \beta_1

  • Asymptotic bias:

plim(\hat{\beta}_1) - \beta_1 = \frac{Cov(x_1, u)}{Var(X_1)}

8.2 Asymptotic normality

  • Under the Gauss-Markov assumptions:

\sqrt{n} (\hat{\beta}_j - \beta_j) \overset{a}{\sim} N\left(0, \frac{\sigma^2}{a^2_j}\right)

  • The asymptotic variance is

\frac{\sigma^2}{a^2_j} > 0

where:

a_j^2 = plim \left(n^{-1} \sum_{i=1}^n \hat{r}_{ij}^2 \right)

and \hat{r}_{ij}^2 denotes the residuals from the regression of x_j on all other x’s.

8.3 Asymptotic efficiency

  • We know that OLS estimators are BLUE under the Gauss-Markov assumptions.

  • OLS estimators are also asymptotically efficient under the Gauss-Markov assumptions

Avar(\hat{\beta}_j - \beta_j) \leq Avar (\tilde{\beta}_j - \beta_j) where:

Avar denotes asymptotic variance and \tilde{\beta}_j denotes another estimator (other than OLS)