To make statistical inference (hypothesis tests, confidence intervals), in addition to expected values and variances we need to know the sampling distributions of \hat{\beta}_js.
To do this we need to assume that the error term is normally distributed. Under the Gauss-Markov assumptions the sampling distributions of OLS estimators can have any shape.
Assumption MLR.6: Normality
Population error term u is independent of the explanatory variables and follows a normal distribution with mean 0 and variance \sigma^2, that is,
u \sim N(0, \sigma^2)
Normality assumption is stronger than the previous assumptions.
Assumption MLR.6 implies that MLR.4, Zero conditional mean, and MLR.5, homoscedasticity, are also satisfied.
Assumptions MLR.1 through MLR.6 are called classical assumptions. (Gauss-Markov assumptions + Normality)
Under the classical assumptions, OLS estimators \hat{\beta}_js are the best unbiased estimators in not only all linear estimators but all estimators (including nonlinear estimators).
Classical assumptions can be summarized as follows:
Remember that u is the sum of many different unobserved factors affecting y.
We can invoke the Central Limit Theorem (CLT) to conclude that u has an approximate normal distribution.
CLT assumes that unobserved factors in u affect y in an additive fashion.
If u is a complicated function of unobserved factors then the CLT may not apply.
In some cases, normality assumption may be violated, for example, distribution of wages may not be normal (positive values, minimum wage laws, etc.). In practice, we assume that conditional distribution is close to being normal.
In some cases, transformations of variables (e.g., natural log) may yield an approximately normal distribution.
Normal sampling distributions
Under the assumptions MLR.1 through MLR.6 OLS estimators follow a normal distributions (conditional on x’s):
Call:
lm(formula = log(price) ~ log(nox) + log(dist) + rooms + stratio,
data = hprice2)
Residuals:
Min 1Q Median 3Q Max
-1.05890 -0.12427 0.02128 0.12882 1.32531
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.083861 0.318111 34.843 < 2e-16 ***
log(nox) -0.953539 0.116742 -8.168 2.57e-15 ***
log(dist) -0.134339 0.043103 -3.117 0.00193 **
rooms 0.254527 0.018530 13.736 < 2e-16 ***
stratio -0.052451 0.005897 -8.894 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.265 on 501 degrees of freedom
Multiple R-squared: 0.584, Adjusted R-squared: 0.5807
F-statistic: 175.9 on 4 and 501 DF, p-value: < 2.2e-16
The variables:
price: the median house price
nox: the amount of nitrogen oxide in the air in the community–> an indicator of pollution
dist: distance to employment centers
rooms: average number of rooms in houses in the community,
stratio: average student-teacher ratio of schools in the community
Suppose we want to test H_0: \beta_{log(nox)} = -1 against H_1: \beta_{log(nox)} \neq -1
Test statistic:
t = \frac{-0.953539 - (-1)}{0.116742} \approx 0.398\,\, [p-value \approx 0.6908]
Thus, we fail to reject H_0.
2.2 Large standard errors and small t statistics
As the sample size (n) gets bigger the standard errors of \hat{\beta}_js become smaller.
Therefore, as n becomes larger it is more appropriate to use small significance levels ( such as 1%).
One reason for large standard errors in practice may be due to high collinearity among explanatory variables (multicollinearity).
If explanatory variables are highly correlated it may be difficult to determine the partial effects of variables.
In this case the best we can do is to collect more data.
2.3 Guidelines for economic and statistical significance
Check for statistical significance: if significant discuss the practical and economic significance using the magnitude of the coefficient.
If a variable is not statistically significant at the usual levels (1%, 5%, 10%) you may still discuss the economic significance and statistical significance using p-values
Small t-statistics and wrong signs on coefficients: these can be ignored in practice
A significant variable that has the unexpected sign and practically large effect is much more difficult to interpret.
This may imply a problem associated with model specification and/or data problems.
Using this ratio we can construct the (1 - \alpha)\times 100\% confidence interval for \hat{\beta}_j
\hat{\beta}_j \pm c \times se(\hat{\beta}_j)
We need three quantities to calculate confidence intervals: coefficient estimate, standard error of the estimate, and critical value (c).
The critical value (c) is the upper \alpha/2 quantile of the t distribution
For example, for df=25 and 95% confidence level, confidence interval for a population parameter can be calculated using:
\hat{\beta}_j \pm 2.06 \times se(\hat{\beta}_j)
If n - k - 1 > 50, then 95% confidence interval can easily be calculated using
\hat{\beta}_j \pm 2 \times se(\hat{\beta}_j)
How do we interpret confidence intervals?
If random samples were obtained over and over again and confidence intervals are computed for each sample then the unknown population value \beta_j would lie in the confidence interval for (1-\alpha) \times 100\% of the samples
For example for a 95% confidence interval, we would say 95 of the confidence intervals out of 100 would contain the true value.
In practice, we only have one sample and thus, only one confidence interval estimate. We do not know if the estimated confidence interval really contains the true value.
Confidence intervals can be used also to test H_0: \beta_j = b_j versus H_1: \beta_j \neq b_j
We reject H_0 at the 5% significance level in favor of H_1 if the 95% confidence interval does not contain b_j
Call:
lm(formula = lwage ~ jc + totcoll + exper, data = twoyear)
Residuals:
Min 1Q Median 3Q Max
-2.10362 -0.28132 0.00551 0.28518 1.78167
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.4723256 0.0210602 69.910 <2e-16 ***
jc -0.0101795 0.0069359 -1.468 0.142
totcoll 0.0768762 0.0023087 33.298 <2e-16 ***
exper 0.0049442 0.0001575 31.397 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4301 on 6759 degrees of freedom
Multiple R-squared: 0.2224, Adjusted R-squared: 0.2221
F-statistic: 644.5 on 3 and 6759 DF, p-value: < 2.2e-16
Based on the above output, we have
t_{jc} = \frac{-0.0101795}{0.0069359} \approx -1.468;\, p = 0.142
- There is no strong evidence against H_0. The return on an additional year of education at a 4-year college is statistically similar with the return on an additional year at a 2-year college.
5 Testing multiple linear restrictions: the F Test
In practice, we would like to test multiple hypotheses about the population parameters.
We will use the F test for this purpose
We want to test whether a group of variables has no effect on the dependent variable
For example, in the following model (unrestricted model)
y = \beta_0 +\beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \beta_4x_4 + \beta_5x_5 + u
we want to test
H_0: \beta_3 = 0, \beta_4 = 0, \beta_5 = 0 \\
H_1: \beta_3 \neq 0, \beta_4 \neq 0, \beta_5 \neq 0
which means that x_3, x_4\, \text{and}\, x_5 together have no effect on y after controlling for x_1\, \text{and}\, x_2.
H_0 puts 3 exclusion restrictions on the model
The alternative holds if at least one of \beta_3, \beta_4\, \text{or}\, \beta_5 is different from zero
Under H_0, the restricted model is given by
y = \beta_0 +\beta_1x_1 + \beta_2x_2 + u
Let SSE_{ur} and SSE_{r} be the residual (error) sums of squares for the unrestricted and restricted models, respectively.
Then H_0 can be tested using
F = \frac{(SSE_{r} - SSE_{ur})/q}{SSE_{ur}/(n-k-1)} \sim F_{q, n-k-1}
the numerator df (q) is the number of restrictions imposed on the model
This test is useful when the variables in the group are highly correlated in which case individual t test is unreliable because of high standard errors due to multicollinearity
The F test outlined above can also be used to test for q = 1, say H_0: \beta_j = 0, since t^2_{n-k-1} \sim F_{1, n-k-1}
However, the t test is more flexible since it allows for one-sided alternatives
We can also express the above F test statistic in terms of R^2
Let R^2_{ur} and R^2_{r} be the R^2 for the unrestricted and restricted models, respectively.
F = \frac{(R^2_{ur} - R^2_{r})/q}{(1- R^2_{ur})/(n-k-1)} \sim F_{q, n-k-1}
The above expression is true because SSE_r = SST (1-R^2_r) and SSE_{ur} = SST (1-R^2_{ur})
To illustrate the above concepts, consider the foolowing model:
F = \frac{(465167 - 464041)/2}{464041/1185} = 1.437707\,;\, p \approx 0.2379
F statistic in SSE form:
F = \frac{(0.03875 - 0.03642)/2}{(1- 0.03875)/1185} = 1.436177;\,\, p \approx 0.2382
Using either form of the F statistic leads to the non-rejection of H_0 which implies that parents’ education has no effect on birth weights. They are jointly insignificant.
the above calculations can be carried out easily using the anova() function as shown in the following code chunk
Code
anova(reg7, reg6)
Analysis of Variance Table
Model 1: bwght ~ cigs + parity + faminc
Model 2: bwght ~ cigs + parity + faminc + motheduc + fatheduc
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1187 465167
2 1185 464041 2 1125.7 1.4373 0.238
6 Asymptotic normality and large-sample inference
The last classical assumption (MLR.6) states that, conditional on x variables, the error term has a normal distribution. This implies that the conditional distribution of y is also normal because linear combinations of normal random variables also follow the normal distribution.
We do not need the normality assumption for the unbiasedness of OLS estimators. The normality assumption is required to derive the exact (finite sample) sampling distributions of OLS estimators (which are also normal).
If the normality assumption fails, does this imply that we cannot carry out t and F tests?
The answer is NO! If the sample size is large enough, we may be able to rely on the Central Limit Theorem to conclude that OLS estimators are asymptotically normal.
Asymptotic = large sample = we collect more data (hence more information as n \rightarrow \infty))
6.1 Lagrange Multiplier (LM) test statistic
In large samples, we can use the Lagrange Multiplier (LM, or score test) statistic to test linear restrictions.
The LM statistic relies only on the estimation of the restricted model. After the restricted model is estimated an auxiliary regression is run to get the LM statistic.
Consider the model
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + u
Suppose we want to test H_0: \beta_3 = \beta_4 = 0 versus H_1: \text{at least one of them is not zero}
The LM test statistic is computed by multiplying the sample size n by R^2 obtained from the regression of the residuals from the restricted model on all explanatory variables.
Under H_0, the resptricted model is
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u
Let \tilde{u} be the residuals from the restricted model
bwght: birth weight of newly born babies, in pounds
cigs: average number of cigarettes the mother smoked per day during pregnancy,
parity: the birth order of this child
faminc: annual family income
motheduc: years of schooling for the mother
fatheduc: years of schooling for the father
and we want to test H_0: \beta_4 = 0, \beta_5 = 0.
Code
resid7 <- reg7$residuals #extracting the residuals from the restricted modelbwght1 <-cbind(bwght1,resid7)reg8 <-lm(resid7 ~ cigs + parity + faminc + motheduc + fatheduc, data = bwght1)summary(reg8)
Call:
lm(formula = resid7 ~ cigs + parity + faminc + motheduc + fatheduc,
data = bwght1)
Residuals:
Min 1Q Median 3Q Max
-95.796 -11.960 0.643 12.679 150.879
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.945597 3.728453 -0.254 0.7998
cigs 0.001916 0.110348 0.017 0.9862
parity -0.044671 0.659406 -0.068 0.9460
faminc -0.011020 0.036562 -0.301 0.7631
motheduc -0.370450 0.319855 -1.158 0.2470
fatheduc 0.472394 0.282643 1.671 0.0949 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.79 on 1185 degrees of freedom
Multiple R-squared: 0.00242, Adjusted R-squared: -0.001789
F-statistic: 0.5749 on 5 and 1185 DF, p-value: 0.7193
The LM statistic is
LM = 1191 \times 0.00242 \approx 2.88;\,\, p \approx 0.2369
We obtain the same results using F test. Therefore,parents’ education has no effect on birth weights.
7 Reporting regression results
One function which is very useful in creating publication-ready summary tabke of regression results is the tbl_regression() function of the gtsummary package
Finite sample properties: unbiasedness and efficiency
These are valid for any sample size n
OLS estimators are:
Unbiased (under assumptions MLR.1-MLR.4)
BLUE (under assumptions MLR.1-MLR5)(Best=the most efficient)
Assumption MLR.6: Normality of the error term (u), independence from explanatory variables
Normality assumption is used to derive the sampling distributions of OLS estimators for any n
Under normality usual t and F test statistics follow standard distributions for any sample size
What are the asymptotic properties of OLS estimators?
Asymptotic: “as the sample size, n, increases without limit”
These properties are: consistency and asymptotic normality
8.1 Asymptiotic consistency
Definition:
Let W_n be an estimator for the unknown population parameter \theta based on a random sample of \{Y_1, Y_2, \cdots, Y_n\}. For an arbitrary small number \epsilon > 0, as n becomes larger and larger and if the following condition is satisfied
then W_n is a consistent estimator for \theta. This means plim(W_n) = \theta
For example, the sample mean \overline{y} is a consistent estimator for the population mean \mu (Law of Large Numbers)
If \hat{\beta}_j is consistent, as n gets larger and larger the sampling distributions become more concentrated around the true value \beta_j
When n \rightarrow \infty, i.e. obtaining more and more data gets us closer to the parameter of interest, \beta_j. In the limit, the sampling distribution collapses on a single point.
This means that if we can collect more and more data we can make our estimator arbitrarily close to the true value.
In the graph below, as n increases, the distribution of \hat{\beta}_1 gets concentrated closer around the true value \beta_1
Recall MLR.4: Random error term is uncorrelated with explanatory variables