Example
ECOM30001/ECOM90001
Basic Econometrics
Semester 1, 2025
Week 3, Lecture 1
Basic Linear Model: Model Specification
Reading: Hill et. al. Chapter 6
- Hypothesis Testing about more than one parameter: The F-test
- Testing the significance of a Model
- Model Specification
- Some General Considerations
- Interpretation in Log Models
- A Reciprocal Model
- Simulation Example
Lecture Objectives
The F-test
consider the (general) econometric model \[\color {blue}{y_i=\beta_0+\beta_1\,X_{1i} + \dots+\beta_K\, X_{Ki} + \varepsilon_i}\]
if \(\color{green}{\varepsilon_i}\) are normally distributed (or N is sufficiently large), we can test one or hypotheses about the unknown population parameters using the F-test
how: compare the sum of squared residuals (RSS) from an unrestricted model to the RSS from a restricted model in which H0 is assumed to be true
Unrestricted and Restricted Model
recall the OLS principle:
- the OLS estimators are the solution to a minimization problem (minimize the sum of squared errors)
the sum of squared residuals ( RSS) will represent the minimized value of the objective function—the sum of squared errors evaluated at the solution to the minimization problem \(\color {blue}{b_0,b_1,b_2, \dots}\) etc.
in our example, the restricted model imposes \(\color {red}{\beta_2= 0}\)
the minimized value of the objective function for the restricted model can never be smaller than that achieved for the unrestricted model. Why?
The F test continued
By the property of the minimum, the \(\color{green}{RSS}\) associated with the restrictions \(\color{red}{(RSS_R)}\) cannot be lower than the \(\color{green}{RSS}\) associated with no restrictions \(\color{red}{(RSS_{UR})}\).
if the null hypothesis is true, we expect that the difference in the RSS associated with restrictions compared to the RSS without the restrictions should be small
if the null hypothesis is not true, we expect that the RSS associated with the restrictions is considerably larger than the RSS associated without the restrictions.
- intuitively, the (false) null hypothesis has significantly reduced the ability of the model to fit the data (and considerably raised the RSS).
By the property of the minimum \(\color{red}{\left(RSS_R-RSS_{UR} \right) \geq 0}\)
Let \(\color {green}{M}\) denote the number of restrictions on the unknown parameters and define the random variable \[\color{blue}{F=\dfrac{\left(RSS_R-RSS_{UR} \right)/M}{\left(RSS_R-RSS_{UR} \right)/(N_K-1)}} \thicksim F(N-K-1)\] if \(H_0\) is true this random variable \(F\) follows a F-distribution with degrees of freedom \(M\) and \((N-K-1)\)
where \[ \begin{align} \color{green}{RSS_R} &= {\color{green}{\sum \hat{e}_i^2}} \text{ under } H_0 \text{ imposing the restrictions}\\ \color{green}{RSS_{UR}} &= {\color{green}{\sum \hat{e}_i^2}} \text{ in the unrestricted model } \end{align} \]
By the property of the minimum \(\color{red}{\left(RSS_R-RSS_{UR} \right) \geq 0}\)
if the null hypothesis is not true \(\color{red}{\left(RSS_R-RSS_{UR} \right)}\) should be relatively large
- the restrictions considerably reduce the ability of the model to fit the data—and the sample value of F-statistic becomes relatively large
when the sample F-statistic becomes ‘sufficiently large’ we will reject \(H_0\)
- the judgement about what value is ‘too large’ is evaluated by comparing the sample value to some \(F_c\) such that: \[ {\color{blue}{Pr[F \geq F_c]=\alpha}} \quad \text{ where } \alpha \text{ is the level of significance}\]
F-tests and t-tests
the null hypothesis \(\color{red}{\beta_2=0}\) is a test regarding a single restriction. In this case, we can use a t-test to test the hypothesis.
you will notice that the p-value associated with the F-test statistic and p-value associated with the t-test statistic are identical
Result: for a two-tailed test about a single coefficient (i.e. a single restriction), we have \[\color{blue}{\{t(N-K-1)\}}^2=F(1,N-K-1)\] the square of a t random variable with \(\color{green}{N-K-1}\) df is an F random variable with distribution \(\color{green}{F(1,N-K-1)}\)
this equivalence does not hold for a one-tailed test
Testing the Significance of a model
in the MLRM, testing the overall significance if the model amounts to testing whether there is a significant relationship between \(y\) and all of the included \(X\)’s: \[\color{blue}{y_i=\beta_0+\beta_1,X_{1i}+ \dots+\beta_K\, X_{Ki}+\varepsilon_i}\]
\[ \begin{align} \color{red}{H_0}:& \color{red}{\beta_1 =\beta_2=\dots\beta_K=0}\\ \color{red}{H_A}: & \color{red}{\text{ at least one } \beta_j \neq 0 \text{ for }j=1,2,\dots,K} \end{align}\]if the null hypothesis is true then none of our explanatory variables influence \(y\) and our model is of little value
note however \(\color{red}{H_0}\) involves \(\color{green}{K}\) restrictions
rejection of \(\color{red}{H_0}\) does not tell us which of the included \(\color{green}{X}\) variables are important in determining \(y\)
- it only tells us that at least one of the included \(\color{green}{X}\)’s is important (statistically)
the unrestricted model all of the included \(X\)’s: \[\color{blue}{y_i=\beta_0+\beta_1,X_{1i}+ \dots+\beta_K\, X_{Ki}+\varepsilon_i}\]
the restricted model (imposing K restrictions) \[\color{blue}{y_i=\beta_o+\varepsilon_i}\]
The RSS from a model with only a constant is equal to the total sum of squares \(\color{green}{\sum(y_i-\bar{y}^2)}\). This mean \(\color{blue}{RSS_R=TSS}\)
we do not need to estimate the restricted model to get \(\color{green} {RSS_R}\) since \(\color{blue}{TSS}\) will be the same in both the restricted and the unrestricted model. Why?
the F-statistic becomes \[\color{blue}{F=\dfrac{(RSS_R-RSS_{UR}/M)}{RSS_{UR}/(N-K-1)}=\dfrac{(TSS-RSS)/K}{RSS/(N-K-1)}}\]
this F-statistic has \(\color{green}{K}\) numerator df and \(\color{green}{N-K-1}\) denominator df
R computes the sample F-statistic for this test of overall significance and reports it in the regression output as
F-statistic
- recall \[\color{blue}{R^2=\dfrac{\sum(\hat{y}_i-\bar{y})^2}{\sum(y_i-\bar{y})^2}= 1- \dfrac{RSS}{TSS}}\]
so the sample F-statistic for testing the significance of the model becomes: \[\color{green}{F=\dfrac{(TSS-RSS)/K}{RSS/(N-K-1)}= \dfrac{R^2/K}{(1-R^2)/(N-K-1)}}\]
since \(\color{blue}{(TSS-RSS)=TSS*R^2}\) and \(\color{blue}{RSS=TSS(1-R^2)}\)
Individual or Joint Tests
consider \(H_0:\beta_1=\beta_2=0\)
- why not just perform a t-test for each of the null hypotheses \(H_0:\beta_1=0\) and \(H_0: \beta_2=0\) ?
the reason is \(\color{green}{\text{corr}(b_1,b_2)}\) is not necessarily zero so that the F-testing procedure makes allowance for correlations between the OLS estimators
the F test is a joint test for whether the pair of values \(\beta_1=0\) and \(\beta_2=0\) are consistent with the data
testing \(\beta_1=0\) using a t-test does not take into account the possibility that \(\beta_2=0\) and no allowance is made for \(\color{green}{\text{corr}(b_1,b_2)}\)
Worked Examples
Worked Example 1
- Using the
carpackage in R
- Hypothesis testing for a single parameter: t-test and F-test
Worked Example 2
- Using the
carpackage in R
- Hypothesis testing the overall significance of a model: F-test
Worked Example 3
- Using the
carpackage in R
- Testing joint linear hypotheses: F-test
Additional Example
- Using the
carpackage in R - Testing the overall significance of a model: F-test
- Testing hypotheses in a quadratic model: t-test and F-test
Model Specification
in any econometric analysis, specification if the model is one of the first steps in the econometric methodology
three essential features of model specification are
- choice of functional form
- choice of explanatory variables
- omission of relevant explanatory variables
- inclusion of irrelevant variables
- examining whether the assumptions of the MLRM hold, and if not which assumptions are violated
- choice of functional form
for items 1 and 2, economic principles and logical reasoning play a prominent role
Functional Form
The MLRM does not necessarily restrict the relationship between \(X\) and \(y\) to be linear.
- Often economic theory implies a non-linear relationship between the variables \(X\) and \(y\).
However, it does restrict the way the parameters \(\beta_j\) enter the econometric model
The econometric model must be linear in parameters.
The parameters \(\beta_j\) cannot be multiplied together, divided, squared, etc.
The variables \(X\) and \(y\) can be transformed in any way, as long as the resulting model satisfies the assumptions of the regression model.
Choose a functional form that is sufficiently flexible to fit the data while preserving the assumptions about the random error term.
Summary
Linear Model
\[\color{blue}{y+\beta_0+\beta_1\,X + \varepsilon}\]
where
\[\color{red}{\beta_1= \dfrac{\Delta E[y|X]}{\Delta X}}\] so \(\color{green}\beta_1\) represents the slope of the conditional mean function.
Log- Linear Model \[\color{blue}{\text{ln}\,y+\beta_0+\beta_1\,X + \varepsilon}\] so
\[\color{red}{(100*\beta_1) \approx \left( \dfrac{\% \Delta E[y|X]}{\Delta X} \right)}\]
so \(\color{blue}{(100*\beta_1)}\) represents the (approximate) percentage change in \(E[y|x]\) associated with a change in the level of \(X\), for a ‘small’ change in X (semi-elasticity)
Log-Linear Model
Model (2): \(\color{red}{b_1=0.07676}\) so an additional year of education raised average wages by 7.68%
Linear-Log Model
Linear-Log Model \[\color{blue}{y= \beta_0+\beta_1\, \text{ln}\,X}\]
so \[\color{red}{\dfrac{\beta_1}{100}= \dfrac{1}{100}* \left( \dfrac{\Delta E[y|X]}{\Delta X/X} \right) = \left(\dfrac{\Delta E[y|X]}{\% \Delta X} \right)}\]
so \(\color{red}{\beta_1/100}\) represents the level change in \(E[y|X]\) associated with a percentage change in the level of \(X\), for a small change in \(X\).
Alternatively, \(\color{blue}{\beta_1}\) then represents the change in \(E[y|X]\) associated with a doubling or 100% change in \(X\).
Log-Log Model
Log-Log Model \[\color{blue}{\text{ln}\,y= \beta_0+\beta_1\, \text{ln}\,X}\]
so \[\color{red}{\beta_1= \dfrac{\Delta E[\text{ln}\,y|X]}{\Delta \text{ln}\,X} \approx \dfrac{\Delta E[y|X/E[y/X]]}{\Delta\, X\X}}\]
so
\[\color{red}{\beta_1 \approx \dfrac{100}{100}* \left( \dfrac{\Delta E[y|X/E[y/X]]}{\Delta\, X\X} \right) = \dfrac{\% \Delta E[y|X]}{\% \Delta X}}\]
so \(\color{blue}{\beta_1}\) represents the (approximate) percentage change in \(E[y|X]\) associated with a percentage change on the level of \(X\).
Note the the parameter \(\beta_1\) can be interpreted as an elasticity.
Example: GDP and Child Labour
Question: What is the relationship between child labour and GDP per-capita?
- we expect a negative relationship between GDP per-capita and the share of child labour
- countries with larger GDP per capita will tend to have, on average, a lower share of child labour
- but it might be a non-linear relationship - the child labour share might decline sharply with GSP per-capita
- the reduction on child labour share might depend upon the level of GDP per-capita
- increases on GDP per-capita for low levels of GDP per-capita might have a greater effect upon the child labour share
theory suggest a non-linear relationship between GDP per-capita and child labour but does not provide the functional form of the relationship - as GDP per-capita increases, slope of the conditional mean becomes less negative
plot of the data suggests negative but non-linear relationship between GDP per-capita and child share
plot of data suggests that the following reciprocal econometric model might be appropriate: \[\color{blue}{\text{cshare}_i = \beta_0 + \beta_1 \, \dfrac{1}{\text{gdp}_i}+ \varepsilon_i}\]
as gdp (per-capita) \(\rightarrow \infty\), \(E[\text{cshare|gdp}] \rightarrow \beta_0\)
slope becomes flatter as GDP per-capita increases:
\[\color{red} {\dfrac {\Delta E[\text{cshare|gdp}]}{\Delta \text{gdp}} = - \beta_1 \dfrac{1}{\text{gdp}^2}}\]slope depends upon the level of GDP per-capita
when \(\beta_1>0\) , slope is negative for all values of GDP per-capita
- \(\color{blue}{b_1= - 0.0.2696}\) - negative relationship between child share and GDP per-capita
- additional $100 of income reduces child share by 0.27 percentage points
- look at the fitted values - linear model predicts negative child share for some ‘wealthy’ countries (red line)
Reciprocal Model
GDP and Child Labour (OLS Residuals)
GDP and Child Labour Reciprocal Model (OLS Residuals)
GDP and Child Labour Reciprocal Model (Fitted Values)
\(\color{blue}{b_1>0}\) do slope of the fitted regression line is negative for all values of GDP per-capita
Policy Implication: raising GDP per-capita of the ‘poorest’ countries will have the largest effect upon the child labour share
look at the fitted values
evaluating at mean GDP per-capita of $15,156, average slope is approximately -0.056. Compare to estimated linear effect of -0.2696
\[\color{red}{\dfrac {\Delta E{[\text{cshare|gdp]}}}{\Delta \text{gdp}} = +b_1\, \dfrac{1}{\overline{\text{gdp}}^2}} = - \dfrac{12.8673}{(15.156)^2}=-0.056\]
- as GDP per-capita \(\rightarrow \infty\), child share \(\rightarrow \approx 3 \%\) - and statistically significantly different from zero.
Choosing a Funtional Form
economic theory may not provide enough information to identify which functional form is appropriate
several alternative functional forms may be consistent with the restrictions suggested by economic theory
we need to choose a functional form that is:
- sufficiently flexible to fit the data
- while at the same time preservingthe assumptions about the error term
- sufficiently flexible to fit the data
What to do?
plot the data - check whether for larger values of \(X\), \(y\) tends to increase (or decrease) at an increasing, constant or decreasing rate. This might give us some indication if the appropriate functional form
pick a functional form and plot the residuals - check whether the residuals for the chosen functional form are consistent with zero mean and constant variance random errors.
ideally there should be no (systematic) pattern of any sort in the residuals
if there does appear to be a systematic pattern, then maybe an alternative functional form is appropriate
[next lecture]style=“color:grey;”}: testing for ‘incorrect’ functional form
Simulation Example
Consider the following econometric model
\[\color{blue}{y_i = \beta_0+\beta_1\, X_{1i} + \beta_2\, X_{2i}+ \beta_3\, X_{2i}^2+ \varepsilon_i \qquad \varepsilon_i|X_i \thicksim \mathcal{N}(0,1)}\]
The true values of the parameters are given by: \[ \color{green}{ \begin{align} \beta_0 & = 1 \\ \beta_1 & = 2 \\ \beta_2 & = 3 \\ \beta_3 & = 4 \end{align} } \]
\(X\) is bi-variate normally distributed \[ \color{red}{ \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} \sim \mathcal{N} \left( \begin{matrix} 1 \\ 2 \end{matrix}, \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \right) } \]
with \(\color{blue}{\text{E}(X_1)=1,\text{E}(X_2)=2,\text{VAR}(X_1)=1,\text{VAR}(X_2)=2}\) and \(\color{red}{\text{COV}(X_1,X-2)=0}\) \(y\) will be normally distributed \[ \color{green}{Y_i \thicksim \mathcal{N}\left(\beta_0+\beta_1\, X_{1i}+\beta_2\, X_{2i}+\beta_3\, X_{2i}^2,1 \right)} \]
Suppose instead that we estimate the following ‘incorrect’ model, ignoring the quadratic relationship in \(X_2\): \[\color{green}{y_i=\beta_0+\beta_1\, X_{1i}+ \beta_2\, X_{2i} + \varepsilon_i}\]
We are estimating the wrong functional form, imposing restriction \(\color{green}{\beta_3=0}\).
omitted variable bias: omitted variable \(X_{2i}^2\)
\(\color{green}{\beta_3>0}\) and \(\color{green}{\text{COV}(X_2,X_2^2)>0}\)
relative to the true model, all of the estimated coefficients in the ‘incorrect’ model will be biased
estimate for \(\color{green}{\beta_2}\) will generally be upward biased
note that the bias is a property of the estimator. We cannot determine the sign and magnitude of the bias from a single estimate.
Why is \(\text{COV}(X_1,X_2) \neq 0\)
The ‘incorrect’ model can be written as: \[\color{blue}{y_i = \beta_o + \beta_1\, X_{1i}+\beta_2\, X_{2i} + \left \{\beta_3 \, X_{2i}^2 + \varepsilon_i \right\}}\] - the role of the residuals is to capture everything that is not in the model
- The pattern that would otherwise be explained by the true model would be revealed in the residuals
Although we can identify the missing pattern by putting the regressors in different functions one-by-one, an efficient way to test if there is any misspecification is to compare what has been explained by the model \(\left( \hat{y}_i \right)\) against what is not explained by the model \(\left( \hat{e} \right)\).
our misspecified model has omitted a quadratic term. The scatter plot of the predicted values \(\left( \hat{y}_i \right)\) against the residuals \(\left( \hat{e} \right)\) shows a parabolic relationship.