Problem 1. We estimate the following linear regression model with a sample of \(n\) observations.
\[ Model \quad (1): \quad y= \beta_1 + \beta_2 x_2 + \beta_3x_3+\epsilon \]
Consider a change of unites of measurement of regressor \(x_2\) (only this regressor), wher enow, each new value of this regressor is, \(x^*_2 = a\cdot x_2\) . Define \(X\) as the matrix of observations of the original regressors and \(X^*\) as the matrix of observations with the change of units applied to regressor \(x_2\).
Question (a.i) If we set \(X^* = X \cdot A\) what would be the elements of matrix A?
We can define the matrix \(X\) as:
\[ X=\begin{bmatrix} 1 & x_{12} & x_{13} \\ 1 & x_{22} & x_{23} \\ \vdots & \vdots & \vdots \\ 1 & x_{n2} & x_{n3} \end{bmatrix} \]
Therefore the matrix \(X^*\) would be:
\[ X^*=\begin{bmatrix} 1 & ax_{12} & x_{13} \\ 1 & ax_{22} & x_{23} \\ \vdots & \vdots & \vdots \\ 1 & ax_{n2} & x_{n3} \end{bmatrix} \]
And matrix \(A\) would be:
\[ A=\begin{bmatrix} 1 & 0 & 0 \\ 1 & a & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \]
Question (a.ii) Using matrix algebra, derive the expression that relates \(\hat{\beta^*}\) (the OLS estimator of parameter vector \(\beta\) after the change of units of measurement), with \(\hat{\beta}\) (the OLS estimator before the change of units of measurement).
The OLS coefficient vector minimizes the sum of squared residuals. In linear algebra terms it is the solution to this minimization problem:
\[ \begin{aligned} Min \quad S(b) = e_0'e_0= (y-X^*b)'(y-X^*b) \\ e_0'e_0 = y'y-b'X^{*'}y-y'X^*b +b'X^{*'}X^*b \\ e_0'e_0 = y'y - 2y'X^{*'}b+b'X^{*'}X^*b \end{aligned} \]
The necessary conditions for a minimum is:
\[ \frac{\partial S}{\partial b} = - 2X^{*'}y+ 2X^{*'}X^*b = 0 \]
If the inverse of the transpose of the regressor matrix times the regressor matrix exists, then the solution is:
\[ \hat{\beta^*} =(X^{*'}X^*)^{-1} X^{*'}y \]
Where
\[ \hat{\beta} =(X^{'}X)^{-1} X^{'}y \]
Therefore we can work our way to see how \(\hat{\beta^*}\) is proportional to \(\hat{\beta}\).
First we can check how both matrix of regressor are related:
\[ X^* = X A \\ \]Second, we know that \((AB)^{-1}= (B)^{-1} (A)^{-1}\). and we also know that \((AB)'=B'A'\)
The first step just substitutes the altered matrix of regressor with the matrix of regressors postmultiplied by the matrix of transformation. The second step uses the property of transposition. The third step uses the property of inverse of a product and bares in mind that \(A^{'-1}A'=I\)
\[ \begin{aligned} \hat{\beta^*} =((XA)'(XA))^{-1} (XA)'y \quad (1)\\ = (A'X'XA)^{-1}A'X'y \quad (2)\\ = A^{-1}(X'X)^{-1}A^{'-1}A'X'y\quad (3) \\ =A^{-1}\hat{\beta}\\ \end{aligned} \]
Question (a. iii) Comment on the unit-dependency of least squares estimator.
In this exercise we have seen how the columns of the matrix of regressors are linearly transformed. Common applications would include changes in the units of measurement, say by changing units of currency, hours to minutes, or distances in miles to kilometers. This is a useful practical, algebraic result. For example, it simplifies the analysis in the first application suggested, changing the units of measurement. If an independent variable is scaled by a constant, \(p\), the regression coefficient will be scaled by \(1/p\). There is no need to recompute the regression.
Nevertheless we have to bare in mind that for this result to be true matrix A has to bee a non-singular matrix. Unfortunately, as it happens that it is a column vector is not square and therefore is not invertible and this result would not hold in a real world example.
Question b. Consider now the following linear regression model:
\[ Model(2) \quad y= \beta_1+\beta_2ln(x_2)+\beta_3x_3 + \beta_4x_4+\epsilon \]
The same change of the units of measurement of regressor x2, that we discussed in relationship to Model(1), are considered here for Model(2). Using matrix algebra with similar steps as before, derive the expression that relates \(\hat{\beta}\) (OLS) estimator of parameters of Model(2) with the original units) with \(\hat{\beta^*}\) (OLS) estimator of parameters of Model(2) with change of units of measurement). Are you surprised?
\[ X=\begin{bmatrix} 1 & ln(x_{12}) & x_{13} &x_{14} \\ 1 & ln(x_{22}) & x_{23} &x_{24} \\ \vdots & \vdots & \vdots &\vdots\\ 1 & ln(ax_{n2}) & x_{n3} &x_{n4} \end{bmatrix} \]
\[ X^*=\begin{bmatrix} 1 & ln(ax_{12}) & x_{13} &x_{14} \\ 1 & ln(ax_{22}) & x_{23} &x_{24} \\ \vdots & \vdots & \vdots &\vdots\\ 1 & ln(ax_{n2}) & x_{n3} &x_{n4} \end{bmatrix} \]
Nevertheless, we can express the second column of the matrix \(X^*\) as \(ln(a)+ln(x_{n,2})\). This is very important because as \(ln(a)\) is a number; we are just summing some constant to our variable and therefore we are not changing the relationship of our initial variable of interest \(ln(x_2)\) with our dependent variable \(y\).
From a statistical point of view we know we can express this relationship as \(\hat{\beta}_i\) which summarizes the relationship between \(y\) and \(x_i\). This relationship can be expressed as:
\[ \hat{\beta}_i = \frac{Cov(y,ln(a)+ln(x_i))}{Var(ln(a)+ln(x_i))} = \frac{Cov(y,ln(x_i))}{Var(ln(x_i))} \]The reason why the natural logarithm of “a” \((ln(a))\) is not in the final expression is because is a constant. For the denominator we know that the variance of a constant is zero. For the numerator we know that the covariance of a random variable and a constant is zero \((Cov(a,X)=0)\) therefore it does not affect the numerator. So, in other words, it does not alter the relationship between variables.
Nevertheless, there is not free lunch in multiplying an independent variable by a constant. Although it does not change the relationship between the independent and dependent variable it does change the intercept. If we assume a linear regression model is easy to see that:
\[ y=\beta_1+ \beta_2ln(ax_2) + \epsilon = \beta_1+ \beta_2ln(a) + \beta_2ln(x_2) = \beta'_1+ \beta_2ln(x_2) \]
The question is know how does it relate \(\beta_1\) with \(\beta'_1\) . The answer is that the new beta is just the old beta plus \(\beta_2\) times the constant in this case:
\[ \beta'_0=\beta_0+\beta_1ln(a) \]
To justify this step we can make use of the linear algebra expression for the vector of Betas. The reason why can be easily seen in the formula for estimating the intercept in matrix form:
In our case the expression for the intercept would be:
\[ \hat{\beta_1} = \bar{y}-\hat{\beta}_2 \bar{ln(x)}_2= \bar{y}-\hat{\beta}_2(a+ln(\bar{x}_2)) \]
To prove this is useful to run a regression. With an classical data set we will explain sales in terms of the money spend in advertising in different platforms. We run two models, the second just multiplies the variable “youtube” by two (an arbitrary value for “a”).
data("marketing", package = "datarium")
head(marketing, 4)
## youtube facebook newspaper sales
## 1 276.12 45.36 83.04 26.52
## 2 53.40 47.16 54.12 12.48
## 3 20.64 55.08 83.16 11.16
## 4 181.80 49.56 70.20 22.20
model_1 <- lm(sales ~ log(youtube) + facebook + newspaper , data = marketing)
model_2 <- lm(sales ~ log(2*youtube) + facebook + newspaper, data = marketing)
summary(model_1)
##
## Call:
## lm(formula = sales ~ log(youtube) + facebook + newspaper, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1081 -1.0924 -0.3047 0.9401 5.9971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.779831 0.715021 -16.475 <2e-16 ***
## log(youtube) 4.723414 0.136047 34.719 <2e-16 ***
## facebook 0.206700 0.008203 25.199 <2e-16 ***
## newspaper -0.002531 0.005596 -0.452 0.652
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 196 degrees of freedom
## Multiple R-squared: 0.9067, Adjusted R-squared: 0.9052
## F-statistic: 634.7 on 3 and 196 DF, p-value: < 2.2e-16
summary(model_2)
##
## Call:
## lm(formula = sales ~ log(2 * youtube) + facebook + newspaper,
## data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1081 -1.0924 -0.3047 0.9401 5.9971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.053852 0.802520 -18.758 <2e-16 ***
## log(2 * youtube) 4.723414 0.136047 34.719 <2e-16 ***
## facebook 0.206700 0.008203 25.199 <2e-16 ***
## newspaper -0.002531 0.005596 -0.452 0.652
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 196 degrees of freedom
## Multiple R-squared: 0.9067, Adjusted R-squared: 0.9052
## F-statistic: 634.7 on 3 and 196 DF, p-value: < 2.2e-16
As we can see the only difference is in the intercepts. The difference between the first and the second intercept is -3.28 which as we previously explain is - \(\hat{\beta_2}\cdot ln(2) = 4.72*0.693=3.28\)
So, the relationship between \(\hat{\beta}\) (OLS) estimator of parameters of Model(2) with the original units) with \(\hat{\beta^*}\) can be seen in the following expression:
\[ \hat{\beta}=\begin{bmatrix} \hat{\beta_1} \\ \hat{\beta_2} \\ \hat{\beta_3}\\ \hat{\beta_4} \end{bmatrix} \rightarrow \begin{bmatrix} \hat{\beta_1} -\hat{\beta_2}\cdot ln(a) \\ \hat{\beta_2} \\ \hat{\beta_3}\\ \hat{\beta_4} \end{bmatrix}=\hat{\beta}^* \]
Problem 2. If \(\hat{\epsilon} = y - X\hat{\beta}\) is the vector of OLS residuals associated with linear regression model \(y=XB+\epsilon\) prove that:
Question (a) \(X'\hat{\epsilon}=0\)
By how they are constructed, the residuals are orthogonal to the regressors, not only in the statistical sense but also as numerical vectors
We can define matrix X as:
\[ X_{nxk}=\begin{bmatrix} 1 & x_{12} & \cdots & x_{1k} \\ 1 & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & \ddots &\vdots \\ 1 & x_{n2} & \cdots & x_{nk} \end{bmatrix} \]
And we can define \(\hat{y}\) as: \[ y_{nx1}=[y_{11}, y_{21}, ...,y_{n1}]' \]
Therefore:
\[ \begin{aligned} X'e=X'(y-\hat{y})=X'(y-X\hat{\beta})\\ =X'(y-X((X^{'}X)^{-1} X^{'}y ))\\ =X'y-X'X(X'X)^{-1}X'y\\ =X'y - IX'y=0 \end{aligned} \]
Question (b) If the model includes a constant regressor: \(\sum_{i}^{n}\hat{\epsilon}_i=0\)
Previous exercise proved that the column vector that we obtain when premultiplying the trasposed regressor matrix with the error vector is equal to zero. This product has dimensions \(kx1\) so it is a column vector. Developing the expression we find that the first element of this vector is just sum of each observation error times 1 which is equal to the sum of each observation error. This sum is the first entry of a column vector composed of zeros so it has to be zero.
\[ X'e=\begin{bmatrix} 1 & x_{12} & \cdots & x_{1k} \\ 1 & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & \ddots &\vdots \\ 1 & x_{n2} & \cdots & x_{nk} \end{bmatrix}'\quad \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \end{bmatrix}=\\ \begin{bmatrix} 1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ x_{1k} & \cdots & x_{nk} \end{bmatrix} \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \end{bmatrix} = \begin{bmatrix} \sum_{i}e_i \\ ?\\ ? \\ ? \end{bmatrix}= \begin{bmatrix} 0 \\ 0\\ 0 \\ 0 \end{bmatrix} \]
Questions (c, d & e) Provide the dimensions of associated projection matrix, \(Px\), and residual maker matrix, \(Mx\)and prove that both matrices are idempotent.
First we define the residual maker matrix as:
\[ e=y-X(X^{'}X)^{-1} X^{'}y=(I-X(X^{'}X)^{-1} X^{'})y=MY \]
The dimensions of each matrix is:
\(y_{nx1}\)
\(X_{nxk}\)
Therefore:
\(M = I-X(X^{'}X)^{-1} X^{'}= nxn- nxk(kxn \cdot nxk)^{-1}kxn=nxn\)
Second we define the projection matrix as:
\[ \hat{y}= y-e = Iy-My=(I-M)y=X(X^{'}X)^{-1} X^{'}y =Py \]
Therefore:
\(P = X(X^{'}X)^{-1} X^{'} = nxk(kxn \cdot nxk)^{-1}kxn=nxn\)
An idempotent matrix, \(M\), is one that is equal to its square, that is, \(M^2=MM=M\). If \(M\) is a symmetric idempotent matrix then \(M'M=M\)
\(P\) is an \(nxn\) matrix, and it is indempotent which can be verified as follows:
\[ \begin{aligned} PP = (X(X^{'}X)^{-1} X^{'})(X(X^{'}X)^{-1} X^{'})\\ = X(X^{'}X)^{-1} (X^{'}X)(X'X)^{-1}X'\\ = X(X^{'}X)^{-1} IX'\\ =P \end{aligned} \]
A nice property is that \(I-P\) can be also shown to be idempotent and it would be a shorcut to show how the residual maker matrix is also idempotent:
\((I-P)(I-P)=I-2P+P^2=I-2P+P=(I-P)=MM=M\)
It is also very convenient to show the symmetry of the projection matrix.
For any square and invertible matrices, the inverse and transpose operator commute:
\((X^T)^{-1}= (X^{-1})^{T}\)
In addition, the transpose unary operator is an involution, since \((X^T)^{T}=X\) this it follows that \((X^TX)^{-1}\) is the self transpose, or in other words is symmetric.
\([(X^TX)^{-1}]^T=[(X^TX)^{-T}]^{-1}=(X^TX)^{-1}\)
Therefore, it is possible to apply this rules to prove that the projection matrix is self-transpose or symetric:
\[ P^T=[X(X^{T}X)^{-1} X^{T}]^T=X[(X^{T}X)^{-1}]^T X^{T} = P \]
This is equivalent for the matrix \(M\) is is also symmetric:
\[ M= I-X(X^{'}X)^{-1} X^{'} \\ M'= (I-X(X^{'}X)^{-1} X^{'})'\\ =(I'- I-X(X^{'}X)^{-1} X^{'})\\ =(I-X(X^{'}X)^{-1} X^{'})=M \]
And finally, we will show again that is idempotent, multiply by itself:
\[ \begin{aligned} MM= (I-X(X^{'}X)^{-1} X^{'})(I-X(X^{'}X)^{-1} X^{'})\\ =(I-X(X^{'}X)^{-1}X^{'}-X(X^{'}X)^{-1}X^{'}+X(X^{'}X)^{-1}X^{'}X(X^{'}X)^{-1}X^{'}\\ =(I - 2X(X^{'}X)^{-1}X^{'}+X(X^{'}X)^{-1}X^{'})\\ =(I - X(X^{'}X)^{-1}X^{'}) = M \end{aligned} \]
Problem 3. Consider estimating the following linear regression model using a sample of n observations:
\[ [(x_{i2}, x_{i3}, y_i)]_{i=n}^n \\ y= \beta_1 + \beta_2x_2 + \beta_3x_3 + \epsilon \]
Question (a) Provide dimensions for \(X, y.\)
\[ y=nx1 \\ X=nx3 \]
Question (b) For the special case (K = 3) verify that: \(X'X= \sum^n_ix_ix'_i\)
\[ X'X=\begin{bmatrix} 1 & 1 & \cdots &1 \\ x_{12} & x_{22} & \cdots &x_{n2} \\ x_{13} & x_{23} &\cdots & x_{n3} \end{bmatrix} \begin{bmatrix} 1 & x_{12} & x_{13} \\ 1 & x_{22} & x_{23} \\ \vdots & \vdots & \vdots \\ 1 & x_{n2} & x_{n3} \end{bmatrix} \\ = \begin{bmatrix} \sum^n_{i=1}x2_{i,1} & \sum^n_{i=1}x_{i,2}x_{i,1} &\sum^n_{i=1}x_{i,3}x_{i,1} \\ \sum^n_{i=1}x_{i,1}x_{i,2} & \sum^n_{i=1}x2_{i,2} &\sum^n_{i=1}x_{i,k}x_{i,2} \ \\\sum^n_{i=1}x_{i,1}x_{i,3} & \sum^n_{i=1}x_{i,1}x_{i,3} &\sum^n_{i=1}x2_{i,3} \end{bmatrix} = \sum^3_{i=1}x_ix'_i \]
\[ x_ix'_i=\begin{bmatrix} x_{1,i}x_{i,1} & x_{1,i}x_{i,2}& x_{1,i}x_{i,3} \\ x_{2,i}x_{i,1} & x_{1,2}x_{i,2}& x_{2,i}x_{i,3} \\ x_{3,i}x_{i,1} & x_{1,3}x_{i,2}& x_{3,i}x_{i,3} \end{bmatrix} \]
Question (c) For this special case (K=3) verify that: \(X'y=\sum_i^nx_iy_i\)
\(X'y=\)
\[ \begin{bmatrix} 1 & 1 & \cdots &1 \\x_{1,2} & x_{2,2} & \cdots & x_{n,2} \\x_{1,3} & x_{2,3} & \cdots & x_{n,3}\end{bmatrix}\begin{bmatrix} y_{1,1}\\\vdots\\y_{n,1}\end{bmatrix} =\\=\begin{bmatrix} X'_{row 1} \cdot y\\\vdots\\X'_{row 3} \cdot y_{1}\ = \end{bmatrix} \\ =\begin{bmatrix} \sum Y_i\\ \sum X_{i,2}Y_i \\ \sum X_{i,3} Y_i \end{bmatrix} =\begin{bmatrix} X_{column1} \cdot y\\ X_{column2} \cdot y\\X_{column 3} \cdot y\end{bmatrix} \]
\(\\=\sum_i^nx_iy_i\)
Problem 4. Consider n observations vector of variables \([v_1, v_2]^T\). The center of the space of these variables is given by the mean vector: \(\bar{v}= [1,2]^T\). And the covariance matrix is given by:
\[S = \begin{bmatrix} 4 & 1.5\\ 1.5 &1 \end{bmatrix}\]
Consider we are interested in calculating the distance between observation point \(v_0= [2,1]^T\). and the center of the space \(\bar{v}\)
Question (a & b)
Please, see Matlab code attached.
Question (c) \[ \begin{equation*} \overline{v}= \begin{pmatrix} E(v_1) \\ E(v_2) \end{pmatrix} \rightarrow \begin{pmatrix} E(10v_1) \\ E(10v_2) \end{pmatrix} = \begin{pmatrix} 10E(v_1) \\ 10E(v_2) \end{pmatrix} =10\overline{v} \end{equation*}\]
The mean vector \(\overline{v}\) is given by the expected value of \(v_n\). The expected value is a linear operator, hence multiplying each vector in the space by a scalar \(a\) has the effect of multiplying \(\overline{v}\) also by a scalar \(a\).
\[ S= \begin{pmatrix} Var(v_1) & Cov(v_1,v_2)\\ Cov(v_1,v_2) & Var(v_1) \end{pmatrix} \rightarrow \begin{pmatrix} Var(10v_1) & Cov(10v_1,10v_2)\\ Cov(10v_1,10v_2) & Var(10v_1) \end{pmatrix} = \begin{pmatrix} 100Var(v_1) & 100Cov(v_1,v_2)\\ 100Cov(v_1,v_2) & 100Var(v_1) \end{pmatrix} =100S \]
Note that
\[Cov(X,Y)=E(XY)-E(X)E(Y)\rightarrow \] \[Cov(10X,10Y)=\] \[E(10X10Y)-E(10X)E(10Y)=\] \[100E(XY)-100(E(X)E(Y)= \] \[100(E(XY)-(E(X)E(Y))=\] \[100Cov(X,Y)\] Note that multiply each \(v_n\) by a scalar \(a\) has the effect of multiplying the variance-covariance matrix \(S\) by a scalar \(a^2\).
Question (d)
The Euclidean distance between \(v_0\) and \(\overline{v}\) is unit dependent, linear in the dependency. The Mahalanobis distance is unit-free, hence independent of any rescaling.
Problem 5. Article by Enikolopov et al. (2018) “Social media and corruption” estimates the following regresssion.
\[ R1: Corruption = \beta_1 + \beta_2ln(gdp) +\beta_3Socialnetshare +\epsilon \]
Using data set corruption_socialmedia.csv, which includes author’s data set, we would like to verify that the OLS estimate of parameter \(\beta_2\) of running regression R1 above can be also obtained by running the associated FWL regression, defined by the FWL Theorem.
Questions (a, b & e) Please, see Matlab code attached.
Question (c)
Show that \(y'M_xy\) is equivalent to \(SSE(\beta)\)
\[\mathbf{SSE(\boldsymbol{\beta})}=\epsilon' \epsilon = M_x'y'(y-\hat{y})=M_x'y'y-M_x'y'\hat{y}=y'(M_x'y-M_x'\hat{y})=\] \[=y'(M_x'(y-\hat{y}))=y'M_x' \epsilon=y'M_x'M_xy=y'M_xM_xy=\mathbf{y'M_xy}\] Note that the second-last step is possible because \(M_x\) is a symmetric matrix, whereas the last step exploits the fact that \(M_x\) is idempotent.
Question (d)
\[R2:M_{x1} y = M_{x1} X_2 \gamma_2 + u \]
where
\[M_{x1} = I_n-X_1(X_1'X_1)^{-1} X_{1}'\]
In words, R2 is a regression of residuals on residuals. Specifically, \(M_{x1}y\) is an \(nx1\) vector of residuals derived from the regression in which corruption is the LHS variable and the regressor matrix is composed by a vector of constant, the log of GDP per capita and social media usage. On the other side, \(M_{x1}X_2\) is a \(nx1\) vector of residuals derived from regression \(X_2\) (social media usage )on \(X_1\) (constant, log of GDP per capita). The regression R2 essentially regress \(y\) on \(X_2\) taking into account the effect of \(X_1\), specifically in the residual maker matrix \(M_{X1}\). Hence, it it delivers exactly the same estimate for \(X_2\) coefficient as R1 and the interpretation of the coefficient should be the same.
Question (f)
The figure plots the association between corruption and share of social network users, taking into account the level of GDP per capita. The line is nothing else than the graphical representation of R2. Note that the coefficient indicated in the chart is exactly the same we estimated in R1 and R2.