MOOC Econometrics

Test Exercise 2

Notes:

• See website for how to submit your answers and how feedback is organized. • For parts (e) and (f), you need regression results discussed in Lectures 2.1 and 2.5. Goals and skills being used: • Use matrix methods in the econometric analysis of multiple regression. • Employ matrices and statical methods in multiple regression analysis. • Give numerical verification of mathematical results.

Questions

This test exercise is of a theoretical nature. In our discussion of the F-test, the total set of explanatory factors was split in two parts. The factors in X1 are always included in the model, whereas those in X2 are possibly removed. In questions (a), (b), and (c) you derive relations between the two OLS estimates of the effects of X1 on y, one in the large model and the other in the small model. In parts (d), (e), and (f), you check the relation of question (c) numerically for the wage data of our lectures. We use the notation of Lecture 2.4.2 and assume that the standard regression assumptions A1-A6 are satisfied for the unrestricted model. The restricted model is obtained by deleting the set of g explanatory factors collected in the last g columns X2 of X. We wrote the model with X = (X1 X2) and corresponding partitioning of the OLS estimator b in b1 and b2 as y = X1β1 + X2β2 + ε = X1b1 + X2b2 + e. We denote by bR the OLS estimator of β1 obtained by regressing y on X1, so that bR = (X01X1)−1X01y. Further, let P = (X 01X1)−1X01X2.

(a) Prove that E(bR) = β1 + Pβ2.

# Step 1: We need to express B_R in terms of epsilon

# bR=(X′1X1)−1X′1y
#   =(X′1X1)−1X′1(X1β1+X2β2+ϵ)
#   =(X′1X1)−1X′1X1β1+(X′1X1)−1X′1X2β2+(X′1X1)−1X′1ϵ
#   =Iβ1+(X′1X1)−1X′1X2β2+(X′1X1)−1X′1ϵ
#   =β1+Pβ2+(X′1X1)−1X′1ϵ


# bydefinitiony=X1β1+X2β2+ϵ
# bymatrixalgebra(X′1X1)−1X′1X1=I
#                (X′1X1)−1X′1X2=P


# So as consequence E(bR) is:
# E(bR)=E(β1+Pβ2+(X′1X1)−1X′1ϵ)
#      =β1+Pβ2+(X′1X1)−1X′1E(ϵ)
#      =β1+Pβ2β,  if  Xfixed(A2,A6) and the  E(ϵ)=0(A3)

(b) Prove that var(bR) = σ2(X01X1)−1.

# Defining that bR−E(bR) be in terms of X1, and using part (a):


# bR−E(bR)=[β1+Pβ2+(X′1X1)−1X′1ϵ]−[β1+Pβ2]
#         =(X′1X1)−1X′1ϵ  ... by(a)


#Then, we determine var(bR):

# var(bR)=E([bR−E(bR)][bR−E(bR)]′)
#         =E([(X′1X1)−1X′1ϵ][(X′1X1)−1X′1ϵ]′)
#         =E([(X′1X1)−1X′1ϵ][ϵ′(X′1)′((X′1X1)−1)′])
#         =E([(X′1X1)−1X′1ϵ][ϵ′X1(X′1X1)−1])
#         =E((X′1X1)−1X′1ϵϵ′X1(X′1X1)−1)
#         =(X′1X1)−1X′1E(ϵϵ′)X1(X′1X1)−1
#         =(X′1X1)−1X′1σ2IX1(X′1X1)−1
#         =σ2(X′1X1)−1X′1X1(X′1X1)−1
#         =σ2(X′1X1)−1
#
#

(c) Prove that bR = b1 + Pb2.

#Determining bR:
# bR=(X′1X1)−1X′1y
#   =(X′1X1)−1X′1(X1b1+X2b2+e)
#   =(X′1X1)−1X′1X1b1+(X′1X1)−1X′1X2b2+(X′1X1)−1X′1e
#   =Ib1+(X′1X1)−1X′1X2b2+(X′1X1)−1X′1e
#   =b1+Pb2+(X′1X1)−1X′1e
#   =b1+Pb2

Now consider the wage data of Lectures 2.1 and 2.5. Let y be log-wage (500×1 vector), and let X1 be the (500×2) matrix for the constant term and the variable ‘Female’. Further let X2 be the (500 × 3) matrix with observations of the variables ‘Age’, ‘Educ’ and ‘Parttime’. The values of bR were given in Lecture 2.1, and those of b in Lecture 2.5.

(d) Argue that the columns of the (2 × 3) matrix P are obtained by regressing each of the variables ‘Age’, ‘Educ’,and ‘Parttime’ on a constant term and the variable ‘Female’.

# Is real that X1 and X2 matrices, are:
# a) X1 is a (n×2) matrix, with 2 columns for the constant term and the ‘Female’ variable.
# b) X′1 is its (2×n) transpose.
# c) And X2 is a (n×3) matrix, with 3 columns for the ‘Age’, ‘Educ’ and ‘Parttime’ variables.
# d) Consequently, (X′1X1)−1 is a (2×2) matrix.
# e) and (X′1X1)−1X′1 is a (2×n) matrix.

# So we can conclude that:


# P=(X′1X1)−1X′1X2 is a (2×3) matrix, with columns as such:

#       Column 1: (X′1X1)−1X′1(Age); the OLS formula for regressing ‘Age’ on X1.
#       Column 2: (X′1X1)−1X′1(Educ); the OLS formula for regressing ‘Educ’ on X1.
#       Column 3: (X′1X1)−1X′1(Parttime); the OLS formula for regressing ‘Parttime’ on X1.

(e) Determine the values of P from the results in Lecture 2.1.

# Loading data
library(readxl)
df <- read_excel("DataSet2.xls",col_names = TRUE)
summary(df)
##   Observation         Wage          LogWage          Female     
##  Min.   :  1.0   Min.   : 32.0   Min.   :3.466   Min.   :0.000  
##  1st Qu.:125.8   1st Qu.: 72.0   1st Qu.:4.277   1st Qu.:0.000  
##  Median :250.5   Median :100.0   Median :4.605   Median :0.000  
##  Mean   :250.5   Mean   :114.9   Mean   :4.641   Mean   :0.368  
##  3rd Qu.:375.2   3rd Qu.:144.0   3rd Qu.:4.970   3rd Qu.:1.000  
##  Max.   :500.0   Max.   :384.0   Max.   :5.951   Max.   :1.000  
##       Age             Educ          Parttime    
##  Min.   :20.00   Min.   :1.000   Min.   :0.000  
##  1st Qu.:32.00   1st Qu.:1.000   1st Qu.:0.000  
##  Median :39.00   Median :2.000   Median :0.000  
##  Mean   :40.01   Mean   :2.078   Mean   :0.288  
##  3rd Qu.:47.00   3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :70.00   Max.   :4.000   Max.   :1.000
# Building the matrix X1 (n×2):
constant <- rep(1,500)
matrix2_n <- data.frame(Constant = constant, Female = df$Female)
summary(matrix2_n)
##     Constant     Female     
##  Min.   :1   Min.   :0.000  
##  1st Qu.:1   1st Qu.:0.000  
##  Median :1   Median :0.000  
##  Mean   :1   Mean   :0.368  
##  3rd Qu.:1   3rd Qu.:1.000  
##  Max.   :1   Max.   :1.000
 x1 <- cbind("Constant"=matrix2_n$Constant, "Female" = matrix2_n$Female)
str(x1)
##  num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Constant" "Female"
# and then its X′1 (2×n) transpose:
 x1_t<- t(x1)
str(x1_t)
##  num [1:2, 1:500] 1 0 1 1 1 1 1 0 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:2] "Constant" "Female"
##   ..$ : NULL
# Building X2 (n×3) matri
mat_3_n <- data.frame(Age = df$Age, Educ= df$Educ, Parttime= df$Parttime)
summary(mat_3_n)
##       Age             Educ          Parttime    
##  Min.   :20.00   Min.   :1.000   Min.   :0.000  
##  1st Qu.:32.00   1st Qu.:1.000   1st Qu.:0.000  
##  Median :39.00   Median :2.000   Median :0.000  
##  Mean   :40.01   Mean   :2.078   Mean   :0.288  
##  3rd Qu.:47.00   3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :70.00   Max.   :4.000   Max.   :1.000
x2 <- cbind("Age"=mat_3_n$Age, "Educ"=mat_3_n$Educ, "Parttime"=mat_3_n$Parttime)
summary(x2)
##       Age             Educ          Parttime    
##  Min.   :20.00   Min.   :1.000   Min.   :0.000  
##  1st Qu.:32.00   1st Qu.:1.000   1st Qu.:0.000  
##  Median :39.00   Median :2.000   Median :0.000  
##  Mean   :40.01   Mean   :2.078   Mean   :0.288  
##  3rd Qu.:47.00   3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :70.00   Max.   :4.000   Max.   :1.000
#Producing X′1X1 (2×2) matrix:
x_xt<- (x1_t%*%x1)
x_xt
##          Constant Female
## Constant      500    184
## Female        184    184
# Now getting the inverse:
x_xti <- solve(x_xt)
x_xti
##              Constant       Female
## Constant  0.003164557 -0.003164557
## Female   -0.003164557  0.008599340
# now get the the (X′1X1)−1X′1 (2×n) matrix:
 premult <- x_xti%*%x1_t


# finally  producing the P=(X′1X1)−1X′1X2 (2×3) matrix:
Matrix_P <- premult%*%x2

round(Matrix_P,digits = 2)
##            Age  Educ Parttime
## Constant 40.05  2.26     0.20
## Female   -0.11 -0.49     0.25

(f) Check the numerical validity of the result in part (c). Note: This equation will not hold exactly because thecoefficients have been rounded to two or three decimals; preciser results would have been obtained for higherprecision coefficients.

# fitting the previous model:
fit <- lm(LogWage~ Female+Age+Educ+Parttime, data=df)
summary(fit)
## 
## Call:
## lm(formula = LogWage ~ Female + Age + Educ + Parttime, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73843 -0.15752 -0.00406  0.16491  0.77868 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.052685   0.055334  55.168   <2e-16 ***
## Female      -0.041101   0.024711  -1.663   0.0969 .  
## Age          0.030606   0.001273  24.041   <2e-16 ***
## Educ         0.233178   0.010660  21.874   <2e-16 ***
## Parttime    -0.365449   0.031571 -11.576   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2452 on 495 degrees of freedom
## Multiple R-squared:  0.704,  Adjusted R-squared:  0.7016 
## F-statistic: 294.3 on 4 and 495 DF,  p-value: < 2.2e-16
# Now getting and separating the coefficients:
b1 <- fit$coefficients[[1]]
b2 <- fit$coefficients[[2]]
b3 <- fit$coefficients[[3]]
b4 <- fit$coefficients[[4]]
b5 <- fit$coefficients[[5]]

basic_b <- rbind(b1,b2)
complementary_b <- rbind(b3,b4,b5)

#So In part (e), P matrix was computed as P:
Matrix_P
##                 Age       Educ  Parttime
## Constant 40.0506329  2.2594937 0.1962025
## Female   -0.1104155 -0.4931893 0.2494496
# Finally transforming the data bR =

 rb <- basic_b + (Matrix_P %*% complementary_b)
 
 round(rb, digits = 2)
##     [,1]
## b1  4.73
## b2 -0.25

This aligns with the bR defined in Lecture 2.1 by the the ‘Log(Wage)’ OLS regression on the constant and ‘Female’ variable: Log(Wage)i=4.73−0.25Femalei+ei