• See website for how to submit your answers and how feedback is organized. • For parts (e) and (f), you need regression results discussed in Lectures 2.1 and 2.5. Goals and skills being used: • Use matrix methods in the econometric analysis of multiple regression. • Employ matrices and statical methods in multiple regression analysis. • Give numerical verification of mathematical results.
This test exercise is of a theoretical nature. In our discussion of the F-test, the total set of explanatory factors was split in two parts. The factors in X1 are always included in the model, whereas those in X2 are possibly removed. In questions (a), (b), and (c) you derive relations between the two OLS estimates of the effects of X1 on y, one in the large model and the other in the small model. In parts (d), (e), and (f), you check the relation of question (c) numerically for the wage data of our lectures. We use the notation of Lecture 2.4.2 and assume that the standard regression assumptions A1-A6 are satisfied for the unrestricted model. The restricted model is obtained by deleting the set of g explanatory factors collected in the last g columns X2 of X. We wrote the model with X = (X1 X2) and corresponding partitioning of the OLS estimator b in b1 and b2 as y = X1β1 + X2β2 + ε = X1b1 + X2b2 + e. We denote by bR the OLS estimator of β1 obtained by regressing y on X1, so that bR = (X01X1)−1X01y. Further, let P = (X 01X1)−1X01X2.
# Step 1: We need to express B_R in terms of epsilon
# bR=(X′1X1)−1X′1y
# =(X′1X1)−1X′1(X1β1+X2β2+ϵ)
# =(X′1X1)−1X′1X1β1+(X′1X1)−1X′1X2β2+(X′1X1)−1X′1ϵ
# =Iβ1+(X′1X1)−1X′1X2β2+(X′1X1)−1X′1ϵ
# =β1+Pβ2+(X′1X1)−1X′1ϵ
# bydefinitiony=X1β1+X2β2+ϵ
# bymatrixalgebra(X′1X1)−1X′1X1=I
# (X′1X1)−1X′1X2=P
# So as consequence E(bR) is:
# E(bR)=E(β1+Pβ2+(X′1X1)−1X′1ϵ)
# =β1+Pβ2+(X′1X1)−1X′1E(ϵ)
# =β1+Pβ2β, if Xfixed(A2,A6) and the E(ϵ)=0(A3)
# Defining that bR−E(bR) be in terms of X1, and using part (a):
# bR−E(bR)=[β1+Pβ2+(X′1X1)−1X′1ϵ]−[β1+Pβ2]
# =(X′1X1)−1X′1ϵ ... by(a)
#Then, we determine var(bR):
# var(bR)=E([bR−E(bR)][bR−E(bR)]′)
# =E([(X′1X1)−1X′1ϵ][(X′1X1)−1X′1ϵ]′)
# =E([(X′1X1)−1X′1ϵ][ϵ′(X′1)′((X′1X1)−1)′])
# =E([(X′1X1)−1X′1ϵ][ϵ′X1(X′1X1)−1])
# =E((X′1X1)−1X′1ϵϵ′X1(X′1X1)−1)
# =(X′1X1)−1X′1E(ϵϵ′)X1(X′1X1)−1
# =(X′1X1)−1X′1σ2IX1(X′1X1)−1
# =σ2(X′1X1)−1X′1X1(X′1X1)−1
# =σ2(X′1X1)−1
#
#
#Determining bR:
# bR=(X′1X1)−1X′1y
# =(X′1X1)−1X′1(X1b1+X2b2+e)
# =(X′1X1)−1X′1X1b1+(X′1X1)−1X′1X2b2+(X′1X1)−1X′1e
# =Ib1+(X′1X1)−1X′1X2b2+(X′1X1)−1X′1e
# =b1+Pb2+(X′1X1)−1X′1e
# =b1+Pb2
Now consider the wage data of Lectures 2.1 and 2.5. Let y be log-wage (500×1 vector), and let X1 be the (500×2) matrix for the constant term and the variable ‘Female’. Further let X2 be the (500 × 3) matrix with observations of the variables ‘Age’, ‘Educ’ and ‘Parttime’. The values of bR were given in Lecture 2.1, and those of b in Lecture 2.5.
# Is real that X1 and X2 matrices, are:
# a) X1 is a (n×2) matrix, with 2 columns for the constant term and the ‘Female’ variable.
# b) X′1 is its (2×n) transpose.
# c) And X2 is a (n×3) matrix, with 3 columns for the ‘Age’, ‘Educ’ and ‘Parttime’ variables.
# d) Consequently, (X′1X1)−1 is a (2×2) matrix.
# e) and (X′1X1)−1X′1 is a (2×n) matrix.
# So we can conclude that:
# P=(X′1X1)−1X′1X2 is a (2×3) matrix, with columns as such:
# Column 1: (X′1X1)−1X′1(Age); the OLS formula for regressing ‘Age’ on X1.
# Column 2: (X′1X1)−1X′1(Educ); the OLS formula for regressing ‘Educ’ on X1.
# Column 3: (X′1X1)−1X′1(Parttime); the OLS formula for regressing ‘Parttime’ on X1.
# Loading data
library(readxl)
df <- read_excel("DataSet2.xls",col_names = TRUE)
summary(df)
## Observation Wage LogWage Female
## Min. : 1.0 Min. : 32.0 Min. :3.466 Min. :0.000
## 1st Qu.:125.8 1st Qu.: 72.0 1st Qu.:4.277 1st Qu.:0.000
## Median :250.5 Median :100.0 Median :4.605 Median :0.000
## Mean :250.5 Mean :114.9 Mean :4.641 Mean :0.368
## 3rd Qu.:375.2 3rd Qu.:144.0 3rd Qu.:4.970 3rd Qu.:1.000
## Max. :500.0 Max. :384.0 Max. :5.951 Max. :1.000
## Age Educ Parttime
## Min. :20.00 Min. :1.000 Min. :0.000
## 1st Qu.:32.00 1st Qu.:1.000 1st Qu.:0.000
## Median :39.00 Median :2.000 Median :0.000
## Mean :40.01 Mean :2.078 Mean :0.288
## 3rd Qu.:47.00 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :70.00 Max. :4.000 Max. :1.000
# Building the matrix X1 (n×2):
constant <- rep(1,500)
matrix2_n <- data.frame(Constant = constant, Female = df$Female)
summary(matrix2_n)
## Constant Female
## Min. :1 Min. :0.000
## 1st Qu.:1 1st Qu.:0.000
## Median :1 Median :0.000
## Mean :1 Mean :0.368
## 3rd Qu.:1 3rd Qu.:1.000
## Max. :1 Max. :1.000
x1 <- cbind("Constant"=matrix2_n$Constant, "Female" = matrix2_n$Female)
str(x1)
## num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "Constant" "Female"
# and then its X′1 (2×n) transpose:
x1_t<- t(x1)
str(x1_t)
## num [1:2, 1:500] 1 0 1 1 1 1 1 0 1 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:2] "Constant" "Female"
## ..$ : NULL
# Building X2 (n×3) matri
mat_3_n <- data.frame(Age = df$Age, Educ= df$Educ, Parttime= df$Parttime)
summary(mat_3_n)
## Age Educ Parttime
## Min. :20.00 Min. :1.000 Min. :0.000
## 1st Qu.:32.00 1st Qu.:1.000 1st Qu.:0.000
## Median :39.00 Median :2.000 Median :0.000
## Mean :40.01 Mean :2.078 Mean :0.288
## 3rd Qu.:47.00 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :70.00 Max. :4.000 Max. :1.000
x2 <- cbind("Age"=mat_3_n$Age, "Educ"=mat_3_n$Educ, "Parttime"=mat_3_n$Parttime)
summary(x2)
## Age Educ Parttime
## Min. :20.00 Min. :1.000 Min. :0.000
## 1st Qu.:32.00 1st Qu.:1.000 1st Qu.:0.000
## Median :39.00 Median :2.000 Median :0.000
## Mean :40.01 Mean :2.078 Mean :0.288
## 3rd Qu.:47.00 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :70.00 Max. :4.000 Max. :1.000
#Producing X′1X1 (2×2) matrix:
x_xt<- (x1_t%*%x1)
x_xt
## Constant Female
## Constant 500 184
## Female 184 184
# Now getting the inverse:
x_xti <- solve(x_xt)
x_xti
## Constant Female
## Constant 0.003164557 -0.003164557
## Female -0.003164557 0.008599340
# now get the the (X′1X1)−1X′1 (2×n) matrix:
premult <- x_xti%*%x1_t
# finally producing the P=(X′1X1)−1X′1X2 (2×3) matrix:
Matrix_P <- premult%*%x2
round(Matrix_P,digits = 2)
## Age Educ Parttime
## Constant 40.05 2.26 0.20
## Female -0.11 -0.49 0.25
# fitting the previous model:
fit <- lm(LogWage~ Female+Age+Educ+Parttime, data=df)
summary(fit)
##
## Call:
## lm(formula = LogWage ~ Female + Age + Educ + Parttime, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73843 -0.15752 -0.00406 0.16491 0.77868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.052685 0.055334 55.168 <2e-16 ***
## Female -0.041101 0.024711 -1.663 0.0969 .
## Age 0.030606 0.001273 24.041 <2e-16 ***
## Educ 0.233178 0.010660 21.874 <2e-16 ***
## Parttime -0.365449 0.031571 -11.576 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2452 on 495 degrees of freedom
## Multiple R-squared: 0.704, Adjusted R-squared: 0.7016
## F-statistic: 294.3 on 4 and 495 DF, p-value: < 2.2e-16
# Now getting and separating the coefficients:
b1 <- fit$coefficients[[1]]
b2 <- fit$coefficients[[2]]
b3 <- fit$coefficients[[3]]
b4 <- fit$coefficients[[4]]
b5 <- fit$coefficients[[5]]
basic_b <- rbind(b1,b2)
complementary_b <- rbind(b3,b4,b5)
#So In part (e), P matrix was computed as P:
Matrix_P
## Age Educ Parttime
## Constant 40.0506329 2.2594937 0.1962025
## Female -0.1104155 -0.4931893 0.2494496
# Finally transforming the data bR =
rb <- basic_b + (Matrix_P %*% complementary_b)
round(rb, digits = 2)
## [,1]
## b1 4.73
## b2 -0.25
This aligns with the bR defined in Lecture 2.1 by the the ‘Log(Wage)’ OLS regression on the constant and ‘Female’ variable: Log(Wage)i=4.73−0.25Femalei+ei