Homemade_Regression <- function(X,y) {
#Standard LS formula, which calculates the coefficient estimates
b <- solve(t(X)%*%X) %*% (t(X)) %*% y
print(b)
#Residuals of the LS regression line
e <- y - (X%*%b)
#Estimated Sigma^2. Note: Since X is an nxK matrix, we use the number of rows (n) in X and
#the number of columns (K) in order to calculate our degrees of freedom.
sigma_squared <- ((t(e)%*%e)/(nrow(X)-ncol(X)))
#Creating a matrix in order to get around a "non-conformable arrays" error.
Sigma_Squared_Matrix <- matrix(sigma_squared, nrow=ncol(X),ncol=ncol(X))
SE = sqrt(Sigma_Squared_Matrix * solve(t(X)%*%X))
#Standard errors are reported in the variance-covariance matrix, where standard errors are
#along the diagonal. Note: for reasons I don't understand, when SE is calculated with sqrt
#(Sigma_Squared_Matrix %*% ...) the variance-covariance matrix isn't computed correctly,
#thus, I used the regular * for multiplication.
print(SE)
}PS1
Problem 1
Problem 2 with created function
matrix_cars <- as.matrix(mtcars)
Intercept <- rep(1,nrow(matrix_cars))
Matrix_cars_with_int <- cbind(Intercept,matrix_cars)
FunctionRegression = Homemade_Regression((Matrix_cars_with_int[,c(1,3,5,7)]),Matrix_cars_with_int[,c(2)]) [,1]
Intercept 38.7517874
cyl -0.9416168
hp -0.0180381
wt -3.1669731
Warning in sqrt(Sigma_Squared_Matrix * solve(t(X) %*% X)): NaNs produced
Intercept cyl hp wt
Intercept 1.7868640 NaN 0.08548060 NaN
cyl NaN 0.5509164 NaN NaN
hp 0.0854806 NaN 0.01187625 NaN
wt NaN NaN NaN 0.7405759
Problem 2 with lm()
lm_regression = lm(mpg ~ cyl + hp + wt, data=mtcars)
summary(lm_regression)
Call:
lm(formula = mpg ~ cyl + hp + wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9290 -1.5598 -0.5311 1.1850 5.8986
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
cyl -0.94162 0.55092 -1.709 0.098480 .
hp -0.01804 0.01188 -1.519 0.140015
wt -3.16697 0.74058 -4.276 0.000199 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.512 on 28 degrees of freedom
Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
Problem 3
To be approximately correct, our standard errors rely upon the assumptions of:
Linearity (A1), as described in problem 4, although this also ensure that the expected value of the disturbance is 0.
Full Rank (A2), we must have full rank, as described below, or our standard errors would be biased since one of the x variables would be a linear combination of another.
Exogeneity (A3), this ensures that the expected value of the disturbance of the ith term conditionally on the x-ith term is 0. Furthermore, if the independent variables were not exogenous, it would lead to a biased error term. This is because our error term would include all of the omitted variables from our model.
Homoscedastisity and non-autocorrelation (A4), we must have constant variance in the error term. Without homoscedastisity, our errors would be flawed since at certain points in the model, the variance is far higher than other points, thus, leading us to biased standard errors.
Problem 4
To be causal, our coefficients rely upon the assumptions of:
Linearity (A1 in George’s class), which states that the relationship between the independent and dependent variable must be linear.
Full Rank (A2), which states that X has rank K, which rules out one independent variable being an exact linear combination of other independent variables.
Exogeneity (A3), which states that the independent variable must be exogenous, i.e. the expected value of the error term, conditional upon X, is equal to 0. If this were not the case, we likely have Omitted Variable bias, and our coefficients will be biased.
Homoscedastisity and non-autocorrelation (A4), which states we must have constant variance with the error term. Without this assumption, coefficients would be biased to the point where we may think they are statistically significant, when they in fact are not.
And, while it is not an assumption of the OLS model, we would typically like to be able to answer all 4 of Angrist and Pischke’s fundamental questions for research in order to be sure to establish a causal relationship.