Multiple Linear Regression

To create the model I chose the uswages data set from the faraway package. I want to see the relationship of education and race compared to the weekly wage earned. First we will look at the data to better understand it.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.3
data("uswages")
head(uswages)
##         wage educ exper race smsa ne mw so we pt
## 6085  771.60   18    18    0    1  1  0  0  0  0
## 23701 617.28   15    20    0    1  0  0  0  1  0
## 16208 957.83   16     9    0    1  0  0  1  0  0
## 2720  617.28   12    24    0    1  1  0  0  0  0
## 9723  902.18   14    12    0    1  0  1  0  0  0
## 22239 299.15   12    33    0    1  0  0  0  1  0
attach(uswages)

Now we will create a linear model with educ as a quantitative predictor, race as a qualitative predictor, and wage as the response. Then we will analyze the summary function of the model for all the values we need.

mymod<-lm(wage~educ+race)
summary(mymod)
## 
## Call:
## lm(formula = wage ~ educ + race)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -748.0 -260.6  -72.6  177.6 7471.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  133.668     45.071   2.966 0.003055 ** 
## educ          36.930      3.324  11.109  < 2e-16 ***
## race        -124.812     37.231  -3.352 0.000816 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 444.4 on 1997 degrees of freedom
## Multiple R-squared:  0.06692,    Adjusted R-squared:  0.06599 
## F-statistic: 71.61 on 2 and 1997 DF,  p-value: < 2.2e-16

First we will interpret the values of our model.Our \(\hat{\beta_0}\)=133.668, meaning that a person who is white(race=0) and has no education is expected to still earn 133 dollars per week. Next,\(\hat{\beta_1}\)= 36.930, meaning that for every additional year of education, a person receives about 37 dollars more per week, holding all else constant. Finally our \(\hat{\beta_3}\)=-124.812, this means that an African American person (race=1) receives about 125 dollars less per week than anyone else, holding everything else constant.

Hypothesis Test

Next we analyze whether or not there is a linear relationship between education and the weekly wage received. Our null hypothesis is that there is no relationship between the two (,\(\hat{\beta_1}\)=0). The alternative hypothesis is that there is a linear relationship between them. From the summary function in the “educ” row it tells us that our test stat is 11.109 and the degrees of freedom in our model is 1997 (2000 observations-3 variables). The next columns provides our p-val. Since the p-val is very small we can confidently reject the null hypothesis in favor of the alternative. Meaning that there is a linear relationship between the education level of an individual and their weekly wage.

Confidence Intervals of Regression Parameters

Next we will create a 95% confidence interval for all 3 coefficient values.

confint(mymod)
##                  2.5 %    97.5 %
## (Intercept)   45.27748 222.05776
## educ          30.41033  43.44912
## race        -197.82771 -51.79699

These values can be interpreted as, we a 95% confident that the coefficient value lies within these values. \[\hat{\beta_0}=[45.27748,222.05776]\] \[\hat{\beta_1}=[30.41033,43.44912]\] \[\hat{\beta_3}=[-197.82771,-51.79699]\]

Confidence and Prediction Interval for Mean Value

Lastly, we will create both a Confidence and Prediction interval for the mean value of our output \(\hat{y_i}\).First we must create a data frame to base our prediction off of. we will set our education value at 14 and race as African American (race=1).

newdata<-data.frame(educ=14,race=1)
predi <- predict(mymod, newdata, interval="predict") 
predi
##        fit       lwr      upr
## 1 525.8714 -348.5449 1400.288

As expected, the prediction interval provides a very large window because it bases the prediction on one sole observation. Based on one observation, we are 95% confident that an individual with the specifics given has a weekly wage within this window. Now we will look at the confidence interval for the mean value, with the expectation of a much smaller interval.

confi<-predict(mymod,newdata,interval = "confidence")
confi
##        fit      lwr      upr
## 1 525.8714 455.0116 596.7312

This interval is much smaller. Utilizing the entire data set to create the interval leads to the same amount of confidence in a smaller interval. We are 95% confident that an African American individual with 14 years of education will earn a weekly wage within this window.