Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
data("attitude")
head(attitude)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
summary(attitude)
## rating complaints privileges learning
## Min. :40.00 Min. :37.0 Min. :30.00 Min. :34.00
## 1st Qu.:58.75 1st Qu.:58.5 1st Qu.:45.00 1st Qu.:47.00
## Median :65.50 Median :65.0 Median :51.50 Median :56.50
## Mean :64.63 Mean :66.6 Mean :53.13 Mean :56.37
## 3rd Qu.:71.75 3rd Qu.:77.0 3rd Qu.:62.50 3rd Qu.:66.75
## Max. :85.00 Max. :90.0 Max. :83.00 Max. :75.00
## raises critical advance
## Min. :43.00 Min. :49.00 Min. :25.00
## 1st Qu.:58.25 1st Qu.:69.25 1st Qu.:35.00
## Median :63.50 Median :77.50 Median :41.00
## Mean :64.63 Mean :74.77 Mean :42.93
## 3rd Qu.:71.00 3rd Qu.:80.00 3rd Qu.:47.75
## Max. :88.00 Max. :92.00 Max. :72.00
#View Relatonship in data.
pairs(attitude , gap=0.5)
The linear p values for each variable are higher than the preselected threshold of .05. Residuals are balanced
c.lm<-lm(attitude$rating~attitude$complaints + attitude$privileges + attitude$learning + attitude$raises)
summary(c.lm)
##
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$privileges +
## attitude$learning + attitude$raises)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2663 -5.3960 0.5988 5.8000 11.2370
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.83354 8.53511 1.386 0.178
## attitude$complaints 0.69115 0.14565 4.745 7.21e-05 ***
## attitude$privileges -0.10289 0.13189 -0.780 0.443
## attitude$learning 0.24633 0.15435 1.596 0.123
## attitude$raises -0.02551 0.18388 -0.139 0.891
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.996 on 25 degrees of freedom
## Multiple R-squared: 0.7152, Adjusted R-squared: 0.6697
## F-statistic: 15.7 on 4 and 25 DF, p-value: 1.509e-06
Using backward elimination lets remove the variable which have highest p value above the threshold, which is this case is raises.Coefficents are:
11.8335428 + 0.6911487 x complaints + -0.1028856 x privileges + 0.2463306 x learning + r c.lm$coefficients[5] x raises.
c.lm<-lm(attitude$rating~attitude$complaints + attitude$privileges + attitude$learning)
summary(c.lm)
##
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$privileges +
## attitude$learning)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2012 -5.7478 0.5599 5.8226 11.3241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.2583 7.3183 1.538 0.1360
## attitude$complaints 0.6824 0.1288 5.296 1.54e-05 ***
## attitude$privileges -0.1033 0.1293 -0.799 0.4318
## attitude$learning 0.2380 0.1394 1.707 0.0997 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared: 0.715, Adjusted R-squared: 0.6821
## F-statistic: 21.74 on 3 and 26 DF, p-value: 2.936e-07
Using backward elimination lets remove the variable which have highest p value above the threshold, which is this case is privileges.Residuals, RSquared and Adjusted RSquared stays close to original range. Coefficents are: 11.2583051 + 0.6824165 x complaints + 0.2379762 x learning
c.lm<-lm(attitude$rating~attitude$complaints + attitude$learning)
summary(c.lm)
##
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$learning)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5568 -5.7331 0.6701 6.5341 10.3610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.8709 7.0612 1.398 0.174
## attitude$complaints 0.6435 0.1185 5.432 9.57e-06 ***
## attitude$learning 0.2112 0.1344 1.571 0.128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.817 on 27 degrees of freedom
## Multiple R-squared: 0.708, Adjusted R-squared: 0.6864
## F-statistic: 32.74 on 2 and 27 DF, p-value: 6.058e-08
Using backward elimination lets remove the variable which have highest p value above the threshold, which is this case is learning.Residuals, RSquared and Adjusted RSquared stays has a steeper decrease. Coefficents raised by 30%: 9.8708805 + 0.6435176 x complaints
c.lm<-lm(attitude$rating~attitude$complaints)
summary(c.lm)
##
## Call:
## lm(formula = attitude$rating ~ attitude$complaints)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.8799 -5.9905 0.1783 6.2978 9.6294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.37632 6.61999 2.172 0.0385 *
## attitude$complaints 0.75461 0.09753 7.737 1.99e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.993 on 28 degrees of freedom
## Multiple R-squared: 0.6813, Adjusted R-squared: 0.6699
## F-statistic: 59.86 on 1 and 28 DF, p-value: 1.988e-08
# Check the residual analysis of our model
plot(fitted(c.lm), resid(c.lm))
# q-q plot
qqnorm(resid(c.lm))
qqline(resid(c.lm))
In the data, pairs view could help us determine best variable to use. The first lm summary showed ratings and complaints are best variables to be used as complaints was the only variable with a p value below .05