Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

\(\color{red}{\text{Pairs char shows ratings and complaints very likely have the strongest correlation}}\)

data("attitude")
pairs(attitude)

\(\color{red}{\text{The linear p values for each variable are higher than the preselected threshold of .05. Residuals are balanced}}\)

c.lm<-lm(attitude$rating~attitude$complaints + attitude$privileges + attitude$learning + attitude$raises)
summary(c.lm)
## 
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$privileges + 
##     attitude$learning + attitude$raises)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2663  -5.3960   0.5988   5.8000  11.2370 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         11.83354    8.53511   1.386    0.178    
## attitude$complaints  0.69115    0.14565   4.745 7.21e-05 ***
## attitude$privileges -0.10289    0.13189  -0.780    0.443    
## attitude$learning    0.24633    0.15435   1.596    0.123    
## attitude$raises     -0.02551    0.18388  -0.139    0.891    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.996 on 25 degrees of freedom
## Multiple R-squared:  0.7152, Adjusted R-squared:  0.6697 
## F-statistic:  15.7 on 4 and 25 DF,  p-value: 1.509e-06

\(\color{red}{\text{Using backward elimination we remove the variable with the highest p value above the threshold, which is this case is raises.$\\$Residuals, RSquared and Adjusted RSquared stays close to original range. Coefficents are:}}\) 11.8335428 + 0.6911487 x complaints + -0.1028856 x privileges + 0.2463306 x learning + r c.lm$coefficients[5] x raises

c.lm<-lm(attitude$rating~attitude$complaints + attitude$privileges + attitude$learning)
summary(c.lm)
## 
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$privileges + 
##     attitude$learning)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2012  -5.7478   0.5599   5.8226  11.3241 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          11.2583     7.3183   1.538   0.1360    
## attitude$complaints   0.6824     0.1288   5.296 1.54e-05 ***
## attitude$privileges  -0.1033     0.1293  -0.799   0.4318    
## attitude$learning     0.2380     0.1394   1.707   0.0997 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared:  0.715,  Adjusted R-squared:  0.6821 
## F-statistic: 21.74 on 3 and 26 DF,  p-value: 2.936e-07

\(\color{red}{\text{Continuing backward elimination we remove the variable with the highest p value above the threshold, which is this case is privileges.$\\$Residuals, RSquared and Adjusted RSquared stays close to original range. Coefficents are:}}\) 11.2583051 + 0.6824165 x complaints + 0.2379762 x learning

c.lm<-lm(attitude$rating~attitude$complaints + attitude$learning)
summary(c.lm)
## 
## Call:
## lm(formula = attitude$rating ~ attitude$complaints + attitude$learning)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5568  -5.7331   0.6701   6.5341  10.3610 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           9.8709     7.0612   1.398    0.174    
## attitude$complaints   0.6435     0.1185   5.432 9.57e-06 ***
## attitude$learning     0.2112     0.1344   1.571    0.128    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.817 on 27 degrees of freedom
## Multiple R-squared:  0.708,  Adjusted R-squared:  0.6864 
## F-statistic: 32.74 on 2 and 27 DF,  p-value: 6.058e-08

\(\color{red}{\text{Continuing backward elimination we remove the variable with the highest p value above the threshold, which is this case is learning.$\\$Residuals, RSquared and Adjusted RSquared stays has a steeper decrease. Coefficents raised by 30%:}}\) 9.8708805 + 0.6435176 x complaints

c.lm<-lm(attitude$rating~attitude$complaints)
summary(c.lm)
## 
## Call:
## lm(formula = attitude$rating ~ attitude$complaints)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.8799  -5.9905   0.1783   6.2978   9.6294 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         14.37632    6.61999   2.172   0.0385 *  
## attitude$complaints  0.75461    0.09753   7.737 1.99e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.993 on 28 degrees of freedom
## Multiple R-squared:  0.6813, Adjusted R-squared:  0.6699 
## F-statistic: 59.86 on 1 and 28 DF,  p-value: 1.988e-08

\(\color{red}{\text{In this data, the pairs view was enough to determine the best value to use. The first lm summary already showed that ratings and complaints were the best variables to use as complaints was the only variable with a p value below .05}}\)