library(car)
data(Prestige)
mod1<-lm(prestige ~ ., data = Prestige)
summary(mod1)
##
## Call:
## lm(formula = prestige ~ ., data = Prestige)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9863 -4.9813 0.6983 4.8690 19.2402
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.213e+01 8.018e+00 -1.513 0.13380
## education 3.933e+00 6.535e-01 6.019 3.64e-08 ***
## income 9.946e-04 2.601e-04 3.824 0.00024 ***
## women 1.310e-02 3.018e-02 0.434 0.66524
## census 1.156e-03 6.183e-04 1.870 0.06471 .
## typeprof 1.077e+01 4.676e+00 2.303 0.02354 *
## typewc 2.877e-01 3.139e+00 0.092 0.92718
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.037 on 91 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.841, Adjusted R-squared: 0.8306
## F-statistic: 80.25 on 6 and 91 DF, p-value: < 2.2e-16
pairs(Prestige)
cor(Prestige[,-6])
## education income women prestige census
## education 1.00000000 0.5775802 0.06185286 0.8501769 -0.8230882
## income 0.57758023 1.0000000 -0.44105927 0.7149057 -0.3610023
## women 0.06185286 -0.4410593 1.00000000 -0.1183342 -0.2270028
## prestige 0.85017689 0.7149057 -0.11833419 1.0000000 -0.6345103
## census -0.82308821 -0.3610023 -0.22700277 -0.6345103 1.0000000
A: Yes. The main predictors are education
and income
, because their correlation are the highest with prestige
. Also, in the full model (all variables together), their p-values are below 0.001.
A: No. The predictors type
, census
and women
show no strong correlation with prestige
and their contribution are very low in the full model.
mod2<-lm(prestige ~ education + income, data = Prestige)
summary(mod2)
##
## Call:
## lm(formula = prestige ~ education + income, data = Prestige)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.4040 -5.3308 0.0154 4.9803 17.6889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.8477787 3.2189771 -2.127 0.0359 *
## education 4.1374444 0.3489120 11.858 < 2e-16 ***
## income 0.0013612 0.0002242 6.071 2.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.81 on 99 degrees of freedom
## Multiple R-squared: 0.798, Adjusted R-squared: 0.7939
## F-statistic: 195.6 on 2 and 99 DF, p-value: < 2.2e-16
A: When we select only education
and income
to predict prestige
our model (mod2
) shows an adjusted \(R^2 = 0.794\).
par(mfrow=c(2,2))
plot(mod2)
By plotting the residuals of our mod2
, we confirm that the residuals show low deviance and normal distribution.
A: Our model can make predictions:
# fake guess
guess<-data.frame(income=9000,education=14) # set of random predictor values
y_guess<-predict(mod2, guess)
y_guess
## 1
## 63.32694
Our predicted values seems to be close to the real values of prestige
:
y_hat<-predict(mod2)
plot(y_hat~Prestige$prestige)
Performing a Leave-one-out cross-validation, we can see that the residuals of our two models (with and without type
variable) show normal distribution.
residuos<-c()
for(i in 1:nrow(Prestige)){
model_cross_val<-lm(prestige ~ education + income, data = Prestige[-i,])
residuos[i] <- Prestige$prestige[i]-
predict(model_cross_val,Prestige[i,c("education","income")] )
}
densityPlot(scale(residuos))
which(is.na(Prestige$type))
## [1] 34 53 63 67
dados<-Prestige[-c(34,53,63,67),]
residuos<-c()
for(i in 1:nrow(dados)){
model_cross_val<-lm(prestige ~ education + income+type, data = dados[-i,])
residuos[i] <- dados$prestige[i]-
predict(model_cross_val,dados[i,c("education","income","type")] )
}
densityPlot(scale(residuos))
dados<-Prestige[-c(34,53,63,67),]
mod1<-lm(prestige~income+education,dados)
mod2<-lm(prestige~income+education+type,dados)
AIC(mod1,mod2)
## df AIC
## mod1 4 676.6695
## mod2 6 669.0151
anova(mod1,mod2)
## Analysis of Variance Table
##
## Model 1: prestige ~ income + education
## Model 2: prestige ~ income + education + type
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 95 5272.4
## 2 93 4681.3 2 591.16 5.8721 0.003966 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From our analysis, especially from the results obtained with the AIC criterium, we conclude that including type
predictor increases the model’s accuracy.