library(smss)
library(alr4)
## Loading required package: car
## Loading required package: carData
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(magrittr)
1A Beds would be elimited first, having the largest p value
data('house.selling.price.2')
summary(house.selling.price.2)
## P S Be Ba
## Min. : 17.50 Min. :0.40 Min. :1.000 Min. :1.000
## 1st Qu.: 72.90 1st Qu.:1.33 1st Qu.:3.000 1st Qu.:2.000
## Median : 96.00 Median :1.57 Median :3.000 Median :2.000
## Mean : 99.53 Mean :1.65 Mean :3.183 Mean :1.957
## 3rd Qu.:115.00 3rd Qu.:1.98 3rd Qu.:4.000 3rd Qu.:2.000
## Max. :309.40 Max. :3.85 Max. :5.000 Max. :3.000
## New
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3011
## 3rd Qu.:1.0000
## Max. :1.0000
fit<-lm(formula = P ~ S + Be + Ba + New, data = house.selling.price.2)
summary(fit)
##
## Call:
## lm(formula = P ~ S + Be + Ba + New, data = house.selling.price.2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.212 -9.546 1.277 9.406 71.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.795 12.104 -3.453 0.000855 ***
## S 64.761 5.630 11.504 < 2e-16 ***
## Be -2.766 3.960 -0.698 0.486763
## Ba 19.203 5.650 3.399 0.001019 **
## New 18.984 3.873 4.902 4.3e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.36 on 88 degrees of freedom
## Multiple R-squared: 0.8689, Adjusted R-squared: 0.8629
## F-statistic: 145.8 on 4 and 88 DF, p-value: < 2.2e-16
1B Size, having the smallest p value, would be selected first.
1C
Well, I am not yet a pro with regression modeling. However, so far it seems that including more variables can have this effect. Likely the values of the other variables are impacting the explanatory value of beds. D
m1<- lm(P ~ S + Be + Ba + New, data=house.selling.price.2)
m2<- lm(P ~ Be + Ba + New, data=house.selling.price.2)
m3<- lm(P ~ S + Ba + New, data=house.selling.price.2)
m4<- lm(P ~ S + Be + New, data=house.selling.price.2)
m5<- lm(P ~ S + Be + Ba, data=house.selling.price.2)
m6<- lm(P ~ S + New, data=house.selling.price.2)
m7<- lm(P ~ Ba + New, data=house.selling.price.2)
all_model<-list(m1, m2, m3, m4, m5, m6, m7)
rsquared <- function(fit) summary(fit)$r.squared
adjrsquared <- function(fit) summary(fit)$adj.r.squared
PRESS <- function(fit) {
pr <- residuals(fit)/(1 - lm.influence(fit)$hat)
sum(pr^2)
}
R2= sapply(all_model, rsquared)
AR2= sapply(all_model, adjrsquared)
PRESS = sapply(all_model, PRESS)
AIC = sapply(all_model, AIC)
BIC = sapply(all_model, BIC)
E The models above sugest that the models without the inclusion of bed are the better suited.
2A
data(trees)
tree<-lm(formula = Volume ~ Girth + Height, data = trees)
2B
par(mfrow = c(2,3))
plot(tree, which = 1:6)
A few of the models show violations. Residuals vs fitted show a curvelinear trend, where it should be straight. There seems to be a trend for scale-location, and residuals beyond the dashed lines in residuals vs leverage. Cook’s distance shows observation 31 as having an extreme value.
3 Palm beach is the greatest outlier
data("florida")
florida_model <- lm(Buchanan ~ Bush, data = florida)
par(mfrow = c(2,3))
plot(florida_model, which = 1:6)
3B. Looks like there is an improvement in residuals vs fitted, normal Q-Q, scale-location, and residuals vs leverage
flor<- lm(log(Buchanan) ~ log(Bush), data = florida)
summary(flor)
##
## Call:
## lm(formula = log(Buchanan) ~ log(Bush), data = florida)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96075 -0.25949 0.01282 0.23826 1.66564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.57712 0.38919 -6.622 8.04e-09 ***
## log(Bush) 0.75772 0.03936 19.251 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4673 on 65 degrees of freedom
## Multiple R-squared: 0.8508, Adjusted R-squared: 0.8485
## F-statistic: 370.6 on 1 and 65 DF, p-value: < 2.2e-16
par(mfrow = c (2,3)); plot(flor,which = 1:6)