Some Analyses

library(smss)
library(alr4)

## Loading required package: car

## Loading required package: carData

## Loading required package: effects

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

library(magrittr)

1A Beds would be elimited first, having the largest p value

data('house.selling.price.2')
summary(house.selling.price.2)

##        P                S              Be              Ba       
##  Min.   : 17.50   Min.   :0.40   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 72.90   1st Qu.:1.33   1st Qu.:3.000   1st Qu.:2.000  
##  Median : 96.00   Median :1.57   Median :3.000   Median :2.000  
##  Mean   : 99.53   Mean   :1.65   Mean   :3.183   Mean   :1.957  
##  3rd Qu.:115.00   3rd Qu.:1.98   3rd Qu.:4.000   3rd Qu.:2.000  
##  Max.   :309.40   Max.   :3.85   Max.   :5.000   Max.   :3.000  
##       New        
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3011  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

fit<-lm(formula = P ~ S + Be + Ba + New, data = house.selling.price.2)
summary(fit)

## 
## Call:
## lm(formula = P ~ S + Be + Ba + New, data = house.selling.price.2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.212  -9.546   1.277   9.406  71.953 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -41.795     12.104  -3.453 0.000855 ***
## S             64.761      5.630  11.504  < 2e-16 ***
## Be            -2.766      3.960  -0.698 0.486763    
## Ba            19.203      5.650   3.399 0.001019 ** 
## New           18.984      3.873   4.902  4.3e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.36 on 88 degrees of freedom
## Multiple R-squared:  0.8689, Adjusted R-squared:  0.8629 
## F-statistic: 145.8 on 4 and 88 DF,  p-value: < 2.2e-16

1B Size, having the smallest p value, would be selected first.

Well, I am not yet a pro with regression modeling. However, so far it seems that including more variables can have this effect. Likely the values of the other variables are impacting the explanatory value of beds. D

m1<- lm(P ~ S + Be + Ba + New, data=house.selling.price.2)
m2<- lm(P ~ Be + Ba + New, data=house.selling.price.2)
m3<- lm(P ~ S + Ba + New, data=house.selling.price.2)
m4<- lm(P ~ S + Be + New, data=house.selling.price.2)
m5<- lm(P ~ S + Be + Ba, data=house.selling.price.2)
m6<- lm(P ~ S + New, data=house.selling.price.2)
m7<- lm(P ~ Ba + New, data=house.selling.price.2)

all_model<-list(m1, m2, m3, m4, m5, m6, m7)
rsquared <- function(fit) summary(fit)$r.squared
adjrsquared <- function(fit) summary(fit)$adj.r.squared
PRESS <- function(fit) {
    pr <- residuals(fit)/(1 - lm.influence(fit)$hat)
    sum(pr^2)
}

R2= sapply(all_model, rsquared)
AR2= sapply(all_model, adjrsquared)
PRESS = sapply(all_model, PRESS)
AIC = sapply(all_model, AIC)
BIC = sapply(all_model, BIC)

E The models above sugest that the models without the inclusion of bed are the better suited.

data(trees)
tree<-lm(formula = Volume ~ Girth + Height, data = trees)

par(mfrow = c(2,3))
plot(tree, which = 1:6)

A few of the models show violations. Residuals vs fitted show a curvelinear trend, where it should be straight. There seems to be a trend for scale-location, and residuals beyond the dashed lines in residuals vs leverage. Cook’s distance shows observation 31 as having an extreme value.

3 Palm beach is the greatest outlier

data("florida")
florida_model <- lm(Buchanan ~ Bush, data = florida)
par(mfrow = c(2,3))
plot(florida_model, which = 1:6)

3B. Looks like there is an improvement in residuals vs fitted, normal Q-Q, scale-location, and residuals vs leverage

flor<- lm(log(Buchanan) ~ log(Bush), data = florida)
summary(flor)

## 
## Call:
## lm(formula = log(Buchanan) ~ log(Bush), data = florida)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96075 -0.25949  0.01282  0.23826  1.66564 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.57712    0.38919  -6.622 8.04e-09 ***
## log(Bush)    0.75772    0.03936  19.251  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4673 on 65 degrees of freedom
## Multiple R-squared:  0.8508, Adjusted R-squared:  0.8485 
## F-statistic: 370.6 on 1 and 65 DF,  p-value: < 2.2e-16

par(mfrow = c (2,3)); plot(flor,which = 1:6)

Some Analyses

Eris Dodds

12/12/2022