The stopping data in the alr4 package provides data on the relationship between the speed of cars (in miles per hour) and the stopping distance (in feet) that cars require at that speed. Fit a simple linear regression with Distance as the response and interpret the model. The model you fit must meet the assumptions of simple linear regression before you do all interpretations. Use diagnostic tools to check the assumptions and any tools such as transformations to make the model fit. What transformations did you choose and why?
library(alr4)
## Loading required package: car
## Loading required package: carData
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
head(stopping)
## Speed Distance
## 1 4 4
## 2 5 2
## 3 5 4
## 4 5 8
## 5 5 8
## 6 7 7
summary(stopping)
## Speed Distance
## Min. : 4.00 Min. : 2.00
## 1st Qu.:10.00 1st Qu.: 13.25
## Median :17.50 Median : 29.50
## Mean :18.92 Mean : 39.31
## 3rd Qu.:26.75 3rd Qu.: 56.75
## Max. :40.00 Max. :138.00
g1<-lm(sqrt(Distance)~Speed,data=stopping)
summary(g1)
##
## Call:
## lm(formula = sqrt(Distance) ~ Speed, data = stopping)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49948 -0.54761 0.00469 0.53153 1.54350
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.932396 0.197909 4.711 1.5e-05 ***
## Speed 0.252466 0.009274 27.223 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7209 on 60 degrees of freedom
## Multiple R-squared: 0.9251, Adjusted R-squared: 0.9239
## F-statistic: 741.1 on 1 and 60 DF, p-value: < 2.2e-16
plot(g1)
residualPlots(g1)
## Test stat Pr(>|Test stat|)
## Speed 0.1477 0.8831
## Tukey test 0.1477 0.8826
ncvTest(g1)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 2.331351, Df = 1, p = 0.12679
summary(g1)
##
## Call:
## lm(formula = sqrt(Distance) ~ Speed, data = stopping)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49948 -0.54761 0.00469 0.53153 1.54350
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.932396 0.197909 4.711 1.5e-05 ***
## Speed 0.252466 0.009274 27.223 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7209 on 60 degrees of freedom
## Multiple R-squared: 0.9251, Adjusted R-squared: 0.9239
## F-statistic: 741.1 on 1 and 60 DF, p-value: < 2.2e-16
After doing the transformations, I saw there was a curve in the residuals plot. This meant that there was some kind of ncv and I needed to run an ncv test. I first did the log of distance, but there was still a curve in the data. The square root of the distance made the data scattered, which told me that was the correct transformation.
The MinnLand data in the alr4 package includes every farm sale in Minnesota from 2002-2011 enrolled in the federal Conservation Reserve Program. Fit a linear regression model to understand the impact of the acre price of the farm (variable called acrePrice HINT: THIS IS THE RESPONSE) as it relates to acres and the percentage of tillable land on the farm (variable is called tillable) (HINT: USE BOTH AS PREDICTORS in the same model, multiple linear regression). Check the assumptions of multiple linear regression using the diagnostic tools discussed in lecture. Using the tools discussed adjust the model to meet assumptions, and provide a discussion on the meaning of the model including but not limited to the interpretation of the slope coefficients. HINT: RULES OF THUMB COULD BE HELPFUL!
library(alr4)
head(MinnLand)
## acrePrice region improvements year acres tillable financing crpPct
## 1 766 Northwest 0 2002 82 94 title_transfer 0
## 2 733 Northwest 0 2003 30 63 title_transfer 0
## 3 850 Northwest 4 2002 150 47 title_transfer 0
## 4 975 Northwest 0 2003 160 86 title_transfer 0
## 5 886 Northwest 62 2002 90 NA title_transfer 0
## 6 992 Northwest 30 2003 120 83 title_transfer 0
## productivity
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
summary(MinnLand)
## acrePrice region improvements year
## Min. : 108 Northwest :3799 Min. : 0.000 Min. :2002
## 1st Qu.: 1425 West Central :3297 1st Qu.: 0.000 1st Qu.:2004
## Median : 2442 Central :4198 Median : 0.000 Median :2006
## Mean : 2787 South West :2583 Mean : 4.493 Mean :2006
## 3rd Qu.: 3702 South Central:2832 3rd Qu.: 0.000 3rd Qu.:2008
## Max. :15000 South East :1991 Max. :100.000 Max. :2011
## NA's :50
## acres tillable financing crpPct
## Min. : 1.0 Min. : 0.00 title_transfer :16601 Min. : 0.000
## 1st Qu.: 47.0 1st Qu.: 72.00 seller_financed: 2099 1st Qu.: 0.000
## Median : 80.0 Median : 92.00 Median : 0.000
## Mean : 112.7 Mean : 80.67 Mean : 4.163
## 3rd Qu.: 153.0 3rd Qu.: 97.00 3rd Qu.: 0.000
## Max. :6970.0 Max. :100.00 Max. :100.000
## NA's :1212
## productivity
## Min. : 1.00
## 1st Qu.:59.00
## Median :68.00
## Mean :66.63
## 3rd Qu.:76.00
## Max. :99.00
## NA's :9717
g2<-lm(log(acrePrice)~log(acres)+sqrt(tillable), data=MinnLand)
plot(g2)
ncvTest(g2)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 37.40519, Df = 1, p = 9.5966e-10
summary(g2)
##
## Call:
## lm(formula = log(acrePrice) ~ log(acres) + sqrt(tillable), data = MinnLand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14351 -0.39930 0.08592 0.47220 2.48029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.052680 0.040540 198.64 <2e-16 ***
## log(acres) -0.250169 0.006883 -36.35 <2e-16 ***
## sqrt(tillable) 0.086580 0.003377 25.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6818 on 17485 degrees of freedom
## (1212 observations deleted due to missingness)
## Multiple R-squared: 0.09238, Adjusted R-squared: 0.09227
## F-statistic: 889.8 on 2 and 17485 DF, p-value: < 2.2e-16
The acrePrice, acres, and tillable data by itself shows some trends, so I knew I would have to do some transformations to get it to show variance. I ended up finding that the log of acrePrice, log of acres and square root of tillable ultimately did this. The p-value for acres and tillable shows there is a relationship to acrePrice because they are below 0.05. The intercept is at 805 and as the log of acres is decreased by one, acrePrice is decreased by 25 units. It also shows that as the square root of tillable is increased by one, acrePrice is increased by 8.6 units.