Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets.
In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation: MaxHR = 220 - Age
You have been given the following sample:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Age | 18 | 23 | 25 | 35 | 65 | 54 | 34 | 56 | 72 | 19 | 23 | 42 | 18 | 39 | 37 |
| MaxHR | 202 | 186 | 187 | 180 | 156 | 169 | 174 | 172 | 153 | 199 | 193 | 174 | 198 | 183 | 178 |
Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R.
MaxHR = -0.7977 Age + 210.0485
# Put data in vectors
age <- c(18, 23, 25, 35, 65, 54, 34, 56, 72, 19, 23, 42, 18, 39, 37)
mhr <- c(202, 186, 187, 180, 156, 169, 174, 172, 153, 199, 193, 174, 198, 183, 178)
# Put data in a dataframe for lm function
maxHRdf <- data.frame(age, mhr)
# Fit with lm function
mhrfit <- lm(mhr ~ age, maxHRdf)
mhrfit##
## Call:
## lm(formula = mhr ~ age, data = maxHRdf)
##
## Coefficients:
## (Intercept) age
## 210.0485 -0.7977
Yes, we can see from the summary that the p-value is \(3.848 \times 10^{-8}\), which is much lower than 0.01%, and the significance codes give the probability that Age is not significant as zero.
# Use summary function for details
summary(mhrfit)##
## Call:
## lm(formula = mhr ~ age, data = maxHRdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9258 -2.5383 0.3879 3.1867 6.6242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 210.04846 2.86694 73.27 < 2e-16 ***
## age -0.79773 0.06996 -11.40 3.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.578 on 13 degrees of freedom
## Multiple R-squared: 0.9091, Adjusted R-squared: 0.9021
## F-statistic: 130 on 1 and 13 DF, p-value: 3.848e-08
# Plot Age vs. Max Heart Rate to show what we are fitting
plot(age, mhr, col = "red", main = "Age vs. Max Heart Rate", xlab = "Age", ylab = "Max Heart Rate")
lines(age, (220 - age), col = "green")# Plot the fit results
plot(mhrfit)Using the Auto data set from Assignment 5 perform a Linear Regression analysis using mpg as the dependent variable and the other 4 (displacement, horsepower, weight, acceleration) as independent variables.
# Read in the mpg data from Github
mpgdf <- read.table("https://raw.githubusercontent.com/Godbero/IS605/master/auto-mpg.data", col.names = c("dis", "hp", "wt", "acc", "mpg"))
head(mpgdf)## dis hp wt acc mpg
## 1 307 130 3504 12.0 18
## 2 350 165 3693 11.5 15
## 3 318 150 3436 11.0 18
## 4 304 150 3433 12.0 16
## 5 302 140 3449 10.5 17
## 6 429 198 4341 10.0 15
tail(mpgdf)## dis hp wt acc mpg
## 387 151 90 2950 17.3 27
## 388 140 86 2790 15.6 27
## 389 97 52 2130 24.6 44
## 390 135 84 2295 11.6 32
## 391 120 79 2625 18.6 28
## 392 119 82 2720 19.4 31
# Make random 40 data points from the entire auto data sample
set.seed(42)
mpgdf40 <- mpgdf[sample(nrow(mpgdf), 40), ]
mpgdf40## dis hp wt acc mpg
## 359 231 110 3415 15.8 22.4
## 367 135 84 2525 16.0 29.0
## 112 122 85 2310 18.5 19.0
## 324 90 48 2085 21.7 44.3
## 249 318 140 3735 13.2 19.4
## 201 258 95 3193 17.8 17.5
## 285 302 129 3725 13.4 17.6
## 52 88 76 2065 14.5 30.0
## 253 200 85 2965 15.8 20.2
## 271 151 85 2855 17.6 23.8
## 175 232 90 3211 17.0 19.0
## 274 163 125 3140 13.6 17.0
## 356 145 76 3160 19.6 30.7
## 97 225 105 3121 16.5 18.0
## 382 262 85 3015 17.0 38.0
## 355 141 80 3230 20.4 28.1
## 368 151 90 2735 18.0 27.0
## 45 258 110 2962 13.5 18.0
## 178 121 98 2945 14.5 22.0
## 210 168 120 3820 16.7 16.5
## 337 156 92 2620 14.4 25.8
## 385 144 96 2665 13.9 32.0
## 366 112 85 2575 16.2 31.0
## 350 105 74 2190 14.2 33.0
## 31 140 90 2264 15.5 28.0
## 189 351 152 4215 12.8 14.5
## 143 76 52 1649 16.5 31.0
## 331 168 132 2910 11.4 32.7
## 163 231 110 3039 15.0 21.0
## 304 151 90 2670 16.0 28.4
## 268 105 75 2230 14.5 30.9
## 293 86 65 1975 15.2 34.1
## 140 98 83 2219 16.5 29.0
## 246 85 70 2070 18.6 39.4
## 2 350 165 3693 11.5 15.0
## 298 141 71 3190 24.8 27.2
## 3 318 150 3436 11.0 18.0
## 74 302 140 4294 16.0 13.0
## 321 86 65 2110 17.9 46.6
## 216 111 80 2155 14.8 30.0
# Fit with lm function
mpgfit <- lm(mpg ~ dis + hp + wt + acc, mpgdf)
mpgfit##
## Call:
## lm(formula = mpg ~ dis + hp + wt + acc, data = mpgdf)
##
## Coefficients:
## (Intercept) dis hp wt acc
## 45.251140 -0.006001 -0.043608 -0.005281 -0.023148
# Fit with lm function
mpgfit40 <- lm(mpg ~ dis + hp + wt + acc, mpgdf40)
mpgfit40##
## Call:
## lm(formula = mpg ~ dis + hp + wt + acc, data = mpgdf40)
##
## Coefficients:
## (Intercept) dis hp wt acc
## 48.760681 -0.006323 -0.085800 -0.005613 0.164672
For the random sample of 40 rows: MPG = -0.006323 Displacement + -0.085800 Horsepower + -0.005613 Weight + 0.164672 Acceleration + 48.760681
For the complete data set: MPG = -0.006001 Displacement + -0.043608 Horsepower + -0.005281 Weight + -0.023148 Acceleration + 45.251140
# Use summary function on fit to get significance and standard error
summary(mpgfit40)##
## Call:
## lm(formula = mpg ~ dis + hp + wt + acc, data = mpgdf40)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.7767 -2.7214 -0.5021 1.9057 12.8559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.760681 11.872539 4.107 0.000229 ***
## dis -0.006323 0.023991 -0.264 0.793677
## hp -0.085800 0.105529 -0.813 0.421692
## wt -0.005613 0.003972 -1.413 0.166447
## acc 0.164672 0.630572 0.261 0.795510
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.43 on 35 degrees of freedom
## Multiple R-squared: 0.6036, Adjusted R-squared: 0.5583
## F-statistic: 13.32 on 4 and 35 DF, p-value: 1.074e-06
summary(mpgfit)##
## Call:
## lm(formula = mpg ~ dis + hp + wt + acc, data = mpgdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.378 -2.793 -0.333 2.193 16.256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2511397 2.4560447 18.424 < 2e-16 ***
## dis -0.0060009 0.0067093 -0.894 0.37166
## hp -0.0436077 0.0165735 -2.631 0.00885 **
## wt -0.0052805 0.0008109 -6.512 2.3e-10 ***
## acc -0.0231480 0.1256012 -0.184 0.85388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared: 0.707, Adjusted R-squared: 0.704
## F-statistic: 233.4 on 4 and 387 DF, p-value: < 2.2e-16
| Variable | Sig Level 40 | Sig Level All Data |
|---|---|---|
| Displacement | 0.793677 | 0.37166 |
| Horsepower | 0.421692 | 0.00885 |
| Weight | 0.166447 | 2.3e-10 |
| Acceleration | 0.795510 | 0.85388 |
None of the variables are significant for the small (40) sample and horsepower and weight are significant for the full data set
| Variable | Stand Error 40 | Stand Error All Data |
|---|---|---|
| Displacement | 0.023991 | 0.0067093 |
| Horsepower | 0.105529 | 0.0165735 |
| Weight | 0.003972 | 0.0008109 |
| Acceleration | 0.630572 | 0.1256012 |