Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets. In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation: \[MaxHR = 220 − Age\]
Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the effect of Age on Max HR significant? What is the significance level? Please also plot the fitted relationship between Max HR and Age.
#create data frame
(regData = data.frame(
Age = c(18, 23, 25, 35, 65, 54, 34, 56, 72, 19, 23, 42, 18, 39, 37),
MaxHR = c(202, 186, 187, 180, 156, 169, 174, 172, 153, 199, 193, 174, 198, 183, 178)
))
## Age MaxHR
## 1 18 202
## 2 23 186
## 3 25 187
## 4 35 180
## 5 65 156
## 6 54 169
## 7 34 174
## 8 56 172
## 9 72 153
## 10 19 199
## 11 23 193
## 12 42 174
## 13 18 198
## 14 39 183
## 15 37 178
The estimate of the model intercept is 210.0486 The signifiance level is F-statistic: 130 on 1 and 13 DF, p-value: 3.848e-08
alli.mod1 <- lm(MaxHR ~ Age, data = regData)
summary(alli.mod1)
##
## Call:
## lm(formula = MaxHR ~ Age, data = regData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9258 -2.5383 0.3879 3.1867 6.6242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 210.04846 2.86694 73.27 < 2e-16 ***
## Age -0.79773 0.06996 -11.40 3.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.578 on 13 degrees of freedom
## Multiple R-squared: 0.9091, Adjusted R-squared: 0.9021
## F-statistic: 130 on 1 and 13 DF, p-value: 3.848e-08
require(lattice)
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.3.2
#Plot the residuals
xyplot(resid(alli.mod1) ~ fitted(alli.mod1),
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residual Diagnostic Plot",
panel = function(x, y, ...)
{
panel.grid(h = -1, v = -1)
panel.abline(h = 0)
panel.xyplot(x, y, ...)
}
)
#The function resid extracts the model residuals from the fitted model object
qqmath( ~ resid(alli.mod1),
xlab = "Theoretical Quantiles",
ylab = "Residuals"
)
Using the Auto data set from Assignment 5 perform a Linear Regression analysis using mpg as the dependent variable and the other 4 (displacement, horse- power, weight, acceleration) as independent variables. What is the final linear regression fit equation? Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coefficients?
auto <- as.data.frame(read.table("auto-mpg.data", header = FALSE, as.is = TRUE))
colnames(auto) <- c("displacement", "horsepower", "weight", "acceleration", "mpg")
head(auto)
## displacement horsepower weight acceleration mpg
## 1 307 130 3504 12.0 18
## 2 350 165 3693 11.5 15
## 3 318 150 3436 11.0 18
## 4 304 150 3433 12.0 16
## 5 302 140 3449 10.5 17
## 6 429 198 4341 10.0 15
autoLm = lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
data = auto)
(autoLmSum <- summary(autoLm))
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.378 -2.793 -0.333 2.193 16.256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2511397 2.4560447 18.424 < 2e-16 ***
## displacement -0.0060009 0.0067093 -0.894 0.37166
## horsepower -0.0436077 0.0165735 -2.631 0.00885 **
## weight -0.0052805 0.0008109 -6.512 2.3e-10 ***
## acceleration -0.0231480 0.1256012 -0.184 0.85388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared: 0.707, Adjusted R-squared: 0.704
## F-statistic: 233.4 on 4 and 387 DF, p-value: < 2.2e-16
The intercept and coefficients - the
autoLmSum$coefficients[,1]
## (Intercept) displacement horsepower weight acceleration
## 45.251139699 -0.006000871 -0.043607731 -0.005280508 -0.023147999
From the results, the liner regression fit equation is: mpg = 45.251139699 + -0.006000871 * displacement + -0.043607731 * horsepower - -0.005280508 * weight - -0.023147999 * acceleration – weight has a significant impact on mpg
Take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals.
#Examine at 95% confidence interval
confint(autoLm, level = .95)
## 2.5 % 97.5 %
## (Intercept) 40.422278855 50.080000544
## displacement -0.019192122 0.007190380
## horsepower -0.076193029 -0.011022433
## weight -0.006874738 -0.003686277
## acceleration -0.270094049 0.223798050
First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals.
autoSample <- auto[sample(1:nrow(auto), 40,
replace=FALSE),]
head(autoSample)
## displacement horsepower weight acceleration mpg
## 1 307 130 3504 12.0 18.0
## 36 250 88 3302 15.5 19.0
## 283 225 110 3360 16.6 20.6
## 212 350 145 4055 12.0 13.0
## 178 121 98 2945 14.5 22.0
## 203 85 70 1990 17.0 32.0
autoSampLm = lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
data = autoSample)
(autoSampSum <- summary(autoSampLm))
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = autoSample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0117 -2.8882 -0.4985 2.1521 14.6491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.853461 10.560844 4.437 8.68e-05 ***
## displacement -0.008204 0.028131 -0.292 0.772
## horsepower -0.042775 0.108334 -0.395 0.695
## weight -0.006310 0.004431 -1.424 0.163
## acceleration 0.106029 0.549389 0.193 0.848
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.3 on 35 degrees of freedom
## Multiple R-squared: 0.675, Adjusted R-squared: 0.6378
## F-statistic: 18.17 on 4 and 35 DF, p-value: 3.684e-08
#Examine at 95% confidence interval
confint(autoSampLm, level = .95)
## 2.5 % 97.5 %
## (Intercept) 25.41380801 68.293113524
## displacement -0.06531229 0.048904223
## horsepower -0.26270528 0.177154896
## weight -0.01530506 0.002684588
## acceleration -1.00929029 1.221348747
Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs.
The p=value of the 392 records is larger than the p-value fo the sample.
autoLmSum #entire data set
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.378 -2.793 -0.333 2.193 16.256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2511397 2.4560447 18.424 < 2e-16 ***
## displacement -0.0060009 0.0067093 -0.894 0.37166
## horsepower -0.0436077 0.0165735 -2.631 0.00885 **
## weight -0.0052805 0.0008109 -6.512 2.3e-10 ***
## acceleration -0.0231480 0.1256012 -0.184 0.85388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared: 0.707, Adjusted R-squared: 0.704
## F-statistic: 233.4 on 4 and 387 DF, p-value: < 2.2e-16
autoSampSum #sample summary
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = autoSample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0117 -2.8882 -0.4985 2.1521 14.6491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.853461 10.560844 4.437 8.68e-05 ***
## displacement -0.008204 0.028131 -0.292 0.772
## horsepower -0.042775 0.108334 -0.395 0.695
## weight -0.006310 0.004431 -1.424 0.163
## acceleration 0.106029 0.549389 0.193 0.848
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.3 on 35 degrees of freedom
## Multiple R-squared: 0.675, Adjusted R-squared: 0.6378
## F-statistic: 18.17 on 4 and 35 DF, p-value: 3.684e-08
Reference https://www.r-bloggers.com/simple-linear-regression-2/ https://www.r-bloggers.com/r-tutorial-series-multiple-linear-regression/