Using R’s lm function, perform regression analysis and measure the signicance of the independent variables for the following two data sets. In the rst case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation: MaxHR = 220 ???? Age
Age <- c(18, 23, 25, 35, 65, 54, 34, 56, 72, 19, 23, 42, 18, 39, 37)
MaxHR <- c(202, 186, 187, 180, 156, 169, 174, 172, 153, 199, 193, 174, 198, 183, 178)
Heart_Age <- cbind.data.frame(Age,MaxHR)
Let’s create a scatter plot
plot(MaxHR ~ Age, data = Heart_Age,
xlab = "Age",
ylab = "Max Heart Rate",
main = "Heart rates with age"
)
The graph suggests that their is a linear relationship with Max heart rate and Age, Lets fit a linear Regression
Heart_Age_lg <- lm(MaxHR ~ Age, data = Heart_Age)
summary(Heart_Age_lg)
##
## Call:
## lm(formula = MaxHR ~ Age, data = Heart_Age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9258 -2.5383 0.3879 3.1867 6.6242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 210.04846 2.86694 73.27 < 2e-16 ***
## Age -0.79773 0.06996 -11.40 3.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.578 on 13 degrees of freedom
## Multiple R-squared: 0.9091, Adjusted R-squared: 0.9021
## F-statistic: 130 on 1 and 13 DF, p-value: 3.848e-08
fROM the summary we see the equation is MaxHR = 210.04846 -0.79773* Age
residual standard error is cost which is 4.578
If we assume a null Hypothesis that there no relationship between the two variables. Then that means our significance level should be at least 0.05, to be strict.
However according to the summary, the p-value is also 3.848e-08 which is much less than 0.05. Thereore we reject the Null hypothesis and conclude that there is a significant relationship between the two variables
Let’s look at the correlation coefficient
cor(Age, MaxHR)
## [1] -0.9534656
Because it has a high negative correlation value of -0.9534 which is close to one. Then I can say the effect of Age on HR is significant
Lets look at the predicted MaxHR versus Age and plot the fitted relationship
#fitted(Heart_Age_lg)
plot(MaxHR ~ Age, data = Heart_Age,
xlab = "Age",
ylab = "Max Heart Rate",
main = "Heart rates with age"
)
abline(lm(MaxHR ~ Age, data = Heart_Age))
Problem 2
Let’s import the data set
#filepath <- c("https://raw.githubusercontent.com/nobieyi00/CUNY_MSDA_R/master/auto-mpg.data")
filepath <- c("C:/Users/Mezue/Downloads/assign11/assign11/auto-mpg.data")
Auto_mpg <-read.table(filepath,header = FALSE, sep = "")
colnames(Auto_mpg) <- c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg')
Get subset of random 40 rows for the Linear Regression analysis
random.40 <- as.integer(runif(40, min=1, max = nrow(Auto_mpg)))
Auto_mpg_40 <- Auto_mpg[c(random.40),]
Let’s analysize accelaration and mpg
Auto_mpg_40_fit <- lm(mpg ~ displacement + horsepower + weight + acceleration, Auto_mpg_40)
Auto_mpg_40_fit_lm <-summary(Auto_mpg_40_fit)
Auto_mpg_40_fit_lm
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = Auto_mpg_40)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8579 -1.9092 -0.4054 1.6305 10.8672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.477375 7.261129 6.952 4.41e-08 ***
## displacement -0.001399 0.023008 -0.061 0.9519
## horsepower -0.125737 0.068366 -1.839 0.0744 .
## weight -0.003810 0.002254 -1.690 0.0999 .
## acceleration -0.135412 0.344677 -0.393 0.6968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.354 on 35 degrees of freedom
## Multiple R-squared: 0.8296, Adjusted R-squared: 0.8101
## F-statistic: 42.59 on 4 and 35 DF, p-value: 5.526e-13
mpg =57.424729 + -0.001255×displacement+-0.018986×horsepower+-0.008568×weight+-0.463823×acceleration
Which of the 4 independent variables have a significant impact on mpg? Assuming significance level of 0.05
p_values <- Auto_mpg_40_fit_lm$coefficients[2:5,"Pr(>|t|)"]
p_values[which(p_values < .05)]
## named numeric(0)
We can conclude that weight has the most significant impact on mpg
What are the standard errors on each of the coefficients?
Auto_mpg_40_fit_lm$coefficients[2:5,"Std. Error"]
## displacement horsepower weight acceleration
## 0.023008157 0.068366461 0.002254439 0.344676843
measure the 95% confidence intervals.
confint(Auto_mpg_40_fit, level=0.95)
## 2.5 % 97.5 %
## (Intercept) 35.736500604 6.521825e+01
## displacement -0.048107637 4.531045e-02
## horsepower -0.264528650 1.305394e-02
## weight -0.008387234 7.662727e-04
## acceleration -0.835142927 5.643195e-01
Auto_mpg_fit <- lm(mpg ~ displacement + horsepower + weight + acceleration, Auto_mpg)
Auto_mpg_fit_lm <-summary(Auto_mpg_fit)
Auto_mpg_fit_lm
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = Auto_mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.378 -2.793 -0.333 2.193 16.256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2511397 2.4560447 18.424 < 2e-16 ***
## displacement -0.0060009 0.0067093 -0.894 0.37166
## horsepower -0.0436077 0.0165735 -2.631 0.00885 **
## weight -0.0052805 0.0008109 -6.512 2.3e-10 ***
## acceleration -0.0231480 0.1256012 -0.184 0.85388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared: 0.707, Adjusted R-squared: 0.704
## F-statistic: 233.4 on 4 and 387 DF, p-value: < 2.2e-16
mpg =45.2511397 + -0.0060009×displacement+-0.0436077×horsepower+-0.0052805×weight+-0.0231480×acceleration
Which of the 4 independent variables have a significant impact on mpg? Assuming significance level of 0.05
p_values <- Auto_mpg_fit_lm$coefficients[2:5,"Pr(>|t|)"]
p_values[which(p_values < .05)]
## horsepower weight
## 8.848982e-03 2.302545e-10
We can conclude that weight and horsepower has the most significant impact on mpg
What are the standard errors on each of the coefficients?
Auto_mpg_fit_lm$coefficients[2:5,"Std. Error"]
## displacement horsepower weight acceleration
## 0.0067093055 0.0165734633 0.0008108541 0.1256011622
measure the 95% confidence intervals.
confint(Auto_mpg_fit, level=0.95)
## 2.5 % 97.5 %
## (Intercept) 40.422278855 50.080000544
## displacement -0.019192122 0.007190380
## horsepower -0.076193029 -0.011022433
## weight -0.006874738 -0.003686277
## acceleration -0.270094049 0.223798050
In conclusion, I notice that the larger the dataset the better our confidence levels are. the more data the better or model