Problem 1 - Heart

Case 1 - with given function

Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets. In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation:

MaxHR = 220 -ô€€€ Age

#Age sample
age <- c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37)

#Correlation between variables
cor(age, (220-age))
## [1] -1
#lm equation
case1_eq = lm((220-age) ~  age)

summary(case1_eq)
## Warning in summary.lm(case1_eq): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = (220 - age) ~ age)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.374e-13 -2.734e-15  1.201e-14  1.739e-14  2.641e-14 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  2.200e+02  2.599e-14  8.464e+15   <2e-16 ***
## age         -1.000e+00  6.343e-16 -1.577e+15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.15e-14 on 13 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.486e+30 on 1 and 13 DF,  p-value: < 2.2e-16

Case 2 - With sample data

You have been given the following sample:

Age: 18 23 25 35 65 54 34 56 72 19 23 42 18 39 37 MaxHR: 202 186 187 180 156 169 174 172 153 199 193 174 198 183 178

Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the effect of Age on Max HR significant? What is the significance level? Please also plot the fitted relationship between Max HR and Age.

#sample data
age <- c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37)
maxhr <- c(202,186,187,180,156,169,174,172,153,199,193,174,198,183,178)


heart.df = data.frame(age,maxhr)

#Correlation between two variables
cor(heart.df$age, heart.df$maxhr)
## [1] -0.9534656
#Creating dataframe
heart.df = data.frame(age, maxhr)

#lm equation
case2_eq = lm(maxhr ~  age,heart.df)

summary(case2_eq)
## 
## Call:
## lm(formula = maxhr ~ age, data = heart.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9258 -2.5383  0.3879  3.1867  6.6242 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 210.04846    2.86694   73.27  < 2e-16 ***
## age          -0.79773    0.06996  -11.40 3.85e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.578 on 13 degrees of freedom
## Multiple R-squared:  0.9091, Adjusted R-squared:  0.9021 
## F-statistic:   130 on 1 and 13 DF,  p-value: 3.848e-08
#Plot between two variables
ggplot(heart.df,aes(x=age,y=maxhr)) + geom_point() + geom_smooth(method="lm") 

Resulting equation:

\[maxheart = 210.05 - 0.79773 * age\]

Is the effect of Age on Max HR significant?

Yes, the correlation coefficient is -0.953. So as the age increases the heart rate decreases.

What is the significance level?

From the lm equation, the significance level is 3^-8. It is almost 0. Hence we reject null hypothesis and conclude that the age is statistically significant.

#Below is the significiance level of heart rate

mean_hr = mean(maxhr)
sd_hr = sd(maxhr)
se_hr = sd_hr/sqrt(length(maxhr))


#Confidence interval for max heart rate for 95% confidence level
#min
mean_hr - 1.96*se_hr
## [1] 172.8624
#max
mean_hr + 1.96*se_hr
## [1] 187.671

Problem 2 - Auto

Using the Auto data set from Assignment 5 (also attached here) perform a Linear Regression analysis using mpg as the dependent variable and the other 4 (displacement, horsepower, weight, acceleration) as independent variables. What is the final linear regression fit equation? Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coefficients? Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals. Then, take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals. Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs.

#auto data

auto = read.table("./assign11/auto-mpg.data",header = FALSE)

auto = setNames(auto, c('displacement', 'horsepower','weight', 'acceleration','mpg'))

What is the final linear regression fit equation?

#Lm function with multiple parameters
auto_lm = lm(mpg ~ displacement + horsepower + weight + acceleration, data = auto)

#Display all the coefficents and significance value
summary(auto_lm)
## 
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration, 
##     data = auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.378  -2.793  -0.333   2.193  16.256 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.2511397  2.4560447  18.424  < 2e-16 ***
## displacement -0.0060009  0.0067093  -0.894  0.37166    
## horsepower   -0.0436077  0.0165735  -2.631  0.00885 ** 
## weight       -0.0052805  0.0008109  -6.512  2.3e-10 ***
## acceleration -0.0231480  0.1256012  -0.184  0.85388    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared:  0.707,  Adjusted R-squared:  0.704 
## F-statistic: 233.4 on 4 and 387 DF,  p-value: < 2.2e-16
#Confidence interval for 95%
confint(auto_lm,level = .95)
##                     2.5 %       97.5 %
## (Intercept)  40.422278855 50.080000544
## displacement -0.019192122  0.007190380
## horsepower   -0.076193029 -0.011022433
## weight       -0.006874738 -0.003686277
## acceleration -0.270094049  0.223798050

Equation of auto dataset

\[mpg = 45.2511397 -0.0060009 * displacement -0.0436077 * horsepower -0.0052805 * weight -0.023148 * acceleration\]

What are their corresponding significance levels?

Displacement and acceleration is above the significance level of 0.05. Displacement, horsepower and acceleration is less than 0.05. Hence these three variables have significant relationship with mpg.

Sample data

What are the standard errors on each of the coefficients? Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals. Then, take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals. Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs.

#Fetch 40 sample rows from auto dataset 
sample_auto = auto[sample(nrow(auto),40),]

#Lm function with multiple parameters
sample_auto_lm = lm(mpg ~ displacement + horsepower + weight + acceleration, data = sample_auto)

#Display all the coefficents and significance value
summary(sample_auto_lm)
## 
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration, 
##     data = sample_auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5743 -3.3794 -0.4021  2.2251 11.6321 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  43.121003  10.795817   3.994 0.000317 ***
## displacement  0.007604   0.020934   0.363 0.718635    
## horsepower   -0.048009   0.069564  -0.690 0.494650    
## weight       -0.005757   0.002612  -2.205 0.034153 *  
## acceleration  0.034857   0.545980   0.064 0.949458    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.729 on 35 degrees of freedom
## Multiple R-squared:  0.6394, Adjusted R-squared:  0.5982 
## F-statistic: 15.52 on 4 and 35 DF,  p-value: 2.153e-07
#Confidence interval for 95%
confint(sample_auto_lm,level = .95)
##                    2.5 %        97.5 %
## (Intercept)  21.20433045 65.0376763551
## displacement -0.03489560  0.0501026354
## horsepower   -0.18923108  0.0932121842
## weight       -0.01105919 -0.0004555206
## acceleration -1.07354070  1.1432547957
#Plot between two variables
ggplot(heart.df,aes(x=age,y=maxhr)) + geom_point() + geom_smooth(method="lm") 

Sample mpg equation

\[sample_mpg = 43.1210034 0.0076035 * displacement -0.0480094 * horsepower -0.0057574 * weight 0.034857 * acceleration\]