ASSIGNMENT 11 - LINEAR REGRESSION IN R IS 605 FUNDAMENTALS OF COMPUTATIONAL MATHEMATICS - 2014 Using R’s lm function, perform regression analysis and measure the signi cance of the independent variables for the following two data sets. In the rst case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation: MaxHR = 220 ???? Age You have been given the following sample: Age 18 23 25 35 65 54 34 56 72 19 23 42 18 39 37 MaxHR 202 186 187 180 156 169 174 172 153 199 193 174 198 183 178 Perform a linear regression analysis tting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the e ect of Age on Max HR signi cant? What is the signi cance level? Please also plot the tted relationship between Max HR and Age.

#Read Data

age <-c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37)
hr <- c(202,186,187,180,156,169,174,172,153,199,193,174,198,183,178)

Is the effect of Age on Max HR significant? H0 : Age has no effect on Max HR, from the linear regression equation y=b0+b1x+e H1 : Age has effect on Max HR

lm.r <- lm(hr ~ age) 
summary (lm.r)
## 
## Call:
## lm(formula = hr ~ age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9258 -2.5383  0.3879  3.1867  6.6242 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 210.04846    2.86694   73.27  < 2e-16 ***
## age          -0.79773    0.06996  -11.40 3.85e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.578 on 13 degrees of freedom
## Multiple R-squared:  0.9091, Adjusted R-squared:  0.9021 
## F-statistic:   130 on 1 and 13 DF,  p-value: 3.848e-08

Based on this Our regression model for the above data 210.048 -0.797Age

p-value: 3.848e-08- which is significantly low, so here I would like to reject the NULL Hypothesis There is a significant relationship between Age and MaxHR in the linear regression model of the above data set Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Auto data set

Using the Auto data set from Assignment 5 (also attached here) perform a Linear Re- gression analysis using mpg as the dependent variable and the other 4 (displacement, horse- power, weight, acceleration) as independent variables. What is the final linear regression fit equation? Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coeficients? Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals. Then, take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals. Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs. Please submit an R-markdown file documenting your experiments. Your submission should include the final linear fits, and their corresponding significance levels. In addition, you should clearly state what you concluded from looking at the fit and their significance levels.

autodata <- read.table('auto-mpg.data',
                   col.names = c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg'))

head(autodata)
##   displacement horsepower weight acceleration mpg
## 1          307        130   3504         12.0  18
## 2          350        165   3693         11.5  15
## 3          318        150   3436         11.0  18
## 4          304        150   3433         12.0  16
## 5          302        140   3449         10.5  17
## 6          429        198   4341         10.0  15
set.seed(10)
random40 <- autodata[sample(nrow(autodata), 40),]
head(random40)
##     displacement horsepower weight acceleration  mpg
## 199          250         78   3574         21.0 18.0
## 120          121        112   2868         15.5 19.0
## 167          140         83   2639         17.0 23.0
## 270          156        105   2745         16.7 23.2
## 34           225        105   3439         15.5 16.0
## 88           302        137   4042         14.5 14.0
(lm.r.auto <- lm(mpg ~ .,data=random40))
## 
## Call:
## lm(formula = mpg ~ ., data = random40)
## 
## Coefficients:
##  (Intercept)  displacement    horsepower        weight  acceleration  
##    44.117698     -0.023242     -0.006429     -0.005408      0.076267
summary(lm.r.auto)
## 
## Call:
## lm(formula = mpg ~ ., data = random40)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2563 -2.6450 -0.3425  2.2191 12.2042 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  44.117698  10.879547   4.055 0.000266 ***
## displacement -0.023242   0.025681  -0.905 0.371646    
## horsepower   -0.006429   0.075464  -0.085 0.932590    
## weight       -0.005408   0.003029  -1.785 0.082860 .  
## acceleration  0.076267   0.502775   0.152 0.880301    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.502 on 35 degrees of freedom
## Multiple R-squared:  0.7566, Adjusted R-squared:  0.7288 
## F-statistic:  27.2 on 4 and 35 DF,  p-value: 2.597e-10
lm.r.auto$coefficients
##  (Intercept) displacement   horsepower       weight acceleration 
## 44.117697855 -0.023241502 -0.006429309 -0.005408394  0.076266690

resulting fitting equation is mpg= 44.117697855 -0.023241502displacement - 0.006429309horsepower - 0.005408394weight -0.076266690acceleration

measure the 95% confidence intervals for the 40 set data

confint(lm.r.auto,  level=0.95)
##                    2.5 %       97.5 %
## (Intercept)  22.03104408 6.620435e+01
## displacement -0.07537641 2.889340e-02
## horsepower   -0.15962850 1.467699e-01
## weight       -0.01155797 7.411781e-04
## acceleration -0.94442052 1.096954e+00
#entire data set
(lm.r.autodata <- lm(mpg ~ .,data=autodata))
## 
## Call:
## lm(formula = mpg ~ ., data = autodata)
## 
## Coefficients:
##  (Intercept)  displacement    horsepower        weight  acceleration  
##    45.251140     -0.006001     -0.043608     -0.005281     -0.023148
summary(lm.r.autodata)
## 
## Call:
## lm(formula = mpg ~ ., data = autodata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.378  -2.793  -0.333   2.193  16.256 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.2511397  2.4560447  18.424  < 2e-16 ***
## displacement -0.0060009  0.0067093  -0.894  0.37166    
## horsepower   -0.0436077  0.0165735  -2.631  0.00885 ** 
## weight       -0.0052805  0.0008109  -6.512  2.3e-10 ***
## acceleration -0.0231480  0.1256012  -0.184  0.85388    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared:  0.707,  Adjusted R-squared:  0.704 
## F-statistic: 233.4 on 4 and 387 DF,  p-value: < 2.2e-16
lm.r.autodata$coefficients
##  (Intercept) displacement   horsepower       weight acceleration 
## 45.251139699 -0.006000871 -0.043607731 -0.005280508 -0.023147999

mpg =45.251139699 -0.006000871displacement -0.043607731horsepower - 0.005280508weight - 0.023147999acceleration

measure the 95% confidence intervals for the entire date set

confint(lm.r.autodata,  level=0.95)
##                     2.5 %       97.5 %
## (Intercept)  40.422278855 50.080000544
## displacement -0.019192122  0.007190380
## horsepower   -0.076193029 -0.011022433
## weight       -0.006874738 -0.003686277
## acceleration -0.270094049  0.223798050

from the above it shows that weight and horsepower has significant relation with mpg