days <- c(91, 105, 106, 108, 88, 91, 58, 82, 81, 65, 61, 48, 61 ,43, 33, 36)
index <- c(16.7, 17.1, 18.2, 18.1, 17.2, 18.2, 16, 17.2, 18, 17.2, 16.9, 17.1, 18.2, 17.3, 17.5, 16.6)

Question 1:

Make a scatterplot of the data, Title the plot and label the axes appropriately

plot(index,days)
title("Scatterplot Days vs. Index")

Question 2:

What are the Least Squares estimates of the parameters of a simple linear regression?

model <- lm(days~index)
coef(model)
## (Intercept)       index 
##  -192.98383    15.29637

B1 is 15.29637 and B0 is -192.98383

Question 3:

Add the least squares line to the scatterplot.  

plot(index,days)
abline(model)
title("Scatterplot Days vs. Index")

Question4 :

Check for model adequacy using diagnostic plots.  Show the plots and comment on the assumptions of Constant Variance, Normality, and whether there appear to be outliers.

plot(lm(days~index))

  1. Checking the constant Variance for the errors can be diagnosed in the following plot( Figure 1 ), we can see the variance is not actually changing across the fitted values which are effected by the variance of X values, so we can say the residuals have constant variance across the range of X. so the assumption of consant variance of the residuals is not violated. this is the most important assumption.

Figure 1

  1. Checking the normality of residuals, from the following plot (Figure 2) we can see the normal probability plot is showing some of straight line, that’s mean residual are following a normal distribution as assumed. so the assumption is not violated.

Figure 2

  1. Checking the outliers in our residuals, from the following plot (Figure 3) we can see that the sqrt of the standardized residuals range does not go beyond 1.73 which is corresponding to a value of 3 (sqrt(3)). 3 is just a quick judgment if we have outliers, so we conclude that there is no outlier.

Figure 3

Question 5:

Does the regression appear to be signficant?  Why or why not? 

summary(model)
## 
## Call:
## lm(formula = days ~ index)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

As shown in the model summary we see that the t value of the slope is 1.624, comparing this value with 2.1448, we can’t reject the null hypothesis, the alternative is to be B1=0 . The model appear to be insignficant . [ P-value can be used that 0.1267>0.05 that’s mean we fail to reject the null hypothesis].