1. Introduction

This report analyzes the relationship between Index (X) and Days (Y) using a Simple Linear Regression Model. The goal of this analysis is to determine whether Index is a useful predictor of Days and to evaluate the significance and reliability of the relationship.

2. Data

x <-c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
y <-c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
#check
length(x)
## [1] 16
length(y)
## [1] 16
#data frame
xy_data <- data.frame(x,y)
xy_data
##       x   y
## 1  16.7  91
## 2  17.1 105
## 3  18.2 106
## 4  18.1 108
## 5  17.2  88
## 6  18.2  91
## 7  16.0  58
## 8  17.2  82
## 9  18.0  81
## 10 17.2  65
## 11 16.9  61
## 12 17.1  48
## 13 18.2  61
## 14 17.3  43
## 15 17.5  33
## 16 16.6  36

The data table consists of 16 variables and 2 variables as shown in table above.

3. Scatter plot of the Days on Index

Use the plot() function to create a scatter plot of the Days on Index.

plot(x,y,col="blue")

4. Regression Model

We fit the model into:

\[ Y=\beta0 + B1x +\epsilon \]

Estimated Regression Equation

#least squares regression
lm(y~x)
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      -193.0         15.3
b0 <- -193.0
b1 <- 15.3
model <-lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## x             15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

Interpretation

  • Intercept (\(\beta0\)): -193.00

  • Slope (\(\beta1\)): 15.30

5. Hypothesis Test

We test \(Ho:\beta1=0\) vs \(Ha:\beta1\neq 0\)

n <- 16
b0 <- -193.0
b1 <- 15.3
MSE <- 566.2024
sum(x)/(16)
## [1] 17.34375
xbar <- sum(x)/(16)
(x-xbar)^2
##  [1] 0.414414063 0.059414062 0.733164062 0.571914063 0.020664063 0.733164062
##  [7] 1.805664062 0.020664063 0.430664062 0.020664063 0.196914063 0.059414062
## [13] 0.733164062 0.001914062 0.024414062 0.553164062
sum((x-xbar)^2)
## [1] 6.379375
f <- 6.379375
sqrt(MSE/f)
## [1] 9.420995
SE <- 9.420996
t <- b1/SE
b1/SE
## [1] 1.624032
t <- 1.624032

Conclusion

At \(\sigma=0.05\)

df <- 14
p <- 0.05
qt(0.975,14)
## [1] 2.144787
1.62<2.145
## [1] TRUE

Comment

Since 1.62 is less than 2.145, we reject \(Ho\) therefore regression is not significant.

6. Value of \(R^2\)

\[ R^2=\frac{SSR}{SST} \]

#R^2=SSR/SST
ˆy <- b0+b1*x
ˆy
##  [1] 62.51 68.63 85.46 83.93 70.16 85.46 51.80 70.16 82.40 70.16 65.57 68.63
## [13] 85.46 71.69 74.75 60.98
ybar <- sum(y)/n
sum(y)/16
## [1] 72.3125
ybar <- 72.3125
#SSR=sum(ˆy-ybar)^2
(ˆy-ybar)
##  [1]  -9.8025  -3.6825  13.1475  11.6175  -2.1525  13.1475 -20.5125  -2.1525
##  [9]  10.0875  -2.1525  -6.7425  -3.6825  13.1475  -0.6225   2.4375 -11.3325
(ˆy-ybar)^2
##  [1]  96.0890063  13.5608062 172.8567562 134.9663063   4.6332562 172.8567562
##  [7] 420.7626562   4.6332562 101.7576563   4.6332562  45.4613063  13.5608062
## [13] 172.8567562   0.3875063   5.9414062 128.4255562
sum((ˆy-ybar)^2)
## [1] 1493.383
SSR <- 1493.383
#SST=sum(y-ybar)^2
(y-ybar)
##  [1]  18.6875  32.6875  33.6875  35.6875  15.6875  18.6875 -14.3125   9.6875
##  [9]   8.6875  -7.3125 -11.3125 -24.3125 -11.3125 -29.3125 -39.3125 -36.3125
(y-ybar)^2
##  [1]  349.22266 1068.47266 1134.84766 1273.59766  246.09766  349.22266
##  [7]  204.84766   93.84766   75.47266   53.47266  127.97266  591.09766
## [13]  127.97266  859.22266 1545.47266 1318.59766
sum((y-ybar)^2)
## [1] 9419.438
SST <- 9419.438
SSR/SST
## [1] 0.1585427
#R^2=0.1585427

The value is \(R^2\)=0.1585427

7. Confidence and Prediction Intervals

#confidence interval
conf <- predict(model,interval = "confidence")
conf
##         fit      lwr       upr
## 1  62.46546 44.24504  80.68589
## 2  68.58401 54.90761  82.26042
## 3  85.41001 63.91295 106.90708
## 4  83.88038 63.97338 103.78737
## 5  70.11365 57.02842  83.19887
## 6  85.41001 63.91295 106.90708
## 7  51.75801 21.75791  81.75811
## 8  70.11365 57.02842  83.19887
## 9  82.35074 63.94915 100.75233
## 10 70.11365 57.02842  83.19887
## 11 65.52474 49.93042  81.11906
## 12 68.58401 54.90761  82.26042
## 13 85.41001 63.91295 106.90708
## 14 71.64328 58.85392  84.43265
## 15 74.70256 61.55896  87.84616
## 16 60.93583 41.22205  80.64961
#prediction interval
pred <- predict(model,interval = "prediction")
pred
##         fit       lwr      upr
## 1  62.46546  8.275374 116.6556
## 2  68.58401 15.748171 121.4199
## 3  85.41001 30.032167 140.7879
## 4  83.88038 29.100176 138.6606
## 5  70.11365 17.427738 122.7996
## 6  85.41001 30.032167 140.7879
## 7  51.75801 -7.441550 110.9576
## 8  70.11365 17.427738 122.7996
## 9  82.35074 28.099467 136.6020
## 10 70.11365 17.427738 122.7996
## 11 65.52474 12.160286 118.8892
## 12 68.58401 15.748171 121.4199
## 13 85.41001 30.032167 140.7879
## 14 71.64328 19.030075 124.2565
## 15 74.70256 22.002119 127.4030
## 16 60.93583  6.225545 115.6461

Plot with Intervals

library(ggplot2)
ggplot(xy_data,aes(x,y)) +
  geom_point() +
  geom_smooth(method = 'lm') +
  ggtitle('Days(Y) vs Index(X)')
## `geom_smooth()` using formula = 'y ~ x'

8. Model Adequacy

plot(model)

Checks

Plot Purpose
Residual vs Fitted To check linearity
Normal Q-Q To check normality
Scale-Location To check constant variance
Residuals vs Leverage To check influential points

Three assumptions are followed to check for normal adequacy

  • Normal Errors: Use the normal probability plot

  • Constant Variance: Plot the residuals by the predicted

  • Independence: Plot the residual by the time of order of collection

9. Conclusion

Finally, the regression model provides a very reliable tool that to understand and forecast the expected Days.