Getting Data

Here , X = Minutes of Exposure

Y = Number of Bacteria

x <- c(1:12)
y <- c(175,108,95,82,71,50,49,31,28,17,16,11)

A) Fit a simple linear regression model to the data. What Is the value of R^2?

model <- lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.323  -9.890  -7.323   2.463  45.282 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   142.20      11.26  12.627 1.81e-07 ***
## x             -12.48       1.53  -8.155 9.94e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.3 on 10 degrees of freedom
## Multiple R-squared:  0.8693, Adjusted R-squared:  0.8562 
## F-statistic: 66.51 on 1 and 10 DF,  p-value: 9.944e-06

As we can see from above summary of our fitted regression model that value of R^2 is 0.8693 .

B)Check for model adequacy (comment)

plot(model)

From above plots of residuals vs fitted, we can see that it doesnt seems random pattern .hence we can state that variance is not constant in our model , but we should also keep in mind that we have only 12 data points so our interpretation may not be accurate

From Normality plot , we can see that data points doesnot fall fairly in a straight line , Hence we can state that we dont have normality , but we should also keep in mind that we have only 12 data points so our interpretation may not be accurate

C) Use Box-Cox to perform a power transformation, transform the data as appropriate

trans <- boxcox(model)

lambda <- trans$x
likelihood <- trans$y
f <- lambda[which.max(likelihood)]
f
## [1] 0.1010101

As we can see that value for power transform is very close to 0 , hence we will take it as zero .

We also know that when it is zero , we directly take log of the response for transformation

y_new <- log(y)
y_new
##  [1] 5.164786 4.682131 4.553877 4.406719 4.262680 3.912023 3.891820 3.433987
##  [9] 3.332205 2.833213 2.772589 2.397895

Above data is the transformed values of response

D) Fit a simple linear regression model to the transformed data. What Is the value of R^2?

model2 <- lm(y_new~x)
summary(model2)
## 
## Call:
## lm(formula = y_new ~ x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.184303 -0.083994  0.001453  0.072825  0.206246 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.33878    0.07409   72.05 6.47e-15 ***
## x           -0.23617    0.01007  -23.46 4.49e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1204 on 10 degrees of freedom
## Multiple R-squared:  0.9822, Adjusted R-squared:  0.9804 
## F-statistic: 550.3 on 1 and 10 DF,  p-value: 4.489e-10

We can see from summary above that value of R^2 is 0.9822

E) Check for model adequacy using the transformed data (comment)

plot(model2)

We can see from above residual vs fitted plot that , the plot shows no patter , hence we can claim that do have a constant variance , but we should also keep in mind that we have only 12 data points so our interpretation may not be accurate

NOrmality plot also shows that data points fall fairly on straight line , hence we claim that our assumption of normality on transformed data holds true , but we should also keep in mind that we have only 12 data points so our interpretation may not be accurate

This all shows that our transformed model is better than our old model

F) Estimate the number of bacteria at 10 minutes of exposure, how does this compare with the observed value?

p <- c(10)
pred <- predict(model2,data.frame(x=p),interval = "prediction")
ans <- exp(pred[1])
ans
## [1] 19.62999

Hence the Estimate the number of bacteria at 10 minutes of exposure is 19.62999

G) Provide a 95% prediction interval on the number of bacteria at 10 minutes of exposure.

predin <- predict(model2,data.frame(x=p),interval = "prediction")
lower <- exp(predin[2])
upper <- exp(predin[3])
lower
## [1] 14.68805
upper
## [1] 26.2347

Hence prediction interval for 10 minutes will be as follows

Lower = 14.6880484

upper = 26.2347038