Figure 1: Liner Fit of Bacteria and Minutes
##
## Call:
## lm(formula = bact ~ mins, data = FA5_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.323 -9.890 -7.323 2.463 45.282
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 142.20 11.26 12.627 1.81e-07 ***
## mins -12.48 1.53 -8.155 9.94e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.3 on 10 degrees of freedom
## Multiple R-squared: 0.8693, Adjusted R-squared: 0.8562
## F-statistic: 66.51 on 1 and 10 DF, p-value: 9.944e-06
The
According to the linear model, the \(R^2\) is 0.8693.
Given the summary table, we determined that liner model accurately represents the data.
Figure 2:Standardized Residual vs the Fitted
Figure 2 indicates a pattern in the Residuals a pattern and it indicates that the constant variance is assumption is not a valid assumption.
Figure 3:Normal Probability Plot
Figure 3 is the Normal Probability Plot and standardized residuals seem to follow a non-liner pattern. The non-linear pattern indicates that that the fitted data has a problem with the Normality assumption.
Since the Adequacy Plots indicate violations of the Least Squares Assumptions, We recanted out previous statement about the linear model in not good fit for the data.
Figure 4: Box-Cox Transformation Plot
Figure 4 is the box-cox transformation plot where the best transformation factor, \(\lambda\), is the factor that maximizes the log-likelihood.
For this problem, we determined that the best \(\lambda\) is 0.1010101. We decided to round the value to 0.1 because it’s a lot easier to use 0.1 instead of 0.1010101.
Figure 5: Transformed Fit of Bacteria and Minutes
##
## Call:
## lm(formula = Transformed_data ~ mins, data = FA5_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.026103 -0.008334 -0.003293 0.012661 0.025559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.700726 0.010226 166.31 < 2e-16 ***
## mins -0.034957 0.001389 -25.16 2.25e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01662 on 10 degrees of freedom
## Multiple R-squared: 0.9844, Adjusted R-squared: 0.9829
## F-statistic: 632.9 on 1 and 10 DF, p-value: 2.255e-10
Figure five is the transformed fit of the bacteria data. Overall, the transformed data is a better fit than non-transformed data.
The new \(R^2\) value is 0.9844. ( Which is 0.1151 higher than the non-ttranformed data.)
library(tidyverse)
library(readxl)
library(ggpmisc)
library(MASS)
FA5_data <- read_excel('C:/Users/Rustg/OneDrive/Documents/IE 5344/Flipped_Assignment_5.xlsx')
FA5_data <- FA5_data[-3:-9]
#### Part A
linear_model <- lm(data = FA5_data, bact~mins )
ggplot(data =FA5_data,aes(mins,bact))+geom_point()+ geom_smooth(method = "lm",colour ="red4",fill = "bisque",formula = 'y ~ x')+
ggtitle("Least Squares Fit: Bacteria vs Minutes")+xlab("Minutes")+ylab("Bacteria")
R_squared <- summary(linear_model)$r.squared
summary(linear_model)
# Part B
plot(linear_model)
# Part C
Lamda_Value <-data.frame(boxcox(linear_model))
Best_value <- round(Lamda_Value[which.max(Lamda_Value$y),1],2)
FA5_data$Transformed_data <- FA5_data$bact^Best_value
Transformed_model <-lm(data = FA5_data,Transformed_data~mins)
# Part D
Transformed_model <-lm(data = FA5_data,Transformed_data~mins)
ggplot(data =FA5_data,aes(mins,Transformed_data))+geom_point()+ geom_smooth(method = "lm",colour ="red4",fill = "bisque",formula = 'y ~ x')+ggtitle("Transformed Fit: Bacteria and Minutes")+xlab("Minutes")+ylab("Bacteria")
R_squared_2 <- summary(Transformed_model)$r.squared
summary(Transformed_model)
```