Flipped Assignment 4

Figure 1: Liner Fit of Bacteria and Minutes

## 
## Call:
## lm(formula = bact ~ mins, data = FA5_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.323  -9.890  -7.323   2.463  45.282 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   142.20      11.26  12.627 1.81e-07 ***
## mins          -12.48       1.53  -8.155 9.94e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.3 on 10 degrees of freedom
## Multiple R-squared:  0.8693, Adjusted R-squared:  0.8562 
## F-statistic: 66.51 on 1 and 10 DF,  p-value: 9.944e-06

Part A: What Is the value of R^2?

The

According to the linear model, the \(R^2\) is 0.8693.

Given the summary table, we determined that liner model accurately represents the data.

Part B: Are the model assumptions adequate?

Constant Variance

Figure 2:Standardized Residual vs the Fitted

Figure 2 indicates a pattern in the Residuals a pattern and it indicates that the constant variance is assumption is not a valid assumption.

Normality Assumption

Figure 3:Normal Probability Plot

Figure 3 is the Normal Probability Plot and standardized residuals seem to follow a non-liner pattern. The non-linear pattern indicates that that the fitted data has a problem with the Normality assumption.

Since the Adequacy Plots indicate violations of the Least Squares Assumptions, We recanted out previous statement about the linear model in not good fit for the data.

Part C: Transformed Data

Figure 4: Box-Cox Transformation Plot

Figure 4 is the box-cox transformation plot where the best transformation factor, \(\lambda\), is the factor that maximizes the log-likelihood.

For this problem, we determined that the best \(\lambda\) is 0.1010101. We decided to round the value to 0.1 because it’s a lot easier to use 0.1 instead of 0.1010101.

Part D: Transformed Linear Fit

Figure 5: Transformed Fit of Bacteria and Minutes

## 
## Call:
## lm(formula = Transformed_data ~ mins, data = FA5_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.026103 -0.008334 -0.003293  0.012661  0.025559 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.700726   0.010226  166.31  < 2e-16 ***
## mins        -0.034957   0.001389  -25.16 2.25e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01662 on 10 degrees of freedom
## Multiple R-squared:  0.9844, Adjusted R-squared:  0.9829 
## F-statistic: 632.9 on 1 and 10 DF,  p-value: 2.255e-10

Figure five is the transformed fit of the bacteria data. Overall, the transformed data is a better fit than non-transformed data.

The new \(R^2\) value is 0.9844. ( Which is 0.1151 higher than the non-ttranformed data.)

library(tidyverse)
library(readxl)
library(ggpmisc)
library(MASS)

FA5_data <- read_excel('C:/Users/Rustg/OneDrive/Documents/IE 5344/Flipped_Assignment_5.xlsx')

FA5_data <- FA5_data[-3:-9]


#### Part A
linear_model <- lm(data = FA5_data, bact~mins )

ggplot(data =FA5_data,aes(mins,bact))+geom_point()+ geom_smooth(method = "lm",colour ="red4",fill = "bisque",formula = 'y ~ x')+
  ggtitle("Least Squares Fit: Bacteria vs Minutes")+xlab("Minutes")+ylab("Bacteria")



R_squared <- summary(linear_model)$r.squared

summary(linear_model)





# Part B

plot(linear_model)


 # Part C
Lamda_Value <-data.frame(boxcox(linear_model))


Best_value <- round(Lamda_Value[which.max(Lamda_Value$y),1],2)


FA5_data$Transformed_data <- FA5_data$bact^Best_value

Transformed_model <-lm(data = FA5_data,Transformed_data~mins)



# Part D
Transformed_model <-lm(data = FA5_data,Transformed_data~mins)

ggplot(data =FA5_data,aes(mins,Transformed_data))+geom_point()+ geom_smooth(method = "lm",colour ="red4",fill = "bisque",formula = 'y ~ x')+ggtitle("Transformed Fit: Bacteria and Minutes")+xlab("Minutes")+ylab("Bacteria")



R_squared_2 <- summary(Transformed_model)$r.squared

summary(Transformed_model)

```