Introduction

This project analyzes the “OilDeapsorbtion” dataset using regression and ANOVA. The response variable is the difference in the amount of oil removed in the experimental run and the control run. Key analyses include creating dot plots, interaction plots, checking conditions for two-way ANOVA, running the model, and interpreting 95% confidence intervals.

Data Preparation

We prepare the dataset for analysis by installing necessary packages, loading the data, and inspecting its structure.

# Install and load the necessary package
install.packages("devtools")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
devtools::install_github("statmanrobin/Stat2Data")
## Skipping install of 'Stat2Data' from a github remote, the SHA1 (3fe987c7) has not changed since last install.
##   Use `force = TRUE` to force installation
# Load the dataset
library(Stat2Data)
data(OilDeapsorbtion)

# View the structure and summary of the data
str(OilDeapsorbtion)
## 'data.frame':    40 obs. of  4 variables:
##  $ Salt : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ultra: int  5 5 5 5 5 10 10 10 10 10 ...
##  $ Oil  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Diff : num  0.5 0.5 0.5 -0.5 0 -0.5 0 1.5 1 0.5 ...
summary(OilDeapsorbtion)
##       Salt         Ultra           Oil            Diff        
##  Min.   :0.0   Min.   : 5.0   Min.   : 5.0   Min.   :-1.0000  
##  1st Qu.:0.0   1st Qu.: 5.0   1st Qu.: 5.0   1st Qu.: 0.5000  
##  Median :0.5   Median : 7.5   Median : 7.5   Median : 1.0000  
##  Mean   :0.5   Mean   : 7.5   Mean   : 7.5   Mean   : 0.9875  
##  3rd Qu.:1.0   3rd Qu.:10.0   3rd Qu.:10.0   3rd Qu.: 1.5000  
##  Max.   :1.0   Max.   :10.0   Max.   :10.0   Max.   : 3.0000

(a) Dot Plots for Explanatory Variables

we create dot plots to explore the relationship between the two explanatory variables, Amount of oil, and Amount of time exposed to ultrasound, and the response variable, which is the difference in the amount of oil removed. We will examine whether these variables might influence the response.

library(ggplot2)

# Dot plots for explanatory variables vs. response
ggplot(OilDeapsorbtion, aes(x = factor(Oil), y = Diff)) +
  geom_jitter(width = 0.2) +
  labs(title = "Dot Plot of Diff by Amount of Oil", x = "Amount of Oil (ml)", y = "Diff")

ggplot(OilDeapsorbtion, aes(x = factor(Ultra), y = Diff)) +
  geom_jitter(width = 0.2) +
  labs(title = "Dot Plot of Diff by Ultrasound Duration", x = "Ultrasound Time (minutes)", y = "Diff")

(b) Interaction Plot

We will now create an interaction plot to visualize how the two explanatory variables interact in their effect on the response variable. This will help us decide if we should include an interaction term in the ANOVA model.

interaction.plot(OilDeapsorbtion$Oil, OilDeapsorbtion$Ultra, OilDeapsorbtion$Diff,
                 col = c("red", "blue"), legend = TRUE,
                 xlab = "Amount of Oil (ml)", ylab = "Mean Diff", 
                 main = "Interaction Plot: Oil Amount and Ultrasound Duration")

(c) Checking Conditions for Two-Way ANOVA

We check the conditions for performing a two-way ANOVA with interaction by using graphical methods such as residual plots. We want to ensure that assumptions like normality, homogeneity of variances, and independence are met.

# Residuals vs. fitted values plot
model <- aov(Diff ~ factor(Oil) * factor(Ultra), data = OilDeapsorbtion)
par(mfrow = c(1, 2))
plot(model, which = 1, main = "Residuals vs. Fitted Values")

# Normality plot of residuals
qqnorm(residuals(model))
qqline(residuals(model))

(d) Run the Two-way ANOVA Model

After verifying the conditions, we proceed with running the two-way ANOVA with an interaction term. This analysis will help determine if the effects of oil amount and ultrasound exposure time are statistically significant.

# Two-way ANOVA with interaction term
summary(model)
##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## factor(Oil)                1  4.556   4.556   8.760 0.00542 **
## factor(Ultra)              1  0.056   0.056   0.108 0.74417   
## factor(Oil):factor(Ultra)  1  1.406   1.406   2.704 0.10883   
## Residuals                 36 18.725   0.520                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(e) Construct 95% Confidence Intervals

We will construct 95% confidence intervals for the model parameters to interpret the results in context. This will help us understand the range within which we expect the true values of the parameters to lie.

# Running two-way ANOVA with interaction term
model <- aov(Diff ~ factor(Oil) * factor(Ultra), data = OilDeapsorbtion)

# Summary of the model
summary(model)
##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## factor(Oil)                1  4.556   4.556   8.760 0.00542 **
## factor(Ultra)              1  0.056   0.056   0.108 0.74417   
## factor(Oil):factor(Ultra)  1  1.406   1.406   2.704 0.10883   
## Residuals                 36 18.725   0.520                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 95% Confidence intervals for pairwise comparisons using Tukey HSD
tukey_results <- TukeyHSD(model, "factor(Oil):factor(Ultra)")
summary(tukey_results)
##                           Length Class  Mode   
## factor(Oil):factor(Ultra) 24     -none- numeric
# Running two-way ANOVA with interaction term
model <- aov(Diff ~ factor(Oil) * factor(Ultra), data = OilDeapsorbtion)

# Summary of the model
summary(model)
##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## factor(Oil)                1  4.556   4.556   8.760 0.00542 **
## factor(Ultra)              1  0.056   0.056   0.108 0.74417   
## factor(Oil):factor(Ultra)  1  1.406   1.406   2.704 0.10883   
## Residuals                 36 18.725   0.520                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 95% Confidence intervals for pairwise comparisons using Tukey HSD
tukey_results <- TukeyHSD(model, "factor(Oil):factor(Ultra)")
summary(tukey_results)
##                           Length Class  Mode   
## factor(Oil):factor(Ultra) 24     -none- numeric
# Running two-way ANOVA with interaction term
model <- aov(Diff ~ factor(Oil) * factor(Ultra), data = OilDeapsorbtion)

# Summary of the model
summary(model)
##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## factor(Oil)                1  4.556   4.556   8.760 0.00542 **
## factor(Ultra)              1  0.056   0.056   0.108 0.74417   
## factor(Oil):factor(Ultra)  1  1.406   1.406   2.704 0.10883   
## Residuals                 36 18.725   0.520                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 95% Confidence intervals for pairwise comparisons using Tukey HSD
tukey_results <- TukeyHSD(model, "factor(Oil):factor(Ultra)")
summary(tukey_results)
##                           Length Class  Mode   
## factor(Oil):factor(Ultra) 24     -none- numeric

Conclusion

In this project, we analyzed the “OilDeapSorbtion” dataset using regression and two-way ANOVA techniques to explore the relationship between the explanatory variables (amount of oil and exposure time to ultrasound) and the response variable (difference in oil removal). Through dot plots and an interaction plot, we assessed the potential effects and interactions between these variables. After checking the conditions for ANOVA, we ran the model and interpreted the results, which revealed the significance of the variables and their interaction. Finally, 95% confidence intervals were constructed, providing a clearer understanding of the parameter estimates. Overall, this analysis offers insights into how the experimental factors influence oil removal efficiency and highlights the importance of model assumptions and interactions in statistical analysis.