1. Abstract

This project explores an Oil deapsorbtion dataset from a high school science experiment from 2016 that measures the deapsorbtion of sand based on various factors, in this project. We analyze how the oil added and the amount of time exposed to an ultrasound affects the amount of oil deapsorbed from the sand using a two-way ANOVA.

2. Introduction

When running a two-way ANOVA analysis, one must consider the possible interaction between the variables. We use R programming in this project to analyze the effect of the two explanatory variables (amount of oil and ultrasound time) on the response variable(amount of oil deapsorbed). Additionally, we create an interaction plot for the two variables to determine whether to include an interaction term in the model. We run the model and analyze the results, including the appropriate 95% confidence intervals.

3. Methodology

  • Data Source: OilDeapsorbtion dataset from Stat2Data package
  • Tools: R programming language, devtools, ggplot2, tidyr
  • Data Pre-processing: data transformation and grouping
  • Data Visualization: Creating dot plots using ggplot2, and interaction and residual plots using base R

4. Results

4.1 Dot plot of Two Explanatory Variables

This dot plot reveals the variance of the oil removed is larger when more oil is added and smaller when less oil is added. The group mean of the difference in oil removal when the smaller amount of oil is added seems to be lower than when the larger amount of oil is added. The variance in the difference in oil removed for the lesser ultrasound time is larger than that of the smaller ultrasound time. The group means appear to be similar.

Because differences between means in ultrasound time are slight, and some of the variances of data points are spread out, it is hard to say whether a two-way ANOVA is likely to show some significance. However, where a two-way ANOVA might show some significance around the higher ultrasound time based on the tightly clustered data points of the difference in oil removed or with the differences in group mean based on the amount of oil may also prove significant. There may also be some significance in the interaction of these two variables, but first, we must construct an interaction plot to determine whether this is the case.

4.2 Interaction plot of Two Explanatory Variables

Since the two lines cross in the interaction plot and are not parallel, there is an interaction between the two explanatory variables, so we should include an interaction factor in the model. Whether the interaction is statistically significant is yet to be seen. Next, we check whether the model meets the conditions for a two-way ANOVA with the interaction.

4.3 Checking Condition for Two-Way Anova with Interaction model

The assumptions that this model must meet are normality and constant variance of residuals. In the fitted residuals plot, we see a relatively even variance of residuals across groups; however, there are some outliers that may skew the data. In the Q-Q residuals plot, the data falls approximately along a straight line, though just as before, there are a few outliers that could potentially indicate a deviation from normality. Now, we run the model.

4.4 Interpreting the Model

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Oil          1  4.556   4.556   8.760 0.00542 **
## Ultra        1  0.056   0.056   0.108 0.74417   
## Oil:Ultra    1  1.406   1.406   2.704 0.10883   
## Residuals   36 18.725   0.520                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the large F-value and small p-value (<0.05), the model shows that the average amount of oil removed is statistically significant based on the amount of oil added. The differences in the average amount of oil removed based on the ultrasound time and the interaction factor between the amount of oil and ultrasound time are not statistically significant. For the final step of our analysis, we calculate and interpret the appropriate confidence intervals for the model.

4.5 Calculating and Interpreting Confidence Intervals

We found that the interaction between our two explanatory variables was not statistically significant based on its higher p-value, so we will only be constructing confidence intervals for the main effects.

## We can be 95% confident that the average difference in amount of oil removed in 5ml Oil vs. 10ml Oil is between 0.07 and 0.65 ml. This interval does not include 0 which suggests a statistically significant affect that supports our finding based on the p-value.
## We can be 95% confident that the average difference in amount of oil removed in 5 minute vs. 10 minute ultrasounds is between -0.08 and 0.5 ml. At one side of the interval there is a very slight negative effect and on the other side there is a slight positive effect. There is a lack of statistical significance in the the main effect of the ultrasound, shown by this interval including 0 and the high p-value calculated in the model.

5. Conclusion

In this project, our goal was to determine whether the amount of oil added, the ultrasound time, and the interaction between these two explanatory variables had an effect on the amount of oil removed. We determined whether there was interaction and conducted a two-way ANOVA analysis, where we discussed the significance of our findings using F-values, p-values, and confidence intervals. In the end, we determined that of the variables tested, the amount of oil added was the only one that proved statistically significant.

6. References

  • Stat2Data dataset: [(devtools::install_github(“statmanrobin/Stat2Data”)(library(Stat2Data),(data(“OilDeapsorbtion”)))]

7. Appendices

7.1 Setup Code, Loading Libraries, Reading in Data

knitr::opts_chunk$set(echo = TRUE)
library(devtools)
library(Stat2Data)
library(ggplot2)
library(tidyr)

data("OilDeapsorbtion")

7.2 Data Transformation, Dot plot 4.1

OilDeapsorbtion_long <- OilDeapsorbtion %>%
  gather(key = "Variable", value = "Amount", Oil, Ultra)

ggplot(OilDeapsorbtion_long, aes(x = Amount, y = Diff, color = Variable)) +
  geom_jitter(width = 0.2, height = 0.2, size = 2) +  
  labs(title = "Dot plot of Amount (Oil or Ultrasound) vs Difference in Oil Removed", 
       x = "Amount (Oil or Ultrasound)", 
       y = "Difference (ml)") +
  scale_color_manual(values = c("red", "blue"), labels = c("Oil Amount", "Ultrasound Time")) +
  theme_minimal()

7.3 Data Grouping, Interaction Plot 4.2

OilDeapsorbtion_modified <- OilDeapsorbtion

OilDeapsorbtion_modified$Oil <- as.factor(OilDeapsorbtion_modified$Oil)
OilDeapsorbtion_modified$Ultra <- as.factor(OilDeapsorbtion_modified$Ultra)

interaction.plot(
  x.factor = OilDeapsorbtion_modified$Oil,   
  trace.factor = OilDeapsorbtion_modified$Ultra, 
  response = OilDeapsorbtion_modified$Diff,      
  type = "l",                                    
  col = c("red", "blue"),                    
  xlab = "Oil (ml)",                            
  ylab = "Difference (ml)",                    
  trace.label = "Ultrasound (min)",              
  main = "Interaction Plot of Oil (ml) and Ultrasound (min) on Difference",  # Title of the plot
  cex.main = 1)  

7.4 ANOVA Model and Residual Plots

anova_model <- aov(Diff ~ Oil * Ultra, data = OilDeapsorbtion)

plot(anova_model, which = 1)

plot(anova_model, which = 2)

7.5 ANOVA Model Summary

summary(anova_model)

7.6 Confidence Intervals Calculations and Interpretations

conf_intervals <- confint(anova_model)

oil_ci <- conf_intervals["Oil",]

cat("We can be 95% confident that the average difference in amount of oil removed in 5ml Oil vs. 10ml Oil is between",
    round(oil_ci[1], 2), "and", round(oil_ci[2], 2), "ml. This interval does not include 0 which suggest a statistically significant effect to supports our finding based on the p-value.\n\n")

ultra_ci <- conf_intervals["Ultra",]

cat("We can be 95% confident that the average difference in amount of oil removed in 5 minute vs. 10 minute ultrasounds is between",
    round(ultra_ci[1], 2), "and", round(ultra_ci[2], 2), "ml. At one side of the interval there is a very slight negative effect and on the other side there is a slight positive effect due to the lack of statistical significance in the the main effect of the ultrasound shown by this interval including 0 and the high p-value calculated in the model.\n")