This project explores an Oil deapsorbtion dataset from a high school science experiment from 2016 that measures the deapsorbtion of sand based on various factors, in this project. We analyze how the oil added and the amount of time exposed to an ultrasound affects the amount of oil deapsorbed from the sand using a two-way ANOVA.
When running a two-way ANOVA analysis, one must consider the possible interaction between the variables. We use R programming in this project to analyze the effect of the two explanatory variables (amount of oil and ultrasound time) on the response variable(amount of oil deapsorbed). Additionally, we create an interaction plot for the two variables to determine whether to include an interaction term in the model. We run the model and analyze the results, including the appropriate 95% confidence intervals.
This dot plot reveals the variance of the oil removed is larger when more oil is added and smaller when less oil is added. The group mean of the difference in oil removal when the smaller amount of oil is added seems to be lower than when the larger amount of oil is added. The variance in the difference in oil removed for the lesser ultrasound time is larger than that of the smaller ultrasound time. The group means appear to be similar.
Because differences between means in ultrasound time are slight, and some of the variances of data points are spread out, it is hard to say whether a two-way ANOVA is likely to show some significance. However, where a two-way ANOVA might show some significance around the higher ultrasound time based on the tightly clustered data points of the difference in oil removed or with the differences in group mean based on the amount of oil may also prove significant. There may also be some significance in the interaction of these two variables, but first, we must construct an interaction plot to determine whether this is the case.
Since the two lines cross in the interaction plot and are not parallel, there is an interaction between the two explanatory variables, so we should include an interaction factor in the model. Whether the interaction is statistically significant is yet to be seen. Next, we check whether the model meets the conditions for a two-way ANOVA with the interaction.
The assumptions that this model must meet are normality and constant variance of residuals. In the fitted residuals plot, we see a relatively even variance of residuals across groups; however, there are some outliers that may skew the data. In the Q-Q residuals plot, the data falls approximately along a straight line, though just as before, there are a few outliers that could potentially indicate a deviation from normality. Now, we run the model.
## Df Sum Sq Mean Sq F value Pr(>F)
## Oil 1 4.556 4.556 8.760 0.00542 **
## Ultra 1 0.056 0.056 0.108 0.74417
## Oil:Ultra 1 1.406 1.406 2.704 0.10883
## Residuals 36 18.725 0.520
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the large F-value and small p-value (<0.05), the model shows that the average amount of oil removed is statistically significant based on the amount of oil added. The differences in the average amount of oil removed based on the ultrasound time and the interaction factor between the amount of oil and ultrasound time are not statistically significant. For the final step of our analysis, we calculate and interpret the appropriate confidence intervals for the model.
We found that the interaction between our two explanatory variables was not statistically significant based on its higher p-value, so we will only be constructing confidence intervals for the main effects.
## We can be 95% confident that the average difference in amount of oil removed in 5ml Oil vs. 10ml Oil is between 0.07 and 0.65 ml. This interval does not include 0 which suggests a statistically significant affect that supports our finding based on the p-value.
## We can be 95% confident that the average difference in amount of oil removed in 5 minute vs. 10 minute ultrasounds is between -0.08 and 0.5 ml. At one side of the interval there is a very slight negative effect and on the other side there is a slight positive effect. There is a lack of statistical significance in the the main effect of the ultrasound, shown by this interval including 0 and the high p-value calculated in the model.
In this project, our goal was to determine whether the amount of oil added, the ultrasound time, and the interaction between these two explanatory variables had an effect on the amount of oil removed. We determined whether there was interaction and conducted a two-way ANOVA analysis, where we discussed the significance of our findings using F-values, p-values, and confidence intervals. In the end, we determined that of the variables tested, the amount of oil added was the only one that proved statistically significant.
knitr::opts_chunk$set(echo = TRUE)
library(devtools)
library(Stat2Data)
library(ggplot2)
library(tidyr)
data("OilDeapsorbtion")
OilDeapsorbtion_long <- OilDeapsorbtion %>%
gather(key = "Variable", value = "Amount", Oil, Ultra)
ggplot(OilDeapsorbtion_long, aes(x = Amount, y = Diff, color = Variable)) +
geom_jitter(width = 0.2, height = 0.2, size = 2) +
labs(title = "Dot plot of Amount (Oil or Ultrasound) vs Difference in Oil Removed",
x = "Amount (Oil or Ultrasound)",
y = "Difference (ml)") +
scale_color_manual(values = c("red", "blue"), labels = c("Oil Amount", "Ultrasound Time")) +
theme_minimal()
OilDeapsorbtion_modified <- OilDeapsorbtion
OilDeapsorbtion_modified$Oil <- as.factor(OilDeapsorbtion_modified$Oil)
OilDeapsorbtion_modified$Ultra <- as.factor(OilDeapsorbtion_modified$Ultra)
interaction.plot(
x.factor = OilDeapsorbtion_modified$Oil,
trace.factor = OilDeapsorbtion_modified$Ultra,
response = OilDeapsorbtion_modified$Diff,
type = "l",
col = c("red", "blue"),
xlab = "Oil (ml)",
ylab = "Difference (ml)",
trace.label = "Ultrasound (min)",
main = "Interaction Plot of Oil (ml) and Ultrasound (min) on Difference", # Title of the plot
cex.main = 1)
anova_model <- aov(Diff ~ Oil * Ultra, data = OilDeapsorbtion)
plot(anova_model, which = 1)
plot(anova_model, which = 2)
summary(anova_model)
conf_intervals <- confint(anova_model)
oil_ci <- conf_intervals["Oil",]
cat("We can be 95% confident that the average difference in amount of oil removed in 5ml Oil vs. 10ml Oil is between",
round(oil_ci[1], 2), "and", round(oil_ci[2], 2), "ml. This interval does not include 0 which suggest a statistically significant effect to supports our finding based on the p-value.\n\n")
ultra_ci <- conf_intervals["Ultra",]
cat("We can be 95% confident that the average difference in amount of oil removed in 5 minute vs. 10 minute ultrasounds is between",
round(ultra_ci[1], 2), "and", round(ultra_ci[2], 2), "ml. At one side of the interval there is a very slight negative effect and on the other side there is a slight positive effect due to the lack of statistical significance in the the main effect of the ultrasound shown by this interval including 0 and the high p-value calculated in the model.\n")