── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
So called project 4
Dot plots of explanatory and response variables
ggplot(OilDeapsorbtion, aes(x=Ultra,y=Diff))+
geom_point()+
labs(x="Time Exposed to Ultrasound (Minutes)",y="Difference in Oil Removed From a Control Run") ggplot(data=OilDeapsorbtion, aes(x=Oil,y=Diff))+
geom_point()+
labs(x="Volume of Oil in the sample (mL)",y="Difference in Oil Removed From a Control Run")These plots do not tell us a ton of information, but indicate there may be some significant difference between the experiments. From here, we grow curious if they interact and how they best explain it, justifying considering an anova test.
But, I hear you ask, what if these two variables interact with each other? Rest assured, we may check with yet another plot!
interaction.plot(x.factor = OilDeapsorbtion$Oil,
trace.factor = OilDeapsorbtion$Ultra,
response = OilDeapsorbtion$Diff,
fun = mean,
type = "l",
legend = TRUE,
xlab = "Oil Volume (mL)",
ylab = "Difference from Control in Volume (mL)",
trace.label = "Time Exposed to Ultrasound (Minutes)",
col = 1:2)This indicates a huge difference as they are not even close to parallel. This means an interaction term is worth our consideration and likely.
model <- lm(Diff ~ Oil + Ultra, data=OilDeapsorbtion)
plot(model, which=1)Our Data seems to have non-constant variance, instead ticking up on both ends. Despite this, we may proceed with an ANOVA test for our own curiosity and due to the relatively low degree of increase on both ends.
model <- aov(Diff ~ Oil * Ultra, data = OilDeapsorbtion)
summary(model) Df Sum Sq Mean Sq F value Pr(>F)
Oil 1 4.556 4.556 8.760 0.00542 **
Ultra 1 0.056 0.056 0.108 0.74417
Oil:Ultra 1 1.406 1.406 2.704 0.10883
Residuals 36 18.725 0.520
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, we see our interaction term is actually insignificant at our 5% significance or even a generous 10%, as is the Ultrasound duration, so we are free to consider just the oil volume for our confidence interval.
OilDeapsorbtion %>%
group_by(Oil == 5) %>%
summarise(mean_val = mean(Diff, na.rm = TRUE))# A tibble: 2 × 2
`Oil == 5` mean_val
<lgl> <dbl>
1 FALSE 1.32
2 TRUE 0.65
We then calculate the 95% confidence interval through the following steps
(1.325-0.65)-(2.028*(0.520^0.5)*(1/20+1/20)^0.5)[1] 0.2125448
(1.325-0.65)+(2.028*(0.520^0.5)*(1/20+1/20)^0.5)[1] 1.137455
We may interpret this as “We are 95% certain that the average difference between the 5ml and 10ml level of oil is between 0.2125448 and 1.137455”, or to be less lengthy, between 0.212 and 1.138.