So called project 4

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Dot plots of explanatory and response variables

  ggplot(OilDeapsorbtion, aes(x=Ultra,y=Diff))+
    geom_point()+
  labs(x="Time Exposed to Ultrasound (Minutes)",y="Difference in Oil Removed From a Control Run")

  ggplot(data=OilDeapsorbtion, aes(x=Oil,y=Diff))+
    geom_point()+
    labs(x="Volume of Oil in the sample (mL)",y="Difference in Oil Removed From a Control Run")

These plots do not tell us a ton of information, but indicate there may be some significant difference between the experiments. From here, we grow curious if they interact and how they best explain it, justifying considering an anova test.

But, I hear you ask, what if these two variables interact with each other? Rest assured, we may check with yet another plot!

interaction.plot(x.factor = OilDeapsorbtion$Oil, 
                  trace.factor = OilDeapsorbtion$Ultra, 
                  response = OilDeapsorbtion$Diff, 
                  fun = mean, 
                  type = "l",
                  legend = TRUE,
                  xlab = "Oil Volume (mL)",
                  ylab = "Difference from Control in Volume (mL)",
                  trace.label = "Time Exposed to Ultrasound (Minutes)",
                 col = 1:2)

This indicates a huge difference as they are not even close to parallel. This means an interaction term is worth our consideration and likely.

model <- lm(Diff ~ Oil + Ultra, data=OilDeapsorbtion)
plot(model, which=1)

Our Data seems to have non-constant variance, instead ticking up on both ends. Despite this, we may proceed with an ANOVA test for our own curiosity and due to the relatively low degree of increase on both ends.

model <- aov(Diff ~ Oil * Ultra, data = OilDeapsorbtion)
summary(model)
            Df Sum Sq Mean Sq F value  Pr(>F)   
Oil          1  4.556   4.556   8.760 0.00542 **
Ultra        1  0.056   0.056   0.108 0.74417   
Oil:Ultra    1  1.406   1.406   2.704 0.10883   
Residuals   36 18.725   0.520                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we see our interaction term is actually insignificant at our 5% significance or even a generous 10%, as is the Ultrasound duration, so we are free to consider just the oil volume for our confidence interval.

OilDeapsorbtion %>%
  group_by(Oil == 5) %>%
  summarise(mean_val = mean(Diff, na.rm = TRUE))
# A tibble: 2 × 2
  `Oil == 5` mean_val
  <lgl>         <dbl>
1 FALSE          1.32
2 TRUE           0.65

We then calculate the 95% confidence interval through the following steps

(1.325-0.65)-(2.028*(0.520^0.5)*(1/20+1/20)^0.5)
[1] 0.2125448
(1.325-0.65)+(2.028*(0.520^0.5)*(1/20+1/20)^0.5)
[1] 1.137455

We may interpret this as “We are 95% certain that the average difference between the 5ml and 10ml level of oil is between 0.2125448 and 1.137455”, or to be less lengthy, between 0.212 and 1.138.