Problem 1.

The built-in data set in R “CO2” contains data from an experiment on the cold tolerance of the grass species Echinochloa crus-galli by evaluating its carbon dioxide uptake rates, measured in µmol/m2 sec, between two conditions: chilled and nonchilled. Before moving on, take a moment to look at the “CO2” data set in R to become familiar with its components and naming conventions. Those who conducted the experiment want to know if there is a difference in the carbon dioxide update between “chilled” and “nonchilled” plants. Using the built-in data set “CO2,” against a significance level of α = 0.01

Null Hypothesis: μ1-μ2=0

Alternative Hypothesis: μ1-μ2≠0

α = 0.01

Where μ1 = the true mean carbon dioxide uptake of nonchilled plants and μ2 = the true mean carbon dioxide uptake of chilled plants.

Conditions:

Independence Assumption: It is assumed that the data for both the nonchilled plants and chilled plants was properly randomized.42 nonchilled ≤ 10% of nonchilled plants and 42 chilled plants ≤ 10% of all chilled plants.

Normal Population Assumption: By examining the normal probability plots of CO2 Uptakes of both chilled and nonchilled plants we can evaluate whether they are nearly normally distributed if the plots appear roughly linear.

ggplot(CO2, aes(sample = uptake)) +
  stat_qq() +
  stat_qq_line() +
  facet_wrap(~ Treatment) +
  labs(
    title = "Normal Probability Plots of CO2 Uptake by Treatment",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  )

There appears to be a slight curvature to both normal probability plots which could indicate that the sample data is not normally distributed, however we will assume it is and proceed with caution. We will also assume that the the CO2 uptake of chilled plants is independent of the CO2 uptake of nonchilled plants.

Finally before we can conduct our hypothesis test we must determine if our samples have equal variance and if the standard error can be pooled. To do this we will conduct an F-test where the null hypothesis postulates that the ratio between the sample variances is 1 and thus that the sample variance is equal, and the alternative hypothesis states that the ratio of sample variances is not equal to 1 and thus the samples do not have an equal variance. The F-test will be conducted against a significance level of α = 0.05.

with(CO2,var.test(uptake~Treatment))
## 
##  F test to compare two variances
## 
## data:  uptake by Treatment
## F = 0.79504, num df = 41, denom df = 41, p-value = 0.466
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4273524 1.4790775
## sample estimates:
## ratio of variances 
##          0.7950392

Since the F-test p-value of 0.466 is above the significance level of 0.05, we fail to reject the F-test null hypothesis and thus it is quite likely that the sample variance is equal.

It is ok to use the Student’s t-model with 82 degrees of freedom and Two-Sample t-test for Means.

with(CO2,t.test(uptake~Treatment, alternative = "two.sided",conf.level=0.99,mu=0,paried=FALSE,var.equal=TRUE))
## 
##  Two Sample t-test
## 
## data:  uptake by Treatment
## t = 3.0485, df = 82, p-value = 0.003096
## alternative hypothesis: true difference in means between group nonchilled and group chilled is not equal to 0
## 99 percent confidence interval:
##   0.9255755 12.7934722
## sample estimates:
## mean in group nonchilled    mean in group chilled 
##                 30.64286                 23.78333

Since the p-value of 0.003096 is below the significance level of 0.01 we reject the null hypothesis.

Since the null hypothesis was rejected (due to the p-value being below 0.01), there is strong enough statistical evidence to say that there is a difference in carbon dioxide uptake between “chilled” and “nonchilled” plants.

with(CO2,ggplot(data=NULL,aes(x = Treatment, y = uptake, fill = Treatment)) +
  geom_boxplot(color = "black", alpha = 0.7) +
  labs(
    title = "CO2 Uptake by Treatment Type",
    x = "Treatment Type",
    y = "CO2 Uptake Rate"
  ) +
  scale_fill_manual(values = c("chilled" = "lightblue", "nonchilled" = "lightgreen"))
  )

The box-and-whisker plot above shows that the nonchilled plants generally have higher CO₂ uptake compared to the chilled plants. The median uptake for nonchilled plants, 35.8, is noticeably greater than the median uptake for chilled plants at 21.0, and the overall distribution is shifted upward. This supports the hypothesis test result indicating a significant difference between the two treatments, and may suggest that chilling appears to reduce CO₂ uptake. Additionally, the spread (interquartile range) for nonchilled plants is somewhat narrower, suggesting more consistent uptake levels, whereas the chilled plants show lower and more variable uptake values. Overall, the visualization reinforces the statistical conclusion that there is a difference in CO₂ uptake levels between chilled and nonchilled plants.

Problem 2.

It has been observed that caffeine intake could have an effect on the ability to play video games. A curious Mario Kart enthusiast wanted to know if caffeine made players better (or worse). They enrolled a random sample of n = 30 other Mario Kart enthusiasts, all with similar experience levels, and asked them to play a round before having a caffeinated drink and then play another round an hour after having a caffeinated drink, recording the race times (in seconds) for each round. Further, they had the participants choose to either have a “Coffee” or an “Energy Drink” as their caffeinated beverage of choice.

  1. Conduct a two sample hypothesis test looking at the difference in racing time before and after players had a caffeinated drink

Null Hypothesis: μd=0

Alternative Hypothesis: μd≠0

α = 0.05

Conditions:

Paired Data Assumption: Racing times are paired by Mario Kart enthusiast.

Independence Assumption: It was stated that the sample was randomized and 30 Mario Kart enthusiasts is less than 10% of all Mario Kart enthusiasts.

Normal Population Assumption:By examining the normal probability plot differences in racing time before and after drinking a caffeinated drink we can evaluate whether they are nearly normally distributed if the plots appear roughly linear.

mariokart<-read.csv("mariokart.csv")
mariokart$Diff<-mariokart$Time_Before - mariokart$Time_After
ggplot(mariokart, aes(sample=Diff)) +
  stat_qq() +
  stat_qq_line() +
  labs(
    title = "Normal Probability Plot of Racing Time Differences (Before − After)",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  )

Since the Normal Probability Plot of Racing Time Differences appears roughly linear we can assume that the distribution of the differences in racing times is nearly normally distributed.

It is ok to use the Student’s t-model with 29 degrees of freedom and a Matched Pairs t-Test.

with(mariokart, t.test(Time_Before,Time_After,conf.level=0.95,paired=TRUE,alternative="two.sided",mu=0))
## 
##  Paired t-test
## 
## data:  Time_Before and Time_After
## t = 8.6977, df = 29, p-value = 1.415e-09
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  5.662463 9.144203
## sample estimates:
## mean difference 
##        7.403333

Since the p-value of 1.415e-09 is below the significance level of 0.05 we reject the null hypothesis.

  1. Interpret the results from part (a) relative to the problem scenario.

Since the null hypothesis was rejected (due to the p-value being below 0.05), there is strong enough statistical evidence to say that there is likely a difference in race times before vs after having a caffeinated drink.

  1. Conduct an additional two sample hypothesis test looking at the difference in racing time between the types of caffeinated drinks Null Hypothesis: μ1-μ2=0

Alternative Hypothesis: μ1-μ2≠0

α = 0.05

Where μ1 = the true mean difference in racing time after having an Energy drink and μ2 = the true mean difference in racing time after having a cup of coffee.

Conditions:

Independence Assumption: The experiment was stated to be randomized and we can assume that the each Mariokart Enthusiast’s difference in race time is independent and that the two groups were also independent.

Normal Population Assumption: By examining the normal probability plots of the difference in racing time for each caffeinated drink we can evaluate whether they are nearly normally distributed if the plots appear roughly linear.

ggplot(mariokart, aes(sample = Diff)) +
  stat_qq() +
  stat_qq_line() +
  facet_wrap(~ Type) +
  labs(
    title = "Normal Probability Plots of Difference in Racing Times in seconds by Type of Caffinated Drink Consumed",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  )

Since both normal probability plots appear roughly linear we can assume that the distribution of the differences in racing times for both drinks is nearly normally distributed.

Finally before we can conduct our hypothesis test we must determine if our samples have equal variance and if the standard error can be pooled. To do this we will conduct an F-test where the null hypothesis postulates that the ratio between the sample variances is 1 and thus that the sample variance is equal, and the alternative hypothesis states that the ratio of sample variances is not equal to 1 and thus the samples do not have an equal variance. The F-test will be conducted against a significance level of α = 0.05.

with(mariokart, var.test(Diff~Type))
## 
##  F test to compare two variances
## 
## data:  Diff by Type
## F = 2.5058, num df = 19, denom df = 9, p-value = 0.1595
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6803101 7.2168693
## sample estimates:
## ratio of variances 
##           2.505812

Since the F-test p-value of 0.1595 is above the significance level of 0.05, we fail to reject the F-test null hypothesis and thus it is quite likely that the sample variance is equal.

It is ok to use the Student’s t-model with 28 degrees of freedom and Two-Sample t-test for Means.

with(mariokart,t.test(Diff~Type, alternative = "two.sided",conf.level=0.95,mu=0,paried=FALSE,var.equal=TRUE))
## 
##  Two Sample t-test
## 
## data:  Diff by Type
## t = -0.52222, df = 28, p-value = 0.6056
## alternative hypothesis: true difference in means between group Coffee and group Energy Drink is not equal to 0
## 95 percent confidence interval:
##  -4.70095  2.79095
## sample estimates:
##       mean in group Coffee mean in group Energy Drink 
##                      7.085                      8.040

Since the p-value of 0.6056 is above the significance level of 0.05 we fail to reject the null hypothesis.

  1. Interpret the results from part (c) relative to the problem scenario

Since we failed to reject the null hypothesis, there is not strong enough statistical evidence to say that there is likely a difference in the difference in racing times between Mariokart enthusiasts who drank energy drinks compared to those who drank coffee.

  1. Generate two separate plots which visualize and support the results of the hypothesis tests in parts (a) and (c) using different colors and appropriate labels on the axes. Discuss how each supports the respective hypothesis testing results.
boxplot(mariokart$Diff, horizontal=TRUE,
        main = "Distribution of Differences in Racing Times (Before - After)",
        xlab = "Change in Time (seconds)",
        col = "lightblue")

The box plot above shows that the distribtion of Differences in Racing Times is almost completely above 0 seconds with the median being 7.65 seconds and the IQR being 6.3s. This means the middle 50% of the data is between 4.45s and 10.75s. Since the majority of the data is above 0, this supports the results of the hypothesis test in part (a) as there is likely a difference in Mario Kart racing times before and after having a caffeinated drink.

boxplot(Diff ~ Type, data = mariokart,
        main = "Differnce in Racing Time by Caffeinated Drink Type",
        horizontal = T,
        xlab = "Change in Time (seconds)",
        ylab = "Drink Type",
        col = c("tan", "lightgreen"))

As seen in the side-by-side box plots above, there is very strong overlap between the distribution of differences in Mariokart racing times for Mariokart enthusiasts who drank coffee and those who drank an energy drink. This supports the results of the hypothesis test from part (c) which stated that there was not strong enough statistical evidence to claim that there was a difference in improvements between the two drink groups.

  1. Based on the results of both hypothesis tests, what conclusions can the Mario Kart enthusiast draw about the effect of caffeine and the differences between caffeinated drink types on playing video games?

Caffeine had a clear effect on Mario Kart performance. The paired t-test showed a significant difference in racing times after participants consumed a caffeinated drink, and the box plot of differences confirms this: most participants improved, with a median improvement of 7.65 seconds. This indicates that, in general, caffeine may help players reduce their race times.

However, there was no significant difference between the two types of caffeinated drinks. The two-sample t-test and the side-by-side box plots show that improvements for Coffee and Energy Drink participants were very similar, with strong overlap in their distributions. This suggests that the type of caffeinated drink does not significantly affect racing performance—both drinks have a similar effect.