For all of the applicable problems, plots must be generated using the functions within the ggplot2 package.

Problem 1. Due to the constraint in size and space, laptops tend to have a slower processing speed than desktops, but, depending on the task that needs to be run, this may or may not be an issue. Data was collected on the data processing time (seconds) for a machine learning model between 10 randomly selected Dell Inspiron laptops, each with an Intel Core i5-1135G7, and 10 randomly selected Dell XPS desktops, each with an Intel Core i7-12700K, recording the data in the attached “processing.csv” file. Against a significance level of α = 0.05, those running the model want to determine if there is a noticeable difference in the processing time between the laptops and desktops, particularly if the laptops are slower than the desktops.

  1. Conduct a two sample hypothesis test looking at the difference in processing time (sec) between laptops and desktops, using the appropriate parameters in the “t.test()” function

First thing we need to do before running a T-test is to determine whether the two groups “Desktop” and “Laptop” have equal variance. To do this we can run an F-test with a significance level of α = 0.05, where the null hypothesis states that the true ratio of variances between the groups is equal to 1 (therefore the groups likely have equal variance) and the alternative hypothesis states that the true ratio of variances is not equal to 1.

var.test(Time ~ Computer, data = processing)
## 
##  F test to compare two variances
## 
## data:  Time by Computer
## F = 1.1786, num df = 9, denom df = 9, p-value = 0.8107
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2927405 4.7449217
## sample estimates:
## ratio of variances 
##           1.178571

Since the p-value for the F-test of 0.8107 is above the significance level of 0.05 we fail to reject the null hypothesis and thus the variances of both groups are statistically likely to be equal.

Now we can continue with the T-test. Since those who are running the model want to determine if there is a noticeable difference in the processing time between the laptops and desktops, particularly if the laptops are slower than the desktops the null hypothesis and alternative hypothesis are as follows:

Null Hypothesis: μDesktop − μLaptop = 0

Alternative Hypothesis: μDesktop − μLaptop < 0

α = 0.05

t.test(Time ~ Computer, data = processing,
       alternative = "less", var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  Time by Computer
## t = -10.23, df = 18, p-value = 3.142e-09
## alternative hypothesis: true difference in means between group Desktop and group Laptop is less than 0
## 95 percent confidence interval:
##       -Inf -12.70662
## sample estimates:
## mean in group Desktop  mean in group Laptop 
##                  55.1                  70.4

Since the p-value of 3.142e-09 << 0, we reject the null hypothesis.

  1. Interpret the results from part (a) relative to the problem scenario

Since in part (a) we rejected the null hypothesis, there is strong enough statistical evidence to say that, based on the sample, Laptops on average have larger processing times than Desktops and are thus slower at running the model than Desktops.

  1. Generate a plot which visualizes and supports the results of the hypothesis test in part (a) using colors and appropriate labels on the axes. Discuss how it supports the hypothesis testing results.

As seen in the box plots below, there is no overlap in the sample distributions and the distribution of Laptop processing times is completely higher than the distribution of Desktop times. This supports both the results of the hypothesis test in part (a) and the conclusion drawn in part (b) as, according to the distribution of the sample groups, Laptops have larger processing times than Desktops and are thus slower at running the model.

library(ggplot2)

ggplot(processing, aes(x = Computer, y = Time, fill = Computer)) +
  geom_boxplot(alpha = 0.75) +
  scale_fill_brewer(palette = "Blues") +
  labs(title = "Boxplot Comparing Processing Times of Laptops and Desktops",
       x = "Computer Type",
       y = "Processing Time (seconds)") +
  coord_flip()

Problem 2. First, take a few minutes to look through the components of the built-in R data set “chickwts,” which contains the results of an experiment conducted to measure the effectiveness of 6 different feed supplements on the growth rate (measured by weight) of chickens. Then complete the following:

  1. Using “ggplot()”, generate a plot to visually compare the differences in chicken weight between feed types, using different colors to distinguish the groups and appropriate labels on the axes. What preliminary observations can you make about the differences in weight between feed types?
chickwts<-chickwts

ggplot(chickwts, aes(x = weight, y = feed, fill = feed)) +
  geom_boxplot(alpha = 0.75) +
  scale_fill_brewer(palette = "Oranges") +
  labs(title = "The Effectiveness of Feed Supplements on the Growth Rate of Chickens ",
       x = "Chick Weight",
       y = "Feed Supplement")

The boxplots indicates general differences in chicken weight among the various feed types. The casein and sunflower feeds appear to produce higher median weights compared with the other groups, while the horsebean feed shows the lowest median weights. The remaining feed types—linseed, soybean, and meatmeal—fall within an intermediate range. Considerable overlap is observed among the distributions, suggesting that while some differences in chick growth are visible, the overall spread of weights across feed types is fairly broad.

  1. Check and interpret the following one-way ANOVA assumptions:

    • Normality using both the Normality QQ Plot and Shapiro-Wilk Test, remembering to check this assumption for each feed type.

    • Equal Variances using Bartlett’s Test or Levene’s Test - choose which is best!

First we will evaluate Normality as this will help us determine which test we will use to determine the equality of variance among our groups.(If all groups are significantly Normally Distributed we will use Bartlett’s Test because it is more powerful for Normally Distributed data, however if at least one of the groups is not Normally Distributed we will instead use Levene’s Test because even though it is less powerful, it is not sensitive to Non-Normally Distributed data and will still be able to evaluate the equality of variance among the different feed groups.)

ggplot(chickwts, aes(sample = weight, colour = feed, shape = feed)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "Normal Q–Q Plot of Chicken Weights by Feed Type",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  scale_colour_brewer(palette = "Set2")

with(chickwts, shapiro.test(weight[feed == "horsebean"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "horsebean"]
## W = 0.93758, p-value = 0.5264
with(chickwts, shapiro.test(weight[feed == "linseed"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "linseed"]
## W = 0.96931, p-value = 0.9035
with(chickwts, shapiro.test(weight[feed == "soybean"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "soybean"]
## W = 0.9464, p-value = 0.5064
with(chickwts, shapiro.test(weight[feed == "sunflower"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "sunflower"]
## W = 0.92809, p-value = 0.3603
with(chickwts, shapiro.test(weight[feed == "meatmeal"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "meatmeal"]
## W = 0.97914, p-value = 0.9612
with(chickwts, shapiro.test(weight[feed == "casein"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[feed == "casein"]
## W = 0.91663, p-value = 0.2592

All feed groups produced p-values from the Shapiro-Wilk normality test greater than the significance level of 0.05 (horsebean: 0.5264, linseed: 0.9035, soybean: 0.5064, sunflower: 0.3603, meatmeal: 0.9612, casein: 0.2592), indicating that the null hypothesis of normality cannot be rejected for any group. These results suggest that there is strong enough statistical evidence that the distributions of chicken weights within each feed type are nearly normal. These results are further supported by the Normal Probability plots for each group which all indicate that the distributions of each group are roughly straight and linear.

Since there was strong enough statistical evidence that all groups were determined to have nearly normal distributions we can use Bartlett’s Test to determine the equailty of the variances among the groups.

bartlett.test(weight ~ feed, data = chickwts)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  weight by feed
## Bartlett's K-squared = 3.2597, df = 5, p-value = 0.66

Since the p-value of 0.66 is greater than the significance level of 0.05 we fail to reject the null hypothesis for Bartlett’s Test which is that the variances are not statistically different, so by we failing to reject the null hypothesis, we can conclude the variances are not likely to be different and, therefore, can be assumed to be equal.

  1. Conduct and interpret, against a significance level of α = 0.05, a one-way ANOVA to evaluate the differences between the feed types on the weight of chickens, then answer the following questions.

Null hypothesis: The mean chicken weight is the same across all feed types.

Alternative hypothesis: At least one feed type has a different mean weight.

α = 0.05

chick_aov <- aov(weight ~ feed, data = chickwts)
summary(chick_aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## feed         5 231129   46226   15.37 5.94e-10 ***
## Residuals   65 195556    3009                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value of 5.94e-10 is much less than the significance level of 0.05 we reject the null hypothesis. There is strong enough statistical evidence to say that at least one feed type has a different mean weight.

1) Does your plot in part (a) support this result? Why or why not? Explain your reasoning

Yes the plot in part (a) supports the results of the ANOVA. The plot suggest that casein and sunflower feeds appear to produce higher median weights compared with the other groups, while the horsebean feed shows the lowest median weights which corresponds with the ANOVA findings as there is infact a statsitically significant difference between at least one of the groups.

2) Based on your assumption checks in part (b), is it likely the one-way ANOVA results are valid? Why or why not? Explain your reasoning

Since the Normal Probability Plots (Q-Q Plots) showed and the Shapiro-Wilk normality tests found that each group was Nearly Normally Distributed and the Bartlett's test of homogeneity of variances found that each groups varaiance was equal both assumptions for the ANOVA test were fufilled. We could assume that the data for each group was Normally Distrubuted and that the varainces of each group were equal, thus the results of the one-way ANOVA were most-likely vaild. If one of these assumptions could not be made, the vailidity of the ANOVA could be called into question, however since both assumptions passed, we can trust the likelhood of the vailidity of the resutls.
  1. Conduct and interpret the results of a pairwise comparison between each pair of feed types using Tukey’s HSD test. Is there a feed type that appears to provide the highest amount of weight when compared to the other feed types?
TukeyHSD(chick_aov, conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ feed, data = chickwts)
## 
## $feed
##                            diff         lwr       upr     p adj
## horsebean-casein    -163.383333 -232.346876 -94.41979 0.0000000
## linseed-casein      -104.833333 -170.587491 -39.07918 0.0002100
## meatmeal-casein      -46.674242 -113.906207  20.55772 0.3324584
## soybean-casein       -77.154762 -140.517054 -13.79247 0.0083653
## sunflower-casein       5.333333  -60.420825  71.08749 0.9998902
## linseed-horsebean     58.550000  -10.413543 127.51354 0.1413329
## meatmeal-horsebean   116.709091   46.335105 187.08308 0.0001062
## soybean-horsebean     86.228571   19.541684 152.91546 0.0042167
## sunflower-horsebean  168.716667   99.753124 237.68021 0.0000000
## meatmeal-linseed      58.159091   -9.072873 125.39106 0.1276965
## soybean-linseed       27.678571  -35.683721  91.04086 0.7932853
## sunflower-linseed    110.166667   44.412509 175.92082 0.0000884
## soybean-meatmeal     -30.480519  -95.375109  34.41407 0.7391356
## sunflower-meatmeal    52.007576  -15.224388 119.23954 0.2206962
## sunflower-soybean     82.488095   19.125803 145.85039 0.0038845

Based on the Tukey HSD results, several pairwise comparisons between feed types showed statistically significant differences at the 0.05 level. Chickens fed with casein had significantly higher mean weights than those fed horsebean (p = 0.0000000), linseed (p = 0.0002100), and soybean (p = 0.0083653). Chickens fed with sunflower also had significantly higher weights than those fed horsebean (p = 0.0000000), linseed (p = 0.0000884), and soybean (p = 0.0038845). Comparisons among meatmeal, linseed, and soybean generally had p-values greater than 0.05, indicating no significant differences. Overall, the pairwise comparisons suggest that casein and sunflower feeds are associated with the highest chicken weights, while horsebean is associated with the lowest.