Homework Questions: 1. The datasets package (installed in R
by default) contains a dataset called InsectSpraysthat
shows the results of an experiment with six different kinds of
insecticide. For each kind of insecticide, *n* = 12
observations were conducted. Each observation represented the count of
insects killed by the spray. In this experiment, what is the dependent
variable (outcome) and what is the independent variable? What is the
total number of observations?
#The dependent variable is the count of insects killed. This is the outcome that is measured in each observation and is affected by the type of insecticide used.
#Since each kind of insecticide was tested 12 times and there are six different kinds of insecticides, the total number of observations in the dataset is:
# Compute the total number of observations based on the number of unique categories in an independent variable and the number of observations per category
# Load the dataset
data(InsectSprays)
# Find the number of unique types of insecticides
num_types <- length(unique(InsectSprays$spray))
# Assuming each type of insecticide has the same number of observations
# Find the number of observations for one type of insecticide
observations_per_type <- length(InsectSprays$count[InsectSprays$spray == unique(InsectSprays$spray)[1]])
# Calculate the total number of observations
total_observations <- num_types * observations_per_type
# Print the results
print(paste("There are", num_types, "types of insecticides."))
## [1] "There are 6 types of insecticides."
print(paste("Each type has", observations_per_type, "observations."))
## [1] "Each type has 12 observations."
print(paste("Total observations =", total_observations))
## [1] "Total observations = 72"
aov() procedure on the
InsectSprays data set, the “Mean Sq” for spray is 533.8 and
the “Mean Sq” for Residuals is 15.4. Which one of these is the between ‐
groups variance and which one is the within‐groups variance? Explain
your answers briefly in your own words.# "Mean Sq" for Spray represents the between-groups variance. Quantifying the variation in insect counts associated with different types of sprays. This indicates how much a type of spray contributes to the variability in the effectiveness (insect counts). A higher "Mean Sq", as seen here, suggests that changes in type of spray can alter the outcome significantly.
# "Mean Sq" for Residuals is the within-groups variance. It measures the variance within each group of the same spray type and is not explained by the spray type itself. This shows the degree of variation in counts within a group using the same spray, illustrating the natural variability of data when subjected to identical treatment conditions.
F*‐ratio by hand or using a
calculator. Given everything you have learned about
F*‐ratios, what do you think of this one? Hint: If you
had all the information you needed for a Null Hypothesis Significance
Test, would you reject the null? Why or why not?# F = "Mean Square Between Groups" / "Mean Square Within Groups"
# Given "Mean Square for 'spray'" = 533.8 = Between-Groups Variance
# Given "Mean Square for 'residuals'" = 15.4 = Within-Groups Variance
# F-Ratio = 533.8 / 15.4
# F-Ratio = 34.66
# An F-ratio significantly greater than 1.0 is typically viewed as potential evidence suggesting that at least one of the groups originates from a population with a distinct mean. In this analysis of variance, the F-ratio indicated that the the variation between the different spray types is large compared to the variation within each spray type ((between-group variance : within-group variance) ≥ 1)
# Given all the data necessary data (like between-groups degrees of freedom and within groups degrees of freedom) to do a full Null Hypothesis Significance Test, you would reject the null hypothesis and the hypothesis that some sprays are more effective than others would be conclusive based on this analysis.
InsectSprays example, there are
six groups where each one has *n* = 12 observations.
Calculate the degrees of freedom between groups and the degrees of
freedom within groups. Explain why the sum of these two values adds up
to one less than the total number of observations in the data
set.**# k = number of groups (sprays)
# DFbetween = k - 1
# DFbetween = 6 - 1
# Dfbetween = 5
# N = number of observations
# k = number of groups (sprays)
# DFwithin = N - k
# DFbetween = 72 - 6
# Dfbetween = 66
# TotalDF = DFbetween + DFwithin
# TotalDF = 5 + 66
# TotalDF = 71
# When calculating the total degrees of freedom, one of the total number of abservations is allocated to the overall mean and therefore is considered a constraint in ANOVA. The total degrees of freedom represent one degree omitted as a baseline for the overall mean and the rest of the degrees of freedom are distributed between and within the groups to assess variability relative to this mean.
aov() command on
the InsectSprays data set. You will have to specify the
model correctly using the “~” character to separate the
dependent variable from the independent variable. Place the results of
the aov() command into a new object called
insectResults. Run the summary() command on
insectResults and interpret the results briefly in your own
words. As a matter of good practice, you should state the null
hypothesis, the alternative hypothesis, and what the results of the null
hypothesis significance test lead you to conclude.# Running the ANOVA
insectResults <- aov(count ~ spray, data = InsectSprays)
# Output the summary of the ANOVA
summary(insectResults)
## Df Sum Sq Mean Sq F value Pr(>F)
## spray 5 2669 533.8 34.7 <2e-16 ***
## Residuals 66 1015 15.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Null Hypothesis (H0): There is no difference in the effectiveness of the various sprays; all sprays kill an equal number of insects.
# Alternative Hypothesis (HA): At least one spray is significantly different in effectiveness compared to the others.
# The F-value being much higher than 1 while the p-value associated with the F-statistic is much smaller than the common alpha level for significance tests of 0.05 indicated that the evidence of variance of effectiveness in this analysis is statistically significant. Therefore the null hypothesis is rejected and the differences among the group means are not due to random chance and that not all groups are equally effective.
BayesFactor package and run the
anovaBF() command on the InsectSpraysdata set.
You will have to specify the model correctly using the “~”
character to separate the dependent variable from the independent
variable. Produce posterior distributions with the
posterior() command and display the resulting HDIs.
Interpret the results briefly in your own words, including an
interpretation of the BayesFactor produced by he grouping variable. As a
matter of good practice, you should state the two hypotheses that are
being compared. Using the rules of thumb offered by Kass and Raftery
(1995), what is the strength of this result?# Install BayesFactor and coda if not already installed
if (!require(BayesFactor)) {
install.packages("BayesFactor", dependencies = TRUE)
}
## Loading required package: BayesFactor
## Loading required package: coda
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
##
## Type BFManual() to open the manual.
## ************
# Load the BayesFactor package
library(BayesFactor)
library(coda)
# Perform Bayesian ANOVA
bayes_anova <- anovaBF(formula = count ~ spray, data = InsectSprays)
print(bayes_anova)
## Bayes factor analysis
## --------------
## [1] spray : 1.506706e+14 ±0%
##
## Against denominator:
## Intercept only
## ---
## Bayes factor type: BFlinearModel, JZS
# Obtain posterior distributions
posterior_distribution <- posterior(bayes_anova, iterations = 10000)
# Compute and display the HDIs
hdi_values <- HPDinterval(posterior_distribution)
print(hdi_values)
## lower upper
## mu 8.615871 10.483629
## spray-A 2.803039 6.834153
## spray-B 3.593197 7.745725
## spray-C -9.254505 -5.056430
## spray-D -6.433945 -2.425340
## spray-E -7.868071 -3.768071
## spray-F 4.857397 9.010973
## sig2 11.163241 22.122971
## g_spray 0.443113 8.889598
## attr(,"Probability")
## [1] 0.95
# The Bayesian ANOVA with a Bayes Factor of 1.51e+14 decisively rejects the null hypothesis that all insect sprays are equally effective, supporting the alternative hypothesis of variability in effectiveness among sprays. The posterior HDIs reveal specific credible intervals: Spray types A, B, and F enhance effectiveness (all intervals above zero), while types C, D, and E reduce it (all intervals below zero). The overall mean effectiveness ranges between 8.56 and 10.40. The variance within spray types (sig2) suggests intra-group variability (11.07 to 21.76), and group variance (g_spray) confirms significant inter-spray variability (0.46 to 8.68). These spray usage results strongly support guiding informed decisions in selecting specific sprays by proven efficacy. Given the rules of thumb theorized by Kass and Raferty that a Bayes Factor of >150 is very strong evidence against a null hypothesis, the Bayes Factor of 15.1 trillion eclipses the scale of decisive strength in Kass and Raferty's theorem, indicated the strength of this result is astronomical.