Gil_Raitses

Homework Questions: 1. The datasets package (installed in R by default) contains a dataset called InsectSpraysthat shows the results of an experiment with six different kinds of insecticide. For each kind of insecticide, *n* = 12 observations were conducted. Each observation represented the count of insects killed by the spray. In this experiment, what is the dependent variable (outcome) and what is the independent variable? What is the total number of observations?

#The dependent variable is the count of insects killed. This is the outcome that is measured in each observation and is affected by the type of insecticide used.

#Since each kind of insecticide was tested 12 times and there are six different kinds of insecticides, the total number of observations in the dataset is:

# Compute the total number of observations based on the number of unique categories in an independent variable and the number of observations per category

# Load the dataset
data(InsectSprays)

# Find the number of unique types of insecticides
num_types <- length(unique(InsectSprays$spray))

# Assuming each type of insecticide has the same number of observations
# Find the number of observations for one type of insecticide
observations_per_type <- length(InsectSprays$count[InsectSprays$spray == unique(InsectSprays$spray)[1]])

# Calculate the total number of observations
total_observations <- num_types * observations_per_type

# Print the results
print(paste("There are", num_types, "types of insecticides."))

## [1] "There are 6 types of insecticides."

print(paste("Each type has", observations_per_type, "observations."))

## [1] "Each type has 12 observations."

print(paste("Total observations =", total_observations))

## [1] "Total observations = 72"

After running the aov() procedure on the InsectSprays data set, the “Mean Sq” for spray is 533.8 and the “Mean Sq” for Residuals is 15.4. Which one of these is the between ‐ groups variance and which one is the within‐groups variance? Explain your answers briefly in your own words.

# "Mean Sq" for Spray represents the between-groups variance. Quantifying the variation in insect counts associated with different types of sprays. This indicates how much a type of spray contributes to the variability in the effectiveness (insect counts). A higher "Mean Sq", as seen here, suggests that changes in type of spray can alter the outcome significantly.

# "Mean Sq" for Residuals is the within-groups variance. It measures the variance within each group of the same spray type and is not explained by the spray type itself. This shows the degree of variation in counts within a group using the same spray, illustrating the natural variability of data when subjected to identical treatment conditions.

Based on the information in question 2 and your response to that question, calculate anF*‐ratio by hand or using a calculator. Given everything you have learned about F*‐ratios, what do you think of this one? Hint: If you had all the information you needed for a Null Hypothesis Significance Test, would you reject the null? Why or why not?

# F = "Mean Square Between Groups" / "Mean Square Within Groups"
# Given "Mean Square for 'spray'" = 533.8 = Between-Groups Variance
# Given "Mean Square for 'residuals'" = 15.4 = Within-Groups Variance
# F-Ratio = 533.8 / 15.4
# F-Ratio = 34.66

# An F-ratio significantly greater than 1.0 is typically viewed as potential evidence suggesting that at least one of the groups originates from a population with a distinct mean. In this analysis of variance, the F-ratio indicated that the the variation between the different spray types is large compared to the variation within each spray type ((between-group variance : within-group variance) ≥ 1)

# Given all the data necessary data (like between-groups degrees of freedom and within groups degrees of freedom) to do a full Null Hypothesis Significance Test, you would reject the null hypothesis and the hypothesis that some sprays are more effective than others would be conclusive based on this analysis.

**Continuing with the InsectSprays example, there are six groups where each one has *n* = 12 observations. Calculate the degrees of freedom between groups and the degrees of freedom within groups. Explain why the sum of these two values adds up to one less than the total number of observations in the data set.**

# k = number of groups (sprays)
# DFbetween = k - 1
# DFbetween = 6 - 1
# Dfbetween = 5

# N = number of observations
# k = number of groups (sprays)
# DFwithin = N - k
# DFbetween = 72 - 6
# Dfbetween = 66


# TotalDF = DFbetween + DFwithin
# TotalDF = 5 + 66
# TotalDF = 71

# When calculating the total degrees of freedom, one of the total number of abservations is allocated to the overall mean and therefore is considered a constraint in ANOVA. The total degrees of freedom represent one degree omitted as a baseline for the overall mean and the rest of the degrees of freedom are distributed between and within the groups to assess variability relative to this mean.

Use R or R‐Studio to run the aov() command on the InsectSprays data set. You will have to specify the model correctly using the “~” character to separate the dependent variable from the independent variable. Place the results of the aov() command into a new object called insectResults. Run the summary() command on insectResults and interpret the results briefly in your own words. As a matter of good practice, you should state the null hypothesis, the alternative hypothesis, and what the results of the null hypothesis significance test lead you to conclude.

# Running the ANOVA
insectResults <- aov(count ~ spray, data = InsectSprays)

# Output the summary of the ANOVA
summary(insectResults)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## spray        5   2669   533.8    34.7 <2e-16 ***
## Residuals   66   1015    15.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Null Hypothesis (H0): There is no difference in the effectiveness of the various sprays; all sprays kill an equal number of insects.

# Alternative Hypothesis (HA): At least one spray is significantly different in effectiveness compared to the others.

# The F-value being much higher than 1 while the p-value associated with the F-statistic is much smaller than the common alpha level for significance tests of 0.05 indicated that the evidence of variance of effectiveness in this analysis is statistically significant. Therefore the null hypothesis is rejected and the differences among the group means are not due to random chance and that not all groups are equally effective.

Load the BayesFactor package and run the anovaBF() command on the InsectSpraysdata set. You will have to specify the model correctly using the “~” character to separate the dependent variable from the independent variable. Produce posterior distributions with the posterior() command and display the resulting HDIs. Interpret the results briefly in your own words, including an interpretation of the BayesFactor produced by he grouping variable. As a matter of good practice, you should state the two hypotheses that are being compared. Using the rules of thumb offered by Kass and Raftery (1995), what is the strength of this result?

# Install BayesFactor and coda if not already installed
if (!require(BayesFactor)) {
    install.packages("BayesFactor", dependencies = TRUE)

}

## Loading required package: BayesFactor

## Loading required package: coda

## Loading required package: Matrix

## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************

# Load the BayesFactor package
library(BayesFactor)
library(coda)

# Perform Bayesian ANOVA
bayes_anova <- anovaBF(formula = count ~ spray, data = InsectSprays)
print(bayes_anova)

## Bayes factor analysis
## --------------
## [1] spray : 1.506706e+14 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

# Obtain posterior distributions
posterior_distribution <- posterior(bayes_anova, iterations = 10000)

# Compute and display the HDIs
hdi_values <- HPDinterval(posterior_distribution)
print(hdi_values)

##             lower     upper
## mu       8.615871 10.483629
## spray-A  2.803039  6.834153
## spray-B  3.593197  7.745725
## spray-C -9.254505 -5.056430
## spray-D -6.433945 -2.425340
## spray-E -7.868071 -3.768071
## spray-F  4.857397  9.010973
## sig2    11.163241 22.122971
## g_spray  0.443113  8.889598
## attr(,"Probability")
## [1] 0.95

# The Bayesian ANOVA with a Bayes Factor of 1.51e+14 decisively rejects the null hypothesis that all insect sprays are equally effective, supporting the alternative hypothesis of variability in effectiveness among sprays. The posterior HDIs reveal specific credible intervals: Spray types A, B, and F enhance effectiveness (all intervals above zero), while types C, D, and E reduce it (all intervals below zero). The overall mean effectiveness ranges between 8.56 and 10.40. The variance within spray types (sig2) suggests intra-group variability (11.07 to 21.76), and group variance (g_spray) confirms significant inter-spray variability (0.46 to 8.68). These spray usage results strongly support guiding informed decisions in  selecting specific sprays by proven efficacy. Given the rules of thumb theorized by Kass and Raferty that a Bayes Factor of >150 is very strong evidence against a null hypothesis, the Bayes Factor of 15.1 trillion eclipses the scale of decisive strength in Kass and Raferty's theorem, indicated the strength of this result is astronomical.

Gil_Raitses_HW6

Gil Raitses

2024-05-09