# Two-way ANOVA test using ToothGrowth data set.
# Store the data in the variable mydata
mydata1 <- read.csv('basketball.csv')
# Research Question: Is there any relationship between the number of basketball
#shots made and time of day and shoes worn?
# Generate frequency table. If values in all cells are same, then have a balanced design.
table(mydata1$Time, mydata1$Shoes)
##
## Favorite Others
## Morning 4 4
## Night 4 4
# Visualize the data
boxplot(Made ~ Time * Shoes, mydata1)

interaction.plot(mydata1$Shoes, mydata1$Time, mydata1$Made)

# Compute two-way ANOVA test. We begin by using a model with interaction. If
# interaction is not significant, then use additive model.
aovres1 <- aov(Made ~ Time * Shoes, mydata1)
summary(aovres1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Time 1 7.56 7.56 0.344 0.568
## Shoes 1 39.06 39.06 1.777 0.207
## Time:Shoes 1 18.06 18.06 0.822 0.382
## Residuals 12 263.75 21.98
# Interpret results:
# From ANOVA results, we see that based on p-values and significance level of 0.05:
# 1. The p-value of Time is 0.05, which indicates that the levels of Time
# are associated with significant different shots made.
#we cannot conclude that a significant difference exists.
# 2. The p-value of Shoes is 0.207 > 0.05, which indicates
# that the type of Shoes are not associated with significant different shots made.
#we cannot conclude that a significant difference exists.
# 3. The p-value for the interaction between Time*Shoes is 0.382 > 0.05, which
# indicates that the relationships between shoes type and basketball shots doesnot depends on the
#time of the day.
# ANOVA test is not significant for the main effects (Time and Shoes), then we
# are done and there is no need to compute the Tukey test.
# 1. Check homogeneity of variance assumption
# 1.1 Residuals vs. fit plot
# Residuals should "bounce randomly" around the 0 line, which suggests that the
# assumption that the relationship is linear is reasonable.
# There are outliers 14 at point 5 residual, 1 at point -5 residual and 6 at the point -10 residual
# If outliers exist, it can be useful to remove outliers to meet test assumptions.
plot(aovres1, 1)

# 1.2 Levene's test
# If p-value > significance level, we can assume homogeneity of variances in the different
#treatment groups.
library(car)
## Loading required package: carData
leveneTest(Made ~ Time * Shoes, mydata1)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.6851 0.5781
## 12
# 2. Check normality assumptions
# 2.1 Normality plot of the residuals
# Quantiles of residuals are plotted against quantiles of normal distribution
# along with a 45-degree reference line.
# Verify assumption that residuals are normally distributed.
# Normal probability plot of residuals should approximately follow a straight line. but there is
# is outlier 14 below the straight line
plot(aovres1, 2)

# histogram of the residuals and seems like it is left hand skewed
aov_residuals <- residuals(aovres1)
hist(aov_residuals)
