Lab4-Question2.utf8.md

# Two-way ANOVA test using ToothGrowth data set.

# Store the data in the variable mydata
mydata1  <- read.csv('basketball.csv')

# Research Question: Is there any relationship between the number of basketball 
#shots made and time of day and shoes worn?

# Generate frequency table. If values in all cells are same, then have a balanced design.
table(mydata1$Time, mydata1$Shoes)

##          
##           Favorite Others
##   Morning        4      4
##   Night          4      4

# Visualize the data
boxplot(Made ~ Time * Shoes, mydata1)

interaction.plot(mydata1$Shoes, mydata1$Time, mydata1$Made)

# Compute two-way ANOVA test. We begin by using a model with interaction. If
# interaction is not significant, then use additive model.
aovres1 <- aov(Made ~ Time * Shoes, mydata1)
summary(aovres1)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Time         1   7.56    7.56   0.344  0.568
## Shoes        1  39.06   39.06   1.777  0.207
## Time:Shoes   1  18.06   18.06   0.822  0.382
## Residuals   12 263.75   21.98

# Interpret results:
# From ANOVA results, we see that based on p-values and significance level of 0.05:
# 1. The p-value of Time is  0.05, which indicates that the levels of Time
# are associated with significant different shots made.
#we cannot conclude that a significant difference exists.

# 2. The p-value of Shoes is 0.207 > 0.05, which indicates
# that the type of Shoes are not associated with significant different shots made.
#we cannot conclude that a significant difference exists.

# 3. The p-value for the interaction between Time*Shoes is 0.382 > 0.05, which 
# indicates that the relationships between shoes type and basketball shots doesnot depends on the 
#time of the day.

# ANOVA test is not significant for the main effects (Time and Shoes), then we
# are done and there is no need to compute the Tukey test. 

# 1. Check homogeneity of variance assumption
# 1.1 Residuals vs. fit plot
# Residuals should "bounce randomly" around the 0 line, which suggests that the
# assumption that the relationship is linear is reasonable.
# There are outliers 14 at point 5 residual, 1 at point -5 residual and 6 at the point -10 residual
# If outliers exist, it can be useful to remove outliers to meet test assumptions.
plot(aovres1, 1)

# 1.2 Levene's test
# If p-value > significance level, we can assume homogeneity of variances in the different 
#treatment groups.
library(car)

## Loading required package: carData

leveneTest(Made ~ Time * Shoes, mydata1)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  3  0.6851 0.5781
##       12

# 2. Check normality assumptions
# 2.1 Normality plot of the residuals
# Quantiles of residuals are plotted against quantiles of normal distribution
# along with a 45-degree reference line.
# Verify assumption that residuals are normally distributed.
# Normal probability plot of residuals should approximately follow a straight line. but there is
# is outlier 14 below the straight line
plot(aovres1, 2)

# histogram of the residuals and seems like it is left hand skewed
aov_residuals <- residuals(aovres1)
hist(aov_residuals)

Lab4-Question2.R

arnabchakraboty

2020-02-11