Keeping with my project, I am now going to explore the olympic dataset and preform an few ANOVA computations. Let’s ask if the sport they play affects their height. Really I am asking is the mean height of all athletes across sports the same.
\[ H_0: \mu_1 = \mu_2=\cdots=\mu_n\\ H_A: \mu_i\neq\mu_j \text{ for some }i\text{ and }j \]
a <- aov(Height ~ Sport , data = data)
summary(a)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sport 58 6805266 117332 1497 <2e-16 ***
## Residuals 210886 16533167 78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 60171 observations deleted due to missingness
Clearly there is some difference, so we will reject the null hypothesis. Let’s explore the assumptions of the test.
plot(a,1)
The red line sits right in the middle here so eye balling it, it looks like the assumption of equal variances is okay.
plot(a,2)
## Warning: not plotting observations with leverage one:
## 76248
The QQ-Plot also looks acceptable suggesting the data is normal. So I feel confident rejecting the null hypothesis and saying that there is evidence that the mean height of athletes in different sports do have a different.
y1 <- mean(data$Height, na.rm = TRUE)
ggplot(data = data, aes(x = Sport, y = Height))+
geom_jitter(color = 'grey') +
stat_summary(fun.data = 'mean_se', color = "red") +
geom_hline(yintercept = y1, color = "blue",linetype = "dashed")
## Warning: Removed 60171 rows containing non-finite values (stat_summary).
## Warning: Removed 60171 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_segment).
Next I wonder if gender plays a significant role here too? I’ll add it to my ANOVA and explore. There will be three hypotheses here.
a2 <- aov(Weight ~ Sport*Sex, data)
summary(a2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sport 55 10406985 189218 1716.33 <2e-16 ***
## Sex 1 9365021 9365021 84947.02 <2e-16 ***
## Sport:Sex 45 151088 3358 30.45 <2e-16 ***
## Residuals 208139 22946374 110
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 62875 observations deleted due to missingness