Olympics

Keeping with my project, I am now going to explore the olympic dataset and preform an few ANOVA computations. Let’s ask if the sport they play affects their height. Really I am asking is the mean height of all athletes across sports the same.

\[ H_0: \mu_1 = \mu_2=\cdots=\mu_n\\ H_A: \mu_i\neq\mu_j \text{ for some }i\text{ and }j \]

a <- aov(Height ~ Sport , data = data)
summary(a)
##                 Df   Sum Sq Mean Sq F value Pr(>F)    
## Sport           58  6805266  117332    1497 <2e-16 ***
## Residuals   210886 16533167      78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 60171 observations deleted due to missingness

Clearly there is some difference, so we will reject the null hypothesis. Let’s explore the assumptions of the test.

plot(a,1)

The red line sits right in the middle here so eye balling it, it looks like the assumption of equal variances is okay.

plot(a,2)
## Warning: not plotting observations with leverage one:
##   76248

The QQ-Plot also looks acceptable suggesting the data is normal. So I feel confident rejecting the null hypothesis and saying that there is evidence that the mean height of athletes in different sports do have a different.

y1 <-  mean(data$Height, na.rm = TRUE)
ggplot(data = data, aes(x = Sport, y = Height))+
  geom_jitter(color = 'grey') +
  stat_summary(fun.data = 'mean_se', color = "red") +
  geom_hline(yintercept = y1,  color = "blue",linetype = "dashed")
## Warning: Removed 60171 rows containing non-finite values (stat_summary).
## Warning: Removed 60171 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_segment).

Two Way ANOVA

Next I wonder if gender plays a significant role here too? I’ll add it to my ANOVA and explore. There will be three hypotheses here.

a2 <- aov(Weight ~ Sport*Sex, data)
summary(a2)
##                 Df   Sum Sq Mean Sq  F value Pr(>F)    
## Sport           55 10406985  189218  1716.33 <2e-16 ***
## Sex              1  9365021 9365021 84947.02 <2e-16 ***
## Sport:Sex       45   151088    3358    30.45 <2e-16 ***
## Residuals   208139 22946374     110                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 62875 observations deleted due to missingness