============================================================================================================
Read the case study and provide your analysis.
Your friend from the Junior Soccer League is happy to know about your interest in her dataset. She mentions she also did another experiment with the same teams. This time, she wanted to know which food did the players liked most. For the November’s match, she offered oranges and bananas, and she offered spinach smoothies for the other match in December. She counted how many players liked each type of snack regarding the different teams. She wants to know if there is an important difference between preferences.
| Oranges and Bananas | Spinach Smoothie | |
|---|---|---|
| Team Tiger | 15 | 9 |
| Team Shark | 21 | 17 |
Evaluation metrics are going to consider:
1. How you determine the test or approach you are going to use.
Answer: The variables, team and month, are nominal. The team is either Team Tiger or Team Shark. The month is either November or December. The month and the food are associated. The dependent variable, preference, is boolean. An experiment is whether a soccer player like the certain food in each team, which means team is the independent variable. The goal of the study would be to examine whether the preference changes as the team is varied. Hence, the Chi-square test of independence is in use.
2. Explicitly mention what you need to use for this approach, e.g., the parameters, statistics, predictors necessary.
Answer: The Chi-square test of independence determines whether there is an association between two categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test. Some requirements for running such a test are: two or more levels for each variable; there is no relationship between the subjects in each group; the categorical variables are not paired (e.g., pre-test and post-test); relatively large sample size. The null hypothesis (\(H_0\)) of the Chi-square test of independence is that there is no association between two categorical variables.
3. Report the results of your approach.
Answer: The Chi-square test of independence is run in the code chunk. The test statistic for the Chi-square test of independence is calculated as 0.088973. The p-value of 0.7655 is not statistically significant at an alpha level of 0.05, hence fail to reject the null hypothesis.
4. The written communication of the process and final recommendation.
Answer: If the null hypothesis is true, but we reject it, the false positive is a Type I error. If the null hypothesis is false, but we fail to reject it, the false negative is a Type II error.
A small p-value says the data is unlikely to occur if the null hypothesis is true. We, therefore, conclude that the null hypothesis is probably not true and that the alternative hypothesis is true instead.
We often choose a significance level as a benchmark for judging if the p-value is small enough. If the p-value is less than or equal to the significance level, we reject the null hypothesis and accept the alternative hypothesis instead.
If the p-value is greater than the significance level, we say we “fail to reject” the null hypothesis. We never say that we “accept” the null hypothesis. We just say that we don’t have enough evidence to reject it. This is equivalent to saying we don’t have enough evidence to support the alternative hypothesis. Reference
It is failed to reject that there is no association between the team and the food preference.
chisq.test(data.frame("Team Tiger"=c(15,9),"Team Shark"=c(21,17)))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data.frame(`Team Tiger` = c(15, 9), `Team Shark` = c(21, 17))
## X-squared = 0.088973, df = 1, p-value = 0.7655