Week 3 Analysis Tanner Norton 5/8/2019
library(tidyverse)
library(DT)
library(pander)
library(readr)
library(car)
HSS <- read_csv("HighSchoolSeniors.csv")
HSS_ttest <- read_csv("HSS_ttest.csv")
#Remember: select "Session, Set Working Directory, To Source File Location", and then play this R-chunk into your console to read the HSS data into R.
In 2018 a program called Census at school randomly surveyed 500 high school seniors across the nation. This survey included questions that gave both quantitative and qualitative answers. In this analysis students were grouped into two categories. The first included those who preferred to drink water and the seconed those who preffered caffeinated drinks which include coffee, tea, & caffeinated sodas/energy drinks. The purpose of this experiment is to determine if students who prefer water to caffeinated drinks get significantly different amounts of sleep on school nights.
\[ H_0: \mu_Water_1 = \mu_Caffeinated_2 \] \[ H_a: \mu_Water_1 > \mu_Caffeinated_2 \] It has been hypothesized that those who prefer the caffeinated drinks get less sleep on school nights compared to those who prefer water therefore the \(H_a\) will have sufficient evidence to be true. In this case \(\mu\) represents the mean number of hours slept on a school night.
datatable(HSS_ttest, options=list(lengthMenu = c(10,50)), style = "default")
The five number summary below give a good look into how the data is layed out. As can be seen the data includes more than required 30 oberservations for each category of drink. It appears that the data is spread out very similarly between those who prefer water versus caffeinated drinks.
HSS_ttest%>%
group_by(Drink) %>%
summarise(min = min(`Hours slept on school night`, na.rm = TRUE),
"1st Quart" = quantile(`Hours slept on school night`, .25, na.rm = TRUE),
median = median( `Hours slept on school night`, na.rm = TRUE),
mean = mean(`Hours slept on school night`, na.rm = TRUE),
'3rd Quart' = quantile(`Hours slept on school night`, .75, na.rm = TRUE),
max = max(`Hours slept on school night`, na.rm = TRUE),
sample = n()) %>%
pander()
Drink | min | 1st Quart | median | mean | 3rd Quart | max | sample |
---|---|---|---|---|---|---|---|
Caffeinated | 2 | 6 | 6 | 6.315 | 7 | 9.5 | 73 |
Water | 3 | 6 | 7 | 6.704 | 7.5 | 10 | 298 |
While the spread of the data appears to be very similar for those who prefer both types of drinks, the QQplot below shows that the data is not normally distributed as serveral oberservations lay outside the limits. Because of this using a T-test is not the most appropriate way to test this question but rather using a Wilcoxon or Kruskal-Wallis which do not assume a normal distribution in the data. However, to not futher complicate things we are going to continue with a T-test.
qqPlot(`Hours slept on school night` ~ Drink, data = HSS_ttest, ylab = "Hours of sleep on schoolnights")
pander(t.test(`Hours slept on school night` ~ Drink, data = HSS_ttest, mu = 0, alternative = "less",
conf.level = 0.95), caption="T-test: Hours of sleep on School nights", split.table=Inf)
Test statistic | df | P value | Alternative hypothesis | mean in group Caffeinated | mean in group Water |
---|---|---|---|---|---|
-2.139 | 100.5 | 0.01742 * | less | 6.315 | 6.704 |
The P-value obtained of .01742 gives sufficient evidence for the null hypothesis to be rejected at the 95% level and assume the alternative that there is a significant difference between students who prefer water versus caffeinated drinks in the amount of sleep they are getting on school nights. The average hours of sleep for students who prefer caffeinated drinks was .38 hours and therefore the hypothesis that these students are getting less sleep is confirmed by these results. However, as was mentioned above a T-test was not the most appropriate for this situation therefore it is suggested that this experiment be done again later in the semster when other tests can be performed that do not rely on the assumption of normality.