This document is a final project for the UCSD Course BILD 5. My project builds upon the study referenced below. I have proposed a potential follow-up experiment to extend the study’s findings. The project utilizes simulated data, structured and analyzed as if collected from a real experiment. While the outcomes do not provide scientific evidence, they showcase my skills in formulating and addressing relevant biological questions through statistics, experimental design, and programming.
Buijze et al. “The Effect of Cold Showering on Health and Work: A Randomized Controlled Trial.” PLOS ONE 11, no. 9 (September 15, 2016): e0161749. https://doi.org/10.1371/journal.pone.0161749.
Is there a difference in heart rate for 30 and 90 second showers when compared to normal showers?
The Null Hypothesis is that there is no significant difference in heart rate between the control, 30 second, and 90 second shower groups.
The Alternate Hypothesis is that there is a siginificant difference in heart rate between the control, 30 second, and 90 second showers.
I am using the ANOVA test for analyzing the variance between means on my data comparing the heart rate results among cold shower intervals.
## tibble [180 × 3] (S3: tbl_df/tbl/data.frame)
## $ X : int [1:180] 1 1 1 2 2 2 3 3 3 4 ...
## $ ShowerType: chr [1:180] "Normal.Shower..Control." "X90.Second.Shower" "X30.Second.Shower" "Normal.Shower..Control." ...
## $ HeartRate : num [1:180] 78 80.8 86 84.3 82 ...
The parameters for an ANOVA test include normality, variance, and independence. In order to check that our data is normal, we must find the residuals, graph a plot, and perform a KS test. Independence should be incorporated into data collection, which does not apply here. Variance can be found using the summary command as well as a TukeyHSD, if needed and if the p-value is less than 0.05.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ks_test1<-ks.test(residuals,"pnorm",mean(residuals),sd(residuals))
ks_test1
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: residuals
## D = 0.4061, p-value < 2.2e-16
## alternative hypothesis: two-sided
The results of my EDA did in fact show that my data had an outlier with a cooks distance above 0.5 and a very significant p-value. In order to fix this, I must remove the outlier from the dataset and test for normality again. This means that I will once again create a linear model, find the residuals, graph a histogram, and perform a KS test with the corrected data. If the KS test has a p-value well above 0.05, I will then know that I have correctly removed my outlier and that my data is normal.
I will remove the outlier from the dataset and create a new dataframe that does not include my identified outlier.
## # A tibble: 179 × 3
## X ShowerType HeartRate
## <int> <chr> <dbl>
## 1 1 Normal.Shower..Control. 78.0
## 2 1 X90.Second.Shower 80.8
## 3 1 X30.Second.Shower 86.0
## 4 2 Normal.Shower..Control. 84.3
## 5 2 X90.Second.Shower 82.0
## 6 2 X30.Second.Shower 74.3
## 7 3 Normal.Shower..Control. 79.9
## 8 3 X90.Second.Shower 64.1
## 9 3 X30.Second.Shower 81.7
## 10 4 Normal.Shower..Control. 90.0
## # ℹ 169 more rows
## tibble [179 × 3] (S3: tbl_df/tbl/data.frame)
## $ X : int [1:179] 1 1 1 2 2 2 3 3 3 4 ...
## $ ShowerType: chr [1:179] "Normal.Shower..Control." "X90.Second.Shower" "X30.Second.Shower" "Normal.Shower..Control." ...
## $ HeartRate : num [1:179] 78 80.8 86 84.3 82 ...
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ks_test2<-ks.test(residuals2, "pnorm", mean(residuals2), sd(residuals2))
ks_test2
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: residuals2
## D = 0.049957, p-value = 0.763
## alternative hypothesis: two-sided
summary(second_model)
##
## Call:
## lm(formula = CorrectedDF$HeartRate ~ CorrectedDF$ShowerType)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9018 -3.4696 -0.1771 2.8586 19.1188
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.9097 0.6972 117.479 < 2e-16
## CorrectedDF$ShowerTypeX30.Second.Shower -2.6788 0.9860 -2.717 0.00725
## CorrectedDF$ShowerTypeX90.Second.Shower -6.0248 0.9902 -6.084 7.11e-09
##
## (Intercept) ***
## CorrectedDF$ShowerTypeX30.Second.Shower **
## CorrectedDF$ShowerTypeX90.Second.Shower ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.401 on 176 degrees of freedom
## Multiple R-squared: 0.1743, Adjusted R-squared: 0.1649
## F-statistic: 18.58 on 2 and 176 DF, p-value: 4.795e-08
TukeyHSD(aov(second_model))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = second_model)
##
## $`CorrectedDF$ShowerType`
## diff lwr upr
## X30.Second.Shower-Normal.Shower..Control. -2.678842 -5.009528 -0.3481556
## X90.Second.Shower-Normal.Shower..Control. -6.024784 -8.365325 -3.6842431
## X90.Second.Shower-X30.Second.Shower -3.345942 -5.686483 -1.0054016
## p adj
## X30.Second.Shower-Normal.Shower..Control. 0.0197457
## X90.Second.Shower-Normal.Shower..Control. 0.0000000
## X90.Second.Shower-X30.Second.Shower 0.0025673
The ANOVA test and boxplot indicated significantly different means among the different shower types. The F-Statistic of 18.58, along with the results of the Tukey Test, indicated that the three groups were significantly different from each other in every relationship, as all p-values were below the threshold of 0.05. This was especially evident in the relationship between the Normal Shower and 90 Second Shower, which had a p-value incredibly close to 0.0000. This is visualied in the boxplot above, which compares how different the means of each shower type are. The summary results estimated the coefficients and intercepts, which indicated a drop in heart rate going from normal showers to 30 second cold showers to 90 second cold showers. Due to the significance of the p-value (4.795e-08), we can predict that the utilization of cold showers can be a significant predictor of a lower heart rate an hour after the shower occurs. Despite this, the R^2 value of 0.1743 indicates that a very limited amount of the variance can be attributed to the model, which means that there is a significant amount of unexplained variability and uncertainty in this model. To sumarize, there is a significant difference in heart rate when taking different types of cold (or warm) showers, but further tests are needed to know if the difference can be attributed to the cold showers themselves.