BILD 5 Final Project

This document is a final project for the UCSD Course BILD 5. My project builds upon the study referenced below. I have proposed a potential follow-up experiment to extend the study’s findings. The project utilizes simulated data, structured and analyzed as if collected from a real experiment. While the outcomes do not provide scientific evidence, they showcase my skills in formulating and addressing relevant biological questions through statistics, experimental design, and programming.

Article Citation

Buijze et al. “The Eﬀect of Cold Showering on Health and Work: A Randomized Controlled Trial.” PLOS ONE 11, no. 9 (September 15, 2016): e0161749. https://doi.org/10.1371/journal.pone.0161749.

Research Question

Is there a difference in heart rate for 30 and 90 second showers when compared to normal showers?

Null Hypothesis

The Null Hypothesis is that there is no significant difference in heart rate between the control, 30 second, and 90 second shower groups.

Alternate Hypothesis

The Alternate Hypothesis is that there is a siginificant difference in heart rate between the control, 30 second, and 90 second showers.

Test Performed

I am using the ANOVA test for analyzing the variance between means on my data comparing the heart rate results among cold shower intervals.

Import and Tidy the Data

## tibble [180 × 3] (S3: tbl_df/tbl/data.frame)
##  $ X         : int [1:180] 1 1 1 2 2 2 3 3 3 4 ...
##  $ ShowerType: chr [1:180] "Normal.Shower..Control." "X90.Second.Shower" "X30.Second.Shower" "Normal.Shower..Control." ...
##  $ HeartRate : num [1:180] 78 80.8 86 84.3 82 ...

Exploratory Data Analysis

The parameters for an ANOVA test include normality, variance, and independence. In order to check that our data is normal, we must find the residuals, graph a plot, and perform a KS test. Independence should be incorporated into data collection, which does not apply here. Variance can be found using the summary command as well as a TukeyHSD, if needed and if the p-value is less than 0.05.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ks_test1<-ks.test(residuals,"pnorm",mean(residuals),sd(residuals))
ks_test1

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  residuals
## D = 0.4061, p-value < 2.2e-16
## alternative hypothesis: two-sided

Description of Current Data and Plan for Fixes if Needed

The results of my EDA did in fact show that my data had an outlier with a cooks distance above 0.5 and a very significant p-value. In order to fix this, I must remove the outlier from the dataset and test for normality again. This means that I will once again create a linear model, find the residuals, graph a histogram, and perform a KS test with the corrected data. If the KS test has a p-value well above 0.05, I will then know that I have correctly removed my outlier and that my data is normal.

Correct Data Issues and Build Final Data Table.

I will remove the outlier from the dataset and create a new dataframe that does not include my identified outlier.

## # A tibble: 179 × 3
##        X ShowerType              HeartRate
##    <int> <chr>                       <dbl>
##  1     1 Normal.Shower..Control.      78.0
##  2     1 X90.Second.Shower            80.8
##  3     1 X30.Second.Shower            86.0
##  4     2 Normal.Shower..Control.      84.3
##  5     2 X90.Second.Shower            82.0
##  6     2 X30.Second.Shower            74.3
##  7     3 Normal.Shower..Control.      79.9
##  8     3 X90.Second.Shower            64.1
##  9     3 X30.Second.Shower            81.7
## 10     4 Normal.Shower..Control.      90.0
## # ℹ 169 more rows

## tibble [179 × 3] (S3: tbl_df/tbl/data.frame)
##  $ X         : int [1:179] 1 1 1 2 2 2 3 3 3 4 ...
##  $ ShowerType: chr [1:179] "Normal.Shower..Control." "X90.Second.Shower" "X30.Second.Shower" "Normal.Shower..Control." ...
##  $ HeartRate : num [1:179] 78 80.8 86 84.3 82 ...

Demonstrate That Your New Data Fit the Assmuptions of the Model

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ks_test2<-ks.test(residuals2, "pnorm", mean(residuals2), sd(residuals2))
ks_test2

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  residuals2
## D = 0.049957, p-value = 0.763
## alternative hypothesis: two-sided

Test the Null Hypothesis of Your Research Question

summary(second_model)

## 
## Call:
## lm(formula = CorrectedDF$HeartRate ~ CorrectedDF$ShowerType)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9018  -3.4696  -0.1771   2.8586  19.1188 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              81.9097     0.6972 117.479  < 2e-16
## CorrectedDF$ShowerTypeX30.Second.Shower  -2.6788     0.9860  -2.717  0.00725
## CorrectedDF$ShowerTypeX90.Second.Shower  -6.0248     0.9902  -6.084 7.11e-09
##                                            
## (Intercept)                             ***
## CorrectedDF$ShowerTypeX30.Second.Shower ** 
## CorrectedDF$ShowerTypeX90.Second.Shower ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.401 on 176 degrees of freedom
## Multiple R-squared:  0.1743, Adjusted R-squared:  0.1649 
## F-statistic: 18.58 on 2 and 176 DF,  p-value: 4.795e-08

TukeyHSD(aov(second_model))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = second_model)
## 
## $`CorrectedDF$ShowerType`
##                                                diff       lwr        upr
## X30.Second.Shower-Normal.Shower..Control. -2.678842 -5.009528 -0.3481556
## X90.Second.Shower-Normal.Shower..Control. -6.024784 -8.365325 -3.6842431
## X90.Second.Shower-X30.Second.Shower       -3.345942 -5.686483 -1.0054016
##                                               p adj
## X30.Second.Shower-Normal.Shower..Control. 0.0197457
## X90.Second.Shower-Normal.Shower..Control. 0.0000000
## X90.Second.Shower-X30.Second.Shower       0.0025673

Build the Figure That Best Tells the Story of Your Model

Figure Caption

The ANOVA test and boxplot indicated significantly different means among the different shower types. The F-Statistic of 18.58, along with the results of the Tukey Test, indicated that the three groups were significantly different from each other in every relationship, as all p-values were below the threshold of 0.05. This was especially evident in the relationship between the Normal Shower and 90 Second Shower, which had a p-value incredibly close to 0.0000. This is visualied in the boxplot above, which compares how different the means of each shower type are. The summary results estimated the coefficients and intercepts, which indicated a drop in heart rate going from normal showers to 30 second cold showers to 90 second cold showers. Due to the significance of the p-value (4.795e-08), we can predict that the utilization of cold showers can be a significant predictor of a lower heart rate an hour after the shower occurs. Despite this, the R^2 value of 0.1743 indicates that a very limited amount of the variance can be attributed to the model, which means that there is a significant amount of unexplained variability and uncertainty in this model. To sumarize, there is a significant difference in heart rate when taking different types of cold (or warm) showers, but further tests are needed to know if the difference can be attributed to the cold showers themselves.