Term Project Description

This document is a final project for the UCSD Course BILD 5. My project builds upon the study referenced below. I have proposed a potential follow-up experiment to extend the study’s findings. The project utilizes simulated data, structured and analyzed as if collected from a real experiment. While the outcomes do not provide scientific evidence, they showcase my skills in formulating and addressing relevant biological questions through statistics, experimental design, and programming.

Article Citation

Buijze, G. A., Sierevelt, I. N., van der Heijden, B. C., Dijkgraaf, M. G., & Frings-Dresen, M. H. (2016). The Effect of Cold Showering on Health and Work: A Randomized Controlled Trial. PloS one, 11(9), e0161749. https://doi.org/10.1371/journal.pone.0161749

Research Question

My research question for my final project is: does taking longer cold showers cause a decrease in adult body weight?

Null Hypothesis

My null hypothesis for my final project is: taking longer cold showers has no statistically significant effect on an individual’s weight.

Alternate Hypothesis

My alternative hypothesis for my final project is: taking longer cold showers results in the decrease of adult body weight.

Test Performed

A linear regression is performed on my final project, as I am measuring two continuous variables to predict the dependent variable (weight) based on the independent variable (duration of a cold shower).

Import and Tidy the Data

## 'data.frame':    37 obs. of  2 variables:
##  $ Duration: num  12.68 13.86 10.82 13.69 6.63 ...
##  $ Weight  : num  179 179 180 178 175 ...

Exploratory Data Analysis

After checking if the data is tidy, I want to test the normality of the x and y variables (distribution and weight respectively) of my data. The normality of the each variable can be tested by making a histogram, the first check of normality, as well as running a KS test on each variable. If the two columns of data pass the normality checks (p value > 0.05), then I can test the normality of the null hypothesis. If the data fails the normality test, then I need to transform my data to fix any skews, or remove any outliers.

KS.Results.DUR<- ks.test(TIDY_DF$Duration,
        'pnorm',
        mean = mean(TIDY_DF$Duration),
        sd = sd(TIDY_DF$Duration))
KS.Results.DUR
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  TIDY_DF$Duration
## D = 0.11645, p-value = 0.6546
## alternative hypothesis: two-sided
KS.Results.WEI<- ks.test(TIDY_DF$Weight,
        'pnorm',
        mean = mean(TIDY_DF$Weight),
        sd = sd(TIDY_DF$Weight))
KS.Results.WEI
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  TIDY_DF$Weight
## D = 0.10882, p-value = 0.7327
## alternative hypothesis: two-sided

Description of Current Data and Plan for Fixes if Needed

The results from my EDA show that both variables of my data do not deviate significantly from the null hypothesis (both histograms appear to be normally distributed, and the p-value of the KS tests are greater than 0.05). From this information, the null hypothesis would not be rejected. There is no problem with my file; I can continue with my data as is and proceed to testing the normality of the null hypothesis.

Correct Data Issues and Build Final Data Table.

The data already meets the assumption of the model.

## 'data.frame':    37 obs. of  2 variables:
##  $ Duration: num  12.68 13.86 10.82 13.69 6.63 ...
##  $ Weight  : num  179 179 180 178 175 ...

Demonstrate That Your New Data Fit the Assmuptions of the Model


Test the Null Hypothesis of Your Research Question

ShowerModel <- lm(Weight ~ Duration, data = TIDY_DF)
ShowerResults <- summary(ShowerModel)
ShowerResults
## 
## Call:
## lm(formula = Weight ~ Duration, data = TIDY_DF)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.7447  -5.9171  -0.6877   3.8532  20.8775 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  198.123      5.822  34.033  < 2e-16 ***
## Duration      -1.522      0.492  -3.094  0.00386 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.715 on 35 degrees of freedom
## Multiple R-squared:  0.2148, Adjusted R-squared:  0.1924 
## F-statistic: 9.575 on 1 and 35 DF,  p-value: 0.003865

Build the Figure That Best Tells the Story of Your Model

## `geom_smooth()` using formula = 'y ~ x'

Figure Caption

This figure demonstrates that the longer you take a cold shower, the less an individual weighs. Here, the multiple R^2 value came out to be 0.2148, which indicates that only around 21.5% of the variation in the data can be explained by the model. However, the p-value of this model comes out as 0.003865, indicating that this model is statistically significant and the null hypothesis can be rejected. Furthermore, the slope in the figure is negative, indicating a negative relationship between the length of a cold shower taken and the weight of an individual. Therefore, longer cold showers cause an individual to weigh less, with roughly 1.5 pounds lost for every 3 minutes the cold shower goes on longer for. If real, this finding would mean that cold showers have numerous benefits to an individual’s health. Now that cold showering has possible weight reduction benefits and assists in prevention of minor ailments as the original paper that inspired this study found, different avenues of how cold showers can benefit an individual’s overall health should be explored. Here, I copied and pasted the skeleton of the ggplot function from a different section assignment, but I asked ChatGPT for some ways to improve the graph. Specifically, I learned to color-code different parts of the graph, change opacities of certain elements, and change the size of the dots representing the data.