2024-10-20

Context

Sleep is one of the most fundamental parts of human health. It’s often recognized as one of the most important aspects of well-being. At the same time, coffee and caffeine are consumed by the vast majority of Americans.

I’ll be using a dataset of 137 responses from a survey regarding coffee consumption, the time it was consumed, and where it was consumed. Using these three metrics, I’ll be drawing correlations in a digestible format using graphs and plots.

Overview Plot

Each point represents a participant, allowing us to visualize potential correlations among these 3 variables.

Location and Consumption

This bar chart illustrates the frequency of coffee consumption at different locations.

Location and Sleep

This box plot compares the distribution of sleep duration across different coffee consumption locations.

Location and Time

This bar chart shows the number of participants who consume coffee after 5 PM, segmented by location

Correlation Coefficient

The Pearson correlation coefficient \(r\) between daily coffee consumption and hours of sleep is calculated as follows:

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]

where \(x\) represents daily coffee intake and \(y\) represents hours of sleep.

The result of the Pearson correlation is:

## [1] -0.1711758

Hypothesis Testing: T-Test

We conduct a t-test to determine if there is a significant difference in sleep duration between those who consume coffee after 5 PM and those who do not:

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

where \(\bar{X}_1\) and \(\bar{X}_2\) are the mean sleep hours for each group, \(s_1^2\) and \(s_2^2\) are the variances, and \(n_1\) and \(n_2\) are the sample sizes.

The result of our T-Test is

##        t 
## 1.180372

Code used for the first 3D plot

Below is the R code that generated the 3D scatter plot visualizing coffee intake, sleep duration, and late-day consumption, seen earlier in the presentation.

plot_ly(data = coffee_data, x = ~Daily_Coffee, y = ~Hrs_sleep, z = ~After_5PM, color = ~After_5PM, symbols = ~After_5PM, type="scatter3d", mode="markers") %>%
  layout(title = "Coffee Intake, Sleep Duration, and Late-Day Consumption",
         scene = list(xaxis = list(title = 'Daily Coffee'),
                      yaxis = list(title = 'Hours of Sleep'),
                      zaxis = list(title = 'Coffee After 5 PM (1 if Yes)')))