In this report, we explored how different forms of exercise affects sleep habits. Participants were divided into four exercise groups (None, Cardio, Weights, and Cardio+Weights), and their average hours of sleep were measured before and after the experiment, along with their sleep efficiency at the end of the experiment. We conducted analyses including descriptive statistics, visualization, t-tests, and ANOVAs to determine which type of exercise best improves sleep.
getwd()
## [1] "/Users/tiff/Desktop/R class"
list.files()
## [1] "Assignment #1.R"
## [2] "Continuing the Law Firm Analysis (Part 2).R"
## [3] "Midterm Practice Data.xlsx"
## [4] "midterm_sleep_exercise.xlsx"
## [5] "R class.Rproj"
## [6] "rsconnect"
## [7] "Week 3 Homework.R"
## [8] "XW-Continuing the Law Firm Analysis (Part 2).Rmd"
## [9] "XW-Continuing-the-Law-Firm-Analysis--Part-2-.html"
## [10] "XW-Midterm Script.r"
## [11] "XW-Midterm-Exercise & Sleep.Rmd"
## [12] "XW-Midterm-Exercise---Sleep.html"
## [13] "XW-Midterm-Exercise---Sleep.Rmd"
## [14] "xw-practice midterm.r"
## [15] "XW-R Script to R Markdown Report Assignment.Rmd"
## [16] "XW-R-Script-to-R-Markdown-Report-Assignment.html"
excel_sheets("midterm_sleep_exercise.xlsx")
## [1] "participant_info_midterm" "sleep_data_midterm"
participant_info_midterm <- read_xlsx("midterm_sleep_exercise.xlsx", sheet="participant_info_midterm")
sleep_data_midterm <- read_xlsx("midterm_sleep_exercise.xlsx", sheet="sleep_data_midterm")
glimpse(participant_info_midterm)
## Rows: 100
## Columns: 4
## $ ID <chr> "P001", "P002", "P003", "P004", "P005", "P006", "P007",…
## $ Exercise_Group <chr> "NONE", "Nonee", "None", "None", "None", "None", "None"…
## $ Sex <chr> "Male", "Malee", "Female", "Female", "Male", "Female", …
## $ Age <dbl> 35, 57, 26, 29, 33, 33, 32, 30, 37, 28, 30, 20, 42, 31,…
glimpse(sleep_data_midterm)
## Rows: 100
## Columns: 4
## $ ID <chr> "P001", "P002", "P003", "P004", "P005", "P006", "P007…
## $ Pre_Sleep <chr> "zzz-5.8", "Sleep-6.6", NA, "SLEEP-7.2", "score-7.4",…
## $ Post_Sleep <dbl> 4.7, 7.4, 6.2, 7.3, 7.4, 7.1, 6.7, 9.0, 5.1, 6.3, 6.2…
## $ Sleep_Efficiency <dbl> 81.6, 75.7, 82.9, 83.6, 83.5, 88.5, 83.6, 73.4, 88.2,…
We imported the Excel file midterm_sleep_exercise.xlsx, which contained two sheets: participant information (participant_info_midterm) and sleep data (sleep_data_midterm). We used the read_xlsx() function from the readxl package to load each sheet and used glimpse() to preview their structures.
The dataset contains 100 participants. The participant_info_midterm dataset includes the following variables: ID, Exercise_Group, Sex, Age. While the sleep_data_midterm dataset contains: ID, Pre_Sleep, Post_Sleep, Sleep_Efficiency.
colnames(participant_info_midterm)
## [1] "ID" "Exercise_Group" "Sex" "Age"
unique(participant_info_midterm$Exercise_Group)
## [1] "NONE" "Nonee" "None" "N"
## [5] "Cardio" "C" "WEIGHTZ" "WEIGHTS"
## [9] "WEIGHTSSS" "Cardio+Weights" "CW" "C+W"
participant_info_midterm <- participant_info_midterm %>%
mutate(Exercise_Group = case_when(
Exercise_Group %in% c("NONE", "Nonee","N") ~ "None",
Exercise_Group == "C" ~ "Cardio",
Exercise_Group %in% c("WEIGHTZ", "WEIGHTS", "WEIGHTSSS") ~ "Weights",
Exercise_Group %in% c("CW", "C+W") ~ "Cardio+Weights",
TRUE ~ Exercise_Group))
unique(participant_info_midterm$Sex)
## [1] "Male" "Malee" "Female" "Femalee" "F" "M" "Fem"
## [8] "MALE" "Mal"
participant_info_midterm <- participant_info_midterm %>%
mutate(Sex = case_when(
Sex %in% c("Malee", "MALE", "Mal", "M") ~ "Male",
Sex %in% c("Femalee", "Fem", "F") ~ "Female",
TRUE ~ Sex))
midterm_data_combined<- merge(participant_info_midterm, sleep_data_midterm, by="ID")
Once we sucessfully imported our data, we standardized Exercise_Group and Sex variable labels as we need to ensure consistency across our dataset, by changing the variable entries in the Exercise_Group and Sex columns to have consistent values (e.g., “cardio”, “CARDIO” -> “Cardio”; “m”, “MALE” -> “Male”). These adjustments prevent grouping errors during analyses.
After cleaning the column, we merge the two datasets by the ID column to create a single dataset (midterm_data_combined) containing each participant’s demographics and sleep data. The cleaned merged dataset will now contain ID, Exercise_Group, Sex, Age, Pre_Sleep, Post_Sleep, Sleep_Efficiency.
colnames(midterm_data_combined)
## [1] "ID" "Exercise_Group" "Sex" "Age"
## [5] "Pre_Sleep" "Post_Sleep" "Sleep_Efficiency"
glimpse(midterm_data_combined)
## Rows: 100
## Columns: 7
## $ ID <chr> "P001", "P002", "P003", "P004", "P005", "P006", "P007…
## $ Exercise_Group <chr> "None", "None", "None", "None", "None", "None", "None…
## $ Sex <chr> "Male", "Male", "Female", "Female", "Male", "Female",…
## $ Age <dbl> 35, 57, 26, 29, 33, 33, 32, 30, 37, 28, 30, 20, 42, 3…
## $ Pre_Sleep <chr> "zzz-5.8", "Sleep-6.6", NA, "SLEEP-7.2", "score-7.4",…
## $ Post_Sleep <dbl> 4.7, 7.4, 6.2, 7.3, 7.4, 7.1, 6.7, 9.0, 5.1, 6.3, 6.2…
## $ Sleep_Efficiency <dbl> 81.6, 75.7, 82.9, 83.6, 83.5, 88.5, 83.6, 73.4, 88.2,…
midterm_data_combined <- midterm_data_combined %>%
mutate(Pre_Sleep = str_extract(Pre_Sleep, "\\d+\\.?\\d*"),
Pre_Sleep = as.numeric(Pre_Sleep))
midterm_data_combined <- midterm_data_combined %>%
mutate(Sleep_Difference = Post_Sleep - Pre_Sleep)
midterm_data_combined <- midterm_data_combined %>%
mutate(AgeGroup2 = case_when(
Age < 40 ~ "<40",
Age >= 40 ~ ">=40"))
sum(is.na(midterm_data_combined$Sleep_Difference))
## [1] 14
midterm_data_combined <- midterm_data_combined %>% filter(!is.na(Sleep_Difference))
After some column cleaning and merging, we created new variables to prepare the dataset for analysis. First, we cleaned the Pre_Sleep variable by extracting only the numeric values and converting to numeric format. Then, we created a new variable, Sleep_Difference, which represent the change in average hours of sleep from before to after the experiment.
We also created a new variable to separate participant into two distinct age groups: those under 40 (<40) and those 40 or older (>=40). Lastly, we removed any missing values from Sleep_Difference for accurate statistical analyses. The dataset will now have 86 rows.
sleep_summary <- rbind(cbind(Variable = "Sleep_Difference",
favstats(~ Sleep_Difference, data = midterm_data_combined)),
cbind(Variable = "Sleep_Efficiency",
favstats(~ Sleep_Efficiency, data = midterm_data_combined)))
kable(sleep_summary)
| Variable | min | Q1 | median | Q3 | max | mean | sd | n | missing |
|---|---|---|---|---|---|---|---|---|---|
| Sleep_Difference | -1.1 | 0.300 | 0.75 | 1.100 | 2.1 | 0.6825581 | 0.6610494 | 86 | 0 |
| Sleep_Efficiency | 71.7 | 79.975 | 83.30 | 88.425 | 101.5 | 83.7755814 | 5.9738043 | 86 | 0 |
Overall, participants showed an average Sleep_Difference of 0.68 hours (SD = 0.66), showing a small increase in sleep from before to after the experiment. The Sleep_Efficiency variable had a mean of 83.78% (SD = 5.97), suggesting a rather high efficiency across participants
execrise_sleep_summary <- rbind(
cbind(Varible = "Sleep_Difference",
favstats(Sleep_Difference ~ Exercise_Group, data = midterm_data_combined)),
cbind(Varible = "Sleep_Efficiency",
favstats(Sleep_Efficiency ~ Exercise_Group, data = midterm_data_combined))) %>%
arrange(desc(mean))
kable(execrise_sleep_summary)
| Varible | Exercise_Group | min | Q1 | median | Q3 | max | mean | sd | n | missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Sleep_Efficiency | Cardio+Weights | 74.5 | 83.50 | 88.7 | 90.5 | 96.3 | 86.8347826 | 5.9803169 | 23 | 0 |
| Sleep_Efficiency | Cardio | 75.9 | 81.30 | 85.5 | 88.0 | 101.5 | 85.4476190 | 5.9916291 | 21 | 0 |
| Sleep_Efficiency | Weights | 74.8 | 77.90 | 80.8 | 83.6 | 89.5 | 81.4571429 | 4.3113306 | 21 | 0 |
| Sleep_Efficiency | None | 71.7 | 76.60 | 81.5 | 83.6 | 90.4 | 81.0714286 | 5.5514992 | 21 | 0 |
| Sleep_Difference | Cardio | 0.3 | 0.70 | 1.2 | 1.4 | 2.1 | 1.1380952 | 0.4852589 | 21 | 0 |
| Sleep_Difference | Cardio+Weights | -0.1 | 0.65 | 0.9 | 1.1 | 1.5 | 0.8608696 | 0.3822649 | 23 | 0 |
| Sleep_Difference | Weights | -0.7 | 0.30 | 0.5 | 1.1 | 1.8 | 0.6666667 | 0.6126445 | 21 | 0 |
| Sleep_Difference | None | -1.1 | -0.40 | 0.1 | 0.6 | 0.9 | 0.0476190 | 0.6384505 | 21 | 0 |
When comparing group-wise means by Exercise_Group for both Sleep_Difference and Sleep_Efficiency, Cardio+Weights (M = 86.83%) and Cardio (M = 85.45%) participants showed the highest sleep efficiency average. On the other hand, the None group had the lowest sleep efficiency average (M = 81.07%). Similarly, for Sleep_Difference, the Cardio group showed the highest average increase in sleep (M = 1.14), followed by Cardio+Weights (M = 0.86), with the None group showing minimal change (M = 0.05). Based on the descriptive results, it is strongly suggesting that cardio-based exercises may contribute to greater improvement in both sleep duration and efficiency.
midterm_data_combined$Exercise_Group <- factor(midterm_data_combined$Exercise_Group,
levels = c("None", "Cardio", "Weights", "Cardio+Weights"))
ggplot(midterm_data_combined, aes(x = Exercise_Group, y = Sleep_Difference))+
geom_boxplot(fill = "lightblue", color = "black")+
labs(title = "Sleep difference by exercise group",
x = "Type of Exercise",
y ="Sleep Difference (hours)") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 17, family = "serif", face = "bold"),
axis.title.x = element_text(size = 12, family = "serif"),
axis.title.y = element_text(size = 12, family = "serif")
)
The boxplot above shows how Sleep_Difference varied across the four exercise groups. Overall, participants in the Cardio group exhibited the greatest improvements in sleep duration. Follow by Cardio+Weights group. The Weights group showed little improvement, while the None group had the lowest and most spread-out results, meaning their sleep changes were smaller and less consistent. Overall, the plot suggests that cardio-based exercise helps improve sleep the most.
ggplot(midterm_data_combined, aes(x = Exercise_Group, y = Sleep_Efficiency))+
geom_boxplot(fill = "cadetblue", color = "black")+
labs(title = "Sleep efficiency by exercise group",
x = "Type of Exercise",
y ="Sleep Efficiency") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 17, family = "serif", face = "bold"),
axis.title.x = element_text(size = 12, family = "serif"),
axis.title.y = element_text(size = 12, family = "serif")
)
The boxplot above shows how Sleep_Efficiency varied across the four exercise groups. Overall, participants in the Cardio+Weights group showed the highest sleep efficiency, followed by the Cardio group. The Weights group had lower efficiency, while the None group showed the lowest and most variable results. Overall, the plot suggests that cardio-based or combined exercise helps improve sleep quality the most.
ggplot(midterm_data_combined, aes(x = Sleep_Efficiency, y = Sleep_Difference))+
geom_point(color="steelblue")+
geom_smooth(method = "lm", color = "darkblue")+
labs(title = "Relationship between sleep efficiency and sleep difference ",
x = "Sleep Efficiency",
y ="Sleep Difference (hours)") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 17, family = "serif", face = "bold"),
axis.title.x = element_text(size = 12, family = "serif"),
axis.title.y = element_text(size = 12, family = "serif")
)
## `geom_smooth()` using formula = 'y ~ x'
The scatterplot above shows the relationship between Sleep_Efficiency and Sleep_Difference. The slight upward trend suggests that participants with higher sleep efficiency also tended to show greater improvements in sleep duration. However, the points are relatively spread out, meaning the relationship is weak. Overall, this suggests that while better sleep efficiency may be linked to longer sleep, the connection isn’t very strong or consistent across participants.
sex_t_test1 <- t.test(Sleep_Difference ~ Sex, data = midterm_data_combined)
sex_t_test1
##
## Welch Two Sample t-test
##
## data: Sleep_Difference by Sex
## t = 1.5801, df = 77.647, p-value = 0.1182
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -0.05865017 0.50972574
## sample estimates:
## mean in group Female mean in group Male
## 0.7795918 0.5540541
Based on the t-test comparing sleep differences between the two sex groups (Male vs. Female), females (M = 0.78) had a slightly higher average sleep difference than males (M = 0.55). However, this difference was not statistically significant, t(77.65) = 1.58, p = 0.118. Since the p-value is greater than the significance level of 0.05, it means that there’s a greater chance that the difference we observed is by random chance. Hence, there’s not enough evidence to conclude that sleep change differs by sex.
Age2_t_test2 <- t.test(Sleep_Difference ~ AgeGroup2, data = midterm_data_combined)
Age2_t_test2
##
## Welch Two Sample t-test
##
## data: Sleep_Difference by AgeGroup2
## t = -1.3746, df = 36.662, p-value = 0.1776
## alternative hypothesis: true difference in means between group <40 and group >=40 is not equal to 0
## 95 percent confidence interval:
## -0.50676303 0.09717936
## sample estimates:
## mean in group <40 mean in group >=40
## 0.6373134 0.8421053
Based on the t-test comparing sleep difference between the two age groups (<40 or >=40), those who are 40 and older (M=0.84) had slightly higher average sleep difference than those who are younger than 40 (M=0.64). However, this difference was not statistically significant, t(36.66)= -1.37, p=0.178. Once again, the p value appears to be larger than 0.05 threshold for significance, therefore the difference between sleep difference and age groups is not significant.
anova_difference <- aov(Sleep_Difference ~ Exercise_Group, data = midterm_data_combined)
summary(anova_difference)
## Df Sum Sq Mean Sq F value Pr(>F)
## Exercise_Group 3 13.56 4.520 15.72 3.67e-08 ***
## Residuals 82 23.58 0.288
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
supernova(anova_difference)
## Analysis of Variance Table (Type III SS)
## Model: Sleep_Difference ~ Exercise_Group
##
## SS df MS F PRE p
## ----- --------------- | ------ -- ----- ------ ----- -----
## Model (error reduced) | 13.560 3 4.520 15.717 .3651 .0000
## Error (from model) | 23.583 82 0.288
## ----- --------------- | ------ -- ----- ------ ----- -----
## Total (empty model) | 37.144 85 0.437
TukeyHSD(anova_difference)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Sleep_Difference ~ Exercise_Group, data = midterm_data_combined)
##
## $Exercise_Group
## diff lwr upr p adj
## Cardio-None 1.0904762 0.6564482 1.52450413 0.0000000
## Weights-None 0.6190476 0.1850197 1.05307556 0.0018927
## Cardio+Weights-None 0.8132505 0.3887628 1.23773822 0.0000171
## Weights-Cardio -0.4714286 -0.9054565 -0.03740063 0.0278779
## Cardio+Weights-Cardio -0.2772257 -0.7017134 0.14726203 0.3237562
## Cardio+Weights-Weights 0.1942029 -0.2302848 0.61869060 0.6287294
The one-way ANOVA revealed a significant effect of Exercise_Group on Sleep_Difference, F(3, 82) = 15.72, p < .001. The PRE value of 0.37 suggests that Exercise_Group explained about 37% of the total variance in Sleep_Difference.
Post-hoc Tukey tests showed that the Cardio (p < .001), Weights (p = .0019), and Cardio+Weights (p < .001) groups each had significantly greater improvements in sleep compared to the None group. The Cardio group also showed a significantly greater increase in sleep compared to the Weights group (p = .028). However, differences between Cardio+Weights vs. Cardio (p = .324) and Cardio+Weights vs. Weights (p = .629) were not statistically significant.
anova_efficiency <- aov(Sleep_Efficiency ~ Exercise_Group, data = midterm_data_combined)
summary(anova_efficiency)
## Df Sum Sq Mean Sq F value Pr(>F)
## Exercise_Group 3 540.4 180.1 5.925 0.00104 **
## Residuals 82 2492.9 30.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
supernova(anova_efficiency)
## Analysis of Variance Table (Type III SS)
## Model: Sleep_Efficiency ~ Exercise_Group
##
## SS df MS F PRE p
## ----- --------------- | -------- -- ------- ----- ----- -----
## Model (error reduced) | 540.400 3 180.133 5.925 .1782 .0010
## Error (from model) | 2492.939 82 30.402
## ----- --------------- | -------- -- ------- ----- ----- -----
## Total (empty model) | 3033.339 85 35.686
TukeyHSD(anova_efficiency)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Sleep_Efficiency ~ Exercise_Group, data = midterm_data_combined)
##
## $Exercise_Group
## diff lwr upr p adj
## Cardio-None 4.3761905 -0.08623232 8.8386133 0.0566544
## Weights-None 0.3857143 -4.07670851 4.8481371 0.9958617
## Cardio+Weights-None 5.7633540 1.39901844 10.1276896 0.0046379
## Weights-Cardio -3.9904762 -8.45289899 0.4719466 0.0962888
## Cardio+Weights-Cardio 1.3871636 -2.97717203 5.7514992 0.8383629
## Cardio+Weights-Weights 5.3776398 1.01330416 9.7419753 0.0094267
The one-way ANOVA revealed a significant effect of Exercise_Group on Sleep_Efficiency, F(3, 82) = 5.93, p = .001. The PRE value of 0.18 means that Exercise_Group explained about 18% of the total variance in Sleep_Efficiency.
Post-hoc Tukey tests showed that the Cardio+Weights group had significantly higher sleep efficiency compared to the None group (p = .0046). The difference between the Cardio and None groups was marginally significant (p = .057), suggesting a possible trend toward better sleep efficiency among those in the Cardio group. Other pairwise comparisons, including Weights vs. None (p = .996), Weights vs. Cardio (p = .096), and Cardio+Weights vs. Cardio (p = .839), were not statistically significant.
After considering both ANOVA and post-hoc tests outcomes, I would recommend Cardio as the exercise regimen to improve overall sleep. The one-way ANOVA showed a significant effect of exercise group on sleep difference, F(3, 82) = 15.72, p < .001, with the exercise group explaining a substantial proportion of variance (37%) in sleep difference. Tukey post-hoc tests revealed that the Cardio group had significantly greater improvements in sleep compared to both the None (p < .05) and Weights (p = .028) groups. Although the Cardio+Weights group also improved sleep, its results were not significantly higher than Cardio alone (p > .05). Similarly, Cardio showed the highest gains in sleep efficiency, making it the best option for improving overall sleep.
I think the most challenging part about this midterm was making sure I transferred everything correctly when moving my code from the R script to R Markdown, especially because I prefer completing the script first. This time around, I felt more confident about writing the actual script and I’m becoming more familiar with the functions. Next time, I want to spend more time reviewing the results of my statistical tests to make sure I fully understand the meaning of each test so I can communicate the findings more effectively.