Code
rm(list = ls())Both exercise and sleep have a strong, positive, widely recorded correlation with positive well-being. The purpose of this study was to determine which variable had a stronger correlation. The working hypothesis of this study was that sleep quality is more strongly associated with promoting well-being. Participants were current full-time college students aged 18–22 who were members of the First-Year Research Immersion program (FRI) at Binghamton University. Using two separate wearables, the FitBit Charge 6 and the Muse S, data of the participants was recorded by the wearable and logged by the participant during a daily survey. The participants wore the FitBit Charge 6 each day to track exercise data, and the Muse S at night to track sleep data. Participants used the wearables for a period of 7 days, filling out a more in-depth survey about mental well-being on Day 1 and Day 7. The survey also asked participants to rate their mental health throughout the day on a five-point Likert scale from poor to excellent. Using the results of this study, public health officials can determine whether it is more important for students to ensure that their sleep quality is high or their exercise is consistent and sufficient and thus provide recommendations to both students and colleges.
MUSE headband, step count, sleep score, college students, lifestyle mental health
rm(list = ls())library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(tibble)
library(dplyr)
library(tidyr)
library(psych)
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
library(scales) # for number formatting like comma()
Attaching package: 'scales'
The following objects are masked from 'package:psych':
alpha, rescale
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(english) # to convert numbers to words
Attaching package: 'english'
The following object is masked from 'package:scales':
ordinal
library(stringr) # for text functions like str_c()library(readxl)
# Import Excel file
onesevendata <- read_excel(
"10.27.25.Day1_7.Clean.xlsx",
col_names = TRUE)
onesevendata[onesevendata == -99] <- NA
onesevendata[onesevendata == -50] <- NA
##explanation: all -99 and -50 data will be treated as missing data
# View first 10 rows
head(onesevendata, 10)# A tibble: 10 × 78
StartDate EndDate Status IPAddress Progress
<dttm> <dttm> <dbl> <chr> <dbl>
1 2025-06-30 15:29:14 2025-06-30 17:10:54 0 64.128.175.42 100
2 2025-06-30 15:52:41 2025-06-30 18:54:44 0 149.125.91.33 100
3 2025-06-30 21:07:27 2025-06-30 21:16:25 0 153.33.244.42 100
4 2025-06-30 23:33:05 2025-06-30 23:38:26 0 149.125.195.32 100
5 2025-06-24 16:55:07 2025-06-24 16:55:38 0 24.47.129.138 22
6 2025-07-06 21:21:04 2025-07-07 18:10:49 0 64.128.175.42 100
7 2025-07-08 07:04:12 2025-07-08 18:09:29 0 149.125.88.193 100
8 2025-07-09 04:53:03 2025-07-09 05:14:22 0 166.194.188.15 100
9 2025-08-25 12:24:10 2025-08-25 12:25:47 1 <NA> 100
10 2025-08-29 10:22:06 2025-08-29 10:25:02 1 <NA> 100
# ℹ 73 more variables: `Duration (in seconds)` <dbl>, Finished <dbl>,
# RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
# RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
# LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
# UserLanguage <chr>, Q_RecaptchaScore <dbl>, SURVEYDAY <dbl>,
# PASSWORD_COLOR <dbl>, PASSWORD <chr>, `7DAYS` <dbl>, YEAR <dbl>,
# PROGRAM <dbl>, LIVING <dbl>, `GENDER ` <dbl>, SEXUALIDENTITY <dbl>, …
#source: Importing Data Once (Hei & McCarty, 2025): https://shanemccarty.github.io/FRIplaybook/import-once.htmllibrary(readxl)
# Import Excel file
dailydata <- read_excel(
"daily.survey.xlsx",
col_names = TRUE)New names:
• `HEARTRATE` -> `HEARTRATE...39`
• `HEARTRATE` -> `HEARTRATE...43`
dailydata[dailydata == -99] <- NA
dailydata[dailydata == -50] <- NA
##explanation: all -99 and -50 data will be treated as missing data
# View first 10 rows
head(dailydata, 10)# A tibble: 10 × 54
StartDate EndDate Status IPAddress Progress Duration (in seconds…¹ Finished
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Start Date End Da… Respo… IP Addre… Progress Duration (in seconds) Finished
2 45791.8068… 45791.… 0 149.125.… 44 264 0
3 45792.3557… 45795.… 0 66.67.6.… 89 285250 0
4 45811.8646… 45811.… 1 <NA> 100 26 1
5 45818.2859… 45818.… 0 149.125.… 100 604 1
6 45818.9734… 45818.… 0 149.125.… 100 434 1
7 45819.2866… 45819.… 0 149.125.… 100 1013 1
8 45819.8208… 45819.… 0 149.125.… 100 222 1
9 45820.8984… 45820.… 0 172.59.1… 100 171 1
10 45821.3397… 45821.… 0 149.125.… 100 1753 1
# ℹ abbreviated name: ¹`Duration (in seconds)`
# ℹ 47 more variables: RecordedDate <chr>, ResponseId <chr>,
# RecipientLastName <chr>, RecipientFirstName <chr>, RecipientEmail <chr>,
# ExternalReference <chr>, LocationLatitude <chr>, LocationLongitude <chr>,
# DistributionChannel <chr>, UserLanguage <chr>, Q_RecaptchaScore <chr>,
# PASSWORD_COLOR <chr>, PASSWORD <chr>, DAY <chr>, `STRESS _1` <chr>,
# STARTTIME <chr>, ENDTIME <chr>, TIMEINBED <chr>, SLEEPSCORE <chr>, …
#source: Importing Data Once (Hei & McCarty, 2025): https://shanemccarty.github.io/FRIplaybook/import-once.htmllibrary(tidyr)
## Convert to wide format
wide_onesevendata <- onesevendata %>%
pivot_wider(
id_cols = PASSWORD,
names_from = SURVEYDAY,
values_from = c(`WELLBEING1`, `WELLBEING2`, `WELLBEING3`, `WELLBEING4`, `WELLBEING5`, `WELLBEING6`, `WELLBEING7`, `WELLBEING8`),
names_glue = "{.value}_T{SURVEYDAY}"
)Warning: Values from `WELLBEING1`, `WELLBEING2`, `WELLBEING3`, `WELLBEING4`,
`WELLBEING5`, `WELLBEING6`, `WELLBEING7` and `WELLBEING8` are not uniquely
identified; output will contain list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
{data} |>
dplyr::summarise(n = dplyr::n(), .by = c(PASSWORD, SURVEYDAY)) |>
dplyr::filter(n > 1L)
#source: https://dcl-prog.stanford.edu/list-columns.html
# i got help from danica on this part
#source: Tidying your Data (McCarty et. al., 2025): https://shanemccarty.github.io/FRIplaybook/tidyr.htmllibrary(tidyr)
## Convert to wide format
wide_daily_survey <- dailydata %>%
pivot_wider(
id_cols = PASSWORD,
names_from = DAY,
values_from = c(SLEEPSCORE, STEPCOUNT),
names_glue = "{.value}_T{DAY}"
)Warning: Values from `SLEEPSCORE` and `STEPCOUNT` are not uniquely identified; output
will contain list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
{data} |>
dplyr::summarise(n = dplyr::n(), .by = c(PASSWORD, DAY)) |>
dplyr::filter(n > 1L)
#source: Tidying your Data (McCarty et. al., 2025): https://shanemccarty.github.io/FRIplaybook/tidyr.htmllibrary(readxl)
# Import Excel file
daily_survey_clean <- read_excel(
"daily_survey_clean.xlsx",
col_names = TRUE)
#source: Importing Data Once (Hei & McCarty, 2025): https://shanemccarty.github.io/FRIplaybook/import-once.html
#explanation: all -99 and -50 data will be treated as missing datalibrary(dplyr)
library(ggplot2)
# Fix list columns in wide_onesevendata
library(dplyr)
wide_onesevendata <- wide_onesevendata %>%
mutate(across(starts_with("WELLBEING"),
~ as.numeric(as.character(sapply(., `[`, 1)))))Warning: There were 24 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 23 remaining warnings.
# Select relevant WELLBEING columns + PASSWORD
masterdata <- wide_onesevendata %>%
select(starts_with("WELLBEING"), PASSWORD)
# Convert all WELLBEING columns to numeric
masterdata <- masterdata %>%
mutate(across(starts_with("WELLBEING"), as.numeric))
# Optional recoding (to labels if you want them as text — otherwise skip)
recode_labels <- function(x) {
case_when(
x == 1 ~ "Not at all",
x == 2 ~ "A little bit",
x == 3 ~ "Moderately",
x == 4 ~ "Quite a bit",
x == 5 ~ "Extremely",
TRUE ~ NA_character_
)
}
masterdata <- wide_onesevendata %>%
select(starts_with("WELLBEING"), PASSWORD) %>%
mutate(across(starts_with("WELLBEING"), ~ as.numeric(as.character(sapply(., `[`, 1)))))
# Create mean wellbeing score (numeric)
masterdata$WELLBEING <- rowMeans(masterdata %>% select(starts_with("WELLBEING")), na.rm = TRUE)
# Plot WELLBEING distribution
ggplot(masterdata, aes(x = WELLBEING)) +
geom_histogram(binwidth = 0.5, color = "black")Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).
wide_daily_survey <- wide_daily_survey %>%
select(-contains("What day of data collection"))
# Convert STEPCOUNT columns to numeric before averaging
wide_daily_survey <- wide_daily_survey %>%
mutate(across(starts_with("STEPCOUNT"),
~ as.numeric(as.character(sapply(., `[`, 1))))) Warning: There were 8 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 7 remaining warnings.
# Compute average stepcount
wide_daily_survey$STEPCOUNT <- rowMeans(
wide_daily_survey %>% select(starts_with("STEPCOUNT")),
na.rm = TRUE
)
# Plot STEPCOUNT distribution
ggplot(wide_daily_survey, aes(x = STEPCOUNT)) +
geom_histogram(binwidth = 500, color = "black")Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_bin()`).
### Checking Normality of Sleep Score
for (col in grep("^SLEEPSCORE", names(wide_daily_survey), value = TRUE)) {
wide_daily_survey[[col]] <- as.numeric(as.character(sapply(wide_daily_survey[[col]], `[`, 1)))
}Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
Warning: NAs introduced by coercion
wide_daily_survey$SLEEPSCORE <- rowMeans(wide_daily_survey[, c("SLEEPSCORE_T1", "SLEEPSCORE_T2", "SLEEPSCORE_T3", "SLEEPSCORE_T4","SLEEPSCORE_T5", "SLEEPSCORE_T6", "SLEEPSCORE_T7")], na.rm=TRUE)
library(ggplot2)
ggplot(wide_daily_survey, mapping=aes(x = SLEEPSCORE)) +
geom_histogram(binwidth = .5, color = "black")Warning: Removed 12 rows containing non-finite outside the scale range
(`stat_bin()`).
#source: datacampComparing Sleepscore and Wellbeing
#combining datasets
library(dplyr)
masterdata2 <- inner_join(masterdata, wide_daily_survey, by = "PASSWORD")
ggplot(masterdata2, aes(SLEEPSCORE,WELLBEING)) +
geom_smooth(method = "lm") +
geom_point(position = "jitter")`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 6 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_point()`).
summary(masterdata2$WELLBEING) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.812 3.000 3.500 3.488 3.938 5.000 1
summary(masterdata2$SLEEPSCORE) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
39.50 65.52 71.67 70.03 76.58 83.33 6
lm(formula = WELLBEING ~ SLEEPSCORE, data = masterdata2)
Call:
lm(formula = WELLBEING ~ SLEEPSCORE, data = masterdata2)
Coefficients:
(Intercept) SLEEPSCORE
3.5172409 -0.0007238
#source: datacampggplot(masterdata2, aes(STEPCOUNT,WELLBEING)) +
geom_smooth(method = "lm") +
geom_point(position = "jitter")`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
summary(masterdata2$WELLBEING) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.812 3.000 3.500 3.488 3.938 5.000 1
summary(masterdata2$STEPCOUNT) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
418 6970 9605 10079 12377 21341 2
lm(formula = STEPCOUNT ~ WELLBEING, data = masterdata2)
Call:
lm(formula = STEPCOUNT ~ WELLBEING, data = masterdata2)
Coefficients:
(Intercept) WELLBEING
3064 2013
#source: datacamp