Introduction and Motivations:

After entering my senior year of university, because of tons of academic works and anxiety about application to graduate schools, I always felt sleep-deprived and mentally exhausted, which affected both my studies and my mental health. However, I always thought my sleep pattern was normal because many people online mentioned having similar issues. This raised a perplexing question: were sleep issues a universal challenge? So, I started this project about sleep and lifestyle. Studying sleep is meaningful because sleep significantly impacts mental health. Reflecting back, my freshman year was particularly challenging due to the pandemic, requiring me to accommodate with jet lag while attending classes at night hours, often resulting in disrupted sleep and subsequently being diagnosed with anxiety. Therefore, the purpose of this data visualization is to demonstrate the importance of healthy sleep habits. Sleep is a critical aspect of well-being, and understanding our individual sleep patterns and comparing them to global trends can help us make informed decisions about our health and lifestyle choices. In this data-driven story, I will present a comparative analysis of sleep patterns and various lifestyle habits that influence the sleep duration, highlighting the significant differences between my own sleep duration and those of a broader community, in this case, the world’s sleep duration.

Data Features:

Sleep Dataset from a Larger Community:

The dataset contains sleep and lifestyle data from 400 individuals. It is shared by Laksika Tharmalingam on Kaggle website. It has 400 rows and 13 columns.

Variables:
  • Person ID: An identifier for each individual.

  • Sleep Duration (hours): The number of hours the person sleeps per day.

  • Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.

  • Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

  • Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

  • Heart Rate (bpm): The resting heart rate of the person in beats per minute.

  • Daily Steps: The number of steps the person takes per day.

  • Age: The age of the person in years.

  • Occupation: The occupation or profession of the person.

  • Gender: The gender of the person (Male, Female).

  • BMI Category: The BMI category of the person (Underweight, Normal, Overweight).

  • Blood Pressure: The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure (Systolic, Diastolic).

  • Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

Personal Sleep Dataset:

The dataset comprises my personal sleep and lifestyle data recorded over nearly 2 months. It has 51 rows and 11 columns.

Variables:
  • Day: An identifier for date.

  • Sleep Duration (hours): The number of hours I sleep.

  • Steps: The number of steps I take.

  • Physical Activity Level (minutes/day): The number of minutes I engage in physical activity, including aerobic and anaerobic exercise, that day.

  • Screen Time in 1 Hour before Sleep (minutes): The time I spend on phone or laptop in 1 hour before sleep on that day.

  • Coffee Consumption in Cups: The number of coffee I drink on that day.

  • Numbers of Assignment Due: The number of assignments due that day.

  • Date: The date of measurement.

  • Weekend or Weekday: Whether that day is a weekday or during weekend (weekday, weekend).

  • Go to Gym or Not: Whether I go to gym to exercise or not on that day (yes, no).

  • Have Carbohydrate in Dinner or Not: Whether I have carbohydrate in dinner or not on that day (yes, no).

knitr::opts_chunk$set(message = FALSE)
library(ggplot2)
library(dplyr)
library(plotly)
library(patchwork)


global_sleep <- read.csv("/Users/clarawang/Desktop/sleep_project.csv")
my_sleep <- read.csv("/Users/clarawang/Desktop/my_sleep.csv")

Plot 1.

By comparing my sleep duration in October and sleep duration of a larger community, I found my sleep duration is less than the global trend. Therefore, I decided to make some changes of my lifestyle in the next month to improve my sleep pattern.

oct.sleep <- my_sleep %>%
  filter(Day %in% c(1:25))

combined_sleep_oct <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = oct.sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_oct <- ggplot(combined_sleep_oct, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against Mine before Changing", x = "Dataset", y = "Sleep Duration (Hours)") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_oct

Plot 2.

To start with, I took a look at the impact of physical activity level on sleep duration as I usually heard exercising could bring good sleep. As I am female, I particularlly focus on the female data and discovered a strong positive linear relationship between physical activity level and sleep duration, except for several outlier. Therefore, I decided to exercise more in the next month.

colors <- c("Male" = "#666666", "Female" = "#CC0033")

female <- global_sleep %>%
  filter(Gender == "Female")
  
p1 <- ggplot(female, aes(x = Physical.Activity.Level, y = Sleep.Duration, color = "Female")) +
  geom_point(size = 2.3, alpha = 0.2) +
  labs(x = "Physical Activity Level", y = "Sleep Duration (Hours)", subtitle = "Female") +
  scale_color_manual(values = colors) + 
  theme_minimal() + 
  guides(color = FALSE) +
  theme(
    plot.subtitle = element_text(hjust = 0.5, color = "#CC0033", size = 13, margin = margin(0,0,-3,0)), 
    plot.margin = margin(t = 10, unit = "pt"), 
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0.1),
    panel.grid.minor = element_line(size = 0.1)
  )


male <- global_sleep %>%
  filter(Gender == "Male")

p2 <- p1 + ggplot(male, aes(x = Physical.Activity.Level, y = Sleep.Duration, color = "Male")) +
  geom_point(size = 2.3, alpha = 0.2) +
  labs(x = "Physical Activity Level", y = "Sleep Duration (Hours)", subtitle = "Male", title = "Sleep Duration VS Physical Activity Level") +
  scale_color_manual(values = colors) + 
  guides(color = FALSE) +
  theme_minimal() + 
  theme(
    plot.subtitle = element_text(hjust = 0.5, color = "#666666", size = 13, margin = margin(0,0,-3,0)),
    plot.title = element_text(hjust = 2.0, size = 16, margin = margin(0,0,10,0)),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0.1),
    panel.grid.minor = element_line(size = 0.1)
    )
p2

Plot 3.

Then, I shifted my focus to the influence of daily steps and paid more attention to the BMI category that I belonged to. The data also illustrated a moderately positively linear relationship between daily steps and sleep duration. Thus, I decided to walk more instead of scooting or driving in the next month.

knitr::include_graphics("/Users/clarawang/Desktop/plot3.png")

Plot 4.

Besides, I was curious about the sleep patterns of scientists, which was going to be my ultimate career goal. Sadly, both the sleep duration and sleep quality of scientists were almost the lowest among all the surveyed occupations. Hence, I would like to change my lifestyle as quick as possible to sleep more in the rest of my undergraduate life before stepping into the real field of science.

knitr::include_graphics("/Users/clarawang/Desktop/plot4.png")

Plot 5.

As I changed my lifestyle habits after 10.25, my sleep duration increased remarkably. Moreover, my sleep duration was longer when I went to gym.

order <- my_sleep$Date

dt <- my_sleep %>%
  mutate(Date = factor(Date, levels = order))

colors <- c("#666666", "#CC0033")

p3 <- ggplot(dt, aes(x = Date, y = Sleep.Duration, fill = Go.to.Gym.or.Not)) +
  geom_bar(stat = "identity", position = position_dodge(width = 1.0), width = 0.9, alpha = 0.8) +
  labs(x = "Date", y = "Sleep Duration (Hours)", title = "Sleep Duration Change over Time") +
  scale_fill_manual(values = colors, name = "Go to Gym or Not") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0),
    legend.key.width = unit(0.3, "cm"), 
    legend.key.height = unit(0.3, "cm"), 
    axis.text.x = element_text(angle = 45, hjust = 1, size = 5)
  )
p3

Plot 6.

In November, I walked more, exercised more, and drank less coffee per day, all of which improved my sleep duration. I also slept longer during weekends.

setwd("/Users/clarawang/Desktop/shiny")
mysleep <- read.csv("/Users/clarawang/Desktop/my_sleep.csv")

library(shiny)
library(ggplot2)
library(dplyr)
library(RColorBrewer)

mysleep <- mysleep %>%
  mutate(Sleep_Duration = as.numeric(Sleep.Duration), 
         Step_Count = as.numeric(Steps), 
         Weekend_or_Weekday = as.character(Weekend.or.Weekday),
         Go_to_Gym_or_Not = as.character(Go.to.Gym.or.Not),
         Have_Carbohydrate_in_Dinner_or_Not = as.character(Have.Carbohydrate.in.Dinner.or.Not), 
         Physical_Activity_Level = as.numeric(Physical.Activity.Level),
         Coffee_Consumption_in_Cups = as.factor(Coffee.Consumption.in.Cups),
  ) %>%
  select(c("Sleep_Duration", "Step_Count", "Weekend_or_Weekday", "Go_to_Gym_or_Not", "Have_Carbohydrate_in_Dinner_or_Not", "Physical_Activity_Level", "Coffee_Consumption_in_Cups"))

all_vars <- colnames(mysleep)[-1]

ui <- fluidPage(
  titlePanel("My Sleep Information"),
  sidebarLayout(
    sidebarPanel(
      selectInput("variable", "Select a Variable:", choices = all_vars),
      checkboxInput("summary", "Show Summary of Selected Variable", TRUE)
    ),
    mainPanel(
      plotOutput("plot"),
      h3(""),
      verbatimTextOutput("summary"),
      h3(""),
      verbatimTextOutput("explain")
    )
  )
)

server <- function(input, output) {
  
  output$plot <- renderPlot({
    selected_var <- input$variable
    
    if (is.numeric(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration")) +
        geom_point() +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0("Sleep Duration VS" , selected_var, " (Scatter Plot)")) +
        geom_smooth(method = "lm", se = FALSE, aes(group = 1), size = 0.5, color = "grey") +
        theme_minimal() +
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else if (is.character(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration")) +
        geom_boxplot() +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0("Sleep Duration VS" , selected_var, " (Box Plot)")) +
        theme_minimal() + 
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else if (is.factor(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration", fill = selected_var)) +
        geom_bar(stat = "identity", position = "dodge") +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0("Sleep Duration VS" , selected_var, " (Bar Plot)")) +
        scale_fill_brewer(palette = "Greys") +
        theme_minimal() +
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else {
      return(NULL)
    }
  })
  
  output$summary <- renderPrint({
    if (!input$summary) {
      return(cat("Summary is hidden"))
    } else {
      selected_var <- input$variable
      if (is.numeric(mysleep[[selected_var]])) {
        return(summary(mysleep[[selected_var]]))
      } else if (is.character(mysleep[[selected_var]])) {
        return(table(mysleep[[selected_var]]))
      } else if (is.factor(mysleep[[selected_var]])) {
        return(table(mysleep[[selected_var]]))
      } else {
        return(cat("Unsupported data type"))
      }
    }
  })
  
  output$explain <- renderText({
    return("Minimum (Min.): The minimum represents the smallest value within a dataset. 

1st Quartile (1st Qu.): The first quartile represents the value below which 25% of the data fall. 

Median: The median is the middle value in a dataset when arranged in ascending order. 

Mean: The mean is the sum of all values in a dataset divided by the total number of observations. 

3rd Quartile (1st Qu.): The third quartile represents the value below which 75% of the data fall. 

Maximum (Max.): The maximum represents the largest value within a dataset.")
  })
}

shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Plot 7.

I also found my sleep duration increased if I spent less time on my phone or laptop in 1 hour before sleep.

accumulate_by <- function(dat, var) {
  var <- lazyeval::f_eval(var, dat)
  lvls <- plotly:::getLevels(var)
  dats <- lapply(seq_along(lvls), function(x) {
    cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
  })
  dplyr::bind_rows(dats)
}

animation_dt <- read.csv("/Users/clarawang/Desktop/animation_dt.csv")
fig <- animation_dt %>% 
  accumulate_by(~Date)

p4 <- fig %>%
  plot_ly(x = ~ Date, y = ~ Value, split = ~ Variable, frame = ~ frame, type = 'scatter', mode = 'lines', colors = c("#CC0033", "#666666"), color = ~ Variable) %>% 
  layout(title = "Sleep Duration VS Screen Time before Sleep", xaxis = list(title = "Day", showline = TRUE, showgrid = TRUE, zeroline = FALSE), yaxis = list(title = "Time", showline = TRUE, showgrid = TRUE, zeroline = FALSE)) %>% 
  animation_opts(frame = 100, transition = 0) %>% 
  animation_slider(hide = T) %>% 
  animation_button(x = 1.1, xanchor = "right", y = 0, yanchor = "bottom")

p4

Plot 8.

I also slept longer when I had less assignments due that day and the next day.

assign <- my_sleep %>%
  select(c(Numbers.of.Assignment.Due, Sleep.Duration))

library(scales)
ggplot(assign, 
       aes(x = factor(Numbers.of.Assignment.Due,labels = c("0 Assignment", "1 Assignment", "2 Assignments", "3 Assignments", "4 Assignments")), y = Sleep.Duration, color = Numbers.of.Assignment.Due)) +
  geom_jitter(alpha = 0.7) + 
  labs(title = "Sleep Duration VS Number of Assignments Due on that Day", y = "Sleep Duration (Hours)", x = "") +
  scale_color_gradient(low = "grey90", high = "black", name = "Numbers of\nAssignment Due") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5), 
        axis.line = element_line(size = 0.5), 
        panel.grid.major = element_line(size = 0.1), 
        panel.grid.minor = element_line(size = 0.1),
        legend.title = element_text(margin = margin(b = 10)))

Plot 9.

Finally, I found my sleep pattern exceeded that of the larger community and my overall sleep duration in the past two month matched the global trend.

nov.sleep <- my_sleep %>%
  filter(Day %in% c(26:51))

combined_sleep_nov <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = nov.sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_nov <- ggplot(combined_sleep_nov, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against Mine after Changing", x = "Dataset", y = "Sleep Duration") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_nov

combined_sleep_final <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = my_sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_final <- ggplot(combined_sleep_final, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against My Overall Sleep Duration", x = "Dataset", y = "Sleep Duration") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_final