Introduction and Motivations:

After entering my senior year of university, because of tons of academic works and anxiety about application to graduate schools, I always felt sleep-deprived and mentally exhausted, which affected both my studies and my mental health. However, I always thought my sleep pattern was normal because many people online mentioned having similar issues. This raised a perplexing question: were sleep issues a universal challenge? So, I started this project about sleep and lifestyle. Studying sleep is meaningful because sleep significantly impacts mental health. Reflecting back, my freshman year was particularly challenging due to the pandemic, requiring me to accommodate with jet lag while attending classes at night hours, often resulting in disrupted sleep and subsequently being diagnosed with anxiety. Therefore, the purpose of this data visualization is to demonstrate the importance of healthy sleep habits. Sleep is a critical aspect of well-being, and understanding our individual sleep patterns and comparing them to global trends can help us make informed decisions about our health and lifestyle choices. In this data-driven story, I will present a comparative analysis of sleep patterns and various lifestyle habits that influence the sleep duration, highlighting the significant differences between my own sleep duration and those of a broader community, in this case, the world’s sleep duration.

Data Features:

Sleep Dataset from a Larger Community:

The dataset contains sleep and lifestyle data from 400 individuals. It is shared by Laksika Tharmalingam on Kaggle website. It has 400 rows and 13 columns.

Variables:
  • Person ID: An identifier for each individual.

  • Sleep Duration (hours): The number of hours the person sleeps per day.

  • Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.

  • Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

  • Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

  • Heart Rate (bpm): The resting heart rate of the person in beats per minute.

  • Daily Steps: The number of steps the person takes per day.

  • Age: The age of the person in years.

  • Occupation: The occupation or profession of the person.

  • Gender: The gender of the person (Male, Female).

  • BMI Category: The BMI category of the person (Underweight, Normal, Overweight).

  • Blood Pressure: The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure (Systolic, Diastolic).

  • Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

Personal Sleep Dataset:

The dataset comprises my personal sleep and lifestyle data recorded over nearly 2 months. It has 51 rows and 11 columns.

Variables:
  • Day: An identifier for date.

  • Sleep Duration (hours): The number of hours I sleep.

  • Steps: The number of steps I take.

  • Physical Activity Level (minutes/day): The number of minutes I engage in physical activity, including aerobic and anaerobic exercise, that day.

  • Screen Time in 1 Hour before Sleep (minutes): The time I spend on phone or laptop in 1 hour before sleep on that day.

  • Coffee Consumption in Cups: The number of coffee I drink on that day.

  • Numbers of Assignment Due: The number of assignments due that day.

  • Date: The date of measurement.

  • Weekend or Weekday: Whether that day is a weekday or during weekend (weekday, weekend).

  • Go to Gym or Not: Whether I go to gym to exercise or not on that day (yes, no).

  • Have Carbohydrate in Dinner or Not: Whether I have carbohydrate in dinner or not on that day (yes, no).

knitr::opts_chunk$set(message = FALSE)
library(ggplot2)
library(dplyr)
library(plotly)
library(patchwork)


global_sleep <- read.csv("/Users/clarawang/Desktop/sleep_project.csv")
my_sleep <- read.csv("/Users/clarawang/Desktop/my_sleep.csv")

Plot 1.

By comparing my sleep duration in October and sleep duration of a larger community, I found my sleep duration is less than the global trend. Therefore, I decided to make some changes of my lifestyle in the next month to improve my sleep pattern.

oct.sleep <- my_sleep %>%
  filter(Day %in% c(1:25))

combined_sleep_oct <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = oct.sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_oct <- ggplot(combined_sleep_oct, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against Mine before Changing", x = "Dataset", y = "Sleep Duration (Hours)") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_oct

Plot 2.

To start with, I took a look at the impact of physical activity level on sleep duration as I usually heard exercising could bring good sleep. As I am female, I particularlly focus on the female data and discovered a strong positive linear relationship between physical activity level and sleep duration, except for several outlier. Therefore, I decided to exercise more in the next month.

colors <- c("Male" = "#666666", "Female" = "#CC0033")

female <- global_sleep %>%
  filter(Gender == "Female")
  
p1 <- ggplot(female, aes(x = Physical.Activity.Level, y = Sleep.Duration, color = "Female")) +
  geom_point(size = 2.3, alpha = 0.2) +
  labs(x = "Physical Activity Level", y = "Sleep Duration (Hours)", subtitle = "Female") +
  scale_color_manual(values = colors) + 
  theme_minimal() + 
  guides(color = FALSE) +
  theme(
    plot.subtitle = element_text(hjust = 0.5, color = "#CC0033", size = 13, margin = margin(0,0,-3,0)), 
    plot.margin = margin(t = 10, unit = "pt"), 
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0.1),
    panel.grid.minor = element_line(size = 0.1)
  )


male <- global_sleep %>%
  filter(Gender == "Male")

p2 <- p1 + ggplot(male, aes(x = Physical.Activity.Level, y = Sleep.Duration, color = "Male")) +
  geom_point(size = 2.3, alpha = 0.2) +
  labs(x = "Physical Activity Level", y = "Sleep Duration (Hours)", subtitle = "Male", title = "Relationship between Physical Activity Level and Sleep Duration") +
  scale_color_manual(values = colors) + 
  guides(color = FALSE) +
  theme_minimal() + 
  theme(
    plot.subtitle = element_text(hjust = 0.5, color = "#666666", size = 13, margin = margin(0,0,-3,0)),
    plot.title = element_text(hjust = 1.0, size = 16, margin = margin(0,0,10,0)),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0.1),
    panel.grid.minor = element_line(size = 0.1)
    )
p2

Plot 3.

Then, I shifted my focus to the influence of daily steps and paid more attention to the BMI category that I belonged to. The data also illustrated a moderately positively linear relationship between daily steps and sleep duration. Thus, I decided to walk more instead of scooting or driving in the next month.

knitr::include_graphics("/Users/clarawang/Desktop/plot3.png")

Plot 4.

Besides, I was curious about the sleep patterns of scientists, which was going to be my ultimate career goal. Sadly, both the sleep duration and sleep quality of scientists were almost the lowest among all the surveyed occupations. Hence, I would like to change my lifestyle as quick as possible to sleep more in the rest of my undergraduate life before stepping into the real field of science.

knitr::include_graphics("/Users/clarawang/Desktop/plot4.png")

Plot 5.

As I changed my lifestyle habits after 10.25, my sleep duration increased remarkably. Moreover, my sleep duration was longer when I went to gym.

order <- my_sleep$Date

dt <- my_sleep %>%
  mutate(Date = factor(Date, levels = order))

colors <- c("#666666", "#CC0033")

p3 <- ggplot(dt, aes(x = Date, y = Sleep.Duration, fill = Go.to.Gym.or.Not)) +
  geom_bar(stat = "identity", position = position_dodge(width = 1.0), width = 0.9, alpha = 0.8) +
  labs(x = "Date", y = "Sleep Duration (Hours)", title = "Sleep Duration Change over Time") +
  scale_fill_manual(values = colors, name = "Go to Gym or Not") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0),
    legend.key.width = unit(0.3, "cm"), 
    legend.key.height = unit(0.3, "cm"), 
    axis.text.x = element_text(angle = 45, hjust = 1, size = 5)
  )
p3

Plot 6.

In November, I walked more, exercised more, and drank less coffee per day, all of which improved my sleep duration. I also slept longer during weekends.

setwd("/Users/clarawang/Desktop/shiny")
mysleep <- read.csv("/Users/clarawang/Desktop/my_sleep.csv")

library(shiny)
library(ggplot2)
library(dplyr)
library(RColorBrewer)

mysleep <- mysleep %>%
  mutate(Sleep_Duration = as.numeric(Sleep.Duration), 
         Step_Count = as.numeric(Steps), 
         Weekend_or_Weekday = as.character(Weekend.or.Weekday),
         Go_to_Gym_or_Not = as.character(Go.to.Gym.or.Not),
         Have_Carbohydrate_in_Dinner_or_Not = as.character(Have.Carbohydrate.in.Dinner.or.Not), 
         Physical_Activity_Level = as.numeric(Physical.Activity.Level),
         Coffee_Consumption_in_Cups = as.factor(Coffee.Consumption.in.Cups),
  ) %>%
  select(c("Sleep_Duration", "Step_Count", "Weekend_or_Weekday", "Go_to_Gym_or_Not", "Have_Carbohydrate_in_Dinner_or_Not", "Physical_Activity_Level", "Coffee_Consumption_in_Cups"))

all_vars <- colnames(mysleep)[-1]

ui <- fluidPage(
  titlePanel("My Sleep Information"),
  sidebarLayout(
    sidebarPanel(
      selectInput("variable", "Select a variable:", choices = all_vars),
      checkboxInput("summary", "Show Summary", TRUE)
    ),
    mainPanel(
      plotOutput("plot"),
      verbatimTextOutput("summary")
    )
  )
)

server <- function(input, output) {
  
  output$plot <- renderPlot({
    selected_var <- input$variable
    
    if (is.numeric(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration")) +
        geom_point() +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0(selected_var, " VS Sleep Duration")) +
        geom_smooth(method = "lm", se = FALSE, aes(group = 1), size = 0.5, color = "grey") +
        theme_minimal() +
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else if (is.character(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration")) +
        geom_boxplot() +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0(selected_var, " VS Sleep Duration")) +
        theme_minimal() + 
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else if (is.factor(mysleep[[selected_var]])) {
      ggplot(mysleep, aes_string(x = selected_var, y = "Sleep_Duration", fill = selected_var)) +
        geom_bar(stat = "identity", position = "dodge") +
        labs(x = selected_var, y = "Sleep Duration (Hours)", title = paste0(selected_var, " VS Sleep Duration")) +
        scale_fill_brewer(palette = "Greys") +
        theme_minimal() +
        theme(plot.title = element_text(hjust = 0.5), 
              axis.line = element_line(size = 0.5), 
              panel.grid.major = element_line(size = 0.1), 
              panel.grid.minor = element_line(size = 0.1))
    } else {
      return(NULL)
    }
  })
  
  output$summary <- renderPrint({
    if (!input$summary) {
      return(cat("Summary is hidden"))
    } else {
      selected_var <- input$variable
      if (is.numeric(mysleep[[selected_var]])) {
        return(summary(mysleep[[selected_var]]))
      } else if (is.character(mysleep[[selected_var]])) {
        return(table(mysleep[[selected_var]]))
      } else if (is.factor(mysleep[[selected_var]])) {
        return(table(mysleep[[selected_var]]))
      } else {
        return(cat("Unsupported data type"))
      }
    }
  })
}

shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Plot 7.

I also found my sleep duration increased if I spent less time on my phone or laptop in 1 hour before sleep.

accumulate_by <- function(dat, var) {
  var <- lazyeval::f_eval(var, dat)
  lvls <- plotly:::getLevels(var)
  dats <- lapply(seq_along(lvls), function(x) {
    cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
  })
  dplyr::bind_rows(dats)
}

animation_dt <- read.csv("/Users/clarawang/Desktop/animation_dt.csv")
fig <- animation_dt %>% 
  accumulate_by(~Date)

p4 <- fig %>%
  plot_ly(x = ~ Date, y = ~ Value, split = ~ Variable, frame = ~ frame, type = 'scatter', mode = 'lines', colors = c("#CC0033", "#666666"), color = ~ Variable) %>% 
  layout(title = "Screen Time before Sleep and Sleep Duration", xaxis = list(title = "Day", showline = TRUE, showgrid = TRUE, zeroline = FALSE), yaxis = list(title = "Time", showline = TRUE, showgrid = TRUE, zeroline = FALSE)) %>% 
  animation_opts(frame = 100, transition = 0) %>% 
  animation_slider(hide = T) %>% 
  animation_button(x = 1.1, xanchor = "right", y = 0, yanchor = "bottom")

p4

Plot 8.

I also slept longer when I had less assignments due that day and the next day.

assign <- my_sleep %>%
  select(c(Numbers.of.Assignment.Due, Sleep.Duration))

library(scales)
ggplot(assign, 
       aes(x = factor(Numbers.of.Assignment.Due,labels = c("0 Assignment", "1 Assignment", "2 Assignments", "3 Assignments", "4 Assignments")), y = Sleep.Duration, color = Numbers.of.Assignment.Due)) +
  geom_jitter(alpha = 0.7) + 
  labs(title = "Sleep Duration by Number of Assignments Due on that Day", y = "Sleep Duration (Hours)", x = "") +
  scale_color_gradient(low = "grey90", high = "black", name = "Numbers of\nAssignment Due") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5), 
        axis.line = element_line(size = 0.5), 
        panel.grid.major = element_line(size = 0.1), 
        panel.grid.minor = element_line(size = 0.1),
        legend.title = element_text(margin = margin(b = 10)))

Plot 9.

Finally, I found my sleep pattern exceeded that of the larger community and my overall sleep duration in the past two month matched the global trend.

nov.sleep <- my_sleep %>%
  filter(Day %in% c(26:51))

combined_sleep_nov <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = nov.sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_nov <- ggplot(combined_sleep_nov, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against Mine after Changing", x = "Dataset", y = "Sleep Duration") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_nov

combined_sleep_final <- rbind(
  data.frame(Dataset = "Global Sleep", Duration = global_sleep$Sleep.Duration),
  data.frame(Dataset = "My Sleep", Duration = my_sleep$Sleep.Duration)
)

colors <- c("#666666", "#CC0033")

p_final <- ggplot(combined_sleep_final, aes(x = Dataset, y = Duration, fill = Dataset)) +
  geom_boxplot() +
  labs(title = "Comparison of Global Sleep Duration against My Overall Sleep Duration", x = "Dataset", y = "Sleep Duration") +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_line(size = 0.5),
    panel.grid.major = element_line(size = 0),
    panel.grid.minor = element_line(size = 0)
  )
p_final