Introduction

Before analyzing the data, it’s important to load the necessary libraries. Each library serves a specific purpose in our analysis.

ggplot2 is essential for creating data visualizations. It allows us to create complex and aesthetically pleasing plots by layering data, geometries, and statistics. plotly adds interactivity to visualizations. This means we can zoom, pan, and hover over elements in the plots to explore the data in greater detail. dplyr is a powerful package for data manipulation. It simplifies tasks such as filtering, summarizing, and transforming datasets into the desired structure for analysis. lme4 provides tools for fitting linear and nonlinear mixed-effects models. In this project, it is used to analyze how reaction times change with sleep deprivation.

library(ggplot2)
library(plotly)
library(dplyr)
library(lme4)

Dataset Overview

For this project, we use the sleepstudy dataset, which is part of the lme4 package. This dataset examines how sleep deprivation impacts reaction time. The sleepstudy dataset contains 180 observations of 18 participants. Reaction represents the measured reaction time of participants. Days represents the number of days the participant has been sleep-deprived. Lastly, Subject represents a unique identifier for each participant in the study. Here’s a step-by-step breakdown:

Pre-Processing the Data

data("sleepstudy", package = "lme4")
print(head(sleepstudy))
##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308
## 3 250.8006    2     308
## 4 321.4398    3     308
## 5 356.8519    4     308
## 6 414.6901    5     308
print(str(sleepstudy))
## 'data.frame':    180 obs. of  3 variables:
##  $ Reaction: num  250 259 251 321 357 ...
##  $ Days    : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
## NULL
missing_values <- sum(is.na(sleepstudy))
cat("Number of missing values:", missing_values, "\n")
## Number of missing values: 0
summary_stats <- sleepstudy %>%
  summarize(
    min_reaction = min(Reaction),
    max_reaction = max(Reaction),
    mean_reaction = mean(Reaction),
    median_reaction = median(Reaction),
    sd_reaction = sd(Reaction)
  )
print(summary_stats)
##   min_reaction max_reaction mean_reaction median_reaction sd_reaction
## 1     194.3322     466.3535      298.5079        288.6508    56.32876

The goal is to determine:

  1. How does reaction time change with increasing sleep deprivation?
  2. Are there individual differences in the impact of sleep deprivation?

Based on this dataset, we see that as the number of days of sleep deprivation increases, reaction time tends to increase as well, indicating a potential decline in performance.The Subject column shows that data is grouped by individual participants, meaning some participants were monitored across multiple days of sleep deprivation.

To assess the distribution of reaction times, we plotted a boxplot using:

ggplot(sleepstudy, aes(x = factor(0), y = Reaction)) +
  geom_boxplot() +
  labs(title = "Boxplot of Reaction Times", x = "", y = "Reaction Time (ms)")

avg_data <- sleepstudy %>%
  group_by(Days) %>%
  summarize(
    mean_reaction = mean(Reaction),
    sd_reaction = sd(Reaction)
  )

Exploratory Analysis

Reaction times range from approximately 178 to 704 milliseconds, with a few high outliers indicating potential extreme cases of sleep deprivation impact.

We identified in our data exploration process that the relationship between reaction time and days of sleep deprivation was crucial for answering our question. However, the raw data only provided individual observations per subject, making it challenging to identify clear trends or variability without visualizations. Our goal is to create meaningful visualizations to better understand these relationships, both at the individual and group levels. To achieve this, we used three different visualization techniques. Each visualization is designed to highlight specific aspects of the data while making the results interpretable and actionable. Below, we break down the steps taken for each visualization and the reasoning behind them.

Visualizations

Our first plot visualizes how reaction time changes across days of sleep deprivation for each individual subject. This allows us to explore individual-level trends and detect variability among participants. The first step was to map the variables: the Days column (representing days of sleep deprivation) to the x-axis, and the Reaction column (representing reaction time in milliseconds) to the y-axis. To distinguish between subjects, we mapped the Subject column to the color aesthetic and grouped the data by Subject. Using geom_line(), we connected each subject’s data points with a line to show trends over time, and we added geom_point() to highlight the actual data points for better visibility.

Reaction Time vs. Days of Sleep Deprivation

ggplot(sleepstudy, aes(x = Days, y = Reaction, group = Subject, color = as.factor(Subject))) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(
    title = "Reaction Time Across Days of Sleep Deprivation",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)",
    color = "Subject ID"
  ) +
  theme_minimal()

From here, we learned that reaction times generally increased as the number of days of sleep deprivation increased, though the rate of change varied by subject. This variability suggests that some individuals are more resilient to sleep deprivation than others, which we will explore further in later analyses.

While the first plot focused on individual trends, our second plot summarizes the data by calculating the average reaction time for all subjects at each level of sleep deprivation. This aggregate-level view helps us understand the overall trend and assess the variability in reaction times. We first transformed the data to calculate the mean and standard deviation of Reaction for each Days value. By adding a regression line for each participant, we can clarify the individual trends. The following code was used to preprocess the data:

ggplot(sleepstudy, aes(x = Days, y = Reaction, color = as.factor(Subject))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Reaction Time Trends with Regression Lines",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  )

Here’s how we visualized the transformed data:

avg_data <- sleepstudy %>%
  group_by(Days) %>%
  summarize(
    mean_reaction = mean(Reaction),
    sd_reaction = sd(Reaction)
  )

ggplot(avg_data, aes(x = Days, y = mean_reaction)) +
  geom_line(size = 1.2, color = "blue") +
  geom_point(size = 3, color = "darkblue") +
  geom_errorbar(aes(ymin = mean_reaction - sd_reaction, ymax = mean_reaction + sd_reaction), width = 0.2) +
  labs(
    title = "Mean Reaction Time Across Sleep Deprivation Days",
    x = "Days of Sleep Deprivation",
    y = "Mean Reaction Time (ms)"
  ) +
  theme_light()

We see here that the mean reaction time increases steadily with days of sleep deprivation. Error bars grow larger over time, indicating greater variability among participants after extended sleep deprivation.

Static plots can limit user exploration. An interactive plot allows users to focus on specific participants or data points.

In this plot, the blue line represents the mean reaction time at each level of sleep deprivation. Error bars show ±1 standard deviation, providing a sense of the variability in reaction times. We see here that the average reaction time increases steadily with more days of sleep deprivation, confirming our hypothesis that sleep deprivation impairs reaction time. The error bars increase in size as days progress, indicating greater variability in how participants respond to prolonged sleep deprivation.

Statistical Analysis

Interactive Exploration of Reaction Times

The third step was to create an interactive visualization that allows users to explore individual trends in reaction times dynamically. By using plotly, we converted a static ggplot into an interactive experience. The interactive features allow users to hover over points to see exact reaction times and subject IDs.

p <- ggplot(sleepstudy, aes(x = Days, y = Reaction, color = as.factor(Subject))) +
  geom_point(size = 2) +
  geom_line(aes(group = Subject)) +
  labs(
    title = "Interactive Plot: Reaction Times by Sleep Deprivation Days",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  ) +
  theme_classic()

ggplotly(p)

This visualization provides the same information as Plot 1 but in an interactive format. By hovering over points, users can see the exact day of sleep deprivation, the corresponding reaction time, and the subject’s ID. The interactivity makes it easier to identify specific outliers or trends, such as subjects who show significant resistance to sleep deprivation.

Correlation

ggplot(sleepstudy, aes(x = Days, y = Reaction)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Correlation Between Days of Sleep Deprivation and Reaction Time",
    x = "Days of Sleep Deprivation",
    y = "Reaction Time (ms)"
  ) +
  theme_minimal()

Conclusion This analysis of the sleepstudy dataset highlights the cognitive decline associated with sleep deprivation. While average trends suggest a steady increase in reaction times, individual differences reveal that some participants are more resilient to sleep deprivation. Future work could explore additional variables (e.g., age, baseline cognitive ability) to explain this variability.