STAT3333_Final_projectrmd

Technology Useage & Stress Wellness

Abstract
As technology advances and society’s dependence on it deepens, there is growing worry that excessive use of technology can negatively impact users’ stress levels. In fact, the American Psychological Association found that high attachment levels to devices and the constant use of technology is associated with higher stress levels, and about a fifth of Americans identify the use of technology as a significant source of stress (American Psychological Association, 2017). As such, looking at the relationship between technology usage and stress can be extremely important in understanding how technology can negatively impact stress levels for its users.

Problem
As stated previously, it has been found that technology usage can negatively affect stress levels for its users. Multiple factors other than technology can also influence stress and wellness, including but not limited to age, gender, and physical activity levels (Idrees, Blal et al). Understanding what factors affect stress levels can help mitigate or control stress levels and improve personal wellness.

Purpose
The purpose of this project is to determine if stress levels differ significantly by gender or age group, if screen time is correlated with mental health scores, if stress and physical activity are independent, and if regression models can predict stress levels based on tech use. Understanding how tech use impacts wellness can inform healthier digital habits and public health strategies.

The Dataset
The dataset used in this project was sourced from Kaggle and was made available on Kaggle by Nagpal Prabhavalkar. The dataset contains information on 5,000 participants and 25 variables. The variables are as follows: the user id of the participant, the age of the participant, the participant’s gender, daily screen time (in hours), daily phone usage (in hours), daily laptop usage (in hours), daily tablet usage (in hours), daily tv usage (in hours), daily social media usage (in hours), daily work related technology usage (in hours), entertainment hours, gaming hours, sleep duration (in hours), sleep quality (rated on a scale of 1 to 5), mood rating (rated on a scale of 1 to 10), stress level (rated on a scale of 1 to 10), physical activity (in hours per week), location type (rated as rural, suburban, or urban), mental health score (rated on a scale of 1 to 100), if the participant uses wellness apps (true or false), if the participant eats healthy (true or false), caffeine intake (in milligrams per day), weekly anxiety score (rated from 0 to 20), weekly depression score (from 0 to 20) and the participant’s mindfulness score (in minutes per day).

The variables that will be focused on in this project will be gender, age, daily screen time, stress level, mental health score, physical activity, mood rating, sleep quality, and location type. It is important to note that categories like stress level, mental health score, mood rating, and sleep quality are subjective, and there may not be consistency in how participants answer or respond to these categories.

This project analyzes the relationship between technology usage (daily screen time, phone usage) and wellness outcomes (stress level, mental health score, sleep quality). We investigate differences across age groups, genders, and location types.
Research Questions:
• Do stress levels differ significantly by gender or age group?
• Is screen time correlated with mental health scores?
• Are stress and physical activity independent?
• Can regression models predict stress from tech use?
Relevance: Understanding how tech use impacts wellness can inform healthier digital habits and public health strategies.

# Load the data
data <- read.csv("Tech_Use_Stress_Wellness.csv")
# Exploratory Data Analysis
#View(data)
#str(data)
#glimpse(data)
#summary(data)
ggplot(data, aes(x = daily_screen_time_hours)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Daily Screen Time")

# Daily screen time by gender
ggplot(data, aes(x = daily_screen_time_hours, fill = gender)) + labs(title = "Daily Screen Time per Gender", x = "Daily Screen Time (hours)", y = "Count per Gender" ) + geom_histogram() + theme_minimal()

# Correlation between daily screen time and stress level
ggplot(data, aes(x = daily_screen_time_hours, y = stress_level, color = gender)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Screen Time vs Stress by Gender",
       x = "Daily Screen Time (hours)",
       y = "Stress Level") + theme_minimal()

# Distribution of Screen Time and Physical Activity
ggplot(data, aes(x = daily_screen_time_hours)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Daily Screen Time",
       x = "Daily Screen Time (hours)",
       y = "Count") +
  theme_minimal()

ggplot(data, aes(x = physical_activity_hours_per_week)) +
  geom_histogram(binwidth = 1, fill = "darkgreen", color = "white") +
  labs(title = "Distribution of Weekly Physical Activity",
       x = "Physical Activity (hours per week)",
       y = "Count") +
  theme_minimal()

# Wellness
ggplot(data, aes(x = daily_screen_time_hours, y = mental_health_score)) +
  geom_point(alpha = 0.6, color = "steelblue", size = 2) +   # softer points
  geom_smooth(method = "lm", se = TRUE, color = "darkred", linetype = "dashed", size = 1.2) +
  labs(
    title = "Daily Screen Time vs Mental Health Score",
    x = "Daily Screen Time (hours)",
    y = "Mental Health Score") + theme_minimal()

# Summary statistics
data %>%
  group_by(gender) %>%
  summarise(mean_stress = mean(stress_level, na.rm = TRUE),
            mean_wellness = mean(mental_health_score, na.rm = TRUE))

## # A tibble: 3 × 3
##   gender mean_stress mean_wellness
##   <chr>        <dbl>         <dbl>
## 1 Female        5.72          64.7
## 2 Male          5.69          64.9
## 3 Other         6.07          63.4

data %>%
  group_by(location_type) %>%
  summarise(mean_stress = mean(stress_level, na.rm = TRUE),
            mean_wellness = mean(mental_health_score, na.rm = TRUE))

## # A tibble: 3 × 3
##   location_type mean_stress mean_wellness
##   <chr>               <dbl>         <dbl>
## 1 Rural                5.64          65.2
## 2 Suburban             5.63          65.1
## 3 Urban                5.81          64.4

data %>%
  group_by(age) %>%
  summarise(mean_stress = mean(stress_level, na.rm = TRUE),
            mean_wellness = mean(mental_health_score, na.rm = TRUE))

## # A tibble: 60 × 3
##      age mean_stress mean_wellness
##    <int>       <dbl>         <dbl>
##  1    15        8.96          52.3
##  2    16        8.56          54.3
##  3    17        8.99          52.1
##  4    18        8.71          52.6
##  5    19        8.9           52.5
##  6    20        8.94          52.4
##  7    21        8.13          55.1
##  8    22        8.89          52.4
##  9    23        8.68          52.5
## 10    24        8.94          51.2
## # ℹ 50 more rows

# Mean daily screen time by location type
data %>%
  group_by(location_type) %>%
  summarise(mean_screen_time = mean(daily_screen_time_hours, na.rm = TRUE))

## # A tibble: 3 × 2
##   location_type mean_screen_time
##   <chr>                    <dbl>
## 1 Rural                     5.00
## 2 Suburban                  4.94
## 3 Urban                     5.11

# Compare CI for stress levels across gender/age groups.
# CI for mean stress level using ggpltot2 and infer package
boot_dist <- data %>%
  specify(response = stress_level) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")

boot_ci <- boot_dist %>%
  get_ci(level = 0.95, type = "percentile")
boot_ci

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1     5.63     5.80

# Manual Bootstrap for mean stress level from Chap05Bootstrap.Rmd Chihara, L., & Hesterberg, T. (2019). Mathematical Statistics with Resampling and R.
x <- data$stress_level
n <- length(x)
N <-  10^4

data.mean<-numeric(N)
#set.seed(2025)
for (i in 1:N)
{
   boot_sample <- sample(x, n, replace = TRUE)
   data.mean[i] <- mean(boot_sample)
}

# Check normality
hist(data.mean, main = "Bootstrap distribution of means")
abline(v = mean(data.mean), col = "blue", lty = 2)

qqnorm(data.mean)
qqline(data.mean)

#bootstrap mean
mean(data.mean)

## [1] 5.717703

#bootstrap standard error or std dev of the boot means
sd(data.mean)

## [1] 0.04138756

# 95% boot percentile CI
quantile(data.mean, c(0.025, 0.975))

##    2.5%   97.5% 
## 5.63760 5.79861

# Hypothesis test
# Example: Is mean stress > 8?
t.test(data$stress_level, mu = 8, alternative = "greater")

## 
##  One Sample t-test
## 
## data:  data$stress_level
## t = -55.345, df = 4999, p-value = 1
## alternative hypothesis: true mean is greater than 8
## 95 percent confidence interval:
##  5.650578      Inf
## sample estimates:
## mean of x 
##    5.7184

# Compare stress by gender standard t.test insufficient due to 3 factors - use pairwise t.test
pairwise.t.test(data$stress_level, data$gender, p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  data$stress_level and data$gender 
## 
##       Female Male
## Male  1.00   -   
## Other 0.34   0.24
## 
## P value adjustment method: bonferroni

# Correlation matrix for key numeric variables
numeric_vars <- data %>%
  select(stress_level,
         daily_screen_time_hours,
         mental_health_score,
         physical_activity_hours_per_week)

cor_matrix <- cor(numeric_vars, use = "complete.obs")

cor_matrix

##                                  stress_level daily_screen_time_hours
## stress_level                        1.0000000               0.6656726
## daily_screen_time_hours             0.6656726               1.0000000
## mental_health_score                -0.9038402              -0.6218312
## physical_activity_hours_per_week   -0.7421300              -0.4626024
##                                  mental_health_score
## stress_level                              -0.9038402
## daily_screen_time_hours                   -0.6218312
## mental_health_score                        1.0000000
## physical_activity_hours_per_week           0.7858499
##                                  physical_activity_hours_per_week
## stress_level                                           -0.7421300
## daily_screen_time_hours                                -0.4626024
## mental_health_score                                     0.7858499
## physical_activity_hours_per_week                        1.0000000

# One-sample CI and hypothesis test for mean stress level
mean_stress <- mean(data$stress_level, na.rm = TRUE)
sd_stress   <- sd(data$stress_level, na.rm = TRUE)
n_stress    <- sum(!is.na(data$stress_level))

# 95% CI for population mean stress
se_stress <- sd_stress / sqrt(n_stress)
lower_ci  <- mean_stress - 1.96 * se_stress
upper_ci  <- mean_stress + 1.96 * se_stress
c(lower_ci = lower_ci, upper_ci = upper_ci)

## lower_ci upper_ci 
## 5.637599 5.799201

# One-sample t-test: is mean stress > 5?
t.test(data$stress_level, mu = 5, alternative = "greater")

## 
##  One Sample t-test
## 
## data:  data$stress_level
## t = 17.426, df = 4999, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 5
## 95 percent confidence interval:
##  5.650578      Inf
## sample estimates:
## mean of x 
##    5.7184

# Correlation between activity and stress
cor.test(data$physical_activity, data$stress_level)

## 
##  Pearson's product-moment correlation
## 
## data:  data$physical_activity and data$stress_level
## t = -78.278, df = 4998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7543317 -0.7294157
## sample estimates:
##      cor 
## -0.74213

# Linear regression: stress ~ screen time
lm_model <- lm(stress_level ~ daily_screen_time_hours, data = data)
summary(lm_model)

## 
## Call:
## lm(formula = stress_level ~ daily_screen_time_hours, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6869 -1.6127 -0.0125  1.5731  6.2244 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.39280    0.08988    4.37 1.27e-05 ***
## daily_screen_time_hours  1.05711    0.01676   63.06  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.176 on 4998 degrees of freedom
## Multiple R-squared:  0.4431, Adjusted R-squared:  0.443 
## F-statistic:  3977 on 1 and 4998 DF,  p-value: < 2.2e-16

# Plot regression
ggplot(data, aes(x = daily_screen_time_hours, y = stress_level)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", se = TRUE, color = "darkred", linetype = "dashed") +
  labs(title = "Screen Time vs Stress Level")+ theme_minimal()

# Simple linear regression: mental health ~ screen time
lm_mh_screen <- lm(mental_health_score ~ daily_screen_time_hours, data = data)
summary(lm_mh_screen)

## 
## Call:
## lm(formula = mental_health_score ~ daily_screen_time_hours, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.476  -7.280  -0.032   7.308  37.495 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             87.11546    0.42374  205.59   <2e-16 ***
## daily_screen_time_hours -4.43626    0.07903  -56.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.26 on 4998 degrees of freedom
## Multiple R-squared:  0.3867, Adjusted R-squared:  0.3866 
## F-statistic:  3151 on 1 and 4998 DF,  p-value: < 2.2e-16

# 95% CI for the screen-time slope
confint(lm_mh_screen, parm = "daily_screen_time_hours", level = 0.95)

##                             2.5 %    97.5 %
## daily_screen_time_hours -4.591194 -4.281327

# Simple linear regression: stress ~ physical activity
lm_stress_activity <- lm(stress_level ~ physical_activity_hours_per_week, data = data)
summary(lm_stress_activity)

## 
## Call:
## lm(formula = stress_level ~ physical_activity_hours_per_week, 
##     data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.625 -1.418 -0.058  1.753  9.921 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       8.23193    0.04236  194.31   <2e-16 ***
## physical_activity_hours_per_week -0.94517    0.01207  -78.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.954 on 4998 degrees of freedom
## Multiple R-squared:  0.5508, Adjusted R-squared:  0.5507 
## F-statistic:  6127 on 1 and 4998 DF,  p-value: < 2.2e-16

# 95% CI for the physical-activity slope
confint(lm_stress_activity,
        parm = "physical_activity_hours_per_week",
        level = 0.95)

##                                       2.5 %     97.5 %
## physical_activity_hours_per_week -0.9688433 -0.9215002

# Simple linear regression: stress ~ mental health score
lm_stress_mh <- lm(stress_level ~ mental_health_score, data = data)
summary(lm_stress_mh)

## 
## Call:
## lm(formula = stress_level ~ mental_health_score, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2690 -0.8606 -0.0499  0.9084  5.1334 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         18.748775   0.089018   210.6   <2e-16 ***
## mental_health_score -0.201191   0.001347  -149.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.247 on 4998 degrees of freedom
## Multiple R-squared:  0.8169, Adjusted R-squared:  0.8169 
## F-statistic: 2.23e+04 on 1 and 4998 DF,  p-value: < 2.2e-16

# 95% CI for the slope (effect of mental health on stress)
confint(lm_stress_mh, parm = "mental_health_score", level = 0.95)

##                          2.5 %     97.5 %
## mental_health_score -0.2038321 -0.1985499

# ANOVA: stress by age group
data$age_group <- cut(
  data$age,
  breaks = c(0, 25, 40, 60, 80, 100),
  labels = c("≤25", "26-40", "41-60", "61-80", "81+"),
  right = TRUE
)

anova_age <- aov(stress_level ~ age_group, data = data)
summary(anova_age)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## age_group      3  10928    3643   576.8 <2e-16 ***
## Residuals   4996  31551       6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Diagnostics: Q-Q plot
plot(anova_age, which = 2)

# Permutation ANOVA for robustness
set.seed(2025)
observed <- anova(lm(stress_level ~ age_group, data = data))$`F value`[1]
N <- 1000
results <- numeric(N)
for (i in 1:N) {
  index <- sample(seq_along(data$stress_level))
  stress_perm <- data$stress_level[index]
  results[i] <- anova(lm(stress_perm ~ age_group, data = data))$`F value`[1]
}
perm_p <- (sum(results >= observed) + 1) / (N + 1)
perm_p

## [1] 0.000999001

# Post-hoc Tukey HSD
TukeyHSD(anova_age)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = stress_level ~ age_group, data = data)
## 
## $age_group
##                   diff        lwr        upr     p adj
## 26-40-≤25   -3.2093500 -3.4917621 -2.9269378 0.0000000
## 41-60-≤25   -3.6099167 -3.8781558 -3.3416777 0.0000000
## 61-80-≤25   -4.3777632 -4.6641864 -4.0913399 0.0000000
## 41-60-26-40 -0.4005668 -0.6415778 -0.1595558 0.0001164
## 61-80-26-40 -1.1684132 -1.4295117 -0.9073148 0.0000000
## 61-80-41-60 -0.7678465 -1.0135454 -0.5221475 0.0000000

# Plots side by side
p1 <- ggplot(data, aes(x = age_group, y = stress_level)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Stress Level by Age Group") + theme_minimal()

p2 <- ggplot(data, aes(x = age_group, y = daily_screen_time_hours)) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "darkgreen") +
  labs(title = "Daily Screen Time by Age Group") + theme_minimal()

p1 + p2

Stress by Age Group
A one-way ANOVA was conducted to examine differences in stress levels across age groups. One group (81+) had no observations, so the analysis effectively compared four age groups (≤25, 26–40, 41–60, 61–80). The ANOVA indicated a strong overall effect of age group on stress, F(3, 4996) = 576.8, p < 0.001.
Tukey post-hoc comparisons showed that the youngest group (≤25) reported substantially higher mean stress than all older groups (differences ≈ 3–4.4 units on the stress scale, all p < 0.001). Stress decreased with age: 26–40 had lower stress than ≤25, 41–60 lower than 26–40, and 61–80 had the lowest stress levels overall.
A permutation ANOVA yielded a very small p-value (≈ 0.001), consistent with the classical ANOVA and reinforcing that age-related differences in stress are highly unlikely to be due to chance. Visualizations of stress and daily screen time by age group suggested that younger adults both reported higher screen time and higher stress, pointing to a potential link between intensive tech use and stress in younger participants.

# ANOVA: stress by location type
anova_loc <- aov(stress_level ~ location_type, data = data)
summary(anova_loc)

##                 Df Sum Sq Mean Sq F value Pr(>F)
## location_type    2     39  19.309   2.273  0.103
## Residuals     4997  42441   8.493

# Diagnostics
plot(anova_loc, which = 2)

# Permutation ANOVA
set.seed(2025)
observed <- anova(lm(stress_level ~ location_type, data = data))$`F value`[1]
N <- 1000
results <- numeric(N)
for (i in 1:N) {
  index <- sample(seq_along(data$stress_level))
  stress_perm <- data$stress_level[index]
  results[i] <- anova(lm(stress_perm ~ location_type, data = data))$`F value`[1]
}
perm_p <- (sum(results >= observed) + 1) / (N + 1)
perm_p

## [1] 0.1288711

# Post-hoc Tukey HSD
TukeyHSD(anova_loc)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = stress_level ~ location_type, data = data)
## 
## $location_type
##                       diff         lwr       upr     p adj
## Suburban-Rural -0.01390902 -0.29024488 0.2624268 0.9923526
## Urban-Rural     0.16735340 -0.08479586 0.4195027 0.2650430
## Urban-Suburban  0.18126242 -0.04329960 0.4058244 0.1409578

# Plots side by side
p3 <- ggplot(data, aes(x = location_type, y = stress_level)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Stress Level by Location") + theme_minimal()

p4 <- ggplot(data, aes(x = location_type, y = daily_screen_time_hours)) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "darkgreen") +
  labs(title = "Daily Screen Time by Location") + theme_minimal()

p3 + p4

Stress by Location Type
A one-way ANOVA compared mean stress levels across location types (Urban, Suburban, Rural). The overall effect of location was not statistically significant, F(2, 4997) = 2.27, p = 0.10. A permutation ANOVA produced a similar p-value (≈ 0.13), indicating that any observed differences in mean stress by location could plausibly be due to random variation.
Tukey post-hoc comparisons showed that the estimated mean stress in urban areas was slightly higher than in rural and suburban areas, but all confidence intervals included zero and all adjusted p-values exceeded 0.10, so no pairwise differences met conventional significance thresholds.
Boxplots and jittered scatterplots suggest a small trend toward higher stress and higher screen time in urban participants, but given the non-significant test results, these patterns should be interpreted cautiously as suggestive rather than conclusive.

# Chi-square test for independence of mood and sleep quality
tab <- table(data$mood_rating, data$sleep_quality)
chisq.test(tab)

## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 1083.8, df = 360, p-value < 2.2e-16

chi <- chisq.test(tab)
#chi$stdres
# Identify cells with large standardized residuals
thr <- 3
idx <- which(chi$stdres > thr, arr.ind = TRUE)

cbind(
  mood_rating  = rownames(chi$stdres)[idx[,"row"]],
  sleep_quality = colnames(chi$stdres)[idx[,"col"]],
  std_residual = chi$stdres[idx]
)

##       mood_rating sleep_quality std_residual      
##  [1,] "1.1"       "1"           "9.57135908716979"
##  [2,] "1"         "2"           "5.09738157829577"
##  [3,] "2.8"       "2"           "3.382164807067"  
##  [4,] "3.7"       "2"           "3.2831630034317" 
##  [5,] "4"         "2"           "4.39496047660441"
##  [6,] "1"         "3"           "14.3292517589722"
##  [7,] "1.2"       "3"           "4.35712315816821"
##  [8,] "1.5"       "3"           "3.42716588748514"
##  [9,] "1.6"       "3"           "3.15054601419805"
## [10,] "7.5"       "5"           "3.94815298706102"
## [11,] "7.8"       "5"           "5.50404644494962"
## [12,] "7.9"       "5"           "4.74986302156519"
## [13,] "8"         "5"           "4.76489626598922"
## [14,] "8.2"       "5"           "3.2582363043431" 
## [15,] "8.3"       "5"           "4.4404007644662" 
## [16,] "8.4"       "5"           "4.19693667031868"
## [17,] "8.6"       "5"           "4.19166505701785"
## [18,] "8.7"       "5"           "3.7209925989437" 
## [19,] "8.9"       "5"           "4.63503982505443"
## [20,] "9.3"       "5"           "3.39793654540615"
## [21,] "9.4"       "5"           "3.17726105895486"
## [22,] "9.5"       "5"           "3.02666882845495"
## [23,] "9.6"       "5"           "3.33977948902774"
## [24,] "10"        "5"           "7.52878571901207"

# Bin mood ratings into categories
data$mood_cat <- cut(data$mood_rating,
                     breaks = c(0, 3, 7, 10),
                     labels = c("Low", "Medium", "High"),
                     include.lowest = TRUE)

# Drop rows with missing values
clean_df <- na.omit(data[, c("mood_cat", "sleep_quality")])

# Force both variables to be factors with explicit levels
clean_df$mood_cat <- factor(clean_df$mood_cat,
                            levels = c("Low", "Medium", "High"))

clean_df$sleep_quality <- factor(clean_df$sleep_quality,
                                 levels = c("1","2","3","4","5"))

# Build contingency table
tab <- table(clean_df$mood_cat, clean_df$sleep_quality)

library(knitr)

# Convert to data frame for kable
df_tab <- as.data.frame.matrix(tab)

# Create kable
kable(df_tab, caption = "Contingency Table of Mood Category by Sleep Quality",
      align = "c")

Contingency Table of Mood Category by Sleep Quality
	1	2	3	4	5
Low	1	22	599	1096	177
Medium	0	9	336	1224	445
High	0	0	44	578	469

Independence of Mood and Sleep Quality
A chi‑square test of independence was performed to assess the relationship between mood ratings and sleep quality. Results indicated a statistically significant association, suggesting that individuals with poorer sleep quality tended to report lower mood ratings. Standardized residuals revealed that the strongest contributions to the chi-square statistic came from participants with very low mood ratings (≈1–2) combined with poor sleep quality (ratings of 1–3), where residuals exceeded thresholds of 9–14. This suggests that individuals reporting poor sleep were disproportionately likely to also report very low mood. Conversely, participants with high mood ratings (≈7.5–10) paired with excellent sleep quality (rating = 5) also showed large positive residuals (≈3–7.5), indicating that good sleep was strongly associated with higher mood scores.

# Bootstrap CI for mean stress level
# Manual bootstrap from Chap05Bootstrap.Rmd
x <- data$stress_level
n <- length(x)
N <- 1000
data.mean <- numeric(N)

for (i in 1:N) {
  boot_sample <- sample(x, n, replace = TRUE)
  data.mean[i] <- mean(boot_sample)
}

mean(data.mean)

## [1] 5.716836

sd(data.mean)

## [1] 0.04237383

quantile(data.mean, c(0.025, 0.975))

##     2.5%    97.5% 
## 5.636770 5.802415

# Compare to parametric t-CI
t.test(x)$conf.int

## [1] 5.63758 5.79922
## attr(,"conf.level")
## [1] 0.95

Bootstrap vs Parametric Confidence Intervals
To evaluate robustness of mean stress estimates, both parametric t‑based confidence intervals and bootstrap percentile intervals were computed. The parametric 95% CI for mean stress was narrow and centered around the sample mean. The bootstrap percentile CI closely matched, with only minor differences in bounds. This agreement suggests that parametric assumptions were reasonable in this dataset, but the bootstrap method provided reassurance that conclusions remain valid even under potential non‑normality or outliers. Together, these analyses demonstrate that stress levels vary meaningfully across demographic groups, that mood and sleep are interrelated, and that bootstrap methods can validate parametric inference.

Conclusion
This study investigated how technology use relates to stress and broader wellness outcomes through ANOVA, chi-square tests, correlation, regression, and bootstrap methods. Consistent with prior concerns about technology and stress, we found that stress levels differed significantly across age groups, with younger participants (≤25) reporting the highest stress and stress decreasing steadily with age. Gender differences were minimal, and location type (Urban, Suburban, Rural) showed no statistically significant effects, though urban participants displayed a small, non-significant trend toward higher stress and screen time. Regression analyses demonstrated that daily screen time was a strong positive predictor of stress, while physical activity was strongly negatively correlated with stress, underscoring the role of lifestyle factors in wellness. Mood and sleep quality were also closely linked: chi-square tests revealed that poor sleep was disproportionately associated with very low mood, while excellent sleep aligned with high mood ratings. Bootstrap confidence intervals for mean stress closely matched parametric t-based intervals, reinforcing the robustness of the findings. Overall, these results support the abstract’s premise that technology use can negatively impact stress and wellness, particularly among younger adults, while also showing that wellness indicators such as sleep, mood, and physical activity are interrelated. The findings suggest that healthier digital habits, combined with lifestyle interventions, may help mitigate stress and promote better overall wellness.

Insights:
• Stress varies substantially across age, with younger adults (≤25) reporting the highest levels and older groups reporting lower stress.
• Mood and sleep quality are strongly associated, consistent with the idea that sleep and emotional wellbeing are intertwined.
• Bootstrap methods produced confidence intervals that closely matched parametric t-intervals, increasing confidence in the stability of the stress estimates.

Limitations:
• Data are self‑reported, which may introduce bias or measurement error.
• The study is observational, limiting causal inference.
• Unequal group sizes and potential confounders (e.g., occupation, health status) may influence results.

References • Kaggle Dataset: Tech Use & Stress Wellness.https://www.kaggle.com/datasets/nagpalprabhavalkar/tech-use-and-stress-wellness?resource=download
• Chihara, L., & Hesterberg, T. (2019). Mathematical Statistics with Resampling and R.
• American Psychological Association (2017). Stress in America: Coping with Change Stress in America™ Survey.

STAT3333_Final_projectrmd

Marali Benitez, Kelly Thackeray, Lance John

2025-11-11

Technology Useage & Stress Wellness