This week, we explore time-based trends in page views related to our dataset. Since our original dataset doesn’t include a time column, we use Wikipedia page views for “Sleep deprivation” as a proxy for public interest in this topic over time.
We pull Wikipedia page view data for the article “Sleep_deprivation”.
# Primary page: Sleep Deprivation
views_sd <- wp_trend(
page = "Sleep_deprivation",
from = "2020-01-01",
to = "2023-12-31",
lang = "en"
)
# Additional terms for comparison
views_stress <- wp_trend("Stress_(biology)", from = "2020-01-01", to = "2023-12-31", lang = "en")
views_screen <- wp_trend("Screen_time", from = "2020-01-01", to = "2023-12-31", lang = "en")
views_insomnia <- wp_trend("Insomnia", from = "2020-01-01", to = "2023-12-31", lang = "en")
# Clean and combine
views_all <- views_sd %>%
dplyr::select(date, views) %>%
dplyr::rename(SleepDeprivation = views) %>%
left_join(views_stress %>% dplyr::select(date, views) %>% dplyr::rename(Stress = views), by = "date") %>%
left_join(views_screen %>% dplyr::select(date, views) %>% dplyr::rename(ScreenTime = views), by = "date") %>%
left_join(views_insomnia %>% dplyr::select(date, views) %>% dplyr::rename(Insomnia = views), by = "date") %>%
dplyr::mutate(date = as.Date(date)) %>%
as_tsibble(index = date)
head(views_all)
## # A tsibble: 6 x 5 [1D]
## date SleepDeprivation Stress ScreenTime Insomnia
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2020-01-01 1127 400 203 2092
## 2 2020-01-02 1267 424 212 2737
## 3 2020-01-03 1348 469 233 2830
## 4 2020-01-04 1230 438 207 2723
## 5 2020-01-05 1234 428 271 2900
## 6 2020-01-06 1464 552 258 3225
views_all %>%
ggplot(aes(x = date)) +
geom_line(aes(y = SleepDeprivation, color = "Sleep Deprivation")) +
geom_line(aes(y = Stress, color = "Stress")) +
geom_line(aes(y = ScreenTime, color = "Screen Time")) +
geom_line(aes(y = Insomnia, color = "Insomnia")) +
labs(title = "Wikipedia Page Views: Sleep-Related Topics",
x = "Date", y = "Page Views", color = "Topic") +
scale_y_continuous(labels = comma)
lm_fit <- views_all %>%
model(Trend = TSLM(SleepDeprivation ~ trend()))
report(lm_fit)
## Series: SleepDeprivation
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -551.63 -206.52 -27.88 140.08 2494.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1166.51264 14.15543 82.407 < 2e-16 ***
## trend() -0.04812 0.01677 -2.869 0.00418 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 270.4 on 1459 degrees of freedom
## Multiple R-squared: 0.00561, Adjusted R-squared: 0.004928
## F-statistic: 8.231 on 1 and 1459 DF, p-value: 0.0041782
Interpretation:
We interpret the coefficients by converting them to odds ratios.
views_all %>%
model(TSLM(SleepDeprivation ~ trend())) %>%
augment() %>%
ggplot(aes(x = date)) +
geom_line(aes(y = SleepDeprivation), color = "gray60") +
geom_line(aes(y = .fitted), color = "red") +
labs(title = "Trend in 'Sleep Deprivation' Page Views",
y = "Page Views", x = "Date")
Interpretation:
views_all %>%
model(Smooth = ETS(SleepDeprivation)) %>%
components() %>%
autoplot() +
labs(title = "Smoothed Components of Sleep Deprivation Interest")
Interpretation:
views_all %>%
ACF(SleepDeprivation) %>%
autoplot() +
labs(title = "ACF of Page Views: Sleep Deprivation")
Interpretation:
views_all %>%
mutate(Year = year(date)) %>%
ggplot(aes(x = date, y = SleepDeprivation, color = factor(Year))) +
geom_line(alpha = 0.8) +
labs(title = "Sleep Deprivation Page Views: Year-over-Year Pattern",
x = "Date", y = "Page Views", color = "Year") +
scale_y_continuous(labels = comma)
Interpretation:
# Prep wide tsibble to VAR-compatible data frame
multi_ts <- views_all %>%
dplyr::select(date, SleepDeprivation, Stress, ScreenTime, Insomnia) %>%
as_tibble() %>%
na.omit()
# Convert to time-series matrix
ts_data <- ts(multi_ts[, -1], start = c(2020, 1), frequency = 365)
# Fit simple VAR model
var_fit <- VAR(ts_data, p = 2, type = "const")
summary(var_fit)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: SleepDeprivation, Stress, ScreenTime, Insomnia
## Deterministic variables: const
## Sample size: 1459
## Log Likelihood: -35492.996
## Roots of the characteristic polynomial:
## 0.9103 0.7982 0.7715 0.5131 0.3245 0.3245 0.2981 0.1925
## Call:
## VAR(y = ts_data, p = 2, type = "const")
##
##
## Estimation results for equation SleepDeprivation:
## =================================================
## SleepDeprivation = SleepDeprivation.l1 + Stress.l1 + ScreenTime.l1 + Insomnia.l1 + SleepDeprivation.l2 + Stress.l2 + ScreenTime.l2 + Insomnia.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## SleepDeprivation.l1 0.394894 0.028648 13.784 < 2e-16 ***
## Stress.l1 0.003355 0.051509 0.065 0.948081
## ScreenTime.l1 0.356825 0.097056 3.676 0.000245 ***
## Insomnia.l1 0.145350 0.031305 4.643 3.74e-06 ***
## SleepDeprivation.l2 0.332689 0.028636 11.618 < 2e-16 ***
## Stress.l2 -0.062988 0.051108 -1.232 0.217982
## ScreenTime.l2 -0.303923 0.097667 -3.112 0.001896 **
## Insomnia.l2 -0.050933 0.031564 -1.614 0.106815
## const 160.767909 34.590587 4.648 3.66e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 184.9 on 1450 degrees of freedom
## Multiple R-Squared: 0.538, Adjusted R-squared: 0.5354
## F-statistic: 211 on 8 and 1450 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation Stress:
## =======================================
## Stress = SleepDeprivation.l1 + Stress.l1 + ScreenTime.l1 + Insomnia.l1 + SleepDeprivation.l2 + Stress.l2 + ScreenTime.l2 + Insomnia.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## SleepDeprivation.l1 0.02652 0.01546 1.715 0.08655 .
## Stress.l1 0.27230 0.02780 9.796 < 2e-16 ***
## ScreenTime.l1 0.34818 0.05238 6.647 4.21e-11 ***
## Insomnia.l1 0.03562 0.01689 2.108 0.03519 *
## SleepDeprivation.l2 0.03382 0.01545 2.189 0.02878 *
## Stress.l2 0.11354 0.02758 4.117 4.06e-05 ***
## ScreenTime.l2 -0.13830 0.05271 -2.624 0.00878 **
## Insomnia.l2 -0.01752 0.01703 -1.028 0.30400
## const 128.03650 18.66734 6.859 1.02e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 99.77 on 1450 degrees of freedom
## Multiple R-Squared: 0.2645, Adjusted R-squared: 0.2604
## F-statistic: 65.17 on 8 and 1450 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation ScreenTime:
## ===========================================
## ScreenTime = SleepDeprivation.l1 + Stress.l1 + ScreenTime.l1 + Insomnia.l1 + SleepDeprivation.l2 + Stress.l2 + ScreenTime.l2 + Insomnia.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## SleepDeprivation.l1 -0.022997 0.008162 -2.818 0.0049 **
## Stress.l1 -0.027795 0.014675 -1.894 0.0584 .
## ScreenTime.l1 0.551463 0.027651 19.944 < 2e-16 ***
## Insomnia.l1 0.019746 0.008919 2.214 0.0270 *
## SleepDeprivation.l2 0.019137 0.008158 2.346 0.0191 *
## Stress.l2 0.016624 0.014560 1.142 0.2538
## ScreenTime.l2 0.187737 0.027825 6.747 2.17e-11 ***
## Insomnia.l2 -0.023487 0.008992 -2.612 0.0091 **
## const 96.549752 9.854600 9.797 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 52.67 on 1450 degrees of freedom
## Multiple R-Squared: 0.4581, Adjusted R-squared: 0.4551
## F-statistic: 153.2 on 8 and 1450 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation Insomnia:
## =========================================
## Insomnia = SleepDeprivation.l1 + Stress.l1 + ScreenTime.l1 + Insomnia.l1 + SleepDeprivation.l2 + Stress.l2 + ScreenTime.l2 + Insomnia.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## SleepDeprivation.l1 -0.04064 0.02572 -1.580 0.114315
## Stress.l1 -0.02938 0.04624 -0.635 0.525364
## ScreenTime.l1 0.30070 0.08713 3.451 0.000575 ***
## Insomnia.l1 0.63733 0.02810 22.677 < 2e-16 ***
## SleepDeprivation.l2 0.07292 0.02571 2.836 0.004628 **
## Stress.l2 -0.21250 0.04588 -4.631 3.96e-06 ***
## ScreenTime.l2 -0.36100 0.08768 -4.117 4.05e-05 ***
## Insomnia.l2 0.24340 0.02834 8.590 < 2e-16 ***
## const 296.83895 31.05452 9.559 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 166 on 1450 degrees of freedom
## Multiple R-Squared: 0.7137, Adjusted R-squared: 0.7121
## F-statistic: 451.8 on 8 and 1450 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## SleepDeprivation Stress ScreenTime Insomnia
## SleepDeprivation 34175 5232 3252 12222
## Stress 5232 9953 1165 4827
## ScreenTime 3252 1165 2774 2685
## Insomnia 12222 4827 2685 27545
##
## Correlation matrix of residuals:
## SleepDeprivation Stress ScreenTime Insomnia
## SleepDeprivation 1.0000 0.2837 0.3340 0.3984
## Stress 0.2837 1.0000 0.2218 0.2916
## ScreenTime 0.3340 0.2218 1.0000 0.3072
## Insomnia 0.3984 0.2916 0.3072 1.0000
Interpretation:
Key Findings: