library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
# Load the data
data <- read.csv("C://Users//saisr//Downloads//pageview obesity data.csv")

# Preview the data
head(data)
##         Date Obesity Overweight Healthy.diet
## 1  10/6/2022    1335        188          778
## 2  10/7/2022    1234        197          837
## 3  10/8/2022    1075        199          694
## 4  10/9/2022    1214        190          829
## 5 10/10/2022    1277        215          822
## 6 10/11/2022    1430        167          849

Converting date column

data <- data %>%
  mutate(Date = as.Date(Date, format = "%m/%d/%Y"))

str(data)
## 'data.frame':    274 obs. of  4 variables:
##  $ Date        : Date, format: "2022-10-06" "2022-10-07" ...
##  $ Obesity     : int  1335 1234 1075 1214 1277 1430 1487 1469 1254 1240 ...
##  $ Overweight  : int  188 197 199 190 215 167 166 222 227 208 ...
##  $ Healthy.diet: int  778 837 694 829 822 849 911 902 694 674 ...

Creating a tsibble Object

# Create a tsibble for the 'Obesity' column
data_tsibble <- data %>%
  as_tsibble(index = Date) %>%
  select(Date, Obesity)

plotting data over time

# Plot Obesity pageviews over the entire time range
data_tsibble %>%
  ggplot(aes(x = Date, y = Obesity)) +
  geom_line(color = "blue") +
  labs(title = "Obesity Pageviews Over Time",
       x = "Date",
       y = "Pageviews")

Different Windows of Time

# Filter for the first 6 months
data_tsibble %>%
  filter(Date <= as.Date("2023-03-31")) %>%
  ggplot(aes(x = Date, y = Obesity)) +
  geom_line(color = "blue") +
  labs(title = "Obesity Pageviews: First 6 Months",
       x = "Date",
       y = "Pageviews")

# Filter for the last 6 months
data_tsibble %>%
  filter(Date >= as.Date("2023-04-01")) %>%
  ggplot(aes(x = Date, y = Obesity)) +
  geom_line(color = "blue") +
  labs(title = "Obesity Pageviews: Last 6 Months",
       x = "Date",
       y = "Pageviews")

key observations

  • The spike early in the dataset is the most prominent feature and likely corresponds to an anomaly or a significant event. -The last six months show more consistent variability compared to the earlier periods.

Seasonality Analysis

1. Smoothing to Detect Seasonality

# Load necessary libraries
library(ggplot2)

# Apply smoothing (LOESS smoothing)
ggplot(data_tsibble, aes(x = Date, y = Obesity)) +
  geom_line(color = "blue") +
  geom_smooth(method = "loess", formula = y ~ x, span = 0.2, color = "red", se = FALSE) +
  labs(title = "Smoothing to Detect Seasonality",
       x = "Date",
       y = "Pageviews") +
  theme_minimal()

Interpretation
  • There is a large spike in pageviews early in the timeline, likely indicating an event or anomaly.
  • After the spike, the pageviews stabilize but exhibit some oscillations.
  • The red line, created using a LOESS smoothing method, provides a smoothed estimate of the trends in the data.
  • There appears to be some recurring patterns or seasonality, especially visible in the smaller oscillations post-spike, although the periodicity is unclear without additional analysis.

Illustrating Seasonality Using ACF

This test detectes periodicity

acf(data_tsibble$Obesity, main = "ACF: Detecting Seasonality", lag.max = 50)

Using PACF to Verify Seasonal Effects

PACF isolates direct correlations by removing indirect ones.

pacf(data_tsibble$Obesity, main = "PACF: Verifying Seasonality", lag.max = 50)


Decomposing the Time Series

# Convert data to a time series object
data_ts <- ts(data$Obesity, frequency = 7)  # Adjust 'Obesity' and frequency as needed

# Decompose the time series
decomposed <- stl(data_ts, s.window = "periodic")

# Plot the decomposed components
plot(decomposed, main = "Decomposition of Time Series")

Insights

Top Panel-data - This is the raw time series data. - we can see an initial spike followed by more stable data with periodic fluctuations. Second Panel-seasonal - This shows repeating patterns or cycles within the data, such as weekly, monthly, or yearly seasonality. - The consistent oscillations suggest a strong seasonal pattern, likely periodic over the specified time frame. - The amplitude of these oscillations appears stable over time. Third Panel-trend - This represents the long-term movement in the data, abstracting away seasonal variations. - Initially, there is a sharp upward trend, corresponding to the spike in the data. - Afterward, the trend stabilizes and shows small fluctuations over time. Bottom Panel-remainder - This shows the random noise or residual variation that cannot be explained by the trend or seasonal components. - The large outlier corresponds to the initial spike in the data, which is not explained by seasonality or trend. - The remainder is relatively stable after the spike, with small variations.

Seasonality: A strong seasonal pattern is evident in the data, indicating regular cycles. Trend: The trend stabilizes after an initial spike, showing a long-term consistent movement. Remainder: Anomalies (e.g., the spike) are captured as residuals, likely due to external events.