week-12
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(tsibble)
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:zoo':
##
## index
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
MLB <- read.csv("/Users/ba/Downloads/MLB.csv")
head(MLB)
## Date MLB
## 1 2015-07-01 158
## 2 2015-07-02 118
## 3 2015-07-03 119
## 4 2015-07-04 142
## 5 2015-07-05 151
## 6 2015-07-06 152
MLB$Date <- as.Date(MLB$Date)
MLB_ts <- as_tsibble(MLB, index=Date) |>
index_by(Date)
MLB_xts <- xts(x=MLB_ts$MLB,
order.by = MLB_ts$Date)
MLB_xts <- setNames(MLB_xts,"MLB")
Visualization of MLB Time-Series data
library(ggplot2)
MLB_ts |>
ggplot(aes(x = Date, y = MLB)) +
geom_line(color = 'lightgreen')+
labs(title = "MLB Response Over Time")+
theme_classic()
Identifying the trend using Linear Regression
MLB_ts |>
filter_index("2021-01" ~ "2022-01")|>
ggplot(mapping = aes(x = Date, y = MLB)) +
geom_line() +
geom_smooth(method = 'lm', color = 'lightgreen', se=FALSE) +
labs(title = "MLB views") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation:
The plot depicting Major League Baseball (MLB) viewership over time reveals several notable features. Firstly, there are clear spikes in viewership at various points throughout the year, which may correspond to key events in the MLB calendar, such as the start of the season, playoffs, or the World Series. These peaks indicate heightened interest during crucial moments of the baseball season.
Secondly, aside from these spikes, there is an underlying trend that suggests a general decline in viewership, as indicated by the fitted trend line. This could be indicative of a broader shift in audience behavior or interest in MLB over the timeframe analyzed.
Additionally, the variance in viewership appears to be quite substantial. While the general trend is downward, the range between the peaks and the baseline is significant. This variability might suggest that while general interest is waning, specific events still generate substantial viewership.
The spikes also suggest there may be seasonal effects, such as particular events or series that draw more viewers. Segmenting the data around these events can help us understand if and how these events significantly influence overall viewership trends. The significant spikes may be outliers or have high leverage, which can disproportionately affect the regression model’s slope and intercept. By creating subsets without these points, you can assess the underlying trend’s robustness.
Seasonality
acf(MLB_ts$MLB, lag.max = 24)
Interpretation: The graph showing autocorrelation function (ACF) reveals a trend where the correlation between observations decreases as the time lag increases. This pattern indicates that there isn’t a strong seasonal pattern in the data.
When the ACF graph displays decreasing values as the lag increases, it means that there aren’t consistent cycles or repeating patterns in the data at fixed intervals. Instead, the data likely shows more random or irregular fluctuations over time.
pacf(MLB_ts$MLB,lag.max = 24)
Interpretation: Looking at the Partial Autocorrelation Function (PACF) graph, we can determine that there’s no clear seasonality in the time series data. There aren’t any notable spikes occurring at intervals that would align with weekly (7 days) or yearly (12 months) patterns. Thus, we can infer that the data doesn’t display any significant seasonality.