Content
1. Importing Major League Baseball (MLB) Views Data
2. Creating a XTS time series data of MLB views
3. Visualizing the MLB time series data.
4. Identifying Trend
- Linear Regression
- Moving Averages
- Lo(W)ess
5. Identifying Seasonality using ACF and PACF
6. Stationary Check
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(tsibble)
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:zoo':
##
## index
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
MLB <- read.csv("/Users/anuragreddy/Desktop/Statistics with R/MLB.csv")
head(MLB)
## Date MLB
## 1 2015-07-01 158
## 2 2015-07-02 118
## 3 2015-07-03 119
## 4 2015-07-04 142
## 5 2015-07-05 151
## 6 2015-07-06 152
MLB$Date <- as.Date(MLB$Date)
MLB_ts <- as_tsibble(MLB, index=Date) |>
index_by(Date)
MLB_xts <- xts(x=MLB_ts$MLB,
order.by = MLB_ts$Date)
MLB_xts <- setNames(MLB_xts,"MLB")
Visualization
library(ggplot2)
MLB_ts |>
ggplot(aes(x = Date, y = MLB)) +
geom_line(color = 'blue')+
labs(title = "MLB Response Over Time")
Linear Regression
MLB_ts |>
filter_index("2021-01" ~ "2022-01")|>
ggplot(mapping = aes(x = Date, y = MLB)) +
geom_line() +
geom_smooth(method = 'lm', color = 'blue', se=FALSE) +
labs(title = "MLB views") +
theme_economist()
## `geom_smooth()` using formula = 'y ~ x'
Rolling Averages
MLB_xts |>
rollapply(width = 100, \(x) mean(x,ra.nm=TRUE))|>
ggplot(aes(x = Index, y = MLB))+
geom_line()+
labs(title = "MLB views over time", subtitle = "100 days Rolling Average")+
theme_economist()
## Warning: Removed 99 rows containing missing values or values outside the scale range
## (`geom_line()`).
Lowess
MLB_ts |>
ggplot(aes(x = Date, y = MLB))+
geom_point(size=1,shape='o') +
geom_smooth(span=0.5, color = 'blue', se=FALSE)+
labs(title = "MLB views over time")+
theme_economist()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Interpretation: In analyzing the time series data pertaining to Major League Baseball (MLB) views, three distinct methodologies were employed to discern prevailing trends. The findings indicate a discernible downward trajectory in MLB views over successive years.
The utilization of multiple analytical approaches not only reinforces the validity of the findings but also underscores the significance of the identified trend. As such, these insights provide a foundation for further investigation and strategic deliberation aimed at mitigating the factors contributing to the observed decline in MLB views.
Seasonality
1. ACF
acf(MLB_ts$MLB, lag.max = 24)
Interpretation: The autocorrelation function (ACF) graph exhibits a gradually decreasing pattern, it suggests a decline in correlation between observations as the lag increases. In the context of time series analysis, this behavior indicates a lack of significant seasonality in the data.
When the ACF graph shows gradually decreasing values with increasing lag, it suggests that there are no consistent periodic fluctuations in the data that repeat at fixed intervals. Instead, the data may exhibit a more random or irregular behavior over time.
2. PACF
pacf(MLB_ts$MLB,lag.max = 24)
Interpretation: By scrutinizing the PACF graph above, we can conclude that the time series data exhibits no seasonality. There are no significant spikes observed at intervals corresponding to 7 days or 12 months. Therefore, it can be inferred that the time series data lacks seasonality.
Stationary Check
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
adf.test(MLB_ts$MLB)
## Warning in adf.test(MLB_ts$MLB): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: MLB_ts$MLB
## Dickey-Fuller = -8.3345, Lag order = 14, p-value = 0.01
## alternative hypothesis: stationary
Interpretation: The Augmented Dickey-Fuller test was conducted on the time series data representing Major League Baseball (MLB) views. The test statistic obtained was -8.3345, with a lag order of 14. The calculated p-value for the test was reported as 0.01.
The time series data for MLB views is statistically significant at the chosen significance level (typically 0.05), providing evidence to conclude that the series is stationary. This suggests that the data does not possess a unit root and is devoid of significant trends or patterns, implying stability over time.