This project aims to analyze time series models to forecast the future performance of Netflix’s stock price (NFLX). The dataset covers the period from 2018 to 2022, offering insights into the changes in NFLX’s price over time. The analysis focuses on comparing SARIMA (Seasonal Autoregressive Integrated Moving Average) and GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models to evaluate their predictive accuracy. By comparing the predicted values to actual values, the performance of these models is assessed to determine the most reliable forecasting method.
This project focuses on analyzing and forecasting Netflix’s stock price. The goal is to produce accurate predictions of Netflix’s future stock price by applying two models—SARIMA and GARCH—on historical stock price data.
Netflix is a global leader in streaming services, with over 233 million paid memberships as of 2023, spanning more than 190 countries. I selected Netflix’s stock (NFLX) because of its prominent role in the market and its widespread use in everyday life. As Netflix has experienced significant growth in both revenue and subscribers, its stock price has garnered substantial attention from investors.
In this project, SARIMA and GARCH models are applied to a training dataset to analyze the stock price of NFLX. We examine the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to identify key features, such as trends, seasonality, stationarity, and the need for data transformations. Once the optimal model parameters are determined, diagnostic checks are performed to select the best-fitting model.
The dataset used for analyzing Netflix’s stock price was obtained from Kaggle and sourced from Yahoo Finance. It contains daily stock prices (in U.S. dollars) for Netflix (NFLX) from February 5, 2018, to February 4, 2022. The dataset includes 1,009 observations across seven variables: Date, Open, High, Low, Close, Adjusted Close, and Volume.
In this analysis, the Close price is used as it represents the final
trading price during business hours, providing a comprehensive
reflection of the stock’s value. This dataset is particularly
significant because it spans a period where Netflix’s stock price
exhibited substantial growth, including its highest historical peak. The
lowest stock price within this period was $233.88 on December 24, 2018,
and the highest was $691.69 on November 17, 2021.
It’s also important to note that the dataset frequency is 252 trading
days per year rather than the standard 365 days, as the stock market
does not operate on weekends or holidays.
The SARIMA (Seasonal Autoregressive Integrated Moving Average) model is an extension of the ARIMA model, designed to handle time series data with both seasonal patterns and non-stationary characteristics. SARIMA requires the specification of two parameter sets: the non-seasonal parameters (p, d, q) and the seasonal parameters (P, D, Q), along with the seasonal period (s).
(p, d, q):
* p: The order of autoregressive (AR) terms
* d: The number of differences required to achieve
stationarity
* q: The order of moving average (MA) terms
(P, D, Q):
* P: The order of the seasonal autoregressive part
* D: The number of seasonal differences
* Q: The order of the seasonal moving average part
s: The length of theseasonal cycle in the data
To determine the optimal parameters for the SARIMA model, we analyze the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots. After selecting several models, we use criteria such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to identify the best-fitting model.
The GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model is an extension of the ARCH model, frequently used to model financial data with periods of high volatility clustered together. GARCH captures time-varying volatility by using an autoregressive structure for the conditional variance.
To fit the GARCH model, we first apply an AR model to the dataset and analyze the ACF of the squared residuals. Diagnostic tests are conducted to confirm the validity of the GARCH model. In this project, we use the AR(8) model, selected based on the earlier analysis, to capture the volatility dynamics and forecast stock price fluctuations.
To begin, the time series of Netflix’s stock price was set up using the NFLX dataset, with a frequency of 252 trading days per year (since the stock market only operates on weekdays). Over the five-year period, the stock price shows an overall upward trend, without a clear seasonal pattern.
The initial analysis of the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) indicated a slowly decaying ACF and a high value of PACF at lag 1. As a result, we applied a log transformation to stabilize variance and differenced the data to achieve stationarity.
After performing the log transformation, the overall shape of the time series remained similar, but the scale was reduced.
In order to compare our prediction with the actual values later, the dataset is splitted into the training set (989 observations) and the testing set (20 observations) as 4-week stock price prediction.
From the ACF and PACF of the differenced training set, we identified the values of \(d\) = 1, \(p_{max}\) = 8, and \(q_{max}\) = 1. Since the data did not display any seasonal patterns, the seasonal components were set to \(Q\) = 0 and \(D\) = 0.
Now, we will find the best two models with the smallest Akaike Information Criterion (AIC) value and the smallest Bayesian Information Criterion (BIC) value.After comparing multiple models based on AIC and BIC, ARIMA(8, 1, 0) was found to have the lowest AIC (-4479.758), and ARIMA(1, 1, 0) had the lowest BIC (-4465.775).
Diagnostics of ARIMA(8, 1, 0):
## initial value -3.685749
## iter 2 value -3.699052
## iter 3 value -3.699326
## iter 4 value -3.699330
## iter 5 value -3.699331
## iter 5 value -3.699331
## iter 5 value -3.699331
## final value -3.699331
## converged
## initial value -3.695088
## iter 2 value -3.695129
## iter 3 value -3.695132
## iter 3 value -3.695132
## iter 3 value -3.695132
## final value -3.695132
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.0876 0.0242 0.0090 0.0477 -0.0737 -0.0547 0.0037 -0.0844
## s.e. 0.0318 0.0319 0.0319 0.0319 0.0319 0.0319 0.0320 0.0319
##
## sigma^2 estimated as 0.0006172: log likelihood = 2248.88, aic = -4479.76
##
## $degrees_of_freedom
## [1] 980
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.0876 0.0318 -2.7597 0.0059
## ar2 0.0242 0.0319 0.7571 0.4492
## ar3 0.0090 0.0319 0.2810 0.7788
## ar4 0.0477 0.0319 1.4984 0.1344
## ar5 -0.0737 0.0319 -2.3119 0.0210
## ar6 -0.0547 0.0319 -1.7110 0.0874
## ar7 0.0037 0.0320 0.1151 0.9084
## ar8 -0.0844 0.0319 -2.6461 0.0083
##
## $AIC
## [1] -4.534168
##
## $AICc
## [1] -4.534019
##
## $BIC
## [1] -4.489571
Diagnostics of ARIMA(1, 1, 0):
## initial value -3.682936
## iter 2 value -3.686978
## iter 2 value -3.686978
## iter 2 value -3.686978
## final value -3.686978
## converged
## initial value -3.685925
## iter 1 value -3.685925
## final value -3.685925
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1
## -0.0898
## s.e. 0.0317
##
## sigma^2 estimated as 0.0006287: log likelihood = 2239.78, aic = -4475.57
##
## $degrees_of_freedom
## [1] 987
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.0898 0.0317 -2.8298 0.0048
##
## $AIC
## [1] -4.529925
##
## $AICc
## [1] -4.529921
##
## $BIC
## [1] -4.520015
Overall, while the ARIMA(1,1,0) model produced reasonable results, its diagnostics suggest that it may be less accurate compared to ARIMA(8,1,0), as indicated by the residual patterns and Ljung-Box test results.
Therefore, the best fitting model would be ARIMA(8,1,0), and this
model will be used to conduct forecasting.
Using the ARIMA(8, 1, 0) model, 20 forecasts are generated for the test dataset, and these are compared to the true stock prices. The plot reveals a discrepancy between the predicted and actual values: the actual stock price dropped significantly, while the model predicted an increase. This indicated that although the ARIMA(8, 1, 0) model performed well in fitting the training data, it failed to capture some external factors that affected Netflix’s stock price, such as a subscription price that was increased by $1-2 a month and intensified competition in the streaming industry following the success of “Squid Game.”
For GARCH model, I used the AR(8) because it was identified as the best-fitting model from the previous analysis. This AR(8) model was used as the basis for capturing volatility in the NFLX stock price. The goal was to model the changing volatility over time and predict future fluctuations.
## initial value -3.686125
## iter 2 value -3.699671
## iter 3 value -3.699957
## iter 4 value -3.699963
## iter 5 value -3.699963
## iter 6 value -3.699964
## iter 7 value -3.699965
## iter 8 value -3.699965
## iter 8 value -3.699965
## iter 8 value -3.699965
## final value -3.699965
## converged
## initial value -3.695839
## iter 2 value -3.695882
## iter 3 value -3.695885
## iter 4 value -3.695886
## iter 5 value -3.695888
## iter 6 value -3.695888
## iter 6 value -3.695888
## iter 6 value -3.695888
## final value -3.695888
## converged
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.07 0.10 0.05 0.05 0.10 0.09 0.07 0.02 0.02 0.05 0.04 0.03 0.08
## PACF 0.07 0.09 0.03 0.03 0.09 0.07 0.04 -0.01 0.00 0.03 0.02 0.00 0.06
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.03 0.04 0.01 0.01 0.00 0.02 0.01 0.01 0.01 0.01 0.03 0.00
## PACF 0.01 0.02 -0.01 -0.01 -0.01 0.01 0.00 0.00 0.01 0.01 0.02 -0.01
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.01 0.02 0.01 0.03 0.01 -0.01 0.02 0.02 0.01 0.03 0.00 0.00
## PACF 0.00 0.01 0.00 0.02 0.00 -0.01 0.02 0.02 -0.01 0.02 -0.01 -0.01
## [,38] [,39] [,40] [,41] [,42]
## ACF -0.02 0.03 0 -0.01 0.01
## PACF -0.03 0.03 0 -0.02 0.00
We first analyzed the residuals from the AR(8) model by plotting the ACF of the squared residuals. The ACF plot displayed a notable spike at lag 2, and the PACF also showed a spike at lag 2, indicating that the GARCH(2, 2) model would be appropriate for capturing the volatility clusters in the data.
##
## Series Initialization:
## ARMA Model: arma
## Formula Mean: ~ arma(0, 0)
## GARCH Model: garch
## Formula Variance: ~ garch(2, 2)
## ARMA Order: 0 0
## Max ARMA Order: 0
## GARCH Order: 2 2
## Max GARCH Order: 2
## Maximum Order: 2
## Conditional Dist: norm
## h.start: 3
## llh.start: 1
## Length of Series: 988
## Recursion Init: mci
## Series Scale: 0.02517583
##
## Parameter Initialization:
## Initial Parameters: $params
## Limits of Transformations: $U, $V
## Which Parameters are Fixed? $includes
## Parameter Matrix:
## U V params includes
## mu -0.31258890 0.3125889 0.03125889 TRUE
## omega 0.00000100 100.0000000 0.10000000 TRUE
## alpha1 0.00000001 1.0000000 0.05000000 TRUE
## alpha2 0.00000001 1.0000000 0.05000000 TRUE
## gamma1 -0.99999999 1.0000000 0.10000000 FALSE
## gamma2 -0.99999999 1.0000000 0.10000000 FALSE
## beta1 0.00000001 1.0000000 0.40000000 TRUE
## beta2 0.00000001 1.0000000 0.40000000 TRUE
## delta 0.00000000 2.0000000 2.00000000 FALSE
## skew 0.10000000 10.0000000 1.00000000 FALSE
## shape 1.00000000 10.0000000 4.00000000 FALSE
## Index List of Parameters to be Optimized:
## mu omega alpha1 alpha2 beta1 beta2
## 1 2 3 4 7 8
## Persistence: 0.9
##
##
## --- START OF TRACE ---
## Selected Algorithm: nlminb
##
## R coded nlminb Solver:
##
## 0: 1365.6502: 0.0312589 0.100000 0.0500000 0.0500000 0.400000 0.400000
## 1: 1365.6353: 0.0312591 0.0994931 0.0496107 0.0498982 0.399753 0.399770
## 2: 1365.6101: 0.0312614 0.0982603 0.0486158 0.0517544 0.400786 0.400993
## 3: 1365.5302: 0.0312666 0.0949084 0.0447754 0.0538297 0.401916 0.402575
## 4: 1365.5008: 0.0312743 0.0941847 0.0405583 0.0541916 0.404360 0.405677
## 5: 1365.4461: 0.0312860 0.0892456 0.0404033 0.0565630 0.405351 0.407390
## 6: 1365.4437: 0.0312872 0.0891940 0.0400116 0.0567266 0.405697 0.407818
## 7: 1365.4395: 0.0312905 0.0887983 0.0396200 0.0564307 0.405796 0.408082
## 8: 1365.4368: 0.0312948 0.0885487 0.0394930 0.0563254 0.406117 0.408608
## 9: 1365.4325: 0.0313070 0.0876811 0.0392644 0.0558442 0.406423 0.409424
## 10: 1365.4241: 0.0313599 0.0864032 0.0388834 0.0564655 0.406389 0.411089
## 11: 1365.4204: 0.0314054 0.0869458 0.0371159 0.0576009 0.405490 0.411503
## 12: 1365.4188: 0.0314622 0.0863832 0.0374355 0.0568085 0.405247 0.413345
## 13: 1365.4152: 0.0315225 0.0854998 0.0379139 0.0560799 0.404465 0.414736
## 14: 1365.4131: 0.0315903 0.0855114 0.0379325 0.0567563 0.403172 0.415717
## 15: 1365.4112: 0.0316584 0.0854477 0.0376918 0.0568685 0.401897 0.416884
## 16: 1365.4052: 0.0320224 0.0844288 0.0379133 0.0550234 0.396268 0.425159
## 17: 1365.3838: 0.0331815 0.0870730 0.0348293 0.0602864 0.371279 0.445485
## 18: 1365.3683: 0.0343104 0.0867325 0.0351163 0.0620605 0.345900 0.467962
## 19: 1365.3395: 0.0354109 0.0877129 0.0356633 0.0641055 0.320352 0.491608
## 20: 1365.3091: 0.0391644 0.0931616 0.0366873 0.0708397 0.215815 0.582648
## 21: 1365.3002: 0.0406226 0.0934410 0.0385874 0.0674040 0.235927 0.563647
## 22: 1365.2976: 0.0404934 0.0905390 0.0384077 0.0659472 0.242041 0.562127
## 23: 1365.2966: 0.0400880 0.0910725 0.0382766 0.0653291 0.235052 0.569317
## 24: 1365.2963: 0.0404703 0.0910081 0.0386222 0.0654788 0.232170 0.571760
## 25: 1365.2963: 0.0403776 0.0909464 0.0384968 0.0655437 0.233289 0.570761
## 26: 1365.2963: 0.0403819 0.0909569 0.0385079 0.0655332 0.233101 0.570937
## 27: 1365.2963: 0.0403820 0.0909564 0.0385081 0.0655335 0.233100 0.570938
##
## Final Estimate of the Negative LLH:
## LLH: -2272.392 norm LLH: -2.299992
## mu omega alpha1 alpha2 beta1 beta2
## 1.016651e-03 5.765022e-05 3.850815e-02 6.553346e-02 2.330999e-01 5.709383e-01
##
## R-optimhess Difference Approximated Hessian Matrix:
## mu omega alpha1 alpha2 beta1
## mu -1829076.2275 -3495344 5330.747 7.467187e+02 -1040.505
## omega -3495344.1499 -45251467675 -18178536.417 -1.814487e+07 -24478638.768
## alpha1 5330.7471 -18178536 -11927.666 -1.075800e+04 -11311.199
## alpha2 746.7187 -18144865 -10757.997 -1.111924e+04 -11324.587
## beta1 -1040.5050 -24478639 -11311.199 -1.132459e+04 -14090.772
## beta2 -1553.5649 -24673190 -11295.435 -1.133957e+04 -14191.447
## beta2
## mu -1553.565
## omega -24673190.008
## alpha1 -11295.435
## alpha2 -11339.572
## beta1 -14191.447
## beta2 -14308.696
## attr(,"time")
## Time difference of 0.0105319 secs
##
## --- END OF TRACE ---
##
##
## Time to Estimate Parameters:
## Time difference of 0.03513312 secs
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(2, 2), data = ydiff, con.dist = "std")
##
## Mean and Variance Equation:
## data ~ garch(2, 2)
## <environment: 0x127f2d990>
## [data = ydiff]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 alpha2 beta1 beta2
## 0.00101665 0.00005765 0.03850815 0.06553346 0.23309991 0.57093834
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 1.017e-03 7.447e-04 1.365 0.1722
## omega 5.765e-05 2.640e-05 2.183 0.0290 *
## alpha1 3.851e-02 2.801e-02 1.375 0.1692
## alpha2 6.553e-02 3.577e-02 1.832 0.0669 .
## beta1 2.331e-01 3.273e-01 0.712 0.4763
## beta2 5.709e-01 2.938e-01 1.944 0.0519 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 2272.392 normalized: 2.299992
##
## Description:
## Sun Oct 13 20:31:12 2024 by user:
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 887.031 0
## Shapiro-Wilk Test R W 0.9584548 3.929184e-16
## Ljung-Box Test R Q(10) 15.72122 0.1078957
## Ljung-Box Test R Q(15) 22.57332 0.09362887
## Ljung-Box Test R Q(20) 35.16912 0.01922185
## Ljung-Box Test R^2 Q(10) 2.123062 0.995307
## Ljung-Box Test R^2 Q(15) 2.613227 0.9998311
## Ljung-Box Test R^2 Q(20) 4.098247 0.9999433
## LM Arch Test R TR^2 2.043445 0.9993363
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -4.587838 -4.558107 -4.587912 -4.576532
The GARCH(2,2) model was used to forecast future volatility, with 95% confidence intervals. The results showed that the predicted volatility followed the same pattern as the historical data, with reasonable confidence intervals. This indicates that the model is capable of predicting volatility accurately, although the forecast remains sensitive to extreme external shocks that the model cannot fully anticipate.
Overall, while the GARCH(2,2) model performed well in capturing and forecasting volatility, external factors not accounted for in the model could still influence stock price movements.
The goal of this project was to provide an accurate forecast of Netflix’s future stock prices by applying time series analysis to past data. The analysis began with a log transformation to address non-stationarity and increasing trends in the raw data.
After examining the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), several ARIMA models were tested, and ARIMA(8, 1, 0) was selected based on its superior performance using AIC and BIC values. This model was then used for forecasting the stock price, but it failed to capture the sudden drop in Netflix’s stock price. This limitation likely stems from external factors, such as Netflix’s subscription price increase and heightened competition in the streaming market, which were not included in the model.
For the GARCH model, the AR(8) model was applied to analyze and forecast the stock price volatility. By investigating the ACF and PACF of the squared residuals, GARCH(2,2) was identified as the optimal model for capturing the volatility in the dataset. The GARCH model successfully accounted for the clustering of volatility and provided reasonable confidence intervals for future volatility. However, like the ARIMA model, the GARCH model is sensitive to external shocks, which the model cannot fully predict.
The analysis in the project report helped understand the stock price trends and predict future price. Although the two models had satisfactory result on diagnostic tests, some further study could improve the results of prediction. Incorporating external variables, such as market sentiment, competitor dynamics, or macroeconomic indicators, could enhance the accuracy of both price and volatility forecasts.