Task is to perform a time series analysis for a stock price. Selecting a stock price and completing different tasks
Selected stock price: Netflix (NFLX)
Netflix is one of the world’s leading entertainment services with over 238 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time”. (Netflix - Overview - Profile, 2018)
library(plotly)
library(xts)
library(dplyr)
library(zoo)
library(tseries)
library(stats)
library(forecast)
library(astsa)
library(corrplot)
library(AER)
library(vars)
library(dynlm)
library(vars)
library(TSstudio)
library(tidyverse)
library(sarima)
library(dygraphs)
Netflix <- read.csv("C:\\Users\\danyb\\OneDrive - Instituto Tecnologico y de Estudios Superiores de Monterrey\\Docs\\Documentos\\Business Intelligence\\Quinto Semestre\\Introduction to Econometrics\\Examen\\entretainment_stocks.csv")
Before analyze
str(Netflix)
## 'data.frame': 192 obs. of 7 variables:
## $ Date : chr "1/1/2007" "2/1/2007" "3/1/2007" "4/1/2007" ...
## $ Disney_Adj_Close : num 29.1 28.4 28.5 29 29.3 ...
## $ Netflix_Adj_Close : num 3.26 3.22 3.31 3.17 3.13 2.77 2.46 2.5 2.96 3.78 ...
## $ Nintendo_Adj_Close : num 37.1 33.1 36.3 40 43.6 ...
## $ WBD_Adj_Close : num 8.47 8.21 9.78 11.11 11.95 ...
## $ EA_Adj_Close : num 49.5 49.9 49.8 49.9 48.4 ...
## $ Paramount_Adj_Close: num 22.1 21.5 21.6 22.6 23.7 ...
Column “Date” from dataset is on chr type, in order to graph time series, it needs to be date type.
Netflix$Date <- as.Date(Netflix$Date,"%m/%d/%Y")
str(Netflix)
## 'data.frame': 192 obs. of 7 variables:
## $ Date : Date, format: "2007-01-01" "2007-02-01" ...
## $ Disney_Adj_Close : num 29.1 28.4 28.5 29 29.3 ...
## $ Netflix_Adj_Close : num 3.26 3.22 3.31 3.17 3.13 2.77 2.46 2.5 2.96 3.78 ...
## $ Nintendo_Adj_Close : num 37.1 33.1 36.3 40 43.6 ...
## $ WBD_Adj_Close : num 8.47 8.21 9.78 11.11 11.95 ...
## $ EA_Adj_Close : num 49.5 49.9 49.8 49.9 48.4 ...
## $ Paramount_Adj_Close: num 22.1 21.5 21.6 22.6 23.7 ...
colSums(is.na(Netflix))
## Date Disney_Adj_Close Netflix_Adj_Close Nintendo_Adj_Close
## 0 0 0 0
## WBD_Adj_Close EA_Adj_Close Paramount_Adj_Close
## 0 0 0
There are no missing values on the dataset.
In order to get an interactive time series graph, it will be necessary to use “plot_ly”
plot_ly(data = Netflix, x = ~Date, y = ~Netflix_Adj_Close, type = "scatter", mode = "lines") %>%
layout(title = "Netflix Stock Price",
xaxis = list(title = "Date"),
yaxis = list(title = "Netflix Adj Close")) %>%
add_trace(text = ~paste("Adj Close: $", Netflix_Adj_Close, "Date:", Date),
hoverinfo = "text")
In summary, there is a clear upward trend in Netflix’s stock price. The first significant increase took place in 2018, but the most notable one occurred in 2020, with consistent growth reaching its highest point in October 2021. However, it experienced a steep drop afterward, going from $690.31 USD to $174.87 in April 2022.
NFLXts<-ts(Netflix$Netflix_Adj_Close,start=c(2007,1),end=c(2022,12),frequency=12)
Netdec<-decompose(NFLXts)
plot(Netdec)
i. Do the time series data show a trend?
Yes they do. It is possible to see a positive trend for this time series data.
ii. Do the time series data show seasonality? How is the change of the seasonal component over time?
The seasonal component exhibits a consistent pattern over time, indicating the presence of seasonality. Furthermore, there is no observable variation or changes in this component as time progresses.
It’s interesting to observe the random factor in this time series. Initially, this factor remains fairly stable, but then it starts to fluctuate around the year 2017. Additionally, there is a noticeable spike, possibly in the year 2022, which coincides with Netflix’s most significant downturn. Upon further investigation, it was possible to found information about that year, such as the price increase in Netflix plans, increased competition in the market, Netflix’s exit from Russia, and its measures against password sharing.
# Stationary Test
adf.test(Netflix$Netflix_Adj_Close)
##
## Augmented Dickey-Fuller Test
##
## data: Netflix$Netflix_Adj_Close
## Dickey-Fuller = -2.9019, Lag order = 5, p-value = 0.1987
## alternative hypothesis: stationary
### H0: Non-stationary and HA: Stationary.
### The P-Value is higher than 0.05 so it fails to Reject the H0. Time series data is non-stationary.
plot(Netflix$Netflix_Adj_Close,type="l",ylab="Original Price Stock",main = "Netflix Stock Price")
Considering the p-value of 0.1987 obtained in the ADF test for this time series, we fail to reject H0, which suggests that the time series is non-stationary. It can be confirmed by seeing the graph.
# Serial Autocorrelation
# ACF plots: correlation between two periods in a time series is referred as autocorrelation function (acf)
acf(Netflix$Netflix_Adj_Close,main="Significant Autocorrelations") # Generally, non-stationary time series data show the presence of serial autocorrelation.
The graph suggest that there is a significant autocorrelation in the variable of the model, giving as a result that one period have a significant impact for the next period behavior, which can become a problem when estimating a model. It may not be adequately capturing the structure of the time series.
summary(NET_ARMA<-arma(diff(log(Netflix$Netflix_Adj_Close)),order=c(1,1)))
##
## Call:
## arma(x = diff(log(Netflix$Netflix_Adj_Close)), order = c(1, 1))
##
## Model:
## ARMA(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.733068 -0.077751 0.006555 0.077652 0.539031
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## ar1 0.36348 0.34178 1.063 0.288
## ma1 -0.22747 0.35478 -0.641 0.521
## intercept 0.01518 0.01175 1.292 0.196
##
## Fit:
## sigma^2 estimated as 0.02369, Conditional Sum-of-Squares = 4.48, AIC = -166.81
2. ARIMA transforming to diff and log
NET_ARIMA <- Arima(diff(log(Netflix$Netflix_Adj_Close)),order=c(1,1,1))
summary(NET_ARIMA)
## Series: diff(log(Netflix$Netflix_Adj_Close))
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.1438 -1.0000
## s.e. 0.0719 0.0217
##
## sigma^2 = 0.02387: log likelihood = 83.78
## AIC=-161.55 AICc=-161.42 BIC=-151.81
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.002676514 0.1532704 0.1071805 -Inf Inf 0.6820546 -0.01191508
# model evaluation
# Testing serial autocorrelation in regression residuals
# Ho: There is no serial autocorrelation
# Ha: There is serial autocorrelation
NET_ARMA_R<-NET_ARMA$residuals
Box.test(NET_ARMA_R,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: NET_ARMA_R
## X-squared = 0.0027965, df = 1, p-value = 0.9578
# Accept the Ho. The P-value is > 0.05 indicating that ARMA model does not show residual serial autocorrelation.
Having a p-value of 0.9578 (>0.05), we fail to reject H0, indicating that this ARMA model does not show residual serial autocorrelation in its residuals, and they behave independently.
#When estimating this ARMA model, first value is null, so we need to omit it.
NET_ARMA$fitted.values <- na.omit(NET_ARMA$fitted.values)
NET_ARMA$residuals <- na.omit(NET_ARMA$residuals)
#Testing residuals
adf.test(NET_ARMA$residuals)
## Warning in adf.test(NET_ARMA$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: NET_ARMA$residuals
## Dickey-Fuller = -6.3544, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
For ADF residuals, p-value is lower than 0.05, so we fail to reject H0, and it can be concluded that residuals are stationary.
hist(NET_ARMA$residuals)
It follows a normal distribution,
#testing values
adf.test(NET_ARMA$fitted.values)
## Warning in adf.test(NET_ARMA$fitted.values): p-value smaller than printed
## p-value
##
## Augmented Dickey-Fuller Test
##
## data: NET_ARMA$fitted.values
## Dickey-Fuller = -5.7383, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
We got a p-value of 0.01, failing to reject H0, which means that there is an stationary time series data.
plot(diff(log(Netflix$Netflix_Adj_Close)),type="l",ylab="diff Price Stock",main = "Netflix Stock Price")
We got stationary time series data by correctly transforming them to diff and log.
# model evaluation
# Testing serial autocorrelation in regression residuals
# Ho: There is no serial autocorrelation
# Ha: There is serial autocorrelation
NET_ARIMA_R<-NET_ARIMA$residuals
Box.test(NET_ARIMA_R,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: NET_ARIMA_R
## X-squared = 0.027544, df = 1, p-value = 0.8682
# Accept the Ho. The P-value is > 0.05 indicating that ARMA model does not show residual serial autocorrelation.
Having a p-value of 0.8682 (>0.05), we fail to reject H0, indicating that this ARMA model does not show residual serial autocorrelation in its residuals, and they behave independently.
#Testing residuals.
adf.test(NET_ARIMA$residuals)
## Warning in adf.test(NET_ARIMA$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: NET_ARIMA$residuals
## Dickey-Fuller = -6.2735, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
For ADF residuals in ARIMA, p-value is lower than 0.05, so we fail to reject H0, and it can be concluded that residuals are stationary.
#Testing values
adf.test(NET_ARIMA$fitted)
## Warning in adf.test(NET_ARIMA$fitted): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: NET_ARIMA$fitted
## Dickey-Fuller = -5.1742, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
We got a p-value of 0.01, failing to reject H0, which means that there is an stationary time series data.
It can be observed that both models ARMA and ARIMA have good test values to be taken into account. However, ARMA model has a lower AIC value, and a higher value for “Ljung_Box Test”, this is why we are gonna select this model.
acf(NET_ARMA$fitted.values,main="Significant Autocorrelations")
This model also helps to eliminate serial autocorrelation.
For the next part, it will be necessary to transform the fitted.values of the model into its original form, reversing log and first ordering difference.
We need to convert these values into a vector “c” due the different length between fitted.values and the original ones, that we will be needing it to reverse the “diff” transformation. Having this time series as vector allows to sum them up, despite of length difference. Finally, we are going to convert them into time series again, in order to make a forecast.
fv_exp<-c(exp(NET_ARMA$fitted.values)) #reverting "log" operation using "exp"
IBv<-c(Netflix$Netflix_Adj_Close) #Converting the original values into a vector.
Fixed_Values<-fv_exp+IBv #reverting "diff" operation summing the original values to the differences.
## Warning in fv_exp + IBv: longitud de objeto mayor no es múltiplo de la longitud
## de uno menor
ts_Fixed_Values <- ts(Fixed_Values, start = 1, end = length(Fixed_Values), frequency = 1) #Make it time series
Now it is possible to make the forecast correctly
NStock_forecast<-forecast(ts_Fixed_Values,h=5) #h=5 for 5 periods.
NStock_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 193 296.1937 240.2485 352.1388 210.6329 381.7544
## 194 296.4846 216.8991 376.0702 174.7690 418.2002
## 195 296.7756 198.7233 394.8279 146.8175 446.7337
## 196 297.0666 183.1687 410.9644 122.8748 471.2583
## 197 297.3575 169.2521 425.4629 101.4372 493.2778
When generating a forecast with this model, we can obtain an estimate of what the stock price for the next 5 periods could be. Taking into account a 95% confidence level, these values would be as follows:
Pointed values for this forecast are:
The graphs for this forecast are plotted below.
plot(NStock_forecast)
autoplot(NStock_forecast)
References Netflix - Overview - Profile. (2018). Netflix.net. https://ir.netflix.net/ir-overview/profile/default.aspx