Hey there! Welcome to another R project of mine. This time, I played with two models in order to forecast future stock prices for Activision Blizzard (ATVI). The models I used are simple linear regression and Holt-Winters. In my last project, which looked at predicting successful NFL field goals, I followed a structure of training a model on one set of data then evaluating performance on another set of data. I’m skipping that process for this project so I can talk more about the difference between the two models.
I’m trying to predict Blizzard’s stock prices because I am a huge Warcraft fanboy (WC3 + WoW). In fact, WoW Classic releases early next week. DM me if you play Horde.
The process for this project is as follows:
The main packages used for this project are tidyverse, readxl, stats, highcharter, forecast, and cowplot.
And with that…
One of the highest rated South Park episodes was “Make Love, Not Warcraft.”
So, here is where I’m going to import and adjust the data locally: Blizzard_Monthly_Data
I made this dataset myself by pulling daily historical data from Yahoo! Finance. I was able to pull historical opens, closes, volumes, adjusted closes, and returns from 2008-2019. I chose 2008 as my start date since Activision Blizzard was formed in 2008 after a merger. After pulling this data for Blizzard, I also pulled it for the Dow Jones Industrial Average (DJIA). I even went through the painstaking effort to add an event column to the original dataset to track patch/expansion releases for WoW and the annual Blizzcon convention. I was going to track impact of these events on stock price, but it got messy since I had to compress daily data into monthly data, then I got lazy. I had to compress the data into monthly figures since Holt-Winters needs even recurring periods to work well. There’s always 12 months in a year, but there is not a consistent number of trading days per year for the stock market (about 251, though). Furthermore, my unincluded regression with the event variable included showed no impact on stock price.
Anyhow, after importing, I want to quickly adjust the data to work for regression. Linear regression cannot directly take time-series data, so it needs an index in the place for date. Using the “mutate” function from the “dplyr” package, we can add a column to a dataset and indicate what we want the values to be. Since the data is pre-sorted from oldest to newest, we only need the row number which we set to “id”. Instead of using head() to inspect the dataset, we use glimpse() to do practically the same thing.
Blizzard_Monthly_Data <- read.csv(file="Month Year Blizz.csv")
Blizzard_Monthly_Data <- Blizzard_Monthly_Data %>% mutate(id = row_number())
glimpse(Blizzard_Monthly_Data)
## Observations: 132
## Variables: 4
## $ ï..Month.of.Date <fct> August 2008, September 2008, October 2008, No...
## $ Avg..Close <dbl> 17.008333, 16.431905, 12.433478, 11.212105, 9...
## $ Avg..DJIA.Close <dbl> 11557.056, 11072.842, 9079.750, 8440.312, 863...
## $ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...
Let’s dive into actual analytics
Linear regression is useful for this problem since it is strong at exploring the relationships of numbers when using numbers. Factor data and time data get a little more messy.
Below we run the regression looking at the average monthly close price for Blizzard’s stock vs the time index ID. What makes this a “simple” linear regression is the use of only one variable to predict the stock price. If we added another predictor variable, such as the DJIA close, we would have a multiple linear regression. While this might be more accurate, I found time to be highly correlated to the DJIA close, which is like double dipping. Besides, I won’t exactly know the future DJIA closes, but I will know the future time index, which builds a better case for forecasting.
Looking at the summary for the regression, we see two nice metrics: Pr(>|t|) and Adjusted R-squared. We want Pr(>|t|) to be as low as possible since it indicates whether or not the variable has any impact. Our cutoff is generally 0.05. Anything higher than that we should throw out. The adjusted R-squared should read as: “Changes in time ID are responsible for ~ 74.82% of the change in Blizzard’s stock price.” We want this to be as high as possible, but should be skeptical when it is too high (> 95%).
Lastly we look at the “Estimate” column, which actually gives magnitude for the indicator. The way this reads is: “for every 1 increase in time ID, Blizzard’s stock price should go up by $0.45.” Furthermore, stock price should be predicted with the following formula (note the -2.36913 is the intercept):
stock price = 0.45274(id) - 2.36913 + error
So now we have something we can use to predict future stock prices.
Blizzard_Regression<-lm(Blizzard_Monthly_Data$Avg..Close ~ Blizzard_Monthly_Data$id)
summary(Blizzard_Regression)
##
## Call:
## lm(formula = Blizzard_Monthly_Data$Avg..Close ~ Blizzard_Monthly_Data$id)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.094 -8.494 -2.590 6.786 26.162
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.36913 1.76556 -1.342 0.182
## Blizzard_Monthly_Data$id 0.45274 0.02304 19.654 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.08 on 130 degrees of freedom
## Multiple R-squared: 0.7482, Adjusted R-squared: 0.7463
## F-statistic: 386.3 on 1 and 130 DF, p-value: < 2.2e-16
Next, we visualize the regression using scatter plots with trendlines fitted to them. I did two separate plots just to show how closely related (correlated) time and DJIA close are. This is why we could only use one as a predictor in the above model.
Note the colors: for those who have a life and never played World of Warcraft, the two factions are the Horde and the Alliance. The Horde classically consists of Orcs, Taurens (minotaurs), Trolls, and Undead (zombies). The Horde is generally the cooler of the two factions, their colors are red and black. The Alliance classically consists of Humans (boring), Night Elves (purple elitests), Dwarves, and Gnomes. The Alliance is generally full of middleschoolers, their colors are blue and gold.
DJIAvStock <- ggplot(data=Blizzard_Monthly_Data, mapping = aes(Avg..DJIA.Close,Avg..Close))+
geom_point(color = "red") +
geom_smooth(method = "lm", color = "black")+
labs(x="DJIA Average Close", y = "Blizzard Average Close")+
ggtitle("Horde")
TimevStock <- ggplot(data=Blizzard_Monthly_Data, mapping = aes(id, Avg..Close))+
geom_point(color="blue") +
geom_smooth(method = "lm", color = "gold")+
labs(x="Monthly Time Index, starting 8/2008", y = "Blizzard Average Close")+
ggtitle("Alliance")
plot_grid(DJIAvStock, TimevStock, labels = "AUTO")
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Holt-Winters is a relatively simple yet strong model that works specifically with time-series data. A core concept of this model is smoothing. Exponential smoothing is essentially giving less weight to older observations (denoted as the alpha level). This makes sense since stock prices from 2010 should not be as important as stock prices from yesterday. Holt-Winters also has additional smoothing parameters, such as trend (beta), and seasonal (gamma), which looks at cyclical patterns on monthly or quarterly frequencies. Stocks are not supposed to be seasonal, but I kept gamma activated anyway.
The nice thing about Holt-Winters is that you can just state how far ahead you want to predict, in our case it is 24 months, or two years. The shaded area around the line shows confidence ranges that the future stock price will likely fall in. You can create similar shaded areas for linear regression by adding/subtracting standard deviations.
Blizz_ts <- ts(Blizzard_Monthly_Data$Avg..Close, start = c(2008,8), frequency = 12)
Blizz_HW <- HoltWinters(Blizz_ts)
Blizz_HW
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = Blizz_ts)
##
## Smoothing parameters:
## alpha: 1
## beta : 0.002181349
## gamma: 0.3417505
##
## Coefficients:
## [,1]
## a 49.08978600
## b 0.09164321
## s1 0.80441185
## s2 0.60653258
## s3 0.68838417
## s4 -0.15539180
## s5 -0.33418725
## s6 -0.60099376
## s7 -1.95591228
## s8 -1.02822557
## s9 -0.51899062
## s10 0.41626773
## s11 1.56289143
## s12 0.51521350
hchart(forecast(Blizz_HW, h = 24))
So, this was my attempt at forecasting. Thanks!