1. Project Overview

Stock market forecasting is one of the most challenging and interesting areas of forecasting. A strong forecast model can have a large financial payoff. Stock market data is normally volatile due to multiple dependencies which may include but are not limited to microeconomic and macroeconomic factors. That’s why judgmental forecasting (Deep Domain Knowledge) can be beneficial in the stock market forecast. The S&P 500, or just the S&P, is an American stock market index based on the market capitalizations of 500 large companies having common stocks listed on the NYSE, NASDAQ, or the Cboe BZX Exchange. The provided dataset includes historical stock prices (5 years of data up to February 2018) for all companies currently found on the S&P 500 index. The data has been acquired by querying against IEX group Rest API.

2. Top 10 Asset Holdings

Assumption: Total Asset = Average of daily stock price * Daily Volume

#load csv data
df <- read.csv(file = "all_SPstocks_5yr.csv", header = TRUE)

#Calculate the average stock price, asset and year
df$avg <- rowMeans(df[, 2:5])
df$asset <- df$avg*df$volume
df$date <- as.Date(df$date)
df$year <- year(df$date)

head(df)
##         date  open  high   low close   volume Name     avg     asset year
## 1 2013-02-08 15.07 15.12 14.63 14.75  8407500  AAL 14.8925 125208694 2013
## 2 2013-02-11 14.89 15.01 14.26 14.46  8882000  AAL 14.6550 130165710 2013
## 3 2013-02-12 14.45 14.51 14.10 14.27  8126000  AAL 14.3325 116465895 2013
## 4 2013-02-13 14.30 14.94 14.25 14.66 10259500  AAL 14.5375 149147481 2013
## 5 2013-02-14 14.94 14.96 13.16 13.99 31879900  AAL 14.2625 454687074 2013
## 6 2013-02-15 13.93 14.61 13.93 14.50 15628000  AAL 14.2425 222581790 2013

Assumption: When determining the top 10 holdings of total assets, the total assets on the last day of each year are used. For instance, the top 10 holdings in 2013 are ordered by the total assets on 12/31/2013. Moreover, since we have incomplete data of 2018, the top 10 holdings in 2018 are determined by the total assets on 02/07/2018.

#Find the last trading day of each year
last_day <- aggregate(df$date, by = list(year = df$year), max)

#Find the top 10 holdings of each year
top10 <- data.frame()
for (i in c(1:6)){
  asset <- df[df$date == last_day[i, 2], c("Name", "asset", "year") ]
  top10 <- rbind(top10, head(asset[order(asset$asset, decreasing = TRUE),], 10))
}

#Reshape the top10 list
top10_list <- data.frame("2013" = top10[1:10,], "2014" = top10[11:20,], "2015" = top10[21:30,], "2016" = top10[31:40,], "2017" = top10[41:50,], "2018" = top10[51:60,])

#Top 10 holdings in 2018
top10_2018 <- tail(top10, 10)

writeLines("List of top 10 asset holdings in 2018:")

## List of top 10 asset holdings in 2018:

top10_2018

##         Name       asset year
## 47646   AMZN 10281971451 2018
## 2518    AAPL  8323089876 2018
## 219868    FB  5032939624 2018
## 423892  NVDA  4617113596 2018
## 391457  MSFT  3710679561 2018
## 74129    BAC  3161132240 2018
## 251567 GOOGL  2779883968 2018
## 601414  WYNN  2774291664 2018
## 76647     BA  2745773152 2018
## 592670   WFC  2611720886 2018

The top 10 holdings by total assets in 2018 are Amazon(AMZN), Apple(APPL), Facebook(FB), Nvidia(NVDA), Microsoft (MSFT), Bank of America(BAC), Alphabet (GOOGL), Wynn Resorts (WYNN), Boeing (BA), and Wells Fargo (WFC). 6 out of 10 are all technology companies, whose total assets sum up to 34.75 billion.

#Subset of the original dataset
df_10 <- df[df$Name %in% top10_2018$Name, ]

#Reshape the data
melt_df_10 <- melt(df_10, id.vars = c("date", "Name","avg", "asset", "year"))

p6 <- ggplot(top10_list[,1:2], 
       aes(x = reorder(X2013.Name, -X2013.asset), y = X2013.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2013.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2013 Top 10 Holdings by Total Assets")

p5 <- ggplot(top10_list[,4:5], 
             aes(x = reorder(X2014.Name, -X2014.asset), y = X2014.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2014.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2014 Top 10 Holdings by Total Assets")

p4 <- ggplot(top10_list[,7:8], 
             aes(x = reorder(X2015.Name, -X2015.asset), y = X2015.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2015.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2015 Top 10 Holdings by Total Assets")

p3 <- ggplot(top10_list[,10:11], 
             aes(x = reorder(X2016.Name, -X2016.asset), y = X2016.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2016.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2016 Top 10 Holdings by Total Assets")

p2 <- ggplot(top10_list[13:14], 
             aes(x = reorder(X2017.Name, -X2017.asset), y = X2017.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2017.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2017 Top 10 Holdings by Total Assets")

p1 <- ggplot(top10_list[,16:17], 
             aes(x = reorder(X2018.Name, -X2018.asset), y = X2018.asset )) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = paste(format(round(top10_list$X2018.asset/1e6,1), trim = TRUE), "M", sep = "")), 
            vjust = -0.3, size = 3) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("2018 Top 10 Holdings by Total Assets")

grid.arrange(p1, p2, p3, p4, p5, p6, ncol = 2)

According to the figure above, we can observe that:

2018 was a phenomenal year for the top 10 companies. All these companies at least doubled their assest within three months.
There are a lot of changes happening in 2016. Even Apple still have the largest assets, its total assets are smaller than that in 2015. However, 2016 was a great year for Nvidia that jumps to the second place for the first time.
Technology companies continue showing strong capabilities of holding giant assests since 2013.
Amazon is a rising star, whose total assets increase dramatically since 2015. Specially, its assets increases by over 300% within 3 months in 2018.
Apple occupies the first place for a long time till Amazon exceeds it in 2018.
It is the first time that Wynn Resort appears in the top 10 list.

Assmption: Invest 100,000 dollars today, no inflation rate or other factors considered when calculate the return rate of investment.


#Define function to caluclate the ROI
ROI_calculation <- function(current_price, future_price, total_investment){
  rate <- ((total_investment/current_price)*future_price-total_investment)/total_investment
  rate <- percent(rate, accuracy = .01)
  return(rate)
}

3. Forecasting

3.1 Amazon

Assumption: The NYSE and NASDAQ average about 253 trading days a year.

days_in_2013 <- nrow(df_10[df_10$Name == "AMZN" & df_10$year == 2013,])

#Convert data to time series
AMZN_avg <- ts(df_10[df_10$Name == "AMZN", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

AMZN_asset <- ts(df_10[df_10$Name == "AMZN", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

AMZN_volume <- ts(df_10[df_10$Name == "AMZN", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(AMZN_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("AMZN Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(AMZN_asset)

p3 <- autoplot(AMZN_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("AMZN Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(AMZN_avg)

p5 <- autoplot(AMZN_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("AMZN Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(AMZN_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Amazon has large variation over time, with a slightly increasing trend. The correlogram verifies the increasing trend. Moreover, there is a seasaonal or cyclic pattern in this time series.

Stock Price: According to the time plot, there is a dramatically increasing trend over time. The variation seems constant over time. The correlogram verifies the existance of a trend. There might be seasonality in the time series, but it is not very significant.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is no clear trend. According to the correlogram, it is interesting to notice that the volumes show constant seasonality, whose frequency is about a quarter.

Based on the observations above, we can conclude that the increasing trend of total assets are attributed to the dramatic increase of stock price; meanwhile, the seasonal pattern shown in the total assets mainly come from the stock volumes.

decomp <- stl(AMZN_avg, t.window = 15, s.window = "periodic", robust = TRUE)

autoplot(decomp) + ggtitle("STL Decomposition of Amazon Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Moreover, when checking the remainders, we notice that at the beginning of each year, the remaindars have large variance, resulting from the seasonal decomposition. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(AMZN_avg, h = 65)
fcast_short2 <- holt(AMZN_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(AMZN_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(AMZN_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(AMZN_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(AMZN_avg~trend, lambda = 0), h = 65)

autoplot(AMZN_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("AMZN Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  9.038417
## 2    Damped Holt's Trend  9.047351
## 3                  Drift  9.052496
## 4                    ETS 12.754332
## 5      Linear Regression 97.109638
## 6 Exponential Regression 63.635949

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the damped Holt’s trend method best fit our assumption and the time series. Furthermore, the RMSE of this method is smaller than most of the methods, except the Holt’s method. The Holt’s method is too optimistic for stock forecasting.

checkresiduals(fcast_short2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt's method
## Q* = 342.8, df = 246.8, p-value = 5.007e-05
## 
## Model df: 5.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(AMZN_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short2$mean, 1))

writeLines("ROI of short-term investment on AMZN is:")

## ROI of short-term investment on AMZN is:

ROI_calculation(current, price_3month, 100000)

## [1] "9.25%"

According to the Ljung-Box test, the damped Holt’s trend model does have lack of fit. The residual plots show that the variation of residuals increase over time.This is attributed to the dramatically changes in the time series. Since this model gives the best forecasting results, we can accept this minor issue with the residuals. Moreover, the distribution of residual follows a normal distribution with zero-mean.Therefore, we select the damped Holt’s trend model for short-term forecasting. The return of investment rate is 9.25%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(AMZN_avg, h = 253*3)
fcast_long2 <- holt(AMZN_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(AMZN_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(AMZN_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(AMZN_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(AMZN_avg~trend, lambda = 0), h = 253*3)

autoplot(AMZN_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("AMZN Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  9.038417
## 2    Damped Holt's Trend  9.050041
## 3                  Drift  9.052496
## 4                    ETS 12.754332
## 5      Linear Regression 97.109638
## 6 Exponential Regression 63.635949

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift method best fit our assumption and the time series. Furthermore, the RMSE of this method is smaller than most of the methods, except the Holt’s method and the damped Holt’s method. These two methods are too optimistic for long-term forecasting.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 340.43, df = 250.8, p-value = 0.0001395
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(AMZN_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on AMZN is:")

## ROI of long-term investment on AMZN is:

ROI_calculation(current, price_3year, 100000)

## [1] "49.31%"

According to the Ljung-Box test, the drift model does have lack of fit. The residual plots show that the variation of residuals increase over time.This is attributed to the dramatically changes in the time series. Since this model gives the best forecasting results, we can accept this minor issue with the residuals. Moreover, the distribution of residual follows a normal distribution with zero-mean.Therefore, we select the drift model for long-term forecasting. The return of investment rate is 49.31%.

3.2 Apple

#Convert data to time series
AAPL_avg <- ts(df_10[df_10$Name == "AAPL", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

AAPL_asset <- ts(df_10[df_10$Name == "AAPL", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

AAPL_volume <- ts(df_10[df_10$Name == "AAPL", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(AAPL_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("AAPL Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(AAPL_asset)

p3 <- autoplot(AAPL_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("AAPL Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(AAPL_avg)

p5 <- autoplot(AAPL_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("AAPL Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(AAPL_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Apple has large variation over time, with a slightly decreasing trend. The correlogram verifies the decreasing trend. Moreover, there is cyclic pattern observed in this time series.

Stock Price: According to the time plot, there is an overall increasing trend over time. The variation seems constant over time. The correlogram verifies the existance of a trend.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is a decreasing trend. According to the correlogram, the stock volume has some cyclic patterns.

Based on the observations above, we can conclude that the slightly decreasing trend of total assets are attributed to the decrease of stock volume; meanwhile, the cyclic pattern shown in the total assets mainly come from the stock volumes.

decomp <- stl(AAPL_avg, t.window = 13, s.window = "periodic", robust = TRUE)

autoplot(decomp) + ggtitle("STL Decomposition of Apple Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(AAPL_avg, h = 65)
fcast_short2 <- holt(AAPL_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(AAPL_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(AAPL_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(AAPL_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(AAPL_avg~trend, lambda = 0), h = 65)

autoplot(AAPL_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("AAPL Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  1.340898
## 2    Damped Holt's Trend  1.341282
## 3                  Drift  1.341328
## 4                    ETS  1.430296
## 5      Linear Regression 14.467210
## 6 Exponential Regression 14.449197

checkresiduals(fcast_short1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 331.49, df = 247.8, p-value = 0.0002985
## 
## Model df: 4.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(AAPL_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short1$mean, 1))

writeLines("ROI of short-term investment on AAPL is:")

## ROI of short-term investment on AAPL is:

ROI_calculation(current, price_3month, 100000)

## [1] "3.17%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the Holt’s trend model for short-term forecasting. The return of investment rate is 3.17%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(AAPL_avg, h = 253*3)
fcast_long2 <- holt(AAPL_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(AAPL_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(AAPL_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(AAPL_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(AAPL_avg~trend, lambda = 0), h = 253*3)

autoplot(AAPL_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("AAPL Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  1.340898
## 2    Damped Holt's Trend  1.334498
## 3                  Drift  1.341328
## 4                    ETS  1.430296
## 5      Linear Regression 14.467210
## 6 Exponential Regression 14.449197

checkresiduals(fcast_long1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 331.49, df = 247.8, p-value = 0.0002985
## 
## Model df: 4.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(AAPL_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long1$mean, 1))

writeLines("ROI of long-term investment on APPLE is:")

## ROI of long-term investment on APPLE is:

ROI_calculation(current, price_3year, 100000)

## [1] "37.06%"

3.3 Facebook

#Convert data to time series
FB_avg <- ts(df_10[df_10$Name == "FB", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

FB_asset <- ts(df_10[df_10$Name == "FB", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

FB_volume <- ts(df_10[df_10$Name == "FB", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(FB_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("FB Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(FB_asset)

p3 <- autoplot(FB_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("FB Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(FB_avg)

p5 <- autoplot(FB_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("FB Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(FB_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Facebook has large variation over time. There is no trend or seasonality observed.

Stock Price: According to the time plot, there is a dramatically increasing trend over time, but no seasonality. The correlogram verifies the existance of a trend.
Stock Volume: In the time plot, the stock volumnes have significant changes over time. The volumes decrease since 2014.

autoplot(stl(FB_avg, t.window = 13, s.window = "periodic", robust = TRUE)) + ggtitle("STL Decomposition of Facebook Stock Price")

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(FB_avg, h = 65)
fcast_short2 <- holt(FB_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(FB_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(FB_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(FB_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(FB_avg~trend, lambda = 0), h = 65)

autoplot(FB_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("FB Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  1.366646
## 2    Damped Holt's Trend  1.371608
## 3                  Drift  1.367055
## 4                    ETS  1.599945
## 5      Linear Regression  7.895288
## 6 Exponential Regression 10.045526

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the drift method or Holt’s trend method (overlapping) best fit our assumption and the time series. Furthermore, the RMSE’s of these methods are smaller than most of the methods, except the ETS method. The ETS has underestimated the stock price too much.

checkresiduals(fcast_short3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 320.01, df = 250.8, p-value = 0.002028
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(FB_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short3$mean, 1))

writeLines("ROI of short-term investment on FB is:")

## ROI of short-term investment on FB is:

ROI_calculation(current, price_3month, 100000)

## [1] "4.35%"

According to the Ljung-Box test, the drift model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for short-term forecasting. The return of investment rate is 4.35%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(FB_avg, h = 253*3)
fcast_long2 <- holt(FB_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(FB_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(FB_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(FB_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(FB_avg~trend, lambda = 0), h = 253*3)

autoplot(FB_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("FB Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  1.366646
## 2    Damped Holt's Trend  1.371211
## 3                  Drift  1.367055
## 4                    ETS  1.599945
## 5      Linear Regression  7.895288
## 6 Exponential Regression 10.045526

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift method or the Holt’s trend method (overlapping) best fit our assumption and the time series. Furthermore, the RMSE of these two methods are the smallest among all the methods.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 320.01, df = 250.8, p-value = 0.002028
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(FB_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on FB is:")

## ROI of long-term investment on FB is:

ROI_calculation(current, price_3year, 100000)

## [1] "50.81%"

According to the Ljung-Box test, the drift model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for long-term forecasting. The return of investment rate is 50.81%.

3.4 NVIDIA

#Convert data to time series
NVDA_avg <- ts(df_10[df_10$Name == "NVDA", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

NVDA_asset <- ts(df_10[df_10$Name == "NVDA", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

NVDA_volume <- ts(df_10[df_10$Name == "NVDA", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(NVDA_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("NVDA Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(NVDA_asset)

p3 <- autoplot(NVDA_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("NVDA Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(NVDA_avg)

p5 <- autoplot(NVDA_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("NVDA Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(NVDA_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Amazon has an increasing variation as trend increases since 2017.Moreover, there is cyclic pattern in this time series.

Stock Price: According to the time plot, there is a dramatically increasing trend over time, especially after 2017. The correlogram verifies the existance of a trend. No seasonality is observed.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is no clear trend. According to the correlogram, it is interesting to notice that the volumes show some cyclic patterns over time.

Based on the observations above, we can conclude that the increasing trend of total assets are attributed to the dramatic increase of stock price; meanwhile, the cyclic pattern shown in the total assets mainly come from the stock volumes.

autoplot(stl(NVDA_avg, t.window = 13, s.window = "periodic", robust = TRUE)) + ggtitle("Classical Multiplicative Decomposition of NVIDIA Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Moreover, when checking the remainders, we notice that since the 2017, the remainders varies significantly. This observation also means inconsistant patterns in the original data. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(NVDA_avg, h = 65)
fcast_short2 <- holt(NVDA_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(NVDA_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(NVDA_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(NVDA_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(NVDA_avg~trend, lambda = 0), h = 65)

autoplot(NVDA_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("NVDA Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  1.733307
## 2    Damped Holt's Trend  1.739568
## 3                  Drift  1.738368
## 4                    ETS  2.144262
## 5      Linear Regression 33.046401
## 6 Exponential Regression 24.199647

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the Holt’s trend method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smallest among other methods.

checkresiduals(fcast_short1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 635.61, df = 247.8, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(NVDA_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short1$mean, 1))

writeLines("ROI of short-term investment on NVDA is:")

## ROI of short-term investment on NVDA is:

ROI_calculation(current, price_3month, 100000)

## [1] "12.61%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. The residual plots show that the variation of residuals increase over time.This is attributed to the dramatically changes in the original data. Since this model gives the best forecasting results, we can accept this minor issue with the residuals. Moreover, the distribution of residual follows a normal distribution with zero-mean.Therefore, we select the damped Holt’s trend model for short-term forecasting. The return of investment rate is 12.61%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(NVDA_avg, h = 253*3)
fcast_long2 <- holt(NVDA_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(NVDA_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(NVDA_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(NVDA_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(NVDA_avg~trend, lambda = 0), h = 253*3)

autoplot(NVDA_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("NVDA Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  1.733307
## 2    Damped Holt's Trend  1.731359
## 3                  Drift  1.738368
## 4                    ETS  2.144262
## 5      Linear Regression 33.046401
## 6 Exponential Regression 24.199647

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smaller than most of methods, except the Holt’s trend method and damped Holt’s trend method. However, the Holt’s trend method is too optimistic for long-term forecasting; while the constant forecasts of the damped Holt’s trend does not make sense.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 635.88, df = 250.8, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(NVDA_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on NVDA is:")

## ROI of long-term investment on NVDA is:

ROI_calculation(current, price_3year, 100000)

## [1] "57.09%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. The residual plots show that the variation of residuals increase over time.This is attributed to the dramatically changes in the original data. Since this model gives the best forecasting results, we can accept this minor issue with the residuals. Moreover, the distribution of residual follows a normal distribution with zero-mean.Therefore, we select the drift model for long-term forecasting. The return of investment rate is 57.09%.

3.5 Microsoft

#Convert data to time series
MSFT_avg <- ts(df_10[df_10$Name == "MSFT", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

MSFT_asset <- ts(df_10[df_10$Name == "MSFT", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

MSFT_volume <- ts(df_10[df_10$Name == "MSFT", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(MSFT_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("MSFT Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(MSFT_asset)

p3 <- autoplot(MSFT_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("MSFT Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(MSFT_avg)

p5 <- autoplot(MSFT_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("MSFT Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(MSFT_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Microsoft has large variation over time, without a clear trend. The correlogram shows there is seasonal or cyclic pattern in this time series.

Stock Price: According to the time plot, there is an overall increasing trend over time. The variation seems constant over time. The correlogram verifies the existance of a trend.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is a decreasing trend. According to the correlogram, the stock volume has some cyclic patterns.

Based on the observations above, we can conclude that the cyclic pattern shown in the total assets mainly come from the stock volumes.

autoplot(stl(MSFT_avg, t.window = 13, s.window = "periodic", robust = TRUE)) + ggtitle("STL Decomposition of Microsoft Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(MSFT_avg, h = 65)
fcast_short2 <- holt(MSFT_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(MSFT_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(MSFT_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(MSFT_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(MSFT_avg~trend, lambda = 0), h = 65)

autoplot(MSFT_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("MSFT Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend 0.5877113
## 2    Damped Holt's Trend 0.5894088
## 3                  Drift 0.5874820
## 4                    ETS 0.6542756
## 5      Linear Regression 4.6765349
## 6 Exponential Regression 3.7138033

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the drift method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smallest among all the methods.

checkresiduals(fcast_short3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 340.4, df = 250.8, p-value = 0.0001401
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(MSFT_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short3$mean, 1))

writeLines("ROI of short-term investment on MSFT is:")

## ROI of short-term investment on MSFT is:

ROI_calculation(current, price_3month, 100000)

## [1] "3.59%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for short-term forecasting. The return of investment rate is 3.59%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(MSFT_avg, h = 253*3)
fcast_long2 <- holt(MSFT_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(MSFT_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(MSFT_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(MSFT_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(MSFT_avg~trend, lambda = 0), h = 253*3)

autoplot(MSFT_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("MSFT Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend 0.5877113
## 2    Damped Holt's Trend 0.5878188
## 3                  Drift 0.5874820
## 4                    ETS 0.6542756
## 5      Linear Regression 4.6765349
## 6 Exponential Regression 3.7138033

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift method or the Holt’s trend method (overlapping) best fit our assumption and the time series. Furthermore, the RMSE of these two methods are the smallest among all the methods.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 340.4, df = 250.8, p-value = 0.0001401
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(MSFT_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on MSFT is:")

## ROI of long-term investment on MSFT is:

ROI_calculation(current, price_3year, 100000)

## [1] "41.97%"

According to the Ljung-Box test, the drift model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for long-term forecasting. The return of investment rate is 41.97%.

3.6 Bank of America

#Convert data to time series
BAC_avg <- ts(df_10[df_10$Name == "BAC", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

BAC_asset <- ts(df_10[df_10$Name == "BAC", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

BAC_volume <- ts(df_10[df_10$Name == "BAC", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(BAC_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("BAC Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(BAC_asset)

p3 <- autoplot(BAC_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("BAC Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(BAC_avg)

p5 <- autoplot(BAC_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("BAC Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(BAC_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Bank of America has large variation over time, without a clear trend. The correlogram shows the existance of sesonal or cyclic pattern.

Stock Price: According to the time plot, there is an overall increasing trend over time. The variation seems constant over time. The correlogram verifies the existance of a trend.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is a decreasing trend. According to the correlogram, the stock volume has some cyclic patterns.

decomp <- stl(BAC_avg, t.window = 13, s.window = "periodic", robust = TRUE)

autoplot(decomp) + ggtitle("STL Decomposition of Bank of America Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(BAC_avg, h = 65)
fcast_short2 <- holt(BAC_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(BAC_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(BAC_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(BAC_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(BAC_avg~trend, lambda = 0), h = 65)

autoplot(BAC_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("BAC Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend 0.2351029
## 2    Damped Holt's Trend 0.2385554
## 3                  Drift 0.2351336
## 4                    ETS 0.2558375
## 5      Linear Regression 2.9724854
## 6 Exponential Regression 2.8568694

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the Holt’s trend method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smallest.

checkresiduals(fcast_short1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 340.92, df = 247.8, p-value = 7.86e-05
## 
## Model df: 4.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(BAC_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short1$mean, 1))

writeLines("ROI of short-term investment on BAC is:")

## ROI of short-term investment on BAC is:

ROI_calculation(current, price_3month, 100000)

## [1] "6.51%"

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(BAC_avg, h = 253*3)
fcast_long2 <- holt(BAC_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(BAC_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(BAC_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(BAC_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(BAC_avg~trend, lambda = 0), h = 253*3)

autoplot(BAC_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("BAC Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend 0.2351029
## 2    Damped Holt's Trend 0.2337044
## 3                  Drift 0.2351336
## 4                    ETS 0.2558375
## 5      Linear Regression 2.9724854
## 6 Exponential Regression 2.8568694

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift best fit our assumption and the time series. Moreover, the RMSE of this method is smaller than most of other methods, except the Holt’s method and the damped Holt’s method. However, the Holt’s method is too optimistic for long-term forecast; while the constant forecasts of damped Holt’s method do not make sense.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 338.99, df = 250.8, p-value = 0.0001708
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(BAC_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on BAC is:")

## ROI of long-term investment on BAC is:

ROI_calculation(current, price_3year, 100000)

## [1] "37.53%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for short-term forecasting. The return of investment rate is 37.53%.

3.7 Alphabet

#Convert data to time series
GOOGL_avg <- ts(df_10[df_10$Name == "GOOGL", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

GOOGL_asset <- ts(df_10[df_10$Name == "GOOGL", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

GOOGL_volume <- ts(df_10[df_10$Name == "GOOGL", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(GOOGL_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("GOOGL Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(GOOGL_asset)

p3 <- autoplot(GOOGL_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("GOOGL Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(GOOGL_avg)

p5 <- autoplot(GOOGL_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("GOOGL Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(GOOGL_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Alphabet has large variation over time, without a clear trend. The correlogram shows there is cyclic pattern in this time series.

Stock Price: According to the time plot, there is an overall increasing trend over time. The variation seems constant over time. The correlogram verifies the existance of a trend.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is a decreasing trend. According to the correlogram, the stock volume has some cyclic patterns.

Based on the observations above, we can conclude that the cyclic pattern shown in the total assets mainly come from the stock volumes.

autoplot(stl(GOOGL_avg, t.window = 13, s.window = "periodic", robust = TRUE)) + ggtitle("STL Decomposition of Alphabet Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(GOOGL_avg, h = 65)
fcast_short2 <- holt(GOOGL_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(GOOGL_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(GOOGL_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(GOOGL_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(GOOGL_avg~trend, lambda = 0), h = 65)

autoplot(GOOGL_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("GOOGL Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  8.003362
## 2    Damped Holt's Trend  8.017877
## 3                  Drift  8.004661
## 4                    ETS  9.414580
## 5      Linear Regression 56.866105
## 6 Exponential Regression 47.529928

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the drift method or Holt’s trend method (overlapping) best fit our assumption and the time series. Furthermore, the RMSE’s of these methods are smaller than the other methods.

checkresiduals(fcast_short3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 465.99, df = 250.8, p-value = 4.219e-15
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(GOOGL_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short3$mean, 1))

writeLines("ROI of short-term investment on GOOGL is:")

## ROI of short-term investment on GOOGL is:

ROI_calculation(current, price_3month, 100000)

## [1] "3.28%"

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(GOOGL_avg, h = 253*3)
fcast_long2 <- holt(GOOGL_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(GOOGL_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(GOOGL_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(GOOGL_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(GOOGL_avg~trend, lambda = 0), h = 253*3)

autoplot(GOOGL_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("GOOGL Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  8.003362
## 2    Damped Holt's Trend  7.936226
## 3                  Drift  8.004661
## 4                    ETS  9.414580
## 5      Linear Regression 56.866105
## 6 Exponential Regression 47.529928

checkresiduals(fcast_long1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 466.99, df = 247.8, p-value = 1.221e-15
## 
## Model df: 4.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(GOOGL_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long1$mean, 1))

writeLines("ROI of long-term investment on GOOGL is:")

## ROI of long-term investment on GOOGL is:

ROI_calculation(current, price_3year, 100000)

## [1] "39.77%"

According to the Ljung-Box test, the drift model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the Holt’s trend model for long-term forecasting. The return of investment rate is 39.77%.

3.8 Wynn

#Convert data to time series
WYNN_avg <- ts(df_10[df_10$Name == "WYNN", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

WYNN_asset <- ts(df_10[df_10$Name == "WYNN", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

WYNN_volume <- ts(df_10[df_10$Name == "WYNN", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(WYNN_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("WYNN Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(WYNN_asset)

p3 <- autoplot(WYNN_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("WYNN Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(WYNN_avg)

p5 <- autoplot(WYNN_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("WYNN Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(WYNN_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Wynn has large variation over time, without a clear trend. No cyclic or seasonal pattern is observed.

Stock Price: According to the time plot, the price increases since 2016, but drop significantly at the beginning of 2018.
Stock Volume: In the time plot, the stock volumnes have significant changes over time.

decomp <- stl(WYNN_avg, t.window = 13, s.window = "periodic", robust = TRUE)

autoplot(decomp) + ggtitle("STL Decomposition of Wynn Resort Stock Price")

According to the decomposition results, the trend component is increasing since 2016, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(WYNN_avg, h = 65)
fcast_short2 <- holt(WYNN_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(WYNN_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(WYNN_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(WYNN_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(WYNN_avg~trend, lambda = 0), h = 65)

autoplot(WYNN_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("WYNN Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  2.558230
## 2    Damped Holt's Trend  2.557376
## 3                  Drift  2.559897
## 4                    ETS  2.553492
## 5      Linear Regression 40.605924
## 6 Exponential Regression 40.896997

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the ETS method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smallest.

checkresiduals(fcast_short4)

## Warning in checkresiduals(fcast_short4): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,Ad,N)
## Q* = 311.98, df = 246.8, p-value = 0.003084
## 
## Model df: 5.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(WYNN_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short4$mean, 1))

writeLines("ROI of short-term investment on WYNN is:")

## ROI of short-term investment on WYNN is:

ROI_calculation(current, price_3month, 100000)

## [1] "1.70%"

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(WYNN_avg, h = 253*3)
fcast_long2 <- holt(WYNN_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(WYNN_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(WYNN_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(WYNN_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(WYNN_avg~trend, lambda = 0), h = 253*3)

autoplot(WYNN_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("WYNN Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  2.558230
## 2    Damped Holt's Trend  2.541134
## 3                  Drift  2.559897
## 4                    ETS  2.553492
## 5      Linear Regression 40.605924
## 6 Exponential Regression 40.896997

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 389.87, df = 250.8, p-value = 4.149e-08
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(WYNN_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on Wynn is:")

## ROI of long-term investment on Wynn is:

ROI_calculation(current, price_3year, 100000)

## [1] "17.56%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the drift model for long-term forecasting. The return of investment rate is 17.56%.

3.9 Boeing

#Convert data to time series
BA_avg <- ts(df_10[df_10$Name == "BA", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

BA_asset <- ts(df_10[df_10$Name == "BA", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

BA_volume <- ts(df_10[df_10$Name == "BA", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(BA_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("BA Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(BA_asset)

p3 <- autoplot(BA_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("BA Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(BA_avg)

p5 <- autoplot(BA_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("BA Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(BA_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Boeing has an increasing variation as trend increases since 2017.Moreover, there is cyclic pattern in this time series.

Stock Price: According to the time plot, there is a dramatically increasing trend over time, especially after 2017. The correlogram verifies the existance of a trend. No seasonality is observed.
Stock Volume: In the time plot, the stock volumnes have significant changes over time, but there is no clear trend. According to the correlogram, it is interesting to notice that the volumes show some cyclic patterns over time.

autoplot(stl(BA_avg, t.window = 13, s.window = "periodic", robust = TRUE)) + ggtitle("Classical Multiplicative Decomposition of Boeing Stock Price")

According to the decomposition results, the trend component is increasing over time, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Moreover, when checking the remainders, we notice that since the 2017, the remainders varies significantly. This observation also means inconsistant patterns in the original data. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(BA_avg, h = 65)
fcast_short2 <- holt(BA_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(BA_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(BA_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(BA_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(BA_avg~trend, lambda = 0), h = 65)

autoplot(BA_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("BA Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend  1.859472
## 2    Damped Holt's Trend  1.865038
## 3                  Drift  1.871985
## 4                    ETS  2.498656
## 5      Linear Regression 31.713798
## 6 Exponential Regression 30.002004

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the damped Holt’s trend method best fit our assumption and the time series. Furthermore, the RMSE of this method is smaller than most of the methods, except the Holt’s trend method. The Holt’s method is too optimistic for shor-term forecast.

checkresiduals(fcast_short2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt's method
## Q* = 407.35, df = 246.8, p-value = 5.253e-10
## 
## Model df: 5.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(BA_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short2$mean, 1))

writeLines("ROI of short-term investment on BA is:")

## ROI of short-term investment on BA is:

ROI_calculation(current, price_3month, 100000)

## [1] "8.11%"

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(BA_avg, h = 253*3)
fcast_long2 <- holt(BA_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(BA_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(BA_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(BA_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(BA_avg~trend, lambda = 0), h = 253*3)

autoplot(BA_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("BA Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend  1.859472
## 2    Damped Holt's Trend  1.859422
## 3                  Drift  1.871985
## 4                    ETS  2.498656
## 5      Linear Regression 31.713798
## 6 Exponential Regression 30.002004

When selecting models for long-term investment, we will emphasis the long-term performance of such company or industry. Thus, we will consider the trend of stock price in a larger range. According to the comparative analysis, forecasts using the drift method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smaller than most of methods, except the Holt’s trend method and damped Holt’s trend method. However, the Holt’s trend method is too optimistic for long-term forecasting; while the constant forecasts of the damped Holt’s trend does not make sense.

checkresiduals(fcast_long3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 430.77, df = 250.8, p-value = 1.156e-11
## 
## Model df: 1.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(BA_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long3$mean, 1))

writeLines("ROI of long-term investment on BA is:")

## ROI of long-term investment on BA is:

ROI_calculation(current, price_3year, 100000)

## [1] "46.91%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. The residual plots show that the variation of residuals increase over time.This is attributed to the dramatically changes in the original data. Since this model gives the best forecasting results, we can accept this minor issue with the residuals. Moreover, the distribution of residual follows a normal distribution with zero-mean.Therefore, we select the drift model for long-term forecasting. The return of investment rate is 46.91%.

3.10 Wells Fargo

#Convert data to time series
WFC_avg <- ts(df_10[df_10$Name == "WFC", "avg"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

WFC_asset <- ts(df_10[df_10$Name == "WFC", "asset"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

WFC_volume <- ts(df_10[df_10$Name == "WFC", "volume"], 
            frequency = 253, start = c(2013, 253-days_in_2013))

p1 <- autoplot(WFC_asset) +
  xlab("Daily") + 
  ylab("Total Assets (dollars)") +
  ggtitle("WFC Total Assets (Feb 2013 - Feb 2018)")

p2 <- ggAcf(WFC_asset)

p3 <- autoplot(WFC_avg) + 
  xlab("Daily") + 
  ylab("Average Stock Price (dollars)") +
  ggtitle("WFC Stock Price (Feb 2013 - Feb 2018)")

p4 <- ggAcf(WFC_avg)

p5 <- autoplot(WFC_volume) + 
  xlab("Daily") + 
  ylab("Stock Volume") +
  ggtitle("WFC Stock Volume (Feb 2013 - Feb 2018)")

p6 <- ggAcf(WFC_volume)

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 3)

1. Total Assets: According to the time plot, the total assests of Wynn has large variation over time, without a clear trend. There exists cyclic or seasonal pattern as shown in the correlogram.

Stock Price: According to the time plot, the price increases since 2017, but drop significantly at the beginning of 2018.Cyclic pattern is observed.
Stock Volume: In the time plot, the stock volumnes have significant changes over time; and the cyclic pattern is attributed to the stock volume.

decomp <- stl(WFC_avg, t.window = 13, s.window = "periodic", robust = TRUE)

autoplot(decomp) + ggtitle("STL Decomposition of Wells Fargo Stock Price")

According to the decomposition results, the trend component is increasing since 2017, which is consistant with our observation in the time plot. In terms of the seasonal component, even though the decomposition result shows the sesonal component, the scale of the seasonal component is relatively small that will not impact the stock price significantly. Therefore, when making forecast, we will focus on the trend component.

Short-term Forecast


#Short term forecasting (3 months)
fcast_short1 <- holt(WFC_avg, h = 65)
fcast_short2 <- holt(WFC_avg, damped = TRUE, phi = 0.98, h = 65)
fcast_short3 <- rwf(WFC_avg, drift = TRUE, h = 65)
fcast_short4 <- stlf(WFC_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 65)
fcast_short5 <- forecast(tslm(WFC_avg~trend), h = 65)
fcast_short6 <- forecast(tslm(WFC_avg~trend, lambda = 0), h = 65)

autoplot(WFC_avg) + 
  autolayer(fcast_short1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_short2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_short3, PI = FALSE, series = "Drift") +
  autolayer(fcast_short4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_short5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_short6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("WFC Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Short-term Forecast"))


RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_short1)[2], accuracy(fcast_short2)[2], accuracy(fcast_short3)[2],
               accuracy(fcast_short4)[2], accuracy(fcast_short5)[2], accuracy(fcast_short6)[2]))
  

print(RMSE_comp)
##                   Method      RMSE
## 1           Holt's Trend 0.5302463
## 2    Damped Holt's Trend 0.5302293
## 3                  Drift 0.5304005
## 4                    ETS 0.5230858
## 5      Linear Regression 4.2939391
## 6 Exponential Regression 4.3710081

Becasue the volatility of stock market, we assume that short-term forecast will heavily depend on the most recent prices. In other words, we are more aggressive for short-term forecasting. According to the comparative analysis, forecasts using the ETS method best fit our assumption and the time series. Furthermore, the RMSE of this method is the smallest.

checkresiduals(fcast_short4)

## Warning in checkresiduals(fcast_short4): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 304.78, df = 249.8, p-value = 0.009941
## 
## Model df: 2.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(WFC_avg,1))

#Stock price three months later
price_3month <- as.numeric(tail(fcast_short4$mean, 1))

writeLines("ROI of short-term investment on WFC is:")

## ROI of short-term investment on WFC is:

ROI_calculation(current, price_3month, 100000)

## [1] "-3.83%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the ETS model for long-term forecasting. The return of investment rate is -3.83%.

Long-term Forecast

#Long term forecasting (3 years)
fcast_long1 <- holt(WFC_avg, h = 253*3)
fcast_long2 <- holt(WFC_avg, damped = TRUE, h = 253*3)
fcast_long3 <- rwf(WFC_avg, drift = TRUE, h = 253*3)
fcast_long4 <- stlf(WFC_avg, t.window = 13, s.window = "periodic", robust = TRUE, method = "ets", h = 253*3)
fcast_long5 <- forecast(tslm(WFC_avg~trend), h = 253*3)
fcast_long6 <- forecast(tslm(WFC_avg~trend, lambda = 0), h = 253*3)

autoplot(WFC_avg) + 
  autolayer(fcast_long1, PI = FALSE, series = "Holt's Trend") +
  autolayer(fcast_long2, PI = FALSE, series = "Damped Holt's Trend") +
  autolayer(fcast_long3, PI = FALSE, series = "Drift") +
  autolayer(fcast_long4, PI = FALSE, series = "ETS") + 
  autolayer(fcast_long5, PI = FALSE, series = "Linear Regression") +
  autolayer(fcast_long6, PI = FALSE, series = "Exponential Regression") + 
  xlab("Daily") + ylab("Stock Price (dollars)") +
  ggtitle("WFC Stock Price Forecast") + 
  guides(colour = guide_legend(title = "Long-term Forecast"))

RMSE_comp <- data.frame(Method = c("Holt's Trend", "Damped Holt's Trend", "Drift", "ETS", "Linear Regression", "Exponential Regression"),
                        RMSE = c(accuracy(fcast_long1)[2], accuracy(fcast_long2)[2], accuracy(fcast_long3)[2],
               accuracy(fcast_long4)[2], accuracy(fcast_long5)[2], accuracy(fcast_long6)[2]))
  

print(RMSE_comp)

##                   Method      RMSE
## 1           Holt's Trend 0.5302463
## 2    Damped Holt's Trend 0.5273329
## 3                  Drift 0.5304005
## 4                    ETS 0.5230858
## 5      Linear Regression 4.2939391
## 6 Exponential Regression 4.3710081

checkresiduals(fcast_long4)

## Warning in checkresiduals(fcast_long4): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 304.78, df = 249.8, p-value = 0.009941
## 
## Model df: 2.   Total lags used: 251.8

#Stock price on 02/07/2018
current <- as.numeric(tail(WFC_avg,1))

#Stock price three years later
price_3year<- as.numeric(tail(fcast_long4$mean, 1))

writeLines("ROI of long-term investment on WFC is:")

## ROI of long-term investment on WFC is:

ROI_calculation(current, price_3year, 100000)

## [1] "0.00%"

According to the Ljung-Box test, the Holt’s trend model does have lack of fit. However, the residual plots show that the distribution of residual follows a normal distribution with zero-mean, and the variation are constant over time. This meets the requirements of a good forecasting model. Therefore, we select the ETS model for long-term forecasting. The return of investment rate is 0.0%.

4. Conclusion

For short-term investment, the top three stocks are:

DSCI-FXP-SP19- Project #1 Stock Market Forecast

Yiling He

March 21, 2019

1. Project Overview

2. Top 10 Asset Holdings

3. Forecasting

3.1 Amazon

Short-term Forecast

Long-term Forecast

3.2 Apple

Short-term Forecast

Long-term Forecast

3.3 Facebook

Short-term Forecast

Long-term Forecast

3.4 NVIDIA

Short-term Forecast

Long-term Forecast

3.5 Microsoft

Short-term Forecast

Long-term Forecast

3.6 Bank of America

Short-term Forecast

Long-term Forecast

3.7 Alphabet

Short-term Forecast

Long-term Forecast

3.8 Wynn

Short-term Forecast

Long-term Forecast

3.9 Boeing

Short-term Forecast

Long-term Forecast

3.10 Wells Fargo

Short-term Forecast

Long-term Forecast

4. Conclusion