Objective - Understand the evolution of Google stock price over ther years

Data Loading

The character “-” represents a missing value

goog <- read.csv("goog.csv", na.strings = "-")

Data struture:

dim(goog)
## [1] 2514    6
head(goog)
##    ï..Date   Open   High    Low  Close  Volume
## 1 8-Apr-16 743.97 745.45 735.55 739.15 1290787
## 2 7-Apr-16 745.37 747.00 736.28 740.28 1429504
## 3 6-Apr-16 735.77 746.24 735.56 745.69 1050193
## 4 5-Apr-16 738.00 742.80 735.37 737.80 1129829
## 5 4-Apr-16 750.06 752.80 742.43 745.29 1131843
## 6 1-Apr-16 738.60 750.34 737.00 749.91 1574870

Modifying Date field: * Change the name * Change the type to Date * Insert na new Month column

Sys.setlocale("LC_TIME", "English")
## [1] "English_United States.1252"
names(goog)[1] <- "Date"
goog$Date <- as.Date(goog$Date, "%d-%b-%y")
goog$Month <- format(goog$Date, "%Y-%m")

Most of the volume entries are missing. Since it represents more than 50% of the results, an imputation could be inaccurate

sapply(goog, function(x) sum(is.na(x)))
##   Date   Open   High    Low  Close Volume  Month 
##      0      0      0      0      0   2001      0
summary(goog)
##       Date                 Open            High            Low       
##  Min.   :2006-04-12   Min.   :131.1   Min.   :134.6   Min.   :123.5  
##  1st Qu.:2008-10-09   1st Qu.:242.3   1st Qu.:245.3   1st Qu.:240.2  
##  Median :2011-04-09   Median :296.0   Median :298.4   Median :293.4  
##  Mean   :2011-04-10   Mean   :354.7   Mean   :358.1   Mean   :351.1  
##  3rd Qu.:2013-10-08   3rd Qu.:455.0   3rd Qu.:459.5   3rd Qu.:451.0  
##  Max.   :2016-04-08   Max.   :784.5   Max.   :789.9   Max.   :766.9  
##                                                                      
##      Close           Volume            Month          
##  Min.   :128.6   Min.   :  527223   Length:2514       
##  1st Qu.:242.4   1st Qu.: 1445878   Class :character  
##  Median :295.6   Median : 1788506   Mode  :character  
##  Mean   :354.6   Mean   : 2058055                     
##  3rd Qu.:454.9   3rd Qu.: 2258918                     
##  Max.   :776.6   Max.   :11164943                     
##                  NA's   :2001

Find the Open price mean per month

goog_date <- aggregate(goog$Open, list(Date = goog$Month ),FUN = mean)
goog_date$Date <- as.Date(paste(goog_date$Date,"-01",sep=""))

The stock shows a continuous monthly upper trend over the years

library(ggplot2)
g <- ggplot()
g+geom_line(data = goog_date, aes(Date,x), colour = "red" )

Transform the data in a data series, with a 12 months seasonality

ts.goog <- ts(goog_date$x, frequency = 12, start = c(2006,4))

Forecasting

Decomposing the google series into seasonal, trend, and remainder components.

stl.goog <- stl(ts.goog, s.window = "periodic" )
plot(stl.goog)

Holt-Winters

In this forecast, it was used an Holt-Winters exponential smoothing. It is used a 12 month seasonality.

library(forecast)
## Warning: package 'forecast' was built under R version 3.2.4
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.0
fit.hw <- HoltWinters(ts.goog)
goog.forecast.holt1 <- forecast(fit.hw, h = 12, level = 95)
plot(goog.forecast.holt1)

Arima model

fit1 <- auto.arima(ts.goog)
fcast_ar <-forecast(fit1, h=12, level = 95)
plot(fcast_ar)

ets

fit<-ets(ts.goog)
fcast_ets <- forecast(fit, h=12, level = 95)
plot(fcast_ets)

Neural network

fit_nn <- nnetar(ts.goog)
fcast_nn <- forecast(fit_nn, h=12, level = 95)
plot(fcast_nn)

Comparison

This study used a simplistic approach, employing all the data to fit the model and predicting future values through the defined model. This kind of approach conducts to overfitted models, which in spite of the smaller error do not reflect model’s true performance with real data. Therefore, this first document serves as base for future developments, allowing us to build a perception of how the accuracy changes with a more solid method.

Accuracy for Holt Winters Method

accuracy(goog.forecast.holt1)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.987552 25.49542 20.13194 -0.5283023 6.597947 0.2918411
##                   ACF1
## Training set 0.1258745

Accuracy for Arima Method

accuracy(fcast_ar)
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.05268505 20.90343 16.52174 -0.3981689 5.050276 0.2222985
##                      ACF1
## Training set -0.003889585

Accuracy for ets Method

accuracy(fcast_ets)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.3993283 22.47849 17.87291 -0.5052694 5.509053 0.2404784
##                  ACF1
## Training set 0.306915

Accuracy for Neural Network Method

accuracy(fcast_nn)
##                        ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.007483225 22.23893 17.93012 -0.4740918 5.368472 0.2412481
##                   ACF1
## Training set 0.2892273

This comparison shows that Arima is the model with lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).