The character “-” represents a missing value
goog <- read.csv("goog.csv", na.strings = "-")
dim(goog)
## [1] 2514 6
head(goog)
## ï..Date Open High Low Close Volume
## 1 8-Apr-16 743.97 745.45 735.55 739.15 1290787
## 2 7-Apr-16 745.37 747.00 736.28 740.28 1429504
## 3 6-Apr-16 735.77 746.24 735.56 745.69 1050193
## 4 5-Apr-16 738.00 742.80 735.37 737.80 1129829
## 5 4-Apr-16 750.06 752.80 742.43 745.29 1131843
## 6 1-Apr-16 738.60 750.34 737.00 749.91 1574870
Modifying Date field: * Change the name * Change the type to Date * Insert na new Month column
Sys.setlocale("LC_TIME", "English")
## [1] "English_United States.1252"
names(goog)[1] <- "Date"
goog$Date <- as.Date(goog$Date, "%d-%b-%y")
goog$Month <- format(goog$Date, "%Y-%m")
Most of the volume entries are missing. Since it represents more than 50% of the results, an imputation could be inaccurate
sapply(goog, function(x) sum(is.na(x)))
## Date Open High Low Close Volume Month
## 0 0 0 0 0 2001 0
summary(goog)
## Date Open High Low
## Min. :2006-04-12 Min. :131.1 Min. :134.6 Min. :123.5
## 1st Qu.:2008-10-09 1st Qu.:242.3 1st Qu.:245.3 1st Qu.:240.2
## Median :2011-04-09 Median :296.0 Median :298.4 Median :293.4
## Mean :2011-04-10 Mean :354.7 Mean :358.1 Mean :351.1
## 3rd Qu.:2013-10-08 3rd Qu.:455.0 3rd Qu.:459.5 3rd Qu.:451.0
## Max. :2016-04-08 Max. :784.5 Max. :789.9 Max. :766.9
##
## Close Volume Month
## Min. :128.6 Min. : 527223 Length:2514
## 1st Qu.:242.4 1st Qu.: 1445878 Class :character
## Median :295.6 Median : 1788506 Mode :character
## Mean :354.6 Mean : 2058055
## 3rd Qu.:454.9 3rd Qu.: 2258918
## Max. :776.6 Max. :11164943
## NA's :2001
goog_date <- aggregate(goog$Open, list(Date = goog$Month ),FUN = mean)
goog_date$Date <- as.Date(paste(goog_date$Date,"-01",sep=""))
The stock shows a continuous monthly upper trend over the years
library(ggplot2)
g <- ggplot()
g+geom_line(data = goog_date, aes(Date,x), colour = "red" )
Transform the data in a data series, with a 12 months seasonality
ts.goog <- ts(goog_date$x, frequency = 12, start = c(2006,4))
Decomposing the google series into seasonal, trend, and remainder components.
stl.goog <- stl(ts.goog, s.window = "periodic" )
plot(stl.goog)
In this forecast, it was used an Holt-Winters exponential smoothing. It is used a 12 month seasonality.
library(forecast)
## Warning: package 'forecast' was built under R version 3.2.4
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.0
fit.hw <- HoltWinters(ts.goog)
goog.forecast.holt1 <- forecast(fit.hw, h = 12, level = 95)
plot(goog.forecast.holt1)
fit1 <- auto.arima(ts.goog)
fcast_ar <-forecast(fit1, h=12, level = 95)
plot(fcast_ar)
fit<-ets(ts.goog)
fcast_ets <- forecast(fit, h=12, level = 95)
plot(fcast_ets)
fit_nn <- nnetar(ts.goog)
fcast_nn <- forecast(fit_nn, h=12, level = 95)
plot(fcast_nn)
This study used a simplistic approach, employing all the data to fit the model and predicting future values through the defined model. This kind of approach conducts to overfitted models, which in spite of the smaller error do not reflect model’s true performance with real data. Therefore, this first document serves as base for future developments, allowing us to build a perception of how the accuracy changes with a more solid method.
Accuracy for Holt Winters Method
accuracy(goog.forecast.holt1)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.987552 25.49542 20.13194 -0.5283023 6.597947 0.2918411
## ACF1
## Training set 0.1258745
Accuracy for Arima Method
accuracy(fcast_ar)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.05268505 20.90343 16.52174 -0.3981689 5.050276 0.2222985
## ACF1
## Training set -0.003889585
Accuracy for ets Method
accuracy(fcast_ets)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.3993283 22.47849 17.87291 -0.5052694 5.509053 0.2404784
## ACF1
## Training set 0.306915
Accuracy for Neural Network Method
accuracy(fcast_nn)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.007483225 22.23893 17.93012 -0.4740918 5.368472 0.2412481
## ACF1
## Training set 0.2892273
This comparison shows that Arima is the model with lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).