ADEC7460 Homework #3

Jordan Harrop

Wikipedia Trend Data

For the third homework assignment I thought it would be clever to do time series analysis on the amount of daily views for the Wikipedia Time Series webpage. The below code uses the wikipediatrend package to pull daily page views from July first 2015 to May fifth 2018.

library(forecast)
library(wikipediatrend)
library(prophet)

data <- wp_trend(page = "Time_series",
                 from = "2015-07-01",
                 to = "2018-05-01")

# wp_cache_reset()

head(data)

##   project   language article     access     agent      granularity
## 1 wikipedia en       Time_series all-access all-agents daily      
## 2 wikipedia en       Time_series all-access all-agents daily      
## 3 wikipedia en       Time_series all-access all-agents daily      
## 4 wikipedia en       Time_series all-access all-agents daily      
## 5 wikipedia en       Time_series all-access all-agents daily      
## 6 wikipedia en       Time_series all-access all-agents daily      
##   date       views
## 1 2015-07-01 1124 
## 2 2015-07-02 1036 
## 3 2015-07-03  861 
## 4 2015-07-04  604 
## 5 2015-07-05  627 
## 6 2015-07-06 1087

Before fitting models, the below code creates some time series plots. The first plot looks at the time series, the ACF plot and PACF plot. From looking at the graphs, one can tell from the ACF plot that the series does show strong autocorrelation. At this point the data is also transformed using the Box-Cox transformation.

# Prep Data For ETS, ARIMA Models
ts <- ts(data$views, frequency = 365, start = c(2015, 183))
tsdisplay(ts)

# Prep Data for Facebook Prophet Model
ds <- data$date
y <- data$views
df <- data.frame(ds, y)

# Box-Cox Transformation
lam <- BoxCox.lambda(data$views)
data.clean <- BoxCox(data$views, lambda = lam)
ts.tfm <- ts(data.clean, frequency = 365, start = c(2015, 183))
plot(ts.tfm)

# Transformation for Prophet Model
df$y <- BoxCox(df$y, lambda = lam)

Exploritory Data Analysis

Below looks at different compenents of the time series by looking at the decomposition of the series. The below code also experiments with different orders of differencing of the data to get a stationary time series.

ts.decomp <- stl(ts.tfm, s.window = "periodic", robust = TRUE)
plot(ts.decomp)

ts.tfm.d1 <- diff(ts.tfm)
plot(ts.tfm.d1)

ts.decomp <- stl(ts.tfm.d1, s.window = "periodic", robust = TRUE)
plot(ts.decomp)

tsdisplay(ts.tfm.d1)

ts.tfm.d2 <- diff(ts.tfm.d1)
plot(ts.tfm.d2)

ARIMA Model, ETS & Neural Net

There will be a total of four different models tested: ARIMA, ETS, Neural Net and a Facebook Prophet Model. Even thought the Prophet model was not directly covered in the course text book it was still learned indirectly within the context of the course. The below fits the ARIMA, ETS and the Neural Net and forecasts a year ahead.

fit.ari <- Arima(ts.tfm,  order = c(3,1,3))
fcst.arima <- forecast(fit.ari, h=365)
plot(fcst.arima)

fcst.ets <- forecast(ts.tfm, h= 365)
fcst.ets$model

## ETS(A,A,N) 
## 
## Call:
##  ets(y = x, model = etsmodel, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.0002 
##     beta  = 0.0002 
## 
##   Initial states:
##     l = 176.1935 
##     b = -0.0186 
## 
##   sigma:  19.8045
## 
##      AIC     AICc      BIC 
## 13385.87 13385.93 13410.59

plot(fcst.ets)

fit.nn <- nnetar(ts.tfm)
fcst.nn <- forecast(fit.nn, h=365)
plot(fcst.nn)

Facebook Prophet Model

The prophet model reguires the data in a different format. However, the below code predicts a year ahead similiar to the above models. A plot of trend, weakly trend, and yearly trend is also provided.

fit.prophet <- prophet(df)

## Initial log joint probability = -7.66155
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

h <- make_future_dataframe(fit.prophet, periods = 365)

fcst.prophet <- predict(fit.prophet, h)

plot(fit.prophet, fcst.prophet)

prophet_plot_components(fit.prophet, fcst.prophet)

Training / Test Performance

For testing and performance we will create a train and test set. The training set will consist of the first 1000 observations and the test set will consist of the last 36 observations. The ARIMA model will utilize three time lags in the regression with one order of differencing and three error terms. The ETS model will use a (A,A,N) which consist of additive errors and a additive trend with no seasonality. This ETS model is very similiar to the Holts linear model with additive errors.

The best to models according to RMSE are plotted against the actual data. For this analysis, the Prophet model preformed the best.

train <- ts(ts.tfm[1:1000], frequency = 365, start = c(2015, 183))
test <- ts(ts.tfm[1001:1036], frequency = 365, start = c(2018, 84))

df.train <- df[1:1000,]

# Fit Models
fit.arima <- Arima(train, order = c(3,1,3))
fcst.arima <- forecast(fit.arima, h=36)

fcst.ets <- forecast(train, h= 36)

fit.nn <- nnetar(train)
fcst.nn <- forecast(fit.nn, h=36)

# Prophet Model
fit.prophet <- prophet(df.train)

## Initial log joint probability = -8.31303
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

h <- make_future_dataframe(fit.prophet, periods = 36)

fcst.prophet <- predict(fit.prophet, h)
fcst.p1 <- ts(fcst.prophet$yhat[1001:1036], frequency = 365, start = c(2018, 84))

accuracy(fcst.arima, test)

##                      ME     RMSE      MAE        MPE      MAPE      MASE
## Training set  0.1446658 18.85363 14.84543 -0.8962761  8.403095 0.6242161
## Test set     -1.5318756 44.84244 38.18420 -4.6196272 20.018370 1.6055575
##                    ACF1 Theil's U
## Training set -0.1081297        NA
## Test set      0.5164302  1.260804

accuracy(fcst.ets, test)

##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set 0.1465272 19.25279 15.00438 -1.151956  8.639017 0.6308993
## Test set     0.2285948 39.87756 34.67006 -2.739269 17.901141 1.4577957
##                   ACF1 Theil's U
## Training set 0.3534594        NA
## Test set     0.4654387  1.190379

accuracy(fcst.nn, test)

##                        ME      RMSE       MAE         MPE      MAPE
## Training set  0.006590431  2.267121  1.608666 -0.04000878  0.865049
## Test set     -5.293059057 55.909551 48.570054 -6.92458770 25.190018
##                    MASE       ACF1 Theil's U
## Training set 0.06764068 0.04153165        NA
## Test set     2.04225843 0.58130725  1.571049

accuracy(fcst.p1, test)

##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 3.532557 15.88078 13.11913 1.024287 6.368782 0.7100999 0.4970761

plot(test, type="o", ylab="Wiki Time Series Views Transformed",
  flwd=1, plot.conf=FALSE)
lines(window(test, start = c(2018,84)),type="o")
lines(fcst.ets$mean,col=2)
legend("topleft", lty=1, pch=1, col=1:2,
    c("Data","ETS"))

plot(test, type="o", ylab="Wiki Time Series Views Transformed",
  flwd=1, plot.conf=FALSE)
lines(window(test, start = c(2018,84)),type="o")
lines(fcst.p1, col=2)
legend("topleft", lty=1, pch=1, col=1:2,
    c("Data","Prophet"))