============================================================================================================

Analyze the attached time series data on the average monthly value of company stock. Forecast one year into the future and report the projected average value and low and high ends using a 95% confidence interval.

 

Question: Based on the forecast would you advise a client to buy the stock?
Answer: Yes, I would advise a client to buy the stock. There is an upward trend in the decomposition of the time series. Figure 6 shows the prediction of the next year’s stock prices in blue color, which is rising.

 

Question: And if so, what is the most they should pay and why?
Answer: When it comes to the most value they should pay, investors perhaps need to discount the future value into the present value. This answer, otherwise, does not discuss the discounted cash flow (DCF) analysis.

The closing price of the observations is 521.40 USD. The projected average value and low and high ends using 95% confidence interval are, respectively, the variables avg, low95, and high95 in the last code chunk. If a client buys one share at this moment, the average return will be 80.78 USD one year later, while earning 232.93 USD or losing 71.38 USD with the 95% confidence interval.

 

Submit a file with your work including the main points of the guideline. Along with the written results and recommendation, you should also submit your code with annotations.  

Evaluation metrics are going to consider:
1. How you determine the test or approach you are going to use.
Answer: The dataset contains 37 observations on 2 variables, namely Month and Value. Autocorrelation occurs when the residuals are not independent of each other. This issue typically happens in stock prices, where the current price is not independent of the previous ones. The Durbin-Watson test is to check this autocorrelation issue. In time series analysis, the autoregressive integrated moving average (ARIMA) models are fitted to understand the time series better and predict future points.  

2. Explicitly mention what you need to use for this approach, e.g., the parameters, statistics, predictors necessary.
Answer: Time series data may have some characteristics in nonstationarity and time-varying volatility.The ARIMA model’s two major assumptions are that the dataset is (1) stationary and (2) univariate. The beginning step is running a unit root test, such as the ADF test to examine how a time series dataset being determined by a trend. Such a test has a null hypothesis that the time series is nonstationary. If a unit root does exist, then the time series is nonstationary. The following step is decomposing trend and seasonality to eliminate the nonstationarity.

Nonseasonal ARIMA model is denoted ARIMA(p,d,q), where the parameter p, d, q respectively represents the lag number as “AR”, the differencing degree as “I”, and the moving average order as “MA”. The parameter p indicates that the future point is regressed on its own lagged terms. The parameter d indicates that the dataset has been replaced with the differencing terms, and the processes may be more than once. The parameter q indicates that the regression error is the linear combination of error terms in the past. Moreover, seasonal ARIMA model is denoted ARIMA(p,d,q)(P,D,Q)[m], where the parameters P, D, Q are the seasonal parts of p, d, q, and m is the frequency.  

3. Report the results of your approach.
Answer: The p-value is not statistically significant at an alpha level of 0.05, hence the research fails to reject the null hypothesis of ADF test. The dataset takes the difference transformation of 20th order. In this case, the model may already lose its accuracy. Furthermore, ACF and PACF respectively determine parameter q and p. In Figure 3 and Figure 4, some spikes are statistically significant at lag 2 and 3 in ACF, and at lag 1 in PACF when d is 20. However, an ARIMA(1,20,0) model runs into “vmmin” error.

An auto ARIMA model, i.e., ARIMA(0,2,2) is built instead. Yet, the time series is, otherwise, not stationary when d=2. This issue may imply that an ARIMA model may not fit the observations well.  

4. The written communication of the process and final recommendation.
Answer: Figure 6 shows an ARIMA(0,2,2) model. The black line is the observation. The red line is fitting. The blue line is the prediction with both 95% and 80% confidence intervals. The stock prices seem to have a rising tendency in the long run. A buy rating is recommended.

 

Descriptive Analysis

data <- read_excel("~/Documents/HU/ANLY 525-50-B/final exam/Final Exam 525 - Analysis Question A.xlsx")
sum(is.na(data)) #no missing values, hence needn't tsclean()
## [1] 0
#data$Value <- tsclean(data$Value)
series <- ts(data$Value,frequency=12) #setup the time series with a year frequeny

ggplot(data,aes(Month,Value))+geom_line()+geom_point()+xlab("Trading Month Indices")+ylab("Stock Prices (USD)")+ggtitle("Figure 1. Stock Trading Price in Three Years")+theme_classic()

 

Unit root test

An Augmented Dickey-Fuller (ADF) of the unit root test is to examine the strength of the trend in the time series. Then, taking differences or logarithm can decompose the trend and the seasonality.

plot(decompose(series),xlab="Time (Frequency = 12 months)")

adf.test(series) #not stationary
## 
##  Augmented Dickey-Fuller Test
## 
## data:  series
## Dickey-Fuller = -0.23969, Lag order = 3, p-value = 0.9878
## alternative hypothesis: stationary
#not stationary when d=0~19, until d=20
adf.test(diff(series,20)) #stationary
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(series, 20)
## Dickey-Fuller = -4.4364, Lag order = 2, p-value = 0.01
## alternative hypothesis: stationary

 

Autocorrelation function (ACF) and partial autocorrelation function (PACF) plots

par(mfrow=c(1,2))
acf(diff(data$Value,20),lag.max=30,main="Figure 3. ACF Plot, d=20")
pacf(diff(data$Value,20),lag.max=30,main="Figure 4. PACF Plot, d=20")

 

Fitting an ARIMA model

model <-auto.arima(series)
summary(model)
## Series: series 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.5789  -0.2724
## s.e.   0.1589   0.1488
## 
## sigma^2 estimated as 109.7:  log likelihood=-131.41
## AIC=268.83   AICc=269.6   BIC=273.49
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 2.035183 9.889485 7.313185 0.4523609 1.620453 0.2303072
##                     ACF1
## Training set -0.05102664
tsdiag(model)

#model <- arima(series, order=c(1,20,0))
# Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE,  : 
#   initial value in 'vmmin' is not finite
# In addition: Warning message:
# In log(s2) : NaNs produced

“Fitting the ARIMA model with Maximum Likelihood (method =”ML“) requires optimising (minimising) the ARIMA model negative log-likelihood over the parameters. This turns out to be a constrained optimisation problem as the parameters must result in a stationary model. This nonlinear constraint is accounted for with the negative log-likelihood returning Inf (infinity) if the the constraint is not satisfied. If the MLE is near the boundary of the constraint evaluation of the negative log-likelihood near the MLE could return infinity. As the hessian is obtained with numerical differentiation by evaluating the negative log-likelihood near the MLE this can result in the nonfinite finite difference error you obtained. So if the hessian is not required put hessian = FALSE. Otherwise, this error depends on the MLE solution so an alternative optimisation algorithm (Nelder-Mead) might return an MLE sufficiently far from the boundary of the constrain that the error is avoided.” Reference

 

Prediction

pred <- forecast(model,h=12) #prediction for the next 12 months
low95 <- data.frame(pred$lower)$X95.
high95 <- data.frame(pred$upper)$X95.
avg <- pred$mean

par(mfrow=c(1,1))
plot(pred,xlab="Trading Month Indices",ylab="Stock Prices (USD)",main="Figure 6. Stock Trading Price in Four Years\n Forecasts from ARIMA(0,2,2)",lwd=2)
lines(pred$fitted,col="red")
legend("topleft",legend=c("Fitted","Predicted","Observed"),col=c("red","blue","black"),lty=c(1,1,1),lwd=c(2,2,2),bty="n")

low95[12]-series[37] #losing on low
## [1] -71.37899
high95[12]-series[37] #earning on high
## [1] 232.9338
avg[12]-series[37] #average gain
## [1] 80.77739