Microsoft Stock Price Prediction

Introduction

The stock market is a public and organized environment for trading certain securities and real estate (stocks, stock options, real estate funds, etc.). Transactions can take place through stock exchanges or on the so-called over-the-counter markets (markets in which securities not traded on exchanges are traded, within the legal rules provided for by law and regulations, without coordination by private self-regulatory entities).

The motivation of this study is to analyze the nature of financial data. Therefore, I decided on analysing the stock price of a well know company whose stock is effected by market trends. The goal of this project is to discover interesting facts of a company’s Stock price and develop a suitable model for forecasting.

Understanding the problem

In the stock market, we carry out purchase and sale transactions for profit. In this analysis, we are going to analyze the Linear Regression model to predict the stock’s closing price for a period of time.

First it’s important to establish what we’re aiming to solve. Broadly, stock market analysis is divided into two parts – Fundamental Analysis and Technical Analysis.

Fundamental Analysis involves analyzing the company’s future profitability on the basis of its current business environment and financial performance.
Technical Analysis, on the other hand, includes reading the charts and using statistical figures to identify the trends in the stock market.

Our focus will be on the technical analysis part, for which we will be using the dataset of Microsoft stock price from April 2015 to April 2021 to build a model capable of estimating the closing stock price.

Data Source

The data originally comes from an open dataset on the Kaggle website, which was gathered using google sheets. This is known as the OHLC data or Open, High, Low, Close data that denotes the opening price, highest price, lowest price and the closing price for a date.

Libraries Used

library(ggplot2)
library(mltools)
library(forecast)
library(tseries)
library(mltools)
library(xts)
library(prophet)
library(tidyverse)
library(caret)

Loading the dataset

data <- read.csv("E:/Forcasting/MS_Stock_data/Microsoft_Stock.csv")

#Viewing the structure of data
str(data)

## 'data.frame':    1511 obs. of  6 variables:
##  $ Date  : chr  "4/1/2015 16:00:00" "4/2/2015 16:00:00" "4/6/2015 16:00:00" "4/7/2015 16:00:00" ...
##  $ Open  : num  40.6 40.7 40.3 41.6 41.5 ...
##  $ High  : num  40.8 40.7 41.8 41.9 41.7 ...
##  $ Low   : num  40.3 40.1 40.2 41.3 41 ...
##  $ Close : num  40.7 40.3 41.5 41.5 41.4 ...
##  $ Volume: int  36865322 37487476 39223692 28809375 24753438 25723861 28022002 30276692 24244382 27343581 ...

#Converting date column to date format
data['Date'] <- as.Date(data$Date, "%m/%d/%Y")

#Checking for missing values
colSums(is.na(data))

##   Date   Open   High    Low  Close Volume 
##      0      0      0      0      0      0

There is no missing data in the dataset.

ggplot(data = data, aes(x = Date, y = Close)) +
  geom_line(colour = "purple") +
  labs(title = "Stock Price of Microsoft over the Years",
       subtitle = "2015-2021",
       x = "Date", y = "Stock Price ($)")

The time series plots depicts the price of Microsoft Stock.It shows the price of the stock increases over time even though the price shows volatility.Thus the stock price is not stationary over time due to changing mean and variance over time.

Histogram below depicts the change in mean and variance over time of closing price. Also, the distribution of Closing price is not normal. Due to this violation of normality, using a Linear model to predict the closing price is not a good choice (which will be proved in further analysis).

hist(data$Close, prob = T, col ="purple", breaks = 100, 
     xlim = c(min(data$Close) - 1 ,max(data$Close) + 1), 
     main = "Histogram of Stock Closing Price", xlab = "Closing Price")

To verify the normality of distribution we can also plot a Q-Q plot as below. The below plot shows that the disrtibution of Closing price is not mormally distributed.

ggplot(data=data, aes(sample = Close),) +
  stat_qq() +
  stat_qq_line(col='purple') + ggtitle('QQ plot of Closing Price')

Box plots of numerical values is used to verify if there is presence of outliers. Below plots suggests there are no outliers.

par(mfrow = c(3,2), oma = c(1,1,0,0) + 0.1, mar = c(3,3,1,1) + 0.1)
boxplot(data$Open,main="Open")
boxplot(data$High,main="High")
boxplot(data$Low,main="Low")
boxplot(data$Close,main="Close")
boxplot(data$Volume,main="Volume")

Linear Modeling

sample_index <- 1:(nrow(data)*0.70)
train_data <- data[sample_index,]
test_data <- data[-sample_index,]
close.lm <- lm(Close ~ Date ,data = train_data)
summary(close.lm)

## 
## Call:
## lm(formula = Close ~ Date, data = train_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.4561  -5.8235  -0.3349   5.4332  16.1337 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.616e+02  7.537e+00  -114.3   <2e-16 ***
## Date         5.418e-02  4.357e-04   124.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.272 on 1055 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.936 
## F-statistic: 1.546e+04 on 1 and 1055 DF,  p-value: < 2.2e-16

Using 70% of the data as a training dataset, after fitting a regression line using date to predict closing price of a stock, we get the above regression results. From the above summary we can see that the p-value of date variable is less than the specified value of α=0.05. So, we have enough evidence to reject the null hypothesis and say that there is significant linear relationship between the date & closing price values.

Using the testing data set (30% of data), predicted the closing price as below.

predict_close <- predict(close.lm,newdata = test_data[c('Date')] )
test_data_pred <- test_data
test_data_pred['predict_close'] <- predict_close

Using rmse function, found the rmse value for the linear model. The value is 54

rmse_close_lm <- rmse(actuals = test_data_pred$Close,
                      preds = test_data_pred$predict_close)

Plotted the test data (Purple), train data actual values (Orange) & the regression values or predicted values (green). From the below regression line we can say that linear modeling for the given time series is not a good choice for predictions. Also our Linear model has identified a drop in early 2020 which is when the Coivd-19 pandemic has started to show effect of the stock price. To enhance our predictions we can try using modelling techniques such as, Auto ARIMA & Prophet.

ggplot(data = train_data, aes(x = Date, y = Close)) +
  geom_line(colour = "purple") +
  geom_line(data = test_data, aes(x = Date, y = Close),colour = "orange") +
  geom_line(data = test_data_pred, aes(x = Date, y = predict_close),colour = "green") +
  labs(title = "Stock Price of Microsoft over the Years(Predicted)",
       subtitle = "2015-2021",
       x = "Date", y = "Stock Price ($)")

ARIMA Modeling

Before we start with ARIMA modeling we will check if the data of closing price is stationary. Our data is said to be stationary if we have constant variance and each value has a zero correlation with all other values in the series with a mean of zero. From the above “Stock Price of Microsoft over the Years” plot we can see closing price of the stock increases over time even though the price shows volatility.Thus the stock price is not stationary over time due to changing mean and variance over time.

We shall run ADF & KPSS tests to check for stationarity.

#Augmented Dickey-Fuller
adf.test(train_data$Close)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  train_data$Close
## Dickey-Fuller = -1.5753, Lag order = 10, p-value = 0.7581
## alternative hypothesis: stationary

From the results of ADF test, we can see the p-value is higher than the significance level of 0.05. So, we can not reject the null hypothesis and conclude that the closing price is not stationary.

#KPSS test
kpss.test(train_data$Close)

## 
##  KPSS Test for Level Stationarity
## 
## data:  train_data$Close
## KPSS Level = 12.768, Truncation lag parameter = 7, p-value = 0.01

From the results of KPSS test we can say the data for closing price is not stationary. Now, we perform first order differencing to remove the non-stationarity. We can verify this by ploting the closing price differences. (The data after differencing is stationary for the most of the time, there is non-stationarity at the end due to the effect of Coivd-19 pandemic)

plot(diff(train_data$Close), type="l",main="Microsoft Stock Closing price",ylab="Close Differences",xlab="Days")

Now, we will perform stationarity tests on differenced data.

#Augmented Dickey-Fuller
adf.test(diff(train_data$Close))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(train_data$Close)
## Dickey-Fuller = -11.6, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary

#KPSS
kpss.test(diff(train_data$Close))

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(train_data$Close)
## KPSS Level = 0.16595, Truncation lag parameter = 7, p-value = 0.1

Results are contradicting between ADF & KPSS tests, where the former says the data is stationary where as the later says the data is non-stationary. so running Phillips-Perron Test on differenced data to confirm.

pp.test(diff(train_data$Close),alternative="stationary")

## Warning in pp.test(diff(train_data$Close), alternative = "stationary"): p-value
## smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  diff(train_data$Close)
## Dickey-Fuller Z(alpha) = -1072.1, Truncation lag parameter = 7, p-value
## = 0.01
## alternative hypothesis: stationary

ACF & PCF plots

The above differenced time series shows that it is stationary and now we will plot the ACF & PACF plots for this data to verify the behavior of time series. ARIMA models take into account the past values to predict the future values. There are three important parameters in ARIMA:

p (past values used for forecasting the next value)
q (past forecast errors used to predict the future values)
d (order of differencing)

From the below ACF & PACF plots we can estimate the parameters p & q.

acf(diff(train_data$Close))

pacf(diff(train_data$Close))

From the ACF plot we can also infer that there is no seasonality in our data. This is expected out of Microsoft stocks time series data. As per the ACF plot we can confirm the time series appears to be a Moving Average series. Based on the ACF & PACF plot we can make an approximate estimate of the order of the series and build the following ARIMA models. The one with lowest AIC & BIC values would be our best choice.

Modeling

#ARIMA(0,1,1)
model1 <- Arima(train_data$Close,order=c(0,1,1),include.constant=T)
model1

## Series: train_data$Close 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.1400  0.0860
## s.e.   0.0329  0.0311
## 
## sigma^2 = 1.381:  log likelihood = -1668.01
## AIC=3342.01   AICc=3342.03   BIC=3356.9

checkresiduals(model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 35.845, df = 8, p-value = 1.874e-05
## 
## Model df: 2.   Total lags used: 10

#ARIMA(0,1,2)
model2 <- Arima(train_data$Close,order=c(0,1,2),include.constant=T)
model2

## Series: train_data$Close 
## ARIMA(0,1,2) with drift 
## 
## Coefficients:
##           ma1      ma2   drift
##       -0.1191  -0.0945  0.0860
## s.e.   0.0311   0.0331  0.0283
## 
## sigma^2 = 1.372:  log likelihood = -1664.01
## AIC=3336.03   AICc=3336.07   BIC=3355.88

checkresiduals(model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2) with drift
## Q* = 27.397, df = 7, p-value = 0.0002827
## 
## Model df: 3.   Total lags used: 10

#ARIMA(0,1,3)
model3 <- Arima(train_data$Close,order=c(0,1,3),include.constant=T)
model3

## Series: train_data$Close 
## ARIMA(0,1,3) with drift 
## 
## Coefficients:
##           ma1      ma2     ma3   drift
##       -0.1184  -0.0926  0.0483  0.0862
## s.e.   0.0307   0.0327  0.0313  0.0301
## 
## sigma^2 = 1.37:  log likelihood = -1662.83
## AIC=3335.65   AICc=3335.71   BIC=3360.46

checkresiduals(model3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,3) with drift
## Q* = 22.948, df = 6, p-value = 0.0008142
## 
## Model df: 4.   Total lags used: 10

#ARIMA(0,1,4)
model4 <- Arima(train_data$Close,order=c(0,1,4),include.constant=T)
model4

## Series: train_data$Close 
## ARIMA(0,1,4) with drift 
## 
## Coefficients:
##           ma1      ma2     ma3      ma4   drift
##       -0.1313  -0.0867  0.0540  -0.0730  0.0858
## s.e.   0.0312   0.0312  0.0312   0.0355  0.0274
## 
## sigma^2 = 1.366:  log likelihood = -1660.74
## AIC=3333.47   AICc=3333.55   BIC=3363.25

checkresiduals(model4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,4) with drift
## Q* = 19.534, df = 5, p-value = 0.001528
## 
## Model df: 5.   Total lags used: 10

Auto ARIMA

Parameter tuning for ARIMA consumes a lot of time. So we will use auto ARIMA which automatically selects the best combination of (p,q,d) that provides the least error.

#Auto ARIMA
model5<-auto.arima((train_data$Close), seasonal=FALSE)
model5

## Series: (train_data$Close) 
## ARIMA(3,1,3) with drift 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2      ma3   drift
##       0.1546  -0.3164  0.6935  -0.2780  0.2418  -0.6800  0.0846
## s.e.  0.1104   0.0788  0.0706   0.1165  0.0857   0.0766  0.0216
## 
## sigma^2 = 1.346:  log likelihood = -1651.89
## AIC=3319.78   AICc=3319.91   BIC=3359.47

checkresiduals(model5)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,3) with drift
## Q* = 3.4724, df = 3, p-value = 0.3244
## 
## Model df: 7.   Total lags used: 10

From the residual plots we can confirm that the residuals are not distinguishable from a white noise series as the results are not significant.

Below table contains the AIC & BIC values of all the ARIMA models built. Auto ARIMA suggests that we need to use ARIMA(3,1,3) model based on AIC & BIC values. Thus, we shall proceed with ARIMA(3,1,3) model as suggested by Auto ARIMA as it has better AIC & BIC values.

Models Summary

models_name	models_AIC	models_BIC
ARIMA(0,1,1)	3342.01	3356.90
ARIMA(0,1,2)	3336.03	3355.88
ARIMA(0,1,3)	3335.65	3360.46
ARIMA(0,1,4)	3333.47	3363.25
Auto ARIMA(3,1,3)	3319.78	3359.47

Make Predictions

The Multi-Step Forecast does the job of forecasting the next 150 trading days without using the test data set.

plot(forecast(model5,h=150),main="ARIMA(3,1,3) Forecast",ylab="Closing Price",xlab="Date")

The accuracy of the model can be see below. This menthod of forcasting gives an RMSE of 3.9

test_pred <- Arima(test_data$Close, model=model5)
accuracy(test_pred)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.2346034 3.856331 2.640417 0.09741845 1.456463 0.9915436
##                    ACF1
## Training set -0.2150405

Modeling Using Facebook’s Prophet Model

Prophet, designed and pioneered by Facebook, is a time series forecasting library that requires no data preprocessing and is extremely simple to implement. The input for Prophet is a dataframe with two columns: date and target (ds and y). Prophet tries to capture the seasonality in the past data and works well when the dataset is large.

Fitting the model

prophet_data = train_data %>% 
  rename(ds = Date, y = Close)
p_model = prophet(prophet_data)
future = make_future_dataframe(p_model,periods = 365)
forecast = predict(p_model,future)

Plotting the forecast

plot(p_model,forecast) +
  ylab("Closing Price") + xlab("Date") + theme_bw()

Interactive Charts

dyplot.prophet(p_model,forecast)

Time Series Decomposition

Time series data can exhibit a variety of patterns, and it is often helpful to split a time series into several components, each representing an underlying pattern category. In the below plots we shall Visualization of Time Series components.

prophet_plot_components(p_model,forecast)

Plotting the changepoints

plot(p_model,forecast) +
  add_changepoints_to_plot(p_model) +
  theme_bw() +
  ylab("Closing Price") + 
  xlab("Date")

Assessment of Seasonality

There are basically two methods to analyze the seasonality of a Time Series: additive and multiplicative. In the additive model, the behavior is linear where changes over time are consistently made by the same amount, like a linear trend.

The model fitted above is done using additive seasonality assumption, as the default value of “seasonality.mode” parameter is ‘additive’.

In multiplicative model, trend and seasonal components are multiplied and then added to the error component. It is not linear, can be exponential or quadratic and represented by a curved line.

Let’s fit a multiplicative model and perform forcasting below:

multi = prophet(prophet_data,seasonality.mode = "multiplicative")
multi_forecast = predict(multi,future)

Plotting the forecast

dyplot.prophet(multi,multi_forecast)

prophet_plot_components(multi,multi_forecast)

We could see the forecast is not doing a good job in multiplicative model as we are getting negative stock prices, which is not possible.

Assessment of holidays

model_holidays = prophet(prophet_data,fit = FALSE)
model_holidays = add_country_holidays(model_holidays,country_name = "US")
model_holidays = fit.prophet(model_holidays,prophet_data)
forecast_holidays = predict(model_holidays,future)
prophet_plot_components(model_holidays,forecast_holidays)

forecast_holidays %>%
  filter(holidays != 0) %>%
  dplyr::select(-ds:-additive_terms_upper, -holidays:-holidays_upper, -weekly:-yhat, -contains("upper"), -contains("lower")) %>%
  mutate_all(~ if_else(. == 0, as.numeric(NA), .)) %>%
  summarize_if(is.numeric, ~ max(., na.rm = T)) %>%
  pivot_longer(
    cols = `Christmas Day`:`Washington's Birthday`,
    names_to = 'holiday', 
    values_to = 'effect'
  ) %>%
ggplot() +
  geom_col(aes(effect,holiday))+
  theme_bw()

From the above plot, Veterans Day, Columbus Day seems to be the most important holidays affecting stock prices for Microsoft.

Prophet Model Assessment

We will calculate RMSE, MAE, MAPE to assess the model performance on test data. Before that we will increase our forcating time period to include all the dats availabe in our testing dataset.

test_dates = test_data[,1]
future_2 = make_future_dataframe(p_model,periods = 700)
forecast_2 = predict(p_model,future_2)
forecast_metric_data = forecast_2 %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds %in% (test_dates))

RMSE = sqrt(mean((test_data$Close - forecast_metric_data$yhat)^2))
MAE = mean(abs(test_data$Close - forecast_metric_data$yhat))
MAPE = mean(abs((test_data$Close - forecast_metric_data$yhat)/test_data$Close))
print(paste("RMSE:",round(RMSE,2)))

## [1] "RMSE: 39.93"

print(paste("MAE:",round(MAE,2)))

## [1] "MAE: 32.91"

print(paste("MAPE:",round(MAPE,2)))

## [1] "MAPE: 0.16"

Cross-Validation with Prophet Model

df.cv <- cross_validation(p_model, initial = 730, period = 30, horizon = 30, units = 'days')
head(df.cv)

##       y         ds     yhat yhat_lower yhat_upper     cutoff
## 1 67.53 2017-04-24 65.29656   63.91355   66.66080 2017-04-23
## 2 67.92 2017-04-25 65.18932   63.79759   66.56335 2017-04-23
## 3 67.83 2017-04-26 65.18328   63.84720   66.61522 2017-04-23
## 4 68.27 2017-04-27 65.05404   63.60974   66.42490 2017-04-23
## 5 68.46 2017-04-28 64.97823   63.72417   66.29444 2017-04-23
## 6 69.41 2017-05-01 64.66387   63.20180   66.14533 2017-04-23

unique(df.cv$cutoff)

##  [1] "2017-04-23 GMT" "2017-05-23 GMT" "2017-06-22 GMT" "2017-07-22 GMT"
##  [5] "2017-08-21 GMT" "2017-09-20 GMT" "2017-10-20 GMT" "2017-11-19 GMT"
##  [9] "2017-12-19 GMT" "2018-01-18 GMT" "2018-02-17 GMT" "2018-03-19 GMT"
## [13] "2018-04-18 GMT" "2018-05-18 GMT" "2018-06-17 GMT" "2018-07-17 GMT"
## [17] "2018-08-16 GMT" "2018-09-15 GMT" "2018-10-15 GMT" "2018-11-14 GMT"
## [21] "2018-12-14 GMT" "2019-01-13 GMT" "2019-02-12 GMT" "2019-03-14 GMT"
## [25] "2019-04-13 GMT" "2019-05-13 GMT"

Cross-Validation Actual vs Predicted

df.cv %>% 
  ggplot()+
  geom_point(aes(ds,y)) +
  geom_point(aes(ds,yhat,color = factor(cutoff)))+
  theme_bw()+
  xlab("Date")+
  ylab("Closing Price")+
  scale_color_discrete(name = 'Cutoff')

Table for Performance metrics

We can generate a table for various performance metrics for different time horizons.

performance_metrics(df.cv)

##    horizon       mse     rmse      mae       mape      mdape      smape
## 1   3 days  9.258266 3.042740 2.246496 0.02292406 0.01577315 0.02288683
## 2   4 days 13.499756 3.674201 2.657491 0.02682402 0.01419638 0.02672558
## 3   5 days 17.746545 4.212665 3.090287 0.03099727 0.02163692 0.03090786
## 4   6 days 21.677617 4.655923 3.385956 0.03396338 0.02526102 0.03376395
## 5   7 days 26.101180 5.108931 3.674707 0.03698395 0.02805215 0.03666708
## 6   8 days 25.519320 5.051665 3.633358 0.03712603 0.02959705 0.03686874
## 7   9 days 26.443754 5.142349 3.683692 0.03807638 0.02959705 0.03779097
## 8  10 days 28.072759 5.298373 3.786991 0.03976921 0.02828739 0.03941697
## 9  11 days 29.256258 5.408905 3.823282 0.03981966 0.02828739 0.03951853
## 10 12 days 30.269546 5.501777 3.886161 0.03981611 0.02735668 0.03975067
## 11 13 days 30.061244 5.482814 3.972235 0.03963228 0.02560659 0.03992493
## 12 14 days 35.477348 5.956286 4.468172 0.04467092 0.03237950 0.04471008
## 13 15 days 35.122291 5.926406 4.453522 0.04536710 0.03547567 0.04527155
## 14 16 days 32.560974 5.706222 4.229467 0.04372185 0.03151711 0.04353098
## 15 17 days 31.532093 5.615344 4.173718 0.04302798 0.03317428 0.04301686
## 16 18 days 33.609581 5.797377 4.332762 0.04409168 0.03501626 0.04411983
## 17 19 days 32.769370 5.724454 4.252051 0.04316524 0.03141626 0.04323926
## 18 20 days 32.326262 5.685619 4.034710 0.04032537 0.02604089 0.04044021
## 19 21 days 32.759297 5.723574 4.112874 0.04092266 0.02893690 0.04090769
## 20 22 days 33.423048 5.781267 4.192185 0.04166204 0.03006686 0.04161086
## 21 23 days 29.822982 5.461042 4.175911 0.04219432 0.03565251 0.04217370
## 22 24 days 24.251086 4.924539 3.869859 0.03975605 0.03399120 0.03984401
## 23 25 days 25.797846 5.079158 4.003124 0.04078213 0.03414902 0.04081244
## 24 26 days 26.294472 5.127814 3.935535 0.03949667 0.03065525 0.03954483
## 25 27 days 29.319786 5.414775 4.100798 0.04018234 0.03414902 0.04036263
## 26 28 days 30.885771 5.557497 4.215261 0.04106138 0.03406208 0.04122067
## 27 29 days 33.044848 5.748465 4.385580 0.04316954 0.03406208 0.04343926
## 28 30 days 34.334708 5.859583 4.534927 0.04582468 0.03406208 0.04586384
##     coverage
## 1  0.5332671
## 2  0.5241090
## 3  0.4558093
## 4  0.4011917
## 5  0.3333333
## 6  0.3878407
## 7  0.4276730
## 8  0.4039956
## 9  0.3704071
## 10 0.3843098
## 11 0.3878407
## 12 0.3096559
## 13 0.2907880
## 14 0.3360849
## 15 0.3187686
## 16 0.2939424
## 17 0.2830189
## 18 0.3584906
## 19 0.3584906
## 20 0.3700210
## 21 0.2907880
## 22 0.2889772
## 23 0.2899702
## 24 0.3783515
## 25 0.3892751
## 26 0.3396226
## 27 0.3140954
## 28 0.2704403

Now, we can plot visualizations for metrics like RMSE, MAE, MAPE, MDAPE, SMAPE.

plot_cross_validation_metric(df.cv, metric = 'rmse')

plot_cross_validation_metric(df.cv, metric = 'mae')

plot_cross_validation_metric(df.cv, metric = 'mape')

plot_cross_validation_metric(df.cv, metric = 'mdape')

plot_cross_validation_metric(df.cv, metric = 'smape')

Microsoft Stock Price Prediction

Kishore Kumar Danda

Introduction

Understanding the problem

Data Source

Libraries Used

Loading the dataset

Linear Modeling

ARIMA Modeling

ACF & PCF plots

Modeling

Auto ARIMA

Models Summary

Make Predictions

Modeling Using Facebook’s Prophet Model

Fitting the model

Plotting the forecast

Interactive Charts

Time Series Decomposition

Plotting the changepoints

Assessment of Seasonality

Plotting the forecast

Assessment of holidays

Prophet Model Assessment

Cross-Validation with Prophet Model

Cross-Validation Actual vs Predicted

Table for Performance metrics