1 Introduction and Background

The dataset I examined today tracked the day-to-day performance of Amazon’s stock from May 15, 1997 to October 27, 2021. These daily observations recorded the stock’s price at market opening and closing, as well as its high, low, adjusted closing rate and trading volume for the day.

For the purposes of my analysis, I chose to track the history of Amazon’s market performance on a month-by-month basis. I took the stock’s average opening price for each month, and constrained the timeline from January 2009 to October of 2021, a total of 154 samples of monthly data. I used this subset of the data to create a number of time series models with the hopes of formulating at least one that can forecast Amazon’s average market opening on a month-by-month timeline with relative accuracy.

This is not an associative analysis that attempts to greatly explain why the stock’s opening price has historically moved the way it has, but rather simply a predictive tool. The primary methods of forecast modeling I utilized include exponential smoothing, Holt and Holt-Winters calculations.

Data_Url= "https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/Amazon.csv"
Original_Data = read_delim(Data_Url, delim = ",", show_col_types = FALSE)

#(glimpse(Original_Data))
  # Everything stored the right way ^^

#sum(is.na(Original_Data)) = 0
Data = mutate(Original_Data,
       Month = month(Date, label = TRUE),
       Year = year(Date),
  )

Data = group_by(Data, Month, Year)
Data = summarize(Data, mean(Open))
  # ^^ Since I am looking at month-by-month opens, need to summarize and group
Data = sort_by(Data, Data$Year, Data$Month)
  # ^^ Prioritized sorting, first sort by year (oldest first), and then by month of year (January to December)

# Restricting my analysis from Jan 2009 to Oct 2021, rounding opening price to nearest cent
Data = Data[141:294,]
colnames(Data)[3] = "Open"
Data$Open = round(Data$Open,2)  

Data$Season[Data$Month == "Dec" | Data$Month == "Jan" | Data$Month == "Feb"] = "Winter"
Data$Season[Data$Month == "Mar" | Data$Month == "Apr" | Data$Month == "May"] = "Spring"
Data$Season[Data$Month == "Jun" | Data$Month == "Jul" | Data$Month == "Aug"] = "Summer"
Data$Season[Data$Month == "Sep" | Data$Month == "Oct" | Data$Month == "Nov"] = "Fall"
  # Added variable Season to better analyze seasonality of data over time (more with this later)

Winter_Months = filter(Data, Season == "Winter")
Spring_Months = filter(Data, Season == "Spring")
Summer_Months = filter(Data, Season == "Summer")
Fall_Months = filter(Data, Season == "Fall")

2 Data Visualization

Before beginning model creation, I wanted to visualize two essential parts of the data at hand, its trend and seasonal components. These elements play an important role in understanding the model-to-model accuracy comparisons that will be examined later.

2.1 Trend

Below I plotted the chosen analysis interval (January 2009 to October 2021). What we see is a tale of two very different half-decades; with a rather mildly increasing trend from 2009 to about 2016, and then a large and exponential increase in market value from 2016 onward.

Time_Series = ts(Data$Open, frequency = 12, start = c(2009,1))

plot(Time_Series,
     main = "Amazon Stock Opening Price (2009 - 2021)",
     xlab = "Year",
     ylab = "Price ($USD)",
     lwd = 3)

2.2 Seasonality

While the trend component of Amazon’s stock opening is rather straightforward, it is also valuable to breakdown the seasonal aspect of its historical performance. Below is a table averaging its opening price per season for each respective year, with the total across all the cited years at the bottom.

# Making a season-by-season breakdown throughout the designated data range
Season_By_Season = group_by(Data, Season, Year)
Season_By_Season = summarize(Season_By_Season, mean(Open))
colnames(Season_By_Season)[3] = "Open"
Season_By_Season = sort_by(Season_By_Season, Season_By_Season$Year, Season_By_Season$Season)
Season_By_Season$Open = round(Season_By_Season$Open,2)

Years = data.frame(c("2009", "2010", "2011", "2012", "2013","2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"))
colnames(Years) = "Year"

Winter_Seasons = Season_By_Season[Season_By_Season$Season == "Winter", ]
Spring_Seasons = Season_By_Season[Season_By_Season$Season == "Spring", ]
Summer_Seasons = Season_By_Season[Season_By_Season$Season == "Summer", ]
Fall_Seasons = Season_By_Season[Season_By_Season$Season == "Fall", ]

# Subsetting that season-by-season breakdown to look at each season throughout the years

All_Seasons = cbind(Years, Winter_Seasons$Open, Spring_Seasons$Open, Summer_Seasons$Open, Fall_Seasons$Open)
  colnames(All_Seasons) = c("Year", "Winter", "Spring", "Summer", "Fall")

# To have an averages for the seasons across the years
Season_Totals = data.frame("All Years",
      round(mean(Winter_Months$Open),2),
      round(mean(Spring_Months$Open),2),
      round(mean(Summer_Months$Open),2),
      round(mean(Fall_Months$Open),2))
  colnames(Season_Totals) = c("Year", "Winter", "Spring", "Summer", "Fall")
  invisible(as.numeric(Season_Totals[,(2:5)]))
  
All_Seasons = rbind(All_Seasons, Season_Totals)
kable(All_Seasons, caption = "Amazon Stock Opening Season-By-Season", align = "c")
Amazon Stock Opening Season-By-Season
Year Winter Spring Summer Fall
2009 83.52 74.82 83.55 104.99
2010 141.88 133.00 121.50 158.39
2011 182.57 183.51 202.11 218.72
2012 206.41 201.55 227.00 245.09
2013 308.55 263.60 287.88 330.99
2014 353.48 330.20 330.07 318.70
2015 448.61 398.81 477.79 581.10
2016 633.37 626.20 740.80 793.35
2017 936.48 906.07 991.98 1036.30
2018 1440.47 1536.16 1792.45 1797.06
2019 1681.56 1818.38 1872.48 1775.40
2020 2383.95 2156.80 2967.89 3189.74
2021 3237.10 3227.87 3427.83 3379.21
All Years 865.18 912.07 1040.26 1010.73

As we can see, the summer has historically been Amazon’s best season according to the market, averaging a market open of $1040.26. Fall’s yearly average trails slightly behind at $1010.73, with spring and winter’s averages being noticeably lower. Beyond that, it is also visible that the difference in the stock’s lowest and highest seasonal average tends to increase in magnitude as the years went on. In 2009, the difference between Amazon’s best and worst seasonal average was $30.17. In 2021, that same metric was $199.96. This increasing magnitude of in-year seasonal difference can be seen at greater ease with the plots below.

###### Make a plot of all the data
plot(x = Winter_Seasons$Year, y = Winter_Seasons$Open,
     main = "Amazon Stock Opening Season-By-Season",
     xlab = "Year",
     ylab = "Value($USD)")
  lines(x = Winter_Seasons$Year, y = Winter_Seasons$Open, col = "red", lwd = 1.5)
    points(x = Winter_Seasons$Year, y = Winter_Seasons$Open, col = "red", pch = 21, bg = "red")
    
  lines(x = Spring_Seasons$Year, y = Spring_Seasons$Open, col = "black", lwd = 1.5)
    points(x = Spring_Seasons$Year, y = Spring_Seasons$Open, col = "black", pch = 21, bg = "black")
    
  lines(x = Summer_Seasons$Year, y = Summer_Seasons$Open, col = "green", lwd = 1.5)
    points(x = Summer_Seasons$Year, y = Summer_Seasons$Open, col = "green", pch = 21, bg = "green")
    
  lines(x = Fall_Seasons$Year, y = Fall_Seasons$Open, col = "purple", lwd = 1.5)
    points(x = Fall_Seasons$Year, y = Fall_Seasons$Open, col = "purple", pch = 21, bg = "purple")
    
    legend("topleft",
       legend = c("Winter", "Spring", "Summer", "Fall"),
       col = c("red", "black", "green", "purple"),
       lty = 1, pch = 16, lwd = 2)

###### Make a reduced plot, 2015 to 2021
Reduced_Winter_Seasons = Winter_Seasons[7:13,]
Reduced_Spring_Seasons = Spring_Seasons[7:13,]
Reduced_Summer_Seasons = Summer_Seasons[7:13,]
Reduced_Fall_Seasons = Fall_Seasons[7:13,]

plot(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open,
     main = "Amazon Stock Opening Season-By-Season \n (2015 - 2021)",
     xlab = "Year",
     ylab = "Value($USD)")
  lines(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open, col = "red", lwd = 2)
    points(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open, col = "red", pch = 21, bg = "red")
    
  lines(x = Reduced_Spring_Seasons$Year, y = Reduced_Spring_Seasons$Open, col = "black", lwd = 2)
    points(x = Reduced_Spring_Seasons$Year, y = Reduced_Spring_Seasons$Open, col = "black", pch = 21, bg = "black")
    
  lines(x = Reduced_Summer_Seasons$Year, y = Reduced_Summer_Seasons$Open, col = "green", lwd = 2)
    points(x = Reduced_Summer_Seasons$Year, y = Reduced_Summer_Seasons$Open, col = "green", pch = 21, bg = "green")
    
  lines(x = Reduced_Fall_Seasons$Year, y = Reduced_Fall_Seasons$Open, col = "purple", lwd = 2)
    points(x = Reduced_Fall_Seasons$Year, y = Reduced_Fall_Seasons$Open, col = "purple", pch = 21, bg = "purple")
    
   legend("topleft",
     legend = c("Winter", "Spring", "Summer", "Fall"),
     col = c("red", "black", "green", "purple"),
     lty = 1, pch = 16, lwd = 2)  

These plots offer a visual confirmation for the numerical patterns present in the table above. Graphically speaking, seasonal differences in average opening price are rather unnoticeable from 2009 to about 2016 or 2017. But as time continues beyond that point, the seasonal oscillation increases. This is a similar phenomenon to the trend component of the graph seen earlier, meaning that Amazon’s stock opening price season-by-season is inconsistent, and that inconsistency expands with the upward trend of the overall opening data.

3 Forecast Modeling

Now with a solid grasp of the data’s trend and seasonality, I moved forward and began the forecast modeling process. In total, I created eight models - one simple exponential smoothing, three Holt and four Holt-Winters.

3.1 Training and Testing Sets

Since it is poor statistical practice to test a model on the same data which it is trained/built on, I separated my data into two distinct subsets. The training dataset consisted of monthly observations from the nearly 12 year span of January 2009 to October 2020. Meanwhile, the testing dataset (the observations which my time series models will attempt to computationally predict or forecast), consists of monthly averages from November 2020 to October 2021.

Training_Data = Data[1:142,]
Testing_Data = Data[143:154,]

Time_Series_Training = ts(Training_Data$Open, frequency = 12, start = c(2009,1))
Time_Series_Testing = ts(Testing_Data$Open, frequency = 12, start = c(2020,11))

3.2 Model Creation

Setting each model’s forecast interval equal to 12, I constructed 8 time series models. A breakdown of each model type with a corresponding brief explanation is below.

  • Simple Exponential Smoothing (SES): Older observations are given an exponentially decreasing degree of weight in the calculation of forecasted values. This is done with the use of what is called smoothing constants. Forecasts from an SES model are all equal to the same value, as SES does not consider a dataset’s trend or seasonality, but rather simply calculates a weighted moving average.

  • Holt Additive: A generalization of the SES approach. Forecast calculations include the addition of the data’s trend parameter.

  • Holt Additive W/ Damp: A modification of the additive Holt approach, as it includes a damping parameter that reduces the absolute value of a trend’s increasing or decreasing effect on forecasts over time. This is a popular choice when it is believed that a trend will “flatten out” in the somewhat near future.

  • Holt Multiplicative W/ Damp: Carries the idea of its additive version, but the damping parameter is multiplied and takes the role of a proportional weight, rather than added as an absolute value. *Holt models do not explicitly include a dataset’s seasonal component in its calculations, however the multiplicative damp model is still seen as preferable to the additive when seasonal oscillations are inconsistent throughout the time length of the data.

  • Holt-Winters (HW) Additive: Parallel to the Holt additive model, but also takes into account the data’s seasonal oscillations.

  • HW Additive W/ Damp: Adds parameters representative of both the trend and seasonality respectively to the forecast calculations. Used when there is an assumption of a “flattening curve” phenomenon regarding the dataset’s trend.

  • HW Multiplicative:: Builds on the Holt multiplicative model by multiplying that calculation by a seasonal factor.

  • HW Multiplicative W/ Damp:: Used when we assume a decrease in the severity of data’s upward or downward trend. Factors in, via multiplication, both the response’s trend and seasonal oscillation.

# Use h = 12 for all these forecasts

##### Simple Exponential Smoothing
Simple_Model = ses(Time_Series_Training, h = 12) #Model 1
    Simple_Predictions = data.frame(round(Simple_Model$mean,2)) # the next 10 forecasts

##### Holt
  # "damped" is the assumption that the magnitude of time series' trend or seasonal components will DECREASE over time
Holt_Model_Add = holt(Time_Series_Training, h = 12, damped = FALSE) #Model 2
    Holt_Add_Predictions = data.frame(round(Holt_Model_Add$mean,2))
    
Holt_Model_Add_Damp = holt(Time_Series_Training, h = 12, damped = TRUE) #Model 3
    Holt_Add_Damp_Predictions = data.frame(round(Holt_Model_Add_Damp$mean,2))
    
Holt_Model_Mult_Damp = holt(Time_Series_Training, h = 12, exponential = TRUE, damped = TRUE) #Model 4
    Holt_Mult_Damp_Predictions = data.frame(round(Holt_Model_Mult_Damp$mean,2))

##### Holt-Winters (HW)
HW_Model_Add = hw(Time_Series_Training, h = 12, damped = FALSE, seasonal = "additive") #Model 5
    HW_Add_Predictions = data.frame(round(HW_Model_Add$mean,2))
  
HW_Model_Mult = hw(Time_Series_Training, h = 12, damped = FALSE, seasonal = "multiplicative") #Model 6
    HW_Mult_Predictions = data.frame(round(HW_Model_Mult$mean,2))
  
HW_Model_Add_Damp = hw(Time_Series_Training, h = 12, damped = TRUE, seasonal = "additive") #Model 7
    HW_Add_Damp_Predictions = data.frame(round(HW_Model_Add_Damp$mean,2))
    
HW_Model_Mult_Damp = hw(Time_Series_Training, h = 12, damped = TRUE, seasonal = "multiplicative") #Model 8
    HW_Mult_Damp_Predictions = data.frame(round(HW_Model_Mult_Damp$mean,2))    

Below we can see a breakdown of each model’s forecasted stock opening values for the 12 periods following the training set’s last observation (Nov 2020 - Oct 2021). A summary of some interesting takeaways are also available.

##### Putting them all together  
Predictions = cbind(Simple_Predictions, #1
                    Holt_Add_Predictions, #2
                    Holt_Add_Damp_Predictions, #3
                    Holt_Mult_Damp_Predictions, #4
                    HW_Add_Predictions, #5
                    HW_Mult_Predictions, #6
                    HW_Add_Damp_Predictions, #7 
                    HW_Mult_Damp_Predictions) #8

colnames(Predictions) = c("SES", "Holt Add", "Holt Add W/ Damp", "Holt Multiply W/ Damp", "HW Add", "HW Multiply", "HW Add W/ Damp", "HW Multiply W/ Damp")
kable(Predictions, caption = "Model Forecasts for Monthly Amazon Stock Openings", align = "c")
Model Forecasts for Monthly Amazon Stock Openings
SES Holt Add Holt Add W/ Damp Holt Multiply W/ Damp HW Add HW Multiply HW Add W/ Damp HW Multiply W/ Damp
3241.35 3340.39 3289.48 3300.31 3331.56 3212.56 3318.53 3197.97
3241.35 3439.42 3327.95 3360.33 3419.13 3237.60 3389.56 3200.23
3241.35 3538.45 3358.72 3409.14 3526.80 3386.01 3470.33 3326.71
3241.35 3637.48 3383.34 3448.69 3636.39 3473.67 3557.02 3472.25
3241.35 3736.51 3403.03 3480.66 3719.15 3493.00 3614.45 3430.89
3241.35 3835.53 3418.79 3506.45 3847.23 3780.15 3711.32 3696.34
3241.35 3934.56 3431.39 3527.23 3979.00 3984.88 3811.29 3830.22
3241.35 4033.59 3441.48 3543.93 4087.03 4274.96 3880.84 3946.92
3241.35 4132.62 3449.54 3557.35 4207.22 4678.05 3961.56 4258.44
3241.35 4231.65 3456.00 3568.13 4293.29 4670.11 4008.19 4273.38
3241.35 4330.68 3461.16 3576.77 4395.84 4695.49 4064.64 4165.20
3241.35 4429.70 3465.29 3583.70 4479.89 4599.32 4099.32 4050.32

First, we see that our SES model’s forecasts were all equal to $3241.35. This means that, if we were to assume no future deviation due to trend or seasonal complication, Amazon’s stock should continue to have an average opening of that amount. Next, the highest predicted values, by a substantial amount, came from the Holt additive, Holt-Winters additive and Holt-Winters multiplicative models. This is not surprising as they are the three models out of the eight created ones that take into account the increasing trend that Amazon’s market open had from January 2009 to October 2020 and operate under the assumption that such trend will continue at approximately the same rate.

3.3 Model Evaluation/Ranking Metrics

After looking at each models’ predicted values, I chose to run standard accuracy metrics on each one, using the training data to do so. Ultimate error analysis down the line will utilize the testing data.

Accuracy_Table = data.frame(rbind(round(accuracy(Simple_Model),4),
                       round(accuracy(Holt_Model_Add),4),
                       round(accuracy(Holt_Model_Add_Damp),4),
                       round(accuracy(Holt_Model_Mult_Damp),4),
                       round(accuracy(HW_Model_Add),4),
                       round(accuracy(HW_Model_Mult),4),
                       round(accuracy(HW_Model_Add_Damp),4),
                       round(accuracy(HW_Model_Mult_Damp),4)
                       ))
                       
rownames(Accuracy_Table) = colnames(Predictions)
kable(Accuracy_Table, caption = "Model Accuracy Summary Table", align = "c")
Model Accuracy Summary Table
ME RMSE MAE MPE MAPE MASE ACF1
SES 22.4633 78.0381 43.8711 2.6342 5.8786 0.1942 0.2736
Holt Add 6.3619 72.6341 39.5756 0.3233 5.5699 0.1752 0.1400
Holt Add W/ Damp 10.2700 72.0711 39.6539 1.1240 5.7019 0.1755 -0.0207
Holt Multiply W/ Damp 8.7280 71.7120 39.3588 0.9515 5.7406 0.1742 -0.0186
HW Add 6.1619 70.4673 41.9370 0.3457 7.9333 0.1856 0.1267
HW Multiply 6.1625 67.4218 41.9806 0.1399 6.6633 0.1858 0.0722
HW Add W/ Damp 9.0695 70.5768 42.2699 0.7657 8.0253 0.1871 0.1107
HW Multiply W/ Damp 11.6052 61.5790 39.8682 0.9885 6.2923 0.1765 0.1686

Along with creating the table seen above, I also created sorted subtables for each accuracy metric, ranking model performance from best to worst. I took particular note of the SES, Holt additive with damp and Holt multiplicative with damp models, as their predicted values are the most similar to the true opening stock values present in our testing set. The others all overforecasted to a noticeable degree.

Regarding the three models of heightened interest, the SES model actually ranked last in all metrics except MAPE, and both the damped Holt additive and damped Holt multiplicative models ranked anywhere between 3rd and 5th on most metrics. While this seems contradictory to my previous observation that these three models’ forecasts were most similar to the testing set’s true monthly average Amazon market opens, it is crucial to remember that the accuracy calculations at hand were performed via the training data. Therefore, this is something to note and to keep in mind while moving forward in the analysis.

Ranked_ME = data.frame(rownames(Accuracy_Table), Accuracy_Table$ME)
colnames(Ranked_ME) = c("Model", "ME")
  Ranked_ME = arrange(Ranked_ME, ME)
  
Ranked_RMSE = data.frame(rownames(Accuracy_Table), Accuracy_Table$RMSE)
colnames(Ranked_RMSE) = c("Model", "RMSE")
  Ranked_RMSE = arrange(Ranked_RMSE, RMSE)
  
Ranked_MAE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MAE)
colnames(Ranked_MAE) = c("Model", "MAE")
  Ranked_MAE = arrange(Ranked_MAE, MAE)
  
Ranked_MPE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MPE)
colnames(Ranked_MPE) = c("Model", "MPE")
  Ranked_MPE = arrange(Ranked_MPE, MPE)
  
Ranked_MAPE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MAPE)
colnames(Ranked_MAPE) = c("Model", "MAPE")
  Ranked_MAPE = arrange(Ranked_MAPE, MAPE)
  
Ranked_MASE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MASE)
colnames(Ranked_MASE) = c("Model", "MASE")
  Ranked_MASE = arrange(Ranked_MASE, MASE)
  
Ranked_ACF1 = data.frame(rownames(Accuracy_Table), Accuracy_Table$ACF1)
colnames(Ranked_ACF1) = c("Model", "ACF1")
  Ranked_ACF1 = arrange(Ranked_ACF1, ACF1)
  
kable(Ranked_ME)
kable(Ranked_RMSE)
kable(Ranked_MAE)
kable(Ranked_MPE)
kable(Ranked_MAPE)
kable(Ranked_MASE)
kable(Ranked_ACF1)

3.4 Model Plotting

Below I plotted the forecasts of all eight models in comparison to the values from our testing dataset. For greater visual ease, my first plot contains the four models whose predictions do not factor in seasonal oscillations. While my second plot displays the four seasonally-considerate models (the Holt-Winters ones).

# Seasonal
  colnames(Simple_Predictions) = "Open"
  colnames(Holt_Add_Predictions) = "Open"
  colnames(Holt_Add_Damp_Predictions) = "Open"
  colnames(Holt_Mult_Damp_Predictions) = "Open"

# Non-seasonal
  colnames(HW_Add_Predictions) = "Open"
  colnames(HW_Add_Damp_Predictions) = "Open"
  colnames(HW_Mult_Predictions) = "Open"
  colnames(HW_Mult_Damp_Predictions) = "Open"

List_Of_Dates = as.Date(c("2020-11-01", "2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01"))  
 
Testing_Data = cbind(List_Of_Dates, Testing_Data)
colnames(Testing_Data)[1] = "Date"

###### Plot 1: Non-Seasonal Methods
plot(x = Testing_Data$Date, y = Testing_Data$Open,
     main = "Non-Seasonal Forecasting Methods", 
     mtext("Simple Exponential Smoothing and Holt Models", side = 3, line = 0.5, cex = 0.8),
     xlab = "Time (Nov 2020 - Oct 2021)",
     ylab = "Amazon Stock Opening ($USD)",
     ylim = c(3000, 4500))
  lines(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", lwd = 3)
    points(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", pch = 8, bg = "black")
   
  # Simple   
  lines(x = Testing_Data$Date, y = Simple_Predictions$Open, col = "darkblue", lwd = 2)
    points(x = Testing_Data$Date, y = Simple_Predictions$Open, col = "darkblue", pch = 21, bg = "darkblue")
    
  # Holt Add W/O Damp
  lines(x = Testing_Data$Date, y = Holt_Add_Predictions$Open, col = "darkred", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Add_Predictions$Open, col = "darkred", pch = 21, bg = "darkred")
    
  # Holt Add W/ Damp
  lines(x = Testing_Data$Date, y = Holt_Add_Damp_Predictions$Open, col = "darkgreen", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Add_Damp_Predictions$Open, col = "darkgreen", pch = 21, bg = "darkgreen") 
    
  # Holt Mult W/ Damp
  lines(x = Testing_Data$Date, y = Holt_Mult_Damp_Predictions$Open, col = "gray", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Mult_Damp_Predictions$Open, col = "gray", pch = 21, bg = "gray")
    
    legend("topleft",
     legend = c("True Values", "Simple", "Holt Add", "Holt Add W/ Damp", "Holt Mult W/ Damp"),
     col = c("black", "darkblue", "darkred", "darkgreen", "gray"),
     lty = 1, pch = 16, lwd = 2)

From our non-seasonal plot, it is noticeable right away that the SES and damped Holt models performed far better predictively than the non-damped additive Holt model. Meanwhile, the multiplicative Holt model overforecasted more than the additive Holt or SES model.

###### Plot 2: Seasonal HW Methods
plot(x = Testing_Data$Date, y = Testing_Data$Open,
     main = "Seasonal Forecasting Methods", 
     mtext("Holt-Winters Models", side = 3, line = 0.5, cex = 0.8),
     xlab = "Time (Nov 2020 - Oct 2021)",
     ylab = "Amazon Stock Opening ($USD)",
     ylim = c(3000, 4800))
  lines(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", lwd = 3)
    points(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", pch = 8, bg = "black")
   
  # Simple   
  lines(x = Testing_Data$Date, y = HW_Add_Predictions$Open, col = "darkblue", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Add_Predictions$Open, col = "darkblue", pch = 21, bg = "darkblue")
    
  # Holt Add W/O Damp
  lines(x = Testing_Data$Date, y = HW_Add_Damp_Predictions$Open, col = "darkred", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Add_Damp_Predictions$Open, col = "darkred", pch = 21, bg = "darkred")
    
  # Holt Add W/ Damp
  lines(x = Testing_Data$Date, y = HW_Mult_Predictions$Open, col = "darkgreen", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Mult_Predictions$Open, col = "darkgreen", pch = 21, bg = "darkgreen") 
    
  # Holt Mult W/ Damp
  lines(x = Testing_Data$Date, y = HW_Mult_Damp_Predictions$Open, col = "gray", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Mult_Damp_Predictions$Open, col = "gray", pch = 21, bg = "gray")
    
   legend("topleft",
     legend = c("True Values", "HW Add", "HW Add W/ Damp", "HW Mult", "HW Mult W/ Damp"),
     col = c("black", "darkblue", "darkred", "darkgreen", "gray"),
     lty = 1, pch = 16, lwd = 2)

After taking a look at the four Holts-Winters (HW) models - the ones that consider a dataset’s seasonal component - I saw clear and consistent overprediction. Even the damped HW models, which operate under the presumption that an increasing or decreasing trend previously seen in the data will mitigate in effect over time, did not have a forecasted value close to matching the true recorded average opening market value for Amazon those selected months.

After reviewing each of the eight models’ forecasted values, and then seeing them all plotted in relation to the known values from our testing dataset, I chose to designate the damped additive Holt model and the SES model as the two most effective in their prediction accuracy. Next up I took a closer look at the two, running error analysis, this time by comparing each model’s forecasts to the testing data values.

3.5 Final Model Selection

Finally, before selecting which of these models was most effective in this predictive analysis scenario, I wanted to look at each model’s errors, both observation-by-observation and holistically. I created a table that broke down each model’s error for each of the 12 months’ forecasts, as well as calculating the respective mean square error (MSE) and mean absolute prediction error (MAPE). Per these calculations, I deemed the SES model as the most effective for forecasting purposes of the eight available.

Simple_Errors = Simple_Predictions - Testing_Data$Open
Holt_Add_Damp_Errors = Holt_Add_Damp_Predictions - Testing_Data$Open

Model_Select_Table = cbind(Testing_Data$Open, 
                           Simple_Predictions, Simple_Errors,
                           Holt_Add_Damp_Predictions, Holt_Add_Damp_Errors)

colnames(Model_Select_Table) = c("True Values", "Simple Smoothing", "Simple Smoothing Errors", "Holt Addition W/ Damp", "Holt Errors")
kable(Model_Select_Table, caption = "Prediction and Error Breakdown for SES and Additive Holt Models", align = "c")
Prediction and Error Breakdown for SES and Additive Holt Models
True Values Simple Smoothing Simple Smoothing Errors Holt Addition W/ Damp Holt Errors
3147.33 3241.35 94.02 3289.48 142.15
3199.93 3241.35 41.42 3327.95 128.02
3206.54 3241.35 34.81 3358.72 152.18
3267.66 3241.35 -26.31 3383.34 115.68
3074.58 3241.35 166.77 3403.03 328.45
3347.73 3241.35 -106.38 3418.79 71.06
3261.31 3241.35 -19.96 3431.39 170.08
3360.01 3241.35 -118.66 3441.48 81.47
3612.71 3241.35 -371.36 3449.54 -163.17
3310.76 3241.35 -69.41 3456.00 145.24
3432.44 3241.35 -191.09 3461.16 28.72
3325.98 3241.35 -84.63 3465.29 139.31
# MSE
Simple_Err_MSE = sum(Simple_Errors^2)/nrow(Simple_Errors)
Holt_Add_Damp_Errors_MSE = sum(Holt_Add_Damp_Errors^2)/nrow(Holt_Add_Damp_Errors)

# MAPE
Simple_Err_MAPE = (sum(abs((Simple_Errors/Testing_Data$Open)*100)))/nrow(Simple_Errors)
Holt_Add_Damp_Errors_MAPE = (sum(abs((Holt_Add_Damp_Errors/Testing_Data$Open)*100)))/nrow(Holt_Add_Damp_Errors)

Error_Summary = data.frame(rbind(Simple_Err_MSE, Holt_Add_Damp_Errors_MSE),
                            rbind(Simple_Err_MAPE, Holt_Add_Damp_Errors_MAPE))
colnames(Error_Summary) = c("MSE", "MAPE")
rownames(Error_Summary) = c("Simple Exponential Smoothing", "Holt Additive W/ Damp")
kable(Error_Summary, caption = "Accuracy Calculations for SES and Additive Holt Models", align = "c")
Accuracy Calculations for SES and Additive Holt Models
MSE MAPE
Simple Exponential Smoothing 21039.21 3.283775
Holt Additive W/ Damp 24131.76 4.264767

4 Conclusion

Ultimately, of the eight models that were created throughout this analysis, the most straightforward one - simple exponential smoothing (SES) - was the most effective. Retrospectively, I believe this can be attributed to two reasons; the first being the spike in the rate of Amazon’s stock increase from about 2018 to October of 2021 (the ending quarter of the data used for this analysis). The magnitude of the rise in value, in relation to the amount of time passed during that interval, was so much greater than the first three quarters of the dataset. Because of this, all of the non-damped models (both Holt and HW), drastically and consistently overforecasted. Essentially, the assumption of these non-damped models that the trend visible in the dataset (and more heavily influenced by more recent observations) was representative of a new constant rate of increase, was the downfall in their forecasting accuracy.

The second explanation for SES turning out to be the most accurate in this instance is that the data’s seasonal component was rather parallel to the trend in its behavior relative to the time period. It was visible in the season-by-season plots that the seasonal differences in average Amazon stock opening during the last years of the time interval were much greater than said differences in the early years of the time interval. All the HW models, both damped and non-damped, take seasonality into calculative consideration. Too much weight was given to the increase in seasonal disparity in the most recent years, leading to consistent overforecasting.

In the future, it would be useful to analyze Amazon’s historical monthly average market opening price with a method less vulnerable to near-term and recent spikes in overall increasing/decreasing trends and/or seasonal oscillations. # References:

Original Dataset Source:

Dataset Download Link via Github:

---
title: "Time Series Analysis of Amazon's Month-to-Month Stock Price"
author: "Chris Bahm"
date: "2025-11-25"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: true
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(forecast)) {library(forecast)} 

if (!require(lubridate)) {library(lubridate)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE
)
```

# Introduction and Background 
The dataset I examined today tracked the day-to-day performance of Amazon's stock from May 15, 1997 to October 27, 2021. These daily observations recorded the stock's price at market opening and closing, as well as its high, low, adjusted closing rate and trading volume for the day.

For the purposes of my analysis, I chose to track the history of Amazon's market performance on a month-by-month basis. I took the stock's average opening price for each month, and constrained the timeline from January 2009 to October of 2021, a total of 154 samples of monthly data. I used this subset of the data to create a number of time series models with the hopes of formulating at least one that can forecast Amazon's average market opening on a month-by-month timeline with relative accuracy. 

This is not an associative analysis that attempts to greatly explain *why* the stock's opening price has historically moved the way it has, but rather simply a predictive tool. The primary methods of forecast modeling I utilized include exponential smoothing, Holt and Holt-Winters calculations.

```{r Data Loading and Cleaning}
Data_Url= "https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/Amazon.csv"
Original_Data = read_delim(Data_Url, delim = ",", show_col_types = FALSE)

#(glimpse(Original_Data))
  # Everything stored the right way ^^

#sum(is.na(Original_Data)) = 0
Data = mutate(Original_Data,
       Month = month(Date, label = TRUE),
       Year = year(Date),
  )

Data = group_by(Data, Month, Year)
Data = summarize(Data, mean(Open))
  # ^^ Since I am looking at month-by-month opens, need to summarize and group
Data = sort_by(Data, Data$Year, Data$Month)
  # ^^ Prioritized sorting, first sort by year (oldest first), and then by month of year (January to December)

# Restricting my analysis from Jan 2009 to Oct 2021, rounding opening price to nearest cent
Data = Data[141:294,]
colnames(Data)[3] = "Open"
Data$Open = round(Data$Open,2)  

Data$Season[Data$Month == "Dec" | Data$Month == "Jan" | Data$Month == "Feb"] = "Winter"
Data$Season[Data$Month == "Mar" | Data$Month == "Apr" | Data$Month == "May"] = "Spring"
Data$Season[Data$Month == "Jun" | Data$Month == "Jul" | Data$Month == "Aug"] = "Summer"
Data$Season[Data$Month == "Sep" | Data$Month == "Oct" | Data$Month == "Nov"] = "Fall"
  # Added variable Season to better analyze seasonality of data over time (more with this later)

Winter_Months = filter(Data, Season == "Winter")
Spring_Months = filter(Data, Season == "Spring")
Summer_Months = filter(Data, Season == "Summer")
Fall_Months = filter(Data, Season == "Fall")
```

# Data Visualization
Before beginning model creation, I wanted to visualize two essential parts of the data at hand, its trend and seasonal components. These elements play an important role in understanding the model-to-model accuracy comparisons that will be examined later.

## Trend
Below I plotted the chosen analysis interval (January 2009 to October 2021). What we see is a tale of two very different half-decades; with a rather mildly increasing trend from 2009 to about 2016, and then a large and exponential increase in market value from 2016 onward.
```{r, fig.align='center', fig.height = 5}
Time_Series = ts(Data$Open, frequency = 12, start = c(2009,1))

plot(Time_Series,
     main = "Amazon Stock Opening Price (2009 - 2021)",
     xlab = "Year",
     ylab = "Price ($USD)",
     lwd = 3)
```

## Seasonality
While the trend component of Amazon's stock opening is rather straightforward, it is also valuable to breakdown the seasonal aspect of its historical performance. Below is a table averaging its opening price per season for each respective year, with the total across all the cited years at the bottom. 

```{r Data Visualization, fig.align='center', fig.height = 5}
# Making a season-by-season breakdown throughout the designated data range
Season_By_Season = group_by(Data, Season, Year)
Season_By_Season = summarize(Season_By_Season, mean(Open))
colnames(Season_By_Season)[3] = "Open"
Season_By_Season = sort_by(Season_By_Season, Season_By_Season$Year, Season_By_Season$Season)
Season_By_Season$Open = round(Season_By_Season$Open,2)

Years = data.frame(c("2009", "2010", "2011", "2012", "2013","2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"))
colnames(Years) = "Year"

Winter_Seasons = Season_By_Season[Season_By_Season$Season == "Winter", ]
Spring_Seasons = Season_By_Season[Season_By_Season$Season == "Spring", ]
Summer_Seasons = Season_By_Season[Season_By_Season$Season == "Summer", ]
Fall_Seasons = Season_By_Season[Season_By_Season$Season == "Fall", ]

# Subsetting that season-by-season breakdown to look at each season throughout the years

All_Seasons = cbind(Years, Winter_Seasons$Open, Spring_Seasons$Open, Summer_Seasons$Open, Fall_Seasons$Open)
  colnames(All_Seasons) = c("Year", "Winter", "Spring", "Summer", "Fall")

# To have an averages for the seasons across the years
Season_Totals = data.frame("All Years",
      round(mean(Winter_Months$Open),2),
      round(mean(Spring_Months$Open),2),
      round(mean(Summer_Months$Open),2),
      round(mean(Fall_Months$Open),2))
  colnames(Season_Totals) = c("Year", "Winter", "Spring", "Summer", "Fall")
  invisible(as.numeric(Season_Totals[,(2:5)]))
  
All_Seasons = rbind(All_Seasons, Season_Totals)
kable(All_Seasons, caption = "Amazon Stock Opening Season-By-Season", align = "c")
```

As we can see, the summer has historically been Amazon's best season according to the market, averaging a market open of $1040.26. Fall's yearly average trails slightly behind at $1010.73, with spring and winter's averages being noticeably lower. Beyond that, it is also visible that the difference in the stock's lowest and highest seasonal average tends to increase in magnitude as the years went on. In 2009, the difference between Amazon's best and worst seasonal average was $30.17. In 2021, that same metric was $199.96. This increasing magnitude of in-year seasonal difference can be seen at greater ease with the plots below.

```{r, fig.align='center', fig.height = 5}
###### Make a plot of all the data
plot(x = Winter_Seasons$Year, y = Winter_Seasons$Open,
     main = "Amazon Stock Opening Season-By-Season",
     xlab = "Year",
     ylab = "Value($USD)")
  lines(x = Winter_Seasons$Year, y = Winter_Seasons$Open, col = "red", lwd = 1.5)
    points(x = Winter_Seasons$Year, y = Winter_Seasons$Open, col = "red", pch = 21, bg = "red")
    
  lines(x = Spring_Seasons$Year, y = Spring_Seasons$Open, col = "black", lwd = 1.5)
    points(x = Spring_Seasons$Year, y = Spring_Seasons$Open, col = "black", pch = 21, bg = "black")
    
  lines(x = Summer_Seasons$Year, y = Summer_Seasons$Open, col = "green", lwd = 1.5)
    points(x = Summer_Seasons$Year, y = Summer_Seasons$Open, col = "green", pch = 21, bg = "green")
    
  lines(x = Fall_Seasons$Year, y = Fall_Seasons$Open, col = "purple", lwd = 1.5)
    points(x = Fall_Seasons$Year, y = Fall_Seasons$Open, col = "purple", pch = 21, bg = "purple")
    
    legend("topleft",
       legend = c("Winter", "Spring", "Summer", "Fall"),
       col = c("red", "black", "green", "purple"),
       lty = 1, pch = 16, lwd = 2)
    
###### Make a reduced plot, 2015 to 2021
Reduced_Winter_Seasons = Winter_Seasons[7:13,]
Reduced_Spring_Seasons = Spring_Seasons[7:13,]
Reduced_Summer_Seasons = Summer_Seasons[7:13,]
Reduced_Fall_Seasons = Fall_Seasons[7:13,]

plot(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open,
     main = "Amazon Stock Opening Season-By-Season \n (2015 - 2021)",
     xlab = "Year",
     ylab = "Value($USD)")
  lines(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open, col = "red", lwd = 2)
    points(x = Reduced_Winter_Seasons$Year, y = Reduced_Winter_Seasons$Open, col = "red", pch = 21, bg = "red")
    
  lines(x = Reduced_Spring_Seasons$Year, y = Reduced_Spring_Seasons$Open, col = "black", lwd = 2)
    points(x = Reduced_Spring_Seasons$Year, y = Reduced_Spring_Seasons$Open, col = "black", pch = 21, bg = "black")
    
  lines(x = Reduced_Summer_Seasons$Year, y = Reduced_Summer_Seasons$Open, col = "green", lwd = 2)
    points(x = Reduced_Summer_Seasons$Year, y = Reduced_Summer_Seasons$Open, col = "green", pch = 21, bg = "green")
    
  lines(x = Reduced_Fall_Seasons$Year, y = Reduced_Fall_Seasons$Open, col = "purple", lwd = 2)
    points(x = Reduced_Fall_Seasons$Year, y = Reduced_Fall_Seasons$Open, col = "purple", pch = 21, bg = "purple")
    
   legend("topleft",
     legend = c("Winter", "Spring", "Summer", "Fall"),
     col = c("red", "black", "green", "purple"),
     lty = 1, pch = 16, lwd = 2)  
```

These plots offer a visual confirmation for the numerical patterns present in the table above. Graphically speaking, seasonal differences in average opening price are rather unnoticeable from 2009 to about 2016 or 2017. But as time continues beyond that point, the seasonal oscillation increases. This is a similar phenomenon to the trend component of the graph seen earlier, meaning that Amazon's stock opening price season-by-season is inconsistent, and that inconsistency expands with the upward trend of the overall opening data.

# Forecast Modeling
Now with a solid grasp of the data's trend and seasonality, I moved forward and began the forecast modeling process. In total, I created eight models - one simple exponential smoothing, three Holt and four Holt-Winters.

## Training and Testing Sets
Since it is poor statistical practice to test a model on the same data which it is trained/built on, I separated my data into two distinct subsets. The training dataset consisted of monthly observations from the nearly 12 year span of January 2009 to October 2020. Meanwhile, the testing dataset (the observations which my time series models will attempt to computationally predict or forecast), consists of monthly averages from November 2020 to October 2021.
```{r}
Training_Data = Data[1:142,]
Testing_Data = Data[143:154,]

Time_Series_Training = ts(Training_Data$Open, frequency = 12, start = c(2009,1))
Time_Series_Testing = ts(Testing_Data$Open, frequency = 12, start = c(2020,11))
```

## Model Creation
Setting each model's forecast interval equal to 12, I constructed 8 time series models. A breakdown of each model type with a corresponding brief explanation is below.

- Simple Exponential Smoothing (SES): Older observations are given an exponentially decreasing degree of weight in the calculation of forecasted values. This is done with the use of what is called smoothing constants. Forecasts from an SES model are all equal to the same value, as SES does not consider a dataset's trend or seasonality, but rather simply calculates a weighted moving average.

- <span style="color:red;">Holt Additive</span>: A generalization of the SES approach. Forecast calculations include the **addition** of the data's trend parameter. 

- <span style="color:red;">Holt Additive W/ Damp</span>: A modification of the additive Holt approach, as it includes a damping parameter that reduces the absolute value of a trend's increasing or decreasing effect on forecasts over time. This is a popular choice when it is believed that a trend will "flatten out" in the somewhat near future.

- <span style="color:red;">Holt Multiplicative W/ Damp</span>: Carries the idea of its additive version, but the damping parameter is **multiplied** and takes the role of a proportional weight, rather than added as an absolute value.  *Holt models do not explicitly include a dataset's seasonal component in its calculations, however the multiplicative damp model is still seen as preferable to the additive when seasonal oscillations are inconsistent throughout the time length of the data.

- <span style="color:#0000FF;">Holt-Winters (HW) Additive</span>: Parallel to the Holt additive model, but also takes into account the data's seasonal oscillations. 

- <span style="color:#0000FF;">HW Additive W/ Damp</span>: Adds parameters representative of both the trend **and** seasonality respectively to the forecast calculations. Used when there is an assumption of a "flattening curve" phenomenon regarding the dataset's trend.

- <span style="color:#0000FF;">HW Multiplicative:</span>: Builds on the Holt multiplicative model by multiplying that calculation by a seasonal factor.

- <span style="color:#0000FF;">HW Multiplicative W/ Damp:</span>: Used when we assume a decrease in the severity of data's upward or downward trend. Factors in, via multiplication, both the response's trend and seasonal oscillation. 
```{r, fig.align='center', fig.width=5} 
# Use h = 12 for all these forecasts

##### Simple Exponential Smoothing
Simple_Model = ses(Time_Series_Training, h = 12) #Model 1
    Simple_Predictions = data.frame(round(Simple_Model$mean,2)) # the next 10 forecasts

##### Holt
  # "damped" is the assumption that the magnitude of time series' trend or seasonal components will DECREASE over time
Holt_Model_Add = holt(Time_Series_Training, h = 12, damped = FALSE) #Model 2
    Holt_Add_Predictions = data.frame(round(Holt_Model_Add$mean,2))
    
Holt_Model_Add_Damp = holt(Time_Series_Training, h = 12, damped = TRUE) #Model 3
    Holt_Add_Damp_Predictions = data.frame(round(Holt_Model_Add_Damp$mean,2))
    
Holt_Model_Mult_Damp = holt(Time_Series_Training, h = 12, exponential = TRUE, damped = TRUE) #Model 4
    Holt_Mult_Damp_Predictions = data.frame(round(Holt_Model_Mult_Damp$mean,2))

##### Holt-Winters (HW)
HW_Model_Add = hw(Time_Series_Training, h = 12, damped = FALSE, seasonal = "additive") #Model 5
    HW_Add_Predictions = data.frame(round(HW_Model_Add$mean,2))
  
HW_Model_Mult = hw(Time_Series_Training, h = 12, damped = FALSE, seasonal = "multiplicative") #Model 6
    HW_Mult_Predictions = data.frame(round(HW_Model_Mult$mean,2))
  
HW_Model_Add_Damp = hw(Time_Series_Training, h = 12, damped = TRUE, seasonal = "additive") #Model 7
    HW_Add_Damp_Predictions = data.frame(round(HW_Model_Add_Damp$mean,2))
    
HW_Model_Mult_Damp = hw(Time_Series_Training, h = 12, damped = TRUE, seasonal = "multiplicative") #Model 8
    HW_Mult_Damp_Predictions = data.frame(round(HW_Model_Mult_Damp$mean,2))    
```

Below we can see a breakdown of each model's forecasted stock opening values for the 12 periods following the training set's last observation (Nov 2020 - Oct 2021). A summary of some interesting takeaways are also available.

```{r}
##### Putting them all together  
Predictions = cbind(Simple_Predictions, #1
                    Holt_Add_Predictions, #2
                    Holt_Add_Damp_Predictions, #3
                    Holt_Mult_Damp_Predictions, #4
                    HW_Add_Predictions, #5
                    HW_Mult_Predictions, #6
                    HW_Add_Damp_Predictions, #7 
                    HW_Mult_Damp_Predictions) #8

colnames(Predictions) = c("SES", "Holt Add", "Holt Add W/ Damp", "Holt Multiply W/ Damp", "HW Add", "HW Multiply", "HW Add W/ Damp", "HW Multiply W/ Damp")
kable(Predictions, caption = "Model Forecasts for Monthly Amazon Stock Openings", align = "c")
```

First, we see that our SES model's forecasts were all equal to $3241.35. This means that, if we were to assume no future deviation due to trend or seasonal complication, Amazon's stock should continue to have an average opening of that amount. Next, the highest predicted values, by a substantial amount, came from the Holt additive, Holt-Winters additive and Holt-Winters multiplicative models. This is not surprising as they are the three models out of the eight created ones that take into account the increasing trend that Amazon's market open had from January 2009 to October 2020 **and** operate under the assumption that such trend will continue at approximately the same rate.

## Model Evaluation/Ranking Metrics
After looking at each models' predicted values, I chose to run standard accuracy metrics on each one, using the **training** data to do so. Ultimate error analysis down the line will utilize the testing data.

```{r, fig.align='center'}
Accuracy_Table = data.frame(rbind(round(accuracy(Simple_Model),4),
                       round(accuracy(Holt_Model_Add),4),
                       round(accuracy(Holt_Model_Add_Damp),4),
                       round(accuracy(Holt_Model_Mult_Damp),4),
                       round(accuracy(HW_Model_Add),4),
                       round(accuracy(HW_Model_Mult),4),
                       round(accuracy(HW_Model_Add_Damp),4),
                       round(accuracy(HW_Model_Mult_Damp),4)
                       ))
                       
rownames(Accuracy_Table) = colnames(Predictions)
kable(Accuracy_Table, caption = "Model Accuracy Summary Table", align = "c")
```

Along with creating the table seen above, I also created sorted subtables for each accuracy metric, ranking model performance from best to worst. I took particular note of the SES, Holt additive with damp and Holt multiplicative with damp models, as their predicted values are the most similar to the true opening stock values present in our testing set. The others all overforecasted to a noticeable degree. 

Regarding the three models of heightened interest, the SES model actually ranked *last* in all metrics except MAPE, and both the damped Holt additive and damped Holt multiplicative models ranked anywhere between 3rd and 5th on most metrics. While this seems contradictory to my previous observation that these three models' forecasts were most similar to the testing set's true monthly average Amazon market opens, it is crucial to remember that the accuracy calculations at hand were performed via the training data. Therefore, this is something to note and to keep in mind while moving forward in the analysis.

```{r, results='hide'}
Ranked_ME = data.frame(rownames(Accuracy_Table), Accuracy_Table$ME)
colnames(Ranked_ME) = c("Model", "ME")
  Ranked_ME = arrange(Ranked_ME, ME)
  
Ranked_RMSE = data.frame(rownames(Accuracy_Table), Accuracy_Table$RMSE)
colnames(Ranked_RMSE) = c("Model", "RMSE")
  Ranked_RMSE = arrange(Ranked_RMSE, RMSE)
  
Ranked_MAE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MAE)
colnames(Ranked_MAE) = c("Model", "MAE")
  Ranked_MAE = arrange(Ranked_MAE, MAE)
  
Ranked_MPE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MPE)
colnames(Ranked_MPE) = c("Model", "MPE")
  Ranked_MPE = arrange(Ranked_MPE, MPE)
  
Ranked_MAPE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MAPE)
colnames(Ranked_MAPE) = c("Model", "MAPE")
  Ranked_MAPE = arrange(Ranked_MAPE, MAPE)
  
Ranked_MASE = data.frame(rownames(Accuracy_Table), Accuracy_Table$MASE)
colnames(Ranked_MASE) = c("Model", "MASE")
  Ranked_MASE = arrange(Ranked_MASE, MASE)
  
Ranked_ACF1 = data.frame(rownames(Accuracy_Table), Accuracy_Table$ACF1)
colnames(Ranked_ACF1) = c("Model", "ACF1")
  Ranked_ACF1 = arrange(Ranked_ACF1, ACF1)
  
kable(Ranked_ME)
kable(Ranked_RMSE)
kable(Ranked_MAE)
kable(Ranked_MPE)
kable(Ranked_MAPE)
kable(Ranked_MASE)
kable(Ranked_ACF1)
```

## Model Plotting
Below I plotted the forecasts of all eight models in comparison to the values from our testing dataset.
For greater visual ease, my first plot contains the four models whose predictions do not factor in seasonal oscillations. While my second plot displays the four seasonally-considerate models (the Holt-Winters ones).

```{r, fig.height = 5, fig.align='center'}
# Seasonal
  colnames(Simple_Predictions) = "Open"
  colnames(Holt_Add_Predictions) = "Open"
  colnames(Holt_Add_Damp_Predictions) = "Open"
  colnames(Holt_Mult_Damp_Predictions) = "Open"

# Non-seasonal
  colnames(HW_Add_Predictions) = "Open"
  colnames(HW_Add_Damp_Predictions) = "Open"
  colnames(HW_Mult_Predictions) = "Open"
  colnames(HW_Mult_Damp_Predictions) = "Open"

List_Of_Dates = as.Date(c("2020-11-01", "2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01"))  
 
Testing_Data = cbind(List_Of_Dates, Testing_Data)
colnames(Testing_Data)[1] = "Date"

###### Plot 1: Non-Seasonal Methods
plot(x = Testing_Data$Date, y = Testing_Data$Open,
     main = "Non-Seasonal Forecasting Methods", 
     mtext("Simple Exponential Smoothing and Holt Models", side = 3, line = 0.5, cex = 0.8),
     xlab = "Time (Nov 2020 - Oct 2021)",
     ylab = "Amazon Stock Opening ($USD)",
     ylim = c(3000, 4500))
  lines(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", lwd = 3)
    points(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", pch = 8, bg = "black")
   
  # Simple   
  lines(x = Testing_Data$Date, y = Simple_Predictions$Open, col = "darkblue", lwd = 2)
    points(x = Testing_Data$Date, y = Simple_Predictions$Open, col = "darkblue", pch = 21, bg = "darkblue")
    
  # Holt Add W/O Damp
  lines(x = Testing_Data$Date, y = Holt_Add_Predictions$Open, col = "darkred", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Add_Predictions$Open, col = "darkred", pch = 21, bg = "darkred")
    
  # Holt Add W/ Damp
  lines(x = Testing_Data$Date, y = Holt_Add_Damp_Predictions$Open, col = "darkgreen", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Add_Damp_Predictions$Open, col = "darkgreen", pch = 21, bg = "darkgreen") 
    
  # Holt Mult W/ Damp
  lines(x = Testing_Data$Date, y = Holt_Mult_Damp_Predictions$Open, col = "gray", lwd = 2)
    points(x = Testing_Data$Date, y = Holt_Mult_Damp_Predictions$Open, col = "gray", pch = 21, bg = "gray")
    
    legend("topleft",
     legend = c("True Values", "Simple", "Holt Add", "Holt Add W/ Damp", "Holt Mult W/ Damp"),
     col = c("black", "darkblue", "darkred", "darkgreen", "gray"),
     lty = 1, pch = 16, lwd = 2)
```

From our non-seasonal plot, it is noticeable right away that the SES and damped Holt models performed **far** better predictively than the non-damped additive Holt model. Meanwhile, the multiplicative Holt model overforecasted more than the additive Holt or SES model.

```{r, fig.height = 5, fig.align='center'}  
###### Plot 2: Seasonal HW Methods
plot(x = Testing_Data$Date, y = Testing_Data$Open,
     main = "Seasonal Forecasting Methods", 
     mtext("Holt-Winters Models", side = 3, line = 0.5, cex = 0.8),
     xlab = "Time (Nov 2020 - Oct 2021)",
     ylab = "Amazon Stock Opening ($USD)",
     ylim = c(3000, 4800))
  lines(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", lwd = 3)
    points(x = Testing_Data$Date, y = Testing_Data$Open, col = "black", pch = 8, bg = "black")
   
  # Simple   
  lines(x = Testing_Data$Date, y = HW_Add_Predictions$Open, col = "darkblue", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Add_Predictions$Open, col = "darkblue", pch = 21, bg = "darkblue")
    
  # Holt Add W/O Damp
  lines(x = Testing_Data$Date, y = HW_Add_Damp_Predictions$Open, col = "darkred", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Add_Damp_Predictions$Open, col = "darkred", pch = 21, bg = "darkred")
    
  # Holt Add W/ Damp
  lines(x = Testing_Data$Date, y = HW_Mult_Predictions$Open, col = "darkgreen", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Mult_Predictions$Open, col = "darkgreen", pch = 21, bg = "darkgreen") 
    
  # Holt Mult W/ Damp
  lines(x = Testing_Data$Date, y = HW_Mult_Damp_Predictions$Open, col = "gray", lwd = 2)
    points(x = Testing_Data$Date, y = HW_Mult_Damp_Predictions$Open, col = "gray", pch = 21, bg = "gray")
    
   legend("topleft",
     legend = c("True Values", "HW Add", "HW Add W/ Damp", "HW Mult", "HW Mult W/ Damp"),
     col = c("black", "darkblue", "darkred", "darkgreen", "gray"),
     lty = 1, pch = 16, lwd = 2)
```

After taking a look at the four Holts-Winters (HW) models - the ones that consider a dataset's seasonal component - I saw clear and consistent overprediction. Even the damped HW models, which operate under the presumption that an increasing or decreasing trend previously seen in the data will mitigate in effect over time, did not have a forecasted value close to matching the true recorded average opening market value for Amazon those selected months.

After reviewing each of the eight models' forecasted values, and then seeing them all plotted in relation to the known values from our testing dataset, I chose to designate the damped additive Holt model and the SES model as the two most effective in their prediction accuracy. Next up I took a closer look at the two, running error analysis, this time by comparing each model's forecasts to the **testing** data values.

## Final Model Selection
Finally, before selecting which of these models was most effective in this predictive analysis scenario, I wanted to look at each model's errors, both observation-by-observation and holistically.
I created a table that broke down each model's error for each of the 12 months' forecasts, as well as calculating the respective mean square error (MSE) and mean absolute prediction error (MAPE). Per these calculations, I deemed the SES model as the most effective for forecasting purposes of the eight available.
```{r}
Simple_Errors = Simple_Predictions - Testing_Data$Open
Holt_Add_Damp_Errors = Holt_Add_Damp_Predictions - Testing_Data$Open

Model_Select_Table = cbind(Testing_Data$Open, 
                           Simple_Predictions, Simple_Errors,
                           Holt_Add_Damp_Predictions, Holt_Add_Damp_Errors)

colnames(Model_Select_Table) = c("True Values", "Simple Smoothing", "Simple Smoothing Errors", "Holt Addition W/ Damp", "Holt Errors")
kable(Model_Select_Table, caption = "Prediction and Error Breakdown for SES and Additive Holt Models", align = "c")

# MSE
Simple_Err_MSE = sum(Simple_Errors^2)/nrow(Simple_Errors)
Holt_Add_Damp_Errors_MSE = sum(Holt_Add_Damp_Errors^2)/nrow(Holt_Add_Damp_Errors)

# MAPE
Simple_Err_MAPE = (sum(abs((Simple_Errors/Testing_Data$Open)*100)))/nrow(Simple_Errors)
Holt_Add_Damp_Errors_MAPE = (sum(abs((Holt_Add_Damp_Errors/Testing_Data$Open)*100)))/nrow(Holt_Add_Damp_Errors)

Error_Summary = data.frame(rbind(Simple_Err_MSE, Holt_Add_Damp_Errors_MSE),
                            rbind(Simple_Err_MAPE, Holt_Add_Damp_Errors_MAPE))
colnames(Error_Summary) = c("MSE", "MAPE")
rownames(Error_Summary) = c("Simple Exponential Smoothing", "Holt Additive W/ Damp")
kable(Error_Summary, caption = "Accuracy Calculations for SES and Additive Holt Models", align = "c")
```

# Conclusion
Ultimately, of the eight models that were created throughout this analysis, the most straightforward one - simple exponential smoothing (SES) - was the most effective. Retrospectively, I believe this can be attributed to two reasons; the first being the spike in the rate of Amazon's stock increase from about 2018 to October of 2021 (the ending quarter of the data used for this analysis). The magnitude of the rise in value, in relation to the amount of time passed during that interval, was so much greater than the first three quarters of the dataset. Because of this, all of the non-damped models (both Holt and HW), drastically and consistently overforecasted. Essentially, the assumption of these non-damped models that the trend visible in the dataset (and more heavily influenced by more recent observations) was representative of a new constant rate of increase, was the downfall in their forecasting accuracy.

The second explanation for SES turning out to be the most accurate in this instance is that the data's seasonal component was rather parallel to the trend in its behavior relative to the time period. It was visible in the season-by-season plots that the seasonal differences in average Amazon stock opening during the last years of the time interval were much greater than said differences in the early years of the time interval. All the HW models, both damped and non-damped, take seasonality into calculative consideration. Too much weight was given to the increase in seasonal disparity in the most recent years, leading to consistent overforecasting.

In the future, it would be useful to analyze Amazon's historical monthly average market opening price with a method less vulnerable to near-term and recent spikes in overall increasing/decreasing trends and/or seasonal oscillations.
# References:

Original Dataset Source:

- https://www.kaggle.com/datasets/kannan1314/amazon-stock-price-all-time/data

Dataset Download Link via Github:

- https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/Amazon.csv
