2025-12-11

What Did I Do and Why?

  • Thesis: Analyzed Amazon’s monthly opening price

  • Scope: Jan 2009 to Oct 2021

  • Personal Motivation: Improved financial literacy, valuable market knowledge

  • Target Audience: Investors, economists, media analysts

Methods and Technology Used

  • R and RStudio (knitr, forecast, lubridate)

  • Exponential Smoothing

  • Holt / Holt-Winters

  • “Weighted Averaging” ~ think a step above Moving Average

Background of Dataset

  • Sourced from Kaggle

  • High data workability

    • No missing values, consistent data entry and variable encoding
  • Came with a daily observation for every market-open day from May 15, 1997 to October 27, 2021

    • First few observations seen below
Date Open High Low Close Adj Close Volume
1997-05-15 2.437500 2.500000 1.927083 1.958333 1.958333 72156000
1997-05-16 1.968750 1.979167 1.708333 1.729167 1.729167 14700000
1997-05-19 1.760417 1.770833 1.625000 1.708333 1.708333 6106800
1997-05-20 1.729167 1.750000 1.635417 1.635417 1.635417 5467200
1997-05-21 1.635417 1.645833 1.375000 1.427083 1.427083 18853200
1997-05-22 1.437500 1.447917 1.312500 1.395833 1.395833 11776800

My Procedure

  • Research Question: What is the history and how can I predict the future?

  • Constraints: Only looking at market opens on the first day of every month on timeline from January 2009 to October of 2021

    • Opening price = strong understanding colloquially

    • Day-to-day = too many observations

    • Year-to-year = too few observations

    • Chosen timeline: n = 154

    • First few observations of dataset after programming manipulation

      Month

      Year

      Open

      Season

      Jan

      2009

      52.01

      Winter

      Feb

      2009

      62.87

      Winter

      Mar

      2009

      68.35

      Spring

      Apr

      2009

      78.28

      Spring

      May

      2009

      77.84

      Spring

      Jun

      2009

      83.19

      Summer

Data Visualization

  • Priority 1: Observe Trend

  • What we see from plot below:

    • Positive
    • Not linear
    • (Increase ~ 2017 to 2022) >>> (Increase ~ 2009 to 2016)

Data Visualization

  • Priority 2: Observe Seasonality
    • Seasonal differences exacerbate as time goes on

Training Vs Testing Data

  • Training Data Testing Data
    • Poses risk of overfitting
    • Avoids true “real world” application
  • Training Jan 2009 to Oct 2020 (142 month period)
  • Testing Nov 2020 to Oct 2021 (12 month period)

Model Creation - Exponential Smoothing and Holt

  • Accredit these ideas to Charles Holt, Professor of Business and Finance at University of Austin

  • Developed throughout the mid to late 1950’s

  • Holt expanded on the foundations of SES (discovered in previous decade) by considering trend

Model Type Explanation Trend? Seasonality? Assumptions
Simple Exponential Smoothing (SES) Weighted average; More recent observations = More weight No No Values are not changing much from what they’ve recently been
Holt Additive Generalized SES, but ADDS trend parameter Yes No Values increasing/decreasing in LINEAR fashion
Holt Additive W/Damp Generalized SES, ADDS trend parameter, but DAMPS parameter to reduce abs[trend’s impact] Yes No Linear increase/decrease, but will weaken in strength over time
Holt Multiplicative W/Damp Like additive iteration, but MULTIPLIES trend parameter Yes No Values increasing/decreasing by a RATE
  • Note: Holt Multiplicative without Damp is rarely ever used
    • Approaches \(\infty\)
    • Computationally unstable
    • Not realistic

Model Creation - Holt-Winters

  • Holt and a student of his, Peter-Winters, then went even further and developed a method that also factored in seasonality

  • Officially published these findings in a 1960 paper “Forecasting Sales by Exponentially Weighted Moving Averages”

Model Type Explanation Trend? Seasonality? Assumptions
Holt-Winters (HW) Additive Adds Data’s trend and constant oscillation due to seasonality Yes Yes Linear upward/downward overall trend; seasonal fluctuations from a constant
HW Additive W/Damp Add’s a DAMPED trend and constant seasonality Yes Yes Upward/downward overall trend appears linear, but its strengh is decreasing; seasonal fluctuations remain constant
HW Multiplicative Multiplies Data’s trend and proportional oscillation due to seasonality Yes Yes Upward/downward overall trend per a certain rate; seasonal fluctuations behave proportionally
HW Multiplicative W/Damp Multiplies a DAMPED trend and proportional seasonality Yes Yes Upward/downward overall trend per a certain rate, but will decrease in magnitude; proportional season fluctuations

Model Forecasts

Model Forecasts for Monthly Amazon Stock Openings ($USD)
Month True Values SES Holt Add Holt Add W/ Damp Holt Multiply W/ Damp HW Add HW Multiply HW Add W/ Damp HW Multiply W/ Damp
Nov 2020 3147.33 3241.35 3340.39 3289.48 3300.31 3331.56 3212.56 3318.53 3197.97
Dec 2020 3199.93 3241.35 3439.42 3327.95 3360.33 3419.13 3237.60 3389.56 3200.23
Jan 2021 3206.54 3241.35 3538.45 3358.72 3409.14 3526.80 3386.01 3470.33 3326.71
Feb 2021 3267.66 3241.35 3637.48 3383.34 3448.69 3636.39 3473.67 3557.02 3472.25
Mar 2021 3074.58 3241.35 3736.51 3403.03 3480.66 3719.15 3493.00 3614.45 3430.89
Apr 2021 3347.73 3241.35 3835.53 3418.79 3506.45 3847.23 3780.15 3711.32 3696.34
May 2021 3261.31 3241.35 3934.56 3431.39 3527.23 3979.00 3984.88 3811.29 3830.22
Jun 2021 3360.01 3241.35 4033.59 3441.48 3543.93 4087.03 4274.96 3880.84 3946.92
Jul 2021 3612.71 3241.35 4132.62 3449.54 3557.35 4207.22 4678.05 3961.56 4258.44
Aug 2021 3310.76 3241.35 4231.65 3456.00 3568.13 4293.29 4670.11 4008.19 4273.38
Sept 2021 3432.44 3241.35 4330.68 3461.16 3576.77 4395.84 4695.49 4064.64 4165.20
Oct 2021 3325.98 3241.35 4429.70 3465.29 3583.70 4479.89 4599.32 4099.32 4050.32
  • The non-damped Holt and HW models drastically overestimated

  • The damped HW models overestimated but not as much

  • Most accurate were SES and damped Holt models

Model Evaluation

  • Before plotting, wanted to use a table to review each model’s accuracy measures

  • Using R’s accuracy function (forecast package), calculated error metrics below

    • Note that calculations are based on training data
  • Model’s with the best ranking in said metric are marked

Model Accuracy Summary Table
ME RMSE MAE MPE MAPE MASE ACF1
SES 22.4633 78.0381 43.8711 2.6342 5.8786 0.1942 0.2736
Holt Add 6.3619 72.6341 39.5756 0.3233 5.5699 0.1752 0.14
Holt Add W/ Damp 10.27 72.0711 39.6539 1.124 5.7019 0.1755 -0.0207
Holt Multiply W/ Damp 8.728 71.712 39.3588 0.9515 5.7406 0.1742 -0.0186
HW Add 6.1619 70.4673 41.937 0.3457 7.9333 0.1856 0.1267
HW Multiply 6.1625 67.4218 41.9806 0.1399 6.6633 0.1858 0.0722
HW Add W/ Damp 9.0695 70.5768 42.2699 0.7657 8.0253 0.1871 0.1107
HW Multiply W/ Damp 11.6052 61.579 39.8682 0.9885 6.2923 0.1765 0.1686
  • Damped HW Multiplicative model had best ranking in three of the seven error measurements

Model Plotting

  • Next step was to look at plots (real and recorded values versus what models predicted)

  • SES and damped Holt models are only ones that remotely resemble the true values

Choosing Between Top Models

Prediction and Error Breakdown for SES and Additive Holt Models
Month True Values SES SES Errors Holt Add W/ Damp Holt Errors
Nov 2020 3147.33 3241.35 94.02 3289.48 142.15
Dec 2020 3199.93 3241.35 41.42 3327.95 128.02
Jan 2021 3206.54 3241.35 34.81 3358.72 152.18
Feb 2021 3267.66 3241.35 -26.31 3383.34 115.68
Mar 2021 3074.58 3241.35 166.77 3403.03 328.45
Apr 2021 3347.73 3241.35 -106.38 3418.79 71.06
May 2021 3261.31 3241.35 -19.96 3431.39 170.08
Jun 2021 3360.01 3241.35 -118.66 3441.48 81.47
Jul 2021 3612.71 3241.35 -371.36 3449.54 -163.17
Aug 2021 3310.76 3241.35 -69.41 3456 145.24
Sept 2021 3432.44 3241.35 -191.09 3461.16 28.72
Oct 2021 3325.98 3241.35 -84.63 3465.29 139.31
Average 3295.58 3241.35 -54.23 3407.18 111.6
  • The SES model:
    • Four overestimations
    • Eight underestimations
    • Range of Error [19.96 to 371.36]
  • The Holt Additive Damped model:
    • Eleven overestimations
    • One underestimation
    • Range of Error [28.72 to 328.45]
  • Slight edge to SES model, still need to dig further

Choosing Between Top Models

  • Will calculate forecasts from each of these two models versus the known values (test data) to make ultimate decision

  • Two measures of absolute error (ME and MSE)

  • Two measures of relative error (MPE and MAPE)

    SES

    Holt Add W/ Damp

    ME

    -54.23

    111.60

    MSE

    21039.21

    24131.76

    MPE

    -1.49

    3.51

    MAPE

    3.28

    4.26

Final Model Selction

  • Simple Exponential Smoothing (SES) was the most effective and accurate for the purposes of this analysis

SES Accuracy Measures Calculated via Testing Data
ME -54.23
MSE 21039.21
MPE -1.49
MAPE 3.28

Conclusions/Explanations

Why did the simplest model perform the best?

1.) The massive spike in the rate of Amazon’s stock increase from about 2018 to October of 2021

  • Because of the “out of norm” rate of increase, all the non-damped models (Holt and HW) drastically overforecasted

Conclusions/Explanations

2.) Seasonal oscillations mirrored the overall data trend, disparities grew at exponential rate over the last ~ 4 years of data set

  • Damped HW models assume depreciation over time in trend, not in season-by-season disparities
    • *This is because usually seasonal oscillations are constant (Additive) or proportional (Multiplicative) independently of whether or not trend is increasing or decreasing

Conclusions/Explanations

3.) Damping parameter phi (pronounced “fee”) (\(\phi\)) was too high

  • \(\phi\) always exists between 0 and 1
    • If \(\phi\) = 1, then the damp is non-existent
  • Automatically calculated via R’s forecast function
  • Per $ selector, \(\phi\)) ~ 0.80 for damped Holt Additive and damped Holt Multiplicative models

  • I did attempt to manually rebuild these two Holt models with a lower phi value, but was consistently met with errors
    • The value of phi, alpha and beta are all interdependent on one another
    • I could not even lower phi’s value to 0.79

Takeaways / Lessons

  • Subsetting into training vs testing sets is very important

  • “More complex” does not necessarily = better fit for situation at hand

    • Sometimes more accuracy is retained by not considering all possible factors
  • The value of investing long term in companies you believe in

    • Say you bought 100 shares of Amazon in January of 2009 for $5,201
      • Those shares would be worth $30,288 by January of 2015
      • By January of 2018. they would be worth $130,138
      • If you continued to hold, they would be worth $332,598 by October of 2022

Changes/Ideas for Future Analyses

  • Perform an associative and external analysis, looking at what specific factors across timeline (Jan 2009 - October 2021) contributed to stock’s monthly increase/decrease

  • Perform a time series analysis on more recent data and on a tighter timeline for Amazon

    • Ex: Jan 2022 to Dec 2025, observations at weekly intervals (sample size ~ 150)
      • As we saw, the year of 2020 had a massive impact on both trend and seasonality
  • Perform a time series analysis on the same timeline but with a different company

References