QUESTION 7.1
Describe a situation or problem from your job, everyday life, current events, etc., for which exponential smoothing would be appropriate. What data would you need? Would you expect the value of a (the first smoothing parameter) to be closer to 0 or 1, and why?
I currently work for a retail company. Exponential smoothing is well-suited for forecasting in scenarios like inventory management for a retail store, where you aim to predict product demand that fluctuates but does not exhibit strong seasonal or trend patterns.
The value of α, the first smoothing parameter, depends on the fluctuation pattern of the sales data:
For stable sales with minor random fluctuations, α should be closer to 0, emphasizing the historical data to smooth out the randomness.
For more volatile sales where recent data strongly indicate future trends, α should be closer to 1, giving more weight to the latest observations.
Typically, α might range between 0.1 and 0.5, balancing between stability and responsiveness to recent changes, chosen based on historical data analysis to minimize forecast error.
QUESTION 7.2
Using the 20 years of daily high temperature data for Atlanta (July through October) from Question 6.2 (file temps.txt), build and use an exponential smoothing model to help make a judgment of whether the unofficial end of summer has gotten later over the 20 years.
Clear our environment and load any libraries we might need.
rm(list = ls())
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load up the Atlanta Temperature Data from the previous HW.
temps <- read.table("/Users/djmariano/Downloads/hw4-SP22/temps.txt", header = TRUE, stringsAsFactors = FALSE)
Preview our data
head(temps)
## DAY X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1 1-Jul 98 86 91 84 89 84 90 73 82 91 93 95
## 2 2-Jul 97 90 88 82 91 87 90 81 81 89 93 85
## 3 3-Jul 97 93 91 87 93 87 87 87 86 86 93 82
## 4 4-Jul 90 91 91 88 95 84 89 86 88 86 91 86
## 5 5-Jul 89 84 91 90 96 86 93 80 90 89 90 88
## 6 6-Jul 93 84 89 91 96 87 93 84 90 82 81 87
## X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
## 1 85 95 87 92 105 82 90 85
## 2 87 90 84 94 93 85 93 87
## 3 91 89 83 95 99 76 87 79
## 4 90 91 85 92 98 77 84 85
## 5 88 80 88 90 100 83 86 84
## 6 82 87 89 90 98 83 87 84
summary(temps)
## DAY X1996 X1997 X1998
## Length:123 Min. :60.00 Min. :55.00 Min. :63.00
## Class :character 1st Qu.:79.00 1st Qu.:78.50 1st Qu.:79.50
## Mode :character Median :84.00 Median :84.00 Median :86.00
## Mean :83.72 Mean :81.67 Mean :84.26
## 3rd Qu.:90.00 3rd Qu.:88.50 3rd Qu.:89.00
## Max. :99.00 Max. :95.00 Max. :95.00
## X1999 X2000 X2001 X2002
## Min. :57.00 Min. : 55.00 Min. :51.00 Min. :57.00
## 1st Qu.:75.00 1st Qu.: 77.00 1st Qu.:78.00 1st Qu.:78.00
## Median :86.00 Median : 86.00 Median :84.00 Median :87.00
## Mean :83.36 Mean : 84.03 Mean :81.55 Mean :83.59
## 3rd Qu.:91.00 3rd Qu.: 91.00 3rd Qu.:87.00 3rd Qu.:91.00
## Max. :99.00 Max. :101.00 Max. :93.00 Max. :97.00
## X2003 X2004 X2005 X2006
## Min. :57.00 Min. :62.00 Min. :54.00 Min. :53.00
## 1st Qu.:78.00 1st Qu.:78.00 1st Qu.:81.50 1st Qu.:79.00
## Median :84.00 Median :82.00 Median :85.00 Median :85.00
## Mean :81.48 Mean :81.76 Mean :83.36 Mean :83.05
## 3rd Qu.:87.00 3rd Qu.:87.00 3rd Qu.:88.00 3rd Qu.:91.00
## Max. :91.00 Max. :95.00 Max. :94.00 Max. :98.00
## X2007 X2008 X2009 X2010
## Min. : 59.0 Min. :50.00 Min. :51.00 Min. :67.00
## 1st Qu.: 81.0 1st Qu.:79.50 1st Qu.:75.00 1st Qu.:82.00
## Median : 86.0 Median :85.00 Median :83.00 Median :90.00
## Mean : 85.4 Mean :82.51 Mean :80.99 Mean :87.21
## 3rd Qu.: 89.5 3rd Qu.:88.50 3rd Qu.:88.00 3rd Qu.:93.00
## Max. :104.0 Max. :95.00 Max. :95.00 Max. :97.00
## X2011 X2012 X2013 X2014
## Min. :59.00 Min. : 56.00 Min. :56.00 Min. :63.00
## 1st Qu.:79.00 1st Qu.: 79.50 1st Qu.:77.00 1st Qu.:81.50
## Median :89.00 Median : 85.00 Median :84.00 Median :86.00
## Mean :85.28 Mean : 84.65 Mean :81.67 Mean :83.94
## 3rd Qu.:94.00 3rd Qu.: 90.50 3rd Qu.:88.00 3rd Qu.:89.00
## Max. :99.00 Max. :105.00 Max. :92.00 Max. :95.00
## X2015
## Min. :56.0
## 1st Qu.:77.0
## Median :85.0
## Mean :83.3
## 3rd Qu.:90.0
## Max. :97.0
Now lets convert our data into a vector. We will ignore the first column and then preview the data.
tempsvector <- as.vector(unlist(temps[,2:21]))
str(tempsvector)
## int [1:2460] 98 97 97 90 89 93 93 91 93 93 ...
Now let’s convert our data into a time series and then plot it. We choose 123 for our frequency because there are 123 days in each of the recorded years.
temps_ts <- ts(data = tempsvector, frequency=123)
Now let’s plot and view our data.
plot.ts(temps_ts)
In the last assignment, we tried to find when the “unofficial end” of summer was (i.e the dates when the temperatures have dropped) for each year. We also tried to estimate whether if the summers have become hotter as the years progressed or not. For this assignment, we will build an exponential smoothing model that lets us judge whether the unofficial end of summer has gotten later over the 20 years.
We will use Holt-Winters to build an exponential smoothing model for our data. Holt-Winters is a forecasting algorithm that is used for time series data, especially data with trends and seasonality.
We set alpha, beta, and gamma to null to find their optimal values. There are two options for seasonality: additive and multiplicative. We will try both and see what gives us the lower SSE (sum of squared error)
We start with multiplicative:
temps_hwm <- HoltWinters(temps_ts,
alpha = NULL,
beta = NULL,
gamma = NULL,
seasonal = "multiplicative")
Let’s view the data.
plot(temps_hwm)
temps_hwm$SSE
## [1] 68904.57
Now we do additive.
temps_hwa <- HoltWinters(temps_ts,
alpha = NULL,
beta = NULL,
gamma = NULL,
seasonal = "additive")
View the data.
plot(temps_hwa)
temps_hwa$SSE
## [1] 66244.25
Hmm, I thought this data would have been a better fit for multiplicative. Anyways, we will use additive since it has a lower SSE.
Now let’s check out what our model’s alpha, beta, and gamma values are.
temps_hwa$alpha
## alpha
## 0.6610618
temps_hwa$beta
## beta
## 0
temps_hwa$gamma
## gamma
## 0.6248076
It selected an alpha of .66 which indicates that recent temperatures are more indicative of upcoming temperatures compared to earlier observations. This makes sense, because in the summer, the temperature will typically drop off steeply with lower temperatures that are more likely to be observed towards the end of the summer. Therefore, this would indicate that the lower temperatures are likely to continue.
It selected a beta of 0 which indicates that there is no trend in the data.
Now let’s plot our model to see these values fitted against our data.
fitted_temps <- temps_hwa$fitted
plot(fitted_temps)
We can now individually see our level, trend, and seasonality along with our fitted data. Next, we will export our data to a CSV to perform a CUSUM analysis in Excel to judge whether the end of summer occurs later than it used to.
temps_excel = data.frame(fitted_temps)
write.csv(temps_excel, file = "/Users/djmariano/Downloads/hw4-SP22/fitted_data")
The remainder of the assignment is finished in Excel. You can find my workbook attached titled hw4_7.2_cusum with my CUSUM analysis. My judgement is that the data does not prove an indication that the unofficial end date of the summer is occuring later in more recent years.