QUESTION 7.1

Describe a situation or problem from your job, everyday life, current events, etc., for which exponential smoothing would be appropriate. What data would you need? Would you expect the value of a (the first smoothing parameter) to be closer to 0 or 1, and why?

I currently work for a retail company. Exponential smoothing is well-suited for forecasting in scenarios like inventory management for a retail store, where you aim to predict product demand that fluctuates but does not exhibit strong seasonal or trend patterns.

The value of α, the first smoothing parameter, depends on the fluctuation pattern of the sales data:

For stable sales with minor random fluctuations, α should be closer to 0, emphasizing the historical data to smooth out the randomness.

For more volatile sales where recent data strongly indicate future trends, α should be closer to 1, giving more weight to the latest observations.

Typically, α might range between 0.1 and 0.5, balancing between stability and responsiveness to recent changes, chosen based on historical data analysis to minimize forecast error.

QUESTION 7.2

Using the 20 years of daily high temperature data for Atlanta (July through October) from Question 6.2 (file temps.txt), build and use an exponential smoothing model to help make a judgment of whether the unofficial end of summer has gotten later over the 20 years.

Clear our environment and load any libraries we might need.

rm(list = ls())
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load up the Atlanta Temperature Data from the previous HW.

temps <- read.table("/Users/djmariano/Downloads/hw4-SP22/temps.txt", header = TRUE, stringsAsFactors = FALSE)

Preview our data

head(temps)
##     DAY X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1 1-Jul    98    86    91    84    89    84    90    73    82    91    93    95
## 2 2-Jul    97    90    88    82    91    87    90    81    81    89    93    85
## 3 3-Jul    97    93    91    87    93    87    87    87    86    86    93    82
## 4 4-Jul    90    91    91    88    95    84    89    86    88    86    91    86
## 5 5-Jul    89    84    91    90    96    86    93    80    90    89    90    88
## 6 6-Jul    93    84    89    91    96    87    93    84    90    82    81    87
##   X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
## 1    85    95    87    92   105    82    90    85
## 2    87    90    84    94    93    85    93    87
## 3    91    89    83    95    99    76    87    79
## 4    90    91    85    92    98    77    84    85
## 5    88    80    88    90   100    83    86    84
## 6    82    87    89    90    98    83    87    84
summary(temps)
##      DAY                X1996           X1997           X1998      
##  Length:123         Min.   :60.00   Min.   :55.00   Min.   :63.00  
##  Class :character   1st Qu.:79.00   1st Qu.:78.50   1st Qu.:79.50  
##  Mode  :character   Median :84.00   Median :84.00   Median :86.00  
##                     Mean   :83.72   Mean   :81.67   Mean   :84.26  
##                     3rd Qu.:90.00   3rd Qu.:88.50   3rd Qu.:89.00  
##                     Max.   :99.00   Max.   :95.00   Max.   :95.00  
##      X1999           X2000            X2001           X2002      
##  Min.   :57.00   Min.   : 55.00   Min.   :51.00   Min.   :57.00  
##  1st Qu.:75.00   1st Qu.: 77.00   1st Qu.:78.00   1st Qu.:78.00  
##  Median :86.00   Median : 86.00   Median :84.00   Median :87.00  
##  Mean   :83.36   Mean   : 84.03   Mean   :81.55   Mean   :83.59  
##  3rd Qu.:91.00   3rd Qu.: 91.00   3rd Qu.:87.00   3rd Qu.:91.00  
##  Max.   :99.00   Max.   :101.00   Max.   :93.00   Max.   :97.00  
##      X2003           X2004           X2005           X2006      
##  Min.   :57.00   Min.   :62.00   Min.   :54.00   Min.   :53.00  
##  1st Qu.:78.00   1st Qu.:78.00   1st Qu.:81.50   1st Qu.:79.00  
##  Median :84.00   Median :82.00   Median :85.00   Median :85.00  
##  Mean   :81.48   Mean   :81.76   Mean   :83.36   Mean   :83.05  
##  3rd Qu.:87.00   3rd Qu.:87.00   3rd Qu.:88.00   3rd Qu.:91.00  
##  Max.   :91.00   Max.   :95.00   Max.   :94.00   Max.   :98.00  
##      X2007           X2008           X2009           X2010      
##  Min.   : 59.0   Min.   :50.00   Min.   :51.00   Min.   :67.00  
##  1st Qu.: 81.0   1st Qu.:79.50   1st Qu.:75.00   1st Qu.:82.00  
##  Median : 86.0   Median :85.00   Median :83.00   Median :90.00  
##  Mean   : 85.4   Mean   :82.51   Mean   :80.99   Mean   :87.21  
##  3rd Qu.: 89.5   3rd Qu.:88.50   3rd Qu.:88.00   3rd Qu.:93.00  
##  Max.   :104.0   Max.   :95.00   Max.   :95.00   Max.   :97.00  
##      X2011           X2012            X2013           X2014      
##  Min.   :59.00   Min.   : 56.00   Min.   :56.00   Min.   :63.00  
##  1st Qu.:79.00   1st Qu.: 79.50   1st Qu.:77.00   1st Qu.:81.50  
##  Median :89.00   Median : 85.00   Median :84.00   Median :86.00  
##  Mean   :85.28   Mean   : 84.65   Mean   :81.67   Mean   :83.94  
##  3rd Qu.:94.00   3rd Qu.: 90.50   3rd Qu.:88.00   3rd Qu.:89.00  
##  Max.   :99.00   Max.   :105.00   Max.   :92.00   Max.   :95.00  
##      X2015     
##  Min.   :56.0  
##  1st Qu.:77.0  
##  Median :85.0  
##  Mean   :83.3  
##  3rd Qu.:90.0  
##  Max.   :97.0

Now lets convert our data into a vector. We will ignore the first column and then preview the data.

tempsvector <- as.vector(unlist(temps[,2:21]))
str(tempsvector)
##  int [1:2460] 98 97 97 90 89 93 93 91 93 93 ...

Now let’s convert our data into a time series and then plot it. We choose 123 for our frequency because there are 123 days in each of the recorded years.

temps_ts <- ts(data = tempsvector, frequency=123)

Now let’s plot and view our data.

plot.ts(temps_ts)

In the last assignment, we tried to find when the “unofficial end” of summer was (i.e the dates when the temperatures have dropped) for each year. We also tried to estimate whether if the summers have become hotter as the years progressed or not. For this assignment, we will build an exponential smoothing model that lets us judge whether the unofficial end of summer has gotten later over the 20 years.

We will use Holt-Winters to build an exponential smoothing model for our data. Holt-Winters is a forecasting algorithm that is used for time series data, especially data with trends and seasonality.

We set alpha, beta, and gamma to null to find their optimal values. There are two options for seasonality: additive and multiplicative. We will try both and see what gives us the lower SSE (sum of squared error)

We start with multiplicative:

temps_hwm <- HoltWinters(temps_ts,
                                 alpha = NULL,
                                 beta = NULL,
                                 gamma = NULL,
                                 seasonal = "multiplicative")

Let’s view the data.

plot(temps_hwm)

temps_hwm$SSE
## [1] 68904.57

Now we do additive.

temps_hwa <- HoltWinters(temps_ts,
                         alpha = NULL,
                         beta = NULL,
                         gamma = NULL,
                         seasonal = "additive")

View the data.

plot(temps_hwa)

temps_hwa$SSE
## [1] 66244.25

Hmm, I thought this data would have been a better fit for multiplicative. Anyways, we will use additive since it has a lower SSE.

Now let’s check out what our model’s alpha, beta, and gamma values are.

temps_hwa$alpha
##     alpha 
## 0.6610618
temps_hwa$beta
## beta 
##    0
temps_hwa$gamma
##     gamma 
## 0.6248076

It selected an alpha of .66 which indicates that recent temperatures are more indicative of upcoming temperatures compared to earlier observations. This makes sense, because in the summer, the temperature will typically drop off steeply with lower temperatures that are more likely to be observed towards the end of the summer. Therefore, this would indicate that the lower temperatures are likely to continue.

It selected a beta of 0 which indicates that there is no trend in the data.

Now let’s plot our model to see these values fitted against our data.

fitted_temps <- temps_hwa$fitted
plot(fitted_temps)

We can now individually see our level, trend, and seasonality along with our fitted data. Next, we will export our data to a CSV to perform a CUSUM analysis in Excel to judge whether the end of summer occurs later than it used to.

temps_excel = data.frame(fitted_temps)
write.csv(temps_excel, file = "/Users/djmariano/Downloads/hw4-SP22/fitted_data")

The remainder of the assignment is finished in Excel. You can find my workbook attached titled hw4_7.2_cusum with my CUSUM analysis. My judgement is that the data does not prove an indication that the unofficial end date of the summer is occuring later in more recent years.