Question 7.1 - Describe a situation or problem from your
job, everyday life, current events, etc., for which exponential
smoothing would be appropriate. What data would you need? Would you
expect the value of alpha (the first smoothing parameter) to be closer
to 0 or 1, and why?
In previous weeks, all my examples have related to the environment.
Accordingly, one area where exponential smoothing might be helpful is
looking at and examining el nino cycles. El nino can happen every 2-7
years, so it’s important that we smooth out our curves due to the
unpredictable nature of these cycles. Thus, we could take el nino
predictors such as rainfall and temperature and use them to predict
these sporadic el nino cycles. I would expect that our alpha value be
closer to 0 than 1 because of the fluctuating nature of the cycle.
Additionally, our el nino cycle takes months to years to develop, so we
would want a value closer to 0 than 1, so our previous smoothing value
is more heavily valued than our incremental value.
Question 7.2 - Using the 20 years of daily high
temperature data for Atlanta (July through October) from Question 6.2
(file temps.txt), build and use an exponential smoothing model to help
make a judgment of whether the unofficial end of summer has gotten later
over the 20 years. (Part of the point of this assignment is for you to
think about how you might use exponential smoothing to answer this
question. Feel free to combine it with other models if you’d like to.
There’s certainly more than one reasonable
approach.)
Note: in R, you can use either HoltWinters (simpler to
use) or the smooth package’s es function (harder to use, but more
general). If you use es, the Holt-Winters model uses model=”AAM” in the
function call (the first and second constants are used “A”dditively, and
the third (seasonality) is used “M”ultiplicatively; the documentation
doesn’t make that clear).
Methodology - We want to read in our data like last week. However, instead of going right in and creating a cusum model, we want to go ahead and test different applications of HoltWinters. There are four possibilities, one with Trends or Patterns, one with Trends and not Patterns, one with Trends and Additive Patterns, and one with Trends and Multiplicative Patterns. Let us examine these four different models in R.
temp_data <- read.table("C:/Users/james/OneDrive/Desktop/Georgia Tech/ISYE 6501/Homeworks/Homework 4/Homework4_ISYE6501/Homework4_ISYE6501/data 7.2/temps.txt", header = TRUE)
head(temp_data)
Let us go ahead and use the conjunction of unlist and as.vector to put all our temperatures into a single vector. Once we create our vector, we can format our values as a time series dataset by using the ts function.
all_temps = as.vector(unlist(temp_data[,2:21]))
head(all_temps)
## [1] 98 97 97 90 89 93
temp_ts = ts(all_temps, start = 1996, end = 2015, frequency = 123)
plot(temp_ts)
Let us now see the summaries of our fitted values for each individual model we have created.
#We want to run 4 different HoltWinters models. One basic one, One with Trends, and two with Cyclical Patterns, one additive and one multiplicative.
ex_smooth1 = HoltWinters(temp_ts, beta = FALSE, gamma = FALSE)
head(ex_smooth1$fitted)
## Time Series:
## Start = c(1996, 2)
## End = c(1996, 7)
## Frequency = 123
## xhat level
## 1996.008 98.00000 98.00000
## 1996.016 97.16037 97.16037
## 1996.024 97.02572 97.02572
## 1996.033 91.12671 91.12671
## 1996.041 89.34106 89.34106
## 1996.049 92.41322 92.41322
ex_smooth2 = HoltWinters(temp_ts, gamma = FALSE)
head(ex_smooth2$fitted)
## Time Series:
## Start = c(1996, 3)
## End = c(1996, 8)
## Frequency = 123
## xhat level trend
## 1996.016 96.00000 97.00000 -1.0000000
## 1996.024 95.84872 96.84553 -0.9968058
## 1996.033 89.88796 90.90345 -1.0154880
## 1996.041 88.11884 89.13716 -1.0183244
## 1996.049 91.24328 92.24601 -1.0027328
## 1996.057 91.73152 92.72864 -0.9971214
ex_smooth3 = HoltWinters(temp_ts, seasonal = "additive")
head(ex_smooth3$fitted)
## Time Series:
## Start = c(1997, 1)
## End = c(1997, 6)
## Frequency = 123
## xhat level trend season
## 1997.000 87.17619 82.87739 -0.004362918 4.303159
## 1997.008 90.32137 82.08762 -0.004362918 8.238119
## 1997.016 92.95607 81.86865 -0.004362918 11.091777
## 1997.024 90.93226 81.89363 -0.004362918 9.042997
## 1997.033 83.99752 81.93450 -0.004362918 2.067387
## 1997.041 84.04359 81.93179 -0.004362918 2.116168
ex_smooth4 = HoltWinters(temp_ts, seasonal = "multiplicative")
head(ex_smooth4$fitted)
## Time Series:
## Start = c(1997, 1)
## End = c(1997, 6)
## Frequency = 123
## xhat level trend season
## 1997.000 87.23653 82.87739 -0.004362918 1.052653
## 1997.008 90.41281 82.14241 -0.004362918 1.100742
## 1997.016 92.99079 81.90479 -0.004362918 1.135413
## 1997.024 90.93790 81.90547 -0.004362918 1.110338
## 1997.033 83.99871 81.93589 -0.004362918 1.025231
## 1997.041 84.04480 81.93231 -0.004362918 1.025838
From our summaries, I think it would be hard to measure when summer ends in model 1 and model 2. For we are only given their trends in the first two models. However, in the last two models, we are given our value for season, which in the additive model adds up to zero, and in the multiplicative model it should average out to 1. So for the sake of finding values, we are going to use our multiplicative model. Then, we are going to combine our Exponential Smoothing and Cumulative Sum methods to detect when we go under that threshold of 1 in our seasonal values. This will represent when we enter our new season.
ex_smooth_op = ex_smooth4
plot(ex_smooth_op)
head(ex_smooth_op$fitted)
## Time Series:
## Start = c(1997, 1)
## End = c(1997, 6)
## Frequency = 123
## xhat level trend season
## 1997.000 87.23653 82.87739 -0.004362918 1.052653
## 1997.008 90.41281 82.14241 -0.004362918 1.100742
## 1997.016 92.99079 81.90479 -0.004362918 1.135413
## 1997.024 90.93790 81.90547 -0.004362918 1.110338
## 1997.033 83.99871 81.93589 -0.004362918 1.025231
## 1997.041 84.04480 81.93231 -0.004362918 1.025838
Here, we are going to pull all our season values and re-assign those two our years and find when we pass our threshold every year.
#What we observe in our data is our baseline value for the season. If it is above the value of 1, we now it is warmer and still probably summer, but once it goes below, it seemingly cools off and enters fall. Thus, let us take our seasonal number we get and see when this number falls under our line y = 1.
seasonal_values = as.vector(ex_smooth_op$fitted[,4])
seasonal_values[1]
## [1] 1.052653
seasonal_values= seasonal_values[-length(seasonal_values)]
Now let us go ahead and apply our cusum function here to test when our seasonal value goes below 1.
cusum_fun = function(c,T){
for (i in 1:18){
result = rep(0,123)
for (j in 1:123){
data_differ = 1 - seasonal_values[(i-1)*123+j]-c
if (data_differ <= 0){
result[j]=0
}
else{
if (j %% 123 == 1){
result[j] = result[j-1]
}
else{
result[j] = result[j-1] + data_differ
}
}
}
cross = which(result >= T)[1]
output = paste(substring(colnames(temp_data)[i+2],2), ":", temp_data[cross,1])
print(output)
}
}
Now that we have built our cusum function, we can go ahead and test our values, but first, we must select our values of c and t. I think I want to measure 5% difference in seasonal values, so we will select a C value of .05 and a threshold value of around 20%, so t = .2
summary(all_temps)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 50.00 79.00 85.00 83.34 90.00 105.00
c = .05
t = .20
cusum_fun(c,t)
## [1] "1997 : 30-Sep"
## [1] "1998 : 1-Oct"
## [1] "1999 : 1-Oct"
## [1] "2000 : 1-Oct"
## [1] "2001 : 1-Oct"
## [1] "2002 : 1-Oct"
## [1] "2003 : 2-Oct"
## [1] "2004 : 2-Oct"
## [1] "2005 : 3-Oct"
## [1] "2006 : 3-Oct"
## [1] "2007 : 3-Oct"
## [1] "2008 : 3-Oct"
## [1] "2009 : 4-Oct"
## [1] "2010 : 3-Oct"
## [1] "2011 : 3-Oct"
## [1] "2012 : 2-Oct"
## [1] "2013 : 2-Oct"
## [1] "2014 : 3-Oct"
As such, I think we could be able to conclude that summers got longer from 1997 to 2014.
cusum_1 = function(C, T) {
#one for each year column in our data
for (i in 2:21){
year_data = temp_data[,i]
year_results = rep(0, length(year_data))
#62 Days in both July and August
year_mean = mean(year_data[1:62])
for (j in 1:length(year_data)){
year_differ = year_mean - year_data[j] - C
if (year_differ <= 0){
year_results[j] = 0
}
else{
if (j == 1){
year_results[j] = year_differ
}
else{
year_results[j] = year_results[j-1] + year_differ
}
}
}
year_cross = which(year_results >= T)[1]
output = paste(substring(colnames(temp_data)[i],2), ":", temp_data[year_cross,1])
print(output)
}
}
st_dev_pre = sd(all_temps)
st_dev_pre
## [1] 8.620253
c_pre = 5
t_pre = 20
cusum_1(c_pre, t_pre)
## [1] "1996 : 20-Sep"
## [1] "1997 : 26-Sep"
## [1] "1998 : 6-Oct"
## [1] "1999 : 13-Jul"
## [1] "2000 : 26-Jul"
## [1] "2001 : 25-Sep"
## [1] "2002 : 24-Sep"
## [1] "2003 : 29-Sep"
## [1] "2004 : 16-Sep"
## [1] "2005 : 7-Oct"
## [1] "2006 : 14-Sep"
## [1] "2007 : 16-Sep"
## [1] "2008 : 17-Sep"
## [1] "2009 : 2-Sep"
## [1] "2010 : 28-Sep"
## [1] "2011 : 6-Sep"
## [1] "2012 : 14-Sep"
## [1] "2013 : 17-Aug"
## [1] "2014 : 26-Sep"
## [1] "2015 : 14-Sep"
Accordingly, we see the difference in our values between last weeks cusum function and our exponential smoothing function that accompanies our cusum function this week.
Discussion of Results - As we can see, our results are slightly different from last week to this week. Last week, most of our values were in late September, with the occasional August/Early September and October dates. However, this week, we can see our clear noise reduction. Yes, we see a more clear increase in temperatures per year compared to last week. By smoothing our function, we get a much more consistent illustration of yearly temperatures. Furthermore, we can clearly see the applicability of using our exponential smoothing function in conjunction with cusum methods.