Prob 7.1

Describe a situation or problem from your job, everyday life, current events, etc., for which exponential smoothing would be appropriate. What data would you need? Would you expect the value of alpha (the first smoothing parameter) to be closer to 0 or 1, and why?

We are forecasting Operational Expenses (time series data is needed) for some of our workforce and we are using exponential smoothing functions. For expenses that are seasonal and do not have sparsity issues, the TES (Holts-Winter) does really well. For expenses that are not seasonal and do not contain trends, the SES does quite well sometimes and we let the algorithm pick the optimal alpha values based on the data. Our alpha values are between 0.5 to 0.8 in general; As alpha gets closer to one, this means more weight is given to the more recent observations

require(ggthemes)
library(tidyverse)
library(magrittr)
library(TTR)
library(tidyr)
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
library(fpp2)   
library(forecast)
library(caTools)
library(reshape2)
library(psych) 
require(graphics)
library(qcc)
library(ggQC)

Load Data

df <- read.csv(file="temps.csv",stringsAsFactors = F, header=T) 
#Replacing Unwanted headers
df1 <- df %>% rename_at(vars(starts_with("X")),funs(str_replace(.,"X","")))
head(df1,2)

##     DAY 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 1 1-Jul   98   86   91   84   89   84   90   73   82   91   93   95   85   95
## 2 2-Jul   97   90   88   82   91   87   90   81   81   89   93   85   87   90
##   2010 2011 2012 2013 2014 2015
## 1   87   92  105   82   90   85
## 2   84   94   93   85   93   87

Prob 7.2

#make wide to long format
df2 <- melt(df1, id.vars = c("DAY"),variable.name = "YEAR",value.name="TEMP")
#head(df2,20)
df3 = df2 %>% unite(date, DAY, YEAR, sep = "-", remove = FALSE)
#df3
New_Date <- as.Date(df3$date, format="%d-%b-%Y")
#class(New_Date)
df4 <- df3 %>% mutate(date=New_Date) %>% select(c("date","TEMP"))
head(df4,5)

##         date TEMP
## 1 1996-07-01   98
## 2 1996-07-02   97
## 3 1996-07-03   97
## 4 1996-07-04   90
## 5 1996-07-05   89

Distribution of Temperatures: Summary statistics & Histogram

summary(df4$TEMP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   50.00   79.00   85.00   83.34   90.00  105.00

glimpse(describe(df4$TEMP))

## Observations: 1
## Variables: 13
## $ vars     <dbl> 1
## $ n        <dbl> 2460
## $ mean     <dbl> 83.33902
## $ sd       <dbl> 8.620253
## $ median   <dbl> 85
## $ trimmed  <dbl> 84.0747
## $ mad      <dbl> 7.413
## $ min      <dbl> 50
## $ max      <dbl> 105
## $ range    <dbl> 55
## $ skew     <dbl> -0.8208976
## $ kurtosis <dbl> 0.5796116
## $ se       <dbl> 0.1738011

# Histogram 
hist(df4$TEMP, xlab="Temperature",col="yellow",,main="Fig1a: Histogram of Temperatures in Atlanta",probability=TRUE)
s = sd(df4$TEMP)
m = mean(df4$TEMP)
curve(dnorm(x, mean=m, sd=s), add=TRUE,col = "red", lwd = 3)

Interactive BoxPlot

Temperature over 20 years in Atlanta

#making interactive Boxplot
fig1 <- plot_ly(df2, y = ~TEMP, x = ~YEAR, type = "box")

fig1 <- fig1 %>% layout(title = "Fig1b: Boxplots of Temperature over 20 yrs in Atlanta",
         xaxis = list(title = "Year"),
         yaxis = list (title = "Degree Fahrenheit")) %>% layout(plot_bgcolor='rgb(555, 666, 188)')
fig1

Observation1:

A left-skewed histogram and boxplot with low temperature outliers indicated the data set contained more lower temperatures than higher temperatures over the 20 year span
The question is whether there exist enough statistical evidence to support such a hypothesis? We shall find out….

Holts Winters Plot

Original plot of data
Timeseries conversion
Making Temperatures into Timeseries Object to feed into Holts-Winter’s function call

#Plot original dataset via Holts-Winters
dat_ts <- ts(df4[, 2], start = c(1996,1), frequency = 123)
plot(dat_ts)

Fitting via Holts Winters

#Fit Holt-Winters exponential smoothing model to data and show summary
fit_hw <- HoltWinters(dat_ts,seasonal="multiplicative", optim.start = c(alpha = 0.3, beta = 0.1, gamma = 0.1))
summary(fit_hw)

##              Length Class  Mode     
## fitted       9348   mts    numeric  
## x            2460   ts     numeric  
## alpha           1   -none- numeric  
## beta            1   -none- numeric  
## gamma           1   -none- numeric  
## coefficients  125   -none- numeric  
## seasonal        1   -none- character
## SSE             1   -none- numeric  
## call            4   -none- call

#Plot the fitted values overlaying original data
plot(fit_hw )

plot(fitted(fit_hw))

The “Greeks”

alpha, beta & gamma

fit_hw$alpha

##    alpha 
## 0.615003

fit_hw$beta

## beta 
##    0

fit_hw$gamma

##     gamma 
## 0.5495256

Seasonality Factors

#binding dates and seasonality factors together to make dataframe
df4.shorten<-df4 %>% slice(124:2460)
test<-as.vector(fit_hw$fitted[,4])
df.XmR<- df4.shorten %>% mutate(seaonality.factor=test)%>% select(c("date","seaonality.factor"))
head(df.XmR,5)

##         date seaonality.factor
## 1 1997-07-01          1.052653
## 2 1997-07-02          1.100742
## 3 1997-07-03          1.135413
## 4 1997-07-04          1.110338
## 5 1997-07-05          1.025231

Line Charts of seasonality factors

19 years of seasonality factors (excluding first year)
Year 1996 was “lost” in the computation because Holts-Winters needed a full cycle (123 days in our case) to determine seasonality factors

#Grouping avg SI by year
df3.shorten<-df3 %>% slice(124:2460) %>%  select(c("date","YEAR"))%>% mutate(seaonality.factor=test)
df.sf<-df3.shorten %>% group_by(YEAR) %>% summarise(Seasonality.Index=1*mean(seaonality.factor))
head(df.sf,5)

## # A tibble: 5 x 2
##   YEAR  Seasonality.Index
##   <fct>             <dbl>
## 1 1997              1    
## 2 1998              0.998
## 3 1999              0.998
## 4 2000              0.998
## 5 2001              0.997

#line charts of seasonality factors
ggplot(data=df.sf, aes(x=YEAR, y=Seasonality.Index, group=1)) +geom_line(color="red")+geom_point()+ggtitle("Fig1C:Line Chart of Avg Seasonality Index over 20 Yrs in Atlanta")+theme_excel()

Observation2:

At this point, Holts-Winter decomposition and figures 1a to 1c are telling 2 different stories:

All the 3 Figures visually showed temperatures were trending lower year over year (YoY)

However, the decomposed trend graph from Holts-Winters indicated NO trending at all!

Holts-Winter indicated that there were some randomness (alpha=0.615003) and seasonalities (gamma=0.5495256) but No Trend; zero beta
Holts-Winter decomposed plots clearly showed flat trend and level, with strong influence of seasonalities

Control chart

Plotting seasonality factors vs. control limits
95% confidence level (alpha=0.05); control limits = +/- 2 Std Errors from the mean

#mean Temp over entire 20 years
m <- matrix(fit_hw$fitted[,4],ncol=123)
temp.mean<-mean(m)#mean ~ 1.0
#median Temp over entire 20 years
temp.median<-median(m)#median ~ 1.01
#Temp Standard Deviation over entire 20 years
temp.sd<-sd(m)#std dev. ~ 0.09 => 2SD = 0.18
# Using 95% confidence level equates to control limits of +/-2 Std Dev from mean
sd.2<-2*temp.sd
mean.1.2<-sd.2+temp.mean
mean.0.8<-temp.mean-sd.2
#control charts with upper and lower limits
ggplot(data=df.XmR, aes(x=date, y=seaonality.factor, group=1))+geom_line(color='blue')+ggtitle("Fig2:Line Chart of Avg seasonality factors over 20 Yrs in Atlanta")+theme_excel()+ geom_hline(yintercept=mean.1.2, linetype="dashed", color = "red",size=1.25)+ geom_hline(yintercept=mean.0.8, linetype="dashed", color = "red",size=1.5)

Summary

- All Visual indicators and even the ploted average seasonal indices were pointing to lower temperatures 

- Fig1a thru Fig1c displayed YoY downward temperature trends

- Decomposition of Time Series indicated there were indeed no trend at all; beta was computed to be zero

Conclusion

Like the saying goes, looks can be deceiving. The histogram, boxplot and line chart all were telling us the data were trending lower while in fact, Holts-Winter’s decomposition told us a different story:

- No trends either upwards or downwards were statistically evident

- The average Seasonal Index were lower YoY but the decreases were too small and not statistical strong enough

- Since NO normality assumption is needed in fitting an exponential smoothing model, see reference 1, we can apply the standard confidence intervals to check if temperature spikes were statistically significant 

- Fig 2:Plotting the seaonsal factors with a 95% confidence level in the control chart, we witnessed the data were quite stationary peppered with seasonalities. Any violations against upper or lower bounds were purely due to seasonality effects

Bottomline, it is really hard to tell if unofficial end of summer has gotten later over the 20 years based on this data. i.e, temperatures getting warmer YoY. However, we discovered no statistical evidence to support downward temperature trend over 20 years. And higher temperature spikes especially in the later years of 2011 thru 2015 were driven by seasonalities

Reference 1:

https://stats.stackexchange.com/questions/64911/does-the-holt-winters-algorithm-for-exponential-smoothing-in-time-series-modelli

HW4_ISYE_6501

9/11/2020