HW4

Question 7.2

Using the 20 years of daily high temperature data for Atlanta (July through October) from Question 6.2 (file temps.txt), build and use an exponential smoothing model to help make a judgment of whether the unofficial end of summer has gotten later over the 20 years. (Part of the point of this assignment is for you to think about how you might use exponential smoothing to answer this question. Feel free to combine it with other models if you’d like to. There’s certainly more than one reasonable approach.)

rm(list = ls())

# Load temps data as time series object
temps <- read.table('temps.txt', stringsAsFactors = FALSE, header = TRUE)
head(temps)

##     DAY X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006
## 1 1-Jul    98    86    91    84    89    84    90    73    82    91    93
## 2 2-Jul    97    90    88    82    91    87    90    81    81    89    93
## 3 3-Jul    97    93    91    87    93    87    87    87    86    86    93
## 4 4-Jul    90    91    91    88    95    84    89    86    88    86    91
## 5 5-Jul    89    84    91    90    96    86    93    80    90    89    90
## 6 6-Jul    93    84    89    91    96    87    93    84    90    82    81
##   X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
## 1    95    85    95    87    92   105    82    90    85
## 2    85    87    90    84    94    93    85    93    87
## 3    82    91    89    83    95    99    76    87    79
## 4    86    90    91    85    92    98    77    84    85
## 5    88    88    80    88    90   100    83    86    84
## 6    87    82    87    89    90    98    83    87    84

temps_vec <- as.vector(unlist(temps[,2:21]))
temps_ts <- ts(data = temps_vec, frequency=123, start=1996)
summary(temps_ts)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   50.00   79.00   85.00   83.34   90.00  105.00

# Plot the temps time series
ts.plot(temps_ts)

# Use Holt Winters to smooth the data
# alpha beta gamma set to null so function will find optimal values
# there are seasonal variations, so set seasonal to "multiplicative"
temps_hw <- HoltWinters(temps_ts, alpha=NULL, beta=NULL, gamma=NULL, seasonal = "multiplicative") 
summary(temps_hw)

##              Length Class  Mode     
## fitted       9348   mts    numeric  
## x            2460   ts     numeric  
## alpha           1   -none- numeric  
## beta            1   -none- numeric  
## gamma           1   -none- numeric  
## coefficients  125   -none- numeric  
## seasonal        1   -none- character
## SSE             1   -none- numeric  
## call            6   -none- call

plot(temps_hw)

# It looks like there was some smoothing that occured. Let's look at the temps_hw$fitted object more closely, since "fitted" has 
# the time series data in it as well as the level, trend, and seasonal components.
head(temps_hw$fitted)

##          xhat    level        trend   season
## [1,] 87.23653 82.87739 -0.004362918 1.052653
## [2,] 90.42182 82.15059 -0.004362918 1.100742
## [3,] 92.99734 81.91055 -0.004362918 1.135413
## [4,] 90.94030 81.90763 -0.004362918 1.110338
## [5,] 83.99917 81.93634 -0.004362918 1.025231
## [6,] 84.04496 81.93247 -0.004362918 1.025838

# Let's pull out the seasonality factors into a separate object and add the row/column names for ease of use
temps_hw_sf <- matrix(temps_hw$fitted[,4], nrow=123)
head(temps_hw_sf)

##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
## [1,] 1.052653 1.049468 1.120607 1.103336 1.118390 1.108172 1.140906
## [2,] 1.100742 1.099653 1.108025 1.098323 1.110184 1.116213 1.126827
## [3,] 1.135413 1.135420 1.139096 1.142831 1.143201 1.138495 1.129678
## [4,] 1.110338 1.110492 1.117079 1.125774 1.134539 1.126117 1.130758
## [5,] 1.025231 1.025233 1.044684 1.067291 1.084725 1.097239 1.115055
## [6,] 1.025838 1.025722 1.028169 1.042340 1.053954 1.067494 1.080203
##          [,8]     [,9]    [,10]    [,11]    [,12]    [,13]    [,14]
## [1,] 1.140574 1.125438 1.122063 1.161415 1.198102 1.198910 1.243012
## [2,] 1.154074 1.142187 1.131889 1.144549 1.134661 1.153433 1.165431
## [3,] 1.156092 1.165657 1.147982 1.149459 1.135756 1.153310 1.155197
## [4,] 1.137722 1.150639 1.146992 1.142497 1.150162 1.151169 1.157751
## [5,] 1.103877 1.120818 1.133733 1.132167 1.142714 1.139244 1.112909
## [6,] 1.094312 1.102680 1.092178 1.075766 1.088547 1.082185 1.103092
##         [,15]    [,16]    [,17]    [,18]    [,19]
## [1,] 1.243781 1.238435 1.300204 1.290647 1.254521
## [2,] 1.172935 1.190735 1.191956 1.219190 1.228826
## [3,] 1.157286 1.169773 1.189915 1.172309 1.169045
## [4,] 1.163844 1.159343 1.166605 1.167993 1.158956
## [5,] 1.132435 1.132045 1.145230 1.168161 1.170449
## [6,] 1.115071 1.118575 1.121598 1.134962 1.145475

tail(temps_hw_sf)

##             [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [118,] 0.9202878 0.9167266 0.9316249 0.9483276 0.9356822 0.8960745
## [119,] 0.9203791 0.8752448 0.8890989 0.9189986 0.9279178 0.8676299
## [120,] 0.9934182 0.9493421 0.9386713 0.9343247 0.9395149 0.9269720
## [121,] 1.0052839 1.0170234 0.9947869 0.9777463 0.9568681 0.9754782
## [122,] 1.0049851 1.0160082 1.0182545 1.0152111 1.0042247 1.0293585
## [123,] 0.9920411 0.9788295 0.9796735 0.9865343 0.9910570 1.0001962
##             [,7]      [,8]      [,9]     [,10]     [,11]     [,12]
## [118,] 0.9115292 0.8880224 0.8919086 0.9016390 0.9096762 0.9076538
## [119,] 0.8817807 0.8675880 0.8778461 0.8903499 0.9073530 0.9042270
## [120,] 0.9360466 0.8945719 0.8856237 0.8857793 0.8844382 0.8961658
## [121,] 0.9685817 0.9862772 0.9533857 0.9449564 0.9580790 0.9214616
## [122,] 0.9928965 1.0143191 1.0032329 1.0049513 1.0001026 0.9921573
## [123,] 0.9651261 0.9733806 0.9879126 0.9947735 0.9800279 0.9931484
##            [,13]     [,14]     [,15]     [,16]     [,17]     [,18]
## [118,] 0.9056794 0.8805521 0.8824280 0.8730733 0.8639161 0.8614441
## [119,] 0.8709238 0.8536790 0.8396531 0.8495173 0.8266791 0.8544568
## [120,] 0.8523899 0.8794887 0.8631928 0.8499712 0.7994624 0.8024431
## [121,] 0.9301216 0.9347985 0.8985886 0.8409815 0.8223158 0.8457842
## [122,] 1.0005049 0.9591606 0.9478091 0.9185031 0.8930257 0.8944247
## [123,] 1.0024638 0.9948792 0.9941488 0.9852592 0.9909004 0.9581546
##            [,19]
## [118,] 0.8717307
## [119,] 0.8598048
## [120,] 0.8002607
## [121,] 0.8257107
## [122,] 0.8615051
## [123,] 0.9137532

# And also pull out the xhat values into a separate object too
temps_hw_smoothed <- matrix(temps_hw$fitted[,1], nrow=123)
head(temps_hw_smoothed)

##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
## [1,] 87.23653 65.04516 90.29613 83.39938 87.68863 78.07509 73.10059
## [2,] 90.42182 84.87634 85.44878 86.44444 84.78855 86.02384 72.13247
## [3,] 92.99734 89.61560 85.65942 92.85774 88.70570 90.23022 77.77739
## [4,] 90.94030 88.47600 84.80741 91.55309 86.98750 87.27931 83.52416
## [5,] 83.99917 83.11178 81.14293 88.80208 81.40681 86.06745 83.86090
## [6,] 84.04496 88.00054 85.21673 91.04477 81.83758 87.87757 78.93483
##          [,8]     [,9]    [,10]    [,11]    [,12]    [,13]    [,14]
## [1,] 87.27074 92.29714 78.50826 81.58696 84.72917 79.51855 86.74604
## [2,] 85.01878 92.85614 88.18138 88.52648 80.39548 85.65722 81.47324
## [3,] 82.68648 92.33884 92.43570 86.72311 84.53380 88.31357 82.29310
## [4,] 83.37312 87.29596 92.69774 83.30574 89.62822 88.56597 82.90566
## [5,] 83.64904 84.25223 90.58916 84.18954 89.27001 89.12501 80.92784
## [6,] 86.79140 85.75665 86.91496 82.21750 84.28967 79.32562 84.52016
##         [,15]    [,16]    [,17]     [,18]    [,19]
## [1,] 93.88371 82.30605 84.88750 102.54643 90.07756
## [2,] 87.43846 92.55001 76.18707  89.57468 85.16854
## [3,] 90.24836 91.18746 81.46207  88.15080 82.09161
## [4,] 93.69353 95.13130 76.56780  87.11605 79.49314
## [5,] 90.14667 94.60910 75.42085  85.20682 83.69666
## [6,] 88.67069 96.75445 78.42462  83.25423 82.08838

#Let's add the row and column names back into the dataset for ease of use.
colnames(temps_hw_sf) <- colnames(temps[,3:21])
rownames(temps_hw_sf) <- temps[,1]

colnames(temps_hw_smoothed) <- colnames(temps[,3:21])
rownames(temps_hw_smoothed) <- temps[,1]

# One interesting thing we could look at is the average seasonal factor for each year and for the dataset as a whole.
meanSFAllYears <- mean(temps_hw_sf)
meanSFAllYears

## [1] 0.9954727

meanSFEachYear <- vector()
for (i in 1:ncol(temps_hw_sf)){
  meanSFEachYear[i] = mean(temps_hw_sf[,i])
}
meanSFEachYear

##  [1] 1.0000000 0.9981882 0.9980547 0.9975095 0.9971271 0.9961954 0.9955108
##  [8] 0.9957393 0.9956360 0.9949667 0.9948162 0.9946495 0.9943300 0.9941211
## [15] 0.9940949 0.9933907 0.9932476 0.9935215 0.9928824

# We could look at the average SF in October, which I think everyone would agree is a fall month.
which(temps$DAY=="1-Oct")

## [1] 93

mean(temps_hw_sf[93:123,])

## [1] 0.8751098

# Let's also look at the average seasonal factor for the first year of data (x[1]):
meanSF_firstYear <- mean(temps_hw_sf[,1])
meanSF_firstYear

## [1] 1

# Interesting. The baseline mean seasonal factor is 1, which I suppose makes sense since it's the baseline. 
# So, one way we could define the end of summer is the day when the seasonal factor falls below the baseline SF factor (1), which would indicate that the weather is getting cooler and fall is here. We could use cusum to see at which point in each year that threshold is being breached. Here's an implementation of CUSUM we can use:

cusum_decrease = function(data, mean, T, C){
  results = list()
  cusum = 0
  rowCounter = 1
  while (rowCounter <= nrow(data)){
    current = data[rowCounter,]
    cusum = max(0, cusum + (mean - current - C))
    # print(cusum)
    if (cusum >= T) {
      results = rowCounter
      break
    }
    rowCounter = rowCounter + 1
    if (rowCounter >= nrow(data)){
      results = NA
      break
    }
  }
  return(results)
}

# Let's see what values for C we should use by definined it as 0.5x of the standard deviation of the first year of data, and let's use 3x the stD for T.
C_var = sd(temps_hw_sf[,1])*0.5
T_var = sd(temps_hw_sf[,1])*3

# OK. Now let's run cusum on each year of data to see which day the running average of the SF breached the threshold that we set.
# I'll use the mean of the first year (x[1]) when applying CUSUM.
result_vector = vector()
for (col in 1:ncol(temps_hw_sf)){
  result_vector[col] = cusum_decrease(data = as.matrix(temps_hw_sf[,col]), mean = 1,T = T_var, C = C_var)
}

result_df = data.frame(Year = colnames(temps_hw_sf),Day = temps[result_vector,1])
result_df

##     Year    Day
## 1  X1997 30-Sep
## 2  X1998  1-Oct
## 3  X1999  1-Oct
## 4  X2000  1-Oct
## 5  X2001  2-Oct
## 6  X2002  2-Oct
## 7  X2003  3-Oct
## 8  X2004  3-Oct
## 9  X2005  4-Oct
## 10 X2006  4-Oct
## 11 X2007  5-Oct
## 12 X2008  5-Oct
## 13 X2009  5-Oct
## 14 X2010  5-Oct
## 15 X2011  3-Oct
## 16 X2012  3-Oct
## 17 X2013  3-Oct
## 18 X2014  4-Oct
## 19 X2015  4-Oct

From this analysis, it looks like the day fall came gradually got later, but the range was much tighter than in the previous HW when we used CUSUM on the unsmoothed data. Could this be the impact of the exponential smoothing model?

HW4

Matt

September 16, 2018

Question 7.1

Question 7.2