Handling Time-Series Data in R

Cassie (Xi) Guo

Outline

  • Introduction of time series in R
  • R libraries for time series

    • 'stats'
    • 'zoo'
    • 'forecast'
  • An example

Time series basics

  • Data collected from observations sequentially over time

  • We use time series to:

    • Understand the history
    • Model the data
    • Predict the future

Two kinds of time series

  • Stationary (too good to be true)
  • Nonstationary

plot of chunk unnamed-chunk-2plot of chunk unnamed-chunk-2

“Experience with real-world data, however, soon convinces once that both stationarity and Gaussianity are fairy tales invented for the amusement of undergraduates.” (Thompson 1994)

Libraries for time series

  • stats

    • Basic functions
  • zoo

    • Irregular time series
  • forecast

    • Prediction & modeling

'stats' Package

  • Create ts object

    • ts()
  • plot

    • plot.ts()
    • ts.plot()
  • Trend analysis

    • lm()
    • decompose()
  • Modeling

    • HoltWinters()

ts()

  • start, end, frequency
setwd('/Users/LilSummer/Desktop/TS_analysis')
men <- read.csv(file = 'men_clothing.csv', header = T, skip = 6)

    ###  make a ts object ###
    men.ts <- ts(men$Value, 
                 start = c(1992, 1), 
                 end = c(2014, 12), 
                 frequency = 12 )
men.ts
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1992  701  658  731  816  856  853  714  777  762  841  913 1557
1993  695  618  706  796  809  792  722  729  748  836  913 1598
1994  688  632  762  774  791  819  719  754  754  836  935 1568
1995  661  607  674  726  721  740  642  697  722  756  903 1466
1996  639  651  708  718  774  759  660  761  744  793  919 1420
1997  697  633  746  716  809  793  737  846  761  870  987 1474
1998  716  644  735  844  836  817  757  844  754  865  955 1429
1999  684  610  708  812  803  791  723  789  714  802  896 1335
2000  635  611  707  759  770  750  719  789  751  787  897 1332
2001  620  588  681  675  710  702  626  743  610  704  793 1173
2002  540  534  657  632  665  650  592  695  604  669  761 1113
2003  539  498  601  631  679  657  602  741  625  704  781 1191
2004  574  559  637  674  675  657  639  720  634  727  809 1261
2005  575  576  699  672  698  719  646  740  634  717  809 1252
2006  570  557  660  693  693  704  623  716  718  781  823 1306
2007  625  600  679  731  740  718  629  636  656  717  804 1237
2008  609  586  680  710  757  715  622  655  636  664  708 1009
2009  524  497  543  669  650  607  575  551  579  611  620  928
2010  476  471  568  631  627  598  544  517  564  634  669  988
2011  487  497  599  690  677  661  588  548  643  682  707 1083
2012  526  545  623  710  726  703  614  619  687  710  730 1079
2013  599  561  681  717  749  677  648  689  683  787  802 1090
2014  577  642  707  790  806  706  657  695  695  757  803 1066
  • Optional: start, end
  • Frequency

    • Default: 1
    • frequency = 52 (week)
    • frequency = 4 (season)
men.ts <- ts(men$Value, end = c(2014, 12), frequency = 12)
men.ts <- ts(men$Value, start = c(1992, 1), frequency = 12)
men.ts <- ts(men$Value, frequency = 12)

plot.ts()

plot.ts(men.ts, lwd = 3)

plot of chunk unnamed-chunk-7

clothes <- cbind(women.ts, men.ts)
plot.ts(clothes, plot.type = 'multiple') ##multiple is optional
plot.ts(clothes)

plot of chunk unnamed-chunk-8

plot.ts(clothes, plot.type = 'single', col = c('dark red', 'dark green'))
legend("topleft", legend = c('women', 'men'), col = c('dark red', 'dark green'), lty = 1)

plot of chunk unnamed-chunk-9

ts.plot()

ts.plot(clothes, gpars = list(col = c('dark red', 'dark green')), lwd = 3)
legend("topleft", legend = c('women', 'men'), col = c('dark red', 'dark green'), lty = 1)

plot of chunk unnamed-chunk-10

lm() -- Trend analysis

plot of chunk unnamed-chunk-11

model = lm(men.ts ~ time(men.ts))
summary(model)

Call:
lm(formula = men.ts ~ time(men.ts))

Residuals:
    Min      1Q  Median      3Q     Max 
-247.89  -95.98  -41.83   27.04  783.24 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  15795.156   3309.274   4.773 2.96e-06 ***
time(men.ts)    -7.513      1.652  -4.548 8.12e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 182.2 on 274 degrees of freedom
Multiple R-squared:  0.07021,   Adjusted R-squared:  0.06681 
F-statistic: 20.69 on 1 and 274 DF,  p-value: 8.123e-06

plot of chunk unnamed-chunk-13

time(men.ts)
          Jan      Feb      Mar      Apr      May      Jun      Jul
1992 1992.000 1992.083 1992.167 1992.250 1992.333 1992.417 1992.500
1993 1993.000 1993.083 1993.167 1993.250 1993.333 1993.417 1993.500
1994 1994.000 1994.083 1994.167 1994.250 1994.333 1994.417 1994.500
1995 1995.000 1995.083 1995.167 1995.250 1995.333 1995.417 1995.500
1996 1996.000 1996.083 1996.167 1996.250 1996.333 1996.417 1996.500
1997 1997.000 1997.083 1997.167 1997.250 1997.333 1997.417 1997.500
1998 1998.000 1998.083 1998.167 1998.250 1998.333 1998.417 1998.500
1999 1999.000 1999.083 1999.167 1999.250 1999.333 1999.417 1999.500
2000 2000.000 2000.083 2000.167 2000.250 2000.333 2000.417 2000.500
2001 2001.000 2001.083 2001.167 2001.250 2001.333 2001.417 2001.500
2002 2002.000 2002.083 2002.167 2002.250 2002.333 2002.417 2002.500
2003 2003.000 2003.083 2003.167 2003.250 2003.333 2003.417 2003.500
2004 2004.000 2004.083 2004.167 2004.250 2004.333 2004.417 2004.500
2005 2005.000 2005.083 2005.167 2005.250 2005.333 2005.417 2005.500
2006 2006.000 2006.083 2006.167 2006.250 2006.333 2006.417 2006.500
2007 2007.000 2007.083 2007.167 2007.250 2007.333 2007.417 2007.500
2008 2008.000 2008.083 2008.167 2008.250 2008.333 2008.417 2008.500
2009 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500
2010 2010.000 2010.083 2010.167 2010.250 2010.333 2010.417 2010.500
2011 2011.000 2011.083 2011.167 2011.250 2011.333 2011.417 2011.500
2012 2012.000 2012.083 2012.167 2012.250 2012.333 2012.417 2012.500
2013 2013.000 2013.083 2013.167 2013.250 2013.333 2013.417 2013.500
2014 2014.000 2014.083 2014.167 2014.250 2014.333 2014.417 2014.500
          Aug      Sep      Oct      Nov      Dec
1992 1992.583 1992.667 1992.750 1992.833 1992.917
1993 1993.583 1993.667 1993.750 1993.833 1993.917
1994 1994.583 1994.667 1994.750 1994.833 1994.917
1995 1995.583 1995.667 1995.750 1995.833 1995.917
1996 1996.583 1996.667 1996.750 1996.833 1996.917
1997 1997.583 1997.667 1997.750 1997.833 1997.917
1998 1998.583 1998.667 1998.750 1998.833 1998.917
1999 1999.583 1999.667 1999.750 1999.833 1999.917
2000 2000.583 2000.667 2000.750 2000.833 2000.917
2001 2001.583 2001.667 2001.750 2001.833 2001.917
2002 2002.583 2002.667 2002.750 2002.833 2002.917
2003 2003.583 2003.667 2003.750 2003.833 2003.917
2004 2004.583 2004.667 2004.750 2004.833 2004.917
2005 2005.583 2005.667 2005.750 2005.833 2005.917
2006 2006.583 2006.667 2006.750 2006.833 2006.917
2007 2007.583 2007.667 2007.750 2007.833 2007.917
2008 2008.583 2008.667 2008.750 2008.833 2008.917
2009 2009.583 2009.667 2009.750 2009.833 2009.917
2010 2010.583 2010.667 2010.750 2010.833 2010.917
2011 2011.583 2011.667 2011.750 2011.833 2011.917
2012 2012.583 2012.667 2012.750 2012.833 2012.917
2013 2013.583 2013.667 2013.750 2013.833 2013.917
2014 2014.583 2014.667 2014.750 2014.833 2014.917

decompose()

men.ts.de <- decompose(men.ts)
men.ts.de$seasonal
             Jan         Feb         Mar         Apr         May
1992 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1993 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1994 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1995 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1996 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1997 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1998 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
1999 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2000 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2001 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2002 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2003 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2004 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2005 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2006 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2007 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2008 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2009 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2010 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2011 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2012 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2013 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
2014 -139.097538 -163.334280  -70.142992  -23.993371   -5.671402
             Jun         Jul         Aug         Sep         Oct
1992  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1993  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1994  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1995  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1996  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1997  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1998  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
1999  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2000  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2001  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2002  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2003  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2004  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2005  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2006  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2007  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2008  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2009  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2010  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2011  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2012  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2013  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
2014  -24.078598  -91.118371  -35.625947  -61.595644    7.135417
             Nov         Dec
1992   81.870265  525.652462
1993   81.870265  525.652462
1994   81.870265  525.652462
1995   81.870265  525.652462
1996   81.870265  525.652462
1997   81.870265  525.652462
1998   81.870265  525.652462
1999   81.870265  525.652462
2000   81.870265  525.652462
2001   81.870265  525.652462
2002   81.870265  525.652462
2003   81.870265  525.652462
2004   81.870265  525.652462
2005   81.870265  525.652462
2006   81.870265  525.652462
2007   81.870265  525.652462
2008   81.870265  525.652462
2009   81.870265  525.652462
2010   81.870265  525.652462
2011   81.870265  525.652462
2012   81.870265  525.652462
2013   81.870265  525.652462
2014   81.870265  525.652462
plot(men.ts.de, lwd = 4)  ##lwd doesn't work

plot of chunk unnamed-chunk-16

'zoo' Package

  • Particularly aimed at irregular time series of numeric vectors/matrices and factors
  • Create zoo object

    • zooreg()
  • Subset

    • window()
  • Plot

    • autoplot.zoo()
  • Handle missing data

    • na.approx()
    • na.locf()

zooreg()

Useful for analyzing daily data

library(zoo)
Daily.z <- zooreg(Daily$t.ave, start = as.Date('1960-01-01'), end = as.Date("2015-12-31"), frequency = 1)
str(Daily.z)
'zooreg' series from 1960-01-01 to 2015-12-30
  Data: num [1:20453] -3.3 -5.3 -15.3 -18.9 -20.3 -7.5 0.25 -10.3 -6.15 -5.55 ...
  Index:  Date[1:20453], format: "1960-01-01" "1960-01-02" "1960-01-03" "1960-01-04" ...
  Frequency: 1 

window()

data <- window(Daily.z, start = as.Date('2015-01-01'), end = as.Date('2015-03-01'))
data
2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05 2015-01-06 
     -9.45      -6.10     -11.65     -21.10     -20.30     -18.60 
2015-01-07 2015-01-08 2015-01-09 2015-01-10 2015-01-11 2015-01-12 
    -17.20     -13.30     -17.20     -11.40     -14.45     -21.15 
2015-01-13 2015-01-14 2015-01-15 2015-01-16 2015-01-17 2015-01-18 
    -15.55      -5.30      -3.90      -3.90       0.55      -0.30 
2015-01-19 2015-01-20 2015-01-21 2015-01-22 2015-01-23 2015-01-24 
      3.05      -1.40      -6.40      -4.70       1.10       1.10 
2015-01-25 2015-01-26 2015-01-27 2015-01-28 2015-01-29 2015-01-30 
     -0.85       2.25       0.55      -0.80      -6.95      -5.30 
2015-01-31 2015-02-01 2015-02-02 2015-02-03 2015-02-04 2015-02-05 
     -5.00     -15.80     -15.85     -12.75     -17.50     -12.50 
2015-02-06 2015-02-07 2015-02-08 2015-02-09 2015-02-10 2015-02-11 
     -5.30      -1.40      -1.70      -2.50      -4.15     -14.15 
2015-02-12 2015-02-13 2015-02-14 2015-02-15 2015-02-16 2015-02-17 
    -13.30      -8.35     -15.85     -13.65     -10.85     -15.85 
2015-02-18 2015-02-19 2015-02-20 2015-02-21 2015-02-22 2015-02-23 
    -19.20     -15.25      -5.00     -13.65     -19.45     -10.00 
2015-02-24 2015-02-25 2015-02-26 2015-02-27 2015-02-28 2015-03-01 
     -5.25     -16.65     -16.95     -15.30     -13.05      -9.15 

autoplot.zoo()

autoplot.zoo(Daily.z[1:100])

plot of chunk unnamed-chunk-20

'forecast' Package

  • Modeling

    • forecast()
    • forecast.HoltWinters()
    • forecast.Arima()
  • Plotting

    • plot.forecast()

An example

plot.ts(women.ts, lwd = 3)

plot of chunk unnamed-chunk-21

HoltWinters()

sale.Holt <- HoltWinters(women.ts)
sale.Holt
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = women.ts)

Smoothing parameters:
 alpha: 0.2933085
 beta : 0
 gamma: 0.7588565

Coefficients:
            [,1]
a    3648.737436
b       5.231498
s1  -1027.307620
s2   -688.517871
s3    112.539820
s4     82.626857
s5    236.175512
s6   -282.368713
s7   -337.807758
s8    -98.808560
s9   -267.604667
s10   -45.937374
s11   295.438372
s12  1297.456790
plot(sale.Holt, lwd = 3)

plot of chunk unnamed-chunk-23

library(forecast)
sale.forecast <- forecast.HoltWinters(sale.Holt, h = 48)
plot.forecast(sale.forecast, lwd = 3)

plot of chunk unnamed-chunk-24

Model diagnostics

acf(sale.forecast$residuals, lag.max = 20)

plot of chunk unnamed-chunk-25

Box.test(sale.forecast$residuals, lag = 20, type = 'Ljung-Box')

    Box-Ljung test

data:  sale.forecast$residuals
X-squared = 59.553, df = 20, p-value = 8.351e-06

Othe packages

  • TSA

  • quantmod

  • TTR

References

  • “All models are wrong but some are useful” ——— George Box, Statistician

  • http://xkcd.com/1725/