outline:

  1. Time series basics: create a (time series) ts object from scratch, read in from files, plot

  2. Check stationarity for classical times series models by simulation.

  3. Handle non-stationary time series. (The goal is to turn non-stationary time series into stationary time series through a series of filters (e.g. detrending, autocorrelation). Illustrate using real data (electricity and temperature)

Import libraries

library(tseries)
library(ggplot2)
library(zoo)
library(data.table)
library(forecast)
library(tidyverse)
library(fma)

PART I

Time series basics

Create a single time series object

create a time series (ts) object with (1) value: \(2t + e_t\), where \(e_t \sim N(0,20)\), \(t=1,2,\cdots,120\); (2) time: from Jan-2019 to Dec-2028. Name this object “tseries”.

Use ?ts to check documents.

set.seed(100) #  sets the starting number used to generate a sequence of random numbers – it ensures that you get the same result if you start with that same seed each time you run the same process.

t <- 2 * seq(from = 1, to = 120, by = 1)  + rnorm(120, mean = 0, sd = 20)

tseries <- ts(data = t, start=c(2019,1), end=c(2028,12), frequency = 12) # needs to be in equal time increments 
# a time series is an ordered sequence of values of a variable at equally spaced time intervals.

print(tseries)
##             Jan        Feb        Mar        Apr        May        Jun
## 2019  -8.043847   6.630623   4.421658  25.735696  12.339425  18.372602
## 2020  21.967321  42.796810  32.467590  31.413666  26.222915  46.217125
## 2021  33.712418  43.230989  39.595569  60.618891  34.845411  64.941520
## 2022  77.658154  84.346466  99.308047  99.404040  79.967415 112.064070
## 2023  98.875581  62.426882  93.058756  69.228041 109.577297 145.949314
## 2024 116.760085 122.623119 118.422329 179.639179 132.596683 117.739500
## 2025 122.751614 180.970435 108.758080 152.254994 132.249433 161.410790
## 2026 184.191632 168.841899 178.327357 192.347242 212.543515 177.924594
## 2027 177.350084 204.270397 174.426337 176.519305 195.341533 231.262274
## 2028 242.802029 217.851324 225.451870 229.092025 213.709323 199.415698
##             Jul        Aug        Sep        Oct        Nov        Dec
## 2019   2.364186  30.290654   1.494811  12.802757  23.797723  25.925489
## 2020  19.723716  86.205936  33.238200  59.281212  51.239226  63.468092
## 2021  60.177729  99.147512  63.241408  65.776130  56.199714  67.564115
## 2022  50.464487 100.457348  79.554333 118.444619  86.731193 122.381315
## 2023  64.561490 131.609283  86.023488 152.497448 145.625975 103.222962
## 2024 146.759885 140.033832 136.601661 138.150202 150.978065 122.712887
## 2025 178.169037 118.511905 179.936445 163.000085 139.093014 129.375769
## 2026 170.857554 212.566029 168.140852 164.848575 179.394071 240.913655
## 2027 196.617053 224.857513 180.840126 203.993882 198.471654 208.614070
## 2028 223.380491 234.567721 254.362400 230.888526 231.949180 272.303814

read in from file

Use read.table to read the data from “ts.sample.txt” into a ts object and print it out.

?read.table
ts.sample <- read.csv("ts.sample.txt", header=TRUE, sep="") # defaults between read.table and read.csv are different
ts.sample
##              x
## 1   1.73067773
## 2  -1.08274980
## 3  -2.21151312
## 4   1.50704389
## 5   0.14582993
## 6   0.40603023
## 7   1.30107501
## 8  -0.19889175
## 9  -0.81168049
## 10  0.90951703
## 11  0.08432402
## 12 -0.31358711
## 13  1.72252075
## 14 -0.18193383
## 15 -1.32927078
## 16 -0.27564470
## 17 -0.23653474
## 18  0.78487600
## 19  0.99247488
## 20  0.69994919
## 21 -1.91246878
## 22 -1.24671867
## 23  0.49855245
## 24 -0.45423027
## 25  1.49903922
## 26  1.01092209
## 27 -0.22414683
## 28 -1.60582153
## 29 -0.06655503
## 30 -0.65764414
## 31  0.14960412
## 32  1.30539387
## 33 -0.50526480
## 34 -1.12061501
## 35  0.77018017
## 36 -0.18834248
## 37 -0.31537627
## 38  0.89977692
## 39  0.68375782
## 40 -1.44608804
ts.sample <- read.table("ts.sample.txt")
ts.sample
##              x
## 1   1.73067773
## 2  -1.08274980
## 3  -2.21151312
## 4   1.50704389
## 5   0.14582993
## 6   0.40603023
## 7   1.30107501
## 8  -0.19889175
## 9  -0.81168049
## 10  0.90951703
## 11  0.08432402
## 12 -0.31358711
## 13  1.72252075
## 14 -0.18193383
## 15 -1.32927078
## 16 -0.27564470
## 17 -0.23653474
## 18  0.78487600
## 19  0.99247488
## 20  0.69994919
## 21 -1.91246878
## 22 -1.24671867
## 23  0.49855245
## 24 -0.45423027
## 25  1.49903922
## 26  1.01092209
## 27 -0.22414683
## 28 -1.60582153
## 29 -0.06655503
## 30 -0.65764414
## 31  0.14960412
## 32  1.30539387
## 33 -0.50526480
## 34 -1.12061501
## 35  0.77018017
## 36 -0.18834248
## 37 -0.31537627
## 38  0.89977692
## 39  0.68375782
## 40 -1.44608804
ts.sample = ts(ts.sample) # we don't know the frequency of this
ts.sample
## Time Series:
## Start = 1 
## End = 40 
## Frequency = 1 
##                 x
##  [1,]  1.73067773
##  [2,] -1.08274980
##  [3,] -2.21151312
##  [4,]  1.50704389
##  [5,]  0.14582993
##  [6,]  0.40603023
##  [7,]  1.30107501
##  [8,] -0.19889175
##  [9,] -0.81168049
## [10,]  0.90951703
## [11,]  0.08432402
## [12,] -0.31358711
## [13,]  1.72252075
## [14,] -0.18193383
## [15,] -1.32927078
## [16,] -0.27564470
## [17,] -0.23653474
## [18,]  0.78487600
## [19,]  0.99247488
## [20,]  0.69994919
## [21,] -1.91246878
## [22,] -1.24671867
## [23,]  0.49855245
## [24,] -0.45423027
## [25,]  1.49903922
## [26,]  1.01092209
## [27,] -0.22414683
## [28,] -1.60582153
## [29,] -0.06655503
## [30,] -0.65764414
## [31,]  0.14960412
## [32,]  1.30539387
## [33,] -0.50526480
## [34,] -1.12061501
## [35,]  0.77018017
## [36,] -0.18834248
## [37,] -0.31537627
## [38,]  0.89977692
## [39,]  0.68375782
## [40,] -1.44608804

convinient functions for ts

  1. Extract times from tseries as a vector using function time; (2) Extract start time, end time and frequency using function start, end, frequency; (3) extract a subset from Jan-2019 to June-2020 using function window.
time(tseries) # goes by twelfves  
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2019 2019.000 2019.083 2019.167 2019.250 2019.333 2019.417 2019.500 2019.583
## 2020 2020.000 2020.083 2020.167 2020.250 2020.333 2020.417 2020.500 2020.583
## 2021 2021.000 2021.083 2021.167 2021.250 2021.333 2021.417 2021.500 2021.583
## 2022 2022.000 2022.083 2022.167 2022.250 2022.333 2022.417 2022.500 2022.583
## 2023 2023.000 2023.083 2023.167 2023.250 2023.333 2023.417 2023.500 2023.583
## 2024 2024.000 2024.083 2024.167 2024.250 2024.333 2024.417 2024.500 2024.583
## 2025 2025.000 2025.083 2025.167 2025.250 2025.333 2025.417 2025.500 2025.583
## 2026 2026.000 2026.083 2026.167 2026.250 2026.333 2026.417 2026.500 2026.583
## 2027 2027.000 2027.083 2027.167 2027.250 2027.333 2027.417 2027.500 2027.583
## 2028 2028.000 2028.083 2028.167 2028.250 2028.333 2028.417 2028.500 2028.583
##           Sep      Oct      Nov      Dec
## 2019 2019.667 2019.750 2019.833 2019.917
## 2020 2020.667 2020.750 2020.833 2020.917
## 2021 2021.667 2021.750 2021.833 2021.917
## 2022 2022.667 2022.750 2022.833 2022.917
## 2023 2023.667 2023.750 2023.833 2023.917
## 2024 2024.667 2024.750 2024.833 2024.917
## 2025 2025.667 2025.750 2025.833 2025.917
## 2026 2026.667 2026.750 2026.833 2026.917
## 2027 2027.667 2027.750 2027.833 2027.917
## 2028 2028.667 2028.750 2028.833 2028.917
start(tseries) 
## [1] 2019    1
end(tseries)
## [1] 2028   12
frequency(tseries)
## [1] 12
window(tseries, start=c(2019, 1), end=c(2020,6))
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2019 -8.043847  6.630623  4.421658 25.735696 12.339425 18.372602  2.364186
## 2020 21.967321 42.796810 32.467590 31.413666 26.222915 46.217125          
##            Aug       Sep       Oct       Nov       Dec
## 2019 30.290654  1.494811 12.802757 23.797723 25.925489
## 2020

ts plot

Plot the ts object tseries you create using plot.ts function. (You can also try plot or ggplot)

# plot
plot.ts(tseries, col='steelblue2', lty=1, lwd=2)

#plot(tseries, col='steelblue2', lty=1, lwd=2) # how to do in ggplot?
# ggplot
df.ts = data.frame(month=time(tseries), tseries)
ggplot(df.ts, aes(x=month, y=tseries)) + geom_line(col='steelblue2', size=1)
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

Can you show both points and line on the plot?

plot(tseries, type='l', col='steelblue2', lty=1, lwd=2) 

# If you prefer points, specify type='p'. If you prefer both points and line, specify type='b'

check stationary property in simulation data

For selected classical time series, check the ts plot to see if it is stationary.

Gaussian White noise

set.seed(1)
m = 100 # length of time series
x.wn = rnorm(m, 0, 1) 
plot(x.wn, type='l', col='steelblue2', lwd=2) 
abline(h=0, lty=2)

Wite noise with drift / intercept

Try to add a constant drift / intercept to the white noise series we generate in the previous section and make a plot.

In probability theory, stochastic drift is the change of the average value of a stochastic (random) process. A related concept is the drift rate, which is the rate at which the average changes. For example, a process that counts the number of heads in a series of.

a = 2 # drift / intercept - shifted
x.wn.drift = a + rnorm(m, 0, 1) 
plot(x.wn.drift, type='l', col='steelblue2', lwd=2) 
abline(h=a, lty=2)

White noise with a linear trend

set.seed(1)
b = .1 # slope of the linear trend # see what happens as you increase or decrease b? 
x.wn.trend = b*(1:m) + rnorm(m, 0, 1) 
plot(x.wn.trend, type='l', col='steelblue2', lwd=2) 
abline(a=0, b=b, lty=2)

Try generating a White noise with quadratic time trend and check its stationarity

set.seed(1)
c = .001 # scale factor for quadratic term
x.wn.quad = c*(1:m)^2 + rnorm(m, 0, 1) 
plot(x.wn.quad, type='l', col='steelblue2', lwd=2) 
lines(1:m, c*(1:m)^2, lty=2)

Random walk: \(x_t = x_{t-1} + w_t\) wih \(w_t \sim N(0,1)\)

# was here!!


set.seed(2)
m = 100 # number of time points
# function to generate a simple random walk process
random_walk = function(sd=1, m=100) cumsum(rnorm(m, 0, sd))
y.rw = random_walk(1, m)
plot(y.rw, type='l', col='steelblue2', lwd=2)
abline(h=0, lty=2)

multiple threads in one plot

set.seed(2)
n = 10 # number of threads
threads = replicate(n, random_walk(1, m))
# dim(res)
matplot(threads, type="l", lty="solid",  las = 1, ylab = expression(italic(y[t])), xlab = "Time",col = gray(0.6, 1))
abline(h=0, col='darkred', lty=2)

If you prefer ggplot …

threads.df = as.data.frame(threads)
threads.df = threads.df %>% mutate(time=1:m) %>% gather(key='thread', value='value', 1:n)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(n)` instead of `n` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ggplot(threads.df) + geom_line(aes(x=time, y=value, group=thread), col='gray60') + geom_hline(yintercept=0, col='darkred', linetype='dashed')

Notice that the variance of random walk increase with time. This may not be observed in a single-thread plot.

PART II

ACF plot

ACF plot of white noise (up to lag 15)

set.seed(1)
m = 100 # length of time series
x.wn = rnorm(m, 0, 1) 
acf(x.wn, lag.max = 15, main='')

Acf(x.wn, lag.max = 15, main='')

try to plot ACF of white noise with linear time trend we generated previously

set.seed(1)
b = .1 # slope of the linear trend
x.wn.trend = b*(1:m) + rnorm(m, 0, 1) 
acf(x.wn.trend, main='')

try to plot ACF of random walk we generated previously

set.seed(2)
m = 100 # number of time points
# function to generate a simple random walk process
random_walk = function(sd=1, m=100) cumsum(rnorm(m, 0, sd))
y.rw = random_walk(1, m)
acf(y.rw, main='')

# [put your code here]

Transformation, differencing, regression on time trend

electricity data

First perform square root transformation, then use first and twelfth differencing to remove trend and seasonality

A simple but often effective way to stabilize the variance across time is to apply a power transformation (square root, cube root, log, etc) to the time series. Here are some examples:

Differencing is a method of transforming a time series dataset. It can be used to remove the series dependence on time, so-called temporal dependence. … Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.

Here is a plot of the famous airline data along with its log to stabilize the variance, the 1st difference of the log to eliminate the linear trend, and the 12th difference of the 1st difference of the log to eliminate the annual cycle. (our frequency is 12)

?diff
## Help on topic 'diff' was found in the following packages:
## 
##   Package               Library
##   timeDate              /Library/Frameworks/R.framework/Versions/4.0/Resources/library
##   base                  /Library/Frameworks/R.framework/Resources/library
## 
## 
## Using the first match ...
plot.ts(cbind(elec, 
              sqrt(elec), 
              d_1=diff(sqrt(elec), lag=1), 
              d_1_12=diff(diff(sqrt(elec), lag=1), lag=12)), col='steelblue2', lwd=2, main='differencing orders 1')

# exchange 1st differencing and 12th differencing leads to the same result
plot.ts(cbind(elec, 
              sqrt(elec), 
              d_12=diff(sqrt(elec), lag=12), 
              d_12_1=diff(diff(sqrt(elec), lag=12), lag=1)), col='steelblue2', lwd=2, main='differencing orders 2')

identical(diff(diff(sqrt(elec), lag = 1), lag = 12), diff(diff(sqrt(elec), lag = 12), lag = 1))
## [1] TRUE

What if we take difference first and then do transformation?

plot.ts(cbind(elec, 
              d_1=diff(elec, 1), 
              d_1_12 = diff(diff(elec,1),12), 
              trans=sqrt(diff(diff(elec,1),12))), col='red4', lwd=2, main='transformation after diferencing')
## Warning in sqrt(diff(diff(elec, 1), 12)): NaNs produced

global temperature data

read in data and create ts object

temp = fread("temp.txt", header = F)
colnames(temp) = c('year', 'raw','smoothing')
temp.ts = ts(temp$raw, start=1880, end=2019, frequency = 1)
temp.ts
## Time Series:
## Start = 1880 
## End = 2019 
## Frequency = 1 
##   [1] -0.16 -0.07 -0.10 -0.16 -0.28 -0.32 -0.30 -0.35 -0.17 -0.10 -0.34 -0.22
##  [13] -0.27 -0.31 -0.30 -0.22 -0.11 -0.10 -0.26 -0.17 -0.08 -0.15 -0.28 -0.37
##  [25] -0.46 -0.26 -0.22 -0.38 -0.42 -0.48 -0.43 -0.44 -0.35 -0.34 -0.15 -0.14
##  [37] -0.36 -0.46 -0.29 -0.27 -0.27 -0.19 -0.28 -0.26 -0.27 -0.22 -0.11 -0.22
##  [49] -0.20 -0.36 -0.16 -0.09 -0.16 -0.29 -0.13 -0.20 -0.15 -0.03  0.00 -0.02
##  [61]  0.13  0.19  0.07  0.09  0.20  0.09 -0.07 -0.03 -0.11 -0.11 -0.17 -0.07
##  [73]  0.01  0.08 -0.13 -0.14 -0.19  0.05  0.06  0.03 -0.03  0.06  0.03  0.05
##  [85] -0.20 -0.11 -0.06 -0.02 -0.08  0.05  0.03 -0.08  0.01  0.16 -0.07 -0.01
##  [97] -0.10  0.18  0.07  0.17  0.26  0.32  0.14  0.31  0.16  0.12  0.18  0.32
## [109]  0.39  0.27  0.45  0.41  0.22  0.23  0.32  0.45  0.33  0.47  0.61  0.39
## [121]  0.40  0.54  0.63  0.62  0.54  0.68  0.64  0.67  0.55  0.66  0.73  0.61
## [133]  0.65  0.68  0.75  0.90  1.02  0.93  0.86  0.99

Produce ts plot and ACF plot and check if this is a stationary time series

par(mfrow=c(1,2))
plot.ts(temp.ts, col='darkred', lwd=2, xlab='Year', ylab='Temperature Anomaly')
abline(h=0, col='gray', lwd=2)
acf(temp.ts, main='')

How is the stationarity after we take first difference? Check with ts plot and ACF plot.

temp.ts.d1 = diff(temp.ts, lag=1)
par(mfrow=c(1,2))
plot.ts(temp.ts.d1, col='darkred', lwd=2, xlab='Year', ylab='Temperature Anomaly 1st diff')
abline(h=0, col='gray', lwd=2)
acf(temp.ts.d1, main='')

regression on non-linear time trend. Here we consider a quadratic form time trend. After regressing out this trend effect, is the residual sequence stationary? Check this with ts plot and ACF plot.

In time series context, residuals must be stationary in order to avoid spurious regressions (Woolridge, 2012), if there are no properties of stationarity among the residuals, then basically our results tend to produce fake relationships in our model.

x=1:length(temp.ts)-length(temp.ts)/2
lm.fit = lm(temp.ts~x+I(x^2))
# summary(lm.fit)
fitted.val = lm.fit$fitted.values
plot(c(temp.ts), col='darkred', lwd=2, xlab='Year', ylab='', type='l')
lines(1:140,fitted.val, col='steelblue3', lwd=3)

residuals = lm.fit$residuals
par(mfrow=c(1,2))
plot.ts(residuals,col='darkred', lwd=2, ylim=c(-0.3,0.8))
abline(h=0, col='gray', lwd=3)

acf(residuals, main='')