Time series basics: create a (time series) ts object from scratch, read in from files, plot
Check stationarity for classical times series models by simulation.
Handle non-stationary time series. (The goal is to turn non-stationary time series into stationary time series through a series of filters (e.g. detrending, autocorrelation). Illustrate using real data (electricity and temperature)
Import libraries
library(tseries)
library(ggplot2)
library(zoo)
library(data.table)
library(forecast)
library(tidyverse)
library(fma)
create a time series (ts) object with (1) value: \(2t + e_t\), where \(e_t \sim N(0,20)\), \(t=1,2,\cdots,120\); (2) time: from Jan-2019 to Dec-2028. Name this object “tseries”.
Use ?ts
to check documents.
set.seed(100) # sets the starting number used to generate a sequence of random numbers – it ensures that you get the same result if you start with that same seed each time you run the same process.
t <- 2 * seq(from = 1, to = 120, by = 1) + rnorm(120, mean = 0, sd = 20)
tseries <- ts(data = t, start=c(2019,1), end=c(2028,12), frequency = 12) # needs to be in equal time increments
# a time series is an ordered sequence of values of a variable at equally spaced time intervals.
print(tseries)
## Jan Feb Mar Apr May Jun
## 2019 -8.043847 6.630623 4.421658 25.735696 12.339425 18.372602
## 2020 21.967321 42.796810 32.467590 31.413666 26.222915 46.217125
## 2021 33.712418 43.230989 39.595569 60.618891 34.845411 64.941520
## 2022 77.658154 84.346466 99.308047 99.404040 79.967415 112.064070
## 2023 98.875581 62.426882 93.058756 69.228041 109.577297 145.949314
## 2024 116.760085 122.623119 118.422329 179.639179 132.596683 117.739500
## 2025 122.751614 180.970435 108.758080 152.254994 132.249433 161.410790
## 2026 184.191632 168.841899 178.327357 192.347242 212.543515 177.924594
## 2027 177.350084 204.270397 174.426337 176.519305 195.341533 231.262274
## 2028 242.802029 217.851324 225.451870 229.092025 213.709323 199.415698
## Jul Aug Sep Oct Nov Dec
## 2019 2.364186 30.290654 1.494811 12.802757 23.797723 25.925489
## 2020 19.723716 86.205936 33.238200 59.281212 51.239226 63.468092
## 2021 60.177729 99.147512 63.241408 65.776130 56.199714 67.564115
## 2022 50.464487 100.457348 79.554333 118.444619 86.731193 122.381315
## 2023 64.561490 131.609283 86.023488 152.497448 145.625975 103.222962
## 2024 146.759885 140.033832 136.601661 138.150202 150.978065 122.712887
## 2025 178.169037 118.511905 179.936445 163.000085 139.093014 129.375769
## 2026 170.857554 212.566029 168.140852 164.848575 179.394071 240.913655
## 2027 196.617053 224.857513 180.840126 203.993882 198.471654 208.614070
## 2028 223.380491 234.567721 254.362400 230.888526 231.949180 272.303814
Use read.table
to read the data from “ts.sample.txt” into a ts object and print it out.
?read.table
ts.sample <- read.csv("ts.sample.txt", header=TRUE, sep="") # defaults between read.table and read.csv are different
ts.sample
## x
## 1 1.73067773
## 2 -1.08274980
## 3 -2.21151312
## 4 1.50704389
## 5 0.14582993
## 6 0.40603023
## 7 1.30107501
## 8 -0.19889175
## 9 -0.81168049
## 10 0.90951703
## 11 0.08432402
## 12 -0.31358711
## 13 1.72252075
## 14 -0.18193383
## 15 -1.32927078
## 16 -0.27564470
## 17 -0.23653474
## 18 0.78487600
## 19 0.99247488
## 20 0.69994919
## 21 -1.91246878
## 22 -1.24671867
## 23 0.49855245
## 24 -0.45423027
## 25 1.49903922
## 26 1.01092209
## 27 -0.22414683
## 28 -1.60582153
## 29 -0.06655503
## 30 -0.65764414
## 31 0.14960412
## 32 1.30539387
## 33 -0.50526480
## 34 -1.12061501
## 35 0.77018017
## 36 -0.18834248
## 37 -0.31537627
## 38 0.89977692
## 39 0.68375782
## 40 -1.44608804
ts.sample <- read.table("ts.sample.txt")
ts.sample
## x
## 1 1.73067773
## 2 -1.08274980
## 3 -2.21151312
## 4 1.50704389
## 5 0.14582993
## 6 0.40603023
## 7 1.30107501
## 8 -0.19889175
## 9 -0.81168049
## 10 0.90951703
## 11 0.08432402
## 12 -0.31358711
## 13 1.72252075
## 14 -0.18193383
## 15 -1.32927078
## 16 -0.27564470
## 17 -0.23653474
## 18 0.78487600
## 19 0.99247488
## 20 0.69994919
## 21 -1.91246878
## 22 -1.24671867
## 23 0.49855245
## 24 -0.45423027
## 25 1.49903922
## 26 1.01092209
## 27 -0.22414683
## 28 -1.60582153
## 29 -0.06655503
## 30 -0.65764414
## 31 0.14960412
## 32 1.30539387
## 33 -0.50526480
## 34 -1.12061501
## 35 0.77018017
## 36 -0.18834248
## 37 -0.31537627
## 38 0.89977692
## 39 0.68375782
## 40 -1.44608804
ts.sample = ts(ts.sample) # we don't know the frequency of this
ts.sample
## Time Series:
## Start = 1
## End = 40
## Frequency = 1
## x
## [1,] 1.73067773
## [2,] -1.08274980
## [3,] -2.21151312
## [4,] 1.50704389
## [5,] 0.14582993
## [6,] 0.40603023
## [7,] 1.30107501
## [8,] -0.19889175
## [9,] -0.81168049
## [10,] 0.90951703
## [11,] 0.08432402
## [12,] -0.31358711
## [13,] 1.72252075
## [14,] -0.18193383
## [15,] -1.32927078
## [16,] -0.27564470
## [17,] -0.23653474
## [18,] 0.78487600
## [19,] 0.99247488
## [20,] 0.69994919
## [21,] -1.91246878
## [22,] -1.24671867
## [23,] 0.49855245
## [24,] -0.45423027
## [25,] 1.49903922
## [26,] 1.01092209
## [27,] -0.22414683
## [28,] -1.60582153
## [29,] -0.06655503
## [30,] -0.65764414
## [31,] 0.14960412
## [32,] 1.30539387
## [33,] -0.50526480
## [34,] -1.12061501
## [35,] 0.77018017
## [36,] -0.18834248
## [37,] -0.31537627
## [38,] 0.89977692
## [39,] 0.68375782
## [40,] -1.44608804
tseries
as a vector using function time
; (2) Extract start time, end time and frequency using function start
, end
, frequency
; (3) extract a subset from Jan-2019 to June-2020 using function window
.time(tseries) # goes by twelfves
## Jan Feb Mar Apr May Jun Jul Aug
## 2019 2019.000 2019.083 2019.167 2019.250 2019.333 2019.417 2019.500 2019.583
## 2020 2020.000 2020.083 2020.167 2020.250 2020.333 2020.417 2020.500 2020.583
## 2021 2021.000 2021.083 2021.167 2021.250 2021.333 2021.417 2021.500 2021.583
## 2022 2022.000 2022.083 2022.167 2022.250 2022.333 2022.417 2022.500 2022.583
## 2023 2023.000 2023.083 2023.167 2023.250 2023.333 2023.417 2023.500 2023.583
## 2024 2024.000 2024.083 2024.167 2024.250 2024.333 2024.417 2024.500 2024.583
## 2025 2025.000 2025.083 2025.167 2025.250 2025.333 2025.417 2025.500 2025.583
## 2026 2026.000 2026.083 2026.167 2026.250 2026.333 2026.417 2026.500 2026.583
## 2027 2027.000 2027.083 2027.167 2027.250 2027.333 2027.417 2027.500 2027.583
## 2028 2028.000 2028.083 2028.167 2028.250 2028.333 2028.417 2028.500 2028.583
## Sep Oct Nov Dec
## 2019 2019.667 2019.750 2019.833 2019.917
## 2020 2020.667 2020.750 2020.833 2020.917
## 2021 2021.667 2021.750 2021.833 2021.917
## 2022 2022.667 2022.750 2022.833 2022.917
## 2023 2023.667 2023.750 2023.833 2023.917
## 2024 2024.667 2024.750 2024.833 2024.917
## 2025 2025.667 2025.750 2025.833 2025.917
## 2026 2026.667 2026.750 2026.833 2026.917
## 2027 2027.667 2027.750 2027.833 2027.917
## 2028 2028.667 2028.750 2028.833 2028.917
start(tseries)
## [1] 2019 1
end(tseries)
## [1] 2028 12
frequency(tseries)
## [1] 12
window(tseries, start=c(2019, 1), end=c(2020,6))
## Jan Feb Mar Apr May Jun Jul
## 2019 -8.043847 6.630623 4.421658 25.735696 12.339425 18.372602 2.364186
## 2020 21.967321 42.796810 32.467590 31.413666 26.222915 46.217125
## Aug Sep Oct Nov Dec
## 2019 30.290654 1.494811 12.802757 23.797723 25.925489
## 2020
Plot the ts object tseries
you create using plot.ts
function. (You can also try plot
or ggplot
)
# plot
plot.ts(tseries, col='steelblue2', lty=1, lwd=2)
#plot(tseries, col='steelblue2', lty=1, lwd=2) # how to do in ggplot?
# ggplot
df.ts = data.frame(month=time(tseries), tseries)
ggplot(df.ts, aes(x=month, y=tseries)) + geom_line(col='steelblue2', size=1)
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
Can you show both points and line on the plot?
plot(tseries, type='l', col='steelblue2', lty=1, lwd=2)
# If you prefer points, specify type='p'. If you prefer both points and line, specify type='b'
For selected classical time series, check the ts plot to see if it is stationary.
set.seed(1)
m = 100 # length of time series
x.wn = rnorm(m, 0, 1)
plot(x.wn, type='l', col='steelblue2', lwd=2)
abline(h=0, lty=2)
Try to add a constant drift / intercept to the white noise series we generate in the previous section and make a plot.
In probability theory, stochastic drift is the change of the average value of a stochastic (random) process. A related concept is the drift rate, which is the rate at which the average changes. For example, a process that counts the number of heads in a series of.
a = 2 # drift / intercept - shifted
x.wn.drift = a + rnorm(m, 0, 1)
plot(x.wn.drift, type='l', col='steelblue2', lwd=2)
abline(h=a, lty=2)
set.seed(1)
b = .1 # slope of the linear trend # see what happens as you increase or decrease b?
x.wn.trend = b*(1:m) + rnorm(m, 0, 1)
plot(x.wn.trend, type='l', col='steelblue2', lwd=2)
abline(a=0, b=b, lty=2)
set.seed(1)
c = .001 # scale factor for quadratic term
x.wn.quad = c*(1:m)^2 + rnorm(m, 0, 1)
plot(x.wn.quad, type='l', col='steelblue2', lwd=2)
lines(1:m, c*(1:m)^2, lty=2)
# was here!!
set.seed(2)
m = 100 # number of time points
# function to generate a simple random walk process
random_walk = function(sd=1, m=100) cumsum(rnorm(m, 0, sd))
y.rw = random_walk(1, m)
plot(y.rw, type='l', col='steelblue2', lwd=2)
abline(h=0, lty=2)
set.seed(2)
n = 10 # number of threads
threads = replicate(n, random_walk(1, m))
# dim(res)
matplot(threads, type="l", lty="solid", las = 1, ylab = expression(italic(y[t])), xlab = "Time",col = gray(0.6, 1))
abline(h=0, col='darkred', lty=2)
If you prefer ggplot …
threads.df = as.data.frame(threads)
threads.df = threads.df %>% mutate(time=1:m) %>% gather(key='thread', value='value', 1:n)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(n)` instead of `n` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ggplot(threads.df) + geom_line(aes(x=time, y=value, group=thread), col='gray60') + geom_hline(yintercept=0, col='darkred', linetype='dashed')
Notice that the variance of random walk increase with time. This may not be observed in a single-thread plot.
set.seed(1)
m = 100 # length of time series
x.wn = rnorm(m, 0, 1)
acf(x.wn, lag.max = 15, main='')
Acf(x.wn, lag.max = 15, main='')
set.seed(1)
b = .1 # slope of the linear trend
x.wn.trend = b*(1:m) + rnorm(m, 0, 1)
acf(x.wn.trend, main='')
set.seed(2)
m = 100 # number of time points
# function to generate a simple random walk process
random_walk = function(sd=1, m=100) cumsum(rnorm(m, 0, sd))
y.rw = random_walk(1, m)
acf(y.rw, main='')
# [put your code here]
First perform square root transformation, then use first and twelfth differencing to remove trend and seasonality
A simple but often effective way to stabilize the variance across time is to apply a power transformation (square root, cube root, log, etc) to the time series. Here are some examples:
Differencing is a method of transforming a time series dataset. It can be used to remove the series dependence on time, so-called temporal dependence. … Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.
Here is a plot of the famous airline data along with its log to stabilize the variance, the 1st difference of the log to eliminate the linear trend, and the 12th difference of the 1st difference of the log to eliminate the annual cycle. (our frequency is 12)
?diff
## Help on topic 'diff' was found in the following packages:
##
## Package Library
## timeDate /Library/Frameworks/R.framework/Versions/4.0/Resources/library
## base /Library/Frameworks/R.framework/Resources/library
##
##
## Using the first match ...
plot.ts(cbind(elec,
sqrt(elec),
d_1=diff(sqrt(elec), lag=1),
d_1_12=diff(diff(sqrt(elec), lag=1), lag=12)), col='steelblue2', lwd=2, main='differencing orders 1')
# exchange 1st differencing and 12th differencing leads to the same result
plot.ts(cbind(elec,
sqrt(elec),
d_12=diff(sqrt(elec), lag=12),
d_12_1=diff(diff(sqrt(elec), lag=12), lag=1)), col='steelblue2', lwd=2, main='differencing orders 2')
identical(diff(diff(sqrt(elec), lag = 1), lag = 12), diff(diff(sqrt(elec), lag = 12), lag = 1))
## [1] TRUE
What if we take difference first and then do transformation?
plot.ts(cbind(elec,
d_1=diff(elec, 1),
d_1_12 = diff(diff(elec,1),12),
trans=sqrt(diff(diff(elec,1),12))), col='red4', lwd=2, main='transformation after diferencing')
## Warning in sqrt(diff(diff(elec, 1), 12)): NaNs produced
read in data and create ts object
temp = fread("temp.txt", header = F)
colnames(temp) = c('year', 'raw','smoothing')
temp.ts = ts(temp$raw, start=1880, end=2019, frequency = 1)
temp.ts
## Time Series:
## Start = 1880
## End = 2019
## Frequency = 1
## [1] -0.16 -0.07 -0.10 -0.16 -0.28 -0.32 -0.30 -0.35 -0.17 -0.10 -0.34 -0.22
## [13] -0.27 -0.31 -0.30 -0.22 -0.11 -0.10 -0.26 -0.17 -0.08 -0.15 -0.28 -0.37
## [25] -0.46 -0.26 -0.22 -0.38 -0.42 -0.48 -0.43 -0.44 -0.35 -0.34 -0.15 -0.14
## [37] -0.36 -0.46 -0.29 -0.27 -0.27 -0.19 -0.28 -0.26 -0.27 -0.22 -0.11 -0.22
## [49] -0.20 -0.36 -0.16 -0.09 -0.16 -0.29 -0.13 -0.20 -0.15 -0.03 0.00 -0.02
## [61] 0.13 0.19 0.07 0.09 0.20 0.09 -0.07 -0.03 -0.11 -0.11 -0.17 -0.07
## [73] 0.01 0.08 -0.13 -0.14 -0.19 0.05 0.06 0.03 -0.03 0.06 0.03 0.05
## [85] -0.20 -0.11 -0.06 -0.02 -0.08 0.05 0.03 -0.08 0.01 0.16 -0.07 -0.01
## [97] -0.10 0.18 0.07 0.17 0.26 0.32 0.14 0.31 0.16 0.12 0.18 0.32
## [109] 0.39 0.27 0.45 0.41 0.22 0.23 0.32 0.45 0.33 0.47 0.61 0.39
## [121] 0.40 0.54 0.63 0.62 0.54 0.68 0.64 0.67 0.55 0.66 0.73 0.61
## [133] 0.65 0.68 0.75 0.90 1.02 0.93 0.86 0.99
Produce ts plot and ACF plot and check if this is a stationary time series
par(mfrow=c(1,2))
plot.ts(temp.ts, col='darkred', lwd=2, xlab='Year', ylab='Temperature Anomaly')
abline(h=0, col='gray', lwd=2)
acf(temp.ts, main='')
How is the stationarity after we take first difference? Check with ts plot and ACF plot.
temp.ts.d1 = diff(temp.ts, lag=1)
par(mfrow=c(1,2))
plot.ts(temp.ts.d1, col='darkred', lwd=2, xlab='Year', ylab='Temperature Anomaly 1st diff')
abline(h=0, col='gray', lwd=2)
acf(temp.ts.d1, main='')
regression on non-linear time trend. Here we consider a quadratic form time trend. After regressing out this trend effect, is the residual sequence stationary? Check this with ts plot and ACF plot.
In time series context, residuals must be stationary in order to avoid spurious regressions (Woolridge, 2012), if there are no properties of stationarity among the residuals, then basically our results tend to produce fake relationships in our model.
x=1:length(temp.ts)-length(temp.ts)/2
lm.fit = lm(temp.ts~x+I(x^2))
# summary(lm.fit)
fitted.val = lm.fit$fitted.values
plot(c(temp.ts), col='darkred', lwd=2, xlab='Year', ylab='', type='l')
lines(1:140,fitted.val, col='steelblue3', lwd=3)
residuals = lm.fit$residuals
par(mfrow=c(1,2))
plot.ts(residuals,col='darkred', lwd=2, ylim=c(-0.3,0.8))
abline(h=0, col='gray', lwd=3)
acf(residuals, main='')