Borrowed from: ryankelly
library(astsa, quietly=TRUE, warn.conflicts=FALSE)
library(ggplot2)
library(knitr)
library(printr)
library(plyr)
library(dplyr)
library(lubridate)
library(gridExtra)
library(reshape2)
library(TTR)
Here we use the file http://robjhyndman.com/tsdldata/misc/kings.dat contains data on the age of death of successive kings of England, starting with William the Conqueror (original source: Hipel and Mcleod, 1994). You can read data into R using the scan() function, which assumes that your data for successive time points is in a simple text file with one column. The first three lines contain some comment on the data, and we want to ignore this when we read the data into R.
kings <- scan('http://robjhyndman.com/tsdldata/misc/kings.dat', skip=3)
head(kings)
## [1] 60 43 67 50 56 42
Once you have read the time series data into R, the next step is to store the data in a time series object in R, so that you can use R’s many functions for analysing time series data. To store the data in a time series object, we use the ts()
function in R.
kings <- ts(kings)
kings
## Time Series:
## Start = 1
## End = 42
## Frequency = 1
## [1] 60 43 67 50 56 42 50 65 68 43 65 34 47 34 49 41 13 35 53 56 16 43 69
## [24] 59 48 59 86 55 68 51 33 49 67 77 81 67 71 81 68 70 77 56
However, it is common to come across time series that have been collected at regular intervals that are less than the one year of the kings
dataset, for example, monthly, weekly or quarterly. In these cases we can specify the number of times that data was collected per year by using the frequency
parameter in the ts( )
function. For monthly data, we set frequency = 12
. We can also specify the first year that the data were collected and the first interval in that year by using the ‘stat’ parameter. For example, the third quarter of 1909 would be `start = c(1909, 3).
Next we load in a dataset of number of births per month in New York city, from January 1946 to December 1958.
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births <- ts(births, frequency = 12, start = c(1946, 1))
births
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
## Nov Dec
## 1946 21.672 21.870
## 1947 21.759 22.073
## 1948 21.059 21.573
## 1949 21.519 22.025
## 1950 22.084 22.991
## 1951 22.964 23.981
## 1952 23.162 24.707
## 1953 25.246 25.180
## 1954 24.712 25.688
## 1955 25.693 26.881
## 1956 26.291 26.987
## 1957 26.634 27.735
## 1958 25.912 26.619
## 1959 26.992 27.897
Next loading data on beach town souvenir shop.
gift <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
gift<- ts(gift, frequency=12, start=c(1987,1))
gift
## Jan Feb Mar Apr May Jun Jul
## 1987 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74 4349.61
## 1988 2499.81 5198.24 7225.14 4806.03 5900.88 4951.34 6179.12
## 1989 4717.02 5702.63 9957.58 5304.78 6492.43 6630.80 7349.62
## 1990 5921.10 5814.58 12421.25 6369.77 7609.12 7224.75 8121.22
## 1991 4826.64 6470.23 9638.77 8821.17 8722.37 10209.48 11276.55
## 1992 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78
## 1993 10243.24 11266.88 21826.84 17357.33 15997.79 18601.53 26155.15
## Aug Sep Oct Nov Dec
## 1987 3566.34 5021.82 6423.48 7600.60 19756.21
## 1988 4752.15 5496.43 5835.10 12600.08 28541.72
## 1989 8176.62 8573.17 9690.50 15151.84 34061.01
## 1990 7979.25 8093.06 8476.70 17914.66 30114.41
## 1991 12552.22 11637.39 13606.89 21822.11 45060.69
## 1992 19888.61 23933.38 25391.35 36024.80 80721.71
## 1993 28586.52 30505.41 30821.33 46634.38 104660.67
Plot the kings
data.
plot.ts(kings)
At this point we could guess that this time series could be described using an additive model, since the random fluctuations in the data are roughly constant in size over time.
Plotting the births
data.
plot.ts(births)
We can see from this time series that there is certainly some seasonal variation in the number of births per month; there is a peak every summer, and a trough every winter. Again the it seems like this could be described using an additive model, as the seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series, and the random fluctuations seem constant over time.
plot.ts(gift)
In this case, an additive model is not appropriate since the size of the seasonal and random fluctuations change over time and the level of the time series. It is then appropriate to transform the time series so that we can model the data with a classic additive model.
logGift <- log(gift)
plot.ts(logGift)
Decomposing a time series means separating it into it’s constituent components, which are often a trend component and a random component, and if the data is seasonal, a seasonal component.
Recall that non-seasonal time series consist of a trend component and a random component. Decomposing the time series involves tying to separate the time series into these individual components.
One way to do this is using some smoothing method
, such as a simple moving average. The SMA()
function in the TTR
R package can be used to smooth time series data using a moving average. The SMA
function takes a span argument as n
order. To calculate the moving average of order 5, we set n = 5
.
Lets start with n=3
to see a clearer picture of the Kings dataset trend component
kingsSMA3 <- SMA(kings, n=3)
plot.ts(kingsSMA3)
It seems like there is still some random fluctuations in the data, we might want to try a big larger of a smoother. Lets try n=8
.
kingsSMA3 <- SMA(kings, n=8)
plot.ts(kingsSMA3)
This is better, we can see that the death of English kings has declined from ~55 years to ~40 years for a brief period, followed by a rapid increase in the next 20 years to ages in the 70’s.
A seasonal time series, in addition to the trend and random components, also has a seasonal component. Decomposing a seasonal time series means separating the time series into these three components. In R we can use the decompose()
function to estimate the three components of the time series.
Lets estimate the trend, seasonal, and random components of the New York births dataset.
birthsComp <- decompose(births)
birthsComp
## $x
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
## Nov Dec
## 1946 21.672 21.870
## 1947 21.759 22.073
## 1948 21.059 21.573
## 1949 21.519 22.025
## 1950 22.084 22.991
## 1951 22.964 23.981
## 1952 23.162 24.707
## 1953 25.246 25.180
## 1954 24.712 25.688
## 1955 25.693 26.881
## 1956 26.291 26.987
## 1957 26.634 27.735
## 1958 25.912 26.619
## 1959 26.992 27.897
##
## $seasonal
## Jan Feb Mar Apr May Jun
## 1946 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1947 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1948 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1949 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1950 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1951 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1952 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1953 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1954 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1955 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1956 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1957 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1958 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## 1959 -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## Jul Aug Sep Oct Nov Dec
## 1946 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1947 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1948 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1949 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1950 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1951 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1952 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1953 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1954 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1955 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1956 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1957 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1958 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
## 1959 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
##
## $trend
## Jan Feb Mar Apr May Jun Jul
## 1946 NA NA NA NA NA NA 23.98433
## 1947 22.35350 22.30871 22.30258 22.29479 22.29354 22.30562 22.33483
## 1948 22.43038 22.43667 22.38721 22.35242 22.32458 22.27458 22.23754
## 1949 22.06375 22.08033 22.13317 22.16604 22.17542 22.21342 22.27625
## 1950 23.21663 23.26967 23.33492 23.42679 23.50638 23.57017 23.63888
## 1951 24.00083 24.12350 24.20917 24.28208 24.35450 24.43242 24.49496
## 1952 24.27204 24.27300 24.28942 24.30129 24.31325 24.35175 24.40558
## 1953 24.78646 24.84992 24.92692 25.02362 25.16308 25.26963 25.30154
## 1954 25.92446 25.92317 25.92967 25.92137 25.89567 25.89458 25.92963
## 1955 25.64612 25.78679 25.93192 26.06388 26.16329 26.25388 26.35471
## 1956 27.21104 27.21900 27.20700 27.26925 27.35050 27.37983 27.39975
## 1957 27.44221 27.40283 27.44300 27.45717 27.44429 27.48975 27.54354
## 1958 27.68642 27.76067 27.75963 27.71037 27.65783 27.58125 27.49075
## 1959 26.96858 27.00512 27.09250 27.17263 27.26208 27.36033 NA
## Aug Sep Oct Nov Dec
## 1946 23.66213 23.42333 23.16112 22.86425 22.54521
## 1947 22.31167 22.26279 22.25796 22.27767 22.35400
## 1948 22.21988 22.16983 22.07721 22.01396 22.02604
## 1949 22.35750 22.48862 22.70992 22.98563 23.16346
## 1950 23.75713 23.86354 23.89533 23.87342 23.88150
## 1951 24.48379 24.43879 24.36829 24.29192 24.27642
## 1952 24.44475 24.49325 24.58517 24.70429 24.76017
## 1953 25.34125 25.42779 25.57588 25.73904 25.87513
## 1954 25.98246 26.01054 25.88617 25.67087 25.57312
## 1955 26.40496 26.45379 26.64933 26.95183 27.14683
## 1956 27.44150 27.45229 27.43354 27.44488 27.46996
## 1957 27.56933 27.63167 27.67804 27.62579 27.61212
## 1958 27.46183 27.42262 27.34175 27.25129 27.08558
## 1959 NA NA NA NA NA
##
## $random
## Jan Feb Mar Apr May
## 1946 NA NA NA NA NA
## 1947 -0.237305288 0.863252404 0.543893429 0.175887019 -0.793193109
## 1948 0.183819712 -0.318705929 0.340268429 0.121262019 -0.354234776
## 1949 0.161444712 0.002627404 -0.571689904 -0.749362981 -0.666068109
## 1950 0.064569712 -0.292705929 0.479560096 1.047887019 1.561973558
## 1951 -0.036638622 1.008460737 0.004310096 0.556595353 -0.176151442
## 1952 0.203153045 0.079960737 -0.376939904 -0.853612981 -0.576901442
## 1953 0.254736378 -0.122955929 -0.224439904 -0.159946314 0.016265224
## 1954 -0.590263622 -0.536205929 0.189810096 1.079303686 1.062681891
## 1955 0.021069712 0.535169071 -0.073439904 -1.787196314 -1.647943109
## 1956 -0.316846955 -0.918039263 -0.155523237 0.507428686 0.924848558
## 1957 -0.176013622 -0.471872596 -0.762523237 0.240512019 1.182056891
## 1958 0.122778045 -0.753705929 0.340851763 -0.319696314 0.021515224
## 1959 -0.215388622 0.363835737 -0.295023237 -0.419946314 -1.115734776
## Jun Jul Aug Sep Oct
## 1946 NA -0.963379006 -0.925718750 -0.939949519 -0.709369391
## 1947 -1.391369391 -0.311879006 0.347739583 0.150592147 0.076797276
## 1948 0.001672276 0.256412660 0.119531250 -0.623449519 0.289547276
## 1949 0.813838942 0.371704327 0.225906250 0.081758814 -0.578161058
## 1950 0.166088942 -0.423920673 -0.467718750 -0.433157853 -0.418577724
## 1951 0.387838942 0.499995994 -0.030385417 -0.116407853 -0.033536058
## 1952 0.538505609 0.414370994 0.206656250 0.025133814 -0.161411058
## 1953 -0.481369391 0.251412660 0.100156250 0.148592147 0.110880609
## 1954 0.380672276 -0.679670673 -0.269052083 -0.550157853 -0.282411058
## 1955 0.118380609 0.550245994 1.029447917 0.768592147 0.359422276
## 1956 -0.087577724 0.126204327 -0.437093750 -0.087907853 0.927213942
## 1957 0.053505609 -0.934587340 -0.592927083 0.724717147 0.030713942
## 1958 0.581005609 0.282204327 0.132572917 0.290758814 -0.171994391
## 1959 -1.642077724 NA NA NA NA
## Nov Dec
## 1946 -0.082484776 -0.298388622
## 1947 0.591098558 0.095819712
## 1948 0.154806891 -0.076221955
## 1949 -0.356859776 -0.761638622
## 1950 -0.679651442 -0.513680288
## 1951 -0.218151442 0.081403045
## 1952 -0.432526442 0.323653045
## 1953 0.616723558 -0.318305288
## 1954 0.150890224 0.491694712
## 1955 -0.149068109 0.110986378
## 1956 -0.044109776 -0.106138622
## 1957 0.117973558 0.499694712
## 1958 -0.229526442 -0.089763622
## 1959 NA NA
##
## $figure
## [1] -0.6771947 -2.0829607 0.8625232 -0.8016787 0.2516514 -0.1532556
## [7] 1.4560457 1.1645938 0.6916162 0.7752444 -1.1097652 -0.3768197
##
## $type
## [1] "additive"
##
## attr(,"class")
## [1] "decomposed.ts"
Now lets plot the components .
plot(birthsComp)
If you have a seasonal time series, you can seasonally adjust the series by estimating the seasonal component, and subtracting it from the original time series. We can see below that time time series simply consists of the trend and random components.
birthsSeasonAdj <- births - birthsComp$seasonal
plot.ts(birthsSeasonAdj)