This vignette is to explore and show some of the common techniques when dealing with time series data in R and to get some insights on how to process and understand time series data better. Another reason is to have a good knowledge of time series dataset is to build up skills and toolsets using R to analyse another more complicated dataset like in equity market, eg. Dow Jones Weekly Returns and Earning Per Share etc. This vignette is to introduce the time series dataset (ts) and only touches the surface to get the reader to seek further advance manual and instructions.
To illustrate the time series data, a dataset called “austres” is used in this vignette as sample data along with R commands in package called “astsa”.
The ‘austres’ dataset contains quarterly time series of the number of Australian residents from 1971 to 1993. Further information can be found via this link: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/austres.html The “astsa” library is the “Applied Statistical Time Series Analysis” package. Further information can be found via this link: http://www.stat.pitt.edu/stoffer/tsa4/
library(help="astsa")
library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
austres
## Qtr1 Qtr2 Qtr3 Qtr4
## 1971 13067.3 13130.5 13198.4
## 1972 13254.2 13303.7 13353.9 13409.3
## 1973 13459.2 13504.5 13552.6 13614.3
## 1974 13669.5 13722.6 13772.1 13832.0
## 1975 13862.6 13893.0 13926.8 13968.9
## 1976 14004.7 14033.1 14066.0 14110.1
## 1977 14155.6 14192.2 14231.7 14281.5
## 1978 14330.3 14359.3 14396.6 14430.8
## 1979 14478.4 14515.7 14554.9 14602.5
## 1980 14646.4 14695.4 14746.6 14807.4
## 1981 14874.4 14923.3 14988.7 15054.1
## 1982 15121.7 15184.2 15239.3 15288.9
## 1983 15346.2 15393.5 15439.0 15483.5
## 1984 15531.5 15579.4 15628.5 15677.3
## 1985 15736.7 15788.3 15839.7 15900.6
## 1986 15961.5 16018.3 16076.9 16139.0
## 1987 16203.0 16263.3 16327.9 16398.9
## 1988 16478.3 16538.2 16621.6 16697.0
## 1989 16777.2 16833.1 16891.6 16956.8
## 1990 17026.3 17085.4 17106.9 17169.4
## 1991 17239.4 17292.0 17354.2 17414.2
## 1992 17447.3 17482.6 17526.0 17568.7
## 1993 17627.1 17661.5
summary(austres)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13067 14110 15184 15273 16399 17662
plot(austres, xlab="Year",ylab="Number of Australian Residents", main="Australian Population in thousand")
As demonstrated above, the time series dataset has no dimension i.e.row and column, therefor care must be taken when processing time series.
“austres” is a collection of 89 numbers called a time series object.
Demonstrate the nature of time series via a few normal R commands:
austres[1] # show the first element
## [1] 13067.3
length(austres) # show the number of elements
## [1] 89
dim(austres) # no dimensions..
## NULL
Examine the dataset further using other commands & functions Check the correlation structure of d_austres using Lag Plot function. This is to look at a grid of scatterplots of d_austres[t] versus lagged values as shown in the following 4 scatterplots below:
d_austres <- diff(log(austres)) # difference
plot(d_austres) # plot it
shapiro.test(d_austres) # perform test to show normality of data
##
## Shapiro-Wilk normality test
##
## data: d_austres
## W = 0.98999, p-value = 0.7433
lag1.plot(d_austres,4) #produce a grid of scatterplots of a series versus lagged values of the series
Q <- factor(cycle(austres)) # split into quarter factors
trend <- time(austres) - 1980 # center the time to make the results look better using 1980 as midway in the whole dataset
reg <- lm(log(austres) ~ 0 + trend + Q, na.action=NULL) # run the regression without intercept
summary(reg)
##
## Call:
## lm(formula = log(austres) ~ 0 + trend + Q, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0074945 -0.0035330 -0.0004821 0.0038895 0.0074295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## trend 1.370e-02 6.964e-05 196.7 <2e-16 ***
## Q1 9.599e+00 9.160e-04 10480.2 <2e-16 ***
## Q2 9.599e+00 8.934e-04 10745.0 <2e-16 ***
## Q3 9.599e+00 9.100e-04 10548.5 <2e-16 ***
## Q4 9.599e+00 9.128e-04 10516.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.004218 on 84 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 9.279e+07 on 5 and 84 DF, p-value: < 2.2e-16