Time Series Analysis 1

Visit my website for more like this! I would love to hear your feedback (seriously).

require(astsa, quietly=TRUE, warn.conflicts=FALSE)
require(knitr)
## Loading required package: knitr
library(ggplot2)

Data Sources:

Heavily borrowed from:

Overview of Time Series

This lesson will describe some of the important features that need to be considered when dealing with time series analysis. Here we focus on a single time series, future lessons will incorporate more series.

  • A univariate time series is a sequence of measurements of the same variable collected over time. Most often, they occur at regular time intervals.

However, the major different from standard linear models is that the date/time data are not necessarily identically distributed, that is to say, the ordering matters, thus there is often dependency in the data.

Basic Analysis objectives:

The core objective of tsa is to generate a model that describes the true underlying trend of the time series. With a properly specified mode we can:

  1. Describe important feature of the ts pattern.
  2. Explain how the past affects the future.
  3. Explain how two time series interact.
  4. Forecast future values of the time series.
  5. Other real-life applications like serving as a control standard for a variable that measures some manufacturing operation.

Types of models:

There are two basic types of “time domain” models:

  1. ARIMA models (Auto regressive Integrated Moving Average), these are models that relate the present value of a series to past values and past prediction errors.
  2. Ordinary regression models that use time series as x variables. Like in classical statistics, these are often helpful for a first look at the data, and serve as a starting point for some forecasting methods.

Important Considerations

When first looking at a time series it is important too…

  • Is there an underlying trend?
  • Is there seasonality, or a regular repeating pattern of highs and lows?
  • Is there a long-run cycle or period unrelated to the seasonality?
  • Are their outliers
  • Is there constant variance over time, or is it non-constant?
  • Are there any abrupt changes to any aspects of the series?

Let’s look at some examples:

Ex 1.0 Global Temperature

Linear models are applied frequently in time series analysis, and general statistics. Time series can be described by simple linear models, or in more complex cases, local regression models, polynomial models, and splines.

Let’s create a simple linear model for global temperature data. If you need a refresher on regression, I have compiled a modest tutorial in an IPython notebook.

summary(fit<- lm(gtemp~time(gtemp)))
## 
## Call:
## lm(formula = gtemp ~ time(gtemp))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3195 -0.0972  0.0008  0.0825  0.2938 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.12e+01   5.69e-01   -19.7   <2e-16 ***
## time(gtemp)  5.75e-03   2.92e-04    19.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.125 on 128 degrees of freedom
## Multiple R-squared:  0.751,  Adjusted R-squared:  0.749 
## F-statistic:  386 on 1 and 128 DF,  p-value: <2e-16
plot(gtemp, type='o', ylab='Global Temperature')
abline(fit) # Add regression line

plot of chunk unnamed-chunk-3

  • There is an apparent constant trend upwards, and the time series seems to slowly wander up and down along this upward mean. We can see that the series is fairly centered around a mean value (red line).
  • There is no obvious seasonality.
  • There are no potential outliers.
  • Hard to visual judge if the variance remains constant, but it appears fairly regular.

Now consider a more complex multiple regression model that predicts cardiovascular death using temperature and particulate matter pollution.

par(mfrow=c(3, 1))

plot(cmort, main='Cardiovascular Mortality')
plot(tempr, main='Temperature')
plot(part, main='Particulates')

plot of chunk unnamed-chunk-4

pairs(cbind(Mortality=cmort, Temperature=tempr, Particulates=part))