Today in class we talked about Modeling Trends and Autocorrelation.

Modeling Trends

When Modeling Trends, we are talking about time series data. The format is: Y_t = T*R_t + E_t. The Y_t is the value of our time series at time t. TR_t is the trend. E_t is the error term. The error term is independent and identically distributed, with a mean of 0 and variance of sigma².

Types

There are 4 options for trends:

No Trend. TR_t = B_o
Linear trend. TR_t = B₀ + B₁*t
Quadratic trend. TR_t = B₀ + B₁t + B₁t²
Pth Order Polynomial Trend. TR_t = B₀ + B₁t + … + B_pt^p

Which to Use?

We can learn which trend to use when we plot the data. If all the data is around a horizontal line, then there is no trend. If the points are gathered around an increasing or decreasing line, then it is linear. If there is a curve to it, then quadratic. Lastly, if our data shows a reversal in curvature, then we can try a p-th order polynomial.

Autocorrelation

Autocorrelation when the residuals are correlated through time.

Types of Autocorrelation

Positive correlation: positive [or negative] residuals is likely to be followed by another positive [negative] residual.
Negative correlation: positive [or negative] residuals are likely to be followed by another negative [positive] residual.

First Order Autocorrelation

First order correlation is what we will be working with today. What it means by first order is that we only go one unit away from the original. For example, the temperature for today is related/correlated with the temperature yesterday and the temperature tomorrow.

How To Detect

Plot: Use the plot function–plot(residuals ~ time)
Durbin Watson statistic: this function will tell us if the two variables are autocorrelated (positive/negative) and gives us the test stat/pvalue to reject/accept that there is autocorrelation.

Example!

We will use the data set aatemp to practice these methods.

library(faraway)

## Warning: package 'faraway' was built under R version 3.4.4

data("aatemp")
names(aatemp)

## [1] "year" "temp"

We can run the plot to see if which know which type of trend will work best.

plot(temp ~ year, data = aatemp, type = "l")

Our data looks like it is slightly increasing, so we will construct a linear model first.

linmod <- lm(temp ~ year, data = aatemp)
summary(linmod)

## 
## Call:
## lm(formula = temp ~ year, data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533

Looking at the summary function, our year variable is significant. According to pvalues, our model works. We will check the diagnostic plots to double check.

plot(linmod)

Looking at the first one, our line is slightly curved. Not too bad, but we can do a quadratic function to make sure that isn’t a better fit.

quadmod <- lm(temp ~ year + I(year^2), data = aatemp)
summary(quadmod)

## 
## Call:
## lm(formula = temp ~ year + I(year^2), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.127e+02  3.837e+02  -0.554    0.580
## year         2.567e-01  3.962e-01   0.648    0.518
## I(year^2)   -6.307e-05  1.022e-04  -0.617    0.539
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591

Our pvalues here are not significant, so we should stick with the linear model.

Now that we have our chosen model, we can check for autocollinearity.

NOTE: autocollinearity can be used for all models, to save time we will just run it on the model we are using.

library(lmtest)

## Warning: package 'lmtest' was built under R version 3.4.4

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.4.3

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

dwtest(linmod)

## 
##  Durbin-Watson test
## 
## data:  linmod
## DW = 1.6177, p-value = 0.01524
## alternative hypothesis: true autocorrelation is greater than 0

Unfortunately, our p-value is small, which means we reject the null hypothesis (no autocorrelation) in favor of our alternative hypothesis (there is autocorrelation).

Learning Log (Day 16)

Claire Seng

April 3, 2018