Well, I started by accessing the dataset in R using data(lynx) and then looking at its basic dimensions using class(), start(), end(), and frequency() commands.
Or, more simply:
data(lynx); trapped <- lynx
class(trapped)
## [1] "ts"
tsp(trapped)
## [1] 1821 1934 1
One can see that ‘lynx’ is a time series with numbers of annual lynx trappings in Canada from 1821-1934. Using the length() command in R, it is evident that the dataset contains 114 values. Makes sense.
Ok, so it is a nice, simple dataset with no missing values. But what about all of the missing lynx?! Glorious though these data may be, the practice of trapping lynx most definitely is not. In 1821, the year that lynx trappin’ data collection began, 269 of these animals were trapped, and by the time that somebody finally grew tired of recording data (and of trapping lynx, let’s hope), they’d trapped a total of 175334 of these poor creatures! Lord only knows what they used these for…fur coats? Anyhow, let’s plot a histogram of the data, throw a normal curve on there, and examine the shape.
## Warning: package 'moments' was built under R version 3.1.3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 39.0 348.2 771.0 1538.0 2567.0 6991.0
This is quite skewed toward the high end, suggesting that the median value of 771 lynx trapped per year is probably more informative than the mean value of 1538. Let’s get a visual to see why this might be the case…
Whoa, interesting! Lots of up and down, possibly with some sort of regular periodicity. However, as the skewness indicates, rather than oscillating cyclicly around some central value, it seems that there is some usual number of lynx trappings, a sort of baseline range, and then every 10 or so years the number of trappings spikes. Assuming that the number of lynx trappings is associated with the size of the lynx population, there must be some sort of ecological explanation for the longer-than-seasonal periodicity of this dataset, perhaps based on longer-term reproduction cycles or resource availability.
I then fit a linear model to the data to start to assess the trend as a function of time (the time-travelin’ trappin’ trend). I began by creating a vector of the times of the observations, in order to then use a linear model (this was already done invisibly in the R Markdown code above) to fit a model of lynx trappings as a function of time:
trapped.time <- time(trapped)
trapped.lm <- lm(trapped ~ trapped.time)
summary(trapped.lm)
##
## Call:
## lm(formula = trapped ~ trapped.time)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1594 -1211 -755 1032 5366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4630.034 8493.112 -0.545 0.587
## trapped.time 3.285 4.523 0.726 0.469
##
## Residual standard error: 1589 on 112 degrees of freedom
## Multiple R-squared: 0.004689, Adjusted R-squared: -0.004198
## F-statistic: 0.5276 on 1 and 112 DF, p-value: 0.4691
According to the model, over this time period there was an average increase of 3.29 lynx trappings per year.
Next I used the filter() function to smooth things out, because nobody wants to look at sharp pointy peaks on a plot. I even dimmed out the original trace and added a snazzy legend, like so:
ma10 <- filter(x=trapped, filter=rep(x=1/10,times=10), sides=2)
ma5 <- filter(x=trapped, filter=rep(x=1/5,times=5), sides=2)
plot(trapped,col="grey", ylab = "Lynx Trapped", xlab = "Year")
lines(ma10,col="red",lwd=2)
lines(ma5,col="purple",lwd=2)
legend("topright", c(" 5-Year Moving Average", "10-Year Moving Average"), bty = "n", col = c("purple","red"), pch = 15)
abline(trapped.lm, col="black",lwd=2, lty="dashed")
The plot indicates that the 10-year moving average probably isn’t the best way to go. This makes sense, because the periodicity seems to be on the order of about a decade, so if you go 5 years back and 5 years forward from any particular year (as the 10-year ma does) then you run into another cycle. This messes up the average and results in a moving average line that does not track the cycles in the data. Just for fun, and for practice, let’s see what sort of moving average we can get away with without overlapping cycles. It should be somewhere between 5 and 10 years…
It appears that the best cut-off may be around the 7 year mark, indicating that the cycle is 7-8 years long. Anything longer than that and the moving average does not consistently reflect the oscillations. Read on, to see how this manifests itself in the analysis of the autocorrelation structure of this dataset…
Here are the ACF and PACF plots for lynx:
These correlograms indicate that the autocorrelation for lynx trappings is a higher-order process, in line with what I concluded based on the plot of lynx trappings over time. What order is it? Let’s look at the AIC values and the AR values, as calculated by the ar() function, which tries to fit the model as an AR process:
Relying strictly on the AIC values to determine the order, I would conclude that this is an AR(8) process, as the AIC is minimized at lag 8. The plot of the AR values, however, indicates that this is probably not an AR process, but rather an ARMA process, because of the way that the autocorrelation shifts back and forth between negative and positive values.