library(forecast)
library(ggplot2)
library(xtable)
library(knitr)
sep11_df <- read.csv("Sept11Travel.csv")
air_ts <- ts(sep11_df$Air, start = c(1990, 1),end = c(2001, 8), freq = 12)
rail_ts <- ts(sep11_df$Rail, start = c(1990, 1),end = c(2001, 8), freq = 12)
vmt_ts <- ts(sep11_df$VMT, start = c(1990, 1),end = c(2001, 8), freq = 12)
air_ts <- air_ts / 1000000
rail_ts <- rail_ts /1000000The data provided gave us monthly totals of the miles travelled across 3 major modes of transportation: air, rail, and vehicle. We’re going to be looking at each of these modes seperately to investigate which time series aspects are present in each one. Specifically, we’re concerned with the presence of trends, seasonality, and levels.
plot(air_ts, xlab = "Year", main = "Total Monthly Passenger Air Miles 1990-2001",
ylab = "Passenger Air Miles (billions)", yaxt = "n", bty = "l")
ytm <- pretty(range(air_ts))
axis(2, at = ytm, labels = ytm)a_plot <- recordPlot()Immediately we can see that passenger air miles are trending higher. Moreover, the 9-11 terrorism attack can be spotted by the severe dip in air travel that occured within that year. In fact, if the attack had happened months earlier in the summer, the data would show an even more drastic rejection of the trend. With autumn being a time of reduced air travel already, we can see that the overall trend was preserved through the catastrophe and beyond.
We may be able to spot a trend by eye but we still have much to investigate to ensure we’re getting the full story. We’re going to employ a few techniques to seperate seasonality from the overall trend of the data.
air_yr <- aggregate(air_ts, nfrequency = 1, FUN = sum)
plot(air_yr, xlab = "Year", main = "Total Annual Passenger Air Miles 1990-2001",
ylab = "Passenger Air Miles (billions)", yaxt = "n", bty = "l")Above the data has been aggregated annually to avoid seasonal fluctuations. We see reinforced evidence of the upward trend.
This plot seperates out the passenger miles travelled each month by the year they occured in. This is useful in showing that every year passenger miles are increasing while monthly trends are so consistent that there is hardly an overlap (significant month fluxuations Y/o/Y) within the dataset we’ve been provided.
ggseasonplot(air_ts, continuous = TRUE, year.labels = FALSE) + ggtitle("1990-2001 Passenger Air Miles: Annual Seasonality")ggsubseriesplot(air_ts) + ggtitle("1990-2001 Passenger Air Miles: Annual Seasonality")The chart abovve provides a similar, but also insightful, view on seasonality. Each line represents the changes Y/o/Y within that monthly period. For example, February experienced a reduction in travel in 1991 (demonstrated by the downward slope of the second series) before increasing consistently Y/o/Y. From this view we can see that our visual perception of an upward trend is most likely seperate from any seasonal phenomena that may be occurring.
By plotting a linear model to the series we can see that we have a signficant upward trend (p value on the below table confirms this) that is independent of any seasonal factors.
air_lm <- tslm(air_ts ~ trend)
a_plot
print(xtable(air_lm), type = "html")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 35.7284 | 0.8347 | 42.80 | 0.0000 |
| trend | 0.1771 | 0.0103 | 17.24 | 0.0000 |
lines(air_lm$fitted.values, lwd = 3)plot(rail_ts, xlab = "Year", main = "Total Monthly Passenger Rail Miles 1990-2001",
ylab = "Passenger Rail Miles (millions)", yaxt = "n", bty = "l")
ytm <- pretty(range(rail_ts))
axis(2, at = ytm, labels = ytm)r_plot <- recordPlot()Unlike air miles, rail travel doesn’t immediately signal any clear trends. However, the raw plot is useful in identifying that there may be some seasonality present in the data.
Just like before we’re going to aggregate the rail data by year to eliminate the impact of any seasonal trends that may exists.
rail_yr <- aggregate(rail_ts, nfrequency = 1, FUN = sum)
plot(rail_yr, xlab = "Year", main = "Total Annual Passenger Rail Miles 1990-2001",
ylab = "Passenger Rail Miles (millions)", yaxt = "n", bty = "l")This view of the data reveals a local trend within the data. Between 1990 and 1996 rail traffic experienced a significant decline. The second half of the series however shows rail miles establishing a new base level that might be looking to establish a new, somewhat muted, upward trend.
Again the seasonal chart shows very little Y/o/Y crossover in monthly fluxuations but now we see the opposite pattern that we did in air travel. Did you notice that the more current years are all beneath the older ones? This reinforces the argument that rail traffic experienced a significant decrease in passenger miles far outside the bounds of simple seasonality.
ggseasonplot(rail_ts, continuous = TRUE, year.labels = FALSE) + ggtitle("1990-2001 Passenger Rail Miles: Annual Seasonality")ggsubseriesplot(rail_ts) + ggtitle("1990-2001 Passenger Rail Miles: Annual Seasonality")The one insight that the monthly seasonality view gives us is the perceivable turnaround point at which rail traffic began increasing. It was clearly free of seasonal influence as demonstrated by the sharp direction reversal of every series in the chart.
This series never seemed to fit a linear model just based on its ebb and flow visible to the naked eye. That doesn’t mean we can’t prescribe some type of forecast trend. By utilizing a polynomial function within our model, we can observe that rail traffic is following a statistically significant trend (p value < .01) which may lead to future increases.
rail_lm <- tslm(rail_ts ~ trend + I(trend^2))
r_plot
print(xtable(rail_lm), type = "html")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 576.8287 | 15.6184 | 36.93 | 0.0000 |
| trend | -2.0714 | 0.5114 | -4.05 | 0.0001 |
| I(trend^2) | 0.0087 | 0.0035 | 2.49 | 0.0140 |
lines(rail_lm$fitted.values, lwd = 3)plot(vmt_ts, xlab = "Year", main = "Total Monthly Passenger Vehicle Miles 1990-2001",
ylab = "Passenger Vehicle Miles (billions)", yaxt = "n", bty = "l")
ytm <- pretty(range(vmt_ts))
axis(2, at = ytm, labels = ytm)v_plot <- recordPlot()Much like passenger air miles we can eyeball that some type of positive trend exists in the time series.From the peaks and valleys it would appear that a large degree of seasonality exists as well.
Let’s take a look at a yearly aggregated model of vehicle miles.
vmt_yr <- aggregate(vmt_ts, nfrequency = 1, FUN = sum)
plot(vmt_yr, xlab = "Year", main = "Total Annual Passenger Vehicle Miles 1990-2001",
ylab = "Passenger Vehicle Miles (billions)", yaxt = "n", bty = "l")It’s quite clear that vehicle miles are exhibiting a positive growth trend and are incredibly consistent in Y/o/Y growth.
To ensure we’ve done our due diligence and accounted for any seasonal trends we’ll review the same perspectives we saw with air and rail passenger miles.
Again very little overlap, but consistent growth with very defined peaks and valleys throughout the year.
ggseasonplot(vmt_ts, continuous = TRUE, year.labels = FALSE) + ggtitle("1990-2001 Passenger Vehicle Miles: Annual Seasonality")ggsubseriesplot(vmt_ts) + ggtitle("1990-2001 Passenger Vehicle Miles: Annual Seasonality")Below we see that the monthly perspective reveals the same information with the added bonus of learning that January seems to be the most unpredictable month of vehicle travel within the dataset we are given.
Finally we pair a linear model with the vehicle miles time series and, not surprisingly, the trend is significant (p value < .01).
vmt_lm <- tslm(vmt_ts ~ trend)
v_plot
print(xtable(vmt_lm), type = "html")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 173.1714 | 2.4209 | 71.53 | 0.0000 |
| trend | 0.4425 | 0.0298 | 14.85 | 0.0000 |
lines(vmt_lm$fitted.values, lwd = 3)app_df <- read.csv("ApplianceShipments.csv")
app_ts <- ts(app_df$Shipments, start = c(1985,1), end = c(1989,4), freq = 4)
app_lm <- tslm(app_ts ~ trend)
plot(app_ts, bty = "l", ylab = "Appliance Shipments", xlab = "Year", main = "Monthly Appliance Shipments 1985-1989")app_plot <- recordPlot()The appliance shipment data provided is already known to include noise and level (these are elements common to any time series).
Below we can see from our previous tests of seasonality that appliance sales do not seem to follow any conventional pattern of peaks and valleys. However, we can observe that quarter 2 seems to represent the peak sales period of the year , followed by a steady decline through December.
ggseasonplot(app_ts) + ggtitle("1995-1997 Shampoo Sales: Annual Seasonality")ggsubseriesplot(app_ts) + labs(y = "Appliance Shipments")While not as strong as our previous cases, fitting a linear model to the appliance sales still proves significant (p < .05). Which goes to show that sometimes the naked eye is not immune to missing trends that exist right before our very eyes.
app_plot
lines(app_lm$fitted.value)print(xtable(app_lm), type = "html")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4167.6632 | 111.0352 | 37.53 | 0.0000 |
| trend | 24.4940 | 9.2690 | 2.64 | 0.0165 |
shampoo_df <- read.csv("ShampooSales.csv")
shampoo_ts <- ts(shampoo_df$Shampoo.Sales, start = c(1995,1), end = c(1997,12), freq = 12)
plot(shampoo_ts, xlab = "Year", main = "Monthly Shampoo Sales 1995-1997",
ylab = "Shampoo Sales", bty = "l")s_plot <- recordPlot()Eyeballing the time series we see a distinct upward trend that could be either linear or polynomial at its most significant. As always level and noise are present within the data but what we’re not certain of is the presence of seasonal changes.
Below we’ll use two different plots to analyze the three years composing the data.
ggseasonplot(shampoo_ts) + ggtitle("1995-1997 Shampoo Sales: Annual Seasonality")ggseasonplot(shampoo_ts, polar = TRUE) + ggtitle("1995-1997 Shampoo Sales: Annual Seasonality")The polar representation of monthly sales seems to indicate that sales in the second half of the year are at a consistently higher level than the previous half.
Lets explore this further by viewing each month as a seperate series.
ggsubseriesplot(shampoo_ts)shampoo_qtr <- aggregate(shampoo_ts, nfrequency = 4, FUN = sum)With only 3 years to work with, this representation doesn’t seem to lend much to an argument of seasonality. It may be worth while to zoom out and view the data at another aggregate level.
Below we’ve grouped the time series by quarter, again we can see a distinct boost to sales in the second half of the year. While this type of seasonality may just be a result of noise, we’ll acknowledge that there may be a consistent boost in shampoo sales in the 3rd and 4th quarters of the year.
In reality though, shampoo is a staple product and I wouldn’t expect large degrees of seasonality simply based on its use throughout the year by the population at large. Certain factors may lead to increases and decreases in sales but the overall trend would seem more related to population changes rather than seasons.
ggsubseriesplot(shampoo_qtr) + geom_vline(aes(xintercept = 3),colour="#990000", linetype="dashed") + labs(y = "Shampoo Sales")Finally we are ready to plot the trend. For this time series the best fit line is polynomial and is statistically significant (p < .01).
shampoo_lm <- tslm(shampoo_ts ~ trend + I(trend^2))
s_plot
lines(shampoo_lm$fitted.values, lwd = 2)print(xtable(shampoo_lm), type = "html")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 202.8789 | 33.3002 | 6.09 | 0.0000 |
| trend | -5.8801 | 4.1501 | -1.42 | 0.1659 |
| I(trend^2) | 0.4854 | 0.1088 | 4.46 | 0.0001 |