Revised 2/5/2016 to display code in output file.
The goal of the study is descriptive. While the method of the study uses forecasting techniques, the primary goal was to, “provide a greater understanding…”. In other words, to describe changes in behavior related to the September 11, 2001 attacks.
In this study, the forecast horizon includes the months from September 2001 through July 2004 (the text mentions April 2004, but the actual study produced forecasts through July 2004). In total, the study had a time horizon of 34 periods - months in this context. Next-month forecasts would not be sufficient for this study. The researchers’ approach required that they create a forecast based on their historical data for the entire 34-month forecast window.
This forecasting exercise requires a low level of automation - perhaps no automation at all, though some automation may be desired to improve reproduce-ability. There are only three time series to be forecasted, the forecasts are a one-time event for the purpose of the analysis, there is no specialized software mentioned in the study, and there is a very high level of expertise in the organization performing the analysis.
t1, t2, and t3 in the Air series represent the January, February, and March air travel revenue miles for 1990. t1 specifically represents the forecast for January, 1990.
y1, y2, and y3 are the forecasted values for Air, in January, February, and March of 1990. The values are y(1) = 35,153,577, y(2) = 32,965,187, y(3) = 39,993,913.
I accessed this version of the report in answering these questions: https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/publications/estimated_impacts_of_9_11_on_us_travel/pdf/entire.pdf
setwd("~/Documents/MBA 678/Unit 1")
library(forecast)
library(knitr)
library(ggvis)
study_data <- read.csv("911_study_data.csv", header = TRUE)
#str(study_data)
#creates time series object
air_pre.ts <- ts(study_data$Air..000s., start = c(1990,1), end = c(2001,8), freq = 12)
#Identifies the min and max of the time series' values
yrange = range(air_pre.ts)
#Creates empty plot
plot(c(1990, 2001.8), yrange, type = "n", xlab = "Year", ylab = "Air Revenue Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data
lines (air_pre.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)
#plot(air_pre.ts, xlab = "Time", ylab = "Air Revenue Passenger Miles (000's)", bty = "l")
rail_pre.ts <- ts(study_data$Rail..000s., start = c(1990,1), end = c(2001,8), freq = 12)
rail_pre_yrange = range(rail_pre.ts)
plot(rail_pre.ts, xlab = "Time", ylab = "Rail Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(325000,665000,60000), labels=format(seq(325000,665000,60000)), cex.axis=0.5, las=2)
auto_pre.ts <- ts(study_data$VMT.Billions, start = c(1990,1), end = c(2001,8), freq = 12)
auto_pre_yrange = range(auto_pre.ts)
plot(auto_pre.ts, xlab = "Time", ylab = "Automobile Passenger Miles (Billions)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(150,300,25), labels=format(seq(150,300,25)), cex.axis=0.5, las=2)
air_pre.lm <- tslm(air_pre.ts ~ trend)
#Creates empty plot
plot(c(1990, 2001.8), yrange, type = "n", xlab = "Year", ylab = "Air Revenue Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data
lines (air_pre.ts, bty = "l")
#plots best fit linear
lines(air_pre.lm$fitted, lwd = 3)
#controls axis parameters
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)
In the chart above, we see a linear best-fit of the air travel data. In this case, the linear model fits the data well, and clearly demonstrates the upward slope of the air travel trend.
To better understand the seasonality pattern of the air travel series, it may be helpful to zoom into the data. In this case, I’d like to examine one specific year with a typical seasonality curve: 1998.
air_zoom <- window(air_pre.ts, 1998, c(1998,12))
plot(air_zoom, ylab = "Air Revenue Passenger Miles (000's)", ylim=c(45000, 61000), bty="l")
We find from zooming in that air travel typically peaks in summertime, with lowest periods of travel in winter and moderate values in autumn and spring.
Finally, we’d like to take another look at the air revenue series and attempt to suppress the seasonality we see in the first plot. To accomplish this, we group our monthly observations by year.
air_pre_yearly <- aggregate(air_pre.ts, nfrequency=1, FUN=sum)
plot(air_pre_yearly, type = "b" , ylab = "Annual Air Revenue Passenger Miles (000's)", bty="l")
By grouping (via the aggregate function) our data to the year level, we suppress seasonality and clearly see the near-linear trend in the time series.
Moving onto the rail time series, we find that a linear trend line doesn’t match the plot nearly as well. In this case, we find a better fit with a polynomial trend line:
rail_pre_lin.lm <- tslm(rail_pre.ts ~ trend)
rail_pre_poly.lm <- tslm(rail_pre.ts ~ trend + I(trend^2))
plot(rail_pre.ts, xlab = "Time", ylab = "Rail Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(325000,665000,60000), labels=format(seq(325000,665000,60000)), cex.axis=0.5, las=2)
lines(rail_pre_lin.lm$fitted, lty = 2, lwd = 1)
lines(rail_pre_poly.lm$fitted, lty = 1, lwd = 3)
With the rail data, we see that a quadratic trend line fits the data better than a linear trend line - in other words, around 1997, a decreasing trend in rail ridership reversed direction, and ridership began to increase.
The seasonality seems to have changed in the rail data too, with a different seasonal pattern from 1990-1995 and from 1995 forward. In this time series, let’s take a more detailed view at the year 2000:
rail_zoom <- window(rail_pre.ts, 2000, c(2000,12))
plot(rail_zoom, ylab = "Rail Passenger Miles (000's)", ylim=c(350000, 585000), bty="l")
With this more-detailed view of rail’s annual pattern, we see a similar pattern to the air travel seasonality - highest amounts of travel in the summer months, with winter months having the lowest values, with spring and autumn falling in the middle.
If we take the same approach to suppressing seasonality as with air, we find an annual series with the following shape:
rail_pre_yearly <- aggregate(rail_pre.ts, nfrequency=1, FUN=sum)
plot(rail_pre_yearly, type = "b" , ylab = "Annual Rail Passenger Miles (000's)", bty="l")
This plot indeed suppresses seasonality and helps to see the change in trend indicated by our trend line. This view confirms that rail travel ‘bottomed out’ in 1996 and began to increase for 1997.
Finally, let’s experiment with changing the scale for this time series to logarithmic:
plot(rail_pre.ts, log="y", xlab="Time", ylab="Ridership (log scale)", bty="l")
As noted in the lecture, in this case we see almost no difference from the original plot.
We can repeat the same exercise for the automobile travel time series, which behaves quite similarly to the air travel one.:
auto_pre_lin.lm <- tslm(auto_pre.ts ~ trend)
#We exclude the polynomial version because the trend is very linear.
#rail_pre_poly.lm <- tslm(rail_pre.ts ~ trend + I(trend^2))
plot(auto_pre.ts, xlab = "Time", ylab = "Auto Passenger Miles (Billions)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)
lines(auto_pre_lin.lm$fitted, lty = 1, lwd = 3)
auto_zoom <- window(auto_pre.ts, 2000, c(2000,12))
plot(auto_zoom, ylab = "Auto Passenger Miles (Billions)", ylim=c(195, 250), bty="l")
auto_pre_yearly <- aggregate(auto_pre.ts, nfrequency=1, FUN=sum)
plot(auto_pre_yearly, type = "b" , ylab = "Annual Auto Passenger Miles (Billions)", bty="l")
appliance_data <-read.csv("Appliance_Shipments.csv", header = TRUE)
#creates time series object
appliance.ts <- ts(appliance_data$Shipments, start = c(1985,1), end = c(1989,4), freq = 4)
#Identifies the min and max of the time series' values
yrange_appliance = range(appliance.ts)
#Creates empty plot
plot(c(1985, 1990), yrange_appliance, type = "n", xlab = "Year", ylab = "Appliance Shipments", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data
lines (appliance.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1985,1990,1), labels=format(seq(1985,1990,1)))
axis(2, at=seq(3940, 4900, 160), labels=format(seq(3940, 4900, 160)), cex.axis=0.5, las=2)
#Create trendline
appliance_lin.lm <- tslm(appliance.ts ~ trend)
#Plot with trendline
#Creates empty plot
plot(c(1985, 1990), yrange_appliance, type = "n", xlab = "Year", ylab = "Appliance Shipments", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data
lines (appliance.ts, bty = "l")
lines(appliance_lin.lm$fitted, lty = 2, lwd = 2)
#controls axis parameters
axis(1, at=seq(1985,1990,1), labels=format(seq(1985,1990,1)))
axis(2, at=seq(3940, 4900, 160), labels=format(seq(3940, 4900, 160)), cex.axis=0.5, las=2)
Though not a great fit, the linear trend line shows almost no slope. I would conclude trend is not present in this time series.
Seasonality is also inconsistent in this time series. To help see if there is any consistency, we’ll zoom in on two separate years of data:
appliance_zoom1 <- window(appliance.ts, 1985, c(1985,4))
plot(appliance_zoom1, ylab = "Appliance Shipments", bty="l")
appliance_zoom2 <- window(appliance.ts, 1988, c(1988,4))
plot(appliance_zoom2, ylab = "Appliance Shipments", bty="l")
These two very different annual curves support the conclusion that the base time series demonstrates lack of seasonality. This time series exhibits a high impact from noise. We can hypothesize that perhaps external events disrupted an underlying seasonality or trend, but this would be difficult to support without additional domain knowledge. In this case, the noise to signal ratio seems fairly high, making future forecasting more of a challenge given these base data.
shampoo_data <-read.csv("Shampoo.csv", header = TRUE)
#creates time series object
shampoo.ts <- ts(shampoo_data$Shampoo.Sales, start = c(1995,1), end = c(1997,12), freq = 12)
#Identifies the min and max of the time series' values
yrange_shampoo = range(shampoo.ts)
#Creates empty plot
plot(c(1995, 1998), yrange_shampoo, type = "n", xlab = "Year", ylab = "Shampoo Sales", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data
lines (shampoo.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1995,1998,1), labels=format(seq(1995,1998,1)))
axis(2, at=seq(100, 700, 100), labels=format(seq(100, 700, 100)), cex.axis=0.5, las=2)
shampoo_quad.lm <- tslm(shampoo.ts ~ trend + I(trend^2))
yrange_shampoo = range(shampoo.ts)
#Creates empty plot
plot(c(1995, 1998), yrange_shampoo, type = "n", xlab = "Year", ylab = "Shampoo Sales", bty = "l", xaxt = "n", yaxt = "n")
#plots time series data and trendline
lines (shampoo.ts, bty = "l")
lines(shampoo_quad.lm$fitted, lty = 1, lwd = 3)
#controls axis parameters
axis(1, at=seq(1995,1998,1), labels=format(seq(1995,1998,1)))
axis(2, at=seq(100, 700, 100), labels=format(seq(100, 700, 100)), cex.axis=0.5, las=2)
In this case, the quadratic trend line does seem to match the time series quite well.
Seasonality is less obvious. Let’s compare the data from 1995 through 1997 in more detail:
shampoo_zoom1 <- window(shampoo.ts, 1995, c(1995,12))
plot(shampoo_zoom1, ylab = "Shampoo Sales", ylim=c(100, 400), bty="l")
shampoo_zoom2 <- window(shampoo.ts, 1996, c(1996,12))
plot(shampoo_zoom2, ylab = "Shampoo Sales", ylim=c(150, 450), bty="l")
shampoo_zoom3 <- window(shampoo.ts, 1997, c(1997,12))
plot(shampoo_zoom3, ylab = "Shampoo Sales", ylim=c(300, 700), bty="l")
After considering all three years individually, a clear seasonality is still difficult to see. Let’s try one more approach - grouping the data from month to quarter:
shampoo_quarterly <- aggregate(shampoo.ts, nfrequency=4, FUN=sum)
plot(shampoo_quarterly, type = "b" , ylab = "Quarterly Shampoo Sales", bty="l")
By grouping to quarters, we suppress some of the noise in the monthly time series. The quarterly time series view still does not offer an obvious sign of seasonality, so this student concludes seasonality is not present in the shampoo sales data.