MBA 678 Assignment 1

Chapter 2 Problems

Air, Rail, and Auto Travel

Plot each of the three pre-event time series (Air, Rail, Car). In these plots, we carry the series forward to August, 2001:

setwd("~/Documents/MBA 678/Unit 1")
library(forecast)
library(knitr)
library(ggvis)

study_data <- read.csv("911_study_data.csv", header = TRUE)
#str(study_data)
#creates time series object
air_pre.ts <- ts(study_data$Air..000s., start = c(1990,1), end = c(2001,8), freq = 12)
#Identifies the min and max of the time series' values
yrange = range(air_pre.ts)
#Creates empty plot
plot(c(1990, 2001.8), yrange, type = "n", xlab = "Year", ylab = "Air Revenue Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data
lines (air_pre.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)

#plot(air_pre.ts, xlab = "Time", ylab = "Air Revenue Passenger Miles (000's)", bty = "l")

rail_pre.ts <- ts(study_data$Rail..000s., start = c(1990,1), end = c(2001,8), freq = 12)
rail_pre_yrange = range(rail_pre.ts)
plot(rail_pre.ts, xlab = "Time", ylab = "Rail Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(325000,665000,60000), labels=format(seq(325000,665000,60000)), cex.axis=0.5, las=2)

auto_pre.ts <- ts(study_data$VMT.Billions, start = c(1990,1), end = c(2001,8), freq = 12)
auto_pre_yrange = range(auto_pre.ts)
plot(auto_pre.ts, xlab = "Time", ylab = "Automobile Passenger Miles (Billions)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(150,300,25), labels=format(seq(150,300,25)), cex.axis=0.5, las=2)

These plots demonstrate all three time series components: Level, Trend, and Seasonality.
We see fairly pronounced seasonality with all three time series, but different trend shapes at play:

air_pre.lm <- tslm(air_pre.ts ~ trend)
#Creates empty plot
plot(c(1990, 2001.8), yrange, type = "n", xlab = "Year", ylab = "Air Revenue Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data
lines (air_pre.ts, bty = "l")
#plots best fit linear
lines(air_pre.lm$fitted, lwd = 3)
#controls axis parameters
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)

In the chart above, we see a linear best-fit of the air travel data. In this case, the linear model fits the data well, and clearly demonstrates the upward slope of the air travel trend.

To better understand the seasonality pattern of the air travel series, it may be helpful to zoom into the data. In this case, I’d like to examine one specific year with a typical seasonality curve: 1998.

air_zoom <- window(air_pre.ts, 1998, c(1998,12))
plot(air_zoom, ylab = "Air Revenue Passenger Miles (000's)", ylim=c(45000, 61000), bty="l")

We find from zooming in that air travel typically peaks in summertime, with lowest periods of travel in winter and moderate values in autumn and spring.

Finally, we’d like to take another look at the air revenue series and attempt to suppress the seasonality we see in the first plot. To accomplish this, we group our monthly observations by year.

air_pre_yearly <- aggregate(air_pre.ts, nfrequency=1, FUN=sum)
plot(air_pre_yearly, type = "b" , ylab = "Annual Air Revenue Passenger Miles (000's)", bty="l")

By grouping (via the aggregate function) our data to the year level, we suppress seasonality and clearly see the near-linear trend in the time series.

Moving onto the rail time series, we find that a linear trend line doesn’t match the plot nearly as well. In this case, we find a better fit with a polynomial trend line:

rail_pre_lin.lm <- tslm(rail_pre.ts ~ trend)
rail_pre_poly.lm <- tslm(rail_pre.ts ~ trend + I(trend^2))
plot(rail_pre.ts, xlab = "Time", ylab = "Rail Passenger Miles (000's)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(325000,665000,60000), labels=format(seq(325000,665000,60000)), cex.axis=0.5, las=2)
lines(rail_pre_lin.lm$fitted, lty = 2, lwd = 1)
lines(rail_pre_poly.lm$fitted, lty = 1, lwd = 3)

With the rail data, we see that a quadratic trend line fits the data better than a linear trend line - in other words, around 1997, a decreasing trend in rail ridership reversed direction, and ridership began to increase.

The seasonality seems to have changed in the rail data too, with a different seasonal pattern from 1990-1995 and from 1995 forward. In this time series, let’s take a more detailed view at the year 2000:

rail_zoom <- window(rail_pre.ts, 2000, c(2000,12))
plot(rail_zoom, ylab = "Rail Passenger Miles (000's)", ylim=c(350000, 585000), bty="l")

With this more-detailed view of rail’s annual pattern, we see a similar pattern to the air travel seasonality - highest amounts of travel in the summer months, with winter months having the lowest values, with spring and autumn falling in the middle.

If we take the same approach to suppressing seasonality as with air, we find an annual series with the following shape:

rail_pre_yearly <- aggregate(rail_pre.ts, nfrequency=1, FUN=sum)
plot(rail_pre_yearly, type = "b" , ylab = "Annual Rail Passenger Miles (000's)", bty="l")

This plot indeed suppresses seasonality and helps to see the change in trend indicated by our trend line. This view confirms that rail travel ‘bottomed out’ in 1996 and began to increase for 1997.

Finally, let’s experiment with changing the scale for this time series to logarithmic:

plot(rail_pre.ts, log="y", xlab="Time", ylab="Ridership (log scale)", bty="l")

As noted in the lecture, in this case we see almost no difference from the original plot.

We can repeat the same exercise for the automobile travel time series, which behaves quite similarly to the air travel one.:

auto_pre_lin.lm <- tslm(auto_pre.ts ~ trend)
#We exclude the polynomial version because the trend is very linear.
#rail_pre_poly.lm <- tslm(rail_pre.ts ~ trend + I(trend^2))
plot(auto_pre.ts, xlab = "Time", ylab = "Auto Passenger Miles (Billions)", bty = "l", xaxt = "n", yaxt = "n")
axis(1, at=seq(1990,2004,1), labels=format(seq(1990,2004,1)))
axis(2, at=seq(29500,69500,8000), labels=format(seq(29500,69500,8000)), cex.axis=0.5, las=2)
lines(auto_pre_lin.lm$fitted, lty = 1, lwd = 3)

auto_zoom <- window(auto_pre.ts, 2000, c(2000,12))
plot(auto_zoom, ylab = "Auto Passenger Miles (Billions)", ylim=c(195, 250), bty="l")

auto_pre_yearly <- aggregate(auto_pre.ts, nfrequency=1, FUN=sum)
plot(auto_pre_yearly, type = "b" , ylab = "Annual Auto Passenger Miles (Billions)", bty="l")

3. Shipments of household appliances:

Time plot of the appliance data:

appliance_data <-read.csv("Appliance_Shipments.csv", header = TRUE)
#creates time series object
appliance.ts <- ts(appliance_data$Shipments, start = c(1985,1), end = c(1989,4), freq = 4)
#Identifies the min and max of the time series' values
yrange_appliance = range(appliance.ts)
#Creates empty plot
plot(c(1985, 1990), yrange_appliance, type = "n", xlab = "Year", ylab = "Appliance Shipments", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data
lines (appliance.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1985,1990,1), labels=format(seq(1985,1990,1)))
axis(2, at=seq(3940, 4900, 160), labels=format(seq(3940, 4900, 160)), cex.axis=0.5, las=2)

Level: All time series’ have level. In this case, the level seems to be about 4400.
Trend: Trend is a bit harder to discern in this series. Eyeballing the data, an obvious trend is not apparent, though there may be a slight downward lean. A linear trend line might help to identify a trend:

#Create trendline
appliance_lin.lm <- tslm(appliance.ts ~ trend)
#Plot with trendline
#Creates empty plot
plot(c(1985, 1990), yrange_appliance, type = "n", xlab = "Year", ylab = "Appliance Shipments", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data
lines (appliance.ts, bty = "l")
lines(appliance_lin.lm$fitted, lty = 2, lwd = 2)
#controls axis parameters
axis(1, at=seq(1985,1990,1), labels=format(seq(1985,1990,1)))
axis(2, at=seq(3940, 4900, 160), labels=format(seq(3940, 4900, 160)), cex.axis=0.5, las=2)

Though not a great fit, the linear trend line shows almost no slope. I would conclude trend is not present in this time series.
Seasonality is also inconsistent in this time series. To help see if there is any consistency, we’ll zoom in on two separate years of data:

appliance_zoom1 <- window(appliance.ts, 1985, c(1985,4))
plot(appliance_zoom1, ylab = "Appliance Shipments", bty="l")

appliance_zoom2 <- window(appliance.ts, 1988, c(1988,4))
plot(appliance_zoom2, ylab = "Appliance Shipments", bty="l")

These two very different annual curves support the conclusion that the base time series demonstrates lack of seasonality. This time series exhibits a high impact from noise. We can hypothesize that perhaps external events disrupted an underlying seasonality or trend, but this would be difficult to support without additional domain knowledge. In this case, the noise to signal ratio seems fairly high, making future forecasting more of a challenge given these base data.

6. Shampoo Sales:

Create a well-formatted time plot of the data:

shampoo_data <-read.csv("Shampoo.csv", header = TRUE)
#creates time series object
shampoo.ts <- ts(shampoo_data$Shampoo.Sales, start = c(1995,1), end = c(1997,12), freq = 12)
#Identifies the min and max of the time series' values
yrange_shampoo = range(shampoo.ts)
#Creates empty plot
plot(c(1995, 1998), yrange_shampoo, type = "n", xlab = "Year", ylab = "Shampoo Sales", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data
lines (shampoo.ts, bty = "l")
#controls axis parameters
axis(1, at=seq(1995,1998,1), labels=format(seq(1995,1998,1)))
axis(2, at=seq(100, 700, 100), labels=format(seq(100, 700, 100)), cex.axis=0.5, las=2)

Level is present, as with all time series data. In this case, the level is approximately 300.
Trend is also present in the data. In this case, there seems to be an inflection point in late 1995, with the trend transitioning from nearly flat to clearly positive. Let’s see if a quadratic trend line supports this ‘eyeball’ conclusion:

shampoo_quad.lm <- tslm(shampoo.ts ~ trend + I(trend^2))
yrange_shampoo = range(shampoo.ts)
#Creates empty plot
plot(c(1995, 1998), yrange_shampoo, type = "n", xlab = "Year", ylab = "Shampoo Sales", bty = "l", xaxt = "n", yaxt = "n") 
#plots time series data and trendline
lines (shampoo.ts, bty = "l")
lines(shampoo_quad.lm$fitted, lty = 1, lwd = 3)
#controls axis parameters
axis(1, at=seq(1995,1998,1), labels=format(seq(1995,1998,1)))
axis(2, at=seq(100, 700, 100), labels=format(seq(100, 700, 100)), cex.axis=0.5, las=2)

In this case, the quadratic trend line does seem to match the time series quite well.
Seasonality is less obvious. Let’s compare the data from 1995 through 1997 in more detail:

shampoo_zoom1 <- window(shampoo.ts, 1995, c(1995,12))
plot(shampoo_zoom1, ylab = "Shampoo Sales", ylim=c(100, 400), bty="l")

shampoo_zoom2 <- window(shampoo.ts, 1996, c(1996,12))
plot(shampoo_zoom2, ylab = "Shampoo Sales", ylim=c(150, 450), bty="l")

shampoo_zoom3 <- window(shampoo.ts, 1997, c(1997,12))
plot(shampoo_zoom3, ylab = "Shampoo Sales", ylim=c(300, 700), bty="l")

After considering all three years individually, a clear seasonality is still difficult to see. Let’s try one more approach - grouping the data from month to quarter:

shampoo_quarterly <- aggregate(shampoo.ts, nfrequency=4, FUN=sum)
plot(shampoo_quarterly, type = "b" , ylab = "Quarterly Shampoo Sales", bty="l")

By grouping to quarters, we suppress some of the noise in the monthly time series. The quarterly time series view still does not offer an obvious sign of seasonality, so this student concludes seasonality is not present in the shampoo sales data.

I don’t expect to find seasonal changes in shampoo sales. Nothing in my experience would suggest a seasonal pattern with shampoo sales - since we can assume shampoo sales are closely correlated with shampoo use. For most people, their frequency of shampoo use does not change with the time of year. Instead, we could expect the trend in shampoo sales might be more strongly related with the population of shampoo users rather than a seasonal pattern.

MBA 678 Assignment 1

J. McHenry

1/26/2017

Chapter 1 Problems

Chapter 2 Problems

Air, Rail, and Auto Travel

3. Shipments of household appliances:

6. Shampoo Sales: