Assignment

Due Monday, January 29th, 2018 at 11:59PM: Problems 1, 3, and 6 from Chapter 2 of Shmueli




Impact of September 11 on Air Travel in the United States

The Research and Innovative Technology Administration’s Bureau of Transportation Statistics (BTS) conducted a study to evaluate the impact of the September 11, 2001, terrorist attack on U.S. transportation … The report analyzes monthly passenger movement data between January 1990 and April 2004. Data on three monthly time series are given in the file Sept11Travel.xls for this period: (1) actual airline revenue passenger miles (Air), (2) rail passenger miles (Rail), and (3) vehicle miles traveled (Auto).

In order to assess the impact of September 11, BTS took the following approach: Using data before September 11, it forecasted future data (under the assumption of no terrorist attack). Then, BTS compared the forecasted series with the actual data to assess the impact of the event.

Plot each of the three pre-event time series (Air, Rail, Car).

Beginning with the Sept11Travel.csv data file per the textbook, Practical Time Series Forecasting with R, I cleaned up some of the data results by converting both Rail and Air miles to reflect miles per million (mm), doing so allowed for an easier read of the time series data along the vertical axis. However, it is important to note that Car miles were NOT converted. Along with scaling the points of Air and Rail miles traveled, I shortened the data window to reflect January 1990 to August 2001 (pre-September 11th).

Table 1.1 - Time Series Rail (mm), Air (mm), and Car Miles Traveled by Month from 1990 to 2004
#cleaning up the Sept11Travel.csv file before converting to a time series
#mutate the Rail and Air miles to per 1,000,000 miles

Travel <- read.csv("Sept11Travel.csv")
Travel <- Travel %>%
  mutate(Rail_Miles = ((round(((Rail/1000000)), digits=1)))) %>%
  mutate(Air_Miles = ((round(((Air/1000000)), digits=1))))

Travel <- select(Travel, "Month", "Rail_Miles", "Air_Miles", "VMT")
colnames(Travel) <- c("Month", "Rail Miles (mm)", "Air Miles (mm)", "Car Miles")
Travel <- ts(Travel[,2:4], start = 1990, frequency = 12)

#plot the three (Rail, Air, and Car) charts indicating miles traveled
autoplot(Travel, facets = TRUE, xlab="Time", ylab="Miles Traveled", main="Miles Travelled by Month from 1990 to 2004")

Table 1.2 - Pre-Event Time Series Rail Miles (mm) Traveled by Month from Jan. 1990 to Aug. 2001
Rail_Miles <- ts(Travel[,1], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Rail_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggseasonplot: Rail Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")

ggsubseriesplot(Rail_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggsubseriesplot: Rail Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")

Table 1.3 - Pre-Event Time Series Air Miles (mm) Traveled by Month from Jan. 1990 to Aug. 2001
Air_Miles <- ts(Travel[,2], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Air_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggseasonplot: Air Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")

ggsubseriesplot(Air_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggsubseriesplot: Air Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")

Table 1.4 - Pre-Event Time Series Car Miles Traveled by Month from Jan. 1990 to Aug. 2001
Car_Miles <- ts(Travel[,3], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Car_Miles, xlab="Month", ylab="Miles Traveled", main="ggseasonplot: Car Miles Traveled by Month from Jan. 1990 to Aug. 2001")

ggsubseriesplot(Car_Miles, xlab="Month", ylab="Miles Traveled", main="ggsubseriesplot: Car Miles Traveled by Month from Jan. 1990 to Aug. 2001")

(a) What time series components appear from the plot?

As mentioned by Galit Shmueli and Kenneth C. Lichtendahl Jr., Time series components include:

  • Level - describes the average value of the series
  • Trend - is the change in the series from one period to the next
  • Seasonality - describes a short-term cyclical behavior that can be observed several times within the given series
  • Noise - is the non-systematic part of a time series

Based on the pre-event time series plots above, there exists a seasonality trend to travel — whether by rail, air, or car. We see from the ggsubseriesplots that relatively consistent levels exist and that there is a significant indication that not only is there a seasonality factor at play but that there is an overall trend in the growing number of people traveling.



Shipments of Household Appliances

The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985-1989.

(a) Create a well-formatted time plot of the data

Table 2.1 - Number of Appliances Shipped by Quarter from 1985 to 1989
#uploading the ApplianceShipments.csv into a data table before converting to a time series set

Appliance <- read.csv("ApplianceShipments.csv")
Appliance <- ts(Appliance[,2], start = c(1985,1), frequency = 4)
autoplot(Appliance, facets = FALSE, xlab="Year", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")

ggsubseriesplot(Appliance, xlab="Quarter", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")

ggseasonplot(Appliance, xlab="Quarter", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")



(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

Based on the data series, it appears as though there has been an upward trend in the number of appliances shipped during the time period of 1985 to 1989 with some fluctuation quarter-over-quarter. That said, through the data we can observe a consistent peak in appliances shipped during Q2 (with the exception of 1986 where that lift occured during Q3). One hypothesis may suggest that consumers receive their tax refunds at the beginning of Q2 and use the additional income to place larger ticket items — i.e., dishwashers, washer machines, driers, etc. From both the autoplot and ggseasonplot of the time series, it’s difficult to determine whether is truly any significant noise to the data set. My immediate inclination would be to further investigate the lag in shipped appliances in Q3 of 1986 to determine whether that spike is attributable to additional noise or whether the data is clean and there is some other phenomenon at play.













Forecasting of Shampoo Sales

The file ShampooSales.csv contains data on the monthly sales of a certain shampoo over a 3-year period.

(a) Create a well-formatted time plot of the data

Table 3.1 - Number of Appliances Shipped by Quarter from 1985 to 1989
#uploading the ShampooSales.csv into a data table before converting to a time series set

Shampoo <- read.csv("ShampooSales.csv")
Shampoo <- ts(Shampoo[,2], start = c(1995,1), frequency = 12)
autoplot(Shampoo, facets = FALSE, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")

ggsubseriesplot(Shampoo, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")

ggseasonplot(Shampoo, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")

ggseasonplot(Shampoo, polar = TRUE, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")



(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

From the ggseasonplot of the ShampooSales.csv data series, it’s apparent that several time series components exists within the data. While levels exist within the data series, there is a general upward trend in the sales of Shampoo X. And initially when I tried to suppress for seasonality, the initial ggseasonplot didn’t uncover any surprising results. In fact, it wasn’t until I enabled the polar viewpoint that a seasonal trend emerged from the data series.



(c) Do you expect to see seasonality in sales of shampoo?

One may typically not expect to see a seasonal trend to the purchase of shampoo, given that it’s a product that is (for the most part) used throughout the year on a daily or semi-daily basis. However, Upon enabling the polar viewpoint of the ggseasonplot, it is apparent from the star-like pattern that the purchase of shampoo is a cyclical purchase every other month … give or take. I, unscientifically, examined my own shampoo bottle and noticed that it is approximately 26oz in size, which — assuming a person uses 0.5oz per daily application — would last approximately 52 days, requiring repurchase every other month. This would also explain why sometimes this pattern breaks as well.