Due Monday, January 29th, 2018 at 11:59PM: Problems 1, 3, and 6 from Chapter 2 of Shmueli
The Research and Innovative Technology Administration’s Bureau of Transportation Statistics (BTS) conducted a study to evaluate the impact of the September 11, 2001, terrorist attack on U.S. transportation … The report analyzes monthly passenger movement data between January 1990 and April 2004. Data on three monthly time series are given in the file Sept11Travel.xls for this period: (1) actual airline revenue passenger miles (Air), (2) rail passenger miles (Rail), and (3) vehicle miles traveled (Auto).
In order to assess the impact of September 11, BTS took the following approach: Using data before September 11, it forecasted future data (under the assumption of no terrorist attack). Then, BTS compared the forecasted series with the actual data to assess the impact of the event.
Plot each of the three pre-event time series (Air, Rail, Car).
Beginning with the Sept11Travel.csv data file per the textbook, Practical Time Series Forecasting with R, I cleaned up some of the data results by converting both Rail and Air miles to reflect miles per million (mm), doing so allowed for an easier read of the time series data along the vertical axis. However, it is important to note that Car miles were NOT converted. Along with scaling the points of Air and Rail miles traveled, I shortened the data window to reflect January 1990 to August 2001 (pre-September 11th).
#cleaning up the Sept11Travel.csv file before converting to a time series
#mutate the Rail and Air miles to per 1,000,000 miles
Travel <- read.csv("Sept11Travel.csv")
Travel <- Travel %>%
mutate(Rail_Miles = ((round(((Rail/1000000)), digits=1)))) %>%
mutate(Air_Miles = ((round(((Air/1000000)), digits=1))))
Travel <- select(Travel, "Month", "Rail_Miles", "Air_Miles", "VMT")
colnames(Travel) <- c("Month", "Rail Miles (mm)", "Air Miles (mm)", "Car Miles")
Travel <- ts(Travel[,2:4], start = 1990, frequency = 12)
#plot the three (Rail, Air, and Car) charts indicating miles traveled
autoplot(Travel, facets = TRUE, xlab="Time", ylab="Miles Traveled", main="Miles Travelled by Month from 1990 to 2004")
Rail_Miles <- ts(Travel[,1], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Rail_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggseasonplot: Rail Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")
ggsubseriesplot(Rail_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggsubseriesplot: Rail Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")
Air_Miles <- ts(Travel[,2], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Air_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggseasonplot: Air Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")
ggsubseriesplot(Air_Miles, xlab="Month", ylab="Miles Traveled (mm)", main="ggsubseriesplot: Air Miles Traveled (mm) by Month from Jan. 1990 to Aug. 2001")
Car_Miles <- ts(Travel[,3], start = c(1990, 1), end = c(2001, 8), frequency = 12)
ggseasonplot(Car_Miles, xlab="Month", ylab="Miles Traveled", main="ggseasonplot: Car Miles Traveled by Month from Jan. 1990 to Aug. 2001")
ggsubseriesplot(Car_Miles, xlab="Month", ylab="Miles Traveled", main="ggsubseriesplot: Car Miles Traveled by Month from Jan. 1990 to Aug. 2001")
As mentioned by Galit Shmueli and Kenneth C. Lichtendahl Jr., Time series components include:
Based on the pre-event time series plots above, there exists a seasonality trend to travel — whether by rail, air, or car. We see from the ggsubseriesplots that relatively consistent levels exist and that there is a significant indication that not only is there a seasonality factor at play but that there is an overall trend in the growing number of people traveling.
Along with a growing trend in the number of miles traveled per month by year, we can see that by suppressing seasonality it is clear that various traveling behaviors exist. For instance, following the new year we see a decline in travel in February that is followed by a lift in traveled miles for March. This month-over-month lift may be attributable to Spring Break and school vacations that typically occur during this time, as parents pack up suitcases and head out from the house with the family. Another decline occurs into April, however, from March and throughout the Summer we see consistent month-over-month growth that culminates in August before school returns in September. Had we included September 2001 in our data series, we would have observed a significant drop from August that left air miles close to their lowest in 1990!
The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985-1989.
#uploading the ApplianceShipments.csv into a data table before converting to a time series set
Appliance <- read.csv("ApplianceShipments.csv")
Appliance <- ts(Appliance[,2], start = c(1985,1), frequency = 4)
autoplot(Appliance, facets = FALSE, xlab="Year", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")
ggsubseriesplot(Appliance, xlab="Quarter", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")
ggseasonplot(Appliance, xlab="Quarter", ylab="Appliances Shipped", main="Number of Appliances Shipped by Quarter from 1985 to 1989")
Based on the data series, it appears as though there has been an upward trend in the number of appliances shipped during the time period of 1985 to 1989 with some fluctuation quarter-over-quarter. That said, through the data we can observe a consistent peak in appliances shipped during Q2 (with the exception of 1986 where that lift occured during Q3). One hypothesis may suggest that consumers receive their tax refunds at the beginning of Q2 and use the additional income to place larger ticket items — i.e., dishwashers, washer machines, driers, etc. From both the autoplot and ggseasonplot of the time series, it’s difficult to determine whether is truly any significant noise to the data set. My immediate inclination would be to further investigate the lag in shipped appliances in Q3 of 1986 to determine whether that spike is attributable to additional noise or whether the data is clean and there is some other phenomenon at play.
The file ShampooSales.csv contains data on the monthly sales of a certain shampoo over a 3-year period.
#uploading the ShampooSales.csv into a data table before converting to a time series set
Shampoo <- read.csv("ShampooSales.csv")
Shampoo <- ts(Shampoo[,2], start = c(1995,1), frequency = 12)
autoplot(Shampoo, facets = FALSE, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")
ggsubseriesplot(Shampoo, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")
ggseasonplot(Shampoo, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")
ggseasonplot(Shampoo, polar = TRUE, xlab="Month", ylab="Shampoo Sales", main="Sales of Shampoo X from Jan. 1995 to Dec. 1997")
From the ggseasonplot of the ShampooSales.csv data series, it’s apparent that several time series components exists within the data. While levels exist within the data series, there is a general upward trend in the sales of Shampoo X. And initially when I tried to suppress for seasonality, the initial ggseasonplot didn’t uncover any surprising results. In fact, it wasn’t until I enabled the polar viewpoint that a seasonal trend emerged from the data series.
One may typically not expect to see a seasonal trend to the purchase of shampoo, given that it’s a product that is (for the most part) used throughout the year on a daily or semi-daily basis. However, Upon enabling the polar viewpoint of the ggseasonplot, it is apparent from the star-like pattern that the purchase of shampoo is a cyclical purchase every other month … give or take. I, unscientifically, examined my own shampoo bottle and noticed that it is approximately 26oz in size, which — assuming a person uses 0.5oz per daily application — would last approximately 52 days, requiring repurchase every other month. This would also explain why sometimes this pattern breaks as well.