Chapter 1 Questions

Question 1: Is the goal of this study descriptive or predictive?

The goal of this study is descriptive. It is to see what effect, if any, the September 11, 2001 terror attack had on traveling modes: air, rail, and car.

Question 2: What is the forecast horizon to consider in this task? Are next-month forecasts sufficient?

The forecast horizon would depend on how exact one needs the result to be but it should be out far enough to see if there is a pattern or simple one-time spike. Month by month should be sufficiant for assessing the overall effect. Air travel has seasonality to it but that should show up in the plot.

Question 4: What is the meaning of t = 1, 2, 3 in the Air series? Which time period does t = 1 refer to?

t=1 refers to the first time period in the series, in this case January 1990.

Question 5: What are the values for y1, y2, and y3 in the Air series?

y1 is 35153577, the first value of the miles of air travel; y2 is 32965187; y3 is 39993913.

Chapter 2 Questions

Question 1: Impact of September 11 on Air Travel in the United States

The Research and Innovative Technology Administration’s Bureau of Transportation Statistics (BTS) conducted a study to evaluate the impact of the September 11, 2001, terrorist attack on U.S. transportation. The study report and the data can be found at www.bts.gov/publications/estimated_impacts_of_9_11_on_ us_travel. The goal of the study was stated as follows:

      The purpose of this study is to provide a greater understanding of the passenger travel behavior patterns       of persons making long distance trips before and after September 11.

The report analyzes monthly passenger movement data between January 1990 and April 2004. Data on three monthly time series are given in the file Sept11Travel.xls for this period: (1) actual airline revenue passenger miles (Air), (2) rail passenger miles (Rail), and (3) vehicle miles traveled (Auto).

In order to assess the impact of September 11, BTS took the following approach: Using data before September 11, it forecasted future data (under the assumption of no terrorist attack). Then, BTS compared the forecasted series with the actual data to assess the impact of the event.

Plot each of the three pre-event time series (Air, Rail, Car).

I first read the dataset for data assessing the impact the 9/11/2001 terrorist attack may have had on transportation. I then used “str” to make sure the data looked reasonable.
transport911 <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/Sept11Travel_Updated.csv")
str(transport911)
## 'data.frame':    172 obs. of  4 variables:
##  $ Month: Factor w/ 172 levels "1-Apr","1-Aug",..: 86 75 119 42 130 108 97 53 163 152 ...
##  $ Air  : int  35153577 32965187 39993913 37981886 38419672 42819023 45770315 48763670 38173223 39051877 ...
##  $ Rail : int  454115779 435086002 568289732 568101697 539628385 570694457 618571581 609210368 488444939 514253920 ...
##  $ VMT  : num  163 153 178 179 189 ...
I then wanted to focus on plotting each form of transportation as a time series. I then used the summary function to see what the minimum and maximum values for the y-axis, or transportation type, should be. I chose to go beyond the September 11 date although the directions to the problem may have wanted the data to only include the pre-September 11 data. It wasn’t entirely clear to me. I started with air transportation:
transport911air.ts <- ts(transport911$Air, start=c(1990,1), end=c(2004,4), freq = 12)
summary(transport911air.ts)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 29670000 42660000 49460000 49260000 55360000 69000000
I then plotted the Air Travel data:
plot(transport911air.ts, xlab="Time", ylab="Air Miles Traveled", ylim=c(29670000, 69000000), bty="l")

transport911Rail.ts <- ts(transport911$Rail, start=c(1990,1), end=c(2004,4), freq = 12)
summary(transport911Rail.ts)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 326900000 440200000 477100000 482800000 537700000 664000000
I then plotted the miles traveled by rail:
plot(transport911Rail.ts, xlab="Time", ylab="Rail Miles Traveled", ylim=c(326900000, 664000000), bty="l")

transport911Car.ts <- ts(transport911$VMT, start=c(1990,1), end=c(2004,4), freq = 12)
summary(transport911Car.ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   153.2   192.9   209.7   210.3   229.8   261.3
I then plotted the miles traveled by car:
plot(transport911Car.ts, xlab="Time", ylab="Car Miles Traveled", ylim=c(153.2, 261.3), bty="l")

(a) What time series components appear from the plot?

(b) What type of trend appears? Change the scale of the series, add trend lines, and suppress seasonality to better visualize the trend pattern.

Air travel was definitely effected by the September 11th attack. Therefore, that was the data I focused on. I plotted a subsection of the data ending before September 11, 2001.
Air911 <- window(transport911air.ts,1995,c(2001,8))
plot(Air911, ylab="Air Travel Miles",ylim=c(min(transport911air.ts),max(transport911air.ts)), bty="l")

I then suppressed seasonality and added a trend line to see what the trend wauld have been:
quarterly <- aggregate(Air911,nfrequency=4, FUN=sum)
plot(quarterly, bty="l")
library(forecast)
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3
miles <- tslm(quarterly~trend)
summary(miles)
## 
## Call:
## tslm(formula = quarterly ~ trend)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -13289365  -7370567  -2153420   7331377  15908917 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 136299691    4057275  33.594  < 2e-16 ***
## trend         1697426     262718   6.461 1.11e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10050000 on 24 degrees of freedom
## Multiple R-squared:  0.635,  Adjusted R-squared:  0.6197 
## F-statistic: 41.74 on 1 and 24 DF,  p-value: 1.109e-06
lines(miles$fitted, lwd=2)

#####With an R-square value in the low 60s, this is a pretty good fit. The next check was to see the overall impact the September 11th incident had on air travel.

quarterly <- aggregate(transport911air.ts,nfrequency=4, FUN=sum)
plot(quarterly, bty="l")
miles <- tslm(quarterly~trend)
summary(miles)
## 
## Call:
## tslm(formula = quarterly ~ trend)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -31619784  -9925875  -1516367   9227172  27670090 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 114352086    3460507   33.05  < 2e-16 ***
## trend         1144819     103789   11.03 1.48e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12890000 on 55 degrees of freedom
## Multiple R-squared:  0.6887, Adjusted R-squared:  0.683 
## F-statistic: 121.7 on 1 and 55 DF,  p-value: 1.481e-15
lines(miles$fitted, lwd=2)

When including September 11 and just going a bit beyond, the number of air miles traveled dropped and did not recover over the next few years but the trend was still in the same general direction with a similar slope. Additional current data would show if the ridership recovered and a determination of the overall effect of the incident on air travel could be derived.

Question 3: Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985- 1989.

(a) Create a well-formatted time plot of the data.

appliances <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/ApplianceShipments.csv")
appliances.ts <- ts(appliances$Shipments, start=c(1985,1), end=c(1989,4), freq=4)
str(appliances.ts)
##  Time-Series [1:20] from 1985 to 1990: 4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
plot(appliances.ts, ylab="Shipments",ylim=c(min(appliances.ts),max(appliances.ts)),bty="l", type="b")
sold <- tslm(appliances.ts~trend)
lines(sold$fitted, lwd=2)

(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

This series has a level as level refers to the average value of the series so all series have a level. It seems to be a constant trend, non-seasonal as the values don’t repeat at any frequency, and there is noise as there is no periodicity to the variation.

Question 6: Forecasting Shampoo Sales: The file ShampooSales.xls contains data on the monthly sales of a certain shampoo over a 3-year period.

(a) Create a well-formatted time plot of the data.

shampoo <- read.csv("/Users/wendyhayes/Desktop/MBA 678-Predictive Analytics/ShampooSales.csv")
str(shampoo)
## 'data.frame':    36 obs. of  2 variables:
##  $ Month        : Factor w/ 36 levels "Apr-95","Apr-96",..: 13 10 22 1 25 19 16 4 34 31 ...
##  $ Shampoo.Sales: num  266 146 183 119 180 ...
shampoo.ts <- ts(shampoo$Shampoo.Sales, start=c(1995,1), end=c(1997,12), freq = 12)
summary(shampoo.ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   119.3   192.5   280.2   312.6   411.1   682.0
plot(shampoo.ts, ylab="Shampoo Sales",ylim=c(min(shampoo.ts),max(shampoo.ts)),bty="l", type="b")
shampoosales <- tslm(shampoo.ts~trend)
lines(shampoosales$fitted, lwd=2)

#####I decided to look at an aggregate to see if anything stood out in terms of seasonality:

quarterly <- aggregate(shampoo.ts,nfrequency=4, FUN=sum)
plot(quarterly, bty="l")
sales <- tslm(quarterly~trend)
lines(sales$fitted, lwd=2)

Nothing really jumped out from this. There seems to be a low point around March of 1995 but the next was January 1996. This wasn’t too helpful to me so I decided to look at an aggregate with a smaller time frame to check for seasonality:
shampoo <- window(shampoo.ts,1996,c(1997,6))
plot(shampoo, ylab="Shampoo Sales",ylim=c(min(shampoo.ts),max(shampoo.ts)), bty="l")
sold <- tslm(shampoo~trend)
summary(sold)
## 
## Call:
## tslm(formula = shampoo ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -73.846 -37.629  -0.824  29.156 107.002 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  164.302     23.886   6.879 3.71e-06 ***
## trend         15.030      2.207   6.811 4.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 48.57 on 16 degrees of freedom
## Multiple R-squared:  0.7435, Adjusted R-squared:  0.7275 
## F-statistic: 46.39 on 1 and 16 DF,  p-value: 4.183e-06
lines(sold$fitted, lwd=2)

(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

This series has a level as level refers to the average value of the series so all series have a level. It has an upward linear trend without seasonality as the values don’t repeat at any frequency, and there is noise as there really doesn’t seem to be any periodicity to the variation.

(c) Do you expect to see seasonality in sales of shampoo? Why?