MBA 678: Assignment 1


Chapter 1: Questions 1-5

  1. The goal of the Impact of 9/11 on Air Travel in the United States is descriptive, as they are seeking to identify what impact the past event had.

  2. The forecast horizon in this example would be long-term, spanning years. A next-month forecast would not be sufficient.

  3. The level of automation required for this forecast is limited. As it is a one-time process that does not need to be done repeatedly and over time. It will be one forecast using past data and a comparison to actual information.

4.The t = 1,2,3 indicates the time-series in the data set, so for this example it would be the monthly time period. t = 1 refers to January 1990, the first month in the series.

  1. The values for y in the air series, are the the actual airline revenue passenger miles. The values are as follows:
y RPM
y1 35153577
y2 32965187
y3 39993913

Chapter 2: Questions 1, 3 & 6

    1. To begin, I read in the .csv that contained the data I would need. I then broke the different data sets out, so I could plot flight, rail and vehicle travel individually.
setwd("C:/Users/Jordan/Documents/MBA678")
travel <- read.csv("Sept11Travel_Updated.csv")

head(travel)
##    Month      Air      Rail    VMT
## 1 Jan-90 35153577 454115779 163.28
## 2 Feb-90 32965187 435086002 153.25
## 3 Mar-90 39993913 568289732 178.42
## 4 Apr-90 37981886 568101697 178.68
## 5 May-90 38419672 539628385 188.88
## 6 Jun-90 42819023 570694457 189.16

I seperated the three modes of travel into their own plot, but selecting just the related data set, I set out to plot them as time series.

flights.ts <- ts(travel$Air, start = c(1990,1), end = c(2001,8), freq =12)
plot(flights.ts, xlab = "Time (year)", ylab = "Revenue Passenger Miles Travelled")

rail.ts <- ts(travel$Rail, start = c(1990,1), end = c(2001,8), freq =12)
plot(rail.ts, xlab = "Time (year)", ylab = "Rail Passenger Miles Travelled")

auto.ts <- ts(travel$VMT, start = c(1990,1), end = c(2001,8), freq =12)
plot(auto.ts, xlab = "Time (year)", ylab = "Vehicle Miles Travelled")

    1. For part b of question 1, I used the same plots as above, but adjusted to zoom in on a smaller time period and added trend lines. I started with the flight data and plotted with log(y), zoomed in, and with a trend line.
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.3.2
## This is forecast 7.3
plot(flights.ts, xlab = "Time (year)", ylab = "Miles Travelled by Air")

plot(flights.ts, log="y", xlab = "Time (year)", ylab = "Miles Travelled by Air")

zoom.flights.ts <- window(flights.ts, start = c(1999,1), end = c(2001,8))
plot(zoom.flights.ts, xlab = "Time", ylab = "Miles Travelled by Air")

travelLinear <- tslm(flights.ts ~ trend)
summary(travelLinear)
## 
## Call:
## tslm(formula = flights.ts ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -9466409 -3410590  -681183  3360750 11823514 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 35728435     834749   42.80   <2e-16 ***
## trend         177097      10272   17.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4912000 on 138 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6806 
## F-statistic: 297.2 on 1 and 138 DF,  p-value: < 2.2e-16
plot(flights.ts, xlab = "Time (year)", ylab = "Miles Travelled by Air")
lines(travelLinear$fitted, lwd=2)

supseasonality <- aggregate(flights.ts, nfrequency=4, FUN=sum)
plot(supseasonality, ylab ="Miles Travelled by Air (without seasonality)", bty="l")

I then did the trend line and zoomed in plots for the other two data sets. I mixed up the length for the zoomed in time period as well.

zoom.rail.ts <- window(rail.ts, start = c(1999,1), end = c(2001,8))
plot(zoom.flights.ts, xlab = "Time", ylab = "Miles Travelled by Rail")

railLinear <- tslm(rail.ts ~ trend)
summary(railLinear)
## 
## Call:
## tslm(formula = rail.ts ~ trend)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -142225897  -40750574   -4370229   41272192  133135874 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 547632872   10511773  52.097  < 2e-16 ***
## trend         -837744     129357  -6.476 1.53e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61860000 on 138 degrees of freedom
## Multiple R-squared:  0.2331, Adjusted R-squared:  0.2275 
## F-statistic: 41.94 on 1 and 138 DF,  p-value: 1.532e-09
plot(rail.ts, xlab = "Time (year)", ylab = "Miles Travelled by Rail")
lines(railLinear$fitted, lwd=2)

zoom.auto.ts <- window(auto.ts, start = c(1999,1), end = c(2001,8))
plot(zoom.auto.ts, xlab = "Time", ylab = "Miles Travelled by Automobiles")

autoLinear <- tslm(auto.ts ~ trend)
summary(autoLinear)
## 
## Call:
## tslm(formula = auto.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.554  -9.185   0.559  11.087  23.272 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.17137    2.42094   71.53   <2e-16 ***
## trend         0.44249    0.02979   14.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.25 on 138 degrees of freedom
## Multiple R-squared:  0.6152, Adjusted R-squared:  0.6124 
## F-statistic: 220.6 on 1 and 138 DF,  p-value: < 2.2e-16
plot(auto.ts, xlab = "Time (year)", ylab = "Miles Travelled by Automobiles")
lines(autoLinear$fitted, lwd=2)

zoomautoLinear <- tslm(zoom.auto.ts ~ trend)
summary(zoomautoLinear)
## 
## Call:
## tslm(formula = zoom.auto.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.259  -5.425   2.244  11.962  22.279 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 217.4082     5.6759  38.304   <2e-16 ***
## trend         0.6446     0.3002   2.147     0.04 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.68 on 30 degrees of freedom
## Multiple R-squared:  0.1332, Adjusted R-squared:  0.1043 
## F-statistic: 4.611 on 1 and 30 DF,  p-value: 0.03996
plot(zoom.auto.ts, xlab = "Time (year)", ylab = "Miles Travelled by Automobiles")
lines(zoomautoLinear$fitted, lwd=2)

    1. I read in the Appliance Shipments file to start. I had noted that the csv file was formatted with Quarter-Year, so I wasn’t certain how that would come through if I attempted to plot it. Therefore I did a test and then did some basic formatting to get the data in the correct order to plot as a time series. Although the time series plot looks okay, when I selected the top of appliance.ts, the first observations are all Q1 for each year, rather than in chronological order.
appliances <- read.csv("ApplianceShipments.csv")

appliance.ts <- ts(appliances$Shipments, start = c(1985), end = c(1989), freq =12)

head(appliance.ts)
## [1] 4009 4321 4224 3944 4123 4522
plot(appliance.ts, xlab = "Time (year)", ylab = "Appliances Shipped", bty="l")

I have manually reordered the csv so that it was in the correct order, as I wasn’t sure how to do this in R easily and with so few observations it was a quick fix in Excel.

appliancesADJ <- read.csv("ApplianceShipmentsADJUSTED.csv")

applianceADJ.ts <- ts(appliancesADJ$Shipments, start = c(1985), end = c(1989), freq =12)

head(applianceADJ.ts)
## [1] 4009 4321 4224 3944 4123 4522
plot(applianceADJ.ts, xlab = "Time (year)", ylab = "Appliances Shipped", bty="l")

    1. There appears to be seasonality to appliance sales, but no linear trend.

6.(a) I began by taking a look at the .csv file in Excel to get a sense of the format, then read in the file. I plotted as time series and set up limits for the y-variable.

shampoo <- read.csv("ShampooSales.csv")

shampoo.ts <- ts(shampoo$Shampoo.Sales, start = c(1995,1), end = c(1997,12), freq =12)

plot(shampoo.ts, xlab = "Time (year)", ylab = "Shampoo Sales", ylim = c(100,700), bty="l")

shampooLinear <- tslm(shampoo.ts ~ trend)
summary(shampooLinear)
## 
## Call:
## tslm(formula = shampoo.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -108.74  -52.12  -16.13   43.48  194.25 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    89.14      26.72   3.336  0.00207 ** 
## trend          12.08       1.26   9.590 3.37e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.5 on 34 degrees of freedom
## Multiple R-squared:  0.7301, Adjusted R-squared:  0.7222 
## F-statistic: 91.97 on 1 and 34 DF,  p-value: 3.368e-11
plot(shampoo.ts, xlab = "Time (year)", ylab = "Shampoo Sales")
lines(shampooLinear$fitted, lwd=2)

6(b). There is a strong linear trend, with increasing shampoo sales over time. However there isn’t seasonality to the data, which makes sense as shampoo is a product used yearround with likely little differences over a seasonal period. I would imagine sales of most staple items or basics wouldn’t have seasonality, shampoo included. There isn’t very visual noise, but as the book suggests there is always some noise.