First Assignment

library(knitr)
library(forecast)
library(ggvis)
library(dplyr)
library(zoo)

Chapter 1

1. Is the goal of this study descriptive or predictive?

The question states that the purpose of the study was to provide a greater understanding about long distance travel behavior before and after the 9/11 attacks and to asses the impact of the events, I would therefore have to conclude that the goal is a descriptive analysis.

There is a predictive element in this study insofar as part of the impact analysis involved forecasting. The pre 9/11 data was used to forecast post-9/11 and this forcast was compared to the actual data.

2. What is the forecast horizon to consider in this task? Are next-month forecasts sufficient?

Because the goal is to understand the impact of the 9/11 terrorist events, next month forecasts would not be sufficient. I would hypothesize that there would be both short and long term ramifications of such an event, therefore next month forecasts would provide an incomplete picture. Because long distance travel has a great deal of seasonal variability, it would be important to try and assess the impact over a longer horizon. Refreshing the forecast based on recent data as it comes in would also be important. Perhaps trends of increasing or decreasing long distance travel would emerge. Finally, for were this data provided for forecasting, a next-month forecast would be insufficient because of the seasonality of travel and the necessity of longer term planning for such an costly business.

3. What level of automation does this forecasting task require? Consider the four questions related to automation.

For this descriptive study, a high level of automation is not necessary. Patterns, trends, and a high level assessment of the impact of the terrorist events on travel can be gleaned from crude time series plots. The data we are given is monthly so there is a manageable amount of data to perform the analysis. Additionally, because this is a one time study with a fixed and finite amount of data, a low level of automation is all that is necessary.

4. What is the meaning of t = 1, 2, 3 in the Air series? Which time period does t = 1 refer to? It is the notation used for indexing the time period of interest. In the Air series, t = 1 refers to Jan-90, the first date in the series.

5. What are the values for y1, y2, and y3 in the Air series? The y values are the variable measurements over the noted time period. Thereore \(y_{1}\) refers to the variable measurements in the first time period. For:

air travel \(y_{1} = 35153577 RPM (000s)\)

rail travel \(y_{2} = 32965187 RPM (000s)\)

vehicle travel \(y_{3} = 39993913 RPM (000s)\)

Chapter 2

travel <- read.csv("travel911.csv", header = TRUE, stringsAsFactors = FALSE)
revisedtravel <- read.csv("Sept11Travel.csv", header = TRUE, stringsAsFactors = FALSE)
#kable(head(travel))
names(travel)[1:4] <- c("month_yr", "air.miles", "rail.miles", "car.miles.in.billions")
revisedtravel <- revisedtravel %>% mutate(logAir = log(Air), logRail = log(Rail), logCar = log(VMT))
kable(head(revisedtravel))

Month	Air	Rail	VMT	logAir	logRail	logCar
Jan-90	35153577	454115779	163.28	17.37524	19.93386	5.095466
Feb-90	32965187	435086002	153.25	17.31096	19.89105	5.032071
Mar-90	39993913	568289732	178.42	17.50424	20.15814	5.184140
Apr-90	37981886	568101697	178.68	17.45262	20.15781	5.185596
May-90	38419672	539628385	188.88	17.46408	20.10639	5.241112
Jun-90	42819023	570694457	189.16	17.57249	20.16236	5.242593

1. Plot each of the three pre-event time series (Air, Rail, Car)

logAir.ts <- ts(revisedtravel$logAir,  start = c(1990, 1), end = c(2001, 8), frequency = 12)
logAir.plot <- plot(logAir.ts, ylab = "log Air Passenger Miles", type="l", bty="l",  main = "Airline Revenue Passenger Miles before 9/11 Terrorist Attacks")

  logRail.ts <- ts(revisedtravel$logRail,  start = c(1990, 1), end = c(2001, 8), frequency = 12)
  logRail.plot <- plot(logRail.ts, ylab = "log Rail Passenger Miles", type="l", bty="l", main = "Rail Passenger Miles before 9/11 Terrorist Attacks")

  logCar.ts <- ts(revisedtravel$logCar,  start = c(1990, 1), end = c(2001, 8), frequency = 12)
  logCar.plot <- plot(logCar.ts, ylab = "log Car Passenger Miles", type="l", bty="l", main = "Monthly Car Miles Travelled before 9/11 Terrorist Attacks")

(a) What time series components appear from the plot? (b) What type of trend appears? Change the scale of the series, add trend lines, and suppress seasonality to better visualize the trend pattern.

Air Plots: An upward, linear trend and seasonality. This linear trend is exhibited in plots with trend lines and in plots where the seasonality is suppressed. Seasonality is evident when zooming and in the season plots.

airlinear <- tslm(logAir.ts  ~ trend)
plot(logAir.ts, xlab = "Year", ylab = "log Airline Revenue Passenger Miles", bty = "l", main = "Trend: Air Revenue Passenger Miles")
lines(airlinear$fitted, lwd=2)

quarterlyAir <- aggregate(logAir.ts, nfrequency=4, FUN=sum)
plot(quarterlyAir, xlab = "Year", ylab = "log Air Passenger Miles",bty="l", main = "Air Miles by Quarter:Seasonality Suppressed")

yearlyAir <- aggregate(logAir.ts, nfrequency=1, FUN=sum)
plot(yearlyAir, xlab = "Year", ylab = "log Air Passenger Miles",bty="l", main = "Air Miles Yearly:Seasonality Suppressed")

air.ts.zoom <- window(logAir.ts, start = c(1999, 1), end = c(2000, 12))
plot(air.ts.zoom, xlab = "Year", ylab = "log Airline Revenue Passenger Miles", bty = "l", main = "Zoom: Air Revenue Passenger Miles")

ggseasonplot(logAir.ts, col =rainbow(12), year.labels=TRUE, ylab = "log Air Passenger Revenue Miles", main = "Seasonal Plot for Air Passenger Revenue")

Rail Plot2: show a downward trend and seasonality. A combination of time series visualizations demonstrates these components.

rail.ts <- ts(revisedtravel$Rail, start = c(1990, 1), end = c(2001, 8), frequency = 12)
trainlinear <- tslm(logRail.ts ~ trend)
plot(logRail.ts, xlab = "Year", ylab = "log Rail Passenger Miles", bty = "l", main = "Trend: Rail Passenger Miles")
lines(trainlinear$fitted, lwd = 2)

#Quarterly rail
quarterlyRail <- aggregate(logRail.ts, nfrequency=4, FUN=sum)
plot(quarterlyRail, xlab = "Year", ylab = "log Rail Passenger Miles", bty="l", main = "Rail Miles by Quarter:Seasonality Suppressed")

#Yearly rail
yearlyRail <- aggregate(logRail.ts, nfrequency=1, FUN=sum)
plot(yearlyRail, xlab = "Year", ylab = "log Car Passenger Miles (in billions)",bty="l", main = "Rail Miles by Year:Seasonality Suppressed")

ggseasonplot(logRail.ts, col =rainbow(12), year.labels=TRUE, ylab = "log Rail Passenger Revenue Miles", main = "Season Plot for Rail Miles Travelled")

rail.ts.zoom <- window(logRail.ts, start = c(1998, 1), end = c(2000, 12))
plot(rail.ts.zoom, xlab = "Year", ylab = "log Rail Passenger Miles", bty = "l", main = "Zoom: Rail Passenger Miles")

Car Plots: show an upward trend and seasonality. Again, a combination of time series visualizations demonstrates these components.

carlinear <- tslm(logCar.ts ~ trend)
plot(logCar.ts, xlab = "Year", ylab = "log Car Passenger Miles (in billions)", bty = "l", main = "Trend: Car Passenger Miles")
lines(carlinear$fitted, lwd = 2)

#Quarterly car
quarterlyAuto <- aggregate(logCar.ts, nfrequency=4, FUN=sum)
plot(quarterlyAuto, xlab = "Year", ylab = "log Car Passenger Miles(in billions)",main = "Quarterly Auto: Seasonality Suppressed", bty="l")

yearlyAuto <- aggregate(logCar.ts, nfrequency=1, FUN=sum)
plot(yearlyAuto, xlab = "Year", ylab = "log Car Passenger Miles(in billions)", main = "Yearly Auto:Seasonality Suppressed",  bty="l")

ggseasonplot(logCar.ts, col =rainbow(12), year.labels=TRUE, ylab = "log Car Passenger Miles")

car.ts.zoom <- window(logCar.ts, start = c(1998, 1), end = c(2000, 12))
plot(rail.ts.zoom, xlab = "Year", ylab = "log Car Passenger Miles (in billions)", bty = "l", main = "Zoom: Car Passenger Miles")

3. Shipments of Household Appliances: The file ApplianceShipments.xls contains the series of quarterly shipments (in millions of USD) of U.S. household appliances between 1985- 1989. 6 6 Data courtesy of Ken Black (b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

#read data
Appliance <- read.csv("ApplianceShipments.csv", header = TRUE)
appliance.ts. <- ts(Appliance$Shipments, start = c(1985,1), frequency = 4)
appliance.ts <- ts(Appliance$Shipments, start = 1985, frequency = 4)
#zoo ts for better delineation of quarters
zoo.appliance.ts <- zooreg(Appliance$Shipments, start = as.yearqtr("1985-1"), frequency = 4)
#zoo zoom
ZAp_ts <- window(zoo.appliance.ts, start = as.yearqtr("1985-1"), end = as.yearqtr("1987-4"))
#reg zoom
appliance.zoom <- window(appliance.ts., start = c(1985,1), end = c(1987, 4))
#aggregate
YearlyAppliance <- aggregate(appliance.ts, nfrequency = 1, FUN = sum)
#trendlines
shipmentlinear <- tslm(appliance.ts ~ trend)
shipmentQuad <- tslm(appliance.ts ~ trend + I(trend^2))

(a) Create a well-formatted time plot of the data.

plot(zoo.appliance.ts, col="blue", lty=1, lwd=2, xlab = "Year", ylab = "Appliance Shipments in millions of US $", ylim = c(3900, 5000), main = "Quarterly Appliance Shipments")

(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

There is a usually a seasonal trend in the Appliance Shipment data consisting of an apex in the second quarter and a nadir in the fourth quarter. This can be seen in the season plot and the Decomposition plot.

There is an upward linear trend which is evident from the yearly plot and the decomposition plot.

Noise is present to a certain degree in all time series data. Some is eliminated in the aggregate plot. The noise is also delineated in the decomposed plot.

plot(appliance.ts, col="blue", #lty=1, lwd=2, 
     xlab = "Year", ylab = "Appliance Shipments",  ylim = c(3800, 5000), #pch = 20, type = "b", 
     main = "Quarterly Appliance Shipments 1985-1987 with trend lines")
#par(mfrow = c(2, 1))
lines(shipmentlinear$fitted, lwd = 2)
lines(shipmentQuad$fitted, lty=2, lwd=3)

Zoom plot to show inderlying patterns

plot(appliance.zoom, xlab = "Time", ylab = "Appliance Shipments", bty = "l", main = "Appliance Shipments from 1985-7")

Aggregate Plot to suppress seasonality

plot(YearlyAppliance, col="blue", bty="l")

Seasonal Plot to show seasonality

ggseasonplot(appliance.ts, col =rainbow(12), year.labels=TRUE, ylab = "Shipments", main = "Seasonal Patterns in Appliance Shipments")

Systematic components decomposed showing seasonal and trend patterns.

ApplianceDecomp <- stl(appliance.ts, s.window = 5,  t.window=15)
plot(ApplianceDecomp, main = "Systematic Components of Appliance Shipments")

Rearranging Data for Appliance

library(dplyr)
Appliance2 <-read.csv("ApplianceShipments1.csv", stringsAsFactors = FALSE)
str(Appliance2)

## 'data.frame':    20 obs. of  4 variables:
##  $ Quarter                                          : chr  "Q1-1985" "Q1-1986" "Q1-1987" "Q1-1988" ...
##  $ Shipments                                        : int  4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
##  $ X                                                : logi  NA NA NA NA NA NA ...
##  $ From.Ken.Black..Business.Statistics...5th.Edition: logi  NA NA NA NA NA NA ...

#get rid of extraneous columns
Appliance2 <- Appliance2[,-c(3:4)]
head(Appliance2)

##   Quarter Shipments
## 1 Q1-1985      4009
## 2 Q1-1986      4123
## 3 Q1-1987      4493
## 4 Q1-1988      4595
## 5 Q1-1989      4245
## 6 Q2-1985      4321

#split up the first column
col1Split <- strsplit(Appliance2$Quarter, "-")
head(col1Split)

## [[1]]
## [1] "Q1"   "1985"
## 
## [[2]]
## [1] "Q1"   "1986"
## 
## [[3]]
## [1] "Q1"   "1987"
## 
## [[4]]
## [1] "Q1"   "1988"
## 
## [[5]]
## [1] "Q1"   "1989"
## 
## [[6]]
## [1] "Q2"   "1985"

#notice it's a matrix
str(col1Split)

## List of 20
##  $ : chr [1:2] "Q1" "1985"
##  $ : chr [1:2] "Q1" "1986"
##  $ : chr [1:2] "Q1" "1987"
##  $ : chr [1:2] "Q1" "1988"
##  $ : chr [1:2] "Q1" "1989"
##  $ : chr [1:2] "Q2" "1985"
##  $ : chr [1:2] "Q2" "1986"
##  $ : chr [1:2] "Q2" "1987"
##  $ : chr [1:2] "Q2" "1988"
##  $ : chr [1:2] "Q2" "1989"
##  $ : chr [1:2] "Q3" "1985"
##  $ : chr [1:2] "Q3" "1986"
##  $ : chr [1:2] "Q3" "1987"
##  $ : chr [1:2] "Q3" "1988"
##  $ : chr [1:2] "Q3" "1989"
##  $ : chr [1:2] "Q4" "1985"
##  $ : chr [1:2] "Q4" "1986"
##  $ : chr [1:2] "Q4" "1987"
##  $ : chr [1:2] "Q4" "1988"
##  $ : chr [1:2] "Q4" "1989"

#join the new columns to the Appliance 2 data frame
newApp_df <- data.frame(matrix(unlist(col1Split), nrow = length(col1Split), byrow = T), Appliance2$Shipments)
head(newApp_df)

##   X1   X2 Appliance2.Shipments
## 1 Q1 1985                 4009
## 2 Q1 1986                 4123
## 3 Q1 1987                 4493
## 4 Q1 1988                 4595
## 5 Q1 1989                 4245
## 6 Q2 1985                 4321

#reorder by year, then quarter
newApp_df <- newApp_df[order(newApp_df$X2, newApp_df$X1),]
head(newApp_df)

##    X1   X2 Appliance2.Shipments
## 1  Q1 1985                 4009
## 6  Q2 1985                 4321
## 11 Q3 1985                 4224
## 16 Q4 1985                 3944
## 2  Q1 1986                 4123
## 7  Q2 1986                 4522

#put the columns back together
AppliancesFixed <- paste(newApp_df$X1, "-", newApp_df$X2)
head(AppliancesFixed)

## [1] "Q1 - 1985" "Q2 - 1985" "Q3 - 1985" "Q4 - 1985" "Q1 - 1986" "Q2 - 1986"

#join dates with reordered shipments
ApplianceShipments <- data.frame(AppliancesFixed, newApp_df$Appliance2.Shipments)
head(ApplianceShipments)

##   AppliancesFixed newApp_df.Appliance2.Shipments
## 1       Q1 - 1985                           4009
## 2       Q2 - 1985                           4321
## 3       Q3 - 1985                           4224
## 4       Q4 - 1985                           3944
## 5       Q1 - 1986                           4123
## 6       Q2 - 1986                           4522

#rename columns
names(ApplianceShipments)[1] <- "Quarter"
names(ApplianceShipments)[2] <- "Shipments"
head(ApplianceShipments)

##     Quarter Shipments
## 1 Q1 - 1985      4009
## 2 Q2 - 1985      4321
## 3 Q3 - 1985      4224
## 4 Q4 - 1985      3944
## 5 Q1 - 1986      4123
## 6 Q2 - 1986      4522

ApplianceShipments.ts <- ts(ApplianceShipments$Shipments, start = c(1985,1), end = c(1989, 4), frequency = 4)

library(forecast)
shipmentlinear <- tslm(ApplianceShipments.ts ~ trend)
shipmentQuad <- tslm(ApplianceShipments.ts ~ trend + I(trend^2))

plot(ApplianceShipments.ts, bty="l", xlab="Year", ylab="Appliance Shipments (millions $US)", las = 1)
lines(shipmentlinear$fitted, lwd = 2)

6. Forecasting Shampoo Sales: The file ShampooSales.xls contains data on the monthly sales of a certain shampoo over a 3-year period.

Shampoo <- read.csv("ShampooSales.csv")
shampoo.ts <- ts(Shampoo$Shampoo.Sales, start = c(1995, 1), frequency = 12)
shampoolinear <- tslm(shampoo.ts ~ trend)
shampooQuad <- tslm(shampoo.ts ~ trend + I(trend^2))
shampoo.zoom <- window(shampoo.ts, start = c(1996,1), end = c(1997, 12))
#aggregate
YearlyShampoo <- aggregate(shampoo.ts, nfrequency = 1, FUN = sum)
QuarterlyShampoo <- aggregate(shampoo.ts, nfrequency = 12, FUN = sum)

(a) Create a well-formatted time plot of the data.

plot(shampoo.ts, col="blue", lty=1, lwd=2, xlab = "Month_Year", ylab = "Shampoo Sales",main = "Monthly Shampoo Sales")

(b) Which of the four components (level, trend, seasonality, noise) seem to be present in this series?

Trend is present. There is an upward linear trend shown in the Monthly Shampoo Sales with trend lines plot.

Level and noise are present in all time series data sets.

Seasonality is a vague component in this series of data. In the season plots, the seasons do not mimic each other in a significant manner.

plot(shampoo.ts, col="blue", lty=1, lwd=2, xlab = "Month_Year", ylab = "Shampoo Sales",main = "Monthly Shampoo Sales with trend lines")
lines(shampoolinear$fitted, lwd = 2)
lines(shampooQuad$fitted, lty = 2, lwd = 3)

plot(shampoo.zoom, xlab = "Month_Year", ylab = "Shampoo Sales", main = "Shampoo Sales 1996-1997", col = "blue")

plot(YearlyShampoo, col = "blue", bty = "l", xlab = "Month_Year", ylab = "Shampoo Sales", main = "Shampoo Sales Aggregated by Year")

plot(QuarterlyShampoo, col = "blue", bty = "l", xlab = "Month_Year", ylab = "Shampoo Sales", main = "Shampoo Sales Aggregated by Quarter")

ggseasonplot(shampoo.ts, col = rainbow(3), year.labels=TRUE, ylab = "Shampoo Sales", main = "Seasonal Patterns in Shampoo Sales")

ShampooDecomp <- stl(shampoo.ts, s.window = 5,  t.window=15)
plot(ShampooDecomp, main = "Systematic Components of Shampoo Sales")

(c) Do you expect to see seasonality in sales of shampoo? Why?

I would not expect shampoo sales to manifest a seasonal pattern because it is a standard product that is generally used year round. Perhaps, beauty salons purchase shampoo products where seasonal cycles are manifest…for example, if they are busiest in the summers, however, for home use, I would guess the variability in sales is more random than seasonal.