Chapter 1
Questions 1,2,3,4 & 5
1. Is the goal of this study descriptive or predictive?
The goal of the study is to understand travel behavior patterns of persons making long distance trips before and after September 11. Stated as such, the purpose of the study is descriptive because it intends to look at data collected and use it to describe events that occured in the past. The book does mention that the analysts used the data to make predictions on what travel patterns would have been if the terror attacks did not occur, however these forecasts were used to measure potential impact, not describe actual patterns before and post 9/11.
2. What is the forecast horizon to consider in this task? Are next-month forecasts sufficient?
The forecast horizon is how far into the future should a forecast be made; in this example the horizon would be month to month through April 2004. This would allow the analyts to compare predictied travel, assuming 9/11 did not happen, to actual data gathered after 9/11. Next month forecasts should be sufficient since the data was summarized on a montly basis.
3. What level of automation does this forecasting task require? Consider the four questions related to automation.
If the analysts only intend to forecast travel through April 2004, then this would be a one time event and no automation would be required, however since they would need to predict approzimately 31 data points, depending on the complexity of their prediction model, they may want to automate the process to some extent.
4. What is the meaning of t = 1,2,3 in the Air series? Which time period does t = 1 refer to?
t = 1,2,3 denotes the time period of interest, in this scenario, t = 1 referes to January 1990.
5. What are the values for y1,y2, and y3 in the Air series?
The y values represent the number of actual airline revenue passenger miles recorded for each time period. According to the Excel document, y1 = 35.15 billion, y2 = 32.97 billion and y3 = 39.99 billion.
Question 1
Data is plotted through August 2001, just prior to September 11th; the exception being the time series where we zoom in on the last 4 years, where the series ends December 2000.
#Plot all data pre-September 11th
railMiles.ts <- ts(Rail_Miles_m$Rail_Miles_millions, start = c(1990,1), end = c(2001, 8), freq = 12)
plot(railMiles.ts, xlab = "Date", ylab = "Rail Miles (millions)", main = "Travel by Rail", bty = "l")

#Zooming in first 4 years
railFirst4Years <- window(railMiles.ts, 1990, c(1993,12))
plot(railFirst4Years, ylab = "Rail Miles (millions)", main = "Travel by Rail", sub = "First 4 Years", bty = "l")

#Zooming in last 4 years
railLast4Years <- window(railMiles.ts, start = 1997, end = c(2000,12))
plot(railLast4Years, ylab = "Rail Miles (millions)", main = "Travel by Rail", sub = "Last 4 Years", bty = "l")

#Add trend lines, Linear and Quadratic
railMilesLinear <- tslm(railMiles.ts ~ trend)
railMilesQuad <- tslm(railMiles.ts ~ trend + I(trend^2))
plot(railMiles.ts, xlab = "Date", ylab = "Rail Miles (millions)", main = "Travel by Rail", bty = "l") + lines(railMilesLinear$fitted, lwd = 2) + lines(railMilesQuad$fitted, lty = 2, lwd = 3)

## numeric(0)
Answers pertaining to the Rail Time Series
a) Level and noise occur in every time series, but you can see that seasonality is also present here. Trend is more difficult to see in this plot.
b) Using the examples in the book as a guide, I would have to say that 3rd order polynomial trend with additive seasonlity is what appears for this time series.
#Suppress seasonality
##Quarterly
railQuarterly <- aggregate(railMiles.ts, nfrequency = 4, FUN = sum)
plot(railQuarterly, main = "Travel by Rail", sub = "Summarized by Quarter", bty = "l")

##Yearly
railYearly <- aggregate(railMiles.ts, nfrequency = 1, FUN = sum)
plot(railYearly, main = "Travel by Rail", sub = "Summarized by Year", bty = "l")

#Plot all data
#Plot all data pre-September 11th
airMiles.ts <- ts(Air_Miles_b$Air_Miles_billions, start = c(1990,1), end = c(2001, 8), freq = 12)
plot(airMiles.ts, xlab = "Date", ylab = "Air Miles (billions)", main = "Travel by Air", bty = "l")

#Zooming in first 4 years
airFirst4Years <- window(airMiles.ts, 1990, c(1993,12))
plot(airFirst4Years, ylab = "Air Miles (billions)", main = "Travel by Air", sub = "First 4 Years", bty = "l")

#Zooming in last 4 years
airLast4Years <- window(airMiles.ts, start = 1997, end = c(2000,12))
plot(airLast4Years, ylab = "Air Miles (billions)", main = "Travel by Air", sub = "Last 4 Years", bty = "l")

#What did air travel look like just around September 11th (2001 -2002)?
airMiles.ts2 <- ts(Air_Miles_b$Air_Miles_billions, start = c(2001,1), end = c(2002, 12), freq = 12)
airSept11 <- window(airMiles.ts2, 2001, c(2002,12))
plot(airSept11, ylab = "Air Miles (billions)", main = "Travel by Air", sub = "Around Sept. 11th (2001 -2002)", bty = "l")

#Add trend lines, Linear and Quadratic
airMilesLinear <- tslm(airMiles.ts ~ trend)
airMilesQuad <- tslm(airMiles.ts ~ trend + I(trend^2))
plot(airMiles.ts, xlab = "Date", ylab = "Air Miles (billions)", main = "Travel by Air", bty = "l") + lines(airMilesLinear$fitted, lwd = 2) + lines(airMilesQuad$fitted, lty = 2, lwd = 3)

## numeric(0)
Answers pertaining to the Air Time Series
a) Trend and seasonality are present in this time series.
b) This plot displays upward linear trend with additive seasonality. There is a slight curve in the quadratic trend line, which may indicate an upward exponential trend, but to be more definitive, I would like to see data going back into the 1980s.
#Suppress seasonality
##Quarterly
airQuarterly <- aggregate(airMiles.ts, nfrequency = 4, FUN = sum)
plot(airQuarterly, main = "Travel by Air", sub = "Summarized by Quarter", bty = "l")

##Yearly
airYearly <- aggregate(airMiles.ts, nfrequency = 1, FUN = sum)
plot(airYearly, main = "Travel by Air", sub = "Summarized by Year", bty = "l")

#Plot all data
#Plot all data pre-September 11th
carMiles.ts <- ts(Car_Miles_b$Car_Miles_billions, start = c(1990,1), end = c(2001, 8), freq = 12)
plot(carMiles.ts, xlab = "Date", ylab = "Car Miles (billions)", main = "Travel by Car", bty = "l")

#Zooming in first 4 years
carFirst4Years <- window(carMiles.ts, 1990, c(1993,12))
plot(carFirst4Years, ylab = "Car Miles (billions)", main = "Travel by Car", sub = "First 4 Years", bty = "l")

#Zooming in last 4 years
carLast4Years <- window(carMiles.ts, start = 1997, end = c(2000,12))
plot(carLast4Years, ylab = "Car Miles (billions)", main = "Travel by Car", sub = "Last 4 Years", bty = "l")

#Add trend lines, Linear and Quadratic
carMilesLinear <- tslm(carMiles.ts ~ trend)
carMilesQuad <- tslm(carMiles.ts ~ trend + I(trend^2))
plot(carMiles.ts, xlab = "Date", ylab = "Car Miles (billions)", main = "Travel by Car", bty = "l") + lines(carMilesLinear$fitted, lwd = 2) + lines(carMilesQuad$fitted, lty = 2, lwd = 3)

## numeric(0)
Answers pertaining to the Car Time Series
a) Trend and seasonality are present in this time series.
b) This time plot displays an upward linear trend with additive seasonality.
#Suppress seasonality
##Quarterly
carQuarterly <- aggregate(carMiles.ts, nfrequency = 4, FUN = sum)
plot(carQuarterly, main = "Travel by Car", sub = "Summarized by Quarter", bty = "l")

##Yearly
carYearly <- aggregate(carMiles.ts, nfrequency = 1, FUN = sum)
plot(carYearly, main = "Travel by Car", sub = "Summarized by Year", bty = "l")

Question 3
ApplianceShipments <- read_excel("~/rProjects/Assignment_1/ApplianceShipments.xlsx")
appliancePlot <- ggplot(data = ApplianceShipments, aes(x=Quarter, y=Shipments, group=1)) +
geom_line() +
theme(axis.line = element_line(colour = "black")) +
theme(axis.text.x=element_text(angle=-45, hjust = .001)) +
theme(plot.margin=unit(c(1,1.75,1,1),"cm")) +
labs(title = "Quarterly Appliance Shipments", x = "Quarter", y = "Shipments in Millions U.S. $") +
stat_smooth(method = "lm")
appliancePlot

b) Level and noise are always present - there appears to be an upward trend and a seasonality of Q1 and Q4 being significantly lower than Q2 and Q3, the exception is between ’87 and ’88 where the ending and starting quarters did not drop as much. Using the examples from the book, this graph has and upward linear trend with additive seasonality.
Question 6
ShampooSales <- read_excel("~/rProjects/Assignment_1/ShampooSales.xlsx")
shampooPlot <- ggplot(data = ShampooSales, aes(x=Month, y=ShampooSales, group=1)) +
geom_line() +
theme(axis.line = element_line(colour = "black")) +
theme(axis.text.x=element_text(angle=-45, hjust = .001)) +
theme(plot.margin=unit(c(1,1.75,1,1),"cm")) +
labs(title = "Monthly Shampoo Sales", x = "Month", y = "Shampoo Sales") +
stat_smooth(method = "lm")
shampooPlot

shampooSales.ts <- ts(ShampooSales$ShampooSales, start = c(1995,1), end = c(1997, 12), freq = 12)
shampoo1995 <- window(shampooSales.ts, start = 1995, end = c(1995,12))
plot(shampoo1995, ylab = "Shampoo Sales", xlab = "Date", main = "1995 Shampoo Sales by Month", bty = "l")

shampoo1996 <- window(shampooSales.ts, start = 1996, end = c(1996,12))
plot(shampoo1996, ylab = "Shampoo Sales", xlab = "Date", main = "1996 Shampoo Sales by Month", bty = "l")

shampoo1997 <- window(shampooSales.ts, start = 1997, end = c(1997,12))
plot(shampoo1997, ylab = "Shampoo Sales", xlab = "Date", main = "1997 Shampoo Sales by Month", bty = "l")

ts.plot(shampoo1995, shampoo1996, shampoo1997, gpars=list(xlab="Date", ylab="Shampoo Sales", main = "Shampoo Sales Broken Out by Year", lty=c(1:3)))

b) After examining the first chart, it is clear there there is an upward trend with mild seansonality. I wanted to take a closer look, so I zoomed in on each of the specific years and then re-plotted the time series with the years made distinct, hoping to better identify seasonality. The single consistency that I see across the years is that sales spike in the fall.
c) I would not expect to see seasonality in shampoo sales; I would hope that people wash their hair consistantly throughout the year, however sales tend spike each year around September, October or November.