Assignment 1

Chapter 2 Questions

library(readr)
library(forecast)

1. Impact of September 11 on Air Travel in the United States:*

Sept11Travel <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/Sept11Travel_Updated.csv")
str(Sept11Travel)

## Classes 'tbl_df', 'tbl' and 'data.frame':    172 obs. of  4 variables:
##  $ Month: chr  "Jan-90" "Feb-90" "Mar-90" "Apr-90" ...
##  $ Air  : int  35153577 32965187 39993913 37981886 38419672 42819023 45770315 48763670 38173223 39051877 ...
##  $ Rail : int  454115779 435086002 568289732 568101697 539628385 570694457 618571581 609210368 488444939 514253920 ...
##  $ VMT  : num  163 153 178 179 189 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 4
##   .. ..$ Month: list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Air  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Rail : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ VMT  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

head(Sept11Travel)

## # A tibble: 6 × 4
##    Month      Air      Rail    VMT
##    <chr>    <int>     <int>  <dbl>
## 1 Jan-90 35153577 454115779 163.28
## 2 Feb-90 32965187 435086002 153.25
## 3 Mar-90 39993913 568289732 178.42
## 4 Apr-90 37981886 568101697 178.68
## 5 May-90 38419672 539628385 188.88
## 6 Jun-90 42819023 570694457 189.16

min(Sept11Travel$Air)

## [1] 29672427

max(Sept11Travel$Air)

## [1] 69003617

Air.Sept11Travel.ts <- ts(Sept11Travel$Air, start=c(1990,1), end=c(2001,8), freq=12)
plot(Air.Sept11Travel.ts/1000000, xlab="Time", ylab="Air_Miles (in millions)",ylim = c(29, 70), bty="l", main="Air Miles Traveled Before 9/11")

min(Sept11Travel$Rail)

## [1] 326874247

max(Sept11Travel$Rail)

## [1] 664013874

Rail.Sept11Travel.ts <- ts(Sept11Travel$Rail, start=c(1990,1), end=c(2001,8), freq=12)
plot(Rail.Sept11Travel.ts/1000000, xlab="Time", ylab="Rail_Miles (in millions)",ylim = c(300, 700), bty="l", main="Rail Miles Traveled Before 9/11")

min(Sept11Travel$VMT)

## [1] 153.25

max(Sept11Travel$VMT)

## [1] 261.3

Auto.Sept11Travel.ts <- ts(Sept11Travel$VMT, start=c(1990,1), end=c(2001,8), freq=12)
plot(Auto.Sept11Travel.ts, xlab="Time", ylab="Auto_Miles",ylim = c(150, 300), bty="l", main="Auto Miles Traveled Before 9/11")

A) Time Series Components in the plot:

*Additive - all components on all plots appear to be additive

*Level - they all have an average level.

*Seasonality - In all series seasonality is clearly present, with an annual pattern in miles traveled. Summer season appears to be an annual peak.

*Trend - Both Air and Auto show upward linear trends, while Rail miles show a 3rd order polynomial trend.

*Noise - present in all plots to some unknown degree.

Changes were already previously made to the scale to look at just a relevant range for the series data. Additional changes are made below to look at trend lines and to suppress the seasonality

Added Trend Lines

Air.Linear <- tslm(Air.Sept11Travel.ts ~ trend)
plot(Air.Sept11Travel.ts/1000000, xlab="Time", ylab="Air Miles (in millions)", ylim=c(29, 70), bty="l", main="Air Miles Traveled Before 9/11")
lines(Air.Linear$fitted/1000000)

Rail.Linear <- tslm(Rail.Sept11Travel.ts ~ trend)
Rail.Quad <- tslm(Rail.Sept11Travel.ts ~ trend + I(trend^2))
plot(Rail.Sept11Travel.ts/1000000, xlab="Time", ylab="Rail Miles (in millions)", ylim=c(300, 700), bty="l", main="Rail Miles Traveled Before 9/11")
lines(Rail.Linear$fitted/1000000)
lines(Rail.Quad$fitted/1000000, lty=2, col="blue")

Auto.Linear <- tslm(Auto.Sept11Travel.ts ~ trend)
plot(Auto.Sept11Travel.ts, xlab="Time", ylab="Auto Miles", ylim=c(150, 300), bty="l", main="Auto Miles Traveled Before 9/11")
lines(Auto.Linear$fitted)

Supress Seasonality

Air.yearly <- aggregate(Air.Sept11Travel.ts/1000000, nfrequency=1, FUN=sum)
plot(Air.yearly, bty="l")

Rail.yearly <- aggregate(Rail.Sept11Travel.ts/1000000, nfrequency=1, FUN=sum)
plot(Rail.yearly, bty="l")

Auto.yearly <- aggregate(Auto.Sept11Travel.ts, nfrequency=1, FUN=sum)
plot(Auto.yearly, bty="l")

Types of trends that appear:

Overall, Air and Auto both show Upward Linear Trends with Additive Seasonality; Rail shows a 3rd Order Polynomial Trend with Additive Seasonality.

3. Shipments of Household Appliances:

Appliances <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/ApplianceShipments.csv")

## Parsed with column specification:
## cols(
##   Quarter = col_character(),
##   Shipments = col_integer()
## )

str(Appliances)

## Classes 'tbl_df', 'tbl' and 'data.frame':    20 obs. of  2 variables:
##  $ Quarter  : chr  "Q1-1985" "Q2-1985" "Q3-1985" "Q4-1985" ...
##  $ Shipments: int  4009 4321 4224 3944 4123 4522 4657 4030 4493 4806 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 2
##   .. ..$ Quarter  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Shipments: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

head(Appliances)

## # A tibble: 6 × 2
##   Quarter Shipments
##     <chr>     <int>
## 1 Q1-1985      4009
## 2 Q2-1985      4321
## 3 Q3-1985      4224
## 4 Q4-1985      3944
## 5 Q1-1986      4123
## 6 Q2-1986      4522

#Time Series
#Had to manipulate Q format in Excel for data to be in chrono order.
Appliances.ts <- ts(Appliances$Shipments, start = c(1985,1), frequency = 4)

A) Create a well-formatted time plot of the data.

#Determine scale
min(Appliances$Shipments)

## [1] 3944

max(Appliances$Shipments)

## [1] 4900

#Plot
plot(Appliances.ts, xlab="Time", ylab="Shipments (in $millions)",ylim = c(3800, 5000), bty="l", main="Quarterly Shipments of Household Appliances")

#Plot with trend lines
Appliances.Linear <- tslm(Appliances.ts ~ trend)
Appliances.Quad <- tslm(Appliances.ts ~ trend + I(trend^2))
plot(Appliances.ts, xlab="Time", ylab="Shipments (in $millions)", ylim=c(3800, 5000), bty="l", main="Quarterly Shipments of Household Appliances")
lines(Appliances.Linear$fitted)
lines(Appliances.Quad$fitted, lty=2, col="blue")

Suppress Seasonality

Appliances.yearly <- aggregate(Appliances.ts, nfrequency=1, FUN=sum)
plot(Appliances.yearly, bty="l")

B) Time Series Components in the plot:

*Level - level is calculated as follows:

mean(Appliances$Shipments)

## [1] 4424.85

*Seasonality - There is definitely seasonality. Q2 is each year spikes, followed by a rapid decline in shipments in Q3 and Q4.

*Trend - From plot above, the trend is positive. While it looks generally linear, it could be a quadratic function, as seen by both trend lines fitted to the data.

*Noise - present in all plots to some unknown degree.

6. Forecasting Shampoo Sales:

Shampoo <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/ShampooSales.csv")

## Parsed with column specification:
## cols(
##   Month = col_character(),
##   `Shampoo Sales` = col_double()
## )

str(Shampoo)

## Classes 'tbl_df', 'tbl' and 'data.frame':    36 obs. of  2 variables:
##  $ Month        : chr  "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
##  $ Shampoo Sales: num  266 146 183 119 180 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 2
##   .. ..$ Month        : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Shampoo Sales: list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

head(Shampoo)

## # A tibble: 6 × 2
##    Month `Shampoo Sales`
##    <chr>           <dbl>
## 1 Jan-95           266.0
## 2 Feb-95           145.9
## 3 Mar-95           183.1
## 4 Apr-95           119.3
## 5 May-95           180.3
## 6 Jun-95           168.5

#Time Series
Shampoo.ts <- ts(Shampoo$'Shampoo Sales', start = c(1995,1), frequency = 12)

A) Create a well-formatted time plot of the data.

#Determine scale
min(Shampoo$`Shampoo Sales`)

## [1] 119.3

max(Shampoo$`Shampoo Sales`)

## [1] 682

#Plot
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim = c(0, 700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo")

#Plot with trend lines
Shampoo.Linear <- tslm(Shampoo.ts ~ trend)
Shampoo.Quad <- tslm(Shampoo.ts ~ trend + I(trend^2))
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim=c(0,700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo")
lines(Shampoo.Linear$fitted)
lines(Shampoo.Quad$fitted, lty=2, col="blue")

#Plot with trend lines
Shampoo.Season <- tslm(Shampoo.ts ~ trend + season)
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim=c(0,700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo w/ Seasonality")
lines(Shampoo.Season$fitted, lty=2, col="blue")

The above plot shows both the trend and the seasonality (for any given month of the series).

Suppress Seasonality

Shampoo.quarterly <- aggregate(Shampoo.ts, nfrequency=4, FUN=sum)
plot(Shampoo.quarterly, ylim=c(500,2000), bty="l")

B) Time Series Components in the plot:

*Level - level is calculated as follows:

mean(Shampoo$`Shampoo Sales`)

## [1] 312.6

*Seasonality - There is definitely some seasonality.

*Trend - All plots show a positive trend. Given the dramatic uptick in the most recent year, it suggests that this trend is quadratic. With a wider data series we could make a better determination as to the best type of trend of this data.

*Noise - present in all plots to some unknown degree.

C) I would expect to see some seasonality in Shampoo Sales. Looking at the quarterly data, the data plots show consistent jumps in Q2 and Q3 (summer months). One could argue that more showers are taken in the summer due do the hot weather. Further, vacations where people are buying additional travel supplies would generate larger growth. And back-to-school and sport practice schedules would promote shampoo sales. The opposite is seen routinely in Q1 where sales grow the slowest. Another consideration would be the brand of shampoo - if this is an Axe-like brand, the above arguments are very conceivable.

Assignment 1

Edrick Wittes

February 5, 2017

Chapter 1 Questions

Chapter 2 Questions

1. Impact of September 11 on Air Travel in the United States:*

Added Trend Lines

Supress Seasonality

3. Shipments of Household Appliances:

Suppress Seasonality

6. Forecasting Shampoo Sales:

Suppress Seasonality