Descriptive. Used past data to attain a better understanding of behavior - not to predict future behaviors.
Longer horizons are most likely more beneficial in this task. Next-month forecasts are probably too short a time frame as the effects of an event (terrorist attack) most likely have a changing, namely declining, effect over time - especially on airline passenger miles. That is, next-month after a terrorist attack may have signifcant reduction in long distance trips, but that may normalize on a longer horizon. I would recommend a horizon of a full year, and to reevaluate every 3 months to account for changing behavior since the event.
While there are 3 series to be forcasted, this task is essetially a one-time analysis - it will not need to be run every day with new data. Simple software programs (such as RStudio) can be used to perform the mechanics necessary for this task. Further, an average data analyst can manage these models, preventing the need to further automation. For the current goal and scope, this task requires little automation
They are the first three time periods for observations. In this file, the time periods are as follow:
t1 = Jan-90
t2 = Feb-90
t3 = Mar-90
The y values are the Air RPM (in thousands) for the first three periods as follow:
y1 = 35,153,577
y2 = 32,965,187
y3 = 39,993,913
library(readr)
library(forecast)
Sept11Travel <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/Sept11Travel_Updated.csv")
str(Sept11Travel)
## Classes 'tbl_df', 'tbl' and 'data.frame': 172 obs. of 4 variables:
## $ Month: chr "Jan-90" "Feb-90" "Mar-90" "Apr-90" ...
## $ Air : int 35153577 32965187 39993913 37981886 38419672 42819023 45770315 48763670 38173223 39051877 ...
## $ Rail : int 454115779 435086002 568289732 568101697 539628385 570694457 618571581 609210368 488444939 514253920 ...
## $ VMT : num 163 153 178 179 189 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 4
## .. ..$ Month: list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ Air : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Rail : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ VMT : list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
head(Sept11Travel)
## # A tibble: 6 × 4
## Month Air Rail VMT
## <chr> <int> <int> <dbl>
## 1 Jan-90 35153577 454115779 163.28
## 2 Feb-90 32965187 435086002 153.25
## 3 Mar-90 39993913 568289732 178.42
## 4 Apr-90 37981886 568101697 178.68
## 5 May-90 38419672 539628385 188.88
## 6 Jun-90 42819023 570694457 189.16
min(Sept11Travel$Air)
## [1] 29672427
max(Sept11Travel$Air)
## [1] 69003617
Air.Sept11Travel.ts <- ts(Sept11Travel$Air, start=c(1990,1), end=c(2001,8), freq=12)
plot(Air.Sept11Travel.ts/1000000, xlab="Time", ylab="Air_Miles (in millions)",ylim = c(29, 70), bty="l", main="Air Miles Traveled Before 9/11")
min(Sept11Travel$Rail)
## [1] 326874247
max(Sept11Travel$Rail)
## [1] 664013874
Rail.Sept11Travel.ts <- ts(Sept11Travel$Rail, start=c(1990,1), end=c(2001,8), freq=12)
plot(Rail.Sept11Travel.ts/1000000, xlab="Time", ylab="Rail_Miles (in millions)",ylim = c(300, 700), bty="l", main="Rail Miles Traveled Before 9/11")
min(Sept11Travel$VMT)
## [1] 153.25
max(Sept11Travel$VMT)
## [1] 261.3
Auto.Sept11Travel.ts <- ts(Sept11Travel$VMT, start=c(1990,1), end=c(2001,8), freq=12)
plot(Auto.Sept11Travel.ts, xlab="Time", ylab="Auto_Miles",ylim = c(150, 300), bty="l", main="Auto Miles Traveled Before 9/11")
A) Time Series Components in the plot:
*Additive - all components on all plots appear to be additive
*Level - they all have an average level.
*Seasonality - In all series seasonality is clearly present, with an annual pattern in miles traveled. Summer season appears to be an annual peak.
*Trend - Both Air and Auto show upward linear trends, while Rail miles show a 3rd order polynomial trend.
*Noise - present in all plots to some unknown degree.
Air.Linear <- tslm(Air.Sept11Travel.ts ~ trend)
plot(Air.Sept11Travel.ts/1000000, xlab="Time", ylab="Air Miles (in millions)", ylim=c(29, 70), bty="l", main="Air Miles Traveled Before 9/11")
lines(Air.Linear$fitted/1000000)
Rail.Linear <- tslm(Rail.Sept11Travel.ts ~ trend)
Rail.Quad <- tslm(Rail.Sept11Travel.ts ~ trend + I(trend^2))
plot(Rail.Sept11Travel.ts/1000000, xlab="Time", ylab="Rail Miles (in millions)", ylim=c(300, 700), bty="l", main="Rail Miles Traveled Before 9/11")
lines(Rail.Linear$fitted/1000000)
lines(Rail.Quad$fitted/1000000, lty=2, col="blue")
Auto.Linear <- tslm(Auto.Sept11Travel.ts ~ trend)
plot(Auto.Sept11Travel.ts, xlab="Time", ylab="Auto Miles", ylim=c(150, 300), bty="l", main="Auto Miles Traveled Before 9/11")
lines(Auto.Linear$fitted)
Air.yearly <- aggregate(Air.Sept11Travel.ts/1000000, nfrequency=1, FUN=sum)
plot(Air.yearly, bty="l")
Rail.yearly <- aggregate(Rail.Sept11Travel.ts/1000000, nfrequency=1, FUN=sum)
plot(Rail.yearly, bty="l")
Auto.yearly <- aggregate(Auto.Sept11Travel.ts, nfrequency=1, FUN=sum)
plot(Auto.yearly, bty="l")
Types of trends that appear:
Overall, Air and Auto both show Upward Linear Trends with Additive Seasonality; Rail shows a 3rd Order Polynomial Trend with Additive Seasonality.
Appliances <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/ApplianceShipments.csv")
## Parsed with column specification:
## cols(
## Quarter = col_character(),
## Shipments = col_integer()
## )
str(Appliances)
## Classes 'tbl_df', 'tbl' and 'data.frame': 20 obs. of 2 variables:
## $ Quarter : chr "Q1-1985" "Q2-1985" "Q3-1985" "Q4-1985" ...
## $ Shipments: int 4009 4321 4224 3944 4123 4522 4657 4030 4493 4806 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 2
## .. ..$ Quarter : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ Shipments: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
head(Appliances)
## # A tibble: 6 × 2
## Quarter Shipments
## <chr> <int>
## 1 Q1-1985 4009
## 2 Q2-1985 4321
## 3 Q3-1985 4224
## 4 Q4-1985 3944
## 5 Q1-1986 4123
## 6 Q2-1986 4522
#Time Series
#Had to manipulate Q format in Excel for data to be in chrono order.
Appliances.ts <- ts(Appliances$Shipments, start = c(1985,1), frequency = 4)
A) Create a well-formatted time plot of the data.
#Determine scale
min(Appliances$Shipments)
## [1] 3944
max(Appliances$Shipments)
## [1] 4900
#Plot
plot(Appliances.ts, xlab="Time", ylab="Shipments (in $millions)",ylim = c(3800, 5000), bty="l", main="Quarterly Shipments of Household Appliances")
#Plot with trend lines
Appliances.Linear <- tslm(Appliances.ts ~ trend)
Appliances.Quad <- tslm(Appliances.ts ~ trend + I(trend^2))
plot(Appliances.ts, xlab="Time", ylab="Shipments (in $millions)", ylim=c(3800, 5000), bty="l", main="Quarterly Shipments of Household Appliances")
lines(Appliances.Linear$fitted)
lines(Appliances.Quad$fitted, lty=2, col="blue")
Appliances.yearly <- aggregate(Appliances.ts, nfrequency=1, FUN=sum)
plot(Appliances.yearly, bty="l")
B) Time Series Components in the plot:
*Level - level is calculated as follows:
mean(Appliances$Shipments)
## [1] 4424.85
*Seasonality - There is definitely seasonality. Q2 is each year spikes, followed by a rapid decline in shipments in Q3 and Q4.
*Trend - From plot above, the trend is positive. While it looks generally linear, it could be a quadratic function, as seen by both trend lines fitted to the data.
*Noise - present in all plots to some unknown degree.
Shampoo <- read_csv("C:/Users/Edrick/Google Drive/678 - Predictive Analytics/R Files/ShampooSales.csv")
## Parsed with column specification:
## cols(
## Month = col_character(),
## `Shampoo Sales` = col_double()
## )
str(Shampoo)
## Classes 'tbl_df', 'tbl' and 'data.frame': 36 obs. of 2 variables:
## $ Month : chr "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
## $ Shampoo Sales: num 266 146 183 119 180 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 2
## .. ..$ Month : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ Shampoo Sales: list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
head(Shampoo)
## # A tibble: 6 × 2
## Month `Shampoo Sales`
## <chr> <dbl>
## 1 Jan-95 266.0
## 2 Feb-95 145.9
## 3 Mar-95 183.1
## 4 Apr-95 119.3
## 5 May-95 180.3
## 6 Jun-95 168.5
#Time Series
Shampoo.ts <- ts(Shampoo$'Shampoo Sales', start = c(1995,1), frequency = 12)
A) Create a well-formatted time plot of the data.
#Determine scale
min(Shampoo$`Shampoo Sales`)
## [1] 119.3
max(Shampoo$`Shampoo Sales`)
## [1] 682
#Plot
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim = c(0, 700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo")
#Plot with trend lines
Shampoo.Linear <- tslm(Shampoo.ts ~ trend)
Shampoo.Quad <- tslm(Shampoo.ts ~ trend + I(trend^2))
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim=c(0,700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo")
lines(Shampoo.Linear$fitted)
lines(Shampoo.Quad$fitted, lty=2, col="blue")
#Plot with trend lines
Shampoo.Season <- tslm(Shampoo.ts ~ trend + season)
plot(Shampoo.ts, xlab="Time", ylab="Shampoo Sales", ylim=c(0,700), bty="l", main="Monthly Shampoo Sales of a Certain Shampoo w/ Seasonality")
lines(Shampoo.Season$fitted, lty=2, col="blue")
The above plot shows both the trend and the seasonality (for any given month of the series).
Shampoo.quarterly <- aggregate(Shampoo.ts, nfrequency=4, FUN=sum)
plot(Shampoo.quarterly, ylim=c(500,2000), bty="l")
B) Time Series Components in the plot:
*Level - level is calculated as follows:
mean(Shampoo$`Shampoo Sales`)
## [1] 312.6
*Seasonality - There is definitely some seasonality.
*Trend - All plots show a positive trend. Given the dramatic uptick in the most recent year, it suggests that this trend is quadratic. With a wider data series we could make a better determination as to the best type of trend of this data.
*Noise - present in all plots to some unknown degree.
C) I would expect to see some seasonality in Shampoo Sales. Looking at the quarterly data, the data plots show consistent jumps in Q2 and Q3 (summer months). One could argue that more showers are taken in the summer due do the hot weather. Further, vacations where people are buying additional travel supplies would generate larger growth. And back-to-school and sport practice schedules would promote shampoo sales. The opposite is seen routinely in Q1 where sales grow the slowest. Another consideration would be the brand of shampoo - if this is an Axe-like brand, the above arguments are very conceivable.