MBA 678- predictive Analytics
|
Assignment #1 due Feb 06 2017
|
Yusuf Sultan
|
—————————————————————————————————————————————————————————–
Chapter 1- problems:
1.Is the goal of this study descriptive or predictive?
Since the goal of this study is to provide an understanding of the passenger travel behavior pattern of person making long distance trip before and after September 11, that mean we have two type of analytics:
a.It is descriptive analytic because the first purpose is to study before September 11 and we know that descriptive analytic looks at data and analyzes past events for insight as to how to approach the future.
b.It is predictive because the second purpose is to study after September 11 and we know that predictive analytic uses data to determine the probable future outcome of an event or a likelihood of a situation occurring.
|
-
What is the forecast horizon to consider in this task? Are next -month forecasts sufficient?
The forecast horizon which is denoted k is the number of periods ahead that we most forecast, and Ft+k is a keep -step-ahead forecast. In our September 11 study, one month ahead forecasts (Ft+1) was considered in this task.
It might be sufficient depending on the purposes of the study, one month step ahead for created flexible prices or three month=ahead (Ft+3) are more likely to be needed for scheduling procurement purposes.
|
|
-
What level of automation does this forecasting task require? Consider the four-question related to automation.
The level of required automation depends on the nature of the forecasting task and how will the forecast will be used. There are some questions to ask.
How many series need to be forecasted? Which we have 2 series need to forecasted (Air, Rail and Car). the second question, Is the forecasting an ongoing process or a one-time event? The answer for this question or task is to process before and after September 11 that mean our forecasting is ongoing process and not one time event. The third question will be which data and software will be available during the forecasting period? It is not easy to answer this question because there are a lot of software could use for forecast this data, but since we are studying the time series forecasting with R so probably the data of September 11 and R software are available for use if we asked to forecasting it. The fourth question will be .What forecasting expertise will be available at the organization during the forecasting period? The forecasting expertise will be having a good understanding about the behavior of the passenger for long distance trip and predicate which is more likely used for make travel in the future.
|
-
What the meaning of t = 1,2,3 in the Air series? Which time period doesrefer to?
(t) is an index denoting the time period of interest and t=1 is the first period in a series. In our data (September 11Travel) we have t1= 1/1/1990,t2 = 2/1/1990 and t3 = 3/1/1990 for the Air series. So,t=1 is refer to the first period in series.
|
-
What are the values for ,, and in the Air series?
(y1,y2,y3) are a series of n values measured over n time periods so the values of:
y1=35153577, y2 =35153577 and y3 = 39993913 for Air series which those are the values measured for t1,t2,and t3 .
|
———————————————————————————————————————-
problem 1 1. plot each of the three pre-event time series (Air, Rail, Car).
Problem 1 Sptember 11 analyzes monthly passanger movement data between january 1990 and April 2004 plot each of the three pre-event time series (Air,Rail,Car)
set up the direction of the file+call the data and uplodod it to R + display structure+ display first 6 observations
|
getwd()
## [1] "C:/Users/Yusuf/Desktop"
setwd("C:/Users/Yusuf/Desktop/predictive/")
Spt11 <- read.csv("Sept11Travel.csv")
str(Spt11)
## 'data.frame': 172 obs. of 4 variables:
## $ Month: Factor w/ 172 levels "1-Apr","1-Aug",..: 86 75 119 42 130 108 97 53 163 152 ...
## $ Air : int 35153577 32965187 39993913 37981886 38419672 42819023 45770315 48763670 38173223 39051877 ...
## $ Rail : int 454115779 435086002 568289732 568101697 539628385 570694457 618571581 609210368 488444939 514253920 ...
## $ Car : num 163 153 178 179 189 ...
head(Spt11)
## Month Air Rail Car
## 1 Jan-90 35153577 454115779 163.28
## 2 Feb-90 32965187 435086002 153.25
## 3 Mar-90 39993913 568289732 178.42
## 4 Apr-90 37981886 568101697 178.68
## 5 May-90 38419672 539628385 188.88
## 6 Jun-90 42819023 570694457 189.16
Create a time series object and plot the three pre-event time series (Air,Rail,Car) it.
series time 1 =Air ,2= Rail and 3 = Car
|
Spt11Air.ts <- ts(Spt11$Air, start=c(1990,1), end=c(2001,8), freq=12)
Spt11Rail.ts <- ts(Spt11$Rail, start=c(1990,1), end=c(2001,8), freq=12)
Spt11Car.ts <- ts(Spt11$Car, start=c(1990,1), end=c(2001,8), freq=12)
plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")

a. What time series components appear from the plot?
Air series time (seasonality): the first plot of Air series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Air mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.
Air series time (Trend): shows a strong increasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Air series in increase which is upward Trend.
Air series time (Level): The plot have level which is the average value of the series
Air series time (Noise): The plot have noise which is the random variation from measurement error.
==========
Rail series time (seasonality): The Second plot of Rail series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Rail mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.
Rail series time (Trend): shows a strong decreasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Rail series in decrease which is downward Trend.
Rail series time (Level): The plot have level which is the average value of the series
Rail series time (Noise): The plot have noise which is the random variation from measurement error.
================
Car series time (seasonality): the third plot of Car series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Car mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.
Car series time (Trend): shows a strong increasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Car series in increase which is upward Trend.
Car series time (Level): The plot have level which is the average value of the series
Car series time (Noise): The plot have noise which is the random variation from measurement error.
================================================================
plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")

plot(Spt11Air.ts, log="y", xlab="Time", ylab="Air (log scale)", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, log="y", xlab="Time", ylab="Rail (log scale)", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, log="y", xlab="Time", ylab="Car (log scale)", ylim=c(150, 250), bty="l")

Spt11["logAir"] <- log(Spt11$Air)
plot(Spt11$logAir, type="l", bty="l")

Spt11["logRail"] <- log(Spt11$Rail)
plot(Spt11$logRail, type="l", bty="l")

Spt11["logCar"] <- log(Spt11$Car)
plot(Spt11$logCar, type="l", bty="l")

library(forecast)
## Warning: package 'forecast' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.3.2
## This is forecast 7.3
AirLinear <- tslm(Spt11Air.ts ~ trend)
summary(AirLinear)
##
## Call:
## tslm(formula = Spt11Air.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9466409 -3410590 -681183 3360750 11823514
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35728435 834749 42.80 <2e-16 ***
## trend 177097 10272 17.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4912000 on 138 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6806
## F-statistic: 297.2 on 1 and 138 DF, p-value: < 2.2e-16
plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")
lines(AirLinear$fitted, lwd=2)
AirQuad <- tslm(Spt11Air.ts ~ trend + I(trend^2))
summary(AirQuad)
##
## Call:
## tslm(formula = Spt11Air.ts ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10706560 -3385499 -494312 3334001 11901573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.745e+07 1.253e+06 29.897 <2e-16 ***
## trend 1.042e+05 4.102e+04 2.541 0.0122 *
## I(trend^2) 5.169e+02 2.818e+02 1.834 0.0688 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4870000 on 137 degrees of freedom
## Multiple R-squared: 0.6905, Adjusted R-squared: 0.686
## F-statistic: 152.8 on 2 and 137 DF, p-value: < 2.2e-16
lines(AirQuad$fitted, lty=2, lwd=3)

RailLinear <- tslm(Spt11Rail.ts ~ trend)
summary(RailLinear)
##
## Call:
## tslm(formula = Spt11Rail.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -142225897 -40750574 -4370229 41272192 133135874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 547632872 10511773 52.097 < 2e-16 ***
## trend -837744 129357 -6.476 1.53e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 61860000 on 138 degrees of freedom
## Multiple R-squared: 0.2331, Adjusted R-squared: 0.2275
## F-statistic: 41.94 on 1 and 138 DF, p-value: 1.532e-09
plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")
lines(RailLinear$fitted, lwd=2)
RailQuad <- tslm(Spt11Rail.ts ~ trend + I(trend^2))
summary(RailQuad)
##
## Call:
## tslm(formula = Spt11Rail.ts ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -137634923 -37327572 -1559409 41652351 125112936
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 576828666 15618390 36.93 < 2e-16 ***
## trend -2071369 511394 -4.05 8.52e-05 ***
## I(trend^2) 8749 3513 2.49 0.014 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 60720000 on 137 degrees of freedom
## Multiple R-squared: 0.2663, Adjusted R-squared: 0.2556
## F-statistic: 24.86 on 2 and 137 DF, p-value: 6.14e-10
lines(RailQuad$fitted, lty=2, lwd=3)

CarLinear <- tslm(Spt11Car.ts ~ trend)
summary(CarLinear)
##
## Call:
## tslm(formula = Spt11Car.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.554 -9.185 0.559 11.087 23.272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 173.17137 2.42094 71.53 <2e-16 ***
## trend 0.44249 0.02979 14.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.25 on 138 degrees of freedom
## Multiple R-squared: 0.6152, Adjusted R-squared: 0.6124
## F-statistic: 220.6 on 1 and 138 DF, p-value: < 2.2e-16
plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")
lines(CarLinear$fitted, lwd=2)
CarQuad <- tslm(Spt11Car.ts ~ trend + I(trend^2))
summary(CarQuad)
##
## Call:
## tslm(formula = Spt11Car.ts ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.503 -8.745 0.798 10.974 23.752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.745e+02 3.674e+00 47.487 < 2e-16 ***
## trend 3.867e-01 1.203e-01 3.214 0.00163 **
## I(trend^2) 3.956e-04 8.266e-04 0.479 0.63297
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.29 on 137 degrees of freedom
## Multiple R-squared: 0.6158, Adjusted R-squared: 0.6102
## F-statistic: 109.8 on 2 and 137 DF, p-value: < 2.2e-16
lines(CarQuad$fitted, lty=2, lwd=3)

Changing the Scale
First plot the entire time series again.
|
b. What type of trend appears? Change the scale of the series, add trend lines, and uppress seasonality to better visualize the trend pattern.
There are two types of trend appeared on the three plots we made.
Air, and Car series plot shows upward Trend.
Rail series plot shows downward Trend.
# aggregate by quarter for Air
quarterly <- aggregate(Spt11Air.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

# aggregate by quarter for Rail
quarterly <- aggregate(Spt11Rail.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

# aggregate by quarter for Car
quarterly <- aggregate(Spt11Car.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

Chapter 2- problem-3:
a. Create a well –formatted time plot of the data.
|
setwd("C:/Users/Yusuf/Desktop/predictive/")
Ship <- read.csv("ApplianceShipments.csv")
str(Ship)
## 'data.frame': 20 obs. of 2 variables:
## $ Quarter : Factor w/ 20 levels "Q1-1985","Q1-1986",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Shipments: int 4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
head(Ship)
## Quarter Shipments
## 1 Q1-1985 4009
## 2 Q1-1986 4123
## 3 Q1-1987 4493
## 4 Q1-1988 4595
## 5 Q1-1989 4245
## 6 Q2-1985 4321
Ship.ts <- ts(Ship$Shipments, start=c(1985), end=c(1989), freq=4)
plot(Ship.ts, xlab="Time", ylab="Shipments", ylim=c(3500, 5000), bty="l")

quarterly <- aggregate(Ship.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

library(forecast)
ShipLinear <- tslm(Ship.ts ~ trend)
summary(ShipLinear)
##
## Call:
## tslm(formula = Ship.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -486.03 -202.27 72.75 174.00 474.48
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4417.9926 154.0939 28.67 1.62e-14 ***
## trend 0.7525 15.0380 0.05 0.961
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 303.8 on 15 degrees of freedom
## Multiple R-squared: 0.0001669, Adjusted R-squared: -0.06649
## F-statistic: 0.002504 on 1 and 15 DF, p-value: 0.9608
plot(Ship.ts, xlab="Time", ylab="Shipments", ylim=c(3500, 5000), bty="l")
lines(ShipLinear$fitted, lwd=2)
ShipQuad <- tslm(Ship.ts ~ trend + I(trend^2))
summary(ShipQuad)
##
## Call:
## tslm(formula = Ship.ts ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -401.42 -100.41 -1.57 152.97 275.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3850.426 172.440 22.329 2.4e-12 ***
## trend 179.984 44.102 4.081 0.001123 **
## I(trend^2) -9.957 2.381 -4.182 0.000923 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 209.7 on 14 degrees of freedom
## Multiple R-squared: 0.5554, Adjusted R-squared: 0.4919
## F-statistic: 8.745 on 2 and 14 DF, p-value: 0.003433
lines(ShipQuad$fitted, lty=2, lwd=3)

b. Which of the four components (level, trend, seasonality, noise) seem to be in the series?
-
There is no seasonality ,non-seasonal as the values don’t repeat at any frequency.
-
The Trend is a constant trend ,which is mean no Trend on this data.
-
series has a level as level refers to the average value of the series so all series have a level.
-
There is noise as there is no periodicity to the variation.
|
setwd("C:/Users/Yusuf/Desktop/predictive/")
Shmpoo <- read.csv("ShampooSales.csv")
str(Shmpoo)
## 'data.frame': 36 obs. of 2 variables:
## $ Month: Factor w/ 36 levels "Apr-95","Apr-96",..: 13 10 22 1 25 19 16 4 34 31 ...
## $ Sales: num 266 146 183 119 180 ...
head(Shmpoo)
## Month Sales
## 1 Jan-95 266.0
## 2 Feb-95 145.9
## 3 Mar-95 183.1
## 4 Apr-95 119.3
## 5 May-95 180.3
## 6 Jun-95 168.5
Shmpoo.ts <- ts(Shmpoo$Sales, start=c(1995,1), end=c(1997,12), freq=12)
plot(Shmpoo.ts, xlab="Time", ylab="Shipments", ylim=c(100, 700), bty="l")

quarterly <- aggregate(Shmpoo.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

library(forecast)
ShmpooLinear <- tslm(Shmpoo.ts ~ trend)
summary(ShmpooLinear)
##
## Call:
## tslm(formula = Shmpoo.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108.74 -52.12 -16.13 43.48 194.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89.14 26.72 3.336 0.00207 **
## trend 12.08 1.26 9.590 3.37e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78.5 on 34 degrees of freedom
## Multiple R-squared: 0.7301, Adjusted R-squared: 0.7222
## F-statistic: 91.97 on 1 and 34 DF, p-value: 3.368e-11
plot(Shmpoo.ts, xlab="Time", ylab="Sales", ylim=c(100, 700), bty="l")
lines(ShmpooLinear$fitted, lwd=2)
ShmpooQuad <- tslm(Shmpoo.ts ~ trend + I(trend^2))
summary(ShmpooQuad)
##
## Call:
## tslm(formula = Shmpoo.ts ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -104.148 -42.075 -8.438 33.924 144.582
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.8789 33.3002 6.092 7.35e-07 ***
## trend -5.8801 4.1501 -1.417 0.166
## I(trend^2) 0.4854 0.1088 4.461 8.93e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62.93 on 33 degrees of freedom
## Multiple R-squared: 0.8316, Adjusted R-squared: 0.8214
## F-statistic: 81.51 on 2 and 33 DF, p-value: 1.708e-13
lines(ShmpooQuad$fitted, lty=2, lwd=3)

b. Which of the four components (level, trend, seasonality, noise) seem to be in the series?
There is no seasonality to the data, upward linear trend .There is increasing trend.
This series has a level as level refers to the average value of the series so all series have a level and, there is noise but not really visual.
c. Do you expect to see seasonality in sales of shampoo? Why?
Yes, because the seasonality is the time that we can see the sales of shampoo and gives us an idea about the number of sales that have in every month.