MBA 678- predictive Analytics

Assignment #1 due Feb 06 2017

Yusuf Sultan

 —————————————————————————————————————————————————————————–

Chapter 1-  problems:

1.Is the goal of this study descriptive or predictive?

 Since the goal of this study is to provide an understanding of the passenger travel behavior pattern of    person making long distance trip before and after September 11, that mean we have two type of analytics:

a.It is descriptive analytic because the first purpose is to study before September 11 and we know that descriptive analytic looks at data and analyzes past events for insight as to how to approach the future.

b.It is predictive because the second purpose is to study after September 11 and we know that predictive analytic uses data to determine the probable future outcome of an event or a likelihood of a situation occurring.

  1. What is the forecast horizon to consider in this task? Are next -month forecasts sufficient?

The forecast horizon which is denoted k is the number of periods ahead that we most forecast, and Ft+k is a keep -step-ahead forecast. In our September 11 study, one month ahead forecasts (Ft+1) was considered in this task.

It might be sufficient depending on the purposes of the study, one month step ahead for created flexible prices or three month=ahead (Ft+3) are more likely to be needed for scheduling procurement purposes.

 

  1. What level of automation does this forecasting task require? Consider the four-question related to automation.

The level of required automation depends on the nature of the forecasting task and how will the forecast will be used. There are some questions to ask.

How many series need to be forecasted? Which we have 2 series need to forecasted (Air, Rail and Car). the second question, Is the forecasting an ongoing process or a one-time event? The answer for this question or task is to process before and after September 11 that mean our forecasting is ongoing process and not one time event. The third question will be which data and software will be available during the forecasting period? It is not easy to answer this question because there are a lot of software could use for forecast this data, but since we are studying the time series forecasting with R so probably the data of September 11 and R software are available for use if we asked to forecasting it. The fourth question will be .What forecasting expertise will be available at the organization during the forecasting period? The forecasting expertise will be having a good understanding about the behavior of the passenger for long distance trip and predicate which is more likely used for make travel in the future.

  1. What the meaning of t = 1,2,3 in the Air series? Which time period doesrefer to?

 

(t) is an index denoting the time period of interest and t=1 is the first period in a series. In our data (September 11Travel) we have t1= 1/1/1990,t2 = 2/1/1990  and t3 = 3/1/1990  for the Air series. So,t=1  is refer to the first period in series.

  1. What are the values for ,, and in the Air series?

 

 (y1,y2,y3) are a series of n values measured over n time periods so the values of:

y1=35153577, y2 =35153577 and y3 = 39993913  for  Air series which those are the values measured for t1,t2,and t3 .

———————————————————————————————————————-

 

 

Chapter 2-  problems:

 

problem 1
1. plot each of the three pre-event time series (Air, Rail, Car).

 

Problem 1 Sptember 11 analyzes monthly passanger movement data between january 1990 and April 2004
plot each of the three pre-event time series (Air,Rail,Car)


set up the direction of the file+call the data and uplodod it to R + display structure+ display first 6 observations

getwd()
## [1] "C:/Users/Yusuf/Desktop"
setwd("C:/Users/Yusuf/Desktop/predictive/")
Spt11 <- read.csv("Sept11Travel.csv")
str(Spt11)
## 'data.frame':    172 obs. of  4 variables:
##  $ Month: Factor w/ 172 levels "1-Apr","1-Aug",..: 86 75 119 42 130 108 97 53 163 152 ...
##  $ Air  : int  35153577 32965187 39993913 37981886 38419672 42819023 45770315 48763670 38173223 39051877 ...
##  $ Rail : int  454115779 435086002 568289732 568101697 539628385 570694457 618571581 609210368 488444939 514253920 ...
##  $ Car  : num  163 153 178 179 189 ...
head(Spt11)
##    Month      Air      Rail    Car
## 1 Jan-90 35153577 454115779 163.28
## 2 Feb-90 32965187 435086002 153.25
## 3 Mar-90 39993913 568289732 178.42
## 4 Apr-90 37981886 568101697 178.68
## 5 May-90 38419672 539628385 188.88
## 6 Jun-90 42819023 570694457 189.16

 Create a time series object and plot the three pre-event time series (Air,Rail,Car) it.

series time 1 =Air ,2= Rail and 3 = Car

Spt11Air.ts <- ts(Spt11$Air, start=c(1990,1), end=c(2001,8), freq=12)
Spt11Rail.ts <- ts(Spt11$Rail, start=c(1990,1), end=c(2001,8), freq=12)
Spt11Car.ts <- ts(Spt11$Car, start=c(1990,1), end=c(2001,8), freq=12)
plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")

a. What time series components appear from the plot?

 

Air series time (seasonality): the first plot of Air series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Air mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.

 

Air series time (Trend): shows a strong increasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Air series in increase which is upward Trend.

Air series time (Level): The plot have level which is the average value of the series

Air series time (Noise): The plot have noise which is the random variation from measurement error.

==========


Rail series time (seasonality): The Second plot of Rail series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Rail mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.


Rail series time (Trend): shows a strong decreasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Rail series in decrease which is downward Trend.

Rail series time (Level): The plot have level which is the average value of the series

Rail series time (Noise): The plot have noise which is the random variation from measurement  error.
================

Car series time (seasonality): the third plot of  Car series time shows that the series has as strong seasonality by looking to the seasonal pattern display in the plot which is exists for the Car mileages during the time as strong seasonality within each year well as some strong cyclic behavior with period one year.

 
Car series time (Trend): shows a strong increasing trend. (A trend exists when there is a long-term increase or decrease in the data) in the plot of the Car series in increase which is upward Trend.

Car series time (Level): The plot have level which is the average value of the series

Car series time (Noise): The plot have noise which is the random variation from measurement error.

================================================================

plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")

plot(Spt11Air.ts, log="y", xlab="Time", ylab="Air (log scale)", ylim=c(25000000, 70000000), bty="l")

plot(Spt11Rail.ts, log="y", xlab="Time", ylab="Rail (log scale)", ylim=c(300000000, 700000000), bty="l")

plot(Spt11Car.ts, log="y", xlab="Time", ylab="Car (log scale)", ylim=c(150, 250), bty="l")

Spt11["logAir"] <- log(Spt11$Air)
plot(Spt11$logAir, type="l", bty="l")

Spt11["logRail"] <- log(Spt11$Rail)
plot(Spt11$logRail, type="l", bty="l")

Spt11["logCar"] <- log(Spt11$Car)
plot(Spt11$logCar, type="l", bty="l")

library(forecast)
## Warning: package 'forecast' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.3.2
## This is forecast 7.3
AirLinear <- tslm(Spt11Air.ts ~ trend)
summary(AirLinear)
## 
## Call:
## tslm(formula = Spt11Air.ts ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -9466409 -3410590  -681183  3360750 11823514 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 35728435     834749   42.80   <2e-16 ***
## trend         177097      10272   17.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4912000 on 138 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6806 
## F-statistic: 297.2 on 1 and 138 DF,  p-value: < 2.2e-16
plot(Spt11Air.ts, xlab="Time", ylab="Air", ylim=c(25000000, 70000000), bty="l")
lines(AirLinear$fitted, lwd=2)

AirQuad <- tslm(Spt11Air.ts ~ trend + I(trend^2))
summary(AirQuad)
## 
## Call:
## tslm(formula = Spt11Air.ts ~ trend + I(trend^2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10706560  -3385499   -494312   3334001  11901573 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.745e+07  1.253e+06  29.897   <2e-16 ***
## trend       1.042e+05  4.102e+04   2.541   0.0122 *  
## I(trend^2)  5.169e+02  2.818e+02   1.834   0.0688 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4870000 on 137 degrees of freedom
## Multiple R-squared:  0.6905, Adjusted R-squared:  0.686 
## F-statistic: 152.8 on 2 and 137 DF,  p-value: < 2.2e-16
lines(AirQuad$fitted, lty=2, lwd=3)

RailLinear <- tslm(Spt11Rail.ts ~ trend)
summary(RailLinear)
## 
## Call:
## tslm(formula = Spt11Rail.ts ~ trend)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -142225897  -40750574   -4370229   41272192  133135874 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 547632872   10511773  52.097  < 2e-16 ***
## trend         -837744     129357  -6.476 1.53e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61860000 on 138 degrees of freedom
## Multiple R-squared:  0.2331, Adjusted R-squared:  0.2275 
## F-statistic: 41.94 on 1 and 138 DF,  p-value: 1.532e-09
plot(Spt11Rail.ts, xlab="Time", ylab="Rail", ylim=c(300000000, 700000000), bty="l")
lines(RailLinear$fitted, lwd=2)

RailQuad <- tslm(Spt11Rail.ts ~ trend + I(trend^2))
summary(RailQuad)
## 
## Call:
## tslm(formula = Spt11Rail.ts ~ trend + I(trend^2))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -137634923  -37327572   -1559409   41652351  125112936 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 576828666   15618390   36.93  < 2e-16 ***
## trend        -2071369     511394   -4.05 8.52e-05 ***
## I(trend^2)       8749       3513    2.49    0.014 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60720000 on 137 degrees of freedom
## Multiple R-squared:  0.2663, Adjusted R-squared:  0.2556 
## F-statistic: 24.86 on 2 and 137 DF,  p-value: 6.14e-10
lines(RailQuad$fitted, lty=2, lwd=3)

CarLinear <- tslm(Spt11Car.ts ~ trend)
summary(CarLinear)
## 
## Call:
## tslm(formula = Spt11Car.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.554  -9.185   0.559  11.087  23.272 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.17137    2.42094   71.53   <2e-16 ***
## trend         0.44249    0.02979   14.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.25 on 138 degrees of freedom
## Multiple R-squared:  0.6152, Adjusted R-squared:  0.6124 
## F-statistic: 220.6 on 1 and 138 DF,  p-value: < 2.2e-16
plot(Spt11Car.ts, xlab="Time", ylab="Car", ylim=c(150, 250), bty="l")
lines(CarLinear$fitted, lwd=2)

CarQuad <- tslm(Spt11Car.ts ~ trend + I(trend^2))
summary(CarQuad)
## 
## Call:
## tslm(formula = Spt11Car.ts ~ trend + I(trend^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.503  -8.745   0.798  10.974  23.752 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.745e+02  3.674e+00  47.487  < 2e-16 ***
## trend       3.867e-01  1.203e-01   3.214  0.00163 ** 
## I(trend^2)  3.956e-04  8.266e-04   0.479  0.63297    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.29 on 137 degrees of freedom
## Multiple R-squared:  0.6158, Adjusted R-squared:  0.6102 
## F-statistic: 109.8 on 2 and 137 DF,  p-value: < 2.2e-16
lines(CarQuad$fitted, lty=2, lwd=3)

Changing the Scale

First plot the entire time series again.

 

b. What type of trend appears? Change the scale of the series, add trend lines, and uppress seasonality to better visualize the trend pattern.

There are two types of trend appeared on the three plots we made.

Air, and Car series plot shows upward Trend.

Rail series plot shows downward Trend.

# aggregate by quarter for Air
quarterly <- aggregate(Spt11Air.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

# aggregate by quarter for Rail
quarterly <- aggregate(Spt11Rail.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

# aggregate by quarter for Car
quarterly <- aggregate(Spt11Car.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

 

Chapter 2-  problem-3:

 a. Create a well –formatted time plot of the data.

setwd("C:/Users/Yusuf/Desktop/predictive/")
Ship <- read.csv("ApplianceShipments.csv")
str(Ship)
## 'data.frame':    20 obs. of  2 variables:
##  $ Quarter  : Factor w/ 20 levels "Q1-1985","Q1-1986",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Shipments: int  4009 4123 4493 4595 4245 4321 4522 4806 4799 4900 ...
head(Ship)
##   Quarter Shipments
## 1 Q1-1985      4009
## 2 Q1-1986      4123
## 3 Q1-1987      4493
## 4 Q1-1988      4595
## 5 Q1-1989      4245
## 6 Q2-1985      4321
Ship.ts <- ts(Ship$Shipments, start=c(1985), end=c(1989), freq=4)
plot(Ship.ts, xlab="Time", ylab="Shipments", ylim=c(3500, 5000), bty="l")

quarterly <- aggregate(Ship.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

library(forecast)
ShipLinear <- tslm(Ship.ts ~ trend)
summary(ShipLinear)
## 
## Call:
## tslm(formula = Ship.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -486.03 -202.27   72.75  174.00  474.48 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4417.9926   154.0939   28.67 1.62e-14 ***
## trend          0.7525    15.0380    0.05    0.961    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 303.8 on 15 degrees of freedom
## Multiple R-squared:  0.0001669,  Adjusted R-squared:  -0.06649 
## F-statistic: 0.002504 on 1 and 15 DF,  p-value: 0.9608
plot(Ship.ts, xlab="Time", ylab="Shipments", ylim=c(3500, 5000), bty="l")
lines(ShipLinear$fitted, lwd=2)

ShipQuad <- tslm(Ship.ts ~ trend + I(trend^2))
summary(ShipQuad)
## 
## Call:
## tslm(formula = Ship.ts ~ trend + I(trend^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -401.42 -100.41   -1.57  152.97  275.21 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3850.426    172.440  22.329  2.4e-12 ***
## trend        179.984     44.102   4.081 0.001123 ** 
## I(trend^2)    -9.957      2.381  -4.182 0.000923 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 209.7 on 14 degrees of freedom
## Multiple R-squared:  0.5554, Adjusted R-squared:  0.4919 
## F-statistic: 8.745 on 2 and 14 DF,  p-value: 0.003433
lines(ShipQuad$fitted, lty=2, lwd=3)

 

b. Which of the four components (level, trend, seasonality, noise) seem to be in the series?

  • There is no seasonality ,non-seasonal as the values don’t repeat at any frequency.

  • The Trend is a constant trend ,which is mean no Trend on this data.

  • series has a level as level refers to the average value of the series so all series have a level.

  • There is noise as there is no periodicity to the variation. 

 

setwd("C:/Users/Yusuf/Desktop/predictive/")
Shmpoo <- read.csv("ShampooSales.csv")
str(Shmpoo)
## 'data.frame':    36 obs. of  2 variables:
##  $ Month: Factor w/ 36 levels "Apr-95","Apr-96",..: 13 10 22 1 25 19 16 4 34 31 ...
##  $ Sales: num  266 146 183 119 180 ...
head(Shmpoo)
##    Month Sales
## 1 Jan-95 266.0
## 2 Feb-95 145.9
## 3 Mar-95 183.1
## 4 Apr-95 119.3
## 5 May-95 180.3
## 6 Jun-95 168.5
Shmpoo.ts <- ts(Shmpoo$Sales, start=c(1995,1), end=c(1997,12), freq=12)
plot(Shmpoo.ts, xlab="Time", ylab="Shipments", ylim=c(100, 700), bty="l")

quarterly <- aggregate(Shmpoo.ts, nfrequency=4, FUN=sum)
plot(quarterly, bty="l")

library(forecast)
ShmpooLinear <- tslm(Shmpoo.ts ~ trend)
summary(ShmpooLinear)
## 
## Call:
## tslm(formula = Shmpoo.ts ~ trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -108.74  -52.12  -16.13   43.48  194.25 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    89.14      26.72   3.336  0.00207 ** 
## trend          12.08       1.26   9.590 3.37e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.5 on 34 degrees of freedom
## Multiple R-squared:  0.7301, Adjusted R-squared:  0.7222 
## F-statistic: 91.97 on 1 and 34 DF,  p-value: 3.368e-11
plot(Shmpoo.ts, xlab="Time", ylab="Sales", ylim=c(100, 700), bty="l")
lines(ShmpooLinear$fitted, lwd=2)

ShmpooQuad <- tslm(Shmpoo.ts ~ trend + I(trend^2))
summary(ShmpooQuad)
## 
## Call:
## tslm(formula = Shmpoo.ts ~ trend + I(trend^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -104.148  -42.075   -8.438   33.924  144.582 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 202.8789    33.3002   6.092 7.35e-07 ***
## trend        -5.8801     4.1501  -1.417    0.166    
## I(trend^2)    0.4854     0.1088   4.461 8.93e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 62.93 on 33 degrees of freedom
## Multiple R-squared:  0.8316, Adjusted R-squared:  0.8214 
## F-statistic: 81.51 on 2 and 33 DF,  p-value: 1.708e-13
lines(ShmpooQuad$fitted, lty=2, lwd=3)

 

b. Which of the four components (level, trend, seasonality, noise) seem to be in the series?

 

There is no seasonality to the data, upward linear trend .There is increasing trend.

 

This series has a level as level refers to the average value of the series so all series have a level and, there is noise but not really visual.

 

 

c. Do you expect to see seasonality in sales of shampoo? Why?

 

Yes, because the seasonality is the time that we can see the sales of shampoo and gives us an idea about the number of sales that have in every month.