1 Visualization

1.1 univariate plot

plot(lydia_ts[, "Prod_sales"], col = "red", lty = 2, lwd = 2, ylab = " ", ylim = c(min(lydia_ts), 
    max(lydia_ts)))
lines(lydia_ts[, "Advert_cost"], type = "l", col = "blue", lwd = 2)
legend("topright", c("Advert_cost", "Prod_sales"), col = c("blue", "red"), lty = c(1, 
    2), lwd = c(2, 2), bty = "n")
title("Lydia Pinkham’s Vegetable Compund : \n Advertisment Cost and Product Sales per 1000 USD in scale")
grid()

1.2 bivariate plot

plot((lydia$Advert_cost), (lydia$Prod_sales), lwd = 2, col = "red", xlab = "Advert_cost", 
    ylab = "Prod_sales")
title("Lydia Pinkham’s Vegetable Compund : \n  Product Sales Vs Advertisment Cost  per 1000 USD in scale")
grid()

2 unit root test: Prod_sales

Unitroot_None_Prod_sales = ur.df(lydia$Prod_sales, type = "none", lags = 1)

2.1 type = “none”

summary(Unitroot_None_Prod_sales)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-526.10 -102.99   29.54  147.78  585.76 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
z.lag.1    -0.009589   0.015507  -0.618   0.5392   
z.diff.lag  0.437487   0.129116   3.388   0.0014 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 217.7 on 49 degrees of freedom
Multiple R-squared:  0.1914,    Adjusted R-squared:  0.1584 
F-statistic: 5.798 on 2 and 49 DF,  p-value: 0.005494


Value of test-statistic is: -0.6183 

Critical values for test statistics: 
     1pct  5pct 10pct
tau1 -2.6 -1.95 -1.61

2.2 type = “drift”

Unitroot_Drift_Prod_sales = ur.df(lydia$Prod_sales, type = "drift", lags = 1)
summary(Unitroot_Drift_Prod_sales)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression drift 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-431.24 -144.71  -17.14  125.07  501.25 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 218.33670   94.34711   2.314 0.024983 *  
z.lag.1      -0.11512    0.04796  -2.400 0.020312 *  
z.diff.lag    0.47453    0.12476   3.803 0.000403 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 208.6 on 48 degrees of freedom
Multiple R-squared:  0.2719,    Adjusted R-squared:  0.2416 
F-statistic: 8.963 on 2 and 48 DF,  p-value: 0.0004926


Value of test-statistic is: -2.4002 2.8859 

Critical values for test statistics: 
      1pct  5pct 10pct
tau2 -3.51 -2.89 -2.58
phi1  6.70  4.71  3.86

2.3 type = “trend”

Unitroot_Trend_Prod_sales = ur.df(lydia$Prod_sales, type = "trend", lags = 1)
summary(Unitroot_Trend_Prod_sales)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression trend 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-414.18 -125.31   -2.19  123.87  479.60 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 254.27187  104.72720   2.428 0.019067 *  
z.lag.1      -0.11045    0.04849  -2.278 0.027327 *  
tt           -1.64719    2.05003  -0.803 0.425734    
z.diff.lag    0.45172    0.12841   3.518 0.000977 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 209.4 on 47 degrees of freedom
Multiple R-squared:  0.2818,    Adjusted R-squared:  0.2359 
F-statistic: 6.146 on 3 and 47 DF,  p-value: 0.001297


Value of test-statistic is: -2.2778 2.1249 3.1821 

Critical values for test statistics: 
      1pct  5pct 10pct
tau3 -4.04 -3.45 -3.15
phi2  6.50  4.88  4.16
phi3  8.73  6.49  5.47

2.4 Sub-Result

The above three output results suggested that Prod_sales is integrated without drift and without trend.

3 unit root test: Advert_cost

3.1 type = “none”

Unitroot_None_Advert_cost = ur.df(lydia$Advert_cost, type = "none", lags = 1)
summary(Unitroot_None_Advert_cost)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-673.54  -30.12   38.32  162.82  428.63 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
z.lag.1    -0.02489    0.03091  -0.805    0.425
z.diff.lag  0.06982    0.14269   0.489    0.627

Residual standard error: 224.7 on 49 degrees of freedom
Multiple R-squared:  0.01619,   Adjusted R-squared:  -0.02397 
F-statistic: 0.4031 on 2 and 49 DF,  p-value: 0.6704


Value of test-statistic is: -0.8051 

Critical values for test statistics: 
     1pct  5pct 10pct
tau1 -2.6 -1.95 -1.61

3.2 type = “drift”

Unitroot_Drift_Advert_cost = ur.df(lydia$Advert_cost, type = "drift", lags = 1)
summary(Unitroot_Drift_Advert_cost)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression drift 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-588.47 -104.31   -8.37  139.67  415.91 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) 204.48493   87.47618   2.338   0.0236 *
z.lag.1      -0.21343    0.08591  -2.484   0.0165 *
z.diff.lag    0.15632    0.14153   1.105   0.2749  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 215.2 on 48 degrees of freedom
Multiple R-squared:  0.1167,    Adjusted R-squared:  0.07993 
F-statistic: 3.172 on 2 and 48 DF,  p-value: 0.05084


Value of test-statistic is: -2.4842 3.0858 

Critical values for test statistics: 
      1pct  5pct 10pct
tau2 -3.51 -2.89 -2.58
phi1  6.70  4.71  3.86

3.3 type = “trend”

Unitroot_Trend_Advert_cost = ur.df(lydia$Advert_cost, type = "trend", lags = 1)
summary(Unitroot_Trend_Advert_cost)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression trend 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-589.89 -116.83   14.77  139.87  413.39 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 241.7269   101.6749   2.377   0.0216 *
z.lag.1      -0.2096     0.0865  -2.423   0.0193 *
tt           -1.5140     2.0770  -0.729   0.4697  
z.diff.lag    0.1420     0.1436   0.989   0.3278  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 216.2 on 47 degrees of freedom
Multiple R-squared:  0.1266,    Adjusted R-squared:  0.07086 
F-statistic: 2.271 on 3 and 47 DF,  p-value: 0.09252


Value of test-statistic is: -2.4229 2.2143 3.3213 

Critical values for test statistics: 
      1pct  5pct 10pct
tau3 -4.04 -3.45 -3.15
phi2  6.50  4.88  4.16
phi3  8.73  6.49  5.47

3.4 Sub-Result

The above three output results suggested that log(Advert_cost) is integrated without drift and without trend.

4 Cross-correlation Check

par(mar = c(5.1, 4.1, 6.1, 2.1))
ccf(as.numeric(lydia_demean$Prod_sales), as.numeric(as.numeric(lydia_demean$Advert_cost)), 
    main = "Cross-correlogram without pre-whitening \n Prod_sales & Advert_cost \n Demeaned", 
    ylab = "CCF")

The plot indicated that Prod_sales and Advert_cost have very strong cross correlations with their lags and leads

5 Case I: Advert_cost is granger-caused to Prod_sale

5.1 Prewhiting series before Linear Transfer Function Identification

Prewhiting \(\mathrm{\bigtriangleup Advert\_cost }\)

fit_diff_Advert_cost_demean = auto.arima(diff(lydia_demean$Advert_cost), max.p = 10, 
    max.q = 0, max.order = 10, max.P = 10, max.Q = 0)
fit_diff_Advert_cost_demean

Series: diff(lydia_demean$Advert_cost) 
ARIMA(2,0,0) with zero mean 

Coefficients:
         ar1      ar2
      0.0778  -0.3912
s.e.  0.1266   0.1240

sigma^2 estimated as 42036:  log likelihood=-349.74
AIC=705.47   AICc=705.97   BIC=711.33

checkresiduals(fit_diff_Advert_cost_demean)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,0) with zero mean
Q* = 13.129, df = 8, p-value = 0.1075

Model df: 2.   Total lags used: 10

residuals and coefficients of this model

fit_diff_Advert_cost_demean$residuals

Time Series:
Start = 1 
End = 52 
Frequency = 1 
 [1]   71.669843    8.869368   11.428146   30.878066  -32.909922   64.257420
 [7]   17.486106  -86.675857  268.298732 -199.377962  356.845443  -69.758262
[13]  247.109943  333.893583  153.921011  251.097152  229.928477  175.358052
[19] -647.851185  254.566531  -51.774360   -5.178687 -488.536761   91.694784
[25]  173.216919   43.980291 -541.729645 -393.813085  -13.286238  -17.456374
[31]   77.009560  184.287461  164.772532   50.827995  175.738763  -62.734044
[37]   90.861655 -160.603289 -148.827607   66.658248  -37.029942   30.968843
[43] -191.805345  167.445615  -49.362596  -96.171158    7.119832  -45.149400
[49]  -41.619017 -123.423852    2.673045 -131.642761

fit_diff_Advert_cost_demean$coef

        ar1         ar2 
 0.07780944 -0.39124972

Filter the \(\bigtriangleup \mathrm{Prod\_sales}\) series using the \(\bigtriangleup \mathrm{Advert\_cost}\) Model

y = diff(lydia_demean$Prod_sales)
x = diff(lydia_demean$Advert_cost)
ytilde = filter(y, filter = c(1, -fit_diff_Advert_cost_demean$coef), method = "convolution", 
    sides = 1)
w = filter(x, filter = c(1, -fit_diff_Advert_cost_demean$coef), method = "convolution", 
    sides = 1)
n = length(y)
ytilde1 = ytilde[seq(length(fit_diff_Advert_cost_demean$coef) + 1, n)]
w1 = w[seq(length(fit_diff_Advert_cost_demean$coef) + 1, n)]
ccf(ytilde1, w1, ylab = "CCF")

y0 = y[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 0, n - 0)]
y1 = y[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 1, n - 1)]
y2 = y[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 2, n - 2)]
y3 = y[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 3, n - 3)]
x0 = x[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 0, n - 0)]
x1 = x[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 1, n - 1)]
x2 = x[seq(length(fit_diff_Advert_cost_demean$coef) + 2 - 2, n - 2)]

5.2 result_11

r_11 = lm(y0 ~ x1 + x2 + y1 + y2 + y3 - 1)
summary(r_11)


Call:
lm(formula = y0 ~ x1 + x2 + y1 + y2 + y3 - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-567.76 -124.21   13.80   83.87  537.86 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)   
x1 -0.04794    0.18452  -0.260  0.79623   
x2 -0.32144    0.17357  -1.852  0.07075 . 
y1  0.50366    0.16959   2.970  0.00481 **
y2  0.02754    0.19113   0.144  0.88611   
y3  0.10475    0.15516   0.675  0.50314   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 219.7 on 44 degrees of freedom
Multiple R-squared:  0.2597,    Adjusted R-squared:  0.1756 
F-statistic: 3.088 on 5 and 44 DF,  p-value: 0.01787

5.3 result_12

r_12 = lm(y0 ~ x2 + y1 - 1)
summary(r_12)


Call:
lm(formula = y0 ~ x2 + y1 - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-534.64 -131.59    8.80   92.47  536.04 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
x2  -0.2676     0.1390  -1.925 0.060234 .  
y1   0.4878     0.1302   3.747 0.000489 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 214.6 on 47 degrees of freedom
Multiple R-squared:  0.2454,    Adjusted R-squared:  0.2133 
F-statistic: 7.641 on 2 and 47 DF,  p-value: 0.001339

checkresiduals(r_12)


    Breusch-Godfrey test for serial correlation of order up to 10

data:  Residuals
LM test = 4.2839, df = 10, p-value = 0.9336

par(mfrow = c(2, 2))
plot(r_12)

5.4 Conclusion: Case I

Referring to statistical result_12, it found that

6 Case II: Prod_sale is granger-caused to Advert_cost

Prewhiting \(\mathrm{\bigtriangleup Prod\_sale }\)

fit_diff_Prod_sales_demean = auto.arima(diff(lydia_demean$Prod_sales), max.p = 20, 
    max.q = 0, max.order = 10, max.P = 10, max.Q = 0)
fit_diff_Prod_sales_demean

Series: diff(lydia_demean$Prod_sales) 
ARIMA(1,0,0) with zero mean 

Coefficients:
         ar1
      0.4228
s.e.  0.1237

sigma^2 estimated as 45891:  log likelihood=-352.46
AIC=708.93   AICc=709.17   BIC=712.83

checkresiduals(fit_diff_Prod_sales_demean)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,0) with zero mean
Q* = 6.9285, df = 9, p-value = 0.6446

Model df: 1.   Total lags used: 10

residuals and coefficients of this model

fit_diff_Prod_sales_demean$residuals

Time Series:
Start = 1 
End = 52 
Frequency = 1 
 [1]   11.78078   36.50334  -63.75844  141.44972   80.41596 -150.81224
 [7]   38.16790   67.84564  147.67106  575.58368  -31.83300 -122.74526
[13]  319.45640   80.50298  369.36216  -28.07438   16.81188 -557.78534
[19] -337.71078  116.93356    6.31558  -66.39592 -269.06030  -33.03990
[25]  238.49684 -115.87940 -233.39592 -308.44936  338.47030  138.08034
[31] -137.52374  365.14100  248.54992    8.40892  192.00650 -196.47012
[37]  154.51688 -510.31558  -62.50280   98.66474   78.22820 -228.28868
[43]  -14.70446  218.43636  -44.83914 -224.68460   38.63784   45.56382
[49]  -98.14768 -141.79184   72.68478  -96.73154

fit_diff_Prod_sales_demean$coef

    ar1 
0.42282

Filter the \(\bigtriangleup \mathrm{Advert\_cost}\) series using the \(\bigtriangleup \mathrm{Prod\_sales}\) Model

y = diff(lydia_demean$Advert_cost)
x = diff(lydia_demean$Prod_sales)

ytilde = filter(y, filter = c(1, -fit_diff_Prod_sales_demean$coef), method = "convolution", 
    sides = 1)
w = filter(x, filter = c(1, -fit_diff_Prod_sales_demean$coef), method = "convolution", 
    sides = 1)
n = length(y)
ytilde1 = ytilde[seq(length(fit_diff_Prod_sales_demean$coef) + 1, n)]
w1 = w[seq(length(fit_diff_Prod_sales_demean$coef) + 1, n)]
ccf(ytilde1, w1, ylab = "CCF")

y0 = y[seq(length(fit_diff_Prod_sales_demean$coef) + 2 - 0, n - 0)]
y1 = y[seq(length(fit_diff_Prod_sales_demean$coef) + 2 - 1, n - 1)]
y2 = y[seq(length(fit_diff_Prod_sales_demean$coef) + 2 - 2, n - 2)]
x0 = x[seq(length(fit_diff_Prod_sales_demean$coef) + 2 - 0, n - 0)]
x1 = x[seq(length(fit_diff_Prod_sales_demean$coef) + 2 - 1, n - 1)]

6.1 result_21

r_21 = lm(y0 ~ x1 + y1 + y2 - 1)
summary(r_21)


Call:
lm(formula = y0 ~ x1 + y1 + y2 - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-481.59  -89.52    9.07  114.20  340.10 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
x1   0.5456     0.1367   3.991 0.000229 ***
y1  -0.2546     0.1425  -1.787 0.080414 .  
y2  -0.5172     0.1186  -4.360 7.04e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 182.5 on 47 degrees of freedom
Multiple R-squared:  0.3777,    Adjusted R-squared:  0.3379 
F-statistic: 9.508 on 3 and 47 DF,  p-value: 5.091e-05

6.2 Conclusion: Case II

Referring to statistical result_21, it found that

7 Summary

With prewhitening series technique, the linear transfer function of the given data suggest two kinds granger causality between Advertisment Cost and Product Sales and listed the computational results below :

Assignment_05

DKCH

2020-05-10