Libraries to use.

1 Probability and Statistics

1.1 Sampling Variability

A dataset containing Height and Weight information for 1000 male and female. Get the data from https://www.kaggle.com/mustafaali96/weight-height and place in the same folder as the RMarkdown script.

    Gender         Height          Weight      
 Female:5000   Min.   :137.8   Min.   : 29.35  
 Male  :5000   1st Qu.:161.3   1st Qu.: 61.61  
               Median :168.4   Median : 73.12  
               Mean   :168.6   Mean   : 73.23  
               3rd Qu.:175.7   3rd Qu.: 84.90  
               Max.   :200.7   Max.   :122.47  

Population consisting of 10000 members. The mean for the population height is

[1] 168.5736

Take a simple random sample of size 10.

Find its mean.

[1] 162.384

Repeat this 1000 times.

A few of the Mean Heights
heightSM
172.909
170.725
163.015
170.169
167.784
172.352

Plot a histogram of the repeated resampling procedure.

1.2 Simple Random Sampling

Take a sample of size n from a vector x.

 [1] 40  2 24 58 37 48 16 56 13 42  5 23

1.3 Basic Statistical Visualisation

Produce a histogram of height.

Histogram for height with density on the y-axis and a smooth curve. Adjusted so the histogram’s area (i.e., the total area defined by the boxes) equals one.


  The decimal point is 1 digit(s) to the right of the |

  13 | 899
  14 | 011122222233333444444444444444
  14 | 55555555555555555666666666666666667777777777777777777777777777888888+73
  15 | 00000000000000000000000000000000000000000000000000000000000000111111+462
  15 | 55555555555555555555555555555555555555555555555555555555555555555555+1139
  16 | 00000000000000000000000000000000000000000000000000000000000000000000+1571
  16 | 55555555555555555555555555555555555555555555555555555555555555555555+1695
  17 | 00000000000000000000000000000000000000000000000000000000000000000000+1651
  17 | 55555555555555555555555555555555555555555555555555555555555555555555+1360
  18 | 00000000000000000000000000000000000000000000000000000000000000000000+880
  18 | 55555555555555555555555555555555555555555555555555555555555555555555+297
  19 | 00000000000000000000000000000000001111111111111111111112222222222222+17
  19 | 55555555556666777899
  20 | 01

1.3.1 Dataset: Bear Information

Scatter plots for pairs.

Correlation plots.

1.3.2 Dataset Iris

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 https://en.wikipedia.org/wiki/Iris_flower_data_set. The dataset contains a set of 150 records under five attributes - petal length, petal width, sepal length, sepal width and species.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

1.4 Populations/Samples Parameters

x
 blue green   red 
  606   284   110 
x
 blue green   red 
0.606 0.284 0.110 
y
 blue green   red 
   79    12     9 
y
 blue green   red 
 0.79  0.12  0.09 
 [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2  0.3
[15]  0.4  0.5  0.6  0.7  0.8  0.9  1.0
[1] 5.283162e-17
[1] 0.385
[1] 0.6204837
[1] 0.3666667
[1] TRUE

1.5 Population/Sample Variance

A few of the Sample/Population Variance Heights
heightSV heightPV
258.72013 206.97610
37.39507 29.91606
158.52195 126.81756
114.71627 91.77302
83.59060 66.87248
87.68842 70.15074

Plot a histogram of the repeated resampling procedure.

Population variance is biased for samples, this is shown by the red line lagging behind the blue line. The True variance and sample variance almost match more closely.

     computation.mean grp.mean
1 Population Variance 76.45993
2     Sample Variance 95.57491
3       True Variance 95.49720

1.6 Probability

Factorial

[1] 24

Combination of \(k\) units out of \(n\).

[1] 6

To actually have the combinations and permutations use gtools library

     [,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    2    1
 [5,]    2    3
 [6,]    2    4
 [7,]    3    1
 [8,]    3    2
 [9,]    3    4
[10,]    4    1
[11,]    4    2
[12,]    4    3

For Exe 10.

[1] 0.001085973

2 Random Variables

 [1] 1 0 0 0 0 0 1 1 0 0
 [1] 0.69601752 0.78701285 0.88207892 0.34780817 0.20874892 0.01817447
 [7] 0.12665581 0.97636847 0.61139940 0.60878280

2.1 Distributions

  • p for “probability”, the cumulative distribution function (c. d. f.)
  • q for “quantile”, the inverse c. d. f.
  • d for “density”, the density function (p. f. or p. d. f.)
  • r for “random”, a random variable having the specified distribution

Distribution Functions
Beta pbeta qbeta dbeta rbeta
Binomial pbinom qbinom dbinom rbinom
Cauchy pcauchy qcauchy dcauchy rcauchy
Chi-Square pchisq qchisq dchisq rchisq
Exponential pexp qexp dexp rexp
F pf qf df rf
Gamma pgamma qgamma dgamma rgamma
Geometric pgeom qgeom dgeom rgeom
Hypergeometric phyper qhyper dhyper rhyper
Logistic plogis qlogis dlogis rlogis
Log Normal plnorm qlnorm dlnorm rlnorm
Negative Binomial pnbinom qnbinom dnbinom rnbinom
Normal pnorm qnorm dnorm rnorm
Poisson ppois qpois dpois rpois
Student t pt qt dt rt
Studentized Range ptukey qtukey dtukey rtukey
Uniform punif qunif dunif runif
Weibull pweibull qweibull dweibull rweibull
Wilcoxon Rank Sum Statistic pwilcox qwilcox dwilcox rwilcox
Wilcoxon Signed Rank Statistic psignrank qsignrank dsignrank rsignrank

[1] 0.1733431
[1] 0.9502129
[1] 0.9502129
[1] 0.7768698
[1] 0.7768698

2.2 Discrete

Binomial Bin(15, 0.7)

2.3 Continous

Normal N(0.5, 1)

2.4 Integrate

2.300524 with absolute error < 0.00011
1 with absolute error < 1.1e-14

3 Pearson’s Correlation and Regression

3.1 Correlation

[1] 1.090688
[1] 0.4309702

    Pearson's product-moment correlation

data:  x and y
t = 6.7204, df = 198, p-value = 1.885e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3108137 0.5375682
sample estimates:
      cor 
0.4309702 

Call:
lm(formula = weight ~ height, data = df1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6909 -1.3864  0.1708  1.4898  8.3337 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.4157     0.8338   11.29  < 2e-16 ***
height        1.0907     0.1623    6.72 1.89e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.259 on 198 degrees of freedom
Multiple R-squared:  0.1857,    Adjusted R-squared:  0.1816 
F-statistic: 45.16 on 1 and 198 DF,  p-value: 1.885e-10
Analysis of Variance Table

Response: weight
           Df  Sum Sq Mean Sq F value    Pr(>F)    
height      1  230.38 230.379  45.164 1.885e-10 ***
Residuals 198 1009.98   5.101                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


    Shapiro-Wilk normality test

data:  linearMod$residuals
W = 0.99191, p-value = 0.3324
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9919         0.3324 
Kolmogorov-Smirnov        0.0407         0.8948 
Cramer-von Mises         11.5763         0.0000 
Anderson-Darling          0.3397         0.4957 
-----------------------------------------------

 F Test for Heteroskedasticity
 -----------------------------
 Ho: Variance is homogenous
 Ha: Variance is not homogenous

 Variables: fitted values of weight 

      Test Summary        
 -------------------------
 Num DF     =    1 
 Den DF     =    198 
 F          =    2.713106 
 Prob > F   =    0.1011132 

 Score Test for Heteroskedasticity
 ---------------------------------
 Ho: Variance is homogenous
 Ha: Variance is not homogenous

 Variables: fitted values of weight 

        Test Summary         
 ----------------------------
 DF            =    1 
 Chi2          =    2.703467 
 Prob > Chi2   =    0.1001303 

In 1973, Frank Anscombe published a set of examples showing situations where the results of r could be misleading. (Anscombe FJ, “Graphs in Statistical Analysis,” The American Statistician, 27, 17-21).

The examples involves the analysis of four sets of paired data where all of them will give the same value for r. However, the scatterplot of each set reveals how different the relationships between pairs of variables are.

3.2 Regression


Call:
lm(formula = v ~ x, data = datareg)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2332 -0.4799  0.1090  0.6271  1.6003 

Coefficients:
            Estimate Std. Error t value    Pr(>|t|)    
(Intercept)  -0.2061     0.4459  -0.462       0.649    
x             1.1516     0.1525   7.552 0.000000551 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.035 on 18 degrees of freedom
Multiple R-squared:  0.7601,    Adjusted R-squared:  0.7468 
F-statistic: 57.04 on 1 and 18 DF,  p-value: 0.000000551
Analysis of Variance Table

Response: v
          Df Sum Sq Mean Sq F value      Pr(>F)    
x          1 61.071  61.071  57.037 0.000000551 ***
Residuals 18 19.273   1.071                        
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3 Model Checking


    Shapiro-Wilk normality test

data:  linearMod$residuals
W = 0.9782, p-value = 0.9087
[1] 50
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22

Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,    Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
Analysis of Variance Table

Response: dist
          Df Sum Sq Mean Sq F value   Pr(>F)    
speed      1  21186 21185.5  89.567 1.49e-12 ***
Residuals 48  11354   236.5                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.4 Residuals


    Shapiro-Wilk normality test

data:  linearMod$residuals
W = 0.94509, p-value = 0.02152
  youtube facebook newspaper sales
1  276.12    45.36     83.04 26.52
2   53.40    47.16     54.12 12.48
3   20.64    55.08     83.16 11.16
4  181.80    49.56     70.20 22.20
5  216.96    12.96     70.08 15.48

Call:
lm(formula = sales ~ youtube + facebook, data = marketing)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.5572  -1.0502   0.2906   1.4049   3.3994 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.50532    0.35339   9.919   <2e-16 ***
youtube      0.04575    0.00139  32.909   <2e-16 ***
facebook     0.18799    0.00804  23.382   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.018 on 197 degrees of freedom
Multiple R-squared:  0.8972,    Adjusted R-squared:  0.8962 
F-statistic: 859.6 on 2 and 197 DF,  p-value: < 2.2e-16


    Shapiro-Wilk normality test

data:  model$residuals
W = 0.91804, p-value = 0.00000000419

3.5 olsrr library Model Checking

olsrr offers tools for detecting violation of standard regression assumptions. Here we take a look at residual diagnostics. The standard regression assumptions include the following about residuals/errors:

  • The error has a normal distribution (normality assumption).

  • The errors have mean zero.

  • The errors have same but unknown variance (homoscedasticity assumption).

  • The error are independent of each other (independent errors assumption).

Graph for detecting violation of normality assumption.

Residual Normality Test Test for detecting violation of normality assumption.

-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.918          0.0000 
Kolmogorov-Smirnov        0.1253         0.0037 
Cramer-von Mises          9.459          0.0000 
Anderson-Darling          3.6974         0.0000 
-----------------------------------------------

Correlation between observed residuals and expected residuals under normality.

[1] 0.9564499

Residual vs Fitted Values Plot It is a scatter plot of residuals on the y axis and fitted values on the x axis to detect non-linearity, unequal error variances, and outliers.

Characteristics of a well behaved residual vs fitted plot:

  • The residuals spread randomly around the 0 line indicating that the relationship is linear.

  • The residuals form an approximate horizontal band around the 0 line indicating homogeneity of error variance.

  • No one residual is visibly away from the random pattern of the residuals indicating that there are no outliers.

Residual Histogram

Histogram of residuals for detecting violation of normality assumption.

3.6 Prediction

       fit      lwr      upr
1 5.934320 5.343442 6.525197
2 7.946552 7.418504 8.474599
3 9.115297 8.610423 9.620171

4 Approximation

[1] -9.238514
[1] -1.959964
[1] 2.833333
                2.5 %   97.5 %
(Intercept) 0.9077786 4.758888

5 Time Series

5.1 Creating Time Series.

          Qtr1      Qtr2      Qtr3      Qtr4
2010            1.995615  1.763062  1.452173
2011  4.779934  5.718965  8.779271  8.389820
2012 10.365962  5.398546  5.021821  6.601211
2013  8.987069 10.687979 13.110406 10.572172
2014  8.693758  7.936813 11.620355 11.202458
2015 18.759441 12.399895 13.134812 11.559743
2016 14.003463 13.301188 15.671638 14.750984
2017 15.016441 11.195322 14.881908          
Time Series:
Start = 1980 
End = 2014 
Frequency = 1 
 [1]  1.995615  1.763062  1.452173  4.779934  5.718965  8.779271  8.389820
 [8] 10.365962  5.398546  5.021821  6.601211  8.987069 10.687979 13.110406
[15] 10.572172  8.693758  7.936813 11.620355 11.202458 18.759441 12.399895
[22] 13.134812 11.559743 14.003463 13.301188 15.671638 14.750984 15.016441
[29] 11.195322 14.881908  1.995615  1.763062  1.452173  4.779934  5.718965
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
                [,1]
2016-01-01  1.995615
2016-01-02  1.763062
2016-01-03  1.452173
2016-01-04  4.779934
2016-01-05  5.718965
2016-01-06  8.779271
2016-01-07  8.389820
2016-01-08 10.365962
2016-01-09  5.398546
2016-01-10  5.021821
2016-01-11  6.601211
2016-01-12  8.987069
2016-01-13 10.687979
2016-01-14 13.110406
2016-01-15 10.572172
2016-01-16  8.693758
2016-01-17  7.936813
2016-01-18 11.620355
2016-01-19 11.202458
2016-01-20 18.759441
2016-01-21 12.399895
2016-01-22 13.134812
2016-01-23 11.559743
2016-01-24 14.003463
2016-01-25 13.301188
2016-01-26 15.671638
2016-01-27 14.750984
2016-01-28 15.016441
2016-01-29 11.195322
2016-01-30 14.881908

5.3 Visualisation

5.3.2 ggplot

5.3.3 Plotly

5.3.4 dygraphs

5.4 Data For Example

5.4.1 Lag, Difference, and Autocorrelation

           Jan       Feb       Mar       Apr       May       Jun       Jul
2009                                                                      
2010  1.000000 -1.839818 -3.017105 -1.842691  5.234162  9.841046  8.316776
2011 22.237127 16.637862  3.658659  7.686553 12.333570 17.485469 24.962183
2012 25.156549 26.243030 23.531001 20.708705 11.434616 12.531440 22.490719
2013 24.744799 24.425151 24.882243 36.510455 46.747396 47.942016 47.605089
2014 56.182945                                                            
           Aug       Sep       Oct       Nov       Dec
2009                                          0.000000
2010  6.511892  6.177134  8.671431 11.451401 19.952744
2011 28.333312 28.007451 30.626409 35.790274 38.104631
2012 37.753707 36.636157 36.766465 35.771098 25.376246
2013 56.433489 61.217564 48.260981 35.207093 40.722804
2014                                                  
             Jan         Feb         Mar         Apr         May
2010                           1.0000000  -2.8398178  -1.1772875
2011   2.7799700   8.5013429   2.2843828  -5.5992647 -12.9792034
2012   5.1638647   2.3143578 -12.9480828   1.0864811  -2.7120285
2013  -0.9953678 -10.3948518  -0.6314469  -0.3196474   0.4570921
2014 -13.0538878   5.5157105  15.4601410                        
             Jun         Jul         Aug         Sep         Oct
2010   1.1744144   7.0768531   4.6068836  -1.5242693  -1.8048846
2011   4.0278940   4.6470171   5.1518996   7.4767133   3.3711296
2012  -2.8222960  -9.2740888   1.0968231   9.9592793  15.2629881
2013  11.6282120  10.2369407   1.1946202  -0.3369278   8.8284007
2014                                                            
             Nov         Dec
2010  -0.3347581   2.4942974
2011  -0.3258617   2.6189584
2012  -1.1175503   0.1303086
2013   4.7840752 -12.9565837
2014                        

Autocorrelations of series 'mytsdata', by lag

0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 
 1.000  0.859  0.708  0.609  0.519  0.405  0.337  0.325  0.314  0.278 
0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 
 0.245  0.223  0.186  0.156  0.122  0.049 -0.016 

5.4.3 Trend


Call:
lm(formula = mytsdata ~ time(mytsdata))

Residuals:
     Min       1Q   Median       3Q      Max 
-17.0685  -5.6420   0.6023   6.3084  16.8171 

Coefficients:
                  Estimate  Std. Error t value Pr(>|t|)    
(Intercept)    -23966.4357   1958.8331  -12.23 2.30e-16 ***
time(mytsdata)     11.9230      0.9735   12.25 2.22e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.278 on 48 degrees of freedom
Multiple R-squared:  0.7576,    Adjusted R-squared:  0.7525 
F-statistic:   150 on 1 and 48 DF,  p-value: 2.221e-16
Analysis of Variance Table

Response: mytsdata
               Df  Sum Sq Mean Sq F value    Pr(>F)    
time(mytsdata)  1 10279.2 10279.2     150 2.221e-16 ***
Residuals      48  3289.4    68.5                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


    Shapiro-Wilk normality test

data:  linearMod$residuals
W = 0.98465, p-value = 0.7566

5.4.4 Regression with Lagged Time Series


Time series regression with "ts" data:
Start = 2010(3), End = 2014(3)

Call:
dynlm(formula = mytsdata ~ L(mytsdata, 1))

Residuals:
     Min       1Q   Median       3Q      Max 
-14.5796  -3.5823   0.0598   3.2617  15.4817 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     2.72080    1.68965    1.61    0.114    
L(mytsdata, 1)  0.93266    0.05967   15.63   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.675 on 47 degrees of freedom
Multiple R-squared:  0.8387,    Adjusted R-squared:  0.8352 
F-statistic: 244.3 on 1 and 47 DF,  p-value: < 2.2e-16

5.4.5 Seasonality

5.4.6 Making Stationary


    Augmented Dickey-Fuller Test

data:  mytsdata
Dickey-Fuller = -3.3154, Lag order = 3, p-value = 0.07932
alternative hypothesis: stationary
[1] 0
[1] 1

    Augmented Dickey-Fuller Test

data:  stationaryTS
Dickey-Fuller = -3.9373, Lag order = 3, p-value = 0.01973
alternative hypothesis: stationary

5.4.7 Forecast

Series: ts.stl$time.series[, 3] 
ARIMA(2,0,0) with zero mean 

Coefficients:
         ar1      ar2
      0.8735  -0.6074
s.e.  0.1167   0.1158

sigma^2 estimated as 18.39:  log likelihood=-143.36
AIC=292.72   AICc=293.24   BIC=298.45

Training set error measures:
                    ME     RMSE      MAE      MPE    MAPE      MASE
Training set 0.1290637 4.201792 3.438581 124.7528 161.216 0.4287052
                   ACF1
Training set 0.03800343

5.4.8 Entir Process

The data are clearly non-stationary, as the series wanders up and down for long periods. Consequently, we will take a first difference of the data.

The PACF shown is suggestive of an AR(2) model. So an initial candidate model is an ARIMA(2,1,0). There are no other obvious candidate models.

Series: mytsdatadj 
ARIMA(2,1,0) 

Coefficients:
         ar1      ar2
      0.3716  -0.4716
s.e.  0.1341   0.1319

sigma^2 estimated as 28.68:  log likelihood=-151.02
AIC=308.04   AICc=308.57   BIC=313.71


    Ljung-Box test

data:  Residuals from ARIMA(2,1,0)
Q* = 7.5718, df = 8, p-value = 0.4764

Model df: 2.   Total lags used: 10

5.4.9 Seasonal Package

5.5 Examples

5.5.1 White Noise

Smoothing

5.5.2 Trend and White Noise

5.5.3 Autoregresive

5.5.4 Seasonality and White Noise

5.5.5 Cross-correlation


Attaching package: 'astsa'
The following object is masked from 'package:seasonal':

    unemp
The following object is masked from 'package:forecast':

    gas

5.6 Real Life Examples

data(jj)
plot(jj, type="o", ylab="Quarterly Earnings per Share")

jjdata<-data.frame(Y=as.matrix(jj), date=as.Date(time(jj)))
jjdata %>%
plot_ly(x = ~date, y = ~Y, mode = 'lines+markers', type = "scatter", name="Confirm", width = 850, height = 750)

library(dygraphs)
dygraph(jj)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector()

dygraph(globtemp)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector()

dygraph(speech)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector()

#as.xts(AirPassengers)
dygraph(AirPassengers)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector()

lungDeaths <- cbind(mdeaths, fdeaths)
dygraph(lungDeaths)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector()

SoiRec <- cbind(soi, rec) # Southern Oscillation Index and Recruitment
dy_graph <- list(dygraph(soi)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector(),
  dygraph(rec)%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector())
htmltools::browsable(htmltools::tagList(dy_graph))

SoiRecdata<-data.frame(as.matrix(SoiRec), date=as.Date(time(SoiRec)))
subplot(SoiRecdata %>%
plot_ly(x = ~date, y = ~soi, mode = 'lines+markers', type = "scatter", name="soi"),
SoiRecdata %>%
plot_ly(x = ~date, y = ~rec, mode = 'lines+markers', type = "scatter", name="rec"), nrows = 2, shareX = TRUE)

dy_graph <- list(
dygraph(fmri1[,2:5])%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector(),

dygraph(fmri1[,6:9])%>% 
  dyOptions(drawPoints = TRUE, pointSize = 4)%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3))%>% 
  dyRangeSelector())
htmltools::browsable(htmltools::tagList(dy_graph))

5.7 Exercise

Response mytsX :

Call:
lm(formula = mytsX ~ time(mytsdata))

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5583 -0.6055  0.1441  0.8457  2.5433 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    426.38967   32.54121   13.10   <2e-16 ***
time(mytsdata)  -0.21471    0.01633  -13.15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.179 on 98 degrees of freedom
Multiple R-squared:  0.6382,    Adjusted R-squared:  0.6345 
F-statistic: 172.9 on 1 and 98 DF,  p-value: < 2.2e-16


Response mytsY :

Call:
lm(formula = mytsY ~ time(mytsdata))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5659 -1.0276 -0.2509  1.0234  3.0951 

Coefficients:
                  Estimate  Std. Error t value Pr(>|t|)    
(Intercept)    -1189.71594    39.87865  -29.83   <2e-16 ***
time(mytsdata)     0.60070     0.02001   30.02   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.444 on 98 degrees of freedom
Multiple R-squared:  0.9019,    Adjusted R-squared:  0.9009 
F-statistic: 900.9 on 1 and 98 DF,  p-value: < 2.2e-16
Analysis of Variance Table

               Df  Pillai approx F num Df den Df    Pr(>F)    
(Intercept)     1 0.96816  1474.56      2     97 < 2.2e-16 ***
time(mytsdata)  1 0.92730   618.61      2     97 < 2.2e-16 ***
Residuals      98                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


    Augmented Dickey-Fuller Test

data:  mytsdata[, 1]
Dickey-Fuller = -3.4149, Lag order = 4, p-value = 0.05601
alternative hypothesis: stationary
[1] 0
[1] 1

    Augmented Dickey-Fuller Test

data:  stationaryTS
Dickey-Fuller = -5.7598, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

    Augmented Dickey-Fuller Test

data:  mytsdata[, 2]
Dickey-Fuller = -2.801, Lag order = 4, p-value = 0.2448
alternative hypothesis: stationary
[1] 0
[1] 1

    Augmented Dickey-Fuller Test

data:  stationaryTS
Dickey-Fuller = -5.3845, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

The data are clearly non-stationary, as the series wanders up and down for long periods. Consequently, we will take a first difference of the data.

The PACF shown is suggestive of an AR(1) model. So an initial candidate model is an ARIMA(1,1,0). There are no other obvious candidate models.

Series: mytsdatadj 
ARIMA(1,1,0) 

Coefficients:
          ar1
      -0.5128
s.e.   0.0861

sigma^2 estimated as 1.542:  log likelihood=-161.56
AIC=327.12   AICc=327.24   BIC=332.31


    Ljung-Box test

data:  Residuals from ARIMA(1,1,0)
Q* = 12.871, df = 7, p-value = 0.07533

Model df: 1.   Total lags used: 8

Series: mytsdatadj 
ARIMA(0,1,1) with drift 

Coefficients:
          ma1    drift
      -0.7453  -0.0545
s.e.   0.1093   0.0305

sigma^2 estimated as 1.362:  log likelihood=-155.16
AIC=316.33   AICc=316.58   BIC=324.11


    Ljung-Box test

data:  Residuals from ARIMA(0,1,1) with drift
Q* = 8.1736, df = 6, p-value = 0.2257

Model df: 2.   Total lags used: 8

6 Some Code for Exercise

[1] 67.159
 [1] 253.80 748.08 123.19 143.74  68.08  72.23 122.97 252.78   8.36  17.23
[1] 181.0443
[1] 13.45527
[1] 72.848
[1] 104.90 469.24 281.84 312.51   6.56
[1] 293.7633
[1] 17.13953
[1] -3.000000e-01 -2.000000e-01 -1.000000e-01  5.551115e-17  1.000000e-01
[6]  2.000000e-01  3.000000e-01
[1] 0
[1] 0.04666667
[1] 1
[1] 0.04666667
[1] 0
[1] 4.666667
[1] 1
[1] 4.666667