Part I
2.11a)
library(astsa)
## Warning: package 'astsa' was built under R version 4.1.3
w=rnorm(500)
acf1(w,20)
## [1] 0.05 -0.04 0.06 -0.05 0.06 -0.04 -0.01 -0.02 0.02 -0.06 -0.03 0.06
## [13] 0.06 0.04 0.08 -0.01 -0.04 0.09 -0.01 -0.05
mean(acf1(w,20))
## [1] 0.008
sd(acf1(w,20))
## [1] 0.0497996
1/sqrt(500)
## [1] 0.04472136
The actual ACF of a series of white noise observations is equal to zero for any two observations that do not take place at the same time. This means that the expected ACF value is zero for all lag(h) values where h is nonzero. Furthermore, we can see that the distribution of the sample ACF is approximately Normal with a mean of about zero and a standard deviation that is about 1/sqrt(n). The percentage of values that are within two standard errors of zero is roughly 95% - we can see only 1 of 20 measurements is greater than this limit.
2.11b)
y=rnorm(50)
acf1(y,20)
## [1] -0.22 0.10 -0.19 0.09 -0.23 0.14 -0.11 0.08 -0.16 0.17 -0.09 0.11
## [13] -0.21 0.13 -0.10 0.10 -0.09 0.13 -0.15 0.15
mean(acf1(y,20))
## [1] -0.0175
sd(acf1(y,20))
## [1] 0.1474654
1/sqrt(50)
## [1] 0.1414214
By calculating the sample ACF with a smaller sample size, we still get similar results - the sample mean is centered at zero and the standard deviation is about 1/sqrt(n). The main difference is that a smaller sample size produces a larger standard deviation; the samples SD produced by a sample of n=50 are about 3 times larger than the sample SD produced by a sample of n=500.
2.12a)
v=filter(w, sides=2, filter=rep(1/3,3))
acf1(v,20)
## [1] 0.69 0.33 0.03 0.00 0.02 -0.03 -0.02 -0.05 -0.05 -0.06 0.00 0.08
## [13] 0.14 0.14 0.10 0.06 0.04 0.04 0.01 -0.01
mean(acf1(v,20))
## [1] 0.073
The sample ACF for the moving average produces highly significant results for lag(1) and lag(2), which is to be expected because the autocovariance function for the moving average has nonzero values when h is less than or equal to 2. The expected autocorrelation values for h=1 and h=2 are 2/3 and 1/3 respectively, which is very close to what the sample ACF is producing. The expected autocorrelation value of the rest of the lag values should be zero, and the remaining sample ACF values do appear to be centered at zero and mostly within two standard errors of the mean. The graph does appear to have some level of periodicity.
2.12b)
y=rnorm(50)
v=filter(y, sides=2, filter=rep(1/3,3))
acf1(v,20)
## [1] 0.72 0.52 0.29 0.26 0.17 0.05 0.06 0.08 0.05 -0.01 -0.08 -0.08
## [13] -0.09 -0.18 -0.23 -0.27 -0.20 -0.21 -0.30 -0.41
mean(acf1(v,20))
## [1] 0.007
Just as with the white noise problem, calculating the sample ACF with a smaller sample size still produces similar results - the sample mean is still centered at zero and there still appears to be some level of periodicity. The smaller sample size does produce sample ACF values that are not as close to the expected ACF values for lag(1) and lag(2). There actually does not seems to be that much difference in the sample standard deviations.
2.13)
## [1] 0.82 0.48 0.10 -0.20 -0.36 -0.38 -0.29 -0.16 -0.01 0.09 0.13 0.13
## [13] 0.08 0.00 -0.10 -0.18 -0.23 -0.25 -0.22 -0.15 -0.04 0.07 0.16 0.21
## [25] 0.22 0.19 0.12 0.04 -0.04 -0.11 -0.19 -0.23 -0.21 -0.15 -0.04 0.07
## [37] 0.18 0.24 0.23 0.13 0.00 -0.12 -0.20 -0.22 -0.17 -0.06 0.06 0.16
## [49] 0.22 0.20
## [1] 4e-04
The sample ACF for the AR model shows clear periodicity. The values for lag(1) and lag(2) show very significant autocorrelation values, which is to be expected since x(t-1) and x(t-2) are used to define x(t). The values oscillate between positive and negative correlation as the lag increases - there is typically a strong negative correlation between values that are 6 values apart, and a weaker but still significant positive correlation between values that are 12-14 values apart. Beyond this the exact period of the cycles seems to vary greatly from sample to sample. As expected the ACF values exceed two standard deviations away from the mean of zero significantly more than 95% of the time since this is no longer a white noise series.
2.14)
## [1] 0.99 0.97 0.93 0.87 0.81 0.73 0.64 0.54 0.43 0.31 0.19 0.07
## [13] -0.05 -0.17 -0.29 -0.40 -0.51 -0.61 -0.69 -0.77 -0.83 -0.88 -0.92 -0.94
## [25] -0.95 -0.94 -0.92 -0.88 -0.83 -0.77 -0.69 -0.61 -0.51 -0.41 -0.30 -0.19
## [37] -0.07 0.05 0.16 0.27 0.38 0.48 0.57 0.66 0.73 0.79 0.84 0.87
## [49] 0.89 0.90 0.89 0.87 0.84 0.79 0.73 0.66 0.57 0.48 0.39 0.28
## [61] 0.18 0.07 -0.04 -0.15 -0.26 -0.36 -0.45 -0.54 -0.62 -0.69 -0.75 -0.79
## [73] -0.82 -0.84 -0.85 -0.84 -0.82 -0.79 -0.74 -0.69 -0.62 -0.54 -0.46 -0.36
## [85] -0.27 -0.17 -0.06 0.04 0.14 0.24 0.34 0.43 0.51 0.58 0.65 0.70
## [97] 0.75 0.78 0.79 0.80
## [1] 0.68 0.63 0.61 0.57 0.52 0.46 0.41 0.37 0.28 0.20 0.11 0.04
## [13] -0.02 -0.11 -0.19 -0.25 -0.33 -0.38 -0.45 -0.51 -0.56 -0.57 -0.62 -0.62
## [25] -0.60 -0.60 -0.58 -0.58 -0.54 -0.50 -0.44 -0.37 -0.31 -0.28 -0.19 -0.11
## [37] -0.03 0.06 0.12 0.18 0.27 0.31 0.35 0.42 0.46 0.50 0.52 0.57
## [49] 0.56 0.57 0.58 0.56 0.54 0.52 0.47 0.41 0.36 0.34 0.26 0.17
## [61] 0.09 0.03 -0.04 -0.09 -0.16 -0.24 -0.29 -0.36 -0.41 -0.46 -0.49 -0.53
## [73] -0.53 -0.54 -0.56 -0.58 -0.53 -0.52 -0.47 -0.42 -0.40 -0.37 -0.28 -0.22
## [85] -0.17 -0.11 -0.06 0.03 0.11 0.16 0.24 0.29 0.34 0.39 0.41 0.48
## [97] 0.48 0.49 0.50 0.51
## [1] 0.14 0.07 0.07 0.06 0.03 0.01 0.03 0.08 0.03 0.01 -0.03 -0.02
## [13] 0.04 -0.02 -0.03 -0.01 -0.05 -0.02 -0.06 -0.07 -0.10 -0.06 -0.13 -0.08
## [25] -0.02 -0.02 -0.02 -0.10 -0.08 -0.06 -0.02 0.00 0.00 -0.09 -0.04 0.01
## [37] 0.01 0.07 0.05 0.02 0.09 0.02 -0.03 0.04 0.02 0.03 0.01 0.08
## [49] 0.01 0.03 0.07 0.06 0.05 0.09 0.06 0.01 0.02 0.11 0.06 -0.02
## [61] -0.05 -0.03 -0.04 0.02 0.01 -0.03 -0.03 -0.07 -0.07 -0.09 -0.07 -0.08
## [73] -0.03 -0.04 -0.07 -0.13 -0.04 -0.08 -0.02 0.03 -0.03 -0.07 0.01 0.02
## [85] -0.01 -0.02 -0.04 0.01 0.06 0.03 0.07 0.06 0.05 0.05 0.02 0.11
## [97] 0.04 0.02 0.01 0.03
In the AFC plots with no white noise and white noise with a standard deviation of 1, the cyclic nature of the series is very clear. It is very easy to see that the data has a period of 50; values that are 50 units apart show high positive correlation and values that are 25 units apart show high negative correlation. The addition of white noise with SD=1 produces smaller sample ACF values, but the pattern is still very clear. In the sample with white noise where SD=5, the white noise has significantly altered the ACF values. Since the signal function only has an altitude of 2, the high variability of the white noise masks much of the cyclic pattern that is so clear in the other sample ACF graphs. Although the pattern of positive and negative correlations generally still remains, it is more diffcult to discern the exact period of the series.
2.15) The calculations are included in the attached handwritten page.
Part II
IIa)
set.seed(14203)
w=rnorm(99)
z=rnorm(99)
x=rep(0,100)
y=rep(0,100)
x[1]=0
y[1]=0
x[2:100]=cumsum(w)
y[2:100]=cumsum(z)
IIi)
There does not appear to be any correlation between x and y in the plot; the variables seem to be independent of each other. This is expected, since x and y were produced from two different independent white noise series. It is interesting to note that when this plot is generated many times, several plots do appear to a have a strong postiive or negative correlation by random chance.
IIii)
We would expect that we would fail to reject the null hypothesis. If x and y are truly independent of each other, then the expected slope of the regression line is zero. We would expect that this roughly holds true for the sample, and that the hypothesis test will not produce a significant result.
IIiii)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0803 -2.5631 0.3213 1.6945 7.3958
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.03191 0.57579 -5.266 8.27e-07 ***
## x -0.04941 0.05810 -0.850 0.397
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.785 on 98 degrees of freedom
## Multiple R-squared: 0.007326, Adjusted R-squared: -0.002803
## F-statistic: 0.7232 on 1 and 98 DF, p-value: 0.3972
When fitting the linear model for this data, we get a B1 estimate of -0.04941 with a standard error of 0.0581 which yields a p-value of 0.397 - as expected, we fail to reject the null hypothesis; there is not convincing evidence that B1 is different from zero.
IIb) Percentage of 1000 trials that reject the null hypothesis:
## [1] 0.754
This does not support the expectation stated in part (ii). Based on what was stated above, we would expect that the null hypothesis would be rejected about 5% of the time by chance, but running 1000 simulations produces a rejection rate of roughly 75%.
Checking the conditions of the linear regression model, we see that the condition of independence is violated because the value of the every x term is dependent on the value of the previous x term; the expected autocovariance function is nonzero for almost all terms in the series.
Looking at the regression plots, we can also see that the condition of Normality may not be met - the QQ plot for the residuals varies from the expected quantiles, especially for the values further from the mean. There is also some evidence of a nonconstant variance, as the residuals decrease for the smaller half of the x values then increase for the larger half of the x values.