Problem 2.1
Solution:
From Sect.R.4, we can use the following R code:
library(astsa)
trend = time(jj) - 1970 #helps to 'center' time
Q = factor(cycle(jj) ) #make (Q)uarter factors
reg1 = lm(log(jj)~0 + trend + Q, na.action=NULL) #no intercept
summary(reg1)
##
## Call:
## lm(formula = log(jj) ~ 0 + trend + Q, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29318 -0.09062 -0.01180 0.08460 0.27644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## trend 0.167172 0.002259 74.00 <2e-16 ***
## Q1 1.052793 0.027359 38.48 <2e-16 ***
## Q2 1.080916 0.027365 39.50 <2e-16 ***
## Q3 1.151024 0.027383 42.03 <2e-16 ***
## Q4 0.882266 0.027412 32.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1254 on 79 degrees of freedom
## Multiple R-squared: 0.9935, Adjusted R-squared: 0.9931
## F-statistic: 2407 on 5 and 79 DF, p-value: < 2.2e-16
With the above result, we can now identify the fitted model as \(\small x\) = 0.1672 + 1.0528 \(\small Q_1 (t)\) + 1.0809 \(\small Q_2 (t)\) + 1.151 \(\small Q_3 (t)\) + 0.8823 \(\small Q_4 (t)\) + \(w_t\), where the residual standard error is 0.1254 on 79 degrees of freedom.
Solution:
Based on the generated results in (a), the estimated average annual increase in the logged earnings per share is the sum of each of the quarter coefficient values. That is, 1.052793 + 1.080916 + 1.151024 + 0.882266 = 4.166999.
Solution:
Assuming that model is (a) is correct, the difference of the average logged earnings rate from the third quarter to the fourth quarter can be computed as
In terms of ratio or percentage, this is
\[\small \frac{\alpha_4 - \alpha_3}{\alpha_3} \times 100\% = \frac{0.882266 - 1.151024}{1.151024} \times 100\% = -23.3495\%\]
Therefore, there is a decrease in the average logged earnings from the third quarter to the fourth quarter by -0.268758 or 23.3495%.
Solution:
By considering an intercept term in the model in (a), we can write the following R code:
reg2 = lm(log(jj)~trend + Q, na.action=NULL) #with intercept
summary(reg2)
##
## Call:
## lm(formula = log(jj) ~ trend + Q, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29318 -0.09062 -0.01180 0.08460 0.27644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.052793 0.027359 38.480 < 2e-16 ***
## trend 0.167172 0.002259 73.999 < 2e-16 ***
## Q2 0.028123 0.038696 0.727 0.4695
## Q3 0.098231 0.038708 2.538 0.0131 *
## Q4 -0.170527 0.038729 -4.403 3.31e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1254 on 79 degrees of freedom
## Multiple R-squared: 0.9859, Adjusted R-squared: 0.9852
## F-statistic: 1379 on 4 and 79 DF, p-value: < 2.2e-16
With the above result, we can observe that the first quarter effect is taken away and affects the rest of the quarters. This does not make sense since our aim is to evaluate the effect of each quarter separately.
Solution:
To graph the data, the fitted values and the residuals, the following R code can be used:
par(mfrow=c(1,2))
plot(log(jj), main="Plot of data (R) & fitted value (B)", col="#990000") #data
lines(fitted(reg1), col="#003399") #fitted
plot(log(jj) - fitted(reg1), main="Plot of residuals", col="#10710a")
Looking at the graph above, we can observe that the noise seems not to follow any pattern. The residuals look white and the fit seems pretty good.
Problem 2.2
Solution:
We can extend the regression model \(\small M_t = \beta_0 + \beta_1 t + \beta_2 (T_t - T) + \beta_3 (T_t - T)^2 + \beta_4 P_t + w_t\) presented in Example 2.2 to add another component that accounts for the particulate account four weeks prior. That is, we add \(\small P_{t-4}\) to the regression model. Hence, we will have \[\small \hat{M}_t = \beta_0 + \beta_1 t + \beta_2 (T_t - T) + \beta_3 (T_t - T)^2 + \beta_4 P_t + \beta_5 P_{t-4} + w_t\] The estimated parameters \(\small (\beta_0 ,\beta_1 ,\beta_2 ,\beta_3 ,\beta_4 ,\beta_5)\) can be obtained using the following R code:
n = length(tempr)
temp1 = tempr - mean(tempr) #center temperature
temp2 = temp1^2
trend = time(cmort) #time
fit1 = lm(cmort~trend + temp1 + temp2 + part, na.action=NULL)
fit2 = lm(cmort[5:n]~trend[5:n] + temp1[5:n] + temp2[5:n] + part[5:n]
+ part[1:(n-4)], na.action=NULL)
summary(fit2)
##
## Call:
## lm(formula = cmort[5:n] ~ trend[5:n] + temp1[5:n] + temp2[5:n] +
## part[5:n] + part[1:(n - 4)], na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.228 -4.314 -0.614 3.713 27.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.808e+03 1.989e+02 14.123 < 2e-16 ***
## trend[5:n] -1.385e+00 1.006e-01 -13.765 < 2e-16 ***
## temp1[5:n] -4.058e-01 3.528e-02 -11.503 < 2e-16 ***
## temp2[5:n] 2.155e-02 2.803e-03 7.688 8.02e-14 ***
## part[5:n] 2.029e-01 2.266e-02 8.954 < 2e-16 ***
## part[1:(n - 4)] 1.030e-01 2.485e-02 4.147 3.96e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.287 on 498 degrees of freedom
## Multiple R-squared: 0.608, Adjusted R-squared: 0.6041
## F-statistic: 154.5 on 5 and 498 DF, p-value: < 2.2e-16
The summary of the fit above suggests that all predictors are statistically significant. The new fitted model can be written as \[\small \hat{M}_t = 2,808 - 1.385t - 0.4058 (T_t - T) + 0.02155 (T_t - T)^2 + 0.2029P_t + 0.1030P_{t-4} + w_t\]
Solution:
We can use this R code to generate the required scatterplot matrix:
x = cbind('M_t'=cmort[5:n], 'T_t'=temp1[5:n], 'P_t'=part[5:n],
'P_t-4'=part[1:(n-4)])
pairs(x)
To calculate the pairwise correlations between the series, we can use the code below.
cor(x)
## M_t T_t P_t P_t-4
## M_t 1.0000000 -0.4369648 0.4422896 0.5209993
## T_t -0.4369648 1.0000000 -0.0148241 -0.3990848
## P_t 0.4422896 -0.0148241 1.0000000 0.5340505
## P_t-4 0.5209993 -0.3990848 0.5340505 1.0000000
As we can observed from the results above, the correlation between Mortality (\(\small M_t\)) and Particulate 4 weeks prior (\(\small P_{t-4}\)) is stronger at 0.5209993 against the correlation between Mortality (\(\small M_t\)) and Particulate count (\(\small P_t\)) at 0.4422896. Therefore, the particulate count four weeks prior, \(\small P_{t-4}\), is a significant variable and should be included in the regression.
Problem 2.3
par(mfrow=c(2,2), mar=c(2.5,2.5,0,0)+.5, mgp=c(1.6,.6,0)) # set up
for (i in 1:4){
x = ts(cumsum(rnorm(100,.01,1))) # data
regx = lm(x~0+time(x), na.action=NULL) # regression
plot(x, ylab='Random Walk w Drift') # plots
abline(a=0, b=.01, col=2, lty=2) # true mean (red - dashed)
abline(regx, col=4) # fitted line (blue - solid)
}
Solution:
Using least squares in the R code below, we can plot the data, the true mean function and the fitted line.
set.seed(2)
n = 100
delta = 0.01
time = 1:n
par(mfrow=c(2,2), mar=c(2.5,2.5,0.5,0)+0.5, mgp=c(1.6,0.6,0)) #setup
for (i in 1:4){
w_a = rnorm(n, 0, 1)
x = c()
for (t in time){
x[t] = delta * t + sum(w_a[i:t]) #data
}
mu_a = delta*time #true mean function
fit_a = lm(x~0 + time) #regression
min_ya = floor(min(x))-1; max_ya = ceiling(max(x)) #sets y-axis limit
plot(time, x, ylim=c(min_ya,max_ya), type="l", main=paste("Random Walk with Drift",i)) #plots
lines(time, fitted(fit_a), col="#003399") #fitted line (blue, solid)
lines(time, mu_a, col="#990000",lty=2) #true mean (red, dashed)
}
Solution:
Using similar approach in (a) except the process \(\small y_t = 0.01t + w_t\), we have the R code below to plot the data, the true mean function and the fitted line.
set.seed(2)
n = 100
delta = 0.01
time = 1:n
par(mfrow=c(2,2), mar=c(2.5,2.5,0.5,0)+0.5, mgp=c(1.6,0.6,0)) #setup
for (i in 1:4){
w_b = rnorm(n, 0, 1)
y = delta * time + w_b #data
mu_b = delta*time #true mean function
fit_b = lm(y~0 + time) #regression
min_yb = floor(min(y)); max_yb = ceiling(max(y)) #sets y-axis limit
plot(time, y, ylim=c(min_yb,max_yb), type="l", main=paste("Linear Trend Plus Noise",i)) #plots
lines(time, fitted(fit_b), col="#003399") #fitted line (blue, solid)
lines(time, mu_b, col="#990000",lty=2) #true mean (red, dashed)
}
Solution:
The distance between the fit and the true mean in part (b) is significantly closer compared to that of part (a). This is due to the errors in \(\small y_t\) which are independent and is one of the main assumptions of the linear regression. Whereas, the errors in \(\small x_t\) are correlated due to the accumulation of the white noises.