model1 <- lm(Y ~ x1)
summary(model1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40652.8077 367.18172 110.71577 1.865144e-61
## x1 -515.9282 12.05658 -42.79225 4.671062e-41
Answer: \(\hat{Y} = 4.0652808\times 10^{4} + -515.9282 \times x_1\)
summary(model1)$r.squared
## [1] 0.9734209
Answer: \(R^2 = 0.9734\)
This indicates that 97.34% of the variation in Y is explained by x₁.
plot(residuals(model1), type = "o", pch = 19, col = "blue",
main = "Residuals vs Time Order",
xlab = "Observation Number (Time Order)",
ylab = "Residuals")
abline(h = 0, col = "red", lty = 2, lwd = 2)
Problem: The residual plot shows a systematic pattern where residuals cluster together by sign (positive residuals followed by positive residuals, then negative residuals followed by negative residuals). This indicates autocorrelation (serial correlation) in the errors, which violates the independence assumption of linear regression.
Hypotheses:
runs_result <- runs.test(residuals(model1))
runs_result
##
## Runs Test
##
## data: residuals(model1)
## statistic = -6.7226, runs = 3, n1 = 26, n2 = 26, n = 52, p-value =
## 1.785e-11
## alternative hypothesis: nonrandomness
Test Statistic: \(Z = -6.7226\)
P-value: \(0.0000000000178465\)
Conclusion: Since the p-value (0.0000000000178465) is less than \(\alpha = 0.01\), we reject the null hypothesis at the 0.01 significance level. There is sufficient evidence to conclude that the residuals are not randomly distributed, indicating the presence of autocorrelation in the errors.
Hypotheses:
dw_result <- dwtest(model1, alternative = "two.sided")
dw_result
##
## Durbin-Watson test
##
## data: model1
## DW = 0.11683, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is not 0
Durbin-Watson Statistic: \(DW = 0.1168\)
P-value: \(0.000000000000000000000000000377435\)
Interpretation: The DW statistic of 0.1168 is much less than 2, indicating positive autocorrelation.
Conclusion: Since the p-value (0.000000000000000000000000000377435) is less than \(\alpha = 0.01\), we reject the null hypothesis at the 0.01 significance level. There is sufficient evidence to conclude that first-order autocorrelation exists in the residuals. The DW statistic indicates positive autocorrelation.
model2a <- lm(y ~ n)
summary(model2a)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.805191 2.04724607 -0.8817654 3.789728e-01
## n 1.060624 0.03519549 30.1352171 6.736090e-76
Answer: \(\hat{y} = -1.8052 + 1.0606 \times n\)
plot(fitted(model2a), rstandard(model2a), pch = 19, col = "blue",
main = "Standardized Residuals vs Fitted Values",
xlab = "Fitted Values",
ylab = "Standardized Residuals")
abline(h = 0, col = "red", lty = 2, lwd = 2)
abline(h = c(-2, 2), col = "orange", lty = 2)
Problem: The residual plot shows a funnel (megaphone) shape where the spread of residuals increases as fitted values increase. This indicates heteroscedasticity (non-constant variance), which violates the assumption of constant error variance (homoscedasticity) in linear regression.
model2c <- lm(sqrt(y) ~ n)
summary(model2c)
##
## Call:
## lm(formula = sqrt(y) ~ n)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.59428 -0.62398 0.04937 0.64028 2.03262
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.612745 0.138339 18.89 <2e-16 ***
## n 0.081519 0.002378 34.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9709 on 198 degrees of freedom
## Multiple R-squared: 0.8558, Adjusted R-squared: 0.8551
## F-statistic: 1175 on 1 and 198 DF, p-value: < 2.2e-16
The model has been fitted with \(\sqrt{y}\) as the response variable to stabilize the variance.
beta0_c <- coef(model2c)[1]
beta1_c <- coef(model2c)[2]
On transformed scale: \(\sqrt{\hat{y}} = 2.6127 + 0.0815 \times n\)
On original scale: \(\hat{y} = \left(2.6127 + 0.0815 \times n\right)^2\)
n_value <- 40
pred_sqrt <- predict(model2c, newdata = data.frame(n = n_value))
pred_original <- pred_sqrt^2
Calculation:
Step 1: Calculate on \(\sqrt{y}\)
scale
\(\sqrt{\hat{y}} = 2.6127 + 0.0815 \times 40 =
5.8735\)
Step 2: Square to get original scale
\(\hat{y} = \left(5.8735\right)^2 =
34.4983\)
Answer: When \(n = 40\), \(\hat{y} = 34.4983\)
model2f <- lm(log(y) ~ n + I(n^2))
summary(model2f)
##
## Call:
## lm(formula = log(y) ~ n + I(n^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4744 -0.1900 0.0220 0.2386 0.7751
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5104037 0.0768787 19.65 <2e-16 ***
## n 0.0674594 0.0035136 19.20 <2e-16 ***
## I(n^2) -0.0003820 0.0000337 -11.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3552 on 197 degrees of freedom
## Multiple R-squared: 0.8619, Adjusted R-squared: 0.8605
## F-statistic: 614.8 on 2 and 197 DF, p-value: < 2.2e-16
A polynomial regression model with log-transformed response has been fitted: \(\log(y) = \beta_0 + \beta_1 n + \beta_2 n^2 + \epsilon\)
beta0_f <- coef(model2f)[1]
beta1_f <- coef(model2f)[2]
beta2_f <- coef(model2f)[3]
On log scale:
\(\log(\hat{y}) = 1.510404 + 0.067459 \times n
+ -3.82\times 10^{-4} \times n^2\)
On original scale:
\(\hat{y} = \exp\left(1.510404 + 0.067459
\times n + -3.82\times 10^{-4} \times n^2\right)\)
Or equivalently:
\(\hat{y} = e^{1.510404 + 0.067459 \times n +
-3.82\times 10^{-4} \times n^2}\)
# Load data and fit model
load("dataQ1.RData")
model1 <- lm(Y ~ x1)
# 1(a) & 1(b)
summary(model1)
# 1(c) Residual plot
plot(residuals(model1), type="o")
abline(h=0, col="red", lty=2)
# 1(d) Runs test
library(randtests)
runs.test(residuals(model1))
# 1(e) Durbin-Watson test
library(lmtest)
dwtest(model1, alternative="two.sided")
# Load data and fit model
load("dataQ2.RData")
model2a <- lm(y ~ n)
# 2(a) & 2(b)
summary(model2a)
plot(fitted(model2a), rstandard(model2a))
abline(h=0, col="red", lty=2)
# 2(c) & 2(d) Square-root transformation
model2c <- lm(sqrt(y) ~ n)
summary(model2c)
# 2(e) Prediction at n=40
predict(model2c, newdata=data.frame(n=40))^2
# 2(f) & 2(g) Log polynomial
model2f <- lm(log(y) ~ n + I(n^2))
summary(model2f)